from:"Chris Male"


[ 
https://issues.apache.org/jira/browse/LUCENE-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13914954#comment-13914954
 ] 

Chris Male commented on LUCENE-5468:


Those are some pretty amazing reductions, well done!

 Hunspell very high memory use when loading dictionary
 -

 Key: LUCENE-5468
 URL: https://issues.apache.org/jira/browse/LUCENE-5468
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 3.5
Reporter: Maciej Lisiewski
Priority: Minor
 Attachments: patch.txt


 Hunspell stemmer requires gigantic (for the task) amounts of memory to load 
 dictionary/rules files. 
 For example loading a 4.5 MB polish dictionary (with empty index!) will cause 
 whole core to crash with various out of memory errors unless you set max heap 
 size close to 2GB or more.
 By comparison Stempel using the same dictionary file works just fine with 1/8 
 of that (and possibly lower values as well).
 Sample error log entries:
 http://pastebin.com/fSrdd5W1
 http://pastebin.com/Lmi0re7Z



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5468) Hunspell very high memory use when loading dictionary


[ 
https://issues.apache.org/jira/browse/LUCENE-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915045#comment-13915045
 ] 

Chris Male commented on LUCENE-5468:


Is the longestOnly option a standard Hunspell thing? (more a question of 
general interest)

 Hunspell very high memory use when loading dictionary
 -

 Key: LUCENE-5468
 URL: https://issues.apache.org/jira/browse/LUCENE-5468
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 3.5
Reporter: Maciej Lisiewski
Priority: Minor
 Attachments: LUCENE-5468.patch, patch.txt


 Hunspell stemmer requires gigantic (for the task) amounts of memory to load 
 dictionary/rules files. 
 For example loading a 4.5 MB polish dictionary (with empty index!) will cause 
 whole core to crash with various out of memory errors unless you set max heap 
 size close to 2GB or more.
 By comparison Stempel using the same dictionary file works just fine with 1/8 
 of that (and possibly lower values as well).
 Sample error log entries:
 http://pastebin.com/fSrdd5W1
 http://pastebin.com/Lmi0re7Z



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5376) Add a demo search server


[ 
https://issues.apache.org/jira/browse/LUCENE-5376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915047#comment-13915047
 ] 

Chris Male commented on LUCENE-5376:


Hey Mike,

What's the endzone here? Any thoughts on it coming back into trunk?

 Add a demo search server
 

 Key: LUCENE-5376
 URL: https://issues.apache.org/jira/browse/LUCENE-5376
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: lucene-demo-server.tgz


 I think it'd be useful to have a demo search server for Lucene.
 Rather than being fully featured, like Solr, it would be minimal, just 
 wrapping the existing Lucene modules to show how you can make use of these 
 features in a server setting.
 The purpose is to demonstrate how one can build a minimal search server on 
 top of APIs like SearchManager, SearcherLifetimeManager, etc.
 This is also useful for finding rough edges / issues in Lucene's APIs that 
 make building a server unnecessarily hard.
 I don't think it should have back compatibility promises (except Lucene's 
 index back compatibility), so it's free to improve as Lucene's APIs change.
 As a starting point, I'll post what I built for the eating your own dog 
 food search app for Lucene's  Solr's jira issues 
 http://jirasearch.mikemccandless.com (blog: 
 http://blog.mikemccandless.com/2013/05/eating-dog-food-with-lucene.html ). It 
 uses Netty to expose basic indexing  searching APIs via JSON, but it's very 
 rough (lots nocommits).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5468) Hunspell very high memory use when loading dictionary


[ 
https://issues.apache.org/jira/browse/LUCENE-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915053#comment-13915053
 ] 

Chris Male commented on LUCENE-5468:


Awesome, sounds like a great addition then.

 Hunspell very high memory use when loading dictionary
 -

 Key: LUCENE-5468
 URL: https://issues.apache.org/jira/browse/LUCENE-5468
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 3.5
Reporter: Maciej Lisiewski
Priority: Minor
 Attachments: LUCENE-5468.patch, patch.txt


 Hunspell stemmer requires gigantic (for the task) amounts of memory to load 
 dictionary/rules files. 
 For example loading a 4.5 MB polish dictionary (with empty index!) will cause 
 whole core to crash with various out of memory errors unless you set max heap 
 size close to 2GB or more.
 By comparison Stempel using the same dictionary file works just fine with 1/8 
 of that (and possibly lower values as well).
 Sample error log entries:
 http://pastebin.com/fSrdd5W1
 http://pastebin.com/Lmi0re7Z



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5468) Hunspell very high memory use when loading dictionary


[ 
https://issues.apache.org/jira/browse/LUCENE-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915231#comment-13915231
 ] 

Chris Male commented on LUCENE-5468:


I dont think we should make the recusionCap anymore complex.  I put it in there 
simply to prevent languages from getting into infinite loops.

 Hunspell very high memory use when loading dictionary
 -

 Key: LUCENE-5468
 URL: https://issues.apache.org/jira/browse/LUCENE-5468
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 3.5
Reporter: Maciej Lisiewski
Priority: Minor
 Fix For: 4.8, 5.0

 Attachments: LUCENE-5468.patch, patch.txt


 Hunspell stemmer requires gigantic (for the task) amounts of memory to load 
 dictionary/rules files. 
 For example loading a 4.5 MB polish dictionary (with empty index!) will cause 
 whole core to crash with various out of memory errors unless you set max heap 
 size close to 2GB or more.
 By comparison Stempel using the same dictionary file works just fine with 1/8 
 of that (and possibly lower values as well).
 Sample error log entries:
 http://pastebin.com/fSrdd5W1
 http://pastebin.com/Lmi0re7Z



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5468) Hunspell very high memory use when loading dictionary


[ 
https://issues.apache.org/jira/browse/LUCENE-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915240#comment-13915240
 ] 

Chris Male commented on LUCENE-5468:


Yeah I guess.  We can go over that in a new issue.

 Hunspell very high memory use when loading dictionary
 -

 Key: LUCENE-5468
 URL: https://issues.apache.org/jira/browse/LUCENE-5468
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 3.5
Reporter: Maciej Lisiewski
Priority: Minor
 Fix For: 4.8, 5.0

 Attachments: LUCENE-5468.patch, patch.txt


 Hunspell stemmer requires gigantic (for the task) amounts of memory to load 
 dictionary/rules files. 
 For example loading a 4.5 MB polish dictionary (with empty index!) will cause 
 whole core to crash with various out of memory errors unless you set max heap 
 size close to 2GB or more.
 By comparison Stempel using the same dictionary file works just fine with 1/8 
 of that (and possibly lower values as well).
 Sample error log entries:
 http://pastebin.com/fSrdd5W1
 http://pastebin.com/Lmi0re7Z



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5468) Hunspell very high memory use when loading dictionary

2014-02-23 Thread Chris Male (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13909943#comment-13909943
 ] 

Chris Male commented on LUCENE-5468:


Multiple dictionaries was never in the original design either.  Having an 
efficient and usable design seems to be of higher priority so +1 to not forking 
and doing this in place.

 Hunspell very high memory use when loading dictionary
 -

 Key: LUCENE-5468
 URL: https://issues.apache.org/jira/browse/LUCENE-5468
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 3.5
Reporter: Maciej Lisiewski
Priority: Minor
 Attachments: patch.txt


 Hunspell stemmer requires gigantic (for the task) amounts of memory to load 
 dictionary/rules files. 
 For example loading a 4.5 MB polish dictionary (with empty index!) will cause 
 whole core to crash with various out of memory errors unless you set max heap 
 size close to 2GB or more.
 By comparison Stempel using the same dictionary file works just fine with 1/8 
 of that (and possibly lower values as well).
 Sample error log entries:
 http://pastebin.com/fSrdd5W1
 http://pastebin.com/Lmi0re7Z



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5468) Hunspell very high memory use when loading dictionary

2014-02-23 Thread Chris Male (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13909952#comment-13909952
 ] 

Chris Male commented on LUCENE-5468:


Sounds good

 Hunspell very high memory use when loading dictionary
 -

 Key: LUCENE-5468
 URL: https://issues.apache.org/jira/browse/LUCENE-5468
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 3.5
Reporter: Maciej Lisiewski
Priority: Minor
 Attachments: patch.txt


 Hunspell stemmer requires gigantic (for the task) amounts of memory to load 
 dictionary/rules files. 
 For example loading a 4.5 MB polish dictionary (with empty index!) will cause 
 whole core to crash with various out of memory errors unless you set max heap 
 size close to 2GB or more.
 By comparison Stempel using the same dictionary file works just fine with 1/8 
 of that (and possibly lower values as well).
 Sample error log entries:
 http://pastebin.com/fSrdd5W1
 http://pastebin.com/Lmi0re7Z



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5621) Let Solr use Lucene's SeacherManager

2014-01-09 Thread Chris Male (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867291#comment-13867291
 ] 

Chris Male commented on SOLR-5621:
--

+1 for trunk

 Let Solr use Lucene's SeacherManager
 

 Key: SOLR-5621
 URL: https://issues.apache.org/jira/browse/SOLR-5621
 Project: Solr
  Issue Type: Improvement
Affects Versions: 5.0
Reporter: Tomás Fernández Löbbe
 Fix For: 5.0

 Attachments: SOLR-5621.patch


 It would be nice if Solr could take advantage of Lucene's SearcherManager and 
 get rid of most of the logic related to managing Searchers in SolrCore. 
 I've been taking a look at how possible it is to achieve this, and even if I 
 haven't finish with the changes (there are some use cases that are still not 
 working exactly the same) it looks like it is possible to do.  Some things 
 still could use a lot  of improvement (like the realtime searcher management) 
 and some other not yet implemented, like Searchers on deck or 
 IndexReaderFactory
 I'm attaching an initial patch (many TODOs yet). 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5057) Hunspell stemmer generates multiple tokens

2013-09-05 Thread Chris Male (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-5057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13759514#comment-13759514
]

Chris Male commented on LUCENE-5057:

The example you describe is sort of at the heart of the Hunspell algorithm and
outputting those three different tokens is one of its major advantages. When
we're doing analysis we don't know which of those different meanings the user
intended, so we're providing them as all as options. I don't see that as
something negative about Hunspell, in fact quite the opposite.

Hunspell stemmer generates multiple tokens
--

Key: LUCENE-5057
URL: https://issues.apache.org/jira/browse/LUCENE-5057
Project: Lucene - Core
Issue Type: Improvement
Affects Versions: 4.3
Reporter: Luca Cavanna
Assignee: Adrien Grand

The hunspell stemmer seems to be generating multiple tokens: the original
token plus the available stems.
It might be a good thing in some cases but it seems to be a different
behaviour compared to the other stemmers and causes problems as well. I would
rather have an option to decide whether it should output only the available
stems, or the stems plus the original token. I'm not sure though if it's
possible to have only a single stem indexed, which would be even better in my
opinion. When I look at how snowball works only one token is indexed, the
stem, and that works great. Probably there's something I'm missing in how
hunspell works.
Here is my issue: I have a query composed of multiple terms, which is
analyzed using stemming and a boolean query is generated out of it. All fine
when adding all clauses as should (OR operator), but if I add all clauses as
must (AND operator), then I can get back only the documents that contain the
stem originated by the exactly same original word.
Example for the dutch language I'm working with: fiets (means bicycle in
dutch), its plural is fietsen.
If I index fietsen I get both fietsen and fiets indexed, but if I index
fiets I get the only fiets indexed.
When I query for fietsen whatever I get the following boolean query:
field:fiets field:fietsen field:whatever.
If I apply the AND operator and use must clauses for each subquery, then I
can only find the documents that originally contained fietsen, not the ones
that originally contained fiets, which is not really what stemming is about.
Any thoughts on this? I also wonder if it can be a dictionary issue since I
see that different words that have the word fiets as root don't get the
same stems, and using the AND operator at query time is a big issue.
I would love to contribute on this and looking forward to your feedback.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5057) Hunspell stemmer generates multiple tokens

2013-09-05 Thread Chris Male (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-5057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13759863#comment-13759863
]

Chris Male commented on LUCENE-5057:

I don't think the problem is related to Hunspell. Any analysis could produce
multiple tokens (synonyms for example) and whatever query parser is used needs
to reflect that correctly in how it creates BooleanQuerys. Consequently I
don't think there is an issue that needs be re/opened?

Hunspell stemmer generates multiple tokens
--

Key: LUCENE-5057
URL: https://issues.apache.org/jira/browse/LUCENE-5057
Project: Lucene - Core
Issue Type: Improvement
Affects Versions: 4.3
Reporter: Luca Cavanna
Assignee: Adrien Grand

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4616) Clarify what the score means in SpatialStrategy#makeQuery()

2012-12-11 Thread Chris Male (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529343#comment-13529343
 ] 

Chris Male commented on LUCENE-4616:


I agree with Ryan, we shouldn't try to over-define this.  Returning Query gives 
the Strategies freedom to have a meaningful score if they support it.  But we 
should just add a simple comment stating that the score from the Query may or 
may not be meaningful and the Strategy used should be checked for further 
details.

 Clarify what the score means in SpatialStrategy#makeQuery()
 ---

 Key: LUCENE-4616
 URL: https://issues.apache.org/jira/browse/LUCENE-4616
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Ryan McKinley
Priority: Trivial

 SpatialStrategy#makeQuery() returns a Query, but the docs don't make it clear 
 with the score value should be.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4616) Clarify what the score means in SpatialStrategy#makeQuery()

2012-12-11 Thread Chris Male (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529349#comment-13529349
 ] 

Chris Male commented on LUCENE-4616:


Another option, more big picture I guess, is to take this opportunity and 
remove the Strategy abstraction.  We've touched upon this in other issues, but 
the fact is that each Strategy (including those not contributed) behaves 
differently and the notion of score is a big example of this.  There is some 
consistently in the Prefix Strategies so having an abstraction there probably 
helps but otherwise I think we should just dump Strategy and let some 
Strategies return a Query with meaningful score and some return a CSQ showing 
that their score is meaningless.

 Clarify what the score means in SpatialStrategy#makeQuery()
 ---

 Key: LUCENE-4616
 URL: https://issues.apache.org/jira/browse/LUCENE-4616
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Ryan McKinley
Priority: Trivial

 SpatialStrategy#makeQuery() returns a Query, but the docs don't make it clear 
 with the score value should be.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4569) Allow customization of column stride field and norms via indexing chain

2012-11-28 Thread Chris Male (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505294#comment-13505294
 ] 

Chris Male commented on LUCENE-4569:


John,

I don't really know much about the API you're wanting to change, but to help me 
understand are you able to explain more what you're trying to do in your custom 
indexing format / code? 

I think one of the major motivation for Codecs is to allow this sort of 
customization through their API (there is already Codecs for holding this in 
memory).

 Allow customization of column stride field and norms via indexing chain
 ---

 Key: LUCENE-4569
 URL: https://issues.apache.org/jira/browse/LUCENE-4569
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: John Wang
 Attachments: patch.diff


 We are building an in-memory indexing format and managing our own segments. 
 We are doing this by implementing a custom IndexingChain. We would like to 
 support column-stride-fields and norms without having to wire in a codec 
 (since we are managing our postings differently)
 Suggested change is consistent with the api support for passing in a custom 
 InvertedDocConsumer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4271) Solr LocalParams for Lucene Query Parser

2012-11-17 Thread Chris Male (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-4271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13499565#comment-13499565
]

Chris Male commented on LUCENE-4271:

{quote}
I think it's odd to add syntax to Lucene's query parser that does ... nothing?

And it's strange to make Lucene's QP aware of Solr QP's syntax if it cannot do
anything with it. It seems like Solr's QP should have this logic instead ...
{quote}

{quote}
Indeed - but it requires changes to the parser grammar, so subclassing doesn't
cut it.
I suppose the next best thing would be to make a QP specific to Solr.
{quote}

I don't think we should consider that a bad thing. Solr has different needs
and the classic QP is sort of the lowest common denominator of parsers.

bq. I don't mean to suggest that the Lucene Query Parser should know directly
about the Solr-level structures such as the Solr schema, Solr params, and
Solr Q Parser plugins, but I am suggesting that Lucene could declare and
support abstractions for those sorts of interfaces

I don't think we can practical extend the classic QP in every way just to meet
Solr's needs.

bq. There are lots of useful features which are available in the Solr query
parsers which are unavailable directly to Lucene apps without a lot of effort,
and for no good reason.

.. then the Lucene apps should use the Solr QPs or a version there of. The
Classic QP was moved out of Lucene core for many reasons, but one was to combat
this perspective that its 'the' QP when it is in fact just one particular
implementation (an implementation which has lots of limitations). Users should
be encouraged to use whatever QP meets their needs and we shouldn't make the
classic QP a kitchen sink.

bq. The current estrangement between the Lucene and Solr query parsers is
quite a black eye for Lucene/Solr that can easily be remedied, at least from a
technical perspective.

I think we should go further and fully divorce them. Solr has its needs and
the handling of LocalParams clearly seems to be confusing users but it isn't
something the classic QP should have to resolve. Equally, Solr development
shouldn't be saddled with having to compromise its query features just so they
fit into the classic QP. As I say, the classic QP is the lowest common
denominator of query syntax and parsing and I would recommend to any user (Solr
or not) that when they need to make a large syntactical change, that they roll
their own parser.

Solr LocalParams for Lucene Query Parser

Key: LUCENE-4271
URL: https://issues.apache.org/jira/browse/LUCENE-4271
Project: Lucene - Core
Issue Type: New Feature
Reporter: Yonik Seeley
Attachments: LUCENE-4271.patch

The Lucene QueryParser should implement Solr's LocalParams syntax directly so
that instead of
{code}
_query_:{!geodist d=10 p=20.5,30.2}
{code}
one could directly use
{code}
{!geodist d=10 p=20.5,30.2}
{code}
references: http://wiki.apache.org/solr/LocalParams

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4542) Make RECURSION_CAP in HunspellStemmer configurable

2012-11-07 Thread Chris Male (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13492892#comment-13492892
 ] 

Chris Male commented on LUCENE-4542:


Rafał,

Thanks for creating the patches, they are looking great.  Couple of very small 
improvements:

- Can we mark recursionCap as final?
- Can we improve the javadoc for the recursionCap parameter so it's clear what 
purpose it serves?
- Maybe also drop in a comment at the field about how the recursion cap of 2 is 
the default value based on documentation about Hunspell (as opposed to 
something we arbitrarily chose).

 Make RECURSION_CAP in HunspellStemmer configurable
 --

 Key: LUCENE-4542
 URL: https://issues.apache.org/jira/browse/LUCENE-4542
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 4.0
Reporter: Piotr
Assignee: Chris Male
 Attachments: LUCENE-4542.patch, LUCENE-4542-with-solr.patch


 Currently there is 
 private static final int RECURSION_CAP = 2;
 in the code of the class HunspellStemmer. It makes using hunspell with 
 several dictionaries almost unusable, due to bad performance (f.ex. it costs 
 36ms to stem long sentence in latvian for recursion_cap=2 and 5 ms for 
 recursion_cap=1). It would be nice to be able to tune this number as needed.
 AFAIK this number (2) was chosen arbitrary.
 (it's a first issue in my life, so please forgive me any mistakes done).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4542) Make RECURSION_CAP in HunspellStemmer configurable

2012-11-06 Thread Chris Male (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491845#comment-13491845
 ] 

Chris Male commented on LUCENE-4542:


+1 I absolutely agree we need to make this change.  There is another issue (I 
can't remember what just yet and I'm using a bad connection) where the 
recursion cap was causing analysis loops.  

Do you want to create a patch? We need to maintain backwards compatibility so 
the default experience should be using RECURSION_CAP as it is today.  However 
users should be able to pass in a value as well (that includes the 
HunspellStemFilterFactory).

 Make RECURSION_CAP in HunspellStemmer configurable
 --

 Key: LUCENE-4542
 URL: https://issues.apache.org/jira/browse/LUCENE-4542
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 4.0
Reporter: Piotr

 Currently there is 
 private static final int RECURSION_CAP = 2;
 in the code of the class HunspellStemmer. It makes using hunspell with 
 several dictionaries almost unusable, due to bad performance (f.ex. it costs 
 36ms to stem long sentence in latvian for recursion_cap=2 and 5 ms for 
 recursion_cap=1). It would be nice to be able to tune this number as needed.
 AFAIK this number (2) was chosen arbitrary.
 (it's a first issue in my life, so please forgive me any mistakes done).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-4542) Make RECURSION_CAP in HunspellStemmer configurable

2012-11-06 Thread Chris Male (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male reassigned LUCENE-4542:
--

Assignee: Chris Male

 Make RECURSION_CAP in HunspellStemmer configurable
 --

 Key: LUCENE-4542
 URL: https://issues.apache.org/jira/browse/LUCENE-4542
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 4.0
Reporter: Piotr
Assignee: Chris Male

 Currently there is 
 private static final int RECURSION_CAP = 2;
 in the code of the class HunspellStemmer. It makes using hunspell with 
 several dictionaries almost unusable, due to bad performance (f.ex. it costs 
 36ms to stem long sentence in latvian for recursion_cap=2 and 5 ms for 
 recursion_cap=1). It would be nice to be able to tune this number as needed.
 AFAIK this number (2) was chosen arbitrary.
 (it's a first issue in my life, so please forgive me any mistakes done).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4511) TermsFilter might return wrong results if a field is not indexed or not present in the index

2012-10-30 Thread Chris Male (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13487417#comment-13487417
 ] 

Chris Male commented on LUCENE-4511:


+1 to these improvements.

Another typo: to optimize for this case and to be fitler-cache friendly we  
- filter-cache

 TermsFilter might return wrong results if a field is not indexed or not 
 present in the index
 

 Key: LUCENE-4511
 URL: https://issues.apache.org/jira/browse/LUCENE-4511
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/other
Affects Versions: 4.0, 4.1, 5.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.1, 5.0

 Attachments: LUCENE-4511.patch, LUCENE-4511.patch, LUCENE-4511.patch, 
 LUCENE-4511.patch


 TermsFilter returns if a term returns null from AIR#terms(term) while it 
 should just continue. I will upload a test  fix shortly

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Welcome Alan Woodward as Lucene/Solr committer

2012-10-17 Thread Chris Male

Welcome Alan!

On Wed, Oct 17, 2012 at 6:36 PM, Robert Muir rcm...@gmail.com wrote:

 I'm pleased to announce that the Lucene PMC has voted Alan as a
 Lucene/Solr committer.

 Alan has been contributing patches on various tricky stuff: positions
 iterators, span queries, highlighters, codecs, and so on.

 Alan: its tradition that you introduce yourself with your background.

 I think your account is fully working and you should be able to add
 yourself to the who we are page on the website as well.

 Congratulations!

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-- 
Chris Male | Open Source Search Developer | elasticsearch |
www.ehttp://www.dutchworks.nl
lasticsearch.com

Re: [ANNOUNCE] Apache Lucene 4.0 released.

2012-10-12 Thread Chris Male

 releases (see
  http://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times-
  faster.html).
 
   * A new spell checker, DirectSpellChecker, finds possible corrections
 directly
  against the main search index without requiring a separate index.
 
   * Various in-memory data structures such as the term dictionary and
  FieldCache are represented more efficiently with less object overhead
 (see
  http://blog.mikemccandless.com/2010/07/lucenes-ram-usage-for-
  searching.html).
 
   * All search logic is now required to work per segment, IndexReader was
  therefore refactored to differentiate between atomic and composite
 readers
  (see
 http://blog.thetaphi.de/2012/02/is-your-indexreader-atomic-major.html).
 
   * Lucene 4.0 provides a modular API, consolidating components such as
  Analyzers and Queries that were previously scattered across Lucene core,
  contrib, and Solr. These modules also include additional functionality
 such as
  UIMA analyzer integration and a completely reworked spatial search
  implementation.
 
  Noteworthy changes since 4.0-BETA:
 
   * A new Block PostingsFormat offering improved search performance and
  index compression. This will likely become the default format in a future
  release. (see http://blog.mikemccandless.com/2012/08/lucenes-new-
  blockpostingsformat-thanks.html).
 
   * All non-default codec implementations were moved to a separated codecs
  module. Just add lucene-codecs-4.0.0.jar to your classpath to test these
 out.
 
   * Payloads can be optionally stored on the term vectors.
 
   * Many bugfixes and optimizations.
 
  Please read CHANGES.txt and MIGRATE.txt for a full list of new features
 and
  notes on upgrading. Particularly, the new apis are not compatible with
 previous
  versions of Lucene, however, file format backwards compatibility is
 provided
  for indexes from the 3.0 series and the 4.0-alpha and -beta releases.
 
  Please report any feedback to the mailing lists
  (http://lucene.apache.org/core/discussion.html)
 
  Note: The Apache Software Foundation uses an extensive mirroring network
 for
  distributing releases.  It is possible that the mirror you are using may
 not have
  replicated the release yet.  If that is the case, please try another
 mirror.  This
  also goes for Maven access.
 
  Happy searching,
 
  Apache Lucene/Solr Developers


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-- 
Chris Male | Open Source Search Developer | elasticsearch |
www.ehttp://www.dutchworks.nl
lasticsearch.com

Re: VOTE: release 4.0 (take two)

2012-09-28 Thread Chris Male

+1

On Fri, Sep 28, 2012 at 7:15 AM, Robert Muir rcm...@gmail.com wrote:

 artifacts are here: http://s.apache.org/lusolr40rc1

 By the way, thanks for all the help improving smoketesting and
 packaging and so on. This will pay off in the future!

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-- 
Chris Male | Open Source Search Developer | elasticsearch |
www.ehttp://www.dutchworks.nl
lasticsearch.com

[jira] [Commented] (LUCENE-4427) remove webapp from lucene/demo

2012-09-25 Thread Chris Male (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13463384#comment-13463384
 ] 

Chris Male commented on LUCENE-4427:


I've no great attachment to this code but it's trying to demonstrate the XML 
QueryParser, that's its point.  If we think that has no value then sure, lets 
remove it. But if we just want to make it lighter weight and a little easier to 
maintain, then we could convert it to a simple console app and fix the problems.

 remove webapp from lucene/demo
 --

 Key: LUCENE-4427
 URL: https://issues.apache.org/jira/browse/LUCENE-4427
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless

 Spinoff of SOLR-3879:
 I think the webapp in lucene/demo is a poor demo ... we should remove it.
 EG it does not close its IndexReader, it uses the [very expert] XML 
 QueryParser, 
 it passes Version.LUCENE_CURRENT when creating the StandardAnalyzer ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4419) Test RecursivePrefixTree indexing non-point data

2012-09-23 Thread Chris Male (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-4419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13461548#comment-13461548
]

Chris Male commented on LUCENE-4419:

I really don't see the benefit of randomly generating Shapes. There isn't much
to be revealed with a rectangle that say covers one small part of the pacific
ocean and another rectangle which covers another small part. The number of
possible Shapes is just too massive to ever reveal anything.

What I feel would be better is if we defined Shapes that test particularly
troublesome areas. Datelines, equators, poles. We can also include massive
Shapes and tiny Shapes, circles, points, and whatever else we end up supporting.

Having this standardized Shape suite would be a big benefit to testing all the
Strategys. I don't think it would be particularly difficult to create and once
created, it wouldn't require much maintenance at all.

Test RecursivePrefixTree indexing non-point data

Key: LUCENE-4419
URL: https://issues.apache.org/jira/browse/LUCENE-4419
Project: Lucene - Core
Issue Type: Improvement
Components: modules/spatial
Reporter: David Smiley

RecursivePrefixTreeFilter was modified in ~July 2011 to support spatial
filtering of non-point indexed shapes. It seems to work when playing with
the capability but it isn't tested. It really needs to be as this is a major
feature.
I imagine an approach in which some randomly generated rectangles are indexed
and then a randomly generated rectangle is queried. The right answer can be
calculated brute-force and then compared with the filter. In order to deal
with shape imprecision, the randomly generated shapes could be generated to
fit a course grid (e.g. round everything to a 1 degree interval).

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4419) Test RecursivePrefixTree indexing non-point data

2012-09-23 Thread Chris Male (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-4419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13461564#comment-13461564
]

Chris Male commented on LUCENE-4419:

bq. I'm all for what you suggest – a test that could be used by multiple
strategies

I didn't suggest that. I suggested a common suite of Shapes. I don't like the
idea of having a single test for all Strategys since they work in different
ways and support different things.

bq. I like randomized tests because it can catch errors that a static test
simply didn't test for

Theres a difference between randomized tests and randomized Shape generation
(again I didn't suggest we stopped randomized testing). The world is massive,
much of it isn't remotely interesting or challenging to our spatial
implementations. Just generating arbitrary Shapes somewhere on the globe seems
a total waste of time.

If we have a standard set of Shapes then we can use randomized testing to
handle the permutations between them, but we shouldn't waste days waiting for
tests to hit an interesting Shape.

Test RecursivePrefixTree indexing non-point data

Key: LUCENE-4419
URL: https://issues.apache.org/jira/browse/LUCENE-4419
Project: Lucene - Core
Issue Type: Improvement
Components: modules/spatial
Reporter: David Smiley

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-4412) Reconsider FunctionValues / ValueSource API

2012-09-22 Thread Chris Male (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13461320#comment-13461320
 ] 

Chris Male edited comment on LUCENE-4412 at 9/23/12 4:03 PM:
-

One of the big challenges for this API is the issue of multiple-values.  
Applying a function to two lots of multiple-values is difficult as you begin to 
run into order problems and issue of what to do when the cardinalities are 
different.

  was (Author: cmale):
One of the big challenges for this API is the issue of multiple-values.  
Applying a function to two lots of multiple-values is different as you begin to 
run into order problems and issue of what to do when the cardinalities are 
different.
  
 Reconsider FunctionValues / ValueSource API
 ---

 Key: LUCENE-4412
 URL: https://issues.apache.org/jira/browse/LUCENE-4412
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/other
Reporter: Chris Male
 Fix For: 5.0


 When documenting a lot of these classes today I found myself confused and it 
 isn't the first time with this API.  
 I think we need to step back and reassess what we want from this API, what 
 use cases its designed to meet, and redesign it from the ground up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4412) Reconsider FunctionValues / ValueSource API

2012-09-22 Thread Chris Male (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13461320#comment-13461320
 ] 

Chris Male commented on LUCENE-4412:


One of the big challenges for this API is the issue of multiple-values.  
Applying a function to two lots of multiple-values is different as you begin to 
run into order problems and issue of what to do when the cardinalities are 
different.

 Reconsider FunctionValues / ValueSource API
 ---

 Key: LUCENE-4412
 URL: https://issues.apache.org/jira/browse/LUCENE-4412
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/other
Reporter: Chris Male
 Fix For: 5.0


 When documenting a lot of these classes today I found myself confused and it 
 isn't the first time with this API.  
 I think we need to step back and reassess what we want from this API, what 
 use cases its designed to meet, and redesign it from the ground up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4412) Reconsider FunctionValues / ValueSource API

2012-09-21 Thread Chris Male (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460439#comment-13460439
 ] 

Chris Male commented on LUCENE-4412:


Thanks for raising those concerns David.  They're exactly what I'm referring to 
and what concern me greatly.  If you have any thoughts on how we can better 
design this API (and lets not be bound by what the current API looks like) 
please put them in this issue.

 Reconsider FunctionValues / ValueSource API
 ---

 Key: LUCENE-4412
 URL: https://issues.apache.org/jira/browse/LUCENE-4412
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/other
Reporter: Chris Male
 Fix For: 5.0


 When documenting a lot of these classes today I found myself confused and it 
 isn't the first time with this API.  
 I think we need to step back and reassess what we want from this API, what 
 use cases its designed to meet, and redesign it from the ground up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-4412) Reconsider FunctionValues / ValueSource API

2012-09-20 Thread Chris Male (JIRA)

Chris Male created LUCENE-4412:
--

 Summary: Reconsider FunctionValues / ValueSource API
 Key: LUCENE-4412
 URL: https://issues.apache.org/jira/browse/LUCENE-4412
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/other
Reporter: Chris Male
 Fix For: 5.0


When documenting a lot of these classes today I found myself confused and it 
isn't the first time with this API.  

I think we need to step back and reassess what we want from this API, what use 
cases its designed to meet, and redesign it from the ground up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4409) implement javadocs linting with eclipse ecj compiler

2012-09-19 Thread Chris Male (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13459295#comment-13459295
 ] 

Chris Male commented on LUCENE-4409:


+1 That's pretty damn cool

 implement javadocs linting with eclipse ecj compiler
 

 Key: LUCENE-4409
 URL: https://issues.apache.org/jira/browse/LUCENE-4409
 Project: Lucene - Core
  Issue Type: Task
  Components: general/build
Reporter: Robert Muir

 today we have a lot of custom python scripts checking javadocs (checking for 
 missing stuff too).
 Most of this is implemented by parsing html etc (some of this should stay 
 this way, like broken-link detection)
 But actually the eclipse compiler can do most of this type of linting, and 
 has a lot of options for it. We can pull it via ivy and run it from the 
 command-line.
 I tested this manually by adding a bogus throws clause to Codec.java, 
 downloading the ecj.jar from maven and running it manually:
 {noformat}
 rmuir@beast:~/workspace/lucene-trunk/lucene/core/src/java$ java -cp 
 ~/Downloads/ecj-3.7.2.jar org.eclipse.jdt.internal.compiler.batch.Main 
 -source 1.6 -d none -enableJavadoc -properties 
 ~/workspace/lucene-trunk/dev-tools/eclipse/.settings/org.eclipse.jdt.core.prefs
  .
 ...
 --
 120. ERROR in 
 /home/rmuir/workspace/lucene-trunk/lucene/core/src/java/./org/apache/lucene/codecs/Codec.java
  (at line 59)
   * @throws IOException */
 ^^^
 Javadoc: Exception IOException is not declared
 --
 {noformat}
 here i specified -d none (don't generate class files), and essentially told 
 it to read the compiler warnings/errors options set in the dev-tools config. 
 For javadocs-lint we would want our own separate properties file that 
 disables the ordinary java warnings (because eclipse can warn/error/ignore on 
 lots of things, not just javadocs, and does by default).
 Separately we could also use this to check/fail/warn on other things besides 
 javadoc...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4175) Include BBox Spatial Strategy

2012-09-19 Thread Chris Male (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13459341#comment-13459341
 ] 

Chris Male commented on LUCENE-4175:


With the very near release of 4.0, I don't think we should backport anything 
untested.  I also don't think we're in any immediate hurry for this since we've 
got other options in 4.0.  But we should definitely work on the testing and 
push it for 4.1.

 Include BBox Spatial Strategy
 -

 Key: LUCENE-4175
 URL: https://issues.apache.org/jira/browse/LUCENE-4175
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Ryan McKinley
Assignee: Ryan McKinley
 Attachments: LUCENE-4175-bbox-strategy.patch


 This is an approach to indexing bounding boxes using 4 numeric fields 
 (xmin,ymin,xmax,ymax) and a flag to say if it crosses the dateline.
 This is a modification from the Apache 2.0 code from the ESRI Geoportal:
 http://geoportal.svn.sourceforge.net/svnroot/geoportal/Geoportal/trunk/src/com/esri/gpt/catalog/lucene/SpatialClauseAdapter.java

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3997) join module should not depend on grouping module

2012-09-17 Thread Chris Male (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456892#comment-13456892
 ] 

Chris Male commented on LUCENE-3997:


bq. I propose, instead of using lucene-core as the location for code used by 
multiple modules, that we create a (single) new module that serves this 
purpose, something like lucene-shared or lucene-common (though common analyzers 
already use this name...)

I actually created lucene-common that when I first refactored out the 
FunctionQuery codebase.  After some time it was decided (in an issue I can't 
remember) that the code would go into lucene-core.  I agree with your 
assessment that we shouldn't use lucene-core as a dumping ground, but we should 
get a discussion about this going.

 join module should not depend on grouping module
 

 Key: LUCENE-3997
 URL: https://issues.apache.org/jira/browse/LUCENE-3997
 Project: Lucene - Core
  Issue Type: Task
Affects Versions: 4.0-ALPHA
Reporter: Robert Muir
 Fix For: 4.1

 Attachments: LUCENE-3997.patch, LUCENE-3997.patch


 I think TopGroups/GroupDocs should simply be in core? 
 Both grouping and join modules use these trivial classes, but join depends on 
 grouping just for them.
 I think its better that we try to minimize these inter-module dependencies.
 Of course, another option is to combine grouping and join into one module, but
 last time i brought that up nobody could agree on a name. 
 Anyway I think the change is pretty clean: its similar to having basic stuff 
 like Analyzer.java in core,
 so other things can work with Analyzer without depending on any specific 
 implementing modules.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: being a good citizen is hard when you can't successfully run tests....

2012-09-17 Thread Chris Male

On Tue, Sep 18, 2012 at 12:45 AM, Dawid Weiss dawid.we...@gmail.com wrote:

 I think we can even integrate hossman's suggestion and generate a
 stability report like weekly or something.

 I will take a look at this this week but it is definitely something that
 will require everyone's consensus.

What would they add in addition to the test histories you can see on
jenkins?


 Dawid

 Sent from mobile phone.
 On Sep 17, 2012 2:42 PM, Michael McCandless luc...@mikemccandless.com
 wrote:

 I agree that a test that frequently fails, and does not get fixed, is
 nearly pointless: everybody ignores it so it's as if the test didn't
 exist.  And so it should be disabled.

 I say *nearly* because the failures are in fact useful to devs who do
 have the itch/time to debug/fix them.

 So I think we need some middle ground here, where the tests keep
 failing but only those that are interested in the failures see the
 notifications.  We need to switch from a push model (any failure is
 broadcast to everybody) to a pull model (those devs that want to
 debug the failures go and check the logs), for such tests.

 When someone wants to make sure their change didn't break something
 (Erick's original use case) then these tests should not run.

 I like Dawid's idea (a separate test plan that Jenkins runs with these
 difficult tests, and it wouldn't email dev on failure).

 Mike McCandless

 http://blog.mikemccandless.com

 On Mon, Sep 17, 2012 at 7:58 AM, Robert Muir rcm...@gmail.com wrote:
  On Sun, Sep 16, 2012 at 11:10 PM, Mark Miller markrmil...@gmail.com
 wrote:
  I get value from this test - if it was disabled, I'd probably
 re-enable it.
  would be great if it didn't fail so much, but the type of fail tells me
  something.
 
  That means the assert in question isnt important at all. I'll remove it.
 
  Again my problem is the idea that having a failing build is ok
  because certain types of failures don't matter. If they dont matter
  they should be removed.
 
  It causes a ton of noise when people are lazy about tests in this way,
  and it wastes a ton of peoples time. R
 
  Remember every time one of these tests fails it sends an email, that I
  must read (we don't yet have a way to put in the subject header its a
  SOLR test fail versus a LUCENE one, or i'd filter the solr ones and
  not be complaining as much).
 
  --
  lucidworks.com
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: dev-h...@lucene.apache.org
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-- 
Chris Male | Open Source Search Developer | elasticsearch |
www.ehttp://www.dutchworks.nl
lasticsearch.com

Re: being a good citizen is hard when you can't successfully run tests....

2012-09-17 Thread Chris Male

On Tue, Sep 18, 2012 at 1:11 AM, Dawid Weiss
dawid.we...@cs.put.poznan.plwrote:

  What would they add in addition to the test histories you can see on
  jenkins?

 Is there a per-test history on jenkins too? I'm more familiar with
 Atlassian Bamboo. Obviously if it already is in Jenkins there's no
 need to do anything other than just run tests with


Yeah there is.  It's a little messy and hard to navigate, but an example:

https://builds.apache.org/job/Lucene-Solr-Tests-4.x-Java6/661/testReport/junit/org.apache.solr.cloud/SyncSliceTest/testDistribSearch/history/

(wait for it to load)


 -Dtests.haltonfailure=false

 I'm wondering if jenkins also considers a build failed if tests fail
 but ant returns with success (i.e. does it parse log XMLs and derive
 this information from there)?


No idea sorry.



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-- 
Chris Male | Open Source Search Developer | elasticsearch |
www.ehttp://www.dutchworks.nl
lasticsearch.com

[jira] [Commented] (LUCENE-4388) ShapeMatcher and ShapeValues

2012-09-16 Thread Chris Male (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456561#comment-13456561
]

Chris Male commented on LUCENE-4388:

Interesting idea. I like the idea of Strategys exposing ShapeValues and then
having a standard DistanceValueSource which accepted a Shape, ShapeValues and a
DistanceCalculator. I like that it would also make it easier to retrieve the
Shape if it was needed in other places.

I am little worried that this could encourage consumers, whether they be other
Strategy impls or something else, to use un-inverted index structures instead
of inverted and subsequently suffer in performance and in memory consumption.

bq. And a strategy could support any query shape simply by implementing
makeShapeValues().

I don't understand this. Can you elaborate?

bq. I've been thinking about how the API handles strategies supporting indexing
multiple shapes and I wonder if that could happen simply via a new
MultiShapeShape

One of the challenges with this API is that whether multiple values are
supported is a per Strategy decision, yet whether there are multiple values is
a per Document decision. Document 1 might have only a single Shape, Document 2
might have multiple Shapes. I just wonder whether we want to force Strategys
which support multiple values to always use MultiShape, or whether it should
change per Document and then force the consumer to check.

ShapeMatcher and ShapeValues

Key: LUCENE-4388
URL: https://issues.apache.org/jira/browse/LUCENE-4388
Project: Lucene - Core
Issue Type: New Feature
Components: modules/spatial
Reporter: David Smiley
Attachments: LUCENE-4388_ShapeValues_and_ShapeMatcher.patch

This patch provides two key interfaces: ShapeMatcher and ShapeValues. The
ShapeMatcher concept is borrowed from [~ryantxu]'s JtsGeoStrategy which has a
similar GeometryTester. ShapeValues is basically a
ValueSource/FunctionValues for shapes. This isn't working; I didn't modify
any existing classes.
I haven't completely thought this through but a SpatialStrategy might expose
a makeShapeValues(IndexReader) and/or makeCenterShapeValues(IndexReader) (the
latter is the center points of indexed data). A generic Distance ValueSource
could easily be implemented in terms of makeCenterShapeValues(). And a
strategy could support any query shape simply by implementing
makeShapeValues().
I've been thinking about how the API handles strategies supporting indexing
multiple shapes and I wonder if that could happen simply via a new
MultiShapeShape.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: being a good citizen is hard when you can't successfully run tests....

2012-09-16 Thread Chris Male

On Mon, Sep 17, 2012 at 1:30 PM, Yonik Seeley yo...@lucidworks.com wrote:

 On Sun, Sep 16, 2012 at 2:51 PM, Robert Muir rcm...@gmail.com wrote:
  not so much energy spent fixing these few shitty solr tests, some of
  which (Like TestReplicationHandler) are totally useless and have been
  failing sporatically for like, years.

 Can you explain why it's useless (without the derogatory adjectives
 please)?


I'm not wanting to get into issues of usefulness of tests or not, but I did
just look at the build failure messages over the last few months and I've
received a build failure message for this test almost every single day.  I
appreciate that this doesn't happen locally and makes it hard to fix, but
it's hard to work with continuous integration that so commonly fails on one
test.



 I didn't write the test to begin with, so I don't know off the top of
 my head all of the functionality it covers.  I'd be surprised if it
 was all redundant and covered by other test suites of course.

 Notes:
  - I remember it passing for *long* periods of time
  - I just ran it in a loop 30 times on my linux box and it passed 100%
 of the time, and in a timely manner
  - It *has* found many bugs when it started failing (i.e. usefull, not
 useless)
  - Many of us (including you) *have* worked to improve the situation
 over time when it does deteriorate - check the logs.

 It's not clear what you are suggesting (unless you are volunteering to
 look into this issue with OS-X apparently, or volunteering to write a
 new replication test from scratch or something).

 -Yonik
 http://lucidworks.com

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-- 
Chris Male | Open Source Search Developer | elasticsearch |
www.ehttp://www.dutchworks.nl
lasticsearch.com

[jira] [Commented] (LUCENE-4388) ShapeMatcher and ShapeValues

2012-09-16 Thread Chris Male (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456757#comment-13456757
]

Chris Male commented on LUCENE-4388:

bq. The reasoning is similar to how a standard DistanceValueSource could then
exist. For a makeFilter / makeQuery, there could be a standard ShapeFilter that
consults makeShapeValues to intersect with the query shape. Of course, it
should be preceded by a bbox filter or something similar.

That's going to be so slow. Iterating over every Shape of every Document to
see if intersects? That harks back to WildcardQuery performance of old. Even
with a BBox, you could have 100,000 points within a city. I don't think we
should ever support this. If a user wants to create it themselves then fine,
but we should be striving for performance.

bq. I'm not sure what you mean. But a problem with the other approach (forcing
MultiShape for createFields) is that it would make Solr support difficult,
perhaps requiring a UpdateRequestProcessor to join separate field values into
one. But even putting that aside, I don't think use of a MultiShape needs to be
forced, but it should be supported by the Strategy if it declares that it
handles multi-valued shapes.

Given this issue is about ShapeValues, I'm talking about retrieving Shapes
through ShapeValues, not about indexing. What I was saying is given the
ShapeValues interface:

{code}
S shape(int docId, IndexReader reader);
{code}

We need to decide what S is going to be. If S is always Shape then the
consumer would need to check if the actual value returned was a MultiShape or
not, in order to retrieve the multiple Shapes. If S was always MultiShape,
then the ShapeValues impl would need to return a MultiShape even when there
might only be one Shape associated with the given docId.

This isn't a blocking problem, I was merely suggesting that we need to think
through the use cases we want to support and how MultiShape fits in.

ShapeMatcher and ShapeValues

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4389) Fix TwoDoubles dateline support

2012-09-15 Thread Chris Male (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456405#comment-13456405
 ] 

Chris Male commented on LUCENE-4389:


I have faith in your knowledge on this and there seems to be adequate testing, 
so lets go ahead and commit that.

 Fix TwoDoubles dateline support
 ---

 Key: LUCENE-4389
 URL: https://issues.apache.org/jira/browse/LUCENE-4389
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spatial
Reporter: David Smiley
Assignee: David Smiley
 Fix For: 4.0

 Attachments: 
 LUCENE-4389_Support_dateline_and_circles_for_TwoDoubles.patch, LUCENE-4389 
 Support dateline for TwoDoubles.patch


 The dateline support can easily be fixed.  After this, the TwoDoublesStrategy 
 might not be particularly useful but at least it won't be buggy if you stay 
 with Rectangle query shapes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4208) Spatial distance relevancy should use score of 1/distance

2012-09-14 Thread Chris Male (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455768#comment-13455768
]

Chris Male commented on LUCENE-4208:

Things are looking pretty good, we're almost there.

- Where are we on multi-valued fields? In the documentation on
makeDistanceValueSource it doesn't say what happens when multiple values are
indexed. Do we support that in the ValueSource implementations? is the
behaviour undefined? If it is supposed to be defined, can we document it?
- Returns a ValueSource useful as a score Can we drop this claim? Part of the
reason we've moved to having ConstantScoreQuerys is that it isn't clear what
the score for the queries should be. This value isn't useful for every spatial
operation or implementation.

Once these have gotten addressed, I'm +1 for committing.

Spatial distance relevancy should use score of 1/distance
-

Key: LUCENE-4208
URL: https://issues.apache.org/jira/browse/LUCENE-4208
Project: Lucene - Core
Issue Type: New Feature
Components: modules/spatial
Reporter: David Smiley
Fix For: 4.0

Attachments:
LUCENE-4208_makeQuery_return_ConstantScoreQuery_and_remake_TwoDoublesStrategy.patch,

LUCENE-4208_makeQuery_return_ConstantScoreQuery,_standardize_makeDistanceValueSource_behav.patch,

LUCENE-4208_makeQuery_return_ConstantScoreQuery,_standardize_makeDistanceValueSource_behav.patch

The SpatialStrategy.makeQuery() at the moment uses the distance as the score
(although some strategies -- TwoDoubles if I recall might not do anything
which would be a bug). The distance is a poor value to use as the score
because the score should be related to relevancy, and the distance itself is
inversely related to that. A score of 1/distance would be nice. Another
alternative is earthCircumference/2 - distance, although I like 1/distance
better. Maybe use a different constant than 1.
Credit: this is Chris Male's idea.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4208) Spatial distance relevancy should use score of 1/distance

2012-09-12 Thread Chris Male (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453746#comment-13453746
]

Chris Male commented on LUCENE-4208:

I disagree that makeQuery shouldn't exist. There are optimizations to be had
in Query code, such as using BooleanQuery and its associated highly optimized
scorer algs. I think it should continue to exist but should have a default
implementation that creates a CSQ by calling makeFilter.

Spatial distance relevancy should use score of 1/distance
-

Key: LUCENE-4208
URL: https://issues.apache.org/jira/browse/LUCENE-4208
Project: Lucene - Core
Issue Type: New Feature
Components: modules/spatial
Reporter: David Smiley
Fix For: 4.0

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4208) Spatial distance relevancy should use score of 1/distance

2012-09-12 Thread Chris Male (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454581#comment-13454581
]

Chris Male commented on LUCENE-4208:

bq. TwoDoubles is getting overhauled to support the dateline and any query
shape--should probably go into another issue.

Yes please!

Spatial distance relevancy should use score of 1/distance
-

Key: LUCENE-4208
URL: https://issues.apache.org/jira/browse/LUCENE-4208
Project: Lucene - Core
Issue Type: New Feature
Components: modules/spatial
Reporter: David Smiley
Fix For: 4.0

Attachments:
LUCENE-4208_makeQuery_return_ConstantScoreQuery_and_remake_TwoDoublesStrategy.patch

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful


[ 
https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452884#comment-13452884
 ] 

Chris Male commented on LUCENE-4369:


As I say, I totally support renaming this field to something.  I think calling 
it anything else will help with distinguishing it from TextField so I'm +1 for 
MatchOnly.  Perhaps that'll encourage people to read the docs about it not 
being analyzed.

 StringFields name is unintuitive and not helpful
 

 Key: LUCENE-4369
 URL: https://issues.apache.org/jira/browse/LUCENE-4369
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir

 There's a huge difference between TextField and StringField, StringField 
 screws up scoring and bypasses your Analyzer.
 (see java-user thread Custom Analyzer Not Called When Indexing as an 
 example.)
 The name we use here is vital, otherwise people will get bad results.
 I think we should rename StringField to MatchOnlyField.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful


[ 
https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452924#comment-13452924
 ] 

Chris Male commented on LUCENE-4369:


I like ExactMatchField, good suggestion.

 StringFields name is unintuitive and not helpful
 

 Key: LUCENE-4369
 URL: https://issues.apache.org/jira/browse/LUCENE-4369
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-4369.patch


 There's a huge difference between TextField and StringField, StringField 
 screws up scoring and bypasses your Analyzer.
 (see java-user thread Custom Analyzer Not Called When Indexing as an 
 example.)
 The name we use here is vital, otherwise people will get bad results.
 I think we should rename StringField to MatchOnlyField.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4375) Spatial BBoxIntersects and BBoxWithin are used incorrectly

[
https://issues.apache.org/jira/browse/LUCENE-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453674#comment-13453674
]

Chris Male commented on LUCENE-4375:

In the future, once we remove much of the restrictions on the SpatialStrategy
interface, we could have implementations of PTS that was limited to Points and
supported isWithin. Till then, I don't think we should include hacks just to
support isWithin for Points. Lets leave the API nice and we'll make
improvements when we can.

Spatial BBoxIntersects and BBoxWithin are used incorrectly
--

Key: LUCENE-4375
URL: https://issues.apache.org/jira/browse/LUCENE-4375
Project: Lucene - Core
Issue Type: Bug
Reporter: David Smiley
Assignee: David Smiley
Fix For: 4.0

Attachments:
LUCENE-4375_Fix_use_of_BBoxWithin_BBoxIntersects_and_IsWithin.patch

SpatialOperation has two special BBoxIntersects and BBoxWithin choices. I
assumed these where the bounding boxes of the query shape but [~ryantxu]
informed me these are supposed to be for the *indexed shape*. There is no
strategy in Lucene spatial that could use this but there is one externally --
JtsGeoStrategy. Javadocs should be added to clarify, and various places like
SpatialArgs.getShape() should be fixed to not use it incorrectly.
This does remove a feature from the Solr adapters side; the test there will
need to change.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4208) Spatial distance relevancy should use score of 1/distance

[
https://issues.apache.org/jira/browse/LUCENE-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453708#comment-13453708
]

Chris Male commented on LUCENE-4208:

I don't think there is a clear solution here. But I feel ValueSource provides
maximum flexibility going forward. If we continue to support makeValueSource
then people can sort, or include it in their query if they want, or just
retrieve the value at some later stage. makeQuery() should just return a
ConstantScoreQuery. We can consider in future versions what if anything we
want to do around its score.

WRT to TwoDoubles. This Strategy was a nice start to this work awhile back and
was designed to replicate existing point-distance functionality. But it has
huge limitations and it constantly feels like we're being held back by it.
Every Strategy has its limitations, and I dont feel we should hold back changes
just because it impacts TwoDoubles.

Spatial distance relevancy should use score of 1/distance
-

Key: LUCENE-4208
URL: https://issues.apache.org/jira/browse/LUCENE-4208
Project: Lucene - Core
Issue Type: New Feature
Components: modules/spatial
Reporter: David Smiley
Fix For: 4.0

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4208) Spatial distance relevancy should use score of 1/distance

2012-09-09 Thread Chris Male (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13451508#comment-13451508
]

Chris Male commented on LUCENE-4208:

I actually totally agree with David here. Using ValueSource (instead of my
SpatialSimilarity idea) is an excellent solution which leverages existing
Lucene code. Having it this way means that even if a Strategy has a custom
Query implementation (maybe for performance reasons) it would still be possible
to make use of the ValueSource in scoring.

I definitely think we should expose this on a per Strategy basis rather than
all Strategys as some Strategys may not be able to compute distance and we
shouldn't force them to.

Spatial distance relevancy should use score of 1/distance
-

Key: LUCENE-4208
URL: https://issues.apache.org/jira/browse/LUCENE-4208
Project: Lucene - Core
Issue Type: New Feature
Components: modules/spatial
Reporter: David Smiley
Fix For: 4.0

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField

2012-09-09 Thread Chris Male (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13451579#comment-13451579
 ] 

Chris Male commented on LUCENE-3312:


David, just at a guess I imagine the branch used in this issue was created 
before we changed createIndexableFields to not handle storing.  To satisfy the 
conditions at the time (indexing and storing) Nikola changed it to return 
Field.  Lets just fix it and we'll be fine.

 Break out StorableField from IndexableField
 ---

 Key: LUCENE-3312
 URL: https://issues.apache.org/jira/browse/LUCENE-3312
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Nikola Tankovic
  Labels: gsoc2012, lucene-gsoc-12
 Fix For: 5.0

 Attachments: LUCENE-3312-DocumentIterators-uwe.patch, 
 lucene-3312-patch-01.patch, lucene-3312-patch-02.patch, 
 lucene-3312-patch-03.patch, lucene-3312-patch-04.patch, 
 lucene-3312-patch-05.patch, lucene-3312-patch-06.patch, 
 lucene-3312-patch-07.patch, lucene-3312-patch-08.patch, 
 lucene-3312-patch-09.patch, lucene-3312-patch-10.patch, 
 lucene-3312-patch-11.patch, lucene-3312-patch-12a.patch, 
 lucene-3312-patch-12.patch, lucene-3312-patch-13.patch, 
 lucene-3312-patch-14.patch, LUCENE-3312-reintegration.patch, 
 LUCENE-3312-reintegration.patch


 In the field type branch we have strongly decoupled
 Document/Field/FieldType impl from the indexer, by having only a
 narrow API (IndexableField) passed to IndexWriter.  This frees apps up
 use their own documents instead of the user-space impls we provide
 in oal.document.
 Similarly, with LUCENE-3309, we've done the same thing on the
 doc/field retrieval side (from IndexReader), with the
 StoredFieldsVisitor.
 But, maybe we should break out StorableField from IndexableField,
 such that when you index a doc you provide two Iterables -- one for the
 IndexableFields and one for the StorableFields.  Either can be null.
 One downside is possible perf hit for fields that are both indexed 
 stored (ie, we visit them twice, lookup their name in a hash twice,
 etc.).  But the upside is a cleaner separation of concerns in API

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful

2012-09-09 Thread Chris Male (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13451730#comment-13451730
 ] 

Chris Male commented on LUCENE-4369:


I'm +1 for renaming this field (and even considering its long term future) I'm 
just not sure how MatchOnlyField conveys the fact it bypasses analysis?

 StringFields name is unintuitive and not helpful
 

 Key: LUCENE-4369
 URL: https://issues.apache.org/jira/browse/LUCENE-4369
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir

 There's a huge difference between TextField and StringField, StringField 
 screws up scoring and bypasses your Analyzer.
 (see java-user thread Custom Analyzer Not Called When Indexing as an 
 example.)
 The name we use here is vital, otherwise people will get bad results.
 I think we should rename StringField to MatchOnlyField.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3686) fix solr/core and solr/solrj not to share a lib/ directory

2012-09-08 Thread Chris Male (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13451507#comment-13451507
 ] 

Chris Male commented on SOLR-3686:
--

bq. If I'm the only intellij user left, I guess I should go back to just 
maintaining my own simple config again since it seems like I'm getting blown 
out of the water every other week or so.

I'm also still using IntelliJ.  If you have any tweaks or fixes, please 
contribute them!

 fix solr/core and solr/solrj not to share a lib/ directory
 --

 Key: SOLR-3686
 URL: https://issues.apache.org/jira/browse/SOLR-3686
 Project: Solr
  Issue Type: Bug
Reporter: Robert Muir
 Fix For: 4.0, 5.0

 Attachments: SOLR-3686.patch, SOLR-3686.patch


 This makes the build system hairy.
 it also prevents us from using ivy's sync=true (LUCENE-4262) 
 which totally prevents the issue of outdated jars.
 We should fix this so each has its own lib/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4186) Lucene spatial's distErrPct is treated as a fraction, not a percent.

2012-09-05 Thread Chris Male (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-4186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448758#comment-13448758
]

Chris Male commented on LUCENE-4186:

bq. SpatialArgs.toString()'s logic was moved to SpatialArgsParser as
writeSpatialArgs(args) since it looks so close to the parsed format and I'd
like to see it parsed and written in the same class.

+1 Makes sense

bq. SpatialArgs.toString() fixes the bug in displaying the error percent that
Itamar noticed.

bq. Standardizes distErrPct terminology in variables and method names.
Despite the pct it's actually a fraction [0 to 0.5].

+1 Do we validate somewhere that the values are between 0 and 0.5?

bq. Instead of SpatialArgs.distErrPct defaulting to 0.025 it defaults to null.
Now the Strategy's own distErrPct (which defaults to 0.025) is supplied to
args.resolveDistErr(...) so it can see if the args overrides the one in
strategy or not.

If I understand correctly, your motivation for doing this is so in the default
scenario (when no pct is defined) you have the same value at both index time
and query time, correct? I'm starting to wonder whether it makes sense to allow
the value to be set per request. Having the same value at both index and query
time seems ideal so perhaps we should force the value, whether it be the pct or
absolute value, be provided at construction of the Strategy.

bq. SpatialArgs gains a distErr field, parsed from SpatialArgsParser. This is
an alternative means that a search request can specify the distance in a more
direct way.

So can the user now provide either the the distErr or distErrPct and if they
provide the later, it is converted to the former seamlessly? Or must the user
do the conversion themselves? I'm +1 for the first option.

bq. One thing I didn't do, is move the distErrPct getter setter up from
PrefixTreeStrategy to the base SpatialStrategy.

Why would we want to move it to SpatialStrategy? It seems unrelated to the
other Strategies.

Lucene spatial's distErrPct is treated as a fraction, not a percent.
--

Key: LUCENE-4186
URL: https://issues.apache.org/jira/browse/LUCENE-4186
Project: Lucene - Core
Issue Type: Bug
Components: modules/spatial
Reporter: David Smiley
Assignee: David Smiley
Priority: Critical
Fix For: 4.0

Attachments: LUCENE-4186_distErrPct_upgrade.patch

The distance-error-percent of a query shape in Lucene spatial is, in a
nutshell, the percent of the shape's area that is an error epsilon when
considering search detail at its edges. The default is 2.5%, for reference.
However, as configured, it is read in as a fraction:
{code:xml}
fieldType name=location_2d_trie
class=solr.SpatialRecursivePrefixTreeFieldType
distErrPct=0.025 maxDetailDist=0.001 /
{code}

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4186) Lucene spatial's distErrPct is treated as a fraction, not a percent.

2012-09-05 Thread Chris Male (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448801#comment-13448801
 ] 

Chris Male commented on LUCENE-4186:


bq. It'd be nice if this could be done at index time too but I'm not sure how 
it would fit into the API. Maybe an overloaded 
createIndexableFields(shape,distErr)

I've always thought it was a little unusual createIndexableFields didn't also 
accept SpatialArgs, so why don't we change it so it does? 

 Lucene spatial's distErrPct is treated as a fraction, not a percent.
 --

 Key: LUCENE-4186
 URL: https://issues.apache.org/jira/browse/LUCENE-4186
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/spatial
Reporter: David Smiley
Assignee: David Smiley
Priority: Critical
 Fix For: 4.0

 Attachments: LUCENE-4186_distErrPct_upgrade.patch


 The distance-error-percent of a query shape in Lucene spatial is, in a 
 nutshell, the percent of the shape's area that is an error epsilon when 
 considering search detail at its edges.  The default is 2.5%, for reference.  
 However, as configured, it is read in as a fraction:
 {code:xml}
 fieldType name=location_2d_trie 
 class=solr.SpatialRecursivePrefixTreeFieldType
distErrPct=0.025 maxDetailDist=0.001 /
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4365) The Maven build can't directly handle complex inter-module dependencies involving the test-framework modules

2012-09-05 Thread Chris Male (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13449400#comment-13449400
 ] 

Chris Male commented on LUCENE-4365:


Great work improving this Steven, what a mess!

 The Maven build can't directly handle complex inter-module dependencies 
 involving the test-framework modules
 

 Key: LUCENE-4365
 URL: https://issues.apache.org/jira/browse/LUCENE-4365
 Project: Lucene - Core
  Issue Type: Improvement
  Components: general/build
Reporter: Steven Rowe
Assignee: Steven Rowe
Priority: Minor
 Attachments: LUCENE-4365.patch, 
 lucene.solr.cyclic.dependencies.removed.png, 
 lucene.solr.dependency.cycles.png.jpg


 The Maven dependency model disallows cyclic dependencies, of which there are 
 now several in the Ant build (considering test and compile dependencies 
 together, as Maven does).  All of these cycles involve either the Lucene 
 test-framework or the Solr test-framework.
 The current Maven build works around this problem by incorporating 
 dependencies' sources into dependent modules' test sources, rather than 
 literally declaring the problematic dependencies as such. (See SOLR-3780 for 
 a recent example of putting this workaround in place for the Solrj module.)  
 But with the factoring out of the Lucene Codecs module, upon which Lucene 
 test-framework has a compile-time dependency, the complexity of the 
 workarounds required to make it all hang together is great enough that I want 
 to attempt a (Maven-build-only) module refactoring.  It should require fewer 
 contortions and be more maintainable.
 The Maven build is currently broken, as of the addition of the Codecs module 
 (LUCENE-4340).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4354) add validate-maven task to check maven dependencies


[ 
https://issues.apache.org/jira/browse/LUCENE-4354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13447617#comment-13447617
 ] 

Chris Male commented on LUCENE-4354:


hamcrest is a transitive dependency of junit, we'll need to exclude that 
specifically in our poms.

 add validate-maven task to check maven dependencies
 ---

 Key: LUCENE-4354
 URL: https://issues.apache.org/jira/browse/LUCENE-4354
 Project: Lucene - Core
  Issue Type: Task
Reporter: Robert Muir
 Attachments: LUCENE-4354.patch


 We had a situation where the maven artifacts depended on the wrong version of 
 tika: we should test that the maven dependencies are correct.
 An easy way to do this is to force it to download all of its dependencies, 
 and then run our existing license checks over that.
 This currently fails: maven is bringing in some extra 3rd party libraries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4354) add validate-maven task to check maven dependencies


[ 
https://issues.apache.org/jira/browse/LUCENE-4354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13447634#comment-13447634
 ] 

Chris Male commented on LUCENE-4354:


Ignoring the scope issue, the validation has revealed valid issues.  For 
example the jdom, rome and servlet dependencies all have different versions to 
our license files.

 add validate-maven task to check maven dependencies
 ---

 Key: LUCENE-4354
 URL: https://issues.apache.org/jira/browse/LUCENE-4354
 Project: Lucene - Core
  Issue Type: Task
Reporter: Robert Muir
 Attachments: LUCENE-4354.patch


 We had a situation where the maven artifacts depended on the wrong version of 
 tika: we should test that the maven dependencies are correct.
 An easy way to do this is to force it to download all of its dependencies, 
 and then run our existing license checks over that.
 This currently fails: maven is bringing in some extra 3rd party libraries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4354) add validate-maven task to check maven dependencies


[ 
https://issues.apache.org/jira/browse/LUCENE-4354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13447638#comment-13447638
 ] 

Chris Male commented on LUCENE-4354:


It's not just an issue of what ends up in the war since we also publish 
individual artifacts / poms.

 add validate-maven task to check maven dependencies
 ---

 Key: LUCENE-4354
 URL: https://issues.apache.org/jira/browse/LUCENE-4354
 Project: Lucene - Core
  Issue Type: Task
Reporter: Robert Muir
 Attachments: LUCENE-4354.patch


 We had a situation where the maven artifacts depended on the wrong version of 
 tika: we should test that the maven dependencies are correct.
 An easy way to do this is to force it to download all of its dependencies, 
 and then run our existing license checks over that.
 This currently fails: maven is bringing in some extra 3rd party libraries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4354) add validate-maven task to check maven dependencies


[ 
https://issues.apache.org/jira/browse/LUCENE-4354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13447714#comment-13447714
 ] 

Chris Male commented on LUCENE-4354:


Yeah I think tests should be catching them.  Do you have any examples?

 add validate-maven task to check maven dependencies
 ---

 Key: LUCENE-4354
 URL: https://issues.apache.org/jira/browse/LUCENE-4354
 Project: Lucene - Core
  Issue Type: Task
Reporter: Robert Muir
 Attachments: LUCENE-4354-dep-fix.patch, 
 LUCENE-4354_hacked_lucene_only.patch, LUCENE-4354.patch, LUCENE-4354.patch, 
 LUCENE-4354.patch


 We had a situation where the maven artifacts depended on the wrong version of 
 tika: we should test that the maven dependencies are correct.
 An easy way to do this is to force it to download all of its dependencies, 
 and then run our existing license checks over that.
 This currently fails: maven is bringing in some extra 3rd party libraries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4362) ban tab-indented source


[ 
https://issues.apache.org/jira/browse/LUCENE-4362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448478#comment-13448478
 ] 

Chris Male commented on LUCENE-4362:


Well damn.

 ban tab-indented source
 ---

 Key: LUCENE-4362
 URL: https://issues.apache.org/jira/browse/LUCENE-4362
 Project: Lucene - Core
  Issue Type: Task
Reporter: Robert Muir
 Attachments: LUCENE-4362_core.patch


 This makes code really difficult to read and work with.
 Its easy enough to prevent.
 {noformat}
 Index: build.xml
 ===
 --- build.xml (revision 1380979)
 +++ build.xml (working copy)
 @@ -77,11 +77,12 @@
  or
containsregexp expression=@author\b casesensitive=yes/
containsregexp expression=\bno(n|)commit\b casesensitive=no/
 +  containsregexp expression=\t casesensitive=no/
  /or
/fileset
map from=${validate.currDir}${file.separator} to=* /
  /pathconvert
 -fail if=validate.patternsFoundThe following files contain @author 
 tags or nocommits:${line.separator}${validate.patternsFound}/fail
 +fail if=validate.patternsFoundThe following files contain @author 
 tags, tabs or nocommits:${line.separator}${validate.patternsFound}/fail
/target
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField

2012-09-02 Thread Chris Male (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13446962#comment-13446962
 ] 

Chris Male commented on LUCENE-3312:


Thanks Uwe and Nikola!

 Break out StorableField from IndexableField
 ---

 Key: LUCENE-3312
 URL: https://issues.apache.org/jira/browse/LUCENE-3312
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Nikola Tankovic
  Labels: gsoc2012, lucene-gsoc-12
 Fix For: 5.0

 Attachments: LUCENE-3312-DocumentIterators-uwe.patch, 
 lucene-3312-patch-01.patch, lucene-3312-patch-02.patch, 
 lucene-3312-patch-03.patch, lucene-3312-patch-04.patch, 
 lucene-3312-patch-05.patch, lucene-3312-patch-06.patch, 
 lucene-3312-patch-07.patch, lucene-3312-patch-08.patch, 
 lucene-3312-patch-09.patch, lucene-3312-patch-10.patch, 
 lucene-3312-patch-11.patch, lucene-3312-patch-12a.patch, 
 lucene-3312-patch-12.patch, lucene-3312-patch-13.patch, 
 lucene-3312-patch-14.patch, LUCENE-3312-reintegration.patch, 
 LUCENE-3312-reintegration.patch


 In the field type branch we have strongly decoupled
 Document/Field/FieldType impl from the indexer, by having only a
 narrow API (IndexableField) passed to IndexWriter.  This frees apps up
 use their own documents instead of the user-space impls we provide
 in oal.document.
 Similarly, with LUCENE-3309, we've done the same thing on the
 doc/field retrieval side (from IndexReader), with the
 StoredFieldsVisitor.
 But, maybe we should break out StorableField from IndexableField,
 such that when you index a doc you provide two Iterables -- one for the
 IndexableFields and one for the StorableFields.  Either can be null.
 One downside is possible perf hit for fields that are both indexed 
 stored (ie, we visit them twice, lookup their name in a hash twice,
 etc.).  But the upside is a cleaner separation of concerns in API

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField

2012-08-31 Thread Chris Male (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445842#comment-13445842
]

Chris Male commented on LUCENE-3312:

I've thought about this a little bit.

{quote}
To me storing needs no 'type' information at all: But I guess the problem with
that is that we need
DocValues types since DocValues are stored fields here.
{quote}

We've gone back and forwards about this a lot since the Fields cleanup began
but it would be nice to actually have the DocValues Types on the StorableField
itself rather than on StorableFieldType. In the end the type is related to the
type of the value itself, not disconnected metadata. Having it this way would
also alleviate the need for StorableFieldType and make storing values as simple
as possible.

{quote}
This basically is the same problem all over again.
* You make a Document with N StorableFields
* You call IR.document and get a StorableDocument back, with N-3 StorableFields.
* You wonder: what happened to the other 3 fields?

They were DocValues.
{quote}

What if they were returned? Because you're absolutely right, it seems odd for
DocValues Fields to be StorableFields and then not accessible like all other
StorableFields. So what if we changed how IR.document worked so you could pull
DocValues Fields too. Is that something users might want?

Break out StorableField from IndexableField
---

Key: LUCENE-3312
URL: https://issues.apache.org/jira/browse/LUCENE-3312
Project: Lucene - Core
Issue Type: Improvement
Components: core/index
Reporter: Michael McCandless
Assignee: Nikola Tankovic
Labels: gsoc2012, lucene-gsoc-12
Fix For: Field Type branch

Attachments: LUCENE-3312-DocumentIterators-uwe.patch,
lucene-3312-patch-01.patch, lucene-3312-patch-02.patch,
lucene-3312-patch-03.patch, lucene-3312-patch-04.patch,
lucene-3312-patch-05.patch, lucene-3312-patch-06.patch,
lucene-3312-patch-07.patch, lucene-3312-patch-08.patch,
lucene-3312-patch-09.patch, lucene-3312-patch-10.patch,
lucene-3312-patch-11.patch, lucene-3312-patch-12a.patch,
lucene-3312-patch-12.patch, lucene-3312-patch-13.patch,
lucene-3312-patch-14.patch, LUCENE-3312-reintegration.patch

In the field type branch we have strongly decoupled
Document/Field/FieldType impl from the indexer, by having only a
narrow API (IndexableField) passed to IndexWriter. This frees apps up
use their own documents instead of the user-space impls we provide
in oal.document.
Similarly, with LUCENE-3309, we've done the same thing on the
doc/field retrieval side (from IndexReader), with the
StoredFieldsVisitor.
But, maybe we should break out StorableField from IndexableField,
such that when you index a doc you provide two Iterables -- one for the
IndexableFields and one for the StorableFields. Either can be null.
One downside is possible perf hit for fields that are both indexed
stored (ie, we visit them twice, lookup their name in a hash twice,
etc.). But the upside is a cleaner separation of concerns in API

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3700) Create a Classification component

2012-08-30 Thread Chris Male (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13444841#comment-13444841
 ] 

Chris Male commented on SOLR-3700:
--

bq. Is there any reason not to develop it as a Lucene module? I haven't looked 
at the patch, but if it's not Solr-specific, or depends on Solr API, perhaps we 
can make this issue a LUCENE- one?

+1

 Create a Classification component
 -

 Key: SOLR-3700
 URL: https://issues.apache.org/jira/browse/SOLR-3700
 Project: Solr
  Issue Type: New Feature
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
Priority: Minor
 Attachments: SOLR-3700_2.patch, SOLR-3700.patch


 Lucene/Solr can host huge sets of documents containing lots of information in 
 fields so that these can be used as training examples (w/ features) in order 
 to very quickly create classifiers algorithms to use on new documents and / 
 or to provide an additional service.
 So the idea is to create a contrib module (called 'classification') to host a 
 ClassificationComponent that will use already seen data (the indexed 
 documents / fields) to classify new documents / text fragments.
 The first version will contain a (simplistic) Lucene based Naive Bayes 
 classifier but more implementations should be added in the future.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4343) Clear up more Tokenizer.setReader/TokenStream.reset issues

2012-08-30 Thread Chris Male (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13444908#comment-13444908
]

Chris Male commented on LUCENE-4343:

+1 to the improvements and pursuing making it final.

Clear up more Tokenizer.setReader/TokenStream.reset issues
--

Key: LUCENE-4343
URL: https://issues.apache.org/jira/browse/LUCENE-4343
Project: Lucene - Core
Issue Type: Task
Components: modules/analysis
Reporter: Robert Muir
Attachments: LUCENE-4343.patch

spinoff from user-list thread.
I think the rename helps, but the javadocs still have problems: they seem to
only describe a totally wacky case (CachingTokenFilter) and not the normal
case.
Ideally setReader would be final I think, but there are a few crazy
tokenstreams to fix before I could make that work. Would also need something
hackish so MockTokenizer's state machine is still functional.
But i worked on fixing up the mess in our various tokenstreams, which is easy
for the most part.
As part of this I found it was really useful in flushing out test bugs (ones
that dont use MockTokenizer, which they really should), if we can do some
best-effort exceptions when the consumer is broken and it costs nothing.
For example:
{noformat}
- private int offset = 0, bufferIndex = 0, dataLen = 0, finalOffset = 0;
+ // note: bufferIndex is -1 here to best-effort AIOOBE consumers that don't
call reset()
+ private int offset = 0, bufferIndex = -1, dataLen = 0, finalOffset = 0;
{noformat}
I think this is worth exploring more... this was really effective at finding
broken tests etc. We should see if we can be more thorough/ideally throw
better exceptions when consumers are broken and its free.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField

2012-08-25 Thread Chris Male (JIRA)















































Chris Male
 commented on  LUCENE-3312


Break out StorableField from IndexableField















Apply only to trunk (5.0) - so it has more time to bake? I think this change would be too big for Lucene 4.0 - and too late??

+1 to 5.0 only.  It's another big change to the Document/Field API that we may want to evolve more as it bakes and earlier adopters begin to use it.

Are there any other things to change? One open point is StorableFieldType.

StorableFieldType seems like the only thing at this stage that needs to be addressed.



























This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira





-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField

2012-08-25 Thread Chris Male (JIRA)















































Chris Male
 commented on  LUCENE-3312


Break out StorableField from IndexableField















+1



























This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira





-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4197) Small improvements to Lucene Spatial Module for v4

2012-08-18 Thread Chris Male (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13437458#comment-13437458
 ] 

Chris Male commented on LUCENE-4197:


+1

 Small improvements to Lucene Spatial Module for v4
 --

 Key: LUCENE-4197
 URL: https://issues.apache.org/jira/browse/LUCENE-4197
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spatial
Reporter: David Smiley
 Fix For: 4.0

 Attachments: LUCENE-4197_rename_CachedDistanceValueSource.patch, 
 LUCENE-4197_SpatialArgs_doesn_t_need_overloaded_toString()_with_a_ctx_param_.patch,
  SpatialArgs-_remove_unused_min_and_max_params.patch


 This issue is to capture small changes to the Lucene spatial module that 
 don't deserve their own issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField

2012-08-17 Thread Chris Male (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13436755#comment-13436755
 ] 

Chris Male commented on LUCENE-3312:


We definitely need to clean up StorableFieldType situation, but I think we can 
tackle that afterwards.  I think it's best to ensure what we have now works and 
we're comfortable with the API.

 Break out StorableField from IndexableField
 ---

 Key: LUCENE-3312
 URL: https://issues.apache.org/jira/browse/LUCENE-3312
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Nikola Tankovic
  Labels: gsoc2012, lucene-gsoc-12
 Fix For: Field Type branch

 Attachments: lucene-3312-patch-01.patch, lucene-3312-patch-02.patch, 
 lucene-3312-patch-03.patch, lucene-3312-patch-04.patch, 
 lucene-3312-patch-05.patch, lucene-3312-patch-06.patch, 
 lucene-3312-patch-07.patch, lucene-3312-patch-08.patch, 
 lucene-3312-patch-09.patch, lucene-3312-patch-10.patch, 
 lucene-3312-patch-11.patch, lucene-3312-patch-12a.patch, 
 lucene-3312-patch-12.patch, lucene-3312-patch-13.patch


 In the field type branch we have strongly decoupled
 Document/Field/FieldType impl from the indexer, by having only a
 narrow API (IndexableField) passed to IndexWriter.  This frees apps up
 use their own documents instead of the user-space impls we provide
 in oal.document.
 Similarly, with LUCENE-3309, we've done the same thing on the
 doc/field retrieval side (from IndexReader), with the
 StoredFieldsVisitor.
 But, maybe we should break out StorableField from IndexableField,
 such that when you index a doc you provide two Iterables -- one for the
 IndexableFields and one for the StorableFields.  Either can be null.
 One downside is possible perf hit for fields that are both indexed 
 stored (ie, we visit them twice, lookup their name in a hash twice,
 etc.).  But the upside is a cleaner separation of concerns in API

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField

2012-08-15 Thread Chris Male (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13434970#comment-13434970
 ] 

Chris Male commented on LUCENE-3312:


Is it going to be possible to address IndexableFieldType vs StorableFieldType 
situation resolved before this lands? I can assist if that would help.

 Break out StorableField from IndexableField
 ---

 Key: LUCENE-3312
 URL: https://issues.apache.org/jira/browse/LUCENE-3312
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Nikola Tankovic
  Labels: gsoc2012, lucene-gsoc-12
 Fix For: Field Type branch

 Attachments: lucene-3312-patch-01.patch, lucene-3312-patch-02.patch, 
 lucene-3312-patch-03.patch, lucene-3312-patch-04.patch, 
 lucene-3312-patch-05.patch, lucene-3312-patch-06.patch, 
 lucene-3312-patch-07.patch, lucene-3312-patch-08.patch, 
 lucene-3312-patch-09.patch, lucene-3312-patch-10.patch, 
 lucene-3312-patch-11.patch, lucene-3312-patch-12a.patch, 
 lucene-3312-patch-12.patch


 In the field type branch we have strongly decoupled
 Document/Field/FieldType impl from the indexer, by having only a
 narrow API (IndexableField) passed to IndexWriter.  This frees apps up
 use their own documents instead of the user-space impls we provide
 in oal.document.
 Similarly, with LUCENE-3309, we've done the same thing on the
 doc/field retrieval side (from IndexReader), with the
 StoredFieldsVisitor.
 But, maybe we should break out StorableField from IndexableField,
 such that when you index a doc you provide two Iterables -- one for the
 IndexableFields and one for the StorableFields.  Either can be null.
 One downside is possible perf hit for fields that are both indexed 
 stored (ie, we visit them twice, lookup their name in a hash twice,
 etc.).  But the upside is a cleaner separation of concerns in API

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField


[ 
https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13431719#comment-13431719
 ] 

Chris Male commented on LUCENE-3312:


Hey Nikola,

bq. except for mentioned TestQualityRun.testTrecQuality.

I'm happy to help work out what is going wrong here, have you done any 
debugging of the test yourself? What have you worked out so far?

 Break out StorableField from IndexableField
 ---

 Key: LUCENE-3312
 URL: https://issues.apache.org/jira/browse/LUCENE-3312
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Nikola Tankovic
  Labels: gsoc2012, lucene-gsoc-12
 Fix For: Field Type branch

 Attachments: lucene-3312-patch-01.patch, lucene-3312-patch-02.patch, 
 lucene-3312-patch-03.patch, lucene-3312-patch-04.patch, 
 lucene-3312-patch-05.patch, lucene-3312-patch-06.patch, 
 lucene-3312-patch-07.patch, lucene-3312-patch-08.patch, 
 lucene-3312-patch-09.patch


 In the field type branch we have strongly decoupled
 Document/Field/FieldType impl from the indexer, by having only a
 narrow API (IndexableField) passed to IndexWriter.  This frees apps up
 use their own documents instead of the user-space impls we provide
 in oal.document.
 Similarly, with LUCENE-3309, we've done the same thing on the
 doc/field retrieval side (from IndexReader), with the
 StoredFieldsVisitor.
 But, maybe we should break out StorableField from IndexableField,
 such that when you index a doc you provide two Iterables -- one for the
 IndexableFields and one for the StorableFields.  Either can be null.
 One downside is possible perf hit for fields that are both indexed 
 stored (ie, we visit them twice, lookup their name in a hash twice,
 etc.).  But the upside is a cleaner separation of concerns in API

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField


[ 
https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13431828#comment-13431828
 ] 

Chris Male commented on LUCENE-3312:


Wow, I have replicated the same behaviour.  On the branch the number of fields 
per doc is... wow.

 Break out StorableField from IndexableField
 ---

 Key: LUCENE-3312
 URL: https://issues.apache.org/jira/browse/LUCENE-3312
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Nikola Tankovic
  Labels: gsoc2012, lucene-gsoc-12
 Fix For: Field Type branch

 Attachments: lucene-3312-patch-01.patch, lucene-3312-patch-02.patch, 
 lucene-3312-patch-03.patch, lucene-3312-patch-04.patch, 
 lucene-3312-patch-05.patch, lucene-3312-patch-06.patch, 
 lucene-3312-patch-07.patch, lucene-3312-patch-08.patch, 
 lucene-3312-patch-09.patch


 In the field type branch we have strongly decoupled
 Document/Field/FieldType impl from the indexer, by having only a
 narrow API (IndexableField) passed to IndexWriter.  This frees apps up
 use their own documents instead of the user-space impls we provide
 in oal.document.
 Similarly, with LUCENE-3309, we've done the same thing on the
 doc/field retrieval side (from IndexReader), with the
 StoredFieldsVisitor.
 But, maybe we should break out StorableField from IndexableField,
 such that when you index a doc you provide two Iterables -- one for the
 IndexableFields and one for the StorableFields.  Either can be null.
 One downside is possible perf hit for fields that are both indexed 
 stored (ie, we visit them twice, lookup their name in a hash twice,
 etc.).  But the upside is a cleaner separation of concerns in API

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField


[ 
https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13431833#comment-13431833
 ] 

Chris Male commented on LUCENE-3312:


Ah I think I found the problem, it's in Document, I'll verify in a few seconds.

 Break out StorableField from IndexableField
 ---

 Key: LUCENE-3312
 URL: https://issues.apache.org/jira/browse/LUCENE-3312
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Nikola Tankovic
  Labels: gsoc2012, lucene-gsoc-12
 Fix For: Field Type branch

 Attachments: lucene-3312-patch-01.patch, lucene-3312-patch-02.patch, 
 lucene-3312-patch-03.patch, lucene-3312-patch-04.patch, 
 lucene-3312-patch-05.patch, lucene-3312-patch-06.patch, 
 lucene-3312-patch-07.patch, lucene-3312-patch-08.patch, 
 lucene-3312-patch-09.patch


 In the field type branch we have strongly decoupled
 Document/Field/FieldType impl from the indexer, by having only a
 narrow API (IndexableField) passed to IndexWriter.  This frees apps up
 use their own documents instead of the user-space impls we provide
 in oal.document.
 Similarly, with LUCENE-3309, we've done the same thing on the
 doc/field retrieval side (from IndexReader), with the
 StoredFieldsVisitor.
 But, maybe we should break out StorableField from IndexableField,
 such that when you index a doc you provide two Iterables -- one for the
 IndexableFields and one for the StorableFields.  Either can be null.
 One downside is possible perf hit for fields that are both indexed 
 stored (ie, we visit them twice, lookup their name in a hash twice,
 etc.).  But the upside is a cleaner separation of concerns in API

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField


[ 
https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13431839#comment-13431839
 ] 

Chris Male commented on LUCENE-3312:


Yup found it.  

The problem is in the branch {{Document#getFields()}} is creating a new List 
and inside {{DocMaker}} in the benchmark module, it is pulling the Fields and 
clearing them (using {{clear()}}).  Since a new List is being created each 
time, it is the new List that is getting cleared rather than the actual fields. 
 Hence each iteration just adds more fields without having the previous ones 
cleared.

 Break out StorableField from IndexableField
 ---

 Key: LUCENE-3312
 URL: https://issues.apache.org/jira/browse/LUCENE-3312
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Nikola Tankovic
  Labels: gsoc2012, lucene-gsoc-12
 Fix For: Field Type branch

 Attachments: lucene-3312-patch-01.patch, lucene-3312-patch-02.patch, 
 lucene-3312-patch-03.patch, lucene-3312-patch-04.patch, 
 lucene-3312-patch-05.patch, lucene-3312-patch-06.patch, 
 lucene-3312-patch-07.patch, lucene-3312-patch-08.patch, 
 lucene-3312-patch-09.patch


 In the field type branch we have strongly decoupled
 Document/Field/FieldType impl from the indexer, by having only a
 narrow API (IndexableField) passed to IndexWriter.  This frees apps up
 use their own documents instead of the user-space impls we provide
 in oal.document.
 Similarly, with LUCENE-3309, we've done the same thing on the
 doc/field retrieval side (from IndexReader), with the
 StoredFieldsVisitor.
 But, maybe we should break out StorableField from IndexableField,
 such that when you index a doc you provide two Iterables -- one for the
 IndexableFields and one for the StorableFields.  Either can be null.
 One downside is possible perf hit for fields that are both indexed 
 stored (ie, we visit them twice, lookup their name in a hash twice,
 etc.).  But the upside is a cleaner separation of concerns in API

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField


[ 
https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13431841#comment-13431841
 ] 

Chris Male commented on LUCENE-3312:


Nikola, we should probably move all of Document's methods over to just working 
with Field (and not IndexableField).  I don't mind if we want to make 
getFields() return an immutable list but we then need to provide a clear() 
method so people can reuse Document instances.

 Break out StorableField from IndexableField
 ---

 Key: LUCENE-3312
 URL: https://issues.apache.org/jira/browse/LUCENE-3312
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Nikola Tankovic
  Labels: gsoc2012, lucene-gsoc-12
 Fix For: Field Type branch

 Attachments: lucene-3312-patch-01.patch, lucene-3312-patch-02.patch, 
 lucene-3312-patch-03.patch, lucene-3312-patch-04.patch, 
 lucene-3312-patch-05.patch, lucene-3312-patch-06.patch, 
 lucene-3312-patch-07.patch, lucene-3312-patch-08.patch, 
 lucene-3312-patch-09.patch


 In the field type branch we have strongly decoupled
 Document/Field/FieldType impl from the indexer, by having only a
 narrow API (IndexableField) passed to IndexWriter.  This frees apps up
 use their own documents instead of the user-space impls we provide
 in oal.document.
 Similarly, with LUCENE-3309, we've done the same thing on the
 doc/field retrieval side (from IndexReader), with the
 StoredFieldsVisitor.
 But, maybe we should break out StorableField from IndexableField,
 such that when you index a doc you provide two Iterables -- one for the
 IndexableFields and one for the StorableFields.  Either can be null.
 One downside is possible perf hit for fields that are both indexed 
 stored (ie, we visit them twice, lookup their name in a hash twice,
 etc.).  But the upside is a cleaner separation of concerns in API

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField


[ 
https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13431849#comment-13431849
 ] 

Chris Male commented on LUCENE-3312:


Yeah we definitely shouldn't return a new list.  I think the immutable list and 
Document.clear() combo will suffice.

 Break out StorableField from IndexableField
 ---

 Key: LUCENE-3312
 URL: https://issues.apache.org/jira/browse/LUCENE-3312
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Nikola Tankovic
  Labels: gsoc2012, lucene-gsoc-12
 Fix For: Field Type branch

 Attachments: lucene-3312-patch-01.patch, lucene-3312-patch-02.patch, 
 lucene-3312-patch-03.patch, lucene-3312-patch-04.patch, 
 lucene-3312-patch-05.patch, lucene-3312-patch-06.patch, 
 lucene-3312-patch-07.patch, lucene-3312-patch-08.patch, 
 lucene-3312-patch-09.patch


 In the field type branch we have strongly decoupled
 Document/Field/FieldType impl from the indexer, by having only a
 narrow API (IndexableField) passed to IndexWriter.  This frees apps up
 use their own documents instead of the user-space impls we provide
 in oal.document.
 Similarly, with LUCENE-3309, we've done the same thing on the
 doc/field retrieval side (from IndexReader), with the
 StoredFieldsVisitor.
 But, maybe we should break out StorableField from IndexableField,
 such that when you index a doc you provide two Iterables -- one for the
 IndexableFields and one for the StorableFields.  Either can be null.
 One downside is possible perf hit for fields that are both indexed 
 stored (ie, we visit them twice, lookup their name in a hash twice,
 etc.).  But the upside is a cleaner separation of concerns in API

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField


[ 
https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13431851#comment-13431851
 ] 

Chris Male commented on LUCENE-3312:


Oh we should also include a unit test that verifies this behaviour.

 Break out StorableField from IndexableField
 ---

 Key: LUCENE-3312
 URL: https://issues.apache.org/jira/browse/LUCENE-3312
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Nikola Tankovic
  Labels: gsoc2012, lucene-gsoc-12
 Fix For: Field Type branch

 Attachments: lucene-3312-patch-01.patch, lucene-3312-patch-02.patch, 
 lucene-3312-patch-03.patch, lucene-3312-patch-04.patch, 
 lucene-3312-patch-05.patch, lucene-3312-patch-06.patch, 
 lucene-3312-patch-07.patch, lucene-3312-patch-08.patch, 
 lucene-3312-patch-09.patch


 In the field type branch we have strongly decoupled
 Document/Field/FieldType impl from the indexer, by having only a
 narrow API (IndexableField) passed to IndexWriter.  This frees apps up
 use their own documents instead of the user-space impls we provide
 in oal.document.
 Similarly, with LUCENE-3309, we've done the same thing on the
 doc/field retrieval side (from IndexReader), with the
 StoredFieldsVisitor.
 But, maybe we should break out StorableField from IndexableField,
 such that when you index a doc you provide two Iterables -- one for the
 IndexableFields and one for the StorableFields.  Either can be null.
 One downside is possible perf hit for fields that are both indexed 
 stored (ie, we visit them twice, lookup their name in a hash twice,
 etc.).  But the upside is a cleaner separation of concerns in API

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField

[
https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13431867#comment-13431867
]

Chris Male commented on LUCENE-3312:

Nikola,

On a totally note totally unrelated to the bug, I noticed that StorableField
still returns an IndexableFieldType for type(). This lead me to GeneralField.
I don't think we need this. IndexableField should only need name(),
tokenStream() and type(). StorableField needs name(), type() and the various
xyzValue() accessors. Its type() should be a StorableFieldType and some of the
functionality from IndexableFieldType should go there.

Break out StorableField from IndexableField
---

Attachments: lucene-3312-patch-01.patch, lucene-3312-patch-02.patch,
lucene-3312-patch-03.patch, lucene-3312-patch-04.patch,
lucene-3312-patch-05.patch, lucene-3312-patch-06.patch,
lucene-3312-patch-07.patch, lucene-3312-patch-08.patch,
lucene-3312-patch-09.patch

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-3312) Break out StorableField from IndexableField

[
https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13431867#comment-13431867
]

Chris Male edited comment on LUCENE-3312 at 8/9/12 2:51 PM:

Nikola,

On a note totally unrelated to the bug, I noticed that StorableField still
returns an IndexableFieldType for type(). This lead me to GeneralField. I
don't think we need this. IndexableField should only need name(),
tokenStream() and type(). StorableField needs name(), type() and the various
xyzValue() accessors. Its type() should be a StorableFieldType and some of the
functionality from IndexableFieldType should go there.

was (Author: cmale):
Nikola,

Break out StorableField from IndexableField
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField


[ 
https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13431880#comment-13431880
 ] 

Chris Male commented on LUCENE-3312:


{code}
public final ListField getFields() {
  return Collections.unmodifiableList(fields);
}
{code}

 Break out StorableField from IndexableField
 ---

 Key: LUCENE-3312
 URL: https://issues.apache.org/jira/browse/LUCENE-3312
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Nikola Tankovic
  Labels: gsoc2012, lucene-gsoc-12
 Fix For: Field Type branch

 Attachments: lucene-3312-patch-01.patch, lucene-3312-patch-02.patch, 
 lucene-3312-patch-03.patch, lucene-3312-patch-04.patch, 
 lucene-3312-patch-05.patch, lucene-3312-patch-06.patch, 
 lucene-3312-patch-07.patch, lucene-3312-patch-08.patch, 
 lucene-3312-patch-09.patch


 In the field type branch we have strongly decoupled
 Document/Field/FieldType impl from the indexer, by having only a
 narrow API (IndexableField) passed to IndexWriter.  This frees apps up
 use their own documents instead of the user-space impls we provide
 in oal.document.
 Similarly, with LUCENE-3309, we've done the same thing on the
 doc/field retrieval side (from IndexReader), with the
 StoredFieldsVisitor.
 But, maybe we should break out StorableField from IndexableField,
 such that when you index a doc you provide two Iterables -- one for the
 IndexableFields and one for the StorableFields.  Either can be null.
 One downside is possible perf hit for fields that are both indexed 
 stored (ie, we visit them twice, lookup their name in a hash twice,
 etc.).  But the upside is a cleaner separation of concerns in API

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3616) Illegal Field Configurations should throw exceptions

2012-08-04 Thread Chris Male (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428583#comment-13428583
 ] 

Chris Male commented on LUCENE-3616:


With all the various typed XYZField implementations we have now, what do we see 
as the role of Field? Is it just serving as a parent class to the 
implementations or do we expect users will be using it too?

 Illegal Field Configurations should throw exceptions
 

 Key: LUCENE-3616
 URL: https://issues.apache.org/jira/browse/LUCENE-3616
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.0-ALPHA
Reporter: Grant Ingersoll
Assignee: Michael McCandless
Priority: Minor
 Attachments: LUCENE-3616.patch


 When working on LUCENE-3615, I came across:
 {quote}
 java.lang.IllegalArgumentException: field field is stored but does not have 
 binaryValue, stringValue nor numericValue
   at 
 org.apache.lucene.index.codecs.DefaultStoredFieldsWriter.writeField(DefaultStoredFieldsWriter.java:177)
   at 
 org.apache.lucene.index.StoredFieldsConsumer.finishDocument(StoredFieldsConsumer.java:119)
   at 
 org.apache.lucene.index.DocFieldProcessor.finishDocument(DocFieldProcessor.java:295)
   at 
 org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:255)
   at 
 org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:380)
   at 
 org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1480)
   at 
 org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1242)
   at 
 org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1223)
   at org.apache.lucene.index.Test2BTerms.test2BTerms(Test2BTerms.java:194)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
   at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48)
   at 
 org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:525)
   at 
 org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
   at 
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
   at 
 org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:168)
   at 
 org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:47)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
   at 
 org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
   at 
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
   at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
   at org.junit.runner.JUnitCore.run(JUnitCore.java:157)
   at 
 com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:71)
   at 
 com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:199)
   at 
 com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:62)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
 {quote}
 which is due to the using Textfield.TYPE_STORED when using a TokenStream.  
 Since this is an illegal combination, we should throw an exception upon 
 construction of the Field, not later when actually trying to do the indexing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (LUCENE-3616) Illegal Field Configurations should throw exceptions

2012-08-04 Thread Chris Male (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428594#comment-13428594
 ] 

Chris Male commented on LUCENE-3616:


bq. In my opinion if i have a ShortDocValuesField, it shouldnt have a setReader 
method

Agreed.  The setABC() methods are extremely confusing and add another level of 
validation (using your example, we have to validate that you're not setting a 
Reader on a NumericField).

Perhaps we can re-arrange this a little.  If we genuinely feel there there are 
use cases out there that we haven't covered with the typed impls and that we 
don't want to cover, then why not make a GenericField or something, which is 
abstract and accepts just name, FieldType and maybe an Object value.  We can 
then emphasis in documentation that it is expert only, should only be 
subclassed in the extremely rare situations that our typed impls are 
insufficient, and won't be validated so buyer-beware kind of thing.  

We can then gut Field down to a very simple abstract class / interface, and 
promote our typed impls to being 1st class and the recommended entry points for 
users.

Of course if we feel we have provided adequate support through the typed impls, 
then we can skip straight to the gutting.

 Illegal Field Configurations should throw exceptions
 

 Key: LUCENE-3616
 URL: https://issues.apache.org/jira/browse/LUCENE-3616
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.0-ALPHA
Reporter: Grant Ingersoll
Assignee: Michael McCandless
Priority: Minor
 Attachments: LUCENE-3616.patch


 When working on LUCENE-3615, I came across:
 {quote}
 java.lang.IllegalArgumentException: field field is stored but does not have 
 binaryValue, stringValue nor numericValue
   at 
 org.apache.lucene.index.codecs.DefaultStoredFieldsWriter.writeField(DefaultStoredFieldsWriter.java:177)
   at 
 org.apache.lucene.index.StoredFieldsConsumer.finishDocument(StoredFieldsConsumer.java:119)
   at 
 org.apache.lucene.index.DocFieldProcessor.finishDocument(DocFieldProcessor.java:295)
   at 
 org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:255)
   at 
 org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:380)
   at 
 org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1480)
   at 
 org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1242)
   at 
 org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1223)
   at org.apache.lucene.index.Test2BTerms.test2BTerms(Test2BTerms.java:194)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
   at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48)
   at 
 org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:525)
   at 
 org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
   at 
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
   at 
 org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:168)
   at 
 org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:47)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
   at 
 org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
   at 
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
   at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
   at org.junit.runner.JUnitCore.run(JUnitCore.java:157)
   at 
 com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:71)
   at 
 com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:199

[jira] [Updated] (LUCENE-4256) Improve Analysis Factory configuration workflow

2012-08-02 Thread Chris Male (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Chris Male updated LUCENE-4256:
---

Attachment: LUCENE-4256-version.patch

Going to do this in smaller steps so they are easier to review and be sure
about.

This patch moves the Version back into the args Map.

Once this is committed I'll tackle the constructor stuff.

Improve Analysis Factory configuration workflow
---

Key: LUCENE-4256
URL: https://issues.apache.org/jira/browse/LUCENE-4256
Project: Lucene - Core
Issue Type: Improvement
Components: modules/analysis
Reporter: Chris Male
Attachments: LUCENE-4256-further.patch, LUCENE-4256-version.patch,
LUCENE-4256_incomplete.patch

With the Factorys now available for more general use, I'd like to look at
ways to improve the configuration workflow. Currently it's a little disjoint
and confusing, especially around using {{inform(ResourceLoader)}}.
What I think we should do is:
- Remove the need for {{ResourceLoaderAware}} and pass in the ResourceLoader
in {{init}}, so it'd become {{init(MapString, String args, ResourceLoader
loader)}}
- Consider moving away from the generic args Map and using setters. This
gives us better typing and could mitigate bugs due to using the wrong
configure key. However it does force the consumer to invoke each setter.
- If we're going to stick with using the args Map, then move the Version
parameter into {{init}} as well, rather than being a setter as I currently
made it.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Output class folders (Eclipse).

2012-08-01 Thread Chris Male

Couldn't we classpath scan ourselves and detect services rather than
generating the file?

On Wed, Aug 1, 2012 at 10:13 PM, Uwe Schindler u...@thetaphi.de wrote:

 That's Kohsukes Annotation processor:
 http://weblogs.java.net/blog/kohsuke/archive/2009/03/my_project_of_t.html

 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de


  -Original Message-
  From: Uwe Schindler [mailto:u...@thetaphi.de]
  Sent: Wednesday, August 01, 2012 12:07 PM
  To: dev@lucene.apache.org
  Subject: RE: Output class folders (Eclipse).
 
  That is unfortunately needed because of SPI. The problem is: If eclipse
 copies
  all files to one folder, then all META-INF/services/ files that are in
 more than
  one module will overwrite each other. This is only solvable maybe at a
 later
  stage, when we will create META-INF files using a javac annotation
 processor
  (in that case, Eclipse would create one merged META-INF file for all
 modules).
 
  I have not yet opened an issue, but the SPI file creation for analyzers
 is error
  prone (it's easy to miss a new factory), so the idea is to automate
 this. Either:
  - Each codec or analyzer factory gets a compile-only annotation and a
 javac
  annotation processor will put the class name into META-INF (that would
 solve
  the Eclipse issue). There are 2 packages available that can do this
 (itself loaded
  by SPI into javac/eclipse, haha): One from Jenkin's Kohsuke and another
 one.
  The downside is: You have to mark each Codec/Postigsformat/Analysis
 Factory
  with an annotation: @Provider or similar
  - The 2nd option: Use ASM to find all classes in output folder that
 extend a base
  class (e.g., Codec) or interface and generate META-INF/services/oal.Codec
  files. Downside: Would not work with eclipse at all, as it must be run
 by ANT.
 
  Uwe
 
  -
  Uwe Schindler
  H.-H.-Meier-Allee 63, D-28213 Bremen
  http://www.thetaphi.de
  eMail: u...@thetaphi.de
 
 
   -Original Message-
   From: Dawid Weiss [mailto:dawid.we...@gmail.com]
   Sent: Wednesday, August 01, 2012 11:56 AM
   To: dev@lucene.apache.org
   Subject: Output class folders (Eclipse).
  
   The template file separates output folders for class files into many
   bin.* folders at the root level:
  
   output=bin.analysis-kuromoji
  
   Is this intentional? It's annoying, I'd rather move it under one
   folder or even put it into a single folder (since at Eclipse level
   there's no distinction of modules anyway).
  
   Dawid
  
   -
   To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
   additional commands, e-mail: dev-h...@lucene.apache.org
 
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
  commands, e-mail: dev-h...@lucene.apache.org


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-- 
Chris Male | Open Source Search Developer | elasticsearch |
www.ehttp://www.dutchworks.nl
lasticsearch.com

[jira] [Commented] (LUCENE-4271) Solr LocalParams for Lucene Query Parser

2012-07-31 Thread Chris Male (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425627#comment-13425627
 ] 

Chris Male commented on LUCENE-4271:


Is the intention to mandate the !var syntax too? That seems like a pretty Solr 
specific thing (being able to delegate the parsing to a QParser) but I can 
imagine someone just wanting a map of values, e.g.

{code}
{arg_1=val_1 arg_2=val_2} word
{code}

 Solr LocalParams for Lucene Query Parser
 

 Key: LUCENE-4271
 URL: https://issues.apache.org/jira/browse/LUCENE-4271
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Yonik Seeley

 The Lucene QueryParser should implement Solr's LocalParams syntax directly so 
 that instead of
 {code}
 _query_:{!geodist d=10 p=20.5,30.2}
 {code}
 one could directly use
 {code}
 {!geodist d=10 p=20.5,30.2}
  {code}
 references: http://wiki.apache.org/solr/LocalParams

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4271) Solr LocalParams for Lucene Query Parser

2012-07-30 Thread Chris Male (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13424903#comment-13424903
 ] 

Chris Male commented on LUCENE-4271:


Will a query have a single set of params, or will each clause potentially have 
its own?

 Solr LocalParams for Lucene Query Parser
 

 Key: LUCENE-4271
 URL: https://issues.apache.org/jira/browse/LUCENE-4271
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Yonik Seeley

 The Lucene QueryParser should implement Solr's LocalParams syntax directly so 
 that instead of
 {code}
 _query_:{!geodist d=10 p=20.5,30.2}
 {code}
 one could directly use
 {code}
 {!geodist d=10 p=20.5,30.2}
  {code}
 references: http://wiki.apache.org/solr/LocalParams

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4268) Rename ResourceAsStreamReasourceLoader to ClasspathResourceLoader, supply simple FilesystemResourceLoader

2012-07-28 Thread Chris Male (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13424322#comment-13424322
 ] 

Chris Male commented on LUCENE-4268:


+1

 Rename ResourceAsStreamReasourceLoader to ClasspathResourceLoader, supply 
 simple FilesystemResourceLoader
 -

 Key: LUCENE-4268
 URL: https://issues.apache.org/jira/browse/LUCENE-4268
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4268.patch, LUCENE-4268.patch


 We should rename the class and also fix some bugs:
 - Class/ClassLoader.getResourceAsStream() returns null when resource not 
 found (which is a Java bug in my opinion) and does not throw IOException. 
 SolrResourceLoader throws IOException, the Lucene example one should do the 
 same. This prevents NPEs everywhere.
 Improvements:
 - Add no-arg CTOR that uses context class loader instead a given class. This 
 is more what users want. Resource names must then include package name, of 
 course.
 We should also provide a second implementation that allows resource names to 
 be full filesystem paths. I think for loading the resources like custom word 
 list, this is the most wanted implementation. Loading of classes would be 
 delegated to ClassLoader (of course).
 I dont like ResourceLoader also supplying newInstance(), can we remove this 
 for analysis?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4208) Spatial distance relevancy should use score of 1/distance

2012-07-28 Thread Chris Male (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13424358#comment-13424358
]

Chris Male commented on LUCENE-4208:

Having thought about this more I think the best way forward is to just emulate
free-text queries and have a {{SpatialSimilarity}} abstraction. I'm not sure
of the exact nature of the API for this but I think there are times with using
1/x is sufficient and there are probably times when a more convoluted algorithm
fits. We should allow the consumer to control what they choose.

I think the Similarity should be given the Query Shape, the matched docID and
the current SpatialOperation as a minimum. I'd like to somehow see a way to
also pass in a pre-computed distance (for Queries that compute it as part of
their matching) and possibly the matched grid hash for anything using the
PrefixTrees. We might have to have subclasses for those, or maybe a Command or
something, I'm not sure.

Other benefits:
- We immediately open up the ability to have more complex similarity scores
based on overlap percentage or anything really.
- It is plausible that a SpatialSimilarity might use a cache of indexed Shapes
to facilitate more complex algorithms. By having this abstraction we offload
the caching from the main API.
- It is also plausible that a SpatialSimilarity instance could be misused to
cache calculated distances if the consumer so wanted.

I think we should consider whether we want SpatialSimilarities to also be given
the current IndexReader (and so be able to use it in any caches or other
lookups) or whether we want them to be IR independent.

We will also need some custom Queries to actually make use of the
SpatialSimilarity. Need to think this one through a little.

Spatial distance relevancy should use score of 1/distance
-

Key: LUCENE-4208
URL: https://issues.apache.org/jira/browse/LUCENE-4208
Project: Lucene - Core
Issue Type: New Feature
Components: modules/spatial
Reporter: David Smiley
Fix For: 4.0

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4260) Factor subPackages out of resourceloader interface

2012-07-27 Thread Chris Male (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13423811#comment-13423811
 ] 

Chris Male commented on LUCENE-4260:


+1

 Factor subPackages out of resourceloader interface
 --

 Key: LUCENE-4260
 URL: https://issues.apache.org/jira/browse/LUCENE-4260
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-4260.patch


 From Uwe on LUCENE-4257:
 The comment about the subpackages: This should in reality not be in 
 ResourceLoader, its too Solr-specific. It is used internally by Solr, to 
 resolve those solr. fake packages depending on the context. We should 
 remove that from the general interface and only handle it internally in 
 SolrResourceLoader.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4208) Spatial distance relevancy should use score of 1/distance

2012-07-27 Thread Chris Male (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13423851#comment-13423851
]

Chris Male commented on LUCENE-4208:

I'm considered this obfuscates the actual distance too much, making it
difficult to retrieve x again. It's not impossible but suddenly anybody
wanting to retrieve the actual distance must calculate c again.

Spatial distance relevancy should use score of 1/distance
-

Key: LUCENE-4208
URL: https://issues.apache.org/jira/browse/LUCENE-4208
Project: Lucene - Core
Issue Type: New Feature
Components: modules/spatial
Reporter: David Smiley
Fix For: 4.0

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-4208) Spatial distance relevancy should use score of 1/distance

2012-07-27 Thread Chris Male (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13423851#comment-13423851
]

Chris Male edited comment on LUCENE-4208 at 7/27/12 1:02 PM:
-

I'm concerned this obfuscates the actual distance too much, making it difficult
to retrieve x again. It's not impossible but suddenly anybody wanting to
retrieve the actual distance must calculate c again.

was (Author: cmale):
I'm considered this obfuscates the actual distance too much, making it
difficult to retrieve x again. It's not impossible but suddenly anybody
wanting to retrieve the actual distance must calculate c again.

Spatial distance relevancy should use score of 1/distance
-

Key: LUCENE-4208
URL: https://issues.apache.org/jira/browse/LUCENE-4208
Project: Lucene - Core
Issue Type: New Feature
Components: modules/spatial
Reporter: David Smiley
Fix For: 4.0

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4257) factor the getLines in ResourceLoader to WordListLoader


[ 
https://issues.apache.org/jira/browse/LUCENE-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13422869#comment-13422869
 ] 

Chris Male commented on LUCENE-4257:


Thanks for getting to this Robert, it's a good improvement.

 factor the getLines in ResourceLoader to WordListLoader
 ---

 Key: LUCENE-4257
 URL: https://issues.apache.org/jira/browse/LUCENE-4257
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-4257.patch


 This is costly to have as a mandatory method on an interface: and its 
 unrelated to resource loading, and only the factories use it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4173) Remove IgnoreIncompatibleGeometry for SpatialStrategys


[ 
https://issues.apache.org/jira/browse/LUCENE-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13422878#comment-13422878
 ] 

Chris Male commented on LUCENE-4173:


I thought you were anti-degradation at indexing and querying? 

 Remove IgnoreIncompatibleGeometry for SpatialStrategys
 --

 Key: LUCENE-4173
 URL: https://issues.apache.org/jira/browse/LUCENE-4173
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/spatial
Reporter: Chris Male
 Attachments: LUCENE-4173.patch


 Silently not indexing anything for a Shape is not okay.  Users should get an 
 Exception and then they can decide how to proceed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4256) Improve Analysis Factory configuration workflow


 [ 
https://issues.apache.org/jira/browse/LUCENE-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male updated LUCENE-4256:
---

Attachment: LUCENE-4256-further.patch

I took Robert's patch and extended it further, fixing the tests and adding some 
preliminary support for adding params through ResourceLoader.newInstance.  

 Improve Analysis Factory configuration workflow
 ---

 Key: LUCENE-4256
 URL: https://issues.apache.org/jira/browse/LUCENE-4256
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Reporter: Chris Male
 Attachments: LUCENE-4256-further.patch, LUCENE-4256_incomplete.patch


 With the Factorys now available for more general use, I'd like to look at 
 ways to improve the configuration workflow.  Currently it's a little disjoint 
 and confusing, especially around using {{inform(ResourceLoader)}}.
 What I think we should do is:
 - Remove the need for {{ResourceLoaderAware}} and pass in the ResourceLoader 
 in {{init}}, so it'd become {{init(MapString, String args, ResourceLoader 
 loader)}}
 - Consider moving away from the generic args Map and using setters.  This 
 gives us better typing and could mitigate bugs due to using the wrong 
 configure key.  However it does force the consumer to invoke each setter.
 - If we're going to stick with using the args Map, then move the Version 
 parameter into {{init}} as well, rather than being a setter as I currently 
 made it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4256) Improve Analysis Factory configuration workflow


[ 
https://issues.apache.org/jira/browse/LUCENE-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13422941#comment-13422941
 ] 

Chris Male commented on LUCENE-4256:


Thanks Robert I'll take your prototype and roll with it.

 Improve Analysis Factory configuration workflow
 ---

 Key: LUCENE-4256
 URL: https://issues.apache.org/jira/browse/LUCENE-4256
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Reporter: Chris Male
 Attachments: LUCENE-4256_incomplete.patch


 With the Factorys now available for more general use, I'd like to look at 
 ways to improve the configuration workflow.  Currently it's a little disjoint 
 and confusing, especially around using {{inform(ResourceLoader)}}.
 What I think we should do is:
 - Remove the need for {{ResourceLoaderAware}} and pass in the ResourceLoader 
 in {{init}}, so it'd become {{init(MapString, String args, ResourceLoader 
 loader)}}
 - Consider moving away from the generic args Map and using setters.  This 
 gives us better typing and could mitigate bugs due to using the wrong 
 configure key.  However it does force the consumer to invoke each setter.
 - If we're going to stick with using the args Map, then move the Version 
 parameter into {{init}} as well, rather than being a setter as I currently 
 made it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4256) Improve Analysis Factory configuration workflow

[
https://issues.apache.org/jira/browse/LUCENE-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13422876#comment-13422876
]

Chris Male commented on LUCENE-4256:

{quote}
I think of it as just a way of taking String/String args mostly though.
The other stuff is actually already supported by Analyzer easily: its just
that you have to write code to make use of it since its strongly typed.
{quote}

Yeah good point. I guess I was over thinking the purpose of the Factorys.

bq. Maybe we could start with this? It should be a relatively rote change.

Do you think I should put the Version back as a String in the args map? or
leave it typed.

Improve Analysis Factory configuration workflow
---

Key: LUCENE-4256
URL: https://issues.apache.org/jira/browse/LUCENE-4256
Project: Lucene - Core
Issue Type: Improvement
Components: modules/analysis
Reporter: Chris Male

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4173) Remove IgnoreIncompatibleGeometry for SpatialStrategys


[ 
https://issues.apache.org/jira/browse/LUCENE-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13423073#comment-13423073
 ] 

Chris Male commented on LUCENE-4173:


I do like it yeah.  I think it improves 'simple' Strategies like TwoDoubles.  
I'm not sure we need to define it per query and actually I don't think it needs 
to be on the Strategy interface.  Instead I think we should have it in the 
constructors of the appropriate Strategys.  That way the consumer is forced to 
decide how they want to proceed at instantiation.

 Remove IgnoreIncompatibleGeometry for SpatialStrategys
 --

 Key: LUCENE-4173
 URL: https://issues.apache.org/jira/browse/LUCENE-4173
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/spatial
Reporter: Chris Male
 Attachments: LUCENE-4173.patch


 Silently not indexing anything for a Shape is not okay.  Users should get an 
 Exception and then they can decide how to proceed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4173) Remove IgnoreIncompatibleGeometry for SpatialStrategys