Re: IndexWriter.rollback() logic

2009-03-18 Thread Nadav Har'El
On Mon, Feb 23, 2009, Jason Rutherglen wrote about Re: IndexWriter.rollback() logic: Howdy An, Commit means the changes are committed, there's no rollback at that point. Also in the futuer please post your questions to java-dev@lucene.apache.org Actually, An does make a good point that

Re: Make TermScorer non final

2009-03-18 Thread Simon Willnauer
Nothing different, I'm just concerned about the performance as the SpanQuerys take about twice as long as a term query. I run a little benchmark and found BoostingTermQuery being 1.5 times slower than TermQuery without any payloads in the index. In some usecases this could be important especially

Re: IndexWriter.rollback() logic

2009-03-18 Thread Michael McCandless
Nadav Har'El wrote: On Mon, Feb 23, 2009, Jason Rutherglen wrote about Re: IndexWriter.rollback() logic: Howdy An, Commit means the changes are committed, there's no rollback at that point. Also in the futuer please post your questions to java-dev@lucene.apache.org Actually, An does

Re: IndexWriter.rollback() logic

2009-03-18 Thread Michael McCandless
Also, rollback is still possible after a commit as long as you're using a deletion policy that keeps more than one commit around, by opening the IndexWriter on a prior commit point. Mike Nadav Har'El wrote: On Mon, Feb 23, 2009, Jason Rutherglen wrote about Re: IndexWriter.rollback() logic:

Re: Make TermScorer non final

2009-03-18 Thread Grant Ingersoll
See https://issues.apache.org/jira/browse/LUCENE-1017 for some background. Have you measured BTQ versus the SpanTermQuery? Position based stuff is often slower. SpanQueries could use some performance assessments, that is for sure. Ideally, I think you should compare: TermQuery v.

[jira] Commented: (LUCENE-1522) another highlighter

2009-03-18 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12682987#action_12682987 ] Michael McCandless commented on LUCENE-1522: OK to sum up here with

[jira] Commented: (LUCENE-1522) another highlighter

2009-03-18 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12682985#action_12682985 ] Michael McCandless commented on LUCENE-1522: {quote} ANDQuery, ORQuery, and

Re: Make TermScorer non final

2009-03-18 Thread Michael McCandless
Coming from the discussions in LUCENE-1522 (improving highlighter), I think at some point we should merge Span*Query into their normal counterparts, if possible. Ie, there should be only one TermQuery that can do both what the current TermQuery does, and also what SpanTermQuery does. It's able

Re: Make TermScorer non final

2009-03-18 Thread Mark Miller
In some usecases this could be important especially where the power of a span query is not required. I think the power of a spanquery is required for payloads though - the term query will not hit each position to do payload loading - there is no need for termquery to enumerate positions.

Re: Make TermScorer non final

2009-03-18 Thread Simon Willnauer
On Wed, Mar 18, 2009 at 1:32 PM, Mark Miller markrmil...@gmail.com wrote: In some usecases this could be important especially where the power of a span query is not required. I think the power of a spanquery is required for payloads though - the term query will not hit each position to do

[jira] Commented: (LUCENE-1522) another highlighter

2009-03-18 Thread Marvin Humphrey (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683030#action_12683030 ] Marvin Humphrey commented on LUCENE-1522: - I think we may need a tree-structured

[jira] Assigned: (LUCENE-1550) Add N-Gram String Matching for Spell Checking

2009-03-18 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll reassigned LUCENE-1550: --- Assignee: Grant Ingersoll Add N-Gram String Matching for Spell Checking

[jira] Commented: (LUCENE-1522) another highlighter

2009-03-18 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683032#action_12683032 ] Mark Miller commented on LUCENE-1522: - bq. Lucene H1. Too many elipses, and fragments

[jira] Commented: (LUCENE-1522) another highlighter

2009-03-18 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683053#action_12683053 ] Michael McCandless commented on LUCENE-1522: {quote} Something like that. An

[jira] Assigned: (LUCENE-1145) DisjunctionSumScorer small tweak

2009-03-18 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-1145: -- Assignee: Michael McCandless DisjunctionSumScorer small tweak

GSoC 09 project ideas...

2009-03-18 Thread Zaid Md. Abdul Wahab Sheikh
Hi lucene, In this link http://wiki.apache.org/general/SummerOfCode2009 , there are no project ideas for Lucene proper. (Only ideas for Mahout listed). Please put up some ideas for Lucene there or please mention some popular open issues that might be suitable as a GSoC project. I would very much

[jira] Commented: (LUCENE-1145) DisjunctionSumScorer small tweak

2009-03-18 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683058#action_12683058 ] Michael McCandless commented on LUCENE-1145: I plan to commit shortly.

[jira] Commented: (LUCENE-1522) another highlighter

2009-03-18 Thread Marvin Humphrey (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683064#action_12683064 ] Marvin Humphrey commented on LUCENE-1522: - OK, it sounds like one can simply use

Re: GSoC 09 project ideas...

2009-03-18 Thread Jason Rutherglen
Hi Z.S., I'll update LUCENE-1313 after LUCENE-1516 is committed. I can post the basic new patch I have for LUCENE-1313 (heavily simplified compared to the previous patches), however it will assume LUCENE-1516. The other area that will need to be addressed is standard benchmarking for different

Re: GSoC 09 project ideas...

2009-03-18 Thread Michael McCandless
I think creating a better Highlighter for Lucene, which is actively being discussed: https://issues.apache.org/jira/browse/LUCENE-1522 would make a good GSoC project, but I don't think I have time to mentor. Realtime search is currently in progress already, being tracked/iterated here:

[jira] Resolved: (LUCENE-1145) DisjunctionSumScorer small tweak

2009-03-18 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1145. Resolution: Fixed Thanks Eks and Paul! DisjunctionSumScorer small tweak

[jira] Updated: (LUCENE-1472) DateTools.stringToDate() can cause lock contention under load

2009-03-18 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1472: --- Fix Version/s: (was: 2.9) Removing 2.9 target. DateTools.stringToDate() can

[jira] Commented: (LUCENE-1522) another highlighter

2009-03-18 Thread David Kaelbling (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683079#action_12683079 ] David Kaelbling commented on LUCENE-1522: - Hi, Our application wants to find and

[jira] Updated: (LUCENE-1561) Maybe rename Field.omitTf, and strengthen the javadocs

2009-03-18 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1561: --- Attachment: LUCENE-1561.patch Attached patch. I renamed to

[jira] Assigned: (LUCENE-1561) Maybe rename Field.omitTf, and strengthen the javadocs

2009-03-18 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-1561: -- Assignee: Michael McCandless Maybe rename Field.omitTf, and strengthen the

[jira] Assigned: (LUCENE-1490) CJKTokenizer convert HALFWIDTH_AND_FULLWIDTH_FORMS wrong

2009-03-18 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-1490: -- Assignee: Michael McCandless CJKTokenizer convert

[jira] Resolved: (LUCENE-1490) CJKTokenizer convert HALFWIDTH_AND_FULLWIDTH_FORMS wrong

2009-03-18 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1490. Resolution: Fixed Thanks Daniel! CJKTokenizer convert

Re: Make TermScorer non final

2009-03-18 Thread Grant Ingersoll
On Mar 18, 2009, at 7:57 AM, Michael McCandless wrote: Coming from the discussions in LUCENE-1522 (improving highlighter), I think at some point we should merge Span*Query into their normal counterparts, if possible. Ie, there should be only one TermQuery that can do both what the current

[jira] Updated: (LUCENE-1526) Tombstone deletions in IndexReader

2009-03-18 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1526: --- Fix Version/s: (was: 2.9) I don't think we should block 2.9 for this.

[jira] Updated: (LUCENE-1533) Deleted documents as a Filter or top level Query

2009-03-18 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1533: --- Fix Version/s: (was: 2.9) Clearing fix version. Deleted documents as a Filter

Re: GSoC 09 project ideas...

2009-03-18 Thread Grant Ingersoll
On Mar 18, 2009, at 12:04 PM, Zaid Md. Abdul Wahab Sheikh wrote: Hi lucene, In this link http://wiki.apache.org/general/SummerOfCode2009 , there are no project ideas for Lucene proper. (Only ideas for Mahout listed). This requires someone (has to be a committer) willing to mentor. I'd

[jira] Assigned: (LUCENE-652) Compressed fields should be externalized (from Fields into Document)

2009-03-18 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-652: - Assignee: Michael McCandless Compressed fields should be externalized (from

move TrieRange* to core?

2009-03-18 Thread Michael McCandless
I think we should move TrieRange* into core before 2.9? It's received alot of attention, from both developers (Uwe Yonik did lots of iterations, and Solr is folding it in) and user interest. It's a simpler more scalable way to index numeric fields that you intend to sort and/or do range

[jira] Updated: (LUCENE-652) Compressed fields should be externalized (from Fields into Document)

2009-03-18 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-652: -- Attachment: LUCENE-652.patch I added o.a.l.document.CompressionTools, with static

Re: move TrieRange* to core?

2009-03-18 Thread Andi Vajda
On Mar 18, 2009, at 13:01, Michael McCandless luc...@mikemccandless.com wrote: I think we should move TrieRange* into core before 2.9? It's received alot of attention, from both developers (Uwe Yonik did lots of iterations, and Solr is folding it in) and user interest. It's a simpler

[jira] Commented: (LUCENE-1496) Move solr NumberUtils to lucene

2009-03-18 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683149#action_12683149 ] Michael McCandless commented on LUCENE-1496: If we move trie/* into core, what

Re: move TrieRange* to core?

2009-03-18 Thread Earwin Burrfoot
On Wed, Mar 18, 2009 at 23:08, Andi Vajda va...@osafoundation.org wrote: On Mar 18, 2009, at 13:01, Michael McCandless luc...@mikemccandless.com wrote: I think we should move TrieRange* into core before 2.9? It's received alot of attention, from both developers (Uwe Yonik did lots of

[jira] Assigned: (LUCENE-1435) CollationKeyFilter: convert tokens into CollationKeys encoded using IndexableBinaryStringTools

2009-03-18 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-1435: -- Assignee: Michael McCandless CollationKeyFilter: convert tokens into

[jira] Commented: (LUCENE-1435) CollationKeyFilter: convert tokens into CollationKeys encoded using IndexableBinaryStringTools

2009-03-18 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683155#action_12683155 ] Michael McCandless commented on LUCENE-1435: I think we should commit this to

[jira] Assigned: (LUCENE-1434) IndexableBinaryStringTools: convert arbitrary byte sequences into Strings that can be used as index terms, and vice versa

2009-03-18 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-1434: -- Assignee: Michael McCandless IndexableBinaryStringTools: convert arbitrary

RE: move TrieRange* to core?

2009-03-18 Thread Uwe Schindler
I have no problem with it! Thanks! What I would like to be fixed before moving it to core is the fact that a additional helper field is needed for the trie values. If everything could be in one field and the field is still sortable, it would be fine. For that, the order of terms in the FieldCache

[jira] Commented: (LUCENE-1435) CollationKeyFilter: convert tokens into CollationKeys encoded using IndexableBinaryStringTools

2009-03-18 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683167#action_12683167 ] Michael McCandless commented on LUCENE-1435: Steven, I'm hitting compilation

[jira] Commented: (LUCENE-1434) IndexableBinaryStringTools: convert arbitrary byte sequences into Strings that can be used as index terms, and vice versa

2009-03-18 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683171#action_12683171 ] Michael McCandless commented on LUCENE-1434: This looks good. I plan to

[jira] Commented: (LUCENE-1435) CollationKeyFilter: convert tokens into CollationKeys encoded using IndexableBinaryStringTools

2009-03-18 Thread Steven Rowe (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683174#action_12683174 ] Steven Rowe commented on LUCENE-1435: - It's in contrib/miscellaneous/ I used

File Formats Correction

2009-03-18 Thread Mark Miller
Just a note so I don't forget: The file formats page says their are 4 files used for termvectors but their is only 3 that I can see: tvx tvd tvf. http://lucene.apache.org/java/2_4_1/fileformats.html -- - Mark http://www.lucidimagination.com

[jira] Resolved: (LUCENE-1434) IndexableBinaryStringTools: convert arbitrary byte sequences into Strings that can be used as index terms, and vice versa

2009-03-18 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1434. Resolution: Fixed Thanks Steven! IndexableBinaryStringTools: convert arbitrary

[jira] Commented: (LUCENE-1435) CollationKeyFilter: convert tokens into CollationKeys encoded using IndexableBinaryStringTools

2009-03-18 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683182#action_12683182 ] Michael McCandless commented on LUCENE-1435: OK, thanks for the pointer -- I

Re: File Formats Correction

2009-03-18 Thread Michael McCandless
Indeed! I'll fix on trunk. Mike Mark Miller wrote: Just a note so I don't forget: The file formats page says their are 4 files used for termvectors but their is only 3 that I can see: tvx tvd tvf. http://lucene.apache.org/java/2_4_1/fileformats.html -- - Mark

RE: move TrieRange* to core?

2009-03-18 Thread Uwe Schindler
I think we should move TrieRange* into core before 2.9? It's received alot of attention, from both developers (Uwe Yonik did lots of iterations, and Solr is folding it in) and user interest. It's a simpler more scalable way to index numeric fields that you intend to sort and/or do

Re: move TrieRange* to core?

2009-03-18 Thread Michael McCandless
Uwe Schindler wrote: I would be happy with a renaming to NumberRangeFilter, but trie should appear somewhere in the docs. I like this approach (and referencing the original paper); I think it's important the javadocs give enough detail about how it works so that one can understand the big

Re: move TrieRange* to core?

2009-03-18 Thread Michael McCandless
Uwe Schindler wrote: I have no problem with it! Thanks! What I would like to be fixed before moving it to core is the fact that a additional helper field is needed for the trie values. If everything could be in one field and the field is still sortable, it would be fine. For that, the

Re: move TrieRange* to core?

2009-03-18 Thread Michael McCandless
Michael McCandless luc...@mikemccandless.com wrote: Though, won't this make loading the field cache more costly since you'll iterate through many more terms? Or... do the full precision fields always order above all lower precision fields across all docs? If so... maybe we could extend

RE: move TrieRange* to core?

2009-03-18 Thread Uwe Schindler
Though, won't this make loading the field cache more costly since you'll iterate through many more terms? Or... do the full precision fields always order above all lower precision fields across all docs? The highest precision terms have a shift value of 0. As the first char of the encoded

[jira] Commented: (LUCENE-1490) CJKTokenizer convert HALFWIDTH_AND_FULLWIDTH_FORMS wrong

2009-03-18 Thread Daniel Cheng (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683240#action_12683240 ] Daniel Cheng commented on LUCENE-1490: -- This was discovered by Chan

Re: move TrieRange* to core?

2009-03-18 Thread Michael McCandless
Uwe Schindler u...@thetaphi.de wrote: If so... maybe we could extend FieldCache's parser to allow it to stop-early? Ie it'd get the TermEnum, iterate through all the full precision terms first, asking your parser to convert to long/int, and then when your parser sees the very first

[jira] Created: (LUCENE-1567) New flexible query parser

2009-03-18 Thread Luis Alves (JIRA)
New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A

[jira] Commented: (LUCENE-1567) New flexible query parser

2009-03-18 Thread Luis Alves (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683308#action_12683308 ] Luis Alves commented on LUCENE-1567: Should the Flexible Query Parser patch be

[jira] Commented: (LUCENE-1567) New flexible query parser

2009-03-18 Thread Adriano Crestani (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683313#action_12683313 ] Adriano Crestani commented on LUCENE-1567: -- It's probably not ok, since lucene