Re: Probelm sort on TermEnum

2009-04-08 Thread Federica Falini Data Management S.p.A
Hi Steve, in fact the list of terms returned is for user consumption. From every term is possible with a link to activate a search on the term itself and access to document. Annales cafe Caf zucche Thanks Federica Steven A Rowe ha scritto: On 4/7/2009 at 1:19 PM, Michael McCandless

Re: Future projects

2009-04-08 Thread Michael McCandless
On Tue, Apr 7, 2009 at 7:05 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote:    I think we should keep it simple, unless we discover real perf problems with the current approach. Simple is good, however the indexing performance will lag because we're back to the indexing speed of pre

[jira] Updated: (LUCENE-1590) Stored-only fields automatically enable norms and tf when added to document

2009-04-08 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1590: -- Attachment: LUCENE-1590.patch Here is the final patch. I added two tests (one for the bug

[jira] Commented: (LUCENE-1231) Column-stride fields (aka per-document Payloads)

2009-04-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12696972#action_12696972 ] Michael McCandless commented on LUCENE-1231: bq. If you e.g. want to show 5

[jira] Assigned: (LUCENE-1590) Stored-only fields automatically enable norms and tf when added to document

2009-04-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-1590: -- Assignee: Michael McCandless Stored-only fields automatically enable norms

[jira] Commented: (LUCENE-1590) Stored-only fields automatically enable norms and tf when added to document

2009-04-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12696994#action_12696994 ] Michael McCandless commented on LUCENE-1590: Patch looks good! All tests

[jira] Commented: (LUCENE-1539) Improve Benchmark

2009-04-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12696971#action_12696971 ] Michael McCandless commented on LUCENE-1539: I think DeleteByPercentTask.java

omitTF comment

2009-04-08 Thread Mark Miller
The omitTf comment is: /** Expert: * * If set, omit term freq, positions and payloads from postings for this field. * pbNOTE/b: this is a dangerous option to enable. * While it reduces storage space required in the index, * it also means any query requiring positional *

[jira] Commented: (LUCENE-1231) Column-stride fields (aka per-document Payloads)

2009-04-08 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12696977#action_12696977 ] Earwin Burrfoot commented on LUCENE-1231: - I can share my design for doc loading,

[jira] Commented: (LUCENE-1589) IndexWriter.addIndexesNoOptimize(IndexReader[] readers)

2009-04-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12697001#action_12697001 ] Michael McCandless commented on LUCENE-1589: {quote} The deletes are coming

Re: omitTF comment

2009-04-08 Thread Michael McCandless
How about simply: /** Expert: * * If set, omit term freq, positions and payloads from postings for this field. * * pbNOTE/b: While this option reduces storage space required in the index, * it also means any query requiring positional * information, such as {...@link PhraseQuery} or

Re: omitTF comment

2009-04-08 Thread Mark Miller
Yeah, you got my vote. I think this one actually felt a bit more dangerous when it was just called omitTf(). Michael McCandless wrote: How about simply: /** Expert: * * If set, omit term freq, positions and payloads from postings for this field. * * pbNOTE/b: While this option reduces

[jira] Commented: (LUCENE-1590) Stored-only fields automatically enable norms and tf when added to document

2009-04-08 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12697024#action_12697024 ] Uwe Schindler commented on LUCENE-1590: --- {quote} Patch looks good! All tests pass.

Re: omitTF comment

2009-04-08 Thread Michael McCandless
OK I'll make this change in the pending patch on LUCENE-1561. Mike On Wed, Apr 8, 2009 at 8:52 AM, Mark Miller markrmil...@gmail.com wrote: Yeah, you got my vote. I think this one actually felt a bit more dangerous when it was just called omitTf(). Michael McCandless wrote: How about

[jira] Commented: (LUCENE-1590) Stored-only fields automatically enable norms and tf when added to document

2009-04-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12697034#action_12697034 ] Michael McCandless commented on LUCENE-1590: bq. In principle the Field

[jira] Commented: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

2009-04-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12697052#action_12697052 ] Michael McCandless commented on LUCENE-1575: I wonder if we should break out

[jira] Commented: (LUCENE-1539) Improve Benchmark

2009-04-08 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12697057#action_12697057 ] Shai Erera commented on LUCENE-1539: Is it also interesting to add extensions to

[jira] Commented: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

2009-04-08 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12697061#action_12697061 ] Shai Erera commented on LUCENE-1575: That's actually what's done in

[jira] Commented: (LUCENE-1231) Column-stride fields (aka per-document Payloads)

2009-04-08 Thread Marvin Humphrey (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12697076#action_12697076 ] Marvin Humphrey commented on LUCENE-1231: - FWIW, I think priority for document

[jira] Commented: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

2009-04-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12697083#action_12697083 ] Michael McCandless commented on LUCENE-1575: bq. maxScore is only tracked in

[jira] Commented: (LUCENE-1539) Improve Benchmark

2009-04-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12697088#action_12697088 ] Michael McCandless commented on LUCENE-1539: Enabling bzip compression sounds

[jira] Commented: (LUCENE-1539) Improve Benchmark

2009-04-08 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12697117#action_12697117 ] Shai Erera commented on LUCENE-1539: bq. Can you open a new issue? Will do. Improve

[jira] Created: (LUCENE-1591) Enable bzip compression in benchmark

2009-04-08 Thread Shai Erera (JIRA)
Enable bzip compression in benchmark Key: LUCENE-1591 URL: https://issues.apache.org/jira/browse/LUCENE-1591 Project: Lucene - Java Issue Type: Improvement Components: contrib/benchmark

[jira] Commented: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

2009-04-08 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12697122#action_12697122 ] Shai Erera commented on LUCENE-1575: Right ... so basically we're talking about

[jira] Commented: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

2009-04-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12697124#action_12697124 ] Michael McCandless commented on LUCENE-1575: Sounds right! Wanna update the

[jira] Commented: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

2009-04-08 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12697130#action_12697130 ] Shai Erera commented on LUCENE-1575: of course ! Refactoring Lucene collectors

[jira] Commented: (LUCENE-1591) Enable bzip compression in benchmark

2009-04-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12697143#action_12697143 ] Michael McCandless commented on LUCENE-1591: I'm hitting this, when trying to

[jira] Commented: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

2009-04-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12697152#action_12697152 ] Michael McCandless commented on LUCENE-1575: I came across another simple

[jira] Resolved: (LUCENE-1561) Maybe rename Field.omitTf, and strengthen the javadocs

2009-04-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1561. Resolution: Fixed Maybe rename Field.omitTf, and strengthen the javadocs

[jira] Commented: (LUCENE-1591) Enable bzip compression in benchmark

2009-04-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12697164#action_12697164 ] Michael McCandless commented on LUCENE-1591: So, after upgrading to xerces

possible TermInfosReader speedup

2009-04-08 Thread Earwin Burrfoot
Currently, when we're seeking a given Term, it does a binary search across all term space, including terms belonging to other fields. I propose augmenting fields file with two pointers (firstTerm, lastTerm) for each field. That reduces range we need to search, and instead of comparing Terms we

Re: possible TermInfosReader speedup

2009-04-08 Thread Michael McCandless
On Wed, Apr 8, 2009 at 3:46 PM, Earwin Burrfoot ear...@gmail.com wrote: Currently, when we're seeking a given Term, it does a binary search across all term space, including terms belonging to other fields. I propose augmenting fields file with two pointers (firstTerm, lastTerm) for each

[jira] Commented: (LUCENE-1284) Set of Java classes that allow the Lucene search engine to use morphological information developed for the Apertium open-source machine translation platform (http://www

2009-04-08 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12697185#action_12697185 ] Otis Gospodnetic commented on LUCENE-1284: -- Felipe: I took another look at this.

[jira] Commented: (LUCENE-1284) Set of Java classes that allow the Lucene search engine to use morphological information developed for the Apertium open-source machine translation platform (http://www

2009-04-08 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12697208#action_12697208 ] Otis Gospodnetic commented on LUCENE-1284: -- One more for Felipe. Is there a page

Re: possible TermInfosReader speedup

2009-04-08 Thread Earwin Burrfoot
On Thu, Apr 9, 2009 at 00:14, Michael McCandless luc...@mikemccandless.com wrote: On Wed, Apr 8, 2009 at 3:46 PM, Earwin Burrfoot ear...@gmail.com wrote: Currently, when we're seeking a given Term, it does a binary search across all term space, including terms belonging to other fields. I

[jira] Updated: (LUCENE-1539) Improve Benchmark

2009-04-08 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated LUCENE-1539: - Attachment: LUCENE-1539.patch Above mentioned issues fixed. It seems a bit awkward

[jira] Commented: (LUCENE-1313) Realtime Search

2009-04-08 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12697226#action_12697226 ] Jason Rutherglen commented on LUCENE-1313: -- {quote} Still, it's synthetic. If you

[jira] Commented: (LUCENE-1590) Stored-only fields automatically enable norms and tf when added to document

2009-04-08 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12697238#action_12697238 ] Uwe Schindler commented on LUCENE-1590: --- bq. Since FieldInfos is per-segment, one

[jira] Commented: (LUCENE-1591) Enable bzip compression in benchmark

2009-04-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12697233#action_12697233 ] Michael McCandless commented on LUCENE-1591: After some iterations on

RE: possible TermInfosReader speedup

2009-04-08 Thread Uwe Schindler
Also, on the other topic - how hard is it to boost TermEnum.skipTo(term) speed to IndexReader.terms(term) level? Would be nice for TrieRangeFilter and probably some other filters. I think all that's needed is to implement SegmentTermEnum.skipTo, calling something like tis.terms(Term) but

Re: possible TermInfosReader speedup

2009-04-08 Thread Earwin Burrfoot
On Thu, Apr 9, 2009 at 02:01, Uwe Schindler u...@thetaphi.de wrote: Also, on the other topic - how hard is it to boost TermEnum.skipTo(term) speed to IndexReader.terms(term) level? Would be nice for TrieRangeFilter and probably some other filters. I think all that's needed is to implement