Re: MoreLikeThisQuery term frequency caching

2009-04-07 Thread Michael McCandless
I don't have direct experience with MLT, but this sounds like a great improvement, so in answer to (3) I would say "definitely!". Mike On Tue, Apr 7, 2009 at 2:28 AM, Richard Marr wrote: > Hi all, > > I've been exploring MoreLikeThisQuery as part of a recent project and > something that came out

Re: HitCollector#collect(int,float,Collection)

2009-04-07 Thread Michael McCandless
Do you mean tracking the "atomic queries" that caused a given hit to match (where "atomic query" is a query that actually uses TermDocs/Positions to check matching, vs other queries like BooleanQuery that "glomm together" sub-query matches)? EG for a boolean query w/ N clauses, which of those N cl

[jira] Resolved: (LUCENE-1586) add IndexReader.getUniqueTermCount

2009-04-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1586. Resolution: Fixed Thanks Derek! > add IndexReader.getUniqueTermCount > --

Re: Future projects

2009-04-07 Thread Michael McCandless
On Mon, Apr 6, 2009 at 6:43 PM, Jason Rutherglen wrote: >> The realtime reader would have to have sub-readers per thread, > and an aggregate reader that "joins" them by interleaving the > docIDs > > Nice (i.e. nice and complex)! Right, this is why I like the current [simple] near real-time approa

[jira] Commented: (LUCENE-1313) Realtime Search

2009-04-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696438#action_12696438 ] Michael McCandless commented on LUCENE-1313: {quote} > I'd be very interested

[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

2009-04-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696444#action_12696444 ] Michael McCandless commented on LUCENE-1516: I think NRT search is finally rea

[jira] Commented: (LUCENE-1584) Callback for intercepting merging segments in IndexWriter

2009-04-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696445#action_12696445 ] Michael McCandless commented on LUCENE-1584: Jason once LUCENE-1516 is in, can

[jira] Commented: (LUCENE-1582) Make TrieRange completely independent from Document/Field with TokenStream of prefix encoded values

2009-04-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696475#action_12696475 ] Michael McCandless commented on LUCENE-1582: OK I committed the FieldCache par

Probelm sort on TermEnum

2009-04-07 Thread Federica Falini Data Management S.p.A
Title: Firma Good morning, In Lucene 2.2 i have made modification to Term.java, TermBuffer.java (see below)  in order to have  Term enumerations sorted case-insensitive (when a field is not-tokenized): TermEnum terms = reader.terms(new Term("myFieldNotTokenized", ""));   while ("myFieldNotT

[jira] Commented: (LUCENE-1582) Make TrieRange completely independent from Document/Field with TokenStream of prefix encoded values

2009-04-07 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696485#action_12696485 ] Uwe Schindler commented on LUCENE-1582: --- Thanks, i will then go forward with this. F

[jira] Commented: (LUCENE-1582) Make TrieRange completely independent from Document/Field with TokenStream of prefix encoded values

2009-04-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696497#action_12696497 ] Michael McCandless commented on LUCENE-1582: b.q Finally: Let's go on with 831

[jira] Commented: (LUCENE-1589) IndexWriter.addIndexesNoOptimize(IndexReader[] readers)

2009-04-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696434#action_12696434 ] Michael McCandless commented on LUCENE-1589: Jason are you working on a patch

[jira] Updated: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

2009-04-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1575: --- Attachment: LUCENE-1575.patch Attached new patch: * Changed members & methods in

[jira] Commented: (LUCENE-1582) Make TrieRange completely independent from Document/Field with TokenStream of prefix encoded values

2009-04-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696471#action_12696471 ] Michael McCandless commented on LUCENE-1582: OK the changes to FieldCache look

[jira] Updated: (LUCENE-1582) Make TrieRange completely independent from Document/Field with TokenStream of prefix encoded values

2009-04-07 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1582: -- Attachment: LUCENE-1582.patch New patch. In my opinion, it is now stable. New features/change

Re: HitCollector#collect(int,float,Collection)

2009-04-07 Thread Karl Wettin
7 apr 2009 kl. 10.23 skrev Michael McCandless: Do you mean tracking the "atomic queries" that caused a given hit to match (where "atomic query" is a query that actually uses TermDocs/Positions to check matching, vs other queries like BooleanQuery that "glomm together" sub-query matches)? EG fo

Re: HitCollector#collect(int,float,Collection)

2009-04-07 Thread Michael McCandless
On Tue, Apr 7, 2009 at 6:13 AM, Karl Wettin wrote: > > 7 apr 2009 kl. 10.23 skrev Michael McCandless: > >> Do you mean tracking the "atomic queries" that caused a given hit to >> match (where "atomic query" is a query that actually uses >> TermDocs/Positions to check matching, vs other queries lik

[jira] Resolved: (LUCENE-1582) Make TrieRange completely independent from Document/Field with TokenStream of prefix encoded values

2009-04-07 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler resolved LUCENE-1582. --- Resolution: Fixed Committed Revision: 762710 I only added term number statistics in the filt

[jira] Commented: (LUCENE-1584) Callback for intercepting merging segments in IndexWriter

2009-04-07 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696511#action_12696511 ] Earwin Burrfoot commented on LUCENE-1584: - .bq I'd like to step back and understan

[jira] Commented: (LUCENE-1584) Callback for intercepting merging segments in IndexWriter

2009-04-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696550#action_12696550 ] Michael McCandless commented on LUCENE-1584: bq. This is required in one form

omitNorms, omitTermFreqAndPositions in combination with stored-only fields

2009-04-07 Thread Uwe Schindler
Hi, during updating my internal components to the new TrieAPI, I have seen the following: I index a lot of numeric fields with trie encoding omitting norms and term frequency. This works great. Luke shows that both is omitted. As I sometimes also want to have the components of the field stored a

Re: omitNorms, omitTermFreqAndPositions in combination with stored-only fields

2009-04-07 Thread Michael McCandless
That sounds like a real bug to me. If the field is not indexed, then the norm/omitTFAP should be ignored. Can you open a Jira/patch? Thanks, and good catch! Mike On Tue, Apr 7, 2009 at 10:46 AM, Uwe Schindler wrote: > Hi, > > during updating my internal components to the new TrieAPI, I have s

[jira] Commented: (LUCENE-1584) Callback for intercepting merging segments in IndexWriter

2009-04-07 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696583#action_12696583 ] Earwin Burrfoot commented on LUCENE-1584: - bq. The problem is you need more inform

[jira] Created: (LUCENE-1590) Stored-only fields automatically enable norms and tf when added to document

2009-04-07 Thread Uwe Schindler (JIRA)
Stored-only fields automatically enable norms and tf when added to document --- Key: LUCENE-1590 URL: https://issues.apache.org/jira/browse/LUCENE-1590 Project: Lucene - Java

Re: MoreLikeThisQuery term frequency caching

2009-04-07 Thread Richard Marr
Thanks Mike, I'll leave it a few days to give people time to respond then start looking into creating a Jira ticket and a patch. 2009/4/7 Michael McCandless : > I don't have direct experience with MLT, but this sounds like a great > improvement, so in answer to (3) I would say "definitely!". > >

[jira] Reopened: (LUCENE-1561) Maybe rename Field.omitTf, and strengthen the javadocs

2009-04-07 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler reopened LUCENE-1561: --- > Maybe rename Field.omitTf, and strengthen the javadocs > -

[jira] Commented: (LUCENE-1561) Maybe rename Field.omitTf, and strengthen the javadocs

2009-04-07 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696603#action_12696603 ] Uwe Schindler commented on LUCENE-1561: --- I found a deprecation bug: setOmitTf() and

[jira] Updated: (LUCENE-1590) Stored-only fields automatically enable norms and tf when added to document

2009-04-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1590: --- Fix Version/s: 2.9 > Stored-only fields automatically enable norms and tf when added

[jira] Commented: (LUCENE-1561) Maybe rename Field.omitTf, and strengthen the javadocs

2009-04-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696628#action_12696628 ] Michael McCandless commented on LUCENE-1561: bq. setOmitTf() and other are onl

Re: Probelm sort on TermEnum

2009-04-07 Thread Michael McCandless
I think the new contrib/collation package may address this use case? It converts each term to its CollationKey, outside of Lucene. Mike On Tue, Apr 7, 2009 at 7:36 AM, Federica Falini Data Management S.p.A wrote: > Good morning, > In Lucene 2.2 i have made modification to Term.java, TermBuffer.j

Re: Probelm sort on TermEnum

2009-04-07 Thread Michael McCandless
Though, this is not yet released: it's on trunk (will be included in 2.9). Mike On Tue, Apr 7, 2009 at 1:19 PM, Michael McCandless wrote: > I think the new contrib/collation package may address this use case? > It converts each term to its CollationKey, outside of Lucene. > > Mike > > On Tue, Ap

[jira] Updated: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

2009-04-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1575: --- Attachment: LUCENE-1575.patch New patch which just fixes contrib/spatial's cutover t

[jira] Commented: (LUCENE-1589) IndexWriter.addIndexesNoOptimize(IndexReader[] readers)

2009-04-07 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696643#action_12696643 ] Jason Rutherglen commented on LUCENE-1589: -- Yes, because this will block the RAMD

[jira] Commented: (LUCENE-1589) IndexWriter.addIndexesNoOptimize(IndexReader[] readers)

2009-04-07 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696650#action_12696650 ] Jason Rutherglen commented on LUCENE-1589: -- I started, but because MergePolicy.On

RE: Probelm sort on TermEnum

2009-04-07 Thread Steven A Rowe
On 4/7/2009 at 1:19 PM, Michael McCandless wrote: > I think the new contrib/collation package may address this use case? > It converts each term to its CollationKey, outside of Lucene. Since AFAIK CollationKey creation is a one-way process, CollationKeyFilter may not be useful for Federica. Fede

[jira] Commented: (LUCENE-1561) Maybe rename Field.omitTf, and strengthen the javadocs

2009-04-07 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696661#action_12696661 ] Uwe Schindler commented on LUCENE-1561: --- Wasn't it the plan to remove these interfac

[jira] Commented: (LUCENE-1561) Maybe rename Field.omitTf, and strengthen the javadocs

2009-04-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696758#action_12696758 ] Michael McCandless commented on LUCENE-1561: bq. Wasn't it the plan to remove

Size of IndexReaders(potentially leading into an OOM)?

2009-04-07 Thread MakMak
I have a map of indexpaths against readers which I cache. For every new search, I know the indexpath, get the reader, reopen it and perform the search. Problem is, after the system runs for a while the size of the readers grows scary. Does any one know how much does a typical reader hold on to? Do

[jira] Updated: (LUCENE-1546) Add IndexReader.flush(commitUserData)

2009-04-07 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1546: -- Attachment: LUCENE-1546-deprecation.patch This patch fixes deprecation errors: I wrote a class

[jira] Updated: (LUCENE-1561) Maybe rename Field.omitTf, and strengthen the javadocs

2009-04-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1561: --- Attachment: LUCENE-1561.patch Attached patch, also deprecating omitTf in AbstractFie

[jira] Commented: (LUCENE-1546) Add IndexReader.flush(commitUserData)

2009-04-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696803#action_12696803 ] Michael McCandless commented on LUCENE-1546: OK I just committed that, thanks

Re: Size of IndexReaders(potentially leading into an OOM)?

2009-04-07 Thread Michael McCandless
Could you re-ask this on java-user instead? Thanks. Mike On Tue, Apr 7, 2009 at 5:13 PM, MakMak wrote: > > I have a map of indexpaths against readers which I cache. For every new > search, I know the indexpath, get the reader, reopen it and perform the > search. Problem is, after the system run

Re: Size of IndexReaders(potentially leading into an OOM)?

2009-04-07 Thread Goddard, Michael J.
- Original Message - From: java-dev-return-31898-michael.j.goddard=saic@lucene.apache.org To: java-dev@lucene.apache.org Sent: Tue Apr 07 17:13:45 2009 Subject: Size of IndexReaders(potentially leading into an OOM)? I have a map of indexpaths against readers which I cache. For ev

[jira] Commented: (LUCENE-1589) IndexWriter.addIndexesNoOptimize(IndexReader[] readers)

2009-04-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696809#action_12696809 ] Michael McCandless commented on LUCENE-1589: Hmm yes. This is also tricky: ho

Re: Future projects

2009-04-07 Thread Jason Rutherglen
> I think we should keep it simple, unless we discover real perf problems with the current approach. Simple is good, however the indexing performance will lag because we're back to the indexing speed of pre ram buffer? (i.e. merging segments using a ramdirectory). > need to do a merge sort (acr

[jira] Commented: (LUCENE-1590) Stored-only fields automatically enable norms and tf when added to document

2009-04-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696810#action_12696810 ] Michael McCandless commented on LUCENE-1590: Uwe are you working out a patch f

[jira] Commented: (LUCENE-1589) IndexWriter.addIndexesNoOptimize(IndexReader[] readers)

2009-04-07 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696812#action_12696812 ] Jason Rutherglen commented on LUCENE-1589: -- The deletes are coming into the exist

[jira] Commented: (LUCENE-1231) Column-stride fields (aka per-document Payloads)

2009-04-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696811#action_12696811 ] Michael McCandless commented on LUCENE-1231: One interesting idea, from Earwin

[jira] Updated: (LUCENE-1590) Stored-only fields automatically enable norms and tf when added to document

2009-04-07 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1590: -- Attachment: LUCENE-1590.patch Here is it, not fully tested, but seems to work at least for nor

[jira] Commented: (LUCENE-1590) Stored-only fields automatically enable norms and tf when added to document

2009-04-07 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696821#action_12696821 ] Uwe Schindler commented on LUCENE-1590: --- bq. The problem is: Luke does not show the

[jira] Updated: (LUCENE-1590) Stored-only fields automatically enable norms and tf when added to document

2009-04-07 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1590: -- Attachment: LUCENE-1590.patch Here the patch that also fixes the missing omitTf settings in Fi

[jira] Commented: (LUCENE-1589) IndexWriter.addIndexesNoOptimize(IndexReader[] readers)

2009-04-07 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696837#action_12696837 ] Jason Rutherglen commented on LUCENE-1589: -- I took a walk and thought about this,

[jira] Commented: (LUCENE-1231) Column-stride fields (aka per-document Payloads)

2009-04-07 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696838#action_12696838 ] Jason Rutherglen commented on LUCENE-1231: -- +1 Making it automatic makes sense.

[jira] Updated: (LUCENE-1313) Realtime Search

2009-04-07 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated LUCENE-1313: - Attachment: LUCENE-1313.jar Latest realtime code, transactions are removed. * Needs to

[jira] Updated: (LUCENE-1539) Improve Benchmark

2009-04-07 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated LUCENE-1539: - Attachment: LUCENE-1539.patch Fixed the above mentioned problems. When LUCENE-1516 is i

[jira] Commented: (LUCENE-1231) Column-stride fields (aka per-document Payloads)

2009-04-07 Thread Michael Busch (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696873#action_12696873 ] Michael Busch commented on LUCENE-1231: --- {quote} is for column-stride fields to be a