How to not span fields with phrase query?

2014-08-28 Thread Rob Nikander
Hi, If I have document with multiple fields title title: A B C title: X Y Z A phrase search for title:B C X matches this document. Can I prevent that? thanks, Rob

Re: How to not span fields with phrase query?

2014-08-28 Thread Michael Sokolov
Usually that's referred to as multiple values for the same field; in the index there is no distinction between title:C and title:X as far as which field they are in -- they're in the same field. If you want to prevent phrase queries from matching B C X, insert a position gap between C and X;

Re: How to not span fields with phrase query?

2014-08-28 Thread Rob Nikander
Thank you for the explanation. I subclassed Analyzer and overrode `getPositionIncrementGap` for this field. It appears to have worked. Rob On Thu, Aug 28, 2014 at 10:21 AM, Michael Sokolov msoko...@safaribooksonline.com wrote: Usually that's referred to as multiple values for the same

Re: How to not span fields with phrase query?

2014-08-28 Thread craiglang44
`getPositionIncrementGap` Sent from my BlackBerry® smartphone -Original Message- From: Rob Nikander rob.nikan...@gmail.com Date: Thu, 28 Aug 2014 10:26:00 To: java-user@lucene.apache.org Reply-To: java-user@lucene.apache.org Subject: Re: How to not span fields with phrase query? Thank

Re: How to not span fields with phrase query?

2014-08-28 Thread craiglang44
`getPositionIncrementGap` Sent from my BlackBerry® smartphone -Original Message- From: Rob Nikander rob.nikan...@gmail.com Date: Thu, 28 Aug 2014 10:26:00 To: java-user@lucene.apache.org Reply-To: java-user@lucene.apache.org Subject: Re: How to not span fields with phrase query? Thank

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-08-28 Thread Michael McCandless
Hmm screen shot didn't make it ... can you post link? If you are using NRT reader then when a new one is opened, it won't open new SegmentReaders for all segments, just for newly flushed/merged segments since the last reader was opened. So for your N commit points that you have readers open for,

indexing all suffixes to support leading wildcard?

2014-08-28 Thread Rob Nikander
Hi, I've got some short fields (phone num, email) that I'd like to search using good old string matching. (The full query is a boolean or that also uses real text fields.) I see the warnings about wildcard queries that start with *, and I'm wondering... do you think it would be a good idea to

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-08-28 Thread Vitaly Funstein
Here's the link: https://drive.google.com/file/d/0B5eRTXMELFjjbUhSUW9pd2lVN00/edit?usp=sharing I'm indexing let's say 11 unique fields per document. Also, the NRT reader is opened continually, and regular searches use that one. But a special kind of feature allows searching a particular point in

RE: BlockTreeTermsReader consumes crazy amount of memory

2014-08-28 Thread Uwe Schindler
Hi, if you open the 2nd instance (the back in time reader) using DirectoryReader.open(IndexCommit), then it has of course nothing in common with the IndexWriter, so how can they share the SegmentReaders? NRT readers from DirectoryReader.open(IndexWriter) are cached inside IndexWriter, but the

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-08-28 Thread Michael McCandless
You can actually use IndexReader.openIfChanged(latestNRTReader, IndexCommit): this should pull/share SegmentReaders from the pool inside IW, when available. But it will fail to share e.g. SegmentReader no longer part of the NRT view but shared by e.g. two back in time readers. Really we need to

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-08-28 Thread Michael McCandless
Can you drill down some more to see what's using those ~46 MB? Is the the FSTs in the terms index? But, we need to decouple the single segment is opened with multiple SegmentReaders from e.g. single SegmentReader is using too much RAM to hold terms index. E.g. from this screen shot it looks

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-08-28 Thread Vitaly Funstein
Thanks, Mike - I think the issue is actually the latter, i.e. SegmentReader on its own can certainly use enough heap to cause problems, which of course would be made that much worse by failure to pool readers for unchanged segments. But where are you seeing the behavior that would result in reuse

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-08-28 Thread craiglang44
Sent from my BlackBerry® smartphone -Original Message- From: Vitaly Funstein vfunst...@gmail.com Date: Thu, 28 Aug 2014 10:56:17 To: java-user@lucene.apache.org Reply-To: java-user@lucene.apache.org Subject: Re: BlockTreeTermsReader consumes crazy amount of memory Here's the link:

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-08-28 Thread Michael McCandless
Ugh, you're right: this still won't re-use from IW's reader pool. Can you open an issue? Somehow we should make this easier. In the meantime, I guess you can use openIfChanged from your back in time reader to open another back in time reader. This way you have two pools... IW's pool for the

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-08-28 Thread craiglang44
(commit != null) {madnbr...@gmail.com return doOpenFromCommit(commit); }dandappe...@gmail.com Sent from my BlackBerry® smartphone -Original Message- From: Michael McCandless luc...@mikemccandless.com Date: Thu, 28 Aug 2014 14:25:11 To: Lucene Usersjava-user@lucene.apache.org

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-08-28 Thread Vitaly Funstein
Thanks for the suggestions! I'll file an enhancement request. But I am still a little skeptical about the approach of pooling segment readers from prior DirectoryReader instances, opened at earlier commit points. It looks like the up to date check for non-NRT directory reader just compares the

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-08-28 Thread Michael McCandless
On Thu, Aug 28, 2014 at 4:18 PM, Vitaly Funstein vfunst...@gmail.com wrote: Thanks for the suggestions! I'll file an enhancement request. But I am still a little skeptical about the approach of pooling segment readers from prior DirectoryReader instances, opened at earlier commit points. It

Re: madzmad-gleeson consumes crazy amount of memory

2014-08-28 Thread craiglang44
Doopenfromcommit!=mep Sent from my BlackBerry® smartphone -Original Message- From: Vitaly Funstein vfunst...@gmail.com Date: Thu, 28 Aug 2014 11:50:34 To: java-user@lucene.apache.org Reply-To: java-user@lucene.apache.org Subject: Re: BlockTreeTermsReader consumes crazy amount of memory

Can I update one field in doc?

2014-08-28 Thread Rob Nikander
I tried something like this, to loop through all docs in my index and patch a field. But it appears to wipe out some parts of the stored values in the document. For example, highlighting stopped working. [ scala code ] val q = new MatchAllDocsQuery() val topDocs = searcher.search(q, 100) val

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-08-28 Thread Vitaly Funstein
On Thu, Aug 28, 2014 at 1:25 PM, Michael McCandless luc...@mikemccandless.com wrote: The segments_N file can be different, that's fine: after that, we then re-use SegmentReaders when they are in common between the two commit points. Each segments_N file refers to many segments... Yes, you

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-08-28 Thread Vitaly Funstein
On Thu, Aug 28, 2014 at 2:38 PM, Vitaly Funstein vfunst...@gmail.com wrote: Looks like this is used inside Lucene41PostingsFormat, which simply passes in those defaults - so you are effectively saying the minimum (and therefore, maximum) block size can be raised to reuse the size of the terms

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-08-28 Thread craiglang44
(commit != null) {madnbr...@gmail.com ] return doOpenFromCommit(commit; =dandappe...@gmail.com Sent from my BlackBerry® smartphone -Original Message- From: Michael McCandless luc...@mikemccandless.com Date: Thu, 28 Aug 2014 14:25:11 To: Lucene Usersjava-user@lucene.apache.org

Re: Can I update one field in doc?

2014-08-28 Thread Rob Nikander
I used the Luke tool to look at my documents. It shows that the positions and offsets in the term vectors get wiped out, in all fields. I'm using Lucene 4.8. I guess I'll just rebuild the entire doc. Rob On Thu, Aug 28, 2014 at 5:33 PM, Rob Nikander rob.nikan...@gmail.com wrote: I tried

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-08-28 Thread craiglang44
(commit != null) {madnbr...@gmail.com ] LatestNRIreader-return doOpenFromCommit(commit; =dandappe...@gmail.com Sent from my BlackBerry® smartphone -Original Message- From: Michael McCandless luc...@mikemccandless.com Date: Thu, 28 Aug 2014 14:25:11 To: Lucene

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-08-28 Thread craiglang44
openIfChanged(latestNRTReader, Index Commit): Sent from my BlackBerry® smartphone -Original Message- From: Michael McCandless luc...@mikemccandless.com Date: Thu, 28 Aug 2014 15:49:30 To: Lucene Usersjava-user@lucene.apache.org Reply-To: java-user@lucene.apache.org Subject: Re:

Re: Can I update one field in doc?

2014-08-28 Thread craiglang44
B:Mad.c:,07914269520x , bbsmsinboxStore.YES) Sent from my BlackBerry® smartphone -Original Message- From: Rob Nikander rob.nikan...@gmail.com Date: Thu, 28 Aug 2014 19:34:13 To: java-user@lucene.apache.org Reply-To: java-user@lucene.apache.org Subject: Re: Can I update one field in doc?

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-08-28 Thread craiglang44
= 2*(min-1), Sent from my BlackBerry® smartphone -Original Message- From: Vitaly Funstein vfunst...@gmail.com Date: Thu, 28 Aug 2014 14:38:37 To: java-user@lucene.apache.org Reply-To: java-user@lucene.apache.org Subject: Re: BlockTreeTermsReader consumes crazy amount of memory On Thu,

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-08-28 Thread craiglang44
==null!-(?) Sent from my BlackBerry® smartphone -Original Message- From: Vitaly Funstein vfunst...@gmail.com Date: Thu, 28 Aug 2014 13:18:08 To: java-user@lucene.apache.org Reply-To: java-user@lucene.apache.org Subject: Re: BlockTreeTermsReader consumes crazy amount of memory Thanks for

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-08-28 Thread craiglang44
Yes! Sent from my BlackBerry® smartphone -Original Message- From: Vitaly Funstein vfunst...@gmail.com Date: Thu, 28 Aug 2014 14:39:50 To: java-user@lucene.apache.org Reply-To: java-user@lucene.apache.org Subject: Re: BlockTreeTermsReader consumes crazy amount of memory On Thu, Aug 28,

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-08-28 Thread craiglang44
(Commit=all!! Sent from my BlackBerry® smartphone -Original Message- From: Vitaly Funstein vfunst...@gmail.com Date: Thu, 28 Aug 2014 13:18:08 To: java-user@lucene.apache.org Reply-To: java-user@lucene.apache.org Subject: Re: BlockTreeTermsReader consumes crazy amount of memory Thanks

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-08-28 Thread craiglang44
(writer !=commit!:legitNewreader! { Asisthe== return doOpenFromWriter(commit); None } else {just return doOpenNoWriter(commit);! Asis! } Sent from my BlackBerry® smartphone -Original Message- From: Michael McCandless luc...@mikemccandless.com Date: Thu, 28 Aug 2014 15:49:30

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-08-28 Thread craiglang44
-(FST)= Sent from my BlackBerry® smartphone -Original Message- From: Michael McCandless luc...@mikemccandless.com Date: Thu, 28 Aug 2014 15:49:30 To: Lucene Usersjava-user@lucene.apache.org Reply-To: java-user@lucene.apache.org Subject: Re: BlockTreeTermsReader consumes crazy amount of

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-08-28 Thread craiglang44
doOpenIfChanged(final Index Commit commit) throws IOException { ensureOpen(); Sent from my BlackBerry® smartphone -Original Message- From: craiglan...@gmail.com Date: Fri, 29 Aug 2014 00:40:23 To: java-user@lucene.apache.org Reply-To: craiglan...@gmail.com Subject: Re:

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-08-28 Thread craiglang44
doOpenIfChanged(final Index Commit commit) throws IOException { ensureOpen(); Sent from my BlackBerry® smartphone -Original Message- From: craiglan...@gmail.com Date: Fri, 29 Aug 2014 00:42:46 To: java-user@lucene.apache.org Reply-To: craiglan...@gmail.com Subject: Re:

Re: indexing all suffixes to support leading wildcard?

2014-08-28 Thread Erick Erickson
The usual approach is to index to a second field but backwards. See ReverseStringFilter... Then all your leading wildcards are really trailing wildcards in the reversed field. Best, Erick On Thu, Aug 28, 2014 at 10:38 AM, Rob Nikander rob.nikan...@gmail.com wrote: Hi, I've got some

Re: indexing all suffixes to support leading wildcard?

2014-08-28 Thread Jack Krupansky
Use the ngram token filter, and the a query of 512 would match by itself: http://lucene.apache.org/core/4_9_0/analyzers-common/org/apache/lucene/analysis/ngram/NGramTokenFilter.html -- Jack Krupansky -Original Message- From: Erick Erickson Sent: Thursday, August 28, 2014 11:52 PM To: