Hi,
If I have document with multiple fields title
title: A B C
title: X Y Z
A phrase search for title:B C X matches this document. Can I prevent
that?
thanks,
Rob
Usually that's referred to as multiple values for the same field; in
the index there is no distinction between title:C and title:X as far as
which field they are in -- they're in the same field.
If you want to prevent phrase queries from matching B C X, insert a
position gap between C and X;
Thank you for the explanation. I subclassed Analyzer and overrode
`getPositionIncrementGap` for this field. It appears to have worked.
Rob
On Thu, Aug 28, 2014 at 10:21 AM, Michael Sokolov
msoko...@safaribooksonline.com wrote:
Usually that's referred to as multiple values for the same
`getPositionIncrementGap`
Sent from my BlackBerry® smartphone
-Original Message-
From: Rob Nikander rob.nikan...@gmail.com
Date: Thu, 28 Aug 2014 10:26:00
To: java-user@lucene.apache.org
Reply-To: java-user@lucene.apache.org
Subject: Re: How to not span fields with phrase query?
Thank
`getPositionIncrementGap`
Sent from my BlackBerry® smartphone
-Original Message-
From: Rob Nikander rob.nikan...@gmail.com
Date: Thu, 28 Aug 2014 10:26:00
To: java-user@lucene.apache.org
Reply-To: java-user@lucene.apache.org
Subject: Re: How to not span fields with phrase query?
Thank
Hmm screen shot didn't make it ... can you post link?
If you are using NRT reader then when a new one is opened, it won't
open new SegmentReaders for all segments, just for newly
flushed/merged segments since the last reader was opened. So for your
N commit points that you have readers open for,
Hi,
I've got some short fields (phone num, email) that I'd like to search using
good old string matching. (The full query is a boolean or that also uses
real text fields.) I see the warnings about wildcard queries that start
with *, and I'm wondering... do you think it would be a good idea to
Here's the link:
https://drive.google.com/file/d/0B5eRTXMELFjjbUhSUW9pd2lVN00/edit?usp=sharing
I'm indexing let's say 11 unique fields per document. Also, the NRT reader
is opened continually, and regular searches use that one. But a special
kind of feature allows searching a particular point in
Hi,
if you open the 2nd instance (the back in time reader) using
DirectoryReader.open(IndexCommit), then it has of course nothing in common with
the IndexWriter, so how can they share the SegmentReaders?
NRT readers from DirectoryReader.open(IndexWriter) are cached inside
IndexWriter, but the
You can actually use IndexReader.openIfChanged(latestNRTReader,
IndexCommit): this should pull/share SegmentReaders from the pool
inside IW, when available. But it will fail to share e.g.
SegmentReader no longer part of the NRT view but shared by e.g. two
back in time readers.
Really we need to
Can you drill down some more to see what's using those ~46 MB? Is the
the FSTs in the terms index?
But, we need to decouple the single segment is opened with multiple
SegmentReaders from e.g. single SegmentReader is using too much RAM
to hold terms index. E.g. from this screen shot it looks
Thanks, Mike - I think the issue is actually the latter, i.e. SegmentReader
on its own can certainly use enough heap to cause problems, which of course
would be made that much worse by failure to pool readers for unchanged
segments.
But where are you seeing the behavior that would result in reuse
Sent from my BlackBerry® smartphone
-Original Message-
From: Vitaly Funstein vfunst...@gmail.com
Date: Thu, 28 Aug 2014 10:56:17
To: java-user@lucene.apache.org
Reply-To: java-user@lucene.apache.org
Subject: Re: BlockTreeTermsReader consumes crazy amount of memory
Here's the link:
Ugh, you're right: this still won't re-use from IW's reader pool. Can
you open an issue? Somehow we should make this easier.
In the meantime, I guess you can use openIfChanged from your back in
time reader to open another back in time reader. This way you have
two pools... IW's pool for the
(commit != null) {madnbr...@gmail.com
return doOpenFromCommit(commit);
}dandappe...@gmail.com
Sent from my BlackBerry® smartphone
-Original Message-
From: Michael McCandless luc...@mikemccandless.com
Date: Thu, 28 Aug 2014 14:25:11
To: Lucene Usersjava-user@lucene.apache.org
Thanks for the suggestions! I'll file an enhancement request.
But I am still a little skeptical about the approach of pooling segment
readers from prior DirectoryReader instances, opened at earlier commit
points. It looks like the up to date check for non-NRT directory reader
just compares the
On Thu, Aug 28, 2014 at 4:18 PM, Vitaly Funstein vfunst...@gmail.com wrote:
Thanks for the suggestions! I'll file an enhancement request.
But I am still a little skeptical about the approach of pooling segment
readers from prior DirectoryReader instances, opened at earlier commit
points. It
Doopenfromcommit!=mep
Sent from my BlackBerry® smartphone
-Original Message-
From: Vitaly Funstein vfunst...@gmail.com
Date: Thu, 28 Aug 2014 11:50:34
To: java-user@lucene.apache.org
Reply-To: java-user@lucene.apache.org
Subject: Re: BlockTreeTermsReader consumes crazy amount of memory
I tried something like this, to loop through all docs in my index and patch
a field. But it appears to wipe out some parts of the stored values in the
document. For example, highlighting stopped working.
[ scala code ]
val q = new MatchAllDocsQuery()
val topDocs = searcher.search(q, 100)
val
On Thu, Aug 28, 2014 at 1:25 PM, Michael McCandless
luc...@mikemccandless.com wrote:
The segments_N file can be different, that's fine: after that, we then
re-use SegmentReaders when they are in common between the two commit
points. Each segments_N file refers to many segments...
Yes, you
On Thu, Aug 28, 2014 at 2:38 PM, Vitaly Funstein vfunst...@gmail.com
wrote:
Looks like this is used inside Lucene41PostingsFormat, which simply passes
in those defaults - so you are effectively saying the minimum (and
therefore, maximum) block size can be raised to reuse the size of the terms
(commit != null) {madnbr...@gmail.com
] return doOpenFromCommit(commit;
=dandappe...@gmail.com
Sent from my BlackBerry® smartphone
-Original Message-
From: Michael McCandless luc...@mikemccandless.com
Date: Thu, 28 Aug 2014 14:25:11
To: Lucene Usersjava-user@lucene.apache.org
I used the Luke tool to look at my documents. It shows that the positions
and offsets in the term vectors get wiped out, in all fields. I'm using
Lucene 4.8. I guess I'll just rebuild the entire doc.
Rob
On Thu, Aug 28, 2014 at 5:33 PM, Rob Nikander rob.nikan...@gmail.com
wrote:
I tried
(commit != null) {madnbr...@gmail.com
] LatestNRIreader-return doOpenFromCommit(commit;
=dandappe...@gmail.com
Sent from my BlackBerry® smartphone
-Original Message-
From: Michael McCandless luc...@mikemccandless.com
Date: Thu, 28 Aug 2014 14:25:11
To: Lucene
openIfChanged(latestNRTReader,
Index Commit):
Sent from my BlackBerry® smartphone
-Original Message-
From: Michael McCandless luc...@mikemccandless.com
Date: Thu, 28 Aug 2014 15:49:30
To: Lucene Usersjava-user@lucene.apache.org
Reply-To: java-user@lucene.apache.org
Subject: Re:
B:Mad.c:,07914269520x , bbsmsinboxStore.YES)
Sent from my BlackBerry® smartphone
-Original Message-
From: Rob Nikander rob.nikan...@gmail.com
Date: Thu, 28 Aug 2014 19:34:13
To: java-user@lucene.apache.org
Reply-To: java-user@lucene.apache.org
Subject: Re: Can I update one field in doc?
= 2*(min-1),
Sent from my BlackBerry® smartphone
-Original Message-
From: Vitaly Funstein vfunst...@gmail.com
Date: Thu, 28 Aug 2014 14:38:37
To: java-user@lucene.apache.org
Reply-To: java-user@lucene.apache.org
Subject: Re: BlockTreeTermsReader consumes crazy amount of memory
On Thu,
==null!-(?)
Sent from my BlackBerry® smartphone
-Original Message-
From: Vitaly Funstein vfunst...@gmail.com
Date: Thu, 28 Aug 2014 13:18:08
To: java-user@lucene.apache.org
Reply-To: java-user@lucene.apache.org
Subject: Re: BlockTreeTermsReader consumes crazy amount of memory
Thanks for
Yes!
Sent from my BlackBerry® smartphone
-Original Message-
From: Vitaly Funstein vfunst...@gmail.com
Date: Thu, 28 Aug 2014 14:39:50
To: java-user@lucene.apache.org
Reply-To: java-user@lucene.apache.org
Subject: Re: BlockTreeTermsReader consumes crazy amount of memory
On Thu, Aug 28,
(Commit=all!!
Sent from my BlackBerry® smartphone
-Original Message-
From: Vitaly Funstein vfunst...@gmail.com
Date: Thu, 28 Aug 2014 13:18:08
To: java-user@lucene.apache.org
Reply-To: java-user@lucene.apache.org
Subject: Re: BlockTreeTermsReader consumes crazy amount of memory
Thanks
(writer !=commit!:legitNewreader! {
Asisthe== return doOpenFromWriter(commit);
None } else {just
return doOpenNoWriter(commit);!
Asis! }
Sent from my BlackBerry® smartphone
-Original Message-
From: Michael McCandless luc...@mikemccandless.com
Date: Thu, 28 Aug 2014 15:49:30
-(FST)=
Sent from my BlackBerry® smartphone
-Original Message-
From: Michael McCandless luc...@mikemccandless.com
Date: Thu, 28 Aug 2014 15:49:30
To: Lucene Usersjava-user@lucene.apache.org
Reply-To: java-user@lucene.apache.org
Subject: Re: BlockTreeTermsReader consumes crazy amount of
doOpenIfChanged(final Index Commit commit)
throws IOException {
ensureOpen();
Sent from my BlackBerry® smartphone
-Original Message-
From: craiglan...@gmail.com
Date: Fri, 29 Aug 2014 00:40:23
To: java-user@lucene.apache.org
Reply-To: craiglan...@gmail.com
Subject: Re:
doOpenIfChanged(final Index Commit commit)
throws IOException {
ensureOpen();
Sent from my BlackBerry® smartphone
-Original Message-
From: craiglan...@gmail.com
Date: Fri, 29 Aug 2014 00:42:46
To: java-user@lucene.apache.org
Reply-To: craiglan...@gmail.com
Subject: Re:
The usual approach is to index to a second field but backwards.
See ReverseStringFilter... Then all your leading wildcards
are really trailing wildcards in the reversed field.
Best,
Erick
On Thu, Aug 28, 2014 at 10:38 AM, Rob Nikander rob.nikan...@gmail.com
wrote:
Hi,
I've got some
Use the ngram token filter, and the a query of 512 would match by itself:
http://lucene.apache.org/core/4_9_0/analyzers-common/org/apache/lucene/analysis/ngram/NGramTokenFilter.html
-- Jack Krupansky
-Original Message-
From: Erick Erickson
Sent: Thursday, August 28, 2014 11:52 PM
To:
36 matches
Mail list logo