Lucene is not designed for retrieving that many results. What are you doing
with those 5 lacs documents, I suspect this is too much to display so you
probably perform some computations on them? If so maybe you could move them
to Lucene using eg. facets? If that does not work, I'm afraid that Lucene
Lucene by default will search all segments, because it does not know that
your field is a primary key.
Trejkaz's suggestion to early-terminate should work well. You could also
write custom code that uses TermsEnum on each segment.
Mike McCandless
http://blog.mikemccandless.com
On Thu, Apr 20,
then which one is right tool for text searching in files. please can you
suggest me?
On Fri, Apr 21, 2017 at 2:01 PM, Adrien Grand wrote:
> Lucene is not designed for retrieving that many results. What are you doing
> with those 5 lacs documents, I suspect this is too much to display so you
> p
Hi,
for full text search, Lucene is the right tool. The problem is that inverted
indexes and the software (like Lucene) on top are optimized to return the best
ranking results very fast. This is what users normally do, e.g. when they
search Google. You get a page with 10 or 20 results displayed
Hi all.
I’m relatively new to Lucene, so I have a couple questions about writing custom
filters.
The way I understand it, one would extend
org.apache.lucene.analysis.TokenFilter and override #incrementToken to examine
the current token provided by a stream token producer.
I’d like to write so
Hi,
LimitTokenCountFilter is used to index first n tokens. May be it can inspire
you.
Ahmet
On Friday, April 21, 2017, 6:20:11 PM GMT+3, Edoardo Causarano
wrote:
Hi all.
I’m relatively new to Lucene, so I have a couple questions about writing custom
filters.
The way I understand it, one woul
Hi,
thanks for your reply. In several other implementations I’ve seen this pattern
of using a while(input.incrementToken()) within the filter’s incrementToken
method. Is this approach recommended or are there hidden traps (eg: memory
consumption, dependency on filter ordering and so on)
Best
: then which one is right tool for text searching in files. please can you
: suggest me?
so far all you've done is show us your *indexing* code; and said that
after you do a search, calling searcher.doc(docid) on 500,000 documents is
slow.
But you still haven't described the usecase you are tr
: Lucene by default will search all segments, because it does not know that
: your field is a primary key.
:
: Trejkaz's suggestion to early-terminate should work well. You could also
: write custom code that uses TermsEnum on each segment.
Before you go too far down the rabit hole of writting a
Hello,
Let me explain my case:
- suppose I am searching word ("pain" (in same chapter) "head") .This
is my query.
Now what i need to do is i need to first search "pain" and then i need to
search "head" seperately then i need common file name of both search result.
Now the criteria is Suppose:
10 matches
Mail list logo