Re: How to get document effectively. or FieldCache example

2017-04-21 Thread Adrien Grand
Lucene is not designed for retrieving that many results. What are you doing with those 5 lacs documents, I suspect this is too much to display so you probably perform some computations on them? If so maybe you could move them to Lucene using eg. facets? If that does not work, I'm afraid that Lucene

Re: will lucene traverse all segments to search a 'primary key'term or will it stop as soon as it get one?

2017-04-21 Thread Michael McCandless
Lucene by default will search all segments, because it does not know that your field is a primary key. Trejkaz's suggestion to early-terminate should work well. You could also write custom code that uses TermsEnum on each segment. Mike McCandless http://blog.mikemccandless.com On Thu, Apr 20,

Re: How to get document effectively. or FieldCache example

2017-04-21 Thread neeraj shah
then which one is right tool for text searching in files. please can you suggest me? On Fri, Apr 21, 2017 at 2:01 PM, Adrien Grand wrote: > Lucene is not designed for retrieving that many results. What are you doing > with those 5 lacs documents, I suspect this is too much to display so you > p

RE: How to get document effectively. or FieldCache example

2017-04-21 Thread Uwe Schindler
Hi, for full text search, Lucene is the right tool. The problem is that inverted indexes and the software (like Lucene) on top are optimized to return the best ranking results very fast. This is what users normally do, e.g. when they search Google. You get a page with 10 or 20 results displayed

A question over TokenFilters

2017-04-21 Thread Edoardo Causarano
Hi all. I’m relatively new to Lucene, so I have a couple questions about writing custom filters. The way I understand it, one would extend org.apache.lucene.analysis.TokenFilter and override #incrementToken to examine the current token provided by a stream token producer. I’d like to write so

Re: A question over TokenFilters

2017-04-21 Thread Ahmet Arslan
Hi, LimitTokenCountFilter is used to index first n tokens. May be it can inspire you. Ahmet On Friday, April 21, 2017, 6:20:11 PM GMT+3, Edoardo Causarano wrote: Hi all. I’m relatively new to Lucene, so I have a couple questions about writing custom filters. The way I understand it, one woul

Re: A question over TokenFilters

2017-04-21 Thread Edoardo Causarano
Hi, thanks for your reply. In several other implementations I’ve seen this pattern of using a while(input.incrementToken()) within the filter’s incrementToken method. Is this approach recommended or are there hidden traps (eg: memory consumption, dependency on filter ordering and so on) Best

Re: How to get document effectively. or FieldCache example

2017-04-21 Thread Chris Hostetter
: then which one is right tool for text searching in files. please can you : suggest me? so far all you've done is show us your *indexing* code; and said that after you do a search, calling searcher.doc(docid) on 500,000 documents is slow. But you still haven't described the usecase you are tr

Re: will lucene traverse all segments to search a 'primary key'term or will it stop as soon as it get one?

2017-04-21 Thread Chris Hostetter
: Lucene by default will search all segments, because it does not know that : your field is a primary key. : : Trejkaz's suggestion to early-terminate should work well. You could also : write custom code that uses TermsEnum on each segment. Before you go too far down the rabit hole of writting a

Re: How to get document effectively. or FieldCache example

2017-04-21 Thread neeraj shah
Hello, Let me explain my case: - suppose I am searching word ("pain" (in same chapter) "head") .This is my query. Now what i need to do is i need to first search "pain" and then i need to search "head" seperately then i need common file name of both search result. Now the criteria is Suppose: