Re: Automata and Transducer on Lucene 6

2017-04-18 Thread Robert Muir
On Tue, Apr 18, 2017 at 5:16 PM, Michael McCandless wrote: > > +1 to use the tests to learn how things work; I don't know of any guide / > high level documentation for these low level classes, sorry. Maybe write > it up yourself and set it free somewhere online ;)

Re: Automata and Transducer on Lucene 6

2017-04-18 Thread Michael McCandless
On Tue, Apr 18, 2017 at 2:33 PM, Dawid Weiss wrote: - Automaton etc. are completely independent and used for slightly different > purposes (it's brics library ported to Lucene). Again -- tests will be > helpful to understand how they work. These classes use object >

Re: Early Termination of Queries

2017-04-18 Thread Michael McCandless
Each segment in Lucene is its own little index, and you can get the SegmentReader for it (use IndexReader.leaves() API from the full reader you opened), pass that to IndexSearcher, and search it. But be careful: the "last" segment is an unpredictable thing, because the default merge policy merges

Re: Total of term frequencies

2017-04-18 Thread Michael McCandless
Ahh I see. Term vectors are actually an inverted index for a single document, and they also have the same postings API as the whole index (including TermsEnum.totalTermFreq), but that method likely always returns -1 for term vectors because it's not implemented? Maybe Lucene's default codec

Re: Automata and Transducer on Lucene 6

2017-04-18 Thread Dawid Weiss
> I'd like to read something written by who designed these classes. What > motivated, usage examples, what it is good for and what it is not good for. > Maybe a history of the development of Automata on Lucene Are you looking for a historical book on Lucene development or are you looking to solve

Automata and Transducer on Lucene 6

2017-04-18 Thread Juarez Sampaio
Hello everyone, Recently I've watched a few videos and read a few blog posts on Lucene's Automata and how one can speed up things by 100x when properly using Automata and Transducers. "I can definitely use a boost like this", right? The problem is that this material I've read was writen to Lucene

Early Termination of Queries

2017-04-18 Thread aravinth thangasami
Hi all, *EarlyTerminatingSortingCollector* in lucene takes N documents from each segment. I have a case where i need to get the result from latest segment alone will be enough to provide the results. On finding N results in latest segment i will stop searching What is your opinion on this ??

Re: Total of term frequencies

2017-04-18 Thread Michael McCandless
I think you want to use the TermsEnum.totalTermFreq method? Mike McCandless http://blog.mikemccandless.com On Sun, Apr 16, 2017 at 11:36 AM, Manjula Wijewickrema wrote: > Hi, > > Is there any way to get the total count of terms in the Term Frequency > Vector (tvf)? I