QueryParserUtil, big query with wildcards - runs endlessly and produces heavy load

2014-06-26 Thread Clemens Wyss DEV
The following testcase runs endlessly and produces VERY heavy load. ... String query = Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut + labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et

SortedDocValuesField

2014-06-26 Thread Sandeep Khanzode
Hi,   I was checking the SortedDocValuesField and its performance in Sort as opposed to a normal i.e. StringField and its performance in the same sort. So, I used the same string/bytesref value in both fields and in separate JVM processes, I launched the two sorts. I used a RAMDirectory and

Re: SortedDocValuesField

2014-06-26 Thread Robert Muir
don't use RAMDirectory: its not very performant and really intended for e.g. testing and so on. also, using a ramdirectory here defeats the purpose: the idea behind using a docvaluesfield in most cases is to keep (most of) such datastructures out of heap memory. The datastructures and even the

Re: QueryParserUtil, big query with wildcards - runs endlessly and produces heavy load

2014-06-26 Thread Jack Krupansky
I'll defer the the hard-core Lucene committers for the technical details, but I would suggest that a very large term with dozens of wildcards is a known limitation (albeit not well-documented.) IOW, to use wildcards in Lucene in a performant manner, they need to be brief. -- Jack Krupansky

Batch wise Indexing Structured Documents

2014-06-26 Thread Venkata krishna
Hi, I have to index millions of files, that's why i am thinking batch wise indexing is good. Is it possible to do batch indexing using lucene? If batch indexing is possible using lucene provide me sample snippet. So could you please provide your valuable suggestions. Thanks Venkata

Re: QueryParserUtil, big query with wildcards - runs endlessly and produces heavy load

2014-06-26 Thread Erick Erickson
I suspect you're getting leading wildcard searches as well, which must do entire term scans unless you're doing the reverse trick. Replacing all successive whitespace gives you:

Re: Batch wise Indexing Structured Documents

2014-06-26 Thread parnab kumar
download lucene source code... and check the demo source files that are shipped with it ... you should find a sample indexing file... On Thu, Jun 26, 2014 at 9:27 PM, Venkata krishna venkat1...@gmail.com wrote: Hi, I have to index millions of files, that's why i am thinking batch wise

Re: QueryParserUtil, big query with wildcards - runs endlessly and produces heavy load

2014-06-26 Thread Michael McCandless
The test case is only parsing this query, not trying to run it, right? So it doesn't involve automaton/FST ... just the flexible query parser code? It seems bad that flexible QP would take so long, even if the query is strange. Can you open an issue, and maybe attach a thread dump so we can see