The following testcase runs endlessly and produces VERY heavy load.
...
String query = Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed
diam nonumy eirmod tempor invidunt ut
+ labore et dolore magna aliquyam erat, sed
diam voluptua. At vero eos et
Hi,
I was checking the SortedDocValuesField and its performance in Sort as opposed
to a normal i.e. StringField and its performance in the same sort. So, I used
the same string/bytesref value in both fields and in separate JVM processes, I
launched the two sorts.
I used a RAMDirectory and
don't use RAMDirectory: its not very performant and really intended
for e.g. testing and so on.
also, using a ramdirectory here defeats the purpose: the idea behind
using a docvaluesfield in most cases is to keep (most of) such
datastructures out of heap memory. The datastructures and even the
I'll defer the the hard-core Lucene committers for the technical details,
but I would suggest that a very large term with dozens of wildcards is a
known limitation (albeit not well-documented.) IOW, to use wildcards in
Lucene in a performant manner, they need to be brief.
-- Jack Krupansky
Hi,
I have to index millions of files, that's why i am thinking batch wise
indexing is good.
Is it possible to do batch indexing using lucene?
If batch indexing is possible using lucene provide me sample snippet.
So could you please provide your valuable suggestions.
Thanks
Venkata
I suspect you're getting leading wildcard searches as well, which must
do entire term scans unless you're doing the reverse trick.
Replacing all successive whitespace gives you:
download lucene source code... and check the demo source files that are
shipped with it ... you should find a sample indexing file...
On Thu, Jun 26, 2014 at 9:27 PM, Venkata krishna venkat1...@gmail.com
wrote:
Hi,
I have to index millions of files, that's why i am thinking batch wise
The test case is only parsing this query, not trying to run it,
right? So it doesn't involve automaton/FST ... just the flexible
query parser code?
It seems bad that flexible QP would take so long, even if the query is
strange.
Can you open an issue, and maybe attach a thread dump so we can see