The following "testcase" runs endlessly and produces VERY heavy load.
...
String query = "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed
diam nonumy eirmod tempor invidunt ut "
+ "labore et dolore magna aliquyam erat, sed
diam voluptua. At vero eos et
Hi,
I was checking the SortedDocValuesField and its performance in Sort as opposed
to a normal i.e. StringField and its performance in the same sort. So, I used
the same string/bytesref value in both fields and in separate JVM processes, I
launched the two sorts.
I used a RAMDirectory and cre
don't use RAMDirectory: its not very performant and really intended
for e.g. testing and so on.
also, using a ramdirectory here defeats the purpose: the idea behind
using a docvaluesfield in most cases is to keep (most of) such
datastructures out of heap memory. The datastructures and even the
com
I'll defer the the hard-core Lucene committers for the technical details,
but I would suggest that a very large term with dozens of wildcards is a
"known limitation" (albeit not well-documented.) IOW, to use wildcards in
Lucene in a performant manner, they need to be "brief".
-- Jack Krupansky
Hi,
I have to index millions of files, that's why i am thinking batch wise
indexing is good.
Is it possible to do batch indexing using lucene?
If batch indexing is possible using lucene provide me sample snippet.
So could you please provide your valuable suggestions.
Thanks
Venkata krishna
I suspect you're getting leading wildcard searches as well, which must
do entire term scans unless you're doing the reverse trick.
Replacing all successive whitespace gives you:
Lorem*ipsum*dolor*sit*amet,*consetetur*sadipscing*elitr,*sed*diam*nonumy*eirmod*tempor*invidunt*ut*labore*et*dolore*magn
download lucene source code... and check the demo source files that are
shipped with it ... you should find a sample indexing file...
On Thu, Jun 26, 2014 at 9:27 PM, Venkata krishna
wrote:
> Hi,
>
> I have to index millions of files, that's why i am thinking batch wise
> indexing is good.
>
>
The test case is "only" parsing this query, not trying to run it,
right? So it doesn't involve automaton/FST ... just the flexible
query parser code?
It seems bad that flexible QP would take so long, even if the query is
"strange".
Can you open an issue, and maybe attach a thread dump so we can
I came across this type when I checked this blog:
http://blog.trifork.com/2011/10/27/introducing-lucene-index-doc-values/
The blog mentions that the IndexDocValues are created as sorting types indexed
specifically for the purpose and reduce the overhead created by the FieldCache.
I could not l