Hi,
Yes, NIOFS would work. Please don’t use SimpleFSDirectory unless really needed.
The problem with both implementations is a large slowdown when using DocValues
(e.g, for sorting). Standard index queries are also slower due to additional
buffering and copying, but it’s not as large.
>
Hi
I am trying to pin-point a mismatch between the offsets produced by
lucene indexing process when I use the offsets to substring from the
original document content.
I try to debug as far as I can go but I lost track of lucene when I am
at line 298 of DefaultIndexingChain (lucene 5.3.0):
Thanks Uwe
Unfortunately I am using a company server and the system admin refuses
to change those settings. For now my only option is to explicitly use
either SimpleFSDirectory or NIOFSDirectory. But at least it is working!
On 01/10/2015 20:53, Uwe Schindler wrote:
Hi,
You must ask the s
Hi,
Lucene does not remove the \r\n while indexing or storing fields. The Analyzer
just splits e.g., at whitespace (depends on Analyzer). So if you original data
has \r\n, then the offsets would be according to that (it counts 2 chars).
Could it be that you read it using a BufferedReader per li
bq: Unfortunately I am using a company server and the system admin
refuses to change those settings. For now my only option is to
explicitly use either SimpleFSDirectory or NIOFSDirectory. But at
least it is working!
Then find another employer ;). Really, if the system admin is
unwilling to listen
Are you using MappingCharFilter?
It unfortunately has known bugs which require controversial API
changes to fix: https://issues.apache.org/jira/browse/LUCENE-6595
Mike McCandless
http://blog.mikemccandless.com
On Sat, Oct 3, 2015 at 6:02 PM, Uwe Schindler wrote:
> Hi,
>
> Lucene does not remov
Well this is very strange then. If I knew where exactly those
"IndexableField" are constructed in the pipeline i could possibly pin
down the bug...
In any case, no I did not use MappingCharFilter or a BufferedReader.
The way I pass content to analyse is straightforward:
>>>
SolrInputDocument
Hi,
I have the feeling Solr is causing this. Maybe better ask on their side, I am
almost 100% sure this has nothing to do with Lucene! The ReuseableStringReader
you see is caused by the way how Solr sets the field contents (as String). If
the StringReader has no \r anymore, then it is Solr's fa