RE: java.io.IOException: Map failed

2015-10-03 Thread Uwe Schindler
Hi, Yes, NIOFS would work. Please don’t use SimpleFSDirectory unless really needed. The problem with both implementations is a large slowdown when using DocValues (e.g, for sorting). Standard index queries are also slower due to additional buffering and copying, but it’s not as large. >

lucene deliberately removes \r (windows carriage char)

2015-10-03 Thread Ziqi Zhang
Hi I am trying to pin-point a mismatch between the offsets produced by lucene indexing process when I use the offsets to substring from the original document content. I try to debug as far as I can go but I lost track of lucene when I am at line 298 of DefaultIndexingChain (lucene 5.3.0):

Re: java.io.IOException: Map failed

2015-10-03 Thread Ziqi Zhang
Thanks Uwe Unfortunately I am using a company server and the system admin refuses to change those settings. For now my only option is to explicitly use either SimpleFSDirectory or NIOFSDirectory. But at least it is working! On 01/10/2015 20:53, Uwe Schindler wrote: Hi, You must ask the s

RE: lucene deliberately removes \r (windows carriage char)

2015-10-03 Thread Uwe Schindler
Hi, Lucene does not remove the \r\n while indexing or storing fields. The Analyzer just splits e.g., at whitespace (depends on Analyzer). So if you original data has \r\n, then the offsets would be according to that (it counts 2 chars). Could it be that you read it using a BufferedReader per li

Re: java.io.IOException: Map failed

2015-10-03 Thread Erick Erickson
bq: Unfortunately I am using a company server and the system admin refuses to change those settings. For now my only option is to explicitly use either SimpleFSDirectory or NIOFSDirectory. But at least it is working! Then find another employer ;). Really, if the system admin is unwilling to listen

Re: lucene deliberately removes \r (windows carriage char)

2015-10-03 Thread Michael McCandless
Are you using MappingCharFilter? It unfortunately has known bugs which require controversial API changes to fix: https://issues.apache.org/jira/browse/LUCENE-6595 Mike McCandless http://blog.mikemccandless.com On Sat, Oct 3, 2015 at 6:02 PM, Uwe Schindler wrote: > Hi, > > Lucene does not remov

Re: lucene deliberately removes \r (windows carriage char)

2015-10-03 Thread Ziqi Zhang
Well this is very strange then. If I knew where exactly those "IndexableField" are constructed in the pipeline i could possibly pin down the bug... In any case, no I did not use MappingCharFilter or a BufferedReader. The way I pass content to analyse is straightforward: >>> SolrInputDocument

RE: lucene deliberately removes \r (windows carriage char)

2015-10-03 Thread Uwe Schindler
Hi, I have the feeling Solr is causing this. Maybe better ask on their side, I am almost 100% sure this has nothing to do with Lucene! The ReuseableStringReader you see is caused by the way how Solr sets the field contents (as String). If the StringReader has no \r anymore, then it is Solr's fa