SynonymMap through filesystem

2014-03-03 Thread Geet Gangwar
Hi, is there a way I can search in file to match my synomyms, instead of building a SynonymMap ? My synonym list is going to be very large and I don;t want to keep it in memory. Regards Geet

encoding problem when retrieving document field value

2014-03-03 Thread G.Long
Hi :) My index (Lucene 3.5) contains a field called title. Its value is indexed (analyzed and stored) with the WhitespaceAnalyzer and can contains html entities such as #146; or #176; My problem is that when i retrieve values from this field, some of the html entities are missing. For

Re: SynonymMap through filesystem

2014-03-03 Thread Michael McCandless
Currently, no. Patches welcome! The synonyms are compiled into a compact FST, which is RAM resident (but, GC efficient: it's a massive byte[], or several if your FST is really large). Just how large is the FST in your case? Mike McCandless http://blog.mikemccandless.com On Mon, Mar 3, 2014

RE: encoding problem when retrieving document field value

2014-03-03 Thread Uwe Schindler
Hi G. Long, Most likely, the problem is in your application. Lucene does not change the value stored in the index. For stored fields, Lucene does not deal with entities, it's just binary data to Lucene. From your application perspective, it is String in - String out. I think maybe you strip

Re: encoding problem when retrieving document field value

2014-03-03 Thread G.Long
Hi :) I've got this result directly from tncTitle in the following code: field = doc.getFieldable(IndexConstants.FIELD_TNC_TITLE); if (field != null) { tncTitle = field.stringValue(); } ps: in my previous email, the copy/paste of the apostrophe html number made it appear correctly

Re: encoding problem when retrieving document field value

2014-03-03 Thread Jack Krupansky
What is the hex value for that second character returned that appears to display as an apostrophe? Hex 92 (decimal 146) is listed as Private Use 2, so who knows what it might display as. All that is important is the binary/hax value. Out of curiosity, how did your application come about

Re: encoding problem when retrieving document field value

2014-03-03 Thread Trejkaz
On Tue, Mar 4, 2014 at 4:44 AM, Jack Krupansky j...@basetechnology.com wrote: What is the hex value for that second character returned that appears to display as an apostrophe? Hex 92 (decimal 146) is listed as Private Use 2, so who knows what it might display as. Well, if they're dealing