Re: Posting unicode data to lucene not working during searching/retreival!

2009-05-21 Thread KK
st. > > Uwe > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > > -Original Message- > > From: KK [mailto:dioxide.softw...@gmail.com] > > Sent: Thursday, May 21, 2009 7:01 PM &g

RE: Posting unicode data to lucene not working during searching/retreival!

2009-05-21 Thread Uwe Schindler
Hi KK, > right? and remove this conversion that I'm doing later , > > byte [] utfEncodeByteArray = textOnly.getBytes(); > String utfString = new String(utfEncodeByteArray, Charset.forName("UTF- > 8")); > > This will make sure I'm not depending on the platform encoding, right? In principle, ye

RE: Posting unicode data to lucene not working during searching/retreival!

2009-05-21 Thread Uwe Schindler
user@lucene.apache.org > Subject: Re: Posting unicode data to lucene not working during > searching/retreival! > > I did all the changes but no improvement. the data is getting indexed > properly, I think because I'm able to see the results through luke and > luke > has opti

Re: Posting unicode data to lucene not working during searching/retreival!

2009-05-21 Thread KK
I did all the changes but no improvement. the data is getting indexed properly, I think because I'm able to see the results through luke and luke has option for seeing the results in both utf-8 encoding and string default encoding. I tried to use both but no difference. In both the cases I'm able t

Re: Posting unicode data to lucene not working during searching/retreival!

2009-05-21 Thread KK
Thanks @Uwe. #To answer your last mails query, textOnly is the output of the method downloadPage(), complete text thing includeing all html tags etc... #Instead of doing the encode/decode later, what i should do is when downloading the page through buffered reader put the charset as utf-8 as you me

RE: Posting unicode data to lucene not working during searching/retreival!

2009-05-21 Thread Uwe Schindler
I forgot: > byte [] utfEncodeByteArray = textOnly.getBytes(); > String utfString = new String(utfEncodeByteArray, Charset.forName("UTF- > 8")); > > here textonly is the text extracted from the downloaded page What is textonly here? A String, if yes, why decode and then again encode it? The impor

RE: Posting unicode data to lucene not working during searching/retreival!

2009-05-21 Thread Uwe Schindler
Hallo KK., > Thanks for your quick response. Let me explain the whole thing. > I'm downloading the pages for give urls and then extracting text and > converting that to unicode utf-8 this way, > > byte [] utfEncodeByteArray = textOnly.getBytes(); > String utfString = new String(utfEncodeByteArray

Re: Posting unicode data to lucene not working during searching/retreival!

2009-05-21 Thread KK
Thanks for your quick response. Let me explain the whole thing. I'm downloading the pages for give urls and then extracting text and converting that to unicode utf-8 this way, byte [] utfEncodeByteArray = textOnly.getBytes(); String utfString = new String(utfEncodeByteArray, Charset.forName("UTF-8

RE: Posting unicode data to lucene not working during searching/retreival!

2009-05-20 Thread Uwe Schindler
Indexed data is coming out in the same way as put in. Lucene works with Java Strings, so encoding is irrelevant. When you index your values, you must be sure, to construct your index string/char arrays correctly using the UTF-8 encoding (e.g. by using a standard Java Reader, new String byte[], char