st.
>
> Uwe
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
> > -Original Message-
> > From: KK [mailto:dioxide.softw...@gmail.com]
> > Sent: Thursday, May 21, 2009 7:01 PM
&g
Hi KK,
> right? and remove this conversion that I'm doing later ,
>
> byte [] utfEncodeByteArray = textOnly.getBytes();
> String utfString = new String(utfEncodeByteArray, Charset.forName("UTF-
> 8"));
>
> This will make sure I'm not depending on the platform encoding, right?
In principle, ye
user@lucene.apache.org
> Subject: Re: Posting unicode data to lucene not working during
> searching/retreival!
>
> I did all the changes but no improvement. the data is getting indexed
> properly, I think because I'm able to see the results through luke and
> luke
> has opti
I did all the changes but no improvement. the data is getting indexed
properly, I think because I'm able to see the results through luke and luke
has option for seeing the results in both utf-8 encoding and string default
encoding. I tried to use both but no difference. In both the cases I'm able
t
Thanks @Uwe.
#To answer your last mails query, textOnly is the output of the method
downloadPage(), complete text thing includeing all html tags etc...
#Instead of doing the encode/decode later, what i should do is when
downloading the page through buffered reader put the charset as utf-8 as you
me
I forgot:
> byte [] utfEncodeByteArray = textOnly.getBytes();
> String utfString = new String(utfEncodeByteArray, Charset.forName("UTF-
> 8"));
>
> here textonly is the text extracted from the downloaded page
What is textonly here? A String, if yes, why decode and then again encode
it? The impor
Hallo KK.,
> Thanks for your quick response. Let me explain the whole thing.
> I'm downloading the pages for give urls and then extracting text and
> converting that to unicode utf-8 this way,
>
> byte [] utfEncodeByteArray = textOnly.getBytes();
> String utfString = new String(utfEncodeByteArray
Thanks for your quick response. Let me explain the whole thing.
I'm downloading the pages for give urls and then extracting text and
converting that to unicode utf-8 this way,
byte [] utfEncodeByteArray = textOnly.getBytes();
String utfString = new String(utfEncodeByteArray, Charset.forName("UTF-8
Indexed data is coming out in the same way as put in. Lucene works with Java
Strings, so encoding is irrelevant. When you index your values, you must be
sure, to construct your index string/char arrays correctly using the UTF-8
encoding (e.g. by using a standard Java Reader, new String byte[], char