Re: extracting chinese text

Andy Clark 25 May 2001 05:25:05 -0000

Vipul Veera wrote:
> I have a DOM which contains chinese text inside it . I want to traverse the
> DOM and extract the text string out of it. But when I get a Text Node and do
> a getNodeValue on it, it returns a String which contains all '?' charcters
> in it and no chinese charecters.
> How do i extract these chinese charecters out?


If the car doesn't start, try turning the key before hauling
it into the shop for repairs... ;)

Why do you say it contains all '?' characters? Is it because
that's what you see when you print it out to the console or
display it in the application? If so, then it's probably a
font problem. Very common but definitely not a problem with
the parser or DOM implementation.

If the characters *really* are '?', then it sounds like a 
transcoding problem when you're parsing. Are you wrapping 
the input stream with a java Reader? If so, default transcoder 
may blindly convert bytes it doesn't know into '?'. But again,
this is not a problem with the parser or DOM implementation.

-- 
Andy Clark * IBM, TRL - Japan * [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: extracting chinese text

Reply via email to