Hi Suraj,
I absolutely agree with you. I am spitting the
output String into the shell/terminal/console. I am able to see the ? symbols. I
see the ? symbols even when i open the output file in VI editor.
I think it is while reading the input I am messing
up. I don't know if I am messing up, because I tried almost all different
possibilities, but nothing worked. Surprising thing is, it works fine on
one Solaris machine, but fails on other Solaris machine. This is making the
issue complicated.
Hope I find the solution very soon!
Thanx,
Pramodh.
----- Original Message -----
Sent: Monday, November 10, 2003 4:21
PM
Subject: RE: Directly referenced special
characters as "?"
Did you try opening output xml in XMLSpy? I did
transformation using xalan command line and generated attached html file.
IE is able to open this file, in UTF-8 encoding(View->Encoding
menu). However if you change the encoding to something else, the html does
not show correctly and displays gibberish. I was able to get the same
result after transformation windows and unix.
How do you intend to
consume the output xml? I guess the output would be read correctly if
editor/consumer supports UTF-8 otherwise it would dump data as it deems
fit.
-----Original Message----- From: Pramodh Peddi
[mailto:[EMAIL PROTECTED] Sent: Monday, November 10, 2003 3:20
PM To: Christopher Ebert; [EMAIL PROTECTED]; Kumar,
Suraj Subject: Re: Directly referenced special characters as
"?"
I still couldn't get this solved. We did a hack for that
deadline - replacing the bytes for ® by the bytes for the String
"(R)".....which is really a nasty hack. We are still trying to figure
out why it isn't working.
Chris, I think ® is part of windows-1252
encoding. http://www.juha.karvonen.name/hyoty/char/ says that. Not sure how
genuine that site is. ® is part of both windows-1252 and iso-8859-1
encodings. What do you mean by "If you can, check to see what the ® and ©
characters are in the Java system"?
Suraj, I opened the source
xml file in XMLSpy and I am able to view the ® as is.
We still couldn't
figure out whats the problem and how to solve this problem. I really wonder
if this so tough. Or am I missing something basic. I am really doing simple
stuff. Either reading the source into a String and passing into the
Transformer. Or, passing in the InputStream into
the Transformer.
Please let me know if there are any solutions for
this.
Thanks in advance,
Pramodh.
----- Original
Message ----- From: "Christopher Ebert"
<[EMAIL PROTECTED]> To:
<[EMAIL PROTECTED]> Sent: Thursday, November 06, 2003 8:15
AM Subject: RE: Directly referenced special characters as
"?"
If you can, check to see what the ® and © characters are in
the Java system. You have to be careful, because nearly anything you do to
print them out may serialize them to a character set that doesn't have them
(and so print a ?). The surest way is to print out the characters as bytes
along with a '?' and see if they match. This will tell you if you're losing
the characters because they're not in the input character set or not
the correct encoding for the character set (e.g. not in ISO-8859-1). This
often happens with Windows: Windows uses Cp 1252 as the standard encoding,
which is very similar to ISO-8859-1, but not the same, so it can look like
it's working for a long time*. If so, fix the input encoding, or change all
special characters to entities.
Chris
* See 'Dogg's
Hamlet' for further discussion of the nature of this
problem: http://buedg.daig-kastura.de/stoppard/stopp2.htm -----Original
Message----- From: Pramodh Peddi [mailto:[EMAIL PROTECTED] Sent:
Thursday, November 06, 2003 12:06 AM To:
[EMAIL PROTECTED] Subject: Directly referenced special
characters as "?"
Hi, I couldn't solve my problem fully yet. I
posted a request a couple of days ago and the responses helped me a bit,
but not entirely.
I am having an xml (source) file which has
different special characters - some of which are referenced thru entities
(like ™) and others are referenced directly (like ® and ©). The
entity referenced characters are coming up fine while transforming, but the
directly referenced chars are coming up as "?" chars.
I am using
Java1.4.2's Transformer for transforming.
This is what I am doing on
the Java
code: *********************************************************
if
(filePath != null) {
sftp.get(filePath,
rawfileOutputStream);
rawfileOutputStream.close();
}
ByteArrayInputStream
rawfileInputStream =
new ByteArrayInputStream(rawfileOutputStream.toByteArray());
ByteArrayOutputStream
transformedFileOutputStream = new ByteArrayOutputStream();
File
transformedFile = new File("../server/ic/deploy/data.war/"
+ this.taxXSLTResult);
FileOutputStream out = new
FileOutputStream(transformedFile);
transformer.transform(
new
StreamSource(new InputStreamReader(rawfileInputStream),
this.dtdURL),
new
StreamResult(out));
rawfileInputStream.close();
transformedFileOutputStream.close();
**************************************************************************** ********************
The
source file has "windows-1252" encoding header. And in xsl, I tried
xsl: encoding="iso-8859-1" and xsl: encoding = "windows-1252". Niether of
these worked. I even tried to shange the bytes into String and again into
bytes. Nothing works.
I would really appreciate if anyone I can get
any help!!
Thanks in
advance,
Pramodh.
|