Title: Message
I don't know if this is the best way or not, you can build the DOM from the input xml file and then serialize that DOM to XML by given UTF-8 encoding, here is some sampe code.
 
InputSource is = new InputSource(inputs);
 
parser.parse(is);
Document document = parser.getDocument();
 
org.apache.xml.serialize.OutputFormat outputFormat = new org.apache.xml.serialize.OutputFormat();
 
outputFormat.setPreserveSpace(false);
outputFormat.setIndenting(true);
outputFormat.setIndent(4);
outputFormat.setLineSeparator(System.getProperty("line.separator"));
outputFormat.setLineWidth(0);
outputFormat.setEncoding("utf-8");
 
OutputStream outArray = new ByteArrayOutputStream();
XMLSerializer serializer = new XMLSerializer(outArray, outputFormat);
serializer.serialize(document);
.....
 
Benson.
-----Original Message-----
From: Praveen Peddi [mailto:[EMAIL PROTECTED]
Sent: Tuesday, July 13, 2004 8:36 AM
To: [EMAIL PROTECTED]
Subject: Best way to read non-utf xml documents

I have input xml files in "windows-1252" encoding and I have to convert these into utf-8 format and send to server (server assumes that all input xml files are utf-8 encoded). When I read the files and output in utf-8 encoding, I am losing some special characters like registered marks, copy right etc.
 
I am reading the file in OS native encoding and outputting in utf-8 encoding (by not specifying any encoding for input stream).
 
Whats the best way to read non-utf8 encoded xml files and output in utf-8 encoding.
 
Any help would be appreciated...
 
 
Thanks
Praveen
 
**************************************************************
Praveen Peddi
Sr Software Engg, Context Media, Inc.
email:[EMAIL PROTECTED]
Tel:  401.854.3475
Fax:  401.861.3596
web: http://www.contextmedia.com
**************************************************************
Context Media- "The Leader in Enterprise Content Integration"

Reply via email to