I've been out on vacation for a few days, but I'm back now and I'm continuing to work ont he output formatting stuff. I'm still working on the low level stuff, that provides the basic formatting support for stuff later to come. I'm incorporating this stuff into SAXPrint for testing, so you might want to play with it. The current SAXPrint code in the repository has a new parameter: -x=encodingname where encodingname is the output encoding you want to print the file in. The resulting output should be in the encoding you indicate and should have the appropriate stuff escaped so that its legal XML again. Current limitations: 1. The ability to escape characters not representable in the target encoding is not in there yet, so it will consider it an error at this time. Of course, encodings like UTF-16B or UTF-16L, or UTF-8 will represent anything. But if you choose some other encoding, then it might be a problem. 2. It does not automatically pick up the source encoding. The reason for this is that the plugged in display handler in this sample is just one that delegates to cout. Obviously cout cannot handle UTF-16 or UCS-4 and such. So, if you choose one of those, you are going to get gorp for output. So I'm making you select the output format for now, and will auto-select UTF-8 if you don't provide one. 3. Since the SAX output doesn't let us know whether "" or '' was used on an attribute, the output always does "" and escapes any " within the attribute value. 4. Even though is theoretically legal to have "&someref;" where the content of someref has a " inside of it, the SAXPrint code is not smart enough to keep up with nesting of entities and not escape a " within an entity ref inside the attribute value. As an example, the file tmp.xml: ------------------------ <?xml version='1.0' encoding="ISO-8859-1"?> <root foo="<""> &< </root> ------------------------ when run like this: ------------------------ SAXPrint -x=ISO-8859-1 tmp.xml ------------------------ will come out like this: ------------------------ <?xml version='1.0' encoding="ISO-8859-1"?> <root foo="<""> &< </root> ------------------------ which is of course a legal file that can be re-parsed again. If you look inside the SAXPrint handler class, you'll see that basically its almost all just delegation to an XMLFormatter object that was created by the print handler during construction. There are some enums in XMLFormatter that indicate what style of escaping you want, and how to deal with chars that cannot be represented. The first flag works, but the second flag will be ignored for now. I'll work on that next, but its going to be a slower process (at runtime I mean) to use the flag that says to do unrepresentable chars as char refs. Anyway, if you want to start playing with SAXPrint or playing with the XMLFormatter stuff in some of your own code, feel free to do so and to give some feedback. Just be aware that it could be a bit unstable at this early point. If you just want lower level transcoding support. the formatter stuff is based on the new two way transcoding as well. So you might want to play with someof that too. You can now create an XMLTranscoder for a named encoding and use it for transcoding both directions, which means you can use it to transcode Unicode to your target encoding. Your own options at this level are to either have unrepresentable chars be replaced with a replacement char, or to have it be an error. ---------------------------------------- Dean Roddey Software Weenie IBM Center for Java Technology - Silicon Valley [EMAIL PROTECTED]