[gwt-contrib] Re: Illegal XML characters in SOYC XML files

2009-08-19 Thread spoon
LGTM. There is a tricky problem here, probably deserving a comment in the SOYC code. Ideally, the XML file should be non-lossy, and the original string text should be recoverable. The best way I have run into to accomplish that would be to convert the string data back into string literal

[gwt-contrib] Re: Illegal XML characters in SOYC XML files

2009-08-19 Thread jat
Personally, I would just transform every character ==0 or 127 into a \x or \u escape (or since this is XML you could use an entity reference, #x;). There shouldn't be a ton of them and it isn't like XML is small anyway. http://gwt-code-reviews.appspot.com/61801

[gwt-contrib] Re: Illegal XML characters in SOYC XML files

2009-08-19 Thread spoon
I like the #x; idea. There is just one potential problem: will XML readers support it? The linked XML spec has the same restrictions on encoded character entities as on raw characters appearing in the file. Does anyone know if that restriction is honored in practice? Anyone want to test on

[gwt-contrib] Re: Illegal XML characters in SOYC XML files

2009-08-19 Thread Ian Petersen
On Wed, Aug 19, 2009 at 6:28 AM, sp...@google.com wrote: I like the #x; idea. There is just one potential problem: will XML readers support it? The linked XML spec has the same restrictions on encoded character entities as on raw characters appearing in the file. Does anyone know if that

[gwt-contrib] Re: Illegal XML characters in SOYC XML files

2009-08-19 Thread kprobst
Thanks, Lex. I didn't try the #x; idea (see Ian's comment), but I also added the other illegal characters. I'll leave the recoverability (in the dashboard) for another day: (x00) and (u) seem good to me for human consumption, and the surrogate blocks characters shouldn't really ever in an