Im a python programmer, your whole approach to strings/unicode needs help. The encoding issue you have isnt due to the library but rather coder error. If you want to jump on IRC I can talk you through the issues.
On Sun, Jul 14, 2013 at 4:23 PM, Strainu <[email protected]> wrote: > 2013/7/14 MZMcBride <[email protected]>: > > Strainu wrote: > >>I'm trying to parse the following xml (abbriged for brevity): > >> > >><?xml version="1.0" encoding="UTF-8"?> > >><județ> > >> <siruta>47</siruta> > >> <nume>Județul Bacău</nume> > >></județ> > >> > >>Every validator I've tried marks an error on the ț in the tag named > >>județ. > > > > Hi. > > > > This list is a fine place to ask. :-) > > Hi, > > > > > Are you having trouble with validation or parsing? Validators can simply > > be wrong. Which validators are you using? And which parsers are you > using? > > I'm having trouble with both. I used the W3C validator [1], which > wasn't designed for random XML files, but can still find a good number > of errors and xmlvalidation.com [2]. On the parsing side, I tried with > python's lxml; the output is available at [3] > > > > Can you be more specific about what you're trying to do (feel free to > link > > to or include sample code) and the tools you're trying to do it with? > > Well, I have a PHP website which gathers public data about Romania's > administrative units, which I then try to export in > programming-friendly formats (CSV, JSON, XML). The workflow is: > extract the data from the database, put it in a PHP array, then use > this array to generate all the output formats. You have an example of > such an array at [4] (since my initial email I've worked around the > diacritics problem, but I'm still searching for a solution). For > converting to XML I have a custom array_walk function [5]. > > I know that some potential reusers are heavy XML fans, so I wanted to > give them an easy way to reuse the data. Having the XML tags/JSON keys > with diacritics is not a must have, but is definitely a very nice > feature, because those keys could be used directly as labels when > printing the data somewhere. > > Regards, > Strainu > > > > [1] http://validator.w3.org/ > [2] http://www.xmlvalidation.com/ > [3] https://gist.github.com/mgax/f6a3edc5b4883b3377e8 > [4] > https://github.com/strainu/despresate/blob/master/include/sat_functions.php#L278 > [5] > https://github.com/strainu/despresate/blob/master/include/common.php#L57 > > _______________________________________________ > Wikitech-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
