[issue14852] json and ElementTree parsers misbehave on streams containing more than a single object
Changes by Eli Bendersky eli...@gmail.com: -- resolution: - wont fix stage: - committed/rejected status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14852 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14852] json and ElementTree parsers misbehave on streams containing more than a single object
Eli Bendersky eli...@gmail.com added the comment: I propose to close this issue. If the problem in json is real and someone thinks it has to be fixed, a separate issue specifically for json should be opened. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14852 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14852] json and ElementTree parsers misbehave on streams containing more than a single object
Eli Bendersky eli...@gmail.com added the comment: I don't think this is an enhancement to ET, because ET was not designed to be a streaming parser, which is what is required here. ET was designed to read a whole valid XML document. There is 'iterparse', as Antoine mentioned, but it is designed to track changes to the tree while it is being built - mostly to save memory. You have streaming XML parsers in Python - for example xml.sax. You can also relatively easily use xml.sax to find the end of your document and then parse the buffer with ET. I don't see how a comparison with Parsec (a parser generator/DSL library) makes sense. There are tons of such libraries for Python - just pick one. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14852 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14852] json and ElementTree parsers misbehave on streams containing more than a single object
Éric Araujo mer...@netwok.org added the comment: I am not sure the parsers should be lenient. One could argue that it’s the stream that is broken if it contains non-compliant XML or JSON. Can you tell more about the use case? -- nosy: +eli.bendersky, eric.araujo, ezio.melotti, pitrou, rhettinger versions: -Python 2.6 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14852 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14852] json and ElementTree parsers misbehave on streams containing more than a single object
Antoine Pitrou pit...@free.fr added the comment: ElementTree supports incremental parsing with the iterparse() method, not sure it fills your use case: http://docs.python.org/dev/library/xml.etree.elementtree.html#xml.etree.ElementTree.iterparse As for the json module, it doesn't have such a facility. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14852 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14852] json and ElementTree parsers misbehave on streams containing more than a single object
Frederick Ross madhad...@gmail.com added the comment: Antoine, It's not iterative parsing, it's a sequence of XML docs or json objects. Eric, the server I'm retrieving from, for real time searches, steadily produces a stream of (each properly formed) XML or json documents containing new search results. However, at the moment I have to edit the stream on the fly to wrap an outer tag around it and remove any DTD in inner elements, or I can't use the XML parser. Such a workaround isn't possible with the json parser, since it has no iterative parsing mode. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14852 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14852] json and ElementTree parsers misbehave on streams containing more than a single object
R. David Murray rdmur...@bitdance.com added the comment: I think it is perfectly reasonable for a parser to leave the file pointer in some undefined further location into the file when it detects extra stuff and produces an error message. One can certainly argue that producing that error message is a feature (detect badly formed documents). I also think that your use case is a perfectly reasonable one, but I think a mode that supports your use case would be an enhancement. -- nosy: +r.david.murray type: - enhancement versions: +Python 3.3 -Python 2.7 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14852 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14852] json and ElementTree parsers misbehave on streams containing more than a single object
Frederick Ross madhad...@gmail.com added the comment: In the case of files, sure, it's fine. The error gives me the offset, and I can go pull it out and buffer it, and it's fine. Plus XML is strict about having only one document per file. For streams, none of this is applicable. I can't seek in a streaming network connection. If the parser leaves it in an unusable state, then I lose everything that may follow. It makes Python unusable in certain, not very rare, cases of network programming. I'll just add that Haskell's Parsec does this right, and should be used as an example. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14852 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14852] json and ElementTree parsers misbehave on streams containing more than a single object
R. David Murray rdmur...@bitdance.com added the comment: Well, if the stream isn't seekable then I don't see how it can be left in any state other than the same one it leaves a file (read ahead as much as it read to generate the error). So unfortunately by our backward compatibility rules I still think this will be a new feature. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14852 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14852] json and ElementTree parsers misbehave on streams containing more than a single object
New submission from Frederick Ross madhad...@gmail.com: When parsing something like 'ax/aay/a' with xml.etree.ElementTree, or '{}{}' with json, these parser throw exceptions instead of reading a single element of the kind they understand off the stream (or throwing an exception if there is no element they understand) and leaving the stream in a sane state. So I should be able to write import xml.etree.ElementTree as et import StringIO s = StringIO.StringIO(ax/aay/a) elem1 = et.parse(s) elem2 = et.parse(s) and have elem1 correspond to ax/a and elem2 correspond to ay/a. At the very least, if the parsers refuse to parse partial streams, they should at least not destroy the streams. -- components: Library (Lib) messages: 161068 nosy: Frederick.Ross priority: normal severity: normal status: open title: json and ElementTree parsers misbehave on streams containing more than a single object versions: Python 2.6, Python 2.7 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14852 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com