[issue14852] json and ElementTree parsers misbehave on streams containing more than a single object

2012-06-08 Thread Eli Bendersky

Changes by Eli Bendersky eli...@gmail.com:


--
resolution:  - wont fix
stage:  - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14852
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14852] json and ElementTree parsers misbehave on streams containing more than a single object

2012-06-01 Thread Eli Bendersky

Eli Bendersky eli...@gmail.com added the comment:

I propose to close this issue. If the problem in json is real and someone 
thinks it has to be fixed, a separate issue specifically for json should be 
opened.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14852
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14852] json and ElementTree parsers misbehave on streams containing more than a single object

2012-05-28 Thread Eli Bendersky

Eli Bendersky eli...@gmail.com added the comment:

I don't think this is an enhancement to ET, because ET was not designed to be a 
streaming parser, which is what is required here. ET was designed to read a 
whole valid XML document. There is 'iterparse', as Antoine mentioned, but it is 
designed to track changes to the tree while it is being built - mostly to 
save memory.

You have streaming XML parsers in Python - for example xml.sax. You can also 
relatively easily use xml.sax to find the end of your document and then parse 
the buffer with ET.

I don't see how a comparison with Parsec (a parser generator/DSL library) makes 
sense. There are tons of such libraries for Python - just pick one.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14852
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14852] json and ElementTree parsers misbehave on streams containing more than a single object

2012-05-25 Thread Éric Araujo

Éric Araujo mer...@netwok.org added the comment:

I am not sure the parsers should be lenient.  One could argue that it’s the 
stream that is broken if it contains non-compliant XML or JSON.  Can you tell 
more about the use case?

--
nosy: +eli.bendersky, eric.araujo, ezio.melotti, pitrou, rhettinger
versions:  -Python 2.6

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14852
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14852] json and ElementTree parsers misbehave on streams containing more than a single object

2012-05-25 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

ElementTree supports incremental parsing with the iterparse() method, not sure 
it fills your use case:
http://docs.python.org/dev/library/xml.etree.elementtree.html#xml.etree.ElementTree.iterparse

As for the json module, it doesn't have such a facility.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14852
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14852] json and ElementTree parsers misbehave on streams containing more than a single object

2012-05-25 Thread Frederick Ross

Frederick Ross madhad...@gmail.com added the comment:

Antoine, It's not iterative parsing, it's a sequence of XML docs or json 
objects.

Eric, the server I'm retrieving from, for real time searches, steadily produces 
a stream of (each properly formed) XML or json documents containing new search 
results. However, at the moment I have to edit the stream on the fly to wrap an 
outer tag around it and remove any DTD in inner elements, or I can't use the 
XML parser. Such a workaround isn't possible with the json parser, since it has 
no iterative parsing mode.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14852
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14852] json and ElementTree parsers misbehave on streams containing more than a single object

2012-05-25 Thread R. David Murray

R. David Murray rdmur...@bitdance.com added the comment:

I think it is perfectly reasonable for a parser to leave the file pointer in 
some undefined further location into the file when it detects extra stuff and 
produces an error message.  One can certainly argue that producing that error 
message is a feature (detect badly formed documents).  

I also think that your use case is a perfectly reasonable one, but I think a 
mode that supports your use case would be an enhancement.

--
nosy: +r.david.murray
type:  - enhancement
versions: +Python 3.3 -Python 2.7

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14852
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14852] json and ElementTree parsers misbehave on streams containing more than a single object

2012-05-25 Thread Frederick Ross

Frederick Ross madhad...@gmail.com added the comment:

In the case of files, sure, it's fine. The error gives me the offset, and I can 
go pull it out and buffer it, and it's fine. Plus XML is strict about having 
only one document per file.

For streams, none of this is applicable. I can't seek in a streaming network 
connection. If the parser leaves it in an unusable state, then I lose 
everything that may follow. It makes Python unusable in certain, not very rare, 
cases of network programming.

I'll just add that Haskell's Parsec does this right, and should be used as an 
example.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14852
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14852] json and ElementTree parsers misbehave on streams containing more than a single object

2012-05-25 Thread R. David Murray

R. David Murray rdmur...@bitdance.com added the comment:

Well, if the stream isn't seekable then I don't see how it can be left in any 
state other than the same one it leaves a file (read ahead as much as it read 
to generate the error).  So unfortunately by our backward compatibility rules I 
still think this will be a new feature.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14852
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14852] json and ElementTree parsers misbehave on streams containing more than a single object

2012-05-18 Thread Frederick Ross

New submission from Frederick Ross madhad...@gmail.com:

When parsing something like 'ax/aay/a' with xml.etree.ElementTree, or 
'{}{}' with json, these parser throw exceptions instead of reading a single 
element of the kind they understand off the stream (or throwing an exception if 
there is no element they understand) and leaving the stream in a sane state.

So I should be able to write

import xml.etree.ElementTree as et
import StringIO
s = StringIO.StringIO(ax/aay/a)
elem1 = et.parse(s)
elem2 = et.parse(s)

and have elem1 correspond to ax/a and elem2 correspond to ay/a.

At the very least, if the parsers refuse to parse partial streams, they should 
at least not destroy the streams.

--
components: Library (Lib)
messages: 161068
nosy: Frederick.Ross
priority: normal
severity: normal
status: open
title: json and ElementTree parsers misbehave on streams containing more than a 
single object
versions: Python 2.6, Python 2.7

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14852
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com