Re: [Python-Dev] [Python-checkins] cpython: whatsnew: XMLPullParser, plus some doc updates.

2014-01-06 Thread Nick Coghlan
On 5 Jan 2014 12:54, r.david.murray python-check...@python.org wrote:

 http://hg.python.org/cpython/rev/069f88f4935f
 changeset:   88308:069f88f4935f
 user:R David Murray rdmur...@bitdance.com
 date:Sat Jan 04 23:52:50 2014 -0500
 summary:
   whatsnew: XMLPullParser, plus some doc updates.

 I was confused by the text saying that read_events iterated, since it
 actually returns an iterator (that's what a generator does) that the
 caller must then iterate.  So I tidied up the language.  I'm not sure
 what the sentence Events provided in a previous call to read_events()
 will not be yielded again. is trying to convey, so I didn't try to fix
that.

It's a mutating API - once the events have been retrieved, that's it,
they're gone from the internal state. Suggestions for wording improvements
welcome :)

Cheers,
Nick.


 Also fixed a couple more news items.

 files:
   Doc/library/xml.etree.elementtree.rst |  23 +-
   Doc/whatsnew/3.4.rst  |   7 ++-
   Lib/xml/etree/ElementTree.py  |   2 +-
   Misc/NEWS |  12 +++---
   4 files changed, 25 insertions(+), 19 deletions(-)


 diff --git a/Doc/library/xml.etree.elementtree.rst
b/Doc/library/xml.etree.elementtree.rst
 --- a/Doc/library/xml.etree.elementtree.rst
 +++ b/Doc/library/xml.etree.elementtree.rst
 @@ -105,12 +105,15 @@
  root[0][1].text
 '2008'

 +
 +.. _elementtree-pull-parsing:
 +
  Pull API for non-blocking parsing
  ^

 -Most parsing functions provided by this module require to read the whole
 -document at once before returning any result.  It is possible to use a
 -:class:`XMLParser` and feed data into it incrementally, but it's a push
API that
 +Most parsing functions provided by this module require the whole document
 +to be read at once before returning any result.  It is possible to use an
 +:class:`XMLParser` and feed data into it incrementally, but it is a push
API that
  calls methods on a callback target, which is too low-level and
inconvenient for
  most needs.  Sometimes what the user really wants is to be able to parse
XML
  incrementally, without blocking operations, while enjoying the
convenience of
 @@ -119,7 +122,7 @@
  The most powerful tool for doing this is :class:`XMLPullParser`.  It
does not
  require a blocking read to obtain the XML data, and is instead fed with
data
  incrementally with :meth:`XMLPullParser.feed` calls.  To get the parsed
XML
 -elements, call :meth:`XMLPullParser.read_events`.  Here's an example::
 +elements, call :meth:`XMLPullParser.read_events`.  Here is an example::

  parser = ET.XMLPullParser(['start', 'end'])
  parser.feed('mytagsometext')
 @@ -1038,15 +1041,17 @@

 .. method:: read_events()

 -  Iterate over the events which have been encountered in the data
fed to the
 -  parser.  This method yields ``(event, elem)`` pairs, where *event*
is a
 +  Return an iterator over the events which have been encountered in
the
 +  data fed to the
 +  parser.  The iterator yields ``(event, elem)`` pairs, where
*event* is a
string representing the type of event (e.g. ``end``) and *elem*
is the
encountered :class:`Element` object.

Events provided in a previous call to :meth:`read_events` will not
be
 -  yielded again. As events are consumed from the internal queue only
as
 -  they are retrieved from the iterator, multiple readers calling
 -  :meth:`read_events` in parallel will have unpredictable results.
 +  yielded again.  Events are consumed from the internal queue only
when
 +  they are retrieved from the iterator, so multiple readers
iterating in
 +  parallel over iterators obtained from :meth:`read_events` will have
 +  unpredictable results.

 .. note::

 diff --git a/Doc/whatsnew/3.4.rst b/Doc/whatsnew/3.4.rst
 --- a/Doc/whatsnew/3.4.rst
 +++ b/Doc/whatsnew/3.4.rst
 @@ -1088,9 +1088,10 @@
  xml.etree
  -

 -Add an event-driven parser for non-blocking applications,
 -:class:`~xml.etree.ElementTree.XMLPullParser`.
 -(Contributed by Antoine Pitrou in :issue:`17741`.)
 +A new parser, :class:`~xml.etree.ElementTree.XMLPullParser`, allows a
 +non-blocking applications to parse XML documents.  An example can be
 +seen at :ref:`elementtree-pull-parsing`.  (Contributed by Antoine
 +Pitrou in :issue:`17741`.)

  The :mod:`xml.etree.ElementTree` :func:`~xml.etree.ElementTree.tostring`
and
  :func:`~xml.etree.ElementTree.tostringlist` functions, and the
 diff --git a/Lib/xml/etree/ElementTree.py b/Lib/xml/etree/ElementTree.py
 --- a/Lib/xml/etree/ElementTree.py
 +++ b/Lib/xml/etree/ElementTree.py
 @@ -1251,7 +1251,7 @@
  self._close_and_return_root()

  def read_events(self):
 -Iterate over currently available (event, elem) pairs.
 +Return an iterator over currently available (event, elem)
pairs.

  Events are consumed from the internal event queue as they are
  

Re: [Python-Dev] [Python-checkins] cpython: whatsnew: XMLPullParser, plus some doc updates.

2014-01-06 Thread R. David Murray
On Tue, 07 Jan 2014 01:22:21 +1000, Nick Coghlan ncogh...@gmail.com wrote:
 On 5 Jan 2014 12:54, r.david.murray python-check...@python.org wrote:
 
  http://hg.python.org/cpython/rev/069f88f4935f
  changeset:   88308:069f88f4935f
  user:R David Murray rdmur...@bitdance.com
  date:Sat Jan 04 23:52:50 2014 -0500
  summary:
whatsnew: XMLPullParser, plus some doc updates.
 
  I was confused by the text saying that read_events iterated, since it
  actually returns an iterator (that's what a generator does) that the
  caller must then iterate.  So I tidied up the language.  I'm not sure
  what the sentence Events provided in a previous call to read_events()
  will not be yielded again. is trying to convey, so I didn't try to fix
 that.
 
 It's a mutating API - once the events have been retrieved, that's it,
 they're gone from the internal state. Suggestions for wording improvements
 welcome :)

Well, my guess as to what it meant was roughly:

An Event will be yielded exactly once regardless of how many read_events
iterators are processed.

Looking at the code, though, I'm not sure that's actually true.  The
code does not appear to be thread-safe.  Of course, it isn't intended to
be used in a threaded context, but the docs don't quite make that
explicit.  I imagine that's the intent of the statement about parallel
reading, but it doesn't actually say that the code is not thread safe.
It reads more as if it is warning that the order of retrieval would be
unpredictable.

--David
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com