Re: ElementTree.XML(string XML) and ElementTree.fromstring(string XML) not working

2009-06-27 Thread Stefan Behnel
Kee Nethery wrote:
 On Jun 25, 2009, at 11:39 PM, Stefan Behnel wrote:
 parsing a
 document from a string does not have its own function, because it is
 trivial to write

 tree = parse(BytesIO(some_byte_string))
 
 :-) Trivial for someone familiar with the language. For a newbie like
 me, that step was non-obvious.

I actually meant the code complexity, not the fact that you need to know
BytesIO to do the above.


 If what you meant is actually parsing from a byte string, this is easily
 done using BytesIO(), or StringIO() in Py2.x (x6).
 
 Yes, thanks! Looks like BytesIO is a v.3.x enhancement.

It should be available in 2.6 AFAIR, simply as an alias for StringIO.


 Looks like the
 StringIO does what I need since all I'm doing is pulling the unicode
 string into et.parse.

As I said, this won't work, unless you are either

a) passing a unicode string with plain ASCII characters in Py2.x
or
b) confusing UTF-8 and Unicode


 theXmlDataTree =
 et.parse(makeThisUnicodeStringLookLikeAFileSoParseWillDealWithIt(theXmlData))

 This will not work because ET cannot parse from unicode strings (unless
 they only contain plain ASCII characters and you happen to be using
 Python
 2.x). lxml can parse from unicode strings, but it requires that the XML
 must not have an encoding declaration (which would render it non
 well-formed). This is convenient for parsing HTML, it's less
 convenient for XML usually.
 
 Right for my example, if the data is coming in as UTF-8 I believe I can do:
theXmlDataTree = et.parse(StringIO.StringIO(theXmlData), encoding
 ='utf-8')

Yes, although in this case you are not parsing a unicode string but a UTF-8
encoded byte string. Plus, passing 'UTF-8' as encoding to the parser is
redundant, as it is the default for XML.

Stefan
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: ElementTree.XML(string XML) and ElementTree.fromstring(string XML) not working

2009-06-26 Thread Stefan Behnel
Carl Banks wrote:
 On Jun 25, 10:11 pm, Stefan Behnel wrote:
 Carl Banks wrote:
 Why isn't et.parse the only way to do this? Why have XML or fromstring  
 at all?
 Because Fredrick Lundh wanted it that way.  Unlike most Python
 libraries ElementTree is under the control of one person, which means
 it was not designed or vetted by the community, which means it would
 tend to have some interface quirks.
 Just for the record: Fredrik doesn't actually consider it a design quirk.
 
 Well of course he wouldn't--it's his library.

That's not an argument at all. Fredrik put out a alpha of ET 1.3 (long ago,
actually), which is (or was?) meant as a clean-up release for a number of
real quirks in the library (lxml also fixes most of them since 2.0). The
above definitely hasn't changed, simply because it's not considered 'wrong'
by the author(s).

Stefan
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: ElementTree.XML(string XML) and ElementTree.fromstring(string XML) not working

2009-06-26 Thread Stefan Behnel
Hi,

Kee Nethery wrote:
 Why isn't et.parse the only way to do this? Why have XML or fromstring
 at all?

Well, use cases. XML() is an alias for fromstring(), because it's
convenient (and well readable) to write

   section = XML('section id=XYZtitleA to Z/title/section')
   section.append(paragraphs)

for XML literals in source code. fromstring() is there because when you
want to parse a fragment from a string that you got from whatever source,
it's easy to express that with exactly that function, as in

el = fromstring(some_string)

If you want to parse a document from a file or file-like object, use
parse(). Three use cases, three functions. The fourth use case of parsing a
document from a string does not have its own function, because it is
trivial to write

tree = parse(BytesIO(some_byte_string))

I do not argue that fromstring() should necessarily return an Element, as
parsing fragments is more likely for literals than for strings that come
from somewhere else. However, given that the use case of parsing a document
from a string is so easily handled with parse(), I find it ok to give the
second use case its own function, simply because

tree = fromstring(some_string)
fragment_top_element = tree.getroot()

absolutely does not catch it.


 Why not enhance parse and deprecate XML and fromstring with
 something like:

 formPostData = cgi.FieldStorage()
 theXmlData = formPostData['theXml'].value
 theXmlDataTree =
et.parse(makeThisUnicodeStringLookLikeAFileSoParseWillDealWithIt(theXmlData))

This will not work because ET cannot parse from unicode strings (unless
they only contain plain ASCII characters and you happen to be using Python
2.x). lxml can parse from unicode strings, but it requires that the XML
must not have an encoding declaration (which would render it non
well-formed). This is convenient for parsing HTML, it's less convenient for
XML usually.

If what you meant is actually parsing from a byte string, this is easily
done using BytesIO(), or StringIO() in Py2.x (x6).

Stefan
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: ElementTree.XML(string XML) and ElementTree.fromstring(string XML) not working

2009-06-26 Thread Carl Banks
On Jun 25, 11:20 pm, Stefan Behnel stefan...@behnel.de wrote:
 Carl Banks wrote:
  On Jun 25, 10:11 pm, Stefan Behnel wrote:
  Carl Banks wrote:
  Why isn't et.parse the only way to do this? Why have XML or fromstring  
  at all?
  Because Fredrick Lundh wanted it that way.  Unlike most Python
  libraries ElementTree is under the control of one person, which means
  it was not designed or vetted by the community, which means it would
  tend to have some interface quirks.
  Just for the record: Fredrik doesn't actually consider it a design quirk.

  Well of course he wouldn't--it's his library.

 That's not an argument at all.

I can't even imagine what you think I was arguing when I wrote this,
or what issue you could have with this statement.


Carl Banks

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: ElementTree.XML(string XML) and ElementTree.fromstring(string XML) not working

2009-06-26 Thread Kee Nethery
First, thanks to everyone who responded. Figured I'd test all the  
suggestions and provide a response to the list. Here goes ...


On Jun 25, 2009, at 7:38 PM, Nobody wrote:

Why do you need an ElementTree rather than an Element? XML(string)  
returns
the root element, as if you had used et.parse(f).getroot(). You can  
turn

this into an ElementTree with e.g. et.ElementTree(XML(string)).


I tried this:
   et.ElementTree(XML(theXmlData))
and it did not work.

I had to modify it to this to get it to work:
   et.ElementTree(et.XML(theXmlData))



formPostData = cgi.FieldStorage()
theXmlData = formPostData['theXml'].value
theXmlDataTree =
et 
.parse 
(makeThisUnicodeStringLookLikeAFileSoParseWillDealWithIt(theXmlData))


If you want to treat a string as a file, use StringIO.


I tried this:
   import StringIO
   theXmlDataTree = et.parse(StringIO.StringIO(theXmlData))
   orderXml = theXmlDataTree.findall('purchase')

and it did work. StringIO converts the string into what looks like a  
file so parse can process it as a file. Cool.


On Jun 25, 2009, at 7:47 PM, unayok wrote:


I'm not sure what you're expecting.  It looks to me like things are
working okay:

My test script:

[snip]


I agree your code works.

When I tried:
   theXmlDataTree = et.fromstring(theXmlData)
   orderXml = theXmlDataTree.findall('purchase')

When I modified mine to programmatically look inside using the for  
element in theXmlDataTree I was able to see the contents. The  
debugger I am using does not offer me a window into the ElementTree  
data and that was part of the problem. So yes, et.fromstring is  
working correctly. It helps to have someone show me the missing step  
needed to confirm the code works and the IDE does not.




On Jun 25, 2009, at 8:04 PM, Carl Banks wrote:

I believe you are misunderstanding something.  et.XML and
et.fromstring return Elements, whereas et.parse returns an
ElementTree.  These are two different things; however, both of them
contain all the XML.  In fact, an ElementTree (which is returned by
et.parse) is just a container for the root Element (returned by
et.fromstring)--and it adds no important functionality to the root
Element as far as I can tell.


Thank you for explaining the difference. I absolutely was  
misunderstanding this.



Given an Element (as returned by et.XML or et.fromstring) you can pass
it to the ElementTree constructor to get an ElementTree instance.  The
following line should give you something you can play with:

theXmlDataTree = et.ElementTree(et.fromstring(theXmlData))


Yes this works.



On Jun 25, 2009, at 11:39 PM, Stefan Behnel wrote:


If you want to parse a document from a file or file-like object, use
parse(). Three use cases, three functions. The fourth use case of  
parsing a

document from a string does not have its own function, because it is
trivial to write

tree = parse(BytesIO(some_byte_string))


:-) Trivial for someone familiar with the language. For a newbie like  
me, that step was non-obvious.


If what you meant is actually parsing from a byte string, this is  
easily

done using BytesIO(), or StringIO() in Py2.x (x6).


Yes, thanks! Looks like BytesIO is a v.3.x enhancement. Looks like the  
StringIO does what I need since all I'm doing is pulling the unicode  
string into et.parse. Am guessing that either would work equally well.




theXmlDataTree =
et 
.parse 
(makeThisUnicodeStringLookLikeAFileSoParseWillDealWithIt(theXmlData))


This will not work because ET cannot parse from unicode strings  
(unless
they only contain plain ASCII characters and you happen to be using  
Python
2.x). lxml can parse from unicode strings, but it requires that the  
XML

must not have an encoding declaration (which would render it non
well-formed). This is convenient for parsing HTML, it's less  
convenient for

XML usually.


Right for my example, if the data is coming in as UTF-8 I believe I  
can do:
   theXmlDataTree = et.parse(StringIO.StringIO(theXmlData), encoding  
='utf-8')



Again, as a newbie, thanks to everyone who took the time to respond.  
Very helpful.

Kee
--
http://mail.python.org/mailman/listinfo/python-list


ElementTree.XML(string XML) and ElementTree.fromstring(string XML) not working

2009-06-25 Thread Kee Nethery
Summary: I have XML as string and I want to pull it into ElementTree  
so that I can play with it but it is not working for me. XML and  
fromstring when used with a string do not do the same thing as parse  
does with a file. How do I get this to work?


Details:
I have a CGI that receives XML via an HTTP POST as a POST variable  
named 'theXml'. The POST data is a string that the CGI receives, it is  
not a file on a hard disk.


The POSTed string looks like this when viewed in pretty format:

xml
purchase id=1 lang=en
item id=1 productId=369369
nameAutumn/name
quantity1/quantity
price8.46/price
/item
javascriptYES/javascript
/purchase
customer id=123456 time=1227449322
shipping
street19 Any Street/street
cityBerkeley/city
stateCalifornia/state
zip12345/zip
countryPeople's Republic of Berkeley/country
nameJon Roberts/name
/shipping
emailju...@shrimp.edu/email
/customer
/xml


The pseudocode in Python 2.6.2 looks like:

import xml.etree.ElementTree as et

formPostData = cgi.FieldStorage()
theXmlData = formPostData['theXml'].value
theXmlDataTree = et.XML(theXmlData)

and when this runs, theXmlDataTree is set to:

theXmlDataTree  instanceElement xml at 7167b0
attrib  dict{}
tag str xml
tailNoneTypeNone
textNoneTypeNone

I get the same result with fromstring:

formPostData = cgi.FieldStorage()
theXmlData = formPostData['theXml'].value
theXmlDataTree = et.fromstring(theXmlData)

I can put the xml in a file and reference the file by it's URL and use:

et.parse(urllib.urlopen(theUrl))

and that will set theXmlDataTree to:

theXmlDataTree	instance	xml.etree.ElementTree.ElementTree instance at  
0x67cb48


This result I can play with. It contains all the XML.

et.parse seems to pull in the entire XML document and give me  
something to play with whereas et.XML and et.fromstring do not.


Questions:
How do I get this to work?
Where in the docs did it give me an example of how to make this work  
(what did I miss from reading the docs)?


... and for bonus points ...

Why isn't et.parse the only way to do this? Why have XML or fromstring  
at all? Why not enhance parse and deprecate XML and fromstring with  
something like:


formPostData = cgi.FieldStorage()
theXmlData = formPostData['theXml'].value
theXmlDataTree =  
et 
.parse 
(makeThisUnicodeStringLookLikeAFileSoParseWillDealWithIt(theXmlData))


Thanks in advance,
Kee Nethery
--
http://mail.python.org/mailman/listinfo/python-list


Re: ElementTree.XML(string XML) and ElementTree.fromstring(string XML) not working

2009-06-25 Thread Nobody
On Thu, 25 Jun 2009 18:02:25 -0700, Kee Nethery wrote:

 Summary: I have XML as string and I want to pull it into ElementTree  
 so that I can play with it but it is not working for me. XML and  
 fromstring when used with a string do not do the same thing as parse  
 does with a file. How do I get this to work?

Why do you need an ElementTree rather than an Element? XML(string) returns
the root element, as if you had used et.parse(f).getroot(). You can turn
this into an ElementTree with e.g. et.ElementTree(XML(string)).

 Why isn't et.parse the only way to do this? Why have XML or fromstring  
 at all? Why not enhance parse and deprecate XML and fromstring with  
 something like:
 
 formPostData = cgi.FieldStorage()
 theXmlData = formPostData['theXml'].value
 theXmlDataTree = 
 et.parse(makeThisUnicodeStringLookLikeAFileSoParseWillDealWithIt(theXmlData))

If you want to treat a string as a file, use StringIO.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: ElementTree.XML(string XML) and ElementTree.fromstring(string XML) not working

2009-06-25 Thread unayok
On Jun 25, 9:02 pm, Kee Nethery k...@kagi.com wrote:
 Summary: I have XML as string and I want to pull it into ElementTree
 so that I can play with it but it is not working for me. XML and
 fromstring when used with a string do not do the same thing as parse
 does with a file. How do I get this to work?

 Details:
 I have a CGI that receives XML via an HTTP POST as a POST variable
 named 'theXml'. The POST data is a string that the CGI receives, it is
 not a file on a hard disk.

 The POSTed string looks like this when viewed in pretty format:
[...]
 et.parse seems to pull in the entire XML document and give me
 something to play with whereas et.XML and et.fromstring do not.

 Questions:
 How do I get this to work?
 Where in the docs did it give me an example of how to make this work
 (what did I miss from reading the docs)?

[skipping bonus points question]

I'm not sure what you're expecting.  It looks to me like things are
working okay:

My test script:

import xml.etree.ElementTree as ET

data=xml
purchase id=1 lang=en
item id=1 productId=369369
nameAutumn/name
quantity1/quantity
price8.46/price
/item
javascriptYES/javascript
/purchase
customer id=123456 time=1227449322
shipping
street19 Any Street/street
cityBerkeley/city
stateCalifornia/state
zip12345/zip
countryPeople's Republic of Berkeley/
country
nameJon Roberts/name
/shipping
emailju...@shrimp.edu/email
/customer
/xml

xml = ET.fromstring( data )

print xml
print attrib   , xml.attrib
print tag  , xml.tag
print text , xml.text
print contents 
for element in xml :
print element
print tostring
print ET.tostring( xml )

when run, produces:

Element xml at 7f582c2e82d8
attrib{}
tag   xml
text

contents
Element purchase at 7f582c2e8320
Element customer at 7f582c2e85a8
tostring
xml
purchase id=1 lang=en
item id=1 productId=369369
nameAutumn/name
quantity1/quantity
price8.46/price
/item
javascriptYES/javascript
/purchase
customer id=123456 time=1227449322
shipping
street19 Any Street/street
cityBerkeley/city
stateCalifornia/state
zip12345/zip
countryPeople's Republic of Berkeley/
country
nameJon Roberts/name
/shipping
emailju...@shrimp.edu/email
/customer
/xml

Which seems to me quite useful (i.e. it has the full XML available).
Maybe you can explain how you were trying to play with the results
of fromstring() that you can't do from parse().

The documentation for elementtree indicates:

 The ElementTree wrapper type adds code to load XML files as trees
 of Element objects, and save them back again.

and

 The Element type can be used to represent XML files in memory.
 The ElementTree wrapper class is used to read and write XML files.

In the above case, you should find that the getroot() of your loaded
ElementTree instance ( parse().getroot() ) to be the same as the
Element generated by fromstring().
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: ElementTree.XML(string XML) and ElementTree.fromstring(string XML) not working

2009-06-25 Thread Carl Banks
On Jun 25, 6:02 pm, Kee Nethery k...@kagi.com wrote:
 Summary: I have XML as string and I want to pull it into ElementTree  
 so that I can play with it but it is not working for me. XML and  
 fromstring when used with a string do not do the same thing as parse  
 does with a file. How do I get this to work?

 Details:
 I have a CGI that receives XML via an HTTP POST as a POST variable  
 named 'theXml'. The POST data is a string that the CGI receives, it is  
 not a file on a hard disk.

 The POSTed string looks like this when viewed in pretty format:

 xml
         purchase id=1 lang=en
                 item id=1 productId=369369
                         nameAutumn/name
                         quantity1/quantity
                         price8.46/price
                 /item
                 javascriptYES/javascript
         /purchase
         customer id=123456 time=1227449322
                 shipping
                         street19 Any Street/street
                         cityBerkeley/city
                         stateCalifornia/state
                         zip12345/zip
                         countryPeople's Republic of Berkeley/country
                         nameJon Roberts/name
                 /shipping
                 emailju...@shrimp.edu/email
         /customer
 /xml

 The pseudocode in Python 2.6.2 looks like:

 import xml.etree.ElementTree as et

 formPostData = cgi.FieldStorage()
 theXmlData = formPostData['theXml'].value
 theXmlDataTree = et.XML(theXmlData)

 and when this runs, theXmlDataTree is set to:

 theXmlDataTree  instance        Element xml at 7167b0
         attrib  dict    {}
         tag     str     xml
         tail    NoneType        None
         text    NoneType        None

 I get the same result with fromstring:

 formPostData = cgi.FieldStorage()
 theXmlData = formPostData['theXml'].value
 theXmlDataTree = et.fromstring(theXmlData)

 I can put the xml in a file and reference the file by it's URL and use:

 et.parse(urllib.urlopen(theUrl))

 and that will set theXmlDataTree to:

 theXmlDataTree  instance        xml.etree.ElementTree.ElementTree instance 
 at  
 0x67cb48

 This result I can play with. It contains all the XML.

I believe you are misunderstanding something.  et.XML and
et.fromstring return Elements, whereas et.parse returns an
ElementTree.  These are two different things; however, both of them
contain all the XML.  In fact, an ElementTree (which is returned by
et.parse) is just a container for the root Element (returned by
et.fromstring)--and it adds no important functionality to the root
Element as far as I can tell.

Given an Element (as returned by et.XML or et.fromstring) you can pass
it to the ElementTree constructor to get an ElementTree instance.  The
following line should give you something you can play with:

theXmlDataTree = et.ElementTree(et.fromstring(theXmlData))

Conversely, given an ElementTree (as returned bu et.parse) you can
call the getroot method to obtain the root Element, like so:

theXmlRootElement = et.parse(xmlfile).getroot()

I have no use for ElementTree instances so I always call getroot right
away and only store the root element.  You may prefer to work with
ElementTrees rather than with Elements directly, and that's perfectly
fine; just use the technique above to wrap up the root Element if you
use et.fromstring.


[snip]
 Why isn't et.parse the only way to do this? Why have XML or fromstring  
 at all?

Because Fredrick Lundh wanted it that way.  Unlike most Python
libraries ElementTree is under the control of one person, which means
it was not designed or vetted by the community, which means it would
tend to have some interface quirks.  You shouldn't complain: the
library is superb compared to XML solutions like DOM.  A few minor
things should be no big deal.


Carl Banks
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: ElementTree.XML(string XML) and ElementTree.fromstring(string XML) not working

2009-06-25 Thread Kee Nethery
thank you to everyone, I'll play with these suggestions tomorrow at  
work and report back.


On Jun 25, 2009, at 8:04 PM, Carl Banks wrote:


Because Fredrick Lundh wanted it that way.  Unlike most Python
libraries ElementTree is under the control of one person, which means
it was not designed or vetted by the community, which means it would
tend to have some interface quirks.


Yep


You shouldn't complain: the
library is superb compared to XML solutions like DOM.


Which is why I want to use it.


A few minor
things should be no big deal.


True and I will eventually get past the minor quirks. As a newbie,  
figured I'd point out the difficult portions, things that conceptually  
are confusing. I know that after lots of use I'm not going to notice  
that it is strange that I have to stand on my head and touch my nose 3  
times to open the fridge door. The contortions will seem normal.


Results tomorrow, thanks everyone for the assistance.

Kee Nethery
--
http://mail.python.org/mailman/listinfo/python-list


Re: ElementTree.XML(string XML) and ElementTree.fromstring(string XML) not working

2009-06-25 Thread Carl Banks
On Jun 25, 8:53 pm, Kee Nethery k...@kagi.com wrote:
 On Jun 25, 2009, at 8:04 PM, Carl Banks wrote:
  A few minor
  things should be no big deal.

 True and I will eventually get past the minor quirks. As a newbie,  
 figured I'd point out the difficult portions, things that conceptually  
 are confusing. I know that after lots of use I'm not going to notice  
 that it is strange that I have to stand on my head and touch my nose 3  
 times to open the fridge door. The contortions will seem normal.

Well it's not *that* bad.

(That would be PIL. :)


Carl Banks
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: ElementTree.XML(string XML) and ElementTree.fromstring(string XML) not working

2009-06-25 Thread Stefan Behnel
Carl Banks wrote:
 Why isn't et.parse the only way to do this? Why have XML or fromstring  
 at all?
 
 Because Fredrick Lundh wanted it that way.  Unlike most Python
 libraries ElementTree is under the control of one person, which means
 it was not designed or vetted by the community, which means it would
 tend to have some interface quirks.

Just for the record: Fredrik doesn't actually consider it a design quirk.
He argues that it's designed for different use cases. While parse() parses
a file, which normally contains a complete document (represented in ET as
an ElementTree object), fromstring() and especially the 'literal wrapper'
XML() are made for parsing strings, which (most?) often only contain XML
fragments. With a fragment, you normally want to continue doing things like
inserting it into another tree, so you need the top-level element in almost
all cases.

Stefan
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: ElementTree.XML(string XML) and ElementTree.fromstring(string XML) not working

2009-06-25 Thread Carl Banks
On Jun 25, 10:11 pm, Stefan Behnel stefan...@behnel.de wrote:
 Carl Banks wrote:
  Why isn't et.parse the only way to do this? Why have XML or fromstring  
  at all?

  Because Fredrick Lundh wanted it that way.  Unlike most Python
  libraries ElementTree is under the control of one person, which means
  it was not designed or vetted by the community, which means it would
  tend to have some interface quirks.

 Just for the record: Fredrik doesn't actually consider it a design quirk.

Well of course he wouldn't--it's his library.

 He argues that it's designed for different use cases. While parse() parses
 a file, which normally contains a complete document (represented in ET as
 an ElementTree object), fromstring() and especially the 'literal wrapper'
 XML() are made for parsing strings, which (most?) often only contain XML
 fragments. With a fragment, you normally want to continue doing things like
 inserting it into another tree, so you need the top-level element in almost
 all cases.

Whatever, like I said I am not going to nit-pick over small things,
when all the big things are done right.


Carl Banks
-- 
http://mail.python.org/mailman/listinfo/python-list