A recent reported bug (https://sourceforge.net/apps/trac/pyxb/ticket/153)
is:

  A mixed element with a sequence of <xs:any> moves all of the
embedded text to the end:

  <v>Test <xhtml:b>something</xhtml:b> else<v>

  becomes

  <v><xhtml:b>something</xhtml:b>Test else</v>

This isn't so much a bug as a design decision.  I'm opening up
discussion on how to resolve this.

The issue is that the PyXB representation of a complex element is an
instance of a Python object.  XML attributes and elements are named,
and are naturally represented as attributes on that object.
Non-element content doesn't have a natural home.

In the case of simple elements, the element itself holds the value,
and is an instance of a (subclass of a) Python type rather than an
object.  If it's a complex type with simple content, there's a value()
method on the object which distinguishes the content from any
attributes.

For complex elements with mixed content, PyXB attempts to maintain a
list of element and non-element content, in the order it was added, in
the content() function
(http://pyxb.sourceforge.net/api/pyxb.binding.basis.complexTypeDefinition-class.html#content).
 When validation is in force, though, the generated DOM representation
does not necessarily maintain this order (it cannot if the order would
not validate).  Consequently, there's no way to preserve the order of
non-element content, and the mixed content gets emitted in a lump at
the end (if at all, as with ticket 154).

A similar issue exists with multiple-occurrence xs:any, because once
PyXB has demultiplexed the element content into the correct fields in
the containing instance, the original order has been lost.

I would like to fix this, and there's an opportunity because the
switch to counter automata required by a fix to #112 will completely
replace the underlying content model, and the potential exists for
preserving mixed content in order.  What I don't see is that there's
an obvious Python interface.  I'm already unhappy about having to use
names like "content" and "value" within the namespace of a complex or
simple type to provide access to information like this, since that
requires that XML attributes and elements with these names be renamed
in the binding to eliminate the conflict.  I'm also reluctant to have
PyXB attempt to remember and preserve the original order of content,
since that can take a lot of space, normally isn't needed, and can be
invalidated due to manipulations of the Python instance.

So the questions:

* If an XML element has mixed content, how do you want to interact with it
  in Python?

* If an XML schema does not impose an order on child elements, viz by using
  xs:choice or xs:any instead of xs:sequence, how much of a memory or
  performance penalty are you willing to pay to preserve the original
  document order?

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
pyxb-users mailing list
pyxb-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pyxb-users

Reply via email to