A recent reported bug (https://sourceforge.net/apps/trac/pyxb/ticket/153) is:
A mixed element with a sequence of <xs:any> moves all of the embedded text to the end: <v>Test <xhtml:b>something</xhtml:b> else<v> becomes <v><xhtml:b>something</xhtml:b>Test else</v> This isn't so much a bug as a design decision. I'm opening up discussion on how to resolve this. The issue is that the PyXB representation of a complex element is an instance of a Python object. XML attributes and elements are named, and are naturally represented as attributes on that object. Non-element content doesn't have a natural home. In the case of simple elements, the element itself holds the value, and is an instance of a (subclass of a) Python type rather than an object. If it's a complex type with simple content, there's a value() method on the object which distinguishes the content from any attributes. For complex elements with mixed content, PyXB attempts to maintain a list of element and non-element content, in the order it was added, in the content() function (http://pyxb.sourceforge.net/api/pyxb.binding.basis.complexTypeDefinition-class.html#content). When validation is in force, though, the generated DOM representation does not necessarily maintain this order (it cannot if the order would not validate). Consequently, there's no way to preserve the order of non-element content, and the mixed content gets emitted in a lump at the end (if at all, as with ticket 154). A similar issue exists with multiple-occurrence xs:any, because once PyXB has demultiplexed the element content into the correct fields in the containing instance, the original order has been lost. I would like to fix this, and there's an opportunity because the switch to counter automata required by a fix to #112 will completely replace the underlying content model, and the potential exists for preserving mixed content in order. What I don't see is that there's an obvious Python interface. I'm already unhappy about having to use names like "content" and "value" within the namespace of a complex or simple type to provide access to information like this, since that requires that XML attributes and elements with these names be renamed in the binding to eliminate the conflict. I'm also reluctant to have PyXB attempt to remember and preserve the original order of content, since that can take a lot of space, normally isn't needed, and can be invalidated due to manipulations of the Python instance. So the questions: * If an XML element has mixed content, how do you want to interact with it in Python? * If an XML schema does not impose an order on child elements, viz by using xs:choice or xs:any instead of xs:sequence, how much of a memory or performance penalty are you willing to pay to preserve the original document order? ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ pyxb-users mailing list pyxb-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pyxb-users