Hi Daniel, thanks for the heads-up. I don't care all that much about the global dict size - 10M entries should be hard enough to reach for normal use cases. Most users only deal with a very small number of XML formats.
But I did run into issues with the buffer changes. Daniel Veillard, 06.08.2012 09:00: > The new buffer structure will be ABI compatible with the old ones, > i.e. the old code as compiled wil be able to work with the new one, as > the fields with the same values are in the same place in the new > structures. But the structure are now opaque and the few places where > the code was using it directly will need fixing. > What I see from the usage there are for example access to xmlOutputBuffers: > > buf = xmlAllocOutputBuffer (NULL); > ....dump stuff to the buffer... > use data at buf->buffer->content, of size buf->buffer->use > > First okay, that was allowed by the API, but such buffers were supposed > to be used for I/O and encoding conversion, in general accessing > buf->buffer->content and buf->buffer->use directly was not really the > expected way to do things. The fact that xmlOutputBuffer were not > supposed to be used that way is the reason why there is no accessors for > getting the output data, this is now fixed as of commit > > > http://git.gnome.org/browse/libxml2/commit/?id=e258adecd0e19a6cfe6afa232b89aa416368820e > > So where there is such use of direct access, check the LIBXML2_NEW_BUFFER > macro and if present then > - replace buf->buffer->content with xmlOutputBufferGetContent(buf) > - replace buf->buffer->use with xmlOutputBufferGetSize(buf) I tested it and found that lxml is affected by this. lxml currently takes the xmlBuffer* from either the "conv" or "buffer" field of the output buffer and then calls xmlBufferContent() and xmlBufferLength() to get at the result. I take it that this isn't how it'll work in the future, because xmlBufferLength() returns an int and buffers are supposed to be larger than that, right? However, xmlOutputBufferGetContent() only reads the "buffer" field, not the "conv" field. How should I use the "conv" field now? Can't the new xmlOutputBufferGetContent() do "the right thing" for me? Code that uses xmlBuffer directly is here: https://github.com/lxml/lxml/blob/master/src/lxml/serializer.pxi#L31 https://github.com/lxml/lxml/blob/master/src/lxml/serializer.pxi#L123 Another issue I found: xmlDumpNotationTable() still wants an xmlBuffer instead of the xmlBuf that outbuffer.buffer returns. Is the right fix here to include buf.h and call xmlBufBackToBuffer()? https://github.com/lxml/lxml/blob/master/src/lxml/serializer.pxi#L293 (BTW, the reason why the serialisation code is doing so much stuff manually is IIRC that lxml still supports a couple of libxml2 versions that lack the newer features of the serialisation/xmlSave API. And also to avoid slight changes to the serialised XML if it switched to native libxml2 functions abruptly.) > if in some place the xmlBufferPtr was passed independantly of the > OutputBuffer, it's possible to use xmlBufGetContent(buffer) and > xmlBufUse(buffer) to achieve the same. I assume you meant xmlBufContent() ? It seems to me that redefining xmlBufferLength and xmlBufferContent to call the new xmlBuf functions and using a size_t (or ssize_t?) to store the result of xmlBufLength would do the trick. BTW, is there a reason why there's both an xmlBufLength() and an xmlBufUse() that do the same thing? Since this is a new API that doesn't suffer from legacy junk yet, wouldn't one be enough? (And wouldn't xmlBufLength() be the perfect name?) > I don't plan to make an official release with the changes before > September, so there is a bit of time to get this all cleaned up, and > possibly refine the migration stategy for the few apps affected. There'll be a new release (3.0) of lxml quite soon, within a few weeks. It should be doable to get this fixed up by then. Stefan _______________________________________________ xml mailing list, project page http://xmlsoft.org/ [email protected] https://mail.gnome.org/mailman/listinfo/xml
