On Thu, Dec 20, 2012 at 01:09:03PM +0100, Alexandre Bique wrote: > On Thu, Dec 20, 2012 at 6:30 AM, Daniel Veillard <[email protected]> wrote: > > On Fri, Dec 14, 2012 at 11:22:37AM +0100, Alexandre Bique wrote: > >> Hi, > >> > >> I would like to know if it is possible to use xmlTextReader, but with > >> a parseChunk interface? > > > > Well the two are somehow in opposition: > > - the reader will internally try to get more data while parsing > > assuming a synchronous input > > - the chunk interface assumes the parser will just stop and > > give back the execution control to the caller once it needs > > more data. > > > >> Actually I do: > >> > >> // I removed the checks to simplify the code > >> buffer = xmlAllocParserInputBuffer(XML_CHAR_ENCODING_NONE); > >> reader = xmlNewTextReader(buffer, url); > >> > >> void data_received(const char *data, size_t len) > >> { > >> xmlParserInputBufferPush(buffer, len, data); > >> while (xmlTextReaderRead(reader) == 1) > > > > xmlTextReaderRead() may return 0 or and error code here > > if there is not enough data to finish parsing > > Which is alright to me. I observed that it worked well, and when you > feed the parser with more data, it continues where it stopped.
Very surprizing, it should usually raise a fatal error and the reader should basically stop working correctly from that point. http://www.w3.org/TR/REC-xml/#dt-fatal "Once a fatal error is detected, however, the processor MUST NOT continue normal processing (i.e., it MUST NOT continue to pass character data and information about the document's logical structure to the application in the normal way)" > >> parse_node(reader); > >> } > >> > >> This works but I noticed that the last chunk may not be parsed. > >> How can I make the reader to consume all the remaining data? > > > > Honnestly I don't know how to solve that simply. The natural way > > to do this would be to parse in a separate thread, create a > > reader for custom I/O and have the I/O read routine block if there > > is no more data to be read, then the main thread would unblock it > > when new data is available. This requires specialized I/O routines, > > threading and synchronization, so not simple. > > > > The core problem is that xmlTextReaderRead() can either return > > 1 for success, 0 if parsing is finished and -1 in case of error. > > There is no provision in the API to say "I need more data", and > > basically missing data would be reported as a parsing error > > (with missing closed tags for example). > > > > The programming model of the reader is way simpler, but it assumes > > a synchronous input. > > Thanks a lot for your answer. > > Does it sounds a good idea to extend the API to make my use case possible? > > I saw in the source code that it uses a sax parser internally, and the > only thing I need is to make the reader pass parseChunk(NULL, 0) to > its internal sax parser. > > I think that it is a good thing to accept asynchronous input, for > exemple if you read from a socket and get EAGAIN, then you can return > NEED_MORE_DATA, and the the user can read again later, until EOF. There is no way in the API to distinguish "<foo>" and there is no more data which should lead to a fatal parsing error from "<foo>" where it should not error because there is more data to be parsed but they aren't available yet. I don't see how to extend the xmlReaderRead() API to distinguish the two, currently when returning 0 that means the document parsing is finished there is no more data, when returning 1 there is more data and -1 means a fatal parsing error occured. Most existing application will the exit with an error code on anything except 0 or 1. I don't see how to really extend this simply. And the xmlParserInputBufferPtr is not a synchronization structure the parser just reads from it, if you don't feed the data fast enough you will get a parser error. Daniel -- Daniel Veillard | Open Source and Standards, Red Hat [email protected] | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | virtualization library http://libvirt.org/ _______________________________________________ xml mailing list, project page http://xmlsoft.org/ [email protected] https://mail.gnome.org/mailman/listinfo/xml
