Re: [xml] How do I go about adding missing entities to libxml2?
On Tue, Aug 04, 2015 at 05:59:11PM +1000, Sam Saffron wrote: > We have a bunch of missing entities (that flows through to other libs > like nokogiri ruby gem) > > for example > https://meta.discourse.org/t/certain-unicode-entities-are-being-escaped/19898 > > What do I need to do to get them into libxml2 ? you get them in a DTD referenced and loaded by your document. It's not an XML predefined entity so your document has to define what it means Daniel -- Daniel Veillard | Open Source and Standards, Red Hat veill...@redhat.com | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | virtualization library http://libvirt.org/ ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org https://mail.gnome.org/mailman/listinfo/xml
[xml] How do I go about adding missing entities to libxml2?
We have a bunch of missing entities (that flows through to other libs like nokogiri ruby gem) for example aleph; https://meta.discourse.org/t/certain-unicode-entities-are-being-escaped/19898 What do I need to do to get them into libxml2 ? ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org https://mail.gnome.org/mailman/listinfo/xml
Re: [xml] How do I use LATEST_LIBXML2?
On Wed, Oct 01, 2014 at 11:59:36AM -0700, ols6...@sbcglobal.net wrote: I'm attempting to use libxml2 on a Windows system, and I want to compile the source code myself. When I go to the libxml downloads page, I see that 2.9.1 is the latest version. If I download that version, I get a 5 MB file LATEST_LIBXML2. My question is, how do I extract the source files, documentation, etc from this file? Or is this not the correct file? it's a pointer to a gzipped tarball, just fetch libxml2-2.9.1.tar.gz Daniel -- Daniel Veillard | Open Source and Standards, Red Hat veill...@redhat.com | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | virtualization library http://libvirt.org/ ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org https://mail.gnome.org/mailman/listinfo/xml
[xml] How do I use LATEST_LIBXML2?
I'm attempting to use libxml2 on a Windows system, and I want to compile the source code myself. When I go to the libxml downloads page, I see that 2.9.1 is the latest version. If I download that version, I get a 5 MB file LATEST_LIBXML2. My question is, how do I extract the source files, documentation, etc from this file? Or is this not the correct file? ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org https://mail.gnome.org/mailman/listinfo/xml
[xml] How do I get the encoding of an XML document?
Hi there I'd like to find the encoding of an XML document, as detected by libxml2, using the Python bindings. From lxml, I can get it like this: et etree._ElementTree object at 0xb7cc992c et.docinfo.encoding 'windows-1252' According to the lxml API docs, lxml gets this information from libxml2 (see http://codespeak.net/lxml/api.html#parsers ) How do I get at it without depending on lxml? The only way I've been able to find is using debugDumpDocumentHead, which just prints to stdout. dh = xml.debugDumpDocumentHead(xml) DOCUMENT version=1.0 encoding=windows-1252 standalone=true Regards, -- jean . .. //\\\oo///\\ ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml
Re: [xml] How do I get the encoding of an XML document?
On Wed, Jan 03, 2007 at 11:53:55AM +0200, Jean Jordaan wrote: Hi there I'd like to find the encoding of an XML document, as detected by libxml2, using the Python bindings. From lxml, I can get it like this: et etree._ElementTree object at 0xb7cc992c et.docinfo.encoding 'windows-1252' According to the lxml API docs, lxml gets this information from libxml2 (see http://codespeak.net/lxml/api.html#parsers ) How do I get at it without depending on lxml? The only way I've been able to find is using debugDumpDocumentHead, which just prints to stdout. dh = xml.debugDumpDocumentHead(xml) DOCUMENT version=1.0 encoding=windows-1252 standalone=true Hum, it's a string attached to the xmlDoc, it's available directly in C but there is no specific API to extract it. As a result the autogenerated bindings don't seems to have a way to extract the information. Could you add a bugzilla asking for that functionality, the simplest is probably to provide a custom accessor function, specifically at the python binding level. Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ [EMAIL PROTECTED] | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/ ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml
Re: [xml] How do I get the encoding of an XML document?
Hi Daniel Could you add a bugzilla asking for that functionality, the simplest is probably to provide a custom accessor function, specifically at the python binding level. Done: http://bugzilla.gnome.org/show_bug.cgi?id=392300 Thanks, -- jean . .. //\\\oo///\\ ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml
Re: [xml] how do I...
On Tue, May 24, 2005 at 06:01:00PM -0400, Daniel Veillard wrote: By the definition of XML this is not possible. Packing multiple XML document on a single stream without out of band markers is a frequent but huge design flaws. The demonstration is obvious for anybody who read the 2 first pages of the XML standard: http://www.w3.org/TR/REC-xml/#sec-well-formed First production of the XML specification: [1] document ::= prolog element Misc* Misc* means there is no potential limit to the number of Misc element at the end, and not finding one is a fatal error. And the definition of Misc is: Misc ::= Comment | PI | S i.e. either a comment (!-- ... --), a processing instruction (? ... ?) or white space. That PI is further defined to _exclude_ ?xml ... ?. So it seems that is IS possible to determine where the next xml doc starts just by looking for the next xml declaration or the next actual element. On Tue, May 24, 2005 at 02:47:12PM -0600, Sebastian Kuzminsky wrote: I've got a couple of processes that trade XML messages over the network. The sender writes each XML message on a single newline-terminated line. The receiver uses select and read, and reads until it gets a '\n', While I wouldn't necessarily call a design that tried to intuit where the next doc starts broken, using a delimeter between docs does sound like a better idea. As others have said, using \0 instead of \n will probably work better, at least until you do something like switch to UTF-16 encoding. eric ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml
Re: [xml] how do I...
But a parser will not do this a parser MUST report an error and abort at that point if it finds an element there it's the spec. The parser must error, so something else must detect the end of the root and then find the next document, basically you need to rewrite a second non-conformant parser to do that in front of the real parser. In other words, you need a framing layer that happens *below* the XML layer. Lots of those kinds of things around, multipart MIME, DIME, etc. You can probably re-use something rather than inventing your own. /r$ -- Rich Salz, Chief Security Architect DataPower Technology http://www.datapower.com XS40 XML Security Gateway http://www.datapower.com/products/xs40.html ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml
[xml] how do I...
Hi folks, I'm a bit of an XML noob with a how-to question. I've got a couple of processes that trade XML messages over the network. The sender writes each XML message on a single newline-terminated line. The receiver uses select and read, and reads until it gets a '\n', then passes the line to xmlParseMemory and validates the resulting doc with xmlValidateDtd. This works well, but it's a little annoying to have to use '\n' as the message separator. What I'd really like is to let the sender spread its message out over several lines if it wants, and have the receiver detect the end-of-message without any gross hacks. I'm imagining a stateful, incremental parser function that I can repeatedly call with the buffers I read from the network, and it'll consume up to the end-of-buffer or end-of-message, whichever comes first, and return me a doc if it finished one, or NULL if it didnt. If the function didnt consume the whole buffer (because it found an end-of-message before the end-of-buffer), it'll have to tell me where it left off so I can call it again with the rest later. I looked halfheartedly through the docs on the xmlsoft.org website but was ovewrwhelmed. Any clues for the clueless? -- Sebastian Kuzminsky Marie will know I'm headed south, so's to meet me by and by -Townes Van Zandt ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml
Re: [xml] how do I...
On Tue, May 24, 2005 at 02:47:12PM -0600, Sebastian Kuzminsky wrote: I've got a couple of processes that trade XML messages over the network. The sender writes each XML message on a single newline-terminated line. The receiver uses select and read, and reads until it gets a '\n', then passes the line to xmlParseMemory and validates the resulting doc with xmlValidateDtd. This works well, but it's a little annoying to have to use '\n' as the message separator. What I'd really like is to let the sender spread its message out over several lines if it wants, and have the receiver detect the end-of-message without any gross hacks. I'm imagining a stateful, incremental parser function that I can repeatedly call with the buffers I read from the network, and it'll consume up to the end-of-buffer or end-of-message, whichever comes first, and return me a doc if it finished one, or NULL if it didnt. If the function didnt consume the whole buffer (because it found an end-of-message before the end-of-buffer), it'll have to tell me where it left off so I can call it again with the rest later. [...] Any clues for the clueless? By the definition of XML this is not possible. Packing multiple XML document on a single stream without out of band markers is a frequent but huge design flaws. The demonstration is obvious for anybody who read the 2 first pages of the XML standard: http://www.w3.org/TR/REC-xml/#sec-well-formed First production of the XML specification: [1] document ::= prolog element Misc* Misc* means there is no potential limit to the number of Misc element at the end, and not finding one is a fatal error. The direct result from this is that the parser must be told that the document is finished. And libxml2 API being strictly conformant does not offer APIs for what you want. I strongly suggest you redesign your network format to include markers or documents size in the pipe, the current state sounds broken. Daniel -- Daniel Veillard | Red Hat Desktop team http://redhat.com/ [EMAIL PROTECTED] | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/ ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml
Re: [xml] how do I...
On Tue, May 24, 2005 at 04:24:55PM -0600, Sebastian Kuzminsky wrote: My network format already includes an end-of-document marker which never appears inside the document ('\n'), so I guess I'm standards-compliant, if only by dumb luck. :) Hum, the problem is that \n is perfectly legal within XML documents, and quite common there. So it works because the kind of data don't require it but it's not a good solution. You should still be able to use other separators chars in markup and escape it to a numeric character reference if really needed, but still this is very limited. I assume your documents are short, in which case a stream of [(inti, documenti) *] where inti == len(documenti) would be quite easier to handle, as you can directly read the right number of bytes and then pass directly the complete buffer for a single pass parsing. Daniel -- Daniel Veillard | Red Hat Desktop team http://redhat.com/ [EMAIL PROTECTED] | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/ ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml