Re: [xml] How do I go about adding missing entities to libxml2?

2015-10-16 Thread Daniel Veillard
On Tue, Aug 04, 2015 at 05:59:11PM +1000, Sam Saffron wrote:
> We have a bunch of missing entities (that flows through to other libs
> like nokogiri ruby gem)
> 
> for example 
> https://meta.discourse.org/t/certain-unicode-entities-are-being-escaped/19898
> 
> What do I need to do to get them into libxml2 ?

  you get them in a DTD referenced and loaded by your document. It's not
an XML predefined entity so your document has to define what it means

Daniel

-- 
Daniel Veillard  | Open Source and Standards, Red Hat
veill...@redhat.com  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | virtualization library  http://libvirt.org/
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


[xml] How do I go about adding missing entities to libxml2?

2015-08-04 Thread Sam Saffron
We have a bunch of missing entities (that flows through to other libs
like nokogiri ruby gem)

for example aleph;
https://meta.discourse.org/t/certain-unicode-entities-are-being-escaped/19898

What do I need to do to get them into libxml2 ?
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] How do I use LATEST_LIBXML2?

2014-10-03 Thread Daniel Veillard
On Wed, Oct 01, 2014 at 11:59:36AM -0700, ols6...@sbcglobal.net wrote:
 I'm attempting to use libxml2 on a Windows system, and I want to compile the
 source code myself.
 
 When I go to the libxml downloads page, I see that 2.9.1 is the latest
 version. If I download that version, I get a 5 MB file LATEST_LIBXML2.
 
 My question is, how do I extract the source files, documentation, etc from
 this file? Or is this not the correct file?

  it's a pointer to a gzipped tarball,

just fetch libxml2-2.9.1.tar.gz

Daniel

-- 
Daniel Veillard  | Open Source and Standards, Red Hat
veill...@redhat.com  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | virtualization library  http://libvirt.org/
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


[xml] How do I use LATEST_LIBXML2?

2014-10-01 Thread ols6000
I'm attempting to use libxml2 on a Windows system, and I want to 
compile the source code myself.


When I go to the libxml downloads page, I see that 2.9.1 is the 
latest version. If I download that version, I get a 5 MB file LATEST_LIBXML2.


My question is, how do I extract the source files, documentation, etc 
from this file? Or is this not the correct file?


___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


[xml] How do I get the encoding of an XML document?

2007-01-03 Thread Jean Jordaan
Hi there

I'd like to find the encoding of an XML document, as detected by
libxml2, using the Python bindings. From lxml, I can get it like this:

 et
etree._ElementTree object at 0xb7cc992c
 et.docinfo.encoding
'windows-1252'

According to the lxml API docs, lxml gets this information from libxml2 (see
http://codespeak.net/lxml/api.html#parsers )

How do I get at it without depending on lxml? The only way I've been
able to find is using debugDumpDocumentHead, which just prints to
stdout.

 dh = xml.debugDumpDocumentHead(xml)
DOCUMENT
version=1.0
encoding=windows-1252
standalone=true

Regards,
-- 
jean  . ..  //\\\oo///\\
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml


Re: [xml] How do I get the encoding of an XML document?

2007-01-03 Thread Daniel Veillard
On Wed, Jan 03, 2007 at 11:53:55AM +0200, Jean Jordaan wrote:
 Hi there
 
 I'd like to find the encoding of an XML document, as detected by
 libxml2, using the Python bindings. From lxml, I can get it like this:
 
  et
 etree._ElementTree object at 0xb7cc992c
  et.docinfo.encoding
 'windows-1252'
 
 According to the lxml API docs, lxml gets this information from libxml2 (see
 http://codespeak.net/lxml/api.html#parsers )
 
 How do I get at it without depending on lxml? The only way I've been
 able to find is using debugDumpDocumentHead, which just prints to
 stdout.
 
  dh = xml.debugDumpDocumentHead(xml)
 DOCUMENT
 version=1.0
 encoding=windows-1252
 standalone=true

  Hum, it's a string attached to the xmlDoc, it's available directly in C
but there is no specific API to extract it. As a result the autogenerated
bindings don't seems to have a way to extract the information. Could you
add a bugzilla asking for that functionality, the simplest is probably
to provide a custom accessor function, specifically at the python binding
level.

Daniel

-- 
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard  | virtualization library  http://libvirt.org/
[EMAIL PROTECTED]  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine  http://rpmfind.net/
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml


Re: [xml] How do I get the encoding of an XML document?

2007-01-03 Thread Jean Jordaan
Hi Daniel

 Could you
 add a bugzilla asking for that functionality, the simplest is probably
 to provide a custom accessor function, specifically at the python binding
 level.

Done:
  http://bugzilla.gnome.org/show_bug.cgi?id=392300

Thanks,
-- 
jean  . ..  //\\\oo///\\
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml


Re: [xml] how do I...

2005-05-25 Thread Eric Haszlakiewicz
On Tue, May 24, 2005 at 06:01:00PM -0400, Daniel Veillard wrote:
   By the definition of XML this is not possible. Packing multiple
 XML document on a single stream without out of band markers is a frequent
 but huge design flaws. The demonstration is obvious for anybody who
 read the 2 first pages of the XML standard:
 
http://www.w3.org/TR/REC-xml/#sec-well-formed
 
First production of the XML specification:
[1] document ::= prolog element Misc*
 
  Misc* means there is no potential limit to the number of Misc element at 
 the end, and not finding one is a fatal error. 
And the definition of Misc is:
Misc   ::=  Comment | PI | S

i.e. either a comment (!-- ... --), a processing instruction (? ... ?) or
white space.  That PI is further defined to _exclude_ ?xml ... ?.

So it seems that is IS possible to determine where the next xml doc starts
just by looking for the next xml declaration or the next actual element.

 On Tue, May 24, 2005 at 02:47:12PM -0600, Sebastian Kuzminsky wrote:
  I've got a couple of processes that trade XML messages over the network.
  The sender writes each XML message on a single newline-terminated line.
  The receiver uses select and read, and reads until it gets a '\n',

While I wouldn't necessarily call a design that tried to intuit where
the next doc starts broken, using a delimeter between docs does sound like
a better idea.  As others have said, using \0 instead of \n will probably
work better, at least until you do something like switch to UTF-16 encoding.

eric
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml


Re: [xml] how do I...

2005-05-25 Thread Rich Salz

  But a parser will not do this a parser MUST report an error and abort
at that point if it finds an element there it's the spec. The parser 
must error, so something else must detect the end of the root and then 
find the next document, basically you need to rewrite a second non-conformant

parser to do that in front of the real parser.


In other words, you need a framing layer that happens *below* the XML 
layer.  Lots of those kinds of things around, multipart MIME, DIME, etc. 
 You can probably re-use something rather than inventing your own.


/r$

--
Rich Salz, Chief Security Architect
DataPower Technology   http://www.datapower.com
XS40 XML Security Gateway   http://www.datapower.com/products/xs40.html
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml


[xml] how do I...

2005-05-24 Thread Sebastian Kuzminsky
Hi folks, I'm a bit of an XML noob with a how-to question.


I've got a couple of processes that trade XML messages over the network.
The sender writes each XML message on a single newline-terminated line.
The receiver uses select and read, and reads until it gets a '\n',
then passes the line to xmlParseMemory and validates the resulting doc
with xmlValidateDtd.


This works well, but it's a little annoying to have to use '\n' as the
message separator.  What I'd really like is to let the sender spread
its message out over several lines if it wants, and have the receiver
detect the end-of-message without any gross hacks.


I'm imagining a stateful, incremental parser function that I can
repeatedly call with the buffers I read from the network, and it'll
consume up to the end-of-buffer or end-of-message, whichever comes
first, and return me a doc if it finished one, or NULL if it didnt.
If the function didnt consume the whole buffer (because it found an
end-of-message before the end-of-buffer), it'll have to tell me where
it left off so I can call it again with the rest later.


I looked halfheartedly through the docs on the xmlsoft.org website but
was ovewrwhelmed.


Any clues for the clueless?


-- 
Sebastian Kuzminsky
Marie will know I'm headed south, so's to meet me by and by
-Townes Van Zandt
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml


Re: [xml] how do I...

2005-05-24 Thread Daniel Veillard
On Tue, May 24, 2005 at 02:47:12PM -0600, Sebastian Kuzminsky wrote:
 I've got a couple of processes that trade XML messages over the network.
 The sender writes each XML message on a single newline-terminated line.
 The receiver uses select and read, and reads until it gets a '\n',
 then passes the line to xmlParseMemory and validates the resulting doc
 with xmlValidateDtd.
 
 
 This works well, but it's a little annoying to have to use '\n' as the
 message separator.  What I'd really like is to let the sender spread
 its message out over several lines if it wants, and have the receiver
 detect the end-of-message without any gross hacks.
 
 
 I'm imagining a stateful, incremental parser function that I can
 repeatedly call with the buffers I read from the network, and it'll
 consume up to the end-of-buffer or end-of-message, whichever comes
 first, and return me a doc if it finished one, or NULL if it didnt.
 If the function didnt consume the whole buffer (because it found an
 end-of-message before the end-of-buffer), it'll have to tell me where
 it left off so I can call it again with the rest later.
[...]
 Any clues for the clueless?

  By the definition of XML this is not possible. Packing multiple
XML document on a single stream without out of band markers is a frequent
but huge design flaws. The demonstration is obvious for anybody who
read the 2 first pages of the XML standard:

   http://www.w3.org/TR/REC-xml/#sec-well-formed

   First production of the XML specification:
   [1] document ::= prolog element Misc*

 Misc* means there is no potential limit to the number of Misc element at 
the end, and not finding one is a fatal error. 
 The direct result from this is that the parser must be told that the document
is finished. And libxml2 API being strictly conformant does not offer APIs
for what you want. 
 I strongly suggest you redesign your network format to include markers
or documents size in the pipe, the current state sounds broken.

Daniel

-- 
Daniel Veillard  | Red Hat Desktop team http://redhat.com/
[EMAIL PROTECTED]  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml


Re: [xml] how do I...

2005-05-24 Thread Daniel Veillard
On Tue, May 24, 2005 at 04:24:55PM -0600, Sebastian Kuzminsky wrote:
 My network format already includes an end-of-document marker which never
 appears inside the document ('\n'), so I guess I'm standards-compliant,
 if only by dumb luck.  :)

  Hum, the problem is that \n is perfectly legal within XML documents,
and quite common there. So it works because the kind of data don't
require it but it's not a good solution. You should still be able
to use other separators chars in markup and escape it to a numeric
character reference if really needed, but still this is very limited.
I assume your documents are short, in which case a stream of
  [(inti, documenti) *]  where inti == len(documenti)
would be quite easier to handle, as you can directly read the right number 
of bytes and then pass directly the complete buffer for a single pass
parsing.

Daniel

-- 
Daniel Veillard  | Red Hat Desktop team http://redhat.com/
[EMAIL PROTECTED]  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml