[xml] Nasty DTD parsing bug (IO buffering, perhaps?)

2007-02-06 Thread Michael Day
Hi,

Here is a DTD parsing bug in libxml2 (tested with 2.6.27).

Download the following .tar.gz:

 http://www.princexml.com/download/nasty-libxml2-dtd-bug.tar.gz

Unpack it and run:

 $ xmllint --loaddtd bug.xml

You will get lots of error messages, the first one being:

 nlm/references.ent:381: parser error : Comment not terminated

However if you look at the file, you will see that is nonsense, and 
there are no unterminated comments on line 381.

Even worse, if you delete *one character* from the references.ent file 
at *any point* before line 381, then everything works fine!

This appears to be some kind of IO buffering error or something like 
that, as the parser seems to be dependent on how many characters are in 
the file before that point.

Best regards,

Michael

-- 
Print XML with Prince!
http://www.princexml.com
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml


Re: [xml] xsltproc: weired behaviour with parsing freedesktop.org XML shared-mime-info database (bug?)

2007-02-06 Thread Daniel Leidert
Am Dienstag, den 06.02.2007, 21:08 +0100 schrieb Daniel Leidert:
> Hello,
> 
> I observe a really weired behaviour here. See the attached stylesheet
> and process it to the shared-mime-info database (normally
> $datadir/mime/packages/freedesktop.org.xml). If I process my own XML
> file, with a similar (but not the same) DTD, containing an identical
> glob-element, it works. Processing it to the shared-mime-info db, does
> not give any output. What's the problem here? The shared-mime-info DB is
> a valid XML file. So what is happening here? Could you help to explain
> it to me? Maybe I'm just too dumb or I over-read something.

Dooh!

[..]
http://www.freedesktop.org/standards/shared-mime-info";>
[..]

Got it. My fault. Oversaw this.

Regards, Daniel

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml


[xml] xsltproc: weired behaviour with parsing freedesktop.org XML shared-mime-info database (bug?)

2007-02-06 Thread Daniel Leidert
Hello,

I observe a really weired behaviour here. See the attached stylesheet
and process it to the shared-mime-info database (normally
$datadir/mime/packages/freedesktop.org.xml). If I process my own XML
file, with a similar (but not the same) DTD, containing an identical
glob-element, it works. Processing it to the shared-mime-info db, does
not give any output. What's the problem here? The shared-mime-info DB is
a valid XML file. So what is happening here? Could you help to explain
it to me? Maybe I'm just too dumb or I over-read something.

Thanks and regards, Daniel


test.xsl
Description: application/xslt
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Python documentation - any help welcome!

2007-02-06 Thread Mike Kneller
Hi John, thanks for the comments,

>> If I had any suggestions it would be to intersperse working python code
examples for common operations in with the explanatory prose.

I have been doing that throughout the docs - it's quite a bit easier to read on 
the Wiki, and I would welcome any edits... ;-)

http://mikekneller.com/wiki/index.php?title=Getting_started_with_Libxml2_and_Python_-_part_1

At the moment, it's basically a "part 1 of x" so the examples are pretty 
trivial although in my experience, useful for most simple uses of the library. 
I agree with the idea of making it a sort of cookbook though. Any recipes are 
welcome here!

Cheers
Mike

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Python documentation - any help welcome!

2007-02-06 Thread John Dennis
Hi Mike:

I've used the libxml2 python bindings a fair bit and this is a good
start on documenting them. There is a bit of a learning curve but I
think that has more to do with learning libxml2 and less with the python
bindings, but that said it's still nice to see python specific
documentation. 

If I had any suggestions it would be to intersperse working python code
examples for common operations in with the explanatory prose. I think a
lot of folks just quickly want to know how to do basic tasks, a sort of
cookbook FAQ. e.g. 

how do I parse a doc and find all foobar elements and return a list of
them?

how do I build complex python objects by parsing an XML doc?

how can I serialize python objects into XML?

etc.

The examples can illustrate basic concepts in libxml2.
-- 
John Dennis <[EMAIL PROTECTED]>

Learn. Network. Experience open source.
Red Hat Summit San Diego  |  May 9-11, 2007
Learn more: http://www.redhat.com/promo/summit/2007


___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml


Re: [xml] libxml + sax + schema validation

2007-02-06 Thread Daniel Veillard
On Tue, Feb 06, 2007 at 03:33:57PM +0100, Jovan Kostovski wrote:
> Hi,
> 
> I need to write a sax xml parser that will
> validate the contents against a xml schema file.
> 
> Writing a sax parser wasn't to hard, but I have no
> clue how to implement the schema validation.
> 
> Can anyone help me?
> Links to some examples would be great

  there is an API to push a schemas validation context on top of
a SAX event, see xmlSchemaValidateStream() use in testSAX() in xmllint.c

Daniel

-- 
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard  | virtualization library  http://libvirt.org/
[EMAIL PROTECTED]  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine  http://rpmfind.net/
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml


[xml] libxml + sax + schema validation

2007-02-06 Thread Jovan Kostovski
Hi,

I need to write a sax xml parser that will
validate the contents against a xml schema file.

Writing a sax parser wasn't to hard, but I have no
clue how to implement the schema validation.

Can anyone help me?
Links to some examples would be great

BR, Jovan
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Python documentation - any help welcome!

2007-02-06 Thread Nic James Ferrier
Mike Kneller <[EMAIL PROTECTED]> writes:

> Hi,
>
> After struggling to get to grips with Libxml2 and Python, I figured
> that although I can't contribute much in the way of code, I can have
> a crack at getting some useful documentation up together.
>
> I have put the first part up on my Wiki, if anyone would care to
> review for accuracy - or help out where it is a bit light on
> examples?
>
> http://mikekneller.com/wiki/index.php?title=Getting_started_with_Libxml2_and_Python_-_part_1
>
> I realise that this is probably a bit n00b for most here, but I
> would like to bring together workable examples from the ground up,
> most of the other information I have read assumes a level of
> knowledge I just didn't have when I encountered the library for the
> first time.

Hey! I didn't see this till just now. I'm doing a *lot* with
libxml2/libxslt and python. I'll take a look at your doc and let you
know what I think.

-- 
Nic Ferrier
http://www.tapsellferrier.co.uk   for all your tapsell ferrier needs
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml


[xml] Python documentation - any help welcome!

2007-02-06 Thread Mike Kneller
Hi,

After struggling to get to grips with Libxml2 and Python, I figured that 
although I can't contribute much in the way of code, I can have a crack at 
getting some useful documentation up together.

I have put the first part up on my Wiki, if anyone would care to review for 
accuracy - or help out where it is a bit light on examples?

http://mikekneller.com/wiki/index.php?title=Getting_started_with_Libxml2_and_Python_-_part_1

I realise that this is probably a bit n00b for most here, but I would like to 
bring together workable examples from the ground up, most of the other 
information I have read assumes a level of knowledge I just didn't have when I 
encountered the library for the first time.

For reference, I'll post the text here.

Cheers
Mike

=== Getting started with Libxml2 and Python - Part 1 ===

Overview

Getting to grips with Libxml2 and Python can be a frustrating experience, 
particularly as in-depth, accurate Python documentation is hard to find 
on the Web.

Many Python developers dislike the Libxml2 bindings, as they are 'un-Pythonic'
and much too C-like. This however misses the point of Libxml2. The point is that
this library is portable, mature, extremely full-featured and *very* fast.

In the process of writing this tutorial, I hung out in the #xml channel on 
irc.gnome.org, and subscribed to the xml@gnome.org mailing list - I 
was given a lot of help when things weren't obvious! Although there's not a 
massive 
amount of activity on IRC, or in the mailing list on a daily basis, I would
definitely recommend spending some time browsing the archive - or using Google
to search it when you have questions. Additionally, I have found the people in 
the Libxml2 community very helpful. 

Manipulating XML using Libxml2 is fairly straightforward when you have a couple
of working examples, however that tends to be the problem in Python. Finding 
working examples tends to be a bit of a hit-and-miss affair.

The first place to look is in the examples folder in the documentation installed
with your release (/usr/share/doc/libxml2-python-2.6.27/examples on my machine).

TODO: where are the examples on a number of distributions/platforms?

Also, take a moment to scan through libxml2.py itself - this is the Python 
wrapper and
is a good place to look if you are hunting for a particular function. There
is plenty of information in the wrapper as all the docstrings have been 
populated, you can always get information like

print libxml2.parseFile.__doc__

for any particular function.

Also remember that you can list the available methods for any Python object by 
using the dir function. The most immediately useful objects are xmlCore, xmlNode
xmlDoc, so
dir(libxml2.xmlCore)
is your friend when working out what functions are available to you.

I'm going to assume that you know a bit about XML, at least enough to recognise
an XML document when you see one, and hopefully enough about Python to know 
where to find the documentation!

[installing Libxml2]

TODO: installation examples for a number of distros/platforms.

[Loading a document]

The first thing you want to do in XML will be to load a document of some sort.
As a new Libxml2 user, this is where our confusion starts! It is worth 
remembering
that in general, the Python bindings are automatically generated - therefore
there is an equivalent Python function for every C function, and sometimes this
can lead to unnecessary, or apparently duplicated Python functions.

The library contains a number of different functions we can use to load an XML 
document:

parseDoc, parseFile, parseMemory, readDoc, readFd, readFile, readMemory,
recoverDoc and recoverFile

All of these functions return an xmlDoc object. Examples for using each of these
follow:

parseDoc(cur) - load an XML document from memory (a string)

doc = libxml2.parseDoc("""
Hello world!""")   


parseMemory(buffer, size) - load an XML document from memory

doc = libxml2.parseMemory(xml, len(xml))

This function performs exactly the same job as parseDoc from a Python 
perspective.


parseFile(filename) - load an XML document from a file

doc = libxml2.parseFile('test.xml')


readDoc(cur, URL, encoding, options) - load an XML document from memory 
(a string)

This version of the function allows you to specify options on a per-document
basis. The parseDoc version uses the parser defaults (in practice, the 
parser global settings, which can also be modified using global functions).

In most cases,
doc = libxml2.readDoc('',None,None,0)
will be equivalent to
doc = libxml2.parseDoc('')

When using XSL, I have found it better to force entities
to be resolved before running the transform, in which case it is useful to
use the following:

doc = libxml2.readDoc( xml, None, libxm