Re: [xml] libxml2 2.9.23 download

2022-03-16 Thread Stefan Behnel
Hi, Jeffrey Walton via xml schrieb am 16.03.22 um 05:45: libxml2 2.9.13 seems to be missing from ftp://xmlsoft.org/libxml2/. As mentioned in the release announcement: https://mail.gnome.org/archives/xml/2022-February/msg9.html the releases have moved to

Re: [xml] Release of libxml2 2.9.13

2022-02-23 Thread Stefan Behnel
Nick Wellnhofer schrieb am 23.02.22 um 11:36: I asked on GNOME infra if it is possible to offer .tar.gz downloads, but this would require changes to the upload script. Thanks for asking. Stefan ___ xml mailing list, project page http://xmlsoft.org/

Re: [xml] Release of libxml2 2.9.13

2022-02-22 Thread Stefan Behnel
Nick Wellnhofer via xml schrieb am 20.02.22 um 13:53: Version 2.9.13 of libxml2 is available at:     https://download.gnome.org/sources/libxml2/2.9/ Thank you for the release, Nick! Note that starting with this release, libxml2 tarballs are published on download.gnome.org instead of

Re: [xml] Resuming maintenance

2022-01-10 Thread Stefan Behnel
Nick Wellnhofer via xml schrieb am 10.01.22 um 15:20: Thanks to a donation from Google, I'm able to resume maintenance of libxml2 (and libxslt) for the remainder of 2022. I'm very happy to read this, Nick. All the best for 2022. Stefan ___ xml

Re: [xml] Release of libxml2 2.9.11

2021-05-14 Thread Stefan Behnel
Stefan Behnel schrieb am 13.05.21 um 23:13: > I haven't looked into them in detail yet but will do so as soon as I find > the time (probably during the next days). It's not possible that lxml is > doing something here that libxml2 doesn't expect, but we'll see. Sorry, I meant to wr

Re: [xml] Release of libxml2 2.9.11

2021-05-13 Thread Stefan Behnel
Jan Tojnar schrieb am 13.05.21 um 21:44: >> I fail to build libxslt 1.1.34 against it. The "configure" script of >> libxslt has this line: > > libxml2 now behaves more correctly by rejecting invalid arguments like > `print`. This is fixed in libxslt master so it no longer passes it the > extra

Re: [xml] Release of libxml2 2.9.11

2021-05-13 Thread Stefan Behnel
Salut Daniel, Daniel Veillard via xml schrieb am 13.05.21 um 15:54: > P, I am way way behind, but now that CVE-2021-3541 is out I just pushed > that long awaited release. libxml2 2.9.11 is tagged in git and a signed > tarball is available at the usual place: > >

[xml] Could we have a new release?

2021-04-14 Thread Stefan Behnel
Hi, libxml2 2.9.10 has been around for almost 18 months now. There have been lots of fixes during that time, so, may I kindly ask what's hindering a new release? Stefan ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org

Re: [xml] Entering freeze for libxml2-2.9.10

2019-10-31 Thread Stefan Behnel
Hi, sorry to be late to the party. Let me note that the release tests fine with lxml, just with two test failures due to changed (and apparently more accurate) error texts/IDs. I'll adapt the tests in lxml. Thank you for the release, Daniel! Stefan Daniel Veillard schrieb am 28.10.19 um

Re: [xml] Entering freeze for release of libxml2-2.9.9

2018-12-24 Thread Stefan Behnel
Nikolai Weibull schrieb am 24.12.18 um 12:00: > Stefan Behnel, 2018-12-24 11:43: >> Nick Wellnhofer schrieb am 19.12.18 um 17:02: >>> On 30/11/2018 11:41, Nikolai Weibull via xml wrote: >>>> OK, now I understand why it was working in my copy of the repository and

Re: [xml] Entering freeze for release of libxml2-2.9.9

2018-12-24 Thread Stefan Behnel
Nick Wellnhofer schrieb am 19.12.18 um 17:02: > On 30/11/2018 11:41, Nikolai Weibull via xml wrote: >> OK, now I understand why it was working in my copy of the repository and >> not yours.  Something went wrong when you applied the patch, Daniel, as a >> line was elided.  Here’s a fix.  We want

Re: [xml] Entering freeze for release of libxml2-2.9.9

2018-11-29 Thread Stefan Behnel
Daniel Veillard schrieb am 29.11.18 um 21:20: > On Mon, Nov 26, 2018 at 11:48:37AM +0100, Nikolai Weibull via xml wrote: >> Stefan Behnel, 2018-11-25 15:37: >>> Nikolai Weibull schrieb am 24.11.18 um 00:12: >>>> Yes, it seems that my patch for data in interleav

Re: [xml] Entering freeze for release of libxml2-2.9.9

2018-11-25 Thread Stefan Behnel
Nikolai Weibull schrieb am 24.11.18 um 00:12: > Yes, it seems that my patch for data in interleaves was added and this may > be the cause of these issues.  The regression tests didn’t display them, so > this is something different.  Could we perhaps get a minimal test that breaks? Here is what I

Re: [xml] Entering freeze for release of libxml2-2.9.9

2018-11-25 Thread Stefan Behnel
Nikolai Weibull schrieb am 24.11.18 um 00:12: > Yes, it seems that my patch for data in interleaves was added and this may > be the cause of these issues.  The regression tests didn’t display them, so > this is something different.  Could we perhaps get a minimal test that breaks? It's a bit

Re: [xml] Entering freeze for release of libxml2-2.9.9

2018-11-23 Thread Stefan Behnel
Salut Daniel! Daniel Veillard via xml schrieb am 22.11.18 um 18:32: > I have just tagged the Release Candidate 1 in git and pushed a signed > tarball and signed rpms to the usual place: > > ftp://xmlsoft.org/libxml2/ I think something changed in the RelaxNG code. When I try to validate a

Re: [xml] Release of libxml2-2.9.5

2017-09-04 Thread Stefan Behnel
Daniel Veillard schrieb am 04.09.2017 um 15:56: > It's out ! I tagged the release in git and pushed the signed tarball > and rpms to the usual place: > > ftp://xmlsoft.org/libxml2/ > > This is mostly a a security and bug fixes, most of the credit goes to Nick > who wrote or reviewed most

Re: [xml] Python 3.5 issue - SystemError: returned a result with an error set

2017-08-31 Thread Stefan Behnel
Petr Sumbera schrieb am 30.08.2017 um 14:00: > anyone seen following error when running Python regression tests? This is > just with Python 3.5. Pythons 2.7 and 3.4 are ok (I haven't tested Python > 3.6). > > ## running Python regression tests > TypeError: 'NoneType' object is not callable > >

Re: [xml] Support of HTML v5 parsing

2015-06-29 Thread Stefan Behnel
Bruce Miller schrieb am 28.05.2015 um 18:37: On 05/28/2015 12:29 PM, Noam Postavsky wrote: On Thu, May 28, 2015 at 12:13 PM, Frank Gross wrote: Are there any plans to support parsing of HTML V5 in libxml ? I tried function htmlCtxtReadMemory(), but it raises an error for HTML document

Re: [xml] Memory usage 32 bit vs. 64 bit Linux

2015-05-12 Thread Stefan Behnel
Daniel Veillard schrieb am 12.05.2015 um 10:41: On Tue, May 12, 2015 at 10:28:34AM +0200, Robert Grasböck wrote: Hello Stefan! Memory consumption has nearly decreased by 50%, that's the good thing. But the bad thing is that the documentation says: no modification of the tree allowed

Re: [xml] Memory usage 32 bit vs. 64 bit Linux

2015-05-11 Thread Stefan Behnel
Robert Grasböck schrieb am 05.05.2015 um 15:52: I have a question about memory usage of libxml2. I'm using libxml2 on two different systems, once a 32 bit linux other one a 64 bit linux. On both I run the same application which use libxml2 to parse xml files. The application opens many small

Re: [xml] [PATCH] Add methods for python3 iterator

2014-09-23 Thread Stefan Behnel
Ron Angeles schrieb am 18.09.2014 um 09:14: xmlCoreDepthFirstItertor and xmlCoreBreadthFirstItertor only implement a python2-compatible iterator interface. The necessary method names (__next__) have been added. They just passthrough to the python2 method (next). --- python/libxml.py | 8

[xml] [BUG+FIX] valid.c erroneously ignores a validation error if no error callback set

2014-02-21 Thread Stefan Behnel
Hi, valid.c contains this code: 2636 if ((ctxt != NULL) (ctxt-error != NULL)) { 2637 xmlErrValidNode(ctxt, attr-parent, XML_DTD_ID_REDEFINED, 2638 ID %s already defined\n, 2639 value, NULL, NULL); 2640 } It prevents the error from

Re: [xml] [bug] external subset ignored by 2.9.0 when parsing in incremental mode

2012-10-20 Thread Stefan Behnel
Noam Postavsky, 11.10.2012 08:41: This patch fixes my test case as well. Rebumping this again then. Thanks for testing. Stefan ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org https://mail.gnome.org/mailman/listinfo/xml

Re: [xml] [bug] external subset ignored by 2.9.0 when parsing in incremental mode

2012-10-09 Thread Stefan Behnel
[bump] See patch in original e-mail. Stefan Behnel, 28.09.2012 13:44: Hi, there is an unfortunate interaction between the progressive parsing mode and the loading of an external DTD, e.g. to inject defaulted attribute values. I see this in lxml's iterparse() implementation that started

[xml] [bug] external subset ignored by 2.9.0 when parsing in incremental mode

2012-09-28 Thread Stefan Behnel
Hi, there is an unfortunate interaction between the progressive parsing mode and the loading of an external DTD, e.g. to inject defaulted attribute values. I see this in lxml's iterparse() implementation that started failing to inject them in libxml2 2.9.0. It uses incremental push parsing. The

Re: [xml] Availability of libxm2-2.9.0 release candidate 1

2012-08-10 Thread Stefan Behnel
Daniel Veillard, 10.08.2012 07:21: BTW do you have a git commit for 2.9.0 preparation in lxml now ? I may forward this to the packager for Fedora. Hmm, I'm fixing it up only for lxml 3.0. Due to various changes in the code base, that won't apply directly to the latest 2.3.x, and I'm not sure

Re: [xml] Availability of libxm2-2.9.0 release candidate 1

2012-08-09 Thread Stefan Behnel
Daniel Veillard, 10.08.2012 04:42: Following the first rc0 snapshot from last week and after much cleanup and testing, the first release candidate for the next libxml2 release is available at the usual place: [...] As stated previously, I target a final release beginning of September,

Re: [xml] xmlXPathNodeSetSort performance

2012-08-08 Thread Stefan Behnel
Vojtech Fried, 08.08.2012 12:18: I had to do some changes to the original code to make it compile with msvc. Did you report them back to the original author? (note that github allows you to create pull requests through the web interface, you can edit single files right in place) Stefan

Re: [xml] Important: possible incompatible changes ahead for 2.9.0 !

2012-08-07 Thread Stefan Behnel
Daniel Veillard, 07.08.2012 10:16: On Mon, Aug 06, 2012 at 11:39:23PM +0200, Stefan Behnel wrote: thanks for the heads-up. I don't care all that much about the global dict size - 10M entries should be hard enough to reach for normal use cases. Most users only deal with a very small number

Re: [xml] Important: possible incompatible changes ahead for 2.9.0 !

2012-08-07 Thread Stefan Behnel
Daniel Veillard, 06.08.2012 09:00: I have put a snapshot tarball libxml2-2.9.0-rc0.tar.gz (and rpms) for people to have a try and raise issues with this change One minor issue: I think you forgot to regenerate the documentation for the above tar ball. I just noticed it because I routinely

Re: [xml] Important: possible incompatible changes ahead for 2.9.0 !

2012-08-06 Thread Stefan Behnel
Hi Daniel, thanks for the heads-up. I don't care all that much about the global dict size - 10M entries should be hard enough to reach for normal use cases. Most users only deal with a very small number of XML formats. But I did run into issues with the buffer changes. Daniel Veillard,

Re: [xml] xmlXPathNodeSetSort performance

2012-07-31 Thread Stefan Behnel
Stefan Behnel, 29.07.2012 06:55: Vojtech Fried, 26.07.2012 18:17: Third version of the timsort patch. Unfortunately, I was not able to finish it. It does not link on windows and I didn't test it in any way. But if anyone wants to try it, it is probably not far away... I moved the code to .c

Re: [xml] xmlXPathNodeSetSort performance

2012-07-28 Thread Stefan Behnel
Vojtech Fried, 26.07.2012 18:17: Third version of the timsort patch. Unfortunately, I was not able to finish it. It does not link on windows and I didn't test it in any way. But if anyone wants to try it, it is probably not far away... I moved the code to .c file and had to do some other

Re: [xml] xmlXPathNodeSetSort performance

2012-07-26 Thread Stefan Behnel
Vojtech Fried, 26.07.2012 15:45: Keeping it in header has the advantage that it remains generic and can be used from anywhere and with any type of parameters (e.g. not only for sorting xmlNodePtrs). If in .c file, there can only be one sort function. Although since the sort is used from only

Re: [xml] xmlXPathNodeSetSort performance

2012-07-26 Thread Stefan Behnel
Vojtech Fried, 26.07.2012 16:30: What I meant is that libxml currently sorts '/item[true()]', but it does not sort '/item'. I agree, it does not need to sort any of them. I agree with the flag sorted too, but it is another optimization, independent on what I am trying to do now. Ok. I'm just

Re: [xml] xmlXPathNodeSetSort performance

2012-07-25 Thread Stefan Behnel
Vojtech Fried, 25.07.2012 17:45: Second version of Timsort patch, slightly more polished. It builds on my gcc, I have fixed some warnings and merged the two headers into one. I did not move the code to .c file though, because the sort implementation uses some macro magic, i.e. the functions

[xml] open libxml2 crash bugs in lxml's bug tracker

2012-07-02 Thread Stefan Behnel
Hi, lxml's bug tracker currently holds two user code triggered crash bugs for libxml2: https://bugs.launchpad.net/lxml/+bug/1009118 - segfault with XPath expression with unknown namespace and nested function calls https://bugs.launchpad.net/lxml/+bug/502959 - segfault when parsing docbook XML

Re: [xml] Support for really large XML documents

2012-06-02 Thread Stefan Behnel
Hi, note that your top-posting makes it harder to follow the discussion and to reply to it. Vit Zikmund, 25.05.2012 13:10: Well, you are right with the buffer writing to memory and the author of the XMLSec library confirmed that he has to have the whole document there due to c14n. Also it

Re: [xml] Release candidate 1 of libxml2 2.8.0

2012-05-15 Thread Stefan Behnel
Daniel Veillard, 15.05.2012 14:48: I finally managed to go though all the patches which accumulated in Gnome bugzilla and do the various necessary cleanups to try to get a release. There is however *many* changes, especially on the portability side which I just can't test myself (including

[xml] XPath performance issues

2011-11-04 Thread Stefan Behnel
Hi, almost exactly two years ago, I brought up the topic of the surprisingly unpredictable XPath performance on this list (thread titled confusing xpath performance characteristics, without response at the time). The problem is not the actual search, but the merging of node sets after the

Re: [xml] libxml2 messed up MonoTouch and Interface Builder

2010-11-09 Thread Stefan Behnel
James Wright, 08.11.2010 17:40: I tried to install libxml2 yesterday for a Ruby side project of mine. First I tried it with MacPorts but my MacPorts wouldn't work so I tried to download the source for libxml2 and make the install which ran with some errors but nothing that stopped the install.

Re: [xml] libxml2/libxslt: global variables considered harmful

2010-07-28 Thread Stefan Behnel
Daniel Veillard, 28.07.2010 11:53: On Wed, Jul 14, 2010 at 02:07:42PM +0200, Michael Stahl wrote: IMHO such a design would also be possible for libxml2/libxslt, but of course this would be an incompatible interface change. usually there isn't much enthusiasm for that kind of thing :)

Re: [xml] Walking tree without recursion

2010-06-25 Thread Stefan Behnel
Michael Ludwig, 23.06.2010 23:29: Oliver Kindernay schrieb am 23.06.2010 um 18:39 (+0200): I am using libxml2 HTML 4.0 parser to parse HTML and XHTML web pages. I want to found specific tags (i.e a), so I have to walk through the tree of parsed document. And I don't want to use recursion like

Re: [xml] HTMLparser

2010-04-29 Thread Stefan Behnel
Sergio Monteiro Basto, 28.04.2010 20:08: who is the maintainer of HTMLparser , I had report a bug , and no one had reply . What I could do about that ? Should HTMLparser parse bad broken html or not ? IIRC, the last thing I read was that the HTML parser should basically follow HTML5 where

Re: [xml] XPath issue

2010-03-18 Thread Stefan Behnel
Joshua Kwan, 17.03.2010 18:21: I've got an interesting problem about libxml2's XPath support posted on stackoverflow: http://stackoverflow.com/questions/2459428/weird-xpath-behavior-in-libxml2 Please read about it there. There haven't been any answers, so I thought I would consult the real

Re: [xml] confusing xpath performance characteristics

2010-01-27 Thread Stefan Behnel
[bump] Any comments? Stefan Behnel, 09.11.2009 19:23: Stefan Behnel, 09.11.2009 09:57: It's the last operation, merging and sorting large sets of results, that makes this extremely slow - it takes 92% of the evaluation time in my tests (using libxml2 2.7.5). It's much faster to traverse

Re: [xml] XML validation using Schematron using LibXml2?

2009-12-16 Thread Stefan Behnel
Andrew Hartley, 16.12.2009 12:50: Is it possible yet with the latest libxml2 build to validate an XML document using a Schematron? If so can you update the LibXml2 web site to show code examples of how you go about doing this please? If this is not yet possible, do you you know when this is

Re: [xml] Line number value limit

2009-11-12 Thread Stefan Behnel
Hi, Csaba Raduly, 12.11.2009 10:29: Why is the line number in xmlNode limited to an unsigned short ? Because it's a trade-off between space and usefulness. Note that the parser reports line numbers without that limitation. Only the xmlNode struct restricts it. This is a FAQ, BTW. You can look

Re: [xml] confusing xpath performance characteristics

2009-11-09 Thread Stefan Behnel
Stefan Behnel, 09.11.2009 09:57: It's the last operation, merging and sorting large sets of results, that makes this extremely slow - it takes 92% of the evaluation time in my tests (using libxml2 2.7.5). It's much faster to traverse the document in a single step, and just select single

Re: [xml] html parsing incomplete - bug?

2009-10-13 Thread Stefan Behnel
Lydia Patrovic wrote: Note the mainamp;20090924_2 attribute value, which can be interpreted as an unterminated entity. :) Nice little Freudian copypaste quoting error. Here's the line from the real 'HTML' file: script type=text/javascript src=merge.php?f=main20090924_2/script Note the

Re: [xml] html parsing incomplete - bug?

2009-10-13 Thread Stefan Behnel
Martin (gzlist) wrote: On 13/10/2009, Stefan Behnel stefan...@behnel.de wrote: Lydia Patrovic wrote: Note the mainamp;20090924_2 attribute value, which can be interpreted as an unterminated entity. :) Nice little Freudian copypaste quoting error. Here's the line from the real 'HTML' file

Re: [xml] html parsing incomplete - bug?

2009-10-13 Thread Stefan Behnel
Daniel Veillard wrote: On Tue, Oct 13, 2009 at 01:22:12PM +0100, Martin (gzlist) wrote: On 13/10/2009, Stefan Behnel wrote: I wonder why the parser stops parsing here, though. Is '\0' explicitly considered an invalid character in (broken) HTML, or is it really just the usual C EOS slip

[xml] parsing UCS4 in chunks fails with 2.7.4/5

2009-09-28 Thread Stefan Behnel
Hi, there seems to be a change in libxml2 2.7.4 that prevents it from parsing a Python unicode string buffer, which is UCS4-LE encoded on my system. The first call to xmlCtxtResetPush() works and parses the first chunk as expected, but subsequent calls to xmlParseChunk() then fail with an error:

Re: [xml] Release of libxml2-2.7.4

2009-09-11 Thread Stefan Behnel
Daniel Veillard wrote: Better late than never, but an awful lot of pending bug got fixed. Still no major improvement except adding symbol versioning to libxml2 shared libs, which is fairy important for long term maintainance, but not worth jumping to 2.8.0 Tarball and signed rpms

Re: [xml] c14n 1.1 support (patch)

2009-08-20 Thread Stefan Behnel
Hi, Aleksey Sanin wrote: Please find attached a patch that adds support for the new version of c14n (http://www.w3.org/TR/xml-c14n11/). I am getting questions about it in the xmlsec mailing list and I finally decided to implement it. I would greatly appreciate if you can accept this patch

Re: [xml] Inserting XML Schema default attributes

2009-05-31 Thread Stefan Behnel
Hi, as a quick follow-up: injecting default attributes works when applying the schema *after* the parsing step, it does *not* work when validating inside the parser using the SAX plug. Stefan Stefan Behnel wrote: I'm trying to inject default attributes into a document from an XML Schema

Re: [xml] Possible bug, libxml segfault

2009-05-30 Thread Stefan Behnel
Hi, this looks more like a problem in lxml, so I'll answer on the lxml mailing list. Stefan Avleen Vig wrote: Background: We use libxml and libxslt in one of our applications (specifically, in Python via lxml). Recently we've seen our application dying at strange times for no apparent

[xml] parse-time validation against a user provided DTD

2009-05-09 Thread Stefan Behnel
Hi, looking through the API docs, I can't really figure out a way to stick an external DTD into the parser, so that it validates against that rather than trying to load a DTD for the DOCTYPE (or also to do DTD validation if the document does not define a DOCTYPE at all). I can see that xmllint

[xml] Inserting XML Schema default attributes

2009-05-08 Thread Stefan Behnel
Hi, I'm trying to inject default attributes into a document from an XML Schema during parsing. I set up a validation context and set the XML_SCHEMA_VAL_VC_I_CREATE option on it, which, if I understand the docs correctly, tells the validator to create defaulted/fixed attributes if they do not

Re: [xml] xmllint working with 213MB large xml files

2009-02-27 Thread Stefan Behnel
Hi, Janis Rocans wrote: Last day I used xmllint (for windows xmllint.exe: using libxml version 20703) to validate xml file against XSD, but it found a lots of errors on line 65535 (binary ). I noticed, that there's no errors, but the line counter stucked. I believe theres a

Re: [xml] lxml binary for Python 2.6+

2009-01-13 Thread Stefan Behnel
Casey Schroeder wrote: I am searching for an easy way to get lxml for v. 2.6 Python on windows. Can someone tell me if there is a comparable exe to those listed here for 2.6? http://users.skynet.be/sbi/libxml-python/ In case you really meant lxml (and not libxml2, for which this is the

Re: [xml] libxml2 and Python 2.6 on WindowsXP

2009-01-13 Thread Stefan Behnel
Bernd Blacha wrote: I want to use libxml / libxslt in Windows XP. If all you want is to use the libraries and not the Python modules that have the same API, you might be better off using lxml as it provides a much more pythonic interface. The latest stable MS-Windows builds are here:

Re: [xml] libxml2 very slow on big data dump

2008-12-16 Thread Stefan Behnel
Hi, Alexandre Macard wrote: I try dump a node from a big xml (near 7mo), and the libxml2 is very slow to respond. I tried to trace the problem and it seems to take all it's time into the function: xmlOutputBufferWriteEscape. I do not need to escape data because I use a base64 encoding. You

Re: [xml] libxml2 very slow on big data dump

2008-12-16 Thread Stefan Behnel
Alexandre Macard wrote: Stefan Behnel a écrit : Alexandre Macard wrote: I try dump a node from a big xml (near 7mo), and the libxml2 is very slow to respond. I tried to trace the problem and it seems to take all it's time into the function: xmlOutputBufferWriteEscape. I do not need

[xml] Fwd: [lxml-dev] lxml RelaxNG validation on hand-built documents

2008-11-07 Thread Stefan Behnel
Hi, any idea what might trigger this? The main API calls we use are: ctx = xmlRelaxNGNewParserCtxt(filename) schema = xmlRelaxNGParse(ctx) xmlRelaxNGFreeParserCtxt(ctx) ... // create doc ... vc = xmlRelaxNGNewValidCtxt(schema)

[xml] News from the RNC front?

2008-09-11 Thread Stefan Behnel
Hi, is there actually any news from the RelaxNG compact syntax parser front? Last I heard, it was considered for inclusion way back when libxml2 2.6 was still young. I tried to find the original patch and found several mails that mentioned it, but none that contains it. The bug tracker doesn't

Re: [xml] libxml2 2.7.1 breaks XML serialisation of HTML trees

2008-09-10 Thread Stefan Behnel
Hi, Martin (gzlist) wrote: On 08/09/2008, Stefan Behnel [EMAIL PROTECTED] wrote: there was a change in 2.7.1 (xmlsave.c, ~760) that prevents HTML documents from being serialised in XML style... ... If the current behaviour is wanted, what's the future way of achieving this *without

Re: [xml] libxml2 2.7.1 breaks XML serialisation of HTML trees

2008-09-10 Thread Stefan Behnel
Hi, Daniel Veillard wrote: On Mon, Sep 08, 2008 at 03:01:29PM +0200, Stefan Behnel wrote: I now wonder why there are two serialisation methods (xmlNodeDump* and htmlNodeDump*) that ultimately do the same thing, instead of serialising to what they are named after. Well the goal is more

[xml] libxml2 2.7.1 breaks XML serialisation of HTML trees

2008-09-08 Thread Stefan Behnel
Hi, there was a change in 2.7.1 (xmlsave.c, ~760) that prevents HTML documents from being serialised in XML style. That was actually a very convenient feature in lxml, where you could select between XML and HTML serialisation of an HTML tree based on a keyword argument. I now wonder why there

Re: [xml] Processing information in a buffer to XML-document conversion

2008-09-04 Thread Stefan Behnel
Hi, Goran Hasse wrote: When a xmlParseMemory( ... ) i called a ! xml processing tag is inserted in the document. Definitely not. I think you are mixing this up with the serialisation function you are using. Stefan ___ xml mailing list,

Re: [xml] Character reference encoding is slow

2008-08-31 Thread Stefan Behnel
Salut Daniel! Daniel Veillard wrote: [loads of interesting results from the analysis of a pathological case] Anyway to make a long story short, I spent a few hours today fixing the problem by adding support for a new kind of buffers avoiding most of the memmoves needed when handling an

[xml] Character reference encoding is slow

2008-08-29 Thread Stefan Behnel
Hi, we got a report on the lxml list where someone tried to parse and serialise a file that contains 8,000,000 non-ASCII character references (#135;), as in text + #135; * 800 + /text Parsing this is pretty fast, so that's not the problem, but serialising this document back to a

Re: [xml] Cleaning the Web - Implementing HTML 5 parsing in libxml2

2008-08-18 Thread Stefan Behnel
Hi, Karl Dubost wrote: Nick Kew weighed in and proposed that we should target [6]libxml which includes an HTML parser and is already supported by Apache server and many other tools. [6] http://xmlsoft.org/html/libxml-HTMLparser.html From here it would be interesting to

Re: [xml] Cleaning the Web - Implementing HTML 5 parsing in libxml2

2008-08-08 Thread Stefan Behnel
Karl Dubost karl at w3.org writes: I have written a short document to explain the project [Cleaning the Web][1]. It describes what is html5 and what would be the benefits of implementing the html 5 parsing algorithm in libxml2 html parser. There's already an HTML5 implementation in Python

Re: [xml] Better hash function for dict.c

2008-08-06 Thread Stefan Behnel
Salut Daniel, Daniel Veillard wrote: - the second one is unfortunately not fixeable as is it comes from the key hash definitions themselves: -#define xmlDictComputeKey(dict, name, len) \ -(((dict)-size == MIN_DICT_SIZE) ? \ -

Re: [xml] Better hash function for dict.c

2008-08-06 Thread Stefan Behnel
Daniel Veillard wrote: Another option I looked at is the 'One-at-a-Time Hash' from http://burtleburtle.net/bob/hash/doobs.html , looking at the criterias and the results it looks like a good hash too, not too expensive and should work well. The page says it's pretty good when inlined, which

Re: [xml] Better hash function for dict.c

2008-08-06 Thread Stefan Behnel
Stefan Behnel wrote: Daniel Veillard wrote: if you have a bit of time then, maybe you can rerun your initial tests with that one, is that possible ? I can try, sure. Just send me a patch that removes the current hash function from SVN and adds the new one, and I will find a way to compare

Re: [xml] enabling zlib support in Stephane Bidoul's Python binding? (win32)

2008-07-28 Thread Stefan Behnel
Meunier, Jean-Luc wrote: On win32, I'm interested in having the zlib support in libxml2 from Python. If zlib support refers to parsing from zlib compressed XML files, lxml will let you do that. http://codespeak.net/lxml/ If you really want it enabled in a binary build of the original libxml2

Re: [xml] Problems with schema validation

2008-07-18 Thread Stefan Behnel
Hi, Robert Schweikert wrote: Hi I am trying to validate xsd files and am running into a problem. I have a negative test, i.e. a file I know is invalid, yet it passes validation. I used the example code from http://wiki.njh.eu/XML-Schema_validation_with_libxml2 and wrote a short main

Re: [xml] Parsing from a compressed string

2008-07-15 Thread Stefan Behnel
Daniel Veillard wrote: On Sun, Jul 13, 2008 at 07:44:12AM +0200, Stefan Behnel wrote: Hi, it seems that libxml2 can parse zlib compressed data from files. What would be the right way to parse compressed data from a string in memory? And, yes, I want to avoid unpacking it before I parse

[xml] Parsing from a compressed string

2008-07-12 Thread Stefan Behnel
Hi, it seems that libxml2 can parse zlib compressed data from files. What would be the right way to parse compressed data from a string in memory? And, yes, I want to avoid unpacking it before I parse it. Same question for serialisation? Is there anything like a compressing OutputBuffer?

[xml] docs and dicts in xmlSetTreeDoc()

2008-05-02 Thread Stefan Behnel
Hi, I've just fixed a long-standing problem in lxml, now I'm wondering if it isn't actually a problem in libxml2. The function xmlSetTreeDoc() in tree.c is called to update the xmlDoc* pointers of each node in a subtree when it gets appended to a new parent in a different document. The question

Re: [xml] Better hash function for dict.c

2008-04-20 Thread Stefan Behnel
Hi again (and sorry for all the noise), Stefan Behnel wrote: If an application benefits from a different hash function depends on the vocabulary it uses in its XML files. A slow but well distributing hash function performs much better for large vocabularies (or many different vocabularies

Re: [xml] Better hash function for dict.c

2008-04-19 Thread Stefan Behnel
Hi, Daniel Veillard wrote: On Thu, Apr 17, 2008 at 10:05:03AM -0400, Daniel Veillard wrote: Since you seems to be interested in the performances of the hash algorithm, I tried to drop the string comparisons on lookup when possible I have an old patch for this which I'm enclosing, but I

Re: [xml] Better hash function for dict.c

2008-04-19 Thread Stefan Behnel
Hi, Daniel Veillard wrote: On Wed, Apr 16, 2008 at 10:53:04PM +0200, Stefan Behnel wrote: I would prefer to see benchmarks done with xmllint directly, to avoid side effect of more string interning than libxml2. Ok, I did some testing with xmllint. I noticed that things can easily get slower

Re: [xml] Better hash function for dict.c

2008-04-18 Thread Stefan Behnel
Hi Daniel, Daniel Veillard wrote: On Thu, Apr 17, 2008 at 10:05:03AM -0400, Daniel Veillard wrote: Since you seems to be interested in the performances of the hash algorithm, I tried to drop the string comparisons on lookup when possible I have an old patch for this which I'm enclosing,

[xml] Better hash function for dict.c

2008-04-17 Thread Stefan Behnel
Hi, long mail, bottom line being: 30% to multiple times faster parsing with a different hash function in dict.c. Keep reading. I did some profiling on lxml using callgrind. lxml configures the parser to use a global per-thread dictionary, so everything it parses ends up in one dictionary instead

Re: [xml] Better hash function for dict.c

2008-04-17 Thread Stefan Behnel
Hi, Stefan Behnel wrote: http://www.azillionmonkeys.com/qed/hash.html is quite short and readable and seems to do what I was looking for. Some more real-world numbers. I used lxml to parse - using xmlCtxtParseFile() - all .xml and .xsd files that locate found on my hard disc, some 58000 files

Re: [xml] parsing soap..

2008-03-09 Thread Stefan Behnel
Hi, Subramanian S wrote: I am not able to parse the soap messages. I am using libxml2. The simple method of node traversing is not working. How can it be done??? This might help: http://catb.org/~esr/faqs/smart-questions.html Stefan

Re: [xml] indentation after adding new nodes

2008-03-07 Thread Stefan Behnel
Hi, Senthil Nathan *top-posted*: On Thu, Mar 6, 2008 at 11:33 PM, Stefan Behnel [EMAIL PROTECTED] wrote: Senthil Nathan wrote: I tried using the xmlCopyDocNode( ) and xmlCopyNode( ). It copies the node but the indentation is not proper. There is no indentation in an XML tree, but there may

Re: [xml] indentation after adding new nodes

2008-03-07 Thread Stefan Behnel
Senthil Nathan *top-posted* again: I used xmlSaveFile(file.xml, docTree); This just dumps the xmlDocPtr docTree to file.xml without any indentation. Is there any options in libxml2 to set it properly. Care to read the manual? Go to http://xmlsoft.org/html/libxml-tree.html and look for

Re: [xml] indentation after adding new nodes

2008-03-06 Thread Stefan Behnel
Hi, Senthil Nathan wrote: I tried using the xmlCopyDocNode( ) and xmlCopyNode( ). It copies the node but the indentation is not proper. There is no indentation in an XML tree, but there may be text nodes that contain whitespace. Maybe you didn't copy those? How can we set the indentation in

[xml] XML Schema crash in W3C test suite

2008-02-28 Thread Stefan Behnel
Hi, I just ran xmllint of a vanilla libxml2 2.6.31 over the SUN part of the W3C XML Schema test suite. I get a couple of failures, but also a crash in one case, so I thought I'd send in the results. BTW, does anyone have a script to run the whole suite? For example, I have no idea how to figure

Re: [xml] Useless function calls in xmlSetProp()?

2008-02-22 Thread Stefan Behnel
Hi, Julien Charbon wrote: - With old xmlSetProp(): $ ./test-setprop-big Size: 8 Time: 000:14397 Size: 16 Time: 000:03429 Size: 32 Time: 000:03164 [...] - With new [now current] xmlSetProp(): $ ./test-setprop-big Size: 8 Time:

Re: [xml] Return value of xmlCharEncodingInputFunc

2008-02-01 Thread Stefan Behnel
Hi, Ralf Junker wrote: I am in doubt about the -1 return value of these function prototypes: * xmlCharEncodingInputFunc * xmlCharEncodingOutputFunc The documentation says that -1 means lack of space. However, in various implementations of these function prototype I see this: if

Re: [xml] ATTRIBUTE NAME validation problem

2008-01-28 Thread Stefan Behnel
Hi, murali wrote: !ATTLIST doc : CDATA #IMPLIED is a valid declaration of attribute : for element doc. But , currently LIBXML2 generates a error when it encounters this. well, that's just because ':' isn't really a well-formed attribute name. So you're actually lucky libxml2 tells you

Re: [xml] handling xpath error - libxml

2008-01-28 Thread Stefan Behnel
Hi, Senthil Nathan wrote: I would like to how to handle the xpath error gracefully when I use the libxml api, xmlXPathEvalExpression(path, xpathCtx). If I pass a invalid path string to evaluate on the xpathCtx, it throws the error as below and stops there. But I would like to handle that

Re: [xml] handling xpath error - libxml

2008-01-28 Thread Stefan Behnel
Hi, Senthil Nathan wrote: On 1/28/08, Stefan Behnel [EMAIL PROTECTED] wrote: Hi, Senthil Nathan wrote: I would like to how to handle the xpath error gracefully when I use the libxml api, xmlXPathEvalExpression(path, xpathCtx). If I pass a invalid path string to evaluate on the xpathCtx

Re: [xml] ATTRIBUTE NAME validation problem

2008-01-28 Thread Stefan Behnel
Hi, Mike Hommey wrote: But that's still true that the XML declaration spec tells : is a valid attribute name in an ATTLIST. ::: is, too. http://www.w3.org/TR/xml/#NT-Name That's puzzling. Hmm, interesting. Even the errata list this as a valid production for a Name.

Re: [xml] XHTML 1.0 to HTML 4.01

2008-01-24 Thread Stefan Behnel
Florent Guiliani wrote: I wonder if there a way, within libxml2, to convert an XML document (xmlDocPtr) that contains valid XHTML 1.0 into HTML 4.01 ? A way with only libxml2 function calls would be perfect. If you parse it in, you can use the HTML parser, which should also handle XHTML

  1   2   >