Re: [xml] libxml2 equivalents for expat's XML_GetCurrentByteIndex and XML_GetCurrentByteCount

2012-10-18 Thread Daniel Veillard
On Thu, Oct 18, 2012 at 06:19:31PM +0200, Graham Leggett wrote: > On 18 Oct 2012, at 6:07 PM, Daniel Veillard wrote: > > > See xmlByteConsumed() but it's more complex for us than for expat > > as we convert the initial byte stream to UTF-8 if it was in a different > > encoding. See the xmlByteCo

Re: [xml] xpath evaluation timeout

2012-10-18 Thread Zhigang Chen
Thanks Liam We are building a platform to which codes containing xpaths are submitted by external users. Manual optimization of xpaths are infeasible. Do you know about any tools that can automate it? - Z On Oct 18, 2012, at 7:15 PM, Liam R E Quin wrote: > On Thu, 2012-10-18 at 18:00 -0700,

Re: [xml] xpath evaluation timeout

2012-10-18 Thread Liam R E Quin
On Thu, 2012-10-18 at 18:00 -0700, Zhigang Chen wrote: > Hi > > We sometimes run into the situation where a pretty expensive xpath > (e.g. .//table//td[@class]) is run on a big document (~ 9M) and it > takes very very long. In fact we never see it finish. [resending from the right account, sorry]

[xml] xpath evaluation timeout

2012-10-18 Thread Zhigang Chen
Hi We sometimes run into the situation where a pretty expensive xpath (e.g. .//table//td[@class]) is run on a big document (~ 9M) and it takes very very long. In fact we never see it finish. I did some digging and found that it is spending very long time on xmlXPathNodeSetMergeAndClear. So I w

Re: [xml] libxml2 equivalents for expat's XML_GetCurrentByteIndex and XML_GetCurrentByteCount

2012-10-18 Thread Graham Leggett
On 18 Oct 2012, at 6:07 PM, Daniel Veillard wrote: > See xmlByteConsumed() but it's more complex for us than for expat > as we convert the initial byte stream to UTF-8 if it was in a different > encoding. See the xmlByteConsumed() code. The docs say "This function provides the current index of

Re: [xml] libxml2 equivalents for expat's XML_GetCurrentByteIndex and XML_GetCurrentByteCount

2012-10-18 Thread Daniel Veillard
On Thu, Oct 18, 2012 at 04:35:25PM +0200, Graham Leggett wrote: > Hi all, > > I am currently tasked with replacing the expat parser within an application > with the more lenient html parser found in libxml2. > > I am using the parser to work out the location within the document of certain > ele

[xml] libxml2 equivalents for expat's XML_GetCurrentByteIndex and XML_GetCurrentByteCount

2012-10-18 Thread Graham Leggett
Hi all, I am currently tasked with replacing the expat parser within an application with the more lenient html parser found in libxml2. I am using the parser to work out the location within the document of certain elements (tags), and once I have found the element I am looking for, I need to k