On Sat, 2022-07-30 at 17:15 +, Paul Kinnucan via xml wrote:
> Hi,
>
> I need to parse XML files whose paths may contain Unicode characters,
> for example,
>
> W:\jtbug\uc£\mydoc£.xml
>
> What is the best way to do this with libxml2?
Sounds like you are using Microsoft Windows and are going
On Fri, 2021-09-17 at 12:15 +0200, Julius Hamilton via xml wrote:
> Hey,
>
> I would like to write a script which extracts article text content
> from
> webpage HTML.
You might want to look at xidel for that.
>
> I believe I should first inspect the HTML tree, i.e. the raw HTML
> returned
> by
On Wed, 2021-01-27 at 13:37 -0600, Danny Holstein via xml wrote:
>
> With modes "w+" and "w+b", I get XML_IO_FLUSH. Adding "c" to the
> modes
> doesn't change anything, "x" causes the fopen() to fail with a "*file
> exists*" error.
Sounds like you are trying to overwrite a file that already
On Tue, 2021-01-05 at 19:12 +0100, Stefan de Konink wrote:
>
>
> Yesterday I wrote a custom validator in lxml for key/keyref and
> unique
> constraints.
Could you do this instead using schematron?
It may be somewhat slower but easier to maintain.
Liam
--
Liam Quin - web slave for
On Tue, 2019-11-26 at 22:48 -0700, Eric Eberhard wrote:
> Thank you much. I am on AIX (IBM).
OK. Check the man page for malloc
man 3 malloc
and see if there are environment variables to check each allocation.
Years ago (1980s) i ended up writing wrappers around malloc and free
that took an
On Tue, 2019-11-26 at 11:47 -0700, Eric Eberhard wrote:
> Is there a C call to see how much memory one is consuming? I could
> likely put that in to try and find it.
Depends on your operating system - there are also environment variables
you can se that affect the bahaviour of malloc() in
On Tue, 2019-09-17 at 09:26 -0400, Webb Scales wrote:
> Is it possible that the error message is wrong?
Like most parsers, if given faulty input the output is sometimes
unexpected.
I'm not a maintainer here, but i'd guess that a patch to detect the
case of the document having no XML elements in
On Tue, 2019-09-17 at 00:47 -0400, Webb Scales wrote:
> Would a file containing just an XML comment, e.g.,
>
>
>
>
> be an acceptable input to LibXML2?
Let's look at the specification of XML,
https://www.w3.org/TR/REC-xml/
We see in section 2.1,
[[
[Definition: A textual object is a
On Tue, 2019-09-10 at 00:29 -0400, Webb Scales wrote:
>
> If the TextReader didn't insist upon reading beyond the root end-tag,
All XML parsers do that, as the spec requires them to check if anything
follows it and raise an error if so.
Liam
--
Liam Quin - web slave for
On Mon, 2019-09-09 at 22:41 -0400, Webb Scales wrote:
> the
> fact remains that I don't control the text that I'm trying to parse,
> and I still need to parse it, even though it's not "well-formed".
You may need to write some form of pre-processor that fixes the
problems. As you say, that may
On Fri, 2019-07-05 at 12:18 -0700, Eric Eberhard wrote:
> Dear Ashjan,
>
> If it was me I'd do it the cheap way and not use the parser.
Make sure to handle markup in comments and CDATA sections properly,and
to process external files included with XInclude or by entities defined
in the DTD.
On Thu, 2019-07-04 at 10:33 +0100, Ashjan Alsulaimani wrote:
>
>
> What's the best way to approach such a task and the most efficient
> way as I'm dealing with Medline database!
If your input files are a few hundred megabytes or less, start with the
XSLT identity transform and add empty
On Fri, 2018-08-17 at 14:42 +0200, André Rothe wrote:
>
> https://3v4l.org/O0iEf
Try changing
...writeln('');
to
...writeln('<' + '/td>');
and see if that helps; or use a CDATA section,
to escape the markup from the HTML parser.
Although it may depend on what the
On Fri, 2018-08-10 at 02:46 +0100, James Read via xml wrote:
> I have a bunch of html files on disk and want to open them and
> extract the contents of the title tag using libxml2.
By this do you mean the title element in the head?
You can use XPath on an XML document to extract
-1 code point for an E with grave accent, so
this is correct.
How are you printing it or inspecting the result to see \310 instead of
È?
Liam
--
Liam R. E. Quin <l...@holoweb.net>
___
xml mailing list, project page http://xmlsoft.org/
xml@g
On Tue, 27 Oct 2015 15:09:57 -0600
Alex Henrie <alexhenri...@gmail.com> wrote:
> 2015-10-26 19:31 GMT-06:00 Liam R E Quin <l...@holoweb.net>:
> > URIs with unescaped ampersands are a syntax error, however...
>
> Invalid XML is rejected with or without this patch.
On Thu, 2015-04-23 at 19:04 -0500, Andrew Pennebaker wrote:
When I try to hand xmlstarlet the same XPATH query syntax I use in
Chrome
and Firefox, xmlstarlet never seems to find any results. Is there a
document detailing the idiosyncrasies of xmlstarlet XPATH vs WebKit
XPATH,
so that I
On Mon, 2013-04-29 at 21:26 +0400, Nikita Churaev wrote:
OK, so this is really guaranteed by the standard, good. But how libxml2
uses that is still insane.
The libxml API has been... rather widely used... for more than a
decade...
Your mail described some of your expectations when coming to
On Mon, 2013-04-29 at 08:32 +0400, Nikita Churaev wrote:
It doesn't have anything to do with C standard.
It does. Take for example:
struct A {
int q;
int w;
/* end of common part */
float x;
};
struct B {
int q;
int w;
/* end of common part */
double x;
};
On Tue, 2013-04-02 at 09:06 +0200, Amandine Piguel wrote:
Hello,
I would like to know if libxml2 is able to parse HTML5 files, and if
not, if it will be supported in the futur.
Note, libxml's HTML parser is really good at making sense of HTML input,
but it is not a formal HTML parser - the
On Sat, 2013-03-30 at 08:02 +0100, Martin B. wrote:
[...]
It turns out however, that the subtree where the large data resides has
to be read not in-order, but I have to collect some (small amount of)
data before the other.
Do you process the file only once, or many times, between times when
On Mon, 2013-02-04 at 17:14 +0100, Petr Sumbera wrote:
[...]
As far as I can tell Solaris guarantees only backward compatibility. And
this is not the case. Actually it's very likely that you won't be able
to execute binary (if it's not just hello world) on older system because
new binary
On Wed, 2012-11-14 at 15:11 -0600, Ramon F Herrera wrote:
This is a follow-up question. If I want to add XSD schema validation to
my code, what should I use as reference:
(a) The sample programs, since several mention validation?
(b) The xmllint utility?
xmllint; XML validation usually
On Sun, 2012-11-11 at 11:04 -0600, Ramon F Herrera wrote:
The *xmllint* tool does not seem to be XSD-aware.
xmllint --help should suggest otherwise to you, as should the manual
page.
I did find some
comments about XSD in the archives. I naively assumed that adding the
following line would
On Thu, 2012-10-18 at 19:25 -0700, Zhigang Chen wrote:
Thanks Liam
We are building a platform to which codes containing xpaths are
submitted by external users. Manual optimization of xpaths are
infeasible. Do you know about any tools that can automate it?
Setting aside the security
On Thu, 2012-10-18 at 18:00 -0700, Zhigang Chen wrote:
Hi
We sometimes run into the situation where a pretty expensive xpath
(e.g. .//table//td[@class]) is run on a big document (~ 9M) and it
takes very very long. In fact we never see it finish.
[resending from the right account, sorry]
I
On Sun, 2012-09-16 at 12:50 -0500, Ramon F Herrera wrote:
Hello all:
I am glad to report that after I replaced my previous XPath code with
libxml2, my application is running faster. In fact, the qualifier
Dramatic performance gains is an *understatement*. This result is from
one of my
On Fri, 2012-08-24 at 12:21 +0800, Daniel Veillard wrote:
[...]
I suspect it's just the top of the iceberg, there is a number of other
post-compilation optimization which can certainly be made, but with
less drastic improvements.
Mike Kay has spoken at I think XML Prague and/or Balisage about
On Fri, 2012-08-24 at 12:21 +0800, Daniel Veillard wrote:
[...]
I suspect it's just the top of the iceberg, there is a number of other
post-compilation optimization which can certainly be made, but with
less drastic improvements.
Mike Kay has spoken at I think XML Prague and/or Balisage about
On Thu, 2012-07-19 at 14:42 +0100, stuart shepherd wrote:
That last email went off a bit early.
I'm guesing it's something to do with having elements within the text, but
it doesn't look to be replacing anything but rather adding in the replace
value multiple times, is there a way round
On Fri, 2012-08-10 at 21:00 +0800, Daniel Veillard wrote:
- the XML schemas datatype part driven by
http://www.w3.org/TR/xmlschema-2/
. But I wonder if updating is actually right w.r.t.
XSD. Honnestly I don't know, One thing would be to try and see what
it gives on regression
On Wed, 2012-05-16 at 00:16 +0200, Michael Ludwig wrote:
./testapi.exe
This tests gets firewalled by the W3.ORG servers, I checked using
netstat. So I interrupted it. The source file testapi.c is huge.
Is there an easy way to instruct the program not to go the W3.ORG
servers like with
On Fri, 2012-07-20 at 09:25 -0500, Raymond Irving wrote:
Hello,
Are there any plans to add queryselector to libxml?
Why would you want to add CSS selectors when you already have XPath
selectors, which are more powerful?
Liam
--
Liam Quin - XML Activity Lead, W3C,
On Fri, 2012-07-20 at 11:52 -0500, Raymond Irving wrote:
In my opinion it's much easier to use when working with HTML content and
CSS classes.
That might be, but libxml is an XML library. Beware that it also does
not construct a conforming HTML DOM, so your CSS selectors may not
always do what
On Fri, 2012-07-20 at 09:03 -0500, Raymond Irving wrote:
Thanks for the feedback Micheal.
I thought that the first occurrence of /script or /style would be signal
the end of the element's content but I guess the W3C had something else in
mind.
HTML 4 (that you are using) was based on ISO
On Wed, 2012-06-06 at 14:51 +0800, Daniel Veillard wrote:
One of the reasons why that initialization of the context is not
specified in the XPath standard is due to the fact that the standard was
done with the intent to be reused (by XPointer/XLink and XSLT) and we
didnt really expected
On Wed, 2012-05-23 at 17:55 +0200, Vit Zikmund wrote:
We are using XMLSec library built on top of libxml2 to process some large
XML files, however it doesn't seem to work for files 2GB, which is
unfortunately what we need.
Not really an answer to your question, but if you process the same
On Fri, 2012-03-09 at 05:23 -0600, Brian Smith wrote:
On 3/8/12 11:56 PM, Daniel Veillard wrote:
The psvi field is used by Relax-NG and libxslt (for stylesheet
compilation), so if you don't use those with your trees it should be
safe. Daniel
Great, definitely not using libxslt... not
On Wed, 2012-02-15 at 10:04 +0100, spam.spam.spam.s...@free.fr wrote:
There is a solution using the libxslt library and applying a stylesheet to
the document...
But I am wondering if there is a way to do this job using just libxml2.
Have you any idea about this?
Use titleMy Book C/title
On Thu, 2012-02-16 at 08:28 +0100, spam.spam.spam.s...@free.fr wrote:
[...].
Anyway, there seems to have no other solution with libxml2 only.
The spaces are part of the text of the document, so it's not likely that
a conformant XML parser will strip them for you.
You could of course remove the
On Tue, 2012-02-07 at 14:08 +0100, Nikolai Weibull wrote:
What is going wrong here? (Example files minimized to show the problem.)
It looks like there are some bugs, however...
a.dtd:
!ENTITY % a PUBLIC -//a//b//c
so the SYSTEM identifier of %a is actually a.dtd, and system
identifers
On Fri, 2012-01-06 at 11:47 -0500, Thomas Gagne wrote:
I'm unclear how I can reformat the file
tgagne@ubuntu:~/tmp$ cat a.xml
ns:a
/ns:a
without getting the errors
First fix the namespace error in your document.
If that's not possible, and you want to retain the undeclared
Who here is following XML Schema 1.1? Any chance of any support for any
of the changes?
Liam
--
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
___
xml mailing list, project page
On Sun, 2011-10-23 at 23:01 +0200, Joachim Zobel wrote:
Hi.
According to http://www.w3.org/TR/xmlschema-1/#Selector they should,
I think --pattern in xmllint is actually for an XPath expression; since
XPath itself gives no way to bind a prefix to a URI, you are stuck with
//*[local-name()
[...]
If the above are correct, what do you suggest to people who want to use
libxml2 to validate large XMLs with external DTD files? Re-write the input
XML file?
Pretty much yeah. It's not so bad, just a tiny DOCTYPE refering to the DTD.
In many cases you don't even need that. Write
On Sat, 2011-06-11 at 01:02 +0200, David Kubicek wrote:
The problem is that by default, libxml knows only the basic 5 XML
entities.
Why is this a problem?
XML documents must either stick to those entities or define the ones
they want to use, so you should not predefine others.
I just can't
On Sat, 2011-06-11 at 01:02 +0200, David Kubicek wrote:
The problem is that by default, libxml knows only the basic 5 XML
entities.
Why is this a problem?
XML documents must either stick to those entities or define the ones
they want to use, so you should not predefine others.
I just can't
On Sun, 2011-06-05 at 11:19 +0800, Daniel Veillard wrote:
On Fri, Jun 03, 2011 at 05:34:03PM +0200, Tomáš Pospíšil wrote:
Hi Daniel and all hackers,
I'm GSoC student creating new XML index in PostgreSQL which use
LibXML for handling XML documents. My idea about index is about to use
node
On Sat, 2011-03-05 at 14:52 +0100, Michael Ludwig wrote:
Dan Quach schrieb am 01.03.2011 um 12:05 (-0800):
Currently I am reading in the following xml document (through a ruby
wrapper)
LibXML::XML::Document.string( saml_plain)
ds:SignedInfo
ds:CanonicalizationMethod Algorithm='
On Sat, 2010-08-28 at 02:11 +0300, Face wrote:
Hello all,
I am trying to move from .Net System.Xml to libxml2
This doesn't answer your question directly, but may help --
look at the source of the xmllint program that comes with libxml2.
It can do what you want, so, it can be an example.
On Sun, 2010-08-01 at 10:37 -0400, mruhrb...@aim.com wrote:
Hello list,
I run xmllint to format my XML file: xmllint --format file.xml
The output goes to stdout, so I capture it in a file: xmllint --format
file.xml formatted.xml
Now I replace the old with the new: mv formatted.xml
On Wed, 2010-07-21 at 20:35 +0200, Michael Ludwig wrote:
[...]
* allow usage of the most weird characters on Earth in element and
attribute names (upgrade to latest Unicode standard version)
And also more than 60,000 characters that are not at all weird.
However, XML 1.0 5th edition also does
On Mon, 2010-02-08 at 10:25 +0100, Daniel Veillard wrote:
On Mon, Feb 08, 2010 at 03:08:49AM +0100, Iñaki Baz Castillo wrote:
Unfortunatelly in my case I'm implementing a XCAP (RFC 4825) client and
server. XCAP reuses Xpath but allows Xpath nodes without prefix matching an
application
On Mon, 2010-01-04 at 17:35 +0100, Andreas Wagner wrote:
param parName=parameter1 type=DOUBLE val=0.0/- these
attributes ... but i cant ...strange ... the line above works
do u have an idea what can be wrong?
Note that unprefixed attributes are not in any namespace.
The problem is
On Sun, 2009-09-27 at 20:07 +0200, Dirk wrote:
uhm... any comment on this? maybe a suggestion how to make it parse at least
what it can parse? or which encoding i should use?
or a link to a solution?
You could try using iconv to convert the file to utf-8, or telling
libxml the input is in
On Fri, 2009-09-04 at 05:26 -0700, mini thomas wrote:
Yes I have used xerces. But now I need to use libxml2 and the main
road block is strict namespace validation. So waiting for Daniel's
response. Thanks
Why do you need this? Can you fix the XML? That's usually the
best approach.
Liam
On Wed, 2009-04-22 at 11:06 -0500, Michael Cronenworth wrote:
Is there a way to ignore a character contained inside of an XML tag?
Normally this is used for entities, but I have a use for an XML tag to
contain an in this example:
passworde$az_3{dpp/password
No, you need to escape it,
On Wed, 2009-04-22 at 11:21 -0500, Michael Cronenworth wrote:
[...]
Yes, that was my understanding, but the XML I have received (business)
contains bad characters inside of XML tags.
Then you didn't receive XML, you got line noise or garbage :)
Liam
--
Liam Quin - XML Activity Lead, W3C,
On Mon, 2009-02-23 at 17:36 -0500, abey...@axsone.com wrote:
[...]
We normally deal with very large XML files and avoid loading them into
memory in our application by using SAX api. However in scripts we
call xsltproc.
Am I understanding it correctly that xsltproc does in fact load entire
On Thu, 2008-09-25 at 19:06 +0200, Daniel Veillard wrote:
[...]
XQuery doesn't make that much sense when playing with a single document,
There are several implementations of XQuery that work on a single
document and that people say are useful.
my feeling is that it's more fit for a
On Tue, 2008-09-09 at 18:29 +0200, Paco Koch wrote:
hi list,
i'm trying to compile the code from:
http://en.wikipedia.org/wiki/Libxml2
(testing to see if i can code something with libxml2)
but i get following output from my compiler
(gcc version 4.2.3 (Ubuntu 4.2.3-2ubuntu7) - [i have
On Thu, 2008-09-11 at 16:53 +0200, Kris Breuker wrote:
!ELEMENT xyz (instances_chop*, !-- chop_enum with an atrribute for
which instance is removed --
[...]
So I think the following should work:
!ELEMENT xyz (instances_chop*, -- chop_enum with an atrribute for which
instance is removed
On Mon, 2008-05-12 at 21:03 +0800, 职文超 wrote:
I am new to XML. And I am looking for a way to convert an xml file
depending on external DTD files to its equivalent not depending on
external DTD files.
You can change
!DOCTYPE sock SYSTEM sock.dtd
sockArgyle/sock
to
!DOCTYPE sock [
!--*
On Tue, 2008-04-22 at 15:56 +0200, Arnold Hendriks wrote:
Can I cheat? :) Given the fact that nothing should appear between
/body and /html, and /html is always the last tag, its' easiest to
just ignore them and let the autoclose deal with it...
In practice I expect it's not uncommon to
On Tue, 2007-09-04 at 07:01 -0400, Daniel Veillard wrote:
On Tue, Sep 04, 2007 at 06:39:01AM -0300, Bruno Dilly wrote:
Hi people,
I'm trying to parse RSS with html entities, but I'm having the
following errors when it tries to parse the rss file:
Entity 'ntilde' not defined;
Entity
On Fri, 2007-06-15 at 16:29 +1000, Michael Day wrote:
[...]
I think that the changes required to support XML 1.1 would go so deep
into the implementation that it could actually be easier to fork libxml2
and have two separate libraries, one for XML 1.0 and one for XML 1.1.
Are you sure you're
On Thu, 2007-05-10 at 11:24 -0400, [EMAIL PROTECTED] wrote:
I’m aware that xml is case sensitive, but is there a switch to turn
that off?
Of course we can just convert the file into lowercase/uppercase before
setting libxml2 on it, but if there’s an option, I’d rather not.
In general it's
On Wed, 2007-04-25 at 16:32 -0700, Yong Chen (yongche) wrote:
Say I have an xml tree:
A
B C
D E
[..]
Now say I have xpath /A/B, it selects node B. When I return the
result, should I return B node only, or I should return B and also its
child (D
On Wed, 2007-21-02 at 15:59 -0600, David Grohmann wrote:
[...] .But is there a better way to get at that data than manually
following pointers like I was showing in the GDB prompts?
The XPath interface is higher level, if that helps.
Liam
--
Liam Quin - XML Activity Lead, W3C,
On Wed, Jan 24, 2007 at 12:36:34PM -0800, Steve Yan wrote:
I am looking for such a tool which convert the results from a SQL select
into XML format.
There are several ways to do this; some XQuery implementaions can
do it natively... or you could use ODBC or JDBC or even the
Perl DBI to access
On Thu, 2007-11-01 at 14:13 -0300, pwhelan wrote:
This patch adds support for the 'eq' and 'neq' operators, in a way so that
they are just aliases for the '=' and '!=' operators.
You might do better to consider using internal entities
!ENTITY eq '='
!ENTITY neq '!='
and doing, e.g.
On Fri, 2007-05-01 at 19:26 -0500, Mike - EMAIL IGNORED wrote:
,
I see:
Comments may appear anywhere in a document
outside other markup; ...
Consider:
MyDoc
!--Here is my special tag--
MySpecialTag
stuff in tag
/MySpecialTag
/MyDoc
Now is a
On Wed, 2006-09-06 at 15:50 +0200, Marchese Stefano wrote:
My application parses some xml files using the xmlParseFile() API.
This API gives an error if the file has the following content:
contentAsl#x10;URP/content
As indeed it should, character 0x10 (hexadecimal, ie. decimal 16,
i.e. ASCII
On Fri, 2006-07-28 at 08:46 +0400, Nikolay Samokhvalov wrote:
Does anybody use libxml2 to parse not entire docs but parts of them?
Ideally, it would be work with items of XQuery Data Model.
libxsl supports xpath 1, not XPath 2; there's no support right now for
the XPath 2.0 and XML Query Data
On Mon, 2006-07-03 at 11:42 +0200, Buchcik, Kasimier wrote:
I think we intend to implement the missing areas. PSVI is a different
story, which we need to consider well when implementing XPath 2.0
and XSLT 2.0 in the future, since PSVI will have to be a part of those
new technologies, and we
On Mon, 2006-07-03 at 20:50 +, Frans Englich wrote:
It wouldn't surprise if it would be a long wait for the Rec stamp, since
the
W3C machinery can drag things out in this area.
It's part of the trouble with being consensus-based sometimes.
But I expect to see Recommendations this year.
On Mon, 2006-05-08 at 09:35 +0100, srivatsan s wrote:
[...]
The regx for this will be some thing like this
{ns:book} {
BEGIN NEW STATE TO PROCESS CHILD ELEMENTS
}
{ns:Math}{ns:Science}
{
may or may not have new states here depending upon the xsd.
}
What you are describing is a finite
On Tue, 2006-04-18 at 19:43 -0400, Alex Khesin wrote:
I am building Atom/RSS SAX2 parser using libxml, and in order to
implement http://www.atomenabled.org/developers/syndication/#text for
type=xhtml, I need to be able to completely disable entity
replacement.
You don't need to turn entity
On Fri, 2006-02-17 at 15:19 -0800, Rick Jones wrote:
I'm swimming in circles trying to get on board with catalogs, to deal with a
problem I have with netperf4 (http://www.netperf.org/svn/netperf4/trunk).
[...]
Specifically, to allow the netperf DTD (src/netperf_doc.dtd) to exist in
On Fri, 2006-01-27 at 17:53 -0800, Sean Machin wrote:
camera_test.xml:65: validity error : Content model of rule is not
determinist: ((input | (input , (operator , input)+)) , output+)
Right -- it's not deterministic (the error message has a typo I think).
you cant have a situation when
On Tue, 2005-11-08 at 00:44 +0100, Kail wrote:
I've a problem with an old SGLM.
This have many format error, the 2 most annoing are:
1- Have more than 1 element as root child
SGML does not allow this.
//Start of file
reuters /reuters
reuters /reuters
As
On Wed, 2005-06-29 at 15:39 -0400, [EMAIL PROTECTED] wrote:
I cannot figure out how to use xPath without loading the entire XML into
memory.
This is an implementation-specific question, and for libxml2 the answer
is usually that you will need the whole document in memory.
If this is a problem
82 matches
Mail list logo