On Wed, Dec 05, 2007 at 02:43:04AM +0100, Roland Mainz wrote:
> Daniel Veillard wrote:
> > On Tue, Dec 04, 2007 at 06:19:53PM +0100, Roland Mainz wrote:
> > > I am currently working on SAX/xmlSAXParseFile libxml2 bindings for
> > > ksh93/kash and have a few questions about the API:
> > > - Is there a way to provide a "default encoding setting" which should be
> > > used if the document itself doesn't define a character set ?
> >
> > I don't understand the question. The XML standard defines how things
> > should be checked in the absence of informations, in the document. If you
> > provide that information it overrides the normal procxessing (and if you
> > guess wrong you get a fatal error).
> > More doc
> > http://xmlsoft.org/encoding.html
> > http://www.w3.org/TR/REC-xml/#sec-guessing
>
> Erm... the issue is a bit POSIX shell specific. A POSIX shell always
> operates on "characters boundaries in the current locale". If we would
> allow something like
> $ echo "<?xml version="1.0..." | xmlsaxparse myfunctions - # then the
> input data will be in the current locale. Either we assume that only
and if it does
cat foo.xml |
you won't be able to assume anything.
> ASCII data are passed to the SAX parser, the shell script code needs to
that's just wrong.
> lookup the current locale's encoding and pass it with the XML document
> or the "xmlsaxparse" shell builtin command needs to handle the situation
> somehow (e.g. convert input data to the expencted encoding).
Sorry no answer for that, best is to have the encoding in the XML declaration
describe the actual encoding, any guessing done is just calling for troubles.
> > > - How can I turn-off the libxml2 feature that it resolves all entities
> > > (e.g. how can I do my own entity resolving) ?
> >
> > By default in SAX mode if you don't ask for entity replacement I
> > think libxml let you provide it (see the entity callback). NOTE:
> > this is hairy, complex to get right in the general case, and one reason
> > I recomment to not use SAX at all.
>
> What would you recomment to be used instead ?
the Reader API
http://xmlsoft.org/xmlreader.html
I have no idea what you are trying to achieve though.
> > > - How can I abort a SAX parser run from within a callback function ?
> >
> > xmlStopParser()
>
> Thanks! :-)
>
> > > - Is there a way to get |xmlSAXParseFile()| to accept stdin as input to
> > > allow it's use in pipe chains ?
> >
> > "-"
>
> Somehow it seems it doesn't like the pipe input much... the libxml2 code
> sometimes prints stuff like:
> -- snip --
> I/O error : Invalid seek
> I/O error : Invalid seek
> I/O error : Invalid seek
> I/O error : Invalid seek
> -- snip --
No idea why, we have been using that for ages for various processing
paphio:~/XML -> cat tst.xml | xmllint --noout -
paphio:~/XML ->
First time I hear of a problem there.
> > > for a simple RSS browser I would have to call |xmlSAXParseFile()| to
> > > decode the RSS stream and then |xmlSAXParseFile()| a 2nd time (from
> > > within a callback) to decode the XHTML data (I've already tried but
> > > something weired is going on somehow the '<' and '>' characters seem to
> > > "disappear" from the charatcer data stream).
> >
> > No idea, sounds weird.
>
> Yes... even more weired the error sometimes disappears and then comes
> lack - sounds like a job for "dbx -check access" or "valgrind" ...
Daniel
--
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard | virtualization library http://libvirt.org/
[EMAIL PROTECTED] | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
_______________________________________________
xml mailing list, project page http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml