On Wednesday, 20 July 2016 at 01:49:37 UTC, ketmar wrote:
i wrote a simple sax-style xml parser[1][2] for my own needs,
and decided to share it. it has two interfaces: `xmparse()`
function which simply calls callbacks without any validation or
encoding conversion, and `SaxyEx` class, which does some
validation, converts content to utf-8 (from anything
std.encoding supports), and calls callbacks when the given path
is triggered.
it can parse any `char` input range, or std.stdio.File. parsing
files is probably slightly faster than parsing ranges.
internally it is extensively reusing memory buffers it
allocated, so it should not create a big pressure on GC.
you are expected to copy any data you need in callbacks (not
just slice, but .dup!).
so far i'm using it to parse fb2 files, and it parsing 8.5
megabyte utf-8 file (and creating internal reader structures,
including splitting text to words and some other housekeeping)
in one second on my i3 (with dmd -O, even without -inline and
-release).
it is not really documented, but i think it is "intuitive".
there are also some comments in source code; please, read
those! ;-)
p.s. it decodes standard xml entities ( and probably
works right only in utf-8 files, though), understands CDATA and
comments.
enjoy, and happy hacking!
[1] http://repo.or.cz/iv.d.git/blob_plain/HEAD:/saxy.d
[2] http://repo.or.cz/iv.d.git/tree/HEAD:/saxytests
Thanks. I might actually use it. I need an XML parser and wrote a
very basic and incomplete one for my needs.