Re: simple sax-style xml parser

2016-07-29 Thread ketmar via Digitalmars-d-announce

On Friday, 29 July 2016 at 14:47:08 UTC, Chris wrote:
Thanks. I might actually use it. I need an XML parser and wrote 
a very basic and incomplete one for my needs.


great. don't forget to get lastest versions from that links. and 
feel free to report any bugs here, i'll try to fix them asap. ;-)


Re: simple sax-style xml parser

2016-07-29 Thread Chris via Digitalmars-d-announce

On Wednesday, 20 July 2016 at 01:49:37 UTC, ketmar wrote:
i wrote a simple sax-style xml parser[1][2] for my own needs, 
and decided to share it. it has two interfaces: `xmparse()` 
function which simply calls callbacks without any validation or 
encoding conversion, and `SaxyEx` class, which does some 
validation, converts content to utf-8 (from anything 
std.encoding supports), and calls callbacks when the given path 
is triggered.


it can parse any `char` input range, or std.stdio.File. parsing 
files is probably slightly faster than parsing ranges.


internally it is extensively reusing memory buffers it 
allocated, so it should not create a big pressure on GC.


you are expected to copy any data you need in callbacks (not 
just slice, but .dup!).


so far i'm using it to parse fb2 files, and it parsing 8.5 
megabyte utf-8 file (and creating internal reader structures, 
including splitting text to words and some other housekeeping) 
in one second on my i3 (with dmd -O, even without -inline and 
-release).


it is not really documented, but i think it is "intuitive". 
there are also some comments in source code; please, read 
those! ;-)


p.s. it decodes standard xml entities (&# and &#x probably 
works right only in utf-8 files, though), understands CDATA and 
comments.



enjoy, and happy hacking!


[1] http://repo.or.cz/iv.d.git/blob_plain/HEAD:/saxy.d
[2] http://repo.or.cz/iv.d.git/tree/HEAD:/saxytests


Thanks. I might actually use it. I need an XML parser and wrote a 
very basic and incomplete one for my needs.