subject:"Re\: dxml 0.1.0 released"

Re: dxml 0.1.0 released

2018-02-11 Thread Cym13 via Digitalmars-d-announce

On Friday, 9 February 2018 at 21:15:33 UTC, Jonathan M Davis 
wrote:

[...]
Of note, dxml does not support the DTD section beyond what is 
required to parse past it

[...]
- Jonathan M Davis


Fun fact, since the most common security vulnerability associated 
with XML (XEE [1]) is based on exploiting the fact that most 
libraries parse in-line DTDs by default, this makes dxml immune 
to such attacks. Given how often this vulnerability is found in 
the wild it sounds like a very good thing to me :D


[1]: 
https://www.owasp.org/index.php/XML_External_Entity_(XXE)_Processing

Re: dxml 0.1.0 released

2018-02-11 Thread Jacob Carlborg via Digitalmars-d-announce


On 2018-02-10 19:57, Jonathan M Davis wrote:


Kind of. I did some benchmarking to see if some code changes would improve
performance, but I haven't tried benchmarking it against any other XML
libraries.


Ok, I see.


That would take a fair bit of time and effort, and IMHO, that
would be better spent finishing the library first.

Fair enough.

--
/Jacob Carlborg

Re: dxml 0.1.0 released

2018-02-11 Thread Russel Winder via Digitalmars-d-announce

On Sun, 2018-02-11 at 03:34 -0700, Jonathan M Davis via Digitalmars-d-
announce wrote:
> 
[…]
> Given how strings work in D, parsing is something that we should
> easily be
> able to do faster than other languages - or at least, other languages
> typically have to write much less idiomatic code and go to a lot more
> effort
> to reach the speeds that we can easily reach with idiomatic D code.
> So, in
> general, IMHO, parsers are one of those things that we should
> typically be
> writing natively.

Works for me, and given you have given the project a massive kick
start, hopefully others can get stuck in and Phobos can do a swap of
what was with what is.

> That being said, if someone really wants full DTD support, I have no
> problem
> sending them off to deal with bindings to C/C++ libraries, since I
> for one
> am not willing to put in the time or effort to support that part of
> the XML
> spec, since it complicates things considerably while adding nothing
> positive
> IMHO. I'm sure that a D solution could compete excellently with a
> C/C++
> solution, but it's sure not worth my time and effort, and no one else
> has
> stepped up to implement anything along those lines.

I am no longer doing XML stuff myself, but a couple of years ago DTDs
were "dead" and everyone was using XML Schemas.

> Also, we're not about to put bindings to a C/C++ library for XML in
> Phobos
> (it's already been argued quite a bit that doing so with curl was a
> big
> mistake), so if we want to replace std.xml, that calls for writing a
> replacement in D.

True, and entirely reasonable. It is why lxml is only available via
download or far more usually via PyPI.

D having a really good XML (and XSLT) support in it's standard library,
and removing the crud, would be one up on what Python has done.

-- 
Russel.
===
Dr Russel Winder  t: +44 20 7585 2200
41 Buckmaster Roadm: +44 7770 465 077
London SW11 1EN, UK   w: www.russel.org.uk


signature.asc
Description: This is a digitally signed message part

Re: dxml 0.1.0 released

2018-02-11 Thread Jonathan M Davis via Digitalmars-d-announce

On Sunday, February 11, 2018 10:11:05 Russel Winder via Digitalmars-d-
announce wrote:
> On Fri, 2018-02-09 at 13:47 -0800, H. S. Teoh via Digitalmars-d-
>
> announce wrote:
> > On Fri, Feb 09, 2018 at 02:15:33PM -0700, Jonathan M Davis via
> >
> > Digitalmars-d-announce wrote:
> > > I have multiple projects that need an XML parser, and
> > > std_experimental_xml is clearly going nowhere, with the guy who
> > > wrote
> > > it having disappeared into the ether, so I decided to break down
> > > and
> > > write one. I've kind of wanted to for years, but I didn't want to
> > > spend the time on it. However, sometime last year I finally decided
> > > that I had to, and it's been what I've been working on in my free
> > > time
> > > for a while now. And it's finally reached the point when it makes
> > > sense to release it - hence this post.
> >
> > Hooray!  Finally, a glimmer of hope for XML parsing in D!
>
> I wonder why no-one has tried using DStep to create a D binding for
> libxml2 and libxslt.
>
> Whilst Python has a SAX and DOM parsing capability, well three
> different ones in the standard library, anyone doing serious XML work
> in Python uses lxml which is just a Python binding to libxml2 and
> libxslt.
>
> If Python people have given up on the XML stuff in it's standard
> library and use a binding to a well known and distributed one, is this
> a good path for D?

Given how strings work in D, parsing is something that we should easily be
able to do faster than other languages - or at least, other languages
typically have to write much less idiomatic code and go to a lot more effort
to reach the speeds that we can easily reach with idiomatic D code. So, in
general, IMHO, parsers are one of those things that we should typically be
writing natively.

That being said, if someone really wants full DTD support, I have no problem
sending them off to deal with bindings to C/C++ libraries, since I for one
am not willing to put in the time or effort to support that part of the XML
spec, since it complicates things considerably while adding nothing positive
IMHO. I'm sure that a D solution could compete excellently with a C/C++
solution, but it's sure not worth my time and effort, and no one else has
stepped up to implement anything along those lines.

Also, we're not about to put bindings to a C/C++ library for XML in Phobos
(it's already been argued quite a bit that doing so with curl was a big
mistake), so if we want to replace std.xml, that calls for writing a
replacement in D.

- Jonathan M Davis

Re: dxml 0.1.0 released

2018-02-11 Thread Russel Winder via Digitalmars-d-announce

On Fri, 2018-02-09 at 13:47 -0800, H. S. Teoh via Digitalmars-d-
announce wrote:
> On Fri, Feb 09, 2018 at 02:15:33PM -0700, Jonathan M Davis via
> Digitalmars-d-announce wrote:
> > I have multiple projects that need an XML parser, and
> > std_experimental_xml is clearly going nowhere, with the guy who
> > wrote
> > it having disappeared into the ether, so I decided to break down
> > and
> > write one. I've kind of wanted to for years, but I didn't want to
> > spend the time on it. However, sometime last year I finally decided
> > that I had to, and it's been what I've been working on in my free
> > time
> > for a while now. And it's finally reached the point when it makes
> > sense to release it - hence this post.
> 
> Hooray!  Finally, a glimmer of hope for XML parsing in D!

I wonder why no-one has tried using DStep to create a D binding for
libxml2 and libxslt.

Whilst Python has a SAX and DOM parsing capability, well three
different ones in the standard library, anyone doing serious XML work
in Python uses lxml which is just a Python binding to libxml2 and
libxslt.

If Python people have given up on the XML stuff in it's standard
library and use a binding to a well known and distributed one, is this
a good path for D?

-- 
Russel.
===
Dr Russel Winder  t: +44 20 7585 2200
41 Buckmaster Roadm: +44 7770 465 077
London SW11 1EN, UK   w: www.russel.org.uk

signature.asc
Description: This is a digitally signed message part

Re: dxml 0.1.0 released

2018-02-10 Thread Jonathan M Davis via Digitalmars-d-announce

On Saturday, February 10, 2018 21:10:28 Joakim via Digitalmars-d-announce 
wrote:
> On Saturday, 10 February 2018 at 18:57:53 UTC, Jonathan M Davis
>
> wrote:
> > On Saturday, February 10, 2018 16:14:41 Jacob Carlborg via
> >
> > Digitalmars-d- announce wrote:
> >> On 2018-02-09 22:15, Jonathan M Davis wrote:
> >> > [...]
> >>
> >> This is great news! Have you run any benchmarks to see how it
> >> performs?
> >
> > Kind of. I did some benchmarking to see if some code changes
> > would improve performance, but I haven't tried benchmarking it
> > against any other XML libraries. That would take a fair bit of
> > time and effort, and IMHO, that would be better spent finishing
> > the library first. Also, ldc's latest release is only up to dmd
> > 2.077.1, and dxml needs an improvement that got added to
> > byCodeUnit in 2.078.0, so any benchmarking that wants to do
> > something like compare dxml with a C/C++ parsing library while
> > taking the optimizer out of the equation isn't going to work
> > yet unless I fork byCodeUnit for dxml until we get another
> > release of ldc.
>
> ldc master uses the latest 2.078.2 frontend and stdlib, you could
> always build it yourself:
>
> https://github.com/ldc-developers/ldc/blob/master/CMakeLists.txt#L54
> https://wiki.dlang.org/Building_LDC_from_source

That's good to know. Thanks.

If I get to the point where I want to do more benchmarking before ldc does
another release, I'll build it myself, though depending on when I reach that
point and when ldc plans to do another release, it may or may not end up
being necessary.

- Jonathan M Davis

Re: dxml 0.1.0 released

2018-02-10 Thread Joakim via Digitalmars-d-announce

On Saturday, 10 February 2018 at 18:57:53 UTC, Jonathan M Davis 
wrote:
On Saturday, February 10, 2018 16:14:41 Jacob Carlborg via 
Digitalmars-d- announce wrote:

On 2018-02-09 22:15, Jonathan M Davis wrote:
> [...]
This is great news! Have you run any benchmarks to see how it 
performs?


Kind of. I did some benchmarking to see if some code changes 
would improve performance, but I haven't tried benchmarking it 
against any other XML libraries. That would take a fair bit of 
time and effort, and IMHO, that would be better spent finishing 
the library first. Also, ldc's latest release is only up to dmd 
2.077.1, and dxml needs an improvement that got added to 
byCodeUnit in 2.078.0, so any benchmarking that wants to do 
something like compare dxml with a C/C++ parsing library while 
taking the optimizer out of the equation isn't going to work 
yet unless I fork byCodeUnit for dxml until we get another 
release of ldc.


ldc master uses the latest 2.078.2 frontend and stdlib, you could 
always build it yourself:


https://github.com/ldc-developers/ldc/blob/master/CMakeLists.txt#L54
https://wiki.dlang.org/Building_LDC_from_source

Re: dxml 0.1.0 released

2018-02-10 Thread Jonathan M Davis via Digitalmars-d-announce

On Saturday, February 10, 2018 19:53:48 Jesse Phillips via Digitalmars-d-
announce wrote:
> On Friday, 9 February 2018 at 21:15:33 UTC, Jonathan M Davis
>
> wrote:
> > Hopefully, the documentation is clear enough, but obviously,
> > I'm not the best judge of that. So, have at it.
> >
> > Documentation: http://jmdavisprog.com/docs/dxml/0.1.0/
> > Github: https://github.com/jmdavis/dxml
> > Dub: http://code.dlang.org/packages/dxml
> >
> > - Jonathan M Davis
>
> This looks so nice.
>
> I can understand the concerns of the DTD, and it doesn't look
> like you needed to do anything special for namespaces with this
> parser.

I confess that I haven't looked into namespaces in detail, but from what I
understand about them, I don't see any reason to do anything beyond treating
them as part of the name. If the application wants to do something special
with them, then it's free to do so. Key goals of this parser were to make it
fast and simple to use for the typical use case. As much as possible, I'd
like to keep the complicated stuff out of it.

Personally, I see XML only as data just like JSON is only data, and I think
that the complications in the XML spec come from trying to treat it as more
than that.

I had originally intended to provide at least minimal DTD support but leave
most of it to some kind of helper functionality (e.g. have a helper function
which took the DTD data and then validated the rest of the XML using it).
However, as I got farther along, it became clear that that wasn't going to
work without giving up on being able to just slice the input, and I wasn't
willing to give up on that, especially when I don't see handling the DTD as
valuable for anything but dealing with overly complicated XML that is
outside of the programmer's control or to simply be able to say that I
completely implemented the XML spec.

Slicing is part of why parsers written in D should tend to be inherently
fast in comparison to those written in languages like C++, and I want to
take advantage of that. In principle, something like an XML parser should be
able to be a showcase for why D is great. Tango's was, but Phobos' hasn't
been, and I'd like for dxml to be able to be that regardless of whether it
eventually replaces std.xml or not.

- Jonathan M Davis

Re: dxml 0.1.0 released

2018-02-10 Thread Jesse Phillips via Digitalmars-d-announce

On Friday, 9 February 2018 at 21:15:33 UTC, Jonathan M Davis 
wrote:


Hopefully, the documentation is clear enough, but obviously, 
I'm not the best judge of that. So, have at it.


Documentation: http://jmdavisprog.com/docs/dxml/0.1.0/
Github: https://github.com/jmdavis/dxml
Dub: http://code.dlang.org/packages/dxml

- Jonathan M Davis


This looks so nice.

I can understand the concerns of the DTD, and it doesn't look 
like you needed to do anything special for namespaces with this 
parser.

Re: dxml 0.1.0 released

2018-02-10 Thread bauss via Digitalmars-d-announce

On Friday, 9 February 2018 at 21:15:33 UTC, Jonathan M Davis 
wrote:
I have multiple projects that need an XML parser, and 
std_experimental_xml is clearly going nowhere, with the guy who 
wrote it having disappeared into the ether, so I decided to 
break down and write one. I've kind of wanted to for years, but 
I didn't want to spend the time on it. However, sometime last 
year I finally decided that I had to, and it's been what I've 
been working on in my free time for a while now. And it's 
finally reached the point when it makes sense to release it - 
hence this post.


Currently, dxml contains only a range-based StAX / pull parser 
and related helper functions, but the plan is to add a DOM 
parser as well as two writers - one which is the writer 
equivalent of a StaX parser, and one which is DOM-based. 
However, in theory, the StAX parser is complete and quite 
useable as-is - though I expect that I'll be adding more helper 
functions to make it easier to use, and if you find that you're 
doing a particular operation with it frequently and that that 
operation is overly verbose, please point it out so that maybe 
a helper function can be added to improve that use case - e.g. 
I'm thinking of adding a function similar to std.getopt.getopt 
for handling attributes, because I personally find that dealing 
with those is more verbose than I'd like. Obviously, some stuff 
is just going to do better with a DOM parser, but thus far, 
I've found that a StAX parser has suited my needs quite well. I 
have no plans to add a SAX parser, since as far as I can tell, 
SAX parsers are just plain worse than StAX parsers, and the 
StAX approach is quite well-suited to ranges.


Of note, dxml does not support the DTD section beyond what is 
required to parse past it, since supporting it would make it 
impossible for the parser to return slices of the original 
input beyond the case where strings are used (and it would be 
forced to allocate strings in some cases, whereas dxml does 
_very_ minimal heap allocation right now), and parsing the DTD 
section signicantly increases the complexity of the parser in 
order to support something that I honestly don't think should 
ever have been part of the XML standard and is unnecessary for 
many, many XML documents. So, if you're dealing with XML 
documents that contain entity references that are declared in 
the DTD section and then used outside of the DTD section, then 
dxml will not support them, but it will work just fine if a DTD 
section is there so long as it doesn't declare any entity 
references that are then referenced in the document proper.


Hopefully, the documentation is clear enough, but obviously, 
I'm not the best judge of that. So, have at it.


Documentation: http://jmdavisprog.com/docs/dxml/0.1.0/
Github: https://github.com/jmdavis/dxml
Dub: http://code.dlang.org/packages/dxml

- Jonathan M Davis


This is going to be really useful for people like me who works 
with webservices using soap.


Thanks for the great work.

Re: dxml 0.1.0 released

2018-02-10 Thread Jonathan M Davis via Digitalmars-d-announce

On Saturday, February 10, 2018 10:27:42 Stefan via Digitalmars-d-announce 
wrote:
> great work, Jonathan. Thank you.
> We were missing xml for a long time and did so many hacks just to
> get xml somehow parsed.

LOL. Actually, one of the helper functions in std.datetime.timezone that has
to deal with xml does it via hacks, because the XML in question was fairly
simple, and I didn't want to deal with std.xml.

If dxml does end up going through the Phobo review process and eventually
ends up in Phobos, I'll have to change that code so that it uses dxml
instead of the hacks.

- Jonathan M Davis

Re: dxml 0.1.0 released

2018-02-10 Thread Jonathan M Davis via Digitalmars-d-announce

On Saturday, February 10, 2018 12:04:48 Seb via Digitalmars-d-announce 
wrote:
> On Friday, 9 February 2018 at 21:15:33 UTC, Jonathan M Davis
>
> wrote:
> > I have multiple projects that need an XML parser, and
> > std_experimental_xml is clearly going nowhere, with the guy who
> > wrote it having disappeared into the ether, so I decided to
> > break down and write one. I've kind of wanted to for years, but
> > I didn't want to spend the time on it. However, sometime last
> > year I finally decided that I had to, and it's been what I've
> > been working on in my free time for a while now. And it's
> > finally reached the point when it makes sense to release it -
> > hence this post.
> >
> > [...]
>
> FWIW we recently forked the experimental.xml repo to
> dlang-community:
>
> https://github.com/dlang-community/experimental.xml
>
> So PRs etc can be merged easily.
> But yeah it's not moving anywhere atm :/

Yeah, I got some e-mails about that the other day, since I had some open
issues and PRs on it, and IIRC github was telling me that you'd migrated
some of that over, but unless someone decides that they want to take up the
torch on it, it seems pretty dead. I assume that the guy who did it simply
got too busy with school once GSoC ended and then never got back to it even
when he did have time. If he were serious about finishing it and being an
active part of the D community, he would have at least looked at some the
PRs on the project, but he's been completely silent for quite a while now.
So, I guess he moved on. I was able to use it on one of my projects by
making some local changes and by working around some bugs, but it clearly
needs work that it's not getting.

I had some rather specific ideas about what I wanted to do with an XML
parser though and didn't want to spend the time trying to decipher what he'd
done and morph it into something more like what I wanted, so I just started
from scratch.

- Jonathan M Davis

Re: dxml 0.1.0 released

2018-02-10 Thread Jonathan M Davis via Digitalmars-d-announce

On Saturday, February 10, 2018 16:14:41 Jacob Carlborg via Digitalmars-d-
announce wrote:
> On 2018-02-09 22:15, Jonathan M Davis wrote:
> > Currently, dxml contains only a range-based StAX / pull parser and
> > related helper functions, but the plan is to add a DOM parser as well
> > as two writers - one which is the writer equivalent of a StaX parser,
> > and one which is DOM-based. However, in theory, the StAX parser is
> > complete and quite useable as-is - though I expect that I'll be adding
> > more helper functions to make it easier to use, and if you find that
> > you're doing a particular operation with it frequently and that that
> > operation is overly verbose, please point it out so that maybe a helper
> > function can be added to improve that use case - e.g.
> This is great news! Have you run any benchmarks to see how it performs?

Kind of. I did some benchmarking to see if some code changes would improve
performance, but I haven't tried benchmarking it against any other XML
libraries. That would take a fair bit of time and effort, and IMHO, that
would be better spent finishing the library first. Also, ldc's latest
release is only up to dmd 2.077.1, and dxml needs an improvement that got
added to byCodeUnit in 2.078.0, so any benchmarking that wants to do
something like compare dxml with a C/C++ parsing library while taking the
optimizer out of the equation isn't going to work yet unless I fork
byCodeUnit for dxml until we get another release of ldc.

One result of the benchmarking that I did do allowed me to simplify the code
quite a bit though. I'd originally had it be configurable whether the parser
kept track of the line number and column of the document, just the line
number, or neither on the theory that I really wanted access to the position
in the document in error messages but that it would affect performance, so
it should be configurable. However, benchmarking showed that it had
negligible impact on performance to the point that different PositionTypes
won out depending on the file and the particular run of the program,
indicating that that extra complexity was buying me nothing. There were a
fair number of static ifs to deal with that configuration option, so as soon
as I was able to measure that they didn't matter particularly, I removed
that option from the Config and all of its associated static ifs in the
parser and was able to reduce the complexity of the code a fair bit. Testing
that bit was actually the main reason that I did any benchmarking before
releasing anything, since I wanted to avoid changing the API later if I
could.

I am going to need to spend more time benchmarking code changes at some
point here though to see if I can make the parser faster, and eventually, I
will probably benchmark it against other parsing libraries. I fully expect
that it will compare favorably given that it does almost no heap allocations
and slices everything, but there's every possibility that I did something
algorithmically internally that hurts performance more than it should - e.g.
while it tries to parse everything only once, there are a few places where
it ends up taking a second pass over a piece of text, and refactoring that
is on my todo list (though most of the other potential improvements I did
benchmark were a wash, so I may find that it doesn't matter much).

I'll probably be in more of a hurry to benchmark dxml against other parsing
libraries if my dconf talk proposal on it gets accepted, since that's the
sort of thing that should probably be in such a talk.

I haven't even taken the time yet to figure out which libraries it should be
benchmared against.

- Jonathan M Davis

Re: dxml 0.1.0 released

2018-02-10 Thread Jacob Carlborg via Digitalmars-d-announce


On 2018-02-09 22:15, Jonathan M Davis wrote:


Currently, dxml contains only a range-based StAX / pull parser and related
helper functions, but the plan is to add a DOM parser as well as two writers
- one which is the writer equivalent of a StaX parser, and one which is
DOM-based. However, in theory, the StAX parser is complete and quite useable
as-is - though I expect that I'll be adding more helper functions to make it
easier to use, and if you find that you're doing a particular operation with
it frequently and that that operation is overly verbose, please point it out
so that maybe a helper function can be added to improve that use case - e.g.


This is great news! Have you run any benchmarks to see how it performs?

--
/Jacob Carlborg

Re: dxml 0.1.0 released

2018-02-10 Thread Seb via Digitalmars-d-announce

On Friday, 9 February 2018 at 21:15:33 UTC, Jonathan M Davis 
wrote:
I have multiple projects that need an XML parser, and 
std_experimental_xml is clearly going nowhere, with the guy who 
wrote it having disappeared into the ether, so I decided to 
break down and write one. I've kind of wanted to for years, but 
I didn't want to spend the time on it. However, sometime last 
year I finally decided that I had to, and it's been what I've 
been working on in my free time for a while now. And it's 
finally reached the point when it makes sense to release it - 
hence this post.


[...]


FWIW we recently forked the experimental.xml repo to 
dlang-community:


https://github.com/dlang-community/experimental.xml

So PRs etc can be merged easily.
But yeah it's not moving anywhere atm :/

Re: dxml 0.1.0 released

2018-02-10 Thread Stefan via Digitalmars-d-announce


great work, Jonathan. Thank you.
We were missing xml for a long time and did so many hacks just to 
get xml somehow parsed.

Re: dxml 0.1.0 released

2018-02-09 Thread Jonathan M Davis via Digitalmars-d-announce

On Friday, February 09, 2018 13:47:52 H. S. Teoh via Digitalmars-d-announce 
wrote:
> As for DTDs, perhaps it might be enough to make normalize() configurable
> with some way to specify additional entities that may be defined in the
> DTD?  Once that's possible, I'd say it's Good Enough(tm), since the user
> will have the tools to build DTD support from what they're given.  Of
> course, "standard" DTD support can be added later, built on the current
> StAX parser.

As I understand it (though IMHO, the spec isn't clear enough, and I'd have
to go over it with a fine-tooth comb to make sure that I got it right), as
soon as you start dealing with entity references, you can pretty much just
drop whole sections of XML into your document, fundamentally, changing the
document. So, I don't think that it's possible to deal with the entity
references after the fact. They're basically macros that have to be expanded
while you're parsing, which is part of why they're so disgusting IMHO - even
without getting into any of the document validation stuff.

Though honestly, the part about the DTD section that I find truly offensive
is that the document itself is defining what constitutes valid input. Since
when does it make any sense for the _input_ for a program to tell the
program what constitutes valid input? That's for the program to decide. And
considering how much more complicated the parser has to be to properly deal
with the DTD makes its inclusion in the spec seem absolutely insane to me.

And none of that mess is necessary for simple, sane XML documents that are
just providing data.

I _might_ add a DTD parser later, but if I do, it will almost certainly be
its own separate parser. However, given how much of my life I would then be
wasting on something that I consider to be of essentially zero value (if not
negative value), I don't see myself doing it without someone paying me to.
IMHO, the only reason that it makes any sense to fully support the DTD
section is for those poor folks who have to deal with XML documents where
someone else decided to use those features, and they don't have any choice.
I would hope that few programmers would actually _want_ to be using those
features.

> I would support it if you proposed dxml to be added to Phobos.

I've thought about it, but I'd like to complete the writers and the DOM
parser first as well as see it get at least somewhat battle-tested. Right
now, it's just been used in a couple of my personal projects, which did
affect some of my design choices (for the better, I think), but since no one
else has done anything with it, there may be something that it needs that
I've completely missed. The API is simple enough that I _think_ that it's
good as-is and that improvements are largely a question of adding helper
functions, but the library does need more widespread use and feedback.

- Jonathan M Davis

Re: dxml 0.1.0 released

2018-02-09 Thread H. S. Teoh via Digitalmars-d-announce

On Fri, Feb 09, 2018 at 02:15:33PM -0700, Jonathan M Davis via 
Digitalmars-d-announce wrote:
> I have multiple projects that need an XML parser, and
> std_experimental_xml is clearly going nowhere, with the guy who wrote
> it having disappeared into the ether, so I decided to break down and
> write one. I've kind of wanted to for years, but I didn't want to
> spend the time on it. However, sometime last year I finally decided
> that I had to, and it's been what I've been working on in my free time
> for a while now. And it's finally reached the point when it makes
> sense to release it - hence this post.

Hooray!  Finally, a glimmer of hope for XML parsing in D!


> Currently, dxml contains only a range-based StAX / pull parser and related
> helper functions, but the plan is to add a DOM parser as well as two writers
> - one which is the writer equivalent of a StaX parser, and one which is
> DOM-based. However, in theory, the StAX parser is complete and quite useable
> as-is - though I expect that I'll be adding more helper functions to make it
> easier to use, and if you find that you're doing a particular operation with
> it frequently and that that operation is overly verbose, please point it out
> so that maybe a helper function can be added to improve that use case - e.g.
> I'm thinking of adding a function similar to std.getopt.getopt for handling
> attributes, because I personally find that dealing with those is more
> verbose than I'd like. Obviously, some stuff is just going to do better with
> a DOM parser, but thus far, I've found that a StAX parser has suited my
> needs quite well. I have no plans to add a SAX parser, since as far as I can
> tell, SAX parsers are just plain worse than StAX parsers, and the StAX
> approach is quite well-suited to ranges.
> 
> Of note, dxml does not support the DTD section beyond what is required to
> parse past it, since supporting it would make it impossible for the parser
> to return slices of the original input beyond the case where strings are
> used (and it would be forced to allocate strings in some cases, whereas dxml
> does _very_ minimal heap allocation right now), and parsing the DTD section
> signicantly increases the complexity of the parser in order to support
> something that I honestly don't think should ever have been part of the XML
> standard and is unnecessary for many, many XML documents. So, if you're
> dealing with XML documents that contain entity references that are declared
> in the DTD section and then used outside of the DTD section, then dxml will
> not support them, but it will work just fine if a DTD section is there so
> long as it doesn't declare any entity references that are then referenced in
> the document proper.
> 
> Hopefully, the documentation is clear enough, but obviously, I'm not
> the best judge of that. So, have at it.
> 
> Documentation: http://jmdavisprog.com/docs/dxml/0.1.0/
> Github: https://github.com/jmdavis/dxml
> Dub: http://code.dlang.org/packages/dxml
[...]

Wonderful!  The docs are beautiful, I must say.  Good job on that.
Though a simple example of basic usage in the module header would be
very nice.

Glanced over the docs.  It's a pretty nice and clean API, and IMO,
worthy of consideration to be included into Phobos.  IMO, the lack of
SAX / DOM parsing is not a big deal, since it's not hard to build one
given StAX primitives.

Being range-based is very nice, but I'd say your choice to slice the
input, defer expensive/allocating operations to normalize() is a big
winning point.  This approach is fundamental to high performance, in the
principle of not doing any operation that isn't strictly necessary until
it's actually asked for.  If nothing else, this is a good design pattern
that I plan to st^Wcopy in my own code. :-P

As for DTDs, perhaps it might be enough to make normalize() configurable
with some way to specify additional entities that may be defined in the
DTD?  Once that's possible, I'd say it's Good Enough(tm), since the user
will have the tools to build DTD support from what they're given.  Of
course, "standard" DTD support can be added later, built on the current
StAX parser.

I would support it if you proposed dxml to be added to Phobos.


T

-- 
There are 10 kinds of people in the world: those who can count in binary, and 
those who can't.

Re: dxml 0.1.0 released

Re: dxml 0.1.0 released

Re: dxml 0.1.0 released

Re: dxml 0.1.0 released

Re: dxml 0.1.0 released

Re: dxml 0.1.0 released

Re: dxml 0.1.0 released

Re: dxml 0.1.0 released

Re: dxml 0.1.0 released

Re: dxml 0.1.0 released

Re: dxml 0.1.0 released

Re: dxml 0.1.0 released

Re: dxml 0.1.0 released

Re: dxml 0.1.0 released

Re: dxml 0.1.0 released

Re: dxml 0.1.0 released

Re: dxml 0.1.0 released

Re: dxml 0.1.0 released

18 matches

Site Navigation

Mail list logo

Footer information