2003-03-18T13:54:12 Michael Lazzaro:
> A perl5-native parser can be rigged up fairly easily, but it's
> *numbingly* slow compared to the C version.  I mean, 20-50 times
> slower, by my guess.

That's the nature of the beast; XML requires a lexer which knows
about more than just two or so character classes; a trivial split
isn't enough to lex it; and it requires a structured language
parsing algorithm (recursive descent, or one of the table-driven
parsers, I imagine LALR1 would be about right).

These do not implement efficiently in high-level scripting
languages. A tight open-coded finite-state-machine lexer with a
well-designed hand-coded recursive-descent parser should execute on
the rough order of a half-dozen or a dozen machine instructions per
input byte.

Heck, even the vastly more trivial CSV parsing deserves enough
care that it runs breathakingly faster with Text::CSV_XS than with
Text::CSV.

> The speed issue when importing XML-like data (which we do *very
> frequently*) is a constant sticking point for us and our clients.

Then we need a good tight lexer/parser written in C, as a library.
If the existing libraries are too fragile or inflexible, this may
mean we need to design and write a new one.

> It is therefore critically important that P6 allows easy, fast
> parsing for XML-like things, not necessarily just XML proper,
> because that's the way the business winds have been blowing.  And
> it needs to support it out-of-the-box.

Then this new library with glue module will have to be shipped with
perl, is all. That's no biggie.

-Bennett

Attachment: pgp00000.pgp
Description: PGP signature

Reply via email to