2003-03-18T13:54:12 Michael Lazzaro: > A perl5-native parser can be rigged up fairly easily, but it's > *numbingly* slow compared to the C version. I mean, 20-50 times > slower, by my guess.
That's the nature of the beast; XML requires a lexer which knows about more than just two or so character classes; a trivial split isn't enough to lex it; and it requires a structured language parsing algorithm (recursive descent, or one of the table-driven parsers, I imagine LALR1 would be about right). These do not implement efficiently in high-level scripting languages. A tight open-coded finite-state-machine lexer with a well-designed hand-coded recursive-descent parser should execute on the rough order of a half-dozen or a dozen machine instructions per input byte. Heck, even the vastly more trivial CSV parsing deserves enough care that it runs breathakingly faster with Text::CSV_XS than with Text::CSV. > The speed issue when importing XML-like data (which we do *very > frequently*) is a constant sticking point for us and our clients. Then we need a good tight lexer/parser written in C, as a library. If the existing libraries are too fragile or inflexible, this may mean we need to design and write a new one. > It is therefore critically important that P6 allows easy, fast > parsing for XML-like things, not necessarily just XML proper, > because that's the way the business winds have been blowing. And > it needs to support it out-of-the-box. Then this new library with glue module will have to be shipped with perl, is all. That's no biggie. -Bennett
pgp00000.pgp
Description: PGP signature