Re: XML is Too Hard for Programmers = Tim Bray
Rich Morin wrote: I have commented before on the face that Perl doesn't have Power Tools (read, idioms) that are well suited for handling XML. Turns out that Tim Bray agrees. http://www.tbray.org/ongoing/When/200x/2003/03/16/XML-Prog You may want to look at the perl-xml thread called Tim Bray says XML is too hard for programmers on this topic, as well as at the xml-dev thread on the same topic, especially http://lists.xml.org/archives/xml-dev/200303/msg00536.html. Perl can have idioms for XML, they just need to be developed. I don't at all believe however that that needs to happen at the p6l level. We can already make very cool stuff using p5, and the grammar stuff in p6 ought to make the sort of while loop Tim Bray describes quite certainly doable as well. -- Robin Berjon [EMAIL PROTECTED] Research Engineer, Expwayhttp://expway.fr/ 7FC0 6F5F D864 EFB8 08CE 8E74 58E6 D5DB 4889 2488
Re: XML is Too Hard for Programmers = Tim Bray
[EMAIL PROTECTED] (Rich Morin) writes: I have commented before on the face that Perl doesn't have Power Tools (read, idioms) that are well suited for handling XML. Turns out that Tim Bray agrees. Tim Bray also says he gives up and uses regexes as a quick and dirty work around. So maybe these power tools you keep touting aren't necessary after all. -- A witty saying means nothing. -Voltaire
Re: XML is Too Hard for Programmers = Tim Bray
Simon Cozens wrote: [EMAIL PROTECTED] (Rich Morin) writes: I have commented before on the face that Perl doesn't have Power Tools (read, idioms) that are well suited for handling XML. Turns out that Tim Bray agrees. Tim Bray also says he gives up and uses regexes as a quick and dirty work around. So maybe these power tools you keep touting aren't necessary after all. To be fair he only does that on data he has 100% full control over and that is pre-munged to match his regexen. Otherwise, you really can't do that. But that doesn't change much to the (potential absence of) issue, see my other post. -- Robin Berjon [EMAIL PROTECTED] Research Engineer, Expwayhttp://expway.fr/ 7FC0 6F5F D864 EFB8 08CE 8E74 58E6 D5DB 4889 2488
Re: XML is Too Hard for Programmers = Tim Bray
--- Simon Cozens [EMAIL PROTECTED] wrote: [EMAIL PROTECTED] (Rich Morin) writes: I have commented before on the face that Perl doesn't have Power Tools (read, idioms) that are well suited for handling XML. Turns out that Tim Bray agrees. Tim Bray also says he gives up and uses regexes as a quick and dirty work around. So maybe these power tools you keep touting aren't necessary after all. FWIW, I've had to try to rewrite Microsoft's VCPROJ and SLN format files(*), which look a whole lot like XML. Sadly, if you change the order of independent entities in the file, Microsoft's internal parser rejects the file. This despite the fact that MS already has an XML parser dll available for public consumption (More than one version, in fact). To me, this says that there's no real commitment to doing XML. What there is seems to be a recognition that XML format is regular and comprehensible to others, so writing XML-like files becomes popular. =Austin (*) VCPROJ and SLN files are control files for the VS.net IDE product.
Re: XML is Too Hard for Programmers = Tim Bray
Austin Hastings wrote: FWIW, I've had to try to rewrite Microsoft's VCPROJ and SLN format files(*), which look a whole lot like XML. Sadly, if you change the order of independent entities in the file, Microsoft's internal parser rejects the file. This despite the fact that MS already has an XML parser dll available for public consumption (More than one version, in fact). To me, this says that there's no real commitment to doing XML. What there is seems to be a recognition that XML format is regular and comprehensible to others, so writing XML-like files becomes popular. Just because MS has one broken tool (surprise!) doesn't mean there's no 'commitment to doing XML'. There is much commitment, including from MS, and people very rarely use XML-like formats. We are going OT *very* fast. -- Robin Berjon [EMAIL PROTECTED] Research Engineer, Expwayhttp://expway.fr/ 7FC0 6F5F D864 EFB8 08CE 8E74 58E6 D5DB 4889 2488
Re: XML is Too Hard for Programmers = Tim Bray
On Tuesday, March 18, 2003, at 09:55 AM, Austin Hastings wrote: To me, this says that there's no real commitment to doing XML. What there is seems to be a recognition that XML format is regular and comprehensible to others, so writing XML-like files becomes popular. Yep. Which makes things even worse. And this is pretty important stuff. We do a *lot* of XML parsing here (Cognitivity, that is) and even more XML-like parsing. And even with Perl, it's a royal pain. There are P5 XML modules out there which tie into C-based XML libraries... those are quite fast, but fail badly if the XML isn't 100% well-formed, and are largely not extensible for XML-like situations. You'd have to rip one up and rewrite it, in C, for every iteration of -like, which we cannot credibly do. A perl5-native parser can be rigged up fairly easily, but it's *numbingly* slow compared to the C version. I mean, 20-50 times slower, by my guess. The speed issue when importing XML-like data (which we do *very frequently*) is a constant sticking point for us and our clients. Damian's Parse::RecDescent has been a godsend, implementation-wise -- but it of course suffers the same nasty speed issues. This is a big, big issue, and one that P6 needs to address well, because this is how many businesses will judge it. What I'm hoping, obviously, is that the new P6 regexes -- which will be *perfect* for writing and maintaining our umpteen quite-similar parsing rulesets -- will be fast enough to at least be in the same order of magnitude as a middling C solution. They don't have to be as fast as C, obviously, but they can't be 20x worse. Why does this matter so much? Because it's a barn door. Even though it's so much easier to write XML-like parsers in Perl than, well, anything else, the speed issue will at some point dictate moving to a non-Perl parsing solution. At which point, the issue becomes how much of the rest of the related system to move into that other solution as well, since it is much cheaper to maintain expertise in one toolset than two. So within a company, it can lead to greater use of Perl -- or abandonment of Perl -- depending on success in this one key area. (I have seen this in action at a number of companies.) It is therefore critically important that P6 allows easy, fast parsing for XML-like things, not necessarily just XML proper, because that's the way the business winds have been blowing. And it needs to support it out-of-the-box. Seriously, it's that important. MikeL
Re: XML is Too Hard for Programmers = Tim Bray
2003-03-18T13:54:12 Michael Lazzaro: A perl5-native parser can be rigged up fairly easily, but it's *numbingly* slow compared to the C version. I mean, 20-50 times slower, by my guess. That's the nature of the beast; XML requires a lexer which knows about more than just two or so character classes; a trivial split isn't enough to lex it; and it requires a structured language parsing algorithm (recursive descent, or one of the table-driven parsers, I imagine LALR1 would be about right). These do not implement efficiently in high-level scripting languages. A tight open-coded finite-state-machine lexer with a well-designed hand-coded recursive-descent parser should execute on the rough order of a half-dozen or a dozen machine instructions per input byte. Heck, even the vastly more trivial CSV parsing deserves enough care that it runs breathakingly faster with Text::CSV_XS than with Text::CSV. The speed issue when importing XML-like data (which we do *very frequently*) is a constant sticking point for us and our clients. Then we need a good tight lexer/parser written in C, as a library. If the existing libraries are too fragile or inflexible, this may mean we need to design and write a new one. It is therefore critically important that P6 allows easy, fast parsing for XML-like things, not necessarily just XML proper, because that's the way the business winds have been blowing. And it needs to support it out-of-the-box. Then this new library with glue module will have to be shipped with perl, is all. That's no biggie. -Bennett pgp0.pgp Description: PGP signature
Re: XML is Too Hard for Programmers = Tim Bray
--- Michael Lazzaro [EMAIL PROTECTED] wrote: On Tuesday, March 18, 2003, at 09:55 AM, Austin Hastings wrote: To me, this says that there's no real commitment to doing XML. What there is seems to be a recognition that XML format is regular and comprehensible to others, so writing XML-like files becomes popular. Yep. Which makes things even worse. And this is pretty important stuff. We do a *lot* of XML parsing here (Cognitivity, that is) and even more XML-like parsing. And even with Perl, it's a royal pain. There are P5 XML modules out there which tie into C-based XML libraries... those are quite fast, but fail badly if the XML isn't 100% well-formed, and are largely not extensible for XML-like situations. You'd have to rip one up and rewrite it, in C, for every iteration of -like, which we cannot credibly do. A perl5-native parser can be rigged up fairly easily, but it's *numbingly* slow compared to the C version. I mean, 20-50 times slower, by my guess. The speed issue when importing XML-like data (which we do *very frequently*) is a constant sticking point for us and our clients. Damian's Parse::RecDescent has been a godsend, implementation-wise -- but it of course suffers the same nasty speed issues. This is a big, big issue, and one that P6 needs to address well, because this is how many businesses will judge it. What I'm hoping, obviously, is that the new P6 regexes -- which will be *perfect* for writing and maintaining our umpteen quite-similar parsing rulesets -- will be fast enough to at least be in the same order of magnitude as a middling C solution. They don't have to be as fast as C, obviously, but they can't be 20x worse. Why does this matter so much? Because it's a barn door. Even though it's so much easier to write XML-like parsers in Perl than, well, anything else, the speed issue will at some point dictate moving to a non-Perl parsing solution. At which point, the issue becomes how much of the rest of the related system to move into that other solution as well, since it is much cheaper to maintain expertise in one toolset than two. So within a company, it can lead to greater use of Perl -- or abandonment of Perl -- depending on success in this one key area. (I have seen this in action at a number of companies.) It is therefore critically important that P6 allows easy, fast parsing for XML-like things, not necessarily just XML proper, because that's the way the business winds have been blowing. And it needs to support it out-of-the-box. Seriously, it's that important. You wanna take command of P6ML? :-) I'm pretty happy with the new rexen, so far. I'll probably be even happier once the interaction between A5 and A6 solidifies (Write, Damian, write!). And since so much other 6PAN stuff will depend on P6ML, I'm pretty sure we'll get the XML bits right. But the recode that needs to get done to get from P6ML to FooCorp's XMLike Format (FXF) does have the opportunity to be a sales tool: 1- It's not doable. The P6 grammar for XML parsing is so buttpuckerish that only the original author can understand it, and that only for 10 minutes or so a day. This will scare people off. It's probably better to do a half-assed job than to show someone a hideous grammar as an advert for cool new power. 2- It's a big pain and not worth doing. Better to rewrite. If the grammar is comprehensible but not extensible/adaptable, then it may make for a good demo of the power of P6 but the difficulty of implementing may burn P6. 3- It's simple and easy to do and understand. Woo-hoo! How much more do I need to say? For some Epsilon, P6 should be able to implement XML +/- Epsilon trivially. Cases in point: -- Configuring the rules of XML. -- Configuring the character set. (Even weird stuff, like using [tag] instead of tag). -- Error handling/recovery. -- Commingling XML with other data. -- Embedding other languages into XML, and vice versa. =Austin
Re: XML is Too Hard for Programmers = Tim Bray
At 10:54 AM -0800 3/18/03, Michael Lazzaro wrote: A perl5-native parser can be rigged up fairly easily, but it's *numbingly* slow compared to the C version. I mean, 20-50 times slower, by my guess. The speed issue when importing XML-like data (which we do *very frequently*) is a constant sticking point for us and our clients. Damian's Parse::RecDescent has been a godsend, implementation-wise -- but it of course suffers the same nasty speed issues. I don't know that it makes a difference, as this is *really* a library issue rather than a language one, but there's a basic parrot XML parser in the parrot examples directory. It's faster (factor of four or so, though should speed up with our IO speedups) than the equivalent perl 5 version that it's a line-for-line translation of. The performance numbers are old, it might be faster now. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk