Re: XML is Too Hard for Programmers = Tim Bray

2003-03-18 Thread Robin Berjon
Rich Morin wrote:
I have commented before on the face that Perl doesn't have Power Tools
(read, idioms) that are well suited for handling XML.  Turns out that
Tim Bray agrees.
  http://www.tbray.org/ongoing/When/200x/2003/03/16/XML-Prog
You may want to look at the perl-xml thread called Tim Bray says XML is too 
hard for programmers on this topic, as well as at the xml-dev thread on the 
same topic, especially http://lists.xml.org/archives/xml-dev/200303/msg00536.html.

Perl can have idioms for XML, they just need to be developed. I don't at all 
believe however that that needs to happen at the p6l level. We can already make 
very cool stuff using p5, and the grammar stuff in p6 ought to make the sort of 
while loop Tim Bray describes quite certainly doable as well.

--
Robin Berjon [EMAIL PROTECTED]
Research Engineer, Expwayhttp://expway.fr/
7FC0 6F5F D864 EFB8 08CE  8E74 58E6 D5DB 4889 2488


Re: XML is Too Hard for Programmers = Tim Bray

2003-03-18 Thread Simon Cozens
[EMAIL PROTECTED] (Rich Morin) writes:
 I have commented before on the face that Perl doesn't have Power Tools
 (read, idioms) that are well suited for handling XML.  Turns out that
 Tim Bray agrees.

Tim Bray also says he gives up and uses regexes as a quick and dirty work
around. So maybe these power tools you keep touting aren't necessary after
all.

-- 
A witty saying means nothing.  -Voltaire


Re: XML is Too Hard for Programmers = Tim Bray

2003-03-18 Thread Robin Berjon
Simon Cozens wrote:
[EMAIL PROTECTED] (Rich Morin) writes:

I have commented before on the face that Perl doesn't have Power Tools
(read, idioms) that are well suited for handling XML.  Turns out that
Tim Bray agrees.
Tim Bray also says he gives up and uses regexes as a quick and dirty work
around. So maybe these power tools you keep touting aren't necessary after
all.
To be fair he only does that on data he has 100% full control over and that is 
pre-munged to match his regexen. Otherwise, you really can't do that. But that 
doesn't change much to the (potential absence of) issue, see my other post.

--
Robin Berjon [EMAIL PROTECTED]
Research Engineer, Expwayhttp://expway.fr/
7FC0 6F5F D864 EFB8 08CE  8E74 58E6 D5DB 4889 2488


Re: XML is Too Hard for Programmers = Tim Bray

2003-03-18 Thread Austin Hastings

--- Simon Cozens [EMAIL PROTECTED] wrote:
 [EMAIL PROTECTED] (Rich Morin) writes:
  I have commented before on the face that Perl doesn't have Power
 Tools
  (read, idioms) that are well suited for handling XML.  Turns out
 that
  Tim Bray agrees.
 
 Tim Bray also says he gives up and uses regexes as a quick and dirty
 work around. So maybe these power tools you keep touting aren't 
 necessary after all.

FWIW, I've had to try to rewrite Microsoft's VCPROJ and SLN format
files(*), which look a whole lot like XML. Sadly, if you change the
order of independent entities in the file, Microsoft's internal parser
rejects the file. This despite the fact that MS already has an XML
parser dll available for public consumption (More than one version, in
fact).

To me, this says that there's no real commitment to doing XML. What
there is seems to be a recognition that XML format is regular and
comprehensible to others, so writing XML-like files becomes popular.

=Austin

(*) VCPROJ and SLN files are control files for the VS.net IDE product.


Re: XML is Too Hard for Programmers = Tim Bray

2003-03-18 Thread Robin Berjon
Austin Hastings wrote:
FWIW, I've had to try to rewrite Microsoft's VCPROJ and SLN format
files(*), which look a whole lot like XML. Sadly, if you change the
order of independent entities in the file, Microsoft's internal parser
rejects the file. This despite the fact that MS already has an XML
parser dll available for public consumption (More than one version, in
fact).
To me, this says that there's no real commitment to doing XML. What
there is seems to be a recognition that XML format is regular and
comprehensible to others, so writing XML-like files becomes popular.
Just because MS has one broken tool (surprise!) doesn't mean there's no 
'commitment to doing XML'. There is much commitment, including from MS, and 
people very rarely use XML-like formats.

We are going OT *very* fast.

--
Robin Berjon [EMAIL PROTECTED]
Research Engineer, Expwayhttp://expway.fr/
7FC0 6F5F D864 EFB8 08CE  8E74 58E6 D5DB 4889 2488


Re: XML is Too Hard for Programmers = Tim Bray

2003-03-18 Thread Michael Lazzaro
On Tuesday, March 18, 2003, at 09:55  AM, Austin Hastings wrote:
To me, this says that there's no real commitment to doing XML. What
there is seems to be a recognition that XML format is regular and
comprehensible to others, so writing XML-like files becomes popular.
Yep.  Which makes things even worse.  And this is pretty important 
stuff.

We do a *lot* of XML parsing here (Cognitivity, that is) and even more 
XML-like parsing.  And even with Perl, it's a royal pain.  There are 
P5 XML modules out there which tie into C-based XML libraries... those 
are quite fast, but fail badly if the XML isn't 100% well-formed, and 
are largely not extensible for XML-like situations.  You'd have to 
rip one up and rewrite it, in C, for every iteration of -like, which 
we cannot credibly do.

A perl5-native parser can be rigged up fairly easily, but it's 
*numbingly* slow compared to the C version.  I mean, 20-50 times 
slower, by my guess.  The speed issue when importing XML-like data 
(which we do *very frequently*) is a constant sticking point for us and 
our clients.  Damian's Parse::RecDescent has been a godsend, 
implementation-wise -- but it of course suffers the same nasty speed 
issues.

This is a big, big issue, and one that P6 needs to address well, 
because this is how many businesses will judge it.  What I'm hoping, 
obviously, is that the new P6 regexes -- which will be *perfect* for 
writing and maintaining our umpteen quite-similar parsing rulesets -- 
will be fast enough to at least be in the same order of magnitude as a 
middling C solution.  They don't have to be as fast as C, obviously, 
but they can't be 20x worse.

Why does this matter so much?  Because it's a barn door.  Even though 
it's so much easier to write XML-like parsers in Perl than, well, 
anything else, the speed issue will at some point dictate moving to a 
non-Perl parsing solution.  At which point, the issue becomes how much 
of the rest of the related system to move into that other solution as 
well, since it is much cheaper to maintain expertise in one toolset 
than two.  So within a company, it can lead to greater use of Perl -- 
or abandonment of Perl -- depending on success in this one key area.  
(I have seen this in action at a number of companies.)

It is therefore critically important that P6 allows easy, fast parsing 
for XML-like things, not necessarily just XML proper, because that's 
the way the business winds have been blowing.  And it needs to support 
it out-of-the-box.  Seriously, it's that important.

MikeL



Re: XML is Too Hard for Programmers = Tim Bray

2003-03-18 Thread Bennett Todd
2003-03-18T13:54:12 Michael Lazzaro:
 A perl5-native parser can be rigged up fairly easily, but it's
 *numbingly* slow compared to the C version.  I mean, 20-50 times
 slower, by my guess.

That's the nature of the beast; XML requires a lexer which knows
about more than just two or so character classes; a trivial split
isn't enough to lex it; and it requires a structured language
parsing algorithm (recursive descent, or one of the table-driven
parsers, I imagine LALR1 would be about right).

These do not implement efficiently in high-level scripting
languages. A tight open-coded finite-state-machine lexer with a
well-designed hand-coded recursive-descent parser should execute on
the rough order of a half-dozen or a dozen machine instructions per
input byte.

Heck, even the vastly more trivial CSV parsing deserves enough
care that it runs breathakingly faster with Text::CSV_XS than with
Text::CSV.

 The speed issue when importing XML-like data (which we do *very
 frequently*) is a constant sticking point for us and our clients.

Then we need a good tight lexer/parser written in C, as a library.
If the existing libraries are too fragile or inflexible, this may
mean we need to design and write a new one.

 It is therefore critically important that P6 allows easy, fast
 parsing for XML-like things, not necessarily just XML proper,
 because that's the way the business winds have been blowing.  And
 it needs to support it out-of-the-box.

Then this new library with glue module will have to be shipped with
perl, is all. That's no biggie.

-Bennett


pgp0.pgp
Description: PGP signature


Re: XML is Too Hard for Programmers = Tim Bray

2003-03-18 Thread Austin Hastings

--- Michael Lazzaro [EMAIL PROTECTED] wrote:
 
 On Tuesday, March 18, 2003, at 09:55  AM, Austin Hastings wrote:
  To me, this says that there's no real commitment to doing XML.
 What
  there is seems to be a recognition that XML format is regular and
  comprehensible to others, so writing XML-like files becomes
 popular.
 
 Yep.  Which makes things even worse.  And this is pretty important 
 stuff.
 
 We do a *lot* of XML parsing here (Cognitivity, that is) and even
 more 
 XML-like parsing.  And even with Perl, it's a royal pain.  There
 are 
 P5 XML modules out there which tie into C-based XML libraries...
 those 
 are quite fast, but fail badly if the XML isn't 100% well-formed, and
 
 are largely not extensible for XML-like situations.  You'd have to 
 rip one up and rewrite it, in C, for every iteration of -like,
 which 
 we cannot credibly do.
 
 A perl5-native parser can be rigged up fairly easily, but it's 
 *numbingly* slow compared to the C version.  I mean, 20-50 times 
 slower, by my guess.  The speed issue when importing XML-like data 
 (which we do *very frequently*) is a constant sticking point for us
 and 
 our clients.  Damian's Parse::RecDescent has been a godsend, 
 implementation-wise -- but it of course suffers the same nasty speed 
 issues.
 
 This is a big, big issue, and one that P6 needs to address well, 
 because this is how many businesses will judge it.  What I'm hoping, 
 obviously, is that the new P6 regexes -- which will be *perfect* for 
 writing and maintaining our umpteen quite-similar parsing rulesets --
 
 will be fast enough to at least be in the same order of magnitude as
 a 
 middling C solution.  They don't have to be as fast as C, obviously, 
 but they can't be 20x worse.
 
 Why does this matter so much?  Because it's a barn door.  Even though
 
 it's so much easier to write XML-like parsers in Perl than, well, 
 anything else, the speed issue will at some point dictate moving to a
 
 non-Perl parsing solution.  At which point, the issue becomes how
 much 
 of the rest of the related system to move into that other solution as
 
 well, since it is much cheaper to maintain expertise in one toolset 
 than two.  So within a company, it can lead to greater use of Perl --
 
 or abandonment of Perl -- depending on success in this one key area. 
 (I have seen this in action at a number of companies.)
 
 It is therefore critically important that P6 allows easy, fast
 parsing 
 for XML-like things, not necessarily just XML proper, because that's 
 the way the business winds have been blowing.  And it needs to
 support 
 it out-of-the-box.  Seriously, it's that important.

You wanna take command of P6ML?  :-)

I'm pretty happy with the new rexen, so far. I'll probably be even
happier once the interaction between A5 and A6 solidifies (Write,
Damian, write!). 

And since so much other 6PAN stuff will depend on P6ML, I'm pretty sure
we'll get the XML bits right.

But the recode that needs to get done to get from P6ML to FooCorp's
XMLike Format (FXF) does have the opportunity to be a sales tool:

1- It's not doable. The P6 grammar for XML parsing is so buttpuckerish
that only the original author can understand it, and that only for 10
minutes or so a day.

This will scare people off. It's probably better to do a half-assed job
than to show someone a hideous grammar as an advert for cool new
power.

2- It's a big pain and not worth doing. Better to rewrite.

If the grammar is comprehensible but not extensible/adaptable, then it
may make for a good demo of the power of P6 but the difficulty of
implementing may burn P6.

3- It's simple and easy to do and understand. 

Woo-hoo! How much more do I need to say?

For some Epsilon, P6 should be able to implement XML +/- Epsilon
trivially.

Cases in point:

-- Configuring the rules of XML.

-- Configuring the character set. (Even weird stuff, like using [tag]
instead of tag).

-- Error handling/recovery.

-- Commingling XML with other data.

-- Embedding other languages into XML, and vice versa.

=Austin


Re: XML is Too Hard for Programmers = Tim Bray

2003-03-18 Thread Dan Sugalski
At 10:54 AM -0800 3/18/03, Michael Lazzaro wrote:
A perl5-native parser can be rigged up fairly easily, but it's 
*numbingly* slow compared to the C version.  I mean, 20-50 times 
slower, by my guess.  The speed issue when importing XML-like data 
(which we do *very frequently*) is a constant sticking point for us 
and our clients.  Damian's Parse::RecDescent has been a godsend, 
implementation-wise -- but it of course suffers the same nasty speed 
issues.
I don't know that it makes a difference, as this is *really* a 
library issue rather than a language one, but there's a basic parrot 
XML parser in the parrot examples directory. It's faster (factor of 
four or so, though should speed up with our IO speedups) than the 
equivalent perl 5 version that it's a line-for-line translation of.

The performance numbers are old, it might be faster now.
--
Dan
--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk