Re: should we change [^a-z] to -[a..z] instead of -[a-z]?

2005-04-17 Thread Paul Hodges

--- Larry Wall [EMAIL PROTECTED] wrote:
 On Fri, Apr 15, 2005 at 11:28:31AM -0500, Rod Adams wrote:
 : David Wheeler wrote:
 : 
 : But the first person to write [a...] gets what's comin' to 'em.
 : 
 : Is that nothing (since '.' lt 'a'), or everything after 'a'?
 
 Might as well make it everything after 'a' for consistency.  One
 could also view the last dot as a special version of the ordinary
 any dot, and read it a to whatever.
 
 Larry

I think that if we're looking for consistency, the default should be to
read it as a and everything after it. If someone wants a to
whatever, they should write it [a..\.] since it's a pretty odd
fringe case.




__ 
Do you Yahoo!? 
Plan great trips with Yahoo! Travel: Now over 17,000 guides!
http://travel.yahoo.com/p-travelguide


Re: should we change [^a-z] to -[a..z] instead of -[a-z]?

2005-04-17 Thread Paul Hodges

--- Larry Wall [EMAIL PROTECTED] wrote:
. . .  
 -[a..z]
 
 should be allowed/encouraged/required.  It greatly improves the
 readability in my estimation.  The only problem with requiring .. is
 that people *will* write [a-z] out of habit, and we would probably
 have to outlaw the - form for many years before everyone would get
 used to the .. form.  So maybe we allow - but warn if not
 backslashed.

In general, I think this is a great idea, but what exactly do you mean
by warn if not backslashed? That I'd get a warning *any* time I use a
dash in a character class? I guess I can live with that.



__ 
Do you Yahoo!? 
Plan great trips with Yahoo! Travel: Now over 17,000 guides!
http://travel.yahoo.com/p-travelguide


RE: should we change [^a-z] to -[a..z] instead of -[a-z]?

2005-04-17 Thread Joe Gottman


 -Original Message-
 From: Paul Hodges [mailto:[EMAIL PROTECTED]
 Sent: Sunday, April 17, 2005 1:30 PM
 To: Larry Wall; perl6-language@perl.org
 Subject: Re: should we change [^a-z] to -[a..z] instead of -[a-z]?
 
 
 --- Larry Wall [EMAIL PROTECTED] wrote:
 . . .
  -[a..z]
 
  should be allowed/encouraged/required.  It greatly improves the
  readability in my estimation.  The only problem with requiring .. is
  that people *will* write [a-z] out of habit, and we would probably
  have to outlaw the - form for many years before everyone would get
  used to the .. form.  So maybe we allow - but warn if not
  backslashed.
 
 In general, I think this is a great idea, but what exactly do you mean
 by warn if not backslashed? That I'd get a warning *any* time I use a
 dash in a character class? I guess I can live with that.

   On the other hand, you can use the canonical perl 5 trick of having the
dash be the first character in the class if you want to use a literal dash.

Joe Gottman.





Re: should we change [^a-z] to -[a..z] instead of -[a-z]?

2005-04-15 Thread Juerd
David Wheeler skribis 2005-04-14 21:32 (-0700):
 I was going to say that that was inconsistent, but since you never need 
 to repeat a letter in a character class, well, I guess it isn't. But 
 the first person to write [a...] gets what's comin' to 'em.

Given ASCII, [\x20...] would then be everything except control
characters. Handy!

By the way, does ...5 mean -Inf..5? ;)


Juerd
-- 
http://convolution.nl/maak_juerd_blij.html
http://convolution.nl/make_juerd_happy.html 
http://convolution.nl/gajigu_juerd_n.html


Re: should we change [^a-z] to -[a..z] instead of -[a-z]?

2005-04-15 Thread Braňo Tichý
- Original Message - 
From: Aaron Sherman [EMAIL PROTECTED]
To: David Wheeler [EMAIL PROTECTED]
Cc: Perl6 Language List perl6-language@perl.org
Sent: Friday, April 15, 2005 2:00 PM
Subject: Re: should we change [^a-z] to -[a..z] instead of -[a-z]?


 On Thu, 2005-04-14 at 21:32 -0700, David Wheeler wrote:
  On Apr 14, 2005, at 7:06 PM, Patrick R. Michaud wrote:
 
   So, [a.z]  matches a, ., and z,
   while   [a..z] matches characters a through z inclusive.
 
  I was going to say that that was inconsistent, but since you never need
  to repeat a letter in a character class, well, I guess it isn't. But
  the first person to write [a...] gets what's comin' to 'em.

 A silly question: is there a canonical character set from which we
 extract these ranges? Are we hard-coding Unicode here, or is there some
 way for the user to specify the character set for ranges?


delurk
even sillier question:
if [a.z] matches a, . and z
and [a...] matches all characters from a including (for some definition
of 'all')

how will be range \x21 .. \x2e written?
[!..\.]? (i.e. . escaped?)
/delurk

brao



Re: should we change [^a-z] to -[a..z] instead of -[a-z]?

2005-04-15 Thread Matthew Walton

 delurk
 even sillier question:
 if [a.z] matches a, . and z
 and [a...] matches all characters from a including (for some
 definition of 'all')

 how will be range \x21 .. \x2e written?
 [!..\.]? (i.e. . escaped?)
 /delurk

I was assuming from Larry's mail that [a...] would parse as either:

  1) a character class containing the range from 'a' to '.' (what that
  means is a bit mind-bending for a friday afternoon)  2) a character class 
containing 'a' then a range from '.' to... oh, an
  error
Which way might be ambiguous, but could of course be defined in the
grammar. It hadn't occurred to me that ... for the range to infinity would
be allowed or useful here. I suppose it could just mean 'up to the end of
the available codepoints'.
I do love the idea of [a..f] type ranges though. It's just what the
three dots mean that's got me confused.



Re: should we change [^a-z] to -[a..z] instead of -[a-z]?

2005-04-15 Thread Steven Philip Schubiger
On 14 Apr, Larry Wall wrote:

: In writing some character class translation, I realized that
: 
: -[a-z]
: 
: and its ilk are rather hard to read because of the two hyphens
: that mean different things.  We can't use ![a-z] because that's a
: 0-width lookahead.  Given that we're trying to get rid of special
: exceptions, and - in character classes is weird, and we already
: use .. for ranges everywhere else, and nobody is going to put a
: repeated character into a character class, I'm wondering if
: 
: -[a..z]
: 
: should be allowed/encouraged/required.  It greatly improves the
: readability in my estimation.  The only problem with requiring .. is
: that people *will* write [a-z] out of habit, and we would probably
: have to outlaw the - form for many years before everyone would get
: used to the .. form.  So maybe we allow - but warn if not backslashed.
: 
: Larry

I think, if we bear in mind, as it has been stressed previously, that
many changes concerning regular expressions have been introduced and
require users to assimilate themselves accordingly, it doesn't seem
unreasonable requiring to write double-dot instead of a hyphen; it also
fits the Principle of least surprise idiom nicely, in my opinion.

Nevertheless, as mentioned by David, [a...] would become rather
confusing to people first and secondly to the compiler; although,
regardless whether we assume dot preceeds double-dot or vice-versa,
there would be an expansion enforced (what I'd expect), perhaps
accompanied by a warning.

I agree on a warning upon non-escaped hyphen.

Steven


Re: should we change [^a-z] to -[a..z] instead of -[a-z]?

2005-04-15 Thread Rafael Garcia-Suarez
Aaron Sherman wrote in perl.perl6.language :

 A silly question: is there a canonical character set from which we
 extract these ranges? Are we hard-coding Unicode here, or is there some
 way for the user to specify the character set for ranges?

Perl 5 forces [a-z] (or [i-j] for that matter) to be a range of
lowercase alphabetic characters, even on EBCDIC platforms (where it's
not).


Re: should we change [^a-z] to -[a..z] instead of -[a-z]?

2005-04-15 Thread Patrick R. Michaud
On Fri, Apr 15, 2005 at 01:01:58PM -, Rafael Garcia-Suarez wrote:
 Aaron Sherman wrote in perl.perl6.language :
 
  A silly question: is there a canonical character set from which we
  extract these ranges? Are we hard-coding Unicode here, or is there some
  way for the user to specify the character set for ranges?
 
 Perl 5 forces [a-z] (or [i-j] for that matter) to be a range of
 lowercase alphabetic characters, even on EBCDIC platforms (where it's
 not).

At the moment, PGE (the part that implements the rule engine) is
deferring such questions to Parrot, and otherwise assuming Unicode.
Plus, S02 explicitly indicates that Perl is written in Unicode
and has consistent Unicode semantics, so I think that's what we should
go with.  It's certainly the way the compiler will go, at least
initially.

Pm


Re: should we change [^a-z] to -[a..z] instead of -[a-z]?

2005-04-15 Thread Rod Adams
David Wheeler wrote:
But the first person to write [a...] gets what's comin' to 'em.
Is that nothing (since '.' lt 'a'), or everything after 'a'?
-- Rod Adams



Re: should we change [^a-z] to -[a..z] instead of -[a-z]?

2005-04-15 Thread Larry Wall
On Fri, Apr 15, 2005 at 11:28:31AM -0500, Rod Adams wrote:
: David Wheeler wrote:
: 
: But the first person to write [a...] gets what's comin' to 'em.
: 
: Is that nothing (since '.' lt 'a'), or everything after 'a'?

Might as well make it everything after 'a' for consistency.  One could
also view the last dot as a special version of the ordinary any dot,
and read it a to whatever.

Larry


Re: should we change [^a-z] to -[a..z] instead of -[a-z]?

2005-04-14 Thread Darren Duncan
At 5:21 PM -0700 4/14/05, Larry Wall wrote:
In writing some character class translation, I realized that
-[a-z]
and its ilk are rather hard to read because of the two hyphens
that mean different things.  We can't use ![a-z] because that's a
0-width lookahead.  Given that we're trying to get rid of special
exceptions, and - in character classes is weird, and we already
use .. for ranges everywhere else, and nobody is going to put a
repeated character into a character class, I'm wondering if
-[a..z]
should be allowed/encouraged/required.  It greatly improves the
readability in my estimation.  The only problem with requiring .. is
that people *will* write [a-z] out of habit, and we would probably
have to outlaw the - form for many years before everyone would get
used to the .. form.  So maybe we allow - but warn if not backslashed.
Larry
I don't see why the old syntax has to be supported at all.
Lots of other regexp details are already being changed, such as the 
bounding '' and the removal of the leading internal '^', so people 
already have to edit their regexps.  So they can replace the '-' too 
while they're at it; not very difficult.

Moreover, I often create character classes that have a literal '-' in 
it, and it would be nice to not have to make that the last character 
in the class for it to parse properly.

Also, the '..' is easy to learn because it is consistent with other 
parts of Perl 6.  Likewise, the consistency is another plus when 
demonstrating what is good about Perl to folk who don't use it.

-- Darren Duncan


Re: should we change [^a-z] to -[a..z] instead of -[a-z]?

2005-04-14 Thread Patrick R. Michaud
On Thu, Apr 14, 2005 at 05:21:05PM -0700, Larry Wall wrote:
 Given that we're trying to get rid of special
 exceptions, and - in character classes is weird, and we already
 use .. for ranges everywhere else, and nobody is going to put a
 repeated character into a character class, I'm wondering if
 
 -[a..z]
 
 should be allowed/encouraged/required.  It greatly improves the
 readability in my estimation.  

So, [a.z]  matches a, ., and z, 
while   [a..z] matches characters a through z inclusive.

I think that works for me.  I'll implement it that way (and yes, there
*are* updates to PGE coming very soon!).  

I guess I can't complain too loudly about .. over - for ranges
since I was the one who suggested replacing , with .. in quantifiers
(e.g., {1..3} instead of {1,3}).  Not that I'd be complaining anyway.  :-)

 The only problem with requiring .. is
 that people *will* write [a-z] out of habit, and we would probably
 have to outlaw the - form for many years before everyone would get
 used to the .. form.  So maybe we allow - but warn if not backslashed.

Just to make sure I have it right, by allow - you mean that 
[a-z] matches a, -, and z and produces a warning 
about an unescaped '-'?

Pm


Re: should we change [^a-z] to -[a..z] instead of -[a-z]?

2005-04-14 Thread David Wheeler
On Apr 14, 2005, at 7:06 PM, Patrick R. Michaud wrote:
So, [a.z]  matches a, ., and z,
while   [a..z] matches characters a through z inclusive.
I was going to say that that was inconsistent, but since you never need 
to repeat a letter in a character class, well, I guess it isn't. But 
the first person to write [a...] gets what's comin' to 'em.

Regards,
David
--
David Wheeler
President, Kineticode, Inc.
http://www.kineticode.com/
Kineticode. Setting knowledge in motion.[sm]