Re: Suggested magic for "a" .. "b"

2010-08-01 Thread Leon Timmermans
On Sun, Aug 1, 2010 at 11:39 PM, Martin D Kealey
 wrote:
> In any case I'd much rather prefer that the behaviour be lexically scoped,
> with either adverbs or pragmata, not with the action-at-a-distance that's
> caused by tagging something as fundamental as a String.

In many cases the collation isn't known at compile-time, so adverbs
would be necessary anyway. Pragma's can make things easier in many
cases.

Leon


Re: Suggested magic for "a" .. "b"

2010-08-01 Thread Darren Duncan

Martin D Kealey wrote:

On Wed, 28 Jul 2010, Darren Duncan wrote:

I think that a general solution here is to accept that there may be more
than one valid way to sort some types, strings especially, and so
operators/routines that do sorting should be customizable in some way so
users can pick the behaviour they want.

The customization could be applied at various levels, such as using an
extra argument or trait for the operator/function that cares about
ordering,


That much I agree wholeheartedly with, but ...


or by using an extra attribute or trait for the types being sorted.


... puts us back where we started: how do we cope if the two endpoints
aren't tagged with the same attribute or trait or locale?

In any case I'd much rather prefer that the behaviour be lexically scoped,
with either adverbs or pragmata, not with the action-at-a-distance that's
caused by tagging something as fundamental as a String.


Lexical scoping *is* a good idea, and I would also imagine that users would 
frequently apply that at the file or setting level.


But making this a pragma means that the pragma would have to be a little more 
verbose than a typical pragma.


In the general format, one wouldn't just say, eg:

  collation FooNation;

... but rather it would at least be more like:

  collation Str FooNation;

... to say that you're only applying to operations involving Str types and not, 
say, Numeric types.



So then, "a" cmp "ส้" is always defined, but users can change the
definition.


I take the opposite approach; it's always undefined (read, unthrown
exception) unless the user tells us how they want it treated. That can be a
command-line switch if necessary.

To paraphrase Dante, "the road to hell is paved with Reasonable Defaults".
Or in programming terms, your reasonable default is the cause of my ugly
work-around.


That might be fair.

But if we're going to do that, then I'd like to go a step further and require 
some other operators have mandatory config arguments for users to explicitly 
state the semantics they want, but that once again a lexical pragma can declare 
this at a higher level.


I'm restating this thought in another thread, "rounding method adverbs", so 
that's the best place to follow it.


-- Darren Duncan


Re: Suggested magic for "a" .. "b"

2010-08-01 Thread Martin D Kealey
On Wed, 28 Jul 2010, Darren Duncan wrote:
> I think that a general solution here is to accept that there may be more
> than one valid way to sort some types, strings especially, and so
> operators/routines that do sorting should be customizable in some way so
> users can pick the behaviour they want.
>
> The customization could be applied at various levels, such as using an
> extra argument or trait for the operator/function that cares about
> ordering,

That much I agree wholeheartedly with, but ...

> or by using an extra attribute or trait for the types being sorted.

... puts us back where we started: how do we cope if the two endpoints
aren't tagged with the same attribute or trait or locale?

In any case I'd much rather prefer that the behaviour be lexically scoped,
with either adverbs or pragmata, not with the action-at-a-distance that's
caused by tagging something as fundamental as a String.

Yes sometimes you want the behaviour of your range to mimic the locale of
its operands, but then it should be explicit, with a trait that also
explicitly selects either the left or right operand to extract the locale
from. And probably throw an exception if they aren't in the same locale.

If you don't specify that you want locale-dependent behaviour then the
default action should be an unthrown exception unless both endpoints are
inarguably comparable, so IMHO that pretty much rules out any code-points
that are used in more than language, save perhaps raw ASCII. And even then
you really should make an explicit choice between case-sensitive and
case-insensitive comparison.

> When you want to be consistent, the behaviour of "cmp" affects all of the
> other order-sensitive operations, including any working with intervals.

Indeed, the range constructor and the cmp operator should have the same
adverbs and share lexical pragmata.

> So then, "a" cmp "ส้" is always defined, but users can change the
> definition.

I take the opposite approach; it's always undefined (read, unthrown
exception) unless the user tells us how they want it treated. That can be a
command-line switch if necessary.

To paraphrase Dante, "the road to hell is paved with Reasonable Defaults".
Or in programming terms, your reasonable default is the cause of my ugly
work-around.

-Martin


Re: Suggested magic for "a" .. "b"

2010-07-30 Thread Jon Lang
Aaron Sherman wrote:
> In the end, I'm now questioning the difference between a junction and
> a Range... which is not where I thought this would go.

Conceptually, they're closely related.  In particular, a range behaves
a lot like an any() junction.  Some differences:

1. An any() junction always has a discrete set of options in it; but a
Range could (and generally does) have a continuous set of options.

2. An any() junction can have an arbitrary set of options; a Range's
set of options is defined entirely by its endpoints.

-- 
Jonathan "Dataweaver" Lang


Re: Suggested magic for "a" .. "b"

2010-07-30 Thread Aaron Sherman
On Fri, Jul 30, 2010 at 6:45 PM, Doug McNutt  wrote:
> Please pardon intrusion by a novice who is anything but object oriented.

No problem. Sometimes a fresh perspective helps to illuminate things.

Skipping ahead...

> Are you guise sure that the "..." and ".." operators in perl 6 shouldn't make 
> use of regular expression syntax while deciding just what is intended by the 
> programmer?

You kind of blew my mind, there. I tried to respond twice and each
time I determined that there was a way around what I was about to call
crazy.

In the end, I'm now questioning the difference between a junction and
a Range... which is not where I thought this would go. Good question,
though I should point out that you could never reasonably listify a
range constructed from a regex because "reversing" a regex like that
immediately runs into some awful edge cases. Still, interesting stuff.

-- 
Aaron Sherman
Email or GTalk: a...@ajs.com
http://www.ajs.com/~ajs


Re: Suggested magic for "a" .. "b"

2010-07-30 Thread Doug McNutt
Please pardon intrusion by a novice who is anything but object oriented.

I consider myself a long time user of perl 5. I love it and it has completely 
replaced FORTRAN as my compiler of choice. "Programming Perl" is so dog-eared 
that I may need a replacement. I joined this list when I thought the <<...>> 
operators might allow for vector operations like cross product. dot product, 
curl, grad, and divergence.  I was mistaken but was pleased that such things 
would be possible as add-ins to be created later.

I have never used the ".." operator on perl 5, mostly because I can't 
understand it.

I have actually wished for, in perl 5, an ability to create a list, really a 
unsorted set with an @theset kind of description that I could create with a 
regular expression. All ASCII strings that would match would become members of 
@theset.

@theset = /\A2N\d\d\d\d\Z/;

would make create a temporary array of transistors that have "2N", once 
military, designations. That list would become an input to some other code that 
would look for datasheets. Memory intensive but easy to understand.

Are you guise sure that the "..." and ".." operators in perl 6 shouldn't make 
use of regular expression syntax while deciding just what is intended by the 
programmer?

-- 
-->  The best programming tool is a soldering iron <--


Re: Suggested magic for "a" .. "b"

2010-07-30 Thread Leon Timmermans
On Thu, Jul 29, 2010 at 9:51 PM, Aaron Sherman  wrote:
> My only strongly held belief, here, is that you should not try to answer any
> of these questions for the default range operator on
> unadorned, context-less strings. For that case, you must do something that
> makes sense for all Unicode codepoints in nearly all contexts.

I find that both of limited use and the only sane possibility at the
same time :-|

Leon


Re: Suggested magic for "a" .. "b"

2010-07-30 Thread Brandon S Allbery KF8NH
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 7/29/10 08:15 , Leon Timmermans wrote:
> On Thu, Jul 29, 2010 at 3:24 AM, Darren Duncan  
> wrote:
>>  $foo ~~ $a..$b :QuuxNationality  # just affects this one test
> 
> I like that
> 
>>  $bar = 'hello' :QuuxNationality  # applies anywhere the Str value is used
> 
> What if you compare a QuuxNationality Str with a FooNationality Str?
> That should blow up. Also it can lead to action at a distance. I don't
> think that's the way to go.

It's half right;  the coding set should be part of the type.  Explicit
conversion is probably a good idea too.

- -- 
brandon s. allbery [linux,solaris,freebsd,perl]  allb...@kf8nh.com
system administrator  [openafs,heimdal,too many hats]  allb...@ece.cmu.edu
electrical and computer engineering, carnegie mellon university  KF8NH
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.10 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkxS09sACgkQIn7hlCsL25U69wCdFeqshkDQx24C6QT7Q7XlmF85
zmcAoK7969GXHUwhF9bZ+NPv8xy3qR5m
=vFdg
-END PGP SIGNATURE-


Re: Suggested magic for "a" .. "b"

2010-07-29 Thread Aaron Sherman
On Wed, Jul 28, 2010 at 9:24 PM, Darren Duncan wrote:

> Jon Lang wrote:
>
>> I don't know enough about Unicode to suggest how to solve this.
>>
>>
Thankfully, I know little enough to take up the challenge ;-)


>  All I can
>>> say is that my example above should never return a valid Range object
>>> unless
>>> there is a way I can specify my own ordering and I use it.
>>>
>>
Please see my suggested approach way, way back at the start of all this. Use
Unicode scripts, properties and codepoint sequences to produce a list of
codepoints. Want something more meaningful than codepoints? Great, use an
object that knows what you're asking for:

   EnglishDictword("apple") .. EnglishDictWord("orange")

It's a very Perl way to approach a problem: provide the solution that meets
the least common denominator need (return a range object that represents
ranges based on the information we have) and then allow that same feature to
be used in cases where the user has provided sufficient context to do
something smarter.

I don't think it makes sense to extend the length of strings under
consideration by default. Obviously the above example would include
"blackberry" because you've asked it to consider English dictionary words,
but "aa" .. "zz" shouldn't contain "blackberry" because you don't have
enough data to understand what's being asked for, and thus should fall back
to treating strings as lists of codepoints (speaking of which do we define a
behavior for (1,2,3) .. (4,5,6)? Right now, we consider (1,2,7) to be in
that range, and I don't think that's a terribly useful result).



>
>> That actually says something: it says that we may want to reconsider
>> the notion that all string values can be sorted.  You're suggesting
>> the possibility that "a" cmp "ส้" is, by default, undefined.
>>
>

By default, I think it should by +1 because of the codepoint comparison. If
you then tell Perl that you want that comparison done in a Thai context,
then it's probably -1.

The golden rule of Unicode is: never pretend you have more information than
you do.



>
> I think that a general solution here is to accept that there may be more
> than one valid way to sort some types, strings especially, and so
> operators/routines that do sorting should be customizable in some way so
> users can pick the behaviour they want.
>

And I think that this brings you back to what I was saying at the top of the
thread which is that the most basic approach treats each codepoint as a
collection of information and sorts on that information first and then the
codepoint number itself. If that's not useful to you, tell Perl what you
really wanted.



> Some possible examples of customization:
>
>  $foo ~~ $a..$b :QuuxNationality  # just affects this one test
>
>  $bar = 'hello' :QuuxNationality  # applies anywhere the Str value is used
>

That's a bit too easy to read without thinking about the implications. I
bring back my original example from long ago:

"TOPIXコンポジット1500構成銘柄" which I shamelessly grabbed from a Tokyo Stock
Exchange page. That one string, used in everyday text, contains Latin
letters, Hiragana [I lied, there's no Hiragana], Katakana, Han or Kanji
idiograms and Latin digits.


Now call .succ on that sucker, I dare you, keeping in mind that there's no
one "Japanese" script in Unicode. I think the only valid starting point
without any contextual information is to essentially treat it as a sequence
of codepoints (as if it were an array of integers) and do something
marginally sane on that basis. Then you let the user provide you with hints.
Yes, it's "Japanese language" but that doesn't tell you as much as you'd
hope, since many of the rules come from the languages that Japanese is
borrowing from, here.

One answer is to break it down on script and major category property
boundaries into "TOPIX" (Latin: the name of an index), "コンポジット" (Katakana:
phonetically this is "konpozito" or "composite"), "1500" (Latin digits), and
"構成銘柄" (Kanji ideographs: constituents). Now, treat each one of those as a
separate sequence of codepoints and begin incrementing each sub-sequence in
turn. You could also apply Japanese sorting rules to the successor method,
but then you get into questions of what the Japanese sorting method is for
Latin letters... probably a solved problem, but obscure enough that I'll bet
there are edge cases that are NOT solvable just by knowing that the locale
because they are finer grained (e.g. which Latin-using language does the
word come from? What source language is most appropriate for the context?
etc.)

Maybe you throw an exception when you try to tell Perl that "
TOPIXコンポジット1500構成銘柄" is a Japanese string... but then Perl is rejecting
strings that are considered valid in some contexts within that language.

My only strongly held belief, here, is that you should not try to answer any
of these questions for the default range operator on
unadorned, context-less strings. For that case, you must do something that
makes sense for all 

Re: Suggested magic for "a" .. "b"

2010-07-29 Thread yary
On Thu, Jul 29, 2010 at 5:15 AM, Leon Timmermans  wrote:
> On Thu, Jul 29, 2010 at 3:24 AM, Darren Duncan  
> wrote:
>> Some possible examples of customization:
>>
>>  $foo ~~ $a..$b :QuuxNationality  # just affects this one test
>
> I like that
>
>>  $bar = 'hello' :QuuxNationality  # applies anywhere the Str value is used
>>
>
> What if you compare a QuuxNationality Str with a FooNationality Str?
> That should blow up. Also it can lead to action at a distance. I don't
> think that's the way to go.

I think it's an elegant use of encapsulation- keeping a string's
locale with the string. If the you want to compare two strings with
different collations, either-
 $foo ~~ $a..$b :QuuxNationality  # override the locales for this test
or
  $foo ~~ $a..$b # Perl warns about conflict, and falls back to its default

-y


Re: Suggested magic for "a" .. "b"

2010-07-29 Thread Leon Timmermans
On Thu, Jul 29, 2010 at 3:24 AM, Darren Duncan  wrote:
> Some possible examples of customization:
>
>  $foo ~~ $a..$b :QuuxNationality  # just affects this one test

I like that

>  $bar = 'hello' :QuuxNationality  # applies anywhere the Str value is used
>

What if you compare a QuuxNationality Str with a FooNationality Str?
That should blow up. Also it can lead to action at a distance. I don't
think that's the way to go.

Leon


Re: Suggested magic for "a" .. "b"

2010-07-28 Thread Jon Lang
On Wed, Jul 28, 2010 at 10:35 PM, Brandon S Allbery KF8NH
 wrote:
>  On 7/28/10 8:07 PM, Michael Zedeler wrote:
>> On 2010-07-29 01:39, Jon Lang wrote:
>>> Aaron Sherman wrote:
> In smart-match context, "a".."b" includes "aardvark".
 No one has yet explained to me why that makes sense. The continued
 use of
 ASCII examples, of course, doesn't help. Does "a" .. "b" include
 "æther"?
 This is where Germans and Swedes, for example, don't agree, but
 they're all
 using the same Latin code blocks.
>>> This is definitely something for the Unicode crowd to look into.  But
>>> whatever solution you come up with, please make it compatible with the
>>> notion that "aardvark".."apple" can be used to match any word in the
>>> dictionary that comes between those two words.
>> The key issue here is whethere there is a well defined and meaningful
>> ordering of the characters in question. We keep discussing the nice
>> examples, but how about "apple" .. "ส้ม"?
>
> I thought that was already disallowed by spec.

As a range, it ought to work; it's only when you try to generate a
list from it that you run into trouble, as the spec currently assumes
that "z".succ eqv "aa".

Anyway: whatever default algorithm we go with for resolving "cmp", I
strongly recommend that we define the default .succ so that "$x lt
$x.succ" is always true.

-- 
Jonathan "Dataweaver" Lang


Re: Suggested magic for "a" .. "b"

2010-07-28 Thread Brandon S Allbery KF8NH
 On 7/28/10 8:07 PM, Michael Zedeler wrote:
> On 2010-07-29 01:39, Jon Lang wrote:
>> Aaron Sherman wrote:
 In smart-match context, "a".."b" includes "aardvark".
>>> No one has yet explained to me why that makes sense. The continued
>>> use of
>>> ASCII examples, of course, doesn't help. Does "a" .. "b" include
>>> "æther"?
>>> This is where Germans and Swedes, for example, don't agree, but
>>> they're all
>>> using the same Latin code blocks.
>> This is definitely something for the Unicode crowd to look into.  But
>> whatever solution you come up with, please make it compatible with the
>> notion that "aardvark".."apple" can be used to match any word in the
>> dictionary that comes between those two words.
> The key issue here is whethere there is a well defined and meaningful
> ordering of the characters in question. We keep discussing the nice
> examples, but how about "apple" .. "ส้ม"?

I thought that was already disallowed by spec.


Re: Suggested magic for "a" .. "b"

2010-07-28 Thread Darren Duncan

Jon Lang wrote:

I don't know enough about Unicode to suggest how to solve this. All I can
say is that my example above should never return a valid Range object unless
there is a way I can specify my own ordering and I use it.


That actually says something: it says that we may want to reconsider
the notion that all string values can be sorted.  You're suggesting
the possibility that "a" cmp "ส้" is, by default, undefined.


I think that a general solution here is to accept that there may be more than 
one valid way to sort some types, strings especially, and so operators/routines 
that do sorting should be customizable in some way so users can pick the 
behaviour they want.


The customization could be applied at various levels, such as using an extra 
argument or trait for the operator/function that cares about ordering, or by 
using an extra attribute or trait for the types being sorted.


In fact, this whole issue is very close in concept to the situations where you 
need to do equality/identity tests.


With strings, identity tests can change answers depending on whether you are 
doing it on language-dependent or language-independent graphemes, and Perl 6 
encodes that abstraction level as value metadata.


When you want to be consistent, the behaviour of "cmp" affects all of the other 
order-sensitive operations, including any working with intervals.


Some possible examples of customization:

  $foo ~~ $a..$b :QuuxNationality  # just affects this one test

  $bar = 'hello' :QuuxNationality  # applies anywhere the Str value is used

Also, declaring a Str subtype or something.

Of course, after all this, we still want some reasonable default.  I suggest 
that for Str that aren't nationality-specific, the default ordering semantics 
are by whatever generic ordering Unicode defines, which might be by codepoint. 
And then for Str with nationality-specific grapheme abstractions, the default 
sorting can be whatever is the case for that nationality.  And this is how it is 
except where users define some other order.


So then, "a" cmp "ส้" is always defined, but users can change the definition.

-- Darren Duncan


Re: Suggested magic for "a" .. "b"

2010-07-28 Thread Michael Zedeler

On 2010-07-29 02:19, Jon Lang wrote:

Michael Zedeler wrote:
   

Jon Lang wrote:
 

This is definitely something for the Unicode crowd to look into.  But
whatever solution you come up with, please make it compatible with the
notion that "aardvark".."apple" can be used to match any word in the
dictionary that comes between those two words.
   

The key issue here is whether there is a well defined and meaningful
ordering of the characters in question. We keep discussing the nice
examples, but how about "apple" .. "ส้ม"?
 

All I'm saying is: don't throw out the baby with the bathwater.  Come
up with an interim solution that handles the nice examples intuitively
and the ugly examples poorly (or better, if you can manage that right
out of the gate); then revise the model to improve the handling of the
ugly examples as much as you can; but while you do so, make an effort
to keep the nice examples working.
   
I am sorry if what I write is understood as an argument against ranges 
of strings. I think I know too little about Unicode to be able to do 
anything but point at some issues, I belive we'll have to deal with. The 
solution is not obvious to me.

I don't know enough about Unicode to suggest how to solve this. All I can
say is that my example above should never return a valid Range object unless
there is a way I can specify my own ordering and I use it.
 

That actually says something: it says that we may want to reconsider
the notion that all string values can be sorted.  You're suggesting
the possibility that "a" cmp "ส้" is, by default, undefined.
   

Yes, but I am sure its due to my lack of understanding of Unicode.

Regards,

Michael.



Re: Suggested magic for "a" .. "b"

2010-07-28 Thread Chris Fields
On Jul 28, 2010, at 1:27 PM, Mark J. Reed wrote:

> On Wednesday, July 28, 2010, Jon Lang  wrote:
>> Keep it simple, folks!  There are enough corner cases in Perl 6 as
>> things stand; we don't need to be introducing more of them if we can
>> help it.
> 
> Can I get an Amen?  Amen!
> -- 
> Mark J. Reed 

+1.  I'm agnostic ;>

chris


Re: Suggested magic for "a" .. "b"

2010-07-28 Thread Chris Fields
On Jul 28, 2010, at 1:37 PM, Mark J. Reed wrote:

> On Wed, Jul 28, 2010 at 2:30 PM, Chris Fields  wrote:
>> On Jul 28, 2010, at 1:27 PM, Mark J. Reed wrote:
>>> Can I get an Amen?  Amen!
>>> --
>>> Mark J. Reed 
>> 
>> +1.  I'm agnostic ;>
> 
> Militant?  :)  ( http://tinyurl.com/3xjgxnl )
> 
> Nothing inherently religious about "amen" (or me), but I'll accept
> "+1" as synonymous.   :)
> 
> -- 
> Mark J. Reed 

Not militant, just trying to inject a bit of humor into the zombie thread that 
won't die.

chris

Re: Suggested magic for "a" .. "b"

2010-07-28 Thread Jon Lang
Michael Zedeler wrote:
> Jon Lang wrote:
>> This is definitely something for the Unicode crowd to look into.  But
>> whatever solution you come up with, please make it compatible with the
>> notion that "aardvark".."apple" can be used to match any word in the
>> dictionary that comes between those two words.
>
> The key issue here is whether there is a well defined and meaningful
> ordering of the characters in question. We keep discussing the nice
> examples, but how about "apple" .. "ส้ม"?

All I'm saying is: don't throw out the baby with the bathwater.  Come
up with an interim solution that handles the nice examples intuitively
and the ugly examples poorly (or better, if you can manage that right
out of the gate); then revise the model to improve the handling of the
ugly examples as much as you can; but while you do so, make an effort
to keep the nice examples working.

> I don't know enough about Unicode to suggest how to solve this. All I can
> say is that my example above should never return a valid Range object unless
> there is a way I can specify my own ordering and I use it.

That actually says something: it says that we may want to reconsider
the notion that all string values can be sorted.  You're suggesting
the possibility that "a" cmp "ส้" is, by default, undefined.

There are some significant problems that arise if you do this.

-- 
Jonathan "Dataweaver" Lang


Re: Suggested magic for "a" .. "b"

2010-07-28 Thread Michael Zedeler

On 2010-07-29 01:39, Jon Lang wrote:

Aaron Sherman wrote:


In smart-match context, "a".."b" includes "aardvark".


No one has yet explained to me why that makes sense. The continued use of
ASCII examples, of course, doesn't help. Does "a" .. "b" include "æther"?
This is where Germans and Swedes, for example, don't agree, but they're all
using the same Latin code blocks.


This is definitely something for the Unicode crowd to look into.  But
whatever solution you come up with, please make it compatible with the
notion that "aardvark".."apple" can be used to match any word in the
dictionary that comes between those two words.


The key issue here is whethere there is a well defined and meaningful 
ordering of the characters in question. We keep discussing the nice 
examples, but how about "apple" .. "ส้ม"?


I don't know enough about Unicode to suggest how to solve this. All I 
can say is that my example above should never return a valid Range 
object unless there is a way I can specify my own ordering and I use it.



I've never accepted that the range between two strings of identical length
should include strings of another length. That seems maximally non-intuitive
(well, I suppose you could always return the last 100 words of Hamlet as an
iterable IO object if you really wanted to confuse people), and makes string
and integer ranges far too divergent.


This is why I dislike the notion of the range operator being used to
produce lists: the question of what values you'd get by iterating from
one string value to another is _very_ different from the question of
what string values qualify as being between the two.  The more you use
infix:<..>  to produce lists, the more likely you are to conflate lists
with ranges.


I second the above. Ranges are all about comparing things. $x ~~ $a .. 
$b means "is $x between $a and $b?". The only broadly accepted 
comparison of strings is lexicographical comparison. To illustrate the 
point: wouldn't you find it odd if 2.01 wasn't in between 1.1 and 2.1? 
Really?


Regards,

Michael.



Re: Suggested magic for "a" .. "b"

2010-07-28 Thread Michael Zedeler

On 2010-07-29 00:24, Dave Whipp wrote:

Aaron Sherman wrote:
On Wed, Jul 28, 2010 at 11:34 AM, Dave Whipp  
wrote:


To squint at this slightly, in the context that we already have 
0...1e10 as
a sequence generator, perhaps the semantics of iterating a range 
should be

unordered -- that is,

 for 0..10 -> $x { ... }

is treated as

 for (0...10).pick(*) -> $x { ... }



As others have pointed out, this has some problems. You can't 
implement 0..*

that way, just for starters.


I'd say that' a point in may favor: it demonstrates the integers and 
strings have similar problems. If you pick items from an infinite set 
then every item you pick will have an infinite number of 
digits/characters.


In smart-match context, "a".."b" includes "aardvark". It follows that, 
unless you're filtering/shaping the sequence of generated items, then 
almost every element ("a".."b").Seq starts with an infinite number of 
"a"s.


Consistent semantics would make "a".."b" very not-useful when used as 
a sequence: the user needs to say how they want to avoid the 
infinities. Similarly (0..1).Seq should most likely return Real 
numbers -- and thus (0..1).pick(*) can be approximated by 
(0..1).pick(*, :replace), which is much easier to implement.
I agree that /in theory/ coercing from Range to Sequence, the new 
Sequence should produce every possible value in the Range, unless you 
specify an increment. You could argue that 0 and 1 in (0..1).Seq are 
Ints, resulting in the expansion 0, 1, but that would leave a door open 
for very nasty surprises.


In practise, producing every possible value in a Range with 
over-countable items isn't useful and just opens the door for 
inexperienced programmers to make perl run out of memory without ever 
producing a warning, so I'd suggest that the conversion should fail 
unless an increment is specified.


The general principle would be to avoid meaningless conversions, so (1 
.. *).Seq > (1 .. *).pick should also just fail, but with finite 
endpoints, it could succeed. The question here is whether we should open 
for more parallelization at the cost of simplicity. I don't know.


So either you define some arbitrary semantics (what those should be 
is, I think, the original topic of this thread) or else you punt 
(error message). An error message has the advantage that you can 
always do something useful, later.
I second that just doing something arbitrary where no actual definition 
exists is a really bad idea. To be more specific, there should be no 
.succ or .pred methods on Rat, Str, Real, Complex and anything else that 
is over-countable. Trying to implement .succ on something like Str is 
most likely dwimmy to a very narrow set of applications, but will 
confuse everyone else.


Just to illustrate my point, if we have .succ on Str, why not have it on 
Range or Seq?


Let's just play with that idea for a second - what would a reasonable 
implementation of .succ on Range be?


(1 .. 10).succ --?--> (1 .. 11)
(1 .. 10).succ --?--> (2 .. 11)
(1 .. 10).succ --?--> (1 .. 12)
(1 .. 10).succ --?--> (10^ .. *)

Even starting a discussion about which implementation of .succ for Range 
(above), Str, Rat or Real completely misses the point: there is no 
definition of this function for those domains. It is non-existent and 
trying to do something dwimmy is just confusing.


As a sidenote, ++ and .succ should be treated as two different things 
(just like -- and .pred). ++ really means "add one" everywhere and can 
be kept as such, where .succ means "the next, smallest possible item". 
This means that we can keep ++ and -- for all numeric types.


Coercing to Sequence from Range should by default use .succ on the LHS, 
whereas Seq could just use ++ semantics as often as desired. This would 
make Ranges completely consistent and provide a clear distinction 
between the two classes.

Getting back to 10..0


Yes, I agree with Jon that this should be an empty range. I don't care 
what order you pick the elements from an empty range :).

Either empty, the same as 0 .. 10 or throw an error (I like errors :).

Regards,

Michael.



Re: Suggested magic for "a" .. "b"

2010-07-28 Thread Jon Lang
Aaron Sherman wrote:
>> In smart-match context, "a".."b" includes "aardvark".
>
>
> No one has yet explained to me why that makes sense. The continued use of
> ASCII examples, of course, doesn't help. Does "a" .. "b" include "æther"?
> This is where Germans and Swedes, for example, don't agree, but they're all
> using the same Latin code blocks.

This is definitely something for the Unicode crowd to look into.  But
whatever solution you come up with, please make it compatible with the
notion that "aardvark".."apple" can be used to match any word in the
dictionary that comes between those two words.

> I've never accepted that the range between two strings of identical length
> should include strings of another length. That seems maximally non-intuitive
> (well, I suppose you could always return the last 100 words of Hamlet as an
> iterable IO object if you really wanted to confuse people), and makes string
> and integer ranges far too divergent.

This is why I dislike the notion of the range operator being used to
produce lists: the question of what values you'd get by iterating from
one string value to another is _very_ different from the question of
what string values qualify as being between the two.  The more you use
infix:<..> to produce lists, the more likely you are to conflate lists
with ranges.

-- 
Jonathan "Dataweaver" Lang


Re: Suggested magic for "a" .. "b"

2010-07-28 Thread Jon Lang
Darren Duncan wrote:
> Does "..." also come with the 4 variations of endpoint inclusion/exclusion?
>
> If not, then it should, as I'm sure many times one would want to do this,
> say:
>
>  for 0...^$n -> {...}

You can toggle the inclusion/exclusion of the ending condition by
choosing between "..." and "...^"; but the starting point is the
starting point no matter what: there is neither "^..." nor "^...^".

> In any event, I still think that the mnemonics of "..." (yadda-yadda-yadda)
> are more appropriate to a generator, where it says "produce this and so on".
>  A ".." does not have that mnemonic and looks better for an interval.

Well put.  This++.

-- 
Jonathan "Dataweaver" Lang


Re: Suggested magic for "a" .. "b"

2010-07-28 Thread Aaron Sherman
On Wed, Jul 28, 2010 at 6:24 PM, Dave Whipp  wrote:

> Aaron Sherman wrote:
>
>> On Wed, Jul 28, 2010 at 11:34 AM, Dave Whipp 
>> wrote:
>>
>>  To squint at this slightly, in the context that we already have 0...1e10
>>> as
>>> a sequence generator, perhaps the semantics of iterating a range should
>>> be
>>> unordered -- that is,
>>>
>>>  for 0..10 -> $x { ... }
>>>
>>> is treated as
>>>
>>>  for (0...10).pick(*) -> $x { ... }
>>>
>>>
>> As others have pointed out, this has some problems. You can't implement
>> 0..*
>> that way, just for starters.
>>
>
> I'd say that' a point in may favor: it demonstrates the integers and
> strings have similar problems. If you pick items from an infinite set then
> every item you pick will have an infinite number of digits/characters.
>

So, if I understand you correctly, you're happy about the fact that
iterating over and explicitly lazy range would immediately result in
failure? Sorry, not following.


>
> In smart-match context, "a".."b" includes "aardvark".


No one has yet explained to me why that makes sense. The continued use of
ASCII examples, of course, doesn't help. Does "a" .. "b" include "æther"?
This is where Germans and Swedes, for example, don't agree, but they're all
using the same Latin code blocks.

I don't think you can reasonably bring locale into this. I think it needs to
be purely a codepoint-oriented operator. If you bring locale into it, then
the argument for not including composing an modifying characters goes out
the window, and you're stuck in what I believe Dante called "the Unicode
circle." If you treat this as a codepoint-based operator then you get a very
simple result: "a".."b" is the range between the codepoint for "a" and the
codepoint for "b". "aa" .. "bb" is the range between a sequence of two
codepoints and a sequence of two other code points, which you can define in
a number of ways (we've discussed a few, here) which don't involve having to
expand the sequences to three or more codepoints.

I've never accepted that the range between two strings of identical length
should include strings of another length. That seems maximally non-intuitive
(well, I suppose you could always return the last 100 words of Hamlet as an
iterable IO object if you really wanted to confuse people), and makes string
and integer ranges far too divergent.



>  Then the whole question of reversibility is moot.
>>>
>> Really? I don't think it is. In fact, you've simply made the problem pop
>> up
>> everywhere, and guaranteed that .. must behave totally unlike any other
>> iterator.
>>
>
> %hash.keys has similarly unordered semantics.


Unordered semantics and shuffled values aren't the same thing. The reason
that hash keys are unordered is that we cannot guarantee that any given
implementation will store entries in any given relation to the input. Ranges
have a well defined ordering associated with the elements that fall within
the range by virtue of the basic definition of a range (LHS <= * <= RHS).
Hashes have no ordering associated with their keys (though one can be
imposed, e.g. by sort).


Therefore %hash.keys.reverse is, for most purposes, equivalent to
> %hash.keys.


Argh! No, that's entirely untrue. %hash.keys and %hash.keys.reverse had
better be the same elements, but reversed for all hashes which remain
unmodified between the first and second call.


-- 
Aaron Sherman
Email or GTalk: a...@ajs.com
http://www.ajs.com/~ajs


Re: Suggested magic for "a" .. "b"

2010-07-28 Thread Aaron Sherman
On Wed, Jul 28, 2010 at 6:24 PM, Dave Whipp  wrote:

> Aaron Sherman wrote:
>
>> On Wed, Jul 28, 2010 at 11:34 AM, Dave Whipp 
>> wrote:
>>
>>  To squint at this slightly, in the context that we already have 0...1e10
>>> as
>>> a sequence generator, perhaps the semantics of iterating a range should
>>> be
>>> unordered -- that is,
>>>
>>>  for 0..10 -> $x { ... }
>>>
>>> is treated as
>>>
>>>  for (0...10).pick(*) -> $x { ... }
>>>
>>>
>> As others have pointed out, this has some problems. You can't implement
>> 0..*
>> that way, just for starters.
>>
>
> I'd say that' a point in may favor: it demonstrates the integers and
> strings have similar problems. If you pick items from an infinite set then
> every item you pick will have an infinite number of digits/characters.
>

So, if I understand you correctly, you're happy about the fact that
iterating over and explicitly lazy range would immediately result in
failure? Sorry, not following.


>
> In smart-match context, "a".."b" includes "aardvark".


No one has yet explained to me why that makes sense. The continued use of
ASCII examples, of course, doesn't help. Does "a" .. "b" include "æther"?
This is where Germans and Swedes, for example, don't agree, but they're all
using the same Latin code blocks.

I don't think you can reasonably bring locale into this. I think it needs to
be purely a codepoint-oriented operator. If you bring locale into it, then
the argument for not including composing an modifying characters goes out
the window, and you're stuck in what I believe Dante called "the Unicode
circle." If you treat this as a codepoint-based operator then you get a very
simple result: "a".."b" is the range between the codepoint for "a" and the
codepoint for "b". "aa" .. "bb" is the range between a sequence of two
codepoints and a sequence of two other code points, which you can define in
a number of ways (we've discussed a few, here) which don't involve having to
expand the sequences to three or more codepoints.

I've never accepted that the range between two strings of identical length
should include strings of another length. That seems maximally non-intuitive
(well, I suppose you could always return the last 100 words of Hamlet as an
iterable IO object if you really wanted to confuse people), and makes string
and integer ranges far too divergent.



>  Then the whole question of reversibility is moot.
>>>
>> Really? I don't think it is. In fact, you've simply made the problem pop
>> up
>> everywhere, and guaranteed that .. must behave totally unlike any other
>> iterator.
>>
>
> %hash.keys has similarly unordered semantics.


Unordered semantics and shuffled values aren't the same thing. The reason
that hash keys are unordered is that we cannot guarantee that any given
implementation will store entries in any given relation to the input. Ranges
have a well defined ordering associated with the elements that fall within
the range by virtue of the basic definition of a range (LHS <= * <= RHS).
Hashes have no ordering associated with their keys (though one can be
imposed, e.g. by sort).


Therefore %hash.keys.reverse is, for most purposes, equivalent to
> %hash.keys.


Argh! No, that's entirely untrue. %hash.keys and %hash.keys.reverse had
better be the same elements, but reversed for all hashes which remain
unmodified between the first and second call.


-- 
Aaron Sherman
Email or GTalk: a...@ajs.com
http://www.ajs.com/~ajs


Re: Suggested magic for "a" .. "b"

2010-07-28 Thread Dave Whipp

Darren Duncan wrote:

Dave Whipp wrote:

Similarly (0..1).Seq should most likely return Real numbers


No it shouldn't, because the endpoints are integers.

If you want Real numbers, then say "0.0 .. 1.0" instead.

-- Darren Duncan


That would be inconsistent. $x ~~ 0..1 means 0 <= $x <= 1. The fact that 
the endpoints are integers does not imply the the range does not include 
non-integer reals.


My argument is that iterating a range could be defined to give you a 
uniform distribution of values that would smart match true against that 
range -- and that such a definition would be just as reasonable as (and 
perhaps more general than) one that says that you get an incrementing 
ordered set of integers across that range.


Re: Suggested magic for "a" .. "b"

2010-07-28 Thread Darren Duncan

Dave Whipp wrote:

Similarly (0..1).Seq should most likely return Real numbers


No it shouldn't, because the endpoints are integers.

If you want Real numbers, then say "0.0 .. 1.0" instead.

-- Darren Duncan


Re: Suggested magic for "a" .. "b"

2010-07-28 Thread Dave Whipp

Aaron Sherman wrote:

On Wed, Jul 28, 2010 at 11:34 AM, Dave Whipp  wrote:


To squint at this slightly, in the context that we already have 0...1e10 as
a sequence generator, perhaps the semantics of iterating a range should be
unordered -- that is,

 for 0..10 -> $x { ... }

is treated as

 for (0...10).pick(*) -> $x { ... }



As others have pointed out, this has some problems. You can't implement 0..*
that way, just for starters.


I'd say that' a point in may favor: it demonstrates the integers and 
strings have similar problems. If you pick items from an infinite set 
then every item you pick will have an infinite number of digits/characters.


In smart-match context, "a".."b" includes "aardvark". It follows that, 
unless you're filtering/shaping the sequence of generated items, then 
almost every element ("a".."b").Seq starts with an infinite number of "a"s.


Consistent semantics would make "a".."b" very not-useful when used as a 
sequence: the user needs to say how they want to avoid the infinities. 
Similarly (0..1).Seq should most likely return Real numbers -- and thus 
(0..1).pick(*) can be approximated by (0..1).pick(*, :replace), which is 
much easier to implement.


So either you define some arbitrary semantics (what those should be is, 
I think, the original topic of this thread) or else you punt (error 
message). An error message has the advantage that you can always do 
something useful, later.



Then the whole question of reversibility is moot.

Really? I don't think it is. In fact, you've simply made the problem pop up
everywhere, and guaranteed that .. must behave totally unlike any other
iterator.


%hash.keys has similarly unordered semantics. Therefore 
%hash.keys.reverse is, for most purposes, equivalent to %hash.keys. That 
is why I said the question of reversibility becomes moot if you define 
the collapse of a range to a sequence to be unordered. It also 
demonstrates precedent, so not "totally unlike any other".


Even though it was only a semi-serious proposal, I seem to find myself 
defending it. So maybe I was serious, afterall. That argument for DWIM 
being ordered pretty much goes away once you tell people to use "..." 
for what they intended to mean.




Getting back to 10..0


Yes, I agree with Jon that this should be an empty range. I don't care 
what order you pick the elements from an empty range :).


Re: Suggested magic for "a" .. "b"

2010-07-28 Thread Darren Duncan

Darren Duncan wrote:

Aaron Sherman wrote:
The more I look at this, the more I think ".." and "..." are reversed. 


I would rather that ".." stay with intervals and "..." with generators.  



Another thing to consider if one is looking at huffmanization is how often the 
versions that exclude endpoints would be used, such as "^..^".


I would imagine that a sequence generator would also have this variability 
useful.

Does "..." also come with the 4 variations of endpoint inclusion/exclusion?

If not, then it should, as I'm sure many times one would want to do this, say:

  for 0...^$n -> {...}

In any event, I still think that the mnemonics of "..." (yadda-yadda-yadda) are 
more appropriate to a generator, where it says "produce this and so on".  A ".." 
does not have that mnemonic and looks better for an interval.


-- Darren Duncan


Re: Suggested magic for "a" .. "b"

2010-07-28 Thread Darren Duncan

Aaron Sherman wrote:

The more I look at this, the more I think ".." and "..." are reversed. ".."
has a very specific and narrow usage (comparing ranges) and "..." is
probably going to be the most broadly used operator in the language outside
of quotes, commas and the basic, C-derived math and logic ops. Many (most?)
loops will involve "...". Most array initializers will involve "...". Why
are we not calling that ".."? Just because we defined ".." first, and it
grandfathered its way in the door? Because it resembles the math op? These
don't seem like good reasons.


I would rather that ".." stay with intervals and "..." with generators.  The 
mnemonics make more sense that way.  Having ".." resemble the math op with the 
same meaning, intervals, is a good thing.  Besides comparing ranges, an interval 
would also often be used for a membership test, eg "$a <= $x <= $b" would 
alternately be spelled "$x ~~ $a..$b" for example.  I would imagine that the 
interval use would be more common than the generator use in some problem 
domains. -- Darren Duncan


Re: Suggested magic for "a" .. "b"

2010-07-28 Thread Leon Timmermans
On Wed, Jul 28, 2010 at 11:29 PM, Aaron Sherman  wrote:
> The more I look at this, the more I think ".." and "..." are reversed. ".."
> has a very specific and narrow usage (comparing ranges) and "..." is
> probably going to be the most broadly used operator in the language outside
> of quotes, commas and the basic, C-derived math and logic ops. Many (most?)
> loops will involve "...". Most array initializers will involve "...". Why
> are we not calling that ".."? Just because we defined ".." first, and it
> grandfathered its way in the door? Because it resembles the math op? These
> don't seem like good reasons.

I was thinking the same. Switching them seems better from a huffmanization POV.

Leon


Re: Suggested magic for "a" .. "b"

2010-07-28 Thread yary
On Wed, Jul 28, 2010 at 2:29 PM, Aaron Sherman  wrote:
>
> The more I look at this, the more I think ".." and "..." are reversed. ".."
> has a very specific and narrow usage (comparing ranges) and "..." is
> probably going to be the most broadly used operator in the language outside
> of quotes, commas and the basic, C-derived math and logic ops.

+1

Though it being the day before Rakudo *'s first release makes me
think, "too late!"

-y


Re: Suggested magic for "a" .. "b"

2010-07-28 Thread Aaron Sherman
On Wed, Jul 28, 2010 at 11:34 AM, Dave Whipp  wrote:

> To squint at this slightly, in the context that we already have 0...1e10 as
> a sequence generator, perhaps the semantics of iterating a range should be
> unordered -- that is,
>
>  for 0..10 -> $x { ... }
>
> is treated as
>
>  for (0...10).pick(*) -> $x { ... }
>

As others have pointed out, this has some problems. You can't implement 0..*
that way, just for starters.


> Then the whole question of reversibility is moot.


Really? I don't think it is. In fact, you've simply made the problem pop up
everywhere, and guaranteed that .. must behave totally unlike any other
iterator.

Getting back to 10..0...

The complexity of implementation argument doesn't really hold for me, as:

   (a..b).list = a>b ?? a,*.pred ... b !! a,*.succ ... b

Is pretty darned simple and does not require that b implement anything more
than it does under the current implementation. a, on the other hand, now has
to (optionally, since throwing an exception is the alternative) implement
one more method.

The more I look at this, the more I think ".." and "..." are reversed. ".."
has a very specific and narrow usage (comparing ranges) and "..." is
probably going to be the most broadly used operator in the language outside
of quotes, commas and the basic, C-derived math and logic ops. Many (most?)
loops will involve "...". Most array initializers will involve "...". Why
are we not calling that ".."? Just because we defined ".." first, and it
grandfathered its way in the door? Because it resembles the math op? These
don't seem like good reasons.

-- 
Aaron Sherman
Email or GTalk: a...@ajs.com
http://www.ajs.com/~ajs


Re: Suggested magic for "a" .. "b"

2010-07-28 Thread Dave Whipp

Moritz Lenz wrote:


I fear what Perl 6 needs is not to broaden the range of discussion even
further, but to narrow it down to the essential points. Personal opinion
only.


OK, as a completely serious proposal, the semantics of "for 0..10 { ... 
}" should be for the compiler to complain "sorry, that's a perl5ism: in 
perl6, please use a C<...> or explicit coercion of the range to a sequence".



(BTW, I thought a bit more about my previous suggestion: there is 
precedent in that %hash.keys is unordered -- so it's not entirely 
obvious that a default range coercion should be ordered)


Re: Suggested magic for "a" .. "b"

2010-07-28 Thread Moritz Lenz
Dave Whipp wrote:
> Moritz Lenz wrote:
>> Dave Whipp wrote:
>>>for 0..10 -> $x { ... }
>>> is treated as
>>>for (0...10).pick(*) -> $x { ... }
>> 
>> Sorry, I have to ask. Are you serious? Really?
> 
> Ah, to reply, or not to reply, to rhetorical sarcasm ... In this case, I 
> think I will:

No sarcasm involved, just curiosity.

> Was my specific proposal entirely serious: only in that it was an 
> attempt to broaden the box for the discussion of semantics of coercion 
> ranges.

I fear what Perl 6 needs is not to broaden the range of discussion even
further, but to narrow it down to the essential points. Personal opinion
only.

> Why do we assume that ranges iterate in .succ order -- or even that they 
> iterate as integers (and are finite). Why not iterate as a top-down 
> breadth-first generation of a Cantor set?

That's easy: Principle of least surprise.

Cheers.
Moritz


Re: Suggested magic for "a" .. "b"

2010-07-28 Thread Dave Whipp

Moritz Lenz wrote:

Dave Whipp wrote:

   for 0..10 -> $x { ... }
is treated as
   for (0...10).pick(*) -> $x { ... }


Sorry, I have to ask. Are you serious? Really?


Ah, to reply, or not to reply, to rhetorical sarcasm ... In this case, I 
think I will:


Was my specific proposal entirely serious: only in that it was an 
attempt to broaden the box for the discussion of semantics of coercion 
ranges. One of the banes of my life is to undo the sequential mindset 
that so many programmers have. I like to point out that 
"sequentialization is an optimization to make programs run faster on 
Von-Neumann architectures". Often, it's premature. Most of the time it 
doesn't matter (compilers, and even HW, can extract ILP), but every now 
and again it results in an unfortunate barrier in solution-space.


Why do we assume that ranges iterate in .succ order -- or even that they 
iterate as integers (and are finite). Why not iterate as a top-down 
breadth-first generation of a Cantor set? etc. Does the language need to 
choose a default, or is it better require the programmer to state how 
they want to coerce the range to the seq. Ten years from now, we'll keep 
needing to refer questions to the .. Vs ... faq.


Re: Suggested magic for "a" .. "b"

2010-07-28 Thread Mark J. Reed
On Wed, Jul 28, 2010 at 2:30 PM, Chris Fields  wrote:
> On Jul 28, 2010, at 1:27 PM, Mark J. Reed wrote:
>> Can I get an Amen?  Amen!
>> --
>> Mark J. Reed 
>
> +1.  I'm agnostic ;>

Militant?  :)  ( http://tinyurl.com/3xjgxnl )

Nothing inherently religious about "amen" (or me), but I'll accept
"+1" as synonymous.   :)

-- 
Mark J. Reed 


Re: Suggested magic for "a" .. "b"

2010-07-28 Thread Mark J. Reed
On Wednesday, July 28, 2010, Jon Lang  wrote:
> Keep it simple, folks!  There are enough corner cases in Perl 6 as
> things stand; we don't need to be introducing more of them if we can
> help it.

Can I get an Amen?  Amen!


-- 
Mark J. Reed 


Re: Suggested magic for "a" .. "b"

2010-07-28 Thread Jon Lang
TSa wrote:
> Swapping the endpoints could mean swapping inside test to outside
> test. The only thing that is needed is to swap from && to ||:
>
>   $a .. $b   # means  $a <= $_ && $_ <= $b  if $a < $b
>   $b .. $a   # means  $b <= $_ || $_ <= $a  if $a < $b

This is the same sort of discontinuity of meaning that was causing
problems with Perl 5's use of negative indices to count backward from
the end of a list; there's a reason why Perl 6 now uses the [*-$a]
notation for that sort of thing.

Consider a code snippet where the programmer is given two values: one
is a minimum value which must be reached; the other is a maximum value
which must not be exceeded.  In this example, the programmer does not
know what the values are; for all he knows, the minimum threshold
exceeds the maximum.  As things stand, it's trivial to test whether or
not your sample value is viable: if "$x ~~ $min .. $max", then you're
golden: it doesn't matter what "$min cmp $max" is.  With your change,
I'd have to replace the above with something along the lines of:
  "if $min <= $max && $x ~~ $min .. $max { ... }" - because if $min >
$max, the algorithm will accept values that are well below the minimum
as well as values that are well above the maximum.

Keep it simple, folks!  There are enough corner cases in Perl 6 as
things stand; we don't need to be introducing more of them if we can
help it.

-- 
Jonathan "Dataweaver" Lang


Re: Suggested magic for "a" .. "b"

2010-07-28 Thread yary
> Swapping the endpoints could mean swapping inside test to outside
> test. The only thing that is needed is to swap from && to ||:
>
> $a .. $b # means $a <= $_ && $_ <= $b if $a < $b
> $b .. $a # means $b <= $_ || $_ <= $a if $a < $b

I think that's what "not", "!" are for!


Re: Suggested magic for "a" .. "b"

2010-07-28 Thread TSa (Thomas Sandlaß)
On Wednesday, 28. July 2010 05:12:52 Michael Zedeler wrote:
> Writing ($a .. $b).reverse doesn't make any sense if the result were a
> new Range, since Ranges should then only be used for inclusion tests (so
> swapping endpoints doesn't have any meaningful interpretation), but
> applying .reverse could result in a coercion to Sequence.

Swapping the endpoints could mean swapping inside test to outside
test. The only thing that is needed is to swap from && to ||:

   $a .. $b   # means  $a <= $_ && $_ <= $b  if $a < $b
   $b .. $a   # means  $b <= $_ || $_ <= $a  if $a < $b

Regards TSa.
-- 
"The unavoidable price of reliability is simplicity" -- C.A.R. Hoare
"Simplicity does not precede complexity, but follows it." -- A.J. Perlis
1 + 2 + 3 + 4 + ... = -1/12  -- Srinivasa Ramanujan


Re: Suggested magic for "a" .. "b"

2010-07-28 Thread Moritz Lenz
yary wrote:
> though would a parallel batch of an anonymous block be more naturally written 
> as
> all(0...10) -> $x { ... } # Spawn 11 threads

No,

hyper  for 0..10 -> $x { ... } # spawn as many threads
# as the compiler thinks are reasonable

I think one (already specced) syntax for the same thing is enough,
especially considering that hyper operators also do the same job.

Cheers,
Moritz


Re: Suggested magic for "a" .. "b"

2010-07-28 Thread yary
On Wed, Jul 28, 2010 at 8:34 AM, Dave Whipp  wrote:
> To squint at this slightly, in the context that we already have 0...1e10 as
> a sequence generator, perhaps the semantics of iterating a range should be
> unordered -- that is,
>
>  for 0..10 -> $x { ... }
>
> is treated as
>
>  for (0...10).pick(*) -> $x { ... }

Makes me think about parallel operations.

for 0...10 -> $x { ... } # 0 through 10 in order
for 0..10 -> $x { ... } # Spawn 11 threads, $x=0 through 10 concurrently
for 10..0 -> $x { ... } # A no-op
for 10...0 -> $x { ... } # 10 down to 0 in order

though would a parallel batch of an anonymous block be more naturally written as
all(0...10) -> $x { ... } # Spawn 11 threads

-y


Re: Suggested magic for "a" .. "b"

2010-07-28 Thread Moritz Lenz
Dave Whipp wrote:
> To squint at this slightly, in the context that we already have 0...1e10 
> as a sequence generator, perhaps the semantics of iterating a range 
> should be unordered -- that is,
> 
>for 0..10 -> $x { ... }
> 
> is treated as
> 
>for (0...10).pick(*) -> $x { ... }

Sorry, I have to ask. Are you serious? Really?

Cheers,
Moritz


Re: Suggested magic for "a" .. "b"

2010-07-28 Thread Jon Lang
Dave Whipp wrote:
> To squint at this slightly, in the context that we already have 0...1e10 as
> a sequence generator, perhaps the semantics of iterating a range should be
> unordered -- that is,
>
>  for 0..10 -> $x { ... }
>
> is treated as
>
>  for (0...10).pick(*) -> $x { ... }
>
> Then the whole question of reversibility is moot.

No thanks; I'd prefer it if $a..$b have analogous meanings in item and
list contexts.  As things stand, 10..1 means, in item context,
"numbers that are greater or equal to ten and less than or equal to
one", which is equivalent to "nothing"; in list context, it means "an
empty list". This makes sense to me; having it provide a list
containing the numbers 1 through 10 creates a conflict between the two
contexts regardless of how they're arranged.

As I see it, C< $a..$b > in list context is a useful shorthand for C<
$a, *.succ ... $b >.  You only get into trouble when you start trying
to have infix:<..> do more than that in list context.

If anything needs to be done with respect to infix:<..>, it lies in
changing the community perception of the operator.  The only reason
why we're having this debate at all is that in Perl 5, the .. operator
was used to generate lists; so programmers coming from Perl 5 start
with the expectation that that's what it's for in Perl 6, too.  That
expectation needs to be corrected as quickly as can be managed, not
catered to.  But that's not a matter of language design; it's a matter
to be addressed by whoever's going to be writing the Perl 6 tutorials.

-- 
Jonathan "Dataweaver" Lang


Re: Suggested magic for "a" .. "b"

2010-07-28 Thread Dave Whipp

Michael Zedeler wrote:

This is exactly why I keep writing posts about Ranges being defunct as 
they have been specified now. If we accept the premise that Ranges are 
supposed to define a kind of linear membership specification between two 
starting points (as in math), it doesn't make sense that the LHS has an 
additional constraint (having to provide a .succ method). All we should 
require is that both endpoints supports comparison (that they share a 
common type with comparison, at least).


To squint at this slightly, in the context that we already have 0...1e10 
as a sequence generator, perhaps the semantics of iterating a range 
should be unordered -- that is,


  for 0..10 -> $x { ... }

is treated as

  for (0...10).pick(*) -> $x { ... }

Then the whole question of reversibility is moot. Plus, there would then 
be useful distinction for serialization of C<..> Vs C<...>. (perhaps we 
should even parallelize) When you have two very similar operators it's 
often good to maximize the semantic distance between them so that people 
don't get into the lazy habit of using them without thinking.


Re: Suggested magic for "a" .. "b"

2010-07-28 Thread Darren Duncan

Michael Zedeler wrote:
This is exactly why I keep writing posts about Ranges being defunct as 
they have been specified now. If we accept the premise that Ranges are 
supposed to define a kind of linear membership specification between two 
starting points (as in math), it doesn't make sense that the LHS has an 
additional constraint (having to provide a .succ method). All we should 
require is that both endpoints supports comparison (that they share a 
common type with comparison, at least).


Yes, I agree 100%.  All that should be required to construct a range 
"$foo..$bar" is that the endpoints are comparable, meaning "$foo cmp $bar" 
works.  Having a .pred or .succ for $foo|$bar should not be required to define a 
range but only to use that range as a generator. -- Darren Duncan


Re: Suggested magic for "a" .. "b"

2010-07-27 Thread Michael Zedeler

On 2010-07-28 06:54, Martin D Kealey wrote:

On Wed, 28 Jul 2010, Michael Zedeler wrote:
   

Writing for ($a .. $b).reverse ->  $c { ...} may then blow up because it
turns out that $b doesn't have a .succ method when coercing to sequence
(where the LHS must have an initial value), just like
 for $a .. $b ->  $c { ... }
should be able to blow up because the LHS of a Range shouldn't have to
support .succ.
 

Presumably you'd only throw that except if, as well, $b doesn't support .pred ?
   
Yes. It should be .pred. So ($a .. $b).reverse is only possible if 
$b.pred is defined and $a.gt is defined (and taking an object that has 
the type of $b.pred). If the coercion to Sequence is taking place first, 
we'll have to live with two additional constraints ($b.lt and $a.succ), 
but I guess it would be easy to overload .reverse and get rid of those.


Regards,

Michael.





Re: Suggested magic for "a" .. "b"

2010-07-27 Thread Michael Zedeler

On 2010-07-27 23:50, Aaron Sherman wrote:

PS: On a really abstract note, requiring that ($a .. $b).reverse be lazy
will put new constraints on the right hand side parameter. Previously, it
didn't have to have a value of its own, it just had to be comparable to
other values. for example:

   for $a .. $b ->  $c { ... }

In that, we don't include the RHS in the output range explicitly. Instead,
we increment a $a (via .succ) until it's>= $b. If $a were 1 and $b were an
object that "does Int" but just implements the comparison features, and has
no fixed numeric value, then it should still work (e.g. it could be random).
Now that's not possible because we need to use the RHS a the starting point
when .reverse is invoked.

This is exactly why I keep writing posts about Ranges being defunct as 
they have been specified now. If we accept the premise that Ranges are 
supposed to define a kind of linear membership specification between two 
starting points (as in math), it doesn't make sense that the LHS has an 
additional constraint (having to provide a .succ method). All we should 
require is that both endpoints supports comparison (that they share a 
common type with comparison, at least).


To provide expansion to lists, such as for $a .. $b -> $c { ... }, we 
should use type coercion semantics, coercing from Range to Sequence and 
throw an error if the LHS doesn't support .succ.


Writing ($a .. $b).reverse doesn't make any sense if the result were a 
new Range, since Ranges should then only be used for inclusion tests (so 
swapping endpoints doesn't have any meaningful interpretation), but 
applying .reverse could result in a coercion to Sequence.


Writing for ($a .. $b).reverse -> $c { ...} may then blow up because it 
turns out that $b doesn't have a .succ method when coercing to sequence 
(where the LHS must have an initial value), just like for $a .. $b -> $c 
{ ... } should be able to blow up because the LHS of a Range shouldn't 
have to support .succ.


Regards,

Michael.



Re: Suggested magic for "a" .. "b"

2010-07-27 Thread Jon Lang
Aaron Sherman wrote:
> As a special case, perhaps you can treat ranges as special and not as simple
> iterators. To be honest, I wasn't thinking about the possibility of such
> special cases, but about iterators in general. You can't generically reverse
> lazy constructs without running afoul of the halting problem, which I invite
> you to solve at your leisure ;-)

A really obvious example occurs when the RHS is a Whatever:

   (1..*).reverse;

.reverse magic isn't going to be generically applicable to all lazy
lists; but it can be applicable to all lazy lists that have predefined
start points, end points, and bidirectional iterators, and on all lazy
lists that have random-access iterators and some way of locating the
tail.  Sometimes you can guess what the endpoint and backward-iterator
should be from the start point and the forward-iterator, just as the
infix:<...> operator is able to guess what the forward-iterator should
be from the first one, two, or three items in the list.

This is especially a problem with regard to lists generated using the
series operator, as it's possible to define a custom forward-iterator
for it (but not, AFAICT, a custom reverse-iterator).  In comparison,
the simplicity of the range operator's list generation algorithm
almost guarantees that as long as you know for certain what or where
the last item is, you can lazily generate the list from its tail.  But
only almost:

   (1..3.5); # list context: 1, 2, 3
   (1..3.5).reverse; # list context: 3.5, 2.5, 1.5 - assuming list is
generated from tail.
   (1..3.5).reverse; # list context: 3, 2, 1 - but only if you
generate it from the head first, and then reverse it.

Again, the proper tool for list generation is the series operator,
because it can do everything that the range operator can do in terms
of list generation, and more.

1 ... 3.5 # same as 1, 2, 3
3.5 ... 1 # same as 3.5, 2.5, 1.5 - and obviously so.

With this in mind, I see no reason to allow any magic on .reverse when
dealing with the range operator (or the series operator, for that
matter): as far as it's concerned, it's dealing with a list that lacks
a reverse-iterator, and so it will _always_ generate the list from its
head to its tail before attempting to reverse it.  Maybe at some later
point, after we get Perl 6.0 out the door, we can look into revising
the series operator to permit more powerful iterators so as to allow
.reverse and the like to bring more dwimmy magic to bear.

-- 
Jonathan "Dataweaver" Lang


Re: Suggested magic for "a" .. "b"

2010-07-27 Thread Aaron Sherman
Sorry I haven't responded for so long... much going on in my world.

On Mon, Jul 26, 2010 at 11:35 AM, Nicholas Clark  wrote:

> On Tue, Jul 20, 2010 at 07:31:14PM -0400, Aaron Sherman wrote:
>
> > 2) We deny that a range whose LHS is "larger" than its RHS makes sense,
> but
> > we also don't provide an easy way to construct such ranges lazily
> otherwise.
> > This would be annoying only, but then we have declared that ranges are
> the
> > right way to construct basic loops (e.g. for (1..1e10).reverse -> $i
> {...}
> > which is not lazy (blows up your machine) and feels awfully clunky next
> to
> > for 1e10..1 -> $i {...} which would not blow up your machine, or even
> make
> > it break a sweat, if it worked)
>
> There is no reason why for (1..1e10).reverse -> $i {...} should *not* be
> lazy.
>
>
As a special case, perhaps you can treat ranges as special and not as simple
iterators. To be honest, I wasn't thinking about the possibility of such
special cases, but about iterators in general. You can't generically reverse
lazy constructs without running afoul of the halting problem, which I invite
you to solve at your leisure ;-)

For example, let's just tie it to integer factorization to make it really
obvious:

 # Generator for ranges of sequential, composite integers
 sub composites(Int $start) { gather do { for $start .. * -> $i {
   last if isprime($i);
   take $i;
 } } }
 for composites(10116471302318).reverse -> $i { say $i }

The first value should be 10116471302380, but computing that without
iterating through the list from start to finish would require knowing that
none of the integers between 10116471302318 and 10116471302380, inclusive,
are prime. Of course, the same problem exists for any iterator where the end
condition or steps can't be easily pre-computed, but this makes it more
obvious than most.

That means that Range.reverse has to do something special that iterators in
general can't be relied on to do. Does that introduce problems? Not big
ones. I can definitely see people who are used to "for ($a .. $b).reverse ->
..." getting confused when "for @blah.reverse -> ..." blows up their
machine, but avoiding that confusion might not be practical.

PS: On a really abstract note, requiring that ($a .. $b).reverse be lazy
will put new constraints on the right hand side parameter. Previously, it
didn't have to have a value of its own, it just had to be comparable to
other values. for example:

  for $a .. $b -> $c { ... }

In that, we don't include the RHS in the output range explicitly. Instead,
we increment a $a (via .succ) until it's >= $b. If $a were 1 and $b were an
object that "does Int" but just implements the comparison features, and has
no fixed numeric value, then it should still work (e.g. it could be random).
Now that's not possible because we need to use the RHS a the starting point
when .reverse is invoked.

I have no idea if that matters, but it's important to be aware of when and
where we constrain the interface rather than discovering it later.

-- 
Aaron Sherman
Email or GTalk: a...@ajs.com
http://www.ajs.com/~ajs


Re: Suggested magic for "a" .. "b"

2010-07-26 Thread Nicholas Clark
On Tue, Jul 20, 2010 at 07:31:14PM -0400, Aaron Sherman wrote:

> 2) We deny that a range whose LHS is "larger" than its RHS makes sense, but
> we also don't provide an easy way to construct such ranges lazily otherwise.
> This would be annoying only, but then we have declared that ranges are the
> right way to construct basic loops (e.g. for (1..1e10).reverse -> $i {...}
> which is not lazy (blows up your machine) and feels awfully clunky next to
> for 1e10..1 -> $i {...} which would not blow up your machine, or even make
> it break a sweat, if it worked)

There is no reason why for (1..1e10).reverse -> $i {...} should *not* be lazy.

After all, Perl 5 now implements

@b = reverse sort @a

by directly sorting in reverse. Note how it's now an ex-reverse:

$ perl -MO=Concise -e '@b = reverse sort @a'
c  <@> leave[1 ref] vKP/REFC ->(end)
1 <0> enter ->2
2 <;> nextstate(main 1 -e:1) v ->3
b <2> aassign[t6] vKS ->c
-<1> ex-list lK ->8
3   <0> pushmark s ->4
-   <1> ex-reverse lK/1 ->-
4  <0> pushmark s ->5
7  <@> sort lK/REV ->8
- <0> ex-pushmark s ->5
6 <1> rv2av[t4] lK/1 ->7
5<#> gv[*a] s ->6
-<1> ex-list lK ->b
8   <0> pushmark s ->9
a   <1> rv2av[t2] lKRM*/1 ->b
9  <#> gv[*b] s ->a
-e syntax OK

Likewise

foreach (reverse @a) {...}

is implemented as a reverse iterator on the array, rather than a temporary
list:

$ perl -MO=Concise -e 'foreach(reverse @a) {}'
d  <@> leave[1 ref] vKP/REFC ->(end)
1 <0> enter ->2
2 <;> nextstate(main 2 -e:1) v ->3
c <2> leaveloop vK/2 ->d
7<{> enteriter(next->9 last->c redo->8) lKS/REVERSED ->a
-   <0> ex-pushmark s ->3
-   <1> ex-list lKM ->6
3  <0> pushmark s ->4
-  <1> ex-reverse lKM/1 ->6
- <0> ex-pushmark s ->4
5 <1> rv2av[t2] sKR/1 ->6
4<#> gv[*a] s ->5
6   <#> gv[*_] s ->7
-<1> null vK/1 ->c
b   <|> and(other->8) vK/1 ->c
a  <0> iter s/REVERSED ->b
-  <@> lineseq vK ->-
8 <0> stub v ->9
9 <0> unstack v ->a
-e syntax OK



If it's part of the specification that (1..1e10).reverse is to be implemented
lazily, I'd (personally) consider that an easy enough way to construct a lazy
range.


This doesn't answer any of your other questions about what ranges of
character strings should mean. I don't really have an opinion, other than
it needs to be simple enough to be teachable.

Nicholas Clark


Re: Suggested magic for "a" .. "b"

2010-07-21 Thread Mark J. Reed
On Wed, Jul 21, 2010 at 3:55 PM, Darren Duncan  wrote:
> Larry Wall wrote:
>>
>> On Tue, Jul 20, 2010 at 11:53:27PM -0400, Mark J. Reed wrote:
>> : In particular, consider that pi ~~ 0..4 is true,
>> :  because pi is within the range; but pi ~~ 0...4 is false, because pi
>> : is not one of the generated elements.
>>
>> Small point here, it's not because pi is fractional: 3 ~~ 0...4 is
>> also false because 3 !eqv (0,1,2,3,4).  There is no implicit any()
>> on a smartmatch list pattern as there is in Perl 5.  In Perl 6 the
>> pattern 0..4 may only match a list with the same 5 elements in the
>> same order.
>
> For some reason I thought smart match in Perl 6, when presented with some
> collection on the right-hand side, would test if the value on the left-hand
> side was contained in the collection.

That was my thought as well.

> Similarly, since a range represents a set of all values between 2 endpoints,
> I might have thought this would be reasonable:
>
>  3 ~~ 1..5  # TRUE

AIUI, that is indeed correct.  Ranges smartmatch by testing for
inclusion in the range.  But collections don't smartmatch by testing
for inclusion in the collection.  Which was probably the subject of a
thread I missed somewhere...

For series, I think the canonical solution is to use any().

-- 
Mark J. Reed 


Re: Suggested magic for "a" .. "b"

2010-07-21 Thread Darren Duncan

Larry Wall wrote:

On Tue, Jul 20, 2010 at 11:53:27PM -0400, Mark J. Reed wrote:
: In particular, consider that pi ~~ 0..4 is true,
:  because pi is within the range; but pi ~~ 0...4 is false, because pi
: is not one of the generated elements.

Small point here, it's not because pi is fractional: 3 ~~ 0...4 is
also false because 3 !eqv (0,1,2,3,4).  There is no implicit any()
on a smartmatch list pattern as there is in Perl 5.  In Perl 6 the
pattern 0..4 may only match a list with the same 5 elements in the
same order.


For some reason I thought smart match in Perl 6, when presented with some 
collection on the right-hand side, would test if the value on the left-hand side 
was contained in the collection.


So, for example:

  my @ary = (1,4,3,2,9);
  my $test = 3;
  $test ~~ @ary;  # TRUE

Similarly, since a range represents a set of all values between 2 endpoints, I 
might have thought this would be reasonable:


  3 ~~ 1..5  # TRUE

So if that doesn't work, then what is the canonical way to ask if a value is in 
a range?


Would any of these be reasonable?

  3 ~~ any(1..5)

  3 in 1..5

  3 ∈ 1..5  # Unicode alternative

-- Darren Duncan


Re: Suggested magic for "a" .. "b"

2010-07-21 Thread Aaron Sherman
On Wed, Jul 21, 2010 at 9:46 AM, Aaron Crane  wrote:

>
> > I think that "Ā" .. "Ē" should ĀĂĄĆĈĊČĎĐĒ
>
> If that's in the hope of producing a more "intuitive" result, then why
> not ĀB̄C̄D̄Ē?
>
> That's only partly serious.  I'm acutely aware that choosing a baroque
> set of rules makes life harder for both implementers and users (and,
> in particular, risks ending up with an operator that has no practical
> non-trivial use cases).
>

Well... actually, I got to thinking (which is not my natural state) and I
think we need two approaches. I don't know if they're two operators, a
pragma or what, but there are definitely two things people want:


   - "x".succ_uni yields "x".ord incremented until the resulting codepoint
   "agrees" with "x". By agrees, I mean that it shares the same script and
   general category properties (major/minor). This is an important tool because
   it's universal.
   - "x".succ_loc yields the next character after "x" in the current locale.
   What convinced me that this is a peer to the above was when I thought about
   Japanese, where only a subset of the CJK ideographs are valid Japanese. You
   really need an index and collation for these that is outside of the basic
   Unicode properties.


So yes, if there's a locale in which ĀB̄C̄D̄Ē is the correct ordering, then
I do think that there should be some "Ā" .. "Ē" equivalent that yields the
above in that context. But, I'm not convinced it should be the default.


> I note also that this A-macron and E-macron are in NFC.  I think that,
> certainly by default, the difference between NFC and NFD should be
> hidden from users.  That implies that, however "Ā" .. "Ē" behaves, the
> NFD version should behave identically; and that "B̄" .. F̄ should
> behave in the most equivalent way possible.
>

As I've said previously, I'm only discussing single "characters" which I'm
defining as single codepoints which are neither combining nor modifying. If
you like, we can have the conversation about what you do when you encounter
combining and modifying codepoints, and I do think I agree with you largely,
but I'd like to hold that for now. It's just too much of a rat-hole at this
point.

-- 
Aaron Sherman
Email or GTalk: a...@ajs.com
http://www.ajs.com/~ajs


Re: Suggested magic for "a" .. "b"

2010-07-21 Thread Aaron Crane
Aaron Sherman  wrote:
> There's just an undefined codepoint smack in the middle of the Greek
> uppercase letters (U+03A2). I'm sure the Unicode specs have a rationale for
> that somewhere, but my guess is that there's some thousand-year-old debate
> about the Greek alphabet behind it.

It becomes clearer if you also look at the corresponding lower-case characters:

U+03A1 Greek capital letter rho
U+03A2 (none)
U+03A3 Greek capital letter sigma

U+03C1 Greek small letter rho
U+03C2 Greek small letter final sigma
U+03C3 Greek small letter sigma

Greek words written in lower-case that end in a sigma use a special
glyph for that sigma; and Unicode allocates a codepoint to it for
roundtripping to legacy character sets.  There isn't a corresponding
upper-case final sigma.  Unicode leaves the gap in the upper-case
Greek range for neatness, effectively: adding 0x20 to the numeric
value of an upper-case character yields the corresponding lower-case
version.

> I think that "Ā" .. "Ē" should ĀĂĄĆĈĊČĎĐĒ

If that's in the hope of producing a more "intuitive" result, then why
not ĀB̄C̄D̄Ē?

That's only partly serious.  I'm acutely aware that choosing a baroque
set of rules makes life harder for both implementers and users (and,
in particular, risks ending up with an operator that has no practical
non-trivial use cases).

I note also that this A-macron and E-macron are in NFC.  I think that,
certainly by default, the difference between NFC and NFD should be
hidden from users.  That implies that, however "Ā" .. "Ē" behaves, the
NFD version should behave identically; and that "B̄" .. F̄ should
behave in the most equivalent way possible.

-- 
Aaron Crane ** http://aaroncrane.co.uk/


Re: Suggested magic for "a" .. "b"

2010-07-21 Thread Aaron Sherman
On Wed, Jul 21, 2010 at 1:28 AM, Aaron Sherman  wrote:

>
> For reference, this is the relevant section of the spec:
>
> Character positions are incremented within their natural range for any
> Unicode range that is deemed to represent the digits 0..9 or that is deemed
> to be a complete cyclical alphabet for (one case of) a (Unicode) script.
> Only scripts that represent their alphabet in codepoints that form a cycle
> independent of other alphabets may be so used. (This specification defers to
> the users of such a script for determining the proper cycle of letters.) We
> arbitrarily define the ASCII alphabet not to intersect with other scripts
> that make use of characters in that range, but alphabets that intersperse
> ASCII letters are not allowed.
>
>
> I'm not sure that all of that tracks with the Unicode standard's use of
> some of the terms, but based on what we've discussed, perhaps we could get
> more specific there:
>
> Character positions are incremented within their Unicode Script, but only
> in keeping with their General Category property. Thus C<"A"++> yields C<"B">
> which is the next codepoint, but C<"Ă"++> yields C<"Ą"> even though "ą"
> falls between the two, when incrementing codepoints. Should this prove
> problematic for any specific Unicode Script which requires special handling
> (e.g. because a "letter" really isn't used as a letter at all), such special
> handling may be applied, but the above is the general rule.
>
>
Oh, so close! I realized that I broke the original spec, here. We need to
add back in:

There are two special cases: the ASCII-compatible lower-case letters (a-z)
and the ASCII-compatible upper-case letters (A-Z). For historical reasons,
these, by default, will not increment past the end of their ranges into the
higher-codepoint Latin characters.


Note: we might want a pragma for that as well. I'd suggest that perhaps it
should be a locale-specific feature? So, if you set your locale to fr, then
you include in those ranges all of the Latin characters used in French.

-- 
Aaron Sherman
Email or GTalk: a...@ajs.com
http://www.ajs.com/~ajs


Re: Suggested magic for "a" .. "b"

2010-07-21 Thread Larry Wall
On Wed, Jul 21, 2010 at 09:23:11AM -0400, Mark J. Reed wrote:
: Strike the "counter to current Rakudo behavior" bit; Rakudo is
: behaving as specified in this instance.  I must have been
: hallucinating.

Well, except that we both neglected precedence.   Since ... is looser
than ~~, it must be written 3 ~~ (0...4).  :-)

Larry


Re: Suggested magic for "a" .. "b"

2010-07-21 Thread Mark J. Reed
Strike the "counter to current Rakudo behavior" bit; Rakudo is
behaving as specified in this instance.  I must have been
hallucinating.

On Wed, Jul 21, 2010 at 7:33 AM, Mark J. Reed  wrote:
> Ok, I find that surprising (and counter to current Rakudo behavior),
> but thanks for the correction, and sorry about the misinformation.
>
> On Wednesday, July 21, 2010, Larry Wall  wrote:
>> On Tue, Jul 20, 2010 at 11:53:27PM -0400, Mark J. Reed wrote:
>> : In particular, consider that pi ~~ 0..4 is true,
>> :  because pi is within the range; but pi ~~ 0...4 is false, because pi
>> : is not one of the generated elements.
>>
>> Small point here, it's not because pi is fractional: 3 ~~ 0...4 is
>> also false because 3 !eqv (0,1,2,3,4).  There is no implicit any()
>> on a smartmatch list pattern as there is in Perl 5.  In Perl 6 the
>> pattern 0..4 may only match a list with the same 5 elements in the
>> same order.
>>
>> Larry
>>
>
> --
> Mark J. Reed 
>



-- 
Mark J. Reed 


Re: Suggested magic for "a" .. "b"

2010-07-21 Thread Mark J. Reed
Ok, I find that surprising (and counter to current Rakudo behavior),
but thanks for the correction, and sorry about the misinformation.

On Wednesday, July 21, 2010, Larry Wall  wrote:
> On Tue, Jul 20, 2010 at 11:53:27PM -0400, Mark J. Reed wrote:
> : In particular, consider that pi ~~ 0..4 is true,
> :  because pi is within the range; but pi ~~ 0...4 is false, because pi
> : is not one of the generated elements.
>
> Small point here, it's not because pi is fractional: 3 ~~ 0...4 is
> also false because 3 !eqv (0,1,2,3,4).  There is no implicit any()
> on a smartmatch list pattern as there is in Perl 5.  In Perl 6 the
> pattern 0..4 may only match a list with the same 5 elements in the
> same order.
>
> Larry
>

-- 
Mark J. Reed 


Re: Suggested magic for "a" .. "b"

2010-07-21 Thread Jon Lang
Smylers wrote:
> Jon Lang writes:
>> Approaching this with the notion firmly in mind that infix:<..> is
>> supposed to be used for matching ranges while infix:<...> should be
>> used to generate series:
>>
>> With series, we want C< $LHS ... $RHS > to generate a list of items
>> starting with $LHS and ending with $RHS.  If $RHS > $LHS, we want it
>> to increment one step at a time; if $RHS < $LHS, we want it to
>> decrement one step at a time.
>
> Do we?

Yes, we do.

> I'm used to generating lists and iterating over them (in Perl 5)
> with things like like:
>
>  for (1 .. $max)
>
> where the intention is that if $max is zero, the loop doesn't execute at
> all. Having the equivalent Perl 6 list generation operator, C<...>,
> start counting backwards could be confusing.
>
> Especially if Perl 6 also has a range operator, C<..>, which would Do
> The Right Thing for me in this situation, and where the Perl 6 operator
> that Does The Right Thing is spelt the same as the Perl 5 operator that
> I'm used to; that muddles the distinction you make above about matching
> ranges versus generating lists.

It does muddy the difference, which is why my own gut instinct would
have been to do away with infix:<..>'s ability to generate lists.
Fortunately, I'm not in charge here, and wiser heads than mine have
decreed that infix:<..>, when used in list context, will indeed
generate a list in a manner that closely resembles Perl 5's range
operator: start with the LHS, then increment until you equal or exceed
the RHS - and if you start out exceeding the RHS, you've got yourself
an empty list.

You can do the same thing with the infix:<...> operator, too; but
doing so will be bulkier (albeit much more intuitive).  For example,
the preferred Perl 6 approach to what you described would be:

for 1, 2 ... $x

The two-element list on the left of the series operator invokes a bit
of magic that tells it that the algorithm for generating the next step
in the series is to invoke the increment operator.  This is all
described in S03 in considerable detail; I suggest rereading the
section there concerning the series operator before passing judgment
on it.  .

--
Jonathan "Dataweaver" Lang


Re: Suggested magic for "a" .. "b"

2010-07-21 Thread Larry Wall
On Tue, Jul 20, 2010 at 11:53:27PM -0400, Mark J. Reed wrote:
: In particular, consider that pi ~~ 0..4 is true,
:  because pi is within the range; but pi ~~ 0...4 is false, because pi
: is not one of the generated elements.

Small point here, it's not because pi is fractional: 3 ~~ 0...4 is
also false because 3 !eqv (0,1,2,3,4).  There is no implicit any()
on a smartmatch list pattern as there is in Perl 5.  In Perl 6 the
pattern 0..4 may only match a list with the same 5 elements in the
same order.

Larry


Re: Suggested magic for "a" .. "b"

2010-07-20 Thread Smylers
Jon Lang writes:

> Approaching this with the notion firmly in mind that infix:<..> is
> supposed to be used for matching ranges while infix:<...> should be
> used to generate series:
> 
> With series, we want C< $LHS ... $RHS > to generate a list of items
> starting with $LHS and ending with $RHS.  If $RHS > $LHS, we want it
> to increment one step at a time; if $RHS < $LHS, we want it to
> decrement one step at a time.

Do we? I'm used to generating lists and iterating over them (in Perl 5)
with things like like:

  for (1 .. $max)

where the intention is that if $max is zero, the loop doesn't execute at
all. Having the equivalent Perl 6 list generation operator, C<...>,
start counting backwards could be confusing.

Especially if Perl 6 also has a range operator, C<..>, which would Do
The Right Thing for me in this situation, and where the Perl 6 operator
that Does The Right Thing is spelt the same as the Perl 5 operator that
I'm used to; that muddles the distinction you make above about matching
ranges versus generating lists.

Smylers
-- 
http://twitter.com/Smylers2


Re: Suggested magic for "a" .. "b"

2010-07-20 Thread Darren Duncan

Darren Duncan wrote:
specific, the generic "eqv" operator, or "before" etc would have to be 


Correction, I meant to say "cmp", not "eqv", here. -- Darren Duncan


Re: Suggested magic for "a" .. "b"

2010-07-20 Thread Darren Duncan

Aaron Sherman wrote:

2) The spec doesn't put this information anywhere near the definition of the
range operator. Perhaps we can make a note? This was a source of confusion
for me.


My impression is that a "Range" primarily defines an "interval" in terms of 2 
endpoint values such that it defines a possibly infinite set values between 
those endpoints.


For example, 'aa'..'bb' is an infinite sized set that includes every possible 
character string that starts with the letter 'a', plus every one that starts 
with the string 'ba'.  And so, asking $anysuchstring ~~ 'aa'..'bb' is TRUE.


(Note that for ".." to work, its 2 arguments would need to be of the same type, 
so that we know which set of rules to follow.  Or to be specific, the generic 
"eqv" operator, or "before" etc would have to be defined that takes both of the 
".." arguments as its arguments.  Although this might be fuzzed a bit if the 
spec defines somewhere about automatic casting.  For example, if someone said 
'foo'..42 then I would expect that to fail.)


A "Range" can also be used in a limited fashion to generate a finite list of 
values, but that is not its primary purpose, and the "..." operator does that 
job much better.



3) It seems that there are two competing multi-character approaches and both
seem somewhat valid. Should we use a pragma to toggle behavior between A and
B:

 A: "aa" .. "bb" contains "az"
 B: "aa" .. "bb" contains ONLY "aa", "ab", "ba" and "bb"


I would find A to be the only reasonable answer.

If you want B's semantics then use "..." instead; ".." should not be overloaded 
for that.


If there were to be any similar pragma, then it should control matters like 
"collation", or what nationality/etc-specific subtype of Str the 'aa' and 'bb' 
are blessed into on definition, so that their collation/sorting/etc rules can be 
applied when figuring out if a particular $foo~~$bar..$baz is TRUE or not.


-- Darren Duncan


Re: Suggested magic for "a" .. "b"

2010-07-20 Thread Aaron Sherman
OK, there's a lot here and my head is swimming, so let me re-consolidate and
re-state (BTW: thanks Jon, you've really helped me understand, here).

1) The spec is somewhat vague, but the proposal that I made for single
characters is not an unreasonable interpretation of what's there. Thus, we
could adopt the script/major cat/minor cat triplet as the core tool that
.succ will use for single, non-combining, non-modifying, valid characters?

2) The spec doesn't put this information anywhere near the definition of the
range operator. Perhaps we can make a note? This was a source of confusion
for me.

3) It seems that there are two competing multi-character approaches and both
seem somewhat valid. Should we use a pragma to toggle behavior between A and
B:

 A: "aa" .. "bb" contains "az"
 B: "aa" .. "bb" contains ONLY "aa", "ab", "ba" and "bb"

4) About the ranges I gave as examples, you asked:

"Which codepoint is invalid, and why?"

There's just an undefined codepoint smack in the middle of the Greek
uppercase letters (U+03A2). I'm sure the Unicode specs have a rationale for
that somewhere, but my guess is that there's some thousand-year-old debate
about the Greek alphabet behind it.

"In both of these cases, what do you think it should produce?"

I actually gave that answer a bit later on. I think that "Ā" .. "Ē" should
produce ĀĂĄĆĈĊČĎĐĒ and オ .. ヺ should produce
オカガキギクグケゲコゴサザシジスズセゼソゾタダチヂツヅテデトドナニヌネノハバパヒビピフブプヘベペホボポマミムメモヤユヨラリルレロワヰヱヲンヴヷヸヹヺ
which are all of the Katakana syllabic characters.

"I also have to wonder how or if "0" ... "z" ought to be resolved.  If
you're thinking in terms of the alphabet or digits, this is
nonsensical"

Well, since you agreed with my statement about the properties checking, it
would be 0 through 9 and then a through z because 0 through 9 are Latin
numbers, matching the LHS's properties and a through z are lowercase Latin
letters, matching the RHS's properties.

For reference, this is the relevant section of the spec:

Character positions are incremented within their natural range for any
Unicode range that is deemed to represent the digits 0..9 or that is deemed
to be a complete cyclical alphabet for (one case of) a (Unicode) script.
Only scripts that represent their alphabet in codepoints that form a cycle
independent of other alphabets may be so used. (This specification defers to
the users of such a script for determining the proper cycle of letters.) We
arbitrarily define the ASCII alphabet not to intersect with other scripts
that make use of characters in that range, but alphabets that intersperse
ASCII letters are not allowed.


I'm not sure that all of that tracks with the Unicode standard's use of some
of the terms, but based on what we've discussed, perhaps we could get more
specific there:

Character positions are incremented within their Unicode Script, but only in
keeping with their General Category property. Thus C<"A"++> yields C<"B">
which is the next codepoint, but C<"Ă"++> yields C<"Ą"> even though "ą"
falls between the two, when incrementing codepoints. Should this prove
problematic for any specific Unicode Script which requires special handling
(e.g. because a "letter" really isn't used as a letter at all), such special
handling may be applied, but the above is the general rule.


and then in the section on ranges:

As discussed previously, incrementing a character (which is to say, invoking
C<.succ>) seeks the next codepoint with the same Unicode Script and General
Category properties (major and minor category to be specific). For ranges,
succession is the same if .min and .max have the same properties, but if
they do not, then all codepoints are considered which are greater than
C<.min> and smaller than C<.max> and which agree with either the properties
of C<.min> I the properties of C<.max>


Re: Suggested magic for "a" .. "b"

2010-07-20 Thread Mark J. Reed
On Wed, Jul 21, 2010 at 12:04 AM, Jon Lang  wrote:
> Mark J. Reed wrote:
>> Perhaps the syllabic kana could be the "integer" analogs, and what you
>> get when you iterate over the range using ..., while the modifier kana
>> would not be generated by the series  ア ... ヴ but would be considered
>> in the range  ア .. ヴ?  I wouldn't object to such script-specific
>> behavior, though perhaps it doesn't belong in core.
>
> As I understand it, it wouldn't need to be script-specific behavior;
> just behavior that's aware of Unicode properties.

That wouldn't help in this case.  For example, U+30A1 KATAKANA SMALL
LETTER A - the small "modifier" variety of letter under discussion -
is not a modifier in the Unicode sense.  It has exactly the same
properties as U+30A2 KATAKANA LETTER A, an actual syllable:

30A1;KATAKANA LETTER SMALL A;Lo;0;L;N;
30A2;KATAKANA LETTER A;Lo;0;L;N;

So without script-specific special-case code, there's no way to
distinguish them.  As Aaron said, they're treated like lowercase, but
that's not an accurate representation of how they're used in actual
text, or of the common idea of what constitutes the set of kana.

-- 
Mark J. Reed 


Re: Suggested magic for "a" .. "b"

2010-07-20 Thread Jon Lang
Mark J. Reed wrote:
> Perhaps the syllabic kana could be the "integer" analogs, and what you
> get when you iterate over the range using ..., while the modifier kana
> would not be generated by the series  ア ... ヴ but would be considered
> in the range  ア .. ヴ?  I wouldn't object to such script-specific
> behavior, though perhaps it doesn't belong in core.

As I understand it, it wouldn't need to be script-specific behavior;
just behavior that's aware of Unicode properties.  That particular
issue doesn't come up with the English alphabet because there aren't
any modifier codepoints embedded in the middle of the standard
alphabet.  And if there were, I'd hope that they'd be filtered out
from the series generation by default.

And I'd hope that there would be a way to turn the default filtering
off when I don't want it.

-- 
Jonathan "Dataweaver" Lang


Re: Suggested magic for "a" .. "b"

2010-07-20 Thread Mark J. Reed
On Tue, Jul 20, 2010 at 11:28 PM, Aaron Sherman  wrote:
> So, what's the intention of the range operator, then?

... is a generator that lazily enumerates a series.  .. is a
constructor for a Range object.  They're two different things, with
different behaviors.  In particular, consider that pi ~~ 0..4 is true,
 because pi is within the range; but pi ~~ 0...4 is false, because pi
is not one of the generated elements.

> I guess you could write:
>
>  ア, イ, ウ, エ, オ, カ ... ヂ,ツ ...モ,ヤ, ユ, ヨ ... ロ, ワ ... ヴ (add quotes to taste)
>
> But that seems quite a bit more painful than:

Perhaps the syllabic kana could be the "integer" analogs, and what you
get when you iterate over the range using ..., while the modifier kana
would not be generated by the series  ア ... ヴ but would be considered
in the range  ア .. ヴ?  I wouldn't object to such script-specific
behavior, though perhaps it doesn't belong in core.

-- 
Mark J. Reed 


Re: Suggested magic for "a" .. "b"

2010-07-20 Thread Jon Lang
Aaron Sherman wrote:
> So, what's the intention of the range operator, then? Is it just there to
> offer backward compatibility with Perl 5? Is it a vestige that should be
> removed so that we can Huffman ... down to ..?
>
> I'm not trying to be difficult, here, I just never knew that ... could
> operate on a single item as LHS, and if it can, then .. seems to be obsolete
> and holding some prime operator real estate.

On the contrary: it is not a vestige, it is not obsolete, and it's
making good use of the prime operator real estate that it's holding.
It's just not doing what it did in Perl 5.

I strongly recommend that you reread S03 to find out exactly what each
of these operators does these days.

>> The questions definitely look different that way: for example,
>> ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz is easily and
>> clearly expressed as
>>
>>    'A' ... 'Z', 'a' ... 'z'     # don't think this works in Rakudo yet  :(
>>
>
> I still contend that this is so frequently desirable that it should have a
> simpler form, but it's still going to have problems.
>
> One example: for expressing "Katakana letters" (I use "letters" in the
> Unicode sense, here) it's still dicey. There are things interspersed in the
> Unicode sequence for Katakana that aren't the same thing at all. Unicode
> calls them lowercase, but that's not quite right. They're smaller versions
> of Katakana characters which are used more as punctuation or accents than as
> syllabic glyphs the way the rest of Katakana is.
>
> I guess you could write:
>
>  ア, イ, ウ, エ, オ, カ ... ヂ,ツ ...モ,ヤ, ユ, ヨ ... ロ, ワ ... ヴ (add quotes to taste)
>
> But that seems quite a bit more painful than:
>
>  ア .. ヴ (or ... if you prefer)
>
> Similar problems exist for many scripts (including some of Latin, we're just
> used to the parts that are odd), though I think it's possible that Katakana
> may be the worst because of the mis-use of Ll to indicate a letter when the
> truth of the matter is far more complicated.

Some of this might be addressed by filtering the list as you go -
though I don't remember the method for doing so.  Something like
.grep, I think, with a regex in it that only accepts letters:

(ア ... ヴ).«grep(/<:alpha:>/)

...or something to that effect.

Still, it's possible that we might need something that's more flexible
than that.

-- 
Jonathan "Dataweaver" Lang


Re: Suggested magic for "a" .. "b"

2010-07-20 Thread Jon Lang
Approaching this with the notion firmly in mind that infix:<..> is
supposed to be used for matching ranges while infix:<...> should be
used to generate series:

Aaron Sherman wrote:
> Walk with me a bit, and let's explore the concept of intuitive character
> ranges? This was my suggestion, which seems pretty basic to me:
>
> "x .. y", for all strings x and y, which are composed of a single, valid
> codepoint which is neither combining nor modifying, yields the range of all
> valid, non-combining/modifying codepoints between x and y, inclusive which
> share the Unicode script, general category major property and general
> category minor property of either x or y (lack of a minor property is a
> valid value).

This is indeed true for both range-matching and series-generation as
the spec is currently written.

> In general we have four problems with current specification and
> implementation on the Perl 6 and Perl 5 sides:
>
> 1) Perl 5 and Rakudo have a fundamental difference of opinion about what
> some ranges produce ("A" .. "z", "X" .. "T", etc) and yet we've never really
> articulated why we want that.
>
> 2) We deny that a range whose LHS is "larger" than its RHS makes sense, but
> we also don't provide an easy way to construct such ranges lazily otherwise.
> This would be annoying only, but then we have declared that ranges are the
> right way to construct basic loops (e.g. for (1..1e10).reverse -> $i {...}
> which is not lazy (blows up your machine) and feels awfully clunky next to
> for 1e10..1 -> $i {...} which would not blow up your machine, or even make
> it break a sweat, if it worked)

With ranges, we want C< when $LHS .. $RHS" > to always mean C<< if
$LHS <= $_ <= $RHS >>.  If $RHS < $LHS, then the range being specified
is not valid.  In this context, it makes perfect sense to me why it
doesn't generate anything.

With series, we want C< $LHS ... $RHS > to generate a list of items
starting with $LHS and ending with $RHS.  If $RHS > $LHS, we want it
to increment one step at a time; if $RHS < $LHS, we want it to
decrement one step at a time.

So: 1) we want different behavior from the Range operator in Perl 6
vs. Perl 5 because we have completely re-envisioned the range
operator.  What we have replaced it with is fundamentally more
flexible, though not necessarily perfect.

> 3) We've never had a clear-cut goal in allowing string ranges (as opposed to
> character ranges, which Perl 5 and 6 both muddy a bit), so "intuitive"
> becomes sketchy at best past the first grapheme, and ever muddier when only
> considering codepoints (thus that wing of my proposal and current behavior
> are on much shakier ground, except in so far as it asserts that we might
> want to think about it more).

I think that one notion that we're dealing with here is the idea that
C<< $X < $X.succ >> for all strings.  This seems to be a rather
intuitive assumption to make; but it is apparently not an assumption
that Stringy.succ makes.  As I understand it, "Z".succ eqv "AA".  What
benefit do we gain from this behavior?  Is it the idea that eventually
this will iterate over every possible combination of capital letters?
If so, why is that a desirable goal?


My own gut instinct would be to define the string iterator such that
it increments the final letter in the string until it gets to "Z";
then it resets that character to "A" and increments the next character
by one:

"ABE", "ABF", "ABG" ... "ABZ", "ACA", "ACB" ... "ZZZ"

This pattern ensures that for any two strings in the series, the first
one will be less than its successor.  It does not ensure that every
possible string between "ABE" and "ZZZ" will be represented; far from
it.  But then, 1...9 doesn't produce every number between 1 and 9; it
only produces integers.  Taken to an extreme: pi falls between 1 and
9; but no one in his right mind expects us to come up with a general
sequencing of numbers that increments from 1 to 9 with a guarantee
that it will hit pi before reaching 9.

Mind you, I know that the above is full of holes.  In particular, it
works well when you limit yourself to strings composed of capital
letters; do anything fancier than that, and it falls on its face.

> 4) Many ranges involving single characters on LHS and RHS result in null
> or infinite output, which is deeply non-intuitive to me, and I expect many
> others.

Again, the distinction between range-matching and series-generation
comes to the rescue.

> Solve those (and I tried in my suggestion) and I think you will be able to
> apply intuition to character ranges, but only in so far as a human being is
> likely to be able to intuit anything related to Unicode.

Of the points that you raise, #1, 2, and 4 are neatly solved already.
I'm unsure as to #3; so I'd recommend focusing some scrutiny on it.

> The current behaviour of the range operator is (if I recall correctly):
>> 1) if both sides are single characters, make a range by incrementing
>> codepoints
>>
>
> Sadly, you can't do that reasonabl

Re: Suggested magic for "a" .. "b"

2010-07-20 Thread Aaron Sherman
Side note: you could get around some of the problems, below, but in order to
do so, you would have to exhaustively express all of Unicode using the Str
builtin module's RANGES constant. In fact, as it is now, it defines ASCII
lowercase, but doesn't define Latin lowercase. Presumably because doing so
would be a massive pain. Again, I'll point out that using script and
properties is much easier

On Tue, Jul 20, 2010 at 10:35 PM, Solomon Foster  wrote:

>
> Sorry, didn't mean to imply the series operator was perfect.  (Though
> it is surprisingly awesome in  general, IMO.)  Just that the right
> questions would be about the series operator rather than Ranges.
>

So, what's the intention of the range operator, then? Is it just there to
offer backward compatibility with Perl 5? Is it a vestige that should be
removed so that we can Huffman ... down to ..?

I'm not trying to be difficult, here, I just never knew that ... could
operate on a single item as LHS, and if it can, then .. seems to be obsolete
and holding some prime operator real estate.


>
> The questions definitely look different that way: for example,
> ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz is easily and
> clearly expressed as
>
>'A' ... 'Z', 'a' ... 'z' # don't think this works in Rakudo yet  :(
>

I still contend that this is so frequently desirable that it should have a
simpler form, but it's still going to have problems.

One example: for expressing "Katakana letters" (I use "letters" in the
Unicode sense, here) it's still dicey. There are things interspersed in the
Unicode sequence for Katakana that aren't the same thing at all. Unicode
calls them lowercase, but that's not quite right. They're smaller versions
of Katakana characters which are used more as punctuation or accents than as
syllabic glyphs the way the rest of Katakana is.

I guess you could write:

  ア, イ, ウ, エ, オ, カ ... ヂ,ツ ...モ,ヤ, ユ, ヨ ... ロ, ワ ... ヴ (add quotes to taste)

But that seems quite a bit more painful than:

 ア .. ヴ (or ... if you prefer)

Similar problems exist for many scripts (including some of Latin, we're just
used to the parts that are odd), though I think it's possible that Katakana
may be the worst because of the mis-use of Ll to indicate a letter when the
truth of the matter is far more complicated.



> That suggests to me that the current behavior of 'A' ... 'z' is pretty
> reasonable.
>

You still have to decide to make at least some allowances for invalid
codepoints and I think you should avoid ever generating a combining or
modifying codepoint in such a sequence (e.g. "Ѻ" ... "Ҋ" in Cyrillic which
contains several combining characters for currency and counting as well as
one undefined codepoint).

-- 
Aaron Sherman
Email or GTalk: a...@ajs.com
http://www.ajs.com/~ajs


Re: Suggested magic for "a" .. "b"

2010-07-20 Thread Solomon Foster
On Tue, Jul 20, 2010 at 10:00 PM, Jon Lang  wrote:
> Solomon Foster wrote:
>> Ranges haven't been intended to be the "right way" to construct basic
>> loops for some time now.  That's what the "..." series operator is
>> for.
>>
>>    for 1e10 ... 1 -> $i {
>>         # whatever
>>    }
>>
>> is lazy by the spec, and in fact is lazy and fully functional in
>> Rakudo.  (Errr... okay, actually it just seg faulted after hitting
>> 968746 in the countdown.  But that's a Rakudo bug unrelated to
>> this, I'm pretty sure.)
>
> You took the words out of my mouth.
>
>> All the magic that one wants for handling loop indices -- going
>> backwards, skipping numbers, geometric series, and more -- is present
>> in the series operator.  Range is not supposed to do any of that stuff
>> other than the most basic forward sequence.
>
> Here, though, I'm not so sure: I'd like to see how many of Aaron's
> issues remain unresolved once he reframes them in terms of the series
> operator.

Sorry, didn't mean to imply the series operator was perfect.  (Though
it is surprisingly awesome in  general, IMO.)  Just that the right
questions would be about the series operator rather than Ranges.

The questions definitely look different that way: for example,
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz is easily and
clearly expressed as

'A' ... 'Z', 'a' ... 'z' # don't think this works in Rakudo yet  :(

That suggests to me that the current behavior of 'A' ... 'z' is pretty
reasonable.

-- 
Solomon Foster: colo...@gmail.com
HarmonyWare, Inc: http://www.harmonyware.com


Re: Suggested magic for "a" .. "b"

2010-07-20 Thread Jon Lang
Solomon Foster wrote:
> Ranges haven't been intended to be the "right way" to construct basic
> loops for some time now.  That's what the "..." series operator is
> for.
>
>    for 1e10 ... 1 -> $i {
>         # whatever
>    }
>
> is lazy by the spec, and in fact is lazy and fully functional in
> Rakudo.  (Errr... okay, actually it just seg faulted after hitting
> 968746 in the countdown.  But that's a Rakudo bug unrelated to
> this, I'm pretty sure.)

You took the words out of my mouth.

> All the magic that one wants for handling loop indices -- going
> backwards, skipping numbers, geometric series, and more -- is present
> in the series operator.  Range is not supposed to do any of that stuff
> other than the most basic forward sequence.

Here, though, I'm not so sure: I'd like to see how many of Aaron's
issues remain unresolved once he reframes them in terms of the series
operator.

-- 
Jonathan "Dataweaver" Lang


Re: Suggested magic for "a" .. "b"

2010-07-20 Thread Solomon Foster
On Tue, Jul 20, 2010 at 7:31 PM, Aaron Sherman  wrote:
> 2) We deny that a range whose LHS is "larger" than its RHS makes sense, but
> we also don't provide an easy way to construct such ranges lazily otherwise.
> This would be annoying only, but then we have declared that ranges are the
> right way to construct basic loops (e.g. for (1..1e10).reverse -> $i {...}
> which is not lazy (blows up your machine) and feels awfully clunky next to
> for 1e10..1 -> $i {...} which would not blow up your machine, or even make
> it break a sweat, if it worked)

Ranges haven't been intended to be the "right way" to construct basic
loops for some time now.  That's what the "..." series operator is
for.

for 1e10 ... 1 -> $i {
 # whatever
}

is lazy by the spec, and in fact is lazy and fully functional in
Rakudo.  (Errr... okay, actually it just seg faulted after hitting
968746 in the countdown.  But that's a Rakudo bug unrelated to
this, I'm pretty sure.)

All the magic that one wants for handling loop indices -- going
backwards, skipping numbers, geometric series, and more -- is present
in the series operator.  Range is not supposed to do any of that stuff
other than the most basic forward sequence.

-- 
Solomon Foster: colo...@gmail.com
HarmonyWare, Inc: http://www.harmonyware.com


Re: Suggested magic for "a" .. "b"

2010-07-20 Thread Aaron Sherman
This is a long reply, but I read it over a few times, and I don't see any
fat to trim. This isn't really a simple issue for which intuition is going
to be a sufficient guide, though I agree fully that it needs to be high on
or at the top of the list.

On Sun, Jul 18, 2010 at 6:26 AM, Moritz Lenz  wrote:

> In general, stuffing more complex behaviour into something that feels
> unintuitive is rarely (if ever) a good solution.


Walk with me a bit, and let's explore the concept of intuitive character
ranges? This was my suggestion, which seems pretty basic to me:

"x .. y", for all strings x and y, which are composed of a single, valid
codepoint which is neither combining nor modifying, yields the range of all
valid, non-combining/modifying codepoints between x and y, inclusive which
share the Unicode script, general category major property and general
category minor property of either x or y (lack of a minor property is a
valid value).


In general we have four problems with current specification and
implementation on the Perl 6 and Perl 5 sides:

1) Perl 5 and Rakudo have a fundamental difference of opinion about what
some ranges produce ("A" .. "z", "X" .. "T", etc) and yet we've never really
articulated why we want that.

2) We deny that a range whose LHS is "larger" than its RHS makes sense, but
we also don't provide an easy way to construct such ranges lazily otherwise.
This would be annoying only, but then we have declared that ranges are the
right way to construct basic loops (e.g. for (1..1e10).reverse -> $i {...}
which is not lazy (blows up your machine) and feels awfully clunky next to
for 1e10..1 -> $i {...} which would not blow up your machine, or even make
it break a sweat, if it worked)

3) We've never had a clear-cut goal in allowing string ranges (as opposed to
character ranges, which Perl 5 and 6 both muddy a bit), so "intuitive"
becomes sketchy at best past the first grapheme, and ever muddier when only
considering codepoints (thus that wing of my proposal and current behavior
are on much shakier ground, except in so far as it asserts that we might
want to think about it more).

4) Many ranges involving single characters on LHS and RHS result in null
or infinite output, which is deeply non-intuitive to me, and I expect many
others.

Solve those (and I tried in my suggestion) and I think you will be able to
apply intuition to character ranges, but only in so far as a human being is
likely to be able to intuit anything related to Unicode.


The current behaviour of the range operator is (if I recall correctly):
>
> 1) if both sides are single characters, make a range by incrementing
> codepoints
>

Sadly, you can't do that reasonably. Here are some examples of why, using
only Latin and Greek as examples (not the most convoluted Unicode sections
to be sure):


   - "Α" (capital Greek alpha, not Latin A) .. "Ω" would result in a range
   that contains an invalid codepoint (rakudo: drops the invalid codepoint,
   which you may have meant to imply, but I'm being pedantic because I want to
   come to a specification, not just a sense of the right solution)
   - "Ā" .. "Ē" would be "ĀāĂ㥹ĆćĈĉĊċČčĎďĐđĒ" which is really not what
   you're likely to expect! (rakudo: Ā, infinitely repeating, which is an even
   larger problem for Katakana, where "オ" .. "ヺ" seems a very intuitive way to
   say "all Katakana non-cased letters" but fails because the range contains
   both cased and uncased; Perl 5 just prints "オ", and I think it also sneers
   at you)
   - "A" .. "z" comes out really odd because it contains punctuation (mind
   you, your suggestion is saner than Rakudo's current behavior on "A" .. "z"
   which is an infinite progression of capital-letter-only sequences of 1 or
   more characters! Intuitive, it's not.)


My point was that, if you want simple and intuitive out of Unicode, you're
kind of screwed. The closest you can get is to build your range using
properties and script. The way I suggested doing that was the simplest I
could think of. Speak up if you have a simpler one.

For most simple ranges, our results will be identical (e.g "A" .. "Q").

For the above examples, I would end up producing:

1: Alpha through Omega greek capital letters
2: ĀĂĄĆĈĊČĎĐĒ
(and オカガキギクグケゲコゴサザシジスズセゼソゾタダチヂツヅテデトドナニヌネノハバパヒビピフブプヘベペホボポマミムメモヤユヨラリルレロワヰヱヲンヴヷヸヹヺ
for the Katakana)
3: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz

That seems pretty darned intuitive to me. Mind you, "A" .. "ž" is still ugly
as sin in terms of ordering once you listify, and I can't reasonably fix
that without re-defining Unicode or having a really, really convoluted and
special-case rule, but without getting convoluted, even that ugly example
does something useful and, I dare say, intuitive for testing membership.

Here's the pseudo-code for my suggestion:

  class SingleCharAlphaRange {
 has $.start;
 has $.end;
 # Verify that this is a single character string which is valid
 # and non-combining/non-modifying and repre

Re: Suggested magic for "a" .. "b"

2010-07-18 Thread Moritz Lenz
Ruud H.G. van Tol wrote:
> Aaron Sherman wrote:
> 
>> Having established this range for each correspondingly indexed letter, the
>> range for multi-character strings is defined by a left-significant counting
>> sequence. For example:
>> 
>> "Ab" .. "Be"
>> 
>> defines the ranges:
>> 
>>  and 
>> 
>> This results in a counting sequence (with the most significant character on
>> the left) as follows:
>> 
>> 
> 
> glob can do that:
> 
> perl5.8.5 -wle 'print for <{A,B}{c,d,e}>'

Or Perl 6, for that matter :-)

> .say for  X~ ('a' .. 'e')
Aa
Ab
Ac
Ad
Ae
Ba
Bb
Bc
Bd
Be

In general, stuffing more complex behaviour into something that feels
unintuitive is rarely (if ever) a good solution.

The current behaviour of the range operator is (if I recall correctly):

1) if both sides are single characters, make a range by incrementing
codepoints
2) otherwise, call .succ on the LHS. Stop before the generated values
exceed the RHS.

I'm not convinced it should be any more complicated than that. Remember
that with the series operator you can easily define your own
incrementation rules, and meta operators (like the cross meta operator
demonstrated above) it's also easy to combine different series and lists.

Cheers,
Moritz


Re: Suggested magic for "a" .. "b"

2010-07-17 Thread Ruud H.G. van Tol

Aaron Sherman wrote:


Having established this range for each correspondingly indexed letter, the
range for multi-character strings is defined by a left-significant counting
sequence. For example:

"Ab" .. "Be"

defines the ranges:

 and 

This results in a counting sequence (with the most significant character on
the left) as follows:




glob can do that:

perl5.8.5 -wle 'print for <{A,B}{c,d,e}>'
Ac
Ad
Ae
Bc
Bd
Be



Currently, Rakudo produces this:

"Ab", "Ac", "Ad", "Ae", "Af", "Ag", "Ah", "Ai", "Aj", "Ak", "Al", "Am",
"An", "Ao", "Ap", "Aq", "Ar", "As", "At", "Au", "Av", "Aw", "Ax", "Ay",
"Az", "Ba", "Bb", "Bc", "Bd", "Be"

which I don't think is terribly useful.


Good enough for me. For your variant, just override the .. for 'smarter' 
behavior?


--
Ruud



Re: Suggested magic for "a" .. "b"

2010-07-16 Thread Aaron Sherman
On Fri, Jul 16, 2010 at 3:49 PM, Carl Mäsak  wrote:

> Aaron (>):
> > [...]
> >
> > Many useful results from this suggested change:
> >
> > "C" .. "A" =  (Rakudo: <>)
>
> Regardless of the other traits of your proposed semantics, I think
> permitting reversed ranges such as the one above would be a mistake.
>

Why are you calling that a "reversed range"? It's not reversed, it's a range
like any other. The ordering of the terminator elements is only interesting
if you start pulling elements out. As a range, ordering isn't really
significant.


> Rakudo gives the empty list for ranges whose lhs exceeds (fsvo
> "exceeds") its rhs, because that's the way ranges work in Perl. The
> reason ranges work that way in Perl (in my understanding) is that it's
> the less surprising behavior when the endpoints are determined at
> runtime.
>

In Perl 5, if that's what you mean, "C" .. "A" produces the letters from C
to Z. I have no rational explanation for why, but I suggest we avoid
emulating this behavior in Perl 6.


> For explicitly specifying a reverse list of characters, there's still
> `reverse "A" .. "C"`, which is not only a straightforward idiom and
> huffmanized about right, but also good documentation for the reader.
>

reverse("A" .. "C") is not the same as "C" .. "A". Observe:

$ ./perl6 -e 'say reverse("A" .. "C").perl'
["C", "B", "A"]
$ ./perl6 -e 'say ("A" .. "C").perl'
"A".."C"

In order for reverse to work lazily, it would have to add a wrapper to the
iterator that asked for its last element first, and it's not clear to me
that one CAN ask for an iterators last element without unrolling it. For
single characters, that's not TOO bad, but for strings.elems > 1 you could
blow out your RAM on even fairly trivial strings.

-- 
Aaron Sherman
Email or GTalk: a...@ajs.com
http://www.ajs.com/~ajs


Re: Suggested magic for "a" .. "b"

2010-07-16 Thread Aaron Sherman
On Fri, Jul 16, 2010 at 9:40 PM, Michael Zedeler  wrote:

>
> What started it all, was the intention to extend the operator, making it
> possible to evaluate it in list context. Doing so has opened pandoras box,
> because many (most? all?) solutions are inconsistent with the rule of least
> surprise.
>

I don't think there's any coherent expectation, and therefore no potential
to avoid surprise. Returning comic books might be more of a surprise, but as
long as you're returning a string which appears to be "in the range"
expressed, then I don't see surprise as the problem.


>
> For instance, when considering strings, writing up an expression like
>
> 'goat' ~~ 'cow' .. 'zebra'
>
> This makes sense in most cases, because goat is lexicographically between
> cow and zebra.


This presumes that we're treating a string as a "number" in base x (where x,
I guess would be the number of code points which share ... what, any of the
general category properties of the components of the input strings?

That begins to get horrendously messy very, very fast:

 say "1aB" .. "aB1"



> I'd suggest that if you want to evaluate a Range in list context, you may
> have to provide a hint to the Range generator, telling it how to generate
> subsequent values. Your suggestion that the expansion of 'Ab' ..  'Be'
> should yield  is just an example of a different
> generator (you could call it a different implementation of ++ on Str types).
> It does look useful, but by realizing that it probably is, we have two
> candidates for how Ranges should evaluate in list context.
>

I think the solution here is to evaluate what's practical in the general
case. Your examples are, I think misleading because they involve English
words and we naturally leap to "sure, that one's in the dictionary between
the other two." However, let me pose this dictionary lookup for you:

 "cliché" ~~ "aphorism" .. "truth"

Now, you see where this is going? What happens when we throw in some
punctuation?

 "father-in-law" ~~ "dad" .. "stranger"

The problem is that you have a complex heuristic in mind for determining
membership, and a very simple operator for expressing the set. Worse, I
haven't even gotten into dealing with Unicode where it's entirely reasonable
to write "TOPIXコンポジット1500構成銘柄" which I shamelessly grabbed from a Tokyo
Stock Exchange page. That one string, used in everyday text, contains Latin
letters, Hiragana, Katakana, Han or Kanji idiograms and Latin digits.

Meanwhile, back to ".." ... the range operator. The most useful application
that I can think of for strings of length > 1 is for generating unique
strings such as for mktemp.

Beyond that, its application is actually quite limited, because the rules
for any other sort of string that might make sense to a human are absurdly
complex.

As such, I think it suffices to say that, for the most part, ".." makes
sense for single-character strings, and to expand from there, rather than
trying to introduce anything more complex.

-- 
Aaron Sherman
Email or GTalk: a...@ajs.com
http://www.ajs.com/~ajs


Re: Suggested magic for "a" .. "b"

2010-07-16 Thread Aaron Sherman
On Fri, Jul 16, 2010 at 1:14 PM, yary  wrote:

> There is one case where Rakudo's current output makes more sense then
>
your proposal, and that's when the sequence is analogous to a range of
> numbers in another base, and you don't want to start at the equivalent
> of '' or end up at the equivalent of ''.


If you want a range of numbers, you should be using numbers. Perl should
absolutely not try to guess that you want codepoints to appear in your
result set which were not either expressed in the input or fall between the
range of any two corresponding input codepoints.


> But that's a less
> usual case and there's a workaround. Using your method & example, "Ab"
> .. "Az", "Ba" .. "Be" would reproduce what Rakudo does now.
>

Quite true.


>
> In general, I like it. Though it does mean that the sequence generated
> incrementing "Ab" repeatedly will diverge from "Ab" .. "Be" after 4
> iterations.
>

Also true, and I think that's a correct thing.

-- 
Aaron Sherman
Email or GTalk: a...@ajs.com
http://www.ajs.com/~ajs


Re: Suggested magic for "a" .. "b"

2010-07-16 Thread Jon Lang
Aaron Sherman wrote:
> Oh bother, I wrote this up last night, but forgot to send it. Here y'all
> go:
>
> I've been testing ".." recently, and it seems, in Rakudo, to behave like
> Perl 5. That is, the magic auto-increment for "a" .. "z" works very
> wonkily,
> given any range that isn't within some very strict definitions (identical
> Unicode general category, increasing, etc.) So the following:
>
> "A" .. "z"
>
> produces very odd results.

Bear in mind that ".." is no longer supposed to be used to generate
lists; for that, you should use "...".  That said, that doesn't
address the issues you're raising; it merely spreads them out over two
operators (".." when doing pattern matching, and "..." when doing list
generation).

Your restrictions and algorithms are a good start, IMHO; and at some
point when I have the time, energy, and know-how, I'll read through
them in detail and comment on them.  In the meantime, though, let me
point out a fairly obvious point: sometimes, I want my pattern
matching and list generation to be case-sensitive; other times, I
don't.  More generally, whatever algorithm you decide on should be
subject to tweaking by the user to more accurately reflect his
desires.  So perhaps ".." and "..." should have an adverb that lets
you switch case sensitivity on (if the default is "off") or off (if
the default is "on").  And if you do this, there should be function
forms of ".." and "..." for those of us who have trouble working with
the rules for applying adverbs to operators.  Likewise with other
situations where there might be more than one way to approach things.

-- 
Jonathan "Dataweaver" Lang


Re: Suggested magic for "a" .. "b"

2010-07-16 Thread Michael Zedeler

On 2010-07-16 18:40, Aaron Sherman wrote:

Oh bother, I wrote this up last night, but forgot to send it. Here y'all go:

I've been testing ".." recently, and it seems, in Rakudo, to behave like
Perl 5. That is, the magic auto-increment for "a" .. "z" works very wonkily,
given any range that isn't within some very strict definitions (identical
Unicode general category, increasing, etc.) So the following:

"A" .. "z"

produces very odd results.

I'd like to suggest that we re-define this operator on strings as follows:

[cut]

"Ab" .. "Be"

defines the ranges:

  and

This results in a counting sequence (with the most significant character on
the left) as follows:



Currently, Rakudo produces this:

"Ab", "Ac", "Ad", "Ae", "Af", "Ag", "Ah", "Ai", "Aj", "Ak", "Al", "Am",
"An", "Ao", "Ap", "Aq", "Ar", "As", "At", "Au", "Av", "Aw", "Ax", "Ay",
"Az", "Ba", "Bb", "Bc", "Bd", "Be"

which I don't think is terribly useful.
   
I have been discussing the Range operator before on this list, and since 
it often becomes the topic of discussion, something must be wrong with it.


What started it all, was the intention to extend the operator, making it 
possible to evaluate it in list context. Doing so has opened pandoras 
box, because many (most? all?) solutions are inconsistent with the rule 
of least surprise.


For instance, when considering strings, writing up an expression like

'goat' ~~ 'cow' .. 'zebra'

This makes sense in most cases, because goat is lexicographically 
between cow and zebra. So we have a nice ordering of strings that even 
extends to strings of any length (note that the three words used in my 
example are 3, 4 and 5 letters). As you can see, we even have a Range 
operator in there, so everything should be fine. What breaks everything 
is that we expect the Range operator to be able to generate all values 
between the two provided endpoints. Everything goes downhill from there.


With regard to strings, lexicographical ordering is the only prevailing 
ordering we provide the developer with (apart from length which doesn't 
provide a strict ordering that is needed). So anyone using the Range 
operator would assume that when lexicographical ordering is used for 
Range membership test, it is also used for generation of its members, 
naturally leading to the infinite sequence


cow
cowa
cowaa
cowaaa
...
cowb
cowba
cowbaa

For some reason (even though Perl6 supports infinite lists) we are 
currently using a completely new construct: the domain of strings 
limited to the lenght of the longest operand. This is counter intuitive 
since


'cowbell' ~~ 'cow' .. 'zebra'

but

'cow' .. 'zebra'

does not produce 'cowbell' in list context.

Same story applies to other types that come with a natural ordering, but 
have an over countable domain. Although the solutions differ, the main 
problem is the same - they all behave counter intuitive.


5.0001 ~~ 1.1 .. 10.1

but

1.1 .. 10.1

does not (and really shouldn't!) produce 5.0001 in list context.

I'd suggest that if you want to evaluate a Range in list context, you 
may have to provide a hint to the Range generator, telling it how to 
generate subsequent values. Your suggestion that the expansion of 'Ab' 
..  'Be' should yield  is just an example of a 
different generator (you could call it a different implementation of ++ 
on Str types). It does look useful, but by realizing that it probably 
is, we have two candidates for how Ranges should evaluate in list context.


The same applies to Numeric types.

My suggestion is to eliminate the succ method on Rat, Complex, Real and 
Str and point people in the direction of the series operator if they 
need to generate sequences of things that are over countable.


Regards,

Michael.



Re: Suggested magic for "a" .. "b"

2010-07-16 Thread Carl Mäsak
Aaron (>):
> [...]
>
> Many useful results from this suggested change:
>
> "C" .. "A" =  (Rakudo: <>)

Regardless of the other traits of your proposed semantics, I think
permitting reversed ranges such as the one above would be a mistake.

Rakudo gives the empty list for ranges whose lhs exceeds (fsvo
"exceeds") its rhs, because that's the way ranges work in Perl. The
reason ranges work that way in Perl (in my understanding) is that it's
the less surprising behavior when the endpoints are determined at
runtime.

For explicitly specifying a reverse list of characters, there's still
`reverse "A" .. "C"`, which is not only a straightforward idiom and
huffmanized about right, but also good documentation for the reader.

// Carl


Re: Suggested magic for "a" .. "b"

2010-07-16 Thread yary
On Fri, Jul 16, 2010 at 9:40 AM, Aaron Sherman  wrote:
> For example:
>
> "Ab" .. "Be"
>
> defines the ranges:
>
>  and 
>
> This results in a counting sequence (with the most significant character on
> the left) as follows:
>
> 
>
> Currently, Rakudo produces this:
>
> "Ab", "Ac", "Ad", "Ae", "Af", "Ag", "Ah", "Ai", "Aj", "Ak", "Al", "Am",
> "An", "Ao", "Ap", "Aq", "Ar", "As", "At", "Au", "Av", "Aw", "Ax", "Ay",
> "Az", "Ba", "Bb", "Bc", "Bd", "Be"

There is one case where Rakudo's current output makes more sense then
your proposal, and that's when the sequence is analogous to a range of
numbers in another base, and you don't want to start at the equivalent
of '' or end up at the equivalent of ''. But that's a less
usual case and there's a workaround. Using your method & example, "Ab"
.. "Az", "Ba" .. "Be" would reproduce what Rakudo does now.

In general, I like it. Though it does mean that the sequence generated
incrementing "Ab" repeatedly will diverge from "Ab" .. "Be" after 4
iterations.

-y


Suggested magic for "a" .. "b"

2010-07-16 Thread Aaron Sherman
Oh bother, I wrote this up last night, but forgot to send it. Here y'all go:

I've been testing ".." recently, and it seems, in Rakudo, to behave like
Perl 5. That is, the magic auto-increment for "a" .. "z" works very wonkily,
given any range that isn't within some very strict definitions (identical
Unicode general category, increasing, etc.) So the following:

"A" .. "z"

produces very odd results.

I'd like to suggest that we re-define this operator on strings as follows:

RESTRICTIONS:

First off, if either argument contains combining, modifying, undefined,
reserved or other codepoints which either cannot be treated as a single,
independent "character" or whose Unicode properties are not firmly
established in the Unicode specification, then an exception is immediately
raised. This must be done in order to assure that each character index can
be compared to each corresponding character index without the typical
Unicode ambiguities. Ligatures and other decomposable sequences are treated
by their codepoint in the current encoding, only.

Treatment of strings whose encodings differ should be possible, as all
comparisons are performed on codepoints.

If either argument is zero length, an exception is raised.

If either one argument is *, then it is assumed to stand for the largest
(RHS) or smallest (LHS) codepoint with the same Unicode general properties
as the opposite side (for each character index, if the other value is a
string of length > 1).

ALGORITHM:

If both arguments are strings of non-zero length, ".." will first determine
which is the shorter. This length is the "significant length". Any
characters after this length in the longer sequence are ignored (return
value might be an unthrown exception in this case?)

For all remaining characters, each character is considered with respect to
its correspondingly indexed character in the other string the following
algorithm is applied to determine the range that they represent (the LHS
character is referred to as "A", below and the RHS as "B")

The binary Unicode general category properties of A and B are considered
from the set of major category classes:

L, M, N, P, S, Z, C

Thus the Lu property or Pe property would be considered. The total range
consists of all codepoints lying between the lower of the two codepoints and
the higher of the two, inclusive, which share either the major and minor
Unicode general category property of A and B (if there is no minor subclass,
then codepoints without a minor subclass are considered with respect to that
endpoint). The ordering is determined by the ordering of A and B.

The range is then restricted to codepoints which share the same script as A
or B.

Thus, latin "a" and greek lowercase pi would define a range which included
all lower-case letters from the Latin and Greek scripts that fell between
their codepoints.

Having established this range for each correspondingly indexed letter, the
range for multi-character strings is defined by a left-significant counting
sequence. For example:

"Ab" .. "Be"

defines the ranges:

 and 

This results in a counting sequence (with the most significant character on
the left) as follows:



Currently, Rakudo produces this:

"Ab", "Ac", "Ad", "Ae", "Af", "Ag", "Ah", "Ai", "Aj", "Ak", "Al", "Am",
"An", "Ao", "Ap", "Aq", "Ar", "As", "At", "Au", "Av", "Aw", "Ax", "Ay",
"Az", "Ba", "Bb", "Bc", "Bd", "Be"

which I don't think is terribly useful.

Many useful results from this suggested change:

"C" .. "A" =  (Rakudo: <>)

"(" .. "}" = <( ) [ ] { }> (because open-paren is Pe and close-brace is Ps,
therefore all Pe and Ps codepoints in the range are included).

"Α" .. "Ω" = <Α Β Γ Δ Ε Ζ Η Θ Ι Κ Λ Μ Ν Ξ Ο Π Ρ Σ Τ Υ Φ Χ Ψ Ω> (notice that
codepoint U+03A2 is gracefully skipped, as it is undefined and thus has no
properties).

"apple" .. "orange" = the counting sequence defined by the ranges "a" ..
"o", "p" .. "r", "p" .. "a", "l" .. "n", "e" .. "g" (notice that the string
"orang" will be part of the result set, but "orange" will not.)

In addition:

One alternative to truncation of strings of differing lengths is to extend
the sequence. For example, if we ask for "a" .. "bc", then we might produce
. Where the extension is the original range plus the same range
where each element has the extended string elements concatenated. This might
even be iterated for every additional codepoint in the longer string. For
example: "a" .. "bcd" = 

"..." could have similar semantics. In the case of A, B ... C, for length 1
strings, the range A .. B is simply projected forward to until x ge C (if
A..B is increasing, le otherwise). C's properties probably should not be
considered at all. In the case of length > 1 strings each character index is
projected forward independently until any one character index ge the
corresponding index in the terminator, and there is no "counting":

"AAA", "BCD" ... "GGG" = 

If any index in the sequence does not increment (e.g. "AA", "AB" ... "ZZ")
then there is an implic