Re: multi-character ranges

Jon Lang Wed, 21 Jul 2010 15:47:30 -0700

Aaron Sherman wrote:
> Darren Duncan wrote:
> 3) It seems that there are two competing multi-character approaches and both
>>> seem somewhat valid. Should we use a pragma to toggle behavior between A
>>> and
>>> B:
>>>
>>>  A: "aa" .. "bb" contains "az"
>>>  B: "aa" .. "bb" contains ONLY "aa", "ab", "ba" and "bb"
>>>
>>
>> I would find A to be the only reasonable answer.
>
> [Before I respond, let's agree that, below, I'm going to say things like
> "generates" when talking about "..". What I'm describing is the idea that a
> value exists in the range given, not that a range is actually a list.]
>
> I would find B to be the only reasonable answer, but enough people seem to
> think the other way that I understand there's a valid need to be able to get
> both behaviors.


FWIW, the reasoning behind A is that it's very much like looking up a
word in a dictionary.  Is "az" greater than, less than, or equal to
"aa"?  Greater than.  Is "az" greater than, equal to, or less than
"bb"?  Less than.  Since it is greater than "aa" and less than "bb",
it is between "aa" and "bb".  This is what infix:<..> tests for.

>> If you want B's semantics then use "..." instead; ".." should not be
>> overloaded for that.
>>
>
> I wasn't really distinguishing between ".." and "..." as I'm pretty sure
> they should have the same behavior, here. The case where I'm not sure they
> should have the same behavior is "apple" .. "orange". Frankly, I think that
> there's no right solution there. There's the one I proposed in my original
> message (treat each character index as a distinct sequence and then
> increment in a base defined by all of the sequences), but even I don't like
> that. To generate all possible strings of length 5+ that sort between those
> two is another suggestion, but then what do you expect "father-in-law" ..
> "orange" to do? Punctuation throws a whole new dimension in there, and I'm
> immediately lost. When you go to my Japanese example from many messages ago,
> which I got from a fairly typical Web site and contained 2 Scripts with 4
> different General Categories, I begin to need pharmaceuticals.

What you're asking about now isn't the range or series operators; its
the comparison operators: before, after, gt, lt, ge, le, leg, and so
on.  When comparing two strings, establishing an order between them is
generally straightforward as long as both are composed of letters from
the same alphabet and with the same case; but once you start mixing
cases, introducing non-alphabetical characters such as spaces or
punctuation, and/or introducing characters from other alphabets, the
common-sense meaning of order becomes messy.

Traditionally, this has been addressed by falling back on a comparison
of the characters' ordinals: 0x0041 comes before 0x0042, and so on.
It includes counterintuitive situations where "d" > "E", because all
capital letters come earlier in the Unicode sequencing than any
lower-case letters do.  OTOH, it's robust: if all that you want is a
way to ensure that strings can always be sorted, this will do the job.
 It won't always be an _intuitive_ ordering; but there will always be
an ordering.

> I don't see any value in having different rules for what .. and ... generate
> in these cases, however. (frankly, I'm still on the fence about ... for
> single endpoints, which I think should just devolve to .. (... with a list
> for LHS is another animal, of course))

The only area where infix:<..> and infix:<...> overlap is when you're
talking about list generation; when using them for matching purposes,
C< $x ~~ 1..3 > is equivalent to C< $x >= 1 && $x <= 3 > (that is,
it's a single value that falls somewhere between the two endpoints),
while C< $x ~~ 1...3 > is equivalent to C< $x ~~ (1, 2, 3) > (that is,
$x is a three-element list that contains the values 1, 2, and 3 in
that order) - two very different things.  There simply is not enough
similarity between the two operators for one to degenerate to the
other in anything  more than a few edge-cases.

-- 
Jonathan "Dataweaver" Lang

Re: multi-character ranges

Reply via email to