Re: Suggested magic for "a" .. "b"
On Sun, Aug 1, 2010 at 11:39 PM, Martin D Kealey wrote: > In any case I'd much rather prefer that the behaviour be lexically scoped, > with either adverbs or pragmata, not with the action-at-a-distance that's > caused by tagging something as fundamental as a String. In many cases the collation isn't known at compile-time, so adverbs would be necessary anyway. Pragma's can make things easier in many cases. Leon
Re: Suggested magic for "a" .. "b"
Martin D Kealey wrote: On Wed, 28 Jul 2010, Darren Duncan wrote: I think that a general solution here is to accept that there may be more than one valid way to sort some types, strings especially, and so operators/routines that do sorting should be customizable in some way so users can pick the behaviour they want. The customization could be applied at various levels, such as using an extra argument or trait for the operator/function that cares about ordering, That much I agree wholeheartedly with, but ... or by using an extra attribute or trait for the types being sorted. ... puts us back where we started: how do we cope if the two endpoints aren't tagged with the same attribute or trait or locale? In any case I'd much rather prefer that the behaviour be lexically scoped, with either adverbs or pragmata, not with the action-at-a-distance that's caused by tagging something as fundamental as a String. Lexical scoping *is* a good idea, and I would also imagine that users would frequently apply that at the file or setting level. But making this a pragma means that the pragma would have to be a little more verbose than a typical pragma. In the general format, one wouldn't just say, eg: collation FooNation; ... but rather it would at least be more like: collation Str FooNation; ... to say that you're only applying to operations involving Str types and not, say, Numeric types. So then, "a" cmp "ส้" is always defined, but users can change the definition. I take the opposite approach; it's always undefined (read, unthrown exception) unless the user tells us how they want it treated. That can be a command-line switch if necessary. To paraphrase Dante, "the road to hell is paved with Reasonable Defaults". Or in programming terms, your reasonable default is the cause of my ugly work-around. That might be fair. But if we're going to do that, then I'd like to go a step further and require some other operators have mandatory config arguments for users to explicitly state the semantics they want, but that once again a lexical pragma can declare this at a higher level. I'm restating this thought in another thread, "rounding method adverbs", so that's the best place to follow it. -- Darren Duncan
Re: Suggested magic for "a" .. "b"
On Wed, 28 Jul 2010, Darren Duncan wrote: > I think that a general solution here is to accept that there may be more > than one valid way to sort some types, strings especially, and so > operators/routines that do sorting should be customizable in some way so > users can pick the behaviour they want. > > The customization could be applied at various levels, such as using an > extra argument or trait for the operator/function that cares about > ordering, That much I agree wholeheartedly with, but ... > or by using an extra attribute or trait for the types being sorted. ... puts us back where we started: how do we cope if the two endpoints aren't tagged with the same attribute or trait or locale? In any case I'd much rather prefer that the behaviour be lexically scoped, with either adverbs or pragmata, not with the action-at-a-distance that's caused by tagging something as fundamental as a String. Yes sometimes you want the behaviour of your range to mimic the locale of its operands, but then it should be explicit, with a trait that also explicitly selects either the left or right operand to extract the locale from. And probably throw an exception if they aren't in the same locale. If you don't specify that you want locale-dependent behaviour then the default action should be an unthrown exception unless both endpoints are inarguably comparable, so IMHO that pretty much rules out any code-points that are used in more than language, save perhaps raw ASCII. And even then you really should make an explicit choice between case-sensitive and case-insensitive comparison. > When you want to be consistent, the behaviour of "cmp" affects all of the > other order-sensitive operations, including any working with intervals. Indeed, the range constructor and the cmp operator should have the same adverbs and share lexical pragmata. > So then, "a" cmp "ส้" is always defined, but users can change the > definition. I take the opposite approach; it's always undefined (read, unthrown exception) unless the user tells us how they want it treated. That can be a command-line switch if necessary. To paraphrase Dante, "the road to hell is paved with Reasonable Defaults". Or in programming terms, your reasonable default is the cause of my ugly work-around. -Martin
Re: Suggested magic for "a" .. "b"
Aaron Sherman wrote: > In the end, I'm now questioning the difference between a junction and > a Range... which is not where I thought this would go. Conceptually, they're closely related. In particular, a range behaves a lot like an any() junction. Some differences: 1. An any() junction always has a discrete set of options in it; but a Range could (and generally does) have a continuous set of options. 2. An any() junction can have an arbitrary set of options; a Range's set of options is defined entirely by its endpoints. -- Jonathan "Dataweaver" Lang
Re: Suggested magic for "a" .. "b"
On Fri, Jul 30, 2010 at 6:45 PM, Doug McNutt wrote: > Please pardon intrusion by a novice who is anything but object oriented. No problem. Sometimes a fresh perspective helps to illuminate things. Skipping ahead... > Are you guise sure that the "..." and ".." operators in perl 6 shouldn't make > use of regular expression syntax while deciding just what is intended by the > programmer? You kind of blew my mind, there. I tried to respond twice and each time I determined that there was a way around what I was about to call crazy. In the end, I'm now questioning the difference between a junction and a Range... which is not where I thought this would go. Good question, though I should point out that you could never reasonably listify a range constructed from a regex because "reversing" a regex like that immediately runs into some awful edge cases. Still, interesting stuff. -- Aaron Sherman Email or GTalk: a...@ajs.com http://www.ajs.com/~ajs
Re: Suggested magic for "a" .. "b"
Please pardon intrusion by a novice who is anything but object oriented. I consider myself a long time user of perl 5. I love it and it has completely replaced FORTRAN as my compiler of choice. "Programming Perl" is so dog-eared that I may need a replacement. I joined this list when I thought the <<...>> operators might allow for vector operations like cross product. dot product, curl, grad, and divergence. I was mistaken but was pleased that such things would be possible as add-ins to be created later. I have never used the ".." operator on perl 5, mostly because I can't understand it. I have actually wished for, in perl 5, an ability to create a list, really a unsorted set with an @theset kind of description that I could create with a regular expression. All ASCII strings that would match would become members of @theset. @theset = /\A2N\d\d\d\d\Z/; would make create a temporary array of transistors that have "2N", once military, designations. That list would become an input to some other code that would look for datasheets. Memory intensive but easy to understand. Are you guise sure that the "..." and ".." operators in perl 6 shouldn't make use of regular expression syntax while deciding just what is intended by the programmer? -- --> The best programming tool is a soldering iron <--
Re: Suggested magic for "a" .. "b"
On Thu, Jul 29, 2010 at 9:51 PM, Aaron Sherman wrote: > My only strongly held belief, here, is that you should not try to answer any > of these questions for the default range operator on > unadorned, context-less strings. For that case, you must do something that > makes sense for all Unicode codepoints in nearly all contexts. I find that both of limited use and the only sane possibility at the same time :-| Leon
Re: Suggested magic for "a" .. "b"
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 7/29/10 08:15 , Leon Timmermans wrote: > On Thu, Jul 29, 2010 at 3:24 AM, Darren Duncan > wrote: >> $foo ~~ $a..$b :QuuxNationality # just affects this one test > > I like that > >> $bar = 'hello' :QuuxNationality # applies anywhere the Str value is used > > What if you compare a QuuxNationality Str with a FooNationality Str? > That should blow up. Also it can lead to action at a distance. I don't > think that's the way to go. It's half right; the coding set should be part of the type. Explicit conversion is probably a good idea too. - -- brandon s. allbery [linux,solaris,freebsd,perl] allb...@kf8nh.com system administrator [openafs,heimdal,too many hats] allb...@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.10 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkxS09sACgkQIn7hlCsL25U69wCdFeqshkDQx24C6QT7Q7XlmF85 zmcAoK7969GXHUwhF9bZ+NPv8xy3qR5m =vFdg -END PGP SIGNATURE-
Re: Suggested magic for "a" .. "b"
On Wed, Jul 28, 2010 at 9:24 PM, Darren Duncan wrote: > Jon Lang wrote: > >> I don't know enough about Unicode to suggest how to solve this. >> >> Thankfully, I know little enough to take up the challenge ;-) > All I can >>> say is that my example above should never return a valid Range object >>> unless >>> there is a way I can specify my own ordering and I use it. >>> >> Please see my suggested approach way, way back at the start of all this. Use Unicode scripts, properties and codepoint sequences to produce a list of codepoints. Want something more meaningful than codepoints? Great, use an object that knows what you're asking for: EnglishDictword("apple") .. EnglishDictWord("orange") It's a very Perl way to approach a problem: provide the solution that meets the least common denominator need (return a range object that represents ranges based on the information we have) and then allow that same feature to be used in cases where the user has provided sufficient context to do something smarter. I don't think it makes sense to extend the length of strings under consideration by default. Obviously the above example would include "blackberry" because you've asked it to consider English dictionary words, but "aa" .. "zz" shouldn't contain "blackberry" because you don't have enough data to understand what's being asked for, and thus should fall back to treating strings as lists of codepoints (speaking of which do we define a behavior for (1,2,3) .. (4,5,6)? Right now, we consider (1,2,7) to be in that range, and I don't think that's a terribly useful result). > >> That actually says something: it says that we may want to reconsider >> the notion that all string values can be sorted. You're suggesting >> the possibility that "a" cmp "ส้" is, by default, undefined. >> > By default, I think it should by +1 because of the codepoint comparison. If you then tell Perl that you want that comparison done in a Thai context, then it's probably -1. The golden rule of Unicode is: never pretend you have more information than you do. > > I think that a general solution here is to accept that there may be more > than one valid way to sort some types, strings especially, and so > operators/routines that do sorting should be customizable in some way so > users can pick the behaviour they want. > And I think that this brings you back to what I was saying at the top of the thread which is that the most basic approach treats each codepoint as a collection of information and sorts on that information first and then the codepoint number itself. If that's not useful to you, tell Perl what you really wanted. > Some possible examples of customization: > > $foo ~~ $a..$b :QuuxNationality # just affects this one test > > $bar = 'hello' :QuuxNationality # applies anywhere the Str value is used > That's a bit too easy to read without thinking about the implications. I bring back my original example from long ago: "TOPIXコンポジット1500構成銘柄" which I shamelessly grabbed from a Tokyo Stock Exchange page. That one string, used in everyday text, contains Latin letters, Hiragana [I lied, there's no Hiragana], Katakana, Han or Kanji idiograms and Latin digits. Now call .succ on that sucker, I dare you, keeping in mind that there's no one "Japanese" script in Unicode. I think the only valid starting point without any contextual information is to essentially treat it as a sequence of codepoints (as if it were an array of integers) and do something marginally sane on that basis. Then you let the user provide you with hints. Yes, it's "Japanese language" but that doesn't tell you as much as you'd hope, since many of the rules come from the languages that Japanese is borrowing from, here. One answer is to break it down on script and major category property boundaries into "TOPIX" (Latin: the name of an index), "コンポジット" (Katakana: phonetically this is "konpozito" or "composite"), "1500" (Latin digits), and "構成銘柄" (Kanji ideographs: constituents). Now, treat each one of those as a separate sequence of codepoints and begin incrementing each sub-sequence in turn. You could also apply Japanese sorting rules to the successor method, but then you get into questions of what the Japanese sorting method is for Latin letters... probably a solved problem, but obscure enough that I'll bet there are edge cases that are NOT solvable just by knowing that the locale because they are finer grained (e.g. which Latin-using language does the word come from? What source language is most appropriate for the context? etc.) Maybe you throw an exception when you try to tell Perl that " TOPIXコンポジット1500構成銘柄" is a Japanese string... but then Perl is rejecting strings that are considered valid in some contexts within that language. My only strongly held belief, here, is that you should not try to answer any of these questions for the default range operator on unadorned, context-less strings. For that case, you must do something that makes sense for all
Re: Suggested magic for "a" .. "b"
On Thu, Jul 29, 2010 at 5:15 AM, Leon Timmermans wrote: > On Thu, Jul 29, 2010 at 3:24 AM, Darren Duncan > wrote: >> Some possible examples of customization: >> >> $foo ~~ $a..$b :QuuxNationality # just affects this one test > > I like that > >> $bar = 'hello' :QuuxNationality # applies anywhere the Str value is used >> > > What if you compare a QuuxNationality Str with a FooNationality Str? > That should blow up. Also it can lead to action at a distance. I don't > think that's the way to go. I think it's an elegant use of encapsulation- keeping a string's locale with the string. If the you want to compare two strings with different collations, either- $foo ~~ $a..$b :QuuxNationality # override the locales for this test or $foo ~~ $a..$b # Perl warns about conflict, and falls back to its default -y
Re: Suggested magic for "a" .. "b"
On Thu, Jul 29, 2010 at 3:24 AM, Darren Duncan wrote: > Some possible examples of customization: > > $foo ~~ $a..$b :QuuxNationality # just affects this one test I like that > $bar = 'hello' :QuuxNationality # applies anywhere the Str value is used > What if you compare a QuuxNationality Str with a FooNationality Str? That should blow up. Also it can lead to action at a distance. I don't think that's the way to go. Leon
Re: Suggested magic for "a" .. "b"
On Wed, Jul 28, 2010 at 10:35 PM, Brandon S Allbery KF8NH wrote: > On 7/28/10 8:07 PM, Michael Zedeler wrote: >> On 2010-07-29 01:39, Jon Lang wrote: >>> Aaron Sherman wrote: > In smart-match context, "a".."b" includes "aardvark". No one has yet explained to me why that makes sense. The continued use of ASCII examples, of course, doesn't help. Does "a" .. "b" include "æther"? This is where Germans and Swedes, for example, don't agree, but they're all using the same Latin code blocks. >>> This is definitely something for the Unicode crowd to look into. But >>> whatever solution you come up with, please make it compatible with the >>> notion that "aardvark".."apple" can be used to match any word in the >>> dictionary that comes between those two words. >> The key issue here is whethere there is a well defined and meaningful >> ordering of the characters in question. We keep discussing the nice >> examples, but how about "apple" .. "ส้ม"? > > I thought that was already disallowed by spec. As a range, it ought to work; it's only when you try to generate a list from it that you run into trouble, as the spec currently assumes that "z".succ eqv "aa". Anyway: whatever default algorithm we go with for resolving "cmp", I strongly recommend that we define the default .succ so that "$x lt $x.succ" is always true. -- Jonathan "Dataweaver" Lang
Re: Suggested magic for "a" .. "b"
On 7/28/10 8:07 PM, Michael Zedeler wrote: > On 2010-07-29 01:39, Jon Lang wrote: >> Aaron Sherman wrote: In smart-match context, "a".."b" includes "aardvark". >>> No one has yet explained to me why that makes sense. The continued >>> use of >>> ASCII examples, of course, doesn't help. Does "a" .. "b" include >>> "æther"? >>> This is where Germans and Swedes, for example, don't agree, but >>> they're all >>> using the same Latin code blocks. >> This is definitely something for the Unicode crowd to look into. But >> whatever solution you come up with, please make it compatible with the >> notion that "aardvark".."apple" can be used to match any word in the >> dictionary that comes between those two words. > The key issue here is whethere there is a well defined and meaningful > ordering of the characters in question. We keep discussing the nice > examples, but how about "apple" .. "ส้ม"? I thought that was already disallowed by spec.
Re: Suggested magic for "a" .. "b"
Jon Lang wrote: I don't know enough about Unicode to suggest how to solve this. All I can say is that my example above should never return a valid Range object unless there is a way I can specify my own ordering and I use it. That actually says something: it says that we may want to reconsider the notion that all string values can be sorted. You're suggesting the possibility that "a" cmp "ส้" is, by default, undefined. I think that a general solution here is to accept that there may be more than one valid way to sort some types, strings especially, and so operators/routines that do sorting should be customizable in some way so users can pick the behaviour they want. The customization could be applied at various levels, such as using an extra argument or trait for the operator/function that cares about ordering, or by using an extra attribute or trait for the types being sorted. In fact, this whole issue is very close in concept to the situations where you need to do equality/identity tests. With strings, identity tests can change answers depending on whether you are doing it on language-dependent or language-independent graphemes, and Perl 6 encodes that abstraction level as value metadata. When you want to be consistent, the behaviour of "cmp" affects all of the other order-sensitive operations, including any working with intervals. Some possible examples of customization: $foo ~~ $a..$b :QuuxNationality # just affects this one test $bar = 'hello' :QuuxNationality # applies anywhere the Str value is used Also, declaring a Str subtype or something. Of course, after all this, we still want some reasonable default. I suggest that for Str that aren't nationality-specific, the default ordering semantics are by whatever generic ordering Unicode defines, which might be by codepoint. And then for Str with nationality-specific grapheme abstractions, the default sorting can be whatever is the case for that nationality. And this is how it is except where users define some other order. So then, "a" cmp "ส้" is always defined, but users can change the definition. -- Darren Duncan
Re: Suggested magic for "a" .. "b"
On 2010-07-29 02:19, Jon Lang wrote: Michael Zedeler wrote: Jon Lang wrote: This is definitely something for the Unicode crowd to look into. But whatever solution you come up with, please make it compatible with the notion that "aardvark".."apple" can be used to match any word in the dictionary that comes between those two words. The key issue here is whether there is a well defined and meaningful ordering of the characters in question. We keep discussing the nice examples, but how about "apple" .. "ส้ม"? All I'm saying is: don't throw out the baby with the bathwater. Come up with an interim solution that handles the nice examples intuitively and the ugly examples poorly (or better, if you can manage that right out of the gate); then revise the model to improve the handling of the ugly examples as much as you can; but while you do so, make an effort to keep the nice examples working. I am sorry if what I write is understood as an argument against ranges of strings. I think I know too little about Unicode to be able to do anything but point at some issues, I belive we'll have to deal with. The solution is not obvious to me. I don't know enough about Unicode to suggest how to solve this. All I can say is that my example above should never return a valid Range object unless there is a way I can specify my own ordering and I use it. That actually says something: it says that we may want to reconsider the notion that all string values can be sorted. You're suggesting the possibility that "a" cmp "ส้" is, by default, undefined. Yes, but I am sure its due to my lack of understanding of Unicode. Regards, Michael.
Re: Suggested magic for "a" .. "b"
On Jul 28, 2010, at 1:27 PM, Mark J. Reed wrote: > On Wednesday, July 28, 2010, Jon Lang wrote: >> Keep it simple, folks! There are enough corner cases in Perl 6 as >> things stand; we don't need to be introducing more of them if we can >> help it. > > Can I get an Amen? Amen! > -- > Mark J. Reed +1. I'm agnostic ;> chris
Re: Suggested magic for "a" .. "b"
On Jul 28, 2010, at 1:37 PM, Mark J. Reed wrote: > On Wed, Jul 28, 2010 at 2:30 PM, Chris Fields wrote: >> On Jul 28, 2010, at 1:27 PM, Mark J. Reed wrote: >>> Can I get an Amen? Amen! >>> -- >>> Mark J. Reed >> >> +1. I'm agnostic ;> > > Militant? :) ( http://tinyurl.com/3xjgxnl ) > > Nothing inherently religious about "amen" (or me), but I'll accept > "+1" as synonymous. :) > > -- > Mark J. Reed Not militant, just trying to inject a bit of humor into the zombie thread that won't die. chris
Re: Suggested magic for "a" .. "b"
Michael Zedeler wrote: > Jon Lang wrote: >> This is definitely something for the Unicode crowd to look into. But >> whatever solution you come up with, please make it compatible with the >> notion that "aardvark".."apple" can be used to match any word in the >> dictionary that comes between those two words. > > The key issue here is whether there is a well defined and meaningful > ordering of the characters in question. We keep discussing the nice > examples, but how about "apple" .. "ส้ม"? All I'm saying is: don't throw out the baby with the bathwater. Come up with an interim solution that handles the nice examples intuitively and the ugly examples poorly (or better, if you can manage that right out of the gate); then revise the model to improve the handling of the ugly examples as much as you can; but while you do so, make an effort to keep the nice examples working. > I don't know enough about Unicode to suggest how to solve this. All I can > say is that my example above should never return a valid Range object unless > there is a way I can specify my own ordering and I use it. That actually says something: it says that we may want to reconsider the notion that all string values can be sorted. You're suggesting the possibility that "a" cmp "ส้" is, by default, undefined. There are some significant problems that arise if you do this. -- Jonathan "Dataweaver" Lang
Re: Suggested magic for "a" .. "b"
On 2010-07-29 01:39, Jon Lang wrote: Aaron Sherman wrote: In smart-match context, "a".."b" includes "aardvark". No one has yet explained to me why that makes sense. The continued use of ASCII examples, of course, doesn't help. Does "a" .. "b" include "æther"? This is where Germans and Swedes, for example, don't agree, but they're all using the same Latin code blocks. This is definitely something for the Unicode crowd to look into. But whatever solution you come up with, please make it compatible with the notion that "aardvark".."apple" can be used to match any word in the dictionary that comes between those two words. The key issue here is whethere there is a well defined and meaningful ordering of the characters in question. We keep discussing the nice examples, but how about "apple" .. "ส้ม"? I don't know enough about Unicode to suggest how to solve this. All I can say is that my example above should never return a valid Range object unless there is a way I can specify my own ordering and I use it. I've never accepted that the range between two strings of identical length should include strings of another length. That seems maximally non-intuitive (well, I suppose you could always return the last 100 words of Hamlet as an iterable IO object if you really wanted to confuse people), and makes string and integer ranges far too divergent. This is why I dislike the notion of the range operator being used to produce lists: the question of what values you'd get by iterating from one string value to another is _very_ different from the question of what string values qualify as being between the two. The more you use infix:<..> to produce lists, the more likely you are to conflate lists with ranges. I second the above. Ranges are all about comparing things. $x ~~ $a .. $b means "is $x between $a and $b?". The only broadly accepted comparison of strings is lexicographical comparison. To illustrate the point: wouldn't you find it odd if 2.01 wasn't in between 1.1 and 2.1? Really? Regards, Michael.
Re: Suggested magic for "a" .. "b"
On 2010-07-29 00:24, Dave Whipp wrote: Aaron Sherman wrote: On Wed, Jul 28, 2010 at 11:34 AM, Dave Whipp wrote: To squint at this slightly, in the context that we already have 0...1e10 as a sequence generator, perhaps the semantics of iterating a range should be unordered -- that is, for 0..10 -> $x { ... } is treated as for (0...10).pick(*) -> $x { ... } As others have pointed out, this has some problems. You can't implement 0..* that way, just for starters. I'd say that' a point in may favor: it demonstrates the integers and strings have similar problems. If you pick items from an infinite set then every item you pick will have an infinite number of digits/characters. In smart-match context, "a".."b" includes "aardvark". It follows that, unless you're filtering/shaping the sequence of generated items, then almost every element ("a".."b").Seq starts with an infinite number of "a"s. Consistent semantics would make "a".."b" very not-useful when used as a sequence: the user needs to say how they want to avoid the infinities. Similarly (0..1).Seq should most likely return Real numbers -- and thus (0..1).pick(*) can be approximated by (0..1).pick(*, :replace), which is much easier to implement. I agree that /in theory/ coercing from Range to Sequence, the new Sequence should produce every possible value in the Range, unless you specify an increment. You could argue that 0 and 1 in (0..1).Seq are Ints, resulting in the expansion 0, 1, but that would leave a door open for very nasty surprises. In practise, producing every possible value in a Range with over-countable items isn't useful and just opens the door for inexperienced programmers to make perl run out of memory without ever producing a warning, so I'd suggest that the conversion should fail unless an increment is specified. The general principle would be to avoid meaningless conversions, so (1 .. *).Seq > (1 .. *).pick should also just fail, but with finite endpoints, it could succeed. The question here is whether we should open for more parallelization at the cost of simplicity. I don't know. So either you define some arbitrary semantics (what those should be is, I think, the original topic of this thread) or else you punt (error message). An error message has the advantage that you can always do something useful, later. I second that just doing something arbitrary where no actual definition exists is a really bad idea. To be more specific, there should be no .succ or .pred methods on Rat, Str, Real, Complex and anything else that is over-countable. Trying to implement .succ on something like Str is most likely dwimmy to a very narrow set of applications, but will confuse everyone else. Just to illustrate my point, if we have .succ on Str, why not have it on Range or Seq? Let's just play with that idea for a second - what would a reasonable implementation of .succ on Range be? (1 .. 10).succ --?--> (1 .. 11) (1 .. 10).succ --?--> (2 .. 11) (1 .. 10).succ --?--> (1 .. 12) (1 .. 10).succ --?--> (10^ .. *) Even starting a discussion about which implementation of .succ for Range (above), Str, Rat or Real completely misses the point: there is no definition of this function for those domains. It is non-existent and trying to do something dwimmy is just confusing. As a sidenote, ++ and .succ should be treated as two different things (just like -- and .pred). ++ really means "add one" everywhere and can be kept as such, where .succ means "the next, smallest possible item". This means that we can keep ++ and -- for all numeric types. Coercing to Sequence from Range should by default use .succ on the LHS, whereas Seq could just use ++ semantics as often as desired. This would make Ranges completely consistent and provide a clear distinction between the two classes. Getting back to 10..0 Yes, I agree with Jon that this should be an empty range. I don't care what order you pick the elements from an empty range :). Either empty, the same as 0 .. 10 or throw an error (I like errors :). Regards, Michael.
Re: Suggested magic for "a" .. "b"
Aaron Sherman wrote: >> In smart-match context, "a".."b" includes "aardvark". > > > No one has yet explained to me why that makes sense. The continued use of > ASCII examples, of course, doesn't help. Does "a" .. "b" include "æther"? > This is where Germans and Swedes, for example, don't agree, but they're all > using the same Latin code blocks. This is definitely something for the Unicode crowd to look into. But whatever solution you come up with, please make it compatible with the notion that "aardvark".."apple" can be used to match any word in the dictionary that comes between those two words. > I've never accepted that the range between two strings of identical length > should include strings of another length. That seems maximally non-intuitive > (well, I suppose you could always return the last 100 words of Hamlet as an > iterable IO object if you really wanted to confuse people), and makes string > and integer ranges far too divergent. This is why I dislike the notion of the range operator being used to produce lists: the question of what values you'd get by iterating from one string value to another is _very_ different from the question of what string values qualify as being between the two. The more you use infix:<..> to produce lists, the more likely you are to conflate lists with ranges. -- Jonathan "Dataweaver" Lang
Re: Suggested magic for "a" .. "b"
Darren Duncan wrote: > Does "..." also come with the 4 variations of endpoint inclusion/exclusion? > > If not, then it should, as I'm sure many times one would want to do this, > say: > > for 0...^$n -> {...} You can toggle the inclusion/exclusion of the ending condition by choosing between "..." and "...^"; but the starting point is the starting point no matter what: there is neither "^..." nor "^...^". > In any event, I still think that the mnemonics of "..." (yadda-yadda-yadda) > are more appropriate to a generator, where it says "produce this and so on". > A ".." does not have that mnemonic and looks better for an interval. Well put. This++. -- Jonathan "Dataweaver" Lang
Re: Suggested magic for "a" .. "b"
On Wed, Jul 28, 2010 at 6:24 PM, Dave Whipp wrote: > Aaron Sherman wrote: > >> On Wed, Jul 28, 2010 at 11:34 AM, Dave Whipp >> wrote: >> >> To squint at this slightly, in the context that we already have 0...1e10 >>> as >>> a sequence generator, perhaps the semantics of iterating a range should >>> be >>> unordered -- that is, >>> >>> for 0..10 -> $x { ... } >>> >>> is treated as >>> >>> for (0...10).pick(*) -> $x { ... } >>> >>> >> As others have pointed out, this has some problems. You can't implement >> 0..* >> that way, just for starters. >> > > I'd say that' a point in may favor: it demonstrates the integers and > strings have similar problems. If you pick items from an infinite set then > every item you pick will have an infinite number of digits/characters. > So, if I understand you correctly, you're happy about the fact that iterating over and explicitly lazy range would immediately result in failure? Sorry, not following. > > In smart-match context, "a".."b" includes "aardvark". No one has yet explained to me why that makes sense. The continued use of ASCII examples, of course, doesn't help. Does "a" .. "b" include "æther"? This is where Germans and Swedes, for example, don't agree, but they're all using the same Latin code blocks. I don't think you can reasonably bring locale into this. I think it needs to be purely a codepoint-oriented operator. If you bring locale into it, then the argument for not including composing an modifying characters goes out the window, and you're stuck in what I believe Dante called "the Unicode circle." If you treat this as a codepoint-based operator then you get a very simple result: "a".."b" is the range between the codepoint for "a" and the codepoint for "b". "aa" .. "bb" is the range between a sequence of two codepoints and a sequence of two other code points, which you can define in a number of ways (we've discussed a few, here) which don't involve having to expand the sequences to three or more codepoints. I've never accepted that the range between two strings of identical length should include strings of another length. That seems maximally non-intuitive (well, I suppose you could always return the last 100 words of Hamlet as an iterable IO object if you really wanted to confuse people), and makes string and integer ranges far too divergent. > Then the whole question of reversibility is moot. >>> >> Really? I don't think it is. In fact, you've simply made the problem pop >> up >> everywhere, and guaranteed that .. must behave totally unlike any other >> iterator. >> > > %hash.keys has similarly unordered semantics. Unordered semantics and shuffled values aren't the same thing. The reason that hash keys are unordered is that we cannot guarantee that any given implementation will store entries in any given relation to the input. Ranges have a well defined ordering associated with the elements that fall within the range by virtue of the basic definition of a range (LHS <= * <= RHS). Hashes have no ordering associated with their keys (though one can be imposed, e.g. by sort). Therefore %hash.keys.reverse is, for most purposes, equivalent to > %hash.keys. Argh! No, that's entirely untrue. %hash.keys and %hash.keys.reverse had better be the same elements, but reversed for all hashes which remain unmodified between the first and second call. -- Aaron Sherman Email or GTalk: a...@ajs.com http://www.ajs.com/~ajs
Re: Suggested magic for "a" .. "b"
On Wed, Jul 28, 2010 at 6:24 PM, Dave Whipp wrote: > Aaron Sherman wrote: > >> On Wed, Jul 28, 2010 at 11:34 AM, Dave Whipp >> wrote: >> >> To squint at this slightly, in the context that we already have 0...1e10 >>> as >>> a sequence generator, perhaps the semantics of iterating a range should >>> be >>> unordered -- that is, >>> >>> for 0..10 -> $x { ... } >>> >>> is treated as >>> >>> for (0...10).pick(*) -> $x { ... } >>> >>> >> As others have pointed out, this has some problems. You can't implement >> 0..* >> that way, just for starters. >> > > I'd say that' a point in may favor: it demonstrates the integers and > strings have similar problems. If you pick items from an infinite set then > every item you pick will have an infinite number of digits/characters. > So, if I understand you correctly, you're happy about the fact that iterating over and explicitly lazy range would immediately result in failure? Sorry, not following. > > In smart-match context, "a".."b" includes "aardvark". No one has yet explained to me why that makes sense. The continued use of ASCII examples, of course, doesn't help. Does "a" .. "b" include "æther"? This is where Germans and Swedes, for example, don't agree, but they're all using the same Latin code blocks. I don't think you can reasonably bring locale into this. I think it needs to be purely a codepoint-oriented operator. If you bring locale into it, then the argument for not including composing an modifying characters goes out the window, and you're stuck in what I believe Dante called "the Unicode circle." If you treat this as a codepoint-based operator then you get a very simple result: "a".."b" is the range between the codepoint for "a" and the codepoint for "b". "aa" .. "bb" is the range between a sequence of two codepoints and a sequence of two other code points, which you can define in a number of ways (we've discussed a few, here) which don't involve having to expand the sequences to three or more codepoints. I've never accepted that the range between two strings of identical length should include strings of another length. That seems maximally non-intuitive (well, I suppose you could always return the last 100 words of Hamlet as an iterable IO object if you really wanted to confuse people), and makes string and integer ranges far too divergent. > Then the whole question of reversibility is moot. >>> >> Really? I don't think it is. In fact, you've simply made the problem pop >> up >> everywhere, and guaranteed that .. must behave totally unlike any other >> iterator. >> > > %hash.keys has similarly unordered semantics. Unordered semantics and shuffled values aren't the same thing. The reason that hash keys are unordered is that we cannot guarantee that any given implementation will store entries in any given relation to the input. Ranges have a well defined ordering associated with the elements that fall within the range by virtue of the basic definition of a range (LHS <= * <= RHS). Hashes have no ordering associated with their keys (though one can be imposed, e.g. by sort). Therefore %hash.keys.reverse is, for most purposes, equivalent to > %hash.keys. Argh! No, that's entirely untrue. %hash.keys and %hash.keys.reverse had better be the same elements, but reversed for all hashes which remain unmodified between the first and second call. -- Aaron Sherman Email or GTalk: a...@ajs.com http://www.ajs.com/~ajs
Re: Suggested magic for "a" .. "b"
Darren Duncan wrote: Dave Whipp wrote: Similarly (0..1).Seq should most likely return Real numbers No it shouldn't, because the endpoints are integers. If you want Real numbers, then say "0.0 .. 1.0" instead. -- Darren Duncan That would be inconsistent. $x ~~ 0..1 means 0 <= $x <= 1. The fact that the endpoints are integers does not imply the the range does not include non-integer reals. My argument is that iterating a range could be defined to give you a uniform distribution of values that would smart match true against that range -- and that such a definition would be just as reasonable as (and perhaps more general than) one that says that you get an incrementing ordered set of integers across that range.
Re: Suggested magic for "a" .. "b"
Dave Whipp wrote: Similarly (0..1).Seq should most likely return Real numbers No it shouldn't, because the endpoints are integers. If you want Real numbers, then say "0.0 .. 1.0" instead. -- Darren Duncan
Re: Suggested magic for "a" .. "b"
Aaron Sherman wrote: On Wed, Jul 28, 2010 at 11:34 AM, Dave Whipp wrote: To squint at this slightly, in the context that we already have 0...1e10 as a sequence generator, perhaps the semantics of iterating a range should be unordered -- that is, for 0..10 -> $x { ... } is treated as for (0...10).pick(*) -> $x { ... } As others have pointed out, this has some problems. You can't implement 0..* that way, just for starters. I'd say that' a point in may favor: it demonstrates the integers and strings have similar problems. If you pick items from an infinite set then every item you pick will have an infinite number of digits/characters. In smart-match context, "a".."b" includes "aardvark". It follows that, unless you're filtering/shaping the sequence of generated items, then almost every element ("a".."b").Seq starts with an infinite number of "a"s. Consistent semantics would make "a".."b" very not-useful when used as a sequence: the user needs to say how they want to avoid the infinities. Similarly (0..1).Seq should most likely return Real numbers -- and thus (0..1).pick(*) can be approximated by (0..1).pick(*, :replace), which is much easier to implement. So either you define some arbitrary semantics (what those should be is, I think, the original topic of this thread) or else you punt (error message). An error message has the advantage that you can always do something useful, later. Then the whole question of reversibility is moot. Really? I don't think it is. In fact, you've simply made the problem pop up everywhere, and guaranteed that .. must behave totally unlike any other iterator. %hash.keys has similarly unordered semantics. Therefore %hash.keys.reverse is, for most purposes, equivalent to %hash.keys. That is why I said the question of reversibility becomes moot if you define the collapse of a range to a sequence to be unordered. It also demonstrates precedent, so not "totally unlike any other". Even though it was only a semi-serious proposal, I seem to find myself defending it. So maybe I was serious, afterall. That argument for DWIM being ordered pretty much goes away once you tell people to use "..." for what they intended to mean. Getting back to 10..0 Yes, I agree with Jon that this should be an empty range. I don't care what order you pick the elements from an empty range :).
Re: Suggested magic for "a" .. "b"
Darren Duncan wrote: Aaron Sherman wrote: The more I look at this, the more I think ".." and "..." are reversed. I would rather that ".." stay with intervals and "..." with generators. Another thing to consider if one is looking at huffmanization is how often the versions that exclude endpoints would be used, such as "^..^". I would imagine that a sequence generator would also have this variability useful. Does "..." also come with the 4 variations of endpoint inclusion/exclusion? If not, then it should, as I'm sure many times one would want to do this, say: for 0...^$n -> {...} In any event, I still think that the mnemonics of "..." (yadda-yadda-yadda) are more appropriate to a generator, where it says "produce this and so on". A ".." does not have that mnemonic and looks better for an interval. -- Darren Duncan
Re: Suggested magic for "a" .. "b"
Aaron Sherman wrote: The more I look at this, the more I think ".." and "..." are reversed. ".." has a very specific and narrow usage (comparing ranges) and "..." is probably going to be the most broadly used operator in the language outside of quotes, commas and the basic, C-derived math and logic ops. Many (most?) loops will involve "...". Most array initializers will involve "...". Why are we not calling that ".."? Just because we defined ".." first, and it grandfathered its way in the door? Because it resembles the math op? These don't seem like good reasons. I would rather that ".." stay with intervals and "..." with generators. The mnemonics make more sense that way. Having ".." resemble the math op with the same meaning, intervals, is a good thing. Besides comparing ranges, an interval would also often be used for a membership test, eg "$a <= $x <= $b" would alternately be spelled "$x ~~ $a..$b" for example. I would imagine that the interval use would be more common than the generator use in some problem domains. -- Darren Duncan
Re: Suggested magic for "a" .. "b"
On Wed, Jul 28, 2010 at 11:29 PM, Aaron Sherman wrote: > The more I look at this, the more I think ".." and "..." are reversed. ".." > has a very specific and narrow usage (comparing ranges) and "..." is > probably going to be the most broadly used operator in the language outside > of quotes, commas and the basic, C-derived math and logic ops. Many (most?) > loops will involve "...". Most array initializers will involve "...". Why > are we not calling that ".."? Just because we defined ".." first, and it > grandfathered its way in the door? Because it resembles the math op? These > don't seem like good reasons. I was thinking the same. Switching them seems better from a huffmanization POV. Leon
Re: Suggested magic for "a" .. "b"
On Wed, Jul 28, 2010 at 2:29 PM, Aaron Sherman wrote: > > The more I look at this, the more I think ".." and "..." are reversed. ".." > has a very specific and narrow usage (comparing ranges) and "..." is > probably going to be the most broadly used operator in the language outside > of quotes, commas and the basic, C-derived math and logic ops. +1 Though it being the day before Rakudo *'s first release makes me think, "too late!" -y
Re: Suggested magic for "a" .. "b"
On Wed, Jul 28, 2010 at 11:34 AM, Dave Whipp wrote: > To squint at this slightly, in the context that we already have 0...1e10 as > a sequence generator, perhaps the semantics of iterating a range should be > unordered -- that is, > > for 0..10 -> $x { ... } > > is treated as > > for (0...10).pick(*) -> $x { ... } > As others have pointed out, this has some problems. You can't implement 0..* that way, just for starters. > Then the whole question of reversibility is moot. Really? I don't think it is. In fact, you've simply made the problem pop up everywhere, and guaranteed that .. must behave totally unlike any other iterator. Getting back to 10..0... The complexity of implementation argument doesn't really hold for me, as: (a..b).list = a>b ?? a,*.pred ... b !! a,*.succ ... b Is pretty darned simple and does not require that b implement anything more than it does under the current implementation. a, on the other hand, now has to (optionally, since throwing an exception is the alternative) implement one more method. The more I look at this, the more I think ".." and "..." are reversed. ".." has a very specific and narrow usage (comparing ranges) and "..." is probably going to be the most broadly used operator in the language outside of quotes, commas and the basic, C-derived math and logic ops. Many (most?) loops will involve "...". Most array initializers will involve "...". Why are we not calling that ".."? Just because we defined ".." first, and it grandfathered its way in the door? Because it resembles the math op? These don't seem like good reasons. -- Aaron Sherman Email or GTalk: a...@ajs.com http://www.ajs.com/~ajs
Re: Suggested magic for "a" .. "b"
Moritz Lenz wrote: I fear what Perl 6 needs is not to broaden the range of discussion even further, but to narrow it down to the essential points. Personal opinion only. OK, as a completely serious proposal, the semantics of "for 0..10 { ... }" should be for the compiler to complain "sorry, that's a perl5ism: in perl6, please use a C<...> or explicit coercion of the range to a sequence". (BTW, I thought a bit more about my previous suggestion: there is precedent in that %hash.keys is unordered -- so it's not entirely obvious that a default range coercion should be ordered)
Re: Suggested magic for "a" .. "b"
Dave Whipp wrote: > Moritz Lenz wrote: >> Dave Whipp wrote: >>>for 0..10 -> $x { ... } >>> is treated as >>>for (0...10).pick(*) -> $x { ... } >> >> Sorry, I have to ask. Are you serious? Really? > > Ah, to reply, or not to reply, to rhetorical sarcasm ... In this case, I > think I will: No sarcasm involved, just curiosity. > Was my specific proposal entirely serious: only in that it was an > attempt to broaden the box for the discussion of semantics of coercion > ranges. I fear what Perl 6 needs is not to broaden the range of discussion even further, but to narrow it down to the essential points. Personal opinion only. > Why do we assume that ranges iterate in .succ order -- or even that they > iterate as integers (and are finite). Why not iterate as a top-down > breadth-first generation of a Cantor set? That's easy: Principle of least surprise. Cheers. Moritz
Re: Suggested magic for "a" .. "b"
Moritz Lenz wrote: Dave Whipp wrote: for 0..10 -> $x { ... } is treated as for (0...10).pick(*) -> $x { ... } Sorry, I have to ask. Are you serious? Really? Ah, to reply, or not to reply, to rhetorical sarcasm ... In this case, I think I will: Was my specific proposal entirely serious: only in that it was an attempt to broaden the box for the discussion of semantics of coercion ranges. One of the banes of my life is to undo the sequential mindset that so many programmers have. I like to point out that "sequentialization is an optimization to make programs run faster on Von-Neumann architectures". Often, it's premature. Most of the time it doesn't matter (compilers, and even HW, can extract ILP), but every now and again it results in an unfortunate barrier in solution-space. Why do we assume that ranges iterate in .succ order -- or even that they iterate as integers (and are finite). Why not iterate as a top-down breadth-first generation of a Cantor set? etc. Does the language need to choose a default, or is it better require the programmer to state how they want to coerce the range to the seq. Ten years from now, we'll keep needing to refer questions to the .. Vs ... faq.
Re: Suggested magic for "a" .. "b"
On Wed, Jul 28, 2010 at 2:30 PM, Chris Fields wrote: > On Jul 28, 2010, at 1:27 PM, Mark J. Reed wrote: >> Can I get an Amen? Amen! >> -- >> Mark J. Reed > > +1. I'm agnostic ;> Militant? :) ( http://tinyurl.com/3xjgxnl ) Nothing inherently religious about "amen" (or me), but I'll accept "+1" as synonymous. :) -- Mark J. Reed
Re: Suggested magic for "a" .. "b"
On Wednesday, July 28, 2010, Jon Lang wrote: > Keep it simple, folks! There are enough corner cases in Perl 6 as > things stand; we don't need to be introducing more of them if we can > help it. Can I get an Amen? Amen! -- Mark J. Reed
Re: Suggested magic for "a" .. "b"
TSa wrote: > Swapping the endpoints could mean swapping inside test to outside > test. The only thing that is needed is to swap from && to ||: > > $a .. $b # means $a <= $_ && $_ <= $b if $a < $b > $b .. $a # means $b <= $_ || $_ <= $a if $a < $b This is the same sort of discontinuity of meaning that was causing problems with Perl 5's use of negative indices to count backward from the end of a list; there's a reason why Perl 6 now uses the [*-$a] notation for that sort of thing. Consider a code snippet where the programmer is given two values: one is a minimum value which must be reached; the other is a maximum value which must not be exceeded. In this example, the programmer does not know what the values are; for all he knows, the minimum threshold exceeds the maximum. As things stand, it's trivial to test whether or not your sample value is viable: if "$x ~~ $min .. $max", then you're golden: it doesn't matter what "$min cmp $max" is. With your change, I'd have to replace the above with something along the lines of: "if $min <= $max && $x ~~ $min .. $max { ... }" - because if $min > $max, the algorithm will accept values that are well below the minimum as well as values that are well above the maximum. Keep it simple, folks! There are enough corner cases in Perl 6 as things stand; we don't need to be introducing more of them if we can help it. -- Jonathan "Dataweaver" Lang
Re: Suggested magic for "a" .. "b"
> Swapping the endpoints could mean swapping inside test to outside > test. The only thing that is needed is to swap from && to ||: > > $a .. $b # means $a <= $_ && $_ <= $b if $a < $b > $b .. $a # means $b <= $_ || $_ <= $a if $a < $b I think that's what "not", "!" are for!
Re: Suggested magic for "a" .. "b"
On Wednesday, 28. July 2010 05:12:52 Michael Zedeler wrote: > Writing ($a .. $b).reverse doesn't make any sense if the result were a > new Range, since Ranges should then only be used for inclusion tests (so > swapping endpoints doesn't have any meaningful interpretation), but > applying .reverse could result in a coercion to Sequence. Swapping the endpoints could mean swapping inside test to outside test. The only thing that is needed is to swap from && to ||: $a .. $b # means $a <= $_ && $_ <= $b if $a < $b $b .. $a # means $b <= $_ || $_ <= $a if $a < $b Regards TSa. -- "The unavoidable price of reliability is simplicity" -- C.A.R. Hoare "Simplicity does not precede complexity, but follows it." -- A.J. Perlis 1 + 2 + 3 + 4 + ... = -1/12 -- Srinivasa Ramanujan
Re: Suggested magic for "a" .. "b"
yary wrote: > though would a parallel batch of an anonymous block be more naturally written > as > all(0...10) -> $x { ... } # Spawn 11 threads No, hyper for 0..10 -> $x { ... } # spawn as many threads # as the compiler thinks are reasonable I think one (already specced) syntax for the same thing is enough, especially considering that hyper operators also do the same job. Cheers, Moritz
Re: Suggested magic for "a" .. "b"
On Wed, Jul 28, 2010 at 8:34 AM, Dave Whipp wrote: > To squint at this slightly, in the context that we already have 0...1e10 as > a sequence generator, perhaps the semantics of iterating a range should be > unordered -- that is, > > for 0..10 -> $x { ... } > > is treated as > > for (0...10).pick(*) -> $x { ... } Makes me think about parallel operations. for 0...10 -> $x { ... } # 0 through 10 in order for 0..10 -> $x { ... } # Spawn 11 threads, $x=0 through 10 concurrently for 10..0 -> $x { ... } # A no-op for 10...0 -> $x { ... } # 10 down to 0 in order though would a parallel batch of an anonymous block be more naturally written as all(0...10) -> $x { ... } # Spawn 11 threads -y
Re: Suggested magic for "a" .. "b"
Dave Whipp wrote: > To squint at this slightly, in the context that we already have 0...1e10 > as a sequence generator, perhaps the semantics of iterating a range > should be unordered -- that is, > >for 0..10 -> $x { ... } > > is treated as > >for (0...10).pick(*) -> $x { ... } Sorry, I have to ask. Are you serious? Really? Cheers, Moritz
Re: Suggested magic for "a" .. "b"
Dave Whipp wrote: > To squint at this slightly, in the context that we already have 0...1e10 as > a sequence generator, perhaps the semantics of iterating a range should be > unordered -- that is, > > for 0..10 -> $x { ... } > > is treated as > > for (0...10).pick(*) -> $x { ... } > > Then the whole question of reversibility is moot. No thanks; I'd prefer it if $a..$b have analogous meanings in item and list contexts. As things stand, 10..1 means, in item context, "numbers that are greater or equal to ten and less than or equal to one", which is equivalent to "nothing"; in list context, it means "an empty list". This makes sense to me; having it provide a list containing the numbers 1 through 10 creates a conflict between the two contexts regardless of how they're arranged. As I see it, C< $a..$b > in list context is a useful shorthand for C< $a, *.succ ... $b >. You only get into trouble when you start trying to have infix:<..> do more than that in list context. If anything needs to be done with respect to infix:<..>, it lies in changing the community perception of the operator. The only reason why we're having this debate at all is that in Perl 5, the .. operator was used to generate lists; so programmers coming from Perl 5 start with the expectation that that's what it's for in Perl 6, too. That expectation needs to be corrected as quickly as can be managed, not catered to. But that's not a matter of language design; it's a matter to be addressed by whoever's going to be writing the Perl 6 tutorials. -- Jonathan "Dataweaver" Lang
Re: Suggested magic for "a" .. "b"
Michael Zedeler wrote: This is exactly why I keep writing posts about Ranges being defunct as they have been specified now. If we accept the premise that Ranges are supposed to define a kind of linear membership specification between two starting points (as in math), it doesn't make sense that the LHS has an additional constraint (having to provide a .succ method). All we should require is that both endpoints supports comparison (that they share a common type with comparison, at least). To squint at this slightly, in the context that we already have 0...1e10 as a sequence generator, perhaps the semantics of iterating a range should be unordered -- that is, for 0..10 -> $x { ... } is treated as for (0...10).pick(*) -> $x { ... } Then the whole question of reversibility is moot. Plus, there would then be useful distinction for serialization of C<..> Vs C<...>. (perhaps we should even parallelize) When you have two very similar operators it's often good to maximize the semantic distance between them so that people don't get into the lazy habit of using them without thinking.
Re: Suggested magic for "a" .. "b"
Michael Zedeler wrote: This is exactly why I keep writing posts about Ranges being defunct as they have been specified now. If we accept the premise that Ranges are supposed to define a kind of linear membership specification between two starting points (as in math), it doesn't make sense that the LHS has an additional constraint (having to provide a .succ method). All we should require is that both endpoints supports comparison (that they share a common type with comparison, at least). Yes, I agree 100%. All that should be required to construct a range "$foo..$bar" is that the endpoints are comparable, meaning "$foo cmp $bar" works. Having a .pred or .succ for $foo|$bar should not be required to define a range but only to use that range as a generator. -- Darren Duncan
Re: Suggested magic for "a" .. "b"
On 2010-07-28 06:54, Martin D Kealey wrote: On Wed, 28 Jul 2010, Michael Zedeler wrote: Writing for ($a .. $b).reverse -> $c { ...} may then blow up because it turns out that $b doesn't have a .succ method when coercing to sequence (where the LHS must have an initial value), just like for $a .. $b -> $c { ... } should be able to blow up because the LHS of a Range shouldn't have to support .succ. Presumably you'd only throw that except if, as well, $b doesn't support .pred ? Yes. It should be .pred. So ($a .. $b).reverse is only possible if $b.pred is defined and $a.gt is defined (and taking an object that has the type of $b.pred). If the coercion to Sequence is taking place first, we'll have to live with two additional constraints ($b.lt and $a.succ), but I guess it would be easy to overload .reverse and get rid of those. Regards, Michael.
Re: Suggested magic for "a" .. "b"
On 2010-07-27 23:50, Aaron Sherman wrote: PS: On a really abstract note, requiring that ($a .. $b).reverse be lazy will put new constraints on the right hand side parameter. Previously, it didn't have to have a value of its own, it just had to be comparable to other values. for example: for $a .. $b -> $c { ... } In that, we don't include the RHS in the output range explicitly. Instead, we increment a $a (via .succ) until it's>= $b. If $a were 1 and $b were an object that "does Int" but just implements the comparison features, and has no fixed numeric value, then it should still work (e.g. it could be random). Now that's not possible because we need to use the RHS a the starting point when .reverse is invoked. This is exactly why I keep writing posts about Ranges being defunct as they have been specified now. If we accept the premise that Ranges are supposed to define a kind of linear membership specification between two starting points (as in math), it doesn't make sense that the LHS has an additional constraint (having to provide a .succ method). All we should require is that both endpoints supports comparison (that they share a common type with comparison, at least). To provide expansion to lists, such as for $a .. $b -> $c { ... }, we should use type coercion semantics, coercing from Range to Sequence and throw an error if the LHS doesn't support .succ. Writing ($a .. $b).reverse doesn't make any sense if the result were a new Range, since Ranges should then only be used for inclusion tests (so swapping endpoints doesn't have any meaningful interpretation), but applying .reverse could result in a coercion to Sequence. Writing for ($a .. $b).reverse -> $c { ...} may then blow up because it turns out that $b doesn't have a .succ method when coercing to sequence (where the LHS must have an initial value), just like for $a .. $b -> $c { ... } should be able to blow up because the LHS of a Range shouldn't have to support .succ. Regards, Michael.
Re: Suggested magic for "a" .. "b"
Aaron Sherman wrote: > As a special case, perhaps you can treat ranges as special and not as simple > iterators. To be honest, I wasn't thinking about the possibility of such > special cases, but about iterators in general. You can't generically reverse > lazy constructs without running afoul of the halting problem, which I invite > you to solve at your leisure ;-) A really obvious example occurs when the RHS is a Whatever: (1..*).reverse; .reverse magic isn't going to be generically applicable to all lazy lists; but it can be applicable to all lazy lists that have predefined start points, end points, and bidirectional iterators, and on all lazy lists that have random-access iterators and some way of locating the tail. Sometimes you can guess what the endpoint and backward-iterator should be from the start point and the forward-iterator, just as the infix:<...> operator is able to guess what the forward-iterator should be from the first one, two, or three items in the list. This is especially a problem with regard to lists generated using the series operator, as it's possible to define a custom forward-iterator for it (but not, AFAICT, a custom reverse-iterator). In comparison, the simplicity of the range operator's list generation algorithm almost guarantees that as long as you know for certain what or where the last item is, you can lazily generate the list from its tail. But only almost: (1..3.5); # list context: 1, 2, 3 (1..3.5).reverse; # list context: 3.5, 2.5, 1.5 - assuming list is generated from tail. (1..3.5).reverse; # list context: 3, 2, 1 - but only if you generate it from the head first, and then reverse it. Again, the proper tool for list generation is the series operator, because it can do everything that the range operator can do in terms of list generation, and more. 1 ... 3.5 # same as 1, 2, 3 3.5 ... 1 # same as 3.5, 2.5, 1.5 - and obviously so. With this in mind, I see no reason to allow any magic on .reverse when dealing with the range operator (or the series operator, for that matter): as far as it's concerned, it's dealing with a list that lacks a reverse-iterator, and so it will _always_ generate the list from its head to its tail before attempting to reverse it. Maybe at some later point, after we get Perl 6.0 out the door, we can look into revising the series operator to permit more powerful iterators so as to allow .reverse and the like to bring more dwimmy magic to bear. -- Jonathan "Dataweaver" Lang
Re: Suggested magic for "a" .. "b"
Sorry I haven't responded for so long... much going on in my world. On Mon, Jul 26, 2010 at 11:35 AM, Nicholas Clark wrote: > On Tue, Jul 20, 2010 at 07:31:14PM -0400, Aaron Sherman wrote: > > > 2) We deny that a range whose LHS is "larger" than its RHS makes sense, > but > > we also don't provide an easy way to construct such ranges lazily > otherwise. > > This would be annoying only, but then we have declared that ranges are > the > > right way to construct basic loops (e.g. for (1..1e10).reverse -> $i > {...} > > which is not lazy (blows up your machine) and feels awfully clunky next > to > > for 1e10..1 -> $i {...} which would not blow up your machine, or even > make > > it break a sweat, if it worked) > > There is no reason why for (1..1e10).reverse -> $i {...} should *not* be > lazy. > > As a special case, perhaps you can treat ranges as special and not as simple iterators. To be honest, I wasn't thinking about the possibility of such special cases, but about iterators in general. You can't generically reverse lazy constructs without running afoul of the halting problem, which I invite you to solve at your leisure ;-) For example, let's just tie it to integer factorization to make it really obvious: # Generator for ranges of sequential, composite integers sub composites(Int $start) { gather do { for $start .. * -> $i { last if isprime($i); take $i; } } } for composites(10116471302318).reverse -> $i { say $i } The first value should be 10116471302380, but computing that without iterating through the list from start to finish would require knowing that none of the integers between 10116471302318 and 10116471302380, inclusive, are prime. Of course, the same problem exists for any iterator where the end condition or steps can't be easily pre-computed, but this makes it more obvious than most. That means that Range.reverse has to do something special that iterators in general can't be relied on to do. Does that introduce problems? Not big ones. I can definitely see people who are used to "for ($a .. $b).reverse -> ..." getting confused when "for @blah.reverse -> ..." blows up their machine, but avoiding that confusion might not be practical. PS: On a really abstract note, requiring that ($a .. $b).reverse be lazy will put new constraints on the right hand side parameter. Previously, it didn't have to have a value of its own, it just had to be comparable to other values. for example: for $a .. $b -> $c { ... } In that, we don't include the RHS in the output range explicitly. Instead, we increment a $a (via .succ) until it's >= $b. If $a were 1 and $b were an object that "does Int" but just implements the comparison features, and has no fixed numeric value, then it should still work (e.g. it could be random). Now that's not possible because we need to use the RHS a the starting point when .reverse is invoked. I have no idea if that matters, but it's important to be aware of when and where we constrain the interface rather than discovering it later. -- Aaron Sherman Email or GTalk: a...@ajs.com http://www.ajs.com/~ajs
Re: Suggested magic for "a" .. "b"
On Tue, Jul 20, 2010 at 07:31:14PM -0400, Aaron Sherman wrote: > 2) We deny that a range whose LHS is "larger" than its RHS makes sense, but > we also don't provide an easy way to construct such ranges lazily otherwise. > This would be annoying only, but then we have declared that ranges are the > right way to construct basic loops (e.g. for (1..1e10).reverse -> $i {...} > which is not lazy (blows up your machine) and feels awfully clunky next to > for 1e10..1 -> $i {...} which would not blow up your machine, or even make > it break a sweat, if it worked) There is no reason why for (1..1e10).reverse -> $i {...} should *not* be lazy. After all, Perl 5 now implements @b = reverse sort @a by directly sorting in reverse. Note how it's now an ex-reverse: $ perl -MO=Concise -e '@b = reverse sort @a' c <@> leave[1 ref] vKP/REFC ->(end) 1 <0> enter ->2 2 <;> nextstate(main 1 -e:1) v ->3 b <2> aassign[t6] vKS ->c -<1> ex-list lK ->8 3 <0> pushmark s ->4 - <1> ex-reverse lK/1 ->- 4 <0> pushmark s ->5 7 <@> sort lK/REV ->8 - <0> ex-pushmark s ->5 6 <1> rv2av[t4] lK/1 ->7 5<#> gv[*a] s ->6 -<1> ex-list lK ->b 8 <0> pushmark s ->9 a <1> rv2av[t2] lKRM*/1 ->b 9 <#> gv[*b] s ->a -e syntax OK Likewise foreach (reverse @a) {...} is implemented as a reverse iterator on the array, rather than a temporary list: $ perl -MO=Concise -e 'foreach(reverse @a) {}' d <@> leave[1 ref] vKP/REFC ->(end) 1 <0> enter ->2 2 <;> nextstate(main 2 -e:1) v ->3 c <2> leaveloop vK/2 ->d 7<{> enteriter(next->9 last->c redo->8) lKS/REVERSED ->a - <0> ex-pushmark s ->3 - <1> ex-list lKM ->6 3 <0> pushmark s ->4 - <1> ex-reverse lKM/1 ->6 - <0> ex-pushmark s ->4 5 <1> rv2av[t2] sKR/1 ->6 4<#> gv[*a] s ->5 6 <#> gv[*_] s ->7 -<1> null vK/1 ->c b <|> and(other->8) vK/1 ->c a <0> iter s/REVERSED ->b - <@> lineseq vK ->- 8 <0> stub v ->9 9 <0> unstack v ->a -e syntax OK If it's part of the specification that (1..1e10).reverse is to be implemented lazily, I'd (personally) consider that an easy enough way to construct a lazy range. This doesn't answer any of your other questions about what ranges of character strings should mean. I don't really have an opinion, other than it needs to be simple enough to be teachable. Nicholas Clark
Re: Suggested magic for "a" .. "b"
On Wed, Jul 21, 2010 at 3:55 PM, Darren Duncan wrote: > Larry Wall wrote: >> >> On Tue, Jul 20, 2010 at 11:53:27PM -0400, Mark J. Reed wrote: >> : In particular, consider that pi ~~ 0..4 is true, >> : because pi is within the range; but pi ~~ 0...4 is false, because pi >> : is not one of the generated elements. >> >> Small point here, it's not because pi is fractional: 3 ~~ 0...4 is >> also false because 3 !eqv (0,1,2,3,4). There is no implicit any() >> on a smartmatch list pattern as there is in Perl 5. In Perl 6 the >> pattern 0..4 may only match a list with the same 5 elements in the >> same order. > > For some reason I thought smart match in Perl 6, when presented with some > collection on the right-hand side, would test if the value on the left-hand > side was contained in the collection. That was my thought as well. > Similarly, since a range represents a set of all values between 2 endpoints, > I might have thought this would be reasonable: > > 3 ~~ 1..5 # TRUE AIUI, that is indeed correct. Ranges smartmatch by testing for inclusion in the range. But collections don't smartmatch by testing for inclusion in the collection. Which was probably the subject of a thread I missed somewhere... For series, I think the canonical solution is to use any(). -- Mark J. Reed
Re: Suggested magic for "a" .. "b"
Larry Wall wrote: On Tue, Jul 20, 2010 at 11:53:27PM -0400, Mark J. Reed wrote: : In particular, consider that pi ~~ 0..4 is true, : because pi is within the range; but pi ~~ 0...4 is false, because pi : is not one of the generated elements. Small point here, it's not because pi is fractional: 3 ~~ 0...4 is also false because 3 !eqv (0,1,2,3,4). There is no implicit any() on a smartmatch list pattern as there is in Perl 5. In Perl 6 the pattern 0..4 may only match a list with the same 5 elements in the same order. For some reason I thought smart match in Perl 6, when presented with some collection on the right-hand side, would test if the value on the left-hand side was contained in the collection. So, for example: my @ary = (1,4,3,2,9); my $test = 3; $test ~~ @ary; # TRUE Similarly, since a range represents a set of all values between 2 endpoints, I might have thought this would be reasonable: 3 ~~ 1..5 # TRUE So if that doesn't work, then what is the canonical way to ask if a value is in a range? Would any of these be reasonable? 3 ~~ any(1..5) 3 in 1..5 3 ∈ 1..5 # Unicode alternative -- Darren Duncan
Re: Suggested magic for "a" .. "b"
On Wed, Jul 21, 2010 at 9:46 AM, Aaron Crane wrote: > > > I think that "Ā" .. "Ē" should ĀĂĄĆĈĊČĎĐĒ > > If that's in the hope of producing a more "intuitive" result, then why > not ĀB̄C̄D̄Ē? > > That's only partly serious. I'm acutely aware that choosing a baroque > set of rules makes life harder for both implementers and users (and, > in particular, risks ending up with an operator that has no practical > non-trivial use cases). > Well... actually, I got to thinking (which is not my natural state) and I think we need two approaches. I don't know if they're two operators, a pragma or what, but there are definitely two things people want: - "x".succ_uni yields "x".ord incremented until the resulting codepoint "agrees" with "x". By agrees, I mean that it shares the same script and general category properties (major/minor). This is an important tool because it's universal. - "x".succ_loc yields the next character after "x" in the current locale. What convinced me that this is a peer to the above was when I thought about Japanese, where only a subset of the CJK ideographs are valid Japanese. You really need an index and collation for these that is outside of the basic Unicode properties. So yes, if there's a locale in which ĀB̄C̄D̄Ē is the correct ordering, then I do think that there should be some "Ā" .. "Ē" equivalent that yields the above in that context. But, I'm not convinced it should be the default. > I note also that this A-macron and E-macron are in NFC. I think that, > certainly by default, the difference between NFC and NFD should be > hidden from users. That implies that, however "Ā" .. "Ē" behaves, the > NFD version should behave identically; and that "B̄" .. F̄ should > behave in the most equivalent way possible. > As I've said previously, I'm only discussing single "characters" which I'm defining as single codepoints which are neither combining nor modifying. If you like, we can have the conversation about what you do when you encounter combining and modifying codepoints, and I do think I agree with you largely, but I'd like to hold that for now. It's just too much of a rat-hole at this point. -- Aaron Sherman Email or GTalk: a...@ajs.com http://www.ajs.com/~ajs
Re: Suggested magic for "a" .. "b"
Aaron Sherman wrote: > There's just an undefined codepoint smack in the middle of the Greek > uppercase letters (U+03A2). I'm sure the Unicode specs have a rationale for > that somewhere, but my guess is that there's some thousand-year-old debate > about the Greek alphabet behind it. It becomes clearer if you also look at the corresponding lower-case characters: U+03A1 Greek capital letter rho U+03A2 (none) U+03A3 Greek capital letter sigma U+03C1 Greek small letter rho U+03C2 Greek small letter final sigma U+03C3 Greek small letter sigma Greek words written in lower-case that end in a sigma use a special glyph for that sigma; and Unicode allocates a codepoint to it for roundtripping to legacy character sets. There isn't a corresponding upper-case final sigma. Unicode leaves the gap in the upper-case Greek range for neatness, effectively: adding 0x20 to the numeric value of an upper-case character yields the corresponding lower-case version. > I think that "Ā" .. "Ē" should ĀĂĄĆĈĊČĎĐĒ If that's in the hope of producing a more "intuitive" result, then why not ĀB̄C̄D̄Ē? That's only partly serious. I'm acutely aware that choosing a baroque set of rules makes life harder for both implementers and users (and, in particular, risks ending up with an operator that has no practical non-trivial use cases). I note also that this A-macron and E-macron are in NFC. I think that, certainly by default, the difference between NFC and NFD should be hidden from users. That implies that, however "Ā" .. "Ē" behaves, the NFD version should behave identically; and that "B̄" .. F̄ should behave in the most equivalent way possible. -- Aaron Crane ** http://aaroncrane.co.uk/
Re: Suggested magic for "a" .. "b"
On Wed, Jul 21, 2010 at 1:28 AM, Aaron Sherman wrote: > > For reference, this is the relevant section of the spec: > > Character positions are incremented within their natural range for any > Unicode range that is deemed to represent the digits 0..9 or that is deemed > to be a complete cyclical alphabet for (one case of) a (Unicode) script. > Only scripts that represent their alphabet in codepoints that form a cycle > independent of other alphabets may be so used. (This specification defers to > the users of such a script for determining the proper cycle of letters.) We > arbitrarily define the ASCII alphabet not to intersect with other scripts > that make use of characters in that range, but alphabets that intersperse > ASCII letters are not allowed. > > > I'm not sure that all of that tracks with the Unicode standard's use of > some of the terms, but based on what we've discussed, perhaps we could get > more specific there: > > Character positions are incremented within their Unicode Script, but only > in keeping with their General Category property. Thus C<"A"++> yields C<"B"> > which is the next codepoint, but C<"Ă"++> yields C<"Ą"> even though "ą" > falls between the two, when incrementing codepoints. Should this prove > problematic for any specific Unicode Script which requires special handling > (e.g. because a "letter" really isn't used as a letter at all), such special > handling may be applied, but the above is the general rule. > > Oh, so close! I realized that I broke the original spec, here. We need to add back in: There are two special cases: the ASCII-compatible lower-case letters (a-z) and the ASCII-compatible upper-case letters (A-Z). For historical reasons, these, by default, will not increment past the end of their ranges into the higher-codepoint Latin characters. Note: we might want a pragma for that as well. I'd suggest that perhaps it should be a locale-specific feature? So, if you set your locale to fr, then you include in those ranges all of the Latin characters used in French. -- Aaron Sherman Email or GTalk: a...@ajs.com http://www.ajs.com/~ajs
Re: Suggested magic for "a" .. "b"
On Wed, Jul 21, 2010 at 09:23:11AM -0400, Mark J. Reed wrote: : Strike the "counter to current Rakudo behavior" bit; Rakudo is : behaving as specified in this instance. I must have been : hallucinating. Well, except that we both neglected precedence. Since ... is looser than ~~, it must be written 3 ~~ (0...4). :-) Larry
Re: Suggested magic for "a" .. "b"
Strike the "counter to current Rakudo behavior" bit; Rakudo is behaving as specified in this instance. I must have been hallucinating. On Wed, Jul 21, 2010 at 7:33 AM, Mark J. Reed wrote: > Ok, I find that surprising (and counter to current Rakudo behavior), > but thanks for the correction, and sorry about the misinformation. > > On Wednesday, July 21, 2010, Larry Wall wrote: >> On Tue, Jul 20, 2010 at 11:53:27PM -0400, Mark J. Reed wrote: >> : In particular, consider that pi ~~ 0..4 is true, >> : because pi is within the range; but pi ~~ 0...4 is false, because pi >> : is not one of the generated elements. >> >> Small point here, it's not because pi is fractional: 3 ~~ 0...4 is >> also false because 3 !eqv (0,1,2,3,4). There is no implicit any() >> on a smartmatch list pattern as there is in Perl 5. In Perl 6 the >> pattern 0..4 may only match a list with the same 5 elements in the >> same order. >> >> Larry >> > > -- > Mark J. Reed > -- Mark J. Reed
Re: Suggested magic for "a" .. "b"
Ok, I find that surprising (and counter to current Rakudo behavior), but thanks for the correction, and sorry about the misinformation. On Wednesday, July 21, 2010, Larry Wall wrote: > On Tue, Jul 20, 2010 at 11:53:27PM -0400, Mark J. Reed wrote: > : In particular, consider that pi ~~ 0..4 is true, > : because pi is within the range; but pi ~~ 0...4 is false, because pi > : is not one of the generated elements. > > Small point here, it's not because pi is fractional: 3 ~~ 0...4 is > also false because 3 !eqv (0,1,2,3,4). There is no implicit any() > on a smartmatch list pattern as there is in Perl 5. In Perl 6 the > pattern 0..4 may only match a list with the same 5 elements in the > same order. > > Larry > -- Mark J. Reed
Re: Suggested magic for "a" .. "b"
Smylers wrote: > Jon Lang writes: >> Approaching this with the notion firmly in mind that infix:<..> is >> supposed to be used for matching ranges while infix:<...> should be >> used to generate series: >> >> With series, we want C< $LHS ... $RHS > to generate a list of items >> starting with $LHS and ending with $RHS. If $RHS > $LHS, we want it >> to increment one step at a time; if $RHS < $LHS, we want it to >> decrement one step at a time. > > Do we? Yes, we do. > I'm used to generating lists and iterating over them (in Perl 5) > with things like like: > > for (1 .. $max) > > where the intention is that if $max is zero, the loop doesn't execute at > all. Having the equivalent Perl 6 list generation operator, C<...>, > start counting backwards could be confusing. > > Especially if Perl 6 also has a range operator, C<..>, which would Do > The Right Thing for me in this situation, and where the Perl 6 operator > that Does The Right Thing is spelt the same as the Perl 5 operator that > I'm used to; that muddles the distinction you make above about matching > ranges versus generating lists. It does muddy the difference, which is why my own gut instinct would have been to do away with infix:<..>'s ability to generate lists. Fortunately, I'm not in charge here, and wiser heads than mine have decreed that infix:<..>, when used in list context, will indeed generate a list in a manner that closely resembles Perl 5's range operator: start with the LHS, then increment until you equal or exceed the RHS - and if you start out exceeding the RHS, you've got yourself an empty list. You can do the same thing with the infix:<...> operator, too; but doing so will be bulkier (albeit much more intuitive). For example, the preferred Perl 6 approach to what you described would be: for 1, 2 ... $x The two-element list on the left of the series operator invokes a bit of magic that tells it that the algorithm for generating the next step in the series is to invoke the increment operator. This is all described in S03 in considerable detail; I suggest rereading the section there concerning the series operator before passing judgment on it. . -- Jonathan "Dataweaver" Lang
Re: Suggested magic for "a" .. "b"
On Tue, Jul 20, 2010 at 11:53:27PM -0400, Mark J. Reed wrote: : In particular, consider that pi ~~ 0..4 is true, : because pi is within the range; but pi ~~ 0...4 is false, because pi : is not one of the generated elements. Small point here, it's not because pi is fractional: 3 ~~ 0...4 is also false because 3 !eqv (0,1,2,3,4). There is no implicit any() on a smartmatch list pattern as there is in Perl 5. In Perl 6 the pattern 0..4 may only match a list with the same 5 elements in the same order. Larry
Re: Suggested magic for "a" .. "b"
Jon Lang writes: > Approaching this with the notion firmly in mind that infix:<..> is > supposed to be used for matching ranges while infix:<...> should be > used to generate series: > > With series, we want C< $LHS ... $RHS > to generate a list of items > starting with $LHS and ending with $RHS. If $RHS > $LHS, we want it > to increment one step at a time; if $RHS < $LHS, we want it to > decrement one step at a time. Do we? I'm used to generating lists and iterating over them (in Perl 5) with things like like: for (1 .. $max) where the intention is that if $max is zero, the loop doesn't execute at all. Having the equivalent Perl 6 list generation operator, C<...>, start counting backwards could be confusing. Especially if Perl 6 also has a range operator, C<..>, which would Do The Right Thing for me in this situation, and where the Perl 6 operator that Does The Right Thing is spelt the same as the Perl 5 operator that I'm used to; that muddles the distinction you make above about matching ranges versus generating lists. Smylers -- http://twitter.com/Smylers2
Re: Suggested magic for "a" .. "b"
Darren Duncan wrote: specific, the generic "eqv" operator, or "before" etc would have to be Correction, I meant to say "cmp", not "eqv", here. -- Darren Duncan
Re: Suggested magic for "a" .. "b"
Aaron Sherman wrote: 2) The spec doesn't put this information anywhere near the definition of the range operator. Perhaps we can make a note? This was a source of confusion for me. My impression is that a "Range" primarily defines an "interval" in terms of 2 endpoint values such that it defines a possibly infinite set values between those endpoints. For example, 'aa'..'bb' is an infinite sized set that includes every possible character string that starts with the letter 'a', plus every one that starts with the string 'ba'. And so, asking $anysuchstring ~~ 'aa'..'bb' is TRUE. (Note that for ".." to work, its 2 arguments would need to be of the same type, so that we know which set of rules to follow. Or to be specific, the generic "eqv" operator, or "before" etc would have to be defined that takes both of the ".." arguments as its arguments. Although this might be fuzzed a bit if the spec defines somewhere about automatic casting. For example, if someone said 'foo'..42 then I would expect that to fail.) A "Range" can also be used in a limited fashion to generate a finite list of values, but that is not its primary purpose, and the "..." operator does that job much better. 3) It seems that there are two competing multi-character approaches and both seem somewhat valid. Should we use a pragma to toggle behavior between A and B: A: "aa" .. "bb" contains "az" B: "aa" .. "bb" contains ONLY "aa", "ab", "ba" and "bb" I would find A to be the only reasonable answer. If you want B's semantics then use "..." instead; ".." should not be overloaded for that. If there were to be any similar pragma, then it should control matters like "collation", or what nationality/etc-specific subtype of Str the 'aa' and 'bb' are blessed into on definition, so that their collation/sorting/etc rules can be applied when figuring out if a particular $foo~~$bar..$baz is TRUE or not. -- Darren Duncan
Re: Suggested magic for "a" .. "b"
OK, there's a lot here and my head is swimming, so let me re-consolidate and re-state (BTW: thanks Jon, you've really helped me understand, here). 1) The spec is somewhat vague, but the proposal that I made for single characters is not an unreasonable interpretation of what's there. Thus, we could adopt the script/major cat/minor cat triplet as the core tool that .succ will use for single, non-combining, non-modifying, valid characters? 2) The spec doesn't put this information anywhere near the definition of the range operator. Perhaps we can make a note? This was a source of confusion for me. 3) It seems that there are two competing multi-character approaches and both seem somewhat valid. Should we use a pragma to toggle behavior between A and B: A: "aa" .. "bb" contains "az" B: "aa" .. "bb" contains ONLY "aa", "ab", "ba" and "bb" 4) About the ranges I gave as examples, you asked: "Which codepoint is invalid, and why?" There's just an undefined codepoint smack in the middle of the Greek uppercase letters (U+03A2). I'm sure the Unicode specs have a rationale for that somewhere, but my guess is that there's some thousand-year-old debate about the Greek alphabet behind it. "In both of these cases, what do you think it should produce?" I actually gave that answer a bit later on. I think that "Ā" .. "Ē" should produce ĀĂĄĆĈĊČĎĐĒ and オ .. ヺ should produce オカガキギクグケゲコゴサザシジスズセゼソゾタダチヂツヅテデトドナニヌネノハバパヒビピフブプヘベペホボポマミムメモヤユヨラリルレロワヰヱヲンヴヷヸヹヺ which are all of the Katakana syllabic characters. "I also have to wonder how or if "0" ... "z" ought to be resolved. If you're thinking in terms of the alphabet or digits, this is nonsensical" Well, since you agreed with my statement about the properties checking, it would be 0 through 9 and then a through z because 0 through 9 are Latin numbers, matching the LHS's properties and a through z are lowercase Latin letters, matching the RHS's properties. For reference, this is the relevant section of the spec: Character positions are incremented within their natural range for any Unicode range that is deemed to represent the digits 0..9 or that is deemed to be a complete cyclical alphabet for (one case of) a (Unicode) script. Only scripts that represent their alphabet in codepoints that form a cycle independent of other alphabets may be so used. (This specification defers to the users of such a script for determining the proper cycle of letters.) We arbitrarily define the ASCII alphabet not to intersect with other scripts that make use of characters in that range, but alphabets that intersperse ASCII letters are not allowed. I'm not sure that all of that tracks with the Unicode standard's use of some of the terms, but based on what we've discussed, perhaps we could get more specific there: Character positions are incremented within their Unicode Script, but only in keeping with their General Category property. Thus C<"A"++> yields C<"B"> which is the next codepoint, but C<"Ă"++> yields C<"Ą"> even though "ą" falls between the two, when incrementing codepoints. Should this prove problematic for any specific Unicode Script which requires special handling (e.g. because a "letter" really isn't used as a letter at all), such special handling may be applied, but the above is the general rule. and then in the section on ranges: As discussed previously, incrementing a character (which is to say, invoking C<.succ>) seeks the next codepoint with the same Unicode Script and General Category properties (major and minor category to be specific). For ranges, succession is the same if .min and .max have the same properties, but if they do not, then all codepoints are considered which are greater than C<.min> and smaller than C<.max> and which agree with either the properties of C<.min> I the properties of C<.max>
Re: Suggested magic for "a" .. "b"
On Wed, Jul 21, 2010 at 12:04 AM, Jon Lang wrote: > Mark J. Reed wrote: >> Perhaps the syllabic kana could be the "integer" analogs, and what you >> get when you iterate over the range using ..., while the modifier kana >> would not be generated by the series ア ... ヴ but would be considered >> in the range ア .. ヴ? I wouldn't object to such script-specific >> behavior, though perhaps it doesn't belong in core. > > As I understand it, it wouldn't need to be script-specific behavior; > just behavior that's aware of Unicode properties. That wouldn't help in this case. For example, U+30A1 KATAKANA SMALL LETTER A - the small "modifier" variety of letter under discussion - is not a modifier in the Unicode sense. It has exactly the same properties as U+30A2 KATAKANA LETTER A, an actual syllable: 30A1;KATAKANA LETTER SMALL A;Lo;0;L;N; 30A2;KATAKANA LETTER A;Lo;0;L;N; So without script-specific special-case code, there's no way to distinguish them. As Aaron said, they're treated like lowercase, but that's not an accurate representation of how they're used in actual text, or of the common idea of what constitutes the set of kana. -- Mark J. Reed
Re: Suggested magic for "a" .. "b"
Mark J. Reed wrote: > Perhaps the syllabic kana could be the "integer" analogs, and what you > get when you iterate over the range using ..., while the modifier kana > would not be generated by the series ア ... ヴ but would be considered > in the range ア .. ヴ? I wouldn't object to such script-specific > behavior, though perhaps it doesn't belong in core. As I understand it, it wouldn't need to be script-specific behavior; just behavior that's aware of Unicode properties. That particular issue doesn't come up with the English alphabet because there aren't any modifier codepoints embedded in the middle of the standard alphabet. And if there were, I'd hope that they'd be filtered out from the series generation by default. And I'd hope that there would be a way to turn the default filtering off when I don't want it. -- Jonathan "Dataweaver" Lang
Re: Suggested magic for "a" .. "b"
On Tue, Jul 20, 2010 at 11:28 PM, Aaron Sherman wrote: > So, what's the intention of the range operator, then? ... is a generator that lazily enumerates a series. .. is a constructor for a Range object. They're two different things, with different behaviors. In particular, consider that pi ~~ 0..4 is true, because pi is within the range; but pi ~~ 0...4 is false, because pi is not one of the generated elements. > I guess you could write: > > ア, イ, ウ, エ, オ, カ ... ヂ,ツ ...モ,ヤ, ユ, ヨ ... ロ, ワ ... ヴ (add quotes to taste) > > But that seems quite a bit more painful than: Perhaps the syllabic kana could be the "integer" analogs, and what you get when you iterate over the range using ..., while the modifier kana would not be generated by the series ア ... ヴ but would be considered in the range ア .. ヴ? I wouldn't object to such script-specific behavior, though perhaps it doesn't belong in core. -- Mark J. Reed
Re: Suggested magic for "a" .. "b"
Aaron Sherman wrote: > So, what's the intention of the range operator, then? Is it just there to > offer backward compatibility with Perl 5? Is it a vestige that should be > removed so that we can Huffman ... down to ..? > > I'm not trying to be difficult, here, I just never knew that ... could > operate on a single item as LHS, and if it can, then .. seems to be obsolete > and holding some prime operator real estate. On the contrary: it is not a vestige, it is not obsolete, and it's making good use of the prime operator real estate that it's holding. It's just not doing what it did in Perl 5. I strongly recommend that you reread S03 to find out exactly what each of these operators does these days. >> The questions definitely look different that way: for example, >> ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz is easily and >> clearly expressed as >> >> 'A' ... 'Z', 'a' ... 'z' # don't think this works in Rakudo yet :( >> > > I still contend that this is so frequently desirable that it should have a > simpler form, but it's still going to have problems. > > One example: for expressing "Katakana letters" (I use "letters" in the > Unicode sense, here) it's still dicey. There are things interspersed in the > Unicode sequence for Katakana that aren't the same thing at all. Unicode > calls them lowercase, but that's not quite right. They're smaller versions > of Katakana characters which are used more as punctuation or accents than as > syllabic glyphs the way the rest of Katakana is. > > I guess you could write: > > ア, イ, ウ, エ, オ, カ ... ヂ,ツ ...モ,ヤ, ユ, ヨ ... ロ, ワ ... ヴ (add quotes to taste) > > But that seems quite a bit more painful than: > > ア .. ヴ (or ... if you prefer) > > Similar problems exist for many scripts (including some of Latin, we're just > used to the parts that are odd), though I think it's possible that Katakana > may be the worst because of the mis-use of Ll to indicate a letter when the > truth of the matter is far more complicated. Some of this might be addressed by filtering the list as you go - though I don't remember the method for doing so. Something like .grep, I think, with a regex in it that only accepts letters: (ア ... ヴ).«grep(/<:alpha:>/) ...or something to that effect. Still, it's possible that we might need something that's more flexible than that. -- Jonathan "Dataweaver" Lang
Re: Suggested magic for "a" .. "b"
Approaching this with the notion firmly in mind that infix:<..> is supposed to be used for matching ranges while infix:<...> should be used to generate series: Aaron Sherman wrote: > Walk with me a bit, and let's explore the concept of intuitive character > ranges? This was my suggestion, which seems pretty basic to me: > > "x .. y", for all strings x and y, which are composed of a single, valid > codepoint which is neither combining nor modifying, yields the range of all > valid, non-combining/modifying codepoints between x and y, inclusive which > share the Unicode script, general category major property and general > category minor property of either x or y (lack of a minor property is a > valid value). This is indeed true for both range-matching and series-generation as the spec is currently written. > In general we have four problems with current specification and > implementation on the Perl 6 and Perl 5 sides: > > 1) Perl 5 and Rakudo have a fundamental difference of opinion about what > some ranges produce ("A" .. "z", "X" .. "T", etc) and yet we've never really > articulated why we want that. > > 2) We deny that a range whose LHS is "larger" than its RHS makes sense, but > we also don't provide an easy way to construct such ranges lazily otherwise. > This would be annoying only, but then we have declared that ranges are the > right way to construct basic loops (e.g. for (1..1e10).reverse -> $i {...} > which is not lazy (blows up your machine) and feels awfully clunky next to > for 1e10..1 -> $i {...} which would not blow up your machine, or even make > it break a sweat, if it worked) With ranges, we want C< when $LHS .. $RHS" > to always mean C<< if $LHS <= $_ <= $RHS >>. If $RHS < $LHS, then the range being specified is not valid. In this context, it makes perfect sense to me why it doesn't generate anything. With series, we want C< $LHS ... $RHS > to generate a list of items starting with $LHS and ending with $RHS. If $RHS > $LHS, we want it to increment one step at a time; if $RHS < $LHS, we want it to decrement one step at a time. So: 1) we want different behavior from the Range operator in Perl 6 vs. Perl 5 because we have completely re-envisioned the range operator. What we have replaced it with is fundamentally more flexible, though not necessarily perfect. > 3) We've never had a clear-cut goal in allowing string ranges (as opposed to > character ranges, which Perl 5 and 6 both muddy a bit), so "intuitive" > becomes sketchy at best past the first grapheme, and ever muddier when only > considering codepoints (thus that wing of my proposal and current behavior > are on much shakier ground, except in so far as it asserts that we might > want to think about it more). I think that one notion that we're dealing with here is the idea that C<< $X < $X.succ >> for all strings. This seems to be a rather intuitive assumption to make; but it is apparently not an assumption that Stringy.succ makes. As I understand it, "Z".succ eqv "AA". What benefit do we gain from this behavior? Is it the idea that eventually this will iterate over every possible combination of capital letters? If so, why is that a desirable goal? My own gut instinct would be to define the string iterator such that it increments the final letter in the string until it gets to "Z"; then it resets that character to "A" and increments the next character by one: "ABE", "ABF", "ABG" ... "ABZ", "ACA", "ACB" ... "ZZZ" This pattern ensures that for any two strings in the series, the first one will be less than its successor. It does not ensure that every possible string between "ABE" and "ZZZ" will be represented; far from it. But then, 1...9 doesn't produce every number between 1 and 9; it only produces integers. Taken to an extreme: pi falls between 1 and 9; but no one in his right mind expects us to come up with a general sequencing of numbers that increments from 1 to 9 with a guarantee that it will hit pi before reaching 9. Mind you, I know that the above is full of holes. In particular, it works well when you limit yourself to strings composed of capital letters; do anything fancier than that, and it falls on its face. > 4) Many ranges involving single characters on LHS and RHS result in null > or infinite output, which is deeply non-intuitive to me, and I expect many > others. Again, the distinction between range-matching and series-generation comes to the rescue. > Solve those (and I tried in my suggestion) and I think you will be able to > apply intuition to character ranges, but only in so far as a human being is > likely to be able to intuit anything related to Unicode. Of the points that you raise, #1, 2, and 4 are neatly solved already. I'm unsure as to #3; so I'd recommend focusing some scrutiny on it. > The current behaviour of the range operator is (if I recall correctly): >> 1) if both sides are single characters, make a range by incrementing >> codepoints >> > > Sadly, you can't do that reasonabl
Re: Suggested magic for "a" .. "b"
Side note: you could get around some of the problems, below, but in order to do so, you would have to exhaustively express all of Unicode using the Str builtin module's RANGES constant. In fact, as it is now, it defines ASCII lowercase, but doesn't define Latin lowercase. Presumably because doing so would be a massive pain. Again, I'll point out that using script and properties is much easier On Tue, Jul 20, 2010 at 10:35 PM, Solomon Foster wrote: > > Sorry, didn't mean to imply the series operator was perfect. (Though > it is surprisingly awesome in general, IMO.) Just that the right > questions would be about the series operator rather than Ranges. > So, what's the intention of the range operator, then? Is it just there to offer backward compatibility with Perl 5? Is it a vestige that should be removed so that we can Huffman ... down to ..? I'm not trying to be difficult, here, I just never knew that ... could operate on a single item as LHS, and if it can, then .. seems to be obsolete and holding some prime operator real estate. > > The questions definitely look different that way: for example, > ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz is easily and > clearly expressed as > >'A' ... 'Z', 'a' ... 'z' # don't think this works in Rakudo yet :( > I still contend that this is so frequently desirable that it should have a simpler form, but it's still going to have problems. One example: for expressing "Katakana letters" (I use "letters" in the Unicode sense, here) it's still dicey. There are things interspersed in the Unicode sequence for Katakana that aren't the same thing at all. Unicode calls them lowercase, but that's not quite right. They're smaller versions of Katakana characters which are used more as punctuation or accents than as syllabic glyphs the way the rest of Katakana is. I guess you could write: ア, イ, ウ, エ, オ, カ ... ヂ,ツ ...モ,ヤ, ユ, ヨ ... ロ, ワ ... ヴ (add quotes to taste) But that seems quite a bit more painful than: ア .. ヴ (or ... if you prefer) Similar problems exist for many scripts (including some of Latin, we're just used to the parts that are odd), though I think it's possible that Katakana may be the worst because of the mis-use of Ll to indicate a letter when the truth of the matter is far more complicated. > That suggests to me that the current behavior of 'A' ... 'z' is pretty > reasonable. > You still have to decide to make at least some allowances for invalid codepoints and I think you should avoid ever generating a combining or modifying codepoint in such a sequence (e.g. "Ѻ" ... "Ҋ" in Cyrillic which contains several combining characters for currency and counting as well as one undefined codepoint). -- Aaron Sherman Email or GTalk: a...@ajs.com http://www.ajs.com/~ajs
Re: Suggested magic for "a" .. "b"
On Tue, Jul 20, 2010 at 10:00 PM, Jon Lang wrote: > Solomon Foster wrote: >> Ranges haven't been intended to be the "right way" to construct basic >> loops for some time now. That's what the "..." series operator is >> for. >> >> for 1e10 ... 1 -> $i { >> # whatever >> } >> >> is lazy by the spec, and in fact is lazy and fully functional in >> Rakudo. (Errr... okay, actually it just seg faulted after hitting >> 968746 in the countdown. But that's a Rakudo bug unrelated to >> this, I'm pretty sure.) > > You took the words out of my mouth. > >> All the magic that one wants for handling loop indices -- going >> backwards, skipping numbers, geometric series, and more -- is present >> in the series operator. Range is not supposed to do any of that stuff >> other than the most basic forward sequence. > > Here, though, I'm not so sure: I'd like to see how many of Aaron's > issues remain unresolved once he reframes them in terms of the series > operator. Sorry, didn't mean to imply the series operator was perfect. (Though it is surprisingly awesome in general, IMO.) Just that the right questions would be about the series operator rather than Ranges. The questions definitely look different that way: for example, ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz is easily and clearly expressed as 'A' ... 'Z', 'a' ... 'z' # don't think this works in Rakudo yet :( That suggests to me that the current behavior of 'A' ... 'z' is pretty reasonable. -- Solomon Foster: colo...@gmail.com HarmonyWare, Inc: http://www.harmonyware.com
Re: Suggested magic for "a" .. "b"
Solomon Foster wrote: > Ranges haven't been intended to be the "right way" to construct basic > loops for some time now. That's what the "..." series operator is > for. > > for 1e10 ... 1 -> $i { > # whatever > } > > is lazy by the spec, and in fact is lazy and fully functional in > Rakudo. (Errr... okay, actually it just seg faulted after hitting > 968746 in the countdown. But that's a Rakudo bug unrelated to > this, I'm pretty sure.) You took the words out of my mouth. > All the magic that one wants for handling loop indices -- going > backwards, skipping numbers, geometric series, and more -- is present > in the series operator. Range is not supposed to do any of that stuff > other than the most basic forward sequence. Here, though, I'm not so sure: I'd like to see how many of Aaron's issues remain unresolved once he reframes them in terms of the series operator. -- Jonathan "Dataweaver" Lang
Re: Suggested magic for "a" .. "b"
On Tue, Jul 20, 2010 at 7:31 PM, Aaron Sherman wrote: > 2) We deny that a range whose LHS is "larger" than its RHS makes sense, but > we also don't provide an easy way to construct such ranges lazily otherwise. > This would be annoying only, but then we have declared that ranges are the > right way to construct basic loops (e.g. for (1..1e10).reverse -> $i {...} > which is not lazy (blows up your machine) and feels awfully clunky next to > for 1e10..1 -> $i {...} which would not blow up your machine, or even make > it break a sweat, if it worked) Ranges haven't been intended to be the "right way" to construct basic loops for some time now. That's what the "..." series operator is for. for 1e10 ... 1 -> $i { # whatever } is lazy by the spec, and in fact is lazy and fully functional in Rakudo. (Errr... okay, actually it just seg faulted after hitting 968746 in the countdown. But that's a Rakudo bug unrelated to this, I'm pretty sure.) All the magic that one wants for handling loop indices -- going backwards, skipping numbers, geometric series, and more -- is present in the series operator. Range is not supposed to do any of that stuff other than the most basic forward sequence. -- Solomon Foster: colo...@gmail.com HarmonyWare, Inc: http://www.harmonyware.com
Re: Suggested magic for "a" .. "b"
This is a long reply, but I read it over a few times, and I don't see any fat to trim. This isn't really a simple issue for which intuition is going to be a sufficient guide, though I agree fully that it needs to be high on or at the top of the list. On Sun, Jul 18, 2010 at 6:26 AM, Moritz Lenz wrote: > In general, stuffing more complex behaviour into something that feels > unintuitive is rarely (if ever) a good solution. Walk with me a bit, and let's explore the concept of intuitive character ranges? This was my suggestion, which seems pretty basic to me: "x .. y", for all strings x and y, which are composed of a single, valid codepoint which is neither combining nor modifying, yields the range of all valid, non-combining/modifying codepoints between x and y, inclusive which share the Unicode script, general category major property and general category minor property of either x or y (lack of a minor property is a valid value). In general we have four problems with current specification and implementation on the Perl 6 and Perl 5 sides: 1) Perl 5 and Rakudo have a fundamental difference of opinion about what some ranges produce ("A" .. "z", "X" .. "T", etc) and yet we've never really articulated why we want that. 2) We deny that a range whose LHS is "larger" than its RHS makes sense, but we also don't provide an easy way to construct such ranges lazily otherwise. This would be annoying only, but then we have declared that ranges are the right way to construct basic loops (e.g. for (1..1e10).reverse -> $i {...} which is not lazy (blows up your machine) and feels awfully clunky next to for 1e10..1 -> $i {...} which would not blow up your machine, or even make it break a sweat, if it worked) 3) We've never had a clear-cut goal in allowing string ranges (as opposed to character ranges, which Perl 5 and 6 both muddy a bit), so "intuitive" becomes sketchy at best past the first grapheme, and ever muddier when only considering codepoints (thus that wing of my proposal and current behavior are on much shakier ground, except in so far as it asserts that we might want to think about it more). 4) Many ranges involving single characters on LHS and RHS result in null or infinite output, which is deeply non-intuitive to me, and I expect many others. Solve those (and I tried in my suggestion) and I think you will be able to apply intuition to character ranges, but only in so far as a human being is likely to be able to intuit anything related to Unicode. The current behaviour of the range operator is (if I recall correctly): > > 1) if both sides are single characters, make a range by incrementing > codepoints > Sadly, you can't do that reasonably. Here are some examples of why, using only Latin and Greek as examples (not the most convoluted Unicode sections to be sure): - "Α" (capital Greek alpha, not Latin A) .. "Ω" would result in a range that contains an invalid codepoint (rakudo: drops the invalid codepoint, which you may have meant to imply, but I'm being pedantic because I want to come to a specification, not just a sense of the right solution) - "Ā" .. "Ē" would be "ĀāĂ㥹ĆćĈĉĊċČčĎďĐđĒ" which is really not what you're likely to expect! (rakudo: Ā, infinitely repeating, which is an even larger problem for Katakana, where "オ" .. "ヺ" seems a very intuitive way to say "all Katakana non-cased letters" but fails because the range contains both cased and uncased; Perl 5 just prints "オ", and I think it also sneers at you) - "A" .. "z" comes out really odd because it contains punctuation (mind you, your suggestion is saner than Rakudo's current behavior on "A" .. "z" which is an infinite progression of capital-letter-only sequences of 1 or more characters! Intuitive, it's not.) My point was that, if you want simple and intuitive out of Unicode, you're kind of screwed. The closest you can get is to build your range using properties and script. The way I suggested doing that was the simplest I could think of. Speak up if you have a simpler one. For most simple ranges, our results will be identical (e.g "A" .. "Q"). For the above examples, I would end up producing: 1: Alpha through Omega greek capital letters 2: ĀĂĄĆĈĊČĎĐĒ (and オカガキギクグケゲコゴサザシジスズセゼソゾタダチヂツヅテデトドナニヌネノハバパヒビピフブプヘベペホボポマミムメモヤユヨラリルレロワヰヱヲンヴヷヸヹヺ for the Katakana) 3: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz That seems pretty darned intuitive to me. Mind you, "A" .. "ž" is still ugly as sin in terms of ordering once you listify, and I can't reasonably fix that without re-defining Unicode or having a really, really convoluted and special-case rule, but without getting convoluted, even that ugly example does something useful and, I dare say, intuitive for testing membership. Here's the pseudo-code for my suggestion: class SingleCharAlphaRange { has $.start; has $.end; # Verify that this is a single character string which is valid # and non-combining/non-modifying and repre
Re: Suggested magic for "a" .. "b"
Ruud H.G. van Tol wrote: > Aaron Sherman wrote: > >> Having established this range for each correspondingly indexed letter, the >> range for multi-character strings is defined by a left-significant counting >> sequence. For example: >> >> "Ab" .. "Be" >> >> defines the ranges: >> >> and >> >> This results in a counting sequence (with the most significant character on >> the left) as follows: >> >> > > glob can do that: > > perl5.8.5 -wle 'print for <{A,B}{c,d,e}>' Or Perl 6, for that matter :-) > .say for X~ ('a' .. 'e') Aa Ab Ac Ad Ae Ba Bb Bc Bd Be In general, stuffing more complex behaviour into something that feels unintuitive is rarely (if ever) a good solution. The current behaviour of the range operator is (if I recall correctly): 1) if both sides are single characters, make a range by incrementing codepoints 2) otherwise, call .succ on the LHS. Stop before the generated values exceed the RHS. I'm not convinced it should be any more complicated than that. Remember that with the series operator you can easily define your own incrementation rules, and meta operators (like the cross meta operator demonstrated above) it's also easy to combine different series and lists. Cheers, Moritz
Re: Suggested magic for "a" .. "b"
Aaron Sherman wrote: Having established this range for each correspondingly indexed letter, the range for multi-character strings is defined by a left-significant counting sequence. For example: "Ab" .. "Be" defines the ranges: and This results in a counting sequence (with the most significant character on the left) as follows: glob can do that: perl5.8.5 -wle 'print for <{A,B}{c,d,e}>' Ac Ad Ae Bc Bd Be Currently, Rakudo produces this: "Ab", "Ac", "Ad", "Ae", "Af", "Ag", "Ah", "Ai", "Aj", "Ak", "Al", "Am", "An", "Ao", "Ap", "Aq", "Ar", "As", "At", "Au", "Av", "Aw", "Ax", "Ay", "Az", "Ba", "Bb", "Bc", "Bd", "Be" which I don't think is terribly useful. Good enough for me. For your variant, just override the .. for 'smarter' behavior? -- Ruud
Re: Suggested magic for "a" .. "b"
On Fri, Jul 16, 2010 at 3:49 PM, Carl Mäsak wrote: > Aaron (>): > > [...] > > > > Many useful results from this suggested change: > > > > "C" .. "A" = (Rakudo: <>) > > Regardless of the other traits of your proposed semantics, I think > permitting reversed ranges such as the one above would be a mistake. > Why are you calling that a "reversed range"? It's not reversed, it's a range like any other. The ordering of the terminator elements is only interesting if you start pulling elements out. As a range, ordering isn't really significant. > Rakudo gives the empty list for ranges whose lhs exceeds (fsvo > "exceeds") its rhs, because that's the way ranges work in Perl. The > reason ranges work that way in Perl (in my understanding) is that it's > the less surprising behavior when the endpoints are determined at > runtime. > In Perl 5, if that's what you mean, "C" .. "A" produces the letters from C to Z. I have no rational explanation for why, but I suggest we avoid emulating this behavior in Perl 6. > For explicitly specifying a reverse list of characters, there's still > `reverse "A" .. "C"`, which is not only a straightforward idiom and > huffmanized about right, but also good documentation for the reader. > reverse("A" .. "C") is not the same as "C" .. "A". Observe: $ ./perl6 -e 'say reverse("A" .. "C").perl' ["C", "B", "A"] $ ./perl6 -e 'say ("A" .. "C").perl' "A".."C" In order for reverse to work lazily, it would have to add a wrapper to the iterator that asked for its last element first, and it's not clear to me that one CAN ask for an iterators last element without unrolling it. For single characters, that's not TOO bad, but for strings.elems > 1 you could blow out your RAM on even fairly trivial strings. -- Aaron Sherman Email or GTalk: a...@ajs.com http://www.ajs.com/~ajs
Re: Suggested magic for "a" .. "b"
On Fri, Jul 16, 2010 at 9:40 PM, Michael Zedeler wrote: > > What started it all, was the intention to extend the operator, making it > possible to evaluate it in list context. Doing so has opened pandoras box, > because many (most? all?) solutions are inconsistent with the rule of least > surprise. > I don't think there's any coherent expectation, and therefore no potential to avoid surprise. Returning comic books might be more of a surprise, but as long as you're returning a string which appears to be "in the range" expressed, then I don't see surprise as the problem. > > For instance, when considering strings, writing up an expression like > > 'goat' ~~ 'cow' .. 'zebra' > > This makes sense in most cases, because goat is lexicographically between > cow and zebra. This presumes that we're treating a string as a "number" in base x (where x, I guess would be the number of code points which share ... what, any of the general category properties of the components of the input strings? That begins to get horrendously messy very, very fast: say "1aB" .. "aB1" > I'd suggest that if you want to evaluate a Range in list context, you may > have to provide a hint to the Range generator, telling it how to generate > subsequent values. Your suggestion that the expansion of 'Ab' .. 'Be' > should yield is just an example of a different > generator (you could call it a different implementation of ++ on Str types). > It does look useful, but by realizing that it probably is, we have two > candidates for how Ranges should evaluate in list context. > I think the solution here is to evaluate what's practical in the general case. Your examples are, I think misleading because they involve English words and we naturally leap to "sure, that one's in the dictionary between the other two." However, let me pose this dictionary lookup for you: "cliché" ~~ "aphorism" .. "truth" Now, you see where this is going? What happens when we throw in some punctuation? "father-in-law" ~~ "dad" .. "stranger" The problem is that you have a complex heuristic in mind for determining membership, and a very simple operator for expressing the set. Worse, I haven't even gotten into dealing with Unicode where it's entirely reasonable to write "TOPIXコンポジット1500構成銘柄" which I shamelessly grabbed from a Tokyo Stock Exchange page. That one string, used in everyday text, contains Latin letters, Hiragana, Katakana, Han or Kanji idiograms and Latin digits. Meanwhile, back to ".." ... the range operator. The most useful application that I can think of for strings of length > 1 is for generating unique strings such as for mktemp. Beyond that, its application is actually quite limited, because the rules for any other sort of string that might make sense to a human are absurdly complex. As such, I think it suffices to say that, for the most part, ".." makes sense for single-character strings, and to expand from there, rather than trying to introduce anything more complex. -- Aaron Sherman Email or GTalk: a...@ajs.com http://www.ajs.com/~ajs
Re: Suggested magic for "a" .. "b"
On Fri, Jul 16, 2010 at 1:14 PM, yary wrote: > There is one case where Rakudo's current output makes more sense then > your proposal, and that's when the sequence is analogous to a range of > numbers in another base, and you don't want to start at the equivalent > of '' or end up at the equivalent of ''. If you want a range of numbers, you should be using numbers. Perl should absolutely not try to guess that you want codepoints to appear in your result set which were not either expressed in the input or fall between the range of any two corresponding input codepoints. > But that's a less > usual case and there's a workaround. Using your method & example, "Ab" > .. "Az", "Ba" .. "Be" would reproduce what Rakudo does now. > Quite true. > > In general, I like it. Though it does mean that the sequence generated > incrementing "Ab" repeatedly will diverge from "Ab" .. "Be" after 4 > iterations. > Also true, and I think that's a correct thing. -- Aaron Sherman Email or GTalk: a...@ajs.com http://www.ajs.com/~ajs
Re: Suggested magic for "a" .. "b"
Aaron Sherman wrote: > Oh bother, I wrote this up last night, but forgot to send it. Here y'all > go: > > I've been testing ".." recently, and it seems, in Rakudo, to behave like > Perl 5. That is, the magic auto-increment for "a" .. "z" works very > wonkily, > given any range that isn't within some very strict definitions (identical > Unicode general category, increasing, etc.) So the following: > > "A" .. "z" > > produces very odd results. Bear in mind that ".." is no longer supposed to be used to generate lists; for that, you should use "...". That said, that doesn't address the issues you're raising; it merely spreads them out over two operators (".." when doing pattern matching, and "..." when doing list generation). Your restrictions and algorithms are a good start, IMHO; and at some point when I have the time, energy, and know-how, I'll read through them in detail and comment on them. In the meantime, though, let me point out a fairly obvious point: sometimes, I want my pattern matching and list generation to be case-sensitive; other times, I don't. More generally, whatever algorithm you decide on should be subject to tweaking by the user to more accurately reflect his desires. So perhaps ".." and "..." should have an adverb that lets you switch case sensitivity on (if the default is "off") or off (if the default is "on"). And if you do this, there should be function forms of ".." and "..." for those of us who have trouble working with the rules for applying adverbs to operators. Likewise with other situations where there might be more than one way to approach things. -- Jonathan "Dataweaver" Lang
Re: Suggested magic for "a" .. "b"
On 2010-07-16 18:40, Aaron Sherman wrote: Oh bother, I wrote this up last night, but forgot to send it. Here y'all go: I've been testing ".." recently, and it seems, in Rakudo, to behave like Perl 5. That is, the magic auto-increment for "a" .. "z" works very wonkily, given any range that isn't within some very strict definitions (identical Unicode general category, increasing, etc.) So the following: "A" .. "z" produces very odd results. I'd like to suggest that we re-define this operator on strings as follows: [cut] "Ab" .. "Be" defines the ranges: and This results in a counting sequence (with the most significant character on the left) as follows: Currently, Rakudo produces this: "Ab", "Ac", "Ad", "Ae", "Af", "Ag", "Ah", "Ai", "Aj", "Ak", "Al", "Am", "An", "Ao", "Ap", "Aq", "Ar", "As", "At", "Au", "Av", "Aw", "Ax", "Ay", "Az", "Ba", "Bb", "Bc", "Bd", "Be" which I don't think is terribly useful. I have been discussing the Range operator before on this list, and since it often becomes the topic of discussion, something must be wrong with it. What started it all, was the intention to extend the operator, making it possible to evaluate it in list context. Doing so has opened pandoras box, because many (most? all?) solutions are inconsistent with the rule of least surprise. For instance, when considering strings, writing up an expression like 'goat' ~~ 'cow' .. 'zebra' This makes sense in most cases, because goat is lexicographically between cow and zebra. So we have a nice ordering of strings that even extends to strings of any length (note that the three words used in my example are 3, 4 and 5 letters). As you can see, we even have a Range operator in there, so everything should be fine. What breaks everything is that we expect the Range operator to be able to generate all values between the two provided endpoints. Everything goes downhill from there. With regard to strings, lexicographical ordering is the only prevailing ordering we provide the developer with (apart from length which doesn't provide a strict ordering that is needed). So anyone using the Range operator would assume that when lexicographical ordering is used for Range membership test, it is also used for generation of its members, naturally leading to the infinite sequence cow cowa cowaa cowaaa ... cowb cowba cowbaa For some reason (even though Perl6 supports infinite lists) we are currently using a completely new construct: the domain of strings limited to the lenght of the longest operand. This is counter intuitive since 'cowbell' ~~ 'cow' .. 'zebra' but 'cow' .. 'zebra' does not produce 'cowbell' in list context. Same story applies to other types that come with a natural ordering, but have an over countable domain. Although the solutions differ, the main problem is the same - they all behave counter intuitive. 5.0001 ~~ 1.1 .. 10.1 but 1.1 .. 10.1 does not (and really shouldn't!) produce 5.0001 in list context. I'd suggest that if you want to evaluate a Range in list context, you may have to provide a hint to the Range generator, telling it how to generate subsequent values. Your suggestion that the expansion of 'Ab' .. 'Be' should yield is just an example of a different generator (you could call it a different implementation of ++ on Str types). It does look useful, but by realizing that it probably is, we have two candidates for how Ranges should evaluate in list context. The same applies to Numeric types. My suggestion is to eliminate the succ method on Rat, Complex, Real and Str and point people in the direction of the series operator if they need to generate sequences of things that are over countable. Regards, Michael.
Re: Suggested magic for "a" .. "b"
Aaron (>): > [...] > > Many useful results from this suggested change: > > "C" .. "A" = (Rakudo: <>) Regardless of the other traits of your proposed semantics, I think permitting reversed ranges such as the one above would be a mistake. Rakudo gives the empty list for ranges whose lhs exceeds (fsvo "exceeds") its rhs, because that's the way ranges work in Perl. The reason ranges work that way in Perl (in my understanding) is that it's the less surprising behavior when the endpoints are determined at runtime. For explicitly specifying a reverse list of characters, there's still `reverse "A" .. "C"`, which is not only a straightforward idiom and huffmanized about right, but also good documentation for the reader. // Carl
Re: Suggested magic for "a" .. "b"
On Fri, Jul 16, 2010 at 9:40 AM, Aaron Sherman wrote: > For example: > > "Ab" .. "Be" > > defines the ranges: > > and > > This results in a counting sequence (with the most significant character on > the left) as follows: > > > > Currently, Rakudo produces this: > > "Ab", "Ac", "Ad", "Ae", "Af", "Ag", "Ah", "Ai", "Aj", "Ak", "Al", "Am", > "An", "Ao", "Ap", "Aq", "Ar", "As", "At", "Au", "Av", "Aw", "Ax", "Ay", > "Az", "Ba", "Bb", "Bc", "Bd", "Be" There is one case where Rakudo's current output makes more sense then your proposal, and that's when the sequence is analogous to a range of numbers in another base, and you don't want to start at the equivalent of '' or end up at the equivalent of ''. But that's a less usual case and there's a workaround. Using your method & example, "Ab" .. "Az", "Ba" .. "Be" would reproduce what Rakudo does now. In general, I like it. Though it does mean that the sequence generated incrementing "Ab" repeatedly will diverge from "Ab" .. "Be" after 4 iterations. -y
Suggested magic for "a" .. "b"
Oh bother, I wrote this up last night, but forgot to send it. Here y'all go: I've been testing ".." recently, and it seems, in Rakudo, to behave like Perl 5. That is, the magic auto-increment for "a" .. "z" works very wonkily, given any range that isn't within some very strict definitions (identical Unicode general category, increasing, etc.) So the following: "A" .. "z" produces very odd results. I'd like to suggest that we re-define this operator on strings as follows: RESTRICTIONS: First off, if either argument contains combining, modifying, undefined, reserved or other codepoints which either cannot be treated as a single, independent "character" or whose Unicode properties are not firmly established in the Unicode specification, then an exception is immediately raised. This must be done in order to assure that each character index can be compared to each corresponding character index without the typical Unicode ambiguities. Ligatures and other decomposable sequences are treated by their codepoint in the current encoding, only. Treatment of strings whose encodings differ should be possible, as all comparisons are performed on codepoints. If either argument is zero length, an exception is raised. If either one argument is *, then it is assumed to stand for the largest (RHS) or smallest (LHS) codepoint with the same Unicode general properties as the opposite side (for each character index, if the other value is a string of length > 1). ALGORITHM: If both arguments are strings of non-zero length, ".." will first determine which is the shorter. This length is the "significant length". Any characters after this length in the longer sequence are ignored (return value might be an unthrown exception in this case?) For all remaining characters, each character is considered with respect to its correspondingly indexed character in the other string the following algorithm is applied to determine the range that they represent (the LHS character is referred to as "A", below and the RHS as "B") The binary Unicode general category properties of A and B are considered from the set of major category classes: L, M, N, P, S, Z, C Thus the Lu property or Pe property would be considered. The total range consists of all codepoints lying between the lower of the two codepoints and the higher of the two, inclusive, which share either the major and minor Unicode general category property of A and B (if there is no minor subclass, then codepoints without a minor subclass are considered with respect to that endpoint). The ordering is determined by the ordering of A and B. The range is then restricted to codepoints which share the same script as A or B. Thus, latin "a" and greek lowercase pi would define a range which included all lower-case letters from the Latin and Greek scripts that fell between their codepoints. Having established this range for each correspondingly indexed letter, the range for multi-character strings is defined by a left-significant counting sequence. For example: "Ab" .. "Be" defines the ranges: and This results in a counting sequence (with the most significant character on the left) as follows: Currently, Rakudo produces this: "Ab", "Ac", "Ad", "Ae", "Af", "Ag", "Ah", "Ai", "Aj", "Ak", "Al", "Am", "An", "Ao", "Ap", "Aq", "Ar", "As", "At", "Au", "Av", "Aw", "Ax", "Ay", "Az", "Ba", "Bb", "Bc", "Bd", "Be" which I don't think is terribly useful. Many useful results from this suggested change: "C" .. "A" = (Rakudo: <>) "(" .. "}" = <( ) [ ] { }> (because open-paren is Pe and close-brace is Ps, therefore all Pe and Ps codepoints in the range are included). "Α" .. "Ω" = <Α Β Γ Δ Ε Ζ Η Θ Ι Κ Λ Μ Ν Ξ Ο Π Ρ Σ Τ Υ Φ Χ Ψ Ω> (notice that codepoint U+03A2 is gracefully skipped, as it is undefined and thus has no properties). "apple" .. "orange" = the counting sequence defined by the ranges "a" .. "o", "p" .. "r", "p" .. "a", "l" .. "n", "e" .. "g" (notice that the string "orang" will be part of the result set, but "orange" will not.) In addition: One alternative to truncation of strings of differing lengths is to extend the sequence. For example, if we ask for "a" .. "bc", then we might produce . Where the extension is the original range plus the same range where each element has the extended string elements concatenated. This might even be iterated for every additional codepoint in the longer string. For example: "a" .. "bcd" = "..." could have similar semantics. In the case of A, B ... C, for length 1 strings, the range A .. B is simply projected forward to until x ge C (if A..B is increasing, le otherwise). C's properties probably should not be considered at all. In the case of length > 1 strings each character index is projected forward independently until any one character index ge the corresponding index in the terminator, and there is no "counting": "AAA", "BCD" ... "GGG" = If any index in the sequence does not increment (e.g. "AA", "AB" ... "ZZ") then there is an implic