Re: How are unrecognized options to built-in pod block types treated?
On Wed, Aug 4, 2010 at 10:05 PM, Damian Conway dam...@conway.org wrote: Darren suggested: Use namespaces. The upper/lower/mixed approach *is* a namespace approach. It's a very C-like approach, but yes, it's certainly a crude sort of namespace. Perl already has a more robust and modern namespace system, however. Using it would seem wise. Explicit versioning is your friend. Can I get some support for this? Not from me. ;-) I think it's a dreadful prospect to allow people to write documentation that they will have to rewrite when the Pod spec gets updated. I would hope... really, desperately hope that the POD spec changing would be the least of anyone's worries. If you're writing documentation, it's a foregone conclusion that it has to be maintained, just like any other part of your software. If the POD spec is adding new config options at a rate that isn't several orders of magnitude less than the frequency with which your code changes then either you're documenting the Magna Carta or we have a problem with our documentation system. If the latter is the case, then the right solution is to provide new documentation features via modules and allow the user to select which new features they desire, automatically resolving the problem, since old docs simply won't pull in newer features. This could go both ways, as well. use v6 might get you the default first-pressing documentation features of Perl 6.0.0 while use v6.1 might get you the default features of 6.1. Then you could mix it up: use v6; use Docs::SectionImage; Or, alternatively, to require all Pod parsers to be infinitely backwards compatible across all versions. :-( If you never want documentation to break, then that's your only option. Someday we're going to decide to make an incompatible change to Perl's documentation system, and we'll have a very good reason to do so, I'd imagine. The right thing to do will be to make sure that we roll it out carefully and with all due warning. -- Aaron Sherman Email or GTalk: a...@ajs.com http://www.ajs.com/~ajs
Re: How are unrecognized options to built-in pod block types treated?
Darren (): Read what I said again. I was proposing that the namespace comprised of names matching a pattern like this: /^ [A..Z]+ | [a..z]+ $/ /^ [[A..Z]+ | [a..z]+] $/ // Carl
pattern alternation (was Re: How are ...)
Carl Mäsak wrote: Darren (): Read what I said again. I was proposing that the namespace comprised of names matching a pattern like this: /^ [A..Z]+ | [a..z]+ $/ /^ [[A..Z]+ | [a..z]+] $/ Are the square brackets necessary when the pattern doesn't contain anything other than the alternatives? I would have thought them optional in the case I mentioned. Rather, they would just be necessary in a case like this: /^ foo [[A..Z]+ | [a..z]+] bar $/ -- Darren Duncan
declaring versions (was Re: How ...)
Damian Conway wrote: Darren suggested: Use namespaces. The upper/lower/mixed approach *is* a namespace approach. Yes it is. But I thought that prefix-namespaces would scale better. Especially if the documentation system got complicated enough to involve modules, possibly those by different sources, as some others have suggested. That said, I'm inclined to think that the likely complexity of the documentation system over time should grow by fewer orders of magnitude than code in general, and so I grant that some ideas can look like over-engineering. Explicit versioning is your friend. Can I get some support for this? Not from me. ;-) I think it's a dreadful prospect to allow people to write documentation that they will have to rewrite when the Pod spec gets updated. Or, alternatively, to require all Pod parsers to be infinitely backwards compatible across all versions. :-( One main purpose of declaring the intended interpretation context of a work is so that developers of interpreters have a lot more freedom to *not* be backwards-compatible. Each version is effectively a separate language in some ways. Because a work declares its language version, one should be able to take the work anywhere and it would be completely unambiguous as to how to interpret it, no matter how old it is and how much the state of the art has evolved. If the meaning of a keyword changes in a spec, we know without a doubt which meaning the user intended. As for backwards compatibility, this is actually less onerous to implement with my proposal than otherwise. For one thing, if developers want to make an incompatible change, they can release it right away, without a long deprecation or changeover cycle, and in the typical case old works will continue to be interpreted correctly. For another thing, assuming in the typical case that any time a language evolves, it still provides the means to accomplish anything it was previously capable of, then each implementation needs no backwards-compatibility internally, but just the state of the art. Backwards compatibility can be achieved with version-specific shims over top of this single core, which translate works written to an older spec to their equivalent in the new one. Because versions are explicitly declared, it is trivial to dispatch to the correct interpreter or pseudo-interpreter. Yet another thing, parsers don't have to be infinitely backwards compatible; they can deprecate support for particular older versions as they choose to, when necessary and reasonable. So, explicit versioning is actually very good for *future-proofing*. I believe there are various precedents for this. In the Perl 5 world, for example, see autodie (optional) or perl5i (mandatory). use autodie qw(:1.994); use perl5i::2; I have also done this from day one in my Muldis D language, and I have no regrets for doing so. -- Darren Duncan
Re: declaring versions (was Re: How ...)
Darren Duncan wrote: For another thing, assuming in the typical case that any time a language evolves, it still provides the means to accomplish anything it was previously capable of, then each implementation needs no backwards-compatibility internally, but just the state of the art. Backwards compatibility can be achieved with version-specific shims over top of this single core, which translate works written to an older spec to their equivalent in the new one. Because versions are explicitly declared, it is trivial to dispatch to the correct interpreter or pseudo-interpreter. As an addendum to this thought ... If a system is also capable of generating a source work from a parsed version that is effectively the same as the original, it should also be possible for a user to request a source translation from some older understood spec version to a newer/current one. So they can be assisted in keeping their sources up to date without having to manually keep updating them, in general. Then when support for older formats is deprecated and removed, by that time their source will have been updated so it is still interpretable without manual updates. Of course, supporting this is optional, but its useful. Like a Perl 5 to Perl 6 translator but on much finer and easier to do scales. -- Darren Duncan
Re: pattern alternation (was Re: How are ...)
On Thu, Aug 05, 2010 at 12:29:38AM -0700, Darren Duncan wrote: Carl Mäsak wrote: Darren (): Read what I said again. I was proposing that the namespace comprised of names matching a pattern like this: /^ [A..Z]+ | [a..z]+ $/ /^ [[A..Z]+ | [a..z]+] $/ Are the square brackets necessary when the pattern doesn't contain anything other than the alternatives? In this case yes -- the original pattern without the square brackets would act like: / [^ [A..Z]+] | [[a..z]+ $] / In other words, the original pattern says starting with uppercase or ending with lowercase. Pm
Re: pattern alternation (was Re: How are ...)
Darren (), Carl (), Darren (), Patrick (): Read what I said again. I was proposing that the namespace comprised of names matching a pattern like this: /^ [A..Z]+ | [a..z]+ $/ /^ [[A..Z]+ | [a..z]+] $/ Are the square brackets necessary when the pattern doesn't contain anything other than the alternatives? In this case yes -- the original pattern without the square brackets would act like: / [^ [A..Z]+] | [[a..z]+ $] / In other words, the original pattern says starting with uppercase or ending with lowercase. I see this particular thinko a lot, though. Maybe some Perl 6 lint tool or another will detect when you have a regex containing ^ at its start, $ at the end, | somewhere in the middle, and no [] to disambiguate. // Carl
Re: pattern alternation (was Re: How are ...)
On Thu, Aug 5, 2010 at 7:55 AM, Carl Mäsak cma...@gmail.com wrote: Darren (), Carl (), Darren (), Patrick (): In this case yes -- the original pattern without the square brackets would act like: / [^ [A..Z]+] | [[a..z]+ $] / In other words, the original pattern says starting with uppercase or ending with lowercase. I see this particular thinko a lot, though. Maybe some Perl 6 lint tool or another will detect when you have a regex containing ^ at its start, $ at the end, | somewhere in the middle, and no [] to disambiguate. You know, this problem would go away, almost entirely, if we had a :f[ull] adverb for regex matching that imposed ^[...]$ around the entire match. Then your code becomes: m:f/[A..Z]+|[a..z]+/ for grins, :f[ull]l[ine] could use ^^ and $$. I suspect :full would almost always be associated with TOP, in fact. Boy am I tired of typing ^ and $ in TOP ;-) -- Aaron Sherman Email or GTalk: a...@ajs.com http://www.ajs.com/~ajs
Re: pattern alternation (was Re: How are ...)
Aaron Sherman wrote: You know, this problem would go away, almost entirely, if we had a :f[ull] adverb for regex matching that imposed ^[...]$ around the entire match. Then your code becomes: m:f/[A..Z]+|[a..z]+/ for grins, :f[ull]l[ine] could use ^^ and $$. I suspect :full would almost always be associated with TOP, in fact. Boy am I tired of typing ^ and $ in TOP ;-) The regex counterpart of C say $x vs. C print $x\n . Yes, this would indeed solve a lot of problems. It also reflects a tendency in some regular expression engines out there to automatically impose full string matching (i.e., an implicit ^ at the start and $ at the end). That said: for mnemonic purposes, I'd be inclined to have :f do /^[$pattern]$/, while :ff does /^^[$pattern]$$/. -- Jonathan Dataweaver Lang
Re: pattern alternation (was Re: How are ...)
On 2010-08-05, at 8:27 am, Aaron Sherman wrote: On Thu, Aug 5, 2010 at 7:55 AM, Carl Mäsak cma...@gmail.com wrote: I see this particular thinko a lot, though. Maybe some Perl 6 lint tool or another will detect when you have a regex containing ^ at its start, $ at the end, | somewhere in the middle, and no [] to disambiguate. I think conceptually the beginning and the end of a string feels like a bracketing construct (only without symmetrical symbols). At least that seems to be my instinct. Well, it doesn't in / ^foo | ^bar | ^qux /, but in something like /^ foo|bar $/, the context immediately implies a higher precedence for ^ and $. Maybe something like // foo|bar // could work as a bracketing version? You know, this problem would go away, almost entirely, if we had a :f[ull] adverb for regex matching that imposed ^[...]$ around the entire match. I was thinking of that too. I suspect :full would almost always be associated with TOP, in fact. Boy am I tired of typing ^ and $ in TOP ;-) Does it make sense for ^[...]$ to be assumed in TOP by default? (Though not necessary if there's a shortcut like //...//.) -David
Re: pattern alternation (was Re: How are ...)
On Thu, Aug 05, 2010 at 10:27:50AM -0400, Aaron Sherman wrote: On Thu, Aug 5, 2010 at 7:55 AM, Carl Mäsak cma...@gmail.com wrote: I see this particular thinko a lot, though. Maybe some Perl 6 lint tool or another will detect when you have a regex containing ^ at its start, $ at the end, | somewhere in the middle, and no [] to disambiguate. You know, this problem would go away, almost entirely, if we had a :f[ull] adverb for regex matching that imposed ^[...]$ around the entire match. Then your code becomes: m:f/[A..Z]+|[a..z]+/ There's a version of this already. Matching against an explicit 'regex', 'token', or 'rule' automatically anchors it on both ends. Thus: $string ~~ regex { [A..Z]+ | [a..z]+ } is equivalent to $string ~~ regex { ^ [ A..Z+ | [a..z]+ ] $ } Pm
Re: pattern alternation (was Re: How are ...)
On Thu, Aug 5, 2010 at 11:09 AM, Patrick R. Michaud pmich...@pobox.comwrote: On Thu, Aug 05, 2010 at 10:27:50AM -0400, Aaron Sherman wrote: On Thu, Aug 5, 2010 at 7:55 AM, Carl Mäsak cma...@gmail.com wrote: I see this particular thinko a lot, though. Maybe some Perl 6 lint tool or another will detect when you have a regex containing ^ at its start, $ at the end, | somewhere in the middle, and no [] to disambiguate. You know, this problem would go away, almost entirely, if we had a :f[ull] adverb for regex matching that imposed ^[...]$ around the entire match. Then your code becomes: m:f/[A..Z]+|[a..z]+/ There's a version of this already. Matching against an explicit 'regex', 'token', or 'rule' automatically anchors it on both ends. Thus: $string ~~ regex { [A..Z]+ | [a..z]+ } is equivalent to $string ~~ regex { ^ [ A..Z+ | [a..z]+ ] $ } While that's a nifty special case (I'm sure it will surprise me someday, and I'll spend a half hour debugging before I remember this mail), it doesn't help in the general case (see my example grammar, below). After doing some more thinking and comparing this to other languages (python, for example has match which matches only at the start of a string), it seems to me that there is a sort of out-of-band need to have a more general solution at match time. Here's my second pass suggestion: m:r / m:rooted -- Match is rooted on both ends (^...$) m:rs / m:rootedstart - Match is rooted at the start of string (^, ala Python re.match) m:re / m:rootedend - Match is rooted at the end of string ($) m:rn / m:rootednone - Match is not rooted (default) m:o / m:oneline - Modify :r and friends to use ^^/$$ Here's one way I can see that being routinely used: # Simplistic shell scripts rule TOP :r {stmt*} # Match the whole script rule stmt :r :o { cmd arg* } # One statement per line The other way to go about that would be with parameterized adverbs. I'm not sure how comfy people are with those, but they're in the spec. So this: m:r / m:rooted -- Match is rooted (default is ^...$) Parameters: :s / :start -- Match is rooted only at start (^) :e / :end -- Match is rooted only at end ($) [note: :s :e should produce a warning] :n / :none -- Match is not rooted (null modifier) [note: combining :n with :s or :e should warn] :o / :oneline -- Use ^^ and $$ instead of ^ and $ [note: combining :o with :n should warn?] So our statement matching grammar becomes: rule TOP :r {stmt*} rule stmt :r(:o) { cmd arg* } The clown nose is just a side benefit ;-) Seriously, though, I prefer :r(:o) because :r:o looks like it should be the opposite of :rw (there is no :ro, as far as I know). PS: I see no reason that any of this is needed for 6.0.0 -- Aaron Sherman Email or GTalk: a...@ajs.com http://www.ajs.com/~ajs
Re: pattern alternation (was Re: How are ...)
On Thu, Aug 5, 2010 at 12:28 PM, Aaron Sherman a...@ajs.com wrote: While that's a nifty special case (I'm sure it will surprise me someday, and I'll spend a half hour debugging before I remember this mail), it doesn't help in the general case (see my example grammar, below). In the general case, no. In the case of your grammar, and all grammars, it does help. All regex routines, when called standalone, are anchored to the beginning and end of the string. So, having ^ and $ at the beginning and end of your TOP is a no-op unless some other rule calls it as a subrule. S05 says: In general, the anchoring of any subrule call is controlled by its calling context. When a regex, token, or rule method is called as a subrule, the front is anchored to the current position (as with :p), while the end is not anchored, since the calling context will likely wish to continue parsing. However, when such a method is smartmatched directly, it is automatically anchored on both ends to the beginning and end of the string. and that The basic rule of thumb is that the keyword-defined methods never do implicit .*?-like scanning, while the m// and s// quotelike forms do such scanning in the absence of explicit anchoring. Given that the Grammar.parse is specified to create a new Grammar object and directly match its TOP(or the value of the :rule adverb) method, without any specification that it does implicit .*? like scanning, I think that Grammar.parse should always anchor. This doesn't appear to work quite properly in Rakudo currently. It anchors to the beginning but not to the end. I'm about to check if there's a rakudobug for this already, and submit it if not. After doing some more thinking and comparing this to other languages (python, for example has match which matches only at the start of a string), it seems to me that there is a sort of out-of-band need to have a more general solution at match time. Here's my second pass suggestion: m:r / m:rooted -- Match is rooted on both ends (^...$) m:rs / m:rootedstart - Match is rooted at the start of string (^, ala Python re.match) m:re / m:rootedend - Match is rooted at the end of string ($) m:rn / m:rootednone - Match is not rooted (default) m:o / m:oneline - Modify :r and friends to use ^^/$$ Here's one way I can see that being routinely used: # Simplistic shell scripts rule TOP :r {stmt*} # Match the whole script rule stmt :r :o { cmd arg* } # One statement per line :oneline or similar might be useful. I'm not sure about :rootedend and :rootedstart. :rooted is useful only in one situation: when implicitly matching against the topic. You could do m:r/ foo /; to match against the topic, but regex { foo }; would not do what you want (I think). I don't know if doing an anchored match against the topic is really important enough to justify an adverb just so you don't have to do $_ ~~ regex { foo }. The other way to go about that would be with parameterized adverbs. I'm not sure how comfy people are with those, but they're in the spec. So this: m:r / m:rooted -- Match is rooted (default is ^...$) Parameters: :s / :start -- Match is rooted only at start (^) :e / :end -- Match is rooted only at end ($) [note: :s :e should produce a warning] :n / :none -- Match is not rooted (null modifier) [note: combining :n with :s or :e should warn] :o / :oneline -- Use ^^ and $$ instead of ^ and $ [note: combining :o with :n should warn?] So our statement matching grammar becomes: rule TOP :r {stmt*} rule stmt :r(:o) { cmd arg* } The clown nose is just a side benefit ;-) Seriously, though, I prefer :r(:o) because :r:o looks like it should be the opposite of :rw (there is no :ro, as far as I know). PS: I see no reason that any of this is needed for 6.0.0 -- Aaron Sherman Email or GTalk: a...@ajs.com http://www.ajs.com/~ajs -- Tyler Curtis
Re: pattern alternation (was Re: How are ...)
David Green wrote: On 2010-08-05, at 8:27 am, Aaron Sherman wrote: On Thu, Aug 5, 2010 at 7:55 AM, Carl Mäsak cma...@gmail.com wrote: I see this particular thinko a lot, though. Maybe some Perl 6 lint tool or another will detect when you have a regex containing ^ at its start, $ at the end, | somewhere in the middle, and no [] to disambiguate. I think conceptually the beginning and the end of a string feels like a bracketing construct (only without symmetrical symbols). At least that seems to be my instinct. Well, it doesn't in / ^foo | ^bar | ^qux /, but in something like /^ foo|bar $/, the context immediately implies a higher precedence for ^ and $. Maybe something like // foo|bar // could work as a bracketing version? Personally, I had always considered the ^ and $ to be the lowest precedence things in a pattern. But I can understand the flexibility one gains from that not being so, having seen David's example here, which it never occurred to me before was possible. -- Darren Duncan
Re: pattern alternation (was Re: How are ...)
On Thu, Aug 5, 2010 at 2:43 PM, Tyler Curtis ekir...@gmail.com wrote: On Thu, Aug 5, 2010 at 12:28 PM, Aaron Sherman a...@ajs.com wrote: While that's a nifty special case (I'm sure it will surprise me someday, and I'll spend a half hour debugging before I remember this mail), it doesn't help in the general case (see my example grammar, below). In the general case, no. In the case of your grammar, and all grammars, it does help. All regex routines, when called standalone, are anchored to the beginning and end of the string. So, having ^ and $ at the beginning and end of your TOP is a no-op unless some other rule calls it as a subrule. There's something deeply disturbing to me in that... but I can't fully express what it is. It just feels like I'm going to end up debugging mountains of code, written by people who didn't understand that that was the case. Several times over the past few weeks, I've mentioned something on this list only to find that, buried somewhere deep in a synopsis, there was a special case I was unaware of. The sheer volume of silent special cases in Perl 6 appears to be dwarfing that of Perl 5, but perhaps that's just because I know Perl 5 far better than I know Perl 6. Mind you, I'm not complaining, so much as working out how I feel out loud Am I the only one who feels this way at this point? :oneline or similar might be useful. I'm not sure about :rootedend and :rootedstart. Are you saying that you can't think of examples of where you want to root a regex only to the start or end, or that you just don't think you need an adverb to do it? If the former, then I submit the 1536 examples of matching only at the end of strings in my local Perl library (mostly for matching whitespace or filename extensions it looks like) and the 3199 examples of matching only at the start which includes headers of all types (RFC2822 and friends, HTTP, CPAN configs, etc.), whitespace, command sequence matching (e.g. /^GET /) and so on. If the latter, then I guess you and I just have a different take, here, and that's fine. I respect your opinion, but in this case, I happen to disagree. PS: You can also search through any typical python install for \.match which will yield quite a lot of additional examples. I don't know Ruby or Java very well, or I'd go looking for examples there too. -- Aaron Sherman Email or GTalk: a...@ajs.com http://www.ajs.com/~ajs