pluralization idea that keeps bugging me
Last night I got a message entitled: yum: 1 Updates Available. Of course, that's probably just a Python programmer giving up on doing the right thing, but we see this sort of bletcherousness all the time. After a recent exchange on PerlMonks about join, I've been thinking about the problem of pluralization in interpolated strings, where we get things like: say Received $m message{ 1==$m ?? '' !! 's' }. My first thought is that this is such a common idiom that we ought to have some syntactic sugar for it: say Received $m message\s. which reads nicely enough since the usual case is plural. Basically, \s would be smart enough to magically know somehow whether the last interpolation was 1 or not. It would be particular nice when the interpolation is a closure: say Received {calculate_number_of_messages()} message\s. That would cover most of the cases for English speakers using regular nouns, but I wonder whether there's some kind of generalization that would help for cases like: say There was/were $o ox/oxen But that doesn't work since / isn't a metacharacter. Using an adverb seems like overkill, if we can piggyback on an existing metachar. Maybe something like say There was\swere $o ox\soxen where if anything alphabetic follows the \s it is the alternative plural. But note that the first \s there would have to be looking forward rather than backward to do the verb, which constrains the possible mechanisms, and makes it problematic to use \s multiple times: say There was\swere $o ox\soxen and $g goat\s. though that could be made clearer with explicit concatenation: say There was\swere $o ox\soxen ~ and $g goat\s. say There was\swere $o ox\soxen , and $g goat\s. Or maybe instead of using \ we should use a sigil: say There $was|were $o $ox|oxen except, of course, that $ is already taken. Seems tacky to use up a real variable name like: say There $Xwas|were $o $Xox|oxen I suppose one could make a case for Num vars having a . method though: say There $owas|were $o $oox|oxen That nicely resolves the ambiguity of say There $owas|were $o $oox|oxen and $g goat$gs but doesn't really help when you really need it, which is when you interpolate something hairy: say There $j.k.l.m.owas|were $j.k.l.m.o $j.k.l.m.oox|oxen and $j.k.l.m.g goat$j.k.l.m.gs It's even less helpful when you interpolate a closure since there's no variable name to refer to (unless you assign one, but then we're losing much of our syntactic sugary wonderfulness). So maybe we should just make \s dwim and leave it at that. Two dwimminesses, really. The first dwim finds the associated interpolation, either the first interpolation of a variable or closure before the \s, or if there is none, the first one after. Call that interpolated value $X for the moment. (It doesn't really have to have a real variable name, but the important thing is not to evaluate the expression multiple times since it might have side effects (including the side effect of being inefficient to compute).) The second dwim looks at the alphabeticality of the next character (defined Unicodically, of course) to decide if there is one argument or two: foo\s means $X == 1 ?? 'foo' !! 'foos' foo\sbarmeans $X == 1 ?? 'foo' !! 'bar' Internally, you end up multiply dispatching to something like pluralize($X,'foo') or pluralize($X,'foo','bar'). (Arguably we could make pluralize interpolate the $X as well, but that only works for noun agreement, not verb agreement.) I think that probably handles most of the Indo-European cases, and anything more complicated can revert to explicit code. (Or go though a localization dictionary...) Any other cute ideas? Larry
Re: pluralization idea that keeps bugging me
On 26/01/2008, Larry Wall [EMAIL PROTECTED] wrote: After a recent exchange on PerlMonks about join, I've been thinking about the problem of pluralization in interpolated strings, where we get things like: say Received $m message{ 1==$m ?? '' !! 's' }. ... Any other cute ideas? No matter what you do it will remain too English-centric. It might work for Catalan, too. But it will remain totally useless for Arabic or Chinese. In any case, i don't understand why should this be in the core language at all. -- Amir Elisha Aharoni English - http://aharoni.wordpress.com Hebrew - http://haharoni.wordpress.com We're living in pieces, I want to live in peace. - T. Moore
Re: pluralization idea that keeps bugging me
Larry Wall wrote: Any other cute ideas? If you have '\s', you'll also want '\S': $n cat\s fight\S # 1 cat fights; 2 cats fight I'm not fond of the 'ox\soxen' idea; but I could get behind something like '\sox oxen' or 'ox\sen'. '\sa b' would mean 'a is singular; b is plural' '\sa' would be short for '\s a' '\s' would be short for '\s s' \Sa b' would reverse this. Sometimes, you won't want the pluralization variable in the string itself, or you won't know which one to use. You could use an adverb for this: :s$nthe cat\s \sis are fighting. and/or find a way to tag a variable in the string: $owner's \s=$count cat\s '\s=$count' means set plurality based on $count, and display $count normally. -- Jonathan Dataweaver Lang
Re: pluralization idea that keeps bugging me
Jonathan makes an excellent point about s and S. In fact, there's probably a little language out there for this. I don't think it needs to be in the core, though. But you could put in some kind of hook mechanism, so that detecting the presence of \s or whatever caused the string to be treated specially. Perhaps it gets a different, possibly more sophisticated, type? A type that is only in-core in a limited (English-only?) implementation, but which admins can install at whim. =Austin Jonathan Lang wrote: Larry Wall wrote: Any other cute ideas? If you have '\s', you'll also want '\S': $n cat\s fight\S # 1 cat fights; 2 cats fight I'm not fond of the 'ox\soxen' idea; but I could get behind something like '\sox oxen' or 'ox\sen'. '\sa b' would mean 'a is singular; b is plural' '\sa' would be short for '\s a' '\s' would be short for '\s s' \Sa b' would reverse this. Sometimes, you won't want the pluralization variable in the string itself, or you won't know which one to use. You could use an adverb for this: :s$nthe cat\s \sis are fighting. and/or find a way to tag a variable in the string: $owner's \s=$count cat\s '\s=$count' means set plurality based on $count, and display $count normally.
Re: pluralization idea that keeps bugging me
Jonathan Lang schreef: I'm not fond of the 'ox\soxen' idea; but I could get behind something like '\sox oxen' or 'ox\sen'. $n ox\s en $n\sone multiple no cat\s s fight\s s s ;) -- Affijn, Ruud Gewoon is een tijger.
Re: pluralization idea that keeps bugging me
Amir E. Aharoni wrote: On 26/01/2008, Larry Wall [EMAIL PROTECTED] wrote: After a recent exchange on PerlMonks about join, I've been thinking about the problem of pluralization in interpolated strings, where we get things like: say Received $m message{ 1==$m ?? '' !! 's' }. ... Any other cute ideas? No matter what you do it will remain too English-centric. It might work for Catalan, too. But it will remain totally useless for Arabic or Chinese. In any case, i don't understand why should this be in the core language at all. I second that. A few more thoughts: 1. For example in Hungarian, you don't need this at all: the noun stays singular after the numeral. 2. AFAIK in some languages it's not 1 ore more, but 1, 2 or more. 3. It's often not 1 or more what you need, but none, 1 ore more. No new messages - You have 1 new message - You have 3 new messages. Or more likely bNow new messages./b - a href=/read.aspYou have 1 new message./a ... etc. 4. I work a lot with multilingual websites. I have learned long ago that it's never {{you_have}} [% messages %] {{messages}}. You have to be *very* lucky just to make this work in two languages. Instead, it's {{number_of_new_messages}}: [% messages %]. That pretty much works everywhere. So not in the core, probably. There are too many exceptions. A module would be cool, though :) String::Plural::English, or whatnot. - Fagzal
Re: pluralization idea that keeps bugging me
To me this sounds like use Lingua::EN::Pluralize::DSL; which would overload your grammar locally to parse strings this way. However, due to i18n reasons this should not be in the core. It might make sense to ship a slightly modernized Locale::MakeText with Perl 6 so that it can be used in the compiler itself, but unless a fully open ended system like L::MT is included I think having anything at all might be damaging, because this will encourage people to use the partial solution that is already built in instead of the complete on eon the CPAN (c.f. many core modules). -- Yuval Kogman [EMAIL PROTECTED] http://nothingmuch.woobling.org 0xEBD27418
Re: pluralization idea that keeps bugging me
On Saturday 26 January 2008 08:58:43 Larry Wall wrote: That would cover most of the cases for English speakers using regular nouns, but I wonder whether there's some kind of generalization that would help for cases like: say There was/were $o ox/oxen That makes me wish for a subjunctive/optative mood marker. I'm not sure why. In-language localization and internationalization hooks do seem awfully useful, but English-only pluralization rules just might not cut it. Nearly pain-free l10n and i18n *is* kind of a killer feature though. -- c
Re: pluralization idea that keeps bugging me
At 8:58 AM -0800 1/26/08, Larry Wall wrote: My first thought is that this is such a common idiom that we ought to have some syntactic sugar for it: say Received $m message\s. I don't think that a feature like this should be in the core language; it is too complicated as well as an open-ended problem. A better use of this discussion is perhaps to determine whether any more basic core features would need updating in order to support a separate extension module to more easily provide the feature that was being discussed. -- Darren Duncan
Re: pluralization idea that keeps bugging me
On Sat, Jan 26, 2008 at 08:58:43AM -0800, Larry Wall wrote: After a recent exchange on PerlMonks about join, I've been thinking about the problem of pluralization in interpolated strings, where we get things like: say Received $m message{ 1==$m ?? '' !! 's' }. My first thought is that this is such a common idiom that we ought to have some syntactic sugar for it: say Received $m message\s. [...] Any other cute ideas? FWIW, this sounds to me a lot like a special quoting operator or adverbial form. say qq:pluralized Received $m message\s. Pm
Re: pluralization idea that keeps bugging me
On 2008-01-26 Larry Wall [EMAIL PROTECTED] wrote: Last night I got a message entitled: yum: 1 Updates Available. [snip a lot] I think that probably handles most of the Indo-European cases, and anything more complicated can revert to explicit code. (Or go though a localization dictionary...) Please don't put this in the language. The problem is harder than it seems (there are European languages that pluralize differently on $X % 10, IIRC; 0 is singular or plural depending on the language, etc etc). Look at the documentation of GNU gettext, or the translation guidelines for KDE, to get the whole mess. We already have Locale::MakeText. To get the whole magical interpolation, we'd just have to define a suitable quoting construct, right? I know Perl is not minimal, but sometimes I feel that it will end up being maximal... and the more you put in the core, the less flexibility you get in the long term. -- Dakkar - Mobilis in mobile GPG public key fingerprint = A071 E618 DD2C 5901 9574 6FE2 40EA 9883 7519 3F88 key id = 0x75193F88 printk(%s: Boo!\n, dev-name); linux-2.6.19/drivers/net/depca.c signature.asc Description: PGP signature
Re: pluralization idea that keeps bugging me
Its only English centric if the idea is fixed to plurals, because its only for plurals where English words are mutated by grammar rules. In other languages, words are mutated by other factors, such as the gender of the word, the case, and the number. The problem can be quite difficult, say in Russian. Suppose you want to say something like Respected customer name and interpolate customer name from a database. In English, its a doddle. But in Russian, all adjectives (eg. 'respected') have both male and female forms, so the gender of customer has to be determined in order to correctly interpolate. And for plurals, some languages have different words for single, double and many forms. In Russian, the noun after the number has one form for 1 (nominative singular), another form (genitive singular) for numbers 2 to 4, and then a third form (genitive plural) for 5 and above. So, a simple plural hook is insufficient. Then take Welsh, its words mutate with prefixes as well as suffixes dependent on context. Whilst it would be nice for there to be a neat syntax for such things (thus avoiding English-centricity), the complexities of all languages might be too burdensome for core perl6. Amir E. Aharoni wrote: On 26/01/2008, Larry Wall [EMAIL PROTECTED] wrote: After a recent exchange on PerlMonks about join, I've been thinking about the problem of pluralization in interpolated strings, where we get things like: say Received $m message{ 1==$m ?? '' !! 's' }. ... Any other cute ideas? No matter what you do it will remain too English-centric. It might work for Catalan, too. But it will remain totally useless for Arabic or Chinese. In any case, i don't understand why should this be in the core language at all.
Re: pluralization idea that keeps bugging me
Gianni Ceccarelli wrote: Please don't put this in the language. The problem is harder than it seems (there are European languages that pluralize differently on $X % 10, IIRC; 0 is singular or plural depending on the language, etc etc). -snip- I know Perl is not minimal, but sometimes I feel that it will end up being maximal... and the more you put in the core, the less flexibility you get in the long term. This _does_ appear to be something more suitable for a Locale:: module. I just wonder if there are enough hooks in the core to allow for an appropriately brief syntax to be introduced in a module: can one roll one's own string interpolations as things stand? E.g., is there a way to add meaning to backslashed characters in a string that would normally lack meaning? Do we have the tools to build $m tool\s? -- Jonathan Dataweaver Lang
Re: pluralization idea that keeps bugging me
On Sat, Jan 26, 2008 at 08:58:43AM -0800, Larry Wall wrote: Last night I got a message entitled: yum: 1 Updates Available. Of course, that's probably just a Python programmer giving up on doing the right thing, but we see this sort of bletcherousness all the time. Any other cute ideas? It's worth reading the perldoc for Locale::Maketext and Locale::Maketext::TPJ13. Sean Burke did some truly excellent work explain a lot of the pitfalls here. Sean built us the only solution I've yet seen that gets pluralization reasonably ok in languages with non-English-like pluralization rules without making me want to just give up and write Updates found: 1 ;) -j
Re: pluralization idea that keeps bugging me
Yuval Kogman wrote: You can subclass the grammar and change everything. Theoretically that's a yes =) Right. One last question: is this (i.e., extending a string's grammar) a keep simple things simple thing, or a keep difficult things doable thing? -- Jonathan Dataweaver Lang
Re: pluralization idea that keeps bugging me
On Sat, Jan 26, 2008 at 18:43:50 -0800, Jonathan Lang wrote: Right. One last question: is this (i.e., extending a string's grammar) a keep simple things simple thing, or a keep difficult things doable thing? I'm going to guess somewhere in between. It should be about the same level of complexity as Filter::Simple, except with much finer control and more correctness. I'm not the best person to answer this though. -- Yuval Kogman [EMAIL PROTECTED] http://nothingmuch.woobling.org 0xEBD27418 pgpGuOUMaC21l.pgp Description: PGP signature
Re: pluralization idea that keeps bugging me
On Sat, Jan 26, 2008 at 18:12:17 -0800, Jonathan Lang wrote: This _does_ appear to be something more suitable for a Locale:: module. I just wonder if there are enough hooks in the core to allow for an appropriately brief syntax to be introduced in a module: can one roll one's own string interpolations as things stand? E.g., is there a way to add meaning to backslashed characters in a string that would normally lack meaning? You can subclass the grammar and change everything. Theoretically that's a yes =) -- Yuval Kogman [EMAIL PROTECTED] http://nothingmuch.woobling.org 0xEBD27418 pgpY4J1EXkC6j.pgp Description: PGP signature