pluralization idea that keeps bugging me

2008-01-26 Thread Larry Wall
Last night I got a message entitled: yum: 1 Updates Available.
Of course, that's probably just a Python programmer giving up on doing
the right thing, but we see this sort of bletcherousness all the time.

After a recent exchange on PerlMonks about join, I've been thinking
about the problem of pluralization in interpolated strings, where we
get things like:

say Received $m message{ 1==$m ?? '' !! 's' }.

My first thought is that this is such a common idiom that we ought
to have some syntactic sugar for it:

say Received $m message\s.

which reads nicely enough since the usual case is plural.
Basically, \s would be smart enough to magically know somehow whether
the last interpolation was 1 or not.  It would be particular nice when
the interpolation is a closure:

say Received {calculate_number_of_messages()} message\s.

That would cover most of the cases for English speakers using regular
nouns, but I wonder whether there's some kind of generalization that
would help for cases like:

say There was/were $o ox/oxen

But that doesn't work since / isn't a metacharacter.  Using an adverb
seems like overkill, if we can piggyback on an existing metachar.

Maybe something like

say There was\swere $o ox\soxen

where if anything alphabetic follows the \s it is the alternative
plural.  But note that the first \s there would have to be looking
forward rather than backward to do the verb, which constrains the
possible mechanisms, and makes it problematic to use \s multiple times:

say There was\swere $o ox\soxen and $g goat\s.

though that could be made clearer with explicit concatenation:

say There was\swere $o ox\soxen  ~ and $g goat\s.
say There was\swere $o ox\soxen , and $g goat\s.

Or maybe instead of using \ we should use a sigil:

say There $was|were $o $ox|oxen

except, of course, that $ is already taken.  Seems tacky to
use up a real variable name like:

say There $Xwas|were $o $Xox|oxen

I suppose one could make a case for Num vars having a . method though:

say There $owas|were $o $oox|oxen

That nicely resolves the ambiguity of

say There $owas|were $o $oox|oxen and $g goat$gs

but doesn't really help when you really need it, which is when you
interpolate something hairy:

say There $j.k.l.m.owas|were $j.k.l.m.o $j.k.l.m.oox|oxen and 
$j.k.l.m.g goat$j.k.l.m.gs

It's even less helpful when you interpolate a closure since there's
no variable name to refer to (unless you assign one, but then we're
losing much of our syntactic sugary wonderfulness).  So maybe we should
just make \s dwim and leave it at that.  Two dwimminesses, really.
The first dwim finds the associated interpolation, either the first
interpolation of a variable or closure before the \s, or if there
is none, the first one after.  Call that interpolated value $X for
the moment.  (It doesn't really have to have a real variable name,
but the important thing is not to evaluate the expression multiple
times since it might have side effects (including the side effect of
being inefficient to compute).)

The second dwim looks at the alphabeticality of the next character
(defined Unicodically, of course) to decide if there is one argument or two:

foo\s   means   $X == 1 ?? 'foo' !! 'foos'
foo\sbarmeans   $X == 1 ?? 'foo' !! 'bar'

Internally, you end up multiply dispatching to something like
pluralize($X,'foo') or pluralize($X,'foo','bar').  (Arguably we
could make pluralize interpolate the $X as well, but that only
works for noun agreement, not verb agreement.)

I think that probably handles most of the Indo-European cases, and
anything more complicated can revert to explicit code.  (Or go though
a localization dictionary...)

Any other cute ideas?  

Larry


Re: pluralization idea that keeps bugging me

2008-01-26 Thread Amir E. Aharoni
On 26/01/2008, Larry Wall [EMAIL PROTECTED] wrote:
 After a recent exchange on PerlMonks about join, I've been thinking
 about the problem of pluralization in interpolated strings, where we
 get things like:

 say Received $m message{ 1==$m ?? '' !! 's' }.

 ...

 Any other cute ideas?

No matter what you do it will remain too English-centric. It might
work for Catalan, too. But it will remain totally useless for Arabic
or Chinese.

In any case, i don't understand why should this be in the core language at all.

-- 
Amir Elisha Aharoni

English -  http://aharoni.wordpress.com
Hebrew  - http://haharoni.wordpress.com

We're living in pieces,
 I want to live in peace. - T. Moore


Re: pluralization idea that keeps bugging me

2008-01-26 Thread Jonathan Lang
Larry Wall wrote:
 Any other cute ideas?

If you have '\s', you'll also want '\S':

$n cat\s fight\S # 1 cat fights; 2 cats fight

I'm not fond of the 'ox\soxen' idea; but I could get behind something
like '\sox oxen' or 'ox\sen'.

'\sa b' would mean 'a is singular; b is plural'
'\sa' would be short for '\s a'
'\s' would be short for '\s s'
\Sa b' would reverse this.

Sometimes, you won't want the pluralization variable in the string
itself, or you won't know which one to use.  You could use an adverb
for this:

:s$nthe cat\s \sis are fighting.

and/or find a way to tag a variable in the string:

$owner's \s=$count cat\s

'\s=$count' means set plurality based on $count, and display $count normally.

-- 
Jonathan Dataweaver Lang


Re: pluralization idea that keeps bugging me

2008-01-26 Thread Austin Hastings
Jonathan makes an excellent point about s and S. In fact, there's 
probably a little language out there for this.


I don't think it needs to be in the core, though. But you could put in 
some kind of hook mechanism, so that detecting the presence of \s or 
whatever caused the string to be treated specially. Perhaps it gets a 
different, possibly more sophisticated, type? A type that is only 
in-core in a limited (English-only?) implementation, but which admins 
can install at whim.


=Austin


Jonathan Lang wrote:

Larry Wall wrote:
  

Any other cute ideas?



If you have '\s', you'll also want '\S':

$n cat\s fight\S # 1 cat fights; 2 cats fight

I'm not fond of the 'ox\soxen' idea; but I could get behind something
like '\sox oxen' or 'ox\sen'.

'\sa b' would mean 'a is singular; b is plural'
'\sa' would be short for '\s a'
'\s' would be short for '\s s'
\Sa b' would reverse this.

Sometimes, you won't want the pluralization variable in the string
itself, or you won't know which one to use.  You could use an adverb
for this:

:s$nthe cat\s \sis are fighting.

and/or find a way to tag a variable in the string:

$owner's \s=$count cat\s

'\s=$count' means set plurality based on $count, and display $count normally.

  




Re: pluralization idea that keeps bugging me

2008-01-26 Thread Dr.Ruud
Jonathan Lang schreef:

 I'm not fond of the 'ox\soxen' idea; but I could get behind something
 like '\sox oxen' or 'ox\sen'.

   $n ox\s en

   $n\sone multiple no cat\s s  fight\s s s

;)

-- 
Affijn, Ruud

Gewoon is een tijger.


Re: pluralization idea that keeps bugging me

2008-01-26 Thread Fagyal Csongor

Amir E. Aharoni wrote:

On 26/01/2008, Larry Wall [EMAIL PROTECTED] wrote:
  

After a recent exchange on PerlMonks about join, I've been thinking
about the problem of pluralization in interpolated strings, where we
get things like:

say Received $m message{ 1==$m ?? '' !! 's' }.

...

Any other cute ideas?



No matter what you do it will remain too English-centric. It might
work for Catalan, too. But it will remain totally useless for Arabic
or Chinese.

In any case, i don't understand why should this be in the core language at all.

I second that.

A few more thoughts:

1. For example in Hungarian, you don't need this at all: the noun stays 
singular after the numeral.


2. AFAIK in some languages it's not 1 ore more, but 1, 2 or more.

3. It's often not 1 or more what you need, but none, 1 ore more. No 
new messages - You have 1 new message - You have 3 new messages. Or 
more likely bNow new messages./b - a href=/read.aspYou have 1 
new message./a ... etc.


4. I work a lot with multilingual websites. I have learned long ago that 
it's never {{you_have}} [% messages %] {{messages}}. You have to be 
*very* lucky just to make this work in two languages. Instead, it's 
{{number_of_new_messages}}: [% messages %]. That pretty much works 
everywhere.



So not in the core, probably. There are too many exceptions. A module 
would be cool, though :) String::Plural::English, or whatnot.




- Fagzal




Re: pluralization idea that keeps bugging me

2008-01-26 Thread Yuval Kogman
To me this sounds like

use Lingua::EN::Pluralize::DSL;

which would overload your grammar locally to parse strings this way.

However, due to i18n reasons this should not be in the core.

It might make sense to ship a slightly modernized Locale::MakeText
with Perl 6 so that it can be used in the compiler itself, but
unless a fully open ended system like L::MT is included I think
having anything at all might be damaging, because this will
encourage people to use the partial solution that is already built
in instead of the complete on eon the CPAN (c.f. many core modules).

-- 
  Yuval Kogman [EMAIL PROTECTED]
http://nothingmuch.woobling.org  0xEBD27418



Re: pluralization idea that keeps bugging me

2008-01-26 Thread chromatic
On Saturday 26 January 2008 08:58:43 Larry Wall wrote:

 That would cover most of the cases for English speakers using regular
 nouns, but I wonder whether there's some kind of generalization that
 would help for cases like:

     say There was/were $o ox/oxen

That makes me wish for a subjunctive/optative mood marker.  I'm not sure why.

In-language localization and internationalization hooks do seem awfully 
useful, but English-only pluralization rules just might not cut it.

Nearly pain-free l10n and i18n *is* kind of a killer feature though.

-- c


Re: pluralization idea that keeps bugging me

2008-01-26 Thread Darren Duncan

At 8:58 AM -0800 1/26/08, Larry Wall wrote:

My first thought is that this is such a common idiom that we ought
to have some syntactic sugar for it:

say Received $m message\s.


I don't think that a feature like this should be in the core 
language; it is too complicated as well as an open-ended problem.


A better use of this discussion is perhaps to determine whether any 
more basic core features would need updating in order to support a 
separate extension module to more easily provide the feature that was 
being discussed.


-- Darren Duncan


Re: pluralization idea that keeps bugging me

2008-01-26 Thread Patrick R. Michaud
On Sat, Jan 26, 2008 at 08:58:43AM -0800, Larry Wall wrote:
 After a recent exchange on PerlMonks about join, I've been thinking
 about the problem of pluralization in interpolated strings, where we
 get things like:
 
 say Received $m message{ 1==$m ?? '' !! 's' }.
 
 My first thought is that this is such a common idiom that we ought
 to have some syntactic sugar for it:
 
 say Received $m message\s.

 [...]

 Any other cute ideas?  

FWIW, this sounds to me a lot like a special quoting operator or
adverbial form.

say qq:pluralized Received $m message\s.

Pm


Re: pluralization idea that keeps bugging me

2008-01-26 Thread Gianni Ceccarelli
On 2008-01-26 Larry Wall [EMAIL PROTECTED] wrote:
 Last night I got a message entitled: yum: 1 Updates Available.
 [snip a lot]
 I think that probably handles most of the Indo-European cases, and
 anything more complicated can revert to explicit code.  (Or go though
 a localization dictionary...)

Please don't put this in the language. The problem is harder than it
seems (there are European languages that pluralize differently on $X %
10, IIRC; 0 is singular or plural depending on the language, etc etc).

Look at the documentation of GNU gettext, or the translation
guidelines for KDE, to get the whole mess.

We already have Locale::MakeText. To get the whole magical
interpolation, we'd just have to define a suitable quoting construct,
right?

I know Perl is not minimal, but sometimes I feel that it will end up
being maximal... and the more you put in the core, the less
flexibility you get in the long term.

-- 
Dakkar - Mobilis in mobile
GPG public key fingerprint = A071 E618 DD2C 5901 9574
 6FE2 40EA 9883 7519 3F88
key id = 0x75193F88

printk(%s: Boo!\n, dev-name);
linux-2.6.19/drivers/net/depca.c


signature.asc
Description: PGP signature


Re: pluralization idea that keeps bugging me

2008-01-26 Thread Richard Hainsworth
Its only English centric if the idea is fixed to plurals, because its 
only for plurals where English words are mutated by grammar rules.


In other languages, words are mutated by other factors, such as the 
gender of the word, the case, and the number.


The problem can be quite difficult, say in Russian. Suppose you want to 
say something like Respected customer name and interpolate customer 
name from a database. In English, its a doddle. But in Russian, all 
adjectives (eg. 'respected') have both male and female forms, so the 
gender of customer has to be determined in order to correctly interpolate.


And for plurals, some languages have different words for single, double 
and many forms. In
Russian, the noun after the number has one form for 1 (nominative 
singular), another form (genitive singular) for numbers 2 to 4, and then 
a third form (genitive plural) for 5 and above. So, a simple plural hook 
is insufficient.


Then take Welsh, its words mutate with prefixes as well as suffixes 
dependent on context.


Whilst it would be nice for there to be a neat syntax for such things 
(thus avoiding English-centricity), the complexities of all languages 
might be too burdensome for core perl6.


Amir E. Aharoni wrote:

On 26/01/2008, Larry Wall [EMAIL PROTECTED] wrote:
  

After a recent exchange on PerlMonks about join, I've been thinking
about the problem of pluralization in interpolated strings, where we
get things like:

say Received $m message{ 1==$m ?? '' !! 's' }.

...

Any other cute ideas?



No matter what you do it will remain too English-centric. It might
work for Catalan, too. But it will remain totally useless for Arabic
or Chinese.

In any case, i don't understand why should this be in the core language at all.

  


Re: pluralization idea that keeps bugging me

2008-01-26 Thread Jonathan Lang
Gianni Ceccarelli wrote:
 Please don't put this in the language. The problem is harder than it
 seems (there are European languages that pluralize differently on $X %
 10, IIRC; 0 is singular or plural depending on the language, etc etc).

-snip-

 I know Perl is not minimal, but sometimes I feel that it will end up
 being maximal... and the more you put in the core, the less
 flexibility you get in the long term.

This _does_ appear to be something more suitable for a Locale::
module.  I just wonder if there are enough hooks in the core to allow
for an appropriately brief syntax to be introduced in a module: can
one roll one's own string interpolations as things stand?  E.g., is
there a way to add meaning to backslashed characters in a string that
would normally lack meaning?

Do we have the tools to build $m tool\s?

-- 
Jonathan Dataweaver Lang


Re: pluralization idea that keeps bugging me

2008-01-26 Thread jesse



On Sat, Jan 26, 2008 at 08:58:43AM -0800, Larry Wall wrote:
 Last night I got a message entitled: yum: 1 Updates Available.
 Of course, that's probably just a Python programmer giving up on doing
 the right thing, but we see this sort of bletcherousness all the time.
 
 Any other cute ideas?  
 

It's worth reading the perldoc for Locale::Maketext and
Locale::Maketext::TPJ13.  Sean Burke did some truly excellent work
explain a lot of the pitfalls here. Sean built us the only solution I've
yet seen that gets pluralization reasonably ok in languages with
non-English-like pluralization rules without making me want to just give
up and write Updates found: 1 ;)

-j


Re: pluralization idea that keeps bugging me

2008-01-26 Thread Jonathan Lang
Yuval Kogman wrote:
 You can subclass the grammar and change everything.

 Theoretically that's a yes =)

Right.  One last question: is this (i.e., extending a string's
grammar) a keep simple things simple thing, or a keep difficult
things doable thing?

-- 
Jonathan Dataweaver Lang


Re: pluralization idea that keeps bugging me

2008-01-26 Thread Yuval Kogman
On Sat, Jan 26, 2008 at 18:43:50 -0800, Jonathan Lang wrote:

 Right.  One last question: is this (i.e., extending a string's
 grammar) a keep simple things simple thing, or a keep difficult
 things doable thing?

I'm going to guess somewhere in between.

It should be about the same level of complexity as Filter::Simple,
except with much finer control and more correctness.

I'm not the best person to answer this though.

-- 
  Yuval Kogman [EMAIL PROTECTED]
http://nothingmuch.woobling.org  0xEBD27418



pgpGuOUMaC21l.pgp
Description: PGP signature


Re: pluralization idea that keeps bugging me

2008-01-26 Thread Yuval Kogman
On Sat, Jan 26, 2008 at 18:12:17 -0800, Jonathan Lang wrote:

 This _does_ appear to be something more suitable for a Locale::
 module.  I just wonder if there are enough hooks in the core to allow
 for an appropriately brief syntax to be introduced in a module: can
 one roll one's own string interpolations as things stand?  E.g., is
 there a way to add meaning to backslashed characters in a string that
 would normally lack meaning?

You can subclass the grammar and change everything.

Theoretically that's a yes =)

-- 
  Yuval Kogman [EMAIL PROTECTED]
http://nothingmuch.woobling.org  0xEBD27418



pgpY4J1EXkC6j.pgp
Description: PGP signature