Re: Possible contrib.humanize addition

2010-01-06 Thread SmileyChris


On Jan 5, 9:24 pm, harrym  wrote:
> I'm working a templatetag that determines whether to use 'a' or 'an'
> in front of English words. My particular use case for this is in a
> tumblelog app I'm developing - many different types of entry may be
> added (link, html, quote, etc), and I'm linking to the 'Add a[n]
>  entry' pages by iterating over the different types. Would this
> be considered a useful addition to contrib.humanize?
>
> The two main reasons against it I see are that firstly, it only works
> for English words, so would be of little use to developers using
> foreign languages, and secondly, it perhaps wouldn't be as widely used
> as the other filters in there.
>
Here's a snippet I wrote a while back you may want to check out too:
www.djangosnippets.org/snippets/1519/
-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-develop...@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.




Re: Possible contrib.humanize addition

2010-01-06 Thread Chuck Harmston
More of an academic question, as it likely isn't a feasible solution for
Django, but might a soundex solve this problem? Best I can tell, rules for
articles, without exception, are based on the pronunciation of the following
word..

Of course, phonology can be regional, subjective, and unpredictable. "Wind"
(the flow of gases) and "wind" (circular weaving) are identical to a
template tag but have different vowel sounds. The "a" sound in "bag" is
pronounced much differently in northern Minnesota (where it's bay-g) than
they do in Baltimore.

This feels unsolvable.


On Wed, Jan 6, 2010 at 9:56 AM, Hanne Moa  wrote:

> 2010/1/6 sago :
> >> If you present some research to
> >> demonstrate how this tag could/would work for non-English languages,
> >> it would be a lot more compelling.
> >
> > That's not going to work, in any meaningful sense. That peculiarity of
> > the article is highly English-specific. The generalization would
> > surely be something like
> >
> > {% if /some-regex/.matches(word) %}{{ form1 }} {{ word }}{% else %}
> > {{ form2 }} {{ word }}{% endif %}
>
> Disclaimer: I have a masters degree in Computational Linguistics. Ths
> is a simplified account of  "last year of bachelor"-stuff:
>
> Human language cannot (mathematically proven) be modelled by a mere
> regexp, as human language is not only context-free, (needing a full
> parser) but context-sensitive (needing parsers we don't really have
> yet). Nice, yes?
>
> It cannot go in humanize but it could go in localflavor for English.
> It would be necessary with a stemmer and a replaceable wordlist
> though, as what words get "an" and what get "a" not only depends on
> country but also on specific publishing styles - and all of this has a
> tendency to change over time.
>
>
> HM
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django developers" group.
> To post to this group, send email to django-develop...@googlegroups.com.
> To unsubscribe from this group, send email to
> django-developers+unsubscr...@googlegroups.com
> .
> For more options, visit this group at
> http://groups.google.com/group/django-developers?hl=en.
>
>
>
>
-- 

You received this message because you are subscribed to the Google Groups "Django developers" group.

To post to this group, send email to django-develop...@googlegroups.com.

To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com.

For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.



Re: Possible contrib.humanize addition

2010-01-06 Thread Hanne Moa
2010/1/6 sago :
>> If you present some research to
>> demonstrate how this tag could/would work for non-English languages,
>> it would be a lot more compelling.
>
> That's not going to work, in any meaningful sense. That peculiarity of
> the article is highly English-specific. The generalization would
> surely be something like
>
> {% if /some-regex/.matches(word) %}{{ form1 }} {{ word }}{% else %}
> {{ form2 }} {{ word }}{% endif %}

Disclaimer: I have a masters degree in Computational Linguistics. Ths
is a simplified account of  "last year of bachelor"-stuff:

Human language cannot (mathematically proven) be modelled by a mere
regexp, as human language is not only context-free, (needing a full
parser) but context-sensitive (needing parsers we don't really have
yet). Nice, yes?

It cannot go in humanize but it could go in localflavor for English.
It would be necessary with a stemmer and a replaceable wordlist
though, as what words get "an" and what get "a" not only depends on
country but also on specific publishing styles - and all of this has a
tendency to change over time.


HM
-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-develop...@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.




Re: Possible contrib.humanize addition

2010-01-06 Thread harrym
The code I've got so far works pretty well - I've tested it on some
medium-sized corpora and the only times the expected result was
different from the actual result was when the corpus was wrong. The
code works by first checking a few specific rules for numbers and
acromyns, then checking against a few exceptional cases (word
prefixes), then checking whether the word starts with a vowel. Most of
the rules came from some Perl code I found a while a go - just ported
them over to Python.

But I agree that this would be far too difficult ( / impossible) to
make multi-lingual so is perhaps not appropriate for inclusion in
Django.

Harry

On Jan 6, 2:17 pm, sago  wrote:
> > Hmm, can it handle the following?
>
> >  an honest man
> >  a history book
> >  an historical book (debatable)
>
> It can't, the rules for the indefinite article around 'h' are complex
> and depend on the etymology of the word used. To add complexity the
> lexicographic rules are often different to the rules for speech, and
> UK rules differ from US rules (and possibly Oz too, but I don't
> know).
>
> > If you present some research to
> > demonstrate how this tag could/would work for non-English languages,
> > it would be a lot more compelling.
>
> That's not going to work, in any meaningful sense. That peculiarity of
> the article is highly English-specific. The generalization would
> surely be something like
>
> {% if /some-regex/.matches(word) %}{{ form1 }} {{ word }}{% else %}
> {{ form2 }} {{ word }}{% endif %}
>
> where the regex is language and context dependent. There are various
> regex replacement filters/tags out in the djangosphere. Could you use
> one of them?
>
> > (That's NT Koine Greek, it might be different/simpler/more complicated
> > in modern Greek).
>
> What is it about Django and NT scholars - have you come across James
> Tauber (of Pinax fame?)
>
> Ian.
-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-develop...@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.




Re: Possible contrib.humanize addition

2010-01-06 Thread James Bennett
On Wed, Jan 6, 2010 at 8:17 AM, sago  wrote:
> What is it about Django and NT scholars - have you come across James
> Tauber (of Pinax fame?)

There are at least three Django committers who can list one or another
ancient Greek dialect among their studies. Not sure why that is, but
it does make for fun conversation over drinks.


-- 
"Bureaucrat Conrad, you are technically correct -- the best kind of correct."
-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-develop...@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.




Re: Possible contrib.humanize addition

2010-01-06 Thread sago
> Hmm, can it handle the following?
>
>  an honest man
>  a history book
>  an historical book (debatable)

It can't, the rules for the indefinite article around 'h' are complex
and depend on the etymology of the word used. To add complexity the
lexicographic rules are often different to the rules for speech, and
UK rules differ from US rules (and possibly Oz too, but I don't
know).

> If you present some research to
> demonstrate how this tag could/would work for non-English languages,
> it would be a lot more compelling.

That's not going to work, in any meaningful sense. That peculiarity of
the article is highly English-specific. The generalization would
surely be something like

{% if /some-regex/.matches(word) %}{{ form1 }} {{ word }}{% else %}
{{ form2 }} {{ word }}{% endif %}

where the regex is language and context dependent. There are various
regex replacement filters/tags out in the djangosphere. Could you use
one of them?

> (That's NT Koine Greek, it might be different/simpler/more complicated
> in modern Greek).
What is it about Django and NT scholars - have you come across James
Tauber (of Pinax fame?)

Ian.
-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-develop...@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.




Re: Possible contrib.humanize addition

2010-01-06 Thread Luke Plant
On Tuesday 05 January 2010 21:24:13 harrym wrote:
> I'm working a templatetag that determines whether to use 'a' or
>  'an' in front of English words. My particular use case for this is
>  in a tumblelog app I'm developing - many different types of entry
>  may be added (link, html, quote, etc), and I'm linking to the 'Add
>  a[n]  entry' pages by iterating over the different types.
>  Would this be considered a useful addition to contrib.humanize?

Hmm, can it handle the following?

 an honest man
 a history book
 an historical book (debatable)

My gut instinct is that it's not possible to work this out 
programmatically.  When it comes to other languages, I imagine it's 
going to be even harder (if it's possible to get harder than 
'impossible'), because you have things like gender and case to worry 
about, which certainly cannot be worked out by an algorithm.

To give some examples, in French, the choice is between 'un' and 
'une', depending on whether the word is masculine or feminine.  In 
Greek, the choice is between  ̔εις, ̔ενα, ̔ενος, ̔ενι, μια, μιαν, 
μιας, μια, ̔εν, ̔εν, ̔ενος, ̔ενι, depending on whether the word is 
masculine, feminine or neuter, and in nominative, accusative, genitive 
or dative case. Although in many cases you would probably omit the 
article altogether - the above words often mean "one" rather than "a".
(That's NT Koine Greek, it might be different/simpler/more complicated 
in modern Greek).

I imagine there are plenty of languages where this gets even worse, 
violating almost every assumption you don't even know you are making 
(like whether the article comes before or after or in the middle, or 
exists at all, etc. etc.)

To summarise: if I were you, I would give up now.

Luke

-- 
"Mediocrity: It takes a lot less time, and most people don't 
realise until it's too late." (despair.com)

Luke Plant || http://lukeplant.me.uk/
-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-develop...@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.




Re: Possible contrib.humanize addition

2010-01-05 Thread harrym
Thanks for your reply - I'll have a look into how this would work with
other languages and get back to you if it looks like it would work
easily with other languages.

Regards,
Harry

On Jan 6, 3:45 am, Russell Keith-Magee  wrote:
> On Wed, Jan 6, 2010 at 5:24 AM, harrym  wrote:
> > I'm working a templatetag that determines whether to use 'a' or 'an'
> > in front of English words. My particular use case for this is in a
> > tumblelog app I'm developing - many different types of entry may be
> > added (link, html, quote, etc), and I'm linking to the 'Add a[n]
> >  entry' pages by iterating over the different types. Would this
> > be considered a useful addition to contrib.humanize?
>
> > The two main reasons against it I see are that firstly, it only works
> > for English words, so would be of little use to developers using
> > foreign languages, and secondly, it perhaps wouldn't be as widely used
> > as the other filters in there.
>
> It sounds like a potentially interesting addition to contrib.humanize,
> but you have hit both of the objections that I would raise.
>
> The foreign language limitation is particularly important - if we're
> going to introduce a tag like this, then it should be able to be used
> for languages other than English. If you present some research to
> demonstrate how this tag could/would work for non-English languages,
> it would be a lot more compelling.
>
> Yours,
> Russ Magee %-)
-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-develop...@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.




Re: Possible contrib.humanize addition

2010-01-05 Thread Russell Keith-Magee
On Wed, Jan 6, 2010 at 5:24 AM, harrym  wrote:
> I'm working a templatetag that determines whether to use 'a' or 'an'
> in front of English words. My particular use case for this is in a
> tumblelog app I'm developing - many different types of entry may be
> added (link, html, quote, etc), and I'm linking to the 'Add a[n]
>  entry' pages by iterating over the different types. Would this
> be considered a useful addition to contrib.humanize?
>
> The two main reasons against it I see are that firstly, it only works
> for English words, so would be of little use to developers using
> foreign languages, and secondly, it perhaps wouldn't be as widely used
> as the other filters in there.

It sounds like a potentially interesting addition to contrib.humanize,
but you have hit both of the objections that I would raise.

The foreign language limitation is particularly important - if we're
going to introduce a tag like this, then it should be able to be used
for languages other than English. If you present some research to
demonstrate how this tag could/would work for non-English languages,
it would be a lot more compelling.

Yours,
Russ Magee %-)
-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-develop...@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.