Re: Possible contrib.humanize addition
On Jan 5, 9:24 pm, harrymwrote: > I'm working a templatetag that determines whether to use 'a' or 'an' > in front of English words. My particular use case for this is in a > tumblelog app I'm developing - many different types of entry may be > added (link, html, quote, etc), and I'm linking to the 'Add a[n] > entry' pages by iterating over the different types. Would this > be considered a useful addition to contrib.humanize? > > The two main reasons against it I see are that firstly, it only works > for English words, so would be of little use to developers using > foreign languages, and secondly, it perhaps wouldn't be as widely used > as the other filters in there. > Here's a snippet I wrote a while back you may want to check out too: www.djangosnippets.org/snippets/1519/ -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-develop...@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: Possible contrib.humanize addition
More of an academic question, as it likely isn't a feasible solution for Django, but might a soundex solve this problem? Best I can tell, rules for articles, without exception, are based on the pronunciation of the following word.. Of course, phonology can be regional, subjective, and unpredictable. "Wind" (the flow of gases) and "wind" (circular weaving) are identical to a template tag but have different vowel sounds. The "a" sound in "bag" is pronounced much differently in northern Minnesota (where it's bay-g) than they do in Baltimore. This feels unsolvable. On Wed, Jan 6, 2010 at 9:56 AM, Hanne Moawrote: > 2010/1/6 sago : > >> If you present some research to > >> demonstrate how this tag could/would work for non-English languages, > >> it would be a lot more compelling. > > > > That's not going to work, in any meaningful sense. That peculiarity of > > the article is highly English-specific. The generalization would > > surely be something like > > > > {% if /some-regex/.matches(word) %}{{ form1 }} {{ word }}{% else %} > > {{ form2 }} {{ word }}{% endif %} > > Disclaimer: I have a masters degree in Computational Linguistics. Ths > is a simplified account of "last year of bachelor"-stuff: > > Human language cannot (mathematically proven) be modelled by a mere > regexp, as human language is not only context-free, (needing a full > parser) but context-sensitive (needing parsers we don't really have > yet). Nice, yes? > > It cannot go in humanize but it could go in localflavor for English. > It would be necessary with a stemmer and a replaceable wordlist > though, as what words get "an" and what get "a" not only depends on > country but also on specific publishing styles - and all of this has a > tendency to change over time. > > > HM > > -- > You received this message because you are subscribed to the Google Groups > "Django developers" group. > To post to this group, send email to django-develop...@googlegroups.com. > To unsubscribe from this group, send email to > django-developers+unsubscr...@googlegroups.com > . > For more options, visit this group at > http://groups.google.com/group/django-developers?hl=en. > > > > -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-develop...@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: Possible contrib.humanize addition
2010/1/6 sago: >> If you present some research to >> demonstrate how this tag could/would work for non-English languages, >> it would be a lot more compelling. > > That's not going to work, in any meaningful sense. That peculiarity of > the article is highly English-specific. The generalization would > surely be something like > > {% if /some-regex/.matches(word) %}{{ form1 }} {{ word }}{% else %} > {{ form2 }} {{ word }}{% endif %} Disclaimer: I have a masters degree in Computational Linguistics. Ths is a simplified account of "last year of bachelor"-stuff: Human language cannot (mathematically proven) be modelled by a mere regexp, as human language is not only context-free, (needing a full parser) but context-sensitive (needing parsers we don't really have yet). Nice, yes? It cannot go in humanize but it could go in localflavor for English. It would be necessary with a stemmer and a replaceable wordlist though, as what words get "an" and what get "a" not only depends on country but also on specific publishing styles - and all of this has a tendency to change over time. HM -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-develop...@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: Possible contrib.humanize addition
The code I've got so far works pretty well - I've tested it on some medium-sized corpora and the only times the expected result was different from the actual result was when the corpus was wrong. The code works by first checking a few specific rules for numbers and acromyns, then checking against a few exceptional cases (word prefixes), then checking whether the word starts with a vowel. Most of the rules came from some Perl code I found a while a go - just ported them over to Python. But I agree that this would be far too difficult ( / impossible) to make multi-lingual so is perhaps not appropriate for inclusion in Django. Harry On Jan 6, 2:17 pm, sagowrote: > > Hmm, can it handle the following? > > > an honest man > > a history book > > an historical book (debatable) > > It can't, the rules for the indefinite article around 'h' are complex > and depend on the etymology of the word used. To add complexity the > lexicographic rules are often different to the rules for speech, and > UK rules differ from US rules (and possibly Oz too, but I don't > know). > > > If you present some research to > > demonstrate how this tag could/would work for non-English languages, > > it would be a lot more compelling. > > That's not going to work, in any meaningful sense. That peculiarity of > the article is highly English-specific. The generalization would > surely be something like > > {% if /some-regex/.matches(word) %}{{ form1 }} {{ word }}{% else %} > {{ form2 }} {{ word }}{% endif %} > > where the regex is language and context dependent. There are various > regex replacement filters/tags out in the djangosphere. Could you use > one of them? > > > (That's NT Koine Greek, it might be different/simpler/more complicated > > in modern Greek). > > What is it about Django and NT scholars - have you come across James > Tauber (of Pinax fame?) > > Ian. -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-develop...@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: Possible contrib.humanize addition
On Wed, Jan 6, 2010 at 8:17 AM, sagowrote: > What is it about Django and NT scholars - have you come across James > Tauber (of Pinax fame?) There are at least three Django committers who can list one or another ancient Greek dialect among their studies. Not sure why that is, but it does make for fun conversation over drinks. -- "Bureaucrat Conrad, you are technically correct -- the best kind of correct." -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-develop...@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: Possible contrib.humanize addition
> Hmm, can it handle the following? > > an honest man > a history book > an historical book (debatable) It can't, the rules for the indefinite article around 'h' are complex and depend on the etymology of the word used. To add complexity the lexicographic rules are often different to the rules for speech, and UK rules differ from US rules (and possibly Oz too, but I don't know). > If you present some research to > demonstrate how this tag could/would work for non-English languages, > it would be a lot more compelling. That's not going to work, in any meaningful sense. That peculiarity of the article is highly English-specific. The generalization would surely be something like {% if /some-regex/.matches(word) %}{{ form1 }} {{ word }}{% else %} {{ form2 }} {{ word }}{% endif %} where the regex is language and context dependent. There are various regex replacement filters/tags out in the djangosphere. Could you use one of them? > (That's NT Koine Greek, it might be different/simpler/more complicated > in modern Greek). What is it about Django and NT scholars - have you come across James Tauber (of Pinax fame?) Ian. -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-develop...@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: Possible contrib.humanize addition
On Tuesday 05 January 2010 21:24:13 harrym wrote: > I'm working a templatetag that determines whether to use 'a' or > 'an' in front of English words. My particular use case for this is > in a tumblelog app I'm developing - many different types of entry > may be added (link, html, quote, etc), and I'm linking to the 'Add > a[n] entry' pages by iterating over the different types. > Would this be considered a useful addition to contrib.humanize? Hmm, can it handle the following? an honest man a history book an historical book (debatable) My gut instinct is that it's not possible to work this out programmatically. When it comes to other languages, I imagine it's going to be even harder (if it's possible to get harder than 'impossible'), because you have things like gender and case to worry about, which certainly cannot be worked out by an algorithm. To give some examples, in French, the choice is between 'un' and 'une', depending on whether the word is masculine or feminine. In Greek, the choice is between ̔εις, ̔ενα, ̔ενος, ̔ενι, μια, μιαν, μιας, μια, ̔εν, ̔εν, ̔ενος, ̔ενι, depending on whether the word is masculine, feminine or neuter, and in nominative, accusative, genitive or dative case. Although in many cases you would probably omit the article altogether - the above words often mean "one" rather than "a". (That's NT Koine Greek, it might be different/simpler/more complicated in modern Greek). I imagine there are plenty of languages where this gets even worse, violating almost every assumption you don't even know you are making (like whether the article comes before or after or in the middle, or exists at all, etc. etc.) To summarise: if I were you, I would give up now. Luke -- "Mediocrity: It takes a lot less time, and most people don't realise until it's too late." (despair.com) Luke Plant || http://lukeplant.me.uk/ -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-develop...@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: Possible contrib.humanize addition
Thanks for your reply - I'll have a look into how this would work with other languages and get back to you if it looks like it would work easily with other languages. Regards, Harry On Jan 6, 3:45 am, Russell Keith-Mageewrote: > On Wed, Jan 6, 2010 at 5:24 AM, harrym wrote: > > I'm working a templatetag that determines whether to use 'a' or 'an' > > in front of English words. My particular use case for this is in a > > tumblelog app I'm developing - many different types of entry may be > > added (link, html, quote, etc), and I'm linking to the 'Add a[n] > > entry' pages by iterating over the different types. Would this > > be considered a useful addition to contrib.humanize? > > > The two main reasons against it I see are that firstly, it only works > > for English words, so would be of little use to developers using > > foreign languages, and secondly, it perhaps wouldn't be as widely used > > as the other filters in there. > > It sounds like a potentially interesting addition to contrib.humanize, > but you have hit both of the objections that I would raise. > > The foreign language limitation is particularly important - if we're > going to introduce a tag like this, then it should be able to be used > for languages other than English. If you present some research to > demonstrate how this tag could/would work for non-English languages, > it would be a lot more compelling. > > Yours, > Russ Magee %-) -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-develop...@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: Possible contrib.humanize addition
On Wed, Jan 6, 2010 at 5:24 AM, harrymwrote: > I'm working a templatetag that determines whether to use 'a' or 'an' > in front of English words. My particular use case for this is in a > tumblelog app I'm developing - many different types of entry may be > added (link, html, quote, etc), and I'm linking to the 'Add a[n] > entry' pages by iterating over the different types. Would this > be considered a useful addition to contrib.humanize? > > The two main reasons against it I see are that firstly, it only works > for English words, so would be of little use to developers using > foreign languages, and secondly, it perhaps wouldn't be as widely used > as the other filters in there. It sounds like a potentially interesting addition to contrib.humanize, but you have hit both of the objections that I would raise. The foreign language limitation is particularly important - if we're going to introduce a tag like this, then it should be able to be used for languages other than English. If you present some research to demonstrate how this tag could/would work for non-English languages, it would be a lot more compelling. Yours, Russ Magee %-) -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-develop...@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.