[Mt-list] [OT] Terminology relative to NLP for (African) pi-languages and pi-pairs, towards more oral systems

2007-01-15 Thread Christian Boitet

Hi, 15/1/07

At 18:47 -0400 25/08/06, Don Osborn wrote:

Thanks to all who have responded in this thread. I will follow up offline.

Re the question of terminology, and minority languages in particular, here
are a few quick thoughts (with apologies for taking this off on a tangent):

1. I hadn't thought of minority being offensive, but I guess we need to be
attentive to such matters. The main problem with the term I saw was its
imprecision. There was not long ago a project to compile information on
minority languages. To the surprise of a few people asked about it,
including me, Hausa was one of them (next to Swahili it supposedly has the
highest speakership of all African languages). But when we discussed it
further, the criteria indeed seemed to admit it: In Hausaland across much
of Niger and Nigeria it is the main language, but Hausaphones are minorities
elsewhere and it is spoken as a trade language by some people further away.
However, by extension, then, just about every other language in Africa is
minority as well. What capped it was discovering that Chinese also
qualified as a minority language - which it is in fact in many countries,
though we wouldn't think to call it, or Spanish or English, etc. As Francis
puts it, situational minority languages. But that just shows how dependent
the term is on context.


yes!


2. So people grope for an appropriate term. For more widely spoken
languages, LWC for language of wider communication emerged at some point
(rather like lingua francas, but let's not try to sort out the difference
between those two here). And at the other extreme there are endangered
languages about which, although definitions can vary, there is a generally
accepted sense of what it means (though even on that I've read references to
Igbo, a language spoken by somewhere on the order of 20 million people
described as endangered - but let's not delve into the issues there
either). But in between those two what do you say? Small languages as
shorthand for less widely spoken languages are more appropriately spoken
of as the latter - but that's too cumbersome. In Europe there was the term
lesser-used languages but with uncertain implications - less people speak
then or those that do use them less or both? Local languages is one that
I've tried to avoid lately because it seems to me to be used in a way that
reduces the languages status, and is applied only in some parts of the world
(and what of local when you have, say, Wolof-speaking merchants in New
York and Paris, for instance?). In Francophone countries the term langue
partenaire has been coined, but that raises questions of what kind of
partnership, and who's partner with whom and why and so on

3. A lot depends of course on context. Under-resourced languages is very
descriptive for ICT contexts and even some traditional technologies (e.g.,
no textbooks in so many less-widely-spoken-languages for the better part of
the past century - now that's under-resourced). But maybe not in demographic
or sociolinguistic contexts. Just for an example, Fula definitely is under
resourced in the technical and monetary sense, but definitely not
linguistically (e.g., its lexicon is staggering - there's a large dictionary
of the roots alone). Less commonly taught languages (LCTLs) is purely an
academic reference. Pi-language is a new one on me but seems to be mainly
a technical reference (pi=poorly informatisées or what?).


- pi-language (pi for poorly equipped or having poor [NLP] resources is a
  term coined by Vincent Berment in his PhD on 
the computerization of groups of pi-languages, 
2004, applications to Lao, Khmer, Bengali, ThaiŠ


- under-resourced language was introduced in 
2005 by a native English colleague I can't 
remember the name of at the moment, to translate 
the title and description of a workshop dedicated 
to NLP for pi-languages at TALN-05.




4. I ran into this problem personally when I wanted a way to refer to a very
wide class of languages not counting the LWCs as LWCs, and came up with an
acronym that I think covers the intended field and is in itself
constructively ambiguous: MINEL - where M is maternal (which is every
language, but here the emphasis is on this role as opposed to the 2nd
language role) or minority (sorry!); I is indigenous (which also can mean
anything, but here meant in the sense of languages of indigenous peoples;
N is national which is an appellation more common in Francophone countries
especially in Africa and is *not* the same as official; E is endangered,
or ethnic which one will hear with regard to languages in some parts of
the world (funny that a language might be referred to as ethnic and not
indigenous or vice-vera, but the criteria for the distinction are arguable);
and L could be less-widely-spoken or even local or, well, language.

That about runs the gamut, from what I have. Hope all have a good weekend
(some of you are in the midst of it and others just starting, and some of us

RE: [Mt-list] [OT] Terminology relative to NLP for (African) pi-languages and pi-pairs, towards more oral systems

2007-01-15 Thread Harold Somers
 
 - under-resourced language was introduced in
 2005 by a native English colleague I can't remember the name 
 of at the moment, to translate the title and description of a 
 workshop dedicated to NLP for pi-languages at TALN-05.

My feeling is that this term was in use before 2005. For example, Steve
Bird, writing in February 2004,  gives here
(http://itre.cis.upenn.edu/~myl/languagelog/archives/000481.html) a nice
list of alternative terms including under-resourced.

I think I coined the term non-indigenous minority language (NIML) in
1997, which enjoyed a brief period of currency, to cover the idea of a
language spoken by a local minority (e.g. Urdu in the UK), but to get
away from the predominant association at that time between the term
minority langauge, in Europe at least, and languages like Welsh,
Breton and so on.

In the USA, the term low-density language seemed to be prevalent: I
first came across it at the 1998 AMTA conference, in the title of Jones
and Havrilla's paper. In their paper they gloss low density as
languages of low diffusion, world minority langauges, ie langauges
for which major online resources are typically not available. As I
always liked to point out at that time, one of the world's top 3
languages (in terms of numbers of speakers), namely Hindi-Urdu, was
certainly a low-density language in this sense, but not a world minority
language; but at least the question of resources was foregrounded by
this term. I used the term in a 2001 paper. 

The term lesser-studied languages (sic) was current in 2000, when it
was used in the title of a NATO Advanced Study symposium organized by
Kemal Oflazer. This list has already discussed (in Feb 2005) the
appropriateness or otherwise of lesser as opposed to less, so let's
not rerun that one (but as a pernickety native speaker I might point out
that a less-spoken language is spoken by fewer, not less, people!)

 Last bit of thought: we should be more precise about the term 
 language.
 For example, Chinese is not 1 language, but several (no 
 oral intercomprehension), with their
 dialects: Mandarin, Cantonese, Wu (Shanghainese is a dialect 
 of Wu), Fujien, etc. Or Arabic, for that matter. And that is 

Mandarin and Cantonese are mutually non-comprehensible languages, not
dialects. To describe them as dialects of Chinese would be like
describing French and German as dialects of European. 

But it is fairly difficult to be very precise about the term language.
As was famously stated, possibly by Max Weinreich, A language is a
dialect with an army and navy (though see
http://en.wikipedia.org/wiki/Language_is_a_dialect_with_an_army_and_navy
). Although he was referring to Yiddish, Flemish~Dutch is the example
that always springs to my mind.

___
Mt-list mailing list