RE: [Mt-list] Cheeseburgery hamburgers the problem of computerisedtranslations

2009-02-05 Thread Don Osborn
Thanks Mikel. I see our comment got a positive comment too.

Some of this is just a continual need for education. I hope that FT steps up 
and does a solid article about were the technology really is today and where it 
seems to be going.

It has been a while since we've communicated about Apertium etc. In the context 
of Africa I think we're moving towards being able to develop such tools for 
African languages. At the moment efforts are relatively minor. I think I 
related my idea that South Africa would be a great place to develop MT for 
closely related languages - they have the resources, the policy commitment in 
principle to linguistic diversity (not a small matter), and two sets of 
official languages that are closely releted. With the need to produce some 
documents in the various official languages, the ability to facilitate 
translation from say Zulu into Xhosa (Nguni languages) or Sotho into Tswana 
could be quite important. However, most of the talent to work on such is 
otherwise occupied with locales, termoinology, fonts, keyboards etc in the 
African Network for Localisation (ANLoc) project. In the longer term it will 
get attention...

All the beat. 


 -Original Message-
 From: Mikel L. Forcada []
 Sent: Sunday, February 01, 2009 2:08 AM
 Cc: Don Osborn
 Subject: Re: [Mt-list] Cheeseburgery hamburgers  the problem of
 El Saturday 31 January 2009 15:53:06 Don Osborn va escriure:
  FYI, this item on a Financial Times blog may be of interest - another
  article on how inadequate MT is. I posted a comment; others may want
 I did. Thanks for pointing out.
 Mikel Forcada
  Cheeseburgery hamburgers and the problem of computerised translations
  January 26, 2009by Tony Barber
 r oblem-of-computerised-translations/
 Mikel L. Forcada

Mt-list mailing list

[Mt-list] Cheeseburgery hamburgers the problem of computerised translations

2009-01-31 Thread Don Osborn
FYI, this item on a Financial Times blog may be of interest - another
article on how inadequate MT is. I posted a comment; others may want to


Cheeseburgery hamburgers and the problem of computerised translations

January 26, 2009by Tony Barber


Mt-list mailing list

[Mt-list] Review of MT articles in MultiLingual

2008-06-30 Thread Don Osborn
I just posted a review of sorts of the several articles on MT in the
April-May edition of MultiLingual at
. Nothing too significant in that - and indeed I realize how much I have to
learn on the topic - but I thought it was worth calling attention to the
topics addressed. Are we at a paradigm shift or watershed point in the
practical use of MT and perceptions of it in the business world (if not yet
the public at large)?


Comments, corrections, etc. are welcome.


Don Osborn


Mt-list mailing list

[Mt-list] Dvorak on MT Computing's Final Frontiers

2008-02-27 Thread Don Osborn
Although it's not a technical article, a column in the March edition of PC
Magazine by John Dvorak may be of interest. Among other things he deplores
the state of MT. See:,2704,2256955,00.asp 


FWIW, I put up some comments at in which I
consider some broader issues.


Don Osborn

Mt-list mailing list

[Mt-list] Finnish firm develops machine translation technology

2007-05-20 Thread Don Osborn
FYI, just saw this item from Engineering News:

This may not be news to many/most(?) of you. Whenever I see an article in
any field that claims a new generation technology my first question (with
no disrespect intended) is how does it really compare to other new
generation (or old) technologies?

Don Osborn

Views  Columnists  Hitech Briefs - Karel Smrcka
Finnish firm develops machine translation technology
Published: 18 May 07 - 0:00

Sunda Systems, of Finland, has developed a new-generation
machine-translation technology that can be used to develop efficient machine
translators for any pair of languages you care to name. The Sunda MT
Workbench provides a set of tools for building high-quality machine
translators for even small languages that have been bypassed so far by major

Sunda Systems applies a number of key technologies to guarantee the
suitability of its Sunda MT Workbench for a wide variety of languages and
ensure that it can be used for any pair of languages, and contains all the
tools needed for building a machine translator from scratch.

Dependency theory, for example, is employed to handle sentence structure, as
structures that can be quite different on the surface in different languages
can actually be quite close when projected on this abstract level.

The company has also pioneered the principle of parallel translation. Among
the most important tangible theoretical and practical benefits of this
approach is efficiency as a common processor can translate thousands of
sentences in a second. This principle also keeps linguistic and
computational issues strictly separate, and enables linguists to concentrate
on linguistic issues and see the effects of a linguistic change on the
system in only a few seconds.

The Sunda MT Workbench also includes tools for quality control and teamwork.

A high-quality English-Finnish translator built, using the Sunda MT
Workbench, is already in wide use and has yielded good results.

To minimise the reworking needed for different languages, the Sunda approach
is based on the principle of late commitment, which means that the
processing of source- language sentences is conditioned to a specific target
language only when it is imperative - not before.

This means that a major portion of the source-language processing developed
for one pair of languages can be reused in a translator for another target

In addition to good translation quality, Sunda Systems has also prioritised
the need for its technology and the applications built around it to be
efficient, user-friendly, adaptable and robust.

Sunda's core translation engine can be embedded easily in external systems
using standard programming interfaces and Internet protocols, for example,
and runs seamlessly under most commonly used operating systems.

The engine also has built-in support for common file formats, such as RTM
and HTML - and documents written in these formats retain their formats in

Language-independent end- user applications are already avail-able to
translate home pages in a Web browser, translate formatted documents, and
translate general text content in desktop applications.

Coupon No.: EN0108492

Mt-list mailing list

[Mt-list] RE: [OT] Terminology relative to NLP for (African) pi-languages and pi-pairs, towards more oral systems

2007-01-16 Thread Don Osborn
Merci Christian, replies to the last part of your message below...

 Last bit of thought: we should be more precise about the term
 For example, Chinese is not 1 language, but several (no oral
 intercomprehension), with their
 dialects: Mandarin, Cantonese, Wu (Shanghainese is a dialect of Wu),
 Fujien, etc. Or Arabic, for that matter. And that is quite important
 for NLP.
 For instance, a morphological analyzer for Literal or Standard Arabic
 is almost useless for Iraki. The US now have more resources for Iraki
 than for Standard Arabic҆

In the broad sense it may not be possible to have a consistent definition of
language that is applicable to all uses. Ethnologue has a consistent
approach, though very clearly a splitter one. That may be more useful in
some areas (perhaps MT?) than others (like software localization). There is
not AFAIK an index of languageness, that is how independent a language
is, or whether it exists as a variant very close to one or more others,
whether it or another is a standard for a wider range of uses than the other
closely-related tongues, etc. (Incidentally a low languageness index, if
there were such a thing, might indicate potential use of certain kinds of MT
[shallow transfer models, as far as I understand the term] among the
related languages.)

We get into some interesting and complex areas with situations like Chinese
and Arabic as you describe - what do you call a language that is pretty much
the same written, but different languages spoken? Or when there is
coexistence of related standard/written form and colloquial/spoken forms?
 Concerning African languages, I was told the situation is the same,
 with many dialects and sometimes different writing systems
 (missionaries from various countries and confessions created them
 almost indemendently). For NLP systems to be really useful, then, they
 must be tuned to these variants. Also, we should more and more take
 into account that, although all are technically written, their use is
 mostly oral, and their speakers rarely write or read in them.
 Then, unity provided by a common script is somewhat destroyed: systems
 have to become increasingly directly oral.

I'm actually (re)writing something that touches on these issues. Writing
systems are sometimes multiple for the same tongue, but nowadays that might
be due to differences in country language policies (as pertain to
orthographies within their borders - borders that very often split language
communities); there are also as you mention sometimes legacies of divergent
missionary approaches (an example are the orthographies for Twi Ashanti, Twi
Akuapem, and Fanti in Ghana). The situation, though, is often dynamic 
changing which is both good and bad news: good in that standardized or
unified forms benefit wider use, but bad for NLP or localization when they
are in flux due to the transition not being complete or completely adopted.

Your mention of possible directly oral approaches is very much on target.
I do, however, see this as a family of technologies including audio-based
applications, speech - text transformation softwares, and of course
computer translation programs (MT. translation memory). The sum object would
be to make the transition among languages and forms of expression more
seamless. I've had discussions where the notion of written + neo-oral
culture in Africa has been mentioned. That's talking big  vague, but as far
as I see from the technology anyway, there is a lot that can be done in that

The bottom line is how can all these wonderful things ICT can do be made to
accommodate situations where there are many languages, often with oral
traditions, easy codeswitching by speakers, still low literacy/pluriliteracy
rates, and more access to cellphones than computers.

 With that trend, an important question has become how to adapt/reuse
 resources and tools from 1 rho- or mu- language to a variant which
 is still very much pi-!

This is true. Actually I have been looking at this system and others to help
categorize our working list of priority languages (and language
groups/clusters) at (some
experiments offline). The idea being larger strategies for languages in
which one could divide such a priority list into areas for attention and

I also hope that in the case of languages in Africa it will be possible to
develop some novel approaches in developing applications, not only adapting
 reusing what is created elsewhere.

Don Osborn
PanAfrican Localisation project

 Best regards,
 Christian Boitet
 (Pr. Universite' Joseph Fourier)   Tel: +33 (0)4 76 51 43 55/48 17
 GETA, CLIPS, IMAG-campus, BP53 Fax: +33 (0)4 76 44 66 75/51 44
 385, rue de la Bibliothe`que   Mel: [EMAIL PROTECTED]
 38041 Grenoble Cedex 9, France

[Mt-list] Translation industry has vast potential in India

2006-10-14 Thread Don Osborn
This item from the Telugu Portal at may be of
interest. I read it thinking of the potential for machine translation
facilitate work and various kinds of communication across diverse languages
in various regions of the developing world. (I was alerted to this article
by RSS from Kwintessential Cross Cultural News: )

Don Osborn
PanAfrican Localisation project

Nation: Translation industry has vast potential in India: Pitroda 
Posted by admin on 2006/10/11 10:00:12

New Delhi, Oct 11 (IANS) The translation industry has the potential to
generate more than 500,000 jobs in India, and necessary recommendations
would be made to exploit the potential, said Knowledge Commission Chairman
Sam Pitroda Wednesday.

We are working towards strengthening the translation industry by opening
state-run training institutions and then open it for the private sector,
Pitroda said at a discussion organised by the Confederation of Indian
Industry (CII) here.

The translation industry in India has been neglected so far. India is a
diverse country and we don't understand each other's culture or languages.
Why can't a Bengali work be translated into a Gujarati work? he queried.

That's the only way knowledge can be truly imparted.

Pitroda said the entire education system in India needed a complete
overhauling - right from government-run schools to institutions of higher
education - since education was becoming a privilege for the few who could
afford it.

He added that the Knowledge Commission - set up by Prime Minister Manmohan
Singh in 2005 - has given a set of 10 recommendations in this regard to the
government and another set of 10 suggestions would be made in a couple of

Our recommendations cover areas like increasing the number of universities
to 1,500 from 350 in the next few years. We have also given recommendations
on libraries, affirmative action, language, translations, literacy and
programmes, said Pitroda.

He hoped the recommendations would trigger wide debates in society, and
said: I want criticism to arise because that is how there will be change in
people's mindsets, which is very important for the country to develop.

Pitroda - who led India's telecommunications revolution of the 1980s and
headed the Technology Mission that covered areas like drinking water and
edible oils - said the government had accepted the commission's paper on

The Chicago-based technocrat-entrepreneur - who is also part of a UN
committee to help push technology across the globe in the 21st century -
said India had a long way to go before it could call itself a superpower.

Mt-list mailing list

[OT] Terminology (RE: [Mt-list] NLP for (African) pi-languages, not minority languages)

2006-08-25 Thread Don Osborn
Thanks to all who have responded in this thread. I will follow up offline.

Re the question of terminology, and minority languages in particular, here
are a few quick thoughts (with apologies for taking this off on a tangent):

1. I hadn't thought of minority being offensive, but I guess we need to be
attentive to such matters. The main problem with the term I saw was its
imprecision. There was not long ago a project to compile information on
minority languages. To the surprise of a few people asked about it,
including me, Hausa was one of them (next to Swahili it supposedly has the
highest speakership of all African languages). But when we discussed it
further, the criteria indeed seemed to admit it: In Hausaland across much
of Niger and Nigeria it is the main language, but Hausaphones are minorities
elsewhere and it is spoken as a trade language by some people further away.
However, by extension, then, just about every other language in Africa is
minority as well. What capped it was discovering that Chinese also
qualiified as a minority language - which it is in fact in many countries,
though we wouldn't think to call it, or Spanish or English, etc. As Francis
puts it, situational minority languages. But that just shows how dependent
the term is on context.

2. So people grope for an appropriate term. For more widely spoken
languages, LWC for language of wider communication emerged at some point
(rather like lingua francas, but let's not try to sort out the difference
between those two here). And at the other extreme there are endangered
languages about which, although definitions can vary, there is a generally
accepted sense of what it means (though even on that I've read references to
Igbo, a language spoken by somewhere on the order of 20 million people
described as endangered - but let's not delve into the issues there
either). But in between those two what do you say? Small languages as
shorthand for less widely spoken languages are more appropriately spoken
of as the latter - but that's too cumbersome. In Europe there was the term
lesser-used languages but with uncertain implications - less people speak
then or those that do use them less or both? Local languages is one that
I've tried to avoid lately because it seems to me to be used in a way that
reduces the languages status, and is applied only in some parts of the world
(and what of local when you have, say, Wolof-speaking merchants in New
York and Paris, for instance?). In Francophone countries the term langue
partenaire has been coined, but that raises questions of what kind of
partnership, and who's partner with whom and why and so on 

3. A lot depends of course on context. Under-resourced languages is very
descriptive for ICT contexts and even some traditional technologies (e.g.,
no textbooks in so many less-widely-spoken-languages for the better part of
the past century - now that's under-resourced). But maybe not in demographic
or sociolinguistic contexts. Just for an example, Fula definitely is under
resourced in the technical and monetary sense, but definitely not
linguistically (e.g., its lexicon is staggering - there's a large dictionary
of the roots alone). Less commonly taught languages (LCTLs) is purely an
academic reference. Pi-language is a new one on me but seems to be mainly
a technical reference (pi=poorly informatisées or what?).

4. I ran into this problem personally when I wanted a way to refer to a very
wide class of languages not counting the LWCs as LWCs, and came up with an
acronym that I think covers the intended field and is in itself
constructively ambiguous: MINEL - where M is maternal (which is every
language, but here the emphasis is on this role as opposed to the 2nd
language role) or minority (sorry!); I is indigenous (which also can mean
anything, but here meant in the sense of languages of indigenous peoples;
N is national which is an appellation more common in Francophone countries
especially in Africa and is *not* the same as official; E is endangered,
or ethnic which one will hear with regard to languages in some parts of
the world (funny that a language might be referred to as ethnic and not
indigenous or vice-vera, but the criteria for the distinction are arguable);
and L could be less-widely-spoken or even local or, well, language.

That about runs the gamut, from what I have. Hope all have a good weekend
(some of you are in the midst of it and others just starting, and some of us
will work through it either way!).

Don Osborn

Mt-list mailing list

[Mt-list] Computers translation in Africa / involving African languages

2006-08-24 Thread Don Osborn
I have updated a very modest presentation of some info relevant to MT in
Africa at . There is not much there, so I
would like to request information/recommendations for other links relating
to MT in Africa and in African languages wherever. (The page also needs some
reworking, but I'm mainly concerned now with content.)


Don Osborn
PanAfrican Localisation project

Mt-list mailing list

RE: [Mt-list] Computers translation in Africa / involving African languages

2006-08-24 Thread Don Osborn
Thanks. It does, but info on Arabic could easily overwhelm a page like that.
Is there a (meta-)page(s) with links to MT projects on Arabic? More ideal to
add such a link. 

-Original Message-
From: Robert Frederking [mailto:[EMAIL PROTECTED] 
Sent: Thursday, August 24, 2006 1:04 PM
To: Don Osborn
Subject: Re: [Mt-list] Computers  translation in Africa / involving African

If Arabic counts, there's much work in the US on Arabic these days.

Don Osborn wrote:
 I have updated a very modest presentation of some info relevant to MT 
 in Africa at . There is not much there, 
 so I would like to request information/recommendations for other links 
 relating to MT in Africa and in African languages wherever. (The page 
 also needs some reworking, but I'm mainly concerned now with content.)


 Don Osborn
 PanAfrican Localisation project

 Mt-list mailing list


Mt-list mailing list

[Mt-list] World's first translation software for cell phone launched in Xiamen

2006-07-30 Thread Don Osborn

FYI, in case you haven't already seen thisw item 
elsewhere...World's first translation software for cell phone 
launched in Xiamen It 
has now become a reality that you hear Chinese when the other people speak to 
you in English on a mobile phone. This has been achieved by a translation 
software developed by Xiamen Talentedsoft Co. Ltd. (Talentedsoft). 
With storage of 
0.1 million words, the world's first translation software for cell phone can 
translate a common sentence from Chinese to English or from English to Chinese 
in less than 0.5 second and can also read out the words after the translation is 
Talentedsoft is a company engaged in developing 
machine translation technology and voice recognition technology and is 
co-founded by two professors from Institute of Artificial Intelligence of Xiamen 
University and a returned Chinese intellectual from overseas with a PHD. 

By People's 
Daily Online UPDATED: 18:05, July 28, 2006
Mt-list mailing list