Hi Andy,

Thanks for your post to the list.

A few comments:

> I think this is my main concern: SMT is very well (and deservedly so)
> established nowadays as the main way to do MT. Unless you're an MT
> person, you'd think that it was the _only_ way to do MT, as here.

I would say:

1. Can SMT currently be used in implementations across the full spectrum of 
real-user needs for the billions of dollars / euros of communications needs 
today?

2. How many languages is SMT currently valid for?

Well, read my article "What about SMT?" in the IJLD (available on my website 
http://www.geocities.com/jeffallenpubs/) and that should shed some light on the 
topic.

>     1. Can papers on EBMT succeed in getting published (especially in
>        non-expert, i.e. MT-specific, conferences) without making direct
>        comparisons to SMT?

You bet. Look at all the panel talks and user implementation case studies 
concerning Translation Memory (TM) systems that have been presented at 
conferences like ASLIB, LocalizationWorld, LREC, MT confs, and others over the 
past 20 years. There were tons of booths at Localization World Bonn last week 
(probably 30-50). About 350 participants and lots of user groups present.  Many 
TM players were there.  Very few commercial MT companies present, but many 
folks interested in MT in general.   So where is the end-user tendency for MT-
type systems?

SMT is just one type of system. EBMT is a different type. Different 
methodologies and corresponding to different types of needs, at least for the 
moment.  My panel talk at Localization World Bonn provided an outline of the 
types of MT systems and showed how the different features in various commercial 
translation software products correspond to varying types of translation 
approaches and needs.  And I clearly stated that if you purchase a system that 
doesn't match your need, then you don't have the right to complain about it.  

ALLEN, Jeff. 2004. Inbound vs. Outbound Translation. Presentation in 
the "Localization for Customer Support" panel.  LocalizationWorld, Bonn, 
Germany, 29 June - 1 July 2004.
(not available yet online but will be soon).

As for your paper, if you did not say in your submission that you were 
comparing EBMT to SMT, then I see no reason why your submission should be 
rejected for not doing so. I also review a lot of technical conference papers 
as well as language technology articles for MultiLingual.  If something is 
incorrect or invalid to some point, I'll definitely make comments on it, and 
usually back it up with references.  But the "only" way anyone can clain that 
any method is "the" best approach is to prove it from "market-driven" survey 
work among users.  And from the many surveys I have conducted and published 
over the years in the field (again see my website under the Language Resources 
section), SMT is not the one that the majority of end-users have been 
implementing.  This does not downgrade the value of SMT, but rather makes us 
look at it from point of view of what it is good for, and what it is not yet 
good for.


>        "there is no discussion to how [our approach] would compare
>         with more established techniques such as word-alignment using
>         statistical models. Showing that [our approach] is comparable
>         (or better) than the traditional way of acquiring
>         phrase-alignment [SMT, references excluded here] would make
>         this paper just great".

Read my following recent article (short, 1-2 pages) that reminds us how people 
are usually biased in what they say. Always take a few steps back and look at 
the big picture:

ALLEN, Jeffrey. March 2004. Thinking about machine translation: several 
questions to ask yourself when you read an article about MT technologies. In 
special supplement of Multilingual Computing and Technology magazine, Number 
62, March 2004.
See my website under the MT and MT postediting page (thematic category) or 
under the Multilingual Computing and Technology (publication channel category).


>     3. Has EBMT as a paradigm been 'muscled out' by the more dominant
>        SMT approach?

Who says that SMT is dominant?  

Despite the fact that SMT might be the real cool thing to be doing research 
on (and yes I have done work on it too several years ago at CMU and was part of 
a thesis committee on applying SMT to a KBMT implementation), let's take a step 
back with the real end-user perspective.  

Which engine types are being implemented today as products in real-world 
contexts and are effectively financially meeting the billions of dollars / 
euros of the translation and localization needs in the global market (see all 
the survey results from IDC, Allied Business, Forrester, et al)?

And which of these types of systems are realistically paying all of the 
salaries of the hundreds of MT researchers and implementers across the world 
today?

   * I really only know of 2 commercial companies working at implementing SMT.
   * I know of 2 companies doing KBMT commercial systems, and a few industrial 
projects implementing KBMT custom systems.
   * There are tons of commercial companies doing RBMT systems
   * Many of MT companies are implementing EBMT-like plug-in modules into 
the RBMT systems
   * There are many Translation Memory (TM) companies whose EBMT-
like tools are what thousands of human translators use on a daily basis for the 
overall translation and localization industry.
   * There are several TM tools that now have MT-approach features.

So, which of these systems types are dominating the global market today?

Then I look back at my own career over the past 10 years and analyze which 
systems types have really actually put food on my family's table:

SMT for a small part of 2 years
KBMT for 2 years
MEMT for 2 years
RBMT for several years
EBMT types for several years


I'm not at all trying to slam SMT, but I want to put into perspective what do 
we mean by the "mainstream" and "dominant" approaches.  All human translators I 
know use TM systems, and even TM is not always a productive solution for them.  
It takes a lot of evangelizing to convince the professional translator 
community to use of RBMT and KBMT systems.  See the discussion thread section 
on my web site on what I have done over the years to do so with professional 
translators on the LANTRA-L list.  

Yet, where is the majority of money being invested in products, and being spent 
by user groups and institutions?

SMT?
RMBT?
EBMT?
etc


And how pure are the different systems?

Let's recall that Bob Frederking wrote a good post to the MT-List about a 
year ago with regard to the definitions of different types of systems.  I 
really liked his description of these different systems and think he is right 
on with the analogy he provided.
It is an explanation that deserves being reread.


I haven't answered all your questions, and have come back with more questions 
for all of us, yet I myself would have a hard time saying that SMT is "the 
dominant" approach today for real-world communication and translation needs.  

SMT has its place and is providing a lot of interesting results for academic 
research, industrial research, government research, and now some 
product/service offerings. It is more valid for some language directions and 
less for others.  It is a very valuable component when combining it with other 
MT approaches.  Yet calling it the "mainstream" approach honestly seems a bit 
ignorant to me given all that I've shown above. My web site provides lots of 
references to more info and details.

Sorry for my long reply, but I hope it makes us all think a bit about what you 
(Andy) are saying to the community.

Many thanks again for your request for comments. I'll be offline without e-mail 
for a week.

Jeff


Quoting Andy Way <[EMAIL PROTECTED]>:
> I'm going to try very hard not to make this sound like a rant. Rather, I 
> hope the following (probably long-winded) observations may seed an 
> interesting debate as to where we are these days w.r.t. corpus-based MT, 
> and MT in general.
> 
> As many of you know, I submit to and review for many NLP and 
> (especially) MT conferences. In my experience (and I trust this is 
> relatively uncontroversial), the vast majority of MT papers that one 
> sees nowadays are corpus-based. Now, even though I work mostly in the 
> area of EBMT, I think it is again uncontroversial to state that most of 
> the corpus-based MT papers one sees are not EBMT, but rather SMT. Herein
> lies the point I would like to make.
> 
> We submitted a paper recently to a conference (I won't say which, but it 
> wasn't an MT conference per se) which was turned down. The paper 
> received 3 reviews. One comment received was:
> 
>        "the paper ... completely ignores the current mainstream empirical
>        approach to machine translation: phrase-based or template-based
>        statistical machine translation".
> 
> This was true - it did. One useful comment was for us to compare our
> approach with an SMT approach - we're trying this out as we speak, but
> I wonder whether any SMT paper would be asked to compare its findings
> with those of an EBMT approach? Is it the case nowadays that a paper
> on (the admittedly considerably less mainstream) EBMT cannot stand on
> its own merit?
> 
> Nevertheless, EBMT has been chunking sentences into phrases from the
> word go. SMT has recently caught on to this idea, and results have
> improved quite dramatically. Despite this, one other comment was:
> 
>        "there is no discussion to how [our approach] would compare
>         with more established techniques such as word-alignment using
>         statistical models. Showing that [our approach] is comparable
>         (or better) than the traditional way of acquiring
>         phrase-alignment [SMT, references excluded here] would make
>         this paper just great".
> 
> I think this is my main concern: SMT is very well (and deservedly so)
> established nowadays as the main way to do MT. Unless you're an MT
> person, you'd think that it was the _only_ way to do MT, as here.
> 
> We've all received rejections before, and dodgy reviews too. I too 
> reject papers, and I'm sure I've given the odd dodgy review too! I hope
> I'm making it clear that's not my main concern here. Rather, I have
> these questions:
> 
>     1. Can papers on EBMT succeed in getting published (especially in
>        non-expert, i.e. MT-specific, conferences) without making direct
>        comparisons to SMT?
> 
>     2. Can anyone envisage a situation where an SMT paper was asked to
>        compare its results against an MT model?
> 
>     3. Has EBMT as a paradigm been 'muscled out' by the more dominant
>        SMT approach?
> 
>     4. Instead of signalling the 'bright new dawn' for EBMT, will the
>        volume of [Carl & Way, 2003] instead come to be seen as the
>        epitaph for this approach?
> 
> OK, maybe I'm being a bit OTT here, but you get the point. Anyone care
> to indulge me here?
> 
> Cheers,
> Andy.
> 
> 
> _______________________________________________
> MT-List mailing list
> [EMAIL PROTECTED]
> http://www.computing.dcu.ie/mailman/listinfo/mt-list
> 


-- 

_______________________________________________
MT-List mailing list
[EMAIL PROTECTED]
http://www.computing.dcu.ie/mailman/listinfo/mt-list

Reply via email to