I want an update processor that runs Translation Party.

http://translationparty.com/

http://downloadsquad.switched.com/2009/08/14/translation-party-achieves-hilarious-results-using-google-transl/

----- Original Message -----
| From: "SUJIT PAL" <sujit....@comcast.net>
| To: solr-user@lucene.apache.org
| Sent: Wednesday, October 10, 2012 2:51:37 PM
| Subject: Re: Query foreign language "synonyms" / words of equivalent meaning?
| 
| Hi,
| 
| We are using google translate to do something like what you
| (onlinespending) want to do, so maybe it will help.
| 
| During indexing, we store the searchable fields from documents into a
| fields named _en, _fr, _es, etc. So assuming we capture title and
| body from each document, the fields are (title_en, body_en),
| (title_fr, body_fr), etc, with their own analyzer chains. These
| documents come from a controlled source (ie not the web), so we know
| the language they are authored in.
| 
| During searching, a custom component intercepts the client language
| and the query. The query is sent to google translate for language
| detection. The largest amount of docs in the corpus is english, so
| if the detected language is either english or the client language,
| then we call google translate again to find the translated query in
| the other (english or client) language. Another custom component
| constructs an OR query between the two languages one component of
| which is aimed at the _en field set and the other aimed at the _xx
| (client language) field set.
| 
| -sujit
| 
| On Oct 9, 2012, at 11:24 PM, Bernd Fehling wrote:
| 
| > 
| > As far as I know, there is no built-in functionality for language
| > translation.
| > I would propose to write one, but there are many many pitfalls.
| > If you want to translate from one language to another you might
| > have to
| > know the "starting" language. Otherwise you get problems with
| > translation.
| > 
| > Not (german) -> distress (english), affliction (english)
| > 
| > - you might have words in one language which are stopwords in other
| > language "not"
| > - you don't have a one to one mapping, it's more like "1 to n+x"
| >  toilette (french) -> bathroom, rest room / restroom, powder room
| > 
| > This are just two points which jump into my mind but there are tons
| > of pitfalls.
| > 
| > We use the solution of a multilingual thesaurus as synonym
| > dictionary.
| > http://en.wikipedia.org/wiki/Eurovoc
| > It holds translations of 22 official languages of the European
| > Union.
| > 
| > So a search for "europäischer währungsfonds" gives also results
| > with
| > "european monetary fund", "fonds monétaire européen", ...
| > 
| > Regards
| > Bernd
| > 
| > 
| > 
| > Am 10.10.2012 04:54, schrieb onlinespend...@gmail.com:
| >> Hi,
| >> 
| >> English is going to be the predominant language used in my
| >> documents, but
| >> there may be a spattering of words in other languages (such as
| >> Spanish or
| >> French). What I'd like is to initiate a query for something like
| >> "bathroom"
| >> for example and for Solr to return documents that not only contain
| >> "bathroom" but also "baño" (Spanish). And the same goes when
| >> searching for "
| >> baño". I'd like Solr to return documents that contain either
| >> "bathroom" or "
| >> baño".
| >> 
| >> One possibility is to pre-translate all indexed documents to a
| >> common
| >> language, in this case English. And if someone were to search
| >> using a
| >> foreign word, I'd need to translate that to English before issuing
| >> a query
| >> to Solr. This appears to be problematic, since I'd have to know
| >> whether the
| >> indexed words and the query are even in a foreign language, which
| >> is not
| >> trivial.
| >> 
| >> Another possibility is to pre-build a list of foreign word
| >> synonyms. So baño
| >> would be listed as a synonym for bathroom. But I'd need to include
| >> other
| >> languages (such as toilette in French) and other words. This
| >> requires that
| >> I know in advance all possible words I'd need to include foreign
| >> language
| >> versions of (not to mention needing to know which languages to
| >> include).
| >> This isn't trivial either.
| >> 
| >> I'm assuming there's no built-in functionality that supports the
| >> foreign
| >> language translation on the fly, so what do people propose?
| >> 
| >> Thanks!
| >> 
| > 
| > --
| > *************************************************************
| > Bernd Fehling                Universitätsbibliothek Bielefeld
| > Dipl.-Inform. (FH)            LibTec - Bibliothekstechnologie
| > Universitätsstr. 25                     und Wissensmanagement
| > 33615 Bielefeld
| > Tel. +49 521 106-4060       bernd.fehling(at)uni-bielefeld.de
| > 
| > BASE - Bielefeld Academic Search Engine - www.base-search.net
| > *************************************************************
| 
| 

Reply via email to