I want an update processor that runs Translation Party. http://translationparty.com/
http://downloadsquad.switched.com/2009/08/14/translation-party-achieves-hilarious-results-using-google-transl/ ----- Original Message ----- | From: "SUJIT PAL" <sujit....@comcast.net> | To: solr-user@lucene.apache.org | Sent: Wednesday, October 10, 2012 2:51:37 PM | Subject: Re: Query foreign language "synonyms" / words of equivalent meaning? | | Hi, | | We are using google translate to do something like what you | (onlinespending) want to do, so maybe it will help. | | During indexing, we store the searchable fields from documents into a | fields named _en, _fr, _es, etc. So assuming we capture title and | body from each document, the fields are (title_en, body_en), | (title_fr, body_fr), etc, with their own analyzer chains. These | documents come from a controlled source (ie not the web), so we know | the language they are authored in. | | During searching, a custom component intercepts the client language | and the query. The query is sent to google translate for language | detection. The largest amount of docs in the corpus is english, so | if the detected language is either english or the client language, | then we call google translate again to find the translated query in | the other (english or client) language. Another custom component | constructs an OR query between the two languages one component of | which is aimed at the _en field set and the other aimed at the _xx | (client language) field set. | | -sujit | | On Oct 9, 2012, at 11:24 PM, Bernd Fehling wrote: | | > | > As far as I know, there is no built-in functionality for language | > translation. | > I would propose to write one, but there are many many pitfalls. | > If you want to translate from one language to another you might | > have to | > know the "starting" language. Otherwise you get problems with | > translation. | > | > Not (german) -> distress (english), affliction (english) | > | > - you might have words in one language which are stopwords in other | > language "not" | > - you don't have a one to one mapping, it's more like "1 to n+x" | > toilette (french) -> bathroom, rest room / restroom, powder room | > | > This are just two points which jump into my mind but there are tons | > of pitfalls. | > | > We use the solution of a multilingual thesaurus as synonym | > dictionary. | > http://en.wikipedia.org/wiki/Eurovoc | > It holds translations of 22 official languages of the European | > Union. | > | > So a search for "europäischer währungsfonds" gives also results | > with | > "european monetary fund", "fonds monétaire européen", ... | > | > Regards | > Bernd | > | > | > | > Am 10.10.2012 04:54, schrieb onlinespend...@gmail.com: | >> Hi, | >> | >> English is going to be the predominant language used in my | >> documents, but | >> there may be a spattering of words in other languages (such as | >> Spanish or | >> French). What I'd like is to initiate a query for something like | >> "bathroom" | >> for example and for Solr to return documents that not only contain | >> "bathroom" but also "baño" (Spanish). And the same goes when | >> searching for " | >> baño". I'd like Solr to return documents that contain either | >> "bathroom" or " | >> baño". | >> | >> One possibility is to pre-translate all indexed documents to a | >> common | >> language, in this case English. And if someone were to search | >> using a | >> foreign word, I'd need to translate that to English before issuing | >> a query | >> to Solr. This appears to be problematic, since I'd have to know | >> whether the | >> indexed words and the query are even in a foreign language, which | >> is not | >> trivial. | >> | >> Another possibility is to pre-build a list of foreign word | >> synonyms. So baño | >> would be listed as a synonym for bathroom. But I'd need to include | >> other | >> languages (such as toilette in French) and other words. This | >> requires that | >> I know in advance all possible words I'd need to include foreign | >> language | >> versions of (not to mention needing to know which languages to | >> include). | >> This isn't trivial either. | >> | >> I'm assuming there's no built-in functionality that supports the | >> foreign | >> language translation on the fly, so what do people propose? | >> | >> Thanks! | >> | > | > -- | > ************************************************************* | > Bernd Fehling Universitätsbibliothek Bielefeld | > Dipl.-Inform. (FH) LibTec - Bibliothekstechnologie | > Universitätsstr. 25 und Wissensmanagement | > 33615 Bielefeld | > Tel. +49 521 106-4060 bernd.fehling(at)uni-bielefeld.de | > | > BASE - Bielefeld Academic Search Engine - www.base-search.net | > ************************************************************* | |