Claudio,

Ah, through multilingual indexing/search work (with 
http://www.sematext.com/products/multilingual-indexer/index.html ) I learned 
that cross-language search often doesn't really make sense, unless the search 
involves "universal terms" (e.g. Fiat, BMW, Mercedes, Olivetti, Tomi de Paola, 
Alberto Tomba...).  If the search involved natural language-specific terms, 
then searching in the "foreign" language doesn't work so well and doesn't make 
a ton.  Imagine a search for "ciao ragazzi".  I have no idea what the Italian 
stemmer does with that, but say it turns it into "cia raga" (it doesn't, but 
just imagine).  If this was done with Italian docs at index time, you will find 
the matching docs.  But what happens if "ciao ragazzi" was analyzed by some 
German analyzer?  Different tokens will be created and indexed, so a "ciao 
ragazzi" search won't work.  And this Analyzer would you use to analyze that 
query anyway?  Italian or German?

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/



----- Original Message ----
> From: Claudio Martella <claudio.marte...@tis.bz.it>
> To: solr-user@lucene.apache.org
> Sent: Thu, February 11, 2010 3:21:32 AM
> Subject: Re: dismax and multi-language corpus
> 
> I'll try removing the '-'. I do need now to search it. the other option
> would be to request the user what language to query. but in my region we
> use italian and german in the same quantity, so it would turn out in
> querying both the languages all the time. or you meant a more performant
> solution of query both the languages all the time? :)
> 
> 
> Otis Gospodnetic wrote:
> > Claudio - fields with '-' in them can be problematic.
> >
> > Side comment: do you really want to search across all languages at once?  
> > If 
> not, maybe 3 different dismax configs would make your searches better.
> >
> >  Otis
> > ----
> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> > Hadoop ecosystem search :: http://search-hadoop.com/
> >
> >
> >
> > ----- Original Message ----
> >  
> >> From: Claudio Martella 
> >> To: solr-user@lucene.apache.org
> >> Sent: Wed, February 10, 2010 3:15:40 PM
> >> Subject: dismax and multi-language corpus
> >>
> >> Hello list,
> >>
> >> I have a corpus with 3 languages, so i setup a text content field (with
> >> no stemming) and 3 text-[en|it|de] fields with specific snowball stemmers.
> >> i copyField the text to my language-away fields. So, I setup this dismax
> >> searchHandler:
> >>
> >>
> >>
> >>   dismax
> >>   title^1.2 content-en^0.8 content-it^0.8
> >> content-de^0.8
> >>   title^1.2 content-en^0.8 content-it^0.8
> >> content-de^0.8
> >>   title^1.2 content-en^0.8 content-it^0.8
> >> content-de^0.8
> >>   0.1
> >>
> >>
> >>
> >>
> >> but i get this error:
> >>
> >> HTTP Status 400 - org.apache.lucene.queryParser.ParseException: Expected
> >> ',' at position 7 in 'content-en'
> >>
> >> type Status report
> >>
> >> message org.apache.lucene.queryParser.ParseException: Expected ',' at
> >> position 7 in 'content-en'
> >>
> >> description The request sent by the client was syntactically incorrect
> >> (org.apache.lucene.queryParser.ParseException: Expected ',' at position
> >> 7 in 'content-en').
> >>
> >> Any idea?
> >>
> >> TIA
> >>
> >> Claudio
> >>
> >> -- 
> >> Claudio Martella
> >> Digital Technologies
> >> Unit Research & Development - Analyst
> >>
> >> TIS innovation park
> >> Via Siemens 19 | Siemensstr. 19
> >> 39100 Bolzano | 39100 Bozen
> >> Tel. +39 0471 068 123
> >> Fax  +39 0471 068 129
> >> claudio.marte...@tis.bz.it http://www.tis.bz.it
> >>
> >> Short information regarding use of personal data. According to Section 13 
> >> of 
> >> Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we 
> >> process your personal data in order to fulfil contractual and fiscal 
> obligations 
> >> and also to send you information regarding our services and events. Your 
> >> personal data are processed with and without electronic means and by 
> respecting 
> >> data subjects' rights, fundamental freedoms and dignity, particularly with 
> >> regard to confidentiality, personal identity and the right to personal 
> >> data 
> >> protection. At any time and without formalities you can write an e-mail to 
> >> priv...@tis.bz.it in order to object the processing of your personal data 
> >> for 
> 
> >> the purpose of sending advertising materials and also to exercise the 
> >> right 
> to 
> >> access personal data and other rights referred to in Section 7 of Decree 
> >> 196/2003. The data controller is TIS Techno Innovation Alto Adige, Siemens 
> >> Street n. 19, Bolzano. You can find the complete information on the web 
> >> site 
> >> www.tis.bz.it.
> >>    
> >
> >
> >  
> 
> 
> -- 
> Claudio Martella
> Digital Technologies
> Unit Research & Development - Analyst
> 
> TIS innovation park
> Via Siemens 19 | Siemensstr. 19
> 39100 Bolzano | 39100 Bozen
> Tel. +39 0471 068 123
> Fax  +39 0471 068 129
> claudio.marte...@tis.bz.it http://www.tis.bz.it
> 
> Short information regarding use of personal data. According to Section 13 of 
> Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we 
> process your personal data in order to fulfil contractual and fiscal 
> obligations 
> and also to send you information regarding our services and events. Your 
> personal data are processed with and without electronic means and by 
> respecting 
> data subjects' rights, fundamental freedoms and dignity, particularly with 
> regard to confidentiality, personal identity and the right to personal data 
> protection. At any time and without formalities you can write an e-mail to 
> priv...@tis.bz.it in order to object the processing of your personal data for 
> the purpose of sending advertising materials and also to exercise the right 
> to 
> access personal data and other rights referred to in Section 7 of Decree 
> 196/2003. The data controller is TIS Techno Innovation Alto Adige, Siemens 
> Street n. 19, Bolzano. You can find the complete information on the web site 
> www.tis.bz.it.

Reply via email to