Hi Ilia,

When writing *Solr in Action*, I implemented a feature which can do what
you're asking (allow multiple, dynamic analyzers to be used in a single
text field). This would allow you to use the same field and dynamically
change the analyzers (for example, you could do language-identification on
documents and only stem to the identified languages). It also support more
than one Analyzer per field (i.e. if you single documents or queries
containing multiple languages).

This seems to be a feature request which comes up regularly, so I just
submitted a new feature request on JIRA to add this feature to Solr and
track the progress:
https://issues.apache.org/jira/browse/SOLR-6492

I included a comment showing how to use the functionality currently
described in *Solr in Action*, but I plan to make it easier to use over the
next 2 months before calling it done. I'm going to be talking about
multilingual search in November at Lucene/Solr Revolution, so I'd ideally
like to finish before then so I can demonstrate it there.

Thanks,

-Trey Grainger
Director of Engineering, Search & Analytics @ CareerBuilder


On Mon, Sep 8, 2014 at 3:31 PM, Jorge Luis Betancourt Gonzalez <
jlbetanco...@uci.cu> wrote:

> In one of the talks by Trey Grainger (author of Solr in Action) it touches
> how on CareerBuilder are dealing with multilingual with payloads, its a
> little more of work but I think it would payoff.
>
> On Sep 8, 2014, at 7:58 AM, Jack Krupansky <j...@basetechnology.com>
> wrote:
>
> > You also need to take a stance as to whether you wish to auto-detect the
> language at query time vs. have a UI selection of language vs. attempt to
> perform the same query for each available language and then "determine"
> which has the best "relevancy". The latter two options are very sensitive
> to short queries. Keep in mind that auto-detection for indexing full
> documents is a different problem that auto-detection for very short queries.
> >
> > -- Jack Krupansky
> >
> > -----Original Message----- From: Ilia Sretenskii
> > Sent: Sunday, September 7, 2014 10:33 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: How to implement multilingual word components fields schema?
> >
> > Thank you for the replies, guys!
> >
> > Using field-per-language approach for multilingual content is the last
> > thing I would try since my actual task is to implement a search
> > functionality which would implement relatively the same possibilities for
> > every known world language.
> > The closest references are those popular web search engines, they seem to
> > serve worldwide users with their different languages and even
> > cross-language queries as well.
> > Thus, a field-per-language approach would be a sure waste of storage
> > resources due to the high number of duplicates, since there are over 200
> > known languages.
> > I really would like to keep single field for cross-language searchable
> text
> > content, witout splitting it into specific language fields or specific
> > language cores.
> >
> > So my current choice will be to stay with just the ICUTokenizer and
> > ICUFoldingFilter as they are without any language specific
> > stemmers/lemmatizers yet at all.
> >
> > Probably I will put the most popular languages stop words filters and
> > stemmers into the same one searchable text field to give it a try and see
> > if it works correctly in a stack.
> > Does specific language related filters stacking work correctly in one
> field?
> >
> > Further development will most likely involve some advanced custom
> analyzers
> > like the "SimplePolyGlotStemmingTokenFilter" to utilize the ICU generated
> > ScriptAttribute.
> > http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/100236
> >
> https://github.com/whateverdood/cross-lingual-search/blob/master/src/main/java/org/apache/lucene/sandbox/analysis/polyglot/SimplePolyGlotStemmingTokenFilter.java
> >
> > So I would like to know more about those "academic papers on this issue
> of
> > how best to deal with mixed language/mixed script queries and documents".
> > Tom, could you please share them?
>
> Concurso "Mi selfie por los 5". Detalles en
> http://justiciaparaloscinco.wordpress.com
>

Reply via email to