Re: Multi-language indexing and searching

Daniel Alheiros Fri, 08 Jun 2007 07:39:35 -0700

Hi Henri.

Thanks for your reply.
I've just looked at the patch you referred, but doing this I will lose the
out of the box Solr installation... I'll have to create my own Solr
application responsible for creating the multiple cores and I'll have to
change my indexing process to something able to notify content for a
specific core.

Can't I have the same index, using one single core, same field names being
processed by language specific components based on a field/parameter?

I will try to draw what I'm thinking, please forgive me if I'm not using the
correct terms but I'm not an IR expert.

Thinking in a workflow:
    Indexing:
        Multilanguage indexer receives some documents
            for each document, verify the "language" field
                if language = "English" then process using the
EnglishIndexer
                else if language = "Chinese" then process using the
ChineseIndexer
                else if ...

    Querying:
        Multilanguage Request Handler receives a request
            if parameter language = "English" then process using the English
Request Handler
            else if parameter language = "Chinese" then process using the
Chinese Request Handler
            else if ...

I can see that in the schema field definitions, we have some language
dependent parameters... It can be a problem, as I would like to have the
same fields for all requests...

Sorry to bother, but before I split all my data this way I would like to be
sure that it's the best approach for me.

Regards,
Daniel        

On 8/6/07 15:15, "Henrib" <[EMAIL PROTECTED]> wrote:

> 
> Hi Daniel,
> If it is functionally 'ok' to search in only one lang at a time, you could
> try having one index per lang. Each per-lang index would have one schema
> where you would describe field types (the lang part coming through
> stemming/snowball analyzers, per-lang stopwords & al) and the same field
> name could be used in each of them.
> You could either deploy that solution through multiple web-apps (one per
> lang) (or try the patch for issue Solr-215).
> Regards,
> Henri
> 
> 
> Daniel Alheiros wrote:
>> 
>> Hi, 
>> 
>> I'm just starting to use Solr and so far, it has been a very interesting
>> learning process. I wasn't a Lucene user, so I'm learning a lot about
>> both.
>> 
>> My problem is:
>> I have to index and search content in several languages.
>> 
>> My scenario is a bit different from other that I've already read in this
>> forum, as my client is the same to search any language and it could be
>> accomplished using a field to define language.
>> 
>> My questions are more focused on how to keep the benefits of all the
>> protwords, stopwords and synonyms in a multilanguage situation....
>> 
>> Should I create new Analyzers that can deal with the "language" field of
>> the
>> document? What do you recommend?
>> 
>> Regards,
>> Daniel 
>> 
>> 
>> http://www.bbc.co.uk/
>> This e-mail (and any attachments) is confidential and may contain personal
>> views which are not the views of the BBC unless specifically stated.
>> If you have received it in error, please delete it from your system.
>> Do not use, copy or disclose the information in any way nor act in
>> reliance on it and notify the sender immediately.
>> Please note that the BBC monitors e-mails sent or received.
>> Further communication will signify your consent to this.
>> 
>> 
>> 

http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal 
views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on 
it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.

Re: Multi-language indexing and searching

Reply via email to