Hi Henri. Thanks for your reply. I've just looked at the patch you referred, but doing this I will lose the out of the box Solr installation... I'll have to create my own Solr application responsible for creating the multiple cores and I'll have to change my indexing process to something able to notify content for a specific core.
Can't I have the same index, using one single core, same field names being processed by language specific components based on a field/parameter? I will try to draw what I'm thinking, please forgive me if I'm not using the correct terms but I'm not an IR expert. Thinking in a workflow: Indexing: Multilanguage indexer receives some documents for each document, verify the "language" field if language = "English" then process using the EnglishIndexer else if language = "Chinese" then process using the ChineseIndexer else if ... Querying: Multilanguage Request Handler receives a request if parameter language = "English" then process using the English Request Handler else if parameter language = "Chinese" then process using the Chinese Request Handler else if ... I can see that in the schema field definitions, we have some language dependent parameters... It can be a problem, as I would like to have the same fields for all requests... Sorry to bother, but before I split all my data this way I would like to be sure that it's the best approach for me. Regards, Daniel On 8/6/07 15:15, "Henrib" <[EMAIL PROTECTED]> wrote: > > Hi Daniel, > If it is functionally 'ok' to search in only one lang at a time, you could > try having one index per lang. Each per-lang index would have one schema > where you would describe field types (the lang part coming through > stemming/snowball analyzers, per-lang stopwords & al) and the same field > name could be used in each of them. > You could either deploy that solution through multiple web-apps (one per > lang) (or try the patch for issue Solr-215). > Regards, > Henri > > > Daniel Alheiros wrote: >> >> Hi, >> >> I'm just starting to use Solr and so far, it has been a very interesting >> learning process. I wasn't a Lucene user, so I'm learning a lot about >> both. >> >> My problem is: >> I have to index and search content in several languages. >> >> My scenario is a bit different from other that I've already read in this >> forum, as my client is the same to search any language and it could be >> accomplished using a field to define language. >> >> My questions are more focused on how to keep the benefits of all the >> protwords, stopwords and synonyms in a multilanguage situation.... >> >> Should I create new Analyzers that can deal with the "language" field of >> the >> document? What do you recommend? >> >> Regards, >> Daniel >> >> >> http://www.bbc.co.uk/ >> This e-mail (and any attachments) is confidential and may contain personal >> views which are not the views of the BBC unless specifically stated. >> If you have received it in error, please delete it from your system. >> Do not use, copy or disclose the information in any way nor act in >> reliance on it and notify the sender immediately. >> Please note that the BBC monitors e-mails sent or received. >> Further communication will signify your consent to this. >> >> >> http://www.bbc.co.uk/ This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this.