Thank you for your reply.

Yes, I realize that hitting a query against the hole content would come with
this problems, but what I'm trying to say is that I will always narrow by
the language (from my users point of view). I would like to know if it is
possible (and appropriate) to have all my content in one index for
administrative reasons (batch indexing, queries based on ID or Date,
centralized maintenance).

My index will be something around 4 GB (initially) and maybe in 5 years time
it will reach 8 GB.

What do you think about it?

Regards,
Daniel


On 7/6/07 18:56, "Walter Underwood" <[EMAIL PROTECTED]> wrote:

> I'm not sure what sort of "field" you mean for defining the
> language.
> 
> If you plan to use a single search UI regardless of language,
> we used to do this in Ultraseek, but it doesn't really work.
> Queries are too short for reliable language ID (is "die" in
> German, English, or Latin?), and language-specific processing
> can be pretty differnent.
> 
> We ran into surface words that collided in different languages.
> As I remember, "mobile" is a plural noun in Dutch but a verb in
> English.
> 
> Finally, Solr lingustic support is OK for English, but not as
> good for more heavily-inflected langauge. For German, you
> really need to decompose compound words, something not available
> in Solr.
> 
> The only semi-successful cross-language search seems to be with
> n-gram indexing. That usually produces a larger index and somewhat
> slower performance (because of the number of terms), but at least
> it works.
> 
> wunder
> 
> On 6/7/07 10:47 AM, "Daniel Alheiros" <[EMAIL PROTECTED]> wrote:
> 
>> I have to index and search content in several languages.
>> 
>> My scenario is a bit different from other that I've already read in this
>> forum, as my client is the same to search any language and it could be
>> accomplished using a field to define language.
> 


http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal 
views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on 
it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.
                                        

Reply via email to