Re: multilingual content and indexing

Lukas Kahwe Smith Tue, 12 Jul 2016 03:24:07 -0700

> On 12 Jul 2016, at 12:15, Michael Marth <[email protected]> wrote:
> 
> Hi Lukas,
> 
> I am not entirely sure what you want to achieve (or what exactly you mean 
> with “dealing with multi language content”), but trying to answer a bit:
> 
> Let’s say you have distinct content trees for different languages, like e.g.
> /content/en
> /content/jp
> Etc.
> 
> You can choose to index all these trees in one (Lucene) index for full text 
> search and filter the results in your query, i.e. Put the burden on the query 
> engine.
> This is a simple setup which leads to a large index (although I personally 
> have not seen this to be a problem)


for example if you index multi lingual content under the same field while doing 
monolingual searches, then you tend to have suboptimal sorting since word 
distributions values from one language affect word distribution of another.

> Alternatively, you can create different index definitions for each subtree 
> (see [1]), e.g. Using the “includedPaths” property. This would lead to 
> smaller indexes at the downside that you would have to create an index 
> definition if you add a new language tree.
> This approach has the additional benefit that you can define 
> language-specific Lucene analyzers for each sub tree, so that e.g. In the 
> example above the Japanese index would have ist own analyzer.

ok, so its possible to tweak this with the standard indexer in Oak without 
having to switch to an external indexer like Solr just for this. good to hear.

regards,
Lukas Kahwe Smith
[email protected]

signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: multilingual content and indexing

Reply via email to