Tommaso Teofili created OAK-4348:
------------------------------------

             Summary: Cross language search via SMT
                 Key: OAK-4348
                 URL: https://issues.apache.org/jira/browse/OAK-4348
             Project: Jackrabbit Oak
          Issue Type: New Feature
          Components: query
            Reporter: Tommaso Teofili
            Assignee: Tommaso Teofili
             Fix For: 1.6


It would be interesting to investigate usage of statistical machine translation 
toolkits (like Apache Joshua) in order to enable cross language search, so that 
query can be eventually expanded to search over translated terms too.
Example: 
- enable spanish to english translation
- perform full text search for "hola" 
- query engine looks for translations for "hola"
- SMT returns "hello"
- query engine add an additional (UNION) clause for the translated term
- the query performed by Oak becomes "hello OR hola"
- both results for english and spanish terms get returned

This of course should be configurable.
Note that the integration may happen also via Apache Tika which provides a 
Translator API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to