Tommaso Teofili created OAK-4348:
------------------------------------
Summary: Cross language search via SMT
Key: OAK-4348
URL: https://issues.apache.org/jira/browse/OAK-4348
Project: Jackrabbit Oak
Issue Type: New Feature
Components: query
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
Fix For: 1.6
It would be interesting to investigate usage of statistical machine translation
toolkits (like Apache Joshua) in order to enable cross language search, so that
query can be eventually expanded to search over translated terms too.
Example:
- enable spanish to english translation
- perform full text search for "hola"
- query engine looks for translations for "hola"
- SMT returns "hello"
- query engine add an additional (UNION) clause for the translated term
- the query performed by Oak becomes "hello OR hola"
- both results for english and spanish terms get returned
This of course should be configurable.
Note that the integration may happen also via Apache Tika which provides a
Translator API.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)