[ 
https://issues.apache.org/jira/browse/SOLR-8714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15212885#comment-15212885
 ] 

Lewis John McGibbney edited comment on SOLR-8714 at 3/26/16 8:36 AM:
---------------------------------------------------------------------

Hi [~teofili] I started a patch which I thought was sound. The blocker right 
now is SOLR-8716
If we can do the upgrade on Tika then this issue (with Joshua for example 
backing statistical machine translation via the [language 
packs|http://joshua-decoder.org/language-packs/] we've been generating) then 
this issue is IMHO a game changer for the way that Web crawlers harvest and 
make data available, useful and ultimately meaningful to us all. If we can get 
Solr doing statistical machine translation at indexing time then this is a game 
changer (of course others are doing it, but for the open source Apache Solr it 
would be excellent). 


was (Author: lewismc):
Hi [~teofili] I started a patch which I thought was sound. The blocker right 
now is SOLR-8716
If we can do the upgrade on Tika then this issue (with Joshua for example 
backing statistical machine translation via the language packs we've been 
generating) then this issue is IMHO a game changer for the way that Web 
crawlers harvest and make data available, useful and ultimately meaningful to 
us all. If we can get Solr doing statistical machine translation at indexing 
time then this is a game changer (of course others are doing it, but for the 
open source Apache Solr it would be excellent). 

> Implement translation contrib package for LanguageTranslationUpdateProcessor's
> ------------------------------------------------------------------------------
>
>                 Key: SOLR-8714
>                 URL: https://issues.apache.org/jira/browse/SOLR-8714
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Lewis John McGibbney
>             Fix For: master
>
>
> A while back over in Tika we implemented the 
> [Translator|https://github.com/apache/tika/blob/master/tika-core/src/main/java/org/apache/tika/language/translate/Translator.java]
>  interface. This now provides a number of 
> [implementations|https://github.com/apache/tika/tree/master/tika-translate/src/main/java/org/apache/tika/language/translate].
>  
> This issue will provide a  translation contrib package offering a 
> LanguageTranslationUpdateProcessor.
> The new processor will probably utilize the existing [Solr Language 
> Identifier|https://github.com/apache/lucene-solr/tree/master/solr/contrib/langid]
>  and would enable a document to be translated based upon a user defined 
> mapping. The LanguageTranslatorUpdateProcessor's should be pluggable and 
> would be placed in an UpdateChain the same as the 
> [LanguageIdentifierUpdateProcessor|https://github.com/apache/lucene-solr/blob/master/solr/contrib/langid/src/java/org/apache/solr/update/processor/LanguageIdentifierUpdateProcessor.java]'s
> It is my intent to also provide a wiki page which can be referenced and 
> maintained in conjunction with the code. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to