about language extraction for zip documents

Eyeris RodrIguez Rueda Fri, 29 May 2015 11:38:52 -0700

Hi comunity.
Im using nutch 1.9 and solr 4.10.
I use nutch for parse zip documents, but the field language is empty in solr 
for all of this documents and this is a problem for me.
ParseZip plugin use tika to detect mimetype and to extract content of files but 
language is missing.
I was thinking that if the package has 3 documents so the language could be a 
multivalued field and contain all language from the documents inside.
What you think about this topic?

about language extraction for zip documents

Reply via email to