+1, agreed. This would be a welcomed addition.
Cheers, Chris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -----Original Message----- From: Lewis John Mcgibbney <[email protected]> Reply-To: "[email protected]" <[email protected]> Date: Sunday, May 31, 2015 at 11:49 AM To: "[email protected]" <[email protected]> Subject: Re: about language extraction for zip documents >Hi, > >On Sun, May 31, 2015 at 12:30 AM, <[email protected]> >wrote: > >> >> >> Hi comunity. >> Im using nutch 1.9 and solr 4.10. >> I use nutch for parse zip documents, but the field language is empty in >> solr for all of this documents and this is a problem for me. >> ParseZip plugin use tika to detect mimetype and to extract content of >> files but language is missing. >> I was thinking that if the package has 3 documents so the language could >> be a multivalued field and contain all language from the documents >>inside. >> What you think about this topic? >> > >Please open a Jira issue and if possible attach a patch for the >functionality. It think it would be a nice addition to the parse-zip >plugin >and to me makes good sense. >Thanks >Lewis

