On 16 Dec 2014 at 11:48:44, Marius Dumitru Florea ([email protected](mailto:[email protected])) wrote:
> On Tue, Dec 16, 2014 at 2:11 AM, Arnold, Garth wrote: > > Hello Marius - thank you for the detailed reply. My goal is (2) - to find > > all documents with a .7z attachment, where those attachments include > > file(s) containing "foo". If I read your email correctly, Tika 1.6 (5) is > > root cause for my failure to search successfully for text within the files > > contained in a .7z attachment. I am successful with my search when using a > > .zip file as the attachment - so we will instruct wiki users to avoid .7z > > attachments. > > Yes, at least until we upgrade to Tika 1.7. FTR we’re now using Tika 1.7 in the latest versions of XWiki. Thanks -Vincent > Thanks, > Marius > > > > > Garth > > > >> -----Original Message----- > >> Message: 2 > >> Date: Thu, 11 Dec 2014 08:42:20 +0200 > >> From: Marius Dumitru Florea > >> To: XWiki Users > >> Subject: Re: [xwiki-users] XWiki search/Solr support for additional > >> filetypes > >> Message-ID: > >> > >> [email protected]> > >> Content-Type: text/plain; charset=UTF-8 > >> > >> It depends what you mean by "search attachments that are 7-Zip .7z > >> archives": > >> > >> (1) Give me all the documents that have an attachment of mime type > >> application/x-7z-compressed > >> (2) Give me all the documents that have a 7-Zip archive attached that > >> includes a file that contains the word "foo" > >> > >> If you use Solr, the default search engine for XWiki 6.2.4, then the > >> code that is responsible for indexing the attachments is > >> AttachmentSolrMetadataExtractor [1]. This is a component so it can be > >> overridden as per [2]. The current implementation uses Tika [3] to: > >> > >> (1) detect the mime type of the attachment > >> (2) extract indexable content from the attachment (whatever its mime > >> type may be) > >> > >> For (1) Tika supports detecting the 7-Zip mime type since version 1.2 > >> [4]. For (2) judging by [5] it seems Tika also supports reading 7-ZIP > >> archives but there were some issues in 1.6 that have been fixed in > >> 1.7. We are currently using Tika 1.6 in XWiki. We should probably > >> upgrade. > >> > >> Hope this helps, > >> Marius > >> > >> [1] https://github.com/xwiki/xwiki-platform/blob/master/xwiki-platform- > >> core/xwiki-platform-search/xwiki-platform-search-solr/xwiki-platform- > >> search-solr- > >> api/src/main/java/org/xwiki/search/solr/internal/metadata/AttachmentSolr > >> MetadataExtractor.java > >> [2] > >> http://extensions.xwiki.org/xwiki/bin/view/Extension/Component+Module > >> #HOverrides > >> [3] https://github.com/xwiki/xwiki-platform/blob/master/xwiki-platform- > >> core/xwiki-platform-search/xwiki-platform-search-solr/xwiki-platform- > >> search-solr- > >> api/src/main/java/org/xwiki/search/solr/internal/metadata/AbstractSolrMet > >> adataExtractor.java#L458 > >> [4] https://issues.apache.org/jira/browse/TIKA-940 > >> [5] https://issues.apache.org/jira/browse/TIKA-1411 > >> > >> On Wed, Dec 10, 2014 at 9:20 PM, Arnold, Garth wrote: > >> > Hello - is it possible to enable searching of additional filetypes > >> > within XWiki > >> 6.2.4? Specifically I would like to be able to search attachments that are > >> 7-Zip > >> .7z archives. It looks to me as though the underlying library (Commons > >> Compress) supports this filetype, but I am a new XWiki user and non-java > >> programmer so I may be assuming too much. > >> > > >> > Thanks in advance for your thoughts on this - > >> > > >> > Garth Arnold _______________________________________________ users mailing list [email protected] http://lists.xwiki.org/mailman/listinfo/users
