Apache Tika version 1.15 now handles XLSB files. The behavior described below is the expected behavior if a file type is identified but there is no parser to handle that file type.
A little late to the game, I admit... :) Cheers, Tim From Roland Everaert <reveatw...@gmail.com> Subject Re: XLSB files not indexed Date Mon, 21 Oct 2013 07:59:20 GMTHi Otis, In our case, there is no exception raised by tika or solr, a lucene document is created, but the content field contains only a few white spaces like for ODF files. Roland. On Sat, Oct 19, 2013 at 3:54 AM, Otis Gospodnetic < otis.gospodne...@gmail.com> wrote: > Hi Roland, > > It looks like: > Tika - yes > Solr - no? > > Based on http://search-lucene.com/?q=xlsb > > ODF != XLSB though, I think... > > Otis > -- > Solr & ElasticSearch Support -- http://sematext.com/ > Performance Monitoring -- http://sematext.com/spm > > > > On Fri, Oct 18, 2013 at 7:36 AM, Roland Everaert <reveatw...@gmail.com> > wrote: > > Hi, > > > > Can someone tells me if tika is supposed to extract data from xlsb files > > (the new MS Office format in binary form)? > > > > If so then it seems that solr is not able to index them like it is not > able > > to index ODF files (a JIRA is already opened for ODF > > https://issues.apache.org/jira/browse/SOLR-4809) > > > > Can someone confirm the problem, or tell me what to do to make solr works > > with XLSB files. > > > > > > Regards, > > > > > > Roland. >