Doğacan Güney-3 wrote:
> 
> On Wed, Aug 11, 2010 at 18:23, webdev1977 <webdev1...@gmail.com> wrote:
> 
>>
>> I am using tika... should I not be?  The problem is that this shared
>> drive
>> has such a diverse set of documents, I was trying to include as many
>> document types as possible.  There are some really really office
>> documents
>> that can't be open by the newer versions of office.  I was having
>> problems
>> in nutch 1.0 with parsing them.  hmm.. maybe I should turn off tika?
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Have-yet-to-complete-a-very-large-filesystem-crawl-tp1076547p1089160.html
>> Sent from the Nutch - User mailing list archive at Nabble.com.
>>
> 
> Can you check this issue?
> 
> https://issues.apache.org/jira/browse/NUTCH-356
> 
>  <https://issues.apache.org/jira/browse/NUTCH-356>Maybe it can help.
> 
> 
> -- 
> Doğacan Güney
> 
> 


Thanks for the sugestion, I am trying it out as we speak.

!

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Have-yet-to-complete-a-very-large-filesystem-crawl-tp1076547p1090655.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to