On Wed, Aug 11, 2010 at 18:23, webdev1977 <[email protected]> wrote:

>
> I am using tika... should I not be?  The problem is that this shared drive
> has such a diverse set of documents, I was trying to include as many
> document types as possible.  There are some really really office documents
> that can't be open by the newer versions of office.  I was having problems
> in nutch 1.0 with parsing them.  hmm.. maybe I should turn off tika?
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Have-yet-to-complete-a-very-large-filesystem-crawl-tp1076547p1089160.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>

Can you check this issue?

https://issues.apache.org/jira/browse/NUTCH-356

 <https://issues.apache.org/jira/browse/NUTCH-356>Maybe it can help.


-- 
Doğacan Güney

Reply via email to