On Wed, Aug 11, 2010 at 18:23, webdev1977 <[email protected]> wrote:
> > I am using tika... should I not be? The problem is that this shared drive > has such a diverse set of documents, I was trying to include as many > document types as possible. There are some really really office documents > that can't be open by the newer versions of office. I was having problems > in nutch 1.0 with parsing them. hmm.. maybe I should turn off tika? > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Have-yet-to-complete-a-very-large-filesystem-crawl-tp1076547p1089160.html > Sent from the Nutch - User mailing list archive at Nabble.com. > Can you check this issue? https://issues.apache.org/jira/browse/NUTCH-356 <https://issues.apache.org/jira/browse/NUTCH-356>Maybe it can help. -- Doğacan Güney

