Doğacan Güney-3 wrote: > > On Wed, Aug 11, 2010 at 18:23, webdev1977 <webdev1...@gmail.com> wrote: > >> >> I am using tika... should I not be? The problem is that this shared >> drive >> has such a diverse set of documents, I was trying to include as many >> document types as possible. There are some really really office >> documents >> that can't be open by the newer versions of office. I was having >> problems >> in nutch 1.0 with parsing them. hmm.. maybe I should turn off tika? >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/Have-yet-to-complete-a-very-large-filesystem-crawl-tp1076547p1089160.html >> Sent from the Nutch - User mailing list archive at Nabble.com. >> > > Can you check this issue? > > https://issues.apache.org/jira/browse/NUTCH-356 > > <https://issues.apache.org/jira/browse/NUTCH-356>Maybe it can help. > > > -- > Doğacan Güney > >
Thanks for the sugestion, I am trying it out as we speak. ! -- View this message in context: http://lucene.472066.n3.nabble.com/Have-yet-to-complete-a-very-large-filesystem-crawl-tp1076547p1090655.html Sent from the Nutch - User mailing list archive at Nabble.com.