hi, when i want to index a excel file i have the following error Error parsing: http://dev.torrez.us/public/2006/pundit/java/src/plugin/parse-msexcel/sample/test.xls: failed(2,0): Can't be handled as Microsoft document. java.lang.ArrayIndexOutOfBoundsException: No cell at position col1, row 0.
help plz Susam Pal wrote: > > So, the fetch has been successful. The document has been indexed. It > stopped at depth 1 because there are no more URLs to fetch. This is > not any sort of failure. This just means that the job has been > completed too soon. If there were more URLs to fetch, the crawling > would have continued with depth 1. > > Crawling begins with depth 0 when it fetches all the URLs mentioned in > the seed files. Then in the generate phase, more URLs are generated > that have been obtained from the pages fetched in depth 0. In depth 1, > fetcher fetches the generated URLs which haven't been fetched. This > process continues till the fetcher has run as many number of times as > mentioned in the -depth argument, or there are no more URLs to fetch, > whichever happens first. > > I would strongly recommend that you go through the Nutch tutorial > once. It is available at:- > http://lucene.apache.org/nutch/tutorial8.html and this would help you > understand Nutch better. > > Regards, > Susam Pal > > On Nov 16, 2007 4:59 PM, crazy <[EMAIL PROTECTED]> wrote: >> >> i change my seed urls file to this >> http://www.frlii.org/IMG/doc/catalogue_a_portail_27-09-2004.doc >> >> and i have this like result: >> fetching http://www.frlii.org/IMG/doc/cactalogue_a_portail_27-09-2004.doc >> 16 nov. 2007 11:18:55 org.apache.tika.mime.MimeUtils load >> INFO: Loading [tika-mimetypes.xml] >> Fetcher: done >> CrawlDb update: starting >> CrawlDb update: db: crawl/crawldb >> CrawlDb update: segments: [crawl/segments/20071116111851] >> CrawlDb update: additions allowed: true >> CrawlDb update: URL normalizing: true >> CrawlDb update: URL filtering: true >> CrawlDb update: Merging segment data into db. >> CrawlDb update: done >> Generator: Selecting best-scoring urls due for fetch. >> Generator: starting >> Generator: segment: crawl/segments/20071116111859 >> Generator: filtering: false >> Generator: topN: 2147483647 >> Generator: jobtracker is 'local', generating exactly one partition. >> Generator: 0 records selected for fetching, exiting ... >> Stopping at depth=1 - no more URLs to fetch. >> >> what i can do now i feel that we are near the aim >> >> tksss >> > > -- View this message in context: http://www.nabble.com/indexing-word-file-tf4819567.html#a13829977 Sent from the Nutch - User mailing list archive at Nabble.com.
