Re: indexing word file

crazy Fri, 16 Nov 2007 08:25:42 -0800

hi
now i want to index a excel file but i have this problem

Error parsing:
http://dev.torrez.us/public/2006/pundit/java/src/plugin/parse-msexcel/sample/test.xls:
failed(2,0): Can't be handled as Microsoft document.
java.lang.ArrayIndexOutOfBoundsException: No cell at position col1, row 0.
my plugin.includes is like this
 <name>plugin.includes</name>
 
<value>protocol-http|urlfilter-regex|parse-(text|html|htm|js|pdf|msword|mspowerpoint|msexcel)
|index-basic|query-(basic|site|url)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)</value>


i don't now where is the problem 
plz help












Susam Pal wrote:
> 
> So, the fetch has been successful. The document has been indexed. It
> stopped at depth 1 because there are no more URLs to fetch. This is
> not any sort of failure. This just means that the job has been
> completed too soon. If there were more URLs to fetch, the crawling
> would have continued with depth 1.
> 
> Crawling begins with depth 0 when it fetches all the URLs mentioned in
> the seed files. Then in the generate phase, more URLs are generated
> that have been obtained from the pages fetched in depth 0. In depth 1,
> fetcher fetches the generated URLs which haven't been fetched. This
> process continues till the fetcher has run as many number of times as
> mentioned in the -depth argument, or there are no more URLs to fetch,
> whichever  happens first.
> 
> I would strongly recommend that you go through the Nutch tutorial
> once. It is available at:-
> http://lucene.apache.org/nutch/tutorial8.html and this would help you
> understand Nutch better.
> 
> Regards,
> Susam Pal
> 
> On Nov 16, 2007 4:59 PM, crazy <[EMAIL PROTECTED]> wrote:
>>
>> i change my seed urls file to this
>> http://www.frlii.org/IMG/doc/catalogue_a_portail_27-09-2004.doc
>>
>> and i have this like result:
>> fetching http://www.frlii.org/IMG/doc/cactalogue_a_portail_27-09-2004.doc
>> 16 nov. 2007 11:18:55 org.apache.tika.mime.MimeUtils load
>> INFO: Loading [tika-mimetypes.xml]
>> Fetcher: done
>> CrawlDb update: starting
>> CrawlDb update: db: crawl/crawldb
>> CrawlDb update: segments: [crawl/segments/20071116111851]
>> CrawlDb update: additions allowed: true
>> CrawlDb update: URL normalizing: true
>> CrawlDb update: URL filtering: true
>> CrawlDb update: Merging segment data into db.
>> CrawlDb update: done
>> Generator: Selecting best-scoring urls due for fetch.
>> Generator: starting
>> Generator: segment: crawl/segments/20071116111859
>> Generator: filtering: false
>> Generator: topN: 2147483647
>> Generator: jobtracker is 'local', generating exactly one partition.
>> Generator: 0 records selected for fetching, exiting ...
>> Stopping at depth=1 - no more URLs to fetch.
>>
>> what i can do now i feel that we are near the aim
>>
>> tksss
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/indexing-word-file-tf4819567.html#a13796482
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: indexing word file

Reply via email to