Hi,

I'm trying to use the index-more plugin (and query-more) since I would like to 
get the size of the page indexed but am running into the following problem :

Indexer: starting
Indexer: linkdb: crawl/linkdb
Indexer: adding segment: crawl/segments/20080417185602
IFD [Thread-7]: setInfoStream [EMAIL PROTECTED]
IW 0 [Thread-7]: setInfoStream: 
dir=org.apache.lucene.store.FSDirectory@/tmp/hadoop-hilkiah/mapred/local/index/_41064764
 autoCommit=true [EMAIL PROTECTED] [EMAIL PROTECTED] ramBufferSizeMB=16.0 
maxBuffereDocs=50 maxBuffereDeleteTerms=-1 maxFieldLength=10000 index=
Indexer: java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:894)
        at org.apache.nutch.indexer.Indexer.index(Indexer.java:311)
        at org.apache.nutch.indexer.Indexer.run(Indexer.java:333)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.nutch.indexer.Indexer.main(Indexer.java:316)


This is what my plugin.includes looks like:

<property>
  <name>plugin.includes</name>    
<value>protocol-http|urlfilter-suffix|parse-(text|html|js)|index-(basic|anchor|more)|query-(basic|site|url|more)|summary-lucene|scoring-opic|urlnormalizer-(pass|regex|basic)</value>
</property>


Note if I disable index-more (and leave query-more active), the crawl 
completes.  Pls advise.

Regards,
 
Hilkiah G. Lavinier MEng (Hons), ACGI 
6 Winston Lane, 
Goodwill, 
Roseau, Dominica

Mbl: (767) 275 3382
Hm : (767) 440 3924
Fax: (767) 440 4991
VoIP USA: (646) 432 4487


Email: [EMAIL PROTECTED]
Email: [EMAIL PROTECTED]
IM: Yahoo hilkiah / MSN [EMAIL PROTECTED]
IM: ICQ #8978201  / AOL hilkiah21





      
____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  
http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

Reply via email to