What method in crawl.java would trigger the invocation of plugins? Sent from my iPhone. Please ignore the typos.
On Nov 3, 2011, at 5:30 AM, Markus Jelsma <[email protected]> wrote: > remove *parse* in the segment and you're good to go. > > On Thursday 03 November 2011 13:16:40 Ashish Mehrotra wrote: >> Hi All, >> >> I am trying to parse already crawled segments using the method -- >> ParseSegment.parse(seg); >> >> >> seg is the Path to the existing segment. >> This internally fires a new job and the error thrown is -- >> >> Exception in thread "main" java.io.IOException: Segment already parsed! >> at >> org.apache.nutch.parse.ParseOutputFormat.checkOutputSpecs(ParseOutputForma >> t.java:80) at >> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:772) >> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730) at >> org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249) >> at org.apache.nutch.parse.ParseSegment.parse(ParseSegment.java:156) >> >> What I am trying to do here is parse the already fetched data to test my >> HTML Parse Filter. Looks like the above method of ParseSegment gets called >> in the normal workflow of crawl, fetch, parse ... >> >> What I have done is modified the org.apache.nutch.crawl.Crawl.run() to >> call only ParseSegment and commented the injector, generator and fetcher >> parts. I am calling ParseSegment.parse(segment) in the run() method. I am >> passing the segment name in the command line. >> >> Should I be calling some other method to test my HTML parser filter plugin >> without crawling again? >> >> Any pointers should be helpful. >> >> Thanks, >> Ashish > > -- > Markus Jelsma - CTO - Openindex > http://www.linkedin.com/in/markus17 > 050-8536620 / 06-50258350

