Re: Nutch not crawling on a pre-existing hadoop cluster?

MilleBii Tue, 07 Jun 2011 14:24:35 -0700

Aborting does not look wrong, it always does it at the end of a fetch cycle.


Do you use the one stop crawl command or step-by-step. In the latter case
you have more ability to see where it might fail.

We don't get attachments in this mailing list.

2011/6/3 Brian Griffey <[email protected]>

>  Hi all,
>
> I recently downloaded nutch onto my local machine.  I wrote a few plugins
> for it and successfully crawled a few sites to make sure that my parsers and
> indexers worked well.  I then moved the nutch installation onto our
> pre-existing hadoop cluster by copying the needed libs, confs, and the
> build/plugins dir onto every machine in the hadoop cluster, I also adjusted
> the nutch-site.xml to point the plugins to the hard coded path where the
> plugins sit.  The nutch system runs without errors, however it never past a
> few pages. It just seems to get stuck only grabbing one page per level and
> gets that page on every pass. I have included the interesting files and sys
> logs in the attachment for easy viewing. Anyone have any ideas on why it's
> not going forward? It also just seems to abort threads, any ideas?
>
> 2011-06-03 16:20:51,559 WARN org.apache.nutch.parse.ParserFactory: 
> ParserFactory:Plugin: org.apache.nutch.parse.html.HtmlParser mapped to 
> contentType application/xhtml+xml via parse-plugins.xml, but its plugin.xml 
> file does not claim to support contentType: application/xhtml+xml
> 2011-06-03 16:20:51,629 INFO org.apache.nutch.fetcher.Fetcher: 
> -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=19
> 2011-06-03 16:20:51,629 WARN org.apache.nutch.fetcher.Fetcher: Aborting with 
> 10 hung threads.
>
>
> --
> Brian Griffey
> ShopSavvy Android and Big Data Developer
> 650-352-1429
>
>


-- 
-MilleBii-

Re: Nutch not crawling on a pre-existing hadoop cluster?

Reply via email to