Hi Dennis, Thanks for the reply. I can't avoid using the regex matching, I have some patterns in the hostname that can't be matched using either prefix or suffix filters. However, I will try it your way using simpler regexes just to test your theory.
Regards, -vishal. -----Original Message----- From: Dennis Kubes [mailto:[EMAIL PROTECTED] Sent: Friday, September 08, 2006 11:30 PM To: [email protected] Subject: Re: Reduce Error during fetch You may be running into problems with regex stalls on filtering. Try removing the regex filter from the nutch-site.xml plugin.includes property. I was having similar problems before switching to just use prefix and suffix filters as below. I attached my prefix and suffix url filter files that go in conf. I am only indexing http files so you may need to modify these. Hope this helps. <property> <name>plugin.includes</name> <value>protocol-http|urlfilter-(suffix|prefix)|parse-(text|html|js)|inde x-basic|query-(basic|site|url)|summary-basic|scoring-opic</value> <description>Regular expression naming plugin directory names to include. Any plugin not matching this expression is excluded. In any case you need at least include the nutch-extensionpoints plugin. By default Nutch includes crawling just HTML and plain text via HTTP, and basic indexing and search plugins. </description> </property> Dennis Vishal Shah wrote: > Hi, > > I've been trying to get the nutch fetcher to work since a couple of > days, but it always hangs on one of the reduce processes, and the job is > aborted. I am using numFetchers=24 during generate, 24 map tasks and 6 > reduce tasks during fetch on a 3 machine cluster. The task that failed > was tried atleast 3 times, before the job was aborted. > > I looked into the logs on one of the machines with the failed tasks, > and I see these errors: > > 1) 2006-09-08 18:04:03,294 INFO mapred.TaskTracker - > task_0003_r_000004_3: Task failed to report status for 608 seconds. > Killin > g. > > 2) > java.lang.IllegalStateException > at > org.mortbay.jetty.servlet.ServletHttpResponse.getWriter(ServletHttpRespo > nse.java:561) > at > org.apache.jasper.runtime.JspWriterImpl.initOut(JspWriterImpl.java:122) > at > org.apache.jasper.runtime.JspWriterImpl.flushBuffer(JspWriterImpl.java:1 > 15) > at > org.apache.jasper.runtime.PageContextImpl.release(PageContextImpl.java:1 > 90) > at > org.apache.jasper.runtime.JspFactoryImpl.internalReleasePageContext(JspF > actoryImpl.java:115) > at > org.apache.jasper.runtime.JspFactoryImpl.releasePageContext(JspFactoryIm > pl.java:75) > at > org.apache.hadoop.mapred.getMapOutput_jsp._jspService(getMapOutput_jsp.j > ava:100) > at > org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:94) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:802) > at > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427) > at > org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationH > andler.java:475) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567) > at org.mortbay.http.HttpContext.handle(HttpContext.java:1565) > at > org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationCon > text.java:635) > at org.mortbay.http.HttpContext.handle(HttpContext.java:1517) > at org.mortbay.http.HttpServer.service(HttpServer.java:954) > at > org.mortbay.http.HttpConnection.service(HttpConnection.java:814) > at > org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981) > at > org.mortbay.http.HttpConnection.handle(HttpConnection.java:831) > at > org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244 > ) > at > org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357) > at > org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534) > > Any idea where the problem is, and how to rectify it? > > Regards, > > -vishal. > > ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
