This looks like a problem with the name server lookup. Perhaps your name server is slow. While running Nutch you could try 'talking' to the name server and see if it responds quickly or if it's slow.
You could probably further examine this problem by running Nutch with `strace', if you are running this on UNIX. Otis --- Nutch Crawler <[EMAIL PROTECTED]> wrote: > > When it slows, can you get a stack dump by sending SIGQUIT? My > hunch is > > Here is a sample of the SIGQUIT results, the fetchers are in one of > these two states: > > "fetcher12" prio=1 tid=0x082f8448 nid=0x18cb runnable > [0x6de56000..0x6de56600] > at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method) > at > java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:838) > at > java.net.InetAddress.getAddressFromNameService(InetAddress.java:1176) > at java.net.InetAddress.getAllByName0(InetAddress.java:1126) > at java.net.InetAddress.getAllByName0(InetAddress.java:1098) > at java.net.InetAddress.getAllByName(InetAddress.java:1061) > at java.net.InetAddress.getByName(InetAddress.java:958) > at > java.net.InetSocketAddress.<init>(InetSocketAddress.java:124) > at > net.nutch.protocol.http.HttpResponse.<init>(HttpResponse.java:94) > at > net.nutch.protocol.http.HttpResponse.<init>(HttpResponse.java:53) > at > net.nutch.protocol.http.RobotRulesParser.isAllowed(RobotRulesParser.java:364) > at net.nutch.protocol.http.Http.getContent(Http.java:145) > at > net.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:106) > > "fetcher13" prio=1 tid=0x082f96b0 nid=0x18cc in Object.wait() > [0x6ddd5000..0x6ddd5480] > at java.lang.Object.wait(Native Method) > - waiting on <0x763a7238> (a java.util.HashMap) > at java.lang.Object.wait(Object.java:474) > at > java.net.InetAddress.checkLookupTable(InetAddress.java:1226) > - locked <0x763a7238> (a java.util.HashMap) > at > java.net.InetAddress.getAddressFromNameService(InetAddress.java:1165) > at java.net.InetAddress.getAllByName0(InetAddress.java:1126) > at java.net.InetAddress.getAllByName0(InetAddress.java:1098) > at java.net.InetAddress.getAllByName(InetAddress.java:1061) > at java.net.InetAddress.getByName(InetAddress.java:958) > at > java.net.InetSocketAddress.<init>(InetSocketAddress.java:124) > at > net.nutch.protocol.http.HttpResponse.<init>(HttpResponse.java:94) > at > net.nutch.protocol.http.HttpResponse.<init>(HttpResponse.java:53) > at > net.nutch.protocol.http.RobotRulesParser.isAllowed(RobotRulesParser.java:364) > at net.nutch.protocol.http.Http.getContent(Http.java:145) > at > net.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:106) > > > Are the pdf and word parsers enabled, as they are by default? > > These have been known to hang before. > ... > > So one workaround would be to disable these, using something like: > > <property> > > <name>plugin.excludes</name> > > <value>(protocol-(?!http).*)|(parse-(?!html).*)</value> > > </property> > > I have turned off all parsers except html parsers by setting this > property. > > > Another workaround would be to use the whole-web method, and break > your > > fetchlists into smaller chunks that take less than 12 hours to > fetch, > > using, e.g., the -numFetchers parameter when generating fetchlists. > But > > this is substantially more complicated if you"re currently using > the > > crawl command. > > I am using the whole web method. > > Intel(R) Pentium(R) 4 CPU 2.80GHz > WhiteBox Linux 2.4.21-20.EL > java version "1.5.0" > nutch-2004-11-12.tar.gz > > Let me know if you need any other info. > Any suggestions would be helpful. > Thanks, > Ralph > > > ------------------------------------------------------- > This SF.Net email is sponsored by: InterSystems CACHE > FREE OODBMS DOWNLOAD - A multidimensional database that combines > robust object and relational technologies, making it a perfect match > for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8 > _______________________________________________ > Nutch-developers mailing list > [EMAIL PROTECTED] > https://lists.sourceforge.net/lists/listinfo/nutch-developers > ------------------------------------------------------- This SF.Net email is sponsored by: InterSystems CACHE FREE OODBMS DOWNLOAD - A multidimensional database that combines robust object and relational technologies, making it a perfect match for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8 _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers
