> When it slows, can you get a stack dump by sending SIGQUIT? My hunch is
Here is a sample of the SIGQUIT results, the fetchers are in one of
these two states:
"fetcher12" prio=1 tid=0x082f8448 nid=0x18cb runnable [0x6de56000..0x6de56600]
at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:838)
at java.net.InetAddress.getAddressFromNameService(InetAddress.java:1176)
at java.net.InetAddress.getAllByName0(InetAddress.java:1126)
at java.net.InetAddress.getAllByName0(InetAddress.java:1098)
at java.net.InetAddress.getAllByName(InetAddress.java:1061)
at java.net.InetAddress.getByName(InetAddress.java:958)
at java.net.InetSocketAddress.<init>(InetSocketAddress.java:124)
at net.nutch.protocol.http.HttpResponse.<init>(HttpResponse.java:94)
at net.nutch.protocol.http.HttpResponse.<init>(HttpResponse.java:53)
at
net.nutch.protocol.http.RobotRulesParser.isAllowed(RobotRulesParser.java:364)
at net.nutch.protocol.http.Http.getContent(Http.java:145)
at net.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:106)
"fetcher13" prio=1 tid=0x082f96b0 nid=0x18cc in Object.wait()
[0x6ddd5000..0x6ddd5480]
at java.lang.Object.wait(Native Method)
- waiting on <0x763a7238> (a java.util.HashMap)
at java.lang.Object.wait(Object.java:474)
at java.net.InetAddress.checkLookupTable(InetAddress.java:1226)
- locked <0x763a7238> (a java.util.HashMap)
at java.net.InetAddress.getAddressFromNameService(InetAddress.java:1165)
at java.net.InetAddress.getAllByName0(InetAddress.java:1126)
at java.net.InetAddress.getAllByName0(InetAddress.java:1098)
at java.net.InetAddress.getAllByName(InetAddress.java:1061)
at java.net.InetAddress.getByName(InetAddress.java:958)
at java.net.InetSocketAddress.<init>(InetSocketAddress.java:124)
at net.nutch.protocol.http.HttpResponse.<init>(HttpResponse.java:94)
at net.nutch.protocol.http.HttpResponse.<init>(HttpResponse.java:53)
at
net.nutch.protocol.http.RobotRulesParser.isAllowed(RobotRulesParser.java:364)
at net.nutch.protocol.http.Http.getContent(Http.java:145)
at net.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:106)
> Are the pdf and word parsers enabled, as they are by default?
> These have been known to hang before.
...
> So one workaround would be to disable these, using something like:
> <property>
> <name>plugin.excludes</name>
> <value>(protocol-(?!http).*)|(parse-(?!html).*)</value>
> </property>
I have turned off all parsers except html parsers by setting this property.
> Another workaround would be to use the whole-web method, and break your
> fetchlists into smaller chunks that take less than 12 hours to fetch,
> using, e.g., the -numFetchers parameter when generating fetchlists. But
> this is substantially more complicated if you"re currently using the
> crawl command.
I am using the whole web method.
Intel(R) Pentium(R) 4 CPU 2.80GHz
WhiteBox Linux 2.4.21-20.EL
java version "1.5.0"
nutch-2004-11-12.tar.gz
Let me know if you need any other info.
Any suggestions would be helpful.
Thanks,
Ralph
-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers