Any idea what may be causing my urls to disappear? I am loosing 3 out of 5 urls when I use 4 nodes in the cluster. Everything is fine when I use 1 node in the cluster.
Here is a simple test that is giving me different stats if I am using 1 or multiple nodes: Background: running nutch on 4 nodes, Nodes are Debian and Gentoo, 64 and 32 bit systems. Goal: test if indexing 5 urls yields same results when using 1 node versus using 5 nodes Code: latest from trunk as of Nov 23 Number of urls injected in both tests: 5 TEST1: Inject file with 5 urls, comment out all slave nodes except one RESULT1: as expected bin/nutch readdb crawltest/crawldb -stats CrawlDb statistics start: crawltest/crawldb Statistics for CrawlDb: crawltest/crawldb TOTAL urls: 5 retry 0: 2 retry 2: 3 min score: 1.0 avg score: 1.0 max score: 1.0 status 1 (db_unfetched): 3 status 4 (db_redir_temp): 2 CrawlDb statistics: done TEST2: Inject file with 5 urls, 4 slave nodes except one RESULT1: missing 3 urls /nutch/search$ bin/nutch readdb crawltest/crawldb -stats CrawlDb statistics start: crawltest/crawldb Statistics for CrawlDb: crawltest/crawldb TOTAL urls: 2 retry 0: 1 min score: 1.0 avg score: 1.0 max score: 1.0 status 1 (db_unfetched): 1 CrawlDb statistics: done ERROR logs On 32 bit systems I get: 2007-11-23 11:30:36,479 DEBUG util.NativeCodeLoader - Trying to load the custom-built native-hadoop library... 2007-11-23 11:30:36,480 DEBUG util.NativeCodeLoader - Failed to load native-hadoop with error: java.lang.UnsatisfiedLinkError: /nutch/search/lib/native/Linux-i386-32/libhadoop.so: /lib/tls/i686/cmov/libc.so.6: version `GLIBC_2.4' not found (required by /nutch/search/lib/native/Linux-i386-32/libhadoop.so) 2007-11-23 11:30:36,480 DEBUG util.NativeCodeLoader - java.library.path=/nutch/search/bin/../lib/native/Linux-i386-32 2007-11-23 11:30:36,480 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2 On 64 bit systems I get: 2007-11-23 10:49:35,751 DEBUG util.NativeCodeLoader - Trying to load the custom-built native-hadoop library... 2007-11-23 10:49:35,751 DEBUG util.NativeCodeLoader - Failed to load native-hadoop with error: java.lang.UnsatisfiedLinkError: no hadoop in java.library.path 2007-11-23 10:49:35,751 DEBUG util.NativeCodeLoader - java.library.path=/nutch/search/bin/../lib/native/Linux-amd64-64 2007-11-23 10:49:35,751 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Any idea what may be causing my urls to disappear? -- View this message in context: http://www.nabble.com/using-trunk%2C-urls-disappearing-when-using-4-nodes-tf4863548.html#a13918054 Sent from the Nutch - User mailing list archive at Nabble.com.
