Any idea what may be causing my urls to disappear? I am loosing 3 out of 5
urls when I use 4 nodes in the cluster. Everything is fine when I use 1 node
in the cluster. 

Here is a simple test that is giving me different stats if I am using 1 or
multiple nodes: 

Background: running nutch on 4 nodes, Nodes are Debian and Gentoo, 64 and 32
bit systems.
Goal: test if indexing 5 urls yields same results when using 1 node versus
using 5 nodes
Code: latest from trunk as of Nov 23
Number of urls injected in both tests: 5


TEST1: Inject file with 5 urls, comment out all slave nodes except one
RESULT1: as expected
bin/nutch readdb crawltest/crawldb -stats
CrawlDb statistics start: crawltest/crawldb
Statistics for CrawlDb: crawltest/crawldb
TOTAL urls:     5
retry 0:        2
retry 2:        3
min score:      1.0
avg score:      1.0
max score:      1.0
status 1 (db_unfetched):        3
status 4 (db_redir_temp):       2
CrawlDb statistics: done

TEST2: Inject file with 5 urls, 4 slave nodes except one
RESULT1: missing 3 urls
/nutch/search$ bin/nutch readdb crawltest/crawldb -stats
CrawlDb statistics start: crawltest/crawldb
Statistics for CrawlDb: crawltest/crawldb
TOTAL urls:     2
retry 0:        1
min score:      1.0
avg score:      1.0
max score:      1.0
status 1 (db_unfetched):        1
CrawlDb statistics: done



ERROR logs
On 32 bit systems I get:

2007-11-23 11:30:36,479 DEBUG util.NativeCodeLoader - Trying to load the
custom-built native-hadoop library...
2007-11-23 11:30:36,480 DEBUG util.NativeCodeLoader - Failed to load
native-hadoop with error: java.lang.UnsatisfiedLinkError:
/nutch/search/lib/native/Linux-i386-32/libhadoop.so:
/lib/tls/i686/cmov/libc.so.6: version `GLIBC_2.4' not found (required by
/nutch/search/lib/native/Linux-i386-32/libhadoop.so)
2007-11-23 11:30:36,480 DEBUG util.NativeCodeLoader -
java.library.path=/nutch/search/bin/../lib/native/Linux-i386-32
2007-11-23 11:30:36,480 WARN  util.NativeCodeLoader - Unable to load
native-hadoop library for your platform... using builtin-java classes where
applicable
2

On 64 bit systems I get:
2007-11-23 10:49:35,751 DEBUG util.NativeCodeLoader - Trying to load the
custom-built native-hadoop library...
2007-11-23 10:49:35,751 DEBUG util.NativeCodeLoader - Failed to load
native-hadoop with error: java.lang.UnsatisfiedLinkError: no hadoop in
java.library.path
2007-11-23 10:49:35,751 DEBUG util.NativeCodeLoader -
java.library.path=/nutch/search/bin/../lib/native/Linux-amd64-64
2007-11-23 10:49:35,751 WARN  util.NativeCodeLoader - Unable to load
native-hadoop library for your platform... using builtin-java classes where
applicable


Any idea what may be causing my urls to disappear?
-- 
View this message in context: 
http://www.nabble.com/using-trunk%2C-urls-disappearing-when-using-4-nodes-tf4863548.html#a13918054
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to