Hello All, 

I am getting the following error in my hadoop.log (see below). It seems to 
happen everytime I run any of the nutch command line tools :(

<!--

2009-11-25 11:42:49,299 INFO  crawl.Injector - Injector: done
2009-11-25 11:42:49,302 DEBUG hdfs.DFSClient - 
leasechec...@dfsclient[clientname=dfsclient_-822770266, ugi=nutch,nutch]: 
java.lang.Throwable: for testing
        at 
org.apache.hadoop.hdfs.DFSClient$LeaseChecker.toString(DFSClient.java:992)
        at java.lang.String.valueOf(String.java:2827)
        at java.lang.StringBuilder.append(StringBuilder.java:115)
        at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.run(DFSClient.java:981)
        at java.lang.Thread.run(Thread.java:619)
 is interrupted.
java.lang.InterruptedException: sleep interrupted
        at java.lang.Thread.sleep(Native Method)
        at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.run(DFSClient.java:978)
        at java.lang.Thread.run(Thread.java:619)

--> 

Does anyone know what problem I am having ?

Cheers, 

Mischa 

On 25 Nov 2009, at 09:15, Andrzej Bialecki wrote:

> BELLINI ADAM wrote:
>> hi,
>> my two urls points to the same page !
> 
> Please, no need to shout ...
> 
> If the MD5 signatures are different, then the binary content of these pages 
> is different, period.
> 
> Use readseg -dump utility to retrieve the page content from the segment, 
> extract just the two pages from the dump, and run a unix diff utility.
> 
>> can you tell m eplz more about TextProfileSignature ? how should i
>> use it
> 
> Configure this type of signature in your nutch-site.xml - please see the 
> nutch-default.xml for instructions. Please note that you will have to 
> re-parse segments and update the db in order to update the signatures.
> 
> 
> 
> -- 
> Best regards,
> Andrzej Bialecki     <><
> ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
> 

___________________________________
Mischa Tuffield
Email: mischa.tuffi...@garlik.com
Homepage - http://mmt.me.uk/
Garlik Limited, 2 Sheen Road, Richmond, TW9 1AE, UK
+44(0)20 8973 2465  http://www.garlik.com/
Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD

Reply via email to