Hello All, I am getting the following error in my hadoop.log (see below). It seems to happen everytime I run any of the nutch command line tools :(
<!-- 2009-11-25 11:42:49,299 INFO crawl.Injector - Injector: done 2009-11-25 11:42:49,302 DEBUG hdfs.DFSClient - leasechec...@dfsclient[clientname=dfsclient_-822770266, ugi=nutch,nutch]: java.lang.Throwable: for testing at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.toString(DFSClient.java:992) at java.lang.String.valueOf(String.java:2827) at java.lang.StringBuilder.append(StringBuilder.java:115) at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.run(DFSClient.java:981) at java.lang.Thread.run(Thread.java:619) is interrupted. java.lang.InterruptedException: sleep interrupted at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.run(DFSClient.java:978) at java.lang.Thread.run(Thread.java:619) --> Does anyone know what problem I am having ? Cheers, Mischa On 25 Nov 2009, at 09:15, Andrzej Bialecki wrote: > BELLINI ADAM wrote: >> hi, >> my two urls points to the same page ! > > Please, no need to shout ... > > If the MD5 signatures are different, then the binary content of these pages > is different, period. > > Use readseg -dump utility to retrieve the page content from the segment, > extract just the two pages from the dump, and run a unix diff utility. > >> can you tell m eplz more about TextProfileSignature ? how should i >> use it > > Configure this type of signature in your nutch-site.xml - please see the > nutch-default.xml for instructions. Please note that you will have to > re-parse segments and update the db in order to update the signatures. > > > > -- > Best regards, > Andrzej Bialecki <>< > ___. ___ ___ ___ _ _ __________________________________ > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > ___|||__|| \| || | Embedded Unix, System Integration > http://www.sigram.com Contact: info at sigram dot com > ___________________________________ Mischa Tuffield Email: mischa.tuffi...@garlik.com Homepage - http://mmt.me.uk/ Garlik Limited, 2 Sheen Road, Richmond, TW9 1AE, UK +44(0)20 8973 2465 http://www.garlik.com/ Registered in England and Wales 535 7233 VAT # 849 0517 11 Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD