Hello Again, Following my previous post below, I have noticed that I get the following IOException every time I atttempt to use nutch.
<!-- 2009-11-25 12:19:18,760 DEBUG conf.Configuration - java.io.IOException: config() at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:176) at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:164) at org.apache.hadoop.hdfs.protocol.FSConstants.<clinit>(FSConstants.java:51) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2757) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2703) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183) --> Any pointers would be great, I wonder is there a way for me to validate my conf options before I deploy nutch? Regards, Mischa On 25 Nov 2009, at 11:45, Mischa Tuffield wrote: > Hello All, > > I am getting the following error in my hadoop.log (see below). It seems to > happen everytime I run any of the nutch command line tools :( > > <!-- > > 2009-11-25 11:42:49,299 INFO crawl.Injector - Injector: done > 2009-11-25 11:42:49,302 DEBUG hdfs.DFSClient - > leasechec...@dfsclient[clientname=dfsclient_-822770266, ugi=nutch,nutch]: > java.lang.Throwable: for testing > at > org.apache.hadoop.hdfs.DFSClient$LeaseChecker.toString(DFSClient.java:992) > at java.lang.String.valueOf(String.java:2827) > at java.lang.StringBuilder.append(StringBuilder.java:115) > at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.run(DFSClient.java:981) > at java.lang.Thread.run(Thread.java:619) > is interrupted. > java.lang.InterruptedException: sleep interrupted > at java.lang.Thread.sleep(Native Method) > at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.run(DFSClient.java:978) > at java.lang.Thread.run(Thread.java:619) > > --> > > Does anyone know what problem I am having ? > > Cheers, > > Mischa > > On 25 Nov 2009, at 09:15, Andrzej Bialecki wrote: > >> BELLINI ADAM wrote: >>> hi, >>> my two urls points to the same page ! >> >> Please, no need to shout ... >> >> If the MD5 signatures are different, then the binary content of these pages >> is different, period. >> >> Use readseg -dump utility to retrieve the page content from the segment, >> extract just the two pages from the dump, and run a unix diff utility. >> >>> can you tell m eplz more about TextProfileSignature ? how should i >>> use it >> >> Configure this type of signature in your nutch-site.xml - please see the >> nutch-default.xml for instructions. Please note that you will have to >> re-parse segments and update the db in order to update the signatures. >> >> >> >> -- >> Best regards, >> Andrzej Bialecki <>< >> ___. ___ ___ ___ _ _ __________________________________ >> [__ || __|__/|__||\/| Information Retrieval, Semantic Web >> ___|||__|| \| || | Embedded Unix, System Integration >> http://www.sigram.com Contact: info at sigram dot com >> > > ___________________________________ > Mischa Tuffield > Email: mischa.tuffi...@garlik.com > Homepage - http://mmt.me.uk/ > Garlik Limited, 2 Sheen Road, Richmond, TW9 1AE, UK > +44(0)20 8973 2465 http://www.garlik.com/ > Registered in England and Wales 535 7233 VAT # 849 0517 11 > Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD > ___________________________________ Mischa Tuffield Email: mischa.tuffi...@garlik.com Homepage - http://mmt.me.uk/ Garlik Limited, 2 Sheen Road, Richmond, TW9 1AE, UK +44(0)20 8973 2465 http://www.garlik.com/ Registered in England and Wales 535 7233 VAT # 849 0517 11 Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD