Hello Again, 

Following my previous post below, I have noticed that I get the following 
IOException every time I atttempt to use nutch. 

<!--
2009-11-25 12:19:18,760 DEBUG conf.Configuration - java.io.IOException: config()
        at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:176)
        at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:164)
        at 
org.apache.hadoop.hdfs.protocol.FSConstants.<clinit>(FSConstants.java:51)
        at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2757)
        at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2703)
        at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996)
        at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)

-->

Any pointers would be great, I wonder is there a way for me to validate my conf 
options before I deploy nutch?

Regards, 

Mischa
On 25 Nov 2009, at 11:45, Mischa Tuffield wrote:

> Hello All, 
> 
> I am getting the following error in my hadoop.log (see below). It seems to 
> happen everytime I run any of the nutch command line tools :(
> 
> <!--
> 
> 2009-11-25 11:42:49,299 INFO  crawl.Injector - Injector: done
> 2009-11-25 11:42:49,302 DEBUG hdfs.DFSClient - 
> leasechec...@dfsclient[clientname=dfsclient_-822770266, ugi=nutch,nutch]: 
> java.lang.Throwable: for testing
>       at 
> org.apache.hadoop.hdfs.DFSClient$LeaseChecker.toString(DFSClient.java:992)
>       at java.lang.String.valueOf(String.java:2827)
>       at java.lang.StringBuilder.append(StringBuilder.java:115)
>       at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.run(DFSClient.java:981)
>       at java.lang.Thread.run(Thread.java:619)
> is interrupted.
> java.lang.InterruptedException: sleep interrupted
>       at java.lang.Thread.sleep(Native Method)
>       at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.run(DFSClient.java:978)
>       at java.lang.Thread.run(Thread.java:619)
> 
> --> 
> 
> Does anyone know what problem I am having ?
> 
> Cheers, 
> 
> Mischa 
> 
> On 25 Nov 2009, at 09:15, Andrzej Bialecki wrote:
> 
>> BELLINI ADAM wrote:
>>> hi,
>>> my two urls points to the same page !
>> 
>> Please, no need to shout ...
>> 
>> If the MD5 signatures are different, then the binary content of these pages 
>> is different, period.
>> 
>> Use readseg -dump utility to retrieve the page content from the segment, 
>> extract just the two pages from the dump, and run a unix diff utility.
>> 
>>> can you tell m eplz more about TextProfileSignature ? how should i
>>> use it
>> 
>> Configure this type of signature in your nutch-site.xml - please see the 
>> nutch-default.xml for instructions. Please note that you will have to 
>> re-parse segments and update the db in order to update the signatures.
>> 
>> 
>> 
>> -- 
>> Best regards,
>> Andrzej Bialecki     <><
>> ___. ___ ___ ___ _ _   __________________________________
>> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
>> ___|||__||  \|  ||  |  Embedded Unix, System Integration
>> http://www.sigram.com  Contact: info at sigram dot com
>> 
> 
> ___________________________________
> Mischa Tuffield
> Email: mischa.tuffi...@garlik.com
> Homepage - http://mmt.me.uk/
> Garlik Limited, 2 Sheen Road, Richmond, TW9 1AE, UK
> +44(0)20 8973 2465  http://www.garlik.com/
> Registered in England and Wales 535 7233 VAT # 849 0517 11
> Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD
> 

___________________________________
Mischa Tuffield
Email: mischa.tuffi...@garlik.com
Homepage - http://mmt.me.uk/
Garlik Limited, 2 Sheen Road, Richmond, TW9 1AE, UK
+44(0)20 8973 2465  http://www.garlik.com/
Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD

Reply via email to