Re: Nutch and Hadoop not working proper

2009-06-24 Thread Andrzej Bialecki
MilleBii wrote: HLPPP !!! Stuck for 3 days on not able to start any nutch job. hdfs works fine, ie I can put look at files. When i start nutch crawl, I get the following error Job initialization failed: java.lang.IllegalArgumentException: Pathname

How torunning nutch on 2G memory tasknode

2009-06-24 Thread SunGod
Error occurred in crawldb TestDB/crawldb reduce phase i get error msg --- java.lang.OutOfMemoryError: Java heap space my command bin/nutch crawl url -dir TestDB -depth 4 -threads 3 single fetchlist around in 20 my settings on the memory hadoop-env.sh export HADOOP_HEAPSIZE=800

Re: Nutch and Hadoop not working proper

2009-06-24 Thread MilleBii
Yes I'm using both relative path cygwin under windows. so /d: is not introduced by me, but either nutch or hadoop. Regarding the cygwin path you are righ... actually where I lost quite some time. OK will try absolute paths and let you know. -MilleBii- 2009/6/24 Andrzej Bialecki

recrawling

2009-06-24 Thread Neeti Gupta
we had made a crawler that visit various sites, and i want the crawler to crawl sites as soon as they are updated, if anyone can help me to know how i can know when the site is updated and its the time to crawl again -- View this message in context:

Re: recrawling

2009-06-24 Thread Otis Gospodnetic
Neeti, I don't think there is a way to know when a regular web site has been updated. You can issue GET or HEAD requests and look at the Last-Modified date, but this is not 100% reliable. You can fetch and compare content, but that's not 100% reliable either. If you are indexing blogs,

Re: Nutch and Hadoop not working proper

2009-06-24 Thread MilleBii
Actually tried and it fails but this is what I found : bin/hadoop-config.sh does the conversion from relative to absolute path this=$0 while [ -h $this ]; do ls=`ls -ld $this` link=`expr $ls : '.*- \(.*\)$'` if expr $link : '.*/.*' /dev/null; then this=$link else this=`dirname

Re: Nutch and Hadoop not working proper

2009-06-24 Thread MilleBii
What's also i have discovered + hadoop (script) works with unix like paths and works fine on windows + nutch (script) works with Windows paths Could it be that there is some incompatibility because one works unix like paths and not the other ??? 2009/6/24 MilleBii mille...@gmail.com Actually

Re: Nutch and Hadoop not working proper

2009-06-24 Thread Andrzej Bialecki
MilleBii wrote: What's also i have discovered + hadoop (script) works with unix like paths and works fine on windows + nutch (script) works with Windows paths bin/nutch works with Windows paths? I think this could happen only by accident - both scripts work with Cygwin paths. On the other