As the simplest possible test, I decided to run a crawl on only one
single machine, convinced it would work.
I sync'ed to the head (rev 351843) and the only thing I did was to
change the values of these 2 properties in nutch-default.xml:

<property>
  <name>fs.default.name</name>
  <value>*localhost:50000*</value>
  <description>The name of the default file system.  Either the
  literal string "local" or a host:port for NDFS.</description>
</property>

<property>
  <name>mapred.job.tracker</name>
  <value>*localhost:50020*</value>
  <description>The host and port that the MapReduce job tracker runs
  at.  If "local", then jobs are run in-process as a single map
  and reduce task.
  </description>
</property>

Btw, I noticed that the property "mapred.job.tracker.info.port" is
defined twice in the file (that seems a bug to me, but maybe I'm missing
something).

Then, here is exactly each individual step I executed as well as the
associated output:

[EMAIL PROTECTED]:~/nutch-mapred$ bin/nutch-daemon.sh start namenode*
starting namenode, logging to
/home/florent/nutch-mapred/nutch-florent-namenode-florent-dev.log
051202 172440 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml
051202 172441 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml
051202 172441 Server listener on port 50000: starting
051202 172441 Server handler 0 on 50000: starting
051202 172441 Server handler 1 on 50000: starting
051202 172441 Server handler 2 on 50000: starting
051202 172441 Server handler 3 on 50000: starting
051202 172441 Server handler 4 on 50000: starting
051202 172441 Server handler 5 on 50000: starting
051202 172441 Server handler 6 on 50000: starting

[EMAIL PROTECTED]:~/nutch-mapred$ bin/nutch-daemon.sh start jobtracker*
starting jobtracker, logging to
/home/florent/nutch-mapred/nutch-florent-jobtracker-florent-dev.log
051202 172501 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml
051202 172501 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml
051202 172501 Client connection to 127.0.0.1:50000: starting

[EMAIL PROTECTED]:~/nutch-mapred$ bin/nutch-daemon.sh start datanode*
starting datanode, logging to
/home/florent/nutch-mapred/nutch-florent-datanode-florent-dev.log
051202 172518 10 parsing
file:/home/florent/nutch-mapred/conf/nutch-default.xml
051202 172518 10 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml

[EMAIL PROTECTED]:~/nutch-mapred$ bin/nutch-daemon.sh start tasktracker*
starting tasktracker, logging to
/home/florent/nutch-mapred/nutch-florent-tasktracker-florent-dev.log
051202 172522 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml
051202 172522 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml
051202 172522 Server listener on port 50050: starting
051202 172522 Server handler 0 on 50050: starting
051202 172522 Server handler 1 on 50050: starting
051202 172522 Server listener on port 50040: starting
051202 172522 Server handler 0 on 50040: starting
051202 172522 Server handler 1 on 50040: starting
051202 172523 Client connection to 127.0.0.1:50020: starting
051202 172523 Client connection to 127.0.0.1:50000: starting

[EMAIL PROTECTED]:~/nutch-mapred$ echo "http://www.osnews.com";>>urls.txt
[EMAIL PROTECTED]:~/nutch-mapred$ bin/nutch ndfs -mkdir urls*
051202 172558 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml
051202 172558 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml
051202 172558 No FS indicated, using default:localhost:50000
051202 172558 Client connection to 127.0.0.1:50000: starting

[EMAIL PROTECTED]:~/nutch-mapred$ bin/nutch ndfs -copyFromLocal
urls.txt urls/urls.txt*
051202 172612 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml
051202 172612 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml
051202 172612 No FS indicated, using default:localhost:50000
051202 172612 Client connection to 127.0.0.1:50000: starting

[EMAIL PROTECTED]:~/nutch-mapred$ bin/nutch inject crawldb urls*
051202 172655 Injector: starting
051202 172655 Injector: crawlDb: crawldb
051202 172655 Injector: urlDir: urls
051202 172655 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml
051202 172655 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml
051202 172655 Injector: Converting injected urls to crawl db entries.
051202 172655 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml
051202 172655 parsing
file:/home/florent/nutch-mapred/conf/mapred-default.xml
051202 172655 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml
051202 172655 Client connection to 127.0.0.1:50020: starting
051202 172655 Client connection to 127.0.0.1:50000: starting
051202 172656 Running job: job_vz968q
051202 172657  map 0%
051202 172703  map 50%
051202 172705  map 100%
051202 172708  reduce 100%
051202 172708 Job complete: job_vz968q
051202 172708 Injector: Merging injected urls into crawl db.
051202 172708 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml
051202 172708 parsing
file:/home/florent/nutch-mapred/conf/mapred-default.xml
051202 172708 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml
051202 172709 Running job: job_d11zdg
051202 172710  map 0%
051202 172714  map 100%
051202 172717  reduce 100%
051202 172717 Job complete: job_d11zdg
051202 172718 Injector: done

[EMAIL PROTECTED]:~/nutch-mapred$ bin/nutch generate crawldb segments
-topN 10000000*
051202 172751 topN: 10000000
051202 172751 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml
051202 172752 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml
051202 172752 Generator: starting
051202 172752 Generator: segment: segments/20051202172752
051202 172752 Generator: Selecting most-linked urls due for fetch.
051202 172752 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml
051202 172752 parsing
file:/home/florent/nutch-mapred/conf/mapred-default.xml
051202 172752 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml
051202 172752 Client connection to 127.0.0.1:50020: starting
051202 172752 Client connection to 127.0.0.1:50000: starting
051202 172753 Running job: job_99dgv7
051202 172754  map 0%
051202 172756  map 100%
051202 172759  reduce 100%
051202 172759 Job complete: job_99dgv7
051202 172800 Generator: Partitioning selected urls by host, for politeness.
051202 172800 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml
051202 172800 parsing
file:/home/florent/nutch-mapred/conf/mapred-default.xml
051202 172800 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml
051202 172801 Running job: job_3dhill
051202 172802  map 0%
051202 172805  map 100%
051202 172808  reduce 50%
051202 172812  reduce 100%
051202 172812 Job complete: job_3dhill
051202 172812 Generator: done.

[EMAIL PROTECTED]:~/nutch-mapred$ bin/nutch ndfs -ls segments*
051202 172835 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml
051202 172835 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml
051202 172835 No FS indicated, using default:localhost:50000
051202 172835 Client connection to 127.0.0.1:50000: starting
Found 1 items
/user/florent/segments/20051202172752   <dir>

[EMAIL PROTECTED]:~/nutch-mapred$ bin/nutch fetch
segments/20051202172752*
051202 172909 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml
051202 172909 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml
051202 172909 Fetcher: starting
051202 172909 Fetcher: segment: segments/20051202172752
051202 172909 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml
051202 172909 parsing
file:/home/florent/nutch-mapred/conf/mapred-default.xml
051202 172909 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml
051202 172909 Client connection to 127.0.0.1:50020: starting
051202 172909 Client connection to 127.0.0.1:50000: starting
051202 172910 Running job: job_76bwnm
051202 172911  map 0%
051202 172914  map 50%
051202 172918  map 100%
051202 172923  reduce 75%
051202 172926  reduce 100%
051202 172926 Job complete: job_76bwnm
051202 172926 Fetcher: done

[EMAIL PROTECTED]:~/nutch-mapred$ bin/nutch invertlinks linkdb
segments/2005120217275*
051202 173000 LinkDb: starting
051202 173000 LinkDb: linkdb: linkdb
051202 173000 LinkDb: segments: segments/2005120217275
051202 173001 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml
051202 173001 parsing
file:/home/florent/nutch-mapred/conf/mapred-default.xml
051202 173001 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml
051202 173001 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml
051202 173001 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml
051202 173001 Client connection to 127.0.0.1:50020: starting
051202 173001 Client connection to 127.0.0.1:50000: starting
Exception in thread "main" java.io.IOException: No input directories
specified in: NutchConf: nutch-default.xml , mapred-default.xml ,
/tmp/nutch/mapred/local/jobTracker/job_jxygzp.xml , nutch-site.xml
        at org.apache.nutch.ipc.Client.call(Client.java:294)
        at org.apache.nutch.ipc.RPC$Invoker.invoke(RPC.java:127)
        at $Proxy0.submitJob(Unknown Source)
        at org.apache.nutch.mapred.JobClient.submitJob(JobClient.java:259)
        at org.apache.nutch.mapred.JobClient.runJob(JobClient.java:288)
        at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:131)
        at org.apache.nutch.crawl.LinkDb.main(LinkDb.java:192)

Exact same error as before.
I guess there is something I didn't configure that is required, but I
didn't find anything relevant in the doc/tutorials I found...

--Flo


Florent Gluck wrote:

>I tried again with absolute paths, but it didn't make any difference. 
>All the local directories that are accessed by the java process are
>within the user home directory, so write access is not an issue.
>As a test, I also tried to revert my nutch-site.xml and only put the
>following, so it would use the defaults for directories location
>(/tmp/nutch/...):
>
><property>
>  <name>fs.default.name</name>
>  <value>mapred01:10000</value>
>  <description>The name of the default file system.  Either the
>  literal string "local" or a host:port for NDFS.</description>
></property>
>
><property>
>  <name>mapred.job.tracker</name>
>  <value>mapred01:11000</value>
>  <description>The host and port that the MapReduce job tracker runs
>  at.  If "local", then jobs are run in-process as a single map
>  and reduce task.
>  </description>
></property>
>
>Unfortunately, it didn't make any difference either, I still get the
>exact same error.
>What I'm doing is very simple, I'm following what's explained here:
>http://wiki.media-style.com/display/nutchDocu/setup+a+map+reduce+multi+box+system
>The only difference is that I'm using a .slaves file and I run
>start-all.sh to avoid having to log on the slave machine and start the
>daemons manually.
>
>--Flo
>
>Stefan Groschupf wrote:
>
>  
>
>>Sounds strange, I had the a similar probelm, but this related to 
>>different user names on different boxes.
>>Please try to use absolute path  something like bin/nutch fetch /
>>Users/yourUser/segments/30000004344
>>Also check that your users that runs the java processes have write 
>>access to the local folders.
>>:-?
>>
>>Stefan
>>    
>>


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to