Sorry, my mistake. changed to 0.1.1
results:
bash-3.00$ bin/nutch crawl urls -dir crawled -depht 2
060425 113831 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
060425 113831 parsing file:/home/../nutch-nightly/conf/nutch-default.xml
060425 113832 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml
060425 113832 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
060425 113832 parsing file:/home/../nutch-nightly/conf/nutch-site.xml
060425 113832 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
060425 113832 Client connection to 127.0.0.1:50000: starting
060425 113832 crawl started in: crawled
060425 113832 rootUrlDir = 2
060425 113832 threads = 10
060425 113832 depth = 5
060425 113833 Injector: starting
060425 113833 Injector: crawlDb: crawled/crawldb
060425 113833 Injector: urlDir: 2
060425 113833 Injector: Converting injected urls to crawl db entries.
060425 113833 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
060425 113833 parsing file:/home/../nutch-nightly/conf/nutch-default.xml
060425 113833 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml
060425 113833 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
060425 113833 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
060425 113833 parsing file:/home/../nutch-nightly/conf/nutch-site.xml
060425 113833 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
060425 113834 Client connection to 127.0.0.1:50020: starting
060425 113834 Client connection to 127.0.0.1:50000: starting
060425 113834 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
060425 113834 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
060425 113838 Running job: job_23a6ra
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:322)
at org.apache.nutch.crawl.Injector.inject(Injector.java:114)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
bash-3.00$
Step by Step, same but another job that failed.
> --- Ursprüngliche Nachricht ---
> Von: "Zaheed Haque" <[EMAIL PROTECTED]>
> An: [email protected]
> Betreff: Re: java.io.IOException: No input directories specified in
> Datum: Tue, 25 Apr 2006 11:34:10 +0200
>
> >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
>
> Somehow you are still using hadoop-0.1 and not 0.1.1. I am not sure if
> this update will solve your problem but it might. With the config I
> sent you, I could, crawl-index-serach so there must be something
> else.. I am not sure.
>
> Cheers
> Zaheed
>
> On 4/25/06, Peter Swoboda <[EMAIL PROTECTED]> wrote:
> > Seems to be a bit better, doesn't it?
> >
> > bash-3.00$ bin/nutch crawl urls -dir crawled -depht 2
> > 060425 110124 parsing
> >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> > 060425 110124 parsing file:/home/../nutch-nightly/conf/nutch-default.xml
> > 060425 110124 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml
> > 060425 110124 parsing
> >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/mapred-default.xml
> > 060425 110125 parsing file:/home/../nutch-nightly/conf/nutch-site.xml
> > 060425 110125 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > 060425 110125 Client connection to 127.0.0.1:50000: starting
> > 060425 110125 crawl started in: crawled
> > 060425 110125 rootUrlDir = 2
> > 060425 110125 threads = 10
> > 060425 110125 depth = 5
> > 060425 110126 Injector: starting
> > 060425 110126 Injector: crawlDb: crawled/crawldb
> > 060425 110126 Injector: urlDir: 2
> > 060425 110126 Injector: Converting injected urls to crawl db entries.
> > 060425 110126 parsing
> >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> > 060425 110126 parsing file:/home/../nutch-nightly/conf/nutch-default.xml
> > 060425 110126 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml
> > 060425 110126 parsing
> >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/mapred-default.xml
> > 060425 110126 parsing
> >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/mapred-default.xml
> > 060425 110126 parsing file:/home/../nutch-nightly/conf/nutch-site.xml
> > 060425 110127 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > 060425 110127 Client connection to 127.0.0.1:50020: starting
> > 060425 110127 Client connection to 127.0.0.1:50000: starting
> > 060425 110127 parsing
> >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> > 060425 110127 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > Exception in thread "main"
> java.lang.reflect.UndeclaredThrowableException
> > at org.apache.hadoop.mapred.$Proxy1.getJobProfile(Unknown
> Source)
> > at
> >
> org.apache.hadoop.mapred.JobClient$NetworkedJob.<init>(JobClient.java:60)
> > at
> org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:263)
> > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:294)
> > at org.apache.nutch.crawl.Injector.inject(Injector.java:114)
> > at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> > Caused by: java.io.IOException: timed out waiting for response
> > at org.apache.hadoop.ipc.Client.call(Client.java:303)
> > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:141)
> > ... 6 more
> >
> >
> > local ip is the same,
> > but don't exactly know how to handle the ports.
> >
> > Step by Step (generate, index..) caused same error while
> > bin/nutch generate crawl/crawldb crawl/segments
> >
> > > --- Ursprüngliche Nachricht ---
> > > Von: "Zaheed Haque" <[EMAIL PROTECTED]>
> > > An: [email protected]
> > > Betreff: Re: java.io.IOException: No input directories specified in
> > > Datum: Mon, 24 Apr 2006 13:39:10 +0200
> > >
> > > Try the following in your hadoop-site.xml.. please change and adjust
> > > based on your ip address. The following configuration assumes that the
> > > you have 1 server and you are using it as a namenode as well as a
> > > datanode. Note this is NOT the reason for running Hadoopified Nutch!
> > > It is rather for testing....
> > >
> > > --------------------
> > >
> > > <?xml version="1.0"?>
> > > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> > >
> > > <configuration>
> > >
> > > <!-- file system properties -->
> > >
> > > <property>
> > > <name>fs.default.name</name>
> > > <value>127.0.0.1:50000</value>
> > > <description>The name of the default file system. Either the
> > > literal string "local" or a host:port for DFS.</description>
> > > </property>
> > >
> > > <property>
> > > <name>dfs.datanode.port</name>
> > > <value>50010</value>
> > > <description>The port number that the dfs datanode server uses as a
> > > starting
> > > point to look for a free port to listen on.
> > > </description>
> > > </property>
> > >
> > > <property>
> > > <name>dfs.name.dir</name>
> > > <value>/tmp/hadoop/dfs/name</value>
> > > <description>Determines where on the local filesystem the DFS name
> node
> > > should store the name table.</description>
> > > </property>
> > >
> > > <property>
> > > <name>dfs.data.dir</name>
> > > <value>/tmp/hadoop/dfs/data</value>
> > > <description>Determines where on the local filesystem an DFS data
> node
> > > should store its blocks. If this is a comma- or space-delimited
> > > list of directories, then data will be stored in all named
> > > directories, typically on different devices.</description>
> > > </property>
> > >
> > > <property>
> > > <name>dfs.replication</name>
> > > <value>1</value>
> > > <description>How many copies we try to have at all times. The actual
> > > number of replications is at max the number of datanodes in the
> > > cluster.</description>
> > > </property>
> > > <!-- map/reduce properties -->
> > >
> > > <property>
> > > <name>mapred.job.tracker</name>
> > > <value>127.0.0.1:50020</value>
> > > <description>The host and port that the MapReduce job tracker runs
> > > at. If "local", then jobs are run in-process as a single map
> > > and reduce task.
> > > </description>
> > > </property>
> > >
> > > <property>
> > > <name>mapred.job.tracker.info.port</name>
> > > <value>50030</value>
> > > <description>The port that the MapReduce job tracker info webserver
> runs
> > > at.
> > > </description>
> > > </property>
> > >
> > > <property>
> > > <name>mapred.task.tracker.output.port</name>
> > > <value>50040</value>
> > > <description>The port number that the MapReduce task tracker output
> > > server uses as a starting point to look for
> > > a free port to listen on.
> > > </description>
> > > </property>
> > >
> > > <property>
> > > <name>mapred.task.tracker.report.port</name>
> > > <value>50050</value>
> > > <description>The port number that the MapReduce task tracker report
> > > server uses as a starting
> > > point to look for a free port to listen on.
> > > </description>
> > > </property>
> > >
> > > <property>
> > > <name>mapred.local.dir</name>
> > > <value>/tmp/hadoop/mapred/local</value>
> > > <description>The local directory where MapReduce stores intermediate
> > > data files. May be a space- or comma- separated list of
> > > directories on different devices in order to spread disk i/o.
> > > </description>
> > > </property>
> > >
> > > <property>
> > > <name>mapred.system.dir</name>
> > > <value>/tmp/hadoop/mapred/system</value>
> > > <description>The shared directory where MapReduce stores control
> files.
> > > </description>
> > > </property>
> > >
> > > <property>
> > > <name>mapred.temp.dir</name>
> > > <value>/tmp/hadoop/mapred/temp</value>
> > > <description>A shared directory for temporary files.
> > > </description>
> > > </property>
> > >
> > > <property>
> > > <name>mapred.reduce.tasks</name>
> > > <value>1</value>
> > > <description>The default number of reduce tasks per job. Typically
> set
> > > to a prime close to the number of available hosts. Ignored when
> > > mapred.job.tracker is "local".
> > > </description>
> > > </property>
> > >
> > > <property>
> > > <name>mapred.tasktracker.tasks.maximum</name>
> > > <value>2</value>
> > > <value>/tmp/hadoop/mapred/temp</value>
> > > <description>A shared directory for temporary files.
> > > </description>
> > > </property>
> > >
> > > <property>
> > > <name>mapred.reduce.tasks</name>
> > > <value>1</value>
> > > <description>The default number of reduce tasks per job. Typically
> set
> > > to a prime close to the number of available hosts. Ignored when
> > > mapred.job.tracker is "local".
> > > </description>
> > > </property>
> > >
> > > <property>
> > > <name>mapred.tasktracker.tasks.maximum</name>
> > > <value>2</value>
> > > <description>The maximum number of tasks that will be run
> > > simultaneously by a task tracker.
> > > </description>
> > > </property>
> > >
> > > </configuration>
> > >
> > > ------
> > >
> > > Then execute the following commands
> > > - initialize the HDFS
> > > bin/hadoop namenode -format
> > > - Start the namenode/datanode
> > > bin/hadoop-daemon.sh start namenode
> > > bin/hadoop-daemon.sh start datanode
> > > - Lets do some checking...
> > > bin/hadoop dfs -ls
> > >
> > > Should return 0 items!! So lets try to add a file to the DFS
> > >
> > > bin/hadoop dfs -put xyz.html xyz.html
> > >
> > > Try
> > >
> > > bin/hadoop dfs -ls
> > >
> > > You should see one item which is
> > > Found 1 items
> > > /user/root/xyz.html 21433
> > >
> > > bin/hadoop-daemon.sh start jobtracker
> > > bin/hadoop-daemon.sh start tasktracker
> > >
> > > Now you can start of with inject, generate etc.. etc..
> > >
> > > Hope this time it works for you..
> > >
> > > Cheers
> > >
> > >
> > > On 4/24/06, Zaheed Haque <[EMAIL PROTECTED]> wrote:
> > > > On 4/24/06, Peter Swoboda <[EMAIL PROTECTED]> wrote:
> > > > > I forgot to have a look at the log files:
> > > > > namenode:
> > > > > 060424 121444 parsing
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > 060424 121444 parsing
> file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > Exception in thread "main" java.lang.RuntimeException: Not a
> host:port
> > > pair:
> > > > > local
> > > > > at
> > > org.apache.hadoop.dfs.DataNode.createSocketAddr(DataNode.java:75)
> > > > > at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:78)
> > > > > at org.apache.hadoop.dfs.NameNode.main(NameNode.java:394)
> > > > >
> > > > >
> > > > > datanode
> > > > > 060424 121448 10 parsing
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > 060424 121448 10 parsing
> > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > 060424 121448 10 Can't start DataNode in non-directory:
> > > /tmp/hadoop/dfs/data
> > > > >
> > > > > jobtracker
> > > > > 060424 121455 parsing
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > 060424 121455 parsing
> file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > 060424 121455 parsing
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > 060424 121456 parsing
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > 060424 121456 parsing
> file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > Exception in thread "main" java.lang.RuntimeException: Bad
> > > > > mapred.job.tracker: local
> > > > > at
> > > org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:361)
> > > > > at
> > > org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:333)
> > > > > at
> > > org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:51)
> > > > > at
> > > org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:907)
> > > > >
> > > > >
> > > > > tasktracker
> > > > > 060424 121502 parsing
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > 060424 121503 parsing
> file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > Exception in thread "main" java.lang.RuntimeException: Bad
> > > > > mapred.job.tracker: local
> > > > > at
> > > org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:361)
> > > > > at
> > > org.apache.hadoop.mapred.TaskTracker.<init>(TaskTracker.java:86)
> > > > > at
> > > org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:755)
> > > > >
> > > > >
> > > > > What can be the problem?
> > > > > > --- Ursprüngliche Nachricht ---
> > > > > > Von: "Peter Swoboda" <[EMAIL PROTECTED]>
> > > > > > An: [email protected]
> > > > > > Betreff: Re: java.io.IOException: No input directories specified
> in
> > > > > > Datum: Mon, 24 Apr 2006 12:39:28 +0200 (MEST)
> > > > > >
> > > > > > Got the latest nutch-nightly built,
> > > > > > including hadoop-0.1.1.jar.
> > > > > > Copied the content of the daoop-default.xml into
> hadoop-site.xml.
> > > > > > started namenode, datanode, jobtracker, tasktracker.
> > > > > > made
> > > > > > bin/hadoop dfs -put seeds seeds
> > > > > >
> > > > > > result:
> > > > > >
> > > > > > bash-3.00$ bin/hadoop-daemon.sh start namenode
> > > > > > starting namenode, logging to
> > > > > > bin/../logs/hadoop-jung-namenode-gillespie.log
> > > > > >
> > > > > > bash-3.00$ bin/hadoop-daemon.sh start datanode
> > > > > > starting datanode, logging to
> > > > > > bin/../logs/hadoop-jung-datanode-gillespie.log
> > > > > >
> > > > > > bash-3.00$ bin/hadoop-daemon.sh start jobtracker
> > > > > > starting jobtracker, logging to
> > > > > > bin/../logs/hadoop-jung-jobtracker-gillespie.log
> > > > > >
> > > > > > bash-3.00$ bin/hadoop-daemon.sh start tasktracker
> > > > > > starting tasktracker, logging to
> > > > > > bin/../logs/hadoop-jung-tasktracker-gillespie.log
> > > > > >
> > > > > > bash-3.00$ bin/hadoop dfs -put seeds seeds
> > > > > > 060424 121512 parsing
> > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > 060424 121512 parsing
> > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > 060424 121513 No FS indicated, using default:local
> > > > > >
> > > > > > bash-3.00$ bin/hadoop dfs -ls
> > > > > > 060424 121543 parsing
> > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > 060424 121543 parsing
> > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > 060424 121544 No FS indicated, using default:local
> > > > > > Found 18 items
> > > > > > /home/../nutch-nightly/docs <dir>
> > > > > > /home/../nutch-nightly/nutch-nightly.war 15541036
> > > > > > /home/../nutch-nightly/webapps <dir>
> > > > > > /home/../nutch-nightly/CHANGES.txt 17709
> > > > > > /home/../nutch-nightly/build.xml 21433
> > > > > > /home/../nutch-nightly/LICENSE.txt 615
> > > > > > /home/../nutch-nightly/test.log 3447
> > > > > > /home/../nutch-nightly/conf <dir>
> > > > > > /home/../nutch-nightly/default.properties 3043
> > > > > > /home/../nutch-nightly/plugins <dir>
> > > > > > /home/../nutch-nightly/lib <dir>
> > > > > > /home/../nutch-nightly/bin <dir>
> > > > > > /home/../nutch-nightly/logs <dir>
> > > > > > /home/../nutch-nightly/nutch-nightly.jar 408375
> > > > > > /home/../nutch-nightly/src <dir>
> > > > > > /home/../nutch-nightly/nutch-nightly.job 18537096
> > > > > > /home/../nutch-nightly/seeds <dir>
> > > > > > /home/../nutch-nightly/README.txt 403
> > > > > >
> > > > > > bash-3.00$ bin/hadoop dfs -ls seeds
> > > > > > 060424 121603 parsing
> > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > 060424 121603 parsing
> > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > 060424 121603 No FS indicated, using default:local
> > > > > > Found 2 items
> > > > > > /home/../nutch-nightly/seeds/urls.txt~ 0
> > > > > > /home/../nutch-nightly/seeds/urls.txt 26
> > > > > >
> > > > > > so far so good, but:
> > > > > >
> > > > > > bash-3.00$ bin/nutch crawl seeds -dir crawled -depht 2
> > > > > > 060424 121613 parsing
> > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > 060424 121613 parsing
> > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > 060424 121613 parsing
> > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > 060424 121613 parsing
> > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > 060424 121613 parsing
> > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > 060424 121613 parsing
> > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > 060424 121614 crawl started in: crawled
> > > > > > 060424 121614 rootUrlDir = 2
> > > > > > 060424 121614 threads = 10
> > > > > > 060424 121614 depth = 5
> > > > > > Exception in thread "main" java.io.IOException: No valid local
> > > directories
> > > > > > in property: mapred.local.dir
> > > > > > at
> > > > > >
> org.apache.hadoop.conf.Configuration.getFile(Configuration.java:282)
> > > > > > at
> > > org.apache.hadoop.mapred.JobConf.getLocalFile(JobConf.java:127)
> > > > > > at org.apache.nutch.crawl.Crawl.main(Crawl.java:101)
> > > > > > bash-3.00$
> > > > > >
> > > > > > I really don't know what to do.
> > > > > > in hadoop-site.xml it's:
> > > > > > ..
> > > > > > <property>
> > > > > > <name>mapred.local.dir</name>
> > > > > > <value>/tmp/hadoop/mapred/local</value>
> > > > > > <description>The local directory where MapReduce stores
> > > intermediate
> > > > > > data files. May be a space- or comma- separated list of
> > > > > > directories on different devices in order to spread disk i/o.
> > > > > > </description>
> > > > > > </property>
> > > > > > ..
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > _______________________________________
> > > > > > Is your hadoop-site.xml empty, I mean it doesn't consisit any
> > > > > > configuration correct? So what you need to do is add your
> > > > > > configuration there. I suggest you copy the hadoop-0.1.1.jar to
> > > > > > another directory for inspection, copy not move. unzip the
> > > > > > hadoop-0.1.1.jar file you will see hadoop-default.xml file
> there.
> > > use
> > > > > > that as a template to edit your hadoop-site.xml under conf. Once
> you
> > > > > > have edited it then you should start your 'namenode' and
> 'datanode'.
> > > I
> > > > > > am guessing you are using nutch in a distributed way. cos you
> don't
> > > > > > need to use hadoop if you are just running in one machine local
> > > mode!!
> > > > > >
> > > > > > Anyway you need to do the following to start the datanode and
> > > namenode
> > > > > >
> > > > > > bin/hadoop-daemon.sh start namenode
> > > > > > bin/hadoop-daemon.sh start datanode
> > > > > >
> > > > > > then you need to start jobtracker and tasktracker before you
> start
> > > > > > crawling
> > > > > > bin/hadoop-daemon.sh start jobtracker
> > > > > > bin/hadoop-daemon.sh start tasktracker
> > > > > >
> > > > > > then you start your bin/hadoop dfs -put seeds seeds
> > > > > >
> > > > > > On 4/21/06, Peter Swoboda <[EMAIL PROTECTED]> wrote:
> > > > > > > ok. changed to latest nightly build.
> > > > > > > hadoop-0.1.1.jar is existing,
> > > > > > > hadoop-site.xml also.
> > > > > > > now trying
> > > > > > >
> > > > > > > bash-3.00$ bin/hadoop dfs -put seeds seeds
> > > > > > >
> > > > > > > 060421 125154 parsing
> > > > > > > jar:file:/home/stud/jung/Desktop/nutch-nightly/lib/hadoop-
> > > > > > > 0.1.1.jar!/hadoop-default.xml
> > > > > > > 060421 125155 parsing
> > > > > > > file:/home/stud/jung/Desktop/nutch-nightly/conf/hadoop-sit
> e.xml
> > > > > > > 060421 125155 No FS indicated, using default:local
> > > > > > >
> > > > > > > and
> > > > > > >
> > > > > > > bash-3.00$ bin/hadoop dfs -ls
> > > > > > >
> > > > > > > 060421 125217 parsing
> > > > > > > jar:file:/home/stud/jung/Desktop/nutch-nightly/lib/hadoop-
> > > > > > > 0.1.1.jar!/hadoop-default.xml
> > > > > > > 060421 125217 parsing
> > > > > > > file:/home/stud/jung/Desktop/nutch-nightly/conf/hadoop-sit
> e.xml
> > > > > > > 060421 125217 No FS indicated, using default:local
> > > > > > > Found 16 items
> > > > > > > /home/stud/jung/Desktop/nutch-nightly/docs <dir>
> > > > > > > /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.war
> 15541036
> > > > > > > /home/stud/jung/Desktop/nutch-nightly/webapps <dir>
> > > > > > > /home/stud/jung/Desktop/nutch-nightly/CHANGES.txt 17709
> > > > > > > /home/stud/jung/Desktop/nutch-nightly/build.xml 21433
> > > > > > > /home/stud/jung/Desktop/nutch-nightly/LICENSE.txt 615
> > > > > > > /home/stud/jung/Desktop/nutch-nightly/conf <dir>
> > > > > > > /home/stud/jung/Desktop/nutch-nightly/default.properties
> > > 3043
> > > > > > > /home/stud/jung/Desktop/nutch-nightly/plugins <dir>
> > > > > > > /home/stud/jung/Desktop/nutch-nightly/lib <dir>
> > > > > > > /home/stud/jung/Desktop/nutch-nightly/bin <dir>
> > > > > > > /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.jar 408375
> > > > > > > /home/stud/jung/Desktop/nutch-nightly/src <dir>
> > > > > > > /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.job
> 18537096
> > > > > > > /home/stud/jung/Desktop/nutch-nightly/seeds <dir>
> > > > > > > /home/stud/jung/Desktop/nutch-nightly/README.txt 403
> > > > > > >
> > > > > > > also:
> > > > > > >
> > > > > > > bash-3.00$ bin/hadoop dfs -ls seeds
> > > > > > >
> > > > > > > 060421 133004 parsing
> > > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > 060421 133004 parsing
> > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > 060421 133004 No FS indicated, using default:local
> > > > > > > Found 2 items
> > > > > > > /home/../nutch-nightly/seeds/urls.txt~ 0
> > > > > > > /home/../nutch-nightly/seeds/urls.txt 26
> > > > > > > bash-3.00$
> > > > > > >
> > > > > > > but:
> > > > > > >
> > > > > > > but:
> > > > > > >
> > > > > > > bin/nutch crawl seeds -dir crawled -depht 2
> > > > > > >
> > > > > > > 060421 131722 parsing
> > > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > 060421 131723 parsing
> > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > 060421 131723 parsing
> > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > 060421 131723 parsing
> > > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > 060421 131723 parsing
> > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > 060421 131723 parsing
> > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > 060421 131723 crawl started in: crawled
> > > > > > > 060421 131723 rootUrlDir = 2
> > > > > > > 060421 131723 threads = 10
> > > > > > > 060421 131723 depth = 5
> > > > > > > 060421 131724 Injector: starting
> > > > > > > 060421 131724 Injector: crawlDb: crawled/crawldb
> > > > > > > 060421 131724 Injector: urlDir: 2
> > > > > > > 060421 131724 Injector: Converting injected urls to crawl db
> > > entries.
> > > > > > > 060421 131724 parsing
> > > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > 060421 131724 parsing
> > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > 060421 131724 parsing
> > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > 060421 131724 parsing
> > > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > 060421 131724 parsing
> > > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > 060421 131725 parsing
> > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > 060421 131725 parsing
> > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > 060421 131725 parsing
> > > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > 060421 131726 parsing
> > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > 060421 131726 parsing
> > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > 060421 131726 parsing
> > > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > 060421 131726 parsing
> > > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > 060421 131726 parsing
> > > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > 060421 131726 parsing
> > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > 060421 131727 parsing
> > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > 060421 131727 parsing
> > > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > 060421 131727 parsing
> > > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > 060421 131727 parsing
> > > > > > /tmp/hadoop/mapred/local/localRunner/job_6jn7j8.xml
> > > > > > > 060421 131727 parsing
> > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > 060421 131727 job_6jn7j8
> > > > > > > java.io.IOException: No input directories specified in:
> > > Configuration:
> > > > > > > defaults: hadoop-default.xml , mapred-default.xml ,
> > > > > > > /tmp/hadoop/mapred/local/localRunner/job_6jn7j8.xmlfinal:
> > > > > > hadoop-site.xml
> > > > > > > at
> > > > > > >
> > >
> org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java
> > > > > > :90)
> > > > > > > at
> > > > > > >
> > >
> org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java
> > > > > > :100)
> > > > > > > at
> > > > > > >
> > >
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:88)
> > > > > > > 060421 131728 Running job: job_6jn7j8
> > > > > > > Exception in thread "main" java.io.IOException: Job failed!
> > > > > > > at
> > > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:322)
> > > > > > > at
> > > org.apache.nutch.crawl.Injector.inject(Injector.java:115)
> > > > > > > at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> > > > > > > bash-3.00$
> > > > > > >
> > > > > > > Can anyone help?
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > --- Ursprüngliche Nachricht ---
> > > > > > > > Von: "Zaheed Haque" <[EMAIL PROTECTED]>
> > > > > > > > An: [email protected]
> > > > > > > > Betreff: Re: java.io.IOException: No input directories
> specified
> > > in
> > > > > > > > Datum: Fri, 21 Apr 2006 13:18:37 +0200
> > > > > > > >
> > > > > > > > Also I have noticed that you are using hadoop-0.1, there was
> a
> > > bug in
> > > > > > > > 0.1 you should be using 0.1.1. Under you lib catalog you
> should
> > > have
> > > > > > > > the following file
> > > > > > > >
> > > > > > > > hadoop-0.1.1.jar
> > > > > > > >
> > > > > > > > If thats the case. Please download the latest nightly build.
> > > > > > > >
> > > > > > > > Cheers
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On 4/21/06, Zaheed Haque <[EMAIL PROTECTED]> wrote:
> > > > > > > > > Do you have a file called "hadoop-site.xml" under your
> conf
> > > > > > directory?
> > > > > > > > > The content of the file is like the following:
> > > > > > > > >
> > > > > > > > > <?xml version="1.0"?>
> > > > > > > > > <?xml-stylesheet type="text/xsl"
> href="configuration.xsl"?>
> > > > > > > > >
> > > > > > > > > <!-- Put site-specific property overrides in this file.
> -->
> > > > > > > > >
> > > > > > > > > <configuration>
> > > > > > > > >
> > > > > > > > > </configuration>
> > > > > > > > >
> > > > > > > > > or is it missing... if its missing please create a file
> under
> > > the
> > > > > > conf
> > > > > > > > > catalog with the name hadoop-site.xml and then try the
> hadoop
> > > dfs
> > > > > > -ls
> > > > > > > > > again? you should see something! like listing from your
> local
> > > file
> > > > > > > > > system.
> > > > > > > > >
> > > > > > > > > On 4/21/06, Peter Swoboda <[EMAIL PROTECTED]>
> wrote:
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > --- Ursprüngliche Nachricht ---
> > > > > > > > > > > Von: "Zaheed Haque" <[EMAIL PROTECTED]>
> > > > > > > > > > > An: [email protected]
> > > > > > > > > > > Betreff: Re: java.io.IOException: No input directories
> > > specified
> > > > > > in
> > > > > > > > > > > Datum: Fri, 21 Apr 2006 09:48:38 +0200
> > > > > > > > > > >
> > > > > > > > > > > bin/hadoop dfs -ls
> > > > > > > > > > >
> > > > > > > > > > > Can you see your "seeds" directory?
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > bash-3.00$ bin/hadoop dfs -put seeds seeds
> > > > > > > > > > 060421 122421 parsing
> > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-0.
> > > > > > > > > > 1-dev.jar!/hadoop-default.xml
> > > > > > > > >
> > > > > > > > > I think the hadoop-site is missing cos we should be seeing
> a
> > > message
> > > > > > > > > like this here...
> > > > > > > > >
> > > > > > > > > 060421 131014 parsing
> > > > > > > > >
> > > file:/usr/local/src/nutch/build/nutch-0.8-dev/conf/hadoop-site.xml
> > > > > > > > >
> > > > > > > > > > 060421 122421 No FS indicated, using default:local
> > > > > > > > > >
> > > > > > > > > > bash-3.00$ bin/hadoop dfs -ls
> > > > > > > > > >
> > > > > > > > > > 060421 122425 parsing
> > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-0.
> > > > > > > > > > 1-dev.jar!/hadoop-default.xml
> > > > > > > > > >
> > > > > > > > > > 060421 122426 No FS indicated, using default:local
> > > > > > > > > >
> > > > > > > > > > Found 0 items
> > > > > > > > > >
> > > > > > > > > > bash-3.00$
> > > > > > > > > >
> > > > > > > > > > As you can see, i can't.
> > > > > > > > > > What's going wrong?
> > > > > > > > > >
> > > > > > > > > > > bin/hadoop dfs -ls seeds
> > > > > > > > > > >
> > > > > > > > > > > Can you see your text file with URLS?
> > > > > > > > > > >
> > > > > > > > > > > Furthermore bin/nutch crawl is a one shot crawl/index
> > > command. I
> > > > > > > > > > > strongly recommend you take the long route of
> > > > > > > > > > >
> > > > > > > > > > > inject, generate, fetch, updatedb, invertlinks, index,
> > > dedup and
> > > > > > > > > > > merge. You can try the above commands just by typing
> > > > > > > > > > > bin/nutch inject
> > > > > > > > > > > etc..
> > > > > > > > > > > If just try the inject command without any parameters
> it
> > > will
> > > > > > tell
> > > > > > > > you
> > > > > > > > > > > how to use it..
> > > > > > > > > > >
> > > > > > > > > > > Hope this helps.
> > > > > > > > > > > On 4/21/06, Peter Swoboda <[EMAIL PROTECTED]>
> > > wrote:
> > > > > > > > > > > > hi
> > > > > > > > > > > >
> > > > > > > > > > > > i've changed from nutch 0.7 to 0.8
> > > > > > > > > > > > done the following steps:
> > > > > > > > > > > > created an urls.txt in a dir. named seeds
> > > > > > > > > > > >
> > > > > > > > > > > > bin/hadoop dfs -put seeds seeds
> > > > > > > > > > > >
> > > > > > > > > > > > 060317 121440 parsing
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-
> > > > > > 0.1-dev.jar!/hadoop-default.xml
> > > > > > > > > > > > 060317 121441 No FS indicated, using default:local
> > > > > > > > > > > >
> > > > > > > > > > > > bin/nutch crawl seeds -dir crawled -depth 2 >&
> crawl.log
> > > > > > > > > > > > but in crawl.log:
> > > > > > > > > > > > 060419 124302 parsing
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-
> > > > > > 0.1-dev.jar!/hadoop-default.xml
> > > > > > > > > > > > 060419 124302 parsing
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-
> > > > > > 0.1-dev.jar!/mapred-default.xml
> > > > > > > > > > > > 060419 124302 parsing
> > > > > > > > > > > /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunner
> > > > > > > > > > > > 060419 124302 parsing
> > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > java.io.IOException: No input directories specified
> in:
> > > > > > > > Configuration:
> > > > > > > > > > > > defaults: hadoop-default.xml , mapred-default.xml ,
> > > > > > > > > > > >
> > > /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunnerfinal:
> > > > > > > > > > > hadoop-site.xml
> > > > > > > > > > > > at
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > >
> org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java
> > > > > > :84)
> > > > > > > > > > > > at
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > >
> org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java
> > > > > > :94)
> > > > > > > > > > > > at
> > > > > > > > > > > >
> > > > > > > >
> > > > > >
> > >
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:70)
> > > > > > > > > > > > 060419 124302 Running job: job_e7cpf1
> > > > > > > > > > > > Exception in thread "main" java.io.IOException: Job
> > > failed!
> > > > > > > > > > > > at
> > > > > > > >
> org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:310)
> > > > > > > > > > > > at
> > > > > > org.apache.nutch.crawl.Injector.inject(Injector.java:114)
> > > > > > > > > > > > at
> org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> > > > > > > > > > > >
> > > > > > > > > > > > Any ideas?
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> > > > > > > > > > "Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > >
> > > > > > --
> > > > > > Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> > > > > > "Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
> > > > > >
> > > > >
> > > > > --
> > > > > Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> > > > > "Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
> > > > >
> > > >
> > >
> >
> > --
> > Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> > "Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
> >
>
--
GMX Produkte empfehlen und ganz einfach Geld verdienen!
Satte Provisionen für GMX Partner: http://www.gmx.net/de/go/partner
-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general