Hello,
I am able to successfully crawl sites with Nutch using Hadoop-1.2.1.

But when using Hadoop-2.7.1 inject phase fails with the above error(I've
mentioned in the first mail of this thread)

Can you suggest where I'm doing wrong?
Thanks in advance.

With regards,
Imtiaz Shakil Siddique
On Sep 15, 2015 1:56 AM, "Markus Jelsma" <[email protected]> wrote:

> Hello, we are running Nutch 1.10 on Hadoop 2.7 for quite some time now.
> But since Hadoop kept its binary compatibility, even older Nutch' should
> work just as fine.
>
> Markus
>
> -----Original message-----
> > From:Imtiaz Shakil Siddique <[email protected]>
> > Sent: Monday 14th September 2015 18:08
> > To: [email protected]
> > Subject: Re: Compatible Hadoop version with Nutch 1.10
> >
> > Hi,
> >
> > Nutch 1.10 is already released. Are you mentioning about any newer nutch
> > release(nutch 1.11).
> >
> > Thanks for the help Sir.
> > Imtiaz Shakil Siddique
> > On Sep 14, 2015 8:57 PM, "Sebastian Nagel" <[email protected]>
> > wrote:
> >
> > > Hi,
> > >
> > > Nutch 1.10 is supposed to run with Hadoop 1.2.0.
> > > 1.10 (to be released soon) will run with 2.4.0,
> > > and probably also with newer Hadoop versions.
> > >
> > > If you need Nutch with a recent Hadoop version
> > > right now, you could build it by yourself from trunk.
> > >
> > > Cheers,
> > > Sebastian
> > >
> > > 2015-09-11 16:14 GMT+02:00 Imtiaz Shakil Siddique <
> [email protected]
> > > >:
> > >
> > > > Hi,
> > > >
> > > > I was trying to test nutch 1.10 with Hadoop-2.7.1 but during the
> inject
> > > > phase I came across with some errors.
> > > >
> > > > > I was executing $NUTCH_HOME/runtime/deploy/bin/crawl -i
> > > /home/nutch/urls
> > > > > /home/nutch/crawl/ 1
> > > >
> > > > 15/09/10 19:41:17 ERROR crawl.Injector: Injector:
> > > > > java.lang.IllegalArgumentException: Wrong FS:
> > > > > hdfs://localhost:9000/user/root/inject-temp-875522145, expected:
> > > file:///
> > > > > at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:646)
> > > > > at
> > > > >
> > > >
> > >
> org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:82)
> > > > > at
> > > > >
> > > >
> > >
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:601)
> > > > > at
> > > > >
> > > >
> > >
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:819)
> > > > > at
> > > > >
> > > >
> > >
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:596)
> > > > > at
> org.apache.hadoop.fs.FileSystem.isDirectory(FileSystem.java:1437)
> > > > > at
> > > > >
> > > >
> > >
> org.apache.hadoop.fs.ChecksumFileSystem.rename(ChecksumFileSystem.java:506)
> > > > > at org.apache.nutch.crawl.CrawlDb.install(CrawlDb.java:168)
> > > > > at org.apache.nutch.crawl.Injector.inject(Injector.java:356)
> > > > > at org.apache.nutch.crawl.Injector.run(Injector.java:379)
> > > > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> > > > > at org.apache.nutch.crawl.Injector.main(Injector.java:369)
> > > > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > > > > at
> > > > >
> > > >
> > >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> > > > > at
> > > > >
> > > >
> > >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > > > > at java.lang.reflect.Method.invoke(Method.java:497)
> > > > > at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> > > > > at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> > > >
> > > >
> > > > my conf file (hadoop-2.7.1) is given below
> > > > -------- core-site.xml --------
> > > >
> > > > <property>
> > > >          <name>fs.default.name</name>
> > > >          <value>hdfs://localhost:9000</value>
> > > > </property>
> > > > <property>
> > > > <name>hadoop.tmp.dir</name>
> > > > <value>/home/nutch/hadoopData/hadoopTmpDir</value>
> > > > </property>
> > > >
> > > > -------- hdfs-site.xml --------
> > > >  <property>
> > > >    <name>dfs.namenode.name.dir</name>
> > > >    <value>/home/nutch/hadoopData/nameNodeData</value>
> > > >  </property>
> > > >
> > > >  <property>
> > > >    <name>dfs.datanode.data.dir</name>
> > > >    <value>/home/nutch/hadoopData/dataNodeData</value>
> > > >  </property>
> > > >
> > > >      <property>
> > > >          <name>dfs.replication</name>
> > > >          <value>1</value>
> > > >      </property>
> > > > -------- mapred-site.xml --------
> > > > <property>
> > > >         <name>mapred.job.tracker</name>
> > > >         <value>localhost:9001</value>
> > > >      </property>
> > > > <property>
> > > >
> > > > <name>mapred.system.dir</name>
> > > >
> > > >    <value>/home/nutch/hadoopData/mapredJobTrackerData</value>
> > > > </property>
> > > > <property>
> > > >
> > > >  <name>mapred.local.dir</name>
> > > >
> > > >   <value>/home/nutch/hadoopData/mapredTaskTrackerData</value>
> > > >
> > > > </property>
> > > >
> > > > But the same command works successfully when I use Hadoop-1.2.1.
> > > > What is the preferred version of Hadoop that we should use with
> Apache
> > > > Nutch 1.10
> > > >
> > > >
> > > > Thank you so much.
> > > > Imtiaz Shakil Siddique
> > > >
> > >
> >
>

Reply via email to