I have fixed this problem, seemingly just by clearing my tmp folder, but now
whenever i try to updatedb i get:
CrawlDb update: starting
CrawlDb update: db: db
CrawlDb update: segment: segments/20071121062359
CrawlDb update: Merging segment data into db.
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:357)
at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:62)
at org.apache.nutch.crawl.CrawlDb.main(CrawlDb.java:116)
just job failed, no good error message.
On Nov 21, 2007 8:25 AM, Josh Attenberg <[EMAIL PROTECTED]> wrote:
> it appears Nutch is still looking for
> /tmp/hadoop/mapred/system/submit_xiq66r/job.jar
> I moved stuff around when i probably shouldnt have, I have copied this
> folder (/tmp/hadoop/) elsewhere, so there still must be a variable to set. i
> get this error during inject:
>
> Injector: Converting injected urls to crawl db entries.
> Exception in thread "main" java.io.FileNotFoundException:
> /tmp/hadoop/mapred/system/submit_xiq66r/job.jar (No such file or directory)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.<init>(FileOutputStream.java:179)
> at java.io.FileOutputStream.<init>(FileOutputStream.java:131)
> at
> org.apache.hadoop.fs.LocalFileSystem$LocalFSFileOutputStream.<init>(
> LocalFileSystem.java:133)
> at org.apache.hadoop.fs.LocalFileSystem.createRaw(
> LocalFileSystem.java:172)
> at org.apache.hadoop.fs.LocalFileSystem.createRaw(
> LocalFileSystem.java:180)
> at org.apache.hadoop.fs.FSDataOutputStream$Summer.<init>(
> FSDataOutputStream.java:56)
> at org.apache.hadoop.fs.FSDataOutputStream$Summer.<init>(
> FSDataOutputStream.java:45)
> at org.apache.hadoop.fs.FSDataOutputStream .<init>(
> FSDataOutputStream.java:146)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:270)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:177)
> at org.apache.hadoop.fs.FileUtil.copy (FileUtil.java:74)
> at org.apache.hadoop.fs.LocalFileSystem.copyFromLocalFile(
> LocalFileSystem.java:311)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java
> :254)
> at org.apache.hadoop.mapred.JobClient.runJob (JobClient.java:327)
> at org.apache.nutch.crawl.Injector.inject(Injector.java:138)
> at org.apache.nutch.crawl.Injector.main(Injector.java:164)
>
>
> On Nov 21, 2007 12:09 AM, Susam Pal < [EMAIL PROTECTED]> wrote:
>
> > I haven't asked you to move the files to a new location. I don't know
> > if moving the files work. My solution was for a fresh crawl. If the
> > partition which contains /tmp doesn't have enough space, you can point
> > Nutch to a different temporary directory by adding this property to
> > your 'conf/nutch-site.xml' and do a new crawl.
> >
> > <property>
> > <name>hadoop.tmp.dir</name>
> > <value>/opt/tmp</value>
> > <description>Base for Nutch Temporary Directories</description>
> > </property>
> >
> > Please note that /opt/tmp is only an example. Change it to whatever is
> > required on your system. Please post the relevant portions of the
> > error logs too when an error occurs.
> >
> > Regards,
> > Susam Pal
> >
> > On Nov 21, 2007 10:28 AM, Josh Attenberg <[EMAIL PROTECTED]>
> > wrote:
> > > i did as you say, and moved the files to a new directory on a big
> > drive, but
> > > now have some additional errors. are there any other pointers i need
> > to
> > > update?
> > >
> > >
> > > On Nov 20, 2007 11:33 PM, Susam Pal < [EMAIL PROTECTED]> wrote:
> > >
> > > > Is /tmp present in a partition that doesn't have enough space? Does
> > it
> > > > have enough space left when this error occurs? Nutch often needs GBs
> >
> > > > of space for /tmp. If there isn't enough space on the partition
> > having
> > > > /tmp, then you can add the following property in
> > > > 'conf/hadoop-site.xml' to make it use a different directory for
> > > > writing the temporary files.
> > > >
> > > > <property>
> > > > <name>hadoop.tmp.dir</name>
> > > > <value>/opt/tmp</value>
> > > > <description>Base for Nutch Temporary Directories</description>
> > > > </property>
> > > >
> > > > Regards,
> > > > Susam Pal
> > > >
> > > > On Nov 21, 2007 8:54 AM, Josh Attenberg <[EMAIL PROTECTED] >
> > wrote:
> > > > > I had this error when fetching with nutch 0.8.1 there is ~450GB
> > left on
> > > > the
> > > > > disk where the crawl db and segments folder. Are there any other
> > > > settings i
> > > > > need to make? I know there isnt much space in my home directory,
> > if it
> > > > was
> > > > > trying to write there, but there is at least 500M. what are the
> > possible
> > > > > culprits/fixes?
> > > > >
> > > > >
> > > > > Exception in thread "main" org.apache.hadoop.fs.FSError:
> > > > java.io.IOException:
> > > > > No space left on device
> > > > > at
> > > > > org.apache.hadoop.fs.LocalFileSystem$LocalFSFileOutputStream.write
> > (
> > > > > LocalFileSystem.java:150)
> > > > > at
> > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(
> > > > > FSDataOutputStream.java:112)
> > > > > at java.io.BufferedOutputStream.flushBuffer(
> > > > > BufferedOutputStream.java:65)
> > > > > at java.io.BufferedOutputStream.flush (
> > BufferedOutputStream.java
> > > > :123)
> > > > > at java.io.DataOutputStream.flush(DataOutputStream.java
> > :106)
> > > > > at java.io.FilterOutputStream.close(
> > FilterOutputStream.java:140)
> > > > > at org.apache.hadoop.fs.FSDataOutputStream$Summer.close(
> > > > > FSDataOutputStream.java:96)
> > > > > at java.io.FilterOutputStream.close(
> > FilterOutputStream.java:143)
> > > > > at java.io.FilterOutputStream.close(
> > FilterOutputStream.java:143)
> > > > > at java.io.FilterOutputStream.close(
> > FilterOutputStream.java:143)
> > > > > at org.apache.hadoop.fs.FileUtil.copyContent (
> > FileUtil.java:154)
> > > > > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:74)
> > > > > at org.apache.hadoop.fs.LocalFileSystem.copyFromLocalFile(
> > > > > LocalFileSystem.java :311)
> > > > > at org.apache.hadoop.mapred.JobClient.submitJob(
> > JobClient.java
> > > > :254)
> > > > > at org.apache.hadoop.mapred.JobClient.runJob(
> > JobClient.java:327)
> > > > > at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java
> > :443)
> > > > > at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:477)
> > > > > Caused by: java.io.IOException: No space left on device
> > > > > at java.io.FileOutputStream.writeBytes(Native Method)
> > > > > at java.io.FileOutputStream.write(FileOutputStream.java
> > :260)
> > > > > at
> > > > > org.apache.hadoop.fs.LocalFileSystem$LocalFSFileOutputStream.write(
> > > > > LocalFileSystem.java:148)
> > > > > ... 16 more
> > > > >
> > > >
> > >
> >
>
>