Re: File not found error

2014-06-24 Thread John Lafitte
Okay, I got it working again.  Not sure exactly what happened, but fsck
didn't help.  I noticed the last line showed "native method" so moved the
native binaries out of the /lib folder.  Lo and behold, the next time I ran
it, it used the java libs and displayed the filename it was having a
problem with.  It
was /tmp/hadoop-root/mapred/staging/root850517656/.staging so given that I
just went and moved the /tmp/hadoop-root directory and then it started
working again.  Permissions looked fine, so it might have just been corrupt.

Thanks for the help!


On Tue, Jun 24, 2014 at 9:03 PM, John Lafitte 
wrote:

> Well I'm just using nutch in local mode, no hdfs (as far as I know)...  My
> latest thing is trying to determine if there is a filesystem issue.  It's
> not really clear what file is not found.  I have about 10 different
> configs, this is just one of them and they all have the urls folder.  The
> script worked for quite a while before this just started happening on it's
> own.  That's why I'm suspecting a filesystem error.
>
>
> On Tue, Jun 24, 2014 at 6:53 PM, kaveh minooie  wrote:
>
>> you might want to check to see if
>>
>> > Injector: urlDir: di/urls
>>
>> still exist in your hdfs.
>>
>>
>>
>>
>> On 06/24/2014 12:30 AM, John Lafitte wrote:
>>
>>> Using Nutch 1.7
>>>
>>> Out of the blue all of my crawl jobs started failing a few days ago.  I
>>> checked the user logs and nobody logged into the server and there were no
>>> reboots or any other obvious issues.  There is plenty of disk space.
>>>  Here
>>> is the error I'm getting, any help is appreciated:
>>>
>>> Injector: starting at 2014-06-24 07:26:54
>>> Injector: crawlDb: di/crawl/crawldb
>>> Injector: urlDir: di/urls
>>> Injector: Converting injected urls to crawl db entries.
>>> Injector: ENOENT: No such file or directory
>>> at org.apache.hadoop.io.nativeio.NativeIO.chmod(Native Method)
>>> at org.apache.hadoop.fs.FileUtil.execSetPermission(FileUtil.java:701)
>>>   at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:656)
>>> at
>>> org.apache.hadoop.fs.RawLocalFileSystem.setPermission(
>>> RawLocalFileSystem.java:514)
>>>   at
>>> org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(
>>> RawLocalFileSystem.java:349)
>>> at org.apache.hadoop.fs.FilterFileSystem.mkdirs(
>>> FilterFileSystem.java:193)
>>>   at
>>> org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(
>>> JobSubmissionFiles.java:126)
>>> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:942)
>>>   at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>>   at javax.security.auth.Subject.doAs(Subject.java:416)
>>> at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(
>>> UserGroupInformation.java:1190)
>>>   at org.apache.hadoop.mapred.JobClient.submitJobInternal(
>>> JobClient.java:936)
>>> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910)
>>>   at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1353)
>>> at org.apache.nutch.crawl.Injector.inject(Injector.java:281)
>>>   at org.apache.nutch.crawl.Injector.run(Injector.java:318)
>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>   at org.apache.nutch.crawl.Injector.main(Injector.java:308)
>>>
>>>
>> --
>> Kaveh Minooie
>>
>
>


Re: File not found error

2014-06-24 Thread John Lafitte
Well I'm just using nutch in local mode, no hdfs (as far as I know)...  My
latest thing is trying to determine if there is a filesystem issue.  It's
not really clear what file is not found.  I have about 10 different
configs, this is just one of them and they all have the urls folder.  The
script worked for quite a while before this just started happening on it's
own.  That's why I'm suspecting a filesystem error.


On Tue, Jun 24, 2014 at 6:53 PM, kaveh minooie  wrote:

> you might want to check to see if
>
> > Injector: urlDir: di/urls
>
> still exist in your hdfs.
>
>
>
>
> On 06/24/2014 12:30 AM, John Lafitte wrote:
>
>> Using Nutch 1.7
>>
>> Out of the blue all of my crawl jobs started failing a few days ago.  I
>> checked the user logs and nobody logged into the server and there were no
>> reboots or any other obvious issues.  There is plenty of disk space.  Here
>> is the error I'm getting, any help is appreciated:
>>
>> Injector: starting at 2014-06-24 07:26:54
>> Injector: crawlDb: di/crawl/crawldb
>> Injector: urlDir: di/urls
>> Injector: Converting injected urls to crawl db entries.
>> Injector: ENOENT: No such file or directory
>> at org.apache.hadoop.io.nativeio.NativeIO.chmod(Native Method)
>> at org.apache.hadoop.fs.FileUtil.execSetPermission(FileUtil.java:701)
>>   at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:656)
>> at
>> org.apache.hadoop.fs.RawLocalFileSystem.setPermission(
>> RawLocalFileSystem.java:514)
>>   at
>> org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(
>> RawLocalFileSystem.java:349)
>> at org.apache.hadoop.fs.FilterFileSystem.mkdirs(
>> FilterFileSystem.java:193)
>>   at
>> org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(
>> JobSubmissionFiles.java:126)
>> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:942)
>>   at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
>> at java.security.AccessController.doPrivileged(Native Method)
>>   at javax.security.auth.Subject.doAs(Subject.java:416)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(
>> UserGroupInformation.java:1190)
>>   at org.apache.hadoop.mapred.JobClient.submitJobInternal(
>> JobClient.java:936)
>> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910)
>>   at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1353)
>> at org.apache.nutch.crawl.Injector.inject(Injector.java:281)
>>   at org.apache.nutch.crawl.Injector.run(Injector.java:318)
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>   at org.apache.nutch.crawl.Injector.main(Injector.java:308)
>>
>>
> --
> Kaveh Minooie
>


Re: File not found error

2014-06-24 Thread kaveh minooie

you might want to check to see if

> Injector: urlDir: di/urls

still exist in your hdfs.



On 06/24/2014 12:30 AM, John Lafitte wrote:

Using Nutch 1.7

Out of the blue all of my crawl jobs started failing a few days ago.  I
checked the user logs and nobody logged into the server and there were no
reboots or any other obvious issues.  There is plenty of disk space.  Here
is the error I'm getting, any help is appreciated:

Injector: starting at 2014-06-24 07:26:54
Injector: crawlDb: di/crawl/crawldb
Injector: urlDir: di/urls
Injector: Converting injected urls to crawl db entries.
Injector: ENOENT: No such file or directory
at org.apache.hadoop.io.nativeio.NativeIO.chmod(Native Method)
at org.apache.hadoop.fs.FileUtil.execSetPermission(FileUtil.java:701)
  at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:656)
at
org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:514)
  at
org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:349)
at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:193)
  at
org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:126)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:942)
  at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:416)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
  at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910)
  at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1353)
at org.apache.nutch.crawl.Injector.inject(Injector.java:281)
  at org.apache.nutch.crawl.Injector.run(Injector.java:318)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
  at org.apache.nutch.crawl.Injector.main(Injector.java:308)



--
Kaveh Minooie


File not found error

2014-06-24 Thread John Lafitte
Using Nutch 1.7

Out of the blue all of my crawl jobs started failing a few days ago.  I
checked the user logs and nobody logged into the server and there were no
reboots or any other obvious issues.  There is plenty of disk space.  Here
is the error I'm getting, any help is appreciated:

Injector: starting at 2014-06-24 07:26:54
Injector: crawlDb: di/crawl/crawldb
Injector: urlDir: di/urls
Injector: Converting injected urls to crawl db entries.
Injector: ENOENT: No such file or directory
at org.apache.hadoop.io.nativeio.NativeIO.chmod(Native Method)
at org.apache.hadoop.fs.FileUtil.execSetPermission(FileUtil.java:701)
 at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:656)
at
org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:514)
 at
org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:349)
at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:193)
 at
org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:126)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:942)
 at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:416)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
 at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910)
 at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1353)
at org.apache.nutch.crawl.Injector.inject(Injector.java:281)
 at org.apache.nutch.crawl.Injector.run(Injector.java:318)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 at org.apache.nutch.crawl.Injector.main(Injector.java:308)