Re: Nutch and Hadoop not working proper

MilleBii Thu, 25 Jun 2009 12:30:10 -0700

Did another test and got this error:

2009-06-25 21:19:44,663 ERROR mapred.EagerTaskInitializationListener - Job
initialization failed:
java.lang.IllegalArgumentException: Pathname
/d:/Bii/nutch/logs/history/user/_logs/history/localhost_1245956549829_job_200906252102_0001_pc-xxx%xxx_inject+urls
from
d:/Bii/nutch/logs/history/user/_logs/history/localhost_1245956549829_job_200906252102_0001_pc-xxx%5Cxxx_inject+urls
*is not a valid DFS filename*


Some remarks which may help someone to give an hint
1. log files are not in the dfs but in the local filesystem, so why is
looking in the dfs for the logs ?
2. Of course a windows path... does not fit in DFS
3. even in the local file system it is the wrong path .... it should be
/d:/Bii/nutch/logs/history/localhost....



2009/6/25 MilleBii <mille...@gmail.com>

>
>
>
> 2009/6/24 Andrzej Bialecki <a...@getopt.org>
>
>> MilleBii wrote:
>>
>>> What's also i have discovered
>>> + hadoop (script) works with unix like paths and works fine on windows
>>> + nutch (script) works with Windows paths
>>>
>>
>> bin/nutch works with Windows paths? I think this could happen only by
>> accident - both scripts work with Cygwin paths. On the other hand, arguments
>> passed to JVM must be regular Windows paths.
>>
> That's what I meant all path on the JVM call are windows...
> Actually this is a question path in nutch-site.xml or hadoop-site.xml
> should be what Unix like or Windows like ?
>
>
>>
>>> Could it be that there is some incompatibility because one works unix
>>> like
>>> paths and not the other ???
>>>
>>
>> Both scripts work fine for me on Windows XP + Cygwin, without any special
>> settings - I suspect there is something strange in your environment or
>> config...
>>
>> Please note that Hadoop and Nutch scripts are regular shell scripts, so
>> they are aware of Cygwin path conventions, in fact they don't accept
>> un-escaped Windows paths as arguments (i.e. you need to use forward slashes,
>> or you need to put double quotes around a Windows path).
>>
>> Clear but in a way I don't use any path... since I'm using only relative
> paths (at least I think).
>
> The test command that I use is very simple the following : nutch crawl urls
> -dir hcrawl -depth 2
>
> Both urls & hcrawl directory do exist in the hdfs filesystem.... yet I get
> a job failed error, and when I look where was the problem, I get this
> strange path problem.
>
>
>>
>>
>> --
>> Best regards,
>> Andrzej Bialecki     <><
>>  ___. ___ ___ ___ _ _   __________________________________
>> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
>> ___|||__||  \|  ||  |  Embedded Unix, System Integration
>> http://www.sigram.com  Contact: info at sigram dot com
>>
>>
>
>
> --
> -MilleBii-
>



-- 
-MilleBii-

Re: Nutch and Hadoop not working proper

Reply via email to