Hello Bartosz,

I'm running the default Nutch 1.0 version on Windows XP (2 GB RAM)
with Eclipse 3.3.0.  I followed the directions at

http://wiki.apache.org/nutch/RunNutchInEclipse0.9

exactly as stated.  I'm able to run the default Nutch 0.9 release
without any problems in Eclipse.  But when I run 1.0, I always get the
java.io.IOException as stated in my last email.  I had assumed it was
due to the plugin issue, but maybe not.  I'm just running a very small
crawl with two seed URLs.

Here's what hadoop.log says:

2009-04-13 13:41:03,010 INFO  crawl.Crawl - crawl started in: crawl
2009-04-13 13:41:03,025 INFO  crawl.Crawl - rootUrlDir = urls
2009-04-13 13:41:03,025 INFO  crawl.Crawl - threads = 10
2009-04-13 13:41:03,025 INFO  crawl.Crawl - depth = 3
2009-04-13 13:41:03,025 INFO  crawl.Crawl - topN = 5
2009-04-13 13:41:03,479 INFO  crawl.Injector - Injector: starting
2009-04-13 13:41:03,479 INFO  crawl.Injector - Injector: crawlDb: crawl/crawldb
2009-04-13 13:41:03,479 INFO  crawl.Injector - Injector: urlDir: urls
2009-04-13 13:41:03,479 INFO  crawl.Injector - Injector: Converting
injected urls to crawl db entries.
2009-04-13 13:41:03,588 WARN  mapred.JobClient - Use
GenericOptionsParser for parsing the arguments. Applications should
implement Tool for the same.
2009-04-13 13:41:06,105 WARN  mapred.LocalJobRunner - job_local_0001
java.lang.OutOfMemoryError: Java heap space
        at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:498)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138)


I have not tried Sanjoy's advice yet... it looks like this is a memory issue.

Any advice would be much appreciated,
Frank


2009/4/10 Bartosz Gadzimski <bartek...@o2.pl>:
> Hello Frank,
>
> Please look into hadoop.log and let maybe there is something more.
>
> About your error - you must give us more specific configuration of your
> nutch.
>
> Default nutch installation is working with no problems (I'v never changed
> src/plugin path)
>
> Please tell us: version of nutch
> any changes
> different configurations (different then crawl-urlfilter - adding your
> domain).
>
> Thanks,
> Bartosz
>
> Frank McCown pisze:
>>
>> Adding cygwin to my PATH solved my problem with whoami.  But now I'm
>> getting an exception when running the crawler:
>>
>> Injector: Converting injected urls to crawl db entries.
>> Exception in thread "main" java.io.IOException: Job failed!
>>        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
>>        at org.apache.nutch.crawl.Injector.inject(Injector.java:160)
>>        at org.apache.nutch.crawl.Crawl.main(Crawl.java:114)
>>
>> I know from searching the mailing list that this is normally due to a
>> bad plugin.folders setting in the nutch-default.xml, but I used the
>> same value as the tutorial (./src/plugin) to no avail.
>>
>> (As an aside, seems like Hadoop should provide a better error message
>> if the plugin folder doesn't exist.)
>>
>> Anyway, thanks, Bartosz, for your help.
>>
>> Frank
>>
>>
>> 2009/4/10 Bartosz Gadzimski <bartek...@o2.pl>:
>>
>>>
>>> Hello,
>>>
>>> So now you have to install cygwin and be sure that you add it to PATH
>>>
>>> it's in http://wiki.apache.org/nutch/RunNutchInEclipse0.9
>>>
>>> After this you should be able to run "bash" command from command prompt
>>> (Menu Start > RUN > cmd.exe)
>>>
>>> Then you'r done - everything will be working.
>>>
>>> I must add it to wiki, I forgot about whoami problem.
>>>
>>> Take care,
>>> Bartosz
>>>
>>> sanjoy.gh...@thomsonreuters.com pisze:
>>>
>>>>
>>>> Thanks for the suggestion Bartosz.  I downloaded whoami, and It promptly
>>>> crashed on "bash".
>>>>
>>>> 09/04/10 12:02:28 WARN fs.FileSystem: uri=file:///
>>>> javax.security.auth.login.LoginException: Login failed: Cannot run
>>>> program "bash": CreateProcess error=2, The system cannot find the file
>>>> specified
>>>>       at
>>>> org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupI
>>>> nformation.java:250)
>>>>       at
>>>> org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupI
>>>> nformation.java:275)
>>>>       at
>>>> org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupI
>>>> nformation.java:257)
>>>>       at
>>>> org.apache.hadoop.security.UserGroupInformation.login(UserGroupInformati
>>>> on.java:67)
>>>>       at
>>>> org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:1438)
>>>>       at
>>>> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1376)
>>>>       at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:215)
>>>>       at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:120)
>>>>       at org.apache.nutch.crawl.Crawl.main(Crawl.java:84)
>>>>
>>>> Where am I going to find "bash" on Windows without running commandline
>>>> cygwin?  Is there a way to turn off this security in Hadoop?
>>>>
>>>> Thanks,
>>>> Sanjoy
>>>>
>>>> -----Original Message-----
>>>> From: Bartosz Gadzimski [mailto:bartek...@o2.pl] Sent: Friday, April 10,
>>>> 2009 5:06 AM
>>>> To: nutch-dev@lucene.apache.org
>>>> Subject: Re: login failed exception
>>>>
>>>> Hello,
>>>>
>>>> I am not sure if it's the case but you should try to add whoami to your
>>>> windows box.
>>>>
>>>> for example for windows xp and sp2:
>>>> http://www.microsoft.com/downloads/details.aspx?FamilyId=49AE8576-9BB9-4
>>>> 126-9761-BA8011FABF38&displaylang=en
>>>>
>>>>
>>>> Thanks,
>>>> Bartosz
>>>>
>>>> Frank McCown pisze:
>>>>
>>>>
>>>>>
>>>>> I've been running 0.9 in Eclipse on Windows for some time, and I was
>>>>> successful in running the NutchBean from version 1.0 in Eclipse, but
>>>>> the crawler gave me the same exception as it gave this individual.
>>>>> Maybe there's something else I'm overlooking, but I followed the
>>>>> Tutorial at
>>>>>
>>>>> http://wiki.apache.org/nutch/RunNutchInEclipse0.9
>>>>>
>>>>> to a T.  I'll keep working on it though.
>>>>>
>>>>> Frank
>>>>>
>>>>>
>>>>> 2009/4/10 Bartosz Gadzimski <bartek...@o2.pl>:
>>>>>
>>>>>
>>>>>>
>>>>>> fmccown pisze:
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> You must run Nutch's crawler using cygwin on Windows since cygwin
>>>>>>>
>>>>>>>
>>>>
>>>> has the
>>>>
>>>>
>>>>>>>
>>>>>>> whoami program.  If you run it from Eclipse on Windows, it can't use
>>>>>>> cygwin's whoami program and will fail with the exceptions you saw.
>>>>>>>
>>>>>>>
>>>>
>>>> This
>>>>
>>>>
>>>>>>>
>>>>>>> is
>>>>>>> an unfortunately design decision in Hadoop which makes anything
>>>>>>>
>>>>>>>
>>>>
>>>> after
>>>>
>>>>
>>>>>>>
>>>>>>> version 9.0 not work in Eclipse on Windows.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> It's not true, please look at
>>>>>> http://wiki.apache.org/nutch/RunNutchInEclipse0.9
>>>>>>
>>>>>> I am using nutch 1.0 with eclipse on windows with no problems.
>>>>>>
>>>>>> Thanks,
>>>>>> Bartosz
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>
>>
>
>

Reply via email to