Hello Bartosz, I'm running the default Nutch 1.0 version on Windows XP (2 GB RAM) with Eclipse 3.3.0. I followed the directions at
http://wiki.apache.org/nutch/RunNutchInEclipse0.9 exactly as stated. I'm able to run the default Nutch 0.9 release without any problems in Eclipse. But when I run 1.0, I always get the java.io.IOException as stated in my last email. I had assumed it was due to the plugin issue, but maybe not. I'm just running a very small crawl with two seed URLs. Here's what hadoop.log says: 2009-04-13 13:41:03,010 INFO crawl.Crawl - crawl started in: crawl 2009-04-13 13:41:03,025 INFO crawl.Crawl - rootUrlDir = urls 2009-04-13 13:41:03,025 INFO crawl.Crawl - threads = 10 2009-04-13 13:41:03,025 INFO crawl.Crawl - depth = 3 2009-04-13 13:41:03,025 INFO crawl.Crawl - topN = 5 2009-04-13 13:41:03,479 INFO crawl.Injector - Injector: starting 2009-04-13 13:41:03,479 INFO crawl.Injector - Injector: crawlDb: crawl/crawldb 2009-04-13 13:41:03,479 INFO crawl.Injector - Injector: urlDir: urls 2009-04-13 13:41:03,479 INFO crawl.Injector - Injector: Converting injected urls to crawl db entries. 2009-04-13 13:41:03,588 WARN mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 2009-04-13 13:41:06,105 WARN mapred.LocalJobRunner - job_local_0001 java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:498) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138) I have not tried Sanjoy's advice yet... it looks like this is a memory issue. Any advice would be much appreciated, Frank 2009/4/10 Bartosz Gadzimski <bartek...@o2.pl>: > Hello Frank, > > Please look into hadoop.log and let maybe there is something more. > > About your error - you must give us more specific configuration of your > nutch. > > Default nutch installation is working with no problems (I'v never changed > src/plugin path) > > Please tell us: version of nutch > any changes > different configurations (different then crawl-urlfilter - adding your > domain). > > Thanks, > Bartosz > > Frank McCown pisze: >> >> Adding cygwin to my PATH solved my problem with whoami. But now I'm >> getting an exception when running the crawler: >> >> Injector: Converting injected urls to crawl db entries. >> Exception in thread "main" java.io.IOException: Job failed! >> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232) >> at org.apache.nutch.crawl.Injector.inject(Injector.java:160) >> at org.apache.nutch.crawl.Crawl.main(Crawl.java:114) >> >> I know from searching the mailing list that this is normally due to a >> bad plugin.folders setting in the nutch-default.xml, but I used the >> same value as the tutorial (./src/plugin) to no avail. >> >> (As an aside, seems like Hadoop should provide a better error message >> if the plugin folder doesn't exist.) >> >> Anyway, thanks, Bartosz, for your help. >> >> Frank >> >> >> 2009/4/10 Bartosz Gadzimski <bartek...@o2.pl>: >> >>> >>> Hello, >>> >>> So now you have to install cygwin and be sure that you add it to PATH >>> >>> it's in http://wiki.apache.org/nutch/RunNutchInEclipse0.9 >>> >>> After this you should be able to run "bash" command from command prompt >>> (Menu Start > RUN > cmd.exe) >>> >>> Then you'r done - everything will be working. >>> >>> I must add it to wiki, I forgot about whoami problem. >>> >>> Take care, >>> Bartosz >>> >>> sanjoy.gh...@thomsonreuters.com pisze: >>> >>>> >>>> Thanks for the suggestion Bartosz. I downloaded whoami, and It promptly >>>> crashed on "bash". >>>> >>>> 09/04/10 12:02:28 WARN fs.FileSystem: uri=file:/// >>>> javax.security.auth.login.LoginException: Login failed: Cannot run >>>> program "bash": CreateProcess error=2, The system cannot find the file >>>> specified >>>> at >>>> org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupI >>>> nformation.java:250) >>>> at >>>> org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupI >>>> nformation.java:275) >>>> at >>>> org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupI >>>> nformation.java:257) >>>> at >>>> org.apache.hadoop.security.UserGroupInformation.login(UserGroupInformati >>>> on.java:67) >>>> at >>>> org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:1438) >>>> at >>>> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1376) >>>> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:215) >>>> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:120) >>>> at org.apache.nutch.crawl.Crawl.main(Crawl.java:84) >>>> >>>> Where am I going to find "bash" on Windows without running commandline >>>> cygwin? Is there a way to turn off this security in Hadoop? >>>> >>>> Thanks, >>>> Sanjoy >>>> >>>> -----Original Message----- >>>> From: Bartosz Gadzimski [mailto:bartek...@o2.pl] Sent: Friday, April 10, >>>> 2009 5:06 AM >>>> To: nutch-dev@lucene.apache.org >>>> Subject: Re: login failed exception >>>> >>>> Hello, >>>> >>>> I am not sure if it's the case but you should try to add whoami to your >>>> windows box. >>>> >>>> for example for windows xp and sp2: >>>> http://www.microsoft.com/downloads/details.aspx?FamilyId=49AE8576-9BB9-4 >>>> 126-9761-BA8011FABF38&displaylang=en >>>> >>>> >>>> Thanks, >>>> Bartosz >>>> >>>> Frank McCown pisze: >>>> >>>> >>>>> >>>>> I've been running 0.9 in Eclipse on Windows for some time, and I was >>>>> successful in running the NutchBean from version 1.0 in Eclipse, but >>>>> the crawler gave me the same exception as it gave this individual. >>>>> Maybe there's something else I'm overlooking, but I followed the >>>>> Tutorial at >>>>> >>>>> http://wiki.apache.org/nutch/RunNutchInEclipse0.9 >>>>> >>>>> to a T. I'll keep working on it though. >>>>> >>>>> Frank >>>>> >>>>> >>>>> 2009/4/10 Bartosz Gadzimski <bartek...@o2.pl>: >>>>> >>>>> >>>>>> >>>>>> fmccown pisze: >>>>>> >>>>>> >>>>>>> >>>>>>> You must run Nutch's crawler using cygwin on Windows since cygwin >>>>>>> >>>>>>> >>>> >>>> has the >>>> >>>> >>>>>>> >>>>>>> whoami program. If you run it from Eclipse on Windows, it can't use >>>>>>> cygwin's whoami program and will fail with the exceptions you saw. >>>>>>> >>>>>>> >>>> >>>> This >>>> >>>> >>>>>>> >>>>>>> is >>>>>>> an unfortunately design decision in Hadoop which makes anything >>>>>>> >>>>>>> >>>> >>>> after >>>> >>>> >>>>>>> >>>>>>> version 9.0 not work in Eclipse on Windows. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> It's not true, please look at >>>>>> http://wiki.apache.org/nutch/RunNutchInEclipse0.9 >>>>>> >>>>>> I am using nutch 1.0 with eclipse on windows with no problems. >>>>>> >>>>>> Thanks, >>>>>> Bartosz >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >> >> > >