Re: login failed exception
Hello Frank, Yes, it is memory issue you must increase java heap size. Just follow this instructions (another things to add to wiki ;) Eclipse - Window - Preferences - Java - Installed JREs - edit - Default VM arguments I've set mine to -Xms5m -Xmx150m because I have like 200MB RAM left after runnig all apps -Xms (minimum ammount of RAM memory for running applications) -Xmx (maximum) It should help. Thanks, Bartosz Frank McCown pisze: Hello Bartosz, I'm running the default Nutch 1.0 version on Windows XP (2 GB RAM) with Eclipse 3.3.0. I followed the directions at http://wiki.apache.org/nutch/RunNutchInEclipse0.9 exactly as stated. I'm able to run the default Nutch 0.9 release without any problems in Eclipse. But when I run 1.0, I always get the java.io.IOException as stated in my last email. I had assumed it was due to the plugin issue, but maybe not. I'm just running a very small crawl with two seed URLs. Here's what hadoop.log says: 2009-04-13 13:41:03,010 INFO crawl.Crawl - crawl started in: crawl 2009-04-13 13:41:03,025 INFO crawl.Crawl - rootUrlDir = urls 2009-04-13 13:41:03,025 INFO crawl.Crawl - threads = 10 2009-04-13 13:41:03,025 INFO crawl.Crawl - depth = 3 2009-04-13 13:41:03,025 INFO crawl.Crawl - topN = 5 2009-04-13 13:41:03,479 INFO crawl.Injector - Injector: starting 2009-04-13 13:41:03,479 INFO crawl.Injector - Injector: crawlDb: crawl/crawldb 2009-04-13 13:41:03,479 INFO crawl.Injector - Injector: urlDir: urls 2009-04-13 13:41:03,479 INFO crawl.Injector - Injector: Converting injected urls to crawl db entries. 2009-04-13 13:41:03,588 WARN mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 2009-04-13 13:41:06,105 WARN mapred.LocalJobRunner - job_local_0001 java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:498) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138) I have not tried Sanjoy's advice yet... it looks like this is a memory issue. Any advice would be much appreciated, Frank 2009/4/10 Bartosz Gadzimski bartek...@o2.pl: Hello Frank, Please look into hadoop.log and let maybe there is something more. About your error - you must give us more specific configuration of your nutch. Default nutch installation is working with no problems (I'v never changed src/plugin path) Please tell us: version of nutch any changes different configurations (different then crawl-urlfilter - adding your domain). Thanks, Bartosz Frank McCown pisze: Adding cygwin to my PATH solved my problem with whoami. But now I'm getting an exception when running the crawler: Injector: Converting injected urls to crawl db entries. Exception in thread main java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232) at org.apache.nutch.crawl.Injector.inject(Injector.java:160) at org.apache.nutch.crawl.Crawl.main(Crawl.java:114) I know from searching the mailing list that this is normally due to a bad plugin.folders setting in the nutch-default.xml, but I used the same value as the tutorial (./src/plugin) to no avail. (As an aside, seems like Hadoop should provide a better error message if the plugin folder doesn't exist.) Anyway, thanks, Bartosz, for your help. Frank 2009/4/10 Bartosz Gadzimski bartek...@o2.pl: Hello, So now you have to install cygwin and be sure that you add it to PATH it's in http://wiki.apache.org/nutch/RunNutchInEclipse0.9 After this you should be able to run bash command from command prompt (Menu Start RUN cmd.exe) Then you'r done - everything will be working. I must add it to wiki, I forgot about whoami problem. Take care, Bartosz sanjoy.gh...@thomsonreuters.com pisze: Thanks for the suggestion Bartosz. I downloaded whoami, and It promptly crashed on bash. 09/04/10 12:02:28 WARN fs.FileSystem: uri=file:/// javax.security.auth.login.LoginException: Login failed: Cannot run program bash: CreateProcess error=2, The system cannot find the file specified at org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupI nformation.java:250) at org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupI nformation.java:275) at org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupI nformation.java:257) at org.apache.hadoop.security.UserGroupInformation.login(UserGroupInformati on.java:67) at org.apache.hadoop.fs.FileSystem$Cache$Key.init(FileSystem.java:1438) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1376) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:215) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:120) at org.apache.nutch.crawl.Crawl.main(Crawl.java:84) Where am I going to find bash on
[Nutch Wiki] Update of RunNutchInEclipse0.9 by BartoszGadzimski
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The following page has been changed by BartoszGadzimski: http://wiki.apache.org/nutch/RunNutchInEclipse0%2e9 The comment on the change is: Added java heap size solution -- - = RunNutchInEclipse = + = Run Nutch In Eclipse on Linux and Windows nutch version 0.9= This is a work in progress. If you find errors or would like to improve this page, just create an account [UserPreferences] and start editing this page :-) @@ -104, +104 @@ * click on Run * if all works, you should see Nutch getting busy at crawling :-) - == Debug Nutch in Eclipse (not yet tested for 0.9) == + == Java Heap Size problem == + + If you find in hadoop.log line similar to this: + + {{{ + 2009-04-13 13:41:06,105 WARN mapred.LocalJobRunner - job_local_0001 + java.lang.OutOfMemoryError: Java heap space + }}} + + You should increase amount of RAM for running applications from eclipse. + + Just set it in: + + Eclipse - Window - Preferences - Java - Installed JREs - edit - Default VM arguments + + I've set mine to + {{{ + -Xms5m -Xmx150m + }}} + because I have like 200MB RAM left after runnig all apps + + -Xms (minimum ammount of RAM memory for running applications) + -Xmx (maximum) + + + == Debug Nutch in Eclipse == * Set breakpoints and debug a crawl * It can be tricky to find out where to set the breakpoint, because of the Hadoop jobs. Here are a few good places to set breakpoints: {{{
[Nutch Wiki] Update of RunNutchInEclipse1.0 by BartoszGadzimski
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The following page has been changed by BartoszGadzimski: http://wiki.apache.org/nutch/RunNutchInEclipse1%2e0 The comment on the change is: Copied page for 1.0 release New page: = Run Nutch In Eclipse on Linux and Windows nutch version 1.0= This is a work in progress. If you find errors or would like to improve this page, just create an account [UserPreferences] and start editing this page :-) == Tested with == * Nutch release 1.0 * Eclipse 3.3 - aka Europa, ganymede * Java 1.6 * Ubuntu (should work on most platforms though) * Windows XP == Before you start == Setting up Nutch to run into Eclipse can be tricky, and most of the time you are much faster if you edit Nutch in Eclipse but run the scripts from the command line (my 2 cents). However, it's very useful to be able to debug Nutch in Eclipse. But again you might be quicker by looking at the logs (logs/hadoop.log)... == Steps == === For Windows Users === If you are running Windows (tested on Windows XP) you must first install cygwin Download cygwin from http://www.cygwin.com/setup.exe Install cygwin and set PATH variable for it. It's in control panel, system, advanced tab, environment variables and edit/add PATH I have in PATH like: C:\Sun\SDK\bin;C:\cygwin\bin If you run bash in Start-RUN-cmd.exe it should work. Then you should install tools from Microsoft website (adding 'whoami' command). Example for Windows XP and sp2 http://www.microsoft.com/downloads/details.aspx?FamilyId=49AE8576-9BB9-4126-9761-BA8011FABF38displaylang=en Then you can follow rest of these steps === Install Nutch === * Grab a fresh release of Nutch 0.9 - http://lucene.apache.org/nutch/version_control.html * Do not build Nutch now. Make sure you have no .project and .classpath files in the Nutch directory === Create a new java project in Eclipse === * File New Project Java project click Next * Name the project (Nutch_Trunk for instance) * Select Create project from existing source and use the location where you downloaded Nutch * Click on Next, and wait while Eclipse is scanning the folders * Add the folder conf to the classpath (third tab and then add class folder) * Go to Order and Export tab, find the entry for added conf folder and move it to the top. It's required to make eclipse take config (nutch-default.xml, nutch-final.xml, etc.) resources from our conf folder not anywhere else. * Eclipse should have guessed all the java files that must be added on your classpath. If it's not the case, add src/java, src/test and all plugin src/java and src/test folders to your source folders. Also add all jars in lib and in the plugin lib folders to your libraries * Set output dir to tmp_build, create it if necessary * DO NOT add build to classpath === Configure Nutch === * See the [http://wiki.apache.org/nutch/NutchTutorial Tutorial] * Change the property plugin.folders to ./src/plugin on $NUTCH_HOME/conf/nutch-defaul.xml * Make sure Nutch is configured correctly before testing it into Eclipse ;-) === Missing org.farng and com.etranslate === Eclipse will complain about some import statements in parse-mp3 and parse-rtf plugins (30 errors in my case). Because of incompatibility with the Apache license, the .jar files that define the necessary classes were not included with the source code. Download them here: http://nutch.cvs.sourceforge.net/nutch/nutch/src/plugin/parse-mp3/lib/ http://nutch.cvs.sourceforge.net/nutch/nutch/src/plugin/parse-rtf/lib/ Copy the jar files into src/plugin/parse-mp3/lib and src/plugin/parse-rtf/lib/ respectively. Then add the jar files to the build path (First refresh the workspace by pressing F5. Then right-click the project folder Build Path Configure Build Path... Then select the Libraries tab, click Add Jars... and then add each .jar file individually). === Build Nutch === If you setup the project correctly, Eclipse will build Nutch for you into tmp_build. See below for problems you could run into. === Create Eclipse launcher === * Menu Run Run... * create New for Java Application * set in Main class {{{ org.apache.nutch.crawl.Crawl }}} * on tab Arguments, Program Arguments {{{ urls -dir crawl -depth 3 -topN 50 }}} * in VM arguments {{{ -Dhadoop.log.dir=logs -Dhadoop.log.file=hadoop.log }}} * click on Run * if all works, you should see Nutch getting busy at crawling :-) == Java Heap Size problem == If you find in hadoop.log line similar to this: {{{ 2009-04-13 13:41:06,105 WARN mapred.LocalJobRunner - job_local_0001 java.lang.OutOfMemoryError: Java heap space }}} You should increase amount of RAM for running applications from eclipse. Just set it in: Eclipse - Window - Preferences - Java - Installed JREs - edit - Default VM arguments I've set mine to {{{ -Xms5m -Xmx150m }}} because I have like 200MB RAM left after runnig all apps -Xms
[Nutch Wiki] Trivial Update of RunNutchInEclipse0.9 by BartoszGadzimski
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The following page has been changed by BartoszGadzimski: http://wiki.apache.org/nutch/RunNutchInEclipse0%2e9 -- - = Run Nutch In Eclipse on Linux and Windows nutch version 0.9= + = Run Nutch In Eclipse on Linux and Windows nutch version 0.9 = This is a work in progress. If you find errors or would like to improve this page, just create an account [UserPreferences] and start editing this page :-)
[Nutch Wiki] Trivial Update of RunNutchInEclipse1.0 by BartoszGadzimski
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The following page has been changed by BartoszGadzimski: http://wiki.apache.org/nutch/RunNutchInEclipse1%2e0 -- - = Run Nutch In Eclipse on Linux and Windows nutch version 1.0= + = Run Nutch In Eclipse on Linux and Windows nutch version 1.0 = This is a work in progress. If you find errors or would like to improve this page, just create an account [UserPreferences] and start editing this page :-)
[Nutch Wiki] Update of FrontPage by BartoszGadzimski
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The following page has been changed by BartoszGadzimski: http://wiki.apache.org/nutch/FrontPage -- * UpgradeFrom07To08 * [Upgrading_from_0.8.x_to_0.9] * RunNutchInEclipse for v0.8 - * [RunNutchInEclipse0.9] for v0.9 + * [RunNutchInEclipse0.9] for v0.9 (Linux and Windows) + * [RunNutchInEclipse1.0] for v1.0 (Linux and Windows) * [Crawl] - script to crawl (and possible recrawl too) * IntranetRecrawl - script to recrawl a crawl * MergeCrawl - script to merge 2 (or more) crawls