Hey Steve, I've got the Nutch running with Ubuntu and Sun Java 1.6 I'll try to review and help sometime next week.
Steve W. [EMAIL PROTECTED] wrote: > I would like to report back that I have completed this process and > have also cleaned up this message and reformatted it as a page under > the Nutch Wiki entitled GettingNutchRunningWithDebian > > Thank you to Justin Hartman at http://justinhartman.com for the tip > which made it all work. > > > > > > On 1/24/07, Steve W. <[EMAIL PROTECTED]> wrote: > >Partial success on the way to installing Nutch 0.8.1 With Debian Etch. > > > >http://mfgis.com/docs/nutchconfig.html > > > >I would like to relate here my progress towards implementing Nutch > >0.8.1 on Debian Etch in hope of receiving help at the stage where I > >have become stuck. > > > >So here goes: > >Disclaimer: I know little to nothing about the inner workings of > >Java, and Tomcat & Nutch were completely unknown to me a week ago. > > > >0. My OS > ># uname -a > >Linux 2.6.9-023stab033.6-enterprise #1 SMP Tue Nov 7 16:16:56 MSK > >2006 i686 GNU/Linux > ># cat /etc/debian_version > >testing/unstable > > > >I. Install Sun's Java > >//Sun Java is available as a set of Debian packages and may be easily > >installed using apt. (To obtain Sun's Java, ensure that 'non-free' is > >included in /etc/apt/sources.list) > ># apt-get install sun-java5-bin sun-java5-demo sun-java-5jdk sun-java5-jre > > > >//Since there may be more than one flavor of Java on the system (e.g. > >kaffe) ensure that Sun Java is the chosen alternative > ># update-alternatives --config java // then select sun java from the menu > > > >//If necessary edit /etc/profile to include the following lines: > >JAVA_HOME=/usr/lib/jvm/java-1.5.0-sun-1.5.0.10 > >export JAVA_HOME > > > >II. Install Tomcat5.5 > ># apt-get install tomcat5.5 libtomcat5.5-java tomcat5.5-admin tomcat5.5-web > >// Hopefully, tomcat is installed and running, which I was able to verify: > ># ps -ef |grep tomcat > >tomcat55 8069 1 0 09:11 ? 00:00:00 su -p -s /bin/sh > >tomcat55 -c /usr/sbin/rotatelogs > >"/var/lib/tomcat5.5/logs/catalina_%F.log" 86400 > >tomcat55 8072 8069 0 09:11 ? 00:00:00 /usr/sbin/rotatelogs > >/var/lib/tomcat5.5/logs/catalina_%F.log 86400 > >tomcat55 8103 1 0 09:11 ? 00:00:47 > >/usr/lib/jvm/java-1.5.0-sun-1.5.0.10/bin/java > >-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager > >-Djava.util.logging.config.file=/var/lib/tomcat5.5/conf/logging.properties > >-Djava.awt.headless=true -Xmx128M > >-Djava.endorsed.dirs=/usr/share/tomcat5.5/common/endorsed -classpath > >:/usr/lib/jvm/java-1.5.0-sun-1.5.0.10/jre//lib/jcert.jar:/usr/lib/jvm/java-1.5.0-sun-1.5.0.10/jre//lib/jnet.jar:/usr/lib/jvm/java-1.5.0-sun-1.5.0.10/jre//lib/jsse.jar:/usr/share/tomcat5.5/bin/bootstrap.jar:/usr/share/tomcat5.5/bin/commons-logging-api.jar > >-Djava.security.manager > >-Djava.security.policy==/var/lib/tomcat5.5/conf/catalina.policy > >-Dcatalina.base=/var/lib/tomcat5.5 > >-Dcatalina.home=/usr/share/tomcat5.5 > >-Djava.io.tmpdir=/var/lib/tomcat5.5/temp > >org.apache.catalina.startup.Bootstrap start > >// > >// So, while the above worked completely smoothly on the architecture > >described above, I am stalled at this stage on a second debian machine > >which is: > >#uname -a > >Linux amboro 2.6.15-1-486 #2 Mon Mar 6 15:19:16 UTC 2006 i686 GNU/Linux > >#cat /etc/debian_version > >4.0 > >// On this second machine, the previous #apt-get install tomcat5.5 > >libtomcat5.5-java tomcat5.5-admin tomcat5.5-web also forces the > >install of the jsvc ("native application to launch java packages as > >daemons") package, although at > >http://packages.debian.org/testing/web/tomcat5.5 jsvc shows as a > >suggested, not required (depends) package. > >// Installing the packages yields the following messages: > >Setting up jsvc (1.0.2~svn20061127-4) ... > >Setting up libtomcat5.5-java (5.5.20-4) ... > >Setting up tomcat5.5 (5.5.20-4) ... > >Adding system user `tomcat55' (UID 108) ... > >Adding new user `tomcat55' (UID 108) with group `nogroup' ... > >Not creating home directory `/usr/share/tomcat5.5'. > >Installing /var/lib/tomcat5.5/conf/tomcat-users.xml. > >Starting Tomcat servlet engine: tomcat5.5. > >Setting up tomcat5.5-admin (5.5.20-4) ... > >invoke-rc.d: initscript tomcat5.5, action "status" failed. > >Setting up tomcat5.5-webapps (5.5.20-4) ... > >invoke-rc.d: initscript tomcat5.5, action "status" failed. > >// So something is wrong. Running: > ># ps -ef |grep tomcat > >// Shows multiple processes which look like: > >root 9136 1 0 12:52 ? 00:00:00 jsvc.exec -user > >tomcat55 -cp > >/usr/share/java/commons-daemon.jar:/usr/share/tomcat5.5/bin/bootstrap.jar > >-outfile /var/lib/tomcat5.5/logs/catalina.out -errfile &1 -pidfile > >/var/run/tomcat5.5.pid -Djava.awt.headless=true -Xmx128M > >-Djava.endorsed.dirs=/usr/share/tomcat5.5/common/endorsed > >-Dcatalina.base=/var/lib/tomcat5.5 > >-Dcatalina.home=/usr/share/tomcat5.5 > >-Djava.io.tmpdir=/var/lib/tomcat5.5/temp -Djava.security.manager > >-Djava.security.policy=/var/lib/tomcat5.5/conf/catalina.policy > >-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager > >-Djava.util.logging.config.file=/var/lib/tomcat5.5/conf/logging.properties > >org.apache.catalina.startup.Bootstrap > >// Now, comparing this to the above (successfull) install, I note that > >under the former tomcat is running under the java virtual machine, but > >here it is (trying to?) run as a java service. I haven't looked into > >this further, perhaps there is an easy solution? > >// To verify that things aren't working: > ># /etc/init.d/tomcat5.5 restart > >Starting Tomcat servlet engine: tomcat5.5. > ># /etc/init.d/tomcat5.5 status > >Tomcat servlet engine is not running. > >// Looking at the log files, there is only: > ># ls -l /var/log/tomcat5.5 > >total 0 > >prw------- 1 tomcat55 nogroup 0 2007-01-24 12:52 catalina.out > >// > >// So, for now I will table the above and return to the server in > >which tomcat IS working. > ># /etc/init.d/tomcat5.5 status > >#Tomcat servlet engine is running with Java pid > >/var/lib/tomcat5.5/temp/tomcat5.5.pid > >// And this is a great time to point out where I and/or Debian diverge > >from the Nutch tutorial at > >http://lucene.apache.org/nutch/tutorial8.html namely: > >1. Starting|Stoping Tomcat (and with it Catalina) may be achieved > >using /etc/init.d/tomcat5.5 start|stop as noted above (where the > >tutorial wants you to `sh catalina.sh start` (Will this come back to > >bite me later when nutch's files can't be found? We shall see) > >2. Config file and webapp paths. This is a REALLY BIG DEAL. Debian > >has its own location for important Catalina configuration files and > >for the Tomcat webapps root: > ># ls /etc/tomcat5.5 > >policy.d server.xml web.xml > ># ls /etc/tomcat5.5/policy.d > >01system.policy 02debian.policy 03catalina.policy 04webapps.policy > >04webapps.policy 50user.policy > >// These files, collectively provide the content of > >/var/lib/tomcat5.5/catalina.policy but are used instead. I discovered > >this a few steps down the line when changes to the catalina.policy > >were ignored but those in the /etc/tomcat5.5/policy.d were > >implemented. > >// Similarly, the root application 'webapps' path is not used under > >(this) Debian. Instead, the path is: > >// /usr/share/tomcat5.5-webapps > >#ls /usr/share/tomcat5.5-webapps > >ROOT balancer.xml sample.war tomcat-docs > >webdav.xml > >ROOT.xml jsp-examples servlets-examples tomcat-docs.xml > >balancer jsp-examples.xml servlets-examples.xml webdav > >// Not surprisingly, these are applications provided by the deb > >package tomcat5.5-webapps. > >// > >// Ok, so I am ready to point my browser to the Tomcat home at > >http://localhost:8080 > >// Well, that fails with a standard 'unable to connect' > >// So, what port do I really want? Turning to the conf files in the > >correct /etc/tomcat5.5 directory > >// Reviewing /etc/tomcat5.5/server.xml I discover ... Connector > >port="8180" > >// So I return to my browser and point to http://localhost:8180 > >// "If you're seeing this page via a web browser, it means you've set > >up Tomcat successfully. Congratulations!" > >// All Right! I'm on a roll and life is good. > >// Returning to the Nutch tutorial at > >http://lucene.apache.org/nutch/tutorial8.html (skipping for now the > >indexing and crawling sections and jumping down to the web search > >section) I am instructed to 'rm -rf ~/local/tomcat/webapps/ROOT*' and > >'cp nutch*.war ~/local/tomcat/webapps/ROOT.war' (remembering that in > >the case of my Debian system the webapps path is > >/usr/share/tomcat5-webapps rather than ~/local/tomcat/webapps). > >// Wait a minute! After struggling to get Tomcat running, I am > >instructed to throw away all of its webapps in the hope of having > >Nutch work in the future. I think not. Instead I shouldn't I prefer > >to install Nutch as one among many applications? Yes I have seen > >discussion threads that indicate various files and paths within Nutch > >are hard-wired to ROOT, and I notice here: > >http://wiki.apache.org/nutch/Nutch_-_The_Java_Search_Engine that > >quote "It is not clear why the developers designed the application to > >run in the root context. However it is possible to modify the > >application to enable it to be deployed normally." And so I pin my > >hopes on this. > >// After all, I have the Tomcat Manager at my disposal > >http://localhost:8180/manager/html from my Tomcat home page, so I > >choose to use this in an attempt to install nutch. > >// I must grant myself permission to access the Tomcat Manager pages, > >and as instructed in (reference?) do so by modifying > >// /usr/share/tomcat5.5/conf/tomcat-users.xml to include the line: > >// <user username="me" password="*****" roles="manager"/> > >// Granted access to the Tomcat Manager I can now list available > >applications and not surprisingly find that those provided by the deb > >tomcat5.5-webapps package are both listed and functional. > >// Thus, Java and Tomcat are installed and verified to be functional. > >It is time to turn my attention to: > >// > >III. Acquire, configure and install Nutch; Build a test index and > >run a test crawl. > >// I follow the tutorial at > >http://wiki.apache.org/nutch/Nutch_-_The_Java_Search_Engine through > >Section 3.2 without complication and verify I have a successful test > >index and crawl. > >#cd /home/me/nutch-0.8.1 > >#bin/nutch org.apache.nutch.searcher.NutchBean blahBlahBlah > >Total Hits: 8 > >... > >#bin/nutch readdb testcrawl/crawldb -stats > >CrawlDb statistics start: testcrawl/crawldb > >Statistics for CrawlDb: testcrawl/crawldb > >TOTAL urls: 4727 > >... > >CrawlDb statistics: done > >// > >// OK, to recap: The Tomcat Server is fully functional, as is Nutch > >as a stand-alone. It is time to: > > > >IV. Install Nutch as a Tomcat application. > >// As noted above, I ignore the advice to wipe Tomcat's ROOT context, > >opting to (hopefully) install Nutch as one application among many. > >// I have read that placing a WAR in the webapps folder will result in > >it being extracted automatically upon Tomcat's next restart, so I > ># cp /home/me/nutch-0.8.1/nutch-0.8.1.war > >/usr/share/tomcat5.5-webapps/nutch-0.8.1.war > >// I also create the context file > >/usr/share/tomcat5.5-webapps/nutch-0.8.1.xml the contents of which are > ><Context path="/nutch-0.8.1" > >docBase="/usr/share/tomcat5.5-webapps/nutch-0.8.1" > > debug="0" privileged="false" allowLinking="true"> > ></Context> > >// I restart Tomcat > ># /etc/init.d/tomcat5.5 restart > >// Contrary to expectations, the nutch-0.8.1.war file was *NOT* > >extracted. I do so manually > ># mkdir nutch-0.8.1 //(in /usr/share/tomcat5.5-webapps) > ># mv nutch-0.8.1.war nutch-0.8.1 // move the WAR to the folder > ># cd nutch-0.8.1 > ># jar -xvf nutch-0.8.1.war > ># /etc/init.d/tomcat5.5 restart > >// I return to my browser and the Tomcat Manager page and 'List > >Applications' > >// I find an entry for nutch-0.8.1! and click 'start' > >// A message is returned: 'OK - Started application at context path > >/nutch-0.8.1' > >// Life is good! > >// I point my browser to the Nutch home page > >http://localhost:8180/nutch-0.8.1 which redirects me slightly to > >// http://localhost:8180/nutch-0.8.1/en > >// I enter a search term and click 'search' > >// I get an error dump indicating permission errors, which, thanks to > >http://nutch.sourceforge.net/cgi-bin/twiki/view/Main/GettingNutchRunningOnDebian > > I can correct. > >// Remembering the location of Tomcat's configuration files under > >/etc/tomcat5.5/policy.d I edit 04webapps.policy and add the following > >lines: > >grant codeBase "file:/usr/share/tomcat5.5-webapps/nutch-0.8.1/-" { > > permission java.util.PropertyPermission "user.dir", "read"; > > permission java.io.FilePermission > >"/home/me/nutch-0.8.1/testcrawl/*" , "read"; > >}; > >// Restart Tomcat > ># /etc/init.d/tomcat5.5 restart > >// Try the search again > >// And receive an HTTP 500 error which begins: > > > >exception > >org.apache.jasper.JasperException: Exception in JSP: /search.jsp:49 > > > >46: --%> > >47: > >48: <% > >49: NutchBean bean = NutchBean.get(application, nutchConf); > >50: // set the character encoding to use when interpreting request values > >51: request.setCharacterEncoding("UTF-8"); > >52: > > > >// And a log file which reads in part: > >2007-01-24 17:38:08,470 INFO NutchBean - creating new bean > >2007-01-24 17:38:08,472 INFO NutchBean - opening merged index in > >/home/walker/nutch-0.8.1/crawl2/index > >// Everything is fine until here, but: > >2007-01-24 17:38:08,477 ERROR [jsp] - Servlet.service() for servlet > >jsp threw exception > >java.lang.NoClassDefFoundError > > at > > > > org.apache.nutch.searcher.IndexSearcher.getDirectory(IndexSearcher.java:83) > > at > > org.apache.nutch.searcher.IndexSearcher.<init>(IndexSearcher.java:70) > > at org.apache.nutch.searcher.NutchBean.init(NutchBean.java:118) > > at org.apache.nutch.searcher.NutchBean.<init>(NutchBean.java:105) > > at org.apache.nutch.searcher.NutchBean.<init>(NutchBean.java:83) > > at org.apache.nutch.searcher.NutchBean.get(NutchBean.java:70) > > > >// So I am ready to start digging in to this error under the > >assumption that it may be as simple as a path error. > >// However, I instead spend a couple hours preparing this narrative > >and post it up to the Nutch user's list. > > > >// I will gladly reformat this for inclusion in the Wiki if that > >should prove of interest to anybody. Naturally I would hope to have a > >complete solution in hand, and would appreciate any help along the > >way. > > > >S.W. > >Middle Fork Geographic Information Services > >middleforkgis-att-gmail-dott-comm > >24 Jan 2007 > ></pre> > > -- *--* Mail: [EMAIL PROTECTED] *--* Voice: 206.892.6269 *--* Cell: 206.225.0154 *--* HTTP://real.com -------------------------------------- - - - - - - - R e a l - - - - - - - -
signature.asc
Description: Digital signature
