Hey Steve, 
I've got the Nutch running with Ubuntu and Sun Java 1.6
I'll try to review and help sometime next week.


Steve W. [EMAIL PROTECTED] wrote:
> I would like to report back that I have completed this process and
> have also cleaned up this message and reformatted it as a page under
> the Nutch Wiki entitled GettingNutchRunningWithDebian
> 
> Thank you to Justin Hartman at  http://justinhartman.com for the tip
> which made it all work.
> 
> 
> 
> 
> 
> On 1/24/07, Steve W. <[EMAIL PROTECTED]> wrote:
> >Partial success on the way to installing Nutch 0.8.1 With Debian Etch.
> >
> >http://mfgis.com/docs/nutchconfig.html
> >
> >I would like to relate here my progress towards implementing Nutch
> >0.8.1 on Debian Etch in hope of receiving help at the stage where I
> >have become stuck.
> >
> >So here goes:
> >Disclaimer:  I know little to nothing about the inner workings of
> >Java, and Tomcat & Nutch were completely unknown to me a week ago.
> >
> >0.  My OS
> ># uname -a
> >Linux  2.6.9-023stab033.6-enterprise #1 SMP Tue Nov 7 16:16:56 MSK
> >2006 i686 GNU/Linux
> ># cat /etc/debian_version
> >testing/unstable
> >
> >I.  Install Sun's Java
> >//Sun Java is available as a set of Debian packages and may be easily
> >installed using apt.  (To obtain Sun's Java, ensure that 'non-free' is
> >included in /etc/apt/sources.list)
> ># apt-get install sun-java5-bin sun-java5-demo sun-java-5jdk sun-java5-jre
> >
> >//Since there may be more than one flavor of Java on the system (e.g.
> >kaffe) ensure that Sun Java is the chosen alternative
> ># update-alternatives --config java   // then select sun java from the menu
> >
> >//If necessary edit /etc/profile to include the following lines:
> >JAVA_HOME=/usr/lib/jvm/java-1.5.0-sun-1.5.0.10
> >export JAVA_HOME
> >
> >II.  Install Tomcat5.5
> ># apt-get install tomcat5.5 libtomcat5.5-java tomcat5.5-admin tomcat5.5-web
> >// Hopefully, tomcat is installed and running, which I was able to verify:
> ># ps -ef |grep tomcat
> >tomcat55  8069     1  0 09:11 ?        00:00:00 su -p -s /bin/sh
> >tomcat55 -c /usr/sbin/rotatelogs
> >"/var/lib/tomcat5.5/logs/catalina_%F.log" 86400
> >tomcat55  8072  8069  0 09:11 ?        00:00:00 /usr/sbin/rotatelogs
> >/var/lib/tomcat5.5/logs/catalina_%F.log 86400
> >tomcat55  8103     1  0 09:11 ?        00:00:47
> >/usr/lib/jvm/java-1.5.0-sun-1.5.0.10/bin/java
> >-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
> >-Djava.util.logging.config.file=/var/lib/tomcat5.5/conf/logging.properties
> >-Djava.awt.headless=true -Xmx128M
> >-Djava.endorsed.dirs=/usr/share/tomcat5.5/common/endorsed -classpath
> >:/usr/lib/jvm/java-1.5.0-sun-1.5.0.10/jre//lib/jcert.jar:/usr/lib/jvm/java-1.5.0-sun-1.5.0.10/jre//lib/jnet.jar:/usr/lib/jvm/java-1.5.0-sun-1.5.0.10/jre//lib/jsse.jar:/usr/share/tomcat5.5/bin/bootstrap.jar:/usr/share/tomcat5.5/bin/commons-logging-api.jar
> >-Djava.security.manager
> >-Djava.security.policy==/var/lib/tomcat5.5/conf/catalina.policy
> >-Dcatalina.base=/var/lib/tomcat5.5
> >-Dcatalina.home=/usr/share/tomcat5.5
> >-Djava.io.tmpdir=/var/lib/tomcat5.5/temp
> >org.apache.catalina.startup.Bootstrap start
> >//
> >// So, while the above worked completely smoothly on the architecture
> >described above, I am stalled at this stage on a second debian machine
> >which is:
> >#uname -a
> >Linux amboro 2.6.15-1-486 #2 Mon Mar 6 15:19:16 UTC 2006 i686 GNU/Linux
> >#cat /etc/debian_version
> >4.0
> >// On this second machine, the previous #apt-get install tomcat5.5
> >libtomcat5.5-java tomcat5.5-admin tomcat5.5-web also forces the
> >install of the jsvc ("native application to launch java packages as
> >daemons") package, although at
> >http://packages.debian.org/testing/web/tomcat5.5  jsvc shows as a
> >suggested, not required (depends) package.
> >// Installing the packages yields the following messages:
> >Setting up jsvc (1.0.2~svn20061127-4) ...
> >Setting up libtomcat5.5-java (5.5.20-4) ...
> >Setting up tomcat5.5 (5.5.20-4) ...
> >Adding system user `tomcat55' (UID 108) ...
> >Adding new user `tomcat55' (UID 108) with group `nogroup' ...
> >Not creating home directory `/usr/share/tomcat5.5'.
> >Installing /var/lib/tomcat5.5/conf/tomcat-users.xml.
> >Starting Tomcat servlet engine: tomcat5.5.
> >Setting up tomcat5.5-admin (5.5.20-4) ...
> >invoke-rc.d: initscript tomcat5.5, action "status" failed.
> >Setting up tomcat5.5-webapps (5.5.20-4) ...
> >invoke-rc.d: initscript tomcat5.5, action "status" failed.
> >// So something is wrong.  Running:
> ># ps -ef |grep tomcat
> >// Shows multiple processes which look like:
> >root      9136     1  0 12:52 ?        00:00:00 jsvc.exec -user
> >tomcat55 -cp 
> >/usr/share/java/commons-daemon.jar:/usr/share/tomcat5.5/bin/bootstrap.jar
> >-outfile /var/lib/tomcat5.5/logs/catalina.out -errfile &1 -pidfile
> >/var/run/tomcat5.5.pid -Djava.awt.headless=true -Xmx128M
> >-Djava.endorsed.dirs=/usr/share/tomcat5.5/common/endorsed
> >-Dcatalina.base=/var/lib/tomcat5.5
> >-Dcatalina.home=/usr/share/tomcat5.5
> >-Djava.io.tmpdir=/var/lib/tomcat5.5/temp -Djava.security.manager
> >-Djava.security.policy=/var/lib/tomcat5.5/conf/catalina.policy
> >-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
> >-Djava.util.logging.config.file=/var/lib/tomcat5.5/conf/logging.properties
> >org.apache.catalina.startup.Bootstrap
> >// Now, comparing this to the above (successfull) install, I note that
> >under the former tomcat is running under the java virtual machine, but
> >here it is (trying to?) run as a java service.  I haven't looked into
> >this further, perhaps there is an easy solution?
> >// To verify that things aren't working:
> ># /etc/init.d/tomcat5.5 restart
> >Starting Tomcat servlet engine: tomcat5.5.
> ># /etc/init.d/tomcat5.5 status
> >Tomcat servlet engine is not running.
> >// Looking at the log files, there is only:
> ># ls -l /var/log/tomcat5.5
> >total 0
> >prw------- 1 tomcat55 nogroup 0 2007-01-24 12:52 catalina.out
> >//
> >// So, for now I will table the above and return to the server in
> >which tomcat IS working.
> ># /etc/init.d/tomcat5.5 status
> >#Tomcat servlet engine is running with Java pid
> >/var/lib/tomcat5.5/temp/tomcat5.5.pid
> >// And this is a great time to point out where I and/or Debian diverge
> >from the Nutch tutorial at
> >http://lucene.apache.org/nutch/tutorial8.html namely:
> >1.  Starting|Stoping Tomcat (and with it Catalina)  may be achieved
> >using /etc/init.d/tomcat5.5 start|stop as noted above (where the
> >tutorial wants you to `sh catalina.sh start`  (Will this come back to
> >bite me later when nutch's files can't be found?  We shall see)
> >2.  Config file and webapp paths.  This is a REALLY BIG DEAL.  Debian
> >has its own location for important Catalina configuration files and
> >for the Tomcat webapps root:
> ># ls /etc/tomcat5.5
> >policy.d  server.xml  web.xml
> ># ls /etc/tomcat5.5/policy.d
> >01system.policy  02debian.policy  03catalina.policy  04webapps.policy
> >04webapps.policy  50user.policy
> >// These files, collectively provide the content of
> >/var/lib/tomcat5.5/catalina.policy but are used instead.  I discovered
> >this a few steps down the line when changes to the catalina.policy
> >were ignored but those in the /etc/tomcat5.5/policy.d were
> >implemented.
> >// Similarly, the root application 'webapps' path is not used under
> >(this) Debian.  Instead, the path is:
> >// /usr/share/tomcat5.5-webapps
> >#ls /usr/share/tomcat5.5-webapps
> >ROOT      balancer.xml      sample.war             tomcat-docs      
> >webdav.xml
> >ROOT.xml  jsp-examples      servlets-examples      tomcat-docs.xml
> >balancer  jsp-examples.xml  servlets-examples.xml  webdav
> >// Not surprisingly, these are applications provided by the deb
> >package tomcat5.5-webapps.
> >//
> >// Ok, so I am ready to point my browser to the Tomcat home at
> >http://localhost:8080
> >// Well, that fails with a standard 'unable to connect'
> >// So, what port do I really want?  Turning to the conf files in the
> >correct /etc/tomcat5.5 directory
> >// Reviewing  /etc/tomcat5.5/server.xml I discover ... Connector 
> >port="8180"
> >// So I return to my browser and point to http://localhost:8180
> >// "If you're seeing this page via a web browser, it means you've set
> >up Tomcat successfully.  Congratulations!"
> >// All Right!  I'm on a roll and life is good.
> >// Returning to the Nutch tutorial at
> >http://lucene.apache.org/nutch/tutorial8.html (skipping for now the
> >indexing and crawling sections and jumping down to the web search
> >section) I am instructed to 'rm -rf ~/local/tomcat/webapps/ROOT*' and
> >'cp nutch*.war ~/local/tomcat/webapps/ROOT.war'   (remembering that in
> >the case of my Debian system the webapps path is
> >/usr/share/tomcat5-webapps rather than ~/local/tomcat/webapps).
> >// Wait a minute!  After struggling to get Tomcat running, I am
> >instructed to throw away all of its webapps in the hope of having
> >Nutch work in the future.  I think not.  Instead I shouldn't I prefer
> >to install Nutch as one among many applications?  Yes I have seen
> >discussion threads that indicate various files and paths within Nutch
> >are hard-wired to ROOT, and I notice here:
> >http://wiki.apache.org/nutch/Nutch_-_The_Java_Search_Engine  that
> >quote "It is not clear why the developers designed the application to
> >run in the root context. However it is possible to modify the
> >application to enable it to be deployed normally."  And so I pin my
> >hopes on this.
> >// After all, I have the Tomcat Manager at my disposal
> >http://localhost:8180/manager/html from my Tomcat home page, so I
> >choose to use this in an attempt to install nutch.
> >// I must grant myself permission to access the Tomcat Manager pages,
> >and as instructed in (reference?) do so by modifying
> >//    /usr/share/tomcat5.5/conf/tomcat-users.xml to include the line:
> >//    <user username="me" password="*****" roles="manager"/>
> >// Granted access to the Tomcat Manager I can now list available
> >applications and not surprisingly find that those provided by the deb
> >tomcat5.5-webapps package are both listed and functional.
> >// Thus, Java and Tomcat are installed and verified to be functional.
> >It is time to turn my attention to:
> >//
> >III.  Acquire, configure and  install Nutch;  Build a test index and
> >run a test crawl.
> >// I follow the tutorial at
> >http://wiki.apache.org/nutch/Nutch_-_The_Java_Search_Engine through
> >Section 3.2 without complication and verify I have a successful test
> >index and crawl.
> >#cd /home/me/nutch-0.8.1
> >#bin/nutch org.apache.nutch.searcher.NutchBean blahBlahBlah
> >Total Hits: 8
> >...
> >#bin/nutch readdb testcrawl/crawldb -stats
> >CrawlDb statistics start: testcrawl/crawldb
> >Statistics for CrawlDb: testcrawl/crawldb
> >TOTAL urls:     4727
> >...
> >CrawlDb statistics: done
> >//
> >// OK, to recap:  The Tomcat Server is fully functional, as is Nutch
> >as a stand-alone. It is time to:
> >
> >IV.  Install Nutch as a Tomcat application.
> >// As noted above, I ignore the advice to wipe Tomcat's ROOT context,
> >opting to (hopefully) install Nutch as one application among many.
> >// I have read that placing a WAR in the webapps folder will result in
> >it being extracted automatically upon Tomcat's next restart, so I
> ># cp /home/me/nutch-0.8.1/nutch-0.8.1.war
> >/usr/share/tomcat5.5-webapps/nutch-0.8.1.war
> >// I also create the context file
> >/usr/share/tomcat5.5-webapps/nutch-0.8.1.xml the contents of which are
> ><Context path="/nutch-0.8.1" 
> >docBase="/usr/share/tomcat5.5-webapps/nutch-0.8.1"
> >         debug="0" privileged="false" allowLinking="true">
> ></Context>
> >// I restart Tomcat
> ># /etc/init.d/tomcat5.5 restart
> >// Contrary to expectations, the nutch-0.8.1.war file was *NOT*
> >extracted.  I do so manually
> ># mkdir nutch-0.8.1  //(in /usr/share/tomcat5.5-webapps)
> ># mv nutch-0.8.1.war nutch-0.8.1  // move the WAR to the folder
> ># cd nutch-0.8.1
> ># jar -xvf nutch-0.8.1.war
> ># /etc/init.d/tomcat5.5 restart
> >// I return to my browser and the Tomcat Manager page and 'List 
> >Applications'
> >// I find an entry for nutch-0.8.1! and click 'start'
> >// A message is returned:  'OK - Started application at context path
> >/nutch-0.8.1'
> >// Life is good!
> >// I point my browser to the Nutch home page
> >http://localhost:8180/nutch-0.8.1 which redirects me slightly to
> >// http://localhost:8180/nutch-0.8.1/en
> >// I enter a search term and click 'search'
> >// I get an error dump indicating permission errors, which, thanks to
> >http://nutch.sourceforge.net/cgi-bin/twiki/view/Main/GettingNutchRunningOnDebian
> > I can correct.
> >// Remembering the location of Tomcat's configuration files under
> >/etc/tomcat5.5/policy.d I edit 04webapps.policy and add the following
> >lines:
> >grant codeBase "file:/usr/share/tomcat5.5-webapps/nutch-0.8.1/-" {
> >    permission java.util.PropertyPermission "user.dir", "read";
> >    permission java.io.FilePermission
> >"/home/me/nutch-0.8.1/testcrawl/*" , "read";
> >};
> >// Restart Tomcat
> ># /etc/init.d/tomcat5.5 restart
> >// Try the search again
> >// And receive an HTTP 500 error which begins:
> >
> >exception
> >org.apache.jasper.JasperException: Exception in JSP: /search.jsp:49
> >
> >46: --%>
> >47:
> >48: <%
> >49:   NutchBean bean = NutchBean.get(application, nutchConf);
> >50:   // set the character encoding to use when interpreting request values
> >51:   request.setCharacterEncoding("UTF-8");
> >52:
> >
> >// And a log file which reads in part:
> >2007-01-24 17:38:08,470 INFO  NutchBean - creating new bean
> >2007-01-24 17:38:08,472 INFO  NutchBean - opening merged index in
> >/home/walker/nutch-0.8.1/crawl2/index
> >// Everything is fine until here, but:
> >2007-01-24 17:38:08,477 ERROR [jsp] - Servlet.service() for servlet
> >jsp threw exception
> >java.lang.NoClassDefFoundError
> >        at 
> >        
> > org.apache.nutch.searcher.IndexSearcher.getDirectory(IndexSearcher.java:83)
> >        at 
> >        org.apache.nutch.searcher.IndexSearcher.<init>(IndexSearcher.java:70)
> >        at org.apache.nutch.searcher.NutchBean.init(NutchBean.java:118)
> >        at org.apache.nutch.searcher.NutchBean.<init>(NutchBean.java:105)
> >        at org.apache.nutch.searcher.NutchBean.<init>(NutchBean.java:83)
> >        at org.apache.nutch.searcher.NutchBean.get(NutchBean.java:70)
> >
> >// So I am ready to start digging in to this error under the
> >assumption that it may be as simple as a path error.
> >// However, I instead spend a couple hours preparing this narrative
> >and post it up to the Nutch user's list.
> >
> >// I will gladly reformat this for inclusion in the Wiki if that
> >should prove of interest to anybody.  Naturally I would hope to have a
> >complete solution in hand, and would appreciate any help along the
> >way.
> >
> >S.W.
> >Middle Fork Geographic Information Services
> >middleforkgis-att-gmail-dott-comm
> >24 Jan 2007
> ></pre>
> >

-- 
*--* Mail: [EMAIL PROTECTED]
*--* Voice: 206.892.6269
*--* Cell: 206.225.0154
*--* HTTP://real.com
--------------------------------------
- - - - - - - R e a l - - - - - - - -

Attachment: signature.asc
Description: Digital signature

Reply via email to