Erik, I'm not sure cause' I worked with your version long time ago (work with 0.9) so I'm not sure I'm right about the "crawl_generate and crawl_parse" folders in the segment structure.
However, two days ago I had that same exception when one of my segments was missing the parse folder in the segment. So maybe you need to parse the segments again (bin/nutch parse segments/segmentname) HTH, Gal. -----Original Message----- From: Erik Höschler [mailto:[EMAIL PROTECTED] Sent: Friday, January 26, 2007 6:21 PM To: [email protected] Subject: Re: Problems Searching an Index with Nutch Ok, I could not find any crawl_generate or crawl_parse Folder. Also I didn't find Catalina.out on my whole System?!?! One thing I won't understand is the fact that nutch should create my folder structure. If there is a fault in it, just like the missing folders or the 'db' folder which should normally be 'linkdb', how can I fix this. I didn't change anything at the structure by my own so it must have been created by nutch directly... Any idea how this could happen? Thanks for your time ;) --Erik Gal Nitzan schrieb: > Well I guess that db is linkdb for ver 0.7 . > > Any way there is not much info maybe you can find more info in the > Catalina.out ... > > One more thing to look for just maybe it is the reason (long shut)... check > each of your segment folders and verify that it contains all the 5 folders > i.e. content,crawl_generate,crawl_parse,parse_data,parse_text > > HTH > > Gal. > > -----Original Message----- > From: Erik Höschler [mailto:[EMAIL PROTECTED] > Sent: Friday, January 26, 2007 5:58 PM > To: [email protected] > Subject: Re: Problems Searching an Index with Nutch > > Hi, > > I checked my FolderStructure and everything seems to be correct... > > :/opt/nutch/crawl.db# l > insgesamt 8 > drwxr-xr-x 3 root root 53 2007-01-19 14:11 db > drwxr-xr-x 2 root root 4096 2007-01-19 14:18 index > drwxr-xr-x 12 root root 4096 2007-01-26 15:06 segments > > I'm not sure if I've ever had a linkdb Folder or did you mean the db > folder listed above? > > Greetings, > Erik > > Gal Nitzan schrieb: > >> Hi, >> >> I'm not sure but it seems to me you are missing the linkdb and segments >> folder. It should be located on the same level as the index folder. >> >> HTH/ >> >> Gal >> >> -----Original Message----- >> From: Erik Höschler [mailto:[EMAIL PROTECTED] >> Sent: Friday, January 26, 2007 5:04 PM >> To: [email protected] >> Cc: Erik >> Subject: Problems Searching an Index with Nutch >> >> Hi, >> >> I'm running Nutch-0.7.2. I created an Index for my local Lan which >> consists of 45.000 Pages. >> I can inspect this Index with Luke an everything looks fine. When I try >> to start a search Query with Nutch >> I can see the following Exception in my JBOSS Logfile (at the End of the >> Log). >> >> >> //Here I'm redploying the Nutch.war Archive.... >> 2007-01-26 15:55:06,611 INFO [org.jboss.web.tomcat.tc5.TomcatDeployer] >> deploy, ctxPath=/nutch, >> >> > warUrl=file:/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/ > >> 2007-01-26 15:55:06,831 DEBUG [tomcat.localhost./nutch.Context] Starting >> tomcat.localhost./nutch.Context >> 2007-01-26 15:55:06,832 DEBUG [tomcat.localhost./nutch.Context] >> Configuring default Resources >> 2007-01-26 15:55:06,836 DEBUG [tomcat.localhost./nutch.Context] >> Processing standard container startup >> 2007-01-26 15:55:06,844 DEBUG [tomcat.localhost./nutch.Context] Setting >> deployment descriptor public ID to '-//Sun Microsystems, Inc.//DTD Web >> Application 2.3//EN' >> 2007-01-26 15:55:06,862 DEBUG [tomcat.localhost./nutch.Context] Setting >> deployment descriptor public ID to '-//Sun Microsystems, Inc.//DTD Web >> Application 2.3//EN' >> 2007-01-26 15:55:06,866 DEBUG [tomcat.localhost./nutch.Context] Posting >> standard context attributes >> 2007-01-26 15:55:06,866 DEBUG [tomcat.localhost./nutch.Context] >> Configuring application event listeners >> 2007-01-26 15:55:06,866 DEBUG [tomcat.localhost./nutch.Context] Sending >> application start events >> 2007-01-26 15:55:06,866 DEBUG [tomcat.localhost./nutch.Context] Starting >> filters >> 2007-01-26 15:55:06,866 DEBUG [tomcat.localhost./nutch.Context] >> Starting filter 'CommonHeadersFilter' >> 2007-01-26 15:55:06,867 DEBUG [tomcat.localhost./nutch.Context] Starting >> completed //Archive successfully loaded...?!?! >> 2007-01-26 15:55:06,867 DEBUG [tomcat.localhost./nutch.Context] Checking >> for >> >> > jboss.web:j2eeType=WebModule,name=//localhost/nutch,J2EEApplication=none,J2E > >> EServer=none >> >> >> //Here I startet a query in my Webbrowser... >> 2007-01-26 15:55:53,585 INFO [STDOUT] 070126 155553 parsing >> >> > file:/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF > >> /classes/nutch-default.xml >> 2007-01-26 15:55:53,591 INFO [STDOUT] 070126 155553 parsing >> >> > file:/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF > >> /classes/nutch-site.xml >> 2007-01-26 15:55:53,599 INFO [STDOUT] 070126 155553 Plugins: looking >> in: >> >> > /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas > >> ses/plugins >> 2007-01-26 15:55:53,600 INFO [STDOUT] 070126 155553 not including: >> >> > /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas > >> ses/plugins/clustering-carrot2 >> 2007-01-26 15:55:53,600 INFO [STDOUT] 070126 155553 not including: >> >> > /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas > >> ses/plugins/creativecommons >> 2007-01-26 15:55:53,600 INFO [STDOUT] 070126 155553 parsing: >> >> > /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas > >> ses/plugins/index-basic/plugin.xml >> 2007-01-26 15:55:53,607 INFO [STDOUT] 070126 155553 impl: >> point=org.apache.nutch.indexer.IndexingFilter >> class=org.apache.nutch.indexer.basic.BasicIndexingFilter >> 2007-01-26 15:55:53,609 INFO [STDOUT] 070126 155553 not including: >> >> > /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas > >> ses/plugins/index-more >> 2007-01-26 15:55:53,609 INFO [STDOUT] 070126 155553 not including: >> >> > /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas > >> ses/plugins/language-identifier >> 2007-01-26 15:55:53,609 INFO [STDOUT] 070126 155553 parsing: >> >> > /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas > >> ses/plugins/nutch-extensionpoints/plugin.xml >> 2007-01-26 15:55:53,612 INFO [STDOUT] 070126 155553 not including: >> >> > /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas > >> ses/plugins/ontology >> 2007-01-26 15:55:53,612 INFO [STDOUT] 070126 155553 not including: >> >> > /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas > >> ses/plugins/parse-ext >> 2007-01-26 15:55:53,613 INFO [STDOUT] 070126 155553 parsing: >> >> > /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas > >> ses/plugins/parse-html/plugin.xml >> 2007-01-26 15:55:53,614 INFO [STDOUT] 070126 155553 impl: >> point=org.apache.nutch.parse.Parser >> class=org.apache.nutch.parse.html.HtmlParser >> 2007-01-26 15:55:53,615 INFO [STDOUT] 070126 155553 not including: >> >> > /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas > >> ses/plugins/parse-js >> 2007-01-26 15:55:53,615 INFO [STDOUT] 070126 155553 not including: >> >> > /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas > >> ses/plugins/parse-msword >> 2007-01-26 15:55:53,615 INFO [STDOUT] 070126 155553 not including: >> >> > /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas > >> ses/plugins/parse-pdf >> 2007-01-26 15:55:53,615 INFO [STDOUT] 070126 155553 not including: >> >> > /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas > >> ses/plugins/parse-rss >> 2007-01-26 15:55:53,615 INFO [STDOUT] 070126 155553 parsing: >> >> > /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas > >> ses/plugins/parse-text/plugin.xml >> 2007-01-26 15:55:53,617 INFO [STDOUT] 070126 155553 impl: >> point=org.apache.nutch.parse.Parser >> class=org.apache.nutch.parse.text.TextParser >> 2007-01-26 15:55:53,617 INFO [STDOUT] 070126 155553 not including: >> >> > /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas > >> ses/plugins/protocol-file >> 2007-01-26 15:55:53,618 INFO [STDOUT] 070126 155553 not including: >> >> > /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas > >> ses/plugins/protocol-ftp >> 2007-01-26 15:55:53,618 INFO [STDOUT] 070126 155553 parsing: >> >> > /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas > >> ses/plugins/protocol-http/plugin.xml >> 2007-01-26 15:55:53,619 INFO [STDOUT] 070126 155553 impl: >> point=org.apache.nutch.protocol.Protocol >> class=org.apache.nutch.protocol.http.Http >> 2007-01-26 15:55:53,620 INFO [STDOUT] 070126 155553 not including: >> >> > /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas > >> ses/plugins/protocol-httpclient >> 2007-01-26 15:55:53,620 INFO [STDOUT] 070126 155553 parsing: >> >> > /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas > >> ses/plugins/query-basic/plugin.xml >> 2007-01-26 15:55:53,622 INFO [STDOUT] 070126 155553 impl: >> point=org.apache.nutch.searcher.QueryFilter >> class=org.apache.nutch.searcher.basic.BasicQueryFilter >> 2007-01-26 15:55:53,622 INFO [STDOUT] 070126 155553 not including: >> >> > /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas > >> ses/plugins/query-more >> 2007-01-26 15:55:53,622 INFO [STDOUT] 070126 155553 parsing: >> >> > /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas > >> ses/plugins/query-site/plugin.xml >> 2007-01-26 15:55:53,624 INFO [STDOUT] 070126 155553 impl: >> point=org.apache.nutch.searcher.QueryFilter >> class=org.apache.nutch.searcher.site.SiteQueryFilter >> 2007-01-26 15:55:53,624 INFO [STDOUT] 070126 155553 parsing: >> >> > /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas > >> ses/plugins/query-url/plugin.xml >> 2007-01-26 15:55:53,626 INFO [STDOUT] 070126 155553 impl: >> point=org.apache.nutch.searcher.QueryFilter >> class=org.apache.nutch.searcher.url.URLQueryFilter >> 2007-01-26 15:55:53,626 INFO [STDOUT] 070126 155553 not including: >> >> > /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas > >> ses/plugins/urlfilter-prefix >> 2007-01-26 15:55:53,626 INFO [STDOUT] 070126 155553 parsing: >> >> > /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas > >> ses/plugins/urlfilter-regex/plugin.xml >> 2007-01-26 15:55:53,628 INFO [STDOUT] 070126 155553 impl: >> point=org.apache.nutch.net.URLFilter >> class=org.apache.nutch.net.RegexURLFilter >> 2007-01-26 15:55:53,639 INFO [STDOUT] 070126 155553 10 creating new bean >> 2007-01-26 15:55:53,640 INFO [STDOUT] 070126 155553 10 opening segment >> indexes in /srv/opt/nutch-0.7.2/crawl.db/segments >> 2007-01-26 15:55:53,652 ERROR [org.jboss.web.localhost.Engine] >> StandardWrapperValve[jsp]: Servlet.service() for servlet jsp threw >> > exception > >> java.lang.ArrayIndexOutOfBoundsException >> >> >> >> In my Browser i got the following Error ... >> >> >> HTTP Status 500 - >> >> ------------------------------------------------------------------------ >> >> *type* Exception report >> >> *message* >> >> *description* _The server encountered an internal error () that >> prevented it from fulfilling this request._ >> >> *exception* >> >> org.apache.jasper.JasperException >> >> >> > org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:3 > >> 72) >> >> org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:292) >> org.apache.jasper.servlet.JspServlet.service(JspServlet.java:236) >> javax.servlet.http.HttpServlet.service(HttpServlet.java:810) >> >> >> > org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilter.ja > >> va:75) >> >> *root cause* >> >> java.lang.ArrayIndexOutOfBoundsException >> >> *note* _The full stack trace of the root cause is available in the >> Apache Tomcat/5.0.28 logs._ >> >> ------------------------------------------------------------------------ >> >> >> Apache Tomcat/5.0.28 >> >> >> >> I also tested this Search on a newly created Index ( a small one ) but >> got the same error. I Also tried to run Nutch-0.8.1 but still the same. >> Also I couldn't find any information about this error and now I don't >> know what to do. Maybe you have got a idea... >> >> Thanks in advance... >> >> Yours sincerely, >> Erik H. >> >> >> >> > > > > ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
