Ok,

I could not find any crawl_generate or crawl_parse Folder. Also I didn't find Catalina.out on my whole System?!?!

One thing I won't understand is the fact that nutch should create my folder structure. If there is a fault in it, just like the missing folders or the 'db' folder which should normally be 'linkdb', how can I fix this. I didn't change anything at the structure by my own so it must have been created by nutch directly... Any idea how this could happen?

Thanks for your time ;)

--Erik

Gal Nitzan schrieb:

Well I guess that db is linkdb for ver 0.7 .

Any way there is not much info maybe you can find more info in the
Catalina.out ...

One more thing to look for just maybe it is the reason (long shut)... check
each of your segment folders and verify that it contains all the 5 folders
i.e. content,crawl_generate,crawl_parse,parse_data,parse_text

HTH

Gal.

-----Original Message-----
From: Erik Höschler [mailto:[EMAIL PROTECTED] Sent: Friday, January 26, 2007 5:58 PM
To: [email protected]
Subject: Re: Problems Searching an Index with Nutch

Hi,

I checked my FolderStructure and everything seems to be correct...

:/opt/nutch/crawl.db# l
insgesamt 8
drwxr-xr-x   3 root root   53 2007-01-19 14:11 db
drwxr-xr-x   2 root root 4096 2007-01-19 14:18 index
drwxr-xr-x  12 root root 4096 2007-01-26 15:06 segments

I'm not sure if I've ever had a linkdb Folder or did you mean the db folder listed above?

Greetings,
Erik

Gal Nitzan schrieb:
Hi,

I'm not sure but it seems to me you are missing the linkdb and segments
folder. It should be located on the same level as the index folder.

HTH/

Gal

-----Original Message-----
From: Erik Höschler [mailto:[EMAIL PROTECTED] Sent: Friday, January 26, 2007 5:04 PM
To: [email protected]
Cc: Erik
Subject: Problems Searching an Index with Nutch

Hi,

I'm running Nutch-0.7.2. I created an Index for my local Lan which consists of 45.000 Pages. I can inspect this Index with Luke an everything looks fine. When I try to start a search Query with Nutch I can see the following Exception in my JBOSS Logfile (at the End of the Log).


//Here I'm redploying the Nutch.war Archive....
2007-01-26 15:55:06,611 INFO [org.jboss.web.tomcat.tc5.TomcatDeployer] deploy, ctxPath=/nutch,
warUrl=file:/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/
2007-01-26 15:55:06,831 DEBUG [tomcat.localhost./nutch.Context] Starting tomcat.localhost./nutch.Context 2007-01-26 15:55:06,832 DEBUG [tomcat.localhost./nutch.Context] Configuring default Resources 2007-01-26 15:55:06,836 DEBUG [tomcat.localhost./nutch.Context] Processing standard container startup 2007-01-26 15:55:06,844 DEBUG [tomcat.localhost./nutch.Context] Setting deployment descriptor public ID to '-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN' 2007-01-26 15:55:06,862 DEBUG [tomcat.localhost./nutch.Context] Setting deployment descriptor public ID to '-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN' 2007-01-26 15:55:06,866 DEBUG [tomcat.localhost./nutch.Context] Posting standard context attributes 2007-01-26 15:55:06,866 DEBUG [tomcat.localhost./nutch.Context] Configuring application event listeners 2007-01-26 15:55:06,866 DEBUG [tomcat.localhost./nutch.Context] Sending application start events 2007-01-26 15:55:06,866 DEBUG [tomcat.localhost./nutch.Context] Starting filters 2007-01-26 15:55:06,866 DEBUG [tomcat.localhost./nutch.Context] Starting filter 'CommonHeadersFilter' 2007-01-26 15:55:06,867 DEBUG [tomcat.localhost./nutch.Context] Starting completed //Archive successfully loaded...?!?! 2007-01-26 15:55:06,867 DEBUG [tomcat.localhost./nutch.Context] Checking for
jboss.web:j2eeType=WebModule,name=//localhost/nutch,J2EEApplication=none,J2E
EServer=none


//Here I startet a query in my Webbrowser...
2007-01-26 15:55:53,585 INFO [STDOUT] 070126 155553 parsing
file:/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF
/classes/nutch-default.xml
2007-01-26 15:55:53,591 INFO [STDOUT] 070126 155553 parsing
file:/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF
/classes/nutch-site.xml
2007-01-26 15:55:53,599 INFO [STDOUT] 070126 155553 Plugins: looking in:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
ses/plugins
2007-01-26 15:55:53,600 INFO [STDOUT] 070126 155553 not including:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
ses/plugins/clustering-carrot2
2007-01-26 15:55:53,600 INFO [STDOUT] 070126 155553 not including:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
ses/plugins/creativecommons
2007-01-26 15:55:53,600 INFO [STDOUT] 070126 155553 parsing:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
ses/plugins/index-basic/plugin.xml
2007-01-26 15:55:53,607 INFO [STDOUT] 070126 155553 impl: point=org.apache.nutch.indexer.IndexingFilter class=org.apache.nutch.indexer.basic.BasicIndexingFilter 2007-01-26 15:55:53,609 INFO [STDOUT] 070126 155553 not including:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
ses/plugins/index-more
2007-01-26 15:55:53,609 INFO [STDOUT] 070126 155553 not including:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
ses/plugins/language-identifier
2007-01-26 15:55:53,609 INFO [STDOUT] 070126 155553 parsing:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
ses/plugins/nutch-extensionpoints/plugin.xml
2007-01-26 15:55:53,612 INFO [STDOUT] 070126 155553 not including:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
ses/plugins/ontology
2007-01-26 15:55:53,612 INFO [STDOUT] 070126 155553 not including:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
ses/plugins/parse-ext
2007-01-26 15:55:53,613 INFO [STDOUT] 070126 155553 parsing:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
ses/plugins/parse-html/plugin.xml
2007-01-26 15:55:53,614 INFO [STDOUT] 070126 155553 impl: point=org.apache.nutch.parse.Parser class=org.apache.nutch.parse.html.HtmlParser 2007-01-26 15:55:53,615 INFO [STDOUT] 070126 155553 not including:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
ses/plugins/parse-js
2007-01-26 15:55:53,615 INFO [STDOUT] 070126 155553 not including:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
ses/plugins/parse-msword
2007-01-26 15:55:53,615 INFO [STDOUT] 070126 155553 not including:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
ses/plugins/parse-pdf
2007-01-26 15:55:53,615 INFO [STDOUT] 070126 155553 not including:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
ses/plugins/parse-rss
2007-01-26 15:55:53,615 INFO [STDOUT] 070126 155553 parsing:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
ses/plugins/parse-text/plugin.xml
2007-01-26 15:55:53,617 INFO [STDOUT] 070126 155553 impl: point=org.apache.nutch.parse.Parser class=org.apache.nutch.parse.text.TextParser 2007-01-26 15:55:53,617 INFO [STDOUT] 070126 155553 not including:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
ses/plugins/protocol-file
2007-01-26 15:55:53,618 INFO [STDOUT] 070126 155553 not including:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
ses/plugins/protocol-ftp
2007-01-26 15:55:53,618 INFO [STDOUT] 070126 155553 parsing:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
ses/plugins/protocol-http/plugin.xml
2007-01-26 15:55:53,619 INFO [STDOUT] 070126 155553 impl: point=org.apache.nutch.protocol.Protocol class=org.apache.nutch.protocol.http.Http 2007-01-26 15:55:53,620 INFO [STDOUT] 070126 155553 not including:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
ses/plugins/protocol-httpclient
2007-01-26 15:55:53,620 INFO [STDOUT] 070126 155553 parsing:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
ses/plugins/query-basic/plugin.xml
2007-01-26 15:55:53,622 INFO [STDOUT] 070126 155553 impl: point=org.apache.nutch.searcher.QueryFilter class=org.apache.nutch.searcher.basic.BasicQueryFilter 2007-01-26 15:55:53,622 INFO [STDOUT] 070126 155553 not including:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
ses/plugins/query-more
2007-01-26 15:55:53,622 INFO [STDOUT] 070126 155553 parsing:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
ses/plugins/query-site/plugin.xml
2007-01-26 15:55:53,624 INFO [STDOUT] 070126 155553 impl: point=org.apache.nutch.searcher.QueryFilter class=org.apache.nutch.searcher.site.SiteQueryFilter 2007-01-26 15:55:53,624 INFO [STDOUT] 070126 155553 parsing:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
ses/plugins/query-url/plugin.xml
2007-01-26 15:55:53,626 INFO [STDOUT] 070126 155553 impl: point=org.apache.nutch.searcher.QueryFilter class=org.apache.nutch.searcher.url.URLQueryFilter 2007-01-26 15:55:53,626 INFO [STDOUT] 070126 155553 not including:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
ses/plugins/urlfilter-prefix
2007-01-26 15:55:53,626 INFO [STDOUT] 070126 155553 parsing:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
ses/plugins/urlfilter-regex/plugin.xml
2007-01-26 15:55:53,628 INFO [STDOUT] 070126 155553 impl: point=org.apache.nutch.net.URLFilter class=org.apache.nutch.net.RegexURLFilter
2007-01-26 15:55:53,639 INFO  [STDOUT] 070126 155553 10 creating new bean
2007-01-26 15:55:53,640 INFO [STDOUT] 070126 155553 10 opening segment indexes in /srv/opt/nutch-0.7.2/crawl.db/segments 2007-01-26 15:55:53,652 ERROR [org.jboss.web.localhost.Engine] StandardWrapperValve[jsp]: Servlet.service() for servlet jsp threw
exception
java.lang.ArrayIndexOutOfBoundsException



In my Browser i got the following Error ...


  HTTP Status 500 -

------------------------------------------------------------------------

*type* Exception report

*message*

*description* _The server encountered an internal error () that prevented it from fulfilling this request._

*exception*

org.apache.jasper.JasperException
        

org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:3
72)
        
org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:292)
        org.apache.jasper.servlet.JspServlet.service(JspServlet.java:236)
        javax.servlet.http.HttpServlet.service(HttpServlet.java:810)
        

org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilter.ja
va:75)

*root cause*

java.lang.ArrayIndexOutOfBoundsException

*note* _The full stack trace of the root cause is available in the Apache Tomcat/5.0.28 logs._

------------------------------------------------------------------------


      Apache Tomcat/5.0.28



I also tested this Search on a newly created Index ( a small one ) but got the same error. I Also tried to run Nutch-0.8.1 but still the same. Also I couldn't find any information about this error and now I don't know what to do. Maybe you have got a idea...

Thanks in advance...

Yours sincerely,
Erik H.






Reply via email to