Re: Nutch recrawl script for 0.9 doesn't work with trunk. Help

Alexis Votta Thu, 20 Sep 2007 04:33:56 -0700

Hi Tomislav and Nutch users

I could not solve the problem with your instructions.


I crawled two times.  In re-crawl. It generated crawl/NEWindexes.
crawl/indexes was generated in 1st crawl.

I merged ==> bin/nutch merge crawl/index crawl/indexes/ crawl/NEWindexes/

Now search.jsp is showing error.
type Exception report

message

description The server encountered an internal error () that prevented
it from fulfilling this request.

exception

org.apache.jasper.JasperException: java.lang.RuntimeException:
java.lang.NullPointerException
        
org.apache.jasper.servlet.JspServletWrapper.handleJspException(JspServletWrapper.java:532)
        
org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:426)
        org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:320)
        org.apache.jasper.servlet.JspServlet.service(JspServlet.java:266)
        javax.servlet.http.HttpServlet.service(HttpServlet.java:803)

root cause

java.lang.RuntimeException: java.lang.NullPointerException
        
org.apache.nutch.searcher.FetchedSegments.getSummary(FetchedSegments.java:204)
        org.apache.nutch.searcher.NutchBean.getSummary(NutchBean.java:342)
        org.apache.jsp.search_jsp._jspService(search_jsp.java:247)
        org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)
        javax.servlet.http.HttpServlet.service(HttpServlet.java:803)
        
org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:384)
        org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:320)
        org.apache.jasper.servlet.JspServlet.service(JspServlet.java:266)
        javax.servlet.http.HttpServlet.service(HttpServlet.java:803)

root cause

java.lang.NullPointerException
        
org.apache.nutch.searcher.FetchedSegments.getSummary(FetchedSegments.java:159)
        
org.apache.nutch.searcher.FetchedSegments$SummaryThread.run(FetchedSegments.java:177)

Is there any Crawl guru who can help?

On 9/20/07, Tomislav Poljak <[EMAIL PROTECTED]> wrote:
> Hi,
> I had the same problem using re-crawl scripts from wiki. They all work
> fine with nutch versions up to 0.9 (0.9 included), but when using
> nutch-1.0-dev (from trunk) they brak at merge of indexes. Reason is that
> merge in nutch-0.9 (from re-crawl scripts):
>
> bin/nutch merge crawl/indexes crawl/NEWindexes
>
> did the merging of old indexes from crawl/indexes and the new indexes
> from crawl/NEWindexes and stored it in crawl/indexes. But with
> nutch-1.0-dev (from trunk) merge requires empty (new) output folder.
>
> Solution that works (I have tried it) is to do following:
>
> bin/nutch merge crawl/index crawl/indexes crawl/NEWindexes
>
> where crawl/index is new (output) folder, crawl/indexes is old indexes
> and crawl/NEWindexes is the new indexes. It is important to know that
> you can do this with as many indexes you want to merge (as many
> re-crawls), you only have to do:
>
> bin/nutch merge crawl/index crawl/indexes1 crawl/indexes2 ...
>
> but crawl/index must not exist (delete it or backup it).
>
> Nutch search web application will use merged index form crawl/index,
> this is from my web application log:
>
> 2007-09-09 20:30:58,949 INFO  searcher.NutchBean - creating new bean
> 2007-09-09 20:30:59,128 INFO  searcher.NutchBean - opening merged index
> in /home/nutch/test/trunk/crawl/index
>
>
> Hope this will help,
>
> Tomislav
>
>
>
> On Thu, 2007-09-20 at 14:54 +0800, Lyndon Maydwell wrote:
> > /nutch mergesegs $merged_segment -dir $segments
> > if [ $? -ne 0 ]
>
>

Re: Nutch recrawl script for 0.9 doesn't work with trunk. Help

Reply via email to