[ http://issues.apache.org/jira/browse/NUTCH-288?page=all ]
Stefan Neufeind updated NUTCH-288:
----------------------------------
Attachment: NUTCH-288-OpenSearch-fix.patch
This patch includes Doug's one-line fix to prevent an exception.
Also it does go back page by page until you get to the last result-page. The
start-value returned in the RSS-feed is correct afterwards(!). This easily
allows you to check whether the requested result-start and the one received are
identical - otherwise you are on the last page and were "redirected" - and now
know that you don't need to display any pages in your page-navigation following
this one :-)
Applies and works fine for me.
> hitsPerSite-functionality "flawed": problems writing a page-navigation
> ----------------------------------------------------------------------
>
> Key: NUTCH-288
> URL: http://issues.apache.org/jira/browse/NUTCH-288
> Project: Nutch
> Type: Bug
> Components: web gui
> Versions: 0.8-dev
> Reporter: Stefan Neufeind
> Attachments: NUTCH-288-OpenSearch-fix.patch
>
> The deduplication-functionality on a per-site-basis (hitsPerSite = 3) leads
> to problems when trying to offer a page-navigation (e.g. allow the user to
> jump to page 10). This is because dedup is done after fetching.
> RSS shows a maximum number of 7763 documents (that is without dedup!), I set
> it to display 10 items per page. My "naive" approach was to estimate I have
> 7763/10 = 777 pages. But already when moving to page 3 I got no more
> searchresults (I guess because of dedup). And when moving to page 10 I got
> an exception (see below).
> 2006-05-25 16:24:43 StandardWrapperValve[OpenSearch]: Servlet.service() for
> servlet OpenSearch threw exception
> java.lang.NegativeArraySizeException
> at org.apache.nutch.searcher.Hits.getHits(Hits.java:65)
> at
> org.apache.nutch.searcher.OpenSearchServlet.doGet(OpenSearchServlet.java:149)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
> at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:252)
> at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
> at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:214)
> at
> org.apache.catalina.core.StandardValveContext.invokeNext(StandardValveContext.java:104)
> at
> org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:520)
> at
> org.apache.catalina.core.StandardContextValve.invokeInternal(StandardContextValve.java:198)
> at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:152)
> at
> org.apache.catalina.core.StandardValveContext.invokeNext(StandardValveContext.java:104)
> at
> org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:520)
> at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:137)
> at
> org.apache.catalina.core.StandardValveContext.invokeNext(StandardValveContext.java:104)
> at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:118)
> at
> org.apache.catalina.core.StandardValveContext.invokeNext(StandardValveContext.java:102)
> at
> org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:520)
> at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> at
> org.apache.catalina.core.StandardValveContext.invokeNext(StandardValveContext.java:104)
> at
> org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:520)
> at
> org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:929)
> at
> org.apache.coyote.tomcat5.CoyoteAdapter.service(CoyoteAdapter.java:160)
> at
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:799)
> at
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.processConnection(Http11Protocol.java:705)
> at
> org.apache.tomcat.util.net.TcpWorkerThread.runIt(PoolTcpEndpoint.java:577)
> at
> org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:684)
> at java.lang.Thread.run(Thread.java:595)
> Only workaround I see for the moment: Fetching RSS without duplication, dedup
> myself and cache the RSS-result to improve performance. But a cleaner
> solution would imho be nice. Is there a performant way of doing deduplication
> and knowing for sure how many documents are available to view? For sure this
> would mean to dedup all search-results first ...
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
-------------------------------------------------------
All the advantages of Linux Managed Hosting--Without the Cost and Risk!
Fully trained technicians. The highest number of Red Hat certifications in
the hosting industry. Fanatical Support. Click to learn more
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=107521&bid=248729&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers