[ 
http://issues.apache.org/jira/browse/NUTCH-292?page=comments#action_12413778 ] 

Stefan Neufeind commented on NUTCH-292:
---------------------------------------

That patch is for the 0.7-branch, right? In 0.8-dev you'd want to do that in 
BasicSummarizer.java. But to me it looks like something similar is already in 
place:

        // Iterate through as long as we're before the end of
        // the document and we haven't hit the max-number-of-items
        // -in-a-summary.
        //
        while ((j < endToken) && (j - startToken < sumLength)) {

But I also suspect it might have something to do with tokens. What I 
experienced is that several search-results currently contain arbitrary binary 
data. Those are the cases where a parser-plugin has "failed" and where 
parse-text was used as a fallback. If I'm right this might lead to actually 
quite large tokens because no whitespace is found in a row of characters.

@Marcel: Thank you for the fix anyway ... you help is very much appreciated.

> OpenSearchServlet: OutOfMemoryError: Java heap space
> ----------------------------------------------------
>
>          Key: NUTCH-292
>          URL: http://issues.apache.org/jira/browse/NUTCH-292
>      Project: Nutch
>         Type: Bug

>   Components: web gui
>     Versions: 0.8-dev
>     Reporter: Stefan Neufeind
>     Priority: Critical
>  Attachments: summarizer.diff
>
> java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
>       
> org.apache.nutch.searcher.FetchedSegments.getSummary(FetchedSegments.java:203)
>       org.apache.nutch.searcher.NutchBean.getSummary(NutchBean.java:329)
>       
> org.apache.nutch.searcher.OpenSearchServlet.doGet(OpenSearchServlet.java:155)
>       javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
>       javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
> The URL I use is:
> [...]something[...]/opensearch?query=mysearch&start=0&hitsPerSite=3&hitsPerPage=20&sort=url
> It seems to be a problem specific to the date I'm working with. Moving the 
> start from 0 to 10 or changing the query works fine.
> Or maybe it doesn't have to do with sorting but it's just that I hit one "bad 
> search-result" that has a broken summary?
> !! The problem is repeatable. So if anybody has an idea where to search / 
> what to fix, I can easily try that out !!

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



-------------------------------------------------------
All the advantages of Linux Managed Hosting--Without the Cost and Risk!
Fully trained technicians. The highest number of Red Hat certifications in
the hosting industry. Fanatical Support. Click to learn more
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=107521&bid=248729&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to