[
http://issues.apache.org/jira/browse/NUTCH-292?page=comments#action_12413982 ]
Marcel Schnippe commented on NUTCH-292:
---------------------------------------
Hi Stefan,
Thanks for trying out the Patch. Yes, you were right, it was for 0.7. I should
definitly switch, but i made so many custom changes.
The proper place to apply would be in summary-basic.getTokens like in
private Token[] getTokens(String text) {
ArrayList result = new ArrayList();
TokenStream ts = analyzer.tokenStream("content", new StringReader(text));
Token token = null;
- while (true) {
+ while (result.size()<token_deep) {
try {
token = ts.next();
} catch (IOException e) {
token = null;
}
if (token == null) { break; }
result.add(token);
}
try {
ts.close();
} catch (IOException e) {
// ignore
}
return (Token[]) result.toArray(new Token[result.size()]);
}
<humor>Beware of the above code. I have only proven it correct, not tested it
(D.Knuth)</humor>
> OpenSearchServlet: OutOfMemoryError: Java heap space
> ----------------------------------------------------
>
> Key: NUTCH-292
> URL: http://issues.apache.org/jira/browse/NUTCH-292
> Project: Nutch
> Type: Bug
> Components: web gui
> Versions: 0.8-dev
> Reporter: Stefan Neufeind
> Priority: Critical
> Attachments: summarizer.diff
>
> java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
>
> org.apache.nutch.searcher.FetchedSegments.getSummary(FetchedSegments.java:203)
> org.apache.nutch.searcher.NutchBean.getSummary(NutchBean.java:329)
>
> org.apache.nutch.searcher.OpenSearchServlet.doGet(OpenSearchServlet.java:155)
> javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
> javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
> The URL I use is:
> [...]something[...]/opensearch?query=mysearch&start=0&hitsPerSite=3&hitsPerPage=20&sort=url
> It seems to be a problem specific to the date I'm working with. Moving the
> start from 0 to 10 or changing the query works fine.
> Or maybe it doesn't have to do with sorting but it's just that I hit one "bad
> search-result" that has a broken summary?
> !! The problem is repeatable. So if anybody has an idea where to search /
> what to fix, I can easily try that out !!
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
-------------------------------------------------------
All the advantages of Linux Managed Hosting--Without the Cost and Risk!
Fully trained technicians. The highest number of Red Hat certifications in
the hosting industry. Fanatical Support. Click to learn more
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=107521&bid=248729&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers