[ 
http://issues.apache.org/jira/browse/NUTCH-134?page=comments#action_12378063 ] 

Steven Yelton commented on NUTCH-134:
-------------------------------------

Andrzej, my solution to this problem was to fix the comparator to actually 
compare the fragments if numFragments() was the same for both excerpts.  Sounds 
like there are grander plans afoot, but this got me past my problem of only 
seeing one summary fragment when I actually had 3 (they were seen as equal so 
only the last was on the set).

Steven

> Summarizer doesn't select the best snippets
> -------------------------------------------
>
>          Key: NUTCH-134
>          URL: http://issues.apache.org/jira/browse/NUTCH-134
>      Project: Nutch
>         Type: Bug

>   Components: searcher
>     Versions: 0.7.2, 0.7.1, 0.7, 0.8-dev
>     Reporter: Andrzej Bialecki 

>
> Summarizer.java tries to select the best fragments from the input text, where 
> the frequency of query terms is the highest. However, the logic in line 223 
> is flawed in that the excerptSet.add() operation will add new excerpts only 
> if they are not already present - the test is performed using the Comparator 
> that compares only the numUniqueTokens. This means that if there are two or 
> more excerpts, which score equally high, only the first of them will be 
> retained, and the rest of equally-scoring excerpts will be discarded, in 
> favor of other excerpts (possibly lower scoring).
> To fix this the Set should be replaced with a List + a sort operation. To 
> keep the relative position of excerpts in the original order the Excerpt 
> class should be extended with an "int order" field, and the collected 
> excerpts should be sorted in that order prior to adding them to the summary.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to