[ 
https://issues.apache.org/jira/browse/LUCENE-590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-590.
--------------------------------

       Resolution: Fixed
    Fix Version/s: 4.0
                   3.1

Committed revision 1031467, 1031468 (3x)
Thanks Curtis!

> Demo HTML parser gives incorrect summaries when title is repeated as a heading
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-590
>                 URL: https://issues.apache.org/jira/browse/LUCENE-590
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Examples
>    Affects Versions: 2.0.0
>            Reporter: Curtis d'Entremont
>            Assignee: Robert Muir
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-590.patch
>
>
> If you have an html document where the title is repeated as a heading at the 
> top of the document, the HTMLParser will return the title as the summary, 
> ignoring everything else that was added to the summary. Instead, it should 
> keep the rest of the summary and chop off the title part at the beginning 
> (essentially the opposite). I don't see any benefit to repeating the title in 
> the summary for any case.
> In HTMLParser.jj's getSummary():
>     String sum = summary.toString().trim();
>     String tit = getTitle();
>     if (sum.startsWith(tit) || sum.equals(""))
>       return tit;
>     else
>       return sum;
> change it to: (* denotes a line that has changed)
>     String sum = summary.toString().trim();
>     String tit = getTitle();
> *    if (sum.startsWith(tit))             // don't repeat title in summary
> *      return sum.substring(tit.length()).trim();
>     else
>       return sum;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to