Hi there.

I'm from the Carrot2 project (a clustering front-end and components)
and we'd love to add a Nutch adapter to our project (or directly to
Nutch -- this is up to you). I've seen some posts that mentioned
Carrot2 -- glad to hear you want to experiment with it.

Anyway,  the adapter is actually already finished with an exception of
one thing: when I retrieve hits' summaries using:

((NutchBean)bean).getSummary(details, query)

The result is _already_ HTML-escaped. I'd rather have the access to
hit's content as a string, or to the summary as a string. Right now in
FetchedSegments class you have:

>   public String getSummary(HitDetails details, Query query)
>     throws IOException {
> 
>     String text = getSegment(details).getText(getDocNo(details));
> 
>     return new Summarizer().getSummary(text, query).toString();
>   }

And toString() on a Summary iterates over Fragments, appending them to
a StringBuffer... only the Fragment's toString method encodes
everything into HTML entities:

>   /** A fragment of text within a summary. */
>   public static class Fragment {
>     private String text;
[snip]
>    /** Returns the text of this fragment. */
>     public String getText() { return text; }
>     /** Returns an HTML representation of this fragment. */
>     public String toString() { return Entities.encode(text); }
>   }

Maybe I'm blind... but how can I access unescaped summary of a hit?

Dawid

--
Carrot2 Project:
http://www.cs.put.poznan.pl/dweiss/carrot



-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to