Hi there.
I'm from the Carrot2 project (a clustering front-end and components)
and we'd love to add a Nutch adapter to our project (or directly to
Nutch -- this is up to you). I've seen some posts that mentioned
Carrot2 -- glad to hear you want to experiment with it.
Anyway, the adapter is actually already finished with an exception of
one thing: when I retrieve hits' summaries using:
((NutchBean)bean).getSummary(details, query)
The result is _already_ HTML-escaped. I'd rather have the access to
hit's content as a string, or to the summary as a string. Right now in
FetchedSegments class you have:
> public String getSummary(HitDetails details, Query query)
> throws IOException {
>
> String text = getSegment(details).getText(getDocNo(details));
>
> return new Summarizer().getSummary(text, query).toString();
> }
And toString() on a Summary iterates over Fragments, appending them to
a StringBuffer... only the Fragment's toString method encodes
everything into HTML entities:
> /** A fragment of text within a summary. */
> public static class Fragment {
> private String text;
[snip]
> /** Returns the text of this fragment. */
> public String getText() { return text; }
> /** Returns an HTML representation of this fragment. */
> public String toString() { return Entities.encode(text); }
> }
Maybe I'm blind... but how can I access unescaped summary of a hit?
Dawid
--
Carrot2 Project:
http://www.cs.put.poznan.pl/dweiss/carrot
-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers