Re: strategy for post-processing answer set

Fred Zimmerman Sat, 24 Sep 2011 08:23:45 -0700

ok.  this is a very basic question so please bear with me.

I see where the velocity templates are and I have looked at the
documentation and get the idea of how to write them.


it looks to me as if Solr just brings back the URLs. what I want to do is to
get the actual documents in the answer set, simplify their HTML and remove
all the javascript, ads, etc., and append them into a single document.

Now ... does Nutch already have the documents? can I get them from its db?
or do I have to go get the documents again with something like a wget?

Fred

On Fri, Sep 23, 2011 at 16:02, Erik Hatcher <erik.hatc...@gmail.com> wrote:

> conf/velocity by default.  See Solr's example configuration.
>
>   Erik
>
> On Sep 23, 2011, at 12:37, Fred Zimmerman <w...@nimblebooks.com> wrote:
>
> > ok, answered my own question, found velocity rw in solrconfig.xml.  next
> > question:
> >
> > where does velocity look for its templates?
> >
> > -----------------------------------------------------
> > Subscribe to the Nimble Books Mailing List  http://eepurl.com/czS- for
> > monthly updates
> >
> >
> >
> > On Fri, Sep 23, 2011 at 11:57, Fred Zimmerman <w...@nimblebooks.com>
> wrote:
> >
> >> This seems to be out of date. I am running Solr 3.4
> >>
> >> * the file structure of apachehome/contrib is different and I don't see
> >> velocity anywhere underneath
> >> * the page referenced below only talks about Solr 1.4 and 4.0
> >>
> >> ?
> >>
> >> On Thu, Sep 22, 2011 at 19:51, Markus Jelsma <
> markus.jel...@openindex.io>wrote:
> >>
> >>> Hi,
> >>>
> >>> Solr support the Velocity template engine and has veyr good support.
> Ideal
> >>> for
> >>> generating properly formatted output from the search engine. There's a
> >>> clustering example and it's easy to format documents indexed by Nutch.
> >>>
> >>> http://wiki.apache.org/solr/VelocityResponseWriter
> >>>
> >>> Cheers
> >>>
> >>>>> Hi,
> >>>>
> >>>> I would like to take the HTML documents that are the result of a Solr
> >>>> search and combine them into a single HTML document that combines the
> >>> body
> >>>> text of each individual document.  What is a good strategy for this? I
> >>> am
> >>>> crawling with Nutch and Carrot2 for clustering.
> >>>> Fred
> >>>
> >>
> >>
>

Re: strategy for post-processing answer set

Reply via email to