Next couple of weeks depending on holidays I think.

Dennis

Elena wrote:
Thank you for your response. I was lost seeing that summaries were only
generated for certain urls.

Is there any date set for the 1.0 release?

Elena


2008/11/25 Dennis Kubes <[EMAIL PROTECTED]>


Elena wrote:

Hello everyone,

I am using Nutch with the Solr plugin, and I am having a problem indexing
redirected url´s. While Solr generates its fields just fine, as if they
belonged to the redirected url, Nutch leaves the summary field empty. It
seems as if Nutch tries to generate the summary of the original url and
then
makes the query to Solr, which then follows the redirect and fills the
rest
of the fields using the final url. But I am not quite sure of this.

It depends on what version of Nutch you are using.  This was a problem with
some older Trunk versions.  The problem is that Nutch has the concept of a
representative url for redirects.  Redirects have an original and a
redirected to url.  Logic dictates which of those is stored as the url and
which is displayed on search results pages.  Most of the problems which this
mismatch have been fixed in recent patches and should be deployed out in a
new 1.0 release in the next week or so.


I would like to know what is the way Nutch generates summaries, why it
leaves them empty when redirecting. Perharps there is a command to
generate
one field in particular, after the indexing is done.

 Summaries are generated, at query time, from the full text of the web
page stored in ParseText under segments.  The
org.apache.nutch.searcher.Summarizer plugins are what actually returns the
summary text.  By default it uses the summary-basic plugin.

Dennis

 Thanks!


Reply via email to