Alex,

See http://wiki.apache.org/solr/FieldCollapsing for implementing this in
SOLR. Since the indexing and searching is delegated to SOLR as of Nutch 1.3
and 2.0 this won't be implemented directly in Nutch.

HTH

Julien


On 4 January 2011 00:10, <[email protected]> wrote:

> Hello,
>
> I used nutch-1.2 to index a few domains. I noticed that nutch correctly
> crawled all sub-pages of domains. By sub-pages I mean the followings, for
> example for a domain mydomain.com all links inside it like
> mydomain.com/show/photos/1 and etc. I also noticed in our apache logs that
> google-bot also crawled all sub-pages.
> However, in search for mydomain.com google gives mydomain.com in the first
> page and almost no subpages, but nutch gives all subpages. If a domain has,
> let say 200 sub-pages and we display 10 results in a page then it would take
> us 10 pages to go forward to see results from other domains. In contrary
> google displays results form ohter domains in the second place.
>
> Is there a way of fixing this issue?
>
> Thanks in advance.
> Alex.
>
>
>


-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com

Reply via email to