What's wrong with using the scoring-depth plugin?

On 13 December 2013 09:33, Markus Jelsma <[email protected]> wrote:

> Although there is no real notion of depth, as you already figured out, you
> can keep track of it via a scoring filter.
>
> http://grokbase.com/t/nutch/user/1092p10q5g/depth-information-not-being-available-in-crawl-datum
>
>
>
> -----Original message-----
> > From:Nguyen Manh Tien <[email protected]>
> > Sent: Friday 13th December 2013 5:30
> > To: [email protected]
> > Subject: Effective way to crawling seed and discover new urls.
> >
> > Hi,
> >
> > I am crawling a list of home pages to discover new articles, crawler will
> > stop at depth 1.But at depth 1, crawler still add many new urls with
> depth
> > 2, so event i only crawl up to depth 1 but crawldb still have many, many
> > urls at depth 2. Is there any way to prevent that or we need to
> implement a
> > custom plugin?
> >
> > And i only want to index discovered article at depth 1, not seed. do we
> > have a feature to do that?
> >
> > Thanks.
> > Tien
> >
>



-- 

Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Reply via email to