Hi - So you are not using it for scoring right, but to inspect the graph of the 
web. Then there's certainly no need to weed out loops using the loops 
algorithm, neither a need to run the linkrank job 
Markus

 
 
-----Original message-----
> From:Lewis John Mcgibbney <[email protected]>
> Sent: Thursday 11th September 2014 19:53
> To: [email protected]
> Subject: Re: Revisiting Loops Job in Nutch Trunk
> 
> Hi Markus,
> 
> On Wed, Sep 10, 2014 at 10:28 PM, <[email protected]> wrote:
> 
> >
> > Weird, i didn't see my own mail arriving on the list, i sent it via kmail
> > but am on webmail now, which seems to work.
> 
> 
> sigh ;)
> 
> 
> > Anyway, for vertical search on a whole website i would rely on your
> > (customized) Lucene similarity and proper analysis, but also downgrading
> > `bad` pages for which you can make custom classifier plugins in Nutch.
> 
> 
> Yep, this sounds much more appropriate for the task at hand. I have
> debugged the Webgraph code as well as some of the tools within this
> environment... it is not an apple-for-apple fit for what I am trying to
> achieve.
> 
> 
> > That way you can, for example, get rid of hub pages and promote actual
> > content.
> >
> >
> Yeah. I understand.
> 
> 
> >
> > Anyway, it all depends on what you want to achieve, which is....? :)
> >
> 
> 
>    - Networks. Specifically, domain specific networks...
>    - how they are formed and where they come from.
>    - Where the traffic comes from (by server host, server IP, client IP and
>    by content relevance)
>    - what the graph looks like within these domain specific, networks. By
>    the way, within this context, I think that a dense graph is probably OK. I
>    am looking for this actually.
> 

Reply via email to