Hi - So you are not using it for scoring right, but to inspect the graph of the web. Then there's certainly no need to weed out loops using the loops algorithm, neither a need to run the linkrank job Markus
-----Original message----- > From:Lewis John Mcgibbney <[email protected]> > Sent: Thursday 11th September 2014 19:53 > To: [email protected] > Subject: Re: Revisiting Loops Job in Nutch Trunk > > Hi Markus, > > On Wed, Sep 10, 2014 at 10:28 PM, <[email protected]> wrote: > > > > > Weird, i didn't see my own mail arriving on the list, i sent it via kmail > > but am on webmail now, which seems to work. > > > sigh ;) > > > > Anyway, for vertical search on a whole website i would rely on your > > (customized) Lucene similarity and proper analysis, but also downgrading > > `bad` pages for which you can make custom classifier plugins in Nutch. > > > Yep, this sounds much more appropriate for the task at hand. I have > debugged the Webgraph code as well as some of the tools within this > environment... it is not an apple-for-apple fit for what I am trying to > achieve. > > > > That way you can, for example, get rid of hub pages and promote actual > > content. > > > > > Yeah. I understand. > > > > > > Anyway, it all depends on what you want to achieve, which is....? :) > > > > > - Networks. Specifically, domain specific networks... > - how they are formed and where they come from. > - Where the traffic comes from (by server host, server IP, client IP and > by content relevance) > - what the graph looks like within these domain specific, networks. By > the way, within this context, I think that a dense graph is probably OK. I > am looking for this actually. >

