Hi Markus, On Wed, Sep 10, 2014 at 10:28 PM, <[email protected]> wrote:
> > Weird, i didn't see my own mail arriving on the list, i sent it via kmail > but am on webmail now, which seems to work. sigh ;) > Anyway, for vertical search on a whole website i would rely on your > (customized) Lucene similarity and proper analysis, but also downgrading > `bad` pages for which you can make custom classifier plugins in Nutch. Yep, this sounds much more appropriate for the task at hand. I have debugged the Webgraph code as well as some of the tools within this environment... it is not an apple-for-apple fit for what I am trying to achieve. > That way you can, for example, get rid of hub pages and promote actual > content. > > Yeah. I understand. > > Anyway, it all depends on what you want to achieve, which is....? :) > - Networks. Specifically, domain specific networks... - how they are formed and where they come from. - Where the traffic comes from (by server host, server IP, client IP and by content relevance) - what the graph looks like within these domain specific, networks. By the way, within this context, I think that a dense graph is probably OK. I am looking for this actually.

