Assimilating with correct thread. ---------- Forwarded message ---------- From: Lewis John Mcgibbney <[email protected]> Date: Tue, May 3, 2016 at 1:42 PM Subject: Re: user Digest 3 May 2016 14:53:20 -0000 Issue 2582 To: "[email protected]" <[email protected]>
Hi Bin, Hope you are doing well! Please see response below On Tue, May 3, 2016 at 7:53 AM, <[email protected]> wrote: > > From: Bin Wang <[email protected]> > To: "Apache.Nutch.User" <[email protected]> > Cc: > Date: Mon, 2 May 2016 13:26:27 -0600 > Subject: Visualization Tool for Nutch > Hi there, > > Is there a state of the art visualization tool that is Nutch friendly? > > I am planning to get the crawldb information into a better format that can > be digested by Neo4j or Gephi for analysis. However, I have read here > < > http://grokbase.com/t/nutch/user/124fbmankh/how-to-do-detailed-postmortem-analysis-and-visualization-of-nutch-crawl-data > > > and there <http://wiki.apache.org/nutch/bin/nutch%20webgraph> about the > demand but I don't see any solid tutorial or documentation regarding the > visualization. > > I don't think visualization is a necessity for Nutch but something out of > the box will be interesting to have. (people love graphs) > > Mike Joyce and I were previously working on the following (currently stalled) 1. Upgrade enture MR API to 'New' MR API within master branch. 2. Use TinkerPop's ScriptInputFormat [0] for writing an extension of the WebgraphDB out to the Input for gremlin [1]. Once Nutch data is in such a format then we open up another world for graph analysis of Nutch data. I'm going to restart working on 1 above... might even get it finished during ApacheCon next week. We will see. Lewis [0] http://tinkerpop.apache.org/javadocs/3.2.0-incubating/full/index.html?org/apache/tinkerpop/gremlin/hadoop/structure/io/script/ScriptInputFormat.html [1] https://github.com/tinkerpop/gremlin/wiki -- *Lewis*

