Re: Analyze in/out links

Renaud Richardet Thu, 09 Aug 2007 08:01:35 -0700

Marcus Herou wrote:

Hi.
I might be total knucklehead but how do I use the NutchBean or LinkDbReader?
Are there any tutorial howto integrate this with apps?

NutchBean is used in the main jsp of the webapp, use it as an example.LinkDbReader is used by NutchBean.

Do I need to learn
Hadoop and the MapReduce algorithm ?

No, that should not be necessary

HTH,
Renaud

 I don't find the
OpenSearchServlet.getServiceLocator.getNutchBean init process so easy to
comprehend. What files are necessary to get a ServiceLocator ? Just a
nutch-site.xml in classpath or all the files in WEB-INF/classes ? Please
give me a pointer where to look for integrating with nutch..

Yes I have used GraphML before when I do my own crawling and will use that
this time again while using Nutch, but thanks for refreshing my mind!

Let's say I create my own IndexReader what fields are available in the link
index and what dir should I point the IndexReader to for analyzing links ?
I guess this link says something about the structure...
http://wiki.apache.org/nutch/IndexStructure

Kindly

//Marcus





On 8/8/07, Renaud Richardet <[EMAIL PROTECTED]> wrote:

hi Marcus,

Hi.

I have now crawled all the sites I want and is about to create an

undirected

unweighted graph of vertices and edges with Prefuse.

So I have a question:

How do I extract the in/out link info from Nutch

you could use NutchBean, or LinkDbReader, or a custom Lucene searcher...

what will be the semantic for vertices and a links?

 and on what format ?

Prefuse has a built-in GraphMLReader (and writer), so I would
definitively go with that. Check the sample GraphML in prefuse/data.

HTH,
Renaud

Re: Analyze in/out links

Reply via email to