Re: Analyze in/out links

Marcus Herou Wed, 08 Aug 2007 09:02:36 -0700

Hi.

I might be total knucklehead but how do I use the NutchBean or LinkDbReader?
Are there any tutorial howto integrate this with apps? Do I need to learn
Hadoop and the MapReduce algorithm ? I don't find the
OpenSearchServlet.getServiceLocator.getNutchBean init process so easy to
comprehend. What files are necessary to get a ServiceLocator ? Just a
nutch-site.xml in classpath or all the files in WEB-INF/classes ? Please
give me a pointer where to look for integrating with nutch..

Yes I have used GraphML before when I do my own crawling and will use that
this time again while using Nutch, but thanks for refreshing my mind!

Let's say I create my own IndexReader what fields are available in the link
index and what dir should I point the IndexReader to for analyzing links ?
I guess this link says something about the structure...
http://wiki.apache.org/nutch/IndexStructure

Kindly

//Marcus

On 8/8/07, Renaud Richardet <[EMAIL PROTECTED]> wrote:
>
> hi Marcus,
> > Hi.
> >
> > I have now crawled all the sites I want and is about to create an
> undirected
> > unweighted graph of vertices and edges with Prefuse.
> >
> > So I have a question:
> >
> > How do I extract the in/out link info from Nutch
>
> you could use NutchBean, or LinkDbReader, or a custom Lucene searcher...
>
> what will be the semantic for vertices and a links?
> >  and on what format ?
> >
> Prefuse has a built-in GraphMLReader (and writer), so I would
> definitively go with that. Check the sample GraphML in prefuse/data.
>
> HTH,
> Renaud
>

-- 
Marcus Herou Solution Architect & Core Java developer Tailsweep AB
+46702561312
[EMAIL PROTECTED]
http://www.tailsweep.com

Re: Analyze in/out links

Reply via email to