Hi. I might be total knucklehead but how do I use the NutchBean or LinkDbReader? Are there any tutorial howto integrate this with apps? Do I need to learn Hadoop and the MapReduce algorithm ? I don't find the OpenSearchServlet.getServiceLocator.getNutchBean init process so easy to comprehend. What files are necessary to get a ServiceLocator ? Just a nutch-site.xml in classpath or all the files in WEB-INF/classes ? Please give me a pointer where to look for integrating with nutch..
Yes I have used GraphML before when I do my own crawling and will use that this time again while using Nutch, but thanks for refreshing my mind! Let's say I create my own IndexReader what fields are available in the link index and what dir should I point the IndexReader to for analyzing links ? I guess this link says something about the structure... http://wiki.apache.org/nutch/IndexStructure Kindly //Marcus On 8/8/07, Renaud Richardet <[EMAIL PROTECTED]> wrote: > > hi Marcus, > > Hi. > > > > I have now crawled all the sites I want and is about to create an > undirected > > unweighted graph of vertices and edges with Prefuse. > > > > So I have a question: > > > > How do I extract the in/out link info from Nutch > > you could use NutchBean, or LinkDbReader, or a custom Lucene searcher... > > what will be the semantic for vertices and a links? > > and on what format ? > > > Prefuse has a built-in GraphMLReader (and writer), so I would > definitively go with that. Check the sample GraphML in prefuse/data. > > HTH, > Renaud > -- Marcus Herou Solution Architect & Core Java developer Tailsweep AB +46702561312 [EMAIL PROTECTED] http://www.tailsweep.com
