Hi Alex, Thanks for your reply. I want to have Nutch crawl a site, then get a list of all pages/images on the site from the crawl. I am fluent in Java, but I'm looking for pointers to where to begin.
From running the tutorial, I did see a file created by the crawl, "links/part-00000" with plaintext info on all the site's pages - is that the lnkdb you refer to? Thanks, Branden Makana On Wednesday, July 14, 2010, Alex McLintock <[email protected]> wrote: > I'm a bit confused as to what you want to do, your skills available, > and how much you can code yourself. Presumably you have seen the > linksdb? and you see that there is code to read from linksdb? > > Have you looked at the ReadDB facility? You probably want to look at > the class org.apache.nutch.crawl.CrawlDbReader > > > Alex

