Re

Branden Root Wed, 14 Jul 2010 21:58:41 -0700

Hi Alex,

Thanks for your reply. I want to have Nutch crawl a site, then get a list of 
all pages/images on the site from the crawl. I am fluent in Java, but I'm 
looking for pointers to where to begin.


From running the tutorial, I did see a file created by the crawl, 
"links/part-00000" with plaintext info on all the site's pages - is that the 
lnkdb you refer to? 



Thanks,
Branden Makana



On Wednesday, July 14, 2010, Alex McLintock <[email protected]> wrote:
> I'm a bit confused as to what you want to do, your skills available,
> and how much you can code yourself. Presumably you have seen the
> linksdb? and you see that there is code to read from linksdb?
> 
> Have you looked at the ReadDB facility? You probably want to look at
> the class org.apache.nutch.crawl.CrawlDbReader
> 
> 
> Alex

Re

Reply via email to