I don't think that's doable, as I *think* CrawlDb doesn't know which segment 
the URL is in (or does it?  Not looking at the code now, sorry).


But, knowing the segment you should be able to pull the web page data out.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


----- Original Message ----
> From: Viksit Gaur <[EMAIL PROTECTED]>
> To: [email protected]
> Sent: Thursday, June 12, 2008 2:22:09 AM
> Subject: Retrieving data for a particular URL from crawldb?
> 
> Hi all,
> 
> Is there a way to retrieve a particular page from the nutch crawl using 
> the URL as a key? Since I don't know the segment directory which this 
> page was put into, I can't use nutch readseg. But that tool only gives 
> stats about the URL and not its contents.
> 
> Any ideas on the best way to do this?
> 
> Thanks,
> Viksit

Reply via email to