Re: Retrieving data for a particular URL from crawldb?

Otis Gospodnetic Thu, 12 Jun 2008 09:11:16 -0700

I don't think that's doable, as I *think* CrawlDb doesn't know which segment 
the URL is in (or does it?  Not looking at the code now, sorry).



But, knowing the segment you should be able to pull the web page data out.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


----- Original Message ----
> From: Viksit Gaur <[EMAIL PROTECTED]>
> To: [email protected]
> Sent: Thursday, June 12, 2008 2:22:09 AM
> Subject: Retrieving data for a particular URL from crawldb?
> 
> Hi all,
> 
> Is there a way to retrieve a particular page from the nutch crawl using 
> the URL as a key? Since I don't know the segment directory which this 
> page was put into, I can't use nutch readseg. But that tool only gives 
> stats about the URL and not its contents.
> 
> Any ideas on the best way to do this?
> 
> Thanks,
> Viksit

Re: Retrieving data for a particular URL from crawldb?

Reply via email to