Use -expr for JEXL-expressions on CrawlDatum's or -regex. See the 
CrawlDatum.java for the fields you can query.

 
 
-----Original message-----
> From:Michael Coffey <[email protected]>
> Sent: Wednesday 13th September 2017 3:45
> To: User <[email protected]>
> Subject: querying crawldb
> 
> Hello Nutchians,
> I need to be able to query a (nutch 1.x) crawldb for read-only 
> search/sort/summarize purposes, based on combinations of status, fetch_time, 
> score, and things like that. What is a good tool or process for doing such 
> things?
> Up until now, I've been doing readdb-dump and then processing the output with 
> python code that I wrote. But this is slow and clunky, and my code probably 
> has bugs. I wonder, would Hive be a good tool for this?
> 

Reply via email to