Use -expr for JEXL-expressions on CrawlDatum's or -regex. See the CrawlDatum.java for the fields you can query.
-----Original message----- > From:Michael Coffey <[email protected]> > Sent: Wednesday 13th September 2017 3:45 > To: User <[email protected]> > Subject: querying crawldb > > Hello Nutchians, > I need to be able to query a (nutch 1.x) crawldb for read-only > search/sort/summarize purposes, based on combinations of status, fetch_time, > score, and things like that. What is a good tool or process for doing such > things? > Up until now, I've been doing readdb-dump and then processing the output with > python code that I wrote. But this is slow and clunky, and my code probably > has bugs. I wonder, would Hive be a good tool for this? >

