Hi Kiran, For this I think you are looking at diving further into the Gora API and codebase. As you can see around line 232 [0], the Query is set and executed based on the key. What you wish to do would possible encompass setting fields via the Gora Query API. There are some other useful methods in there which you could use for your specific requirements. If you find something which you think we could integrate into the WebTableReader in a more widely applicable manner then by all means please log a Jira, however I think that writing your own custom class to cut of all of the stuff you don't need from the existing WebTableReader may be the best route to take. Of course this may be wrong for me to say...
Lewis [0] http://svn.apache.org/viewvc/nutch/branches/2.x/src/java/org/apache/nutch/crawl/WebTableReader.java?view=markup [1] http://svn.apache.org/repos/asf/gora/trunk/gora-core/src/main/java/org/apache/gora/query/Query.java On Wed, Jan 16, 2013 at 9:35 AM, kiran chitturi <[email protected]>wrote: > If i want to fetch the list of urls based on the value of a field in the > database (like parseStatus, protocolStatus), are there any direct tricks or > commands for it rather than dumping the webpage (without content and text) > and searching inside. > > For example a command like './bin/nutch readdb -dump $FIELD_NAME > $FIELD_VALUE $LOCATION', might be quite useful when trying to look in to > the database after reading stats of the crawl and trying to figure out > which urls are under (status_redir_temp, status_redir_perm, status_retry, > status_gone, status_unfetched, status_fetched). > > Are there any tips/tricks when trying to deal with large data and trying to > dump urls based on parseStatus ? > > The documentation here (http://wiki.apache.org/nutch/bin/nutch_readdb) > might not apply to 2.x series. > > A page with commands and examples will be very helpful. Can we try to > create all new documentation separating 2.x and 1.x series ? > > > Thanks, > > -- > Kiran Chitturi > -- *Lewis*

