Hi Kiran, You are using 2.x still?
On Mon, Feb 4, 2013 at 8:57 AM, kiran chitturi <[email protected]> wrote: > > The file clearly shows that urls with status 1 have the protocolStatus(NOT > FOUND). Those seeds are never moved to status (db_gone) that is status 3 if > i am correct. > > Did anyone had a similar problem ? Any ideas on how to fix it ? HttpBase [0] suggests that upon receipt of a 404 response code the ProtocolStatus is marked to ProtocolStatusCodes.NOTFOUND which appears to be 14! [1]. What are you expecting to happen here? > PS : I have made patch which dumps only particular fields through command > line (Example: ./bin/nutch readdb -dump table_fields -fields > "status,protocolStatus"). baseUrl is dumped by default along with other > fields requested. I can upload if anyone is interested. Please file an issue and attach your patch. Any potential addition to the codebase is welcomed., Thanks. [0] http://svn.apache.org/repos/asf/nutch/branches/2.x/src/plugin/lib-http/src/java/org/apache/nutch/protocol/http/api/HttpBase.java [1] http://svn.apache.org/repos/asf/nutch/branches/2.x/src/java/org/apache/nutch/protocol/ProtocolStatusCodes.java -- Lewis

