Hi Kiran,

You are using 2.x still?

On Mon, Feb 4, 2013 at 8:57 AM, kiran chitturi
<[email protected]> wrote:

>
> The file clearly shows that urls with status 1 have the protocolStatus(NOT
> FOUND). Those seeds are never moved to status (db_gone) that is status 3 if
> i am correct.
>
> Did anyone had a similar problem ? Any ideas on how to fix it ?

HttpBase [0] suggests that upon receipt of a 404 response code the
ProtocolStatus is marked to ProtocolStatusCodes.NOTFOUND which appears
to be 14! [1].
What are you expecting to happen here?


> PS : I have made patch which dumps only particular fields through command
> line (Example: ./bin/nutch readdb -dump table_fields -fields
> "status,protocolStatus"). baseUrl is dumped by default along with other
> fields requested. I can upload if anyone is interested.

Please file an issue and attach your patch. Any potential addition to
the codebase is welcomed.,
Thanks.

[0] 
http://svn.apache.org/repos/asf/nutch/branches/2.x/src/plugin/lib-http/src/java/org/apache/nutch/protocol/http/api/HttpBase.java
[1] 
http://svn.apache.org/repos/asf/nutch/branches/2.x/src/java/org/apache/nutch/protocol/ProtocolStatusCodes.java

-- 
Lewis

Reply via email to