Hi,

I am trying to develop a custom crawler to crawl websites that require form
based authentication using Nutch v1.9 in Java.  The HttpPostAuthentication
feature of Nutch is followed to implement it.

The login parameters required for authentication such as html form-id,
login post data(username, password) are specified as key-value pairs in a
configuration file. What is required to identify the html login form(id or
name of the html form)? How to identify the html form parameters if id or
name of the form is not specified?

I have also posted the question to the developer mailing list, but did not
receive any reply.I am stuck with this for a while. Could somebody provide
with a solution on how to specify the html form parameters of websites to
be crawled to perform form based authentication?

Thanks and Regards,
Tizy

Reply via email to