Hi, Thanks for the reply. Is there any alternative way to do this authentication? Does the fetcher job of Nutch accept cookies for fetching the web sites from the same domain? Could you suggest any work around to do form based authentication using Nutch?
Thanks, Tizy On Tue, Dec 16, 2014 at 1:08 PM, Halil Ibrahim Simsek <[email protected]> wrote: > > Hello Tizy, > > As I know, currently the development version of Nutch can do Basic, Digest > and NTLM based authentication. [1] Nutch can not do POST based > authentication that depends on cookies. BTW there is a document which > supposed to provide this feature but as far as i see no code developed yet. > [2] > > [1] https://wiki.apache.org/nutch/HttpAuthenticationSchemes > [2] https://wiki.apache.org/nutch/HttpPostAuthentication > > Halil > > 2014-12-16 7:16 GMT+02:00 Tizy Ninan <[email protected]>: > > > > Hi, > > > > I am trying to develop a custom crawler to crawl websites that require > form > > based authentication using Nutch v1.9 in Java. The > HttpPostAuthentication > > feature of Nutch is followed to implement it. > > > > The login parameters required for authentication such as html form-id, > > login post data(username, password) are specified as key-value pairs in a > > configuration file. What is required to identify the html login form(id > or > > name of the html form)? How to identify the html form parameters if id or > > name of the form is not specified? > > > > I have also posted the question to the developer mailing list, but did > not > > receive any reply.I am stuck with this for a while. Could somebody > provide > > with a solution on how to specify the html form parameters of websites to > > be crawled to perform form based authentication? > > > > Thanks and Regards, > > Tizy > > > -- Thanks and Regards, Tizy

