sandeep pujar wrote:
Greetings,
Wanted to know if anybody had worked on form based
authentication for the nutch crawler.
any pointers, suggestions would help.
I have, without much success. Form-based authentication is different
from site to site - most sites don't use just a plain form with
username/password, but they use a wide variety of methods to check /
protect the data being sent. In extreme cases forms will use an embedded
challenge string, run a javascript-based md5 hash, and send only that
... in other cases some other tricks are played, with setting cookies,
redirecting, running javascripts, etc. In the end only perhaps 1 out of
50 sites was using a plain form authentication, and even that with
different field names on the form ... so I gave up.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com