Hello,
I used nutch 1.11 to crawl pages behind a login page.
The http-auth configuration looked like this:
---------------------------------------------------------------------------
<?xml version="1.0"?>
<auth-configuration>
<credentials authMethod="formAuth"
loginUrl=loginURL<https://sso.coremedia.com/zendesk/index.jsp?brand_id=3187316&locale_id=1&return_to=https%3A%2F%2Fsupport.coremedia.com%2Fhc%2Fen-us&timestamp=1464019963>
loginFormId="loginForm"
loginRedirect="true">
<loginPostData>
<field name="user[email]"
value="username"/>
<field name="user[password]"
value="password"/>
</loginPostData>
<additionalPostHeaders>
</additionalPostHeaders>
</credentials>
</auth-configuration>
--------------------------------------------------------------------
Everything worked fine. Then I updated to 1.13 (I also tried 1.18) and changed
the configuration as described in the http-auth.xml file:
-----------------------------------------------------------------------------
<auth-configuration>
<credentials authMethod="formAuth"
loginUrl=loginURL<https://sso.coremedia.com/zendesk/index.jsp?brand_id=3187316&locale_id=1&return_to=https%3A%2F%2Fsupport.coremedia.com%2Fhc%2Fen-us&timestamp=1464019963>
loginFormId="loginForm"
loginRedirect="true">
<loginPostData>
<field name="user[email]"
value="username"/>
<field name="user[password]"
value="password"/>
</loginPostData>
<additionalPostHeaders>
</additionalPostHeaders>
<removedFormFields>
</removedFormFields>
<loginCookie>
<policy>BROWSER_COMPATIBILITY</policy>
</loginCookie>
</credentials>
</auth-configuration>
-----------------------------------------------
Now, the login did not work anymore. After some redirects, it gives an HTML
response 403. I tried all loginCookie policy entries, but nothing worked.
The login is to a Zendesk support system with Atlassian Crowd as a login
provider. Has anything changed between 1.11 and 1.13 is something more strict
than before?
I found a very similar question in this mailing list
(https://www.mail-archive.com/[email protected]/msg15746.htmlfrom ) from
2017, which has no solutions.
I would appreciate any help!
Best regards
Michael
Dr. Michael Fritsch
Technical Editor
T: +49.40.325587.214
E: [email protected]<mailto:[email protected]>
CoreMedia GmbH - Be iconic
Ludwig-Erhard-Str. 18
20459 Hamburg, Germany
www.coremedia.com<http://www.coremedia.com/>
------------------------------------------------------------
Managing Directory: Sören Stamer
Commercial Register: Amtsgericht Hamburg, HR B 162480
----------------------------------------------------------------------
Stay up to date and follow us on
LinkedIn<https://www.linkedin.com/company/coremedia-corp> or
Twitter<https://twitter.com/contentcloud>