Hello Nutch Users,

I am a first time Nutch user and been trying to crawl an intranet
portal *https://pilot.mysite.sitecorp.com/user/login
<https://pilot.mysite.sitecorp.com/user/login>*  using Nutch 1.15 and I am
always getting below "No form exists: user-login-form" error.  I tried
crawling other login page like https://urs.earthdata.nasa.gov/ and do not
see such error but for this intranet site I am always getting this error.

I tried crawling the same url/login page using Selenium Chrome Drive and it
does load and fill in the user id/pwd text boxes.

What could be wrong.  How can i further troubleshoot this?

Thanks in advance.

 2019-07-02 10:36:59,152 DEBUG httpclient.HttpMethodBase - Resorting to
protocol version default close connection policy
2019-07-02 10:36:59,153 DEBUG httpclient.HttpMethodBase - Should NOT close
connection, using HTTP/1.1
2019-07-02 10:36:59,153 TRACE httpclient.HttpConnection - enter
HttpConnection.isResponseAvailable()
2019-07-02 10:36:59,153 TRACE httpclient.HttpConnection - enter
HttpConnection.releaseConnection()
2019-07-02 10:36:59,153 DEBUG httpclient.HttpConnection - Releasing
connection back to connection manager.
2019-07-02 10:36:59,153 TRACE httpclient.MultiThreadedHttpConnectionManager
- enter HttpConnectionManager.releaseConnection(HttpConnection)
2019-07-02 10:36:59,153 DEBUG httpclient.MultiThreadedHttpConnectionManager
- Freeing connection, hostConfig=HostConfiguration[host=
https://pilot.mysite.sitecorp.com]
2019-07-02 10:36:59,153 TRACE httpclient.MultiThreadedHttpConnectionManager
- enter HttpConnectionManager.ConnectionPool.getHostPool(HostConfiguration)
2019-07-02 10:36:59,153 DEBUG util.IdleConnectionHandler - Adding
connection at: 1562078219153
2019-07-02 10:36:59,153 DEBUG httpclient.MultiThreadedHttpConnectionManager
- Notifying no-one, there are no waiting threads
2019-07-02 10:36:59,202 DEBUG httpclient.HttpFormAuthentication - No form
element found with 'id' = user-login-form, trying 'name'.
2019-07-02 10:36:59,205 DEBUG httpclient.HttpFormAuthentication - No form
element found with 'name' = user-login-form
2019-07-02 10:36:59,205 ERROR httpclient.Http - Failed to get protocol
output
java.lang.RuntimeException: java.lang.IllegalArgumentException: No form
exists: user-login-form
        at
org.apache.nutch.protocol.httpclient.Http.resolveCredentials(Http.java:500)
        at
org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:177)
        at
org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:320)
        at
org.apache.nutch.fetcher.FetcherThread.run(FetcherThread.java:343)
Caused by: java.lang.IllegalArgumentException: No form exists:
user-login-form
        at
org.apache.nutch.protocol.httpclient.HttpFormAuthentication.getLoginFormParams(HttpFormAuthentication.java:219)
        at
org.apache.nutch.protocol.httpclient.HttpFormAuthentication.login(HttpFormAuthentication.java:95)
        at
org.apache.nutch.protocol.httpclient.Http.resolveCredentials(Http.java:498)
        ... 3 more
2019-07-02 10:36:59,209 INFO  fetcher.FetcherThread - FetcherThread 41
fetch of https://pilot.mysite.sitecorp.com/user/login failed with:
java.lang.RuntimeException: java.lang.IllegalArgumentException: No form
exists: user-login-form
2019-07-02 10:36:59,210 INFO  fetcher.FetcherThread - FetcherThread 41 has
no more work available
2019-07-02 10:36:59,210 INFO  fetcher.FetcherThread - FetcherThread 41
-finishing thread FetcherThread, activeThreads=0
2019-07-02 10:36:59,215 INFO  mapreduce.Job - Job job_local487279790_0001
running in uber mode : false
2019-07-02 10:36:59,216 INFO  mapreduce.Job -  map 0% reduce 0%
2019-07-02 10:36:59,635 INFO  fetcher.Fetcher - -activeThreads=0,
spinWaiting=0, fetchQueues.totalSize=0, fetchQueues.getQueueCount=0
2019-07-02 10:36:59,635 INFO  fetcher.Fetcher - -activeThreads=0
2019-07-02 10:37:00,218 INFO  mapreduce.Job -  map 100% reduce 100%
2019-07-02 10:37:00,218 INFO  mapreduce.Job - Job job_local487279790_0001
completed successfully

Reply via email to