Any insight into this error? On Tue, Jul 2, 2019 at 12:23 PM Susheel Kumar <susheel2...@gmail.com> wrote:
> Hello Nutch Users, > > I am a first time Nutch user and been trying to crawl an intranet portal > *https://pilot.mysite.sitecorp.com/user/login > <https://pilot.mysite.sitecorp.com/user/login>* using Nutch 1.15 and I > am always getting below "No form exists: user-login-form" error. I tried > crawling other login page like https://urs.earthdata.nasa.gov/ and do not > see such error but for this intranet site I am always getting this error. > > I tried crawling the same url/login page using Selenium Chrome Drive and > it does load and fill in the user id/pwd text boxes. > > What could be wrong. How can i further troubleshoot this? > > Thanks in advance. > > 2019-07-02 10:36:59,152 DEBUG httpclient.HttpMethodBase - Resorting to > protocol version default close connection policy > 2019-07-02 10:36:59,153 DEBUG httpclient.HttpMethodBase - Should NOT close > connection, using HTTP/1.1 > 2019-07-02 10:36:59,153 TRACE httpclient.HttpConnection - enter > HttpConnection.isResponseAvailable() > 2019-07-02 10:36:59,153 TRACE httpclient.HttpConnection - enter > HttpConnection.releaseConnection() > 2019-07-02 10:36:59,153 DEBUG httpclient.HttpConnection - Releasing > connection back to connection manager. > 2019-07-02 10:36:59,153 TRACE > httpclient.MultiThreadedHttpConnectionManager - enter > HttpConnectionManager.releaseConnection(HttpConnection) > 2019-07-02 10:36:59,153 DEBUG > httpclient.MultiThreadedHttpConnectionManager - Freeing connection, > hostConfig=HostConfiguration[host=https://pilot.mysite.sitecorp.com] > 2019-07-02 10:36:59,153 TRACE > httpclient.MultiThreadedHttpConnectionManager - enter > HttpConnectionManager.ConnectionPool.getHostPool(HostConfiguration) > 2019-07-02 10:36:59,153 DEBUG util.IdleConnectionHandler - Adding > connection at: 1562078219153 > 2019-07-02 10:36:59,153 DEBUG > httpclient.MultiThreadedHttpConnectionManager - Notifying no-one, there are > no waiting threads > 2019-07-02 10:36:59,202 DEBUG httpclient.HttpFormAuthentication - No form > element found with 'id' = user-login-form, trying 'name'. > 2019-07-02 10:36:59,205 DEBUG httpclient.HttpFormAuthentication - No form > element found with 'name' = user-login-form > 2019-07-02 10:36:59,205 ERROR httpclient.Http - Failed to get protocol > output > java.lang.RuntimeException: java.lang.IllegalArgumentException: No form > exists: user-login-form > at > org.apache.nutch.protocol.httpclient.Http.resolveCredentials(Http.java:500) > at > org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:177) > at > org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:320) > at > org.apache.nutch.fetcher.FetcherThread.run(FetcherThread.java:343) > Caused by: java.lang.IllegalArgumentException: No form exists: > user-login-form > at > org.apache.nutch.protocol.httpclient.HttpFormAuthentication.getLoginFormParams(HttpFormAuthentication.java:219) > at > org.apache.nutch.protocol.httpclient.HttpFormAuthentication.login(HttpFormAuthentication.java:95) > at > org.apache.nutch.protocol.httpclient.Http.resolveCredentials(Http.java:498) > ... 3 more > 2019-07-02 10:36:59,209 INFO fetcher.FetcherThread - FetcherThread 41 > fetch of https://pilot.mysite.sitecorp.com/user/login failed with: > java.lang.RuntimeException: java.lang.IllegalArgumentException: No form > exists: user-login-form > 2019-07-02 10:36:59,210 INFO fetcher.FetcherThread - FetcherThread 41 has > no more work available > 2019-07-02 10:36:59,210 INFO fetcher.FetcherThread - FetcherThread 41 > -finishing thread FetcherThread, activeThreads=0 > 2019-07-02 10:36:59,215 INFO mapreduce.Job - Job job_local487279790_0001 > running in uber mode : false > 2019-07-02 10:36:59,216 INFO mapreduce.Job - map 0% reduce 0% > 2019-07-02 10:36:59,635 INFO fetcher.Fetcher - -activeThreads=0, > spinWaiting=0, fetchQueues.totalSize=0, fetchQueues.getQueueCount=0 > 2019-07-02 10:36:59,635 INFO fetcher.Fetcher - -activeThreads=0 > 2019-07-02 10:37:00,218 INFO mapreduce.Job - map 100% reduce 100% > 2019-07-02 10:37:00,218 INFO mapreduce.Job - Job job_local487279790_0001 > completed successfully >