Any insight into this error?

On Tue, Jul 2, 2019 at 12:23 PM Susheel Kumar <susheel2...@gmail.com> wrote:

> Hello Nutch Users,
>
> I am a first time Nutch user and been trying to crawl an intranet portal 
> *https://pilot.mysite.sitecorp.com/user/login
> <https://pilot.mysite.sitecorp.com/user/login>*  using Nutch 1.15 and I
> am always getting below "No form exists: user-login-form" error.  I tried
> crawling other login page like https://urs.earthdata.nasa.gov/ and do not
> see such error but for this intranet site I am always getting this error.
>
> I tried crawling the same url/login page using Selenium Chrome Drive and
> it does load and fill in the user id/pwd text boxes.
>
> What could be wrong.  How can i further troubleshoot this?
>
> Thanks in advance.
>
>  2019-07-02 10:36:59,152 DEBUG httpclient.HttpMethodBase - Resorting to
> protocol version default close connection policy
> 2019-07-02 10:36:59,153 DEBUG httpclient.HttpMethodBase - Should NOT close
> connection, using HTTP/1.1
> 2019-07-02 10:36:59,153 TRACE httpclient.HttpConnection - enter
> HttpConnection.isResponseAvailable()
> 2019-07-02 10:36:59,153 TRACE httpclient.HttpConnection - enter
> HttpConnection.releaseConnection()
> 2019-07-02 10:36:59,153 DEBUG httpclient.HttpConnection - Releasing
> connection back to connection manager.
> 2019-07-02 10:36:59,153 TRACE
> httpclient.MultiThreadedHttpConnectionManager - enter
> HttpConnectionManager.releaseConnection(HttpConnection)
> 2019-07-02 10:36:59,153 DEBUG
> httpclient.MultiThreadedHttpConnectionManager - Freeing connection,
> hostConfig=HostConfiguration[host=https://pilot.mysite.sitecorp.com]
> 2019-07-02 10:36:59,153 TRACE
> httpclient.MultiThreadedHttpConnectionManager - enter
> HttpConnectionManager.ConnectionPool.getHostPool(HostConfiguration)
> 2019-07-02 10:36:59,153 DEBUG util.IdleConnectionHandler - Adding
> connection at: 1562078219153
> 2019-07-02 10:36:59,153 DEBUG
> httpclient.MultiThreadedHttpConnectionManager - Notifying no-one, there are
> no waiting threads
> 2019-07-02 10:36:59,202 DEBUG httpclient.HttpFormAuthentication - No form
> element found with 'id' = user-login-form, trying 'name'.
> 2019-07-02 10:36:59,205 DEBUG httpclient.HttpFormAuthentication - No form
> element found with 'name' = user-login-form
> 2019-07-02 10:36:59,205 ERROR httpclient.Http - Failed to get protocol
> output
> java.lang.RuntimeException: java.lang.IllegalArgumentException: No form
> exists: user-login-form
>         at
> org.apache.nutch.protocol.httpclient.Http.resolveCredentials(Http.java:500)
>         at
> org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:177)
>         at
> org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:320)
>         at
> org.apache.nutch.fetcher.FetcherThread.run(FetcherThread.java:343)
> Caused by: java.lang.IllegalArgumentException: No form exists:
> user-login-form
>         at
> org.apache.nutch.protocol.httpclient.HttpFormAuthentication.getLoginFormParams(HttpFormAuthentication.java:219)
>         at
> org.apache.nutch.protocol.httpclient.HttpFormAuthentication.login(HttpFormAuthentication.java:95)
>         at
> org.apache.nutch.protocol.httpclient.Http.resolveCredentials(Http.java:498)
>         ... 3 more
> 2019-07-02 10:36:59,209 INFO  fetcher.FetcherThread - FetcherThread 41
> fetch of https://pilot.mysite.sitecorp.com/user/login failed with:
> java.lang.RuntimeException: java.lang.IllegalArgumentException: No form
> exists: user-login-form
> 2019-07-02 10:36:59,210 INFO  fetcher.FetcherThread - FetcherThread 41 has
> no more work available
> 2019-07-02 10:36:59,210 INFO  fetcher.FetcherThread - FetcherThread 41
> -finishing thread FetcherThread, activeThreads=0
> 2019-07-02 10:36:59,215 INFO  mapreduce.Job - Job job_local487279790_0001
> running in uber mode : false
> 2019-07-02 10:36:59,216 INFO  mapreduce.Job -  map 0% reduce 0%
> 2019-07-02 10:36:59,635 INFO  fetcher.Fetcher - -activeThreads=0,
> spinWaiting=0, fetchQueues.totalSize=0, fetchQueues.getQueueCount=0
> 2019-07-02 10:36:59,635 INFO  fetcher.Fetcher - -activeThreads=0
> 2019-07-02 10:37:00,218 INFO  mapreduce.Job -  map 100% reduce 100%
> 2019-07-02 10:37:00,218 INFO  mapreduce.Job - Job job_local487279790_0001
> completed successfully
>

Reply via email to