Hi,

I am trying to crawl AJAX based sites with Nutch using protocol-selenium
with the phantomjs driver. I am using apache-nutch-1.13 compiled from
nutch's github repository. These crawls are launched as tasks in a system
managed by Mesos. When I launch nutch's crawl script from a terminal in the
server everything goes perfect and the site is crawled as I asked. However,
when I execute the same crawl script with the same parameters inside a
Mesos task nutch raises the exception:

fetch of http://XXXXX failed with: java.lang.RuntimeException:
org.openqa.selenium.NoSuchElementException: {"errorMessage":"Unable to
find element with tag name
'body'","request":{"headers":{"Accept-Encoding":"gzip,deflate","Connection":"Keep-Alive","Content-Length":"35","Content-Type":"application/json;
charset=utf-8","Host":"localhost:12215","User-Agent":"Apache-HttpClient/4.3.5
(java 1.5)"},"httpVersion":"1.1","method":"POST","post":"{\"using\":\"tag
name\",\"value\":\"body\"}","url":"/element","urlParsed":{"anchor":"","query":"","file":"element","directory":"/","path":"/element","relative":"/element","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/element","queryKey":{},"chunks":["element"]},"urlOriginal":"/session/a7f98ec0-b8aa-11e6-8b84-232b0d8e1024/element"}}

My first impression was that there was something strange with the
environmental variables (HADOOP_HOME, PATH, CLASSPATH...) but I put the
same env vars in the nutch crawl script and in the terminal and still the
same result.

Any ideas about what I am doing wrong?

Best regards
Carlos Pérez Miguel

Reply via email to