It's possible that Spark sets the executor environment explicitly, which would lead to the http_proxy and https_proxy environment variables not being passed along to the executor. You could try using the `--executor_environment_variables` command-line flag when running the agent to specify these environment variables, ensuring that they get passed through.
On Sat, Oct 31, 2015 at 12:06 AM, Zhongyue Luo <[email protected]> wrote: > Any advise on this issue? I'm having the same problem. > > On Fri, Oct 9, 2015 at 4:13 AM, David M <[email protected]> wrote: > >> Hi everyone. >> >> I have Mesos cluster (0.24.1) for running Spark (1.5.2) that runs great. >> >> I have a requirement to move my Mesos cluster nodes behind a Squid http >> proxy. >> All cluster nodes previously had direct outbound Internet access so >> accessing SPARK_EXECUTOR_URI from a public source was not a problem. >> >> System-wide I have http_proxy and https_proxy environment variables set. >> Command line tools like curl and wget operate just fine against Internet >> resources. >> After configuring Maven's proxy settings the Mesos build completed >> successfully. >> >> I copied my /etc/hosts file to HDFS and attempted the WordCount example >> from: >> http://documentation.altiscale.com/spark-shell-examples-1-1 >> >> It failed with this in the executors stderr file: >> >> I1008 15:39:48.417644 21698 logging.cpp:172] INFO level logging started! >> I1008 15:39:48.417819 21698 fetcher.cpp:414] Fetcher Info: >> {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/20151007-154648-2701359370-5050-25191-S3\/spark","items":[{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"http:\/\/ >> d3kbcqa49mib13.cloudfront.net >> \/spark-1.5.1-bin-hadoop2.6.tgz"}}],"sandbox_directory":"\/var\/run\/mesos\/slaves\/20151007-154648-2701359370-5050-25191-S3\/frameworks\/20151008-123957-2701359370-5050-6382-0001\/executors\/20151007-154648-2701359370-5050-25191-S3\/runs\/507827fb-cfb0-4a1d-977d-9b9afb972c29","user":"spark"} >> I1008 15:39:48.418918 21698 fetcher.cpp:369] Fetching URI ' >> http://d3kbcqa49mib13.cloudfront.net/spark-1.5.1-bin-hadoop2.6.tgz' >> I1008 15:39:48.418936 21698 fetcher.cpp:243] Fetching directly into the >> sandbox directory >> I1008 15:39:48.418949 21698 fetcher.cpp:180] Fetching URI ' >> http://d3kbcqa49mib13.cloudfront.net/spark-1.5.1-bin-hadoop2.6.tgz' >> I1008 15:39:48.418958 21698 fetcher.cpp:127] Downloading resource from ' >> http://d3kbcqa49mib13.cloudfront.net/spark-1.5.1-bin-hadoop2.6.tgz' to >> '/var/run/mesos/slaves/20151007-154648-2701359370-5050-25191-S3/frameworks/20151008-123957-2701359370-5050-6382-0001/executors/20151007-154648-2701359370-5050-25191-S3/runs/507827fb-cfb0-4a1d-977d-9b9afb972c29/spark-1.5.1-bin-hadoop2.6.tgz' >> Failed to fetch ' >> http://d3kbcqa49mib13.cloudfront.net/spark-1.5.1-bin-hadoop2.6.tgz': >> Error downloading resource, received HTTP return code 400 >> Failed to synchronize with slave (it's probably exited) >> >> Long troubleshooting story short it appears that libcurl isn't finding >> out about my proxy. >> >> In ./3rdparty/libprocess/3rdparty/stout/include/stout/posix/net.hpp >> >> I added >> >> curl_easy_setopt(curl, CURLOPT_VERBOSE, true); >> curl_easy_setopt(curl, CURLOPT_PROXY, "<my squid server hostname >> here>"); >> curl_easy_setopt(curl, CURLOPT_PROXYPORT, <my squid server port here>); >> >> before >> >> CURLcode curlErrorCode = curl_easy_perform(curl); >> >> I then recompiled Mesos and the WordCount example is successful. >> >> What is the correct way to set proxy so that libcurl will make use of it? >> >> Thank you. >> David >> >> >> >> > > > -- > *Intel SSG/STO/BDT* > 880 Zixing Road, Zizhu Science Park, Minhang District, 200241, Shanghai, > China > +862161166500 >

