Thank you, Andrew. That makes sense for me now. I was confused by "In yarn-cluster mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster" in http://spark.apache.org/docs/latest/running-on-yarn.html . After you explanation, it's clear now. Thank you.
Best, Fang, Yan yanfang...@gmail.com +1 (206) 849-4108 On Mon, Jul 7, 2014 at 1:07 PM, Andrew Or <and...@databricks.com> wrote: > @Yan, the UI should still work. As long as you look into the container > that launches the driver, you will find the SparkUI address and port. Note > that in yarn-cluster mode the Spark driver doesn't actually run in the > Application Manager; just like the executors, it runs in a container that > is launched by the Resource Manager after the Application Master requests > the container resources. In contrast, in yarn-client mode, your driver is > not launched in a container, but in the client process that launched your > application (i.e. spark-submit), so the stdout of this program directly > contains the SparkUI messages. > > @Chester, I'm not sure what has gone wrong as there are many factors at > play here. When you go the Resource Manager UI, does the "application URL" > link point you to the same SparkUI address as indicated in the logs? If so, > this is the correct behavior. However, I believe the redirect error has > little to do with Spark itself, but more to do with how you set up the > cluster. I have actually run into this myself, but I haven't found a > workaround. Let me know if you find anything. > > > > > 2014-07-07 12:07 GMT-07:00 Chester Chen <ches...@alpinenow.com>: > > As Andrew explained, the port is random rather than 4040, as the the spark >> driver is started in Application Master and the port is random selected. >> >> >> But I have the similar UI issue. I am running Yarn Cluster mode against >> my local CDH5 cluster. >> >> The log states >> "14/07/07 11:59:29 INFO ui.SparkUI: Started SparkUI at >> http://10.0.0.63:58750 >> >> >> " >> >> >> but when you client the spark UI link (ApplicationMaster or >> >> http://10.0.0.63:58750), I will got a 404 with the redirect URI >> >> >> >> >> http://localhost/proxy/application_1404443455764_0010/ >> >> >> >> Looking at the Spark code, notice that the "proxy" is reallya variable to >> get the proxy at the yarn-site.xml http address. But when I specified the >> value at yarn-site.xml, it still doesn't work for me. >> >> >> >> Oddly enough, it works for my co-worker on Pivotal HD cluster, therefore I >> am still looking what's the difference in terms of cluster setup or >> something else. >> >> >> Chester >> >> >> >> >> >> On Mon, Jul 7, 2014 at 11:42 AM, Andrew Or <and...@databricks.com> wrote: >> >>> I will assume that you are running in yarn-cluster mode. Because the >>> driver is launched in one of the containers, it doesn't make sense to >>> expose port 4040 for the node that contains the container. (Imagine if >>> multiple driver containers are launched on the same node. This will cause a >>> port collision). If you're launching Spark from a gateway node that is >>> physically near your worker nodes, then you can just launch your >>> application in yarn-client mode, in which case the SparkUI will always be >>> started on port 4040 on the node that you ran spark-submit on. The reason >>> why sometimes you see the red text is because it appears only on the driver >>> containers, not the executor containers. This is because SparkUI belongs to >>> the SparkContext, which only exists on the driver. >>> >>> Andrew >>> >>> >>> 2014-07-07 11:20 GMT-07:00 Yan Fang <yanfang...@gmail.com>: >>> >>> Hi guys, >>>> >>>> Not sure if you have similar issues. Did not find relevant tickets in >>>> JIRA. When I deploy the Spark Streaming to YARN, I have following two >>>> issues: >>>> >>>> 1. The UI port is random. It is not default 4040. I have to look at the >>>> container's log to check the UI port. Is this suppose to be this way? >>>> >>>> 2. Most of the time, the UI does not work. The difference between logs >>>> are (I ran the same program): >>>> >>>> >>>> >>>> >>>> >>>> >>>> *14/07/03 11:38:50 INFO spark.HttpServer: Starting HTTP Server14/07/03 >>>> 11:38:50 INFO server.Server: jetty-8.y.z-SNAPSHOT 14/07/03 11:38:50 INFO >>>> server.AbstractConnector: Started SocketConnector@0.0.0.0:12026 >>>> <http://SocketConnector@0.0.0.0:12026>14/07/03 11:38:51 INFO >>>> executor.CoarseGrainedExecutorBackend: Got assigned task 0 14/07/03 >>>> 11:38:51 INFO executor.Executor: Running task ID 0...* >>>> >>>> 14/07/02 16:55:32 INFO spark.HttpServer: Starting HTTP Server >>>> 14/07/02 16:55:32 INFO server.Server: jetty-8.y.z-SNAPSHOT >>>> 14/07/02 16:55:32 INFO server.AbstractConnector: Started >>>> SocketConnector@0.0.0.0:14211 >>>> >>>> >>>> >>>> >>>> *14/07/02 16:55:32 INFO ui.JettyUtils: Adding filter: >>>> org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter14/07/02 16:55:32 >>>> INFO server.Server: jetty-8.y.z-SNAPSHOT14/07/02 16:55:32 INFO >>>> server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:21867 >>>> <http://SelectChannelConnector@0.0.0.0:21867> 14/07/02 16:55:32 INFO >>>> ui.SparkUI: Started SparkUI at http://myNodeName:21867 >>>> <http://myNodeName:21867>14/07/02 16:55:32 INFO >>>> cluster.YarnClusterScheduler: Created YarnClusterScheduler* >>>> >>>> When the red part comes, the UI works sometime. Any ideas? Thank you. >>>> >>>> Best, >>>> >>>> Fang, Yan >>>> yanfang...@gmail.com >>>> +1 (206) 849-4108 >>>> >>> >>> >> >