Re: Consistent recommendation for submitting spark apps to YARN, -master yarn --deploy-mode x vs -master yarn-x'
Following up on this thread to see if anyone has some thoughts or opinions on the mentioned approach. Guru Medasani gdm...@gmail.com On Aug 3, 2015, at 10:20 PM, Guru Medasani gdm...@gmail.com wrote: Hi, I was looking at the spark-submit and spark-shell --help on both (Spark 1.3.1 and Spark 1.5-snapshot) versions and the Spark documentation for submitting Spark applications to YARN. It seems to be there is some mismatch in the preferred syntax and documentation. Spark documentation http://spark.apache.org/docs/latest/submitting-applications.html#master-urls says that we need to specify either yarn-cluster or yarn-client to connect to a yarn cluster. yarn-client Connect to a YARN http://spark.apache.org/docs/latest/running-on-yarn.htmlcluster in client mode. The cluster location will be found based on the HADOOP_CONF_DIR or YARN_CONF_DIR variable. yarn-cluster Connect to a YARN http://spark.apache.org/docs/latest/running-on-yarn.htmlcluster in cluster mode. The cluster location will be found based on the HADOOP_CONF_DIR or YARN_CONF_DIR variable. In the spark-submit --help it says the following Options: --master yarn --deploy-mode cluster or client. Usage: spark-submit [options] app jar | python file [app arguments] Usage: spark-submit --kill [submission ID] --master [spark://...] spark://...] Usage: spark-submit --status [submission ID] --master [spark://...] spark://...] Options: --master MASTER_URL spark://host:port spark://host:port, mesos://host:port mesos://host:port, yarn, or local. --deploy-mode DEPLOY_MODE Whether to launch the driver program locally (client) or on one of the worker machines inside the cluster (cluster) (Default: client). I want to bring this to your attention as this is a bit confusing for someone running Spark on YARN. For example, they look at the spark-submit help command and start using the syntax, but when they look at online documentation or user-group mailing list, they see different spark-submit syntax. From a quick discussion with other engineers at Cloudera it seems like —deploy-mode is preferred as it is more consistent with the way things are done with other cluster managers, i.e. there is no standalone-cluster or standalone-client masters. This applies to Mesos as well. Either syntax works, but I would like to propose to use ‘-master yarn —deploy-mode x’ instead of ‘-master yarn-cluster or -master yarn-client’ as it is consistent with other cluster managers . This would require updating all Spark pages related to submitting Spark applications to YARN. So far I’ve identified the following pages. 1) http://spark.apache.org/docs/latest/running-on-yarn.html http://spark.apache.org/docs/latest/running-on-yarn.html 2) http://spark.apache.org/docs/latest/submitting-applications.html#master-urls http://spark.apache.org/docs/latest/submitting-applications.html#master-urls There is a JIRA to track the progress on this as well. https://issues.apache.org/jira/browse/SPARK-9570 https://issues.apache.org/jira/browse/SPARK-9570 The option we choose dictates whether we update the documentation or spark-submit and spark-shell help pages. Any thoughts which direction we should go? Guru Medasani gdm...@gmail.com mailto:gdm...@gmail.com
Consistent recommendation for submitting spark apps to YARN, -master yarn --deploy-mode x vs -master yarn-x'
Hi, I was looking at the spark-submit and spark-shell --help on both (Spark 1.3.1 and Spark 1.5-snapshot) versions and the Spark documentation for submitting Spark applications to YARN. It seems to be there is some mismatch in the preferred syntax and documentation. Spark documentation http://spark.apache.org/docs/latest/submitting-applications.html#master-urls says that we need to specify either yarn-cluster or yarn-client to connect to a yarn cluster. yarn-client Connect to a YARN http://spark.apache.org/docs/latest/running-on-yarn.htmlcluster in client mode. The cluster location will be found based on the HADOOP_CONF_DIR or YARN_CONF_DIR variable. yarn-clusterConnect to a YARN http://spark.apache.org/docs/latest/running-on-yarn.htmlcluster in cluster mode. The cluster location will be found based on the HADOOP_CONF_DIR or YARN_CONF_DIR variable. In the spark-submit --help it says the following Options: --master yarn --deploy-mode cluster or client. Usage: spark-submit [options] app jar | python file [app arguments] Usage: spark-submit --kill [submission ID] --master [spark://...] Usage: spark-submit --status [submission ID] --master [spark://...] Options: --master MASTER_URL spark://host:port, mesos://host:port, yarn, or local. --deploy-mode DEPLOY_MODE Whether to launch the driver program locally (client) or on one of the worker machines inside the cluster (cluster) (Default: client). I want to bring this to your attention as this is a bit confusing for someone running Spark on YARN. For example, they look at the spark-submit help command and start using the syntax, but when they look at online documentation or user-group mailing list, they see different spark-submit syntax. From a quick discussion with other engineers at Cloudera it seems like —deploy-mode is preferred as it is more consistent with the way things are done with other cluster managers, i.e. there is no standalone-cluster or standalone-client masters. This applies to Mesos as well. Either syntax works, but I would like to propose to use ‘-master yarn —deploy-mode x’ instead of ‘-master yarn-cluster or -master yarn-client’ as it is consistent with other cluster managers . This would require updating all Spark pages related to submitting Spark applications to YARN. So far I’ve identified the following pages. 1) http://spark.apache.org/docs/latest/running-on-yarn.html http://spark.apache.org/docs/latest/running-on-yarn.html 2) http://spark.apache.org/docs/latest/submitting-applications.html#master-urls http://spark.apache.org/docs/latest/submitting-applications.html#master-urls There is a JIRA to track the progress on this as well. https://issues.apache.org/jira/browse/SPARK-9570 https://issues.apache.org/jira/browse/SPARK-9570 The option we choose dictates whether we update the documentation or spark-submit and spark-shell help pages. Any thoughts which direction we should go? Guru Medasani gdm...@gmail.com
Re: PSA: Maven 3.3.3 now required to build
Thanks Sean. I noticed this one while building Spark version 1.5.0-SNAPSHOT this morning. WARNING] Rule 0: org.apache.maven.plugins.enforcer.RequireMavenVersion failed with message: Detected Maven Version: 3.2.5 is not in the allowed range 3.3.3. Should we be using maven 3.3.3 locally or build/mvn starting from Spark 1.4.1 or Spark version 1.5? Guru Medasani gdm...@gmail.com On Aug 3, 2015, at 1:01 PM, Sean Owen so...@cloudera.com wrote: If you use build/mvn or are already using Maven 3.3.3 locally (i.e. via brew on OS X), then this won't affect you, but I wanted to call attention to https://github.com/apache/spark/pull/7852 which makes Maven 3.3.3 the minimum required to build Spark. This heads off problems from some behavior differences that Patrick and I observed between 3.3 and 3.2 last week, on top of the dependency reduced POM glitch from the 1.4.1 release window. Again all you need to do is use build/mvn if you don't already have the latest Maven installed and all will be well. Sean - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: PSA: Maven 3.3.3 now required to build
Thanks Sean. Reason I asked this is, in Building Spark documentation of 1.4.1, I still see this. https://spark.apache.org/docs/latest/building-spark.html https://spark.apache.org/docs/latest/building-spark.html Building Spark using Maven requires Maven 3.0.4 or newer and Java 6+. But I noticed the following warnings from the build of Spark version 1.5.0-snapshot. So I was wondering if the changes you mentioned relate to newer versions of Spark or for 1.4.1 version as well. [WARNING] Rule 0: org.apache.maven.plugins.enforcer.RequireMavenVersion failed with message: Detected Maven Version: 3.2.5 is not in the allowed range 3.3.3. [WARNING] Rule 1: org.apache.maven.plugins.enforcer.RequireJavaVersion failed with message: Detected JDK Version: 1.6.0-36 is not in the allowed range 1.7. Guru Medasani gdm...@gmail.com On Aug 3, 2015, at 2:38 PM, Sean Owen so...@cloudera.com wrote: Using ./build/mvn should always be fine. Your local mvn is fine too if it's 3.3.3 or later (3.3.3 is the latest). That's what any brew users on OS X out there will have, by the way. On Mon, Aug 3, 2015 at 8:37 PM, Guru Medasani gdm...@gmail.com wrote: Thanks Sean. I noticed this one while building Spark version 1.5.0-SNAPSHOT this morning. WARNING] Rule 0: org.apache.maven.plugins.enforcer.RequireMavenVersion failed with message: Detected Maven Version: 3.2.5 is not in the allowed range 3.3.3. Should we be using maven 3.3.3 locally or build/mvn starting from Spark 1.4.1 or Spark version 1.5? Guru Medasani gdm...@gmail.com On Aug 3, 2015, at 1:01 PM, Sean Owen so...@cloudera.com wrote: If you use build/mvn or are already using Maven 3.3.3 locally (i.e. via brew on OS X), then this won't affect you, but I wanted to call attention to https://github.com/apache/spark/pull/7852 which makes Maven 3.3.3 the minimum required to build Spark. This heads off problems from some behavior differences that Patrick and I observed between 3.3 and 3.2 last week, on top of the dependency reduced POM glitch from the 1.4.1 release window. Again all you need to do is use build/mvn if you don't already have the latest Maven installed and all will be well. Sean - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Welcoming some new committers
Congratulations to all the new committers! Guru Medasani gdm...@gmail.com On Jun 17, 2015, at 5:12 PM, Matei Zaharia matei.zaha...@gmail.com wrote: Hey all, Over the past 1.5 months we added a number of new committers to the project, and I wanted to welcome them now that all of their respective forms, accounts, etc are in. Join me in welcoming the following new committers: - Davies Liu - DB Tsai - Kousuke Saruta - Sandy Ryza - Yin Huai Looking forward to more great contributions from all of these folks. Matei - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Reporting serialized task size after task broadcast change?
I thought we could see this on the Spark Web UI storage tab. May be I was looking at something else too. On Sep 11, 2014, at 8:47 PM, Sandy Ryza sandy.r...@cloudera.com wrote: Hmm, well I can't find it now, must have been hallucinating. Do you know off the top of your head where I'd be able to find the size to log it? On Thu, Sep 11, 2014 at 6:33 PM, Reynold Xin r...@databricks.com wrote: I didn't know about that On Thu, Sep 11, 2014 at 6:29 PM, Sandy Ryza sandy.r...@cloudera.com wrote: It used to be available on the UI, no? On Thu, Sep 11, 2014 at 6:26 PM, Reynold Xin r...@databricks.com wrote: I don't think so. We should probably add a line to log it. On Thursday, September 11, 2014, Sandy Ryza sandy.r...@cloudera.com wrote: After the change to broadcast all task data, is there any easy way to discover the serialized size of the data getting sent down for a task? thanks, -Sandy - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
RE: Welcoming two new committers
Congrats Joey and Andrew! Sent from my Windows Phone From: Andrew Ormailto:and...@databricks.com Sent: 8/9/2014 2:43 AM To: Prashant Sharmamailto:scrapco...@gmail.com Cc: Xiangrui Mengmailto:men...@gmail.com; Christopher Nguyenmailto:c...@adatao.com; Joseph Gonzalezmailto:jegon...@eecs.berkeley.edu; Matei Zahariamailto:ma...@databricks.com; d...@spark.incubator.apache.orgmailto:d...@spark.incubator.apache.org Subject: Re: Welcoming two new committers Thanks everyone. I look forward to continuing to work with all of you! 2014-08-08 3:23 GMT-07:00 Prashant Sharma scrapco...@gmail.com: Congratulations Andrew and Joey. Prashant Sharma On Fri, Aug 8, 2014 at 2:10 PM, Xiangrui Meng men...@gmail.com wrote: Congrats, Joey Andrew!! -Xiangrui On Fri, Aug 8, 2014 at 12:14 AM, Christopher Nguyen c...@adatao.com wrote: +1 Joey Andrew :) -- Christopher T. Nguyen Co-founder CEO, Adatao http://adatao.com [ah-'DAY-tao] linkedin.com/in/ctnguyen On Thu, Aug 7, 2014 at 10:39 PM, Joseph Gonzalez jegon...@eecs.berkeley.edu wrote: Hi Everyone, Thank you for inviting me to be a committer. I look forward to working with everyone to ensure the continued success of the Spark project. Thanks! Joey On Thu, Aug 7, 2014 at 9:57 PM, Matei Zaharia ma...@databricks.com wrote: Hi everyone, The PMC recently voted to add two new committers and PMC members: Joey Gonzalez and Andrew Or. Both have been huge contributors in the past year -- Joey on much of GraphX as well as quite a bit of the initial work in MLlib, and Andrew on Spark Core. Join me in welcoming them as committers! Matei - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org