Re: Consistent recommendation for submitting spark apps to YARN, -master yarn --deploy-mode x vs -master yarn-x'

2015-08-05 Thread Guru Medasani
Following up on this thread to see if anyone has some thoughts or opinions on 
the mentioned approach.


Guru Medasani
gdm...@gmail.com



 On Aug 3, 2015, at 10:20 PM, Guru Medasani gdm...@gmail.com wrote:
 
 Hi,
 
 I was looking at the spark-submit and spark-shell --help  on both (Spark 
 1.3.1 and Spark 1.5-snapshot) versions and the Spark documentation for 
 submitting Spark applications to YARN. It seems to be there is some mismatch 
 in the preferred syntax and documentation. 
 
 Spark documentation 
 http://spark.apache.org/docs/latest/submitting-applications.html#master-urls
  says that we need to specify either yarn-cluster or yarn-client to connect 
 to a yarn cluster. 
 
 
 yarn-client   Connect to a YARN  
 http://spark.apache.org/docs/latest/running-on-yarn.htmlcluster in client 
 mode. The cluster location will be found based on the HADOOP_CONF_DIR or 
 YARN_CONF_DIR variable.
 yarn-cluster  Connect to a YARN  
 http://spark.apache.org/docs/latest/running-on-yarn.htmlcluster in cluster 
 mode. The cluster location will be found based on the HADOOP_CONF_DIR or 
 YARN_CONF_DIR variable.
 In the spark-submit --help it says the following Options: --master yarn 
 --deploy-mode cluster or client.
 
 Usage: spark-submit [options] app jar | python file [app arguments]
 Usage: spark-submit --kill [submission ID] --master [spark://...] 
 spark://...]
 Usage: spark-submit --status [submission ID] --master [spark://...] 
 spark://...]
 
 Options:
   --master MASTER_URL spark://host:port spark://host:port, 
 mesos://host:port mesos://host:port, yarn, or local.
   --deploy-mode DEPLOY_MODE   Whether to launch the driver program locally 
 (client) or
   on one of the worker machines inside the 
 cluster (cluster)
   (Default: client).
 
 I want to bring this to your attention as this is a bit confusing for someone 
 running Spark on YARN. For example, they look at the spark-submit help 
 command and start using the syntax, but when they look at online 
 documentation or user-group mailing list, they see different spark-submit 
 syntax. 
 
 From a quick discussion with other engineers at Cloudera it seems like 
 —deploy-mode is preferred as it is more consistent with the way things are 
 done with other cluster managers, i.e. there is no standalone-cluster or 
 standalone-client masters. This applies to Mesos as well.
 
 Either syntax works, but I would like to propose to use ‘-master yarn 
 —deploy-mode x’ instead of ‘-master yarn-cluster or -master yarn-client’ as 
 it is consistent with other cluster managers . This would require updating 
 all Spark pages related to submitting Spark applications to YARN.
 
 So far I’ve identified the following pages.
 
 1) http://spark.apache.org/docs/latest/running-on-yarn.html 
 http://spark.apache.org/docs/latest/running-on-yarn.html
 2) 
 http://spark.apache.org/docs/latest/submitting-applications.html#master-urls 
 http://spark.apache.org/docs/latest/submitting-applications.html#master-urls
 
 There is a JIRA to track the progress on this as well.
 
 https://issues.apache.org/jira/browse/SPARK-9570 
 https://issues.apache.org/jira/browse/SPARK-9570
  
 The option we choose dictates whether we update the documentation  or 
 spark-submit and spark-shell help pages.  
 
 Any thoughts which direction we should go? 
 
 Guru Medasani
 gdm...@gmail.com mailto:gdm...@gmail.com
 
 
 



Consistent recommendation for submitting spark apps to YARN, -master yarn --deploy-mode x vs -master yarn-x'

2015-08-03 Thread Guru Medasani
Hi,

I was looking at the spark-submit and spark-shell --help  on both (Spark 1.3.1 
and Spark 1.5-snapshot) versions and the Spark documentation for submitting 
Spark applications to YARN. It seems to be there is some mismatch in the 
preferred syntax and documentation. 

Spark documentation 
http://spark.apache.org/docs/latest/submitting-applications.html#master-urls 
says that we need to specify either yarn-cluster or yarn-client to connect to a 
yarn cluster. 


yarn-client Connect to a YARN  
http://spark.apache.org/docs/latest/running-on-yarn.htmlcluster in client 
mode. The cluster location will be found based on the HADOOP_CONF_DIR or 
YARN_CONF_DIR variable.
yarn-clusterConnect to a YARN  
http://spark.apache.org/docs/latest/running-on-yarn.htmlcluster in cluster 
mode. The cluster location will be found based on the HADOOP_CONF_DIR or 
YARN_CONF_DIR variable.
In the spark-submit --help it says the following Options: --master yarn 
--deploy-mode cluster or client.

Usage: spark-submit [options] app jar | python file [app arguments]
Usage: spark-submit --kill [submission ID] --master [spark://...]
Usage: spark-submit --status [submission ID] --master [spark://...]

Options:
  --master MASTER_URL spark://host:port, mesos://host:port, yarn, or 
local.
  --deploy-mode DEPLOY_MODE   Whether to launch the driver program locally 
(client) or
  on one of the worker machines inside the cluster 
(cluster)
  (Default: client).

I want to bring this to your attention as this is a bit confusing for someone 
running Spark on YARN. For example, they look at the spark-submit help command 
and start using the syntax, but when they look at online documentation or 
user-group mailing list, they see different spark-submit syntax. 

From a quick discussion with other engineers at Cloudera it seems like 
—deploy-mode is preferred as it is more consistent with the way things are done 
with other cluster managers, i.e. there is no standalone-cluster or 
standalone-client masters. This applies to Mesos as well.

Either syntax works, but I would like to propose to use ‘-master yarn 
—deploy-mode x’ instead of ‘-master yarn-cluster or -master yarn-client’ as it 
is consistent with other cluster managers . This would require updating all 
Spark pages related to submitting Spark applications to YARN.

So far I’ve identified the following pages.

1) http://spark.apache.org/docs/latest/running-on-yarn.html 
http://spark.apache.org/docs/latest/running-on-yarn.html
2) http://spark.apache.org/docs/latest/submitting-applications.html#master-urls 
http://spark.apache.org/docs/latest/submitting-applications.html#master-urls

There is a JIRA to track the progress on this as well.

https://issues.apache.org/jira/browse/SPARK-9570 
https://issues.apache.org/jira/browse/SPARK-9570
 
The option we choose dictates whether we update the documentation  or 
spark-submit and spark-shell help pages.  

Any thoughts which direction we should go? 

Guru Medasani
gdm...@gmail.com





Re: PSA: Maven 3.3.3 now required to build

2015-08-03 Thread Guru Medasani
Thanks Sean. I noticed this one while building Spark version 1.5.0-SNAPSHOT 
this morning. 

WARNING] Rule 0: org.apache.maven.plugins.enforcer.RequireMavenVersion failed 
with message:
Detected Maven Version: 3.2.5 is not in the allowed range 3.3.3.

Should we be using maven 3.3.3 locally or build/mvn starting from Spark 1.4.1 
or Spark version 1.5?

Guru Medasani
gdm...@gmail.com



 On Aug 3, 2015, at 1:01 PM, Sean Owen so...@cloudera.com wrote:
 
 If you use build/mvn or are already using Maven 3.3.3 locally (i.e.
 via brew on OS X), then this won't affect you, but I wanted to call
 attention to https://github.com/apache/spark/pull/7852 which makes
 Maven 3.3.3 the minimum required to build Spark. This heads off
 problems from some behavior differences that Patrick and I observed
 between 3.3 and 3.2 last week, on top of the dependency reduced POM
 glitch from the 1.4.1 release window.
 
 Again all you need to do is use build/mvn if you don't already have
 the latest Maven installed and all will be well.
 
 Sean
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org
 



Re: PSA: Maven 3.3.3 now required to build

2015-08-03 Thread Guru Medasani
Thanks Sean. Reason I asked this is, in Building Spark documentation of 1.4.1, 
I still see this.

https://spark.apache.org/docs/latest/building-spark.html 
https://spark.apache.org/docs/latest/building-spark.html

Building Spark using Maven requires Maven 3.0.4 or newer and Java 6+.

But I noticed the following warnings from the build of Spark version 
1.5.0-snapshot. So I was wondering if the changes you mentioned relate to newer 
versions of Spark or for 1.4.1 version as well.

[WARNING] Rule 0: org.apache.maven.plugins.enforcer.RequireMavenVersion failed 
with message:
Detected Maven Version: 3.2.5 is not in the allowed range 3.3.3.

[WARNING] Rule 1: org.apache.maven.plugins.enforcer.RequireJavaVersion failed 
with message:
Detected JDK Version: 1.6.0-36 is not in the allowed range 1.7.

Guru Medasani
gdm...@gmail.com

 On Aug 3, 2015, at 2:38 PM, Sean Owen so...@cloudera.com wrote:
 
 Using ./build/mvn should always be fine. Your local mvn is fine too if
 it's 3.3.3 or later (3.3.3 is the latest). That's what any brew users
 on OS X out there will have, by the way.
 
 On Mon, Aug 3, 2015 at 8:37 PM, Guru Medasani gdm...@gmail.com wrote:
 Thanks Sean. I noticed this one while building Spark version 1.5.0-SNAPSHOT
 this morning.
 
 WARNING] Rule 0: org.apache.maven.plugins.enforcer.RequireMavenVersion
 failed with message:
 Detected Maven Version: 3.2.5 is not in the allowed range 3.3.3.
 
 Should we be using maven 3.3.3 locally or build/mvn starting from Spark
 1.4.1 or Spark version 1.5?
 
 Guru Medasani
 gdm...@gmail.com
 
 
 
 On Aug 3, 2015, at 1:01 PM, Sean Owen so...@cloudera.com wrote:
 
 If you use build/mvn or are already using Maven 3.3.3 locally (i.e.
 via brew on OS X), then this won't affect you, but I wanted to call
 attention to https://github.com/apache/spark/pull/7852 which makes
 Maven 3.3.3 the minimum required to build Spark. This heads off
 problems from some behavior differences that Patrick and I observed
 between 3.3 and 3.2 last week, on top of the dependency reduced POM
 glitch from the 1.4.1 release window.
 
 Again all you need to do is use build/mvn if you don't already have
 the latest Maven installed and all will be well.
 
 Sean
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org
 
 



Re: Welcoming some new committers

2015-06-20 Thread Guru Medasani
Congratulations to all the new committers!

Guru Medasani
gdm...@gmail.com



 On Jun 17, 2015, at 5:12 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
 
 Hey all,
 
 Over the past 1.5 months we added a number of new committers to the project, 
 and I wanted to welcome them now that all of their respective forms, 
 accounts, etc are in. Join me in welcoming the following new committers:
 
 - Davies Liu
 - DB Tsai
 - Kousuke Saruta
 - Sandy Ryza
 - Yin Huai
 
 Looking forward to more great contributions from all of these folks.
 
 Matei
 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org
 


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Reporting serialized task size after task broadcast change?

2014-09-12 Thread Guru Medasani
I thought we could see this on the Spark Web UI storage tab. May be I was 
looking at something else too.

On Sep 11, 2014, at 8:47 PM, Sandy Ryza sandy.r...@cloudera.com wrote:

 Hmm, well I can't find it now, must have been hallucinating.  Do you know
 off the top of your head where I'd be able to find the size to log it?
 
 On Thu, Sep 11, 2014 at 6:33 PM, Reynold Xin r...@databricks.com wrote:
 
 I didn't know about that 
 
 On Thu, Sep 11, 2014 at 6:29 PM, Sandy Ryza sandy.r...@cloudera.com
 wrote:
 
 It used to be available on the UI, no?
 
 On Thu, Sep 11, 2014 at 6:26 PM, Reynold Xin r...@databricks.com wrote:
 
 I don't think so. We should probably add a line to log it.
 
 
 On Thursday, September 11, 2014, Sandy Ryza sandy.r...@cloudera.com
 wrote:
 
 After the change to broadcast all task data, is there any easy way to
 discover the serialized size of the data getting sent down for a task?
 
 thanks,
 -Sandy
 
 
 
 
 


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



RE: Welcoming two new committers

2014-08-09 Thread Guru Medasani
Congrats Joey and Andrew!

Sent from my Windows Phone

From: Andrew Ormailto:and...@databricks.com
Sent: ‎8/‎9/‎2014 2:43 AM
To: Prashant Sharmamailto:scrapco...@gmail.com
Cc: Xiangrui Mengmailto:men...@gmail.com; Christopher 
Nguyenmailto:c...@adatao.com; Joseph 
Gonzalezmailto:jegon...@eecs.berkeley.edu; Matei 
Zahariamailto:ma...@databricks.com; 
d...@spark.incubator.apache.orgmailto:d...@spark.incubator.apache.org
Subject: Re: Welcoming two new committers

Thanks everyone. I look forward to continuing to work with all of you!


2014-08-08 3:23 GMT-07:00 Prashant Sharma scrapco...@gmail.com:

 Congratulations Andrew and Joey.

 Prashant Sharma




 On Fri, Aug 8, 2014 at 2:10 PM, Xiangrui Meng men...@gmail.com wrote:

 Congrats, Joey  Andrew!!

 -Xiangrui

 On Fri, Aug 8, 2014 at 12:14 AM, Christopher Nguyen c...@adatao.com
 wrote:
  +1 Joey  Andrew :)
 
  --
  Christopher T. Nguyen
  Co-founder  CEO, Adatao http://adatao.com [ah-'DAY-tao]
  linkedin.com/in/ctnguyen
 
 
 
  On Thu, Aug 7, 2014 at 10:39 PM, Joseph Gonzalez 
 jegon...@eecs.berkeley.edu
  wrote:
 
  Hi Everyone,
 
  Thank you for inviting me to be a committer.  I look forward to working
  with everyone to ensure the continued success of the Spark project.
 
  Thanks!
  Joey
 
 
 
 
  On Thu, Aug 7, 2014 at 9:57 PM, Matei Zaharia ma...@databricks.com
  wrote:
 
   Hi everyone,
  
   The PMC recently voted to add two new committers and PMC members:
 Joey
   Gonzalez and Andrew Or. Both have been huge contributors in the past
 year
   -- Joey on much of GraphX as well as quite a bit of the initial work
 in
   MLlib, and Andrew on Spark Core. Join me in welcoming them as
 committers!
  
   Matei
  
  
  
  
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org