[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)

2016-05-20 Thread Mete Kural (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293663#comment-15293663
 ] 

Mete Kural commented on SPARK-3821:
---

Thanks for the information Nicholas! Now I understand the Spark project 
strategy around this. spark-ec2 not showing up in the docs with Spark 2.0 would 
be consistent as you write.

Thanks for the referral to the Apache Big Top project. I will examine what's 
available there.

> Develop an automated way of creating Spark images (AMI, Docker, and others)
> ---
>
> Key: SPARK-3821
> URL: https://issues.apache.org/jira/browse/SPARK-3821
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, EC2
>Reporter: Nicholas Chammas
>Assignee: Nicholas Chammas
> Attachments: packer-proposal.html
>
>
> Right now the creation of Spark AMIs or Docker containers is done manually. 
> With tools like [Packer|http://www.packer.io/], we should be able to automate 
> this work, and do so in such a way that multiple types of machine images can 
> be created from a single template.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)

2016-05-19 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292599#comment-15292599
 ] 

Nicholas Chammas commented on SPARK-3821:
-

You can deploy Spark today on Docker just fine. It's just that Spark itself 
does not maintain any official Dockerfiles and likely never will since the 
project is actually trying to push deployment stuff outside the main project 
(hence why spark-ec2 was moved out; you will not see spark-ec2 in the official 
docs once Spark 2.0 comes out). You may be more interested in the Apache Big 
Top project, which focuses on big data system deployment (including Spark) and 
may have stuff for Docker specifically. 

Mesos is a separate matter, because it's a resource manager (analogous to YARN) 
that integrates with Spark at a low level.

If you still think Spark should host and maintain an official Dockerfile and 
Docker images that are suitable for production use, please open a separate 
issue. I think the maintainers will reject it on the grounds that I have 
explained here, though. (Can't say for sure; after all I'm just a random 
contributor.)

> Develop an automated way of creating Spark images (AMI, Docker, and others)
> ---
>
> Key: SPARK-3821
> URL: https://issues.apache.org/jira/browse/SPARK-3821
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, EC2
>Reporter: Nicholas Chammas
>Assignee: Nicholas Chammas
> Attachments: packer-proposal.html
>
>
> Right now the creation of Spark AMIs or Docker containers is done manually. 
> With tools like [Packer|http://www.packer.io/], we should be able to automate 
> this work, and do so in such a way that multiple types of machine images can 
> be created from a single template.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)

2016-05-19 Thread Mete Kural (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292488#comment-15292488
 ] 

Mete Kural commented on SPARK-3821:
---

Thank you for the response Nicholas. spark-ec2 does take care of AMIs for ec2 
and in fact is documented in Spark documentation as a deployment method along 
with distribution with Spark. However, the same level of presence doesn't seem 
to exist for Docker as a deployment method. What's inside the docker folder in 
Spark is not really in shape for a production deployment, not documented in 
Spark documentation either, and doesn't seem to have been worked on in quite a 
while. It seems the only way the Spark project officially supports running 
Spark on Docker is via Mesos, would you say that is correct? With Docker 
becoming an industry standard as of a month ago, I hope there will be renewed 
interest within the Spark project in supporting Docker as an official 
deployment method without the Mesos requirement.

> Develop an automated way of creating Spark images (AMI, Docker, and others)
> ---
>
> Key: SPARK-3821
> URL: https://issues.apache.org/jira/browse/SPARK-3821
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, EC2
>Reporter: Nicholas Chammas
>Assignee: Nicholas Chammas
> Attachments: packer-proposal.html
>
>
> Right now the creation of Spark AMIs or Docker containers is done manually. 
> With tools like [Packer|http://www.packer.io/], we should be able to automate 
> this work, and do so in such a way that multiple types of machine images can 
> be created from a single template.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)

2016-05-19 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292198#comment-15292198
 ] 

Nicholas Chammas commented on SPARK-3821:
-

Not sure if there is renewed interest, but at this point this issue is outside 
the scope of the Spark project. The original impetus for this issue was to 
create AMIs for spark-ec2 in an automated fashion, and spark-ec2 has been moved 
out of the main Spark project.

spark-ec2 now lives here: https://github.com/amplab/spark-ec2

> Develop an automated way of creating Spark images (AMI, Docker, and others)
> ---
>
> Key: SPARK-3821
> URL: https://issues.apache.org/jira/browse/SPARK-3821
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, EC2
>Reporter: Nicholas Chammas
>Assignee: Nicholas Chammas
> Attachments: packer-proposal.html
>
>
> Right now the creation of Spark AMIs or Docker containers is done manually. 
> With tools like [Packer|http://www.packer.io/], we should be able to automate 
> this work, and do so in such a way that multiple types of machine images can 
> be created from a single template.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)

2016-05-19 Thread Mete Kural (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292185#comment-15292185
 ] 

Mete Kural commented on SPARK-3821:
---

Is there any new interest in this now that the Docker image format is 
officially the industry's standard container format 
(http://thenewstack.io/open-container-initiative-launches-container-image-format-spec/
 https://blog.docker.com/2016/04/docker-engine-1-11-runc/)?

> Develop an automated way of creating Spark images (AMI, Docker, and others)
> ---
>
> Key: SPARK-3821
> URL: https://issues.apache.org/jira/browse/SPARK-3821
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, EC2
>Reporter: Nicholas Chammas
>Assignee: Nicholas Chammas
> Attachments: packer-proposal.html
>
>
> Right now the creation of Spark AMIs or Docker containers is done manually. 
> With tools like [Packer|http://www.packer.io/], we should be able to automate 
> this work, and do so in such a way that multiple types of machine images can 
> be created from a single template.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)

2015-02-22 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14332303#comment-14332303
 ] 

Nicholas Chammas commented on SPARK-3821:
-

For those wanting to use the work being done as part of this issue before it 
gets merged upstream, I posted some [instructions on Stack 
Overflow|http://stackoverflow.com/a/28639669/877069] in response to a related 
question.

 Develop an automated way of creating Spark images (AMI, Docker, and others)
 ---

 Key: SPARK-3821
 URL: https://issues.apache.org/jira/browse/SPARK-3821
 Project: Spark
  Issue Type: Improvement
  Components: Build, EC2
Reporter: Nicholas Chammas
Assignee: Nicholas Chammas
 Attachments: packer-proposal.html


 Right now the creation of Spark AMIs or Docker containers is done manually. 
 With tools like [Packer|http://www.packer.io/], we should be able to automate 
 this work, and do so in such a way that multiple types of machine images can 
 be created from a single template.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)

2015-02-13 Thread Chris Love (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320874#comment-14320874
 ] 

Chris Love commented on SPARK-3821:
---

I notice that the packer built ami comes with java7, how would your recommend 
handling java8?  Should both be installed?  

Also which aws linux were the new ami's built off of?  Will this be in a 1.2.x 
branch or just 1.3?

Thanks

Chris

 Develop an automated way of creating Spark images (AMI, Docker, and others)
 ---

 Key: SPARK-3821
 URL: https://issues.apache.org/jira/browse/SPARK-3821
 Project: Spark
  Issue Type: Improvement
  Components: Build, EC2
Reporter: Nicholas Chammas
Assignee: Nicholas Chammas
 Attachments: packer-proposal.html


 Right now the creation of Spark AMIs or Docker containers is done manually. 
 With tools like [Packer|http://www.packer.io/], we should be able to automate 
 this work, and do so in such a way that multiple types of machine images can 
 be created from a single template.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)

2015-02-13 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320905#comment-14320905
 ] 

Nicholas Chammas commented on SPARK-3821:
-

If you want Java 8 alongside 7, you can install both to separate paths. For 
spark-ec2's purposes, we only need 7.

The AMIs used as the base are [defined in the Packer 
template|https://github.com/nchammas/spark-ec2/blob/0f313de64ad9542d1a0f0d6f27131ca4bc01d8c3/image-build/spark-packer-template.json#L5-L6].
 The generated AMIs do not include Spark itself--just its dependencies, plus 
related tools for spark-ec2.

 Develop an automated way of creating Spark images (AMI, Docker, and others)
 ---

 Key: SPARK-3821
 URL: https://issues.apache.org/jira/browse/SPARK-3821
 Project: Spark
  Issue Type: Improvement
  Components: Build, EC2
Reporter: Nicholas Chammas
Assignee: Nicholas Chammas
 Attachments: packer-proposal.html


 Right now the creation of Spark AMIs or Docker containers is done manually. 
 With tools like [Packer|http://www.packer.io/], we should be able to automate 
 this work, and do so in such a way that multiple types of machine images can 
 be created from a single template.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)

2015-02-13 Thread Florian Verhein (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320995#comment-14320995
 ] 

Florian Verhein commented on SPARK-3821:


RE: Java, that reminds me... We should probably be using OracleJDK rather than 
OpenJDK. But I think this should be a separate issue, so just created 
#SPARK-5813.

 Develop an automated way of creating Spark images (AMI, Docker, and others)
 ---

 Key: SPARK-3821
 URL: https://issues.apache.org/jira/browse/SPARK-3821
 Project: Spark
  Issue Type: Improvement
  Components: Build, EC2
Reporter: Nicholas Chammas
Assignee: Nicholas Chammas
 Attachments: packer-proposal.html


 Right now the creation of Spark AMIs or Docker containers is done manually. 
 With tools like [Packer|http://www.packer.io/], we should be able to automate 
 this work, and do so in such a way that multiple types of machine images can 
 be created from a single template.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)

2015-01-14 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277104#comment-14277104
 ] 

Nicholas Chammas commented on SPARK-3821:
-

Hmm, I doubt that was intentional since it seems to be a problem. Maybe
Shivaram can shed some light on the choice of pre built distribution.

I'm guessing it was just an oversight and we need improved logic to install
a wider variety of distributions so that related software like Tachyon
always works correctly.
On 2015년 1월 14일 (수) at 오전 1:51 Florian Verhein (JIRA) j...@apache.org



 Develop an automated way of creating Spark images (AMI, Docker, and others)
 ---

 Key: SPARK-3821
 URL: https://issues.apache.org/jira/browse/SPARK-3821
 Project: Spark
  Issue Type: Improvement
  Components: Build, EC2
Reporter: Nicholas Chammas
Assignee: Nicholas Chammas
 Attachments: packer-proposal.html


 Right now the creation of Spark AMIs or Docker containers is done manually. 
 With tools like [Packer|http://www.packer.io/], we should be able to automate 
 this work, and do so in such a way that multiple types of machine images can 
 be created from a single template.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)

2015-01-14 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277334#comment-14277334
 ] 

Shivaram Venkataraman commented on SPARK-3821:
--

Regarding the pre-built distributions, AFAIK we don't support full Hadoop2 as 
in YARN. We run CDH4 which has some parts of Hadoop2, but with MapReduce. There 
is an open PR to add support for Hadoop2 at 
https://github.com/mesos/spark-ec2/pull/77 and and you can see that it gets the 
right [prebuilt 
Spark|https://github.com/mesos/spark-ec2/pull/77/files#diff-1d040c3294246f2b59643d63868fc2adR97]
 in that case 

 Develop an automated way of creating Spark images (AMI, Docker, and others)
 ---

 Key: SPARK-3821
 URL: https://issues.apache.org/jira/browse/SPARK-3821
 Project: Spark
  Issue Type: Improvement
  Components: Build, EC2
Reporter: Nicholas Chammas
Assignee: Nicholas Chammas
 Attachments: packer-proposal.html


 Right now the creation of Spark AMIs or Docker containers is done manually. 
 With tools like [Packer|http://www.packer.io/], we should be able to automate 
 this work, and do so in such a way that multiple types of machine images can 
 be created from a single template.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)

2015-01-13 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276471#comment-14276471
 ] 

Nicholas Chammas commented on SPARK-3821:
-

[~shivaram] Are we ready to open a PR against {{mesos/spark-ec2}} and start a 
review discussion there?

 Develop an automated way of creating Spark images (AMI, Docker, and others)
 ---

 Key: SPARK-3821
 URL: https://issues.apache.org/jira/browse/SPARK-3821
 Project: Spark
  Issue Type: Improvement
  Components: Build, EC2
Reporter: Nicholas Chammas
Assignee: Nicholas Chammas
 Attachments: packer-proposal.html


 Right now the creation of Spark AMIs or Docker containers is done manually. 
 With tools like [Packer|http://www.packer.io/], we should be able to automate 
 this work, and do so in such a way that multiple types of machine images can 
 be created from a single template.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)

2015-01-13 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276505#comment-14276505
 ] 

Shivaram Venkataraman commented on SPARK-3821:
--

[~nchammas] Yes -- That sounds good

 Develop an automated way of creating Spark images (AMI, Docker, and others)
 ---

 Key: SPARK-3821
 URL: https://issues.apache.org/jira/browse/SPARK-3821
 Project: Spark
  Issue Type: Improvement
  Components: Build, EC2
Reporter: Nicholas Chammas
Assignee: Nicholas Chammas
 Attachments: packer-proposal.html


 Right now the creation of Spark AMIs or Docker containers is done manually. 
 With tools like [Packer|http://www.packer.io/], we should be able to automate 
 this work, and do so in such a way that multiple types of machine images can 
 be created from a single template.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)

2015-01-13 Thread Florian Verhein (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276572#comment-14276572
 ] 

Florian Verhein commented on SPARK-3821:


Thanks [~nchammas], that makes sense.

Created #SPARK-5241.
I'm not sure about the pre-built scenario, but am guessing e.g. 
http://s3.amazonaws.com/spark-related-packages/spark-1.2.0-bin-hadoop2.4.tgz != 
http://s3.amazonaws.com/spark-related-packages/spark-1.2.0-bin-cdh4.tgz. So 
perhaps the intent is that the spark-ec2 scripts only support cdh 
distributions...  

 Develop an automated way of creating Spark images (AMI, Docker, and others)
 ---

 Key: SPARK-3821
 URL: https://issues.apache.org/jira/browse/SPARK-3821
 Project: Spark
  Issue Type: Improvement
  Components: Build, EC2
Reporter: Nicholas Chammas
Assignee: Nicholas Chammas
 Attachments: packer-proposal.html


 Right now the creation of Spark AMIs or Docker containers is done manually. 
 With tools like [Packer|http://www.packer.io/], we should be able to automate 
 this work, and do so in such a way that multiple types of machine images can 
 be created from a single template.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)

2015-01-13 Thread Florian Verhein (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276263#comment-14276263
 ] 

Florian Verhein commented on SPARK-3821:


This is great stuff! It'll also help serve as some documentation for AMI 
requirements when using the spark-ec2 scripts.  

Re the above, I think everything in create_image.sh can be refactored to packer 
(+ duplicate removal - e.g. root login). I've attempted to do this in a fork of 
[~nchammas]'s work, but my use case is a bit different in that I need to go 
from a fresh centos6 minimal (rather than an amazon linux AMI) and then add 
other things.

Possibly related to AMI generation in general: I've noticed that the version 
dependencies in the spark-ec2 scripts are broken. I suspect this will need to 
be handled in both the image and the setup. For example:
- It looks like Spark needs to be built with the right hadoop profile to work, 
but this isn't adhered to. This applies when spark is built from a git checkout 
or from an existing build. This is likely also the case with Tachyon too. 
Probably the cause of https://issues.apache.org/jira/browse/SPARK-3185
- The hadoop native libs are built on the image using 2.4.1, but then copied 
into whatever hadoop build is downloaded in the ephemeral-hdfs and 
persistent-hdfs scripts. I suspect that could cause issues too. Since building 
hadoop is very time consuming, it's something you'd wan't on the image - hence 
creating a dependency. 
- The version dependencies for other things like ganglia aren't documented (I 
believe this is installed on the image but duplicated again in 
spark-ec2/ganglia). I've found that the ganglia config doesn't work for me (but 
recall I'm using a different base AMI, so I'll likely get a different ganglia 
version). I have a sneaky suspicion that the hadoop configs in spark-ec2 won't 
work across the hadoop versions either (but, fingers crossed!).

Re the above, I might try keeping the entire hadoop build (from the image 
creation) for the hdfs setup.

Sorry for the sidetrack, but struggling though all this so hoping it might ring 
a bell for someone.  

p.s. With the image automation, it might also be worth considering putting more 
on the image as an option (esp for people happy to build their own AMIs). For 
example, I see no reason why the module init.sh scripts can't be run from 
packer in order to speed start-up times of the cluster :) 


 Develop an automated way of creating Spark images (AMI, Docker, and others)
 ---

 Key: SPARK-3821
 URL: https://issues.apache.org/jira/browse/SPARK-3821
 Project: Spark
  Issue Type: Improvement
  Components: Build, EC2
Reporter: Nicholas Chammas
Assignee: Nicholas Chammas
 Attachments: packer-proposal.html


 Right now the creation of Spark AMIs or Docker containers is done manually. 
 With tools like [Packer|http://www.packer.io/], we should be able to automate 
 this work, and do so in such a way that multiple types of machine images can 
 be created from a single template.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)

2015-01-13 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276411#comment-14276411
 ] 

Nicholas Chammas commented on SPARK-3821:
-

Hi [~florianverhein] and thanks for chiming in!

{quote}
Re the above, I think everything in create_image.sh can be refactored to packer 
(+ duplicate removal - e.g. root login).
{quote}

Definitely. I'm hoping to make as few changes as possible to the existing 
{{create_image.sh}} script to reduce the review burden, but after this initial 
proposal is accepted it makes sense to refactor these scripts. There is some 
related work proposed in [SPARK-5189].

Some of the things you call out regarding version mismatches and whatnot sound 
like they might merit their own JIRA issues.

For example:

{quote}
It looks like Spark needs to be built with the right hadoop profile to work, 
but this isn't adhered to. 
{quote}

I haven't tested this out, but from the Spark init script, it looks like the 
correct version of Spark is used in [the pre-built 
scenario|https://github.com/mesos/spark-ec2/blob/3a95101c70e6892a8a48cc54094adaed1458487a/spark/init.sh#L109].
 Not so in the [build-from-git 
scenario|https://github.com/mesos/spark-ec2/blob/3a95101c70e6892a8a48cc54094adaed1458487a/spark/init.sh#L21],
 so nice catch. Could you file a JIRA issue for that?

{quote}
For example, I see no reason why the module init.sh scripts can't be run from 
packer in order to speed start-up times of the cluster
{quote}

Regarding this and other ideas regarding pre-baking more on the images, [that's 
how this proposal started, 
actually|https://github.com/nchammas/spark-ec2/blob/9c28878694171ba085a10acd4405c702397d28ce/packer/README.md#base-vs-spark-pre-installed]
 (here's the [original Packer 
template|https://github.com/nchammas/spark-ec2/blob/9c28878694171ba085a10acd4405c702397d28ce/packer/spark-packer.json#L118-L133]).
 We decided to rip that out to reduce the complexity of the initial proposal 
and make it easier to specify different versions of Spark and Hadoop at launch 
time.

 Develop an automated way of creating Spark images (AMI, Docker, and others)
 ---

 Key: SPARK-3821
 URL: https://issues.apache.org/jira/browse/SPARK-3821
 Project: Spark
  Issue Type: Improvement
  Components: Build, EC2
Reporter: Nicholas Chammas
Assignee: Nicholas Chammas
 Attachments: packer-proposal.html


 Right now the creation of Spark AMIs or Docker containers is done manually. 
 With tools like [Packer|http://www.packer.io/], we should be able to automate 
 this work, and do so in such a way that multiple types of machine images can 
 be created from a single template.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)

2015-01-12 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14274535#comment-14274535
 ] 

Nicholas Chammas commented on SPARK-3821:
-

That's correct. All those paths are just relative to the folder containing 
{{spark-packer.json}}.

 Develop an automated way of creating Spark images (AMI, Docker, and others)
 ---

 Key: SPARK-3821
 URL: https://issues.apache.org/jira/browse/SPARK-3821
 Project: Spark
  Issue Type: Improvement
  Components: Build, EC2
Reporter: Nicholas Chammas
Assignee: Nicholas Chammas
 Attachments: packer-proposal.html


 Right now the creation of Spark AMIs or Docker containers is done manually. 
 With tools like [Packer|http://www.packer.io/], we should be able to automate 
 this work, and do so in such a way that multiple types of machine images can 
 be created from a single template.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)

2015-01-11 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273187#comment-14273187
 ] 

Nicholas Chammas commented on SPARK-3821:
-

Updated launch stats:
* Launching cluster with 50 slaves in {{us-east-1}}.
* Stats for best of 3 runs.

{{branch-1.3}} @ 
[{{3a95101}}|https://github.com/mesos/spark-ec2/tree/3a95101c70e6892a8a48cc54094adaed1458487a]:
{code}
Cluster is now in 'ssh-ready' state. Waited 460 seconds.
[timing] rsync /root/spark-ec2:  00h 00m 07s
[timing] setup-slave:  00h 00m 28s
[timing] scala init:  00h 00m 11s
[timing] spark init:  00h 00m 07s
[timing] ephemeral-hdfs init:  00h 12m 40s
[timing] persistent-hdfs init:  00h 12m 35s
[timing] spark-standalone init:  00h 00m 00s
[timing] tachyon init:  00h 00m 08s
[timing] ganglia init:  00h 00m 53s
[timing] scala setup:  00h 03m 11s
[timing] spark setup:  00h 21m 20s
[timing] ephemeral-hdfs setup:  00h 00m 48s
[timing] persistent-hdfs setup:  00h 00m 43s
[timing] spark-standalone setup:  00h 01m 19s
[timing] tachyon setup:  00h 03m 06s
[timing] ganglia setup:  00h 00m 32s
{code}


{{packer}} @ 
[{{273c8c5}}|https://github.com/nchammas/spark-ec2/tree/273c8c518fbc6e86e0fb4410efbe77a4d4e4ff5b]:

{code}
Cluster is now in 'ssh-ready' state. Waited 292 seconds.
[timing] rsync /root/spark-ec2:  00h 00m 20s
[timing] setup-slave:  00h 00m 19s
[timing] scala init:  00h 00m 12s
[timing] spark init:  00h 00m 08s
[timing] ephemeral-hdfs init:  00h 12m 58s
[timing] persistent-hdfs init:  00h 12m 55s
[timing] spark-standalone init:  00h 00m 00s
[timing] tachyon init:  00h 00m 10s
[timing] ganglia init:  00h 00m 15s
[timing] scala setup:  00h 03m 19s
[timing] spark setup:  00h 20m 32s
[timing] ephemeral-hdfs setup:  00h 00m 34s
[timing] persistent-hdfs setup:  00h 00m 27s
[timing] spark-standalone setup:  00h 00m 47s
[timing] tachyon setup:  00h 03m 15s
[timing] ganglia setup:  00h 00m 23s
{code}

As you can see, with the exception of time-to-SSH-availability, things are 
mostly the same across the current and Packer-generated AMIs. I've proposed 
improvements to cut down the launch times of large clusters in [a separate 
issue|SPARK-5189].

[~shivaram] - At this point I think it's safe to say that the approach proposed 
here is straightforward and worth pursuing. All we need now is a review of [the 
scripts that install various 
stuff|https://github.com/nchammas/spark-ec2/blob/273c8c518fbc6e86e0fb4410efbe77a4d4e4ff5b/packer/spark-packer.json#L63-L66]
 (e.g. Ganglia, Python 2.7, etc.) on the AMI to make sure it all makes sense.

 Develop an automated way of creating Spark images (AMI, Docker, and others)
 ---

 Key: SPARK-3821
 URL: https://issues.apache.org/jira/browse/SPARK-3821
 Project: Spark
  Issue Type: Improvement
  Components: Build, EC2
Reporter: Nicholas Chammas
Assignee: Nicholas Chammas
 Attachments: packer-proposal.html


 Right now the creation of Spark AMIs or Docker containers is done manually. 
 With tools like [Packer|http://www.packer.io/], we should be able to automate 
 this work, and do so in such a way that multiple types of machine images can 
 be created from a single template.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)

2015-01-02 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14263187#comment-14263187
 ] 

Nicholas Chammas commented on SPARK-3821:
-

I need to brush up on my statistics, but I think the difference between base 
AMI and Packer AMI is not statistically significant.

The benchmark just tested time from instance launch to SSH availability. 
Nothing was installed or done with the instances after SSH became available. 
(i.e. I wasn't creating Spark clusters.) I still have to post updated 
benchmarks for full cluster launches.

Is there anything else you wanted to see before reviewing this proposal in more 
detail?

 Develop an automated way of creating Spark images (AMI, Docker, and others)
 ---

 Key: SPARK-3821
 URL: https://issues.apache.org/jira/browse/SPARK-3821
 Project: Spark
  Issue Type: Improvement
  Components: Build, EC2
Reporter: Nicholas Chammas
Assignee: Nicholas Chammas
 Attachments: packer-proposal.html


 Right now the creation of Spark AMIs or Docker containers is done manually. 
 With tools like [Packer|http://www.packer.io/], we should be able to automate 
 this work, and do so in such a way that multiple types of machine images can 
 be created from a single template.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)

2015-01-02 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14263206#comment-14263206
 ] 

Nicholas Chammas commented on SPARK-3821:
-

I have Packer configured to run {{create_image.sh}}, as well as other scripts I 
added (e.g. to install Python 2.7), to generate the AMIs I am using. So testing 
Packer-generated AMIs against manually-generated ones (by running 
{{create_image.sh}} by hand) should show little difference.

Packer is just tooling to automate the application of existing scripts like 
{{create_image.sh}} towards creating AMIs and other image types like GCE images 
and Docker images. The goal is to make it easy to generate and update Spark 
AMIs (and eventually Docker images too) in an automated fashion.

 Develop an automated way of creating Spark images (AMI, Docker, and others)
 ---

 Key: SPARK-3821
 URL: https://issues.apache.org/jira/browse/SPARK-3821
 Project: Spark
  Issue Type: Improvement
  Components: Build, EC2
Reporter: Nicholas Chammas
Assignee: Nicholas Chammas
 Attachments: packer-proposal.html


 Right now the creation of Spark AMIs or Docker containers is done manually. 
 With tools like [Packer|http://www.packer.io/], we should be able to automate 
 this work, and do so in such a way that multiple types of machine images can 
 be created from a single template.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)

2015-01-02 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14263193#comment-14263193
 ] 

Shivaram Venkataraman commented on SPARK-3821:
--

Yeah you are right that the times are pretty close for Packer, base AMI. I was 
just curious if I was missing some thing. I don't think there is much else I 
had in mind -- having the full cluster launch times for existing AMI vs. Packer 
would be good and it would also be good to see how Packer compares to images 
created using 
[create_image.sh|https://github.com/mesos/spark-ec2/blob/v4/create_image.sh]

 Develop an automated way of creating Spark images (AMI, Docker, and others)
 ---

 Key: SPARK-3821
 URL: https://issues.apache.org/jira/browse/SPARK-3821
 Project: Spark
  Issue Type: Improvement
  Components: Build, EC2
Reporter: Nicholas Chammas
Assignee: Nicholas Chammas
 Attachments: packer-proposal.html


 Right now the creation of Spark AMIs or Docker containers is done manually. 
 With tools like [Packer|http://www.packer.io/], we should be able to automate 
 this work, and do so in such a way that multiple types of machine images can 
 be created from a single template.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)

2015-01-02 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14263181#comment-14263181
 ] 

Shivaram Venkataraman commented on SPARK-3821:
--

[~nchammas] Thanks for the benchmark. One thing I am curious about is why the 
Packer AMI is faster than launching just the base Amazon AMI. Is this because 
we spend some time installing things on the base AMI that we avoid with Packer 
? 

 Develop an automated way of creating Spark images (AMI, Docker, and others)
 ---

 Key: SPARK-3821
 URL: https://issues.apache.org/jira/browse/SPARK-3821
 Project: Spark
  Issue Type: Improvement
  Components: Build, EC2
Reporter: Nicholas Chammas
Assignee: Nicholas Chammas
 Attachments: packer-proposal.html


 Right now the creation of Spark AMIs or Docker containers is done manually. 
 With tools like [Packer|http://www.packer.io/], we should be able to automate 
 this work, and do so in such a way that multiple types of machine images can 
 be created from a single template.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)

2015-01-01 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14262720#comment-14262720
 ] 

Nicholas Chammas commented on SPARK-3821:
-

For lulz, I've benchmarked the start times of a few AMIs to better understand 
what role the AMI plays in cluster launch times.

Background:
* *Time from instance launch to SSH availability*
* {{m3.medium}} HVM instances in {{us-east-1}}
* ~30 launches recorded for each AMI

Stats:
* ami-35b1885c (current, as of 1.2.0, Spark AMI):
** Average launch time:  340 seconds
** Median launch time:  342 seconds
** Standard deviation:  33 seconds
* ami-b66ed3de (latest base Amazon AMI):
** Average launch time:  291 seconds
** Median launch time:  279 seconds
** Standard deviation:  89 seconds
* ami-3c610f54 (Packer-generated replacement Spark AMI, based on ami-b66ed3de):
** Average launch time:  275 seconds
** Median launch time:  272 seconds
** Standard deviation:  36 seconds

Something changed since the [benchmark I originally 
posted|https://github.com/nchammas/spark-ec2/blob/1b312fa1f794288c5dbe420c5a6451c4de7bf758/packer/proposal.md#new-amis],
 and I haven't seen the  100 second SSH availability.

I'd say that these numbers here are more reliable since I generated them using 
some scripts and many runs, as opposed to manually.

 Develop an automated way of creating Spark images (AMI, Docker, and others)
 ---

 Key: SPARK-3821
 URL: https://issues.apache.org/jira/browse/SPARK-3821
 Project: Spark
  Issue Type: Improvement
  Components: Build, EC2
Reporter: Nicholas Chammas
Assignee: Nicholas Chammas
 Attachments: packer-proposal.html


 Right now the creation of Spark AMIs or Docker containers is done manually. 
 With tools like [Packer|http://www.packer.io/], we should be able to automate 
 this work, and do so in such a way that multiple types of machine images can 
 be created from a single template.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)

2014-12-22 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256683#comment-14256683
 ] 

Nicholas Chammas commented on SPARK-3821:
-

Per the discussion earlier, I've 
[updated|https://github.com/nchammas/spark-ec2/tree/packer/packer] the Packer 
build configuration to drop the release-specific builds. I've also added GNU 
parallel to the list of installed tools and will use it in place of the {{while 
... rsync ...  wait}} pattern used throughout the various setup scripts.

I'll test out these changes on small ( 5 nodes) and large (= 100 nodes) 
cluster launches and post updated benchmarks as well as an updated README and 
proposal.

 Develop an automated way of creating Spark images (AMI, Docker, and others)
 ---

 Key: SPARK-3821
 URL: https://issues.apache.org/jira/browse/SPARK-3821
 Project: Spark
  Issue Type: Improvement
  Components: Build, EC2
Reporter: Nicholas Chammas
Assignee: Nicholas Chammas
 Attachments: packer-proposal.html


 Right now the creation of Spark AMIs or Docker containers is done manually. 
 With tools like [Packer|http://www.packer.io/], we should be able to automate 
 this work, and do so in such a way that multiple types of machine images can 
 be created from a single template.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)

2014-11-10 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14205131#comment-14205131
 ] 

Shivaram Venkataraman commented on SPARK-3821:
--

Regarding reducing init time, I think there are simple things we can do in 
init.sh that will get us most of the way there. For example, we can download 
the tar.gz files for Hadoop, Spark on each machine and untar in parallel 
instead of rsync-ing at the end. But we can revisit this in a separate change I 
guess  

 Develop an automated way of creating Spark images (AMI, Docker, and others)
 ---

 Key: SPARK-3821
 URL: https://issues.apache.org/jira/browse/SPARK-3821
 Project: Spark
  Issue Type: Improvement
  Components: Build, EC2
Reporter: Nicholas Chammas
Assignee: Nicholas Chammas
 Attachments: packer-proposal.html


 Right now the creation of Spark AMIs or Docker containers is done manually. 
 With tools like [Packer|http://www.packer.io/], we should be able to automate 
 this work, and do so in such a way that multiple types of machine images can 
 be created from a single template.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)

2014-11-10 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14205358#comment-14205358
 ] 

Nicholas Chammas commented on SPARK-3821:
-

Here's the [benchmark of the launch times with the new AMIs that don't have 
Spark or Hadoop pre-installed | 
https://github.com/nchammas/spark-ec2/blob/packer/packer/proposal.md#latest-os-updates-and-ganglia-pre-installed-best-run-of-4].

Yeah, there are several optimizations to {{setup.sh}} that I can submit this 
week, mostly related to parallelizing things properly. Should I submit those 
separately, or roll them into this AMI work?

 Develop an automated way of creating Spark images (AMI, Docker, and others)
 ---

 Key: SPARK-3821
 URL: https://issues.apache.org/jira/browse/SPARK-3821
 Project: Spark
  Issue Type: Improvement
  Components: Build, EC2
Reporter: Nicholas Chammas
Assignee: Nicholas Chammas
 Attachments: packer-proposal.html


 Right now the creation of Spark AMIs or Docker containers is done manually. 
 With tools like [Packer|http://www.packer.io/], we should be able to automate 
 this work, and do so in such a way that multiple types of machine images can 
 be created from a single template.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)

2014-11-10 Thread Dan Osipov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14205367#comment-14205367
 ] 

Dan Osipov commented on SPARK-3821:
---

[~nchammas] Excellent work, I look forward to testing it out this week.

{quote}
1. My preference would be to just have a single AMI across Spark versions for a 
couple of reasons.
{quote}

I would actually advocate for baked AMIs. Yes, there are many of them, but IMHO 
there should be a Jenkins job creating these on every release, so it would be a 
fully automated task. These AMIs would be production ready release for Spark 
with all dependencies built in.

 Develop an automated way of creating Spark images (AMI, Docker, and others)
 ---

 Key: SPARK-3821
 URL: https://issues.apache.org/jira/browse/SPARK-3821
 Project: Spark
  Issue Type: Improvement
  Components: Build, EC2
Reporter: Nicholas Chammas
Assignee: Nicholas Chammas
 Attachments: packer-proposal.html


 Right now the creation of Spark AMIs or Docker containers is done manually. 
 With tools like [Packer|http://www.packer.io/], we should be able to automate 
 this work, and do so in such a way that multiple types of machine images can 
 be created from a single template.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)

2014-11-10 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14205401#comment-14205401
 ] 

Nicholas Chammas commented on SPARK-3821:
-

Thanks for taking a look [~danospv]. Looking forward to your feedback.

Keeping the fully-baked AMIs could totally work. The current scripts allow the 
image creation to be fully automated. We may need some more tooling around 
image management (e.g. like [{{delete-all-registered-spark-amis.py}} | 
https://github.com/nchammas/spark-ec2/blob/packer/packer/delete-all-registered-spark-amis.py]),
 and we will need to maintain the image library we build up.

So it's probably just a question of whether we are ready to accept the 
maintenance / tooling burden at this time, though it's totally feasible.

 Develop an automated way of creating Spark images (AMI, Docker, and others)
 ---

 Key: SPARK-3821
 URL: https://issues.apache.org/jira/browse/SPARK-3821
 Project: Spark
  Issue Type: Improvement
  Components: Build, EC2
Reporter: Nicholas Chammas
Assignee: Nicholas Chammas
 Attachments: packer-proposal.html


 Right now the creation of Spark AMIs or Docker containers is done manually. 
 With tools like [Packer|http://www.packer.io/], we should be able to automate 
 this work, and do so in such a way that multiple types of machine images can 
 be created from a single template.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)

2014-11-08 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14203757#comment-14203757
 ] 

Shivaram Venkataraman commented on SPARK-3821:
--

[~nchammas] Thanks for putting this together -- This is looking great ! I just 
had a couple of quick questions, clarifications

1. My preference would be to just have a single AMI across Spark versions for a 
couple of reasons. First it reduces steps for every release (even though 
creating AMIs is definitely much simpler now !). Also the number of AMIs we 
maintain could get large if we do this for every minor and major release like 
1.1.1. [~pwendell] could probably comment more on the release process etc.

2. Could you clarify if Hadoop is pre-installed in new AMIs or are is it still 
installed on startup ? The flexibility we right now have of switching between 
Hadoop 1, Hadoop 2, YARN etc. is useful for testing. (Related packer question: 
Are the [init scripts| 
https://github.com/nchammas/spark-ec2/blob/packer/packer/spark-packer.json#L129]
 run during AMI creation or during startup ?)

3. Do you have some benchmarks for the new AMI without Spark 1.1.0 
pre-installed ? [We right now have old AMI vs. new AMI with 
spark|https://github.com/nchammas/spark-ec2/blob/packer/packer/proposal.md#new-amis---latest-os-updates-and-spark-110-pre-installed-single-run]
 . I see a couple of huge wins in the new AMI (from SSH wait time, ganglia init 
etc.) which I guess we should get even without Spark being pre-installed.

 Develop an automated way of creating Spark images (AMI, Docker, and others)
 ---

 Key: SPARK-3821
 URL: https://issues.apache.org/jira/browse/SPARK-3821
 Project: Spark
  Issue Type: Improvement
  Components: Build, EC2
Reporter: Nicholas Chammas
Assignee: Nicholas Chammas
 Attachments: packer-proposal.html


 Right now the creation of Spark AMIs or Docker containers is done manually. 
 With tools like [Packer|http://www.packer.io/], we should be able to automate 
 this work, and do so in such a way that multiple types of machine images can 
 be created from a single template.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)

2014-11-08 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14203786#comment-14203786
 ] 

Nicholas Chammas commented on SPARK-3821:
-

Thanks for the feedback [~shivaram].

{quote}
1. My preference would be to just have a single AMI across Spark versions for a 
couple of reasons. 
{quote}

I agree. Maintaining images for specific versions of Spark is worth it only if 
you're really crazy about getting the lowest cluster launch times possible. 
Well, that was my [original motivation | 
http://apache-spark-developers-list.1001551.n3.nabble.com/EC2-clusters-ready-in-launch-time-30-seconds-td7262.html]
 for doing this work, but ultimately I agree the complexity is not worth it at 
the moment. I'll take this out unless someone wants to advocate for leaving it 
in.

{quote}
2. Could you clarify if Hadoop is pre-installed in new AMIs or are is it still 
installed on startup ?
{quote}

Currently, I have it set to install Hadoop 2 on the AMIs with Spark 
pre-installed. Again, this was done with the intention of aiming for the lowest 
launch time possible, but if we'd like to do away with the Spark-pre-installed 
AMIs then this is not an issue.

{quote}
Are the init scripts run during AMI creation or during startup ?
{quote}

For the AMIs with Spark pre-installed, they are run during AMI creation. That's 
why the [init runtimes in the second benchmark | 
https://github.com/nchammas/spark-ec2/blob/214d5e4cac392a0eac21f949fe25c0075044411f/packer/proposal.md#new-amis---latest-os-updates-and-spark-110-pre-installed-single-run]
 are all 0 ms; the init script sees that such and such is already installed and 
just exits.

{quote}
3. Do you have some benchmarks for the new AMI without Spark 1.1.0 
pre-installed ?
{quote}

Nope, but I can run one and get back to you on Monday or Tuesday with those 
numbers.

 Develop an automated way of creating Spark images (AMI, Docker, and others)
 ---

 Key: SPARK-3821
 URL: https://issues.apache.org/jira/browse/SPARK-3821
 Project: Spark
  Issue Type: Improvement
  Components: Build, EC2
Reporter: Nicholas Chammas
Assignee: Nicholas Chammas
 Attachments: packer-proposal.html


 Right now the creation of Spark AMIs or Docker containers is done manually. 
 With tools like [Packer|http://www.packer.io/], we should be able to automate 
 this work, and do so in such a way that multiple types of machine images can 
 be created from a single template.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)

2014-11-07 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14203280#comment-14203280
 ] 

Nicholas Chammas commented on SPARK-3821:
-

After much dilly-dallying, I am happy to present:
* A brief proposal / design doc ([fixed JIRA attachment | 
https://issues.apache.org/jira/secure/attachment/12680371/packer-proposal.html],
 [md file on GitHub | 
https://github.com/nchammas/spark-ec2/blob/packer/packer/proposal.md])
* [Initial implementation | 
https://github.com/nchammas/spark-ec2/tree/packer/packer] and [README | 
https://github.com/nchammas/spark-ec2/blob/packer/packer/README.md]
* New AMIs generated by this implementation: [Base AMIs | 
https://github.com/nchammas/spark-ec2/tree/packer/ami-list/base], [Spark 1.1.0 
Pre-Installed | 
https://github.com/nchammas/spark-ec2/tree/packer/ami-list/1.1.0]

To try out the new AMIs with {{spark-ec2}}, you'll need to update [these | 
https://github.com/apache/spark/blob/7e9d975676d56ace0e84c2200137e4cd4eba074a/ec2/spark_ec2.py#L47]
 [two | 
https://github.com/apache/spark/blob/7e9d975676d56ace0e84c2200137e4cd4eba074a/ec2/spark_ec2.py#L593]
 lines (well, really, just the first one) to point to [my {{spark-ec2}} repo on 
the {{packer}} branch | 
https://github.com/nchammas/spark-ec2/tree/packer/packer].

Your candid feedback and/or improvements are most welcome!

 Develop an automated way of creating Spark images (AMI, Docker, and others)
 ---

 Key: SPARK-3821
 URL: https://issues.apache.org/jira/browse/SPARK-3821
 Project: Spark
  Issue Type: Improvement
  Components: Build, EC2
Reporter: Nicholas Chammas
Assignee: Nicholas Chammas
 Attachments: packer-proposal.html


 Right now the creation of Spark AMIs or Docker containers is done manually. 
 With tools like [Packer|http://www.packer.io/], we should be able to automate 
 this work, and do so in such a way that multiple types of machine images can 
 be created from a single template.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)

2014-10-31 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14192273#comment-14192273
 ] 

Nicholas Chammas commented on SPARK-3821:
-

Hey folks, I was hoping to post a design doc here this week and get feedback 
but I will have to push that back to next week. Been very busy this week and 
will be away from a computer all weekend. Apologies. 

 Develop an automated way of creating Spark images (AMI, Docker, and others)
 ---

 Key: SPARK-3821
 URL: https://issues.apache.org/jira/browse/SPARK-3821
 Project: Spark
  Issue Type: Improvement
  Components: Build, EC2
Reporter: Nicholas Chammas
Assignee: Nicholas Chammas

 Right now the creation of Spark AMIs or Docker containers is done manually. 
 With tools like [Packer|http://www.packer.io/], we should be able to automate 
 this work, and do so in such a way that multiple types of machine images can 
 be created from a single template.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)

2014-10-24 Thread Dan Osipov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182956#comment-14182956
 ] 

Dan Osipov commented on SPARK-3821:
---

I'd like to take this on - this is needed for a launch script I'm working on.

Current AMIs are owned by Amazon ID 314332379540 - I assume whatever process 
that gets created as a result of this ticket will need to be run by that user 
to host the resulting AMIs. 

Are there manual steps that are currently done to produce 
https://github.com/mesos/spark-ec2/tree/v4/ami-list ?

 Develop an automated way of creating Spark images (AMI, Docker, and others)
 ---

 Key: SPARK-3821
 URL: https://issues.apache.org/jira/browse/SPARK-3821
 Project: Spark
  Issue Type: Improvement
  Components: Build, EC2
Reporter: Nicholas Chammas

 Right now the creation of Spark AMIs or Docker containers is done manually. 
 With tools like [Packer|http://www.packer.io/], we should be able to automate 
 this work, and do so in such a way that multiple types of machine images can 
 be created from a single template.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)

2014-10-24 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183245#comment-14183245
 ] 

Nicholas Chammas commented on SPARK-3821:
-

Hey [~danospv], I'm currently in the middle of working on this. I've assigned 
this JIRA issue to myself to make that clearer. Next week I plan to post a 
brief design doc and perhaps also an initial alpha of this feature working so 
that people can review it and give their feedback. If after reviewing it you 
find that you'd still like to pursue this, feel free to do so.

{quote}
this is needed for a launch script I'm working on
{quote}

Could you elaborate on your use case? The use cases I'm currently targeting are 
focused on improving {{spark-ec2}} launch times and automating updates to any 
Spark machine images or containers.

{quote}
Are there manual steps that are currently done to produce 
https://github.com/mesos/spark-ec2/tree/v4/ami-list ?
{quote}

Yes, 
[{{create_image.sh}}|https://github.com/mesos/spark-ec2/blob/v4/create_image.sh]
 is supposed to be that script, though it was created _ex post-facto_ and may 
not yield a proper replica of the AMIs we currently have.

On a related note, the approach I'm pursuing will automate both the creation of 
the AMIs in multiple regions and across virtualization types, as well as update 
the AMI list under {{ami-list/}} automatically.

 Develop an automated way of creating Spark images (AMI, Docker, and others)
 ---

 Key: SPARK-3821
 URL: https://issues.apache.org/jira/browse/SPARK-3821
 Project: Spark
  Issue Type: Improvement
  Components: Build, EC2
Reporter: Nicholas Chammas
Assignee: Nicholas Chammas

 Right now the creation of Spark AMIs or Docker containers is done manually. 
 With tools like [Packer|http://www.packer.io/], we should be able to automate 
 this work, and do so in such a way that multiple types of machine images can 
 be created from a single template.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)

2014-10-24 Thread Dan Osipov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183276#comment-14183276
 ] 

Dan Osipov commented on SPARK-3821:
---

OK, great!

 Could you elaborate on your use case? The use cases I'm currently targeting 
 are focused on improving spark-ec2 launch times and automating updates to any 
 Spark machine images or containers.

There are a few problems with spark-ec2 script:
* Large clusters take too long to spin up. This is due to serial processing of 
each slave. When done in parallel, performance is much better. 
* It doesn't handle failure well. EC2 nodes may fail to start up, but still 
report that they're running. In those cases spark-ec2 freezes, then fails, 
without cleaning up state after itself (leaves instances, security groups, EBS 
volumes).

I rewrote the steps in a scala tool. Its not on feature par with spark-ec2 yet, 
but makes some improvements in the above mentioned areas. The goal is for it to 
serve the same role as EMR cli[1], if you've ever used that - including running 
a job. The problem is that a lot of functionality is still bundled in setup.sh, 
which can be minimized by a) doing most of the work at AMI bundling step b) 
performing it in parallel through the launcher. I'd be glad to put the script 
on github so that you can evaluate the approach.

Are you also planning to create AMIs for different combinations Spark and 
Hadoop versions?

[1] 
http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-cli-commands.html

 Develop an automated way of creating Spark images (AMI, Docker, and others)
 ---

 Key: SPARK-3821
 URL: https://issues.apache.org/jira/browse/SPARK-3821
 Project: Spark
  Issue Type: Improvement
  Components: Build, EC2
Reporter: Nicholas Chammas
Assignee: Nicholas Chammas

 Right now the creation of Spark AMIs or Docker containers is done manually. 
 With tools like [Packer|http://www.packer.io/], we should be able to automate 
 this work, and do so in such a way that multiple types of machine images can 
 be created from a single template.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)

2014-10-24 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183390#comment-14183390
 ] 

Nicholas Chammas commented on SPARK-3821:
-

Going for something like EMR's CLI is potentially very useful, though perhaps a 
bit outside the scope of the original {{spark-ec2}} (and there's nothing wrong 
with that!).

What I'm doing will keep {{spark-ec2}} mostly as-is on the surface, but tackle 
the launch times and parallelism as you described.

I'm currently only generating AMIs with Hadoop 2 and Spark 1.1.0, or a base AMI 
with everything except Hadoop and Spark. I haven't yet figured out the details 
of how to handle the full version matrix. Right now I'm leaning towards having 
a base AMI that any version of Spark can be installed on relatively quickly 
and AMIs for specific versions of Spark starting from 1.1.0.

 Develop an automated way of creating Spark images (AMI, Docker, and others)
 ---

 Key: SPARK-3821
 URL: https://issues.apache.org/jira/browse/SPARK-3821
 Project: Spark
  Issue Type: Improvement
  Components: Build, EC2
Reporter: Nicholas Chammas
Assignee: Nicholas Chammas

 Right now the creation of Spark AMIs or Docker containers is done manually. 
 With tools like [Packer|http://www.packer.io/], we should be able to automate 
 this work, and do so in such a way that multiple types of machine images can 
 be created from a single template.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)

2014-10-07 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162663#comment-14162663
 ] 

Nicholas Chammas commented on SPARK-3821:
-

[~shivaram] / [~pwendell]:
# In a Spark cluster, what's the difference between what's installed on the 
master and what's installed on the slaves? Is it basically the same stuff, just 
with minor configuration changes?
# Starting from a base AMI, is the rough procedure for creating a fully built 
Spark instance simply running 
[{{create_image.sh}}|https://github.com/mesos/spark-ec2/blob/v3/create_image.sh]
 followed by [{{setup.sh}}|https://github.com/mesos/spark-ec2/blob/v3/setup.sh] 
(minus the stuff that connects to other instances)?

 Develop an automated way of creating Spark images (AMI, Docker, and others)
 ---

 Key: SPARK-3821
 URL: https://issues.apache.org/jira/browse/SPARK-3821
 Project: Spark
  Issue Type: Improvement
  Components: Build, EC2
Reporter: Nicholas Chammas

 Right now the creation of Spark AMIs or Docker containers is done manually. 
 With tools like [Packer|http://www.packer.io/], we should be able to automate 
 this work, and do so in such a way that multiple types of machine images can 
 be created from a single template.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)

2014-10-07 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162755#comment-14162755
 ] 

Shivaram Venkataraman commented on SPARK-3821:
--

1. Yes - the same stuff is installed on master and slaves. In fact they have 
the same AMI.

2. The base Spark AMI is created using `create_image.sh` (from a base Amazon 
AMI) -- After that we pass in the AMI-ID to `spark_ec2.py` which calls 
`setup.sh` on the master.  

 Develop an automated way of creating Spark images (AMI, Docker, and others)
 ---

 Key: SPARK-3821
 URL: https://issues.apache.org/jira/browse/SPARK-3821
 Project: Spark
  Issue Type: Improvement
  Components: Build, EC2
Reporter: Nicholas Chammas

 Right now the creation of Spark AMIs or Docker containers is done manually. 
 With tools like [Packer|http://www.packer.io/], we should be able to automate 
 this work, and do so in such a way that multiple types of machine images can 
 be created from a single template.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org