[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)
[ https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293663#comment-15293663 ] Mete Kural commented on SPARK-3821: --- Thanks for the information Nicholas! Now I understand the Spark project strategy around this. spark-ec2 not showing up in the docs with Spark 2.0 would be consistent as you write. Thanks for the referral to the Apache Big Top project. I will examine what's available there. > Develop an automated way of creating Spark images (AMI, Docker, and others) > --- > > Key: SPARK-3821 > URL: https://issues.apache.org/jira/browse/SPARK-3821 > Project: Spark > Issue Type: Improvement > Components: Build, EC2 >Reporter: Nicholas Chammas >Assignee: Nicholas Chammas > Attachments: packer-proposal.html > > > Right now the creation of Spark AMIs or Docker containers is done manually. > With tools like [Packer|http://www.packer.io/], we should be able to automate > this work, and do so in such a way that multiple types of machine images can > be created from a single template. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)
[ https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292599#comment-15292599 ] Nicholas Chammas commented on SPARK-3821: - You can deploy Spark today on Docker just fine. It's just that Spark itself does not maintain any official Dockerfiles and likely never will since the project is actually trying to push deployment stuff outside the main project (hence why spark-ec2 was moved out; you will not see spark-ec2 in the official docs once Spark 2.0 comes out). You may be more interested in the Apache Big Top project, which focuses on big data system deployment (including Spark) and may have stuff for Docker specifically. Mesos is a separate matter, because it's a resource manager (analogous to YARN) that integrates with Spark at a low level. If you still think Spark should host and maintain an official Dockerfile and Docker images that are suitable for production use, please open a separate issue. I think the maintainers will reject it on the grounds that I have explained here, though. (Can't say for sure; after all I'm just a random contributor.) > Develop an automated way of creating Spark images (AMI, Docker, and others) > --- > > Key: SPARK-3821 > URL: https://issues.apache.org/jira/browse/SPARK-3821 > Project: Spark > Issue Type: Improvement > Components: Build, EC2 >Reporter: Nicholas Chammas >Assignee: Nicholas Chammas > Attachments: packer-proposal.html > > > Right now the creation of Spark AMIs or Docker containers is done manually. > With tools like [Packer|http://www.packer.io/], we should be able to automate > this work, and do so in such a way that multiple types of machine images can > be created from a single template. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)
[ https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292488#comment-15292488 ] Mete Kural commented on SPARK-3821: --- Thank you for the response Nicholas. spark-ec2 does take care of AMIs for ec2 and in fact is documented in Spark documentation as a deployment method along with distribution with Spark. However, the same level of presence doesn't seem to exist for Docker as a deployment method. What's inside the docker folder in Spark is not really in shape for a production deployment, not documented in Spark documentation either, and doesn't seem to have been worked on in quite a while. It seems the only way the Spark project officially supports running Spark on Docker is via Mesos, would you say that is correct? With Docker becoming an industry standard as of a month ago, I hope there will be renewed interest within the Spark project in supporting Docker as an official deployment method without the Mesos requirement. > Develop an automated way of creating Spark images (AMI, Docker, and others) > --- > > Key: SPARK-3821 > URL: https://issues.apache.org/jira/browse/SPARK-3821 > Project: Spark > Issue Type: Improvement > Components: Build, EC2 >Reporter: Nicholas Chammas >Assignee: Nicholas Chammas > Attachments: packer-proposal.html > > > Right now the creation of Spark AMIs or Docker containers is done manually. > With tools like [Packer|http://www.packer.io/], we should be able to automate > this work, and do so in such a way that multiple types of machine images can > be created from a single template. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)
[ https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292198#comment-15292198 ] Nicholas Chammas commented on SPARK-3821: - Not sure if there is renewed interest, but at this point this issue is outside the scope of the Spark project. The original impetus for this issue was to create AMIs for spark-ec2 in an automated fashion, and spark-ec2 has been moved out of the main Spark project. spark-ec2 now lives here: https://github.com/amplab/spark-ec2 > Develop an automated way of creating Spark images (AMI, Docker, and others) > --- > > Key: SPARK-3821 > URL: https://issues.apache.org/jira/browse/SPARK-3821 > Project: Spark > Issue Type: Improvement > Components: Build, EC2 >Reporter: Nicholas Chammas >Assignee: Nicholas Chammas > Attachments: packer-proposal.html > > > Right now the creation of Spark AMIs or Docker containers is done manually. > With tools like [Packer|http://www.packer.io/], we should be able to automate > this work, and do so in such a way that multiple types of machine images can > be created from a single template. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)
[ https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292185#comment-15292185 ] Mete Kural commented on SPARK-3821: --- Is there any new interest in this now that the Docker image format is officially the industry's standard container format (http://thenewstack.io/open-container-initiative-launches-container-image-format-spec/ https://blog.docker.com/2016/04/docker-engine-1-11-runc/)? > Develop an automated way of creating Spark images (AMI, Docker, and others) > --- > > Key: SPARK-3821 > URL: https://issues.apache.org/jira/browse/SPARK-3821 > Project: Spark > Issue Type: Improvement > Components: Build, EC2 >Reporter: Nicholas Chammas >Assignee: Nicholas Chammas > Attachments: packer-proposal.html > > > Right now the creation of Spark AMIs or Docker containers is done manually. > With tools like [Packer|http://www.packer.io/], we should be able to automate > this work, and do so in such a way that multiple types of machine images can > be created from a single template. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)
[ https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14332303#comment-14332303 ] Nicholas Chammas commented on SPARK-3821: - For those wanting to use the work being done as part of this issue before it gets merged upstream, I posted some [instructions on Stack Overflow|http://stackoverflow.com/a/28639669/877069] in response to a related question. Develop an automated way of creating Spark images (AMI, Docker, and others) --- Key: SPARK-3821 URL: https://issues.apache.org/jira/browse/SPARK-3821 Project: Spark Issue Type: Improvement Components: Build, EC2 Reporter: Nicholas Chammas Assignee: Nicholas Chammas Attachments: packer-proposal.html Right now the creation of Spark AMIs or Docker containers is done manually. With tools like [Packer|http://www.packer.io/], we should be able to automate this work, and do so in such a way that multiple types of machine images can be created from a single template. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)
[ https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320874#comment-14320874 ] Chris Love commented on SPARK-3821: --- I notice that the packer built ami comes with java7, how would your recommend handling java8? Should both be installed? Also which aws linux were the new ami's built off of? Will this be in a 1.2.x branch or just 1.3? Thanks Chris Develop an automated way of creating Spark images (AMI, Docker, and others) --- Key: SPARK-3821 URL: https://issues.apache.org/jira/browse/SPARK-3821 Project: Spark Issue Type: Improvement Components: Build, EC2 Reporter: Nicholas Chammas Assignee: Nicholas Chammas Attachments: packer-proposal.html Right now the creation of Spark AMIs or Docker containers is done manually. With tools like [Packer|http://www.packer.io/], we should be able to automate this work, and do so in such a way that multiple types of machine images can be created from a single template. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)
[ https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320905#comment-14320905 ] Nicholas Chammas commented on SPARK-3821: - If you want Java 8 alongside 7, you can install both to separate paths. For spark-ec2's purposes, we only need 7. The AMIs used as the base are [defined in the Packer template|https://github.com/nchammas/spark-ec2/blob/0f313de64ad9542d1a0f0d6f27131ca4bc01d8c3/image-build/spark-packer-template.json#L5-L6]. The generated AMIs do not include Spark itself--just its dependencies, plus related tools for spark-ec2. Develop an automated way of creating Spark images (AMI, Docker, and others) --- Key: SPARK-3821 URL: https://issues.apache.org/jira/browse/SPARK-3821 Project: Spark Issue Type: Improvement Components: Build, EC2 Reporter: Nicholas Chammas Assignee: Nicholas Chammas Attachments: packer-proposal.html Right now the creation of Spark AMIs or Docker containers is done manually. With tools like [Packer|http://www.packer.io/], we should be able to automate this work, and do so in such a way that multiple types of machine images can be created from a single template. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)
[ https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320995#comment-14320995 ] Florian Verhein commented on SPARK-3821: RE: Java, that reminds me... We should probably be using OracleJDK rather than OpenJDK. But I think this should be a separate issue, so just created #SPARK-5813. Develop an automated way of creating Spark images (AMI, Docker, and others) --- Key: SPARK-3821 URL: https://issues.apache.org/jira/browse/SPARK-3821 Project: Spark Issue Type: Improvement Components: Build, EC2 Reporter: Nicholas Chammas Assignee: Nicholas Chammas Attachments: packer-proposal.html Right now the creation of Spark AMIs or Docker containers is done manually. With tools like [Packer|http://www.packer.io/], we should be able to automate this work, and do so in such a way that multiple types of machine images can be created from a single template. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)
[ https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277104#comment-14277104 ] Nicholas Chammas commented on SPARK-3821: - Hmm, I doubt that was intentional since it seems to be a problem. Maybe Shivaram can shed some light on the choice of pre built distribution. I'm guessing it was just an oversight and we need improved logic to install a wider variety of distributions so that related software like Tachyon always works correctly. On 2015년 1월 14일 (수) at 오전 1:51 Florian Verhein (JIRA) j...@apache.org Develop an automated way of creating Spark images (AMI, Docker, and others) --- Key: SPARK-3821 URL: https://issues.apache.org/jira/browse/SPARK-3821 Project: Spark Issue Type: Improvement Components: Build, EC2 Reporter: Nicholas Chammas Assignee: Nicholas Chammas Attachments: packer-proposal.html Right now the creation of Spark AMIs or Docker containers is done manually. With tools like [Packer|http://www.packer.io/], we should be able to automate this work, and do so in such a way that multiple types of machine images can be created from a single template. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)
[ https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277334#comment-14277334 ] Shivaram Venkataraman commented on SPARK-3821: -- Regarding the pre-built distributions, AFAIK we don't support full Hadoop2 as in YARN. We run CDH4 which has some parts of Hadoop2, but with MapReduce. There is an open PR to add support for Hadoop2 at https://github.com/mesos/spark-ec2/pull/77 and and you can see that it gets the right [prebuilt Spark|https://github.com/mesos/spark-ec2/pull/77/files#diff-1d040c3294246f2b59643d63868fc2adR97] in that case Develop an automated way of creating Spark images (AMI, Docker, and others) --- Key: SPARK-3821 URL: https://issues.apache.org/jira/browse/SPARK-3821 Project: Spark Issue Type: Improvement Components: Build, EC2 Reporter: Nicholas Chammas Assignee: Nicholas Chammas Attachments: packer-proposal.html Right now the creation of Spark AMIs or Docker containers is done manually. With tools like [Packer|http://www.packer.io/], we should be able to automate this work, and do so in such a way that multiple types of machine images can be created from a single template. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)
[ https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276471#comment-14276471 ] Nicholas Chammas commented on SPARK-3821: - [~shivaram] Are we ready to open a PR against {{mesos/spark-ec2}} and start a review discussion there? Develop an automated way of creating Spark images (AMI, Docker, and others) --- Key: SPARK-3821 URL: https://issues.apache.org/jira/browse/SPARK-3821 Project: Spark Issue Type: Improvement Components: Build, EC2 Reporter: Nicholas Chammas Assignee: Nicholas Chammas Attachments: packer-proposal.html Right now the creation of Spark AMIs or Docker containers is done manually. With tools like [Packer|http://www.packer.io/], we should be able to automate this work, and do so in such a way that multiple types of machine images can be created from a single template. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)
[ https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276505#comment-14276505 ] Shivaram Venkataraman commented on SPARK-3821: -- [~nchammas] Yes -- That sounds good Develop an automated way of creating Spark images (AMI, Docker, and others) --- Key: SPARK-3821 URL: https://issues.apache.org/jira/browse/SPARK-3821 Project: Spark Issue Type: Improvement Components: Build, EC2 Reporter: Nicholas Chammas Assignee: Nicholas Chammas Attachments: packer-proposal.html Right now the creation of Spark AMIs or Docker containers is done manually. With tools like [Packer|http://www.packer.io/], we should be able to automate this work, and do so in such a way that multiple types of machine images can be created from a single template. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)
[ https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276572#comment-14276572 ] Florian Verhein commented on SPARK-3821: Thanks [~nchammas], that makes sense. Created #SPARK-5241. I'm not sure about the pre-built scenario, but am guessing e.g. http://s3.amazonaws.com/spark-related-packages/spark-1.2.0-bin-hadoop2.4.tgz != http://s3.amazonaws.com/spark-related-packages/spark-1.2.0-bin-cdh4.tgz. So perhaps the intent is that the spark-ec2 scripts only support cdh distributions... Develop an automated way of creating Spark images (AMI, Docker, and others) --- Key: SPARK-3821 URL: https://issues.apache.org/jira/browse/SPARK-3821 Project: Spark Issue Type: Improvement Components: Build, EC2 Reporter: Nicholas Chammas Assignee: Nicholas Chammas Attachments: packer-proposal.html Right now the creation of Spark AMIs or Docker containers is done manually. With tools like [Packer|http://www.packer.io/], we should be able to automate this work, and do so in such a way that multiple types of machine images can be created from a single template. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)
[ https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276263#comment-14276263 ] Florian Verhein commented on SPARK-3821: This is great stuff! It'll also help serve as some documentation for AMI requirements when using the spark-ec2 scripts. Re the above, I think everything in create_image.sh can be refactored to packer (+ duplicate removal - e.g. root login). I've attempted to do this in a fork of [~nchammas]'s work, but my use case is a bit different in that I need to go from a fresh centos6 minimal (rather than an amazon linux AMI) and then add other things. Possibly related to AMI generation in general: I've noticed that the version dependencies in the spark-ec2 scripts are broken. I suspect this will need to be handled in both the image and the setup. For example: - It looks like Spark needs to be built with the right hadoop profile to work, but this isn't adhered to. This applies when spark is built from a git checkout or from an existing build. This is likely also the case with Tachyon too. Probably the cause of https://issues.apache.org/jira/browse/SPARK-3185 - The hadoop native libs are built on the image using 2.4.1, but then copied into whatever hadoop build is downloaded in the ephemeral-hdfs and persistent-hdfs scripts. I suspect that could cause issues too. Since building hadoop is very time consuming, it's something you'd wan't on the image - hence creating a dependency. - The version dependencies for other things like ganglia aren't documented (I believe this is installed on the image but duplicated again in spark-ec2/ganglia). I've found that the ganglia config doesn't work for me (but recall I'm using a different base AMI, so I'll likely get a different ganglia version). I have a sneaky suspicion that the hadoop configs in spark-ec2 won't work across the hadoop versions either (but, fingers crossed!). Re the above, I might try keeping the entire hadoop build (from the image creation) for the hdfs setup. Sorry for the sidetrack, but struggling though all this so hoping it might ring a bell for someone. p.s. With the image automation, it might also be worth considering putting more on the image as an option (esp for people happy to build their own AMIs). For example, I see no reason why the module init.sh scripts can't be run from packer in order to speed start-up times of the cluster :) Develop an automated way of creating Spark images (AMI, Docker, and others) --- Key: SPARK-3821 URL: https://issues.apache.org/jira/browse/SPARK-3821 Project: Spark Issue Type: Improvement Components: Build, EC2 Reporter: Nicholas Chammas Assignee: Nicholas Chammas Attachments: packer-proposal.html Right now the creation of Spark AMIs or Docker containers is done manually. With tools like [Packer|http://www.packer.io/], we should be able to automate this work, and do so in such a way that multiple types of machine images can be created from a single template. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)
[ https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276411#comment-14276411 ] Nicholas Chammas commented on SPARK-3821: - Hi [~florianverhein] and thanks for chiming in! {quote} Re the above, I think everything in create_image.sh can be refactored to packer (+ duplicate removal - e.g. root login). {quote} Definitely. I'm hoping to make as few changes as possible to the existing {{create_image.sh}} script to reduce the review burden, but after this initial proposal is accepted it makes sense to refactor these scripts. There is some related work proposed in [SPARK-5189]. Some of the things you call out regarding version mismatches and whatnot sound like they might merit their own JIRA issues. For example: {quote} It looks like Spark needs to be built with the right hadoop profile to work, but this isn't adhered to. {quote} I haven't tested this out, but from the Spark init script, it looks like the correct version of Spark is used in [the pre-built scenario|https://github.com/mesos/spark-ec2/blob/3a95101c70e6892a8a48cc54094adaed1458487a/spark/init.sh#L109]. Not so in the [build-from-git scenario|https://github.com/mesos/spark-ec2/blob/3a95101c70e6892a8a48cc54094adaed1458487a/spark/init.sh#L21], so nice catch. Could you file a JIRA issue for that? {quote} For example, I see no reason why the module init.sh scripts can't be run from packer in order to speed start-up times of the cluster {quote} Regarding this and other ideas regarding pre-baking more on the images, [that's how this proposal started, actually|https://github.com/nchammas/spark-ec2/blob/9c28878694171ba085a10acd4405c702397d28ce/packer/README.md#base-vs-spark-pre-installed] (here's the [original Packer template|https://github.com/nchammas/spark-ec2/blob/9c28878694171ba085a10acd4405c702397d28ce/packer/spark-packer.json#L118-L133]). We decided to rip that out to reduce the complexity of the initial proposal and make it easier to specify different versions of Spark and Hadoop at launch time. Develop an automated way of creating Spark images (AMI, Docker, and others) --- Key: SPARK-3821 URL: https://issues.apache.org/jira/browse/SPARK-3821 Project: Spark Issue Type: Improvement Components: Build, EC2 Reporter: Nicholas Chammas Assignee: Nicholas Chammas Attachments: packer-proposal.html Right now the creation of Spark AMIs or Docker containers is done manually. With tools like [Packer|http://www.packer.io/], we should be able to automate this work, and do so in such a way that multiple types of machine images can be created from a single template. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)
[ https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14274535#comment-14274535 ] Nicholas Chammas commented on SPARK-3821: - That's correct. All those paths are just relative to the folder containing {{spark-packer.json}}. Develop an automated way of creating Spark images (AMI, Docker, and others) --- Key: SPARK-3821 URL: https://issues.apache.org/jira/browse/SPARK-3821 Project: Spark Issue Type: Improvement Components: Build, EC2 Reporter: Nicholas Chammas Assignee: Nicholas Chammas Attachments: packer-proposal.html Right now the creation of Spark AMIs or Docker containers is done manually. With tools like [Packer|http://www.packer.io/], we should be able to automate this work, and do so in such a way that multiple types of machine images can be created from a single template. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)
[ https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273187#comment-14273187 ] Nicholas Chammas commented on SPARK-3821: - Updated launch stats: * Launching cluster with 50 slaves in {{us-east-1}}. * Stats for best of 3 runs. {{branch-1.3}} @ [{{3a95101}}|https://github.com/mesos/spark-ec2/tree/3a95101c70e6892a8a48cc54094adaed1458487a]: {code} Cluster is now in 'ssh-ready' state. Waited 460 seconds. [timing] rsync /root/spark-ec2: 00h 00m 07s [timing] setup-slave: 00h 00m 28s [timing] scala init: 00h 00m 11s [timing] spark init: 00h 00m 07s [timing] ephemeral-hdfs init: 00h 12m 40s [timing] persistent-hdfs init: 00h 12m 35s [timing] spark-standalone init: 00h 00m 00s [timing] tachyon init: 00h 00m 08s [timing] ganglia init: 00h 00m 53s [timing] scala setup: 00h 03m 11s [timing] spark setup: 00h 21m 20s [timing] ephemeral-hdfs setup: 00h 00m 48s [timing] persistent-hdfs setup: 00h 00m 43s [timing] spark-standalone setup: 00h 01m 19s [timing] tachyon setup: 00h 03m 06s [timing] ganglia setup: 00h 00m 32s {code} {{packer}} @ [{{273c8c5}}|https://github.com/nchammas/spark-ec2/tree/273c8c518fbc6e86e0fb4410efbe77a4d4e4ff5b]: {code} Cluster is now in 'ssh-ready' state. Waited 292 seconds. [timing] rsync /root/spark-ec2: 00h 00m 20s [timing] setup-slave: 00h 00m 19s [timing] scala init: 00h 00m 12s [timing] spark init: 00h 00m 08s [timing] ephemeral-hdfs init: 00h 12m 58s [timing] persistent-hdfs init: 00h 12m 55s [timing] spark-standalone init: 00h 00m 00s [timing] tachyon init: 00h 00m 10s [timing] ganglia init: 00h 00m 15s [timing] scala setup: 00h 03m 19s [timing] spark setup: 00h 20m 32s [timing] ephemeral-hdfs setup: 00h 00m 34s [timing] persistent-hdfs setup: 00h 00m 27s [timing] spark-standalone setup: 00h 00m 47s [timing] tachyon setup: 00h 03m 15s [timing] ganglia setup: 00h 00m 23s {code} As you can see, with the exception of time-to-SSH-availability, things are mostly the same across the current and Packer-generated AMIs. I've proposed improvements to cut down the launch times of large clusters in [a separate issue|SPARK-5189]. [~shivaram] - At this point I think it's safe to say that the approach proposed here is straightforward and worth pursuing. All we need now is a review of [the scripts that install various stuff|https://github.com/nchammas/spark-ec2/blob/273c8c518fbc6e86e0fb4410efbe77a4d4e4ff5b/packer/spark-packer.json#L63-L66] (e.g. Ganglia, Python 2.7, etc.) on the AMI to make sure it all makes sense. Develop an automated way of creating Spark images (AMI, Docker, and others) --- Key: SPARK-3821 URL: https://issues.apache.org/jira/browse/SPARK-3821 Project: Spark Issue Type: Improvement Components: Build, EC2 Reporter: Nicholas Chammas Assignee: Nicholas Chammas Attachments: packer-proposal.html Right now the creation of Spark AMIs or Docker containers is done manually. With tools like [Packer|http://www.packer.io/], we should be able to automate this work, and do so in such a way that multiple types of machine images can be created from a single template. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)
[ https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14263187#comment-14263187 ] Nicholas Chammas commented on SPARK-3821: - I need to brush up on my statistics, but I think the difference between base AMI and Packer AMI is not statistically significant. The benchmark just tested time from instance launch to SSH availability. Nothing was installed or done with the instances after SSH became available. (i.e. I wasn't creating Spark clusters.) I still have to post updated benchmarks for full cluster launches. Is there anything else you wanted to see before reviewing this proposal in more detail? Develop an automated way of creating Spark images (AMI, Docker, and others) --- Key: SPARK-3821 URL: https://issues.apache.org/jira/browse/SPARK-3821 Project: Spark Issue Type: Improvement Components: Build, EC2 Reporter: Nicholas Chammas Assignee: Nicholas Chammas Attachments: packer-proposal.html Right now the creation of Spark AMIs or Docker containers is done manually. With tools like [Packer|http://www.packer.io/], we should be able to automate this work, and do so in such a way that multiple types of machine images can be created from a single template. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)
[ https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14263206#comment-14263206 ] Nicholas Chammas commented on SPARK-3821: - I have Packer configured to run {{create_image.sh}}, as well as other scripts I added (e.g. to install Python 2.7), to generate the AMIs I am using. So testing Packer-generated AMIs against manually-generated ones (by running {{create_image.sh}} by hand) should show little difference. Packer is just tooling to automate the application of existing scripts like {{create_image.sh}} towards creating AMIs and other image types like GCE images and Docker images. The goal is to make it easy to generate and update Spark AMIs (and eventually Docker images too) in an automated fashion. Develop an automated way of creating Spark images (AMI, Docker, and others) --- Key: SPARK-3821 URL: https://issues.apache.org/jira/browse/SPARK-3821 Project: Spark Issue Type: Improvement Components: Build, EC2 Reporter: Nicholas Chammas Assignee: Nicholas Chammas Attachments: packer-proposal.html Right now the creation of Spark AMIs or Docker containers is done manually. With tools like [Packer|http://www.packer.io/], we should be able to automate this work, and do so in such a way that multiple types of machine images can be created from a single template. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)
[ https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14263193#comment-14263193 ] Shivaram Venkataraman commented on SPARK-3821: -- Yeah you are right that the times are pretty close for Packer, base AMI. I was just curious if I was missing some thing. I don't think there is much else I had in mind -- having the full cluster launch times for existing AMI vs. Packer would be good and it would also be good to see how Packer compares to images created using [create_image.sh|https://github.com/mesos/spark-ec2/blob/v4/create_image.sh] Develop an automated way of creating Spark images (AMI, Docker, and others) --- Key: SPARK-3821 URL: https://issues.apache.org/jira/browse/SPARK-3821 Project: Spark Issue Type: Improvement Components: Build, EC2 Reporter: Nicholas Chammas Assignee: Nicholas Chammas Attachments: packer-proposal.html Right now the creation of Spark AMIs or Docker containers is done manually. With tools like [Packer|http://www.packer.io/], we should be able to automate this work, and do so in such a way that multiple types of machine images can be created from a single template. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)
[ https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14263181#comment-14263181 ] Shivaram Venkataraman commented on SPARK-3821: -- [~nchammas] Thanks for the benchmark. One thing I am curious about is why the Packer AMI is faster than launching just the base Amazon AMI. Is this because we spend some time installing things on the base AMI that we avoid with Packer ? Develop an automated way of creating Spark images (AMI, Docker, and others) --- Key: SPARK-3821 URL: https://issues.apache.org/jira/browse/SPARK-3821 Project: Spark Issue Type: Improvement Components: Build, EC2 Reporter: Nicholas Chammas Assignee: Nicholas Chammas Attachments: packer-proposal.html Right now the creation of Spark AMIs or Docker containers is done manually. With tools like [Packer|http://www.packer.io/], we should be able to automate this work, and do so in such a way that multiple types of machine images can be created from a single template. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)
[ https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14262720#comment-14262720 ] Nicholas Chammas commented on SPARK-3821: - For lulz, I've benchmarked the start times of a few AMIs to better understand what role the AMI plays in cluster launch times. Background: * *Time from instance launch to SSH availability* * {{m3.medium}} HVM instances in {{us-east-1}} * ~30 launches recorded for each AMI Stats: * ami-35b1885c (current, as of 1.2.0, Spark AMI): ** Average launch time: 340 seconds ** Median launch time: 342 seconds ** Standard deviation: 33 seconds * ami-b66ed3de (latest base Amazon AMI): ** Average launch time: 291 seconds ** Median launch time: 279 seconds ** Standard deviation: 89 seconds * ami-3c610f54 (Packer-generated replacement Spark AMI, based on ami-b66ed3de): ** Average launch time: 275 seconds ** Median launch time: 272 seconds ** Standard deviation: 36 seconds Something changed since the [benchmark I originally posted|https://github.com/nchammas/spark-ec2/blob/1b312fa1f794288c5dbe420c5a6451c4de7bf758/packer/proposal.md#new-amis], and I haven't seen the 100 second SSH availability. I'd say that these numbers here are more reliable since I generated them using some scripts and many runs, as opposed to manually. Develop an automated way of creating Spark images (AMI, Docker, and others) --- Key: SPARK-3821 URL: https://issues.apache.org/jira/browse/SPARK-3821 Project: Spark Issue Type: Improvement Components: Build, EC2 Reporter: Nicholas Chammas Assignee: Nicholas Chammas Attachments: packer-proposal.html Right now the creation of Spark AMIs or Docker containers is done manually. With tools like [Packer|http://www.packer.io/], we should be able to automate this work, and do so in such a way that multiple types of machine images can be created from a single template. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)
[ https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256683#comment-14256683 ] Nicholas Chammas commented on SPARK-3821: - Per the discussion earlier, I've [updated|https://github.com/nchammas/spark-ec2/tree/packer/packer] the Packer build configuration to drop the release-specific builds. I've also added GNU parallel to the list of installed tools and will use it in place of the {{while ... rsync ... wait}} pattern used throughout the various setup scripts. I'll test out these changes on small ( 5 nodes) and large (= 100 nodes) cluster launches and post updated benchmarks as well as an updated README and proposal. Develop an automated way of creating Spark images (AMI, Docker, and others) --- Key: SPARK-3821 URL: https://issues.apache.org/jira/browse/SPARK-3821 Project: Spark Issue Type: Improvement Components: Build, EC2 Reporter: Nicholas Chammas Assignee: Nicholas Chammas Attachments: packer-proposal.html Right now the creation of Spark AMIs or Docker containers is done manually. With tools like [Packer|http://www.packer.io/], we should be able to automate this work, and do so in such a way that multiple types of machine images can be created from a single template. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)
[ https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14205131#comment-14205131 ] Shivaram Venkataraman commented on SPARK-3821: -- Regarding reducing init time, I think there are simple things we can do in init.sh that will get us most of the way there. For example, we can download the tar.gz files for Hadoop, Spark on each machine and untar in parallel instead of rsync-ing at the end. But we can revisit this in a separate change I guess Develop an automated way of creating Spark images (AMI, Docker, and others) --- Key: SPARK-3821 URL: https://issues.apache.org/jira/browse/SPARK-3821 Project: Spark Issue Type: Improvement Components: Build, EC2 Reporter: Nicholas Chammas Assignee: Nicholas Chammas Attachments: packer-proposal.html Right now the creation of Spark AMIs or Docker containers is done manually. With tools like [Packer|http://www.packer.io/], we should be able to automate this work, and do so in such a way that multiple types of machine images can be created from a single template. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)
[ https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14205358#comment-14205358 ] Nicholas Chammas commented on SPARK-3821: - Here's the [benchmark of the launch times with the new AMIs that don't have Spark or Hadoop pre-installed | https://github.com/nchammas/spark-ec2/blob/packer/packer/proposal.md#latest-os-updates-and-ganglia-pre-installed-best-run-of-4]. Yeah, there are several optimizations to {{setup.sh}} that I can submit this week, mostly related to parallelizing things properly. Should I submit those separately, or roll them into this AMI work? Develop an automated way of creating Spark images (AMI, Docker, and others) --- Key: SPARK-3821 URL: https://issues.apache.org/jira/browse/SPARK-3821 Project: Spark Issue Type: Improvement Components: Build, EC2 Reporter: Nicholas Chammas Assignee: Nicholas Chammas Attachments: packer-proposal.html Right now the creation of Spark AMIs or Docker containers is done manually. With tools like [Packer|http://www.packer.io/], we should be able to automate this work, and do so in such a way that multiple types of machine images can be created from a single template. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)
[ https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14205367#comment-14205367 ] Dan Osipov commented on SPARK-3821: --- [~nchammas] Excellent work, I look forward to testing it out this week. {quote} 1. My preference would be to just have a single AMI across Spark versions for a couple of reasons. {quote} I would actually advocate for baked AMIs. Yes, there are many of them, but IMHO there should be a Jenkins job creating these on every release, so it would be a fully automated task. These AMIs would be production ready release for Spark with all dependencies built in. Develop an automated way of creating Spark images (AMI, Docker, and others) --- Key: SPARK-3821 URL: https://issues.apache.org/jira/browse/SPARK-3821 Project: Spark Issue Type: Improvement Components: Build, EC2 Reporter: Nicholas Chammas Assignee: Nicholas Chammas Attachments: packer-proposal.html Right now the creation of Spark AMIs or Docker containers is done manually. With tools like [Packer|http://www.packer.io/], we should be able to automate this work, and do so in such a way that multiple types of machine images can be created from a single template. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)
[ https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14205401#comment-14205401 ] Nicholas Chammas commented on SPARK-3821: - Thanks for taking a look [~danospv]. Looking forward to your feedback. Keeping the fully-baked AMIs could totally work. The current scripts allow the image creation to be fully automated. We may need some more tooling around image management (e.g. like [{{delete-all-registered-spark-amis.py}} | https://github.com/nchammas/spark-ec2/blob/packer/packer/delete-all-registered-spark-amis.py]), and we will need to maintain the image library we build up. So it's probably just a question of whether we are ready to accept the maintenance / tooling burden at this time, though it's totally feasible. Develop an automated way of creating Spark images (AMI, Docker, and others) --- Key: SPARK-3821 URL: https://issues.apache.org/jira/browse/SPARK-3821 Project: Spark Issue Type: Improvement Components: Build, EC2 Reporter: Nicholas Chammas Assignee: Nicholas Chammas Attachments: packer-proposal.html Right now the creation of Spark AMIs or Docker containers is done manually. With tools like [Packer|http://www.packer.io/], we should be able to automate this work, and do so in such a way that multiple types of machine images can be created from a single template. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)
[ https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14203757#comment-14203757 ] Shivaram Venkataraman commented on SPARK-3821: -- [~nchammas] Thanks for putting this together -- This is looking great ! I just had a couple of quick questions, clarifications 1. My preference would be to just have a single AMI across Spark versions for a couple of reasons. First it reduces steps for every release (even though creating AMIs is definitely much simpler now !). Also the number of AMIs we maintain could get large if we do this for every minor and major release like 1.1.1. [~pwendell] could probably comment more on the release process etc. 2. Could you clarify if Hadoop is pre-installed in new AMIs or are is it still installed on startup ? The flexibility we right now have of switching between Hadoop 1, Hadoop 2, YARN etc. is useful for testing. (Related packer question: Are the [init scripts| https://github.com/nchammas/spark-ec2/blob/packer/packer/spark-packer.json#L129] run during AMI creation or during startup ?) 3. Do you have some benchmarks for the new AMI without Spark 1.1.0 pre-installed ? [We right now have old AMI vs. new AMI with spark|https://github.com/nchammas/spark-ec2/blob/packer/packer/proposal.md#new-amis---latest-os-updates-and-spark-110-pre-installed-single-run] . I see a couple of huge wins in the new AMI (from SSH wait time, ganglia init etc.) which I guess we should get even without Spark being pre-installed. Develop an automated way of creating Spark images (AMI, Docker, and others) --- Key: SPARK-3821 URL: https://issues.apache.org/jira/browse/SPARK-3821 Project: Spark Issue Type: Improvement Components: Build, EC2 Reporter: Nicholas Chammas Assignee: Nicholas Chammas Attachments: packer-proposal.html Right now the creation of Spark AMIs or Docker containers is done manually. With tools like [Packer|http://www.packer.io/], we should be able to automate this work, and do so in such a way that multiple types of machine images can be created from a single template. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)
[ https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14203786#comment-14203786 ] Nicholas Chammas commented on SPARK-3821: - Thanks for the feedback [~shivaram]. {quote} 1. My preference would be to just have a single AMI across Spark versions for a couple of reasons. {quote} I agree. Maintaining images for specific versions of Spark is worth it only if you're really crazy about getting the lowest cluster launch times possible. Well, that was my [original motivation | http://apache-spark-developers-list.1001551.n3.nabble.com/EC2-clusters-ready-in-launch-time-30-seconds-td7262.html] for doing this work, but ultimately I agree the complexity is not worth it at the moment. I'll take this out unless someone wants to advocate for leaving it in. {quote} 2. Could you clarify if Hadoop is pre-installed in new AMIs or are is it still installed on startup ? {quote} Currently, I have it set to install Hadoop 2 on the AMIs with Spark pre-installed. Again, this was done with the intention of aiming for the lowest launch time possible, but if we'd like to do away with the Spark-pre-installed AMIs then this is not an issue. {quote} Are the init scripts run during AMI creation or during startup ? {quote} For the AMIs with Spark pre-installed, they are run during AMI creation. That's why the [init runtimes in the second benchmark | https://github.com/nchammas/spark-ec2/blob/214d5e4cac392a0eac21f949fe25c0075044411f/packer/proposal.md#new-amis---latest-os-updates-and-spark-110-pre-installed-single-run] are all 0 ms; the init script sees that such and such is already installed and just exits. {quote} 3. Do you have some benchmarks for the new AMI without Spark 1.1.0 pre-installed ? {quote} Nope, but I can run one and get back to you on Monday or Tuesday with those numbers. Develop an automated way of creating Spark images (AMI, Docker, and others) --- Key: SPARK-3821 URL: https://issues.apache.org/jira/browse/SPARK-3821 Project: Spark Issue Type: Improvement Components: Build, EC2 Reporter: Nicholas Chammas Assignee: Nicholas Chammas Attachments: packer-proposal.html Right now the creation of Spark AMIs or Docker containers is done manually. With tools like [Packer|http://www.packer.io/], we should be able to automate this work, and do so in such a way that multiple types of machine images can be created from a single template. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)
[ https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14203280#comment-14203280 ] Nicholas Chammas commented on SPARK-3821: - After much dilly-dallying, I am happy to present: * A brief proposal / design doc ([fixed JIRA attachment | https://issues.apache.org/jira/secure/attachment/12680371/packer-proposal.html], [md file on GitHub | https://github.com/nchammas/spark-ec2/blob/packer/packer/proposal.md]) * [Initial implementation | https://github.com/nchammas/spark-ec2/tree/packer/packer] and [README | https://github.com/nchammas/spark-ec2/blob/packer/packer/README.md] * New AMIs generated by this implementation: [Base AMIs | https://github.com/nchammas/spark-ec2/tree/packer/ami-list/base], [Spark 1.1.0 Pre-Installed | https://github.com/nchammas/spark-ec2/tree/packer/ami-list/1.1.0] To try out the new AMIs with {{spark-ec2}}, you'll need to update [these | https://github.com/apache/spark/blob/7e9d975676d56ace0e84c2200137e4cd4eba074a/ec2/spark_ec2.py#L47] [two | https://github.com/apache/spark/blob/7e9d975676d56ace0e84c2200137e4cd4eba074a/ec2/spark_ec2.py#L593] lines (well, really, just the first one) to point to [my {{spark-ec2}} repo on the {{packer}} branch | https://github.com/nchammas/spark-ec2/tree/packer/packer]. Your candid feedback and/or improvements are most welcome! Develop an automated way of creating Spark images (AMI, Docker, and others) --- Key: SPARK-3821 URL: https://issues.apache.org/jira/browse/SPARK-3821 Project: Spark Issue Type: Improvement Components: Build, EC2 Reporter: Nicholas Chammas Assignee: Nicholas Chammas Attachments: packer-proposal.html Right now the creation of Spark AMIs or Docker containers is done manually. With tools like [Packer|http://www.packer.io/], we should be able to automate this work, and do so in such a way that multiple types of machine images can be created from a single template. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)
[ https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14192273#comment-14192273 ] Nicholas Chammas commented on SPARK-3821: - Hey folks, I was hoping to post a design doc here this week and get feedback but I will have to push that back to next week. Been very busy this week and will be away from a computer all weekend. Apologies. Develop an automated way of creating Spark images (AMI, Docker, and others) --- Key: SPARK-3821 URL: https://issues.apache.org/jira/browse/SPARK-3821 Project: Spark Issue Type: Improvement Components: Build, EC2 Reporter: Nicholas Chammas Assignee: Nicholas Chammas Right now the creation of Spark AMIs or Docker containers is done manually. With tools like [Packer|http://www.packer.io/], we should be able to automate this work, and do so in such a way that multiple types of machine images can be created from a single template. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)
[ https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182956#comment-14182956 ] Dan Osipov commented on SPARK-3821: --- I'd like to take this on - this is needed for a launch script I'm working on. Current AMIs are owned by Amazon ID 314332379540 - I assume whatever process that gets created as a result of this ticket will need to be run by that user to host the resulting AMIs. Are there manual steps that are currently done to produce https://github.com/mesos/spark-ec2/tree/v4/ami-list ? Develop an automated way of creating Spark images (AMI, Docker, and others) --- Key: SPARK-3821 URL: https://issues.apache.org/jira/browse/SPARK-3821 Project: Spark Issue Type: Improvement Components: Build, EC2 Reporter: Nicholas Chammas Right now the creation of Spark AMIs or Docker containers is done manually. With tools like [Packer|http://www.packer.io/], we should be able to automate this work, and do so in such a way that multiple types of machine images can be created from a single template. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)
[ https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183245#comment-14183245 ] Nicholas Chammas commented on SPARK-3821: - Hey [~danospv], I'm currently in the middle of working on this. I've assigned this JIRA issue to myself to make that clearer. Next week I plan to post a brief design doc and perhaps also an initial alpha of this feature working so that people can review it and give their feedback. If after reviewing it you find that you'd still like to pursue this, feel free to do so. {quote} this is needed for a launch script I'm working on {quote} Could you elaborate on your use case? The use cases I'm currently targeting are focused on improving {{spark-ec2}} launch times and automating updates to any Spark machine images or containers. {quote} Are there manual steps that are currently done to produce https://github.com/mesos/spark-ec2/tree/v4/ami-list ? {quote} Yes, [{{create_image.sh}}|https://github.com/mesos/spark-ec2/blob/v4/create_image.sh] is supposed to be that script, though it was created _ex post-facto_ and may not yield a proper replica of the AMIs we currently have. On a related note, the approach I'm pursuing will automate both the creation of the AMIs in multiple regions and across virtualization types, as well as update the AMI list under {{ami-list/}} automatically. Develop an automated way of creating Spark images (AMI, Docker, and others) --- Key: SPARK-3821 URL: https://issues.apache.org/jira/browse/SPARK-3821 Project: Spark Issue Type: Improvement Components: Build, EC2 Reporter: Nicholas Chammas Assignee: Nicholas Chammas Right now the creation of Spark AMIs or Docker containers is done manually. With tools like [Packer|http://www.packer.io/], we should be able to automate this work, and do so in such a way that multiple types of machine images can be created from a single template. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)
[ https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183276#comment-14183276 ] Dan Osipov commented on SPARK-3821: --- OK, great! Could you elaborate on your use case? The use cases I'm currently targeting are focused on improving spark-ec2 launch times and automating updates to any Spark machine images or containers. There are a few problems with spark-ec2 script: * Large clusters take too long to spin up. This is due to serial processing of each slave. When done in parallel, performance is much better. * It doesn't handle failure well. EC2 nodes may fail to start up, but still report that they're running. In those cases spark-ec2 freezes, then fails, without cleaning up state after itself (leaves instances, security groups, EBS volumes). I rewrote the steps in a scala tool. Its not on feature par with spark-ec2 yet, but makes some improvements in the above mentioned areas. The goal is for it to serve the same role as EMR cli[1], if you've ever used that - including running a job. The problem is that a lot of functionality is still bundled in setup.sh, which can be minimized by a) doing most of the work at AMI bundling step b) performing it in parallel through the launcher. I'd be glad to put the script on github so that you can evaluate the approach. Are you also planning to create AMIs for different combinations Spark and Hadoop versions? [1] http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-cli-commands.html Develop an automated way of creating Spark images (AMI, Docker, and others) --- Key: SPARK-3821 URL: https://issues.apache.org/jira/browse/SPARK-3821 Project: Spark Issue Type: Improvement Components: Build, EC2 Reporter: Nicholas Chammas Assignee: Nicholas Chammas Right now the creation of Spark AMIs or Docker containers is done manually. With tools like [Packer|http://www.packer.io/], we should be able to automate this work, and do so in such a way that multiple types of machine images can be created from a single template. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)
[ https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183390#comment-14183390 ] Nicholas Chammas commented on SPARK-3821: - Going for something like EMR's CLI is potentially very useful, though perhaps a bit outside the scope of the original {{spark-ec2}} (and there's nothing wrong with that!). What I'm doing will keep {{spark-ec2}} mostly as-is on the surface, but tackle the launch times and parallelism as you described. I'm currently only generating AMIs with Hadoop 2 and Spark 1.1.0, or a base AMI with everything except Hadoop and Spark. I haven't yet figured out the details of how to handle the full version matrix. Right now I'm leaning towards having a base AMI that any version of Spark can be installed on relatively quickly and AMIs for specific versions of Spark starting from 1.1.0. Develop an automated way of creating Spark images (AMI, Docker, and others) --- Key: SPARK-3821 URL: https://issues.apache.org/jira/browse/SPARK-3821 Project: Spark Issue Type: Improvement Components: Build, EC2 Reporter: Nicholas Chammas Assignee: Nicholas Chammas Right now the creation of Spark AMIs or Docker containers is done manually. With tools like [Packer|http://www.packer.io/], we should be able to automate this work, and do so in such a way that multiple types of machine images can be created from a single template. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)
[ https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162663#comment-14162663 ] Nicholas Chammas commented on SPARK-3821: - [~shivaram] / [~pwendell]: # In a Spark cluster, what's the difference between what's installed on the master and what's installed on the slaves? Is it basically the same stuff, just with minor configuration changes? # Starting from a base AMI, is the rough procedure for creating a fully built Spark instance simply running [{{create_image.sh}}|https://github.com/mesos/spark-ec2/blob/v3/create_image.sh] followed by [{{setup.sh}}|https://github.com/mesos/spark-ec2/blob/v3/setup.sh] (minus the stuff that connects to other instances)? Develop an automated way of creating Spark images (AMI, Docker, and others) --- Key: SPARK-3821 URL: https://issues.apache.org/jira/browse/SPARK-3821 Project: Spark Issue Type: Improvement Components: Build, EC2 Reporter: Nicholas Chammas Right now the creation of Spark AMIs or Docker containers is done manually. With tools like [Packer|http://www.packer.io/], we should be able to automate this work, and do so in such a way that multiple types of machine images can be created from a single template. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)
[ https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162755#comment-14162755 ] Shivaram Venkataraman commented on SPARK-3821: -- 1. Yes - the same stuff is installed on master and slaves. In fact they have the same AMI. 2. The base Spark AMI is created using `create_image.sh` (from a base Amazon AMI) -- After that we pass in the AMI-ID to `spark_ec2.py` which calls `setup.sh` on the master. Develop an automated way of creating Spark images (AMI, Docker, and others) --- Key: SPARK-3821 URL: https://issues.apache.org/jira/browse/SPARK-3821 Project: Spark Issue Type: Improvement Components: Build, EC2 Reporter: Nicholas Chammas Right now the creation of Spark AMIs or Docker containers is done manually. With tools like [Packer|http://www.packer.io/], we should be able to automate this work, and do so in such a way that multiple types of machine images can be created from a single template. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org