Re: Spark 1.0.0 - Java 8

2014-05-30 Thread Surendranauth Hiraman
With respect to virtual hosts, my team uses Vagrant/Virtualbox. We have 3
CentOS VMs with 4 GB RAM each - 2 worker nodes and a master node.

Everything works fine, though if you are using MapR, you have to make sure
they are all on the same subnet.

-Suren



On Fri, May 30, 2014 at 12:20 PM, Upender Nimbekar upent...@gmail.com
wrote:

 Great News ! I've been awaiting this release to start doing some coding
 with Spark using Java 8. Can I run Spark 1.0 examples on a virtual host
 with 16 GB ram and fair descent amount of hard disk ? Or do I reaaly need
 to use a cluster of machines.
 Second, are there any good exmaples of using MLIB on Spark. Please shoot
 me in the right direction.

 Thanks
 Upender

 On Fri, May 30, 2014 at 6:12 AM, Patrick Wendell pwend...@gmail.com
 wrote:

 I'm thrilled to announce the availability of Spark 1.0.0! Spark 1.0.0
 is a milestone release as the first in the 1.0 line of releases,
 providing API stability for Spark's core interfaces.

 Spark 1.0.0 is Spark's largest release ever, with contributions from
 117 developers. I'd like to thank everyone involved in this release -
 it was truly a community effort with fixes, features, and
 optimizations contributed from dozens of organizations.

 This release expands Spark's standard libraries, introducing a new SQL
 package (SparkSQL) which lets users integrate SQL queries into
 existing Spark workflows. MLlib, Spark's machine learning library, is
 expanded with sparse vector support and several new algorithms. The
 GraphX and Streaming libraries also introduce new features and
 optimizations. Spark's core engine adds support for secured YARN
 clusters, a unified tool for submitting Spark applications, and
 several performance and stability improvements. Finally, Spark adds
 support for Java 8 lambda syntax and improves coverage of the Java and
 Python API's.

 Those features only scratch the surface - check out the release notes
 here:
 http://spark.apache.org/releases/spark-release-1-0-0.html

 Note that since release artifacts were posted recently, certain
 mirrors may not have working downloads for a few hours.

 - Patrick





-- 

SUREN HIRAMAN, VP TECHNOLOGY
Velos
Accelerating Machine Learning

440 NINTH AVENUE, 11TH FLOOR
NEW YORK, NY 10001
O: (917) 525-2466 ext. 105
F: 646.349.4063
E: suren.hiraman@v suren.hira...@sociocast.comelos.io
W: www.velos.io


Re: Spark 1.0.0 - Java 8

2014-05-30 Thread Aaron Davidson
Also, the Spark examples can run out of the box on a single machine, as
well as a cluster. See the Master URLs heading here:
http://spark.apache.org/docs/latest/submitting-applications.html#master-urls


On Fri, May 30, 2014 at 9:24 AM, Surendranauth Hiraman 
suren.hira...@velos.io wrote:

 With respect to virtual hosts, my team uses Vagrant/Virtualbox. We have 3
 CentOS VMs with 4 GB RAM each - 2 worker nodes and a master node.

 Everything works fine, though if you are using MapR, you have to make sure
 they are all on the same subnet.

 -Suren



 On Fri, May 30, 2014 at 12:20 PM, Upender Nimbekar upent...@gmail.com
 wrote:

 Great News ! I've been awaiting this release to start doing some coding
 with Spark using Java 8. Can I run Spark 1.0 examples on a virtual host
 with 16 GB ram and fair descent amount of hard disk ? Or do I reaaly need
 to use a cluster of machines.
 Second, are there any good exmaples of using MLIB on Spark. Please shoot
 me in the right direction.

 Thanks
 Upender

 On Fri, May 30, 2014 at 6:12 AM, Patrick Wendell pwend...@gmail.com
 wrote:

 I'm thrilled to announce the availability of Spark 1.0.0! Spark 1.0.0
 is a milestone release as the first in the 1.0 line of releases,
 providing API stability for Spark's core interfaces.

 Spark 1.0.0 is Spark's largest release ever, with contributions from
 117 developers. I'd like to thank everyone involved in this release -
 it was truly a community effort with fixes, features, and
 optimizations contributed from dozens of organizations.

 This release expands Spark's standard libraries, introducing a new SQL
 package (SparkSQL) which lets users integrate SQL queries into
 existing Spark workflows. MLlib, Spark's machine learning library, is
 expanded with sparse vector support and several new algorithms. The
 GraphX and Streaming libraries also introduce new features and
 optimizations. Spark's core engine adds support for secured YARN
 clusters, a unified tool for submitting Spark applications, and
 several performance and stability improvements. Finally, Spark adds
 support for Java 8 lambda syntax and improves coverage of the Java and
 Python API's.

 Those features only scratch the surface - check out the release notes
 here:
 http://spark.apache.org/releases/spark-release-1-0-0.html

 Note that since release artifacts were posted recently, certain
 mirrors may not have working downloads for a few hours.

 - Patrick





 --

 SUREN HIRAMAN, VP TECHNOLOGY
 Velos
 Accelerating Machine Learning

 440 NINTH AVENUE, 11TH FLOOR
 NEW YORK, NY 10001
 O: (917) 525-2466 ext. 105
 F: 646.349.4063
 E: suren.hiraman@v suren.hira...@sociocast.comelos.io
 W: www.velos.io




Re: Spark and Java 8

2014-05-07 Thread Kristoffer Sjögren
Running Hadoop and HDFS on unsupported JVM runtime sounds a little
adventurous. But as long as Spark can run in a separate Java 8 runtime it's
all good. I think having lambdas and type inference is huge when writing
these jobs and using Scala (paying the price of complexity, poor tooling
etc etc) for this tiny feature is often not justified.


On Wed, May 7, 2014 at 2:03 AM, Dean Wampler deanwamp...@gmail.com wrote:

 Cloudera customers will need to put pressure on them to support Java 8.
 They only officially supported Java 7 when Oracle stopped supporting Java 6.

 dean


 On Wed, May 7, 2014 at 5:05 AM, Matei Zaharia matei.zaha...@gmail.comwrote:

 Java 8 support is a feature in Spark, but vendors need to decide for
 themselves when they’d like support Java 8 commercially. You can still run
 Spark on Java 7 or 6 without taking advantage of the new features (indeed
 our builds are always against Java 6).

 Matei

 On May 6, 2014, at 8:59 AM, Ian O'Connell i...@ianoconnell.com wrote:

 I think the distinction there might be they never said they ran that code
 under CDH5, just that spark supports it and spark runs under CDH5. Not that
 you can use these features while running under CDH5.

 They could use mesos or the standalone scheduler to run them


 On Tue, May 6, 2014 at 6:16 AM, Kristoffer Sjögren sto...@gmail.comwrote:

 Hi

 I just read an article [1] about Spark, CDH5 and Java 8 but did not get
 exactly how Spark can run Java 8 on a YARN cluster at runtime. Is Spark
 using a separate JVM that run on data nodes or is it reusing the YARN JVM
 runtime somehow, like hadoop1?

 CDH5 only supports Java 7 [2] as far as I know?

 Cheers,
 -Kristoffer


 [1]
 http://blog.cloudera.com/blog/2014/04/making-apache-spark-easier-to-use-in-java-with-java-8/
 [2]
 http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/latest/CDH5-Requirements-and-Supported-Versions/CDH5-Requirements-and-Supported-Versions.html









 --
 Dean Wampler, Ph.D.
 Typesafe
 @deanwampler
 http://typesafe.com
 http://polyglotprogramming.com



Re: Spark and Java 8

2014-05-06 Thread Marcelo Vanzin
Hi Kristoffer,

You're correct that CDH5 only supports up to Java 7 at the moment. But
Yarn apps do not run in the same JVM as Yarn itself (and I believe MR1
doesn't either), so it might be possible to pass arguments in a way
that tells Yarn to launch the application master / executors with the
Java 8 runtime. I have never tried this, so I don't know if it's
really possible, and it's obviously not supported (also because Java 8
support is part of Spark 1.0 which hasn't been released yet).

You're welcome to try it out, and if you get it to work in some
manner, it would be great to hear back.


On Tue, May 6, 2014 at 6:16 AM, Kristoffer Sjögren sto...@gmail.com wrote:
 Hi

 I just read an article [1] about Spark, CDH5 and Java 8 but did not get
 exactly how Spark can run Java 8 on a YARN cluster at runtime. Is Spark
 using a separate JVM that run on data nodes or is it reusing the YARN JVM
 runtime somehow, like hadoop1?

 CDH5 only supports Java 7 [2] as far as I know?

 Cheers,
 -Kristoffer


 [1]
 http://blog.cloudera.com/blog/2014/04/making-apache-spark-easier-to-use-in-java-with-java-8/
 [2]
 http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/latest/CDH5-Requirements-and-Supported-Versions/CDH5-Requirements-and-Supported-Versions.html







-- 
Marcelo


Re: Spark and Java 8

2014-05-06 Thread Ian O'Connell
I think the distinction there might be they never said they ran that code
under CDH5, just that spark supports it and spark runs under CDH5. Not that
you can use these features while running under CDH5.

They could use mesos or the standalone scheduler to run them


On Tue, May 6, 2014 at 6:16 AM, Kristoffer Sjögren sto...@gmail.com wrote:

 Hi

 I just read an article [1] about Spark, CDH5 and Java 8 but did not get
 exactly how Spark can run Java 8 on a YARN cluster at runtime. Is Spark
 using a separate JVM that run on data nodes or is it reusing the YARN JVM
 runtime somehow, like hadoop1?

 CDH5 only supports Java 7 [2] as far as I know?

 Cheers,
 -Kristoffer


 [1]
 http://blog.cloudera.com/blog/2014/04/making-apache-spark-easier-to-use-in-java-with-java-8/
 [2]
 http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/latest/CDH5-Requirements-and-Supported-Versions/CDH5-Requirements-and-Supported-Versions.html







Re: Spark and Java 8

2014-05-06 Thread Matei Zaharia
Java 8 support is a feature in Spark, but vendors need to decide for themselves 
when they’d like support Java 8 commercially. You can still run Spark on Java 7 
or 6 without taking advantage of the new features (indeed our builds are always 
against Java 6).

Matei

On May 6, 2014, at 8:59 AM, Ian O'Connell i...@ianoconnell.com wrote:

 I think the distinction there might be they never said they ran that code 
 under CDH5, just that spark supports it and spark runs under CDH5. Not that 
 you can use these features while running under CDH5.
 
 They could use mesos or the standalone scheduler to run them
 
 
 On Tue, May 6, 2014 at 6:16 AM, Kristoffer Sjögren sto...@gmail.com wrote:
 Hi
 
 I just read an article [1] about Spark, CDH5 and Java 8 but did not get 
 exactly how Spark can run Java 8 on a YARN cluster at runtime. Is Spark using 
 a separate JVM that run on data nodes or is it reusing the YARN JVM runtime 
 somehow, like hadoop1?
 
 CDH5 only supports Java 7 [2] as far as I know?
 
 Cheers,
 -Kristoffer
 
 
 [1] 
 http://blog.cloudera.com/blog/2014/04/making-apache-spark-easier-to-use-in-java-with-java-8/
 [2] 
 http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/latest/CDH5-Requirements-and-Supported-Versions/CDH5-Requirements-and-Supported-Versions.html