Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-30 Thread Patrick Wendell
Spark is a bit different than Hadoop MapReduce, so maybe that's a source of some confusion. Spark is often used as a substrate for building different types of analytics applications, so @DeveloperAPI are internal API's that we'd like to expose to application writers, but that might be more volatile

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-30 Thread Colin McCabe
On Fri, May 30, 2014 at 2:11 PM, Patrick Wendell wrote: > Hey guys, thanks for the insights. Also, I realize Hadoop has gotten > way better about this with 2.2+ and I think it's great progress. > > We have well defined API levels in Spark and also automated checking > of API violations for new pu

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-30 Thread Marcelo Vanzin
Hi Patrick, On Fri, May 30, 2014 at 2:11 PM, Patrick Wendell wrote: > 2. private[spark] > 3. @Experimental or @DeveloperApi I understand @Experimental, but when would you use @DeveloperApi instead of private[spark]? Seems to me that, for the API user, they both mean very similar, if not exactly

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-30 Thread Patrick Wendell
Hey guys, thanks for the insights. Also, I realize Hadoop has gotten way better about this with 2.2+ and I think it's great progress. We have well defined API levels in Spark and also automated checking of API violations for new pull requests. When doing code reviews we always enforce the narrowes

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-30 Thread Marcelo Vanzin
On Fri, May 30, 2014 at 12:05 PM, Colin McCabe wrote: > I don't know if Scala provides any mechanisms to do this beyond what Java > provides. In fact it does. You can say something like "private[foo]" and the annotated element will be visible for all classes under "foo" (where "foo" is any packa

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-30 Thread Colin McCabe
First of all, I think it's great that you're thinking about this. API stability is super important and it would be good to see Spark get on top of this. I want to clarify a bit about Hadoop. The problem that Hadoop faces is that the Java package system isn't very flexible. If you have a method

Re: Streaming example stops outputting (Java, Kafka at least)

2014-05-30 Thread Nan Zhu
If local[2] is expected, then the streaming doc is actually misleading? as the given example is import org.apache.spark.api.java.function._ import org.apache.spark.streaming._ import org.apache.spark.streaming.api._ // Create a StreamingContext with a local master val ssc = new StreamingContext

Re: Streaming example stops outputting (Java, Kafka at least)

2014-05-30 Thread Patrick Wendell
Yeah - Spark streaming needs at least two threads to run. I actually thought we warned the user if they only use one (@tdas?) but the warning might not be working correctly - or I'm misremembering. On Fri, May 30, 2014 at 6:38 AM, Sean Owen wrote: > Thanks Nan, that does appear to fix it. I was u

Re: Why does spark REPL not embed scala REPL?

2014-05-30 Thread Aaron Davidson
There's some discussion here as well on just using the Scala REPL for 2.11: http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-on-Scala-2-11-td6506.html#a6523 Matei's response mentions the features we needed to change from the Scala REPL (class-based wrappers and where to output the g

Re: Spark 1.0.0 - Java 8

2014-05-30 Thread Surendranauth Hiraman
With respect to virtual hosts, my team uses Vagrant/Virtualbox. We have 3 CentOS VMs with 4 GB RAM each - 2 worker nodes and a master node. Everything works fine, though if you are using MapR, you have to make sure they are all on the same subnet. -Suren On Fri, May 30, 2014 at 12:20 PM, Upend

bin/spark-shell --jars option

2014-05-30 Thread Andrew Ash
Hi Spark users, In past Spark releases I always had to add jars to multiple places when using the spark-shell, and I'm looking to cut down on those. The --jars option looks like it does what I want, but it doesn't work. I did a quick experiment on latest branch-1.0 and found this: *# 0) jar not

Spark 1.0.0 - Java 8

2014-05-30 Thread Upender Nimbekar
Great News ! I've been awaiting this release to start doing some coding with Spark using Java 8. Can I run Spark 1.0 examples on a virtual host with 16 GB ram and fair descent amount of hard disk ? Or do I reaaly need to use a cluster of machines. Second, are there any good exmaples of using MLIB o

Re: Streaming example stops outputting (Java, Kafka at least)

2014-05-30 Thread Sean Owen
Thanks Nan, that does appear to fix it. I was using "local". Can anyone say whether that's to be expected or whether it could be a bug somewhere? On Fri, May 30, 2014 at 2:42 PM, Nan Zhu wrote: > Hi, Sean > > I was in the same problem > > but when I changed MASTER=“local” to MASTER=“local[2]” > >

Re: Streaming example stops outputting (Java, Kafka at least)

2014-05-30 Thread Nan Zhu
Hi, Sean I was in the same problem but when I changed MASTER=“local” to MASTER=“local[2]” everything back to the normal Hasn’t get a chance to ask here Best, -- Nan Zhu On Friday, May 30, 2014 at 9:09 AM, Sean Owen wrote: > Guys I'm struggling to debug some strange behavior in a sim

Streaming example stops outputting (Java, Kafka at least)

2014-05-30 Thread Sean Owen
Guys I'm struggling to debug some strange behavior in a simple Streaming + Java + Kafka example -- in fact, a simplified version of JavaKafkaWordcount, that is just calling print() on a sequence of messages. Data is flowing, but it only appears to work for a few periods -- sometimes 0 -- before ce

Re: Announcing Spark 1.0.0

2014-05-30 Thread Rahul Singhal
Is it intentional/ok that the tag v1.0.0 is behind tag v1.0.0-rc11? Thanks, Rahul Singhal On 30/05/14 3:43 PM, "Patrick Wendell" wrote: >I'm thrilled to announce the availability of Spark 1.0.0! Spark 1.0.0 >is a milestone release as the first in the 1.0 line of releases, >providing API st

Re: Why does spark REPL not embed scala REPL?

2014-05-30 Thread Kan Zhang
One reason is standard Scala REPL uses object based wrappers and their static initializers will be run on remote worker nodes, which may fail due to differences between driver and worker nodes. See discussion here https://groups.google.com/d/msg/scala-internals/h27CFLoJXjE/JoobM6NiUMQJ On Fri, Ma

Re: Announcing Spark 1.0.0

2014-05-30 Thread Christopher Nguyen
Awesome work, Pat et al.! -- Christopher T. Nguyen Co-founder & CEO, Adatao linkedin.com/in/ctnguyen On Fri, May 30, 2014 at 3:12 AM, Patrick Wendell wrote: > I'm thrilled to announce the availability of Spark 1.0.0! Spark 1.0.0 > is a milestone release as the first in the

Announcing Spark 1.0.0

2014-05-30 Thread Patrick Wendell
I'm thrilled to announce the availability of Spark 1.0.0! Spark 1.0.0 is a milestone release as the first in the 1.0 line of releases, providing API stability for Spark's core interfaces. Spark 1.0.0 is Spark's largest release ever, with contributions from 117 developers. I'd like to thank everyon

Why does spark REPL not embed scala REPL?

2014-05-30 Thread Aniket
My apologies in advance if this is a dev mailing list topic. I am working on a small project to provide web interface to spark REPL. The interface will allow people to use spark REPL and perform exploratory analysis on the data. I already have a play application running that provides web interface

Re: Please change instruction about "Launching Applications Inside the Cluster"

2014-05-30 Thread Sandy Ryza
They should be - in the sense that the docs now recommend using spark-submit and thus include entirely different invocations. On Fri, May 30, 2014 at 12:46 AM, Reynold Xin wrote: > Can you take a look at the latest Spark 1.0 docs and see if they are fixed? > > https://github.com/apache/spark/tr

Re: Please change instruction about "Launching Applications Inside the Cluster"

2014-05-30 Thread Reynold Xin
Can you take a look at the latest Spark 1.0 docs and see if they are fixed? https://github.com/apache/spark/tree/master/docs Thanks. On Thu, May 29, 2014 at 5:29 AM, Lizhengbing (bing, BIPA) < zhengbing...@huawei.com> wrote: > The instruction address is in > http://spark.apache.org/docs/0.9.0/