Big data mini course: urllib2 connection refused when checking cluster

2013-12-13 Thread David Gingrich
I'm going through the big data mini course ( http://ampcamp.berkeley.edu/big-data-mini-course/launching-a-bdas-cluster-on-ec2.html) and am getting urllib2 connection refused errors. They're getting thrown in check_spark_cluster() function when waiting for the cluster to start. The URL is

Why spark has to use static method?

2013-12-13 Thread Jie Deng
Hi,all, Thanks for your time to read this, When I first trying to write a new Java class, and put spark in it, I always get a exception: *Exception in thread main org.apache.spark.SparkException: Job aborted: Task not serializable: java.io.NotSerializableException: org.dcu.test.SparkPrefix* *

Re: Why spark has to use static method?

2013-12-13 Thread Yadid Ayzenberg
Hi Jie, it seems that SparkPrefix is not serializable. you can try adding /implements Serializable/ and see if that solves the problem. Yadid On 12/13/13 5:10 AM, Jie Deng wrote: Hi,all, Thanks for your time to read this, When I first trying to write a new Java class, and put spark in it,

Re: Why spark has to use static method?

2013-12-13 Thread Jie Deng
Thanks Yadid, that really works! So is that means static method works only because spark did not distribute task yet, and the right way of using spark is implement all class Serializable? Thanks a lot! 2013/12/13 Yadid Ayzenberg ya...@media.mit.edu Hi Jie, it seems that SparkPrefix is

Re: Why spark has to use static method?

2013-12-13 Thread Jie Deng
Great, Thanks a lot!!! 2013/12/13 Yadid Ayzenberg ya...@media.mit.edu In order for Spark to ship your objects to the slaves, they must be serializable. Also make sure to read the Data Serialization section in the tuning guide: http://spark.incubator.apache.org/docs/latest/tuning.html If

Re: Spark Vs R (Univariate Kernel Density Estimation)

2013-12-13 Thread ABEL ALEJANDRO CORONADO IRUEGAS
Thank you, i have one year homework ;) On Thu, Dec 12, 2013 at 2:51 PM, Imran Rashid im...@quantifind.com wrote: ah, got it, makes a lot more sense now. I couldn't figure out what w was, I should have figured it was weights. As Evan suggested, using zip is almost certainly what you want.

Re: writing to HDFS with a given username

2013-12-13 Thread Philip Ogren
Well, it only uses my user name when I run my application in local mode (i.e. spark is running on my laptop with a master url of local.) Not a general solution for you I'm afraid! On 12/12/2013 5:38 PM, Koert Kuipers wrote: Hey Philip, how do you get spark to write to hdfs with your user

exposing spark through a web service

2013-12-13 Thread Philip Ogren
Hi Spark Community, I would like to expose my spark application/libraries via a web service in order to launch jobs, interact with users, etc. I'm sure there are 100's of ways to think about doing this each with a variety of technology stacks that could be applied. So, I know there is no

Re: exposing spark through a web service

2013-12-13 Thread Mark Hamstra
https://github.com/apache/incubator-spark/pull/222 On Fri, Dec 13, 2013 at 8:36 AM, Philip Ogren philip.og...@oracle.comwrote: Hi Spark Community, I would like to expose my spark application/libraries via a web service in order to launch jobs, interact with users, etc. I'm sure there are

Re: writing to HDFS with a given username

2013-12-13 Thread Koert Kuipers
thats great. didn't realize this was in master already. On Thu, Dec 12, 2013 at 8:10 PM, Shao, Saisai saisai.s...@intel.com wrote: Hi Koert, Spark with multi-user support has been merged in master branch with patch ( https://github.com/apache/incubator-spark/pull/23), you can check out

some wrong link in Spark Summit web page

2013-12-13 Thread Nan Zhu
Hi, I'm not sure if it is the right place to talk about this, if not, I'm very sorry about that - 9-9:30am The State of Spark, and Where We’re Going Nexthttp://spark-summit.org/talk/zaharia-the-state-of-spark-and-where-were-going/ – pptx

Re: exposing spark through a web service

2013-12-13 Thread Patrick Wendell
Hey Philip, To elaborate a bit, this is a proposed patch for integrating something like a restful server into Spark. If you wanted to take a look at the documentation in that patch and comment as to whether it would partially or fully solve your use-case that would be great. - Patrick On Fri,

Re: some wrong link in Spark Summit web page

2013-12-13 Thread Patrick Wendell
Thanks for reporting this we'll figure it out. On Fri, Dec 13, 2013 at 10:04 AM, Nan Zhu zhunanmcg...@gmail.com wrote: Hi, I'm not sure if it is the right place to talk about this, if not, I'm very sorry about that - 9-9:30am The State of Spark, and Where We’re Going

Re: reading a specific key-value

2013-12-13 Thread K. Shankari
I think that you want the lookup() method in PairRDDFunctions? http://spark.incubator.apache.org/docs/latest/api/core/index.html#org.apache.spark.rdd.PairRDDFunctions It is supposed to be more efficient than filter... Shankari On Thu, Dec 12, 2013 at 7:30 PM, Yadid ya...@media.mit.edu wrote:

Re: reading a specific key-value

2013-12-13 Thread Mark Hamstra
Right, if your RDD has a Partitioner, then lookup() will use that to determine which partition contains the key that you want to lookup and only run a task on that partition. That still doesn't efficiently solve the lookup-a-set-of-keys problem, but extending lookup() to efficiently handle a

Spark Summit 2013 videos available now

2013-12-13 Thread Andy Konwinski
Hi Spark user@ and dev@ list members, We are happy to announce that videos and slides of all talks from the first Spark Summit last week, Dec 2-3 in Downtown SF, are now available on the Spark Summit 2013 webpage at http://spark-summit.org/summit-2013. There is a link for each talk's slides and

Re: some wrong link in Spark Summit web page

2013-12-13 Thread Andy Konwinski
Hey Nan, Thanks for pointing that out. It should be fixed now. Andy -- Forwarded message -- From: Nan Zhu zhunanmcg...@gmail.com Date: Fri, Dec 13, 2013 at 10:04 AM Subject: some wrong link in Spark Summit web page To: user@spark.incubator.apache.org Hi, I'm not sure

Fwd: Re: reading a specific key-value

2013-12-13 Thread Yadid Ayzenberg
oops, ,meant to send to the entire list... Original Message Subject:Re: reading a specific key-value Date: Fri, 13 Dec 2013 14:56:22 -0500 From: Yadid Ayzenberg ya...@media.mit.edu To: K. Shankari shank...@eecs.berkeley.edu Its says more efficient if the RDD

Re: Re: reading a specific key-value

2013-12-13 Thread Mark Hamstra
It means that the partitioner (Option[Partitioner]) field of the RDD is Some(p), not None. Which, in turn, means that for a key k, the RDD knows how to find which partition contains that k. In order for that to be true, the RDD has to have been partitioned by key, and after that only

Re: reading a specific key-value

2013-12-13 Thread Yadid Ayzenberg
Thanks, I understand. Im using Java newAPIHadoopRDD. It seems that there is now way to define that partitioner when creating the RDD, correct? Does this mean I have to call partitionBy ? It seems like it would be a lot more efficient to e able to define the partitioner on RDD creation. Yadid

Re: writing to HDFS with a given username

2013-12-13 Thread Matei Zaharia
Yup, this should be in Spark 0.9 and 0.8.1. Matei On Dec 13, 2013, at 9:41 AM, Koert Kuipers ko...@tresata.com wrote: thats great. didn't realize this was in master already. On Thu, Dec 12, 2013 at 8:10 PM, Shao, Saisai saisai.s...@intel.com wrote: Hi Koert, Spark with