Re: Replacing Spark's native scheduler with Sparrow

2014-11-08 Thread Tathagata Das
Let me chime in on the discussion as well. Spark Streaming is another usecase where the scheduler's task-launching throughput and task-latency can limit the batch interval and the overall latencies achievable by Spark Streaming. Lets say we want to do batches of 20 ms (for achieve end-to-end latenc

[RESULT] [VOTE] Designating maintainers for some Spark components

2014-11-08 Thread Matei Zaharia
Thanks everyone for voting on this. With all of the PMC votes being for, the vote passes, but there were some concerns that I wanted to address for everyone who brought them up, as well as in the wording we will use for this policy. First, like every Apache project, Spark follows the Apache voti

Re: proposal / discuss: multiple Serializers within a SparkContext?

2014-11-08 Thread Sandy Ryza
Ah awesome. Passing customer serializers when persisting an RDD is exactly one of the things I was thinking of. -Sandy On Fri, Nov 7, 2014 at 1:19 AM, Matei Zaharia wrote: > Yup, the JIRA for this was https://issues.apache.org/jira/browse/SPARK-540 > (one of our older JIRAs). I think it would

Re: Replacing Spark's native scheduler with Sparrow

2014-11-08 Thread Michael Armbrust
> > However, I haven't seen it be as > high as the 100ms Michael quoted (maybe this was for jobs with tasks that > have much larger objects that take a long time to deserialize?). > I was thinking more about the average end-to-end latency for launching a query that has 100s of partitions. Its also

Re: Should new YARN shuffle service work with "yarn-alpha"?

2014-11-08 Thread Patrick Wendell
Great - I think that should work, but if there are any issues we can definitely fix them up. On Sat, Nov 8, 2014 at 12:47 AM, Sean Owen wrote: > Oops, that was my mistake. I moved network/shuffle into yarn, when > it's just that network/yarn should be removed from yarn-alpha. That > makes yarn-al

MLlib related query

2014-11-08 Thread Manu Kaul
Hi All, I would like to contribute code to the MLlib library with some other ML algorithms, but I was wondering if there were any research papers that led to the development of these libraries using Breeze? I see papers for Apache Spark, but not for MLlib. Thanks, Manu -- The greater danger for

Re: Should new YARN shuffle service work with "yarn-alpha"?

2014-11-08 Thread Sean Owen
Oops, that was my mistake. I moved network/shuffle into yarn, when it's just that network/yarn should be removed from yarn-alpha. That makes yarn-alpha work. I'll run tests and open a quick JIRA / PR for the change. On Sat, Nov 8, 2014 at 8:23 AM, Patrick Wendell wrote: > This second error is som

Re: EC2 clusters ready in launch time + 30 seconds

2014-11-08 Thread Nicholas Chammas
I've posted an initial proposal and implementation of using Packer to automate generating Spark AMIs to SPARK-3821

Re: Should new YARN shuffle service work with "yarn-alpha"?

2014-11-08 Thread Patrick Wendell
I think you might be conflating two things. The first error you posted was because YARN didn't standardize the shuffle API in alpha versions so our spark-network-yarn module won't compile. We should just disable that module if yarn alpha is used. spark-network-yarn is a leaf in the intra-module dep