Re: Spark Improvement Proposals(Internet mail)

2016-10-17 Thread 黄明
There’s no need to compare to Flink’s Streaming Model. Spark should focus more on how to go beyond itself. From the beginning, Spark’s success comes from it’s unified model can satisfiy SQL,Streaming, Machine Learning Models and Graphs Jobs …… all in One. But From 1.6 to 2.0, the abstraction

trying to use Spark applications with modified Kryo

2016-10-17 Thread Prasun Ratn
Hi I want to run some Spark applications with some changes in Kryo serializer. Please correct me, but I think I need to recompile spark (instead of just the Spark applications) in order to use the newly built Kryo serializer? I obtained Kryo 3.0.3 source and built it (mvn package install).

[VOTE] Release Apache Spark 1.6.3 (RC1)

2016-10-17 Thread Reynold Xin
Please vote on releasing the following candidate as Apache Spark version 1.6.3. The vote is open until Thursday, Oct 20, 2016 at 18:00 PDT and passes if a majority of at least 3+1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.6.3 [ ] -1 Do not release this package because ...

Re: cutting 2.0.2?

2016-10-17 Thread Sean Owen
(I don't think 2.0.2 will be released for a while if at all but that's not what you're asking I think) It's a fairly safe change, but also isn't exactly a fix in my opinion. Because there are some other changes to make it all work for SPARC, I think it's more realistic to look to the 2.1.0

Re: source for org.spark-project.hive:1.2.1.spark2

2016-10-17 Thread Sean Owen
IIRC this was all about shading of dependencies, not changes to the source. On Mon, Oct 17, 2016 at 6:26 PM Ryan Blue wrote: > Are these changes that the Hive community has rejected? I don't see a > compelling reason to have a long-term Spark fork of Hive. > > rb > >

Custom Monitoring of Spark applications

2016-10-17 Thread Nicolae Rosca
Hi all, I am trying to write a custom Source for counting errors and output that with Spark sink mechanism ( CSV or JMX ) and having some problems understanding how this works. 1. I defined the Source, added counters created with MetricRegistry and registered the Source >

Re: Spark Improvement Proposals

2016-10-17 Thread Cody Koeninger
I think narrowly focusing on Flink or benchmarks is missing my point. My point is evolve or die. Spark's governance and organization is hampering its ability to evolve technologically, and it needs to change. On Sun, Oct 16, 2016 at 9:21 PM, Debasish Das wrote: >

Re: cutting 2.0.2?

2016-10-17 Thread Cody Koeninger
SPARK-17841 three line bugfix that has a week old PR SPARK-17812 being able to specify starting offsets is a must have for a Kafka mvp in my opinion, already has a PR SPARK-17813 I can put in a PR for this tonight if it'll be considered On Mon, Oct 17, 2016 at 12:28 AM, Reynold Xin

Odp.: Spark Improvement Proposals

2016-10-17 Thread Tomasz Gawęda
Maybe my mail was not clear enough. I didn't want to write "lets focus on Flink" or any other framework. The idea with benchmarks was to show two things: - why some people are doing bad PR for Spark - how - in easy way - we can change it and show that Spark is still on the top No more, no

Re: cutting 2.0.2?

2016-10-17 Thread Erik O'Shaughnessy
I would very much like to see SPARK-16962 included in 2.0.2 as it addresses unaligned memory access patterns that crash non-x86 platforms. I believe this falls in the category of "correctness fix". We (Oracle SAE) have applied the fixes for SPARK-16962 to branch-2.0 and have not encountered any

Re: trying to use Spark applications with modified Kryo

2016-10-17 Thread Steve Loughran
On 17 Oct 2016, at 10:02, Prasun Ratn > wrote: Hi I want to run some Spark applications with some changes in Kryo serializer. Please correct me, but I think I need to recompile spark (instead of just the Spark applications) in order to use

Re: trying to use Spark applications with modified Kryo

2016-10-17 Thread Prasun Ratn
Thanks a lot Steve! On Mon, Oct 17, 2016 at 4:59 PM, Steve Loughran wrote: > > On 17 Oct 2016, at 10:02, Prasun Ratn wrote: > > Hi > > I want to run some Spark applications with some changes in Kryo serializer. > > Please correct me, but I think I

Fwd: Large variation in spark in Task Deserialization Time

2016-10-17 Thread Pulasthi Supun Wickramasinghe
Hi Devs/All, I am seeing a huge variation on spark Task Deserialization Time for my collect and reduce operations. while most tasks complete within 100ms a few take mote than a couple of seconds which slows the entire program down. I have attached a screen shot of the web UI where you can see the

Indexing w spark joins?

2016-10-17 Thread Michael Segel
Hi, Apologies if I’ve asked this question before but I didn’t see it in the list and I’m certain that my last surviving brain cell has gone on strike over my attempt to reduce my caffeine intake… Posting this to both user and dev because I think the question / topic jumps in to both camps.

[build system] jenkins downtime for backups delayed by a hung build

2016-10-17 Thread shane knapp
i just noticed that jenkins was still in quiet mode this morning due to a hung build. i killed the build, backups happened, and the queue is now happily building. sorry for any delay! shane - To unsubscribe e-mail:

Re: source for org.spark-project.hive:1.2.1.spark2

2016-10-17 Thread Ryan Blue
Are these changes that the Hive community has rejected? I don't see a compelling reason to have a long-term Spark fork of Hive. rb On Sat, Oct 15, 2016 at 5:27 AM, Steve Loughran wrote: > > On 15 Oct 2016, at 01:28, Ryan Blue wrote: > > The