Re: Handling questions in the mailing lists

2016-11-08 Thread Denny Lee
Agreed that by simply just moving the questions to SO will not solve anything but I think the call out about the meta-tags is that we need to abide by SO rules and if we were to just jump in and start creating meta-tags, we would be violating at minimum the spirit and at maximum the actual conventi

RE: Handling questions in the mailing lists

2016-11-08 Thread assaf.mendelson
I like the document and I think it is good but I still feel like we are missing an important part here. Look at SO today. There are: - 4658 unanswered questions under apache-spark tag. - 394 unanswered questions under spark-dataframe tag. - 639 unanswered questions

Re: [VOTE] Release Apache Spark 2.0.2 (RC3)

2016-11-08 Thread vaquar khan
*+1 (non binding)* On Tue, Nov 8, 2016 at 10:21 PM, Weiqing Yang wrote: > +1 (non binding) > > > Environment: CentOS Linux release 7.0.1406 (Core) / openjdk version > "1.8.0_111" > > > > ./build/mvn -Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive -Phive-thriftserver > -Dpyspark -Dsparkr -DskipTests cl

Re: [VOTE] Release Apache Spark 2.0.2 (RC3)

2016-11-08 Thread Weiqing Yang
+1 (non binding) Environment: CentOS Linux release 7.0.1406 (Core) / openjdk version "1.8.0_111" ./build/mvn -Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive -Phive-thriftserver -Dpyspark -Dsparkr -DskipTests clean package ./build/mvn -Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive -Phive-thriftserver -Dpy

Re: Connectors using new Kafka consumer API

2016-11-08 Thread Mark Grover
I think they are open to others helping, in fact, more than one person has worked on the JIRA so far. And, it's been crawling really slowly and that's preventing adoption of Spark's new connector in secure Kafka environments. On Tue, Nov 8, 2016 at 7:59 PM, Cody Koeninger wrote: > Have you asked

Re: Connectors using new Kafka consumer API

2016-11-08 Thread Cody Koeninger
Have you asked the assignee on the Kafka jira whether they'd be willing to accept help on it? On Tue, Nov 8, 2016 at 5:26 PM, Mark Grover wrote: > Hi all, > We currently have a new direct stream connector, thanks to work by Cody and > others on SPARK-12177. > > However, that can't be used in secu

Re: [VOTE] Release Apache Spark 2.0.2 (RC3)

2016-11-08 Thread Liwei Lin
+1 (non-binding) Cheers, Liwei On Tue, Nov 8, 2016 at 9:50 PM, Ricardo Almeida < ricardo.alme...@actnowib.com> wrote: > +1 (non-binding) > > over Ubuntu 16.10, Java 8 (OpenJDK 1.8.0_111) built with Hadoop 2.7.3, > YARN, Hive > > > On 8 November 2016 at 12:38, Herman van Hövell tot Westerflier <

Re: [VOTE] Release Apache Spark 2.0.2 (RC3)

2016-11-08 Thread Jagadeesan As
+1 (non binding) Ubuntu 14.04.2 OpenJDK-1.8.0_72 Pyarn -Phadoop-2.7 -Psparkr -Pkinesis-asl -Phive-thriftserver Cheers, Jagadeesan A S From: Reynold Xin To: "dev@spark.apache.org" Date: 08-11-16 11:40 AM Subject:[VOTE] Release Apache Spark 2.0.2 (RC3) Please vote on relea

Connectors using new Kafka consumer API

2016-11-08 Thread Mark Grover
Hi all, We currently have a new direct stream connector, thanks to work by Cody and others on SPARK-12177. However, that can't be used in secure clusters that require Kerberos authentication. That's because Kafka currently doesn't support delegation tokens (KAFKA-1696

Re: Diffing execution plans to understand an optimizer bug

2016-11-08 Thread Herman van Hövell tot Westerflier
Replied in the ticket. On Tue, Nov 8, 2016 at 11:36 PM, Nicholas Chammas < nicholas.cham...@gmail.com> wrote: > SPARK-18367 : limit() > makes the lame walk again > > On Tue, Nov 8, 2016 at 5:00 PM Nicholas Chammas < > nicholas.cham...@gmail.com>

Re: Diffing execution plans to understand an optimizer bug

2016-11-08 Thread Nicholas Chammas
SPARK-18367 : limit() makes the lame walk again On Tue, Nov 8, 2016 at 5:00 PM Nicholas Chammas wrote: > Hmm, it doesn’t seem like I can access the output of > df._jdf.queryExecution().hiveResultString() from Python, and until I can > boil the i

Re: [VOTE] Release Apache Spark 2.0.2 (RC3)

2016-11-08 Thread Dongjoon Hyun
+1 (non-binding) It's built and tested on CentOS 6.8 / OpenJDK 1.8.0_111 with `-Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive -Phive-thriftserver -Psparkr` profile. Cheers! Dongjoon. On 2016-11-08 14:03 (-0800), Michael Armbrust wrote: > +1 > > On Tue, Nov 8, 2016 at 1:17 PM, Sean Owen wrote: >

Re: [VOTE] Release Apache Spark 2.0.2 (RC3)

2016-11-08 Thread Michael Armbrust
+1 On Tue, Nov 8, 2016 at 1:17 PM, Sean Owen wrote: > +1 binding > > (See comments on last vote; same results, except, the regression we > identified is fixed now.) > > > On Tue, Nov 8, 2016 at 6:10 AM Reynold Xin wrote: > >> Please vote on releasing the following candidate as Apache Spark vers

Re: Diffing execution plans to understand an optimizer bug

2016-11-08 Thread Nicholas Chammas
Hmm, it doesn’t seem like I can access the output of df._jdf.queryExecution().hiveResultString() from Python, and until I can boil the issue down a bit, I’m stuck with using Python. I’ll have a go at using regexes to strip some stuff from the printed plans. The one that’s working for me to strip t

Re: Diffing execution plans to understand an optimizer bug

2016-11-08 Thread Reynold Xin
If you want to peek into the internals and do crazy things, it is much easier to do it in Scala with df.queryExecution. For explain string output, you can work around the comparison simply by doing replaceAll("#\\d+", "#x") similar to the patch here: https://github.com/apache/spark/commit/fd90541

Diffing execution plans to understand an optimizer bug

2016-11-08 Thread Nicholas Chammas
I’m trying to understand what I think is an optimizer bug. To do that, I’d like to compare the execution plans for a certain query with and without a certain change, to understand how that change is impacting the plan. How would I do that in PySpark? I’m working with 2.0.1, but I can use master if

Re: [VOTE] Release Apache Spark 2.0.2 (RC3)

2016-11-08 Thread Sean Owen
+1 binding (See comments on last vote; same results, except, the regression we identified is fixed now.) On Tue, Nov 8, 2016 at 6:10 AM Reynold Xin wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.0.2. The vote is open until Thu, Nov 10, 2016 at 22:00 PDT an

Re: [VOTE] Release Apache Spark 2.0.2 (RC3)

2016-11-08 Thread Shixiong(Ryan) Zhu
+1 On Tue, Nov 8, 2016 at 5:50 AM, Ricardo Almeida < ricardo.alme...@actnowib.com> wrote: > +1 (non-binding) > > over Ubuntu 16.10, Java 8 (OpenJDK 1.8.0_111) built with Hadoop 2.7.3, > YARN, Hive > > > On 8 November 2016 at 12:38, Herman van Hövell tot Westerflier < > hvanhov...@databricks.com>

Re: Spark Improvement Proposals

2016-11-08 Thread Ryan Blue
On lazy consensus as opposed to voting: First, why lazy consensus? The proposal was for consensus, which is at least three +1 votes and no vetos. Consensus has no losing side, it requires getting to a point where there is agreement. Isn't that agreement what we want to achieve with these proposals

Re: Handling questions in the mailing lists

2016-11-08 Thread Michael Segel
Guys… please take what I say with a grain of salt… The issue is that the input is a stream of messages where they are addressed in a LIFO manner. This means that messages may be ignored. The stream of data (user@spark for example) is semi-structured in that the stream contains a lot of message

Re: Spark Improvement Proposals

2016-11-08 Thread Cody Koeninger
So there are some minor things (the Where section heading appears to be dropped; wherever this document is posted it needs to actually link to a jira filter showing current / past SIPs) but it doesn't look like I can comment on the google doc. The major substantive issue that I have is that this v

Re: [VOTE] Release Apache Spark 2.0.2 (RC3)

2016-11-08 Thread Ricardo Almeida
+1 (non-binding) over Ubuntu 16.10, Java 8 (OpenJDK 1.8.0_111) built with Hadoop 2.7.3, YARN, Hive On 8 November 2016 at 12:38, Herman van Hövell tot Westerflier < hvanhov...@databricks.com> wrote: > +1 > > On Tue, Nov 8, 2016 at 7:09 AM, Reynold Xin wrote: > >> Please vote on releasing the fo

Re: [VOTE] Release Apache Spark 2.0.2 (RC3)

2016-11-08 Thread Herman van Hövell tot Westerflier
+1 On Tue, Nov 8, 2016 at 7:09 AM, Reynold Xin wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.0.2. The vote is open until Thu, Nov 10, 2016 at 22:00 PDT and passes if > a majority of at least 3+1 PMC votes are cast. > > [ ] +1 Release this package as Apache