Re: [VOTE] SPARK 2.4.0 (RC1)

2018-09-17 Thread Stavros Kontopoulos
Hi Xiao, I just tested it, it seems ok. There are some questions about which properties we should keep when restoring the config. Otherwise it looks ok to me. The reason this should go in 2.4 is that streaming on k8s is something people want to try day one (or at least it is cool to try) and

Re: Should python-2 be supported in Spark 3.0?

2018-09-17 Thread Erik Erlandson
FWIW, Pandas is dropping Py2 support at the end of this year. Tensorflow is less clear. They only support py3 on windows, but there is no reference to any policy about py2 on their roadmap or the

Re: Should python-2 be supported in Spark 3.0?

2018-09-17 Thread Matei Zaharia
I’d like to understand the maintenance burden of Python 2 before deprecating it. Since it is not EOL yet, it might make sense to only deprecate it once it’s EOL (which is still over a year from now). Supporting Python 2+3 seems less burdensome than supporting, say, multiple Scala versions in

Re: Should python-2 be supported in Spark 3.0?

2018-09-17 Thread Mark Hamstra
What is the disadvantage to deprecating now in 2.4.0? I mean, it doesn't change the code at all; it's just a notification that we will eventually cease supporting Py2. Wouldn't users prefer to get that notification sooner rather than later? On Mon, Sep 17, 2018 at 12:58 PM Matei Zaharia wrote:

Re: [VOTE] SPARK 2.4.0 (RC1)

2018-09-17 Thread Yinan Li
We can merge the PR and get SPARK-23200 resolved if the whole point is to make streaming on k8s work first. But given that this is not a blocker for 2.4, I think we can take a bit more time here and get it right. With that being said, I would expect it to be resolved soon. On Mon, Sep 17, 2018 at

Re: Python friendly API for Spark 3.0

2018-09-17 Thread Leif Walsh
I agree with Reynold, at some point you’re going to run into the parts of the pandas API that aren’t distributable. More feature parity will be good, but users are still eventually going to hit a feature cliff. Moreover, it’s not just the pandas API that people want to use, but also the set of

Re: Should python-2 be supported in Spark 3.0?

2018-09-17 Thread Matei Zaharia
That’s a good point — I’d say there’s just a risk of creating a perception issue. First, some users might feel that this means they have to migrate now, which is before Python itself drops support; they might also be surprised that we did this in a minor release (e.g. might we drop Python 2

Re: Should python-2 be supported in Spark 3.0?

2018-09-17 Thread Reynold Xin
i'd like to second that. if we want to communicate timeline, we can add to the release notes saying py2 will be deprecated in 3.0, and removed in a 3.x release. -- excuse the brevity and lower case due to wrist injury On Mon, Sep 17, 2018 at 4:24 PM Matei Zaharia wrote: > That’s a good point

Re: [VOTE] SPARK 2.3.2 (RC6)

2018-09-17 Thread Sean Owen
+1 . Licenses and sigs check out as in previous 2.3.x releases. A build from source with most profiles passed for me. On Mon, Sep 17, 2018 at 8:17 AM Saisai Shao wrote: > > Please vote on releasing the following candidate as Apache Spark version > 2.3.2. > > The vote is open until September 21

Re: [Discuss] Datasource v2 support for manipulating partitions

2018-09-17 Thread tigerquoll
Hi Jayesh, I get where you are coming from - partitions are just an implementation optimisation that we really shouldn’t be bothering the end user with. Unfortunately that view is like saying RPC is like a procedure call, and details of the network transport should be hidden from the end user.

Re: [VOTE] SPARK 2.3.2 (RC6)

2018-09-17 Thread Wenchen Fan
+1. All the blocker issues are all resolved in 2.3.2 AFAIK. On Tue, Sep 18, 2018 at 9:23 AM Sean Owen wrote: > +1 . Licenses and sigs check out as in previous 2.3.x releases. A > build from source with most profiles passed for me. > On Mon, Sep 17, 2018 at 8:17 AM Saisai Shao > wrote: > > > >

Re: [VOTE] SPARK 2.4.0 (RC1)

2018-09-17 Thread Marcelo Vanzin
You can log in to https://repository.apache.org and see what's wrong. Just find that staging repo and look at the messages. In your case it seems related to your signature. failureMessageNo public key: Key with id: () was not able to be located on http://gpg-keyserver.de/. Upload your public

Re: [VOTE] SPARK 2.4.0 (RC1)

2018-09-17 Thread Erik Erlandson
I have no binding vote but I second Stavros’ recommendation for spark-23200 Per parallel threads on Py2 support I would also like to propose deprecating Py2 starting with this 2.4 release On Mon, Sep 17, 2018 at 10:38 AM Marcelo Vanzin wrote: > You can log in to https://repository.apache.org

Re: [VOTE] SPARK 2.4.0 (RC1)

2018-09-17 Thread Xiao Li
Hi, Erik and Stavros, This bug fix SPARK-23200 is not a blocker of the 2.4 release. It sounds important for the Streaming on K8S. Could the K8S oriented committers speed up the reviews? Thanks, Xiao Erik Erlandson 于2018年9月17日周一 上午11:04写道: > > I have no binding vote but I second Stavros’

Re: Should python-2 be supported in Spark 3.0?

2018-09-17 Thread Erik Erlandson
I like Mark’s concept for deprecating Py2 starting with 2.4: It may seem like a ways off but even now there may be some spark versions supporting Py2 past the point where Py2 is no longer receiving security patches On Sun, Sep 16, 2018 at 12:26 PM Mark Hamstra wrote: > We could also deprecate

Re: Should python-2 be supported in Spark 3.0?

2018-09-17 Thread Mark Hamstra
If we're going to do that, then we need to do it right now, since 2.4.0 is already in release candidates. On Mon, Sep 17, 2018 at 10:57 AM Erik Erlandson wrote: > I like Mark’s concept for deprecating Py2 starting with 2.4: It may seem > like a ways off but even now there may be some spark

Re: [VOTE] SPARK 2.3.2 (RC6)

2018-09-17 Thread Saisai Shao
+1 from my own side. Thanks Saisai Wenchen Fan 于2018年9月18日周二 上午9:34写道: > +1. All the blocker issues are all resolved in 2.3.2 AFAIK. > > On Tue, Sep 18, 2018 at 9:23 AM Sean Owen wrote: > >> +1 . Licenses and sigs check out as in previous 2.3.x releases. A >> build from source with most

Re: [VOTE] SPARK 2.4.0 (RC1)

2018-09-17 Thread Saisai Shao
Hi Wenchen, I think you need to set SPHINXPYTHON to python3 before building the docs, to workaround the doc issue ( https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc1-docs/_site/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression ). Here is the notes for release page:

Re: Should python-2 be supported in Spark 3.0?

2018-09-17 Thread Erik Erlandson
I think that makes sense. The main benefit of deprecating *prior* to 3.0 would be informational - making the community aware of the upcoming transition earlier. But there are other ways to start informing the community between now and 3.0, besides formal deprecation. I have some residual

Metastore problem on Spark2.3 with Hive3.0

2018-09-17 Thread ??????????
Hi, guys I am using Spark2.3 and I meet the metastore problem. It looks like something about the compatibility cause Spark2.3 still use the hive-metastore-1.2.1-spark2. Is there any solution? The Hive metastore version is 3.0 and the stacktrace is below:

Re: Metastore problem on Spark2.3 with Hive3.0

2018-09-17 Thread Dongjoon Hyun
Hi, Jerry. There is a JIRA issue for that, https://issues.apache.org/jira/browse/SPARK-24360 . So far, it's in progress for Hive 3.1.0 Metastore for Apache Spark 2.5.0. You can track that issue there. Bests, Dongjoon. On Mon, Sep 17, 2018 at 7:01 PM 白也诗无敌 <445484...@qq.com> wrote: > Hi, guys

Re: how can solve this error

2018-09-17 Thread Wenchen Fan
have you read https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html ? On Mon, Sep 17, 2018 at 4:46 AM hagersaleh wrote: > I write code to connect kafka with spark using python and I run code on > jupyer > my code > import os > #os.environ['PYSPARK_SUBMIT_ARGS'] =

Re: [VOTE] SPARK 2.4.0 (RC1)

2018-09-17 Thread Stavros Kontopoulos
-1 I would like to see: https://github.com/apache/spark/pull/22392 in, as discussed here: https://issues.apache.org/jira/browse/SPARK-23200. It is important IMHO for streaming on K8s. I just started testing it btw. Also 2.12.7(https://contributors.scala-lang.org/t/2-12-7-release/2301,

Re: [VOTE] SPARK 2.4.0 (RC1)

2018-09-17 Thread Nicholas Chammas
I believe -1 votes are merited only for correctness bugs and regressions since the previous release. Does SPARK-23200 count as either? 2018년 9월 17일 (월) 오전 9:40, Stavros Kontopoulos < stavros.kontopou...@lightbend.com>님이 작성: > -1 > > I would like to see:

[VOTE] SPARK 2.3.2 (RC6)

2018-09-17 Thread Saisai Shao
Please vote on releasing the following candidate as Apache Spark version 2.3.2. The vote is open until September 21 PST and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 2.3.2 [ ] -1 Do not release this package because ...

Re: [VOTE] SPARK 2.4.0 (RC1)

2018-09-17 Thread Stavros Kontopoulos
I just follow the comment Wehnchen Fan (of course it is not merged yet, but I wanted to bring this to the attention of the dev list) "We should definitely merge it to branch 2.4, but I won't block the release since it's not that critical and it's still in progress. After it's merged, feel free to

Re: [VOTE] SPARK 2.4.0 (RC1)

2018-09-17 Thread Holden Karau
Deprecating Py 2 in the 2.4 release probably doesn't belong in the RC vote thread. Personally I think we might be a little too late in the game to deprecate it in 2.4, but I think calling it out as "soon to be deprecated" in the release docs would be sensible to give folks extra time to prepare.