Re: [DISCUSS] Java specific APIs design concern and choice

2020-05-07 Thread Hyukjin Kwon
Hi all, I would like to proceed this. Are there more thoughts on this? If not, I would like to go ahead with the proposal here. 2020년 4월 30일 (목) 오후 10:54, Hyukjin Kwon 님이 작성: > Nothing is urgent. I just don't want to leave it undecided and just keep > adding Java APIs inconsistently as it's

Re: Spark FP-growth

2020-05-07 Thread Aditya Addepalli
Hi, I understand that this is not a priority with everything going on, but if you think generating rules for only a single consequent adds value, I would like to contribute. Thanks & Regards, Aditya On Sat, May 2, 2020 at 9:34 PM Aditya Addepalli wrote: > Hi Sean, > > I understand your

Re: Spark FP-growth

2020-05-07 Thread Aditya Addepalli
Absolutely. I meant to say the confidence calculation depends on the support calculations and hence would reduce the time. Thanks for pointing that out. On Thu, 7 May, 2020, 11:56 pm Sean Owen, wrote: > The confidence calculation is pretty trivial, the work is finding the > supports needed. Not

Re: Spark FP-growth

2020-05-07 Thread Aditya Addepalli
Hi Sean, 1. I was thinking that by specifying the consequent we can (somehow?) skip the confidence calculation for all the other consequents. This would greatly reduce the time taken as we avoid computation for consequents we don't care about. 2. Is limiting rule size even possible? I thought

[VOTE] Release Spark 2.4.6 (RC1)

2020-05-07 Thread Holden Karau
Please vote on releasing the following candidate as Apache Spark version 2.4.6. The vote is open until February 5th 11PM PST and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 2.4.6 [ ] -1 Do not release this package because

Re: [VOTE] Release Spark 2.4.6 (RC1)

2020-05-07 Thread Holden Karau
Sorry correction: this vote is open until Monday May 11th at 9am pacific. On Thu, May 7, 2020 at 11:29 AM Holden Karau wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.4.6. > > The vote is open until February 5th 11PM PST and passes if a majority +1 > PMC

Re: Spark FP-growth

2020-05-07 Thread Sean Owen
Yes, you can get the correct support this way by accounting for how many rows were filtered out, but not the right confidence, as it depends on counting support in rows without the items of interest. But computing confidence depends on computing all that support; how would you optimize it even if

Re: Spark FP-growth

2020-05-07 Thread Sean Owen
The confidence calculation is pretty trivial, the work is finding the supports needed. Not sure how to optimize that. On Thu, May 7, 2020, 1:12 PM Aditya Addepalli wrote: > Hi Sean, > > 1. > I was thinking that by specifying the consequent we can (somehow?) skip > the confidence calculation for

Re: [VOTE] Release Spark 2.4.6 (RC1)

2020-05-07 Thread Holden Karau
Thanks for catching that, I was looking at the old release and looked for all the refs to 2.4.5 but missed that. 1343 ( https://repository.apache.org/content/repositories/orgapachespark-1343/ ) is the one to vote on, the others were dry runs of the release script which were a little less dry than

Re: [VOTE] Release Spark 2.4.6 (RC1)

2020-05-07 Thread Dongjoon Hyun
Hi, Holden. The following link looks outdated. It was a link used at Spark 2.4.5 RC2. - https://repository.apache.org/content/repositories/orgapachespark-1340/ Instead, in the Apache repo, there are three candidates. Is 1343 the one we vote? -

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-05-07 Thread Xiao Li
Below are the three major blockers. I think we should start discussing how to unblock the release. -

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-05-07 Thread Sean Owen
So, this RC1 doesn't pass of course, but what's the status of RC2 - are there outstanding issues? On Tue, Mar 31, 2020 at 10:04 PM Reynold Xin wrote: > Please vote on releasing the following candidate as Apache Spark version > 3.0.0. > > The vote is open until 11:59pm Pacific time Fri Apr 3,

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-05-07 Thread Jungtaek Lim
I don't see any new features/functions for these blockers. For SPARK-31257 (which is filed and marked as a blocker from me), I agree unifying create table syntax shouldn't be a blocker for Spark 3.0.0, as that is a new feature, but even we put the proposal aside, the problem remains the same and