Re: Spark 3.0 preview release feature list and major changes

2019-10-19 Thread Erik Erlandson
I'd like to get SPARK-27296 onto 3.0: SPARK-27296 Efficient User Defined Aggregators On Mon, Oct 7, 2019 at 3:03 PM Xingbo Jiang wrote: > Hi all, > > I went over all the finished JIRA tickets targeted to Spark 3.0.0, here > I'm listing all

Re: Spark 3.0 preview release feature list and major changes

2019-10-10 Thread Weichen Xu
Wait... I have some supplement: *New API:* SPARK-25097 Support prediction on single instance in KMeans/BiKMeans/GMM SPARK-28045 add missing RankingEvaluator SPARK-29121 Support Dot Product for Vectors *Behavior change or new API with behavior change:* SPARK-23265 Update multi-column error

Re: Spark 3.0 preview release feature list and major changes

2019-10-10 Thread Xingbo Jiang
Hi all, Here is the updated feature list: SPARK-11215 Multiple columns support added to various Transformers: StringIndexer SPARK-11150 Implement Dynamic Partition Pruning SPARK-13677

Re: Spark 3.0 preview release feature list and major changes

2019-10-10 Thread Sean Owen
See the JIRA - this is too open-ended and not obviously just due to choices in data representation, what you're trying to do, etc. It's correctly closed IMHO. However, identifying the issue more narrowly, and something that looks ripe for optimization, would be useful. On Thu, Oct 10, 2019 at

Re: Spark 3.0 preview release feature list and major changes

2019-10-10 Thread antonkulaga
I think for sure SPARK-28547 At the moment there are some flows in Spark architecture and it performs miserably or even freezes everywhere where column number exceeds 10-15K (even simple describe function takes ages while the

Re: Spark 3.0 preview release feature list and major changes

2019-10-09 Thread Xiao Li
SPARK-29345 Add an API that allows a user to define and observe arbitrary metrics on streaming queries Let us add this too. Cheers, Xiao On Tue, Oct 8, 2019 at 10:31 PM Wenchen Fan wrote: > Regarding DS v2, I'd like to remove > SPARK-26785

Re: Spark 3.0 preview release feature list and major changes

2019-10-08 Thread Wenchen Fan
Regarding DS v2, I'd like to remove SPARK-26785 data source v2 API refactor: streaming write SPARK-26956 remove streaming output mode from data source v2 APIs and put the umbrella ticket

Re: Spark 3.0 preview release feature list and major changes

2019-10-08 Thread Dongjoon Hyun
Thank you for the preparation of 3.0-preview, Xingbo! Bests, Dongjoon. On Tue, Oct 8, 2019 at 2:32 PM Xingbo Jiang wrote: > What's the process to propose a feature to be included in the final Spark >> 3.0 release? >> > > I don't know whether there exists any specific process here, normally

Re: Spark 3.0 preview release feature list and major changes

2019-10-08 Thread Xingbo Jiang
> > What's the process to propose a feature to be included in the final Spark > 3.0 release? > I don't know whether there exists any specific process here, normally you just merge the feature into Spark master before release code freeze, and then the feature would probably be included in the

Re: Spark 3.0 preview release feature list and major changes

2019-10-08 Thread Li Jin
Thanks for summary! I have a question that is semi-related - What's the process to propose a feature to be included in the final Spark 3.0 release? In particular, I am interested in https://issues.apache.org/jira/browse/SPARK-28006. I am happy to do the work so want to make sure I don't miss

Re: Spark 3.0 preview release feature list and major changes

2019-10-08 Thread Xingbo Jiang
Hi all, Thanks for all the feedbacks, here is the updated feature list: SPARK-11215 Multiple columns support added to various Transformers: StringIndexer SPARK-11150 Implement Dynamic

Re: Spark 3.0 preview release feature list and major changes

2019-10-07 Thread Hyukjin Kwon
Cogroup Pandas UDF missing: SPARK-27463 Support Dataframe Cogroup via Pandas UDFs Vectorized R execution: SPARK-26759 Arrow optimization in SparkR's interoperability 2019년 10월 8일 (화) 오전

Re: Spark 3.0 preview release feature list and major changes

2019-10-07 Thread Jungtaek Lim
Thanks for bringing the nice summary of Spark 3.0 improvements! I'd like to add some items from structured streaming side, SPARK-28199 Move Trigger implementations to Triggers.scala and avoid exposing these to the end users (removal of

Spark 3.0 preview release feature list and major changes

2019-10-07 Thread Xingbo Jiang
Hi all, I went over all the finished JIRA tickets targeted to Spark 3.0.0, here I'm listing all the notable features and major changes that are ready to test/deliver, please don't hesitate to add more to the list: SPARK-11215 Multiple columns