[VOTE] Release Spark 3.0.2 (RC1)

2021-02-15 Thread Dongjoon Hyun
Please vote on releasing the following candidate as Apache Spark version 3.0.2. The vote is open until February 19th 9AM (PST) and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 3.0.2 [ ] -1 Do not release this package

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-15 Thread Ye Xianjin
Hi, Thanks for Ryan and Wenchen for leading this. I’d like to add my two cents here. In production environments, the function catalog might be used by multiple systems, such as Spark, Presto and Hive. Is it possible that this function catalog is designed with as an unified function catalog

Re: [DISCUSS] assignee practice on committers+ (possible issue on preemption)

2021-02-15 Thread Jungtaek Lim
Thanks for the input, Hyukjin! I have been keeping my own policy among all discussions I have raised - I would provide the hypothetical example closer to the actual one and avoid pointing out directly. The main purpose of the discussion is to ensure our policy / consensus makes sense, no more. I

Re: [DISCUSS] assignee practice on committers+ (possible issue on preemption)

2021-02-15 Thread Hyukjin Kwon
I remember I raised a similar issue a long time ago in the dev mailing list. I agree that setting no assignee makes sense in most of the cases, and also think we share similar thoughts about the assignee on umbrella JIRAs, followup tasks, the case when it's clear with a design doc, etc. It makes

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-15 Thread Ryan Blue
Thanks for the positive feedback, everyone. It sounds like there is a clear path forward for calling functions. Even without a prototype, the `invoke` plans show that Wenchen's suggested optimization can be done, and incorporating it as an optional extension to this proposal solves many of the

Re: Is there any inplict RDD cache operation for query optimizations?

2021-02-15 Thread attilapiros
hi, There is good reason why the decision about caching is left for the user. Spark does not know about the future of the DataFrames and RDDs. Think about how your program is running (you are still running program), so there is an exact point where the execution is and when Spark reaches an