Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-09 Thread Wenchen Fan
FYI: the Presto UDF API also takes individual parameters instead of the row parameter. I think this direction at least worth a try so that we can see the performance difference. It's also mentioned in the design doc as an alternative

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-09 Thread Wenchen Fan
Hi Holden, As Hyukjin said, following existing designs is not the principle of DS v2 API design. We should make sure the DS v2 API makes sense. AFAIK we didn't fully follow the catalog API design from Hive and I believe Ryan also agrees with it. I think the problem here is we were discussing

Re: [VOTE] Release Spark 3.1.1 (RC2)

2021-02-09 Thread Yikun Jiang
+1, Tested build and basic feature on aarch64(ARM64) environment. Regards, Yikun Yuming Wang 于2021年2月9日周二 下午8:24写道: > +1. Tested a batch of queries with YARN client mode. > > On Tue, Feb 9, 2021 at 2:57 PM 郑瑞峰 wrote: > >> +1 (non-binding) >> >> Thank you, Hyukjin >> >> >> --

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-09 Thread Hyukjin Kwon
Just dropping a few lines. I remember that one of the goals in DSv2 is to correct the mistakes we made in the current Spark codes. It would not have much point if we will happen to just follow and mimic what Spark currently does. It might just end up with another copy of Spark APIs, e.g.

Re: dataFrame.na.fill() fails for column with dot

2021-02-09 Thread Terry Kim
You probably need to update f. name here as well, but we can discuss further when you create a JIRA/PR. Thanks, Terry On Tue, Feb 9, 2021

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-09 Thread Holden Karau
I think this proposal is a good set of trade-offs and has existed in the community for a long period of time. I especially appreciate how the design is focused on a minimal useful component, with future optimizations considered from a point of view of making sure it's flexible, but actual concrete

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-09 Thread Ryan Blue
I don’t think that using Invoke really works. The usability is poor if something goes wrong and it can’t handle varargs or parameterized use cases well. There isn’t a significant enough performance penalty for passing data as a row to justify making the API much more difficult and less expressive.

Re: dataFrame.na.fill() fails for column with dot

2021-02-09 Thread Terry Kim
Thanks Amandeep. This seems like a valid bug to me as quoted columns are not handled property for na.fill(). I think the better place to fix is in DataFrameNaFunctions.scala

Re: Hyperparameter Optimization via Randomization

2021-02-09 Thread Phillip Henry
Hi, Sean. I've added a comment in the new class to suggest a look at Hyperopt etc if the user is using Python. Anyway I've created a pull request: https://github.com/apache/spark/pull/31535 and all tests, style checks etc pass. Wish me luck :) And thanks for the support :) Phillip On Mon,

Re: [DISCUSS] Add RocksDB StateStore

2021-02-09 Thread Hyukjin Kwon
I mean I am okay with adding it as an external module for the extra clarification :-) 2021년 2월 9일 (화) 오후 11:10, Hyukjin Kwon 님이 작성: > I'm good with this too. > > 2021년 2월 9일 (화) 오후 4:16, DB Tsai 님이 작성: > >> +1 to add it as an external module so people can test it out and give >> feedback easier.

Re: [DISCUSS] Add RocksDB StateStore

2021-02-09 Thread Hyukjin Kwon
I'm good with this too. 2021년 2월 9일 (화) 오후 4:16, DB Tsai 님이 작성: > +1 to add it as an external module so people can test it out and give > feedback easier. > > On Mon, Feb 8, 2021 at 10:22 PM Gabor Somogyi > wrote: > > > > +1 adding it any way. > > > > On Mon, 8 Feb 2021, 21:54 Holden Karau,

Re: [VOTE] Release Spark 3.1.1 (RC2)

2021-02-09 Thread Yuming Wang
+1. Tested a batch of queries with YARN client mode. On Tue, Feb 9, 2021 at 2:57 PM 郑瑞峰 wrote: > +1 (non-binding) > > Thank you, Hyukjin > > > -- 原始邮件 -- > *发件人:* "Gengliang Wang" ; > *发送时间:* 2021年2月9日(星期二) 中午1:50 > *收件人:* "Sean Owen"; > *抄送:* "Hyukjin

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-09 Thread Wenchen Fan
Hi Ryan, Sorry if I didn't make it clear. I was referring to implementing UDF using codegen, not calling the UDF with codegen or not. Calling UDF is Spark's job and it doesn't matter if the UDF API uses row or individual parameters, as you said. My point is, it's a bad idea to follow the

dataFrame.na.fill() fails for column with dot

2021-02-09 Thread Amandeep Sharma
Hi guys, Apologies for the long mail. I am running below code snippet import org.apache.spark.sql.SparkSession object ColumnNameWithDot { def main(args: Array[String]): Unit = { val spark = SparkSession.builder.appName("Simple Application") .config("spark.master", "local").getOrCreate()

Re: [VOTE] Release Spark 3.1.1 (RC2)

2021-02-09 Thread Jungtaek Lim
+1 (non-binding) * verified signatures * built custom distribution with enabling kubernetes & hadoop-cloud profile * built custom docker image from dist * ran applications "rate to kafka" & "kafka to kafka" on k8s cluster (local k3s) Thanks for driving the release! Jungtaek Lim (HeartSaVioR)