Re: Apache Training contribution for Spark - Feedback welcome

2019-07-29 Thread Lars Francke
On Mon, Jul 29, 2019 at 2:46 PM Sean Owen wrote: > TL;DR is: take the below as feedback to consider, and proceed as you > see fit. Nobody's suggesting you can't do this. > > On Mon, Jul 29, 2019 at 2:58 AM Lars Francke > wrote: > > The way I read your point is that anyone can publish material (w

Re: [Discuss] Follow ANSI SQL on table insertion

2019-07-29 Thread Hyukjin Kwon
>From my look, +1 on the proposal, considering ASCI and other DBMSes in general. 2019년 7월 30일 (화) 오후 3:21, Wenchen Fan 님이 작성: > We can add a config for a certain behavior if it makes sense, but the most > important thing we want to reach an agreement here is: what should be the > default behavior

Re: [Discuss] Follow ANSI SQL on table insertion

2019-07-29 Thread Wenchen Fan
We can add a config for a certain behavior if it makes sense, but the most important thing we want to reach an agreement here is: what should be the default behavior? Let's explore the solution space of table insertion behavior first: At compile time, 1. always add cast 2. add cast following the A

Fwd: The result of Math.log(3.0) is different on x86_64 and aarch64?

2019-07-29 Thread Sean Owen
That is really interesting re: the recent threads about the value of log() and pow() in the JVM. I think it's worth copying to dev@ here. -- Forwarded message - From: Tianhua huang Date: Mon, Jul 29, 2019 at 5:28 AM Subject: Fwd: The result of Math.log(3.0) is different on x86_64

Re: Apache Training contribution for Spark - Feedback welcome

2019-07-29 Thread Sean Owen
TL;DR is: take the below as feedback to consider, and proceed as you see fit. Nobody's suggesting you can't do this. On Mon, Jul 29, 2019 at 2:58 AM Lars Francke wrote: > The way I read your point is that anyone can publish material (which includes > source code) under the ALv2 outside of the AS

Re: [Discuss] Follow ANSI SQL on table insertion

2019-07-29 Thread Russell Spitzer
I understand spark is making the decisions, i'm say the actual final effect of the null decision would be different depending on the insertion target if the target has different behaviors for null. On Mon, Jul 29, 2019 at 5:26 AM Wenchen Fan wrote: > > I'm a big -1 on null values for invalid cas

Re: [Discuss] Follow ANSI SQL on table insertion

2019-07-29 Thread Wenchen Fan
> I'm a big -1 on null values for invalid casts. This is why we want to introduce the ANSI mode, so that invalid cast fails at runtime. But we have to keep the null behavior for a while, to keep backward compatibility. Spark returns null for invalid cast since the first day of Spark SQL, we can't

Re: Re: How to force sorted merge join to broadcast join

2019-07-29 Thread Wenchen Fan
You can try EXPLAIN COST query and see if it works for you. On Mon, Jul 29, 2019 at 5:34 PM Rubén Berenguel wrote: > I think there is no way of doing that (at least don't remember one right > now). The closer I remember now, is you can run the SQL "ANALYZE TABLE > table_name COMPUTE STATISTIC" t

Re:Re: How to force sorted merge join to broadcast join

2019-07-29 Thread Rubén Berenguel
I think there is no way of doing that (at least don't remember one right now). The closer I remember now, is you can run the SQL "ANALYZE TABLE table_name COMPUTE STATISTIC" to compute them regardless of having a query (also hints the cost based optimiser if I remember correctly), but as far as dis

Re: Logistic Regression Iterations causing High GC in Spark 2.3

2019-07-29 Thread Dhrubajyoti Hati
Actually I didn't have any of the GC tuning in the beginning and then adding them also didn't made any difference. As mentioned earlier I tried low number executors of higher configuration and vice versa. Nothing helps. About the code its simple logistic regression nothing with explicit broadcast o

Re:Re: How to force sorted merge join to broadcast join

2019-07-29 Thread zhangliyun
thks! after using the syntax provided in the link, select /*+ BROADCAST (A) */ ... , i got what i want. but i want to ask beside using queryExecution.stringWithStats (dataframe api) to show the table statistics, is there any way to show the table statistics in explain xxx in spark sql command l

Re: Apache Training contribution for Spark - Feedback welcome

2019-07-29 Thread Lars Francke
Happy to discuss this here but you're also invited to bring those points up at dev@training as other projects might have similar concerns. The request for assistance still stands. If anyone here is interested in helping out reviewing and improving the material please reach out. On Sat, Jul 27, 2

Re: [DISCUSS] New sections in Github Pull Request description template

2019-07-29 Thread Hyukjin Kwon
Thanks, guys. Let me probably mimic the template and open a PR soon - currently I am stuck in some works. I will take a look in few days later. 2019년 7월 27일 (토) 오전 3:32, Bryan Cutler 님이 작성: > The k8s template is pretty good. Under the behavior change section, it > would be good to add instruction

Re: Logistic Regression Iterations causing High GC in Spark 2.3

2019-07-29 Thread Jörn Franke
I would remove the all GC tuning and add it later once you found the underlying root cause. Usually more GC means you need to provide more memory, because something has changed (your application, spark Version etc.) We don’t have your full code to give exact advise, but you may want to rethink