Re: Optimizer rule ConvertToLocalRelation causes expressions to be eager-evaluated in Planning phase

2018-06-08 Thread Li Jin
Sorry I am confused now... My UDF gets executed for each row anyway (because I am doing with column and want to execute the UDF with each row). The difference is that with the optimization "ConvertToLocalRelation" it gets executed for each row on the driver in the optimization stage? On Fri, Jun

Re: Optimizer rule ConvertToLocalRelation causes expressions to be eager-evaluated in Planning phase

2018-06-08 Thread Li Jin
I see. Thanks for the clarification. It's not a a big issue but I am surprised my UDF can be executed in planning phase. If my UDF is doing something expensive it could get weird. On Fri, Jun 8, 2018 at 3:44 PM, Reynold Xin wrote: > But from the user's perspective, optimization is not run

Re: Optimizer rule ConvertToLocalRelation causes expressions to be eager-evaluated in Planning phase

2018-06-08 Thread Reynold Xin
But from the user's perspective, optimization is not run right? So it is still lazy. On Fri, Jun 8, 2018 at 12:35 PM Li Jin wrote: > Hi All, > > Sorry for the long email title. I am a bit surprised to find that the > current optimizer rule "ConvertToLocalRelation" causes expressions to be >

Optimizer rule ConvertToLocalRelation causes expressions to be eager-evaluated in Planning phase

2018-06-08 Thread Li Jin
Hi All, Sorry for the long email title. I am a bit surprised to find that the current optimizer rule "ConvertToLocalRelation" causes expressions to be eager-evaluated in planning phase, this can be demonstrated with the following code: scala> val myUDF = udf((x: String) => { println("UDF

[VOTE] [RESULT] Spark 2.3.1 (RC4)

2018-06-08 Thread Marcelo Vanzin
The vote passes. Thanks to all who helped with the release! I'll follow up later with a release announcement once everything is published. +1 (* = binding): - Marcelo Vanzin * - Reynold Xin * - Sean Owen * - Denny Lee - Dongjoon Hyun - Ricardo Almeida - Hyukjin Kwon - John Zhuge - Mark Hamstra *