RE: [Spark SQL][reopen SPARK-16951]:Alternative implementation of NOT IN to Anti-join

2020-05-12 Thread Shuang, Linna1
. Thanks, Linna From: Mich Talebzadeh Sent: Tuesday, May 12, 2020 11:16 PM To: Shuang, Linna1 Cc: user@spark.apache.org Subject: Re: [Spark SQL][reopen SPARK-16951]:Alternative implementation of NOT IN to Anti-join Hi Linna, Please provide a background to it and your solution. The assumption

[Spark SQL][reopen SPARK-16951]:Alternative implementation of NOT IN to Anti-join

2020-05-11 Thread Shuang, Linna1
Hello, This JIRA (SPARK-16951) already being closed with the resolution of "Won't Fix" on 23/Feb/17. But in TPC-H test, we met performance issue of Q16, which used NOT IN subquery and being translated into broadcast nested loop join. This query uses almost half time of total 22 queries. For