[GitHub] [spark] wangyum commented on pull request #29642: [SPARK-32792][SQL] Improve InSet filter pushdown

2021-03-22 Thread GitBox
wangyum commented on pull request #29642: URL: https://github.com/apache/spark/pull/29642#issuecomment-803815504 This patch is used to push down the data column when the `InSet` value exceeds `spark.sql.parquet.pushdown.inFilterThreshold`. This is benchmark and benchmark result:

[GitHub] [spark] wangyum commented on pull request #29642: [SPARK-32792][SQL] Improve InSet filter pushdown

2020-12-11 Thread GitBox
wangyum commented on pull request #29642: URL: https://github.com/apache/spark/pull/29642#issuecomment-743135912 @cloud-fan @HyukjinKwon @gengliangwang Do you have more comments? This is an automated message from the Apache

[GitHub] [spark] wangyum commented on pull request #29642: [SPARK-32792][SQL] Improve InSet filter pushdown

2020-12-11 Thread GitBox
wangyum commented on pull request #29642: URL: https://github.com/apache/spark/pull/29642#issuecomment-743109098 Production real case test: Before this PR | After this PR --- | ---

[GitHub] [spark] wangyum commented on pull request #29642: [SPARK-32792][SQL] Improve InSet filter pushdown

2020-12-04 Thread GitBox
wangyum commented on pull request #29642: URL: https://github.com/apache/spark/pull/29642#issuecomment-738804876 retest this please. This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] wangyum commented on pull request #29642: [SPARK-32792][SQL] Improve InSet filter pushdown

2020-12-04 Thread GitBox
wangyum commented on pull request #29642: URL: https://github.com/apache/spark/pull/29642#issuecomment-738663385 retest this please. This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] wangyum commented on pull request #29642: [SPARK-32792][SQL] Improve InSet filter pushdown

2020-12-04 Thread GitBox
wangyum commented on pull request #29642: URL: https://github.com/apache/spark/pull/29642#issuecomment-738661573 ```scala package org.apache.spark.sql.execution.benchmark import java.io.File import scala.util.Random import org.apache.spark.SparkConf import