[jira] [Assigned] (SPARK-18766) Push Down Filter Through BatchEvalPython

2016-12-10 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li reassigned SPARK-18766:
---

Assignee: Xiao Li

> Push Down Filter Through BatchEvalPython
> 
>
> Key: SPARK-18766
> URL: https://issues.apache.org/jira/browse/SPARK-18766
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 2.0.2
>Reporter: Xiao Li
>Assignee: Xiao Li
> Fix For: 2.2.0
>
>
> Currently, when users use Python UDF in Filter, {{BatchEvalPython}} is always 
> generated below {{FilterExec}}. However, not all the predicates need to be 
> evaluated after Python UDF execution. Thus, we can push down the predicates 
> through {{BatchEvalPython}} .
> {noformat}
> >>> df = spark.createDataFrame([(1, "1"), (2, "2"), (1, "2"), (1, "2")], 
> >>> ["key", "value"])
> >>> from pyspark.sql.functions import udf, col
> >>> from pyspark.sql.types import BooleanType
> >>> my_filter = udf(lambda a: a < 2, BooleanType())
> >>> sel = df.select(col("key"), col("value")).filter((my_filter(col("key"))) 
> >>> & (df.value < "2"))
> >>> sel.explain(True)
> {noformat}
> {noformat}
> == Physical Plan ==
> *Project [key#0L, value#1]
> +- *Filter ((isnotnull(value#1) && pythonUDF0#9) && (value#1 < 2))
>+- BatchEvalPython [(key#0L)], [key#0L, value#1, pythonUDF0#9]
>   +- Scan ExistingRDD[key#0L,value#1]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-18766) Push Down Filter Through BatchEvalPython

2016-12-07 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18766:


Assignee: (was: Apache Spark)

> Push Down Filter Through BatchEvalPython
> 
>
> Key: SPARK-18766
> URL: https://issues.apache.org/jira/browse/SPARK-18766
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 2.0.2
>Reporter: Xiao Li
>
> Currently, when users use Python UDF in Filter, {{BatchEvalPython}} is always 
> generated below {{FilterExec}}. However, not all the predicates need to be 
> evaluated after Python UDF execution. Thus, we can push down the predicates 
> through {{BatchEvalPython}} .
> {noformat}
> >>> df = spark.createDataFrame([(1, "1"), (2, "2"), (1, "2"), (1, "2")], 
> >>> ["key", "value"])
> >>> from pyspark.sql.functions import udf, col
> >>> from pyspark.sql.types import BooleanType
> >>> my_filter = udf(lambda a: a < 2, BooleanType())
> >>> sel = df.select(col("key"), col("value")).filter((my_filter(col("key"))) 
> >>> & (df.value < "2"))
> >>> sel.explain(True)
> {noformat}
> {noformat}
> == Physical Plan ==
> *Project [key#0L, value#1]
> +- *Filter ((isnotnull(value#1) && pythonUDF0#9) && (value#1 < 2))
>+- BatchEvalPython [(key#0L)], [key#0L, value#1, pythonUDF0#9]
>   +- Scan ExistingRDD[key#0L,value#1]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-18766) Push Down Filter Through BatchEvalPython

2016-12-07 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18766:


Assignee: Apache Spark

> Push Down Filter Through BatchEvalPython
> 
>
> Key: SPARK-18766
> URL: https://issues.apache.org/jira/browse/SPARK-18766
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 2.0.2
>Reporter: Xiao Li
>Assignee: Apache Spark
>
> Currently, when users use Python UDF in Filter, {{BatchEvalPython}} is always 
> generated below {{FilterExec}}. However, not all the predicates need to be 
> evaluated after Python UDF execution. Thus, we can push down the predicates 
> through {{BatchEvalPython}} .
> {noformat}
> >>> df = spark.createDataFrame([(1, "1"), (2, "2"), (1, "2"), (1, "2")], 
> >>> ["key", "value"])
> >>> from pyspark.sql.functions import udf, col
> >>> from pyspark.sql.types import BooleanType
> >>> my_filter = udf(lambda a: a < 2, BooleanType())
> >>> sel = df.select(col("key"), col("value")).filter((my_filter(col("key"))) 
> >>> & (df.value < "2"))
> >>> sel.explain(True)
> {noformat}
> {noformat}
> == Physical Plan ==
> *Project [key#0L, value#1]
> +- *Filter ((isnotnull(value#1) && pythonUDF0#9) && (value#1 < 2))
>+- BatchEvalPython [(key#0L)], [key#0L, value#1, pythonUDF0#9]
>   +- Scan ExistingRDD[key#0L,value#1]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org