Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22104
I mean, the current code will still break partitioned tables:
```
== Physical Plan ==
*(3) Project [_c0#223, pythonUDF0#231 AS v1#226]
+- BatchEvalPython [<lambda>(0)], [_c0#223, pythonUDF0#231]
+- *(2) Project [_c0#223]
+- *(2) Filter (pythonUDF0#230 = 0)
+- BatchEvalPython [<lambda>(0)], [_c0#223, pythonUDF0#230]
+- *(1) FileScan csv [_c0#223] Batched: false, Format: CSV,
Location: InMemoryFileIndex[file:/tmp/tab3], PartitionFilters: [(<lambda>(0) =
0)], PushedFilters: [], ReadSchema: struct<_c0:string>
```
For instance:
```python
from pyspark.sql.functions import udf, lit, col
spark.range(1).selectExpr("id", "id as
value").write.mode("overwrite").format('csv').partitionBy("id").save("/tmp/tab3")
df = spark.read.csv('/tmp/tab3')
df2 = df.withColumn('v1', udf(lambda x: x, 'int')(lit(0)))
df2 = df2.filter(df2['v1'] == 0)
df2.explain()
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]