Dobiasd commented on issue #27128: [SPARK-30421][SQL] Dropped columns still
available for filtering
URL: https://github.com/apache/spark/pull/27128#issuecomment-584495487
Not just this PR was closed, but also the [Jira
issue](https://issues.apache.org/jira/browse/SPARK-30421) was resolved as
"Won't Fix"? Could somebody please explain to me why? It the observed behavior
intended, i.e., it's not a bug, it's a feature, or is it just not worth the
effort to fix it?
To me, the below example still looks wrong.
```scala
scala> val df1 = Seq((0, "a"), (1, "b")).toDF("foo", "bar")
df1: DataFrame = [foo: int, bar: string]
scala> val df2 = df1.drop("bar")
df2: DataFrame = [foo: int]
scala> df2.printSchema
root
|-- foo: integer (nullable = false)
scala> df2.where($"bar" === "a").show
+---+
|foo|
+---+
| 0|
+---+
```
Pandas, as a comparative example, behaves correctly:
```python
>>> import pandas as pd
>>> df1 = pd.DataFrame(data={'foo': [0, 1], 'bar': ["a", "b"]})
>>> df2 = df1.drop(columns=["bar"])
>>> df2.info()
RangeIndex: 2 entries, 0 to 1
Data columns (total 1 columns):
foo2 non-null int64
dtypes: int64(1)
memory usage: 144.0 bytes
>>> df2[df2["bar"] == "a"]
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py",
line 2897, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 107, in
pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 131, in
pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in
pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in
pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'bar'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python3.7/dist-packages/pandas/core/frame.py", line
2995, in __getitem__
indexer = self.columns.get_loc(key)
File "/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py",
line 2899, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 107, in
pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 131, in
pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in
pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in
pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'bar'
```
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org
With regards,
Apache Git Services
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org