Dobiasd commented on issue #27128: [SPARK-30421][SQL] Dropped columns still available for filtering URL: https://github.com/apache/spark/pull/27128#issuecomment-584495487 Not just this PR was closed, but also the [Jira issue](https://issues.apache.org/jira/browse/SPARK-30421) was resolved as "Won't Fix"? Could somebody please explain to me why? It the observed behavior intended, i.e., it's not a bug, it's a feature, or is it just not worth the effort to fix it? To me, the below example still looks wrong. ```scala scala> val df1 = Seq((0, "a"), (1, "b")).toDF("foo", "bar") df1: DataFrame = [foo: int, bar: string] scala> val df2 = df1.drop("bar") df2: DataFrame = [foo: int] scala> df2.printSchema root |-- foo: integer (nullable = false) scala> df2.where($"bar" === "a").show +---+ |foo| +---+ | 0| +---+ ``` Pandas, as a comparative example, behaves correctly: ```python >>> import pandas as pd >>> df1 = pd.DataFrame(data={'foo': [0, 1], 'bar': ["a", "b"]}) >>> df2 = df1.drop(columns=["bar"]) >>> df2.info() <class 'pandas.core.frame.DataFrame'> RangeIndex: 2 entries, 0 to 1 Data columns (total 1 columns): foo 2 non-null int64 dtypes: int64(1) memory usage: 144.0 bytes >>> df2[df2["bar"] == "a"] Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py", line 2897, in get_loc return self._engine.get_loc(key) File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'bar' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python3.7/dist-packages/pandas/core/frame.py", line 2995, in __getitem__ indexer = self.columns.get_loc(key) File "/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py", line 2899, in get_loc return self._engine.get_loc(self._maybe_cast_indexer(key)) File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'bar' ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
