itholic commented on code in PR #42956:
URL: https://github.com/apache/spark/pull/42956#discussion_r1328210599
##########
python/pyspark/pandas/tests/connect/test_parity_internal.py:
##########
@@ -15,18 +15,86 @@
# limitations under the License.
#
import unittest
+import pandas as pd
from pyspark.pandas.tests.test_internal import InternalFrameTestsMixin
from pyspark.testing.connectutils import ReusedConnectTestCase
from pyspark.testing.pandasutils import PandasOnSparkTestUtils
+from pyspark.pandas.internal import (
+ InternalFrame,
+ SPARK_DEFAULT_INDEX_NAME,
+ SPARK_INDEX_NAME_FORMAT,
+)
+from pyspark.pandas.utils import spark_column_equals
class InternalFrameParityTests(
InternalFrameTestsMixin, PandasOnSparkTestUtils, ReusedConnectTestCase
):
- @unittest.skip("TODO(SPARK-43654): Enable
InternalFrameParityTests.test_from_pandas.")
def test_from_pandas(self):
- super().test_from_pandas()
+ pdf = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
Review Comment:
I copied this from
[test_internal.py](https://github.com/apache/spark/blob/master/python/pyspark/pandas/tests/test_internal.py#L31-L107)
with excluding tests that leverages `spark_column_equals`, e.g.
`self.assertTrue(spark_column_equals(internal.spark_column_for(("a",)),
sdf["a"]))`.
Because currently `spark_column_equals` is working in different way from the
"Non-Connect", since we can't compare the two Column object itself as below:
**Non-Connect**
```python
>>> sdf = spark.range(10)
>>> sdf.id._jc.equals(sdf.id._jc)
True
```
**Connect**
```python
>>> sdf = spark.range(10)
>>> sdf.id._jc.equals(sdf.id._jc)
# [JVM_ATTRIBUTE_NOT_SUPPORTED] Attribute `_jc` is not supported in Spark
Connect as it depends on the JVM. If you need to use this attribute, do not use
Spark Connect when creating your session.
```
But on my second thought regarding the comments, maybe we should find a
proper way to compare the Column objects instead of separating the tests.
Because iur current way to compare two Column object from Spark Connect is
rely on comparing `repr` for each Column, but it's a bit hacky way so we should
fix it even though it functions properly in our current code base.
@zhengruifeng May I happen to ask your thought on this? Just as comparing
equality by directly accessing a Java Object in Non-Connect mode, do you think
this operation is also possible in Spark Connect?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]