Re: [PR] [SPARK-47858][SPARK-47852][PYTHON][SQL] Refactoring the structure for DataFrame error context [spark]

via GitHub Tue, 16 Apr 2024 20:11:11 -0700


itholic commented on code in PR #46063:
URL: https://github.com/apache/spark/pull/46063#discussion_r1568157339



##########
python/pyspark/sql/tests/test_dataframe.py:
##########
@@ -1011,36 +1011,6 @@ def test_dataframe_error_context(self):
                 pyspark_fragment="eqNullSafe",
             )
 
-            # DataFrameQueryContext with pysparkLoggingInfo - and
-            with self.assertRaises(AnalysisException) as pe:
-                df.withColumn("and_invalid_type", df.id & "string").collect()
-            self.check_error(
-                exception=pe.exception,
-                error_class="DATATYPE_MISMATCH.BINARY_OP_WRONG_TYPE",
-                message_parameters={
-                    "inputType": '"BOOLEAN"',
-                    "actualDataType": '"BIGINT"',
-                    "sqlExpr": '"(id AND string)"',
-                },
-                query_context_type=QueryContextType.DataFrame,
-                pyspark_fragment="and",
-            )
-
-            # DataFrameQueryContext with pysparkLoggingInfo - or
-            with self.assertRaises(AnalysisException) as pe:
-                df.withColumn("or_invalid_type", df.id | "string").collect()
-            self.check_error(
-                exception=pe.exception,
-                error_class="DATATYPE_MISMATCH.BINARY_OP_WRONG_TYPE",
-                message_parameters={
-                    "inputType": '"BOOLEAN"',
-                    "actualDataType": '"BIGINT"',
-                    "sqlExpr": '"(id OR string)"',
-                },
-                query_context_type=QueryContextType.DataFrame,
-                pyspark_fragment="or",
-            )

Review Comment:
   In the previous PR, the error message was changed because we directly called 
the JVM's "fn" function.
   
   **Before the previous PR**
   ```
   >>> df.withColumn("or_invalid_type", df.id | "string").collect()
   Traceback (most recent call last):
   py4j.protocol.Py4JError: An error occurred while calling o40.or. Trace:
   py4j.Py4JException: Method or([class java.lang.String]) does not exist
        at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:321)
        at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:329)
        at py4j.Gateway.invoke(Gateway.java:274)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at 
py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
        at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
        at java.base/java.lang.Thread.run(Thread.java:840)
   ```
   
   **After the previous PR**
   ```
   >>> df.withColumn("or_invalid_type", df.id | "string").collect()
   Traceback (most recent call last):
   pyspark.errors.exceptions.captured.AnalysisException: 
[DATATYPE_MISMATCH.BINARY_OP_DIFF_TYPES] Cannot resolve "(id OR string)" due to 
data type mismatch: the left and right operands of the binary operator have 
incompatible types ("BIGINT" and "STRING"). SQLSTATE: 42K09;
   'Project [id#0L, (id#0L OR string) AS or_invalid_type#2]
   +- Range (0, 10, step=1, splits=Some(12))
   ```
   
   We changed the approach which is not calling "fn" directly from current PR, 
so the error message has returned to its original state.
   
   Additionally, JVM also doesn't have `DataFrameQueryContext` for "and", "or" 
and "not" functions which is aligning with current behavior.
   
   I guess such an exception for `and`, `or`, `not` is caused by:
   
   
https://github.com/apache/spark/blob/6c827c10dc15e03178277a415c0e26e2d9d3a2f9/python/pyspark/sql/column.py#L402-L408
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-47858][SPARK-47852][PYTHON][SQL] Refactoring the structure for DataFrame error context [spark]

Reply via email to