[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41927: [SPARK-44216] [PYTHON] Make assertSchemaEqual API with ignore_nullable optional flag

via GitHub Mon, 10 Jul 2023 16:49:35 -0700


HyukjinKwon commented on code in PR #41927:
URL: https://github.com/apache/spark/pull/41927#discussion_r1259014832



##########
python/pyspark/testing/utils.py:
##########
@@ -221,7 +221,130 @@ def check_error(
         )
 
 
-def assertDataFrameEqual(df: DataFrame, expected: DataFrame, check_row_order: 
bool = False):
+def assertSchemaEqual(
+    df_schema: StructType, expected_schema: StructType, ignore_nullable: bool 
= False
+):
+    """
+    A util function to assert equality between DataFrame schemas `df_schema`
+    and `expected_schema`, with optional parameter `ignore_nullable`.
+
+    .. versionadded:: 3.5.0
+
+    Parameters
+    ----------
+    df_schema : StructType
+        The DataFrame schema that is being compared or tested.
+
+    expected_schema : StructType
+        The expected schema, for comparison with the actual schema.
+
+    ignore_nullable : bool, optional
+        A flag indicating whether the nullable flag should be ignored in 
schema comparison.
+        If set to `False` (default), the nullable flag in the schemas is not 
taken into account.
+        If set to `True`, the nullable flag will be checked during schema 
comparison.
+
+    Examples
+    --------
+    >>> from pyspark.sql.types import StructType, StructField, ArrayType, 
IntegerType, DoubleType
+    >>> s1 = StructType([StructField("names", ArrayType(DoubleType(), True), 
True)])
+    >>> s2 = StructType([StructField("names", ArrayType(DoubleType(), True), 
True)])
+    >>> assertSchemaEqual(s1, s2) # pass
+    >>> s1 = StructType([StructField("names", ArrayType(IntegerType(), True), 
True)])
+    >>> s2 = StructType([StructField("names", ArrayType(DoubleType(), False), 
True)])
+    >>> assertSchemaEqual(s1, s2) # fail  # doctest: +IGNORE_EXCEPTION_DETAIL
+    Traceback (most recent call last):
+    ...
+    PySparkAssertionError: [DIFFERENT_SCHEMA] Schemas do not match:
+    [df]
+    StructField("names", ArrayType(IntegerType(), True), True)
+    <BLANKLINE>
+    [expected]
+    StructField("names", ArrayType(DoubleType(), False), True)
+    <BLANKLINE>
+    """
+
+    def compare_schemas_ignore_nullable(s1, s2):

Review Comment:
   There are some codes to refer to provide a nice message for nested types, 
e.g., 
https://github.com/apache/spark/blob/master/python/pyspark/sql/types.py#L1906-L2189



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41927: [SPARK-44216] [PYTHON] Make assertSchemaEqual API with ignore_nullable optional flag

Reply via email to