[PR] [SPARK-45554][PYTHON] Introduce flexible parameter to assertSchemaEqual [spark]

via GitHub Thu, 19 Oct 2023 01:08:39 -0700


itholic opened a new pull request, #43450:
URL: https://github.com/apache/spark/pull/43450


   ### What changes were proposed in this pull request?
   
   This PR proposes to add three new parameters to the `assertSchemaEqual`: 
`ignoreNullable`, `ignoreColumnOrder` and `ignoreColumnName` to provide users 
with more flexibility in schema testing.
   
   
   ### Why are the changes needed?
   
   To enhance the utility of `assertSchemaEqual` by accommodating various 
common schema comparison scenarios that users might encounter, without 
necessitating manual adjustments or workarounds.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes. `assertDataFrameEqual` now have the option to use the five new 
parameters:
   <!DOCTYPE html>
   
   Parameter | Type | Comment
   -- | -- | --
   ignoreNullable | Boolean [optional] | Specifies whether a column’s nullable 
property is included when checking for schema equality.</br></br> When set to 
True (default), the nullable property of the columns being compared is not 
taken into account and the columns will be considered equal even if they have 
different nullable settings.</br></br>When set to False, columns are considered 
equal only if they have the same nullable setting.
   ignoreColumnOrder | Boolean [optional] | Specifies whether to compare 
columns in the order they appear in the DataFrames or by column name.</br></br> 
When set to False (default), columns are compared in the order they appear in 
the DataFrames.</br></br> When set to True, a column in the expected DataFrame 
is compared to the column with the same name in the actual DataFrame. 
</br></br>ignoreColumnOrder cannot be set to True if ignoreColumnNames is also 
set to True.
   ignoreColumnName | Boolean [optional] | Specifies whether to fail the 
initial schema equality check if the column names in the two DataFrames are 
different.</br></br> When set to False (default), column names are checked and 
the function fails if they are different.</br></br> When set to True, the 
function will succeed even if column names are different. Column data types are 
compared for columns in the order they appear in the DataFrames.</br></br> 
ignoreColumnNames cannot be set to True if ignoreColumnOrder is also set to 
True.
   
   
   
   
   ### How was this patch tested?
   
   Added usage examples into doctest for each parameter.
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [SPARK-45554][PYTHON] Introduce flexible parameter to assertSchemaEqual [spark]

Reply via email to