Rafal Wojdyla created SPARK-40363: ------------------------------------- Summary: Add SQL misc function to assert/check column value Key: SPARK-40363 URL: https://issues.apache.org/jira/browse/SPARK-40363 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 3.3.0 Reporter: Rafal Wojdyla
SQL function that allows to assert a condition on a column that: * fails when condition is not met * returns original value otherwise Related: SPARK-32793 But {{assert_true}} and {{raise_error}} do not really cut it. In case of {{assert_true}} you have to actually collect the empty column, and the check might no happen if you drop the assertion column, which you will likely do since it's empty. Having a function that returns some value as part of the check, in most cases it would be the checked column would be handy. I'm working with pyspark, so here's python implementation: {code:python} @overload def assert_col_condition( col: Union[str, Column], cond: Callable[[Column], Column], error_msg: Optional[str] = None, ) -> Column: """Asserts condition on a column, IFF it holds returns the original value under `col`""" ... @overload def assert_col_condition( col: Union[str, Column], cond: Column, error_msg: Optional[str] = None ) -> Column: """Asserts condition on a column, IFF it holds returns the original value under `col`""" ... def assert_col_condition( col: Union[str, Column], cond: Union[Column, Callable[[Column], Column]], error_msg: Optional[str] = None, ) -> Column: col = str_to_col(col) if not isinstance(cond, Column): cond = cond(col) return F.when( ~cond, F.raise_error(error_msg or f"Assertion failed: {cond}") ).otherwise(col) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org