This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new 53dae3d0440 [SPARK-43009][PYTHON][FOLLOWUP] Parameterized `sql_formatter.sql()` with Any constants 53dae3d0440 is described below commit 53dae3d0440f5acad1fd30b17fe27ed208860960 Author: Max Gekk <max.g...@gmail.com> AuthorDate: Mon Jun 19 09:31:50 2023 +0900 [SPARK-43009][PYTHON][FOLLOWUP] Parameterized `sql_formatter.sql()` with Any constants ### What changes were proposed in this pull request? In the PR, I propose to change API of parameterized SQL, and replace type of argument values from `string` to `Any` in `sql_formatter`. Language API can accept `Any` objects from which it is possible to construct literal expressions. ### Why are the changes needed? To align the API to PySpark's `sql()`. And the current implementation the parameterized `sql()` requires arguments as string values parsed to SQL literal expressions that causes the following issues: 1. SQL comments are skipped while parsing, so, some fragments of input might be skipped. For example, `'Europe -- Amsterdam'`. In this case, `-- Amsterdam` is excluded from the input. 2. Special chars in string values must be escaped, for instance `'E\'Twaun Moore'` ### Does this PR introduce _any_ user-facing change? Yes. ### How was this patch tested? By running the affected test suite: ``` $ python/run-tests --parallelism=1 --testnames 'pyspark.pandas.sql_formatter' ``` Closes #41644 from MaxGekk/fix-pandas-sql_formatter. Authored-by: Max Gekk <max.g...@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls...@apache.org> --- python/pyspark/pandas/sql_formatter.py | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/python/pyspark/pandas/sql_formatter.py b/python/pyspark/pandas/sql_formatter.py index f87dd3ff29f..4387a1e0909 100644 --- a/python/pyspark/pandas/sql_formatter.py +++ b/python/pyspark/pandas/sql_formatter.py @@ -43,7 +43,7 @@ _CAPTURE_SCOPES = 3 def sql( query: str, index_col: Optional[Union[str, List[str]]] = None, - args: Dict[str, str] = {}, + args: Optional[Dict[str, Any]] = None, **kwargs: Any, ) -> DataFrame: """ @@ -103,10 +103,14 @@ def sql( Also note that the index name(s) should be matched to the existing name. args : dict - A dictionary of parameter names to string values that are parsed as SQL literal - expressions. For example, dict keys: "rank", "name", "birthdate"; dict values: - "1", "'Steven'", "DATE'2023-03-21'". The fragments of string values belonged to SQL - comments are skipped while parsing. + A dictionary of parameter names to Python objects that can be converted to + SQL literal expressions. See + <a href="https://spark.apache.org/docs/latest/sql-ref-datatypes.html"> + Supported Data Types</a> for supported value types in Python. + For example, dictionary keys: "rank", "name", "birthdate"; + dictionary values: 1, "Steven", datetime.date(2023, 4, 2). + Dict value can be also a `Column` of literal expression, in that case it is taken as is. + .. versionadded:: 3.4.0 @@ -166,7 +170,7 @@ def sql( And substitude named parameters with the `:` prefix by SQL literals. - >>> ps.sql("SELECT * FROM range(10) WHERE id > :bound1", args={"bound1":"7"}) + >>> ps.sql("SELECT * FROM range(10) WHERE id > :bound1", args={"bound1":7}) id 0 8 1 9 --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org