zhengruifeng commented on code in PR #37233:
URL: https://github.com/apache/spark/pull/37233#discussion_r925397202
##########
python/pyspark/sql/dataframe.py:
##########
@@ -1422,6 +1422,53 @@ def colRegex(self, colName: str) -> Column:
jc = self._jdf.colRegex(colName)
return Column(jc)
+ def convert(self, schema: StructType) -> "DataFrame":
+ """
+ Returns a new DataFrame where each row is reconciled to match the
specified schema.
+ Spark will:
+ 1, Reorder columns and/or inner fields by name to match the specified
schema.
+ 2, Project away columns and/or inner fields that are not needed by the
specified schema.
+ Missing columns and/or inner fields (present in the specified schema
but not input
+ DataFrame) lead to failures.
+ 3, Cast the columns and/or inner fields to match the data types in the
specified schema,
+ if the types are compatible, e.g., numeric to numeric (error if
overflows), but not string
+ to int.
+ 4, Carry over the metadata from the specified schema, while the
columns and/or inner fields
+ still keep their own metadata if not overwritten by the specified
schema.
+ 5, Fail if the nullability are not compatible. For example, the column
and/or inner field
+ is nullable but the specified schema requires them to be not nullable.
+
+ .. versionadded:: 3.4.0
+
+ Parameters
+ ----------
+ schema : StructType
+ specified schema.
+
+ Examples
+ --------
+ >>> df = spark.createDataFrame([("a", 1)], ["i", "j"])
+ >>> df.schema
+ StructType([StructField('i', StringType(), True), StructField('j',
LongType(), True)])
+ >>> schema = StructType([StructField("j", StringType()),
StructField("i", StringType())])
+ >>> df2 = df.convert(schema)
+ >>> df2.schema
+ StructType([StructField('j', StringType(), True), StructField('i',
StringType(), True)])
+ >>> df2.show()
+ +---+---+
+ | j| i|
+ +---+---+
+ | 1| a|
+ +---+---+
+ """
+ assert schema is not None
+ sc = self.sparkSession._sc
+ assert sc is not None and sc._jvm is not None
+ jschema =
sc._jvm.org.apache.spark.sql.api.python.PythonSQLUtils.parseStructTypeFromJson(
Review Comment:
```
In [8]: sc._jvm.org.apache.spark.sql.types.StructType.fromDDL
Out[8]: <py4j.java_gateway.JavaMember at 0x13656c4f0>
In [9]: sc._jvm.org.apache.spark.sql.types.StructType.fromString
---------------------------------------------------------------------------
Py4JError Traceback (most recent call last)
Input In [9], in <cell line: 1>()
----> 1 sc._jvm.org.apache.spark.sql.types.StructType.fromString
File ~/Dev/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py:1547,
in JavaClass.__getattr__(self, name)
1544 return get_return_value(
1545 answer, self._gateway_client, self._fqn, name)
1546 else:
-> 1547 raise Py4JError(
1548 "{0}.{1} does not exist in the JVM".format(self._fqn, name))
Py4JError: org.apache.spark.sql.types.StructType.fromString does not exist
in the JVM
```
maybe due to that `fromString` is `private[sql]`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]