viirya commented on a change in pull request #32332:
URL: https://github.com/apache/spark/pull/32332#discussion_r678904955
##########
File path: python/pyspark/sql/session.py
##########
@@ -599,6 +609,9 @@ def createDataFrame(self, data, schema=None,
samplingRatio=None, verifySchema=Tr
the sample ratio of rows used for inferring
verifySchema : bool, optional
verify data types of every row against schema. Enabled by default.
+ Specifically, if schema is provided, schema verification will be
performed at the
+ preparation phase; if schema is not provided and can be inferred
(eg. from UDT),
+ schema verification will be performed at the convertion phase.
Review comment:
There is no such `preparation` phase exposed to users. I think it is
confusing to mention the schema verification is performed at preparation phase.
Could we remove it?
E.g. "Specifically, if schema is provided, schema verification will be
performed against the provided schema; if schema is not provided and can be
inferred (eg. from UDT), schema verification will be performed against the
inferred schema."
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]