[GitHub] [spark] viirya commented on a change in pull request #32332: [SPARK-35211][PYTHON] verify inferred schema for _create_dataframe

GitBox Thu, 29 Jul 2021 00:54:56 -0700


viirya commented on a change in pull request #32332:
URL: https://github.com/apache/spark/pull/32332#discussion_r678904955




##########
File path: python/pyspark/sql/session.py
##########
@@ -599,6 +609,9 @@ def createDataFrame(self, data, schema=None, 
samplingRatio=None, verifySchema=Tr
             the sample ratio of rows used for inferring
         verifySchema : bool, optional
             verify data types of every row against schema. Enabled by default.
+            Specifically, if schema is provided, schema verification will be 
performed at the
+            preparation phase; if schema is not provided and can be inferred 
(eg. from UDT),
+            schema verification will be performed at the convertion phase.

Review comment:
       There is no such `preparation` phase exposed to users. I think it is 
confusing to mention the schema verification is performed at preparation phase. 
Could we remove it?
   
   E.g. "Specifically, if schema is provided, schema verification will be 
performed against the provided schema; if schema is not provided and can be 
inferred (eg. from UDT), schema verification will be performed against the 
inferred schema."




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] viirya commented on a change in pull request #32332: [SPARK-35211][PYTHON] verify inferred schema for _create_dataframe

Reply via email to