zero323 commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r720401238
##########
File path: python/pyspark/sql/session.py
##########
@@ -445,7 +487,12 @@ def _inferSchemaFromList(self, data, names=None):
raise ValueError("Some of types cannot be determined after
inferring")
return schema
- def _inferSchema(self, rdd, samplingRatio=None, names=None):
+ def _inferSchema(
+ self,
+ rdd: "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral,
RowLike]]",
Review comment:
Just wondering about this ‒ I have a feeling that it should be either
`RDD[Any]` (type-wise we can invoke this on arbitrary RDD) or, if we want to
give a signal that can succeed only on certain types of RDDs, `Literal*`
variants should be omitted (we don't support schema inference on these).
Same applies to `_inferSchemaFromList`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]