zero323 commented on a change in pull request #34136:
URL: https://github.com/apache/spark/pull/34136#discussion_r720405240



##########
File path: python/pyspark/sql/session.py
##########
@@ -492,28 +539,40 @@ def _inferSchema(self, rdd, samplingRatio=None, 
names=None):
                 prefer_timestamp_ntz=prefer_timestamp_ntz)).reduce(_merge_type)
         return schema
 
-    def _createFromRDD(self, rdd, schema, samplingRatio):
+    def _createFromRDD(
+        self,
+        rdd: "RDD[Union[DateTimeLiteral, LiteralType, DecimalLiteral, 
RowLike]]",
+        schema: Optional[Union[DataType, List[str]]],
+        samplingRatio: Optional[float],
+    ) -> Tuple["RDD[Tuple]", StructType]:

Review comment:
       Following the notes from the above, this could be overloaded to 
distinguish between cases were we can and cannot infer schema. Might be an 
overkill, though.
   
   Just a heads-up ‒ I've encountered some problems related to these specific 
`Unions` while working on SPARK-36894. This surface only with the `self` type 
(which is, ironically, not validated) and I am thinking about introducing some 
`TypeVars` (a more precise choice anyway) as a fix.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to