allisonwang-db commented on code in PR #43360:
URL: https://github.com/apache/spark/pull/43360#discussion_r1364340178


##########
python/pyspark/sql/readwriter.py:
##########
@@ -69,20 +83,26 @@ class DataFrameReader(OptionUtils):
     def __init__(self, spark: "SparkSession"):
         self._jreader = spark._jsparkSession.read()
         self._spark = spark
+        self._format: Optional[Union[str, Type[DataSource]]] = None
+        self._schema: Optional[Union[str, StructType]] = None
+        self._options: Dict[str, "OptionalPrimitiveType"] = dict()
 
     def _df(self, jdf: JavaObject) -> "DataFrame":
         from pyspark.sql.dataframe import DataFrame
 
         return DataFrame(jdf, self._spark)
 
-    def format(self, source: str) -> "DataFrameReader":
+    def format(self, source: Union[str, Type[DataSource]]) -> 
"DataFrameReader":

Review Comment:
   Ah you meant we can't have a Python data source object when using it in 
Scala right? For example this won't work: `val df = 
spark.read.format(MyPythonDataSource).load()`. This is a valid point. 
   
   @HyukjinKwon just brainstorming here: in order to use the short name, a data 
source must be registered somewhere. Then we first need to have an API to 
register a data source, something like `spark.dataSource.register(...)` similar 
to UDF.  
   
   But can we make it more Pythonic? What if I just want to import a Python 
data source from a package and directly use it without registering? Having this 
`DataSource` object we can easily allow this use case:
   
   ```python
   from src.my.datasources import MyDataSource
   
   df = spark.read.format(MyDataSource).load()
   ``` 
   
   This is similar to UDFs, where we don't need to register them to use them in 
PySpark.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to