allisonwang-db commented on code in PR #43360:
URL: https://github.com/apache/spark/pull/43360#discussion_r1364340178
##########
python/pyspark/sql/readwriter.py:
##########
@@ -69,20 +83,26 @@ class DataFrameReader(OptionUtils):
def __init__(self, spark: "SparkSession"):
self._jreader = spark._jsparkSession.read()
self._spark = spark
+ self._format: Optional[Union[str, Type[DataSource]]] = None
+ self._schema: Optional[Union[str, StructType]] = None
+ self._options: Dict[str, "OptionalPrimitiveType"] = dict()
def _df(self, jdf: JavaObject) -> "DataFrame":
from pyspark.sql.dataframe import DataFrame
return DataFrame(jdf, self._spark)
- def format(self, source: str) -> "DataFrameReader":
+ def format(self, source: Union[str, Type[DataSource]]) ->
"DataFrameReader":
Review Comment:
Ah you meant we can't have a Python data source object when using it in
Scala right? For example this won't work: `val df =
spark.read.format(MyPythonDataSource).load()`. This is a valid point.
@HyukjinKwon just brainstorming here: in order to use the short name, a data
source must be registered somewhere. Then we first need to have an API to
register a data source, something like `spark.dataSource.register(...)` similar
to UDF.
But can we make it more Pythonic? What if I just want to import a Python
data source from a package and directly use it without registering? Having this
`DataSource` object we can easily allow this use case:
```python
from src.my.datasources import MyDataSource
df = spark.read.format(MyDataSource).load()
```
This is similar to UDFs, where we don't need to register them to use them in
PySpark.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]