allisonwang-db commented on code in PR #50716: URL: https://github.com/apache/spark/pull/50716#discussion_r2076020575
########## python/docs/source/user_guide/sql/python_data_source.rst: ########## @@ -520,4 +520,6 @@ The following example demonstrates how to implement a basic Data Source using Ar Usage Notes ----------- -- During Data Source resolution, built-in and Scala/Java Data Sources take precedence over Python Data Sources with the same name; to explicitly use a Python Data Source, make sure its name does not conflict with the other Data Sources. +- During Data Source resolution, built-in and Scala/Java Data Sources take precedence over Python Data Sources with the same name; to explicitly use a Python Data Source, make sure its name does not conflict with the other non-Python Data Sources. +- It is allowed to register multiple Python Data Sources with the same name. Later registrations will overwrite earlier ones. +- To automatically register a data source, export it as ``DefaultSource`` in a top level module with name prefix ``pyspark_``. See `pyspark_huggingface <https://github.com/huggingface/pyspark_huggingface>`_ for an example. Review Comment: ```suggestion - To automatically register a data source, export it as ``DefaultSource`` in a top level module with name prefix ``pyspark_``. ``` ########## python/docs/source/user_guide/sql/python_data_source.rst: ########## @@ -520,4 +520,6 @@ The following example demonstrates how to implement a basic Data Source using Ar Usage Notes ----------- -- During Data Source resolution, built-in and Scala/Java Data Sources take precedence over Python Data Sources with the same name; to explicitly use a Python Data Source, make sure its name does not conflict with the other Data Sources. +- During Data Source resolution, built-in and Scala/Java Data Sources take precedence over Python Data Sources with the same name; to explicitly use a Python Data Source, make sure its name does not conflict with the other non-Python Data Sources. +- It is allowed to register multiple Python Data Sources with the same name. Later registrations will overwrite earlier ones. Review Comment: Can we also mention this includes statically registered data sources? ########## python/docs/source/user_guide/sql/python_data_source.rst: ########## @@ -520,4 +520,6 @@ The following example demonstrates how to implement a basic Data Source using Ar Usage Notes ----------- -- During Data Source resolution, built-in and Scala/Java Data Sources take precedence over Python Data Sources with the same name; to explicitly use a Python Data Source, make sure its name does not conflict with the other Data Sources. +- During Data Source resolution, built-in and Scala/Java Data Sources take precedence over Python Data Sources with the same name; to explicitly use a Python Data Source, make sure its name does not conflict with the other non-Python Data Sources. +- It is allowed to register multiple Python Data Sources with the same name. Later registrations will overwrite earlier ones. +- To automatically register a data source, export it as ``DefaultSource`` in a top level module with name prefix ``pyspark_``. See `pyspark_huggingface <https://github.com/huggingface/pyspark_huggingface>`_ for an example. Review Comment: Thanks for updating the doc. Let's use a separate PR to document this feature. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
