nchammas opened a new pull request #26718: [SPARK-27990] [SPARK-29903] Add 
recursiveFileLookup option to Python DataFrameReader
URL: https://github.com/apache/spark/pull/26718
 
 
   ### What changes were proposed in this pull request?
   
   This PR adds the `recursiveFileLookup` option to the Python DataFrameReader 
API.
   
   
   ### Why are the changes needed?
   
   This PR maintains Python feature parity with Scala.
   
   ### Does this PR introduce any user-facing change?
   
   Yes.
   
   Before this PR, you'd only be able to use this option as follows:
   
   ```python
   spark.read.option("recursiveFileLookup", True).text("test-data").show()
   ```
   
   With this PR, you can reference the option from within the format-specific 
method:
   
   ```python
   spark.read.text("test-data", recursiveFileLookup=True).show()
   ```
   
   This option now also shows up in the Python API docs.
   
   ### How was this patch tested?
   
   I tested this manually by creating the following directories with dummy data:
   
   ```
   test-data
   ├── 1.txt
   └── nested
      └── 2.txt
   test-parquet
   ├── nested
   │  ├── _SUCCESS
   │  ├── part-00000-...-.parquet
   ├── _SUCCESS
   ├── part-00000-...-.parquet
   ```
   
   I then ran the following tests and confirmed the output looked good:
   
   ```python
   spark.read.parquet("test-parquet", recursiveFileLookup=True).show()
   spark.read.text("test-data", recursiveFileLookup=True).show()
   spark.read.csv("test-data", recursiveFileLookup=True).show()
   ```
   
   `python/pyspark/sql/tests/test_readwriter.py` seems pretty sparse. I'm happy 
to add my tests there, though it seems we have been deferring testing like this 
to the Scala side of things.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to