[GitHub] [spark] HyukjinKwon commented on a change in pull request #28652: [SPARK-31763][PYSPARK] Add `inputFiles` method in PySpark DataFrame Class.

GitBox Wed, 27 May 2020 02:27:24 -0700


HyukjinKwon commented on a change in pull request #28652:
URL: https://github.com/apache/spark/pull/28652#discussion_r430981667




##########
File path: python/pyspark/sql/dataframe.py
##########
@@ -2219,6 +2219,20 @@ def semanticHash(self):
         """
         return self._jdf.semanticHash()
 
+    @since(3.1)
+    def inputFiles(self):
+        """
+        Returns a best-effort snapshot of the files that compose this 
:class:`DataFrame`.
+        This method simply asks each constituent BaseRelation for its 
respective files and
+        takes the union of all results. Depending on the source relations, 
this may not find
+        all input files. Duplicates are removed.
+
+        >>> df = spark.read.load("examples/src/main/resources/people.json", 
format="json")
+        >>> len(df.inputFiles())
+        1
+        """
+        return [f for f in self._jdf.inputFiles()]

Review comment:
       You can just `return list(self._jdf.inputFiles())`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HyukjinKwon commented on a change in pull request #28652: [SPARK-31763][PYSPARK] Add `inputFiles` method in PySpark DataFrame Class.

Reply via email to