[GitHub] [spark] HyukjinKwon commented on a change in pull request #32546: [SPARK-35395][DOCS] Move ORC data source options from Python and Scala into a single page

GitBox Thu, 20 May 2021 18:09:27 -0700


HyukjinKwon commented on a change in pull request #32546:
URL: https://github.com/apache/spark/pull/32546#discussion_r636569400




##########
File path: python/pyspark/sql/readwriter.py
##########
@@ -793,28 +793,15 @@ def orc(self, path, mergeSchema=None, 
pathGlobFilter=None, recursiveFileLookup=N
         Parameters
         ----------
         path : str or list
-        mergeSchema : str or bool, optional
-            sets whether we should merge schemas collected from all
-            ORC part-files. This will override ``spark.sql.orc.mergeSchema``.
-            The default value is specified in ``spark.sql.orc.mergeSchema``.
-        pathGlobFilter : str or bool
-            an optional glob pattern to only include files with paths matching
-            the pattern. The syntax follows `org.apache.hadoop.fs.GlobFilter`.
-            It does not change the behavior of
-            `partition discovery 
<https://spark.apache.org/docs/latest/sql-data-sources-parquet.html#partition-discovery>`_.
  # noqa
-        recursiveFileLookup : str or bool
-            recursively scan a directory for files. Using this option
-            disables
-            `partition discovery 
<https://spark.apache.org/docs/latest/sql-data-sources-parquet.html#partition-discovery>`_.
  # noqa
 
-            modification times occurring before the specified time. The 
provided timestamp
-            must be in the following format: YYYY-MM-DDTHH:mm:ss (e.g. 
2020-06-01T13:00:00)
-        modifiedBefore : an optional timestamp to only include files with
-            modification times occurring before the specified time. The 
provided timestamp
-            must be in the following format: YYYY-MM-DDTHH:mm:ss (e.g. 
2020-06-01T13:00:00)
-        modifiedAfter : an optional timestamp to only include files with
-            modification times occurring after the specified time. The 
provided timestamp
-            must be in the following format: YYYY-MM-DDTHH:mm:ss (e.g. 
2020-06-01T13:00:00)
+        Other Parameters
+        ----------------
+        Extra options
+            For the extra options, refer to
+            `Data Source Option 
<https://spark.apache.org/docs/latest/sql-data-sources-orc.html#data-source-option>`_
  # noqa
+            and
+            `Generic File Source Options 
<https://spark.apache.org/docs/latest/sql-data-sources-generic-options.html`>_  
# noqa

Review comment:
       Shall we remove this too?

##########
File path: python/pyspark/sql/streaming.py
##########
@@ -637,20 +637,14 @@ def orc(self, path, mergeSchema=None, 
pathGlobFilter=None, recursiveFileLookup=N
 
         .. versionadded:: 2.3.0
 
-        Parameters
-        ----------
-        mergeSchema : str or bool, optional
-            sets whether we should merge schemas collected from all
-            ORC part-files. This will override ``spark.sql.orc.mergeSchema``.
-            The default value is specified in ``spark.sql.orc.mergeSchema``.
-        pathGlobFilter : str or bool, optional
-            an optional glob pattern to only include files with paths matching
-            the pattern. The syntax follows `org.apache.hadoop.fs.GlobFilter`.
-            It does not change the behavior of `partition discovery`_.
-        recursiveFileLookup : str or bool, optional
-            recursively scan a directory for files. Using this option
-            disables
-            `partition discovery 
<https://spark.apache.org/docs/latest/sql-data-sources-parquet.html#partition-discovery>`_.
  # noqa
+        Other Parameters
+        ----------------
+        Extra options
+            For the extra options, refer to
+            `Data Source Option 
<https://spark.apache.org/docs/latest/sql-data-sources-orc.html#data-source-option>`_
  # noqa
+            and
+            `Generic File Source Options 
<https://spark.apache.org/docs/latest/sql-data-sources-generic-options.html`>_  
# noqa

Review comment:
       Shall we remove this too?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HyukjinKwon commented on a change in pull request #32546: [SPARK-35395][DOCS] Move ORC data source options from Python and Scala into a single page

Reply via email to