[GitHub] [spark] HyukjinKwon commented on a change in pull request #32161: [SPARK-35025][SQL][PYTHON][DOCS] Move Parquet data source options from Python and Scala into a single page.

GitBox Wed, 19 May 2021 23:02:02 -0700


HyukjinKwon commented on a change in pull request #32161:
URL: https://github.com/apache/spark/pull/32161#discussion_r635789076




##########
File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
##########
@@ -812,46 +812,14 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
   /**
    * Loads a Parquet file, returning the result as a `DataFrame`.
    *
-   * You can set the following Parquet-specific option(s) for reading Parquet 
files:
-   * <ul>
-   * <li>`mergeSchema` (default is the value specified in 
`spark.sql.parquet.mergeSchema`): sets
-   * whether we should merge schemas collected from all Parquet part-files. 
This will override
-   * `spark.sql.parquet.mergeSchema`.</li>
-   * <li>`pathGlobFilter`: an optional glob pattern to only include files with 
paths matching
-   * the pattern. The syntax follows 
<code>org.apache.hadoop.fs.GlobFilter</code>.
-   * It does not change the behavior of partition discovery.</li>
-   * <li>`modifiedBefore` (batch only): an optional timestamp to only include 
files with
-   * modification times  occurring before the specified Time. The provided 
timestamp
-   * must be in the following form: YYYY-MM-DDTHH:mm:ss (e.g. 
2020-06-01T13:00:00)</li>
-   * <li>`modifiedAfter` (batch only): an optional timestamp to only include 
files with
-   * modification times occurring after the specified Time. The provided 
timestamp
-   * must be in the following form: YYYY-MM-DDTHH:mm:ss (e.g. 
2020-06-01T13:00:00)</li>
-   * <li>`recursiveFileLookup`: recursively scan a directory for files. Using 
this option
-   * disables partition discovery</li>
-   * <li>`datetimeRebaseMode` (default is the value specified in the SQL config
-   * `spark.sql.parquet.datetimeRebaseModeInRead`): the rebasing mode for the 
values
-   * of the `DATE`, `TIMESTAMP_MICROS`, `TIMESTAMP_MILLIS` logical types from 
the Julian to
-   * Proleptic Gregorian calendar:
-   *   <ul>
-   *     <li>`EXCEPTION` : Spark fails in reads of ancient dates/timestamps 
that are ambiguous
-   *     between the two calendars</li>
-   *     <li>`CORRECTED` : loading of dates/timestamps without rebasing</li>
-   *     <li>`LEGACY` : perform rebasing of ancient dates/timestamps from the 
Julian to Proleptic
-   *     Gregorian calendar</li>
-   *   </ul>
-   * </li>
-   * <li>`int96RebaseMode` (default is the value specified in the SQL config
-   * `spark.sql.parquet.int96RebaseModeInRead`): the rebasing mode for `INT96` 
timestamps
-   * from the Julian to Proleptic Gregorian calendar:
-   *   <ul>
-   *     <li>`EXCEPTION` : Spark fails in reads of ancient `INT96` timestamps 
that are ambiguous
-   *     between the two calendars</li>
-   *     <li>`CORRECTED` : loading of timestamps without rebasing</li>
-   *     <li>`LEGACY` : perform rebasing of ancient `INT96` timestamps from 
the Julian to Proleptic
-   *     Gregorian calendar</li>
-   *   </ul>
-   * </li>
-   * </ul>
+   * Parquet-specific option(s) for reading Parquet files can be found in

Review comment:
       The sentence doesn't make sense because you linked general options but 
it says Parquet specific options.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #32161: [SPARK-35025][SQL][PYTHON][DOCS] Move Parquet data source options from Python and Scala into a single page.

Reply via email to