HyukjinKwon commented on a change in pull request #32745:
URL: https://github.com/apache/spark/pull/32745#discussion_r644497726
##########
File path: docs/sql-data-sources-text.md
##########
@@ -57,7 +57,7 @@ Data source options of text can be set via:
</tr>
Review comment:
Can we change `wholetext`'s default value to `<code>false</code>`
##########
File path: docs/sql-data-sources-jdbc.md
##########
@@ -111,115 +120,144 @@ logging into the data sources.
partition stride, not for filtering the rows in table. So all rows in
the table will be
partitioned and returned. This option applies only to reading.
</td>
+ <td>read</td>
</tr>
<tr>
- <td><code>numPartitions</code></td>
- <td>
- The maximum number of partitions that can be used for parallelism in
table reading and
- writing. This also determines the maximum number of concurrent JDBC
connections.
- If the number of partitions to write exceeds this limit, we decrease it
to this limit by
- calling <code>coalesce(numPartitions)</code> before writing.
- </td>
+ <td><code>numPartitions</code></td>
+ <td>(none)</td>
+ <td>
+ The maximum number of partitions that can be used for parallelism in
table reading and
+ writing. This also determines the maximum number of concurrent JDBC
connections.
+ If the number of partitions to write exceeds this limit, we decrease it
to this limit by
+ calling <code>coalesce(numPartitions)</code> before writing.
+ </td>
+ <td>read/write</td>
</tr>
<tr>
<td><code>queryTimeout</code></td>
+ <td><code>0</code></td>
<td>
The number of seconds the driver will wait for a Statement object to
execute to the given
number of seconds. Zero means there is no limit. In the write path, this
option depends on
how JDBC drivers implement the API <code>setQueryTimeout</code>, e.g.,
the h2 JDBC driver
checks the timeout of each query instead of an entire JDBC batch.
- It defaults to <code>0</code>.
</td>
+ <td>read/write</td>
</tr>
<tr>
<td><code>fetchsize</code></td>
+ <td><code>0</code></td>
<td>
- The JDBC fetch size, which determines how many rows to fetch per round
trip. This can help performance on JDBC drivers which default to low fetch size
(e.g. Oracle with 10 rows). This option applies only to reading.
+ The JDBC fetch size, which determines how many rows to fetch per round
trip. This can help performance on JDBC drivers which default to low fetch size
(e.g. Oracle with 10 rows).
</td>
+ <td>read</td>
</tr>
<tr>
- <td><code>batchsize</code></td>
- <td>
- The JDBC batch size, which determines how many rows to insert per round
trip. This can help performance on JDBC drivers. This option applies only to
writing. It defaults to <code>1000</code>.
- </td>
+ <td><code>batchsize</code></td>
+ <td><code>1000</code></td>
+ <td>
+ The JDBC batch size, which determines how many rows to insert per round
trip. This can help performance on JDBC drivers. This option applies only to
writing.
+ </td>
+ <td>write</td>
</tr>
<tr>
- <td><code>isolationLevel</code></td>
- <td>
- The transaction isolation level, which applies to current connection.
It can be one of <code>NONE</code>, <code>READ_COMMITTED</code>,
<code>READ_UNCOMMITTED</code>, <code>REPEATABLE_READ</code>, or
<code>SERIALIZABLE</code>, corresponding to standard transaction isolation
levels defined by JDBC's Connection object, with default of
<code>READ_UNCOMMITTED</code>. This option applies only to writing. Please
refer the documentation in <code>java.sql.Connection</code>.
- </td>
+ <td><code>isolationLevel</code></td>
+ <td><code>READ_UNCOMMITTED</code></td>
+ <td>
+ The transaction isolation level, which applies to current connection. It
can be one of <code>NONE</code>, <code>READ_COMMITTED</code>,
<code>READ_UNCOMMITTED</code>, <code>REPEATABLE_READ</code>, or
<code>SERIALIZABLE</code>, corresponding to standard transaction isolation
levels defined by JDBC's Connection object, with default of
<code>READ_UNCOMMITTED</code>. Please refer the documentation in
<code>java.sql.Connection</code>.
+ </td>
+ <td>write</td>
</tr>
<tr>
- <td><code>sessionInitStatement</code></td>
- <td>
- After each database session is opened to the remote DB and before
starting to read data, this option executes a custom SQL statement (or a PL/SQL
block). Use this to implement session initialization code. Example:
<code>option("sessionInitStatement", """BEGIN execute immediate 'alter session
set "_serial_direct_read"=true'; END;""")</code>
- </td>
+ <td><code>sessionInitStatement</code></td>
+ <td>(none)</td>
+ <td>
+ After each database session is opened to the remote DB and before
starting to read data, this option executes a custom SQL statement (or a PL/SQL
block). Use this to implement session initialization code. Example:
<code>option("sessionInitStatement", """BEGIN execute immediate 'alter session
set "_serial_direct_read"=true'; END;""")</code>
+ </td>
+ <td>read</td>
</tr>
<tr>
<td><code>truncate</code></td>
+ <td><code>false</code></td>
<td>
- This is a JDBC writer related option. When
<code>SaveMode.Overwrite</code> is enabled, this option causes Spark to
truncate an existing table instead of dropping and recreating it. This can be
more efficient, and prevents the table metadata (e.g., indices) from being
removed. However, it will not work in some cases, such as when the new data has
a different schema. It defaults to <code>false</code>. This option applies only
to writing. In case of failures, users should turn off <code>truncate</code>
option to use <code>DROP TABLE</code> again. Also, due to the different
behavior of <code>TRUNCATE TABLE</code> among DBMS, it's not always safe to use
this. MySQLDialect, DB2Dialect, MsSqlServerDialect, DerbyDialect, and
OracleDialect supports this while PostgresDialect and default JDBCDirect
doesn't. For unknown and unsupported JDBCDirect, the user option
<code>truncate</code> is ignored.
+ This is a JDBC writer related option. When
<code>SaveMode.Overwrite</code> is enabled, this option causes Spark to
truncate an existing table instead of dropping and recreating it. This can be
more efficient, and prevents the table metadata (e.g., indices) from being
removed. However, it will not work in some cases, such as when the new data has
a different schema. In case of failures, users should turn off
<code>truncate</code> option to use <code>DROP TABLE</code> again. Also, due to
the different behavior of <code>TRUNCATE TABLE</code> among DBMS, it's not
always safe to use this. MySQLDialect, DB2Dialect, MsSqlServerDialect,
DerbyDialect, and OracleDialect supports this while PostgresDialect and default
JDBCDirect doesn't. For unknown and unsupported JDBCDirect, the user option
<code>truncate</code> is ignored.
+ <td>write</td>
</td>
</tr>
<tr>
<td><code>cascadeTruncate</code></td>
+ <td>the default cascading truncate behaviour of the JDBC database in
question, specified in the <code>isCascadeTruncate</code> in each
JDBCDialect</td>
<td>
- This is a JDBC writer related option. If enabled and supported by the
JDBC database (PostgreSQL and Oracle at the moment), this options allows
execution of a <code>TRUNCATE TABLE t CASCADE</code> (in the case of PostgreSQL
a <code>TRUNCATE TABLE ONLY t CASCADE</code> is executed to prevent
inadvertently truncating descendant tables). This will affect other tables, and
thus should be used with care. This option applies only to writing. It defaults
to the default cascading truncate behaviour of the JDBC database in question,
specified in the <code>isCascadeTruncate</code> in each JDBCDialect.
+ This is a JDBC writer related option. If enabled and supported by the
JDBC database (PostgreSQL and Oracle at the moment), this options allows
execution of a <code>TRUNCATE TABLE t CASCADE</code> (in the case of PostgreSQL
a <code>TRUNCATE TABLE ONLY t CASCADE</code> is executed to prevent
inadvertently truncating descendant tables). This will affect other tables, and
thus should be used with care.
</td>
+ <td>write</td>
</tr>
<tr>
<td><code>createTableOptions</code></td>
+ <td><code>""</code></td>
Review comment:
```suggestion
<td><code></code></td>
```
##########
File path: docs/sql-data-sources-json.md
##########
@@ -114,62 +114,62 @@ Data source options of JSON can be set via:
<tr>
<!-- TODO(SPARK-35433): Add timeZone to Data Source Option for CSV, too.
-->
<td><code>timeZone</code></td>
- <td>None</td>
+ <td>The SQL config <code>spark.sql.session.timeZone</code></td>
Review comment:
```suggestion
<td>(value of <code>spark.sql.session.timeZone</code> configuration)</td>
```
##########
File path: docs/sql-data-sources-json.md
##########
@@ -114,62 +114,62 @@ Data source options of JSON can be set via:
<tr>
<!-- TODO(SPARK-35433): Add timeZone to Data Source Option for CSV, too.
-->
<td><code>timeZone</code></td>
- <td>None</td>
+ <td>The SQL config <code>spark.sql.session.timeZone</code></td>
Review comment:
to match with https://spark.apache.org/docs/latest/configuration.html
##########
File path: docs/sql-data-sources-json.md
##########
@@ -114,62 +114,62 @@ Data source options of JSON can be set via:
<tr>
<!-- TODO(SPARK-35433): Add timeZone to Data Source Option for CSV, too.
-->
<td><code>timeZone</code></td>
- <td>None</td>
+ <td>The SQL config <code>spark.sql.session.timeZone</code></td>
Review comment:
can you fix other instances too? e.g. in CSV as well
https://github.com/apache/spark/blob/73fd6de9a18e8b550fd9afbcf9c87efa598fd76e/docs/sql-data-sources-csv.md
##########
File path: docs/sql-data-sources-json.md
##########
@@ -114,62 +114,62 @@ Data source options of JSON can be set via:
<tr>
<!-- TODO(SPARK-35433): Add timeZone to Data Source Option for CSV, too.
-->
<td><code>timeZone</code></td>
- <td>None</td>
+ <td>The SQL config <code>spark.sql.session.timeZone</code></td>
Review comment:
and parquet
https://github.com/apache/spark/blob/73fd6de9a18e8b550fd9afbcf9c87efa598fd76e/docs/sql-data-sources-parquet.md
##########
File path: docs/sql-data-sources-json.md
##########
@@ -114,62 +114,62 @@ Data source options of JSON can be set via:
<tr>
<!-- TODO(SPARK-35433): Add timeZone to Data Source Option for CSV, too.
-->
<td><code>timeZone</code></td>
- <td>None</td>
+ <td>The SQL config <code>spark.sql.session.timeZone</code></td>
Review comment:
and avro
https://github.com/apache/spark/blob/73fd6de9a18e8b550fd9afbcf9c87efa598fd76e/docs/sql-data-sources-avro.md
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]