[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20057 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user danielvdende commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r203955129 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala --- @@ -102,7 +102,12 @@ object JdbcUtils extends Logging { val dialect = JdbcDialects.get(options.url) val statement = conn.createStatement try { - statement.executeUpdate(dialect.getTruncateQuery(options.table)) + if (options.isCascadeTruncate.isDefined) { +statement.executeUpdate(dialect.getTruncateQuery(options.table, + options.isCascadeTruncate)) + } else { +statement.executeUpdate(dialect.getTruncateQuery(options.table)) + } --- End diff -- Sorry, missed this comment. I'll make the change (@gatorsmile I think this is the comment you meant, right?) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r203949498 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala --- @@ -105,7 +105,12 @@ object JdbcUtils extends Logging { val statement = conn.createStatement try { statement.setQueryTimeout(options.queryTimeout) - statement.executeUpdate(dialect.getTruncateQuery(options.table)) + if (options.isCascadeTruncate.isDefined) { +statement.executeUpdate(dialect.getTruncateQuery(options.table, + options.isCascadeTruncate)) + } else { +statement.executeUpdate(dialect.getTruncateQuery(options.table)) + } --- End diff -- +1 to the above comment of @dongjoon-hyun --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r203948972 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala --- @@ -120,11 +121,26 @@ abstract class JdbcDialect extends Serializable { * The SQL query that should be used to truncate a table. Dialects can override this method to * return a query that is suitable for a particular database. For PostgreSQL, for instance, * a different query is used to prevent "TRUNCATE" affecting other tables. - * @param table The name of the table. + * @param table The table to truncate * @return The SQL query to use for truncating a table */ @Since("2.3.0") def getTruncateQuery(table: String): String = { +getTruncateQuery(table, isCascadingTruncateTable) + } + + /** + * The SQL query that should be used to truncate a table. Dialects can override this method to + * return a query that is suitable for a particular database. For PostgreSQL, for instance, + * a different query is used to prevent "TRUNCATE" affecting other tables. + * @param table The table to truncate + * @param cascade Whether or not to cascade the truncation + * @return The SQL query to use for truncating a table + */ + @Since("2.4.0") + def getTruncateQuery( +table: String, +cascade: Option[Boolean] = isCascadingTruncateTable): String = { --- End diff -- Nit: indent. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r172685440 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/OracleDialect.scala --- @@ -95,4 +95,21 @@ private case object OracleDialect extends JdbcDialect { } override def isCascadingTruncateTable(): Option[Boolean] = Some(false) + + /** + * The SQL query used to truncate a table. + * @param table The table to truncate + * @param cascade Whether or not to cascade the truncation. Default value is the + *value of isCascadingTruncateTable() + * @return The SQL query to use for truncating a table + */ + override def getTruncateQuery( + table: String, + cascade: Option[Boolean] = isCascadingTruncateTable): String = { +cascade match { + case Some(true) => s"TRUNCATE TABLE $table CASCADE" + case _ => s"TRUNCATE TABLE $table" +} + } + --- End diff -- Let's remove line 114. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r172685274 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala --- @@ -120,14 +121,30 @@ abstract class JdbcDialect extends Serializable { * The SQL query that should be used to truncate a table. Dialects can override this method to * return a query that is suitable for a particular database. For PostgreSQL, for instance, * a different query is used to prevent "TRUNCATE" affecting other tables. - * @param table The name of the table. + * @param table The table to truncate * @return The SQL query to use for truncating a table */ @Since("2.3.0") def getTruncateQuery(table: String): String = { +getTruncateQuery(table, isCascadingTruncateTable) + } + + /** + * The SQL query that should be used to truncate a table. Dialects can override this method to + * return a query that is suitable for a particular database. For PostgreSQL, for instance, + * a different query is used to prevent "TRUNCATE" affecting other tables. + * @param table The table to truncate + * @param cascade Whether or not to cascade the truncation + * @return The SQL query to use for truncating a table + */ + @Since("2.4.0") + def getTruncateQuery( +table: String, +cascade: Option[Boolean] = isCascadingTruncateTable): String = { s"TRUNCATE TABLE $table" } + --- End diff -- Let's remove this redundant empty line addition. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r169693135 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala --- @@ -102,7 +102,12 @@ object JdbcUtils extends Logging { val dialect = JdbcDialects.get(options.url) val statement = conn.createStatement try { - statement.executeUpdate(dialect.getTruncateQuery(options.table)) + if (options.isCascadeTruncate.isDefined) { +statement.executeUpdate(dialect.getTruncateQuery(options.table, + options.isCascadeTruncate)) + } else { +statement.executeUpdate(dialect.getTruncateQuery(options.table)) + } --- End diff -- Ping. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r169692801 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala --- @@ -839,35 +839,68 @@ class JDBCSuite extends SparkFunSuite } test("table exists query by jdbc dialect") { -val MySQL = JdbcDialects.get("jdbc:mysql://127.0.0.1/db") -val Postgres = JdbcDialects.get("jdbc:postgresql://127.0.0.1/db") +val mysql = JdbcDialects.get("jdbc:mysql://127.0.0.1/db") +val postgres = JdbcDialects.get("jdbc:postgresql://127.0.0.1/db") val db2 = JdbcDialects.get("jdbc:db2://127.0.0.1/db") val h2 = JdbcDialects.get(url) val derby = JdbcDialects.get("jdbc:derby:db") val table = "weblogs" val defaultQuery = s"SELECT * FROM $table WHERE 1=0" val limitQuery = s"SELECT 1 FROM $table LIMIT 1" -assert(MySQL.getTableExistsQuery(table) == limitQuery) -assert(Postgres.getTableExistsQuery(table) == limitQuery) +assert(mysql.getTableExistsQuery(table) == limitQuery) +assert(postgres.getTableExistsQuery(table) == limitQuery) assert(db2.getTableExistsQuery(table) == defaultQuery) assert(h2.getTableExistsQuery(table) == defaultQuery) assert(derby.getTableExistsQuery(table) == defaultQuery) } test("truncate table query by jdbc dialect") { -val MySQL = JdbcDialects.get("jdbc:mysql://127.0.0.1/db") -val Postgres = JdbcDialects.get("jdbc:postgresql://127.0.0.1/db") +val mysql = JdbcDialects.get("jdbc:mysql://127.0.0.1/db") +val postgres = JdbcDialects.get("jdbc:postgresql://127.0.0.1/db") val db2 = JdbcDialects.get("jdbc:db2://127.0.0.1/db") val h2 = JdbcDialects.get(url) val derby = JdbcDialects.get("jdbc:derby:db") +val oracle = JdbcDialects.get("jdbc:oracle://127.0.0.1/db") +val teradata = JdbcDialects.get("jdbc:teradata://127.0.0.1/db") + val table = "weblogs" val defaultQuery = s"TRUNCATE TABLE $table" val postgresQuery = s"TRUNCATE TABLE ONLY $table" -assert(MySQL.getTruncateQuery(table) == defaultQuery) -assert(Postgres.getTruncateQuery(table) == postgresQuery) +val teradataQuery = s"DELETE FROM $table ALL" + +assert(mysql.getTruncateQuery(table) == defaultQuery) +assert(postgres.getTruncateQuery(table) == postgresQuery) assert(db2.getTruncateQuery(table) == defaultQuery) assert(h2.getTruncateQuery(table) == defaultQuery) assert(derby.getTruncateQuery(table) == defaultQuery) +assert(oracle.getTruncateQuery(table) == defaultQuery) +assert(teradata.getTruncateQuery(table) == teradataQuery) + } + + test("SPARK-22880: Truncate table with CASCADE by jdbc dialect") { +// cascade in a truncate should only be applied for databases that support this, +// even if the parameter is passed. +val mysql = JdbcDialects.get("jdbc:mysql://127.0.0.1/db") +val postgres = JdbcDialects.get("jdbc:postgresql://127.0.0.1/db") +val db2 = JdbcDialects.get("jdbc:db2://127.0.0.1/db") +val h2 = JdbcDialects.get(url) +val derby = JdbcDialects.get("jdbc:derby:db") +val oracle = JdbcDialects.get("jdbc:oracle://127.0.0.1/db") +val teradata = JdbcDialects.get("jdbc:teradata://127.0.0.1/db") + +val table = "weblogs" +val defaultQuery = s"TRUNCATE TABLE $table" +val postgresQuery = s"TRUNCATE TABLE ONLY $table CASCADE" +val oracleQuery = s"TRUNCATE TABLE $table CASCADE" +val teradataQuery = s"DELETE FROM $table ALL" + +assert(mysql.getTruncateQuery(table, Some(true)) == defaultQuery) +assert(postgres.getTruncateQuery(table, Some(true)) == postgresQuery) +assert(db2.getTruncateQuery(table, Some(true)) == defaultQuery) +assert(h2.getTruncateQuery(table, Some(true)) == defaultQuery) +assert(derby.getTruncateQuery(table, Some(true)) == defaultQuery) +assert(oracle.getTruncateQuery(table, Some(true)) == oracleQuery) +assert(teradata.getTruncateQuery(table, Some(true)) == teradataQuery) --- End diff -- Let's group the same one like the following. ``` Seq(mysql, db2, h2, derby).foreach { dialect => assert(dialect.getTruncateQuery(table, Some(true)) == defaultQuery) } ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r169692604 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala --- @@ -839,35 +839,68 @@ class JDBCSuite extends SparkFunSuite } test("table exists query by jdbc dialect") { -val MySQL = JdbcDialects.get("jdbc:mysql://127.0.0.1/db") -val Postgres = JdbcDialects.get("jdbc:postgresql://127.0.0.1/db") +val mysql = JdbcDialects.get("jdbc:mysql://127.0.0.1/db") +val postgres = JdbcDialects.get("jdbc:postgresql://127.0.0.1/db") val db2 = JdbcDialects.get("jdbc:db2://127.0.0.1/db") val h2 = JdbcDialects.get(url) val derby = JdbcDialects.get("jdbc:derby:db") val table = "weblogs" val defaultQuery = s"SELECT * FROM $table WHERE 1=0" val limitQuery = s"SELECT 1 FROM $table LIMIT 1" -assert(MySQL.getTableExistsQuery(table) == limitQuery) -assert(Postgres.getTableExistsQuery(table) == limitQuery) +assert(mysql.getTableExistsQuery(table) == limitQuery) +assert(postgres.getTableExistsQuery(table) == limitQuery) assert(db2.getTableExistsQuery(table) == defaultQuery) assert(h2.getTableExistsQuery(table) == defaultQuery) assert(derby.getTableExistsQuery(table) == defaultQuery) } test("truncate table query by jdbc dialect") { --- End diff -- For this test case, we added `oracle` and `teradata`. It's okay to update this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r169692336 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala --- @@ -805,13 +805,13 @@ class JDBCSuite extends SparkFunSuite } test("PostgresDialect type mapping") { --- End diff -- This test case, too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r169692391 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala --- @@ -839,35 +839,68 @@ class JDBCSuite extends SparkFunSuite } test("table exists query by jdbc dialect") { --- End diff -- this test case, too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r169692053 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala --- @@ -684,17 +684,17 @@ class JDBCSuite extends SparkFunSuite } test("quote column names by jdbc dialect") { --- End diff -- I understand the indention is consistency, but let's keep the unrelated code untouched in this PR. We can do refactoring PR later if needed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r169689517 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/TeradataDialect.scala --- @@ -31,4 +31,19 @@ private case object TeradataDialect extends JdbcDialect { case BooleanType => Option(JdbcType("CHAR(1)", java.sql.Types.CHAR)) case _ => None } + + override def isCascadingTruncateTable(): Option[Boolean] = Some(false) --- End diff -- Please add a function description that Teradata doesn't support cascading DELETE. You may need to explain the lack of TRUNCATE syntax, too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r169688539 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/TeradataDialect.scala --- @@ -31,4 +31,19 @@ private case object TeradataDialect extends JdbcDialect { case BooleanType => Option(JdbcType("CHAR(1)", java.sql.Types.CHAR)) case _ => None } + + override def isCascadingTruncateTable(): Option[Boolean] = Some(false) + + /** + * The SQL query used to truncate a table. --- End diff -- Please add more description about Teradata's lack of `TRUNCATE` syntax here. Otherwise, people try to make a PR for this mistakenly. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user danielvdende commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r169583476 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/TeradataDialect.scala --- @@ -31,4 +31,19 @@ private case object TeradataDialect extends JdbcDialect { case BooleanType => Option(JdbcType("CHAR(1)", java.sql.Types.CHAR)) case _ => None } + + override def isCascadingTruncateTable(): Option[Boolean] = Some(false) + + /** + * The SQL query used to truncate a table. + * @param table The table to truncate. + * @param cascade Whether or not to cascade the truncation. Default value is the + *value of isCascadingTruncateTable(). Ignored for Teradata as it is unsupported + * @return The SQL query to use for truncating a table + */ + override def getTruncateQuery( + table: String, + cascade: Option[Boolean] = isCascadingTruncateTable): String = { +s"TRUNCATE TABLE $table" --- End diff -- Thanks @klinvill for helping out :-) @dongjoon-hyun I've made the changes to `TeradataDialect` and updated the tests accordingly. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r169530627 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/TeradataDialect.scala --- @@ -31,4 +31,19 @@ private case object TeradataDialect extends JdbcDialect { case BooleanType => Option(JdbcType("CHAR(1)", java.sql.Types.CHAR)) case _ => None } + + override def isCascadingTruncateTable(): Option[Boolean] = Some(false) + + /** + * The SQL query used to truncate a table. + * @param table The table to truncate. + * @param cascade Whether or not to cascade the truncation. Default value is the + *value of isCascadingTruncateTable(). Ignored for Teradata as it is unsupported + * @return The SQL query to use for truncating a table + */ + override def getTruncateQuery( + table: String, + cascade: Option[Boolean] = isCascadingTruncateTable): String = { +s"TRUNCATE TABLE $table" --- End diff -- Thank you so much for confirming, @klinvill ! @danielvdende . Could you update `TeradataDialect` according to @klinvill 's advice? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user klinvill commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r169526767 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/TeradataDialect.scala --- @@ -31,4 +31,19 @@ private case object TeradataDialect extends JdbcDialect { case BooleanType => Option(JdbcType("CHAR(1)", java.sql.Types.CHAR)) case _ => None } + + override def isCascadingTruncateTable(): Option[Boolean] = Some(false) + + /** + * The SQL query used to truncate a table. + * @param table The table to truncate. + * @param cascade Whether or not to cascade the truncation. Default value is the + *value of isCascadingTruncateTable(). Ignored for Teradata as it is unsupported + * @return The SQL query to use for truncating a table + */ + override def getTruncateQuery( + table: String, + cascade: Option[Boolean] = isCascadingTruncateTable): String = { +s"TRUNCATE TABLE $table" --- End diff -- Hi @dongjoon-hyun, I was the original author of the TeradataDialect and @gatorsmile reviewed and committed it. You are correct, Teradata does not support the TRUNCATE statement. Instead Teradata uses a DELETE statement so you should be able to use `DELETE FROM $table ALL` instead of `TRUNCATE TABLE $table` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r169443807 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/TeradataDialect.scala --- @@ -31,4 +31,19 @@ private case object TeradataDialect extends JdbcDialect { case BooleanType => Option(JdbcType("CHAR(1)", java.sql.Types.CHAR)) case _ => None } + + override def isCascadingTruncateTable(): Option[Boolean] = Some(false) + + /** + * The SQL query used to truncate a table. + * @param table The table to truncate. + * @param cascade Whether or not to cascade the truncation. Default value is the + *value of isCascadingTruncateTable(). Ignored for Teradata as it is unsupported + * @return The SQL query to use for truncating a table + */ + override def getTruncateQuery( + table: String, + cascade: Option[Boolean] = isCascadingTruncateTable): String = { +s"TRUNCATE TABLE $table" --- End diff -- Gentle ping, @klinvill . --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user danielvdende commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r169025602 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala --- @@ -85,15 +85,24 @@ private object PostgresDialect extends JdbcDialect { s"SELECT 1 FROM $table LIMIT 1" } + override def isCascadingTruncateTable(): Option[Boolean] = Some(false) + /** - * The SQL query used to truncate a table. For Postgres, the default behaviour is to - * also truncate any descendant tables. As this is a (possibly unwanted) side-effect, - * the Postgres dialect adds 'ONLY' to truncate only the table in question - * @param table The name of the table. - * @return The SQL query to use for truncating a table - */ - override def getTruncateQuery(table: String): String = { -s"TRUNCATE TABLE ONLY $table" + * The SQL query used to truncate a table. For Postgres, the default behaviour is to + * also truncate any descendant tables. As this is a (possibly unwanted) side-effect, + * the Postgres dialect adds 'ONLY' to truncate only the table in question + * @param table The table to truncate + * @param cascade Whether or not to cascade the truncation. Default value is the value of + *isCascadingTruncateTable() + * @return The SQL query to use for truncating a table + */ + override def getTruncateQuery( + table: String, + cascade: Option[Boolean] = isCascadingTruncateTable): String = { +cascade match { + case Some(true) => s"TRUNCATE TABLE ONLY $table CASCADE" --- End diff -- Done :-) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r168971182 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/DB2Dialect.scala --- @@ -49,4 +49,17 @@ private object DB2Dialect extends JdbcDialect { } override def isCascadingTruncateTable(): Option[Boolean] = Some(false) + + /** + * The SQL query used to truncate a table. + * @param table The table to truncate. + * @param cascade Whether or not to cascade the truncation. Default value is the + *value of isCascadingTruncateTable(). Ignored for DB2 as it is unsupported. + * @return The SQL query to use for truncating a table + */ + override def getTruncateQuery( + table: String, + cascade: Option[Boolean] = isCascadingTruncateTable): String = { +s"TRUNCATE TABLE $table" + } --- End diff -- Since this is the same code from JdbcDialect, it seems we don't need to add this here. Not only this DB2Dialect, but also DerbyDialect/MsSQLDialect/MySQLDialect. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r168971074 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/TeradataDialect.scala --- @@ -31,4 +31,19 @@ private case object TeradataDialect extends JdbcDialect { case BooleanType => Option(JdbcType("CHAR(1)", java.sql.Types.CHAR)) case _ => None } + + override def isCascadingTruncateTable(): Option[Boolean] = Some(false) + + /** + * The SQL query used to truncate a table. + * @param table The table to truncate. + * @param cascade Whether or not to cascade the truncation. Default value is the + *value of isCascadingTruncateTable(). Ignored for Teradata as it is unsupported + * @return The SQL query to use for truncating a table + */ + override def getTruncateQuery( + table: String, + cascade: Option[Boolean] = isCascadingTruncateTable): String = { +s"TRUNCATE TABLE $table" --- End diff -- Currently, `TeradataDialect` seems to generate a unsupported syntax because it inherits from `JdbcDialects`. Before doing this PR, we had better check this. Ping, @klinvill and @gatorsmile who is the original author and committer of this code. Could you give us some comment for this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r168970903 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/TeradataDialect.scala --- @@ -31,4 +31,19 @@ private case object TeradataDialect extends JdbcDialect { case BooleanType => Option(JdbcType("CHAR(1)", java.sql.Types.CHAR)) case _ => None } + + override def isCascadingTruncateTable(): Option[Boolean] = Some(false) + + /** + * The SQL query used to truncate a table. + * @param table The table to truncate. + * @param cascade Whether or not to cascade the truncation. Default value is the + *value of isCascadingTruncateTable(). Ignored for Teradata as it is unsupported + * @return The SQL query to use for truncating a table + */ + override def getTruncateQuery( + table: String, + cascade: Option[Boolean] = isCascadingTruncateTable): String = { +s"TRUNCATE TABLE $table" --- End diff -- At the beginning, I saw `teradata` is missed in your test case. So, I asked to add that. But, today, I did double-check that in order to review this line. Unfortunately, according to Teradata document, `TRUNCATE` syntax is not documented. And it's not supported according to some old community articles. - Teradata SQL Reference - https://info.teradata.com/HTMLPubs/DB_TTU_16_00/index.html#page/SQL_Reference%2FB035-1146-160K%2Fpds1472240516459.html%23 - Community (DELETE/TRUNCATE COMMAND) - https://community.teradata.com/t5/Database/Delete-truncate-command/td-p/434 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r168969792 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala --- @@ -102,7 +102,12 @@ object JdbcUtils extends Logging { val dialect = JdbcDialects.get(options.url) val statement = conn.createStatement try { - statement.executeUpdate(dialect.getTruncateQuery(options.table)) + if (options.isCascadeTruncate.isDefined) { +statement.executeUpdate(dialect.getTruncateQuery(options.table, + options.isCascadeTruncate)) + } else { +statement.executeUpdate(dialect.getTruncateQuery(options.table)) + } --- End diff -- What about the following? ```scala val truncateQuery = if (options.isCascadeTruncate.isDefined) { dialect.getTruncateQuery(options.table, options.isCascadeTruncate)) } else { dialect.getTruncateQuery(options.table) } statement.executeUpdate(truncateQuery) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r168969101 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala --- @@ -85,15 +85,24 @@ private object PostgresDialect extends JdbcDialect { s"SELECT 1 FROM $table LIMIT 1" } + override def isCascadingTruncateTable(): Option[Boolean] = Some(false) + /** - * The SQL query used to truncate a table. For Postgres, the default behaviour is to - * also truncate any descendant tables. As this is a (possibly unwanted) side-effect, - * the Postgres dialect adds 'ONLY' to truncate only the table in question - * @param table The name of the table. - * @return The SQL query to use for truncating a table - */ - override def getTruncateQuery(table: String): String = { -s"TRUNCATE TABLE ONLY $table" + * The SQL query used to truncate a table. For Postgres, the default behaviour is to + * also truncate any descendant tables. As this is a (possibly unwanted) side-effect, + * the Postgres dialect adds 'ONLY' to truncate only the table in question + * @param table The table to truncate + * @param cascade Whether or not to cascade the truncation. Default value is the value of + *isCascadingTruncateTable() + * @return The SQL query to use for truncating a table + */ + override def getTruncateQuery( + table: String, + cascade: Option[Boolean] = isCascadingTruncateTable): String = { +cascade match { + case Some(true) => s"TRUNCATE TABLE ONLY $table CASCADE" --- End diff -- Yep. Thank you for testing. Could you add more description about this into the parameter description of `cascade`? ``` * @param cascade Whether or not to cascade the truncation. Default value is the value of *isCascadingTruncateTable() ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user danielvdende commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r168943159 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala --- @@ -85,15 +85,24 @@ private object PostgresDialect extends JdbcDialect { s"SELECT 1 FROM $table LIMIT 1" } + override def isCascadingTruncateTable(): Option[Boolean] = Some(false) + /** - * The SQL query used to truncate a table. For Postgres, the default behaviour is to - * also truncate any descendant tables. As this is a (possibly unwanted) side-effect, - * the Postgres dialect adds 'ONLY' to truncate only the table in question - * @param table The name of the table. - * @return The SQL query to use for truncating a table - */ - override def getTruncateQuery(table: String): String = { -s"TRUNCATE TABLE ONLY $table" + * The SQL query used to truncate a table. For Postgres, the default behaviour is to + * also truncate any descendant tables. As this is a (possibly unwanted) side-effect, + * the Postgres dialect adds 'ONLY' to truncate only the table in question + * @param table The table to truncate + * @param cascade Whether or not to cascade the truncation. Default value is the value of + *isCascadingTruncateTable() + * @return The SQL query to use for truncating a table + */ + override def getTruncateQuery( + table: String, + cascade: Option[Boolean] = isCascadingTruncateTable): String = { +cascade match { + case Some(true) => s"TRUNCATE TABLE ONLY $table CASCADE" --- End diff -- Sure, I made a quick example, as you can see using `TRUNCATE TABLE ONLY $table CASCADE` will truncate the foreign key-ed table, but leave the inheritance relationship intact: ```postgres=# CREATE SCHEMA test; CREATE SCHEMA postgres=# CREATE TABLE parent(a INT); CREATE TABLE postgres=# ALTER TABLE parent ADD CONSTRAINT some_constraint PRIMARY KEY(a); ALTER TABLE postgres=# CREATE TABLE child(b INT) INHERITS (parent); CREATE TABLE postgres=# CREATE TABLE forkey(c INT); CREATE TABLE postgres=# ALTER TABLE forkey ADD FOREIGN KEY(c) REFERENCES parent(a); ALTER TABLE postgres=# INSERT INTO parent VALUES(1); INSERT 0 1 postgres=# select * from parent; a --- 1 (1 row) postgres=# select * from child; a | b ---+--- (0 rows) postgres=# INSERT INTO child VALUES(2); INSERT 0 1 postgres=# select * from child; a | b ---+--- 2 | (1 row) postgres=# select * from parent; a --- 1 2 (2 rows) postgres=# INSERT INTO forkey VALUES(1); INSERT 0 1 postgres=# select * from forkey; c --- 1 (1 row) postgres=# TRUNCATE TABLE ONLY parent CASCADE; NOTICE: truncate cascades to table "forkey" TRUNCATE TABLE postgres=# select * from parent; a --- 2 (1 row) postgres=# select * from child; a | b ---+--- 2 | (1 row) postgres=# select * from forkey; c --- (0 rows) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r168930195 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala --- @@ -85,15 +85,24 @@ private object PostgresDialect extends JdbcDialect { s"SELECT 1 FROM $table LIMIT 1" } + override def isCascadingTruncateTable(): Option[Boolean] = Some(false) + /** - * The SQL query used to truncate a table. For Postgres, the default behaviour is to - * also truncate any descendant tables. As this is a (possibly unwanted) side-effect, - * the Postgres dialect adds 'ONLY' to truncate only the table in question - * @param table The name of the table. - * @return The SQL query to use for truncating a table - */ - override def getTruncateQuery(table: String): String = { -s"TRUNCATE TABLE ONLY $table" + * The SQL query used to truncate a table. For Postgres, the default behaviour is to + * also truncate any descendant tables. As this is a (possibly unwanted) side-effect, + * the Postgres dialect adds 'ONLY' to truncate only the table in question + * @param table The table to truncate + * @param cascade Whether or not to cascade the truncation. Default value is the value of + *isCascadingTruncateTable() + * @return The SQL query to use for truncating a table + */ + override def getTruncateQuery( + table: String, + cascade: Option[Boolean] = isCascadingTruncateTable): String = { +cascade match { + case Some(true) => s"TRUNCATE TABLE ONLY $table CASCADE" --- End diff -- I believe you already these this in PostgreSQL. To make it sure for us about using these `ONLY` and `CASCADE` at the same time, could you share us the result of a manual test of this with PostgreSQL tables, descendant, and foreign-key referenced tables here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user danielvdende commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r168929874 --- Diff: docs/sql-programming-guide.md --- @@ -1372,6 +1372,13 @@ the following case-insensitive options: This is a JDBC writer related option. When SaveMode.Overwrite is enabled, this option causes Spark to truncate an existing table instead of dropping and recreating it. This can be more efficient, and prevents the table metadata (e.g., indices) from being removed. However, it will not work in some cases, such as when the new data has a different schema. It defaults to false. This option applies only to writing. + + +cascadeTruncate + +This is a JDBC writer related option. If enabled and supported by the JDBC database (PostgreSQL and Oracle at the moment), this options allows execution of a TRUNCATE TABLE t CASCADE. This will affect other tables, and thus should be used with care. This option applies only to writing. It defaults to the default cascading truncate behaviour of the JDBC database in question, specified in the isCascadeTruncate in each JDBCDialect. --- End diff -- Yeah, so the "problem" there is that PostgreSQL has inheritance across tables (as we discovered in [SPARK-22729](https://github.com/apache/spark/pull/19911). To be completely transparent to users, I think the `ONLY` part of the query for PostgreSQL should also be configurable, but I think that's outside the scope of this PR. I've added a comment saying that for PostgreSQL a `ONLY` is added to prevent descendant tables from being truncated. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user danielvdende commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r168929798 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/OracleDialect.scala --- @@ -94,5 +94,21 @@ private case object OracleDialect extends JdbcDialect { case _ => value } + /** + * The SQL query used to truncate a table. + * @param table The JDBCOptions. + * @param cascade Whether or not to cascade the truncation. Default value is the + *value of isCascadingTruncateTable() + * @return The SQL query to use for truncating a table + */ + override def getTruncateQuery( --- End diff -- Done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user danielvdende commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r168929792 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala --- @@ -42,4 +42,17 @@ private object MsSqlServerDialect extends JdbcDialect { } override def isCascadingTruncateTable(): Option[Boolean] = Some(false) + + /** + * The SQL query used to truncate a table. + * @param table The JDBCOptions. + * @param cascade Whether or not to cascade the truncation. Default value is the + *value of isCascadingTruncateTable(). Ignored for MsSql as it is unsupported. --- End diff -- Hmm, I disagree with this, the reason I added it was to show users that even if they specified `cascadeTruncate` being `true`, it doesn't take effect for some databases (like MsSQL). The `isCascadingTruncateTable` only shows the default behaviour, which for all databases at the moment is false. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r168928044 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala --- @@ -120,11 +121,13 @@ abstract class JdbcDialect extends Serializable { * The SQL query that should be used to truncate a table. Dialects can override this method to * return a query that is suitable for a particular database. For PostgreSQL, for instance, * a different query is used to prevent "TRUNCATE" affecting other tables. - * @param table The name of the table. + * @param table The table to truncate + * @param cascade Whether or not to cascade the truncation * @return The SQL query to use for truncating a table */ @Since("2.3.0") - def getTruncateQuery(table: String): String = { + def getTruncateQuery(table: String, + cascade: Option[Boolean] = isCascadingTruncateTable): String = { --- End diff -- For this one, we need to keep the original one for backward compatibility like the following? ``` @Since("2.3.0") def getTruncateQuery(table: String): String = { getTruncateQuery(table, isCascadingTruncateTable) } @Since("2.4.0") def getTruncateQuery(table: String, cascade: Option[Boolean] = isCascadingTruncateTable) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r168927868 --- Diff: docs/sql-programming-guide.md --- @@ -1372,6 +1372,13 @@ the following case-insensitive options: This is a JDBC writer related option. When SaveMode.Overwrite is enabled, this option causes Spark to truncate an existing table instead of dropping and recreating it. This can be more efficient, and prevents the table metadata (e.g., indices) from being removed. However, it will not work in some cases, such as when the new data has a different schema. It defaults to false. This option applies only to writing. + + +cascadeTruncate + +This is a JDBC writer related option. If enabled and supported by the JDBC database (PostgreSQL and Oracle at the moment), this options allows execution of a TRUNCATE TABLE t CASCADE. This will affect other tables, and thus should be used with care. This option applies only to writing. It defaults to the default cascading truncate behaviour of the JDBC database in question, specified in the isCascadeTruncate in each JDBCDialect. --- End diff -- Ur, for PostgreSQL, we are generating `TRUNCATE TABLE ONLY $table CASCADE`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r168927785 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/AggregatedDialect.scala --- @@ -64,7 +64,16 @@ private class AggregatedDialect(dialects: List[JdbcDialect]) extends JdbcDialect } } - override def getTruncateQuery(table: String): String = { -dialects.head.getTruncateQuery(table) + /** + * The SQL query used to truncate a table. + * @param table The JDBCOptions. --- End diff -- ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r168927699 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/OracleDialect.scala --- @@ -94,5 +94,21 @@ private case object OracleDialect extends JdbcDialect { case _ => value } + /** + * The SQL query used to truncate a table. + * @param table The JDBCOptions. + * @param cascade Whether or not to cascade the truncation. Default value is the + *value of isCascadingTruncateTable() + * @return The SQL query to use for truncating a table + */ + override def getTruncateQuery( --- End diff -- Could you move this function after `isCascadingTruncateTable` like the other files? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r168927651 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/OracleDialect.scala --- @@ -94,5 +94,21 @@ private case object OracleDialect extends JdbcDialect { case _ => value } + /** + * The SQL query used to truncate a table. + * @param table The JDBCOptions. --- End diff -- ditto. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r168927631 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala --- @@ -42,4 +42,17 @@ private object MsSqlServerDialect extends JdbcDialect { } override def isCascadingTruncateTable(): Option[Boolean] = Some(false) + + /** + * The SQL query used to truncate a table. + * @param table The JDBCOptions. + * @param cascade Whether or not to cascade the truncation. Default value is the + *value of isCascadingTruncateTable(). Ignored for MsSql as it is unsupported. --- End diff -- And, for all dialects, please remove it or move it consistently. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r168927633 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala --- @@ -46,4 +46,17 @@ private case object MySQLDialect extends JdbcDialect { } override def isCascadingTruncateTable(): Option[Boolean] = Some(false) + + /** + * The SQL query used to truncate a table. + * @param table The JDBCOptions. --- End diff -- ditto. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r168927608 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala --- @@ -42,4 +42,17 @@ private object MsSqlServerDialect extends JdbcDialect { } override def isCascadingTruncateTable(): Option[Boolean] = Some(false) + + /** + * The SQL query used to truncate a table. + * @param table The JDBCOptions. + * @param cascade Whether or not to cascade the truncation. Default value is the + *value of isCascadingTruncateTable(). Ignored for MsSql as it is unsupported. --- End diff -- I think we can ignore `Ignored for MsSql as it is unsupported.` here. If you want, you can make a function comment of `override def isCascadingTruncateTable(): Option[Boolean] = Some(false)`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r168927597 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala --- @@ -42,4 +42,17 @@ private object MsSqlServerDialect extends JdbcDialect { } override def isCascadingTruncateTable(): Option[Boolean] = Some(false) + + /** + * The SQL query used to truncate a table. + * @param table The JDBCOptions. --- End diff -- `The JDBCOptions`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r168927546 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/DB2Dialect.scala --- @@ -49,4 +49,17 @@ private object DB2Dialect extends JdbcDialect { } override def isCascadingTruncateTable(): Option[Boolean] = Some(false) + + /** + * The SQL query used to truncate a table. + * @param table The JDBCOptions. --- End diff -- `The JDBCOptions`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r168927495 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala --- @@ -120,11 +121,13 @@ abstract class JdbcDialect extends Serializable { * The SQL query that should be used to truncate a table. Dialects can override this method to * return a query that is suitable for a particular database. For PostgreSQL, for instance, * a different query is used to prevent "TRUNCATE" affecting other tables. - * @param table The name of the table. + * @param table The table to truncate + * @param cascade Whether or not to cascade the truncation * @return The SQL query to use for truncating a table */ @Since("2.3.0") - def getTruncateQuery(table: String): String = { + def getTruncateQuery(table: String, + cascade: Option[Boolean] = isCascadingTruncateTable): String = { --- End diff -- @danielvdende . Could you take a look the guide? This guide is helpful in Apache Spark community. - https://github.com/databricks/scala-style-guide --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r168927408 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/DerbyDialect.scala --- @@ -41,4 +41,16 @@ private object DerbyDialect extends JdbcDialect { Option(JdbcType("DECIMAL(31,5)", java.sql.Types.DECIMAL)) case _ => None } + + /** + * The SQL query used to truncate a table. + * @param table The JDBCOptions. + * @param cascade Whether or not to cascade the truncation. Default value is the + *value of isCascadingTruncateTable(). Ignored for Derby as it is unsupported. + * @return The SQL query to use for truncating a table + */ + override def getTruncateQuery(table: String, +cascade: Option[Boolean] = isCascadingTruncateTable): String = { --- End diff -- indentation. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r168927400 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala --- @@ -102,7 +102,12 @@ object JdbcUtils extends Logging { val dialect = JdbcDialects.get(options.url) val statement = conn.createStatement try { - statement.executeUpdate(dialect.getTruncateQuery(options.table)) + if (options.isCascadeTruncate.isDefined) { +statement.executeUpdate(dialect.getTruncateQuery(options.table, + options.isCascadeTruncate)) + } else { +statement.executeUpdate(dialect.getTruncateQuery(options.table)) + } --- End diff -- I see. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r168927406 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/DerbyDialect.scala --- @@ -41,4 +41,16 @@ private object DerbyDialect extends JdbcDialect { Option(JdbcType("DECIMAL(31,5)", java.sql.Types.DECIMAL)) case _ => None } + + /** + * The SQL query used to truncate a table. --- End diff -- Indentation. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user danielvdende commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r168922010 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala --- @@ -860,14 +860,41 @@ class JDBCSuite extends SparkFunSuite val db2 = JdbcDialects.get("jdbc:db2://127.0.0.1/db") val h2 = JdbcDialects.get(url) val derby = JdbcDialects.get("jdbc:derby:db") +val oracle = JdbcDialects.get("jdbc:oracle://127.0.0.1/db") + val table = "weblogs" val defaultQuery = s"TRUNCATE TABLE $table" val postgresQuery = s"TRUNCATE TABLE ONLY $table" + assert(MySQL.getTruncateQuery(table) == defaultQuery) assert(Postgres.getTruncateQuery(table) == postgresQuery) assert(db2.getTruncateQuery(table) == defaultQuery) assert(h2.getTruncateQuery(table) == defaultQuery) assert(derby.getTruncateQuery(table) == defaultQuery) +assert(oracle.getTruncateQuery(table) == defaultQuery) + } + + test("SPARK-22880: Truncate table with CASCADE by jdbc dialect") { +// cascade in a truncate should only be applied for databases that support this, +// even if the parameter is passed. +val MySQL = JdbcDialects.get("jdbc:mysql://127.0.0.1/db") +val Postgres = JdbcDialects.get("jdbc:postgresql://127.0.0.1/db") +val db2 = JdbcDialects.get("jdbc:db2://127.0.0.1/db") +val h2 = JdbcDialects.get(url) +val derby = JdbcDialects.get("jdbc:derby:db") +val oracle = JdbcDialects.get("jdbc:oracle://127.0.0.1/db") --- End diff -- Done ð --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user danielvdende commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r168922006 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala --- @@ -860,14 +860,41 @@ class JDBCSuite extends SparkFunSuite val db2 = JdbcDialects.get("jdbc:db2://127.0.0.1/db") val h2 = JdbcDialects.get(url) val derby = JdbcDialects.get("jdbc:derby:db") +val oracle = JdbcDialects.get("jdbc:oracle://127.0.0.1/db") + val table = "weblogs" val defaultQuery = s"TRUNCATE TABLE $table" val postgresQuery = s"TRUNCATE TABLE ONLY $table" + assert(MySQL.getTruncateQuery(table) == defaultQuery) assert(Postgres.getTruncateQuery(table) == postgresQuery) assert(db2.getTruncateQuery(table) == defaultQuery) assert(h2.getTruncateQuery(table) == defaultQuery) assert(derby.getTruncateQuery(table) == defaultQuery) +assert(oracle.getTruncateQuery(table) == defaultQuery) + } + + test("SPARK-22880: Truncate table with CASCADE by jdbc dialect") { +// cascade in a truncate should only be applied for databases that support this, +// even if the parameter is passed. +val MySQL = JdbcDialects.get("jdbc:mysql://127.0.0.1/db") --- End diff -- Good point, I saw a few other tests in there (untouched by the PR initially) that had this, fixed those now too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user danielvdende commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r168921851 --- Diff: docs/sql-programming-guide.md --- @@ -1372,6 +1372,13 @@ the following case-insensitive options: This is a JDBC writer related option. When SaveMode.Overwrite is enabled, this option causes Spark to truncate an existing table instead of dropping and recreating it. This can be more efficient, and prevents the table metadata (e.g., indices) from being removed. However, it will not work in some cases, such as when the new data has a different schema. It defaults to false. This option applies only to writing. + + +cascadeTruncate + +This is a JDBC writer related option. If enabled and supported by the JDBC database (PostgreSQL and Oracle at the moment), this options allows execution of a TRUNCATE TABLE t CASCADE. This will affect other tables, and thus should be used with care. This option applies only to writing. --- End diff -- As mentioned in another comment, I think we should use the value of `isCascadingTruncateTable` as the default, rather than always `false`. Seems like the correct use of that variable. I can add a sentence to the docs specifying that. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user danielvdende commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r168921833 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala --- @@ -102,7 +102,12 @@ object JdbcUtils extends Logging { val dialect = JdbcDialects.get(options.url) val statement = conn.createStatement try { - statement.executeUpdate(dialect.getTruncateQuery(options.table)) + if (options.isCascadeTruncate.isDefined) { +statement.executeUpdate(dialect.getTruncateQuery(options.table, + options.isCascadeTruncate)) + } else { +statement.executeUpdate(dialect.getTruncateQuery(options.table)) + } --- End diff -- see the previous comment on why this is an `Option[Boolean]` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user danielvdende commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r168921808 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala --- @@ -119,6 +119,8 @@ class JDBCOptions( // // if to truncate the table from the JDBC database val isTruncate = parameters.getOrElse(JDBC_TRUNCATE, "false").toBoolean + + val isCascadeTruncate: Option[Boolean] = parameters.get(JDBC_CASCADE_TRUNCATE).map(_.toBoolean) --- End diff -- The reason I didn't do that is because of the existence of the `isCascadingTruncateTable` function for each dialect. According to the docs, that indicates whether or not a `TRUNCATE TABLE` command results in cascading behaviour by default for a given dialect. I thought it would be nice to then use that value as the default value for `isCascadeTruncate`. In that way, if there ever is a dialect that cascades truncations by default, we don't 'hardcode' a default value. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r168885013 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala --- @@ -860,14 +860,41 @@ class JDBCSuite extends SparkFunSuite val db2 = JdbcDialects.get("jdbc:db2://127.0.0.1/db") val h2 = JdbcDialects.get(url) val derby = JdbcDialects.get("jdbc:derby:db") +val oracle = JdbcDialects.get("jdbc:oracle://127.0.0.1/db") + val table = "weblogs" val defaultQuery = s"TRUNCATE TABLE $table" val postgresQuery = s"TRUNCATE TABLE ONLY $table" + assert(MySQL.getTruncateQuery(table) == defaultQuery) assert(Postgres.getTruncateQuery(table) == postgresQuery) assert(db2.getTruncateQuery(table) == defaultQuery) assert(h2.getTruncateQuery(table) == defaultQuery) assert(derby.getTruncateQuery(table) == defaultQuery) +assert(oracle.getTruncateQuery(table) == defaultQuery) + } + + test("SPARK-22880: Truncate table with CASCADE by jdbc dialect") { +// cascade in a truncate should only be applied for databases that support this, +// even if the parameter is passed. +val MySQL = JdbcDialects.get("jdbc:mysql://127.0.0.1/db") +val Postgres = JdbcDialects.get("jdbc:postgresql://127.0.0.1/db") +val db2 = JdbcDialects.get("jdbc:db2://127.0.0.1/db") +val h2 = JdbcDialects.get(url) +val derby = JdbcDialects.get("jdbc:derby:db") +val oracle = JdbcDialects.get("jdbc:oracle://127.0.0.1/db") --- End diff -- Could you add `teradata` test case, too? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r168884880 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala --- @@ -860,14 +860,41 @@ class JDBCSuite extends SparkFunSuite val db2 = JdbcDialects.get("jdbc:db2://127.0.0.1/db") val h2 = JdbcDialects.get(url) val derby = JdbcDialects.get("jdbc:derby:db") +val oracle = JdbcDialects.get("jdbc:oracle://127.0.0.1/db") + val table = "weblogs" val defaultQuery = s"TRUNCATE TABLE $table" val postgresQuery = s"TRUNCATE TABLE ONLY $table" + assert(MySQL.getTruncateQuery(table) == defaultQuery) assert(Postgres.getTruncateQuery(table) == postgresQuery) assert(db2.getTruncateQuery(table) == defaultQuery) assert(h2.getTruncateQuery(table) == defaultQuery) assert(derby.getTruncateQuery(table) == defaultQuery) +assert(oracle.getTruncateQuery(table) == defaultQuery) + } + + test("SPARK-22880: Truncate table with CASCADE by jdbc dialect") { +// cascade in a truncate should only be applied for databases that support this, +// even if the parameter is passed. +val MySQL = JdbcDialects.get("jdbc:mysql://127.0.0.1/db") --- End diff -- Since this is variable, what about - `MySQL` -> `mysql` - `Postgres` -> `postgres` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r168883676 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/TeradataDialect.scala --- @@ -31,4 +31,16 @@ private case object TeradataDialect extends JdbcDialect { case BooleanType => Option(JdbcType("CHAR(1)", java.sql.Types.CHAR)) case _ => None } + + /** + * The SQL query used to truncate a table. --- End diff -- ditto. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r168883594 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/OracleDialect.scala --- @@ -94,5 +94,20 @@ private case object OracleDialect extends JdbcDialect { case _ => value } + /** + * The SQL query used to truncate a table. + * @param table The JDBCOptions. + * @param cascade Whether or not to cascade the truncation. Default value is the +*value of isCascadingTruncateTable() + * @return The SQL query to use for truncating a table + */ + override def getTruncateQuery(table: String, +cascade: Option[Boolean] = isCascadingTruncateTable): String = { --- End diff -- ditto. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r168883612 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala --- @@ -89,11 +89,17 @@ private object PostgresDialect extends JdbcDialect { * The SQL query used to truncate a table. For Postgres, the default behaviour is to * also truncate any descendant tables. As this is a (possibly unwanted) side-effect, * the Postgres dialect adds 'ONLY' to truncate only the table in question - * @param table The name of the table. + * @param table The table to truncate + * @param cascade Whether or not to cascade the truncation. Default value is the value of +*isCascadingTruncateTable() --- End diff -- ditto. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r168883542 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala --- @@ -46,4 +46,16 @@ private case object MySQLDialect extends JdbcDialect { } override def isCascadingTruncateTable(): Option[Boolean] = Some(false) + + /** + * The SQL query used to truncate a table. --- End diff -- ditto. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r168883570 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/OracleDialect.scala --- @@ -94,5 +94,20 @@ private case object OracleDialect extends JdbcDialect { case _ => value } + /** + * The SQL query used to truncate a table. --- End diff -- ditto. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r168883553 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala --- @@ -46,4 +46,16 @@ private case object MySQLDialect extends JdbcDialect { } override def isCascadingTruncateTable(): Option[Boolean] = Some(false) + + /** + * The SQL query used to truncate a table. + * @param table The JDBCOptions. + * @param cascade Whether or not to cascade the truncation. Default value is the + *value of isCascadingTruncateTable(). Ignored for MySQL as it is unsupported. + * @return The SQL query to use for truncating a table + */ + override def getTruncateQuery(table: String, +cascade: Option[Boolean] = isCascadingTruncateTable): String = { --- End diff -- ditto. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r168883530 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala --- @@ -42,4 +42,16 @@ private object MsSqlServerDialect extends JdbcDialect { } override def isCascadingTruncateTable(): Option[Boolean] = Some(false) + + /** + * The SQL query used to truncate a table. + * @param table The JDBCOptions. + * @param cascade Whether or not to cascade the truncation. Default value is the + *value of isCascadingTruncateTable(). Ignored for MsSql as it is unsupported. + * @return The SQL query to use for truncating a table + */ + override def getTruncateQuery(table: String, +cascade: Option[Boolean] = isCascadingTruncateTable): String = { --- End diff -- ditto. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r168883267 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/DB2Dialect.scala --- @@ -49,4 +49,16 @@ private object DB2Dialect extends JdbcDialect { } override def isCascadingTruncateTable(): Option[Boolean] = Some(false) + + /** + * The SQL query used to truncate a table. + * @param table The JDBCOptions. + * @param cascade Whether or not to cascade the truncation. Default value is the + *value of isCascadingTruncateTable(). Ignored for DB2 as it is unsupported. + * @return The SQL query to use for truncating a table + */ + override def getTruncateQuery(table: String, +cascade: Option[Boolean] = isCascadingTruncateTable): String = { --- End diff -- ditto. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r168883425 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala --- @@ -42,4 +42,16 @@ private object MsSqlServerDialect extends JdbcDialect { } override def isCascadingTruncateTable(): Option[Boolean] = Some(false) + + /** + * The SQL query used to truncate a table. --- End diff -- ditto. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r168883216 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/DB2Dialect.scala --- @@ -49,4 +49,16 @@ private object DB2Dialect extends JdbcDialect { } override def isCascadingTruncateTable(): Option[Boolean] = Some(false) + + /** + * The SQL query used to truncate a table. --- End diff -- Indentation. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r168883182 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/AggregatedDialect.scala --- @@ -64,7 +64,15 @@ private class AggregatedDialect(dialects: List[JdbcDialect]) extends JdbcDialect } } - override def getTruncateQuery(table: String): String = { -dialects.head.getTruncateQuery(table) + /** + * The SQL query used to truncate a table. + * @param table The JDBCOptions. + * @param cascade Whether or not to cascade the truncation. Default value is the + *value of isCascadingTruncateTable() + * @return The SQL query to use for truncating a table + */ + override def getTruncateQuery(table: String, +cascade: Option[Boolean] = isCascadingTruncateTable): String = { --- End diff -- Indentation? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r168883141 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/AggregatedDialect.scala --- @@ -64,7 +64,15 @@ private class AggregatedDialect(dialects: List[JdbcDialect]) extends JdbcDialect } } - override def getTruncateQuery(table: String): String = { -dialects.head.getTruncateQuery(table) + /** + * The SQL query used to truncate a table. --- End diff -- Indentation? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r168882930 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala --- @@ -102,7 +102,12 @@ object JdbcUtils extends Logging { val dialect = JdbcDialects.get(options.url) val statement = conn.createStatement try { - statement.executeUpdate(dialect.getTruncateQuery(options.table)) + if (options.isCascadeTruncate.isDefined) { +statement.executeUpdate(dialect.getTruncateQuery(options.table, + options.isCascadeTruncate)) + } else { +statement.executeUpdate(dialect.getTruncateQuery(options.table)) + } --- End diff -- Simply, one line if we are using `boolean` instead of `Option`? ``` statement.executeUpdate(dialect.getTruncateQuery(options.table, options.isCascadeTruncate)) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r168882599 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala --- @@ -119,6 +119,8 @@ class JDBCOptions( // // if to truncate the table from the JDBC database val isTruncate = parameters.getOrElse(JDBC_TRUNCATE, "false").toBoolean + + val isCascadeTruncate: Option[Boolean] = parameters.get(JDBC_CASCADE_TRUNCATE).map(_.toBoolean) --- End diff -- Sorry. I'm not sure that I lost the previous context, but what about `getOrElse` like `val isTruncate` on line 121? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r168882085 --- Diff: docs/sql-programming-guide.md --- @@ -1372,6 +1372,13 @@ the following case-insensitive options: This is a JDBC writer related option. When SaveMode.Overwrite is enabled, this option causes Spark to truncate an existing table instead of dropping and recreating it. This can be more efficient, and prevents the table metadata (e.g., indices) from being removed. However, it will not work in some cases, such as when the new data has a different schema. It defaults to false. This option applies only to writing. + + +cascadeTruncate + +This is a JDBC writer related option. If enabled and supported by the JDBC database (PostgreSQL and Oracle at the moment), this options allows execution of a TRUNCATE TABLE t CASCADE. This will affect other tables, and thus should be used with care. This option applies only to writing. --- End diff -- Maybe, do we need a default value description like `It defaults to false`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user Stephan202 commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r165814532 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/DB2Dialect.scala --- @@ -49,4 +49,16 @@ private object DB2Dialect extends JdbcDialect { } override def isCascadingTruncateTable(): Option[Boolean] = Some(false) + + /** + * The SQL query used to truncate a table. + * @param table The JDBCOptions. + * @param cascade Whether or not to cascade the truncation. Default value is the + *value of isCascadingTruncateTable() --- End diff -- For dialects not supporting cascading, should the documentation perhaps indicate that the `cascade` parameter is ignored? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user Stephan202 commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r165814475 --- Diff: docs/sql-programming-guide.md --- @@ -1339,6 +1339,13 @@ the following case-insensitive options: This is a JDBC writer related option. When SaveMode.Overwrite is enabled, this option causes Spark to truncate an existing table instead of dropping and recreating it. This can be more efficient, and prevents the table metadata (e.g., indices) from being removed. However, it will not work in some cases, such as when the new data has a different schema. It defaults to false. This option applies only to writing. + + +cascadeTruncate + +This is a JDBC writer related option. If enabled and supported by the JDBC database (PostgreSQL and Oracle at the moment), this options allows execution of a TRUNCATE TABLE t CASCADE. This will affect other tables, and thus should be used with case. This option applies only to writing. --- End diff -- About the documentation: `s/used with case/used with care/` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user danielvdende commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r159371171 --- Diff: docs/sql-programming-guide.md --- @@ -1339,6 +1339,13 @@ the following case-insensitive options: This is a JDBC writer related option. When SaveMode.Overwrite is enabled, this option causes Spark to truncate an existing table instead of dropping and recreating it. This can be more efficient, and prevents the table metadata (e.g., indices) from being removed. However, it will not work in some cases, such as when the new data has a different schema. It defaults to false. This option applies only to writing. + + +cascadeTruncate + +This is a JDBC writer related option. If enabled and supported by the JDBC database (PostgreSQL and Oracle at the moment), this options allows execution of a TRUNCATE TABLE t CASCADE. This will affect other tables, and thus should be used with case. This option applies only to writing. --- End diff -- @dongjoon-hyun one thing I'm thinking of (just curious to hear your opinion): could we use the value of `isCascadingTruncateTable` for each dialect as the default value for the `cascade` boolean? In that way, there is only a single boolean per dialect specifying what the default behaviour is with regard to cascading during truncates. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user danielvdende commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r159370871 --- Diff: docs/sql-programming-guide.md --- @@ -1339,6 +1339,13 @@ the following case-insensitive options: This is a JDBC writer related option. When SaveMode.Overwrite is enabled, this option causes Spark to truncate an existing table instead of dropping and recreating it. This can be more efficient, and prevents the table metadata (e.g., indices) from being removed. However, it will not work in some cases, such as when the new data has a different schema. It defaults to false. This option applies only to writing. + + +cascadeTruncate + +This is a JDBC writer related option. If enabled and supported by the JDBC database (PostgreSQL and Oracle at the moment), this options allows execution of a TRUNCATE TABLE t CASCADE. This will affect other tables, and thus should be used with case. This option applies only to writing. --- End diff -- @dongjoon-hyun apologies if I misrepresented your comments/opinions from the previous PR, wasn't my intention :-). I've given it some more thought, and I can see you point about the default value. I'll make the change we discussed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r159262664 --- Diff: docs/sql-programming-guide.md --- @@ -1339,6 +1339,13 @@ the following case-insensitive options: This is a JDBC writer related option. When SaveMode.Overwrite is enabled, this option causes Spark to truncate an existing table instead of dropping and recreating it. This can be more efficient, and prevents the table metadata (e.g., indices) from being removed. However, it will not work in some cases, such as when the new data has a different schema. It defaults to false. This option applies only to writing. + + +cascadeTruncate + +This is a JDBC writer related option. If enabled and supported by the JDBC database (PostgreSQL and Oracle at the moment), this options allows execution of a TRUNCATE TABLE t CASCADE. This will affect other tables, and thus should be used with case. This option applies only to writing. --- End diff -- Hi, @danielvdende . I think you can ignore my comment in previous PR. There were many directional comments on that PR and it's not the final one. Your previous PR is merged by @gatorsmile . For me, I also still don't agree on the default value inside `JdbcDialects.scala` in this PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user danielvdende commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r159183234 --- Diff: docs/sql-programming-guide.md --- @@ -1339,6 +1339,13 @@ the following case-insensitive options: This is a JDBC writer related option. When SaveMode.Overwrite is enabled, this option causes Spark to truncate an existing table instead of dropping and recreating it. This can be more efficient, and prevents the table metadata (e.g., indices) from being removed. However, it will not work in some cases, such as when the new data has a different schema. It defaults to false. This option applies only to writing. + + +cascadeTruncate + +This is a JDBC writer related option. If enabled and supported by the JDBC database (PostgreSQL and Oracle at the moment), this options allows execution of a TRUNCATE TABLE t CASCADE. This will affect other tables, and thus should be used with case. This option applies only to writing. --- End diff -- I think it raises the question of how complete/incomplete the Spark JDBC API should be, and what the use cases are that it should serve. For the most simple cases, in which no key constraints are set between tables, you won't need this option. However, as soon as foreign key constraints are introduced, it is very important. I agree that not every functionality from SQL (dialects) should be included, but I personally feel this is quite fundamental functionality. Moreover, as it's configuration option users that don't want it also don't have to use it. I think we also discussed this functionality in previous PR with @dongjoon-hyun here: [SPARK-22729](https://github.com/apache/spark/pull/19911) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r159118000 --- Diff: docs/sql-programming-guide.md --- @@ -1339,6 +1339,13 @@ the following case-insensitive options: This is a JDBC writer related option. When SaveMode.Overwrite is enabled, this option causes Spark to truncate an existing table instead of dropping and recreating it. This can be more efficient, and prevents the table metadata (e.g., indices) from being removed. However, it will not work in some cases, such as when the new data has a different schema. It defaults to false. This option applies only to writing. + + +cascadeTruncate + +This is a JDBC writer related option. If enabled and supported by the JDBC database (PostgreSQL and Oracle at the moment), this options allows execution of a TRUNCATE TABLE t CASCADE. This will affect other tables, and thus should be used with case. This option applies only to writing. --- End diff -- This conf only takes an effect when `SaveMode.Overwrite` is enabled and `truncate` is set to true. My doubt is the usage scenario of this extra conf does not benefit most users. Could we hold this PR? When the other users have similar requests, we can revisit it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user danielvdende commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r158557222 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala --- @@ -120,11 +121,12 @@ abstract class JdbcDialect extends Serializable { * The SQL query that should be used to truncate a table. Dialects can override this method to * return a query that is suitable for a particular database. For PostgreSQL, for instance, * a different query is used to prevent "TRUNCATE" affecting other tables. - * @param table The name of the table. + * @param table The table to truncate + * @param cascade (OPTIONAL) Whether or not to cascade the truncation. Default: false * @return The SQL query to use for truncating a table */ @Since("2.3.0") - def getTruncateQuery(table: String): String = { + def getTruncateQuery(table: String, cascade: Boolean = false): String = { --- End diff -- I think the default for all dialects should be false, regardless of whether the cascade feature is even supported. And for those for which it is supported, it should default to false. It's a feature that you should only use if you explicitly want to cascade imho. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r158556753 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala --- @@ -120,11 +121,12 @@ abstract class JdbcDialect extends Serializable { * The SQL query that should be used to truncate a table. Dialects can override this method to * return a query that is suitable for a particular database. For PostgreSQL, for instance, * a different query is used to prevent "TRUNCATE" affecting other tables. - * @param table The name of the table. + * @param table The table to truncate + * @param cascade (OPTIONAL) Whether or not to cascade the truncation. Default: false * @return The SQL query to use for truncating a table */ @Since("2.3.0") - def getTruncateQuery(table: String): String = { + def getTruncateQuery(table: String, cascade: Boolean = false): String = { --- End diff -- That is the main reason I request the change. I don't think this implementation is correct on for all dialects. Anyway, it's only my opinion. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user danielvdende commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r158555400 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala --- @@ -120,11 +121,12 @@ abstract class JdbcDialect extends Serializable { * The SQL query that should be used to truncate a table. Dialects can override this method to * return a query that is suitable for a particular database. For PostgreSQL, for instance, * a different query is used to prevent "TRUNCATE" affecting other tables. - * @param table The name of the table. + * @param table The table to truncate + * @param cascade (OPTIONAL) Whether or not to cascade the truncation. Default: false * @return The SQL query to use for truncating a table */ @Since("2.3.0") - def getTruncateQuery(table: String): String = { + def getTruncateQuery(table: String, cascade: Boolean = false): String = { --- End diff -- That would force all dialects to implement this method though, which would lead to unnecessary code duplication. Moreover, removing the default value here would change the public API, as it would force others who have written custom dialects to change calls to this method. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20057#discussion_r158549591 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala --- @@ -120,11 +121,12 @@ abstract class JdbcDialect extends Serializable { * The SQL query that should be used to truncate a table. Dialects can override this method to * return a query that is suitable for a particular database. For PostgreSQL, for instance, * a different query is used to prevent "TRUNCATE" affecting other tables. - * @param table The name of the table. + * @param table The table to truncate + * @param cascade (OPTIONAL) Whether or not to cascade the truncation. Default: false * @return The SQL query to use for truncating a table */ @Since("2.3.0") - def getTruncateQuery(table: String): String = { + def getTruncateQuery(table: String, cascade: Boolean = false): String = { --- End diff -- Could you remove the default value here? It could be safe to prevent accidental inheritance of this wrong implementation. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...
GitHub user danielvdende opened a pull request: https://github.com/apache/spark/pull/20057 [SPARK-22880][SQL] Add cascadeTruncate option to JDBC datasource This commit adds the `cascadeTruncate` option to the JDBC datasource API, for databases that support this functionality (PostgreSQL and Oracle at the moment). This allows for applying a cascading truncate that affects tables that have foreign key constraints on the table being truncated. ## What changes were proposed in this pull request? Add `cascadeTruncate` option to JDBC datasource API. Allow this to affect the `TRUNCATE` query for databases that support this option. ## How was this patch tested? Existing tests for `truncateQuery` were updated. Also, an additional test was added to ensure that the correct syntax was applied, and that enabling the config for databases that do not support this option does not result in invalid queries. You can merge this pull request into a Git repository by running: $ git pull https://github.com/danielvdende/spark SPARK-22880 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20057.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20057 commit 842eca7fa282a81f0f4d459242869c022ef21080 Author: Daniel van der EndeDate: 2017-12-22T15:58:28Z [SPARK-22880][SQL] Add cascadeTruncate option to JDBC datasource This commit adds the cascadeTruncate option to the JDBC datasource API, for databases that support this functionality (PostgreSQL and Oracle at the moment). This allows for applying a cascading truncate that affects tables that have foreign key constraints on the table being truncated. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org