[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql
[ https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17390517#comment-17390517 ] ASF GitHub Bot commented on HUDI-2208: -- hudi-bot edited a comment on pull request #3328: URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427 ## CI report: * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN * f8b449c31ee8601542f00e3cc15fbcab77da7787 UNKNOWN * bb9a6d83361f3a652b2c902b1b3dc846de617d93 UNKNOWN * e88244d233d323364916c4fc240083566ddc4e56 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1272) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Bulk Insert For Spark Sql > - > > Key: HUDI-2208 > URL: https://issues.apache.org/jira/browse/HUDI-2208 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Support the bulk insert for spark sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql
[ https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17390493#comment-17390493 ] ASF GitHub Bot commented on HUDI-2208: -- hudi-bot edited a comment on pull request #3328: URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427 ## CI report: * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN * f8b449c31ee8601542f00e3cc15fbcab77da7787 UNKNOWN * bb9a6d83361f3a652b2c902b1b3dc846de617d93 UNKNOWN * 46f6c46dd03ca0d342ad0b8e774f2c8b77429c0d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1194) * e88244d233d323364916c4fc240083566ddc4e56 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1272) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Bulk Insert For Spark Sql > - > > Key: HUDI-2208 > URL: https://issues.apache.org/jira/browse/HUDI-2208 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Support the bulk insert for spark sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql
[ https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17390465#comment-17390465 ] ASF GitHub Bot commented on HUDI-2208: -- hudi-bot edited a comment on pull request #3328: URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427 ## CI report: * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN * f8b449c31ee8601542f00e3cc15fbcab77da7787 UNKNOWN * bb9a6d83361f3a652b2c902b1b3dc846de617d93 UNKNOWN * 46f6c46dd03ca0d342ad0b8e774f2c8b77429c0d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1194) * e88244d233d323364916c4fc240083566ddc4e56 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Bulk Insert For Spark Sql > - > > Key: HUDI-2208 > URL: https://issues.apache.org/jira/browse/HUDI-2208 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Support the bulk insert for spark sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql
[ https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17390446#comment-17390446 ] ASF GitHub Bot commented on HUDI-2208: -- pengzhiwei2018 commented on a change in pull request #3328: URL: https://github.com/apache/hudi/pull/3328#discussion_r679789141 ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala ## @@ -209,19 +209,32 @@ object InsertIntoHoodieTableCommand { .getOrElse(INSERT_DROP_DUPS_OPT_KEY.defaultValue) .toBoolean -val operation = if (isOverwrite) { - if (table.partitionColumnNames.nonEmpty) { -INSERT_OVERWRITE_OPERATION_OPT_VAL // overwrite partition - } else { -INSERT_OPERATION_OPT_VAL +val enableBulkInsert = parameters.getOrElse(DataSourceWriteOptions.SQL_ENABLE_BULK_INSERT.key, + DataSourceWriteOptions.SQL_ENABLE_BULK_INSERT.defaultValue()).toBoolean +val isPartitionedTable = table.partitionColumnNames.nonEmpty +val isPrimaryKeyTable = primaryColumns.nonEmpty +val operation = + (isPrimaryKeyTable, enableBulkInsert, isOverwrite, dropDuplicate) match { +case (true, true, _, _) => + throw new IllegalArgumentException(s"Table with primaryKey can not use bulk insert.") +case (_, true, true, _) if isPartitionedTable => + throw new IllegalArgumentException(s"Insert Overwrite Partition can not use bulk insert.") +case (_, true, _, true) => + throw new IllegalArgumentException(s"Bulk insert cannot support drop duplication." + +s" Please disable $INSERT_DROP_DUPS_OPT_KEY and try again.") +// if enableBulkInsert is true, use bulk insert for the insert overwrite non-partitioned table. +case (_, true, true, _) if !isPartitionedTable => BULK_INSERT_OPERATION_OPT_VAL Review comment: HoodieSparkSqlWriter cannot handle the mode for` insert overwrite partitioned table`, we should translate the write type to INSERT_OVERWRITE_TABLE for such case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Bulk Insert For Spark Sql > - > > Key: HUDI-2208 > URL: https://issues.apache.org/jira/browse/HUDI-2208 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Support the bulk insert for spark sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql
[ https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17389901#comment-17389901 ] ASF GitHub Bot commented on HUDI-2208: -- nsivabalan commented on a change in pull request #3328: URL: https://github.com/apache/hudi/pull/3328#discussion_r679153080 ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala ## @@ -209,19 +209,32 @@ object InsertIntoHoodieTableCommand { .getOrElse(INSERT_DROP_DUPS_OPT_KEY.defaultValue) .toBoolean -val operation = if (isOverwrite) { - if (table.partitionColumnNames.nonEmpty) { -INSERT_OVERWRITE_OPERATION_OPT_VAL // overwrite partition - } else { -INSERT_OPERATION_OPT_VAL +val enableBulkInsert = parameters.getOrElse(DataSourceWriteOptions.SQL_ENABLE_BULK_INSERT.key, + DataSourceWriteOptions.SQL_ENABLE_BULK_INSERT.defaultValue()).toBoolean +val isPartitionedTable = table.partitionColumnNames.nonEmpty +val isPrimaryKeyTable = primaryColumns.nonEmpty +val operation = + (isPrimaryKeyTable, enableBulkInsert, isOverwrite, dropDuplicate) match { +case (true, true, _, _) => + throw new IllegalArgumentException(s"Table with primaryKey can not use bulk insert.") +case (_, true, true, _) if isPartitionedTable => + throw new IllegalArgumentException(s"Insert Overwrite Partition can not use bulk insert.") +case (_, true, _, true) => + throw new IllegalArgumentException(s"Bulk insert cannot support drop duplication." + +s" Please disable $INSERT_DROP_DUPS_OPT_KEY and try again.") +// if enableBulkInsert is true, use bulk insert for the insert overwrite non-partitioned table. +case (_, true, true, _) if !isPartitionedTable => BULK_INSERT_OPERATION_OPT_VAL Review comment: also, if there are any valid optimizations, then probably we should move it to HoodiesparkSqlSwriter so that both spark datasource and sql dml benefits :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Bulk Insert For Spark Sql > - > > Key: HUDI-2208 > URL: https://issues.apache.org/jira/browse/HUDI-2208 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Support the bulk insert for spark sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql
[ https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17388522#comment-17388522 ] ASF GitHub Bot commented on HUDI-2208: -- pengzhiwei2018 commented on a change in pull request #3328: URL: https://github.com/apache/hudi/pull/3328#discussion_r678031341 ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala ## @@ -159,7 +159,10 @@ object HoodieSparkSqlWriter { // Convert to RDD[HoodieRecord] val genericRecords: RDD[GenericRecord] = HoodieSparkUtils.createRdd(df, schema, structName, nameSpace) - val shouldCombine = parameters(INSERT_DROP_DUPS_OPT_KEY.key()).toBoolean || operation.equals(WriteOperationType.UPSERT); + val shouldCombine = parameters(INSERT_DROP_DUPS_OPT_KEY.key()).toBoolean || +operation.equals(WriteOperationType.UPSERT) || + parameters.getOrElse(HoodieWriteConfig.COMBINE_BEFORE_INSERT_PROP.key(), Review comment: If the `COMBINE_BEFORE_INSERT_PROP` has enabled, `SparkInsertCommitActionExecutor` will do the `combineOnCondition` in `AbstractWriteHelper` which will need the `precombine` value . But here we have ignored the case of `COMBINE_BEFORE_INSERT_PROP` which will not extract the `precombine` value to the HoodiePayload. ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala ## @@ -209,19 +209,32 @@ object InsertIntoHoodieTableCommand { .getOrElse(INSERT_DROP_DUPS_OPT_KEY.defaultValue) .toBoolean -val operation = if (isOverwrite) { - if (table.partitionColumnNames.nonEmpty) { -INSERT_OVERWRITE_OPERATION_OPT_VAL // overwrite partition - } else { -INSERT_OPERATION_OPT_VAL +val enableBulkInsert = parameters.getOrElse(DataSourceWriteOptions.SQL_ENABLE_BULK_INSERT.key, + DataSourceWriteOptions.SQL_ENABLE_BULK_INSERT.defaultValue()).toBoolean +val isPartitionedTable = table.partitionColumnNames.nonEmpty +val isPrimaryKeyTable = primaryColumns.nonEmpty +val operation = + (isPrimaryKeyTable, enableBulkInsert, isOverwrite, dropDuplicate) match { +case (true, true, _, _) => + throw new IllegalArgumentException(s"Table with primaryKey can not use bulk insert.") Review comment: Because currently we will do the primary key uniqueness check when inserting data to the pk-table, just like what database does. Bulk insert currently cannot do such things. ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala ## @@ -243,6 +256,8 @@ object InsertIntoHoodieTableCommand { RECORDKEY_FIELD_OPT_KEY.key -> primaryColumns.mkString(","), PARTITIONPATH_FIELD_OPT_KEY.key -> partitionFields, PAYLOAD_CLASS_OPT_KEY.key -> payloadClassName, +ENABLE_ROW_WRITER_OPT_KEY.key -> enableBulkInsert.toString, +HoodieWriteConfig.COMBINE_BEFORE_INSERT_PROP.key -> isPrimaryKeyTable.toString, // if the table has primaryKey, enable the combine Review comment: We need to ensure that data is unique for pk-table just like database, so I do combine for the input. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Bulk Insert For Spark Sql > - > > Key: HUDI-2208 > URL: https://issues.apache.org/jira/browse/HUDI-2208 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Support the bulk insert for spark sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql
[ https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17388246#comment-17388246 ] ASF GitHub Bot commented on HUDI-2208: -- nsivabalan commented on a change in pull request #3328: URL: https://github.com/apache/hudi/pull/3328#discussion_r677405475 ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala ## @@ -209,19 +209,32 @@ object InsertIntoHoodieTableCommand { .getOrElse(INSERT_DROP_DUPS_OPT_KEY.defaultValue) .toBoolean -val operation = if (isOverwrite) { - if (table.partitionColumnNames.nonEmpty) { -INSERT_OVERWRITE_OPERATION_OPT_VAL // overwrite partition - } else { -INSERT_OPERATION_OPT_VAL +val enableBulkInsert = parameters.getOrElse(DataSourceWriteOptions.SQL_ENABLE_BULK_INSERT.key, + DataSourceWriteOptions.SQL_ENABLE_BULK_INSERT.defaultValue()).toBoolean +val isPartitionedTable = table.partitionColumnNames.nonEmpty +val isPrimaryKeyTable = primaryColumns.nonEmpty +val operation = + (isPrimaryKeyTable, enableBulkInsert, isOverwrite, dropDuplicate) match { +case (true, true, _, _) => + throw new IllegalArgumentException(s"Table with primaryKey can not use bulk insert.") Review comment: may I know why do we have this constraint? ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala ## @@ -159,7 +159,10 @@ object HoodieSparkSqlWriter { // Convert to RDD[HoodieRecord] val genericRecords: RDD[GenericRecord] = HoodieSparkUtils.createRdd(df, schema, structName, nameSpace) - val shouldCombine = parameters(INSERT_DROP_DUPS_OPT_KEY.key()).toBoolean || operation.equals(WriteOperationType.UPSERT); + val shouldCombine = parameters(INSERT_DROP_DUPS_OPT_KEY.key()).toBoolean || +operation.equals(WriteOperationType.UPSERT) || + parameters.getOrElse(HoodieWriteConfig.COMBINE_BEFORE_INSERT_PROP.key(), Review comment: sorry I don't get you. Precombine is just one field as is right. Not sure what do you mean by "not compute the preCombine field value"? can you throw some more light please. In general, for inserts we don't do any precombine. But if this config (COMBINE_BEFORE_INSERT_PROP) is enabled, we need to do preCombine. ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala ## @@ -243,6 +256,8 @@ object InsertIntoHoodieTableCommand { RECORDKEY_FIELD_OPT_KEY.key -> primaryColumns.mkString(","), PARTITIONPATH_FIELD_OPT_KEY.key -> partitionFields, PAYLOAD_CLASS_OPT_KEY.key -> payloadClassName, +ENABLE_ROW_WRITER_OPT_KEY.key -> enableBulkInsert.toString, Review comment: you can add one, but make the default as true. ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala ## @@ -209,19 +209,32 @@ object InsertIntoHoodieTableCommand { .getOrElse(INSERT_DROP_DUPS_OPT_KEY.defaultValue) .toBoolean -val operation = if (isOverwrite) { - if (table.partitionColumnNames.nonEmpty) { -INSERT_OVERWRITE_OPERATION_OPT_VAL // overwrite partition - } else { -INSERT_OPERATION_OPT_VAL +val enableBulkInsert = parameters.getOrElse(DataSourceWriteOptions.SQL_ENABLE_BULK_INSERT.key, + DataSourceWriteOptions.SQL_ENABLE_BULK_INSERT.defaultValue()).toBoolean +val isPartitionedTable = table.partitionColumnNames.nonEmpty +val isPrimaryKeyTable = primaryColumns.nonEmpty +val operation = + (isPrimaryKeyTable, enableBulkInsert, isOverwrite, dropDuplicate) match { +case (true, true, _, _) => + throw new IllegalArgumentException(s"Table with primaryKey can not use bulk insert.") +case (_, true, true, _) if isPartitionedTable => + throw new IllegalArgumentException(s"Insert Overwrite Partition can not use bulk insert.") +case (_, true, _, true) => + throw new IllegalArgumentException(s"Bulk insert cannot support drop duplication." + +s" Please disable $INSERT_DROP_DUPS_OPT_KEY and try again.") +// if enableBulkInsert is true, use bulk insert for the insert overwrite non-partitioned table. +case (_, true, true, _) if !isPartitionedTable => BULK_INSERT_OPERATION_OPT_VAL Review comment: Am just trying to understand the sql dml here. We already handle save modes within HoodieSparkSqlWriter. So, trying to understand whats required in addition to that? Trying to avoid duplication if possible. I mean, for some of the cases listed here, its just about overWrite mode. ## File path: hu
[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql
[ https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17388127#comment-17388127 ] ASF GitHub Bot commented on HUDI-2208: -- leesf commented on a change in pull request #3328: URL: https://github.com/apache/hudi/pull/3328#discussion_r677536750 ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala ## @@ -243,6 +256,8 @@ object InsertIntoHoodieTableCommand { RECORDKEY_FIELD_OPT_KEY.key -> primaryColumns.mkString(","), PARTITIONPATH_FIELD_OPT_KEY.key -> partitionFields, PAYLOAD_CLASS_OPT_KEY.key -> payloadClassName, +ENABLE_ROW_WRITER_OPT_KEY.key -> enableBulkInsert.toString, Review comment: so here why there is no option for bulk_insert without row_writer? must in row_writer? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Bulk Insert For Spark Sql > - > > Key: HUDI-2208 > URL: https://issues.apache.org/jira/browse/HUDI-2208 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Support the bulk insert for spark sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql
[ https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17388126#comment-17388126 ] ASF GitHub Bot commented on HUDI-2208: -- leesf commented on a change in pull request #3328: URL: https://github.com/apache/hudi/pull/3328#discussion_r677534597 ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala ## @@ -243,6 +256,8 @@ object InsertIntoHoodieTableCommand { RECORDKEY_FIELD_OPT_KEY.key -> primaryColumns.mkString(","), PARTITIONPATH_FIELD_OPT_KEY.key -> partitionFields, PAYLOAD_CLASS_OPT_KEY.key -> payloadClassName, +ENABLE_ROW_WRITER_OPT_KEY.key -> enableBulkInsert.toString, +HoodieWriteConfig.COMBINE_BEFORE_INSERT_PROP.key -> isPrimaryKeyTable.toString, // if the table has primaryKey, enable the combine Review comment: for pk-table and in insert mode, there is no need to combine the input, right? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Bulk Insert For Spark Sql > - > > Key: HUDI-2208 > URL: https://issues.apache.org/jira/browse/HUDI-2208 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Support the bulk insert for spark sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql
[ https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17388122#comment-17388122 ] ASF GitHub Bot commented on HUDI-2208: -- leesf commented on a change in pull request #3328: URL: https://github.com/apache/hudi/pull/3328#discussion_r677534597 ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala ## @@ -243,6 +256,8 @@ object InsertIntoHoodieTableCommand { RECORDKEY_FIELD_OPT_KEY.key -> primaryColumns.mkString(","), PARTITIONPATH_FIELD_OPT_KEY.key -> partitionFields, PAYLOAD_CLASS_OPT_KEY.key -> payloadClassName, +ENABLE_ROW_WRITER_OPT_KEY.key -> enableBulkInsert.toString, +HoodieWriteConfig.COMBINE_BEFORE_INSERT_PROP.key -> isPrimaryKeyTable.toString, // if the table has primaryKey, enable the combine Review comment: even for pk-table, in insert mode, there is no need to combine the input, right? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Bulk Insert For Spark Sql > - > > Key: HUDI-2208 > URL: https://issues.apache.org/jira/browse/HUDI-2208 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Support the bulk insert for spark sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql
[ https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387961#comment-17387961 ] ASF GitHub Bot commented on HUDI-2208: -- hudi-bot edited a comment on pull request #3328: URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427 ## CI report: * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN * f8b449c31ee8601542f00e3cc15fbcab77da7787 UNKNOWN * bb9a6d83361f3a652b2c902b1b3dc846de617d93 UNKNOWN * 46f6c46dd03ca0d342ad0b8e774f2c8b77429c0d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1194) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Bulk Insert For Spark Sql > - > > Key: HUDI-2208 > URL: https://issues.apache.org/jira/browse/HUDI-2208 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Support the bulk insert for spark sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql
[ https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387926#comment-17387926 ] ASF GitHub Bot commented on HUDI-2208: -- hudi-bot edited a comment on pull request #3328: URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427 ## CI report: * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN * f8b449c31ee8601542f00e3cc15fbcab77da7787 UNKNOWN * a427b11ca3c1146c57cdd0ce502ae56da01e66fc Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1186) * bb9a6d83361f3a652b2c902b1b3dc846de617d93 UNKNOWN * 46f6c46dd03ca0d342ad0b8e774f2c8b77429c0d Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1194) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Bulk Insert For Spark Sql > - > > Key: HUDI-2208 > URL: https://issues.apache.org/jira/browse/HUDI-2208 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Support the bulk insert for spark sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql
[ https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387911#comment-17387911 ] ASF GitHub Bot commented on HUDI-2208: -- hudi-bot edited a comment on pull request #3328: URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427 ## CI report: * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN * f8b449c31ee8601542f00e3cc15fbcab77da7787 UNKNOWN * a427b11ca3c1146c57cdd0ce502ae56da01e66fc Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1186) * bb9a6d83361f3a652b2c902b1b3dc846de617d93 UNKNOWN * 46f6c46dd03ca0d342ad0b8e774f2c8b77429c0d UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Bulk Insert For Spark Sql > - > > Key: HUDI-2208 > URL: https://issues.apache.org/jira/browse/HUDI-2208 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Support the bulk insert for spark sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql
[ https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387906#comment-17387906 ] ASF GitHub Bot commented on HUDI-2208: -- hudi-bot edited a comment on pull request #3328: URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427 ## CI report: * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN * f8b449c31ee8601542f00e3cc15fbcab77da7787 UNKNOWN * a427b11ca3c1146c57cdd0ce502ae56da01e66fc Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1186) * bb9a6d83361f3a652b2c902b1b3dc846de617d93 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Bulk Insert For Spark Sql > - > > Key: HUDI-2208 > URL: https://issues.apache.org/jira/browse/HUDI-2208 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Support the bulk insert for spark sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql
[ https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387812#comment-17387812 ] ASF GitHub Bot commented on HUDI-2208: -- hudi-bot edited a comment on pull request #3328: URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427 ## CI report: * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN * f8b449c31ee8601542f00e3cc15fbcab77da7787 UNKNOWN * a427b11ca3c1146c57cdd0ce502ae56da01e66fc Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1186) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Bulk Insert For Spark Sql > - > > Key: HUDI-2208 > URL: https://issues.apache.org/jira/browse/HUDI-2208 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Support the bulk insert for spark sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql
[ https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387805#comment-17387805 ] ASF GitHub Bot commented on HUDI-2208: -- hudi-bot edited a comment on pull request #3328: URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427 ## CI report: * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN * f8b449c31ee8601542f00e3cc15fbcab77da7787 UNKNOWN * 80aeae998fc99dd3f31662a78c8dc5a9dce57252 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1179) * a427b11ca3c1146c57cdd0ce502ae56da01e66fc Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1186) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Bulk Insert For Spark Sql > - > > Key: HUDI-2208 > URL: https://issues.apache.org/jira/browse/HUDI-2208 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Support the bulk insert for spark sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql
[ https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387785#comment-17387785 ] ASF GitHub Bot commented on HUDI-2208: -- hudi-bot edited a comment on pull request #3328: URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427 ## CI report: * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN * f8b449c31ee8601542f00e3cc15fbcab77da7787 UNKNOWN * 80aeae998fc99dd3f31662a78c8dc5a9dce57252 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1179) * a427b11ca3c1146c57cdd0ce502ae56da01e66fc UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Bulk Insert For Spark Sql > - > > Key: HUDI-2208 > URL: https://issues.apache.org/jira/browse/HUDI-2208 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Support the bulk insert for spark sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql
[ https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387780#comment-17387780 ] ASF GitHub Bot commented on HUDI-2208: -- hudi-bot edited a comment on pull request #3328: URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427 ## CI report: * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN * f8b449c31ee8601542f00e3cc15fbcab77da7787 UNKNOWN * 80aeae998fc99dd3f31662a78c8dc5a9dce57252 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1179) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Bulk Insert For Spark Sql > - > > Key: HUDI-2208 > URL: https://issues.apache.org/jira/browse/HUDI-2208 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Support the bulk insert for spark sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql
[ https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387756#comment-17387756 ] ASF GitHub Bot commented on HUDI-2208: -- hudi-bot edited a comment on pull request #3328: URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427 ## CI report: * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN * f8b449c31ee8601542f00e3cc15fbcab77da7787 UNKNOWN * 0f565dae9227a4653aed57ee74937d697b6a583a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1165) * 80aeae998fc99dd3f31662a78c8dc5a9dce57252 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1179) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Bulk Insert For Spark Sql > - > > Key: HUDI-2208 > URL: https://issues.apache.org/jira/browse/HUDI-2208 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Support the bulk insert for spark sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql
[ https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387743#comment-17387743 ] ASF GitHub Bot commented on HUDI-2208: -- hudi-bot edited a comment on pull request #3328: URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427 ## CI report: * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN * f8b449c31ee8601542f00e3cc15fbcab77da7787 UNKNOWN * 0f565dae9227a4653aed57ee74937d697b6a583a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1165) * 80aeae998fc99dd3f31662a78c8dc5a9dce57252 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Bulk Insert For Spark Sql > - > > Key: HUDI-2208 > URL: https://issues.apache.org/jira/browse/HUDI-2208 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Support the bulk insert for spark sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql
[ https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387461#comment-17387461 ] ASF GitHub Bot commented on HUDI-2208: -- hudi-bot edited a comment on pull request #3328: URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427 ## CI report: * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN * f8b449c31ee8601542f00e3cc15fbcab77da7787 UNKNOWN * 0f565dae9227a4653aed57ee74937d697b6a583a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1165) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Bulk Insert For Spark Sql > - > > Key: HUDI-2208 > URL: https://issues.apache.org/jira/browse/HUDI-2208 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Support the bulk insert for spark sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql
[ https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387409#comment-17387409 ] ASF GitHub Bot commented on HUDI-2208: -- hudi-bot edited a comment on pull request #3328: URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427 ## CI report: * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN * f8b449c31ee8601542f00e3cc15fbcab77da7787 UNKNOWN * ceeb9a317a103002e7f6cd198e19310955ab7572 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1162) * 0f565dae9227a4653aed57ee74937d697b6a583a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1165) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Bulk Insert For Spark Sql > - > > Key: HUDI-2208 > URL: https://issues.apache.org/jira/browse/HUDI-2208 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Support the bulk insert for spark sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql
[ https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387375#comment-17387375 ] ASF GitHub Bot commented on HUDI-2208: -- hudi-bot edited a comment on pull request #3328: URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427 ## CI report: * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN * f8b449c31ee8601542f00e3cc15fbcab77da7787 UNKNOWN * ceeb9a317a103002e7f6cd198e19310955ab7572 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1162) * 0f565dae9227a4653aed57ee74937d697b6a583a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Bulk Insert For Spark Sql > - > > Key: HUDI-2208 > URL: https://issues.apache.org/jira/browse/HUDI-2208 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Support the bulk insert for spark sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql
[ https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387372#comment-17387372 ] ASF GitHub Bot commented on HUDI-2208: -- hudi-bot edited a comment on pull request #3328: URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427 ## CI report: * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN * 8d35aa66c41c3dc087a8b1514aa12274c4b661ca Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1144) * f8b449c31ee8601542f00e3cc15fbcab77da7787 UNKNOWN * ceeb9a317a103002e7f6cd198e19310955ab7572 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1162) * 0f565dae9227a4653aed57ee74937d697b6a583a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Bulk Insert For Spark Sql > - > > Key: HUDI-2208 > URL: https://issues.apache.org/jira/browse/HUDI-2208 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Support the bulk insert for spark sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql
[ https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387367#comment-17387367 ] ASF GitHub Bot commented on HUDI-2208: -- hudi-bot edited a comment on pull request #3328: URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427 ## CI report: * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN * 8d35aa66c41c3dc087a8b1514aa12274c4b661ca Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1144) * f8b449c31ee8601542f00e3cc15fbcab77da7787 UNKNOWN * ceeb9a317a103002e7f6cd198e19310955ab7572 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Bulk Insert For Spark Sql > - > > Key: HUDI-2208 > URL: https://issues.apache.org/jira/browse/HUDI-2208 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Support the bulk insert for spark sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql
[ https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387362#comment-17387362 ] ASF GitHub Bot commented on HUDI-2208: -- hudi-bot edited a comment on pull request #3328: URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427 ## CI report: * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN * 8d35aa66c41c3dc087a8b1514aa12274c4b661ca Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1144) * f8b449c31ee8601542f00e3cc15fbcab77da7787 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Bulk Insert For Spark Sql > - > > Key: HUDI-2208 > URL: https://issues.apache.org/jira/browse/HUDI-2208 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Support the bulk insert for spark sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql
[ https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387360#comment-17387360 ] ASF GitHub Bot commented on HUDI-2208: -- pengzhiwei2018 commented on a change in pull request #3328: URL: https://github.com/apache/hudi/pull/3328#discussion_r676621769 ## File path: hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala ## @@ -247,6 +247,13 @@ object DataSourceWriteOptions { .defaultValue("false") .withDocumentation("When set to true, will perform write operations directly using the spark native " + "`Row` representation, avoiding any additional conversion costs.") + /** + * Enable the bulk insert for sql insert statement. + */ + val SQL_ENABLE_BULK_INSERT:ConfigProperty[String] = ConfigProperty +.key("hoodie.sq.bulk.insert.enable") Review comment: Good catch! ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala ## @@ -243,6 +256,8 @@ object InsertIntoHoodieTableCommand { RECORDKEY_FIELD_OPT_KEY.key -> primaryColumns.mkString(","), PARTITIONPATH_FIELD_OPT_KEY.key -> partitionFields, PAYLOAD_CLASS_OPT_KEY.key -> payloadClassName, +ENABLE_ROW_WRITER_OPT_KEY.key -> enableBulkInsert.toString, Review comment: Currently we already have the ENABLE_ROW_WRITER_OPT_KEY to config the Row write. The enableBulkInsert is only used to control the row write for bulk insert. ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala ## @@ -243,6 +256,8 @@ object InsertIntoHoodieTableCommand { RECORDKEY_FIELD_OPT_KEY.key -> primaryColumns.mkString(","), PARTITIONPATH_FIELD_OPT_KEY.key -> partitionFields, PAYLOAD_CLASS_OPT_KEY.key -> payloadClassName, +ENABLE_ROW_WRITER_OPT_KEY.key -> enableBulkInsert.toString, +HoodieWriteConfig.COMBINE_BEFORE_INSERT_PROP.key -> isPrimaryKeyTable.toString, // if the table has primaryKey, enable the combine Review comment: Because we must guarantee the uniqueness for p-k table. So we should combine the input for pk-table. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Bulk Insert For Spark Sql > - > > Key: HUDI-2208 > URL: https://issues.apache.org/jira/browse/HUDI-2208 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Support the bulk insert for spark sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql
[ https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387284#comment-17387284 ] ASF GitHub Bot commented on HUDI-2208: -- leesf commented on a change in pull request #3328: URL: https://github.com/apache/hudi/pull/3328#discussion_r676542684 ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala ## @@ -243,6 +256,8 @@ object InsertIntoHoodieTableCommand { RECORDKEY_FIELD_OPT_KEY.key -> primaryColumns.mkString(","), PARTITIONPATH_FIELD_OPT_KEY.key -> partitionFields, PAYLOAD_CLASS_OPT_KEY.key -> payloadClassName, +ENABLE_ROW_WRITER_OPT_KEY.key -> enableBulkInsert.toString, +HoodieWriteConfig.COMBINE_BEFORE_INSERT_PROP.key -> isPrimaryKeyTable.toString, // if the table has primaryKey, enable the combine Review comment: why the table has primaryKey, should enable the combine? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Bulk Insert For Spark Sql > - > > Key: HUDI-2208 > URL: https://issues.apache.org/jira/browse/HUDI-2208 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Support the bulk insert for spark sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql
[ https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387283#comment-17387283 ] ASF GitHub Bot commented on HUDI-2208: -- leesf commented on a change in pull request #3328: URL: https://github.com/apache/hudi/pull/3328#discussion_r676542368 ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala ## @@ -243,6 +256,8 @@ object InsertIntoHoodieTableCommand { RECORDKEY_FIELD_OPT_KEY.key -> primaryColumns.mkString(","), PARTITIONPATH_FIELD_OPT_KEY.key -> partitionFields, PAYLOAD_CLASS_OPT_KEY.key -> payloadClassName, +ENABLE_ROW_WRITER_OPT_KEY.key -> enableBulkInsert.toString, Review comment: here enableBulkInsert means using ROW_WRITER, should we also introduce another config to control the ROW_WRITER? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Bulk Insert For Spark Sql > - > > Key: HUDI-2208 > URL: https://issues.apache.org/jira/browse/HUDI-2208 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Support the bulk insert for spark sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql
[ https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387279#comment-17387279 ] ASF GitHub Bot commented on HUDI-2208: -- leesf commented on a change in pull request #3328: URL: https://github.com/apache/hudi/pull/3328#discussion_r676541180 ## File path: hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala ## @@ -247,6 +247,13 @@ object DataSourceWriteOptions { .defaultValue("false") .withDocumentation("When set to true, will perform write operations directly using the spark native " + "`Row` representation, avoiding any additional conversion costs.") + /** + * Enable the bulk insert for sql insert statement. + */ + val SQL_ENABLE_BULK_INSERT:ConfigProperty[String] = ConfigProperty +.key("hoodie.sq.bulk.insert.enable") Review comment: nit: sql -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Bulk Insert For Spark Sql > - > > Key: HUDI-2208 > URL: https://issues.apache.org/jira/browse/HUDI-2208 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Support the bulk insert for spark sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql
[ https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387002#comment-17387002 ] ASF GitHub Bot commented on HUDI-2208: -- pengzhiwei2018 commented on pull request #3328: URL: https://github.com/apache/hudi/pull/3328#issuecomment-886327353 Hi @leesf , I have introduced another config key to enable the bulk insert for sql. Please take a review when you have time~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Bulk Insert For Spark Sql > - > > Key: HUDI-2208 > URL: https://issues.apache.org/jira/browse/HUDI-2208 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Support the bulk insert for spark sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql
[ https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17386679#comment-17386679 ] ASF GitHub Bot commented on HUDI-2208: -- hudi-bot edited a comment on pull request #3328: URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427 ## CI report: * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN * 8d35aa66c41c3dc087a8b1514aa12274c4b661ca Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1144) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Bulk Insert For Spark Sql > - > > Key: HUDI-2208 > URL: https://issues.apache.org/jira/browse/HUDI-2208 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Support the bulk insert for spark sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql
[ https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17386667#comment-17386667 ] ASF GitHub Bot commented on HUDI-2208: -- hudi-bot edited a comment on pull request #3328: URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427 ## CI report: * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN * d14438fa452f1f9be2235ba6baffd8059b655a15 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1131) * 8d35aa66c41c3dc087a8b1514aa12274c4b661ca Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1144) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Bulk Insert For Spark Sql > - > > Key: HUDI-2208 > URL: https://issues.apache.org/jira/browse/HUDI-2208 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Support the bulk insert for spark sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql
[ https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17386661#comment-17386661 ] ASF GitHub Bot commented on HUDI-2208: -- hudi-bot edited a comment on pull request #3328: URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427 ## CI report: * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN * d14438fa452f1f9be2235ba6baffd8059b655a15 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1131) * 8d35aa66c41c3dc087a8b1514aa12274c4b661ca UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Bulk Insert For Spark Sql > - > > Key: HUDI-2208 > URL: https://issues.apache.org/jira/browse/HUDI-2208 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Support the bulk insert for spark sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql
[ https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17386659#comment-17386659 ] ASF GitHub Bot commented on HUDI-2208: -- pengzhiwei2018 commented on a change in pull request #3328: URL: https://github.com/apache/hudi/pull/3328#discussion_r675960618 ## File path: hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/TestInsertTable.scala ## @@ -144,7 +145,7 @@ class TestInsertTable extends TestHoodieSqlBase { | partitioned by (dt) | location '${tmp.getCanonicalPath}/$tableName' """.stripMargin) - + spark.sql("set hoodie.datasource.write.row.writer.enable = false") Review comment: remove this now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Bulk Insert For Spark Sql > - > > Key: HUDI-2208 > URL: https://issues.apache.org/jira/browse/HUDI-2208 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Support the bulk insert for spark sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql
[ https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17386197#comment-17386197 ] ASF GitHub Bot commented on HUDI-2208: -- hudi-bot edited a comment on pull request #3328: URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427 ## CI report: * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN * d14438fa452f1f9be2235ba6baffd8059b655a15 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1131) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Bulk Insert For Spark Sql > - > > Key: HUDI-2208 > URL: https://issues.apache.org/jira/browse/HUDI-2208 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Support the bulk insert for spark sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql
[ https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17386165#comment-17386165 ] ASF GitHub Bot commented on HUDI-2208: -- hudi-bot edited a comment on pull request #3328: URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427 ## CI report: * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN * 8087f44f7c591e0b50f113fd415a76136d376a8c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1123) * d14438fa452f1f9be2235ba6baffd8059b655a15 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1131) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Bulk Insert For Spark Sql > - > > Key: HUDI-2208 > URL: https://issues.apache.org/jira/browse/HUDI-2208 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Support the bulk insert for spark sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql
[ https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17386147#comment-17386147 ] ASF GitHub Bot commented on HUDI-2208: -- hudi-bot edited a comment on pull request #3328: URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427 ## CI report: * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN * 8087f44f7c591e0b50f113fd415a76136d376a8c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1123) * d14438fa452f1f9be2235ba6baffd8059b655a15 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Bulk Insert For Spark Sql > - > > Key: HUDI-2208 > URL: https://issues.apache.org/jira/browse/HUDI-2208 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Support the bulk insert for spark sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql
[ https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17386099#comment-17386099 ] ASF GitHub Bot commented on HUDI-2208: -- hudi-bot edited a comment on pull request #3328: URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427 ## CI report: * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN * 8087f44f7c591e0b50f113fd415a76136d376a8c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1123) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Bulk Insert For Spark Sql > - > > Key: HUDI-2208 > URL: https://issues.apache.org/jira/browse/HUDI-2208 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Support the bulk insert for spark sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql
[ https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17386066#comment-17386066 ] ASF GitHub Bot commented on HUDI-2208: -- hudi-bot edited a comment on pull request #3328: URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427 ## CI report: * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN * 52f852a534a7d1106ed25dcb48374e5c9947380c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1103) * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN * 8087f44f7c591e0b50f113fd415a76136d376a8c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1123) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Bulk Insert For Spark Sql > - > > Key: HUDI-2208 > URL: https://issues.apache.org/jira/browse/HUDI-2208 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Support the bulk insert for spark sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql
[ https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17386011#comment-17386011 ] ASF GitHub Bot commented on HUDI-2208: -- hudi-bot edited a comment on pull request #3328: URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427 ## CI report: * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN * 52f852a534a7d1106ed25dcb48374e5c9947380c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1103) * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN * 8087f44f7c591e0b50f113fd415a76136d376a8c UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Bulk Insert For Spark Sql > - > > Key: HUDI-2208 > URL: https://issues.apache.org/jira/browse/HUDI-2208 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Support the bulk insert for spark sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql
[ https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17386007#comment-17386007 ] ASF GitHub Bot commented on HUDI-2208: -- hudi-bot edited a comment on pull request #3328: URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427 ## CI report: * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN * 52f852a534a7d1106ed25dcb48374e5c9947380c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1103) * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Bulk Insert For Spark Sql > - > > Key: HUDI-2208 > URL: https://issues.apache.org/jira/browse/HUDI-2208 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Support the bulk insert for spark sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql
[ https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385646#comment-17385646 ] ASF GitHub Bot commented on HUDI-2208: -- hudi-bot edited a comment on pull request #3328: URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427 ## CI report: * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN * 52f852a534a7d1106ed25dcb48374e5c9947380c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1103) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Bulk Insert For Spark Sql > - > > Key: HUDI-2208 > URL: https://issues.apache.org/jira/browse/HUDI-2208 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Support the bulk insert for spark sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql
[ https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385618#comment-17385618 ] ASF GitHub Bot commented on HUDI-2208: -- hudi-bot edited a comment on pull request #3328: URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427 ## CI report: * 1a30511e4a029d862d22864659a5be0fac7a4fd0 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1101) * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN * 52f852a534a7d1106ed25dcb48374e5c9947380c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1103) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Bulk Insert For Spark Sql > - > > Key: HUDI-2208 > URL: https://issues.apache.org/jira/browse/HUDI-2208 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Support the bulk insert for spark sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql
[ https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385587#comment-17385587 ] ASF GitHub Bot commented on HUDI-2208: -- hudi-bot edited a comment on pull request #3328: URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427 ## CI report: * 1a30511e4a029d862d22864659a5be0fac7a4fd0 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1101) * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN * 52f852a534a7d1106ed25dcb48374e5c9947380c UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Bulk Insert For Spark Sql > - > > Key: HUDI-2208 > URL: https://issues.apache.org/jira/browse/HUDI-2208 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Support the bulk insert for spark sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql
[ https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385584#comment-17385584 ] ASF GitHub Bot commented on HUDI-2208: -- hudi-bot edited a comment on pull request #3328: URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427 ## CI report: * 834080ee7aa7c5ad2fba139d7dd8f66ddaf75b0d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1098) * 1a30511e4a029d862d22864659a5be0fac7a4fd0 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1101) * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Bulk Insert For Spark Sql > - > > Key: HUDI-2208 > URL: https://issues.apache.org/jira/browse/HUDI-2208 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Support the bulk insert for spark sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql
[ https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385570#comment-17385570 ] ASF GitHub Bot commented on HUDI-2208: -- pengzhiwei2018 commented on a change in pull request #3328: URL: https://github.com/apache/hudi/pull/3328#discussion_r674875373 ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/HoodieOptionConfig.scala ## @@ -172,6 +178,15 @@ object HoodieOptionConfig { params.get(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY.key) } + /** + * Whether enable the bulk insert for sql insert statement when there is no primaryKey in the table. + */ + def enableBulkInsert(options: Map[String, String]): Boolean = { Review comment: I saw that currently ENABLE_ROW_WRITER_OPT_KEY is only used for bulk insert, so i reused this config. ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala ## @@ -159,7 +159,10 @@ object HoodieSparkSqlWriter { // Convert to RDD[HoodieRecord] val genericRecords: RDD[GenericRecord] = HoodieSparkUtils.createRdd(df, schema, structName, nameSpace) - val shouldCombine = parameters(INSERT_DROP_DUPS_OPT_KEY.key()).toBoolean || operation.equals(WriteOperationType.UPSERT); + val shouldCombine = parameters(INSERT_DROP_DUPS_OPT_KEY.key()).toBoolean || +operation.equals(WriteOperationType.UPSERT) || + parameters.getOrElse(HoodieWriteConfig.COMBINE_BEFORE_INSERT_PROP.key(), Review comment: Yes, if we have enable the COMBINE_BEFORE_INSERT_PROP for insert, it has not compute the pre combine field value which will result incorrect result for insert with duplicate records. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Bulk Insert For Spark Sql > - > > Key: HUDI-2208 > URL: https://issues.apache.org/jira/browse/HUDI-2208 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Support the bulk insert for spark sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql
[ https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385560#comment-17385560 ] ASF GitHub Bot commented on HUDI-2208: -- leesf commented on a change in pull request #3328: URL: https://github.com/apache/hudi/pull/3328#discussion_r674872620 ## File path: hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/TestInsertTable.scala ## @@ -144,7 +145,7 @@ class TestInsertTable extends TestHoodieSqlBase { | partitioned by (dt) | location '${tmp.getCanonicalPath}/$tableName' """.stripMargin) - + spark.sql("set hoodie.datasource.write.row.writer.enable = false") Review comment: all set to false? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Bulk Insert For Spark Sql > - > > Key: HUDI-2208 > URL: https://issues.apache.org/jira/browse/HUDI-2208 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Support the bulk insert for spark sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql
[ https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385557#comment-17385557 ] ASF GitHub Bot commented on HUDI-2208: -- leesf commented on a change in pull request #3328: URL: https://github.com/apache/hudi/pull/3328#discussion_r674869873 ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/HoodieOptionConfig.scala ## @@ -172,6 +178,15 @@ object HoodieOptionConfig { params.get(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY.key) } + /** + * Whether enable the bulk insert for sql insert statement when there is no primaryKey in the table. + */ + def enableBulkInsert(options: Map[String, String]): Boolean = { Review comment: ENABLE_ROW_WRITER_OPT_KEY is a different type of bulk_insert, spark sql support both types? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Bulk Insert For Spark Sql > - > > Key: HUDI-2208 > URL: https://issues.apache.org/jira/browse/HUDI-2208 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Support the bulk insert for spark sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql
[ https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385553#comment-17385553 ] ASF GitHub Bot commented on HUDI-2208: -- leesf commented on a change in pull request #3328: URL: https://github.com/apache/hudi/pull/3328#discussion_r674866640 ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala ## @@ -159,7 +159,10 @@ object HoodieSparkSqlWriter { // Convert to RDD[HoodieRecord] val genericRecords: RDD[GenericRecord] = HoodieSparkUtils.createRdd(df, schema, structName, nameSpace) - val shouldCombine = parameters(INSERT_DROP_DUPS_OPT_KEY.key()).toBoolean || operation.equals(WriteOperationType.UPSERT); + val shouldCombine = parameters(INSERT_DROP_DUPS_OPT_KEY.key()).toBoolean || +operation.equals(WriteOperationType.UPSERT) || + parameters.getOrElse(HoodieWriteConfig.COMBINE_BEFORE_INSERT_PROP.key(), Review comment: here is a bug? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Bulk Insert For Spark Sql > - > > Key: HUDI-2208 > URL: https://issues.apache.org/jira/browse/HUDI-2208 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Support the bulk insert for spark sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql
[ https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385549#comment-17385549 ] ASF GitHub Bot commented on HUDI-2208: -- hudi-bot edited a comment on pull request #3328: URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427 ## CI report: * 834080ee7aa7c5ad2fba139d7dd8f66ddaf75b0d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1098) * 1a30511e4a029d862d22864659a5be0fac7a4fd0 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1101) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Bulk Insert For Spark Sql > - > > Key: HUDI-2208 > URL: https://issues.apache.org/jira/browse/HUDI-2208 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Support the bulk insert for spark sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql
[ https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385547#comment-17385547 ] ASF GitHub Bot commented on HUDI-2208: -- hudi-bot edited a comment on pull request #3328: URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427 ## CI report: * 834080ee7aa7c5ad2fba139d7dd8f66ddaf75b0d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1098) * 1a30511e4a029d862d22864659a5be0fac7a4fd0 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Bulk Insert For Spark Sql > - > > Key: HUDI-2208 > URL: https://issues.apache.org/jira/browse/HUDI-2208 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Support the bulk insert for spark sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql
[ https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385490#comment-17385490 ] ASF GitHub Bot commented on HUDI-2208: -- hudi-bot edited a comment on pull request #3328: URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427 ## CI report: * 834080ee7aa7c5ad2fba139d7dd8f66ddaf75b0d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1098) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Bulk Insert For Spark Sql > - > > Key: HUDI-2208 > URL: https://issues.apache.org/jira/browse/HUDI-2208 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Support the bulk insert for spark sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql
[ https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385464#comment-17385464 ] ASF GitHub Bot commented on HUDI-2208: -- hudi-bot edited a comment on pull request #3328: URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427 ## CI report: * 834080ee7aa7c5ad2fba139d7dd8f66ddaf75b0d Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1098) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Bulk Insert For Spark Sql > - > > Key: HUDI-2208 > URL: https://issues.apache.org/jira/browse/HUDI-2208 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Support the bulk insert for spark sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql
[ https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385461#comment-17385461 ] ASF GitHub Bot commented on HUDI-2208: -- hudi-bot commented on pull request #3328: URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427 ## CI report: * 834080ee7aa7c5ad2fba139d7dd8f66ddaf75b0d UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Bulk Insert For Spark Sql > - > > Key: HUDI-2208 > URL: https://issues.apache.org/jira/browse/HUDI-2208 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Support the bulk insert for spark sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql
[ https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385459#comment-17385459 ] ASF GitHub Bot commented on HUDI-2208: -- pengzhiwei2018 opened a new pull request #3328: URL: https://github.com/apache/hudi/pull/3328 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of the pull request *(For example: This pull request adds quick-start document.)* ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Bulk Insert For Spark Sql > - > > Key: HUDI-2208 > URL: https://issues.apache.org/jira/browse/HUDI-2208 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > > Support the bulk insert for spark sql -- This message was sent by Atlassian Jira (v8.3.4#803005)