subject:"\"\\\[jira\\\] \\\[Commented\\\] \\\(HUDI\\\-2208\\\) Support Bulk Insert For Spark Sql\""

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

2021-07-30 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17390517#comment-17390517
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

hudi-bot edited a comment on pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427


   
   ## CI report:
   
   * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN
   * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN
   * f8b449c31ee8601542f00e3cc15fbcab77da7787 UNKNOWN
   * bb9a6d83361f3a652b2c902b1b3dc846de617d93 UNKNOWN
   * e88244d233d323364916c4fc240083566ddc4e56 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1272)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Bulk Insert For Spark Sql
> -
>
> Key: HUDI-2208
> URL: https://issues.apache.org/jira/browse/HUDI-2208
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

2021-07-30 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17390493#comment-17390493
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

hudi-bot edited a comment on pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427


   
   ## CI report:
   
   * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN
   * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN
   * f8b449c31ee8601542f00e3cc15fbcab77da7787 UNKNOWN
   * bb9a6d83361f3a652b2c902b1b3dc846de617d93 UNKNOWN
   * 46f6c46dd03ca0d342ad0b8e774f2c8b77429c0d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1194)
 
   * e88244d233d323364916c4fc240083566ddc4e56 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1272)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Bulk Insert For Spark Sql
> -
>
> Key: HUDI-2208
> URL: https://issues.apache.org/jira/browse/HUDI-2208
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

2021-07-30 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17390465#comment-17390465
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

hudi-bot edited a comment on pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427


   
   ## CI report:
   
   * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN
   * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN
   * f8b449c31ee8601542f00e3cc15fbcab77da7787 UNKNOWN
   * bb9a6d83361f3a652b2c902b1b3dc846de617d93 UNKNOWN
   * 46f6c46dd03ca0d342ad0b8e774f2c8b77429c0d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1194)
 
   * e88244d233d323364916c4fc240083566ddc4e56 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Bulk Insert For Spark Sql
> -
>
> Key: HUDI-2208
> URL: https://issues.apache.org/jira/browse/HUDI-2208
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

2021-07-30 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17390446#comment-17390446
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

pengzhiwei2018 commented on a change in pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#discussion_r679789141



##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala
##
@@ -209,19 +209,32 @@ object InsertIntoHoodieTableCommand {
   .getOrElse(INSERT_DROP_DUPS_OPT_KEY.defaultValue)
   .toBoolean
 
-val operation = if (isOverwrite) {
-  if (table.partitionColumnNames.nonEmpty) {
-INSERT_OVERWRITE_OPERATION_OPT_VAL  // overwrite partition
-  } else {
-INSERT_OPERATION_OPT_VAL
+val enableBulkInsert = 
parameters.getOrElse(DataSourceWriteOptions.SQL_ENABLE_BULK_INSERT.key,
+  DataSourceWriteOptions.SQL_ENABLE_BULK_INSERT.defaultValue()).toBoolean
+val isPartitionedTable = table.partitionColumnNames.nonEmpty
+val isPrimaryKeyTable = primaryColumns.nonEmpty
+val operation =
+  (isPrimaryKeyTable, enableBulkInsert, isOverwrite, dropDuplicate) match {
+case (true, true, _, _) =>
+  throw new IllegalArgumentException(s"Table with primaryKey can not 
use bulk insert.")
+case (_, true, true, _) if isPartitionedTable =>
+  throw new IllegalArgumentException(s"Insert Overwrite Partition can 
not use bulk insert.")
+case (_, true, _, true) =>
+  throw new IllegalArgumentException(s"Bulk insert cannot support drop 
duplication." +
+s" Please disable $INSERT_DROP_DUPS_OPT_KEY and try again.")
+// if enableBulkInsert is true, use bulk insert for the insert 
overwrite non-partitioned table.
+case (_, true, true, _) if !isPartitionedTable => 
BULK_INSERT_OPERATION_OPT_VAL

Review comment:
   HoodieSparkSqlWriter cannot handle the mode for` insert overwrite 
partitioned table`, we should translate the write type to 
INSERT_OVERWRITE_TABLE for such case.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Bulk Insert For Spark Sql
> -
>
> Key: HUDI-2208
> URL: https://issues.apache.org/jira/browse/HUDI-2208
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

2021-07-29 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17389901#comment-17389901
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

nsivabalan commented on a change in pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#discussion_r679153080



##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala
##
@@ -209,19 +209,32 @@ object InsertIntoHoodieTableCommand {
   .getOrElse(INSERT_DROP_DUPS_OPT_KEY.defaultValue)
   .toBoolean
 
-val operation = if (isOverwrite) {
-  if (table.partitionColumnNames.nonEmpty) {
-INSERT_OVERWRITE_OPERATION_OPT_VAL  // overwrite partition
-  } else {
-INSERT_OPERATION_OPT_VAL
+val enableBulkInsert = 
parameters.getOrElse(DataSourceWriteOptions.SQL_ENABLE_BULK_INSERT.key,
+  DataSourceWriteOptions.SQL_ENABLE_BULK_INSERT.defaultValue()).toBoolean
+val isPartitionedTable = table.partitionColumnNames.nonEmpty
+val isPrimaryKeyTable = primaryColumns.nonEmpty
+val operation =
+  (isPrimaryKeyTable, enableBulkInsert, isOverwrite, dropDuplicate) match {
+case (true, true, _, _) =>
+  throw new IllegalArgumentException(s"Table with primaryKey can not 
use bulk insert.")
+case (_, true, true, _) if isPartitionedTable =>
+  throw new IllegalArgumentException(s"Insert Overwrite Partition can 
not use bulk insert.")
+case (_, true, _, true) =>
+  throw new IllegalArgumentException(s"Bulk insert cannot support drop 
duplication." +
+s" Please disable $INSERT_DROP_DUPS_OPT_KEY and try again.")
+// if enableBulkInsert is true, use bulk insert for the insert 
overwrite non-partitioned table.
+case (_, true, true, _) if !isPartitionedTable => 
BULK_INSERT_OPERATION_OPT_VAL

Review comment:
   also, if there are any valid optimizations, then probably we should move 
it to HoodiesparkSqlSwriter so that both spark datasource and sql dml benefits 
:) 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Bulk Insert For Spark Sql
> -
>
> Key: HUDI-2208
> URL: https://issues.apache.org/jira/browse/HUDI-2208
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

2021-07-28 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17388522#comment-17388522
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

pengzhiwei2018 commented on a change in pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#discussion_r678031341



##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
##
@@ -159,7 +159,10 @@ object HoodieSparkSqlWriter {
 
   // Convert to RDD[HoodieRecord]
   val genericRecords: RDD[GenericRecord] = 
HoodieSparkUtils.createRdd(df, schema, structName, nameSpace)
-  val shouldCombine = 
parameters(INSERT_DROP_DUPS_OPT_KEY.key()).toBoolean || 
operation.equals(WriteOperationType.UPSERT);
+  val shouldCombine = 
parameters(INSERT_DROP_DUPS_OPT_KEY.key()).toBoolean ||
+operation.equals(WriteOperationType.UPSERT) ||
+
parameters.getOrElse(HoodieWriteConfig.COMBINE_BEFORE_INSERT_PROP.key(),

Review comment:
   If the `COMBINE_BEFORE_INSERT_PROP` has enabled, 
`SparkInsertCommitActionExecutor` will do the  `combineOnCondition` in 
`AbstractWriteHelper`  which will need the `precombine` value .
But here we have ignored the case of `COMBINE_BEFORE_INSERT_PROP` which 
will not extract the `precombine` value to the HoodiePayload.

##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala
##
@@ -209,19 +209,32 @@ object InsertIntoHoodieTableCommand {
   .getOrElse(INSERT_DROP_DUPS_OPT_KEY.defaultValue)
   .toBoolean
 
-val operation = if (isOverwrite) {
-  if (table.partitionColumnNames.nonEmpty) {
-INSERT_OVERWRITE_OPERATION_OPT_VAL  // overwrite partition
-  } else {
-INSERT_OPERATION_OPT_VAL
+val enableBulkInsert = 
parameters.getOrElse(DataSourceWriteOptions.SQL_ENABLE_BULK_INSERT.key,
+  DataSourceWriteOptions.SQL_ENABLE_BULK_INSERT.defaultValue()).toBoolean
+val isPartitionedTable = table.partitionColumnNames.nonEmpty
+val isPrimaryKeyTable = primaryColumns.nonEmpty
+val operation =
+  (isPrimaryKeyTable, enableBulkInsert, isOverwrite, dropDuplicate) match {
+case (true, true, _, _) =>
+  throw new IllegalArgumentException(s"Table with primaryKey can not 
use bulk insert.")

Review comment:
   Because currently we will do the primary key uniqueness check when 
inserting data to the pk-table, just like what database does. Bulk insert 
currently cannot do such things.

##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala
##
@@ -243,6 +256,8 @@ object InsertIntoHoodieTableCommand {
 RECORDKEY_FIELD_OPT_KEY.key -> primaryColumns.mkString(","),
 PARTITIONPATH_FIELD_OPT_KEY.key -> partitionFields,
 PAYLOAD_CLASS_OPT_KEY.key -> payloadClassName,
+ENABLE_ROW_WRITER_OPT_KEY.key -> enableBulkInsert.toString,
+HoodieWriteConfig.COMBINE_BEFORE_INSERT_PROP.key -> 
isPrimaryKeyTable.toString, // if the table has primaryKey, enable the combine

Review comment:
   We need to ensure that data is unique for pk-table just like database, 
so I do combine for the input.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Bulk Insert For Spark Sql
> -
>
> Key: HUDI-2208
> URL: https://issues.apache.org/jira/browse/HUDI-2208
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

2021-07-27 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17388246#comment-17388246
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

nsivabalan commented on a change in pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#discussion_r677405475



##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala
##
@@ -209,19 +209,32 @@ object InsertIntoHoodieTableCommand {
   .getOrElse(INSERT_DROP_DUPS_OPT_KEY.defaultValue)
   .toBoolean
 
-val operation = if (isOverwrite) {
-  if (table.partitionColumnNames.nonEmpty) {
-INSERT_OVERWRITE_OPERATION_OPT_VAL  // overwrite partition
-  } else {
-INSERT_OPERATION_OPT_VAL
+val enableBulkInsert = 
parameters.getOrElse(DataSourceWriteOptions.SQL_ENABLE_BULK_INSERT.key,
+  DataSourceWriteOptions.SQL_ENABLE_BULK_INSERT.defaultValue()).toBoolean
+val isPartitionedTable = table.partitionColumnNames.nonEmpty
+val isPrimaryKeyTable = primaryColumns.nonEmpty
+val operation =
+  (isPrimaryKeyTable, enableBulkInsert, isOverwrite, dropDuplicate) match {
+case (true, true, _, _) =>
+  throw new IllegalArgumentException(s"Table with primaryKey can not 
use bulk insert.")

Review comment:
   may I know why do we have this constraint? 

##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
##
@@ -159,7 +159,10 @@ object HoodieSparkSqlWriter {
 
   // Convert to RDD[HoodieRecord]
   val genericRecords: RDD[GenericRecord] = 
HoodieSparkUtils.createRdd(df, schema, structName, nameSpace)
-  val shouldCombine = 
parameters(INSERT_DROP_DUPS_OPT_KEY.key()).toBoolean || 
operation.equals(WriteOperationType.UPSERT);
+  val shouldCombine = 
parameters(INSERT_DROP_DUPS_OPT_KEY.key()).toBoolean ||
+operation.equals(WriteOperationType.UPSERT) ||
+
parameters.getOrElse(HoodieWriteConfig.COMBINE_BEFORE_INSERT_PROP.key(),

Review comment:
   sorry I don't get you. Precombine is just one field as is right. Not 
sure what do you mean by "not compute the preCombine field value"? can you 
throw some more light please. 
   In general, for inserts we don't do any precombine. But if this config 
(COMBINE_BEFORE_INSERT_PROP) is enabled, we need to do preCombine. 

##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala
##
@@ -243,6 +256,8 @@ object InsertIntoHoodieTableCommand {
 RECORDKEY_FIELD_OPT_KEY.key -> primaryColumns.mkString(","),
 PARTITIONPATH_FIELD_OPT_KEY.key -> partitionFields,
 PAYLOAD_CLASS_OPT_KEY.key -> payloadClassName,
+ENABLE_ROW_WRITER_OPT_KEY.key -> enableBulkInsert.toString,

Review comment:
   you can add one, but make the default as true. 

##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala
##
@@ -209,19 +209,32 @@ object InsertIntoHoodieTableCommand {
   .getOrElse(INSERT_DROP_DUPS_OPT_KEY.defaultValue)
   .toBoolean
 
-val operation = if (isOverwrite) {
-  if (table.partitionColumnNames.nonEmpty) {
-INSERT_OVERWRITE_OPERATION_OPT_VAL  // overwrite partition
-  } else {
-INSERT_OPERATION_OPT_VAL
+val enableBulkInsert = 
parameters.getOrElse(DataSourceWriteOptions.SQL_ENABLE_BULK_INSERT.key,
+  DataSourceWriteOptions.SQL_ENABLE_BULK_INSERT.defaultValue()).toBoolean
+val isPartitionedTable = table.partitionColumnNames.nonEmpty
+val isPrimaryKeyTable = primaryColumns.nonEmpty
+val operation =
+  (isPrimaryKeyTable, enableBulkInsert, isOverwrite, dropDuplicate) match {
+case (true, true, _, _) =>
+  throw new IllegalArgumentException(s"Table with primaryKey can not 
use bulk insert.")
+case (_, true, true, _) if isPartitionedTable =>
+  throw new IllegalArgumentException(s"Insert Overwrite Partition can 
not use bulk insert.")
+case (_, true, _, true) =>
+  throw new IllegalArgumentException(s"Bulk insert cannot support drop 
duplication." +
+s" Please disable $INSERT_DROP_DUPS_OPT_KEY and try again.")
+// if enableBulkInsert is true, use bulk insert for the insert 
overwrite non-partitioned table.
+case (_, true, true, _) if !isPartitionedTable => 
BULK_INSERT_OPERATION_OPT_VAL

Review comment:
   Am just trying to understand the sql dml here. We already handle save 
modes within HoodieSparkSqlWriter. So, trying to understand whats required in 
addition to that? Trying to avoid duplication if possible. I mean, for some of 
the cases listed here, its just about overWrite mode. 

##
File path: 
hu

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

2021-07-27 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17388127#comment-17388127
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

leesf commented on a change in pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#discussion_r677536750



##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala
##
@@ -243,6 +256,8 @@ object InsertIntoHoodieTableCommand {
 RECORDKEY_FIELD_OPT_KEY.key -> primaryColumns.mkString(","),
 PARTITIONPATH_FIELD_OPT_KEY.key -> partitionFields,
 PAYLOAD_CLASS_OPT_KEY.key -> payloadClassName,
+ENABLE_ROW_WRITER_OPT_KEY.key -> enableBulkInsert.toString,

Review comment:
   so here why there is no option for bulk_insert without row_writer? must 
in row_writer?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Bulk Insert For Spark Sql
> -
>
> Key: HUDI-2208
> URL: https://issues.apache.org/jira/browse/HUDI-2208
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

2021-07-27 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17388126#comment-17388126
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

leesf commented on a change in pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#discussion_r677534597



##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala
##
@@ -243,6 +256,8 @@ object InsertIntoHoodieTableCommand {
 RECORDKEY_FIELD_OPT_KEY.key -> primaryColumns.mkString(","),
 PARTITIONPATH_FIELD_OPT_KEY.key -> partitionFields,
 PAYLOAD_CLASS_OPT_KEY.key -> payloadClassName,
+ENABLE_ROW_WRITER_OPT_KEY.key -> enableBulkInsert.toString,
+HoodieWriteConfig.COMBINE_BEFORE_INSERT_PROP.key -> 
isPrimaryKeyTable.toString, // if the table has primaryKey, enable the combine

Review comment:
   for pk-table and in insert mode, there is no need to combine the input, 
right?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Bulk Insert For Spark Sql
> -
>
> Key: HUDI-2208
> URL: https://issues.apache.org/jira/browse/HUDI-2208
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

2021-07-27 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17388122#comment-17388122
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

leesf commented on a change in pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#discussion_r677534597



##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala
##
@@ -243,6 +256,8 @@ object InsertIntoHoodieTableCommand {
 RECORDKEY_FIELD_OPT_KEY.key -> primaryColumns.mkString(","),
 PARTITIONPATH_FIELD_OPT_KEY.key -> partitionFields,
 PAYLOAD_CLASS_OPT_KEY.key -> payloadClassName,
+ENABLE_ROW_WRITER_OPT_KEY.key -> enableBulkInsert.toString,
+HoodieWriteConfig.COMBINE_BEFORE_INSERT_PROP.key -> 
isPrimaryKeyTable.toString, // if the table has primaryKey, enable the combine

Review comment:
   even for pk-table, in insert mode, there is no need to combine the 
input, right?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Bulk Insert For Spark Sql
> -
>
> Key: HUDI-2208
> URL: https://issues.apache.org/jira/browse/HUDI-2208
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

2021-07-27 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387961#comment-17387961
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

hudi-bot edited a comment on pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427


   
   ## CI report:
   
   * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN
   * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN
   * f8b449c31ee8601542f00e3cc15fbcab77da7787 UNKNOWN
   * bb9a6d83361f3a652b2c902b1b3dc846de617d93 UNKNOWN
   * 46f6c46dd03ca0d342ad0b8e774f2c8b77429c0d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1194)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Bulk Insert For Spark Sql
> -
>
> Key: HUDI-2208
> URL: https://issues.apache.org/jira/browse/HUDI-2208
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

2021-07-27 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387926#comment-17387926
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

hudi-bot edited a comment on pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427


   
   ## CI report:
   
   * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN
   * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN
   * f8b449c31ee8601542f00e3cc15fbcab77da7787 UNKNOWN
   * a427b11ca3c1146c57cdd0ce502ae56da01e66fc Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1186)
 
   * bb9a6d83361f3a652b2c902b1b3dc846de617d93 UNKNOWN
   * 46f6c46dd03ca0d342ad0b8e774f2c8b77429c0d Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1194)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Bulk Insert For Spark Sql
> -
>
> Key: HUDI-2208
> URL: https://issues.apache.org/jira/browse/HUDI-2208
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

2021-07-27 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387911#comment-17387911
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

hudi-bot edited a comment on pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427


   
   ## CI report:
   
   * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN
   * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN
   * f8b449c31ee8601542f00e3cc15fbcab77da7787 UNKNOWN
   * a427b11ca3c1146c57cdd0ce502ae56da01e66fc Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1186)
 
   * bb9a6d83361f3a652b2c902b1b3dc846de617d93 UNKNOWN
   * 46f6c46dd03ca0d342ad0b8e774f2c8b77429c0d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Bulk Insert For Spark Sql
> -
>
> Key: HUDI-2208
> URL: https://issues.apache.org/jira/browse/HUDI-2208
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

2021-07-27 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387906#comment-17387906
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

hudi-bot edited a comment on pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427


   
   ## CI report:
   
   * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN
   * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN
   * f8b449c31ee8601542f00e3cc15fbcab77da7787 UNKNOWN
   * a427b11ca3c1146c57cdd0ce502ae56da01e66fc Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1186)
 
   * bb9a6d83361f3a652b2c902b1b3dc846de617d93 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Bulk Insert For Spark Sql
> -
>
> Key: HUDI-2208
> URL: https://issues.apache.org/jira/browse/HUDI-2208
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

2021-07-26 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387812#comment-17387812
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

hudi-bot edited a comment on pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427


   
   ## CI report:
   
   * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN
   * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN
   * f8b449c31ee8601542f00e3cc15fbcab77da7787 UNKNOWN
   * a427b11ca3c1146c57cdd0ce502ae56da01e66fc Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1186)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Bulk Insert For Spark Sql
> -
>
> Key: HUDI-2208
> URL: https://issues.apache.org/jira/browse/HUDI-2208
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

2021-07-26 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387805#comment-17387805
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

hudi-bot edited a comment on pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427


   
   ## CI report:
   
   * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN
   * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN
   * f8b449c31ee8601542f00e3cc15fbcab77da7787 UNKNOWN
   * 80aeae998fc99dd3f31662a78c8dc5a9dce57252 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1179)
 
   * a427b11ca3c1146c57cdd0ce502ae56da01e66fc Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1186)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Bulk Insert For Spark Sql
> -
>
> Key: HUDI-2208
> URL: https://issues.apache.org/jira/browse/HUDI-2208
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

2021-07-26 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387785#comment-17387785
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

hudi-bot edited a comment on pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427


   
   ## CI report:
   
   * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN
   * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN
   * f8b449c31ee8601542f00e3cc15fbcab77da7787 UNKNOWN
   * 80aeae998fc99dd3f31662a78c8dc5a9dce57252 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1179)
 
   * a427b11ca3c1146c57cdd0ce502ae56da01e66fc UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Bulk Insert For Spark Sql
> -
>
> Key: HUDI-2208
> URL: https://issues.apache.org/jira/browse/HUDI-2208
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

2021-07-26 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387780#comment-17387780
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

hudi-bot edited a comment on pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427


   
   ## CI report:
   
   * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN
   * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN
   * f8b449c31ee8601542f00e3cc15fbcab77da7787 UNKNOWN
   * 80aeae998fc99dd3f31662a78c8dc5a9dce57252 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1179)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Bulk Insert For Spark Sql
> -
>
> Key: HUDI-2208
> URL: https://issues.apache.org/jira/browse/HUDI-2208
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

2021-07-26 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387756#comment-17387756
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

hudi-bot edited a comment on pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427


   
   ## CI report:
   
   * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN
   * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN
   * f8b449c31ee8601542f00e3cc15fbcab77da7787 UNKNOWN
   * 0f565dae9227a4653aed57ee74937d697b6a583a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1165)
 
   * 80aeae998fc99dd3f31662a78c8dc5a9dce57252 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1179)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Bulk Insert For Spark Sql
> -
>
> Key: HUDI-2208
> URL: https://issues.apache.org/jira/browse/HUDI-2208
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

2021-07-26 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387743#comment-17387743
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

hudi-bot edited a comment on pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427


   
   ## CI report:
   
   * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN
   * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN
   * f8b449c31ee8601542f00e3cc15fbcab77da7787 UNKNOWN
   * 0f565dae9227a4653aed57ee74937d697b6a583a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1165)
 
   * 80aeae998fc99dd3f31662a78c8dc5a9dce57252 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Bulk Insert For Spark Sql
> -
>
> Key: HUDI-2208
> URL: https://issues.apache.org/jira/browse/HUDI-2208
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

2021-07-26 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387461#comment-17387461
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

hudi-bot edited a comment on pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427


   
   ## CI report:
   
   * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN
   * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN
   * f8b449c31ee8601542f00e3cc15fbcab77da7787 UNKNOWN
   * 0f565dae9227a4653aed57ee74937d697b6a583a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1165)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Bulk Insert For Spark Sql
> -
>
> Key: HUDI-2208
> URL: https://issues.apache.org/jira/browse/HUDI-2208
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

2021-07-26 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387409#comment-17387409
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

hudi-bot edited a comment on pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427


   
   ## CI report:
   
   * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN
   * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN
   * f8b449c31ee8601542f00e3cc15fbcab77da7787 UNKNOWN
   * ceeb9a317a103002e7f6cd198e19310955ab7572 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1162)
 
   * 0f565dae9227a4653aed57ee74937d697b6a583a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1165)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Bulk Insert For Spark Sql
> -
>
> Key: HUDI-2208
> URL: https://issues.apache.org/jira/browse/HUDI-2208
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

2021-07-26 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387375#comment-17387375
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

hudi-bot edited a comment on pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427


   
   ## CI report:
   
   * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN
   * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN
   * f8b449c31ee8601542f00e3cc15fbcab77da7787 UNKNOWN
   * ceeb9a317a103002e7f6cd198e19310955ab7572 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1162)
 
   * 0f565dae9227a4653aed57ee74937d697b6a583a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Bulk Insert For Spark Sql
> -
>
> Key: HUDI-2208
> URL: https://issues.apache.org/jira/browse/HUDI-2208
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

2021-07-26 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387372#comment-17387372
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

hudi-bot edited a comment on pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427


   
   ## CI report:
   
   * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN
   * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN
   * 8d35aa66c41c3dc087a8b1514aa12274c4b661ca Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1144)
 
   * f8b449c31ee8601542f00e3cc15fbcab77da7787 UNKNOWN
   * ceeb9a317a103002e7f6cd198e19310955ab7572 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1162)
 
   * 0f565dae9227a4653aed57ee74937d697b6a583a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Bulk Insert For Spark Sql
> -
>
> Key: HUDI-2208
> URL: https://issues.apache.org/jira/browse/HUDI-2208
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

2021-07-26 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387367#comment-17387367
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

hudi-bot edited a comment on pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427


   
   ## CI report:
   
   * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN
   * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN
   * 8d35aa66c41c3dc087a8b1514aa12274c4b661ca Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1144)
 
   * f8b449c31ee8601542f00e3cc15fbcab77da7787 UNKNOWN
   * ceeb9a317a103002e7f6cd198e19310955ab7572 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Bulk Insert For Spark Sql
> -
>
> Key: HUDI-2208
> URL: https://issues.apache.org/jira/browse/HUDI-2208
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

2021-07-26 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387362#comment-17387362
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

hudi-bot edited a comment on pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427


   
   ## CI report:
   
   * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN
   * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN
   * 8d35aa66c41c3dc087a8b1514aa12274c4b661ca Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1144)
 
   * f8b449c31ee8601542f00e3cc15fbcab77da7787 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Bulk Insert For Spark Sql
> -
>
> Key: HUDI-2208
> URL: https://issues.apache.org/jira/browse/HUDI-2208
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

2021-07-26 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387360#comment-17387360
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

pengzhiwei2018 commented on a change in pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#discussion_r676621769



##
File path: 
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala
##
@@ -247,6 +247,13 @@ object DataSourceWriteOptions {
 .defaultValue("false")
 .withDocumentation("When set to true, will perform write operations 
directly using the spark native " +
   "`Row` representation, avoiding any additional conversion costs.")
+  /**
+   * Enable the bulk insert for sql insert statement.
+   */
+  val SQL_ENABLE_BULK_INSERT:ConfigProperty[String] = ConfigProperty
+.key("hoodie.sq.bulk.insert.enable")

Review comment:
   Good catch!

##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala
##
@@ -243,6 +256,8 @@ object InsertIntoHoodieTableCommand {
 RECORDKEY_FIELD_OPT_KEY.key -> primaryColumns.mkString(","),
 PARTITIONPATH_FIELD_OPT_KEY.key -> partitionFields,
 PAYLOAD_CLASS_OPT_KEY.key -> payloadClassName,
+ENABLE_ROW_WRITER_OPT_KEY.key -> enableBulkInsert.toString,

Review comment:
   Currently we already have the ENABLE_ROW_WRITER_OPT_KEY to config the 
Row write. The enableBulkInsert is only used to control the row write for bulk 
insert. 

##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala
##
@@ -243,6 +256,8 @@ object InsertIntoHoodieTableCommand {
 RECORDKEY_FIELD_OPT_KEY.key -> primaryColumns.mkString(","),
 PARTITIONPATH_FIELD_OPT_KEY.key -> partitionFields,
 PAYLOAD_CLASS_OPT_KEY.key -> payloadClassName,
+ENABLE_ROW_WRITER_OPT_KEY.key -> enableBulkInsert.toString,
+HoodieWriteConfig.COMBINE_BEFORE_INSERT_PROP.key -> 
isPrimaryKeyTable.toString, // if the table has primaryKey, enable the combine

Review comment:
   Because we must guarantee the  uniqueness for p-k table. So we should 
combine the input for pk-table.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Bulk Insert For Spark Sql
> -
>
> Key: HUDI-2208
> URL: https://issues.apache.org/jira/browse/HUDI-2208
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

2021-07-26 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387284#comment-17387284
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

leesf commented on a change in pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#discussion_r676542684



##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala
##
@@ -243,6 +256,8 @@ object InsertIntoHoodieTableCommand {
 RECORDKEY_FIELD_OPT_KEY.key -> primaryColumns.mkString(","),
 PARTITIONPATH_FIELD_OPT_KEY.key -> partitionFields,
 PAYLOAD_CLASS_OPT_KEY.key -> payloadClassName,
+ENABLE_ROW_WRITER_OPT_KEY.key -> enableBulkInsert.toString,
+HoodieWriteConfig.COMBINE_BEFORE_INSERT_PROP.key -> 
isPrimaryKeyTable.toString, // if the table has primaryKey, enable the combine

Review comment:
   why the table has primaryKey, should enable the combine?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Bulk Insert For Spark Sql
> -
>
> Key: HUDI-2208
> URL: https://issues.apache.org/jira/browse/HUDI-2208
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

2021-07-26 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387283#comment-17387283
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

leesf commented on a change in pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#discussion_r676542368



##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala
##
@@ -243,6 +256,8 @@ object InsertIntoHoodieTableCommand {
 RECORDKEY_FIELD_OPT_KEY.key -> primaryColumns.mkString(","),
 PARTITIONPATH_FIELD_OPT_KEY.key -> partitionFields,
 PAYLOAD_CLASS_OPT_KEY.key -> payloadClassName,
+ENABLE_ROW_WRITER_OPT_KEY.key -> enableBulkInsert.toString,

Review comment:
   here enableBulkInsert means using ROW_WRITER, should we also introduce 
another config to control the ROW_WRITER?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Bulk Insert For Spark Sql
> -
>
> Key: HUDI-2208
> URL: https://issues.apache.org/jira/browse/HUDI-2208
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

2021-07-26 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387279#comment-17387279
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

leesf commented on a change in pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#discussion_r676541180



##
File path: 
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala
##
@@ -247,6 +247,13 @@ object DataSourceWriteOptions {
 .defaultValue("false")
 .withDocumentation("When set to true, will perform write operations 
directly using the spark native " +
   "`Row` representation, avoiding any additional conversion costs.")
+  /**
+   * Enable the bulk insert for sql insert statement.
+   */
+  val SQL_ENABLE_BULK_INSERT:ConfigProperty[String] = ConfigProperty
+.key("hoodie.sq.bulk.insert.enable")

Review comment:
   nit: sql




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Bulk Insert For Spark Sql
> -
>
> Key: HUDI-2208
> URL: https://issues.apache.org/jira/browse/HUDI-2208
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

2021-07-25 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387002#comment-17387002
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

pengzhiwei2018 commented on pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#issuecomment-886327353


   Hi @leesf , I have introduced another config key to enable the bulk insert 
for sql. Please take a review when you have time~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Bulk Insert For Spark Sql
> -
>
> Key: HUDI-2208
> URL: https://issues.apache.org/jira/browse/HUDI-2208
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

2021-07-24 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17386679#comment-17386679
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

hudi-bot edited a comment on pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427


   
   ## CI report:
   
   * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN
   * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN
   * 8d35aa66c41c3dc087a8b1514aa12274c4b661ca Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1144)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Bulk Insert For Spark Sql
> -
>
> Key: HUDI-2208
> URL: https://issues.apache.org/jira/browse/HUDI-2208
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

2021-07-24 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17386667#comment-17386667
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

hudi-bot edited a comment on pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427


   
   ## CI report:
   
   * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN
   * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN
   * d14438fa452f1f9be2235ba6baffd8059b655a15 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1131)
 
   * 8d35aa66c41c3dc087a8b1514aa12274c4b661ca Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1144)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Bulk Insert For Spark Sql
> -
>
> Key: HUDI-2208
> URL: https://issues.apache.org/jira/browse/HUDI-2208
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

2021-07-24 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17386661#comment-17386661
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

hudi-bot edited a comment on pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427


   
   ## CI report:
   
   * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN
   * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN
   * d14438fa452f1f9be2235ba6baffd8059b655a15 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1131)
 
   * 8d35aa66c41c3dc087a8b1514aa12274c4b661ca UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Bulk Insert For Spark Sql
> -
>
> Key: HUDI-2208
> URL: https://issues.apache.org/jira/browse/HUDI-2208
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

2021-07-24 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17386659#comment-17386659
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

pengzhiwei2018 commented on a change in pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#discussion_r675960618



##
File path: 
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/TestInsertTable.scala
##
@@ -144,7 +145,7 @@ class TestInsertTable extends TestHoodieSqlBase {
| partitioned by (dt)
| location '${tmp.getCanonicalPath}/$tableName'
""".stripMargin)
-
+  spark.sql("set hoodie.datasource.write.row.writer.enable = false")

Review comment:
   remove this now




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Bulk Insert For Spark Sql
> -
>
> Key: HUDI-2208
> URL: https://issues.apache.org/jira/browse/HUDI-2208
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

2021-07-23 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17386197#comment-17386197
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

hudi-bot edited a comment on pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427


   
   ## CI report:
   
   * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN
   * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN
   * d14438fa452f1f9be2235ba6baffd8059b655a15 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1131)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Bulk Insert For Spark Sql
> -
>
> Key: HUDI-2208
> URL: https://issues.apache.org/jira/browse/HUDI-2208
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

2021-07-23 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17386165#comment-17386165
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

hudi-bot edited a comment on pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427


   
   ## CI report:
   
   * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN
   * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN
   * 8087f44f7c591e0b50f113fd415a76136d376a8c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1123)
 
   * d14438fa452f1f9be2235ba6baffd8059b655a15 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1131)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Bulk Insert For Spark Sql
> -
>
> Key: HUDI-2208
> URL: https://issues.apache.org/jira/browse/HUDI-2208
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

2021-07-23 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17386147#comment-17386147
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

hudi-bot edited a comment on pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427


   
   ## CI report:
   
   * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN
   * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN
   * 8087f44f7c591e0b50f113fd415a76136d376a8c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1123)
 
   * d14438fa452f1f9be2235ba6baffd8059b655a15 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Bulk Insert For Spark Sql
> -
>
> Key: HUDI-2208
> URL: https://issues.apache.org/jira/browse/HUDI-2208
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

2021-07-23 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17386099#comment-17386099
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

hudi-bot edited a comment on pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427


   
   ## CI report:
   
   * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN
   * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN
   * 8087f44f7c591e0b50f113fd415a76136d376a8c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1123)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Bulk Insert For Spark Sql
> -
>
> Key: HUDI-2208
> URL: https://issues.apache.org/jira/browse/HUDI-2208
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

2021-07-23 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17386066#comment-17386066
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

hudi-bot edited a comment on pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427


   
   ## CI report:
   
   * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN
   * 52f852a534a7d1106ed25dcb48374e5c9947380c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1103)
 
   * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN
   * 8087f44f7c591e0b50f113fd415a76136d376a8c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1123)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Bulk Insert For Spark Sql
> -
>
> Key: HUDI-2208
> URL: https://issues.apache.org/jira/browse/HUDI-2208
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

2021-07-23 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17386011#comment-17386011
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

hudi-bot edited a comment on pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427


   
   ## CI report:
   
   * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN
   * 52f852a534a7d1106ed25dcb48374e5c9947380c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1103)
 
   * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN
   * 8087f44f7c591e0b50f113fd415a76136d376a8c UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Bulk Insert For Spark Sql
> -
>
> Key: HUDI-2208
> URL: https://issues.apache.org/jira/browse/HUDI-2208
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

2021-07-23 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17386007#comment-17386007
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

hudi-bot edited a comment on pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427


   
   ## CI report:
   
   * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN
   * 52f852a534a7d1106ed25dcb48374e5c9947380c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1103)
 
   * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Bulk Insert For Spark Sql
> -
>
> Key: HUDI-2208
> URL: https://issues.apache.org/jira/browse/HUDI-2208
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

2021-07-22 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385646#comment-17385646
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

hudi-bot edited a comment on pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427


   
   ## CI report:
   
   * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN
   * 52f852a534a7d1106ed25dcb48374e5c9947380c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1103)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Bulk Insert For Spark Sql
> -
>
> Key: HUDI-2208
> URL: https://issues.apache.org/jira/browse/HUDI-2208
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

2021-07-22 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385618#comment-17385618
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

hudi-bot edited a comment on pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427


   
   ## CI report:
   
   * 1a30511e4a029d862d22864659a5be0fac7a4fd0 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1101)
 
   * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN
   * 52f852a534a7d1106ed25dcb48374e5c9947380c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1103)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Bulk Insert For Spark Sql
> -
>
> Key: HUDI-2208
> URL: https://issues.apache.org/jira/browse/HUDI-2208
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

2021-07-22 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385587#comment-17385587
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

hudi-bot edited a comment on pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427


   
   ## CI report:
   
   * 1a30511e4a029d862d22864659a5be0fac7a4fd0 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1101)
 
   * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN
   * 52f852a534a7d1106ed25dcb48374e5c9947380c UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Bulk Insert For Spark Sql
> -
>
> Key: HUDI-2208
> URL: https://issues.apache.org/jira/browse/HUDI-2208
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

2021-07-22 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385584#comment-17385584
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

hudi-bot edited a comment on pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427


   
   ## CI report:
   
   * 834080ee7aa7c5ad2fba139d7dd8f66ddaf75b0d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1098)
 
   * 1a30511e4a029d862d22864659a5be0fac7a4fd0 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1101)
 
   * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Bulk Insert For Spark Sql
> -
>
> Key: HUDI-2208
> URL: https://issues.apache.org/jira/browse/HUDI-2208
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

2021-07-22 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385570#comment-17385570
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

pengzhiwei2018 commented on a change in pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#discussion_r674875373



##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/HoodieOptionConfig.scala
##
@@ -172,6 +178,15 @@ object HoodieOptionConfig {
 params.get(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY.key)
   }
 
+  /**
+   * Whether enable the bulk insert for sql insert statement when there is no 
primaryKey in the table.
+   */
+  def enableBulkInsert(options: Map[String, String]): Boolean = {

Review comment:
   I saw that currently ENABLE_ROW_WRITER_OPT_KEY is only used for bulk 
insert,  so i reused this config.

##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
##
@@ -159,7 +159,10 @@ object HoodieSparkSqlWriter {
 
   // Convert to RDD[HoodieRecord]
   val genericRecords: RDD[GenericRecord] = 
HoodieSparkUtils.createRdd(df, schema, structName, nameSpace)
-  val shouldCombine = 
parameters(INSERT_DROP_DUPS_OPT_KEY.key()).toBoolean || 
operation.equals(WriteOperationType.UPSERT);
+  val shouldCombine = 
parameters(INSERT_DROP_DUPS_OPT_KEY.key()).toBoolean ||
+operation.equals(WriteOperationType.UPSERT) ||
+
parameters.getOrElse(HoodieWriteConfig.COMBINE_BEFORE_INSERT_PROP.key(),

Review comment:
   Yes,  if we have enable the COMBINE_BEFORE_INSERT_PROP for insert, it 
has not compute the pre combine field value which will result incorrect result 
for insert with duplicate records.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Bulk Insert For Spark Sql
> -
>
> Key: HUDI-2208
> URL: https://issues.apache.org/jira/browse/HUDI-2208
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

2021-07-22 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385560#comment-17385560
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

leesf commented on a change in pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#discussion_r674872620



##
File path: 
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/TestInsertTable.scala
##
@@ -144,7 +145,7 @@ class TestInsertTable extends TestHoodieSqlBase {
| partitioned by (dt)
| location '${tmp.getCanonicalPath}/$tableName'
""".stripMargin)
-
+  spark.sql("set hoodie.datasource.write.row.writer.enable = false")

Review comment:
   all set to false?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Bulk Insert For Spark Sql
> -
>
> Key: HUDI-2208
> URL: https://issues.apache.org/jira/browse/HUDI-2208
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

2021-07-22 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385557#comment-17385557
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

leesf commented on a change in pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#discussion_r674869873



##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/HoodieOptionConfig.scala
##
@@ -172,6 +178,15 @@ object HoodieOptionConfig {
 params.get(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY.key)
   }
 
+  /**
+   * Whether enable the bulk insert for sql insert statement when there is no 
primaryKey in the table.
+   */
+  def enableBulkInsert(options: Map[String, String]): Boolean = {

Review comment:
   ENABLE_ROW_WRITER_OPT_KEY is a different type of bulk_insert, spark sql 
support both types?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Bulk Insert For Spark Sql
> -
>
> Key: HUDI-2208
> URL: https://issues.apache.org/jira/browse/HUDI-2208
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

2021-07-22 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385553#comment-17385553
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

leesf commented on a change in pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#discussion_r674866640



##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
##
@@ -159,7 +159,10 @@ object HoodieSparkSqlWriter {
 
   // Convert to RDD[HoodieRecord]
   val genericRecords: RDD[GenericRecord] = 
HoodieSparkUtils.createRdd(df, schema, structName, nameSpace)
-  val shouldCombine = 
parameters(INSERT_DROP_DUPS_OPT_KEY.key()).toBoolean || 
operation.equals(WriteOperationType.UPSERT);
+  val shouldCombine = 
parameters(INSERT_DROP_DUPS_OPT_KEY.key()).toBoolean ||
+operation.equals(WriteOperationType.UPSERT) ||
+
parameters.getOrElse(HoodieWriteConfig.COMBINE_BEFORE_INSERT_PROP.key(),

Review comment:
   here is a bug?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Bulk Insert For Spark Sql
> -
>
> Key: HUDI-2208
> URL: https://issues.apache.org/jira/browse/HUDI-2208
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

2021-07-22 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385549#comment-17385549
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

hudi-bot edited a comment on pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427


   
   ## CI report:
   
   * 834080ee7aa7c5ad2fba139d7dd8f66ddaf75b0d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1098)
 
   * 1a30511e4a029d862d22864659a5be0fac7a4fd0 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1101)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Bulk Insert For Spark Sql
> -
>
> Key: HUDI-2208
> URL: https://issues.apache.org/jira/browse/HUDI-2208
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

2021-07-22 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385547#comment-17385547
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

hudi-bot edited a comment on pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427


   
   ## CI report:
   
   * 834080ee7aa7c5ad2fba139d7dd8f66ddaf75b0d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1098)
 
   * 1a30511e4a029d862d22864659a5be0fac7a4fd0 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Bulk Insert For Spark Sql
> -
>
> Key: HUDI-2208
> URL: https://issues.apache.org/jira/browse/HUDI-2208
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

2021-07-22 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385490#comment-17385490
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

hudi-bot edited a comment on pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427


   
   ## CI report:
   
   * 834080ee7aa7c5ad2fba139d7dd8f66ddaf75b0d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1098)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Bulk Insert For Spark Sql
> -
>
> Key: HUDI-2208
> URL: https://issues.apache.org/jira/browse/HUDI-2208
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

2021-07-22 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385464#comment-17385464
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

hudi-bot edited a comment on pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427


   
   ## CI report:
   
   * 834080ee7aa7c5ad2fba139d7dd8f66ddaf75b0d Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1098)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Bulk Insert For Spark Sql
> -
>
> Key: HUDI-2208
> URL: https://issues.apache.org/jira/browse/HUDI-2208
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

2021-07-22 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385461#comment-17385461
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

hudi-bot commented on pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427


   
   ## CI report:
   
   * 834080ee7aa7c5ad2fba139d7dd8f66ddaf75b0d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Bulk Insert For Spark Sql
> -
>
> Key: HUDI-2208
> URL: https://issues.apache.org/jira/browse/HUDI-2208
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

2021-07-22 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385459#comment-17385459
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

pengzhiwei2018 opened a new pull request #3328:
URL: https://github.com/apache/hudi/pull/3328


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Bulk Insert For Spark Sql
> -
>
> Key: HUDI-2208
> URL: https://issues.apache.org/jira/browse/HUDI-2208
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

56 matches

Mail list logo