Re: [PR] [HUDI-7605] allow merger strategy to be set in spark sql writer [hudi]

2024-04-11 Thread via GitHub


yihua merged PR #10999:
URL: https://github.com/apache/hudi/pull/10999


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7605] allow merger strategy to be set in spark sql writer [hudi]

2024-04-11 Thread via GitHub


hudi-bot commented on PR #10999:
URL: https://github.com/apache/hudi/pull/10999#issuecomment-2050765903

   
   ## CI report:
   
   * 15e59507262bb635269fc03c820b518558eb267a Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23201)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7605] allow merger strategy to be set in spark sql writer [hudi]

2024-04-11 Thread via GitHub


hudi-bot commented on PR #10999:
URL: https://github.com/apache/hudi/pull/10999#issuecomment-2050680031

   
   ## CI report:
   
   * d392ef9a33b9019a8fadb9c4117cdca48116b48f Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23198)
 
   * 15e59507262bb635269fc03c820b518558eb267a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23201)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7605] allow merger strategy to be set in spark sql writer [hudi]

2024-04-11 Thread via GitHub


hudi-bot commented on PR #10999:
URL: https://github.com/apache/hudi/pull/10999#issuecomment-2050673612

   
   ## CI report:
   
   * d392ef9a33b9019a8fadb9c4117cdca48116b48f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23198)
 
   * 15e59507262bb635269fc03c820b518558eb267a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7605] allow merger strategy to be set in spark sql writer [hudi]

2024-04-11 Thread via GitHub


yihua commented on code in PR #10999:
URL: https://github.com/apache/hudi/pull/10999#discussion_r1561708412


##
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestMORDataSource.scala:
##
@@ -1405,4 +1405,24 @@ class TestMORDataSource extends 
HoodieSparkClientTestBase with SparkDatasetMixin
   basePath
 }
   }
+
+  @Test
+  def testMergerStrategySet(): Unit = {
+val (writeOpts, _) = getWriterReaderOpts()
+val input = recordsToStrings(dataGen.generateInserts("000", 1)).asScala
+val inputDf= spark.read.json(spark.sparkContext.parallelize(input, 1))
+val mergerStrategyName = "asfdasf"

Review Comment:
   make a more readable name here?



##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:
##
@@ -309,6 +310,7 @@ class HoodieSparkSqlWriterInternal {
   .setPartitionMetafileUseBaseFormat(useBaseFormatMetaFile)
   
.setShouldDropPartitionColumns(hoodieConfig.getBooleanOrDefault(HoodieTableConfig.DROP_PARTITION_COLUMNS))
   .setCommitTimezone(timelineTimeZone)
+  .setRecordMergerStrategy(recordMergerStrategy)

Review Comment:
   inline the 
`hoodieConfig.getStringOrDefault(DataSourceWriteOptions.RECORD_MERGER_STRATEGY)`?



##
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestMORDataSource.scala:
##
@@ -1405,4 +1405,24 @@ class TestMORDataSource extends 
HoodieSparkClientTestBase with SparkDatasetMixin
   basePath
 }
   }
+
+  @Test
+  def testMergerStrategySet(): Unit = {
+val (writeOpts, _) = getWriterReaderOpts()
+val input = recordsToStrings(dataGen.generateInserts("000", 1)).asScala
+val inputDf= spark.read.json(spark.sparkContext.parallelize(input, 1))
+val mergerStrategyName = "asfdasf"
+inputDf.write.format("org.apache.hudi")

Review Comment:
   ```suggestion
   inputDf.write.format("hudi")
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7605] allow merger strategy to be set in spark sql writer [hudi]

2024-04-11 Thread via GitHub


hudi-bot commented on PR #10999:
URL: https://github.com/apache/hudi/pull/10999#issuecomment-2050521034

   
   ## CI report:
   
   * d392ef9a33b9019a8fadb9c4117cdca48116b48f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23198)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7605] allow merger strategy to be set in spark sql writer [hudi]

2024-04-11 Thread via GitHub


hudi-bot commented on PR #10999:
URL: https://github.com/apache/hudi/pull/10999#issuecomment-2050508350

   
   ## CI report:
   
   * d392ef9a33b9019a8fadb9c4117cdca48116b48f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org