Re: [PR] [HUDI-7605] allow merger strategy to be set in spark sql writer [hudi]
yihua merged PR #10999: URL: https://github.com/apache/hudi/pull/10999 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7605] allow merger strategy to be set in spark sql writer [hudi]
hudi-bot commented on PR #10999: URL: https://github.com/apache/hudi/pull/10999#issuecomment-2050765903 ## CI report: * 15e59507262bb635269fc03c820b518558eb267a Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23201) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7605] allow merger strategy to be set in spark sql writer [hudi]
hudi-bot commented on PR #10999: URL: https://github.com/apache/hudi/pull/10999#issuecomment-2050680031 ## CI report: * d392ef9a33b9019a8fadb9c4117cdca48116b48f Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23198) * 15e59507262bb635269fc03c820b518558eb267a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23201) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7605] allow merger strategy to be set in spark sql writer [hudi]
hudi-bot commented on PR #10999: URL: https://github.com/apache/hudi/pull/10999#issuecomment-2050673612 ## CI report: * d392ef9a33b9019a8fadb9c4117cdca48116b48f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23198) * 15e59507262bb635269fc03c820b518558eb267a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7605] allow merger strategy to be set in spark sql writer [hudi]
yihua commented on code in PR #10999: URL: https://github.com/apache/hudi/pull/10999#discussion_r1561708412 ## hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestMORDataSource.scala: ## @@ -1405,4 +1405,24 @@ class TestMORDataSource extends HoodieSparkClientTestBase with SparkDatasetMixin basePath } } + + @Test + def testMergerStrategySet(): Unit = { +val (writeOpts, _) = getWriterReaderOpts() +val input = recordsToStrings(dataGen.generateInserts("000", 1)).asScala +val inputDf= spark.read.json(spark.sparkContext.parallelize(input, 1)) +val mergerStrategyName = "asfdasf" Review Comment: make a more readable name here? ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala: ## @@ -309,6 +310,7 @@ class HoodieSparkSqlWriterInternal { .setPartitionMetafileUseBaseFormat(useBaseFormatMetaFile) .setShouldDropPartitionColumns(hoodieConfig.getBooleanOrDefault(HoodieTableConfig.DROP_PARTITION_COLUMNS)) .setCommitTimezone(timelineTimeZone) + .setRecordMergerStrategy(recordMergerStrategy) Review Comment: inline the `hoodieConfig.getStringOrDefault(DataSourceWriteOptions.RECORD_MERGER_STRATEGY)`? ## hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestMORDataSource.scala: ## @@ -1405,4 +1405,24 @@ class TestMORDataSource extends HoodieSparkClientTestBase with SparkDatasetMixin basePath } } + + @Test + def testMergerStrategySet(): Unit = { +val (writeOpts, _) = getWriterReaderOpts() +val input = recordsToStrings(dataGen.generateInserts("000", 1)).asScala +val inputDf= spark.read.json(spark.sparkContext.parallelize(input, 1)) +val mergerStrategyName = "asfdasf" +inputDf.write.format("org.apache.hudi") Review Comment: ```suggestion inputDf.write.format("hudi") ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7605] allow merger strategy to be set in spark sql writer [hudi]
hudi-bot commented on PR #10999: URL: https://github.com/apache/hudi/pull/10999#issuecomment-2050521034 ## CI report: * d392ef9a33b9019a8fadb9c4117cdca48116b48f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23198) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7605] allow merger strategy to be set in spark sql writer [hudi]
hudi-bot commented on PR #10999: URL: https://github.com/apache/hudi/pull/10999#issuecomment-2050508350 ## CI report: * d392ef9a33b9019a8fadb9c4117cdca48116b48f UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org