[GitHub] [hudi] garyli1019 closed pull request #2783: [DOCS]Add docs for 0.8.0 release
garyli1019 closed pull request #2783: URL: https://github.com/apache/hudi/pull/2783 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] garyli1019 commented on pull request #2783: [DOCS]Add docs for 0.8.0 release
garyli1019 commented on pull request #2783: URL: https://github.com/apache/hudi/pull/2783#issuecomment-815496613 closing pr for now, will reopen once fixed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-1716) rt view w/ MOR tables fails after schema evolution
[ https://issues.apache.org/jira/browse/HUDI-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aditya Tiwari updated HUDI-1716: Status: In Progress (was: Open) > rt view w/ MOR tables fails after schema evolution > -- > > Key: HUDI-1716 > URL: https://issues.apache.org/jira/browse/HUDI-1716 > Project: Apache Hudi > Issue Type: Bug > Components: Storage Management >Reporter: sivabalan narayanan >Assignee: Aditya Tiwari >Priority: Major > Labels: pull-request-available, sev:critical, user-support-issues > Fix For: 0.9.0 > > > Looks like realtime view w/ MOR table fails if schema present in existing log > file is evolved to add a new field. no issues w/ writing. but reading fails > More info: [https://github.com/apache/hudi/issues/2675] > > gist of the stack trace: > Caused by: org.apache.avro.AvroTypeException: Found > hoodie.hudi_trips_cow.hudi_trips_cow_record, expecting > hoodie.hudi_trips_cow.hudi_trips_cow_record, missing required field > evolvedFieldCaused by: org.apache.avro.AvroTypeException: Found > hoodie.hudi_trips_cow.hudi_trips_cow_record, expecting > hoodie.hudi_trips_cow.hudi_trips_cow_record, missing required field > evolvedField at > org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:292) at > org.apache.avro.io.parsing.Parser.advance(Parser.java:88) at > org.apache.avro.io.ResolvingDecoder.readFieldOrder(ResolvingDecoder.java:130) > at > org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:215) > at > org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:145) > at > org.apache.hudi.common.table.log.block.HoodieAvroDataBlock.deserializeRecords(HoodieAvroDataBlock.java:165) > at > org.apache.hudi.common.table.log.block.HoodieDataBlock.createRecordsFromContentBytes(HoodieDataBlock.java:128) > at > org.apache.hudi.common.table.log.block.HoodieDataBlock.getRecords(HoodieDataBlock.java:106) > at > org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.processDataBlock(AbstractHoodieLogRecordScanner.java:289) > at > org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.processQueuedBlocksForInstant(AbstractHoodieLogRecordScanner.java:324) > at > org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.scan(AbstractHoodieLogRecordScanner.java:252) > ... 24 more21/03/25 11:27:03 WARN TaskSetManager: Lost task 0.0 in stage > 83.0 (TID 667, sivabala-c02xg219jgh6.attlocal.net, executor driver): > org.apache.hudi.exception.HoodieException: Exception when reading log file > at > org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.scan(AbstractHoodieLogRecordScanner.java:261) > at > org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.performScan(HoodieMergedLogRecordScanner.java:100) > at > org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.(HoodieMergedLogRecordScanner.java:93) > at > org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.(HoodieMergedLogRecordScanner.java:75) > at > org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner$Builder.build(HoodieMergedLogRecordScanner.java:230) > at > org.apache.hudi.HoodieMergeOnReadRDD$.scanLog(HoodieMergeOnReadRDD.scala:328) > at > org.apache.hudi.HoodieMergeOnReadRDD$$anon$3.(HoodieMergeOnReadRDD.scala:210) > at > org.apache.hudi.HoodieMergeOnReadRDD.payloadCombineFileIterator(HoodieMergeOnReadRDD.scala:200) > at > org.apache.hudi.HoodieMergeOnReadRDD.compute(HoodieMergeOnReadRDD.scala:77) > > Logs from local run: > [https://gist.github.com/nsivabalan/656956ab313676617d84002ef8942198] > diff with which above logs were generated: > [https://gist.github.com/nsivabalan/84dad29bc1ab567ebb6ee8c63b3969ec] > > Steps to reproduce in spark shell: > # create MOR table w/ schema1. > # Ingest (with schema1) until log files are created. // verify via hudi-cli. > It took me 2 batch of updates to see a log file. > # create a new schema2 with one new additional field. ingest a batch with > schema2 that updates existing records. > # read entire dataset. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] jintaoguan commented on a change in pull request #2773: [HUDI-1764] Add Hudi-CLI support for clustering
jintaoguan commented on a change in pull request #2773: URL: https://github.com/apache/hudi/pull/2773#discussion_r609351211 ## File path: hudi-cli/src/main/java/org/apache/hudi/cli/commands/ClusteringCommand.java ## @@ -0,0 +1,102 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.cli.commands; + +import org.apache.hudi.cli.HoodieCLI; +import org.apache.hudi.cli.commands.SparkMain.SparkCommand; +import org.apache.hudi.cli.utils.InputStreamConsumer; +import org.apache.hudi.cli.utils.SparkUtil; +import org.apache.hudi.common.table.HoodieTableMetaClient; +import org.apache.hudi.common.table.timeline.HoodieActiveTimeline; +import org.apache.hudi.utilities.UtilHelpers; +import org.apache.log4j.LogManager; +import org.apache.log4j.Logger; +import org.apache.spark.launcher.SparkLauncher; +import org.apache.spark.util.Utils; +import org.springframework.shell.core.CommandMarker; +import org.springframework.shell.core.annotation.CliCommand; +import org.springframework.shell.core.annotation.CliOption; +import org.springframework.stereotype.Component; +import scala.collection.JavaConverters; + +@Component +public class ClusteringCommand implements CommandMarker { + + private static final Logger LOG = LogManager.getLogger(ClusteringCommand.class); + + @CliCommand(value = "clustering schedule", help = "Schedule Clustering") + public String scheduleClustering( + @CliOption(key = "sparkMemory", help = "Spark executor memory", + unspecifiedDefaultValue = "1G") final String sparkMemory, + @CliOption(key = "propsFilePath", help = "path to properties file on localfs or dfs with configurations for hoodie client for clustering", + unspecifiedDefaultValue = "") final String propsFilePath, + @CliOption(key = "hoodieConfigs", help = "Any configuration that can be set in the properties file can be passed here in the form of an array", + unspecifiedDefaultValue = "") final String[] configs) throws Exception { +HoodieTableMetaClient client = HoodieCLI.getTableMetaClient(); Review comment: Good catch! Thanks. ## File path: hudi-cli/src/main/java/org/apache/hudi/cli/commands/ClusteringCommand.java ## @@ -0,0 +1,102 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.cli.commands; + +import org.apache.hudi.cli.HoodieCLI; +import org.apache.hudi.cli.commands.SparkMain.SparkCommand; +import org.apache.hudi.cli.utils.InputStreamConsumer; +import org.apache.hudi.cli.utils.SparkUtil; +import org.apache.hudi.common.table.HoodieTableMetaClient; +import org.apache.hudi.common.table.timeline.HoodieActiveTimeline; +import org.apache.hudi.utilities.UtilHelpers; +import org.apache.log4j.LogManager; +import org.apache.log4j.Logger; +import org.apache.spark.launcher.SparkLauncher; +import org.apache.spark.util.Utils; +import org.springframework.shell.core.CommandMarker; +import org.springframework.shell.core.annotation.CliCommand; +import org.springframework.shell.core.annotation.CliOption; +import org.springframework.stereotype.Component; +import scala.collection.JavaConverters; + +@Component +public class ClusteringCommand implements CommandMarker { + + private static final Logger LOG = LogManager.getLogger(ClusteringCommand.class); + + @CliCommand(value = "clustering schedule", help = "Schedule Clustering") + public String scheduleClustering( + @CliOption(key = "spa
[GitHub] [hudi] n3nash commented on pull request #2388: [HUDI-1353] add incremental timeline support for pending clustering ops
n3nash commented on pull request #2388: URL: https://github.com/apache/hudi/pull/2388#issuecomment-815488402 @satishkotha gentle reminder -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-io edited a comment on pull request #2785: [HUDI-1775] Add option for compaction parallelism
codecov-io edited a comment on pull request #2785: URL: https://github.com/apache/hudi/pull/2785#issuecomment-815400787 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2785?src=pr&el=h1) Report > Merging [#2785](https://codecov.io/gh/apache/hudi/pull/2785?src=pr&el=desc) (a53b11e) into [master](https://codecov.io/gh/apache/hudi/commit/3a926aacf6552fc06005db4a7880a233db904330?el=desc) (3a926aa) will **increase** coverage by `0.01%`. > The diff coverage is `94.11%`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2785/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2785?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master#2785 +/- ## + Coverage 47.05% 47.07% +0.01% - Complexity 3357 3359 +2 Files 484 484 Lines 2309423107 +13 Branches 2456 2457 +1 + Hits 1086810878 +10 - Misses1128011282 +2 - Partials946 947 +1 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `36.94% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudiclient | `∅ <ø> (∅)` | `0.00 <ø> (ø)` | | | hudicommon | `50.77% <ø> (-0.02%)` | `0.00 <ø> (ø)` | | | hudiflink | `56.71% <94.11%> (+0.12%)` | `0.00 <0.00> (ø)` | | | hudihadoopmr | `33.44% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudisparkdatasource | `71.33% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudisync | `45.47% <ø> (ø)` | `0.00 <ø> (ø)` | | | huditimelineservice | `64.36% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudiutilities | `9.37% <ø> (ø)` | `0.00 <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2785?src=pr&el=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...in/java/org/apache/hudi/table/HoodieTableSink.java](https://codecov.io/gh/apache/hudi/pull/2785/diff?src=pr&el=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS90YWJsZS9Ib29kaWVUYWJsZVNpbmsuamF2YQ==) | `11.90% <0.00%> (-0.30%)` | `2.00 <0.00> (ø)` | | | [...va/org/apache/hudi/configuration/FlinkOptions.java](https://codecov.io/gh/apache/hudi/pull/2785/diff?src=pr&el=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9jb25maWd1cmF0aW9uL0ZsaW5rT3B0aW9ucy5qYXZh) | `89.42% <100.00%> (+0.35%)` | `11.00 <0.00> (ø)` | | | [...he/hudi/sink/partitioner/BucketAssignFunction.java](https://codecov.io/gh/apache/hudi/pull/2785/diff?src=pr&el=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL3BhcnRpdGlvbmVyL0J1Y2tldEFzc2lnbkZ1bmN0aW9uLmphdmE=) | `88.00% <100.00%> (+0.32%)` | `17.00 <0.00> (+2.00)` | | | [...e/hudi/common/table/log/HoodieLogFormatWriter.java](https://codecov.io/gh/apache/hudi/pull/2785/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9Ib29kaWVMb2dGb3JtYXRXcml0ZXIuamF2YQ==) | `78.12% <0.00%> (-1.57%)` | `26.00% <0.00%> (ø%)` | | -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] n3nash commented on issue #2743: Do we have any TTL mechanism in Hudi?
n3nash commented on issue #2743: URL: https://github.com/apache/hudi/issues/2743#issuecomment-815473655 @aditiwari01 Here is the ticket and is assigned to you for now :) BTW, there is some relevant work happening here https://github.com/apache/hudi/pull/2452. Please comment on the PR for further changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] n3nash closed issue #2743: Do we have any TTL mechanism in Hudi?
n3nash closed issue #2743: URL: https://github.com/apache/hudi/issues/2743 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-1777) Add SparkDatasource support for delete_partition API
[ https://issues.apache.org/jira/browse/HUDI-1777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishith Agarwal updated HUDI-1777: -- Labels: feature-request sev:normal (was: ) > Add SparkDatasource support for delete_partition API > > > Key: HUDI-1777 > URL: https://issues.apache.org/jira/browse/HUDI-1777 > Project: Apache Hudi > Issue Type: Improvement > Components: Writer Core >Reporter: Nishith Agarwal >Assignee: Aditya Tiwari >Priority: Major > Labels: feature-request, sev:normal > > The `delete_partition` API is supported through the hoodie write client but > not through spark datasource, this ticket tracks the effort to add support > there. > See [https://github.com/apache/hudi/pull/2452] for more details. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-1777) Add SparkDatasource support for delete_partition API
Nishith Agarwal created HUDI-1777: - Summary: Add SparkDatasource support for delete_partition API Key: HUDI-1777 URL: https://issues.apache.org/jira/browse/HUDI-1777 Project: Apache Hudi Issue Type: Improvement Components: Writer Core Reporter: Nishith Agarwal Assignee: Aditya Tiwari The `delete_partition` API is supported through the hoodie write client but not through spark datasource, this ticket tracks the effort to add support there. See [https://github.com/apache/hudi/pull/2452] for more details. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] n3nash commented on issue #2623: org.apache.hudi.exception.HoodieDependentSystemUnavailableException:System HBASE unavailable.
n3nash commented on issue #2623: URL: https://github.com/apache/hudi/issues/2623#issuecomment-815470053 @root18039532923 Let me know if your issue was resolved after backporting that PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 closed pull request #2785: [HUDI-1775] Add option for compaction parallelism
danny0405 closed pull request #2785: URL: https://github.com/apache/hudi/pull/2785 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] n3nash commented on issue #2680: [SUPPORT]Hive sync error by using run_sync_tool.sh
n3nash commented on issue #2680: URL: https://github.com/apache/hudi/issues/2680#issuecomment-815465310 @ztcheck What changes did you make to the `run_sync_too.sh` ? Can you list out the jars you added to the classpath ? It seems like some of the classes should be packaged in the `hudi-hive-sync-bundle` but they are not. Once you provide the packages you added to your classpath, we can then see how to add those to the bundle ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] n3nash commented on issue #2692: [SUPPORT] Corrupt Blocks in Google Cloud Storage
n3nash commented on issue #2692: URL: https://github.com/apache/hudi/issues/2692#issuecomment-815463131 @stackfun Can you respond to @vburenin question ? We can try to go from there.. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] n3nash edited a comment on issue #2692: [SUPPORT] Corrupt Blocks in Google Cloud Storage
n3nash edited a comment on issue #2692: URL: https://github.com/apache/hudi/issues/2692#issuecomment-815462588 @vburenin Can you please open a JIRA ticket with the details on "huge data losses with hudi 0.5.0 and EMR" ? This seems super critical and I would like to know the issues ASAP, don't want to pollute this thread. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] n3nash commented on issue #2692: [SUPPORT] Corrupt Blocks in Google Cloud Storage
n3nash commented on issue #2692: URL: https://github.com/apache/hudi/issues/2692#issuecomment-815462588 @vburenin Can you please open a JIRA ticket with the details on "huge data losses with hudi 0.5.0 and EMR" ? I don't want to pollute this thread. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] aditiwari01 commented on issue #2743: Do we have any TTL mechanism in Hudi?
aditiwari01 commented on issue #2743: URL: https://github.com/apache/hudi/issues/2743#issuecomment-815462565 @n3nash Thanks for the clarificatio. Can we create a jira for the same. I can't pick this right away but would try to conntribute as and when I get time. Meanwhile I will try to directly use the low level api to unblock myself. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-1711) Avro Schema Exception with Spark 3.0 in 0.7
[ https://issues.apache.org/jira/browse/HUDI-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishith Agarwal updated HUDI-1711: -- Labels: sev:critical user-support-issues (was: sev:triage user-support-issues) > Avro Schema Exception with Spark 3.0 in 0.7 > --- > > Key: HUDI-1711 > URL: https://issues.apache.org/jira/browse/HUDI-1711 > Project: Apache Hudi > Issue Type: Bug > Components: DeltaStreamer >Reporter: Balaji Varadarajan >Assignee: sivabalan narayanan >Priority: Major > Labels: sev:critical, user-support-issues > > GH: [https://github.com/apache/hudi/issues/2705] > > > {{21/03/22 10:10:35 WARN util.package: Truncated the string representation of > a plan since it was too large. This behavior can be adjusted by setting > 'spark.sql.debug.maxToStringFields'. > 21/03/22 10:10:35 ERROR executor.Executor: Exception in task 0.0 in stage 1.0 > (TID 1) > java.lang.RuntimeException: Error while decoding: > java.lang.NegativeArraySizeException: -1255727808 > createexternalrow(if (isnull(input[0, > struct, > true])) null else createexternalrow(if (input[0, > struct, > true].isNullAt) null else input[0, > struct, > true].id, if (input[0, > struct, > true].isNullAt) null else input[0, > struct, > true].name.toString, if (input[0, > struct, > true].isNullAt) null else input[0, > struct, > true].type.toString, if (input[0, > struct, > true].isNullAt) null else input[0, > struct, > true].url.toString, if (input[0, > struct, > true].isNullAt) null else input[0, > struct, > true].user.toString, if (input[0, > struct, > true].isNullAt) null else input[0, > struct, > true].password.toString, if (input[0, > struct, > true].isNullAt) null else input[0, > struct, > true].create_time.toString, if (input[0, > struct, > true].isNullAt) null else input[0, > struct, > true].create_user.toString, if (input[0, > struct, > true].isNullAt) null else input[0, > struct, > true].update_time.toString, if (input[0, > struct, > true].isNullAt) null else input[0, > struct, > true].update_user.toString, if (input[0, > struct, > true].isNullAt) null else input[0, > struct, > true].del_flag, StructField(id,IntegerType,false), > StructField(name,StringType,true), StructField(type,StringType,true), > StructField(url,StringType,true), StructField(user,StringType,true), > StructField(password,StringType,true), > StructField(create_time,StringType,true), > StructField(create_user,StringType,true), > StructField(update_time,StringType,true), > StructField(update_user,StringType,true), > StructField(del_flag,IntegerType,true)), if (isnull(input[1, > struct, > true])) null else createexternalrow(if (input[1, > struct, > true].isNullAt) null else input[1, > struct, > true].id, if (input[1, > struct, > true].isNullAt) null else input[1, > struct, > true].name.toString, if (input[1, > struct, > true].isNullAt) null else input[1, > struct, > true].type.toString, if (input[1, > struct, > true].isNullAt) null else input[1, > struct, > true].url.toString, if (input[1, > struct, > true].isNullAt) null else input[1, > struct, > true].user.toString, if (input[1, > struct, > true].isNullAt) null else input[1, > struct, > true].password.toString, if (input[1, > struct, > true].isNullAt) null else input[1, > struct, > true].create_time.toString, if (input[1, > struct, > true].isNullAt) null else input[1, > struct, > true].create_user.toString, if (input[1, > struct, > true].isNullAt) null else input[1, > struct, > true].update_time.toString, if (input[1, > struct, > true].isNullAt) null else input[1, > struct, > true].update_user.toString, if (input[1, > struct, > true].isNullAt) null else input[1, > struct, > true].del_flag, StructField(id,IntegerType,false), > StructField(name,StringType,true), StructField(type,StringType,true), > StructField(url,StringType,true), StructField(user,StringType,true), > StructField(password,StringType,true), > StructField(create_time,StringType,true), > StructField(create_user,StringType,true), > StructField(update_time,StringType,true), > StructField(update_user,StringType,true), > StructField(del_flag,IntegerType,true)), if (isnull(input[2, > struct, > false])) null else createexternalrow(if (input[2, > struct, > false].isNullAt) null else input[2, > struct, > false].version.toString, if (input[2, > struct, > false].isNullAt) null else input[2, > struct, > false].connector.toString, if (input[2, > struct, > false].isNullAt) null else input[2, > struct, > false].name.toString, if (input[2, > struct, > false].isNullAt) null else input[2, > struct, > false].ts_ms, if (input[2, > struct, > false].isNullAt) null else input[2, > struct, > false].snapshot.toString, if (input
[GitHub] [hudi] n3nash closed issue #2705: [SUPPORT] Can not read data schema using Spark3.0.2 on k8s with hudi-utilities (build in 2.12 and spark3)
n3nash closed issue #2705: URL: https://github.com/apache/hudi/issues/2705 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] n3nash commented on issue #2705: [SUPPORT] Can not read data schema using Spark3.0.2 on k8s with hudi-utilities (build in 2.12 and spark3)
n3nash commented on issue #2705: URL: https://github.com/apache/hudi/issues/2705#issuecomment-815461755 Closing this issue since this requires a bug fix, please follow the JIRA above for updates/details. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (HUDI-1711) Avro Schema Exception with Spark 3.0 in 0.7
[ https://issues.apache.org/jira/browse/HUDI-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishith Agarwal reassigned HUDI-1711: - Assignee: sivabalan narayanan > Avro Schema Exception with Spark 3.0 in 0.7 > --- > > Key: HUDI-1711 > URL: https://issues.apache.org/jira/browse/HUDI-1711 > Project: Apache Hudi > Issue Type: Bug > Components: DeltaStreamer >Reporter: Balaji Varadarajan >Assignee: sivabalan narayanan >Priority: Major > Labels: sev:triage, user-support-issues > > GH: [https://github.com/apache/hudi/issues/2705] > > > {{21/03/22 10:10:35 WARN util.package: Truncated the string representation of > a plan since it was too large. This behavior can be adjusted by setting > 'spark.sql.debug.maxToStringFields'. > 21/03/22 10:10:35 ERROR executor.Executor: Exception in task 0.0 in stage 1.0 > (TID 1) > java.lang.RuntimeException: Error while decoding: > java.lang.NegativeArraySizeException: -1255727808 > createexternalrow(if (isnull(input[0, > struct, > true])) null else createexternalrow(if (input[0, > struct, > true].isNullAt) null else input[0, > struct, > true].id, if (input[0, > struct, > true].isNullAt) null else input[0, > struct, > true].name.toString, if (input[0, > struct, > true].isNullAt) null else input[0, > struct, > true].type.toString, if (input[0, > struct, > true].isNullAt) null else input[0, > struct, > true].url.toString, if (input[0, > struct, > true].isNullAt) null else input[0, > struct, > true].user.toString, if (input[0, > struct, > true].isNullAt) null else input[0, > struct, > true].password.toString, if (input[0, > struct, > true].isNullAt) null else input[0, > struct, > true].create_time.toString, if (input[0, > struct, > true].isNullAt) null else input[0, > struct, > true].create_user.toString, if (input[0, > struct, > true].isNullAt) null else input[0, > struct, > true].update_time.toString, if (input[0, > struct, > true].isNullAt) null else input[0, > struct, > true].update_user.toString, if (input[0, > struct, > true].isNullAt) null else input[0, > struct, > true].del_flag, StructField(id,IntegerType,false), > StructField(name,StringType,true), StructField(type,StringType,true), > StructField(url,StringType,true), StructField(user,StringType,true), > StructField(password,StringType,true), > StructField(create_time,StringType,true), > StructField(create_user,StringType,true), > StructField(update_time,StringType,true), > StructField(update_user,StringType,true), > StructField(del_flag,IntegerType,true)), if (isnull(input[1, > struct, > true])) null else createexternalrow(if (input[1, > struct, > true].isNullAt) null else input[1, > struct, > true].id, if (input[1, > struct, > true].isNullAt) null else input[1, > struct, > true].name.toString, if (input[1, > struct, > true].isNullAt) null else input[1, > struct, > true].type.toString, if (input[1, > struct, > true].isNullAt) null else input[1, > struct, > true].url.toString, if (input[1, > struct, > true].isNullAt) null else input[1, > struct, > true].user.toString, if (input[1, > struct, > true].isNullAt) null else input[1, > struct, > true].password.toString, if (input[1, > struct, > true].isNullAt) null else input[1, > struct, > true].create_time.toString, if (input[1, > struct, > true].isNullAt) null else input[1, > struct, > true].create_user.toString, if (input[1, > struct, > true].isNullAt) null else input[1, > struct, > true].update_time.toString, if (input[1, > struct, > true].isNullAt) null else input[1, > struct, > true].update_user.toString, if (input[1, > struct, > true].isNullAt) null else input[1, > struct, > true].del_flag, StructField(id,IntegerType,false), > StructField(name,StringType,true), StructField(type,StringType,true), > StructField(url,StringType,true), StructField(user,StringType,true), > StructField(password,StringType,true), > StructField(create_time,StringType,true), > StructField(create_user,StringType,true), > StructField(update_time,StringType,true), > StructField(update_user,StringType,true), > StructField(del_flag,IntegerType,true)), if (isnull(input[2, > struct, > false])) null else createexternalrow(if (input[2, > struct, > false].isNullAt) null else input[2, > struct, > false].version.toString, if (input[2, > struct, > false].isNullAt) null else input[2, > struct, > false].connector.toString, if (input[2, > struct, > false].isNullAt) null else input[2, > struct, > false].name.toString, if (input[2, > struct, > false].isNullAt) null else input[2, > struct, > false].ts_ms, if (input[2, > struct, > false].isNullAt) null else input[2, > struct, > false].snapshot.toString, if (input[2, > struct, > false].isNullAt) null else i
[hudi] branch master updated: [MINOR] Some unit test code optimize (#2782)
This is an automated email from the ASF dual-hosted git repository. wangxianghu pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 18459d4 [MINOR] Some unit test code optimize (#2782) 18459d4 is described below commit 18459d4045ec4a85081c227893b226a4d759f84b Author: Simon <3656...@qq.com> AuthorDate: Thu Apr 8 13:35:03 2021 +0800 [MINOR] Some unit test code optimize (#2782) * Optimized code * Optimized code --- .../java/org/apache/hudi/utils/TestConcatenatingIterator.java| 9 + .../hudi/integ/testsuite/converter/TestUpdateConverter.java | 9 + 2 files changed, 10 insertions(+), 8 deletions(-) diff --git a/hudi-client/hudi-client-common/src/test/java/org/apache/hudi/utils/TestConcatenatingIterator.java b/hudi-client/hudi-client-common/src/test/java/org/apache/hudi/utils/TestConcatenatingIterator.java index af4c4fb..fc591ed 100644 --- a/hudi-client/hudi-client-common/src/test/java/org/apache/hudi/utils/TestConcatenatingIterator.java +++ b/hudi-client/hudi-client-common/src/test/java/org/apache/hudi/utils/TestConcatenatingIterator.java @@ -23,6 +23,7 @@ import org.junit.jupiter.api.Test; import java.util.ArrayList; import java.util.Arrays; +import java.util.Collections; import java.util.Iterator; import java.util.List; @@ -36,8 +37,8 @@ public class TestConcatenatingIterator { @Test public void testConcatBasic() { Iterator i1 = Arrays.asList(5, 3, 2, 1).iterator(); -Iterator i2 = new ArrayList().iterator(); // empty iterator -Iterator i3 = Arrays.asList(3).iterator(); +Iterator i2 = Collections.emptyIterator(); // empty iterator +Iterator i3 = Collections.singletonList(3).iterator(); ConcatenatingIterator ci = new ConcatenatingIterator<>(Arrays.asList(i1, i2, i3)); List allElements = new ArrayList<>(); @@ -51,9 +52,9 @@ public class TestConcatenatingIterator { @Test public void testConcatError() { -Iterator i1 = new ArrayList().iterator(); // empty iterator +Iterator i1 = Collections.emptyIterator(); // empty iterator -ConcatenatingIterator ci = new ConcatenatingIterator<>(Arrays.asList(i1)); +ConcatenatingIterator ci = new ConcatenatingIterator<>(Collections.singletonList(i1)); assertFalse(ci.hasNext()); try { ci.next(); diff --git a/hudi-integ-test/src/test/java/org/apache/hudi/integ/testsuite/converter/TestUpdateConverter.java b/hudi-integ-test/src/test/java/org/apache/hudi/integ/testsuite/converter/TestUpdateConverter.java index c48d1b1..e162448 100644 --- a/hudi-integ-test/src/test/java/org/apache/hudi/integ/testsuite/converter/TestUpdateConverter.java +++ b/hudi-integ-test/src/test/java/org/apache/hudi/integ/testsuite/converter/TestUpdateConverter.java @@ -21,6 +21,7 @@ package org.apache.hudi.integ.testsuite.converter; import static junit.framework.TestCase.assertTrue; import java.util.Arrays; +import java.util.Collections; import java.util.List; import java.util.Map; @@ -65,7 +66,7 @@ public class TestUpdateConverter { // 2. DFS converter reads existing records and generates random updates for the same row keys UpdateConverter updateConverter = new UpdateConverter(schemaStr, minPayloadSize, -Arrays.asList("timestamp"), Arrays.asList("_row_key")); +Collections.singletonList("timestamp"), Collections.singletonList("_row_key")); List insertRowKeys = inputRDD.map(r -> r.get("_row_key").toString()).collect(); assertTrue(inputRDD.count() == 10); JavaRDD outputRDD = updateConverter.convert(inputRDD); @@ -75,7 +76,7 @@ public class TestUpdateConverter { Map inputRecords = inputRDD.mapToPair(r -> new Tuple2<>(r.get("_row_key").toString(), r)) .collectAsMap(); List updateRecords = outputRDD.collect(); -updateRecords.stream().forEach(updateRecord -> { +updateRecords.forEach(updateRecord -> { GenericRecord inputRecord = inputRecords.get(updateRecord.get("_row_key").toString()); assertTrue(areRecordsDifferent(inputRecord, updateRecord)); }); @@ -87,11 +88,11 @@ public class TestUpdateConverter { */ private boolean areRecordsDifferent(GenericRecord in, GenericRecord up) { for (Field field : in.getSchema().getFields()) { - if (field.name() == "_row_key") { + if (field.name().equals("_row_key")) { continue; } else { // Just convert all types to string for now since all are primitive -if (in.get(field.name()).toString() != up.get(field.name()).toString()) { +if (!in.get(field.name()).toString().equals(up.get(field.name()).toString())) { return true; } }
[GitHub] [hudi] wangxianghu merged pull request #2782: [MINOR] Some unit test code optimize
wangxianghu merged pull request #2782: URL: https://github.com/apache/hudi/pull/2782 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] n3nash closed issue #2707: [SUPPORT] insert_ovewrite_table failed on archiving
n3nash closed issue #2707: URL: https://github.com/apache/hudi/issues/2707 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] n3nash commented on issue #2707: [SUPPORT] insert_ovewrite_table failed on archiving
n3nash commented on issue #2707: URL: https://github.com/apache/hudi/issues/2707#issuecomment-815460958 @ssdong Thanks for opening the PR! Closing this issue now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] n3nash commented on pull request #2783: [DOCS]Add docs for 0.8.0 release
n3nash commented on pull request #2783: URL: https://github.com/apache/hudi/pull/2783#issuecomment-815457616 @garyli1019 The CI is failing, can you take a look ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] n3nash commented on issue #2743: Do we have any TTL mechanism in Hudi?
n3nash commented on issue #2743: URL: https://github.com/apache/hudi/issues/2743#issuecomment-815456351 @aditiwari01 I think you mentioned 2 issues here 1. Record level TTL -> We don't have such a feature in Hudi. Like others have pointed out, using the `hudiTable.deletePartitions()` API is a way to manage older partitions. Yes, you could partition based on _hoodie_commit_time or any other date based partitioning that structures your table to be eligible for deleting older partitions completely. 2. Duplicates across partitions -> If you have an update workload and are using the `upsert` API, yes, using a GlobalIndex will help eliminate duplicates for your table. As @nsivabalan pointed out, we don't have such support out of the spark datasource but have a low level API as pointed above. We welcome contributions and would be good to add this support in spark datasource - let me know if you want to contribute this feature and we can guide you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-io edited a comment on pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support
codecov-io edited a comment on pull request #2645: URL: https://github.com/apache/hudi/pull/2645#issuecomment-792430670 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2645?src=pr&el=h1) Report > Merging [#2645](https://codecov.io/gh/apache/hudi/pull/2645?src=pr&el=desc) (151b9d4) into [master](https://codecov.io/gh/apache/hudi/commit/e970e1f48302aec3af7eeca009a2c793757cd501?el=desc) (e970e1f) will **decrease** coverage by `42.94%`. > The diff coverage is `n/a`. > :exclamation: Current head 151b9d4 differs from pull request most recent head a63cf5e. Consider uploading reports for the commit a63cf5e to get more accurate results [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2645/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2645?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master #2645 +/- ## - Coverage 52.32% 9.37% -42.95% + Complexity 3689 48 -3641 Files 483 54 -429 Lines 230951995-21100 Branches 2460 235 -2225 - Hits 12084 187-11897 + Misses 99421795 -8147 + Partials 1069 13 -1056 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `?` | `?` | | | hudiclient | `?` | `?` | | | hudicommon | `?` | `?` | | | hudiflink | `?` | `?` | | | hudihadoopmr | `?` | `?` | | | hudisparkdatasource | `?` | `?` | | | hudisync | `?` | `?` | | | huditimelineservice | `?` | `?` | | | hudiutilities | `9.37% <ø> (-60.33%)` | `0.00 <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2645?src=pr&el=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | | | [...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | | | [...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | | | [.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | | | [.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | | | [...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | | | [...g/apache/hudi/utilities/sources/JsonDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkRGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | | | [...apache/hudi/utilities/sources/JsonKafkaSource.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkthZmthU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-6.00%)` | | | [...pache/hudi/utilities/sources/ParquetDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUGFycXVldERGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-5.00%)` | | | [...lities/schema/SchemaProviderWithPostProces
[GitHub] [hudi] yanghua commented on pull request #2325: [HUDI-699]Fix CompactionCommand and add unit test for CompactionCommand
yanghua commented on pull request #2325: URL: https://github.com/apache/hudi/pull/2325#issuecomment-815419885 > @wangxianghu: It's OK now. Thanks for your patience, I will do a final check soon. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yanghua commented on pull request #2747: [HUDI-1743] Added support for SqlFileBasedTransformer
yanghua commented on pull request #2747: URL: https://github.com/apache/hudi/pull/2747#issuecomment-815417193 > @yanghua - I don't see the unit tests for the existing transformers except for two functions, I don't have time now to write unit tests, can I handle it in a separate pull request where I can write unit tests for all transformers? It's better to follow a unified contribution guide. If we can test it, we should test it, so that we can make sure the code quality. > This is blocking my data pipelines, can we make an exception and merge this pull request? I'm happy to create a JIRA to track the unit tests for all transformers. thoughts? You can pick this patch into your inner branch. wdyt? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-io edited a comment on pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support
codecov-io edited a comment on pull request #2645: URL: https://github.com/apache/hudi/pull/2645#issuecomment-792430670 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2645?src=pr&el=h1) Report > Merging [#2645](https://codecov.io/gh/apache/hudi/pull/2645?src=pr&el=desc) (46516da) into [master](https://codecov.io/gh/apache/hudi/commit/e970e1f48302aec3af7eeca009a2c793757cd501?el=desc) (e970e1f) will **decrease** coverage by `42.94%`. > The diff coverage is `n/a`. > :exclamation: Current head 46516da differs from pull request most recent head f1d9ada. Consider uploading reports for the commit f1d9ada to get more accurate results [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2645/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2645?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master #2645 +/- ## - Coverage 52.32% 9.37% -42.95% + Complexity 3689 48 -3641 Files 483 54 -429 Lines 230951995-21100 Branches 2460 235 -2225 - Hits 12084 187-11897 + Misses 99421795 -8147 + Partials 1069 13 -1056 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `?` | `?` | | | hudiclient | `?` | `?` | | | hudicommon | `?` | `?` | | | hudiflink | `?` | `?` | | | hudihadoopmr | `?` | `?` | | | hudisparkdatasource | `?` | `?` | | | hudisync | `?` | `?` | | | huditimelineservice | `?` | `?` | | | hudiutilities | `9.37% <ø> (-60.33%)` | `0.00 <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2645?src=pr&el=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | | | [...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | | | [...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | | | [.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | | | [.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | | | [...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | | | [...g/apache/hudi/utilities/sources/JsonDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkRGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | | | [...apache/hudi/utilities/sources/JsonKafkaSource.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkthZmthU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-6.00%)` | | | [...pache/hudi/utilities/sources/ParquetDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUGFycXVldERGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-5.00%)` | | | [...lities/schema/SchemaProviderWithPostProces
[jira] [Created] (HUDI-1776) Support AlterCommand For Hoodie
pengzhiwei created HUDI-1776: Summary: Support AlterCommand For Hoodie Key: HUDI-1776 URL: https://issues.apache.org/jira/browse/HUDI-1776 Project: Apache Hudi Issue Type: Sub-task Components: Spark Integration Reporter: pengzhiwei Assignee: pengzhiwei Fix For: 0.9.0 Support AlterCommand for hoodie. The AlterCommand will change the hoodie.properites and metastore. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] ssdong commented on pull request #2784: [HUDI-1740] Fix insert-overwrite API archival
ssdong commented on pull request #2784: URL: https://github.com/apache/hudi/pull/2784#issuecomment-815413234 @satishkotha Thank you for your review! I’ll take a look when I get back. Currently on a day trip. 😄 Basically, I wanna stop the abuse of `REQUESTED` here, at least for the insert overwrite writing operation, and separate it from `INFLIGHT`. With non-empty inflight commit files, we will suffer info loss. However, as you pointed out, this solution should also work against empty inflight files, I.e. clustering. I consider this a start to clean up and clarify various commit file logics as we have another issue of creating completely empty `REQUESTED` commit files. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] ssdong commented on a change in pull request #2784: [HUDI-1740] Fix insert-overwrite API archival
ssdong commented on a change in pull request #2784: URL: https://github.com/apache/hudi/pull/2784#discussion_r609231425 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/utils/MetadataConversionUtils.java ## @@ -72,9 +76,14 @@ public static HoodieArchivedMetaEntry createMetaWrapper(HoodieInstant hoodieInst HoodieReplaceCommitMetadata replaceCommitMetadata = HoodieReplaceCommitMetadata .fromBytes(metaClient.getActiveTimeline().getInstantDetails(hoodieInstant).get(), HoodieReplaceCommitMetadata.class); archivedMetaWrapper.setHoodieReplaceCommitMetadata(ReplaceArchivalHelper.convertReplaceCommitMetadata(replaceCommitMetadata)); +} else if (hoodieInstant.isInflight()) { + // inflight replacecommit files have the same meta data body as HoodieCommitMetadata Review comment: Thanks for pointing that out. Will test against clustering and see what happens. If it doesn’t work, will find an alternative way. 😄 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] susudong commented on a change in pull request #2784: [HUDI-1740] Fix insert-overwrite API archival
susudong commented on a change in pull request #2784: URL: https://github.com/apache/hudi/pull/2784#discussion_r609229811 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/utils/MetadataConversionUtils.java ## @@ -72,9 +76,14 @@ public static HoodieArchivedMetaEntry createMetaWrapper(HoodieInstant hoodieInst HoodieReplaceCommitMetadata replaceCommitMetadata = HoodieReplaceCommitMetadata .fromBytes(metaClient.getActiveTimeline().getInstantDetails(hoodieInstant).get(), HoodieReplaceCommitMetadata.class); archivedMetaWrapper.setHoodieReplaceCommitMetadata(ReplaceArchivalHelper.convertReplaceCommitMetadata(replaceCommitMetadata)); +} else if (hoodieInstant.isInflight()) { + // inflight replacecommit files have the same meta data body as HoodieCommitMetadata Review comment: Thanks for pointing that out! Let me test it with clustering and see what happens. 😄 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] lw309637554 commented on pull request #2765: [HUDI-1716]: Resolving default values for schema from dataframe
lw309637554 commented on pull request #2765: URL: https://github.com/apache/hudi/pull/2765#issuecomment-815405833 LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] lw309637554 commented on pull request #2773: [HUDI-1764] Add Hudi-CLI support for clustering
lw309637554 commented on pull request #2773: URL: https://github.com/apache/hudi/pull/2773#issuecomment-815405634 @jintaoguan add some minor comments -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] lw309637554 commented on a change in pull request #2773: [HUDI-1764] Add Hudi-CLI support for clustering
lw309637554 commented on a change in pull request #2773: URL: https://github.com/apache/hudi/pull/2773#discussion_r609227257 ## File path: hudi-cli/src/main/java/org/apache/hudi/cli/commands/ClusteringCommand.java ## @@ -0,0 +1,102 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.cli.commands; + +import org.apache.hudi.cli.HoodieCLI; +import org.apache.hudi.cli.commands.SparkMain.SparkCommand; +import org.apache.hudi.cli.utils.InputStreamConsumer; +import org.apache.hudi.cli.utils.SparkUtil; +import org.apache.hudi.common.table.HoodieTableMetaClient; +import org.apache.hudi.common.table.timeline.HoodieActiveTimeline; +import org.apache.hudi.utilities.UtilHelpers; +import org.apache.log4j.LogManager; +import org.apache.log4j.Logger; +import org.apache.spark.launcher.SparkLauncher; +import org.apache.spark.util.Utils; +import org.springframework.shell.core.CommandMarker; +import org.springframework.shell.core.annotation.CliCommand; +import org.springframework.shell.core.annotation.CliOption; +import org.springframework.stereotype.Component; +import scala.collection.JavaConverters; + +@Component +public class ClusteringCommand implements CommandMarker { + + private static final Logger LOG = LogManager.getLogger(ClusteringCommand.class); + + @CliCommand(value = "clustering schedule", help = "Schedule Clustering") + public String scheduleClustering( + @CliOption(key = "sparkMemory", help = "Spark executor memory", + unspecifiedDefaultValue = "1G") final String sparkMemory, + @CliOption(key = "propsFilePath", help = "path to properties file on localfs or dfs with configurations for hoodie client for clustering", + unspecifiedDefaultValue = "") final String propsFilePath, + @CliOption(key = "hoodieConfigs", help = "Any configuration that can be set in the properties file can be passed here in the form of an array", + unspecifiedDefaultValue = "") final String[] configs) throws Exception { +HoodieTableMetaClient client = HoodieCLI.getTableMetaClient(); +String sparkPropertiesPath = + Utils.getDefaultPropertiesFile(JavaConverters.mapAsScalaMapConverter(System.getenv()).asScala()); +SparkLauncher sparkLauncher = SparkUtil.initLauncher(sparkPropertiesPath); + +// First get a clustering instant time and pass it to spark launcher for scheduling clustering +String clusteringInstantTime = HoodieActiveTimeline.createNewInstantTime(); + +sparkLauncher.addAppArgs(SparkCommand.CLUSTERING_SCHEDULE.toString(), client.getBasePath(), +client.getTableConfig().getTableName(), clusteringInstantTime, sparkMemory, propsFilePath); +UtilHelpers.validateAndAddProperties(configs, sparkLauncher); +Process process = sparkLauncher.launch(); +InputStreamConsumer.captureOutput(process); +int exitCode = process.waitFor(); +if (exitCode != 0) { + return "Failed to schedule clustering for " + clusteringInstantTime; +} +return "Attempted to schedule clustering for " + clusteringInstantTime; + } + + @CliCommand(value = "clustering run", help = "Run Clustering") + public String runClustering( + @CliOption(key = "parallelism", help = "Parallelism for hoodie clustering", + unspecifiedDefaultValue = "1") final String parallelism, + @CliOption(key = "sparkMemory", help = "Spark executor memory", + unspecifiedDefaultValue = "4G") final String sparkMemory, + @CliOption(key = "retry", help = "Number of retries", + unspecifiedDefaultValue = "1") final String retry, + @CliOption(key = "clusteringInstant", help = "Clustering instant time", + mandatory = true) final String clusteringInstantTime, + @CliOption(key = "propsFilePath", help = "path to properties file on localfs or dfs with configurations for hoodie client for compacting", + unspecifiedDefaultValue = "") final String propsFilePath, + @CliOption(key = "hoodieConfigs", help = "Any configuration that can be set in the properties file can be passed here in the form of an array", + unspecifiedDefaultValue = "") final String[] configs + ) throws Exception { +HoodieTableMetaClie
[GitHub] [hudi] lw309637554 commented on a change in pull request #2773: [HUDI-1764] Add Hudi-CLI support for clustering
lw309637554 commented on a change in pull request #2773: URL: https://github.com/apache/hudi/pull/2773#discussion_r609227046 ## File path: hudi-cli/src/main/java/org/apache/hudi/cli/commands/ClusteringCommand.java ## @@ -0,0 +1,102 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.cli.commands; + +import org.apache.hudi.cli.HoodieCLI; +import org.apache.hudi.cli.commands.SparkMain.SparkCommand; +import org.apache.hudi.cli.utils.InputStreamConsumer; +import org.apache.hudi.cli.utils.SparkUtil; +import org.apache.hudi.common.table.HoodieTableMetaClient; +import org.apache.hudi.common.table.timeline.HoodieActiveTimeline; +import org.apache.hudi.utilities.UtilHelpers; +import org.apache.log4j.LogManager; +import org.apache.log4j.Logger; +import org.apache.spark.launcher.SparkLauncher; +import org.apache.spark.util.Utils; +import org.springframework.shell.core.CommandMarker; +import org.springframework.shell.core.annotation.CliCommand; +import org.springframework.shell.core.annotation.CliOption; +import org.springframework.stereotype.Component; +import scala.collection.JavaConverters; + +@Component +public class ClusteringCommand implements CommandMarker { + + private static final Logger LOG = LogManager.getLogger(ClusteringCommand.class); + + @CliCommand(value = "clustering schedule", help = "Schedule Clustering") + public String scheduleClustering( + @CliOption(key = "sparkMemory", help = "Spark executor memory", + unspecifiedDefaultValue = "1G") final String sparkMemory, + @CliOption(key = "propsFilePath", help = "path to properties file on localfs or dfs with configurations for hoodie client for clustering", + unspecifiedDefaultValue = "") final String propsFilePath, + @CliOption(key = "hoodieConfigs", help = "Any configuration that can be set in the properties file can be passed here in the form of an array", + unspecifiedDefaultValue = "") final String[] configs) throws Exception { +HoodieTableMetaClient client = HoodieCLI.getTableMetaClient(); +String sparkPropertiesPath = + Utils.getDefaultPropertiesFile(JavaConverters.mapAsScalaMapConverter(System.getenv()).asScala()); +SparkLauncher sparkLauncher = SparkUtil.initLauncher(sparkPropertiesPath); + +// First get a clustering instant time and pass it to spark launcher for scheduling clustering +String clusteringInstantTime = HoodieActiveTimeline.createNewInstantTime(); + +sparkLauncher.addAppArgs(SparkCommand.CLUSTERING_SCHEDULE.toString(), client.getBasePath(), +client.getTableConfig().getTableName(), clusteringInstantTime, sparkMemory, propsFilePath); +UtilHelpers.validateAndAddProperties(configs, sparkLauncher); +Process process = sparkLauncher.launch(); +InputStreamConsumer.captureOutput(process); +int exitCode = process.waitFor(); +if (exitCode != 0) { + return "Failed to schedule clustering for " + clusteringInstantTime; +} +return "Attempted to schedule clustering for " + clusteringInstantTime; Review comment: Succeed to schedule clustering for " + clusteringInstantTime -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] lw309637554 commented on a change in pull request #2773: [HUDI-1764] Add Hudi-CLI support for clustering
lw309637554 commented on a change in pull request #2773: URL: https://github.com/apache/hudi/pull/2773#discussion_r609224532 ## File path: hudi-cli/src/main/java/org/apache/hudi/cli/commands/ClusteringCommand.java ## @@ -0,0 +1,102 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.cli.commands; + +import org.apache.hudi.cli.HoodieCLI; +import org.apache.hudi.cli.commands.SparkMain.SparkCommand; +import org.apache.hudi.cli.utils.InputStreamConsumer; +import org.apache.hudi.cli.utils.SparkUtil; +import org.apache.hudi.common.table.HoodieTableMetaClient; +import org.apache.hudi.common.table.timeline.HoodieActiveTimeline; +import org.apache.hudi.utilities.UtilHelpers; +import org.apache.log4j.LogManager; +import org.apache.log4j.Logger; +import org.apache.spark.launcher.SparkLauncher; +import org.apache.spark.util.Utils; +import org.springframework.shell.core.CommandMarker; +import org.springframework.shell.core.annotation.CliCommand; +import org.springframework.shell.core.annotation.CliOption; +import org.springframework.stereotype.Component; +import scala.collection.JavaConverters; + +@Component +public class ClusteringCommand implements CommandMarker { + + private static final Logger LOG = LogManager.getLogger(ClusteringCommand.class); + + @CliCommand(value = "clustering schedule", help = "Schedule Clustering") + public String scheduleClustering( + @CliOption(key = "sparkMemory", help = "Spark executor memory", + unspecifiedDefaultValue = "1G") final String sparkMemory, + @CliOption(key = "propsFilePath", help = "path to properties file on localfs or dfs with configurations for hoodie client for clustering", + unspecifiedDefaultValue = "") final String propsFilePath, + @CliOption(key = "hoodieConfigs", help = "Any configuration that can be set in the properties file can be passed here in the form of an array", + unspecifiedDefaultValue = "") final String[] configs) throws Exception { +HoodieTableMetaClient client = HoodieCLI.getTableMetaClient(); Review comment: why we do not need initfs just like compaction command? HoodieTableMetaClient client = checkAndGetMetaClient(); boolean initialized = HoodieCLI.initConf(); HoodieCLI.initFS(initialized); -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-io commented on pull request #2785: [HUDI-1775] Add option for compaction parallelism
codecov-io commented on pull request #2785: URL: https://github.com/apache/hudi/pull/2785#issuecomment-815400787 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2785?src=pr&el=h1) Report > Merging [#2785](https://codecov.io/gh/apache/hudi/pull/2785?src=pr&el=desc) (4fca1f0) into [master](https://codecov.io/gh/apache/hudi/commit/3a926aacf6552fc06005db4a7880a233db904330?el=desc) (3a926aa) will **decrease** coverage by `37.68%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2785/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2785?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master #2785 +/- ## - Coverage 47.05% 9.37% -37.69% + Complexity 3357 48 -3309 Files 484 54 -430 Lines 230941995-21099 Branches 2456 235 -2221 - Hits 10868 187-10681 + Misses112801795 -9485 + Partials946 13 -933 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `?` | `?` | | | hudiclient | `?` | `?` | | | hudicommon | `?` | `?` | | | hudiflink | `?` | `?` | | | hudihadoopmr | `?` | `?` | | | hudisparkdatasource | `?` | `?` | | | hudisync | `?` | `?` | | | huditimelineservice | `?` | `?` | | | hudiutilities | `9.37% <ø> (ø)` | `0.00 <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2785?src=pr&el=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [.../org/apache/hudi/cli/commands/MetadataCommand.java](https://codecov.io/gh/apache/hudi/pull/2785/diff?src=pr&el=tree#diff-aHVkaS1jbGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpL2NvbW1hbmRzL01ldGFkYXRhQ29tbWFuZC5qYXZh) | | | | | [...di/common/table/log/block/HoodieAvroDataBlock.java](https://codecov.io/gh/apache/hudi/pull/2785/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9ibG9jay9Ib29kaWVBdnJvRGF0YUJsb2NrLmphdmE=) | | | | | [...ache/hudi/cli/commands/ArchivedCommitsCommand.java](https://codecov.io/gh/apache/hudi/pull/2785/diff?src=pr&el=tree#diff-aHVkaS1jbGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpL2NvbW1hbmRzL0FyY2hpdmVkQ29tbWl0c0NvbW1hbmQuamF2YQ==) | | | | | [...rg/apache/hudi/common/model/HoodieAvroPayload.java](https://codecov.io/gh/apache/hudi/pull/2785/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZUF2cm9QYXlsb2FkLmphdmE=) | | | | | [...di/hadoop/hive/HoodieCombineRealtimeFileSplit.java](https://codecov.io/gh/apache/hudi/pull/2785/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL2hpdmUvSG9vZGllQ29tYmluZVJlYWx0aW1lRmlsZVNwbGl0LmphdmE=) | | | | | [...oning/compaction/CompactionV1MigrationHandler.java](https://codecov.io/gh/apache/hudi/pull/2785/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL3ZlcnNpb25pbmcvY29tcGFjdGlvbi9Db21wYWN0aW9uVjFNaWdyYXRpb25IYW5kbGVyLmphdmE=) | | | | | [.../hudi/table/format/cow/CopyOnWriteInputFormat.java](https://codecov.io/gh/apache/hudi/pull/2785/diff?src=pr&el=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS90YWJsZS9mb3JtYXQvY293L0NvcHlPbldyaXRlSW5wdXRGb3JtYXQuamF2YQ==) | | | | | [...hudi/hadoop/hive/HoodieCombineHiveInputFormat.java](https://codecov.io/gh/apache/hudi/pull/2785/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL2hpdmUvSG9vZGllQ29tYmluZUhpdmVJbnB1dEZvcm1hdC5qYXZh) | | | | | [...in/java/org/apache/hudi/common/model/BaseFile.java](https://codecov.io/gh/apache/hudi/pull/2785/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0Jhc2VGaWxlLmphdmE=) | | | | | [...g/apache/hudi/common/function/FunctionWrapper.java](https://codecov.io/gh/apache/hudi/pull/2785/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2Z1bmN0aW9uL0Z1bmN0aW9uV3JhcHBlci5qYXZh) | | | | | ... and [418 more](https://codecov.io/gh/apache/hudi/pull/2785/diff?src=pr&el=tree-more) | | -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-1775) Add option for compaction parallelism
[ https://issues.apache.org/jira/browse/HUDI-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1775: - Labels: pull-request-available (was: ) > Add option for compaction parallelism > - > > Key: HUDI-1775 > URL: https://issues.apache.org/jira/browse/HUDI-1775 > Project: Apache Hudi > Issue Type: Task > Components: Flink Integration >Reporter: Danny Chen >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] danny0405 opened a new pull request #2785: [HUDI-1775] Add option for compaction parallelism
danny0405 opened a new pull request #2785: URL: https://github.com/apache/hudi/pull/2785 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of the pull request *(For example: This pull request adds quick-start document.)* ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] lw309637554 commented on a change in pull request #2773: [HUDI-1764] Add Hudi-CLI support for clustering
lw309637554 commented on a change in pull request #2773: URL: https://github.com/apache/hudi/pull/2773#discussion_r609215373 ## File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieDeltaStreamer.java ## @@ -1013,26 +1014,22 @@ public void testHoodieAsyncClusteringJob() throws Exception { HoodieDeltaStreamer ds = new HoodieDeltaStreamer(cfg, jsc); deltaStreamerTestRunner(ds, cfg, (r) -> { TestHelpers.assertAtLeastNCommits(2, tableBasePath, dfs); + String scheduleClusteringInstantTime = HoodieActiveTimeline.createNewInstantTime(); Review comment: yes. We also have a doc for async compaction usage https://cwiki.apache.org/confluence/display/HUDI/RFC+-+19+Clustering+data+for+freshness+and+query+performance -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-1775) Add option for compaction parallelism
[ https://issues.apache.org/jira/browse/HUDI-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-1775: - Issue Type: Task (was: New Feature) > Add option for compaction parallelism > - > > Key: HUDI-1775 > URL: https://issues.apache.org/jira/browse/HUDI-1775 > Project: Apache Hudi > Issue Type: Task > Components: Flink Integration >Reporter: Danny Chen >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-1775) Add option for compaction parallelism
Danny Chen created HUDI-1775: Summary: Add option for compaction parallelism Key: HUDI-1775 URL: https://issues.apache.org/jira/browse/HUDI-1775 Project: Apache Hudi Issue Type: New Feature Components: Flink Integration Reporter: Danny Chen -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1674) add partition level delete DOC or example
[ https://issues.apache.org/jira/browse/HUDI-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17316820#comment-17316820 ] liwei commented on HUDI-1674: - [~shivnarayan] spark datasource do not have the delete partition API. It need use the catalog. https://stackoverflow.com/questions/52531327/drop-partitions-from-spark After [https://github.com/apache/hudi/pull/2645] is landed, We can support 'alter table xx drop partition ()' > add partition level delete DOC or example > - > > Key: HUDI-1674 > URL: https://issues.apache.org/jira/browse/HUDI-1674 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: liwei >Priority: Minor > Labels: docs, user-support-issues > Attachments: image-2021-03-08-09-57-05-768.png > > > !image-2021-03-08-09-57-05-768.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] zherenyu831 commented on a change in pull request #2784: [HUDI-1740] Fix insert-overwrite API archival
zherenyu831 commented on a change in pull request #2784: URL: https://github.com/apache/hudi/pull/2784#discussion_r609200500 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java ## @@ -245,7 +245,7 @@ public final void reset() { bootstrapIndex = null; // Initialize with new Hoodie timeline. - init(metaClient, getTimeline()); + init(metaClient, metaClient.reloadActiveTimeline()); Review comment: I think the root problem is why we are calling reset() when close the timeline -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] zherenyu831 commented on a change in pull request #2784: [HUDI-1740] Fix insert-overwrite API archival
zherenyu831 commented on a change in pull request #2784: URL: https://github.com/apache/hudi/pull/2784#discussion_r609192912 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java ## @@ -245,7 +245,7 @@ public final void reset() { bootstrapIndex = null; // Initialize with new Hoodie timeline. - init(metaClient, getTimeline()); + init(metaClient, metaClient.reloadActiveTimeline()); Review comment: @satishkotha Since this part will called after archival, and the archived commits still in the timeline. In the post process hudi will try to load the byte from them, and will cause IO error -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] zherenyu831 commented on a change in pull request #2784: [HUDI-1740] Fix insert-overwrite API archival
zherenyu831 commented on a change in pull request #2784: URL: https://github.com/apache/hudi/pull/2784#discussion_r609186934 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/utils/MetadataConversionUtils.java ## @@ -105,14 +114,15 @@ public static HoodieArchivedMetaEntry createMetaWrapper(HoodieInstant hoodieInst return archivedMetaWrapper; } - public static HoodieArchivedMetaEntry createMetaWrapper(HoodieInstant hoodieInstant, - HoodieCommitMetadata hoodieCommitMetadata) { -HoodieArchivedMetaEntry archivedMetaWrapper = new HoodieArchivedMetaEntry(); -archivedMetaWrapper.setCommitTime(hoodieInstant.getTimestamp()); -archivedMetaWrapper.setActionState(hoodieInstant.getState().name()); - archivedMetaWrapper.setHoodieCommitMetadata(convertCommitMetadata(hoodieCommitMetadata)); -archivedMetaWrapper.setActionType(ActionType.commit.name()); -return archivedMetaWrapper; + public static Option getRequestedReplaceMetadata(HoodieTableMetaClient metaClient, HoodieInstant pendingReplaceInstant) throws IOException { +final HoodieInstant requestedInstant = HoodieTimeline.getReplaceCommitRequestedInstant(pendingReplaceInstant.getTimestamp()); + +Option content = metaClient.getActiveTimeline().getInstantDetails(requestedInstant); +if (!content.isPresent() || content.get().length == 0) { + LOG.warn("No content found in requested file for instant " + pendingReplaceInstant); + return Option.of(new HoodieRequestedReplaceMetadata()); Review comment: I will try with what you suggested ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/utils/MetadataConversionUtils.java ## @@ -105,14 +114,15 @@ public static HoodieArchivedMetaEntry createMetaWrapper(HoodieInstant hoodieInst return archivedMetaWrapper; } - public static HoodieArchivedMetaEntry createMetaWrapper(HoodieInstant hoodieInstant, - HoodieCommitMetadata hoodieCommitMetadata) { -HoodieArchivedMetaEntry archivedMetaWrapper = new HoodieArchivedMetaEntry(); -archivedMetaWrapper.setCommitTime(hoodieInstant.getTimestamp()); -archivedMetaWrapper.setActionState(hoodieInstant.getState().name()); - archivedMetaWrapper.setHoodieCommitMetadata(convertCommitMetadata(hoodieCommitMetadata)); -archivedMetaWrapper.setActionType(ActionType.commit.name()); -return archivedMetaWrapper; + public static Option getRequestedReplaceMetadata(HoodieTableMetaClient metaClient, HoodieInstant pendingReplaceInstant) throws IOException { +final HoodieInstant requestedInstant = HoodieTimeline.getReplaceCommitRequestedInstant(pendingReplaceInstant.getTimestamp()); + +Option content = metaClient.getActiveTimeline().getInstantDetails(requestedInstant); +if (!content.isPresent() || content.get().length == 0) { + LOG.warn("No content found in requested file for instant " + pendingReplaceInstant); + return Option.of(new HoodieRequestedReplaceMetadata()); Review comment: Current logic is using `import org.apache.hudi.common.model.HoodieCommitMetadata.fromBytes()` to fetch empty deltacommit (bytes = []), and creating a new instance of metadata which bytes not empty. So I was thinking it may be better to keep same behaviour in ReplaceCommitRequestedInstant. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-io edited a comment on pull request #2773: [HUDI-1764] Add Hudi-CLI support for clustering
codecov-io edited a comment on pull request #2773: URL: https://github.com/apache/hudi/pull/2773#issuecomment-813928206 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2773?src=pr&el=h1) Report > Merging [#2773](https://codecov.io/gh/apache/hudi/pull/2773?src=pr&el=desc) (582e348) into [master](https://codecov.io/gh/apache/hudi/commit/920537cac83d59ac05676fb952d5479c41adf757?el=desc) (920537c) will **increase** coverage by `17.31%`. > The diff coverage is `0.00%`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2773/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2773?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master#2773 +/- ## = + Coverage 52.30% 69.61% +17.31% + Complexity 3689 373 -3316 = Files 483 54 -429 Lines 23099 1998-21101 Branches 2460 236 -2224 = - Hits 12082 1391-10691 + Misses 9949 475 -9474 + Partials 1068 132 -936 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `?` | `?` | | | hudiclient | `?` | `?` | | | hudicommon | `?` | `?` | | | hudiflink | `?` | `?` | | | hudihadoopmr | `?` | `?` | | | hudisparkdatasource | `?` | `?` | | | hudisync | `?` | `?` | | | huditimelineservice | `?` | `?` | | | hudiutilities | `69.61% <0.00%> (-0.13%)` | `0.00 <0.00> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2773?src=pr&el=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...org/apache/hudi/utilities/HoodieClusteringJob.java](https://codecov.io/gh/apache/hudi/pull/2773/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hvb2RpZUNsdXN0ZXJpbmdKb2IuamF2YQ==) | `62.50% <0.00%> (-2.72%)` | `9.00 <0.00> (ø)` | | | [...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2773/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=) | `71.08% <0.00%> (-0.35%)` | `55.00% <0.00%> (-1.00%)` | | | [.../common/bloom/HoodieDynamicBoundedBloomFilter.java](https://codecov.io/gh/apache/hudi/pull/2773/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2Jsb29tL0hvb2RpZUR5bmFtaWNCb3VuZGVkQmxvb21GaWx0ZXIuamF2YQ==) | | | | | [...java/org/apache/hudi/common/fs/StorageSchemes.java](https://codecov.io/gh/apache/hudi/pull/2773/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL1N0b3JhZ2VTY2hlbWVzLmphdmE=) | | | | | [...e/hudi/common/table/timeline/dto/FileGroupDTO.java](https://codecov.io/gh/apache/hudi/pull/2773/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL2R0by9GaWxlR3JvdXBEVE8uamF2YQ==) | | | | | [...he/hudi/hadoop/SafeParquetRecordReaderWrapper.java](https://codecov.io/gh/apache/hudi/pull/2773/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL1NhZmVQYXJxdWV0UmVjb3JkUmVhZGVyV3JhcHBlci5qYXZh) | | | | | [...n/java/org/apache/hudi/common/HoodieCleanStat.java](https://codecov.io/gh/apache/hudi/pull/2773/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL0hvb2RpZUNsZWFuU3RhdC5qYXZh) | | | | | [.../hudi/common/config/SerializableConfiguration.java](https://codecov.io/gh/apache/hudi/pull/2773/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2NvbmZpZy9TZXJpYWxpemFibGVDb25maWd1cmF0aW9uLmphdmE=) | | | | | [...e/hudi/exception/HoodieDeltaStreamerException.java](https://codecov.io/gh/apache/hudi/pull/2773/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhjZXB0aW9uL0hvb2RpZURlbHRhU3RyZWFtZXJFeGNlcHRpb24uamF2YQ==) | | | | | [...org/apache/hudi/common/model/HoodieFileFormat.java](https://codecov.io/gh/apache/hudi/pull/2773/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZUZpbGVGb3JtYXQuamF2YQ==) | | | | | ... and [421 more](https://codecov.io/gh/apache/hudi/pull/2773/diff?src=pr&el=tree-more) | | -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm
[GitHub] [hudi] hddong commented on pull request #1946: [HUDI-1176]Upgrade tp log4j2
hddong commented on pull request #1946: URL: https://github.com/apache/hudi/pull/1946#issuecomment-815376490 @wangxianghu: had upgrade to `2.13.3` and fix the warning. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (HUDI-1750) Fail to load user's class if user move hudi-spark-bundle_2.11-0.7.0.jar into spark classpath
[ https://issues.apache.org/jira/browse/HUDI-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lrz resolved HUDI-1750. --- Resolution: Fixed > Fail to load user's class if user move hudi-spark-bundle_2.11-0.7.0.jar into > spark classpath > > > Key: HUDI-1750 > URL: https://issues.apache.org/jira/browse/HUDI-1750 > Project: Apache Hudi > Issue Type: Bug >Reporter: lrz >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > Attachments: image-2021-04-01-10-55-43-760.png > > > Hudi use Class.forName(clazzName) to load user's class, which classloader is > same as call,see here: > !image-2021-04-01-10-55-43-760.png! > if user move hudi-spark-bundle jar into spark classPath, and use --jar to add > customer jars, then the caller classLoader will be AppClassLoader, and the > customer jars will be load by spark's MutableURLClassLoader, then lead to > ClassNotFoundException -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-1751) DeltaStream print many unnecessary warn log because of passing hoodie config to kafka consumer
[ https://issues.apache.org/jira/browse/HUDI-1751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lrz resolved HUDI-1751. --- Resolution: Fixed > DeltaStream print many unnecessary warn log because of passing hoodie config > to kafka consumer > -- > > Key: HUDI-1751 > URL: https://issues.apache.org/jira/browse/HUDI-1751 > Project: Apache Hudi > Issue Type: Improvement >Reporter: lrz >Priority: Minor > Labels: pull-request-available > Fix For: 0.9.0 > > > Because we add both kafka parameters and hudi configs at the same properties > file, such as kafka-source.properties, then when creating kafkaParams obj > will add some hoodie config also, which lead to the warn log printing: > !https://wa.vision.huawei.com/vision-file-storage/api/file/download/upload-v2/2021/2/15/qwx352829/76572ba9f4094fb29b018db91fbf1450/image.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-1749) Clean/Compaction/Rollback command maybe never exit when operation fail
[ https://issues.apache.org/jira/browse/HUDI-1749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lrz resolved HUDI-1749. --- Resolution: Fixed > Clean/Compaction/Rollback command maybe never exit when operation fail > -- > > Key: HUDI-1749 > URL: https://issues.apache.org/jira/browse/HUDI-1749 > Project: Apache Hudi > Issue Type: Bug >Reporter: lrz >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > There are two issues: > 1) After Clean/Compaction/Rollback command finish, yarn application will > always show fail because the command exit directly without waitting for > sparkContext stop. > 2)when Clean/Compaction/Rollback command failed because of some exception, > the command will never exit because of sparkContext didn't stop. This is > because sparkUI use jetty, and introduce non-daemon thread, and > sparkContext.stop will stopUI to stop the non-daemon thread. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] jintaoguan commented on a change in pull request #2773: [HUDI-1764] Add Hudi-CLI support for clustering
jintaoguan commented on a change in pull request #2773: URL: https://github.com/apache/hudi/pull/2773#discussion_r609161570 ## File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieDeltaStreamer.java ## @@ -1013,26 +1014,22 @@ public void testHoodieAsyncClusteringJob() throws Exception { HoodieDeltaStreamer ds = new HoodieDeltaStreamer(cfg, jsc); deltaStreamerTestRunner(ds, cfg, (r) -> { TestHelpers.assertAtLeastNCommits(2, tableBasePath, dfs); + String scheduleClusteringInstantTime = HoodieActiveTimeline.createNewInstantTime(); Review comment: Sure I will make it compatible with the old usage mode. The behavior will be 1) if the user provides an instant time, we will use it to schedule clustering and return it to the user. 2) if the user doesn't provide an instant time, we will generate one and return it to the user. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #2770: [SUPPORT] How column _hoodie_is_deleted works?
nsivabalan commented on issue #2770: URL: https://github.com/apache/hudi/issues/2770#issuecomment-815315164 Sorry, what feature you are looking for. can you please clarify. hudi automatically deletes those records which has "_hoodie_is_deleted" set to true. in other words, if you have a batch of write, with mixed set of records (inserts, updates, deletes), hudi will honor all 3. you just need to use "upsert" as your operation type. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] rubenssoto commented on issue #2770: [SUPPORT] How column _hoodie_is_deleted works?
rubenssoto commented on issue #2770: URL: https://github.com/apache/hudi/issues/2770#issuecomment-815296739 @nsivabalan I think the error in on my side. I didn't filter the deleted records on the first batch, it could be a great feature to Hudi in the future. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] stackfun commented on issue #2771: [SUPPORT] Log files are not compacted
stackfun commented on issue #2771: URL: https://github.com/apache/hudi/issues/2771#issuecomment-815292886 Setting the "hoodie.compaction.target.io" config worked like a charm. Thanks a lot! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] stackfun closed issue #2771: [SUPPORT] Log files are not compacted
stackfun closed issue #2771: URL: https://github.com/apache/hudi/issues/2771 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] satishkotha commented on pull request #2784: [HUDI-1740] Fix insert-overwrite API archival
satishkotha commented on pull request #2784: URL: https://github.com/apache/hudi/pull/2784#issuecomment-815291682 @ssdong thanks for bringing this up and contributing. I added some comments, please take a look. Also, looks like there are some CI failures. Please fix those as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] satishkotha commented on a change in pull request #2784: [HUDI-1740] Fix insert-overwrite API archival
satishkotha commented on a change in pull request #2784: URL: https://github.com/apache/hudi/pull/2784#discussion_r609094597 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/utils/MetadataConversionUtils.java ## @@ -105,14 +114,15 @@ public static HoodieArchivedMetaEntry createMetaWrapper(HoodieInstant hoodieInst return archivedMetaWrapper; } - public static HoodieArchivedMetaEntry createMetaWrapper(HoodieInstant hoodieInstant, - HoodieCommitMetadata hoodieCommitMetadata) { -HoodieArchivedMetaEntry archivedMetaWrapper = new HoodieArchivedMetaEntry(); -archivedMetaWrapper.setCommitTime(hoodieInstant.getTimestamp()); -archivedMetaWrapper.setActionState(hoodieInstant.getState().name()); - archivedMetaWrapper.setHoodieCommitMetadata(convertCommitMetadata(hoodieCommitMetadata)); -archivedMetaWrapper.setActionType(ActionType.commit.name()); -return archivedMetaWrapper; + public static Option getRequestedReplaceMetadata(HoodieTableMetaClient metaClient, HoodieInstant pendingReplaceInstant) throws IOException { +final HoodieInstant requestedInstant = HoodieTimeline.getReplaceCommitRequestedInstant(pendingReplaceInstant.getTimestamp()); + +Option content = metaClient.getActiveTimeline().getInstantDetails(requestedInstant); +if (!content.isPresent() || content.get().length == 0) { + LOG.warn("No content found in requested file for instant " + pendingReplaceInstant); + return Option.of(new HoodieRequestedReplaceMetadata()); Review comment: Why not return Option.empty() and skip archival for this case? ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/utils/MetadataConversionUtils.java ## @@ -72,9 +76,14 @@ public static HoodieArchivedMetaEntry createMetaWrapper(HoodieInstant hoodieInst HoodieReplaceCommitMetadata replaceCommitMetadata = HoodieReplaceCommitMetadata .fromBytes(metaClient.getActiveTimeline().getInstantDetails(hoodieInstant).get(), HoodieReplaceCommitMetadata.class); archivedMetaWrapper.setHoodieReplaceCommitMetadata(ReplaceArchivalHelper.convertReplaceCommitMetadata(replaceCommitMetadata)); +} else if (hoodieInstant.isInflight()) { + // inflight replacecommit files have the same meta data body as HoodieCommitMetadata Review comment: We also use replacecommit for 'clustering' operation. Clustering has empty replacecommit.inflight file, so this may not work ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java ## @@ -245,7 +245,7 @@ public final void reset() { bootstrapIndex = null; // Initialize with new Hoodie timeline. - init(metaClient, getTimeline()); + init(metaClient, metaClient.reloadActiveTimeline()); Review comment: IIUC, this is breaking some fundamental assumptions. There are many places where we pass "trimmed" timeline for time-travel queries etc. You are replacing that with all instants from active timeline, which is not desired. Is this change needed if we handle empty partitionToReplaceFileIds in archival? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] kvallala commented on issue #2528: [SUPPORT] Spark read hudi data from hive (metastore)
kvallala commented on issue #2528: URL: https://github.com/apache/hudi/issues/2528#issuecomment-815182803 We are having the same issue. It works with `spark.sql.hive.convertMetastoreParquet=false` when querying Hudi table from spark session, but see duplicates when querying through external hive metastore. Could you pls suggest the required configuration to be set for external Hive Metastore so it works when querying from Hue/spark SQL or any other speed layer (like Dremio) that connects to Hive Metastore to query the Hudi tables -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] ze-engineering-code-challenge commented on pull request #2665: [HUDI-1160] Support update partial fields for CoW table
ze-engineering-code-challenge commented on pull request #2665: URL: https://github.com/apache/hudi/pull/2665#issuecomment-815168500 Hello @liujinhui1994 Should I enable any option to work? Im trying to do an upsert in a Hudi table with 0.8.0 version and didn't work :( Caused by: org.apache.parquet.io.InvalidRecordException: Parquet/Avro schema mismatch: Avro field 'cf_categoria' not found -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vingov commented on pull request #2747: [HUDI-1743] Added support for SqlFileBasedTransformer
vingov commented on pull request #2747: URL: https://github.com/apache/hudi/pull/2747#issuecomment-815167427 @yanghua - I don't see the unit tests for the existing transformers except for two functions, I don't have time now to write unit tests, can I handle it in a separate pull request where I can write unit tests for all transformers? This is blocking my data pipelines, can we make an exception and merge this pull request? I'm happy to create a JIRA to track the unit tests for all transformers. thoughts? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-io commented on pull request #2784: [HUDI-1740] Fix insert-overwrite API archival
codecov-io commented on pull request #2784: URL: https://github.com/apache/hudi/pull/2784#issuecomment-815166346 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2784?src=pr&el=h1) Report > Merging [#2784](https://codecov.io/gh/apache/hudi/pull/2784?src=pr&el=desc) (5572b9f) into [master](https://codecov.io/gh/apache/hudi/commit/920537cac83d59ac05676fb952d5479c41adf757?el=desc) (920537c) will **decrease** coverage by `42.93%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2784/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2784?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master #2784 +/- ## - Coverage 52.30% 9.37% -42.94% + Complexity 3689 48 -3641 Files 483 54 -429 Lines 230991995-21104 Branches 2460 235 -2225 - Hits 12082 187-11895 + Misses 99491795 -8154 + Partials 1068 13 -1055 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `?` | `?` | | | hudiclient | `?` | `?` | | | hudicommon | `?` | `?` | | | hudiflink | `?` | `?` | | | hudihadoopmr | `?` | `?` | | | hudisparkdatasource | `?` | `?` | | | hudisync | `?` | `?` | | | huditimelineservice | `?` | `?` | | | hudiutilities | `9.37% <ø> (-60.38%)` | `0.00 <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2784?src=pr&el=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2784/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | | | [...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2784/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | | | [...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2784/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | | | [.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2784/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | | | [.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2784/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | | | [...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2784/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | | | [...g/apache/hudi/utilities/sources/JsonDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2784/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkRGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | | | [...apache/hudi/utilities/sources/JsonKafkaSource.java](https://codecov.io/gh/apache/hudi/pull/2784/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkthZmthU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-6.00%)` | | | [...pache/hudi/utilities/sources/ParquetDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2784/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUGFycXVldERGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-5.00%)` | | | [...lities/schema/SchemaProviderWithPostProcessor.java](https://codecov.io/gh/apache/hudi/pull/2784/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFQcm92aWRlc
[GitHub] [hudi] ssdong commented on issue #2707: [SUPPORT] insert_ovewrite_table failed on archiving
ssdong commented on issue #2707: URL: https://github.com/apache/hudi/issues/2707#issuecomment-815156685 Hi @satishkotha @jsbali ! I've created the pull request for this issue. Had observed more when going down the road and I've tried my best to clarify them and hopefully had written a detailed enough description for the PR. Let me know. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] ssdong opened a new pull request #2784: [HUDI-1740] Fix insert-overwrite API archival
ssdong opened a new pull request #2784: URL: https://github.com/apache/hudi/pull/2784 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of the pull request ### Summary: This pull request is to fix the archival logic within insert overwrite API against requested & inflight commit files. ### Issue: `0.7.0` throw exception `Caused by: java.lang.IllegalArgumentException: Positive number of partitions required` while `0.9.0-SNAPSHOT`(latest master branch) adds `java.util.NoSuchElementException: No value present in Option` on top of it when hudi tries to archive replace commit files(`COMPLETED`, `REQUESTED` and `INFLIGHT`) Please checkout issue https://github.com/apache/hudi/issues/2707 and ticket https://issues.apache.org/jira/browse/HUDI-1740 for further information about the above 2 exceptions. The inner causes are somewhat sophisticated, and I've tried my best to understand them and applied the fix for it, instead of giving it a one-line tricky fix or patch and see errors gone. Of course, the approaches I am taking are open to discussions. ### Fixes 1. `Positive number of partitions required` error is easier to be fixed where we just have to filter out empty `partitionToReplaceFileIds` for `COMPLETED` replace commit files within `ReplaceArchivalHelper.java`. 2. `java.util.NoSuchElementException: No value present in Option` is much more complicated and it happened due to a call to `ClusteringUtils.getRequestedReplaceMetadata()` against _both_ `REQUESTED` and `INFLIGHT` commit files for retrieving their meta body. Now, I get the idea that we are encouraged to use existing utils classes for code reuse. However, a closer inspection upon `getRequestedReplaceMetadata` shows that clustering feature retrieves the meta for `INFLIGHT` commit file through a `REQUESTED` instant. Although this is _not_ fundamentally wrong since there is no "clustering plan" for both `REQUESTED` and `INFLIGHT` replace commit files so the outcome is the same for both, as it is also being pointed out in the comment within `getRequestedReplaceMetadata`. However, since `REQUESTED` instant is empty(there is a corresponding [ticket](https://issues.apache.org/jira/browse/HUDI-1740) for it), it generates an `Option.empty()` which is being fetched by `.get()` later which t riggered the `NoSuchElementException`. What's more, it _loses_ information in `INFLIGHT` commit file when fetching via `REQUESTED` instant and so we observed in the following screenshot that: https://user-images.githubusercontent.com/3754011/113918516-7e937a80-981d-11eb-84b6-e2c4bec2c3b1.png";> This does not make sense to me when we pretty much _abuse_ the `REQUESTED` concept to deal with `INFLIGHT` with `REQUESTED` itself being empty. I've taken the approach to define an extra field(placeholder) for `INFLIGHT` and reuse the `HoodieCommitMetadata` for deserializing `INFLIGHT` since they share the same structure where `COMPLETED` replace commit extends `HoodieCommitMetadata` for the extra `partitionToReplaceFileIds` field. Now, here's what I gained after adopting this strategy: https://user-images.githubusercontent.com/3754011/113919657-cff03980-981e-11eb-928e-65c719d15ca5.png";> ### The overall outcome after the fix on latest master branch: https://user-images.githubusercontent.com/3754011/113919768-ee563500-981e-11eb-80e8-fe62ba868709.png";> Let me know if there is anything I am missing. :) _The simplest solution to the 2nd issue is to actually having `ClusteringUtils.getRequestedReplaceMetadata` return `Option.of(new HoodieRequestedReplaceMetadata())` upon retrieving the empty `REQUESTED` replace commit file for both `REQUESTED` and `INFLIGHT`. I didn't choose to fix the problem this way fearing it's basically altering the inappropriate approach by giving it a bandage and stopping the bleeding._ ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: - *Fix insert overwrite API archival* ## Committer checklist - [x] Has a corresponding JIRA in PR title & commit - [x] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from
[GitHub] [hudi] nsivabalan commented on a change in pull request #2783: [DOCS]Add docs for 0.8.0 release
nsivabalan commented on a change in pull request #2783: URL: https://github.com/apache/hudi/pull/2783#discussion_r608886318 ## File path: docs/_docs/0.8.0/1_1_spark_quick_start_guide.md ## @@ -0,0 +1,530 @@ +--- +version: 0.8.0 +title: "Quick-Start Guide" +permalink: /docs/spark_quick-start-guide.html +toc: true +last_modified_at: 2019-12-30T15:59:57-04:00 +--- + +This guide provides a quick peek at Hudi's capabilities using spark-shell. Using Spark datasources, we will walk through +code snippets that allows you to insert and update a Hudi table of default table type: +[Copy on Write](/docs/concepts.html#copy-on-write-table). +After each write operation we will also show how to read the data both snapshot and incrementally. +# Scala example + +## Setup + +Hudi works with Spark-2.x & Spark 3.x versions. You can follow instructions [here](https://spark.apache.org/downloads.html) for setting up spark. Review comment: fix min versions here for spark2 ## File path: docs/_docs/0.8.0/0_3_migration_guide.md ## @@ -0,0 +1,72 @@ +--- +version: 0.8.0 +title: Migration Guide +keywords: hudi, migration, use case +permalink: /docs/migration_guide.html +summary: In this page, we will discuss some available tools for migrating your existing table into a Hudi table +last_modified_at: 2019-12-30T15:59:57-04:00 +--- + +Hudi maintains metadata such as commit timeline and indexes to manage a table. The commit timelines helps to understand the actions happening on a table as well as the current state of a table. Indexes are used by Hudi to maintain a record key to file id mapping to efficiently locate a record. At the moment, Hudi supports writing only parquet columnar formats. +To be able to start using Hudi for your existing table, you will need to migrate your existing table into a Hudi managed table. There are a couple of ways to achieve this. + + +## Approaches + + +### Use Hudi for new partitions alone + +Hudi can be used to manage an existing table without affecting/altering the historical data already present in the +table. Hudi has been implemented to be compatible with such a mixed table with a caveat that either the complete +Hive partition is Hudi managed or not. Thus the lowest granularity at which Hudi manages a table is a Hive +partition. Start using the datasource API or the WriteClient to write to the table and make sure you start writing +to a new partition or convert your last N partitions into Hudi instead of the entire table. Note, since the historical + partitions are not managed by HUDI, none of the primitives provided by HUDI work on the data in those partitions. More concretely, one cannot perform upserts or incremental pull on such older partitions not managed by the HUDI table. +Take this approach if your table is an append only type of table and you do not expect to perform any updates to existing (or non Hudi managed) partitions. + + +### Convert existing table to Hudi + +Import your existing table into a Hudi managed table. Since all the data is Hudi managed, none of the limitations + of Approach 1 apply here. Updates spanning any partitions can be applied to this table and Hudi will efficiently + make the update available to queries. Note that not only do you get to use all Hudi primitives on this table, + there are other additional advantages of doing this. Hudi automatically manages file sizes of a Hudi managed table + . You can define the desired file size when converting this table and Hudi will ensure it writes out files + adhering to the config. It will also ensure that smaller files later get corrected by routing some new inserts into + small files rather than writing new small ones thus maintaining the health of your cluster. + +There are a few options when choosing this approach. + +**Option 1** Review comment: shouldn't we also briefly talk about bootstrap here as one of the options? ## File path: docs/_docs/0.8.0/1_1_spark_quick_start_guide.md ## @@ -0,0 +1,530 @@ +--- +version: 0.8.0 +title: "Quick-Start Guide" +permalink: /docs/spark_quick-start-guide.html +toc: true +last_modified_at: 2019-12-30T15:59:57-04:00 +--- + +This guide provides a quick peek at Hudi's capabilities using spark-shell. Using Spark datasources, we will walk through +code snippets that allows you to insert and update a Hudi table of default table type: +[Copy on Write](/docs/concepts.html#copy-on-write-table). +After each write operation we will also show how to read the data both snapshot and incrementally. +# Scala example + +## Setup + +Hudi works with Spark-2.x & Spark 3.x versions. You can follow instructions [here](https://spark.apache.org/downloads.html) for setting up spark. +From the extracted directory run spark-shell with Hudi as: + +```scala +// spark-shell +spark-shell \ + --packages org.apache.hudi:hudi-spark-bundle_2.12:0.7.0,org.apache.spark:spark-avro_2.12:3.0.1 \ + --conf 'spark.s
[jira] [Updated] (HUDI-1740) insert_overwrite_table and insert_overwrite first replacecommit has empty partitionToReplaceFileIds
[ https://issues.apache.org/jira/browse/HUDI-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susu Dong updated HUDI-1740: Description: insert_overwrite_table and insert_overwrite first replacecommit has empty partitionToReplaceFileIds which messes up archival code. Fix The code needs to only proceed if partitionToReplaceFileIds is not Empty. Updates: Archival also breaks upon requested/inflight commit in 0.9.0-SNAPSHOT. It wasn't an issue in 0.7.0 so this Jira ticket is fixing two things. Please refer to the detailed description in the PR. was: insert_overwrite_table and insert_overwrite first replacecommit has empty partitionToReplaceFileIds which messes up archival code. Fix The code needs to only proceed if partitionToReplaceFileIds is not Empty. > insert_overwrite_table and insert_overwrite first replacecommit has empty > partitionToReplaceFileIds > --- > > Key: HUDI-1740 > URL: https://issues.apache.org/jira/browse/HUDI-1740 > Project: Apache Hudi > Issue Type: Bug >Reporter: Jagmeet Bali >Assignee: Susu Dong >Priority: Minor > Labels: pull-request-available > > insert_overwrite_table and insert_overwrite first replacecommit has empty > partitionToReplaceFileIds which messes up archival code. > Fix > The code needs to only proceed if partitionToReplaceFileIds is not Empty. > > Updates: Archival also breaks upon requested/inflight commit in > 0.9.0-SNAPSHOT. It wasn't an issue in 0.7.0 so this Jira ticket is fixing two > things. Please refer to the detailed description in the PR. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HUDI-1739) insert_overwrite_table and insert_overwrite create empty replacecommit.requested file which breaks archival
[ https://issues.apache.org/jira/browse/HUDI-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susu Dong reassigned HUDI-1739: --- Assignee: Susu Dong > insert_overwrite_table and insert_overwrite create empty > replacecommit.requested file which breaks archival > --- > > Key: HUDI-1739 > URL: https://issues.apache.org/jira/browse/HUDI-1739 > Project: Apache Hudi > Issue Type: Bug >Reporter: Jagmeet Bali >Assignee: Susu Dong >Priority: Minor > > Fixes can be to > # Ignore empty replacecommit.requested files. > # Standardise the replacecommit.requested format across all invocations be > it from clustering or this use case. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HUDI-1774) Add support or delete_partition with spark ds
[ https://issues.apache.org/jira/browse/HUDI-1774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan reassigned HUDI-1774: - Assignee: liwei > Add support or delete_partition with spark ds > -- > > Key: HUDI-1774 > URL: https://issues.apache.org/jira/browse/HUDI-1774 > Project: Apache Hudi > Issue Type: Improvement > Components: Spark Integration >Reporter: sivabalan narayanan >Assignee: liwei >Priority: Major > > I see we have added support for delete_partitions at write client. but we > don't have the support with spark datasource. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] nsivabalan commented on issue #2743: Do we have any TTL mechanism in Hudi?
nsivabalan commented on issue #2743: URL: https://github.com/apache/hudi/issues/2743#issuecomment-815015923 @lw309637554 @satishkotha : fyi we are yet to add spark ds support for this "delete_partition" operation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-1674) add partition level delete DOC or example
[ https://issues.apache.org/jira/browse/HUDI-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17316432#comment-17316432 ] sivabalan narayanan commented on HUDI-1674: --- [~309637554]: we are yet to add this operation to spark ds: https://issues.apache.org/jira/browse/HUDI-1774 > add partition level delete DOC or example > - > > Key: HUDI-1674 > URL: https://issues.apache.org/jira/browse/HUDI-1674 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: liwei >Priority: Minor > Labels: docs, user-support-issues > Attachments: image-2021-03-08-09-57-05-768.png > > > !image-2021-03-08-09-57-05-768.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] nsivabalan commented on issue #2399: [SUPPORT] Hudi deletes not being properly commited
nsivabalan commented on issue #2399: URL: https://github.com/apache/hudi/issues/2399#issuecomment-815011207 btw, we have filed a feature request to support reusing existing hudi configs https://issues.apache.org/jira/browse/HUDI-1640 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (HUDI-1760) Incorrect Documentation for HoodieWriteConfigs
[ https://issues.apache.org/jira/browse/HUDI-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gary Li reassigned HUDI-1760: - Assignee: Gary Li > Incorrect Documentation for HoodieWriteConfigs > -- > > Key: HUDI-1760 > URL: https://issues.apache.org/jira/browse/HUDI-1760 > Project: Apache Hudi > Issue Type: Bug >Reporter: Pratyaksh Sharma >Assignee: Gary Li >Priority: Major > > GH Issue - https://github.com/apache/hudi/issues/2760 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] BenjMaq commented on issue #2399: [SUPPORT] Hudi deletes not being properly commited
BenjMaq commented on issue #2399: URL: https://github.com/apache/hudi/issues/2399#issuecomment-814990246 Just want to add that I faced the same issue. For me, the problem was related to the option `.option(DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY, "org.apache.hudi.keygen.ComplexKeyGenerator") that `I set for the `UPSERT` but not for the `DELETE`. Thanks @afeldman1 for reverting back to explain the fix, it led me on the right track. btw, I'm also wondering why this fails? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-73) Support vanilla Avro Kafka Source in HoodieDeltaStreamer
[ https://issues.apache.org/jira/browse/HUDI-73?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gary Li updated HUDI-73: Fix Version/s: (was: 0.8.0) 0.9.0 > Support vanilla Avro Kafka Source in HoodieDeltaStreamer > > > Key: HUDI-73 > URL: https://issues.apache.org/jira/browse/HUDI-73 > Project: Apache Hudi > Issue Type: New Feature > Components: DeltaStreamer >Reporter: Balaji Varadarajan >Assignee: sivabalan narayanan >Priority: Blocker > Labels: pull-request-available, sev:high, user-support-issues > Fix For: 0.9.0 > > > Context : [https://github.com/uber/hudi/issues/597] > Currently, Avro Kafka Source expects the installation to use Confluent > version with SchemaRegistry server running. We need to support the Kafka > installations which do not use Schema Registry by allowing > FileBasedSchemaProvider to be integrated to AvroKafkaSource. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-1774) Add support or delete_partition with spark ds
sivabalan narayanan created HUDI-1774: - Summary: Add support or delete_partition with spark ds Key: HUDI-1774 URL: https://issues.apache.org/jira/browse/HUDI-1774 Project: Apache Hudi Issue Type: Improvement Components: Spark Integration Reporter: sivabalan narayanan I see we have added support for delete_partitions at write client. but we don't have the support with spark datasource. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] garyli1019 opened a new pull request #2783: [DOCS]Add docs for 0.8.0 release
garyli1019 opened a new pull request #2783: URL: https://github.com/apache/hudi/pull/2783 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of the pull request *(For example: This pull request adds quick-start document.)* ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] li36909 commented on pull request #2754: [HUDI-1751] Remove irrelevant properties from passing to kafkaConsumer which in turn prints lot of warn logs
li36909 commented on pull request #2754: URL: https://github.com/apache/hudi/pull/2754#issuecomment-814932095 @n3nash @pratyakshsharma thank you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] li36909 commented on pull request #2752: [HUDI-1749] Clean/Compaction/Rollback command maybe never exit when operation fail
li36909 commented on pull request #2752: URL: https://github.com/apache/hudi/pull/2752#issuecomment-814931307 @n3nash thank you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] li36909 commented on pull request #2753: [HUDI-1750] Fail to load user's class if user move hudi-spark-bundle jar into spark classpath
li36909 commented on pull request #2753: URL: https://github.com/apache/hudi/pull/2753#issuecomment-814930600 @nsivabalan thank you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-io edited a comment on pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support
codecov-io edited a comment on pull request #2645: URL: https://github.com/apache/hudi/pull/2645#issuecomment-792430670 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2645?src=pr&el=h1) Report > Merging [#2645](https://codecov.io/gh/apache/hudi/pull/2645?src=pr&el=desc) (647e322) into [master](https://codecov.io/gh/apache/hudi/commit/e970e1f48302aec3af7eeca009a2c793757cd501?el=desc) (e970e1f) will **increase** coverage by `17.40%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2645/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2645?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master#2645 +/- ## = + Coverage 52.32% 69.72% +17.40% + Complexity 3689 373 -3316 = Files 483 54 -429 Lines 23095 1995-21100 Branches 2460 235 -2225 = - Hits 12084 1391-10693 + Misses 9942 473 -9469 + Partials 1069 131 -938 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `?` | `?` | | | hudiclient | `?` | `?` | | | hudicommon | `?` | `?` | | | hudiflink | `?` | `?` | | | hudihadoopmr | `?` | `?` | | | hudisparkdatasource | `?` | `?` | | | hudisync | `?` | `?` | | | huditimelineservice | `?` | `?` | | | hudiutilities | `69.72% <ø> (+0.03%)` | `0.00 <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2645?src=pr&el=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [.../versioning/compaction/CompactionPlanMigrator.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL3ZlcnNpb25pbmcvY29tcGFjdGlvbi9Db21wYWN0aW9uUGxhbk1pZ3JhdG9yLmphdmE=) | | | | | [...ecution/datasources/Spark3ParsePartitionUtil.scala](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3BhcmszL3NyYy9tYWluL3NjYWxhL29yZy9hcGFjaGUvc3Bhcmsvc3FsL2V4ZWN1dGlvbi9kYXRhc291cmNlcy9TcGFyazNQYXJzZVBhcnRpdGlvblV0aWwuc2NhbGE=) | | | | | [...rg/apache/hudi/cli/commands/CompactionCommand.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS1jbGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpL2NvbW1hbmRzL0NvbXBhY3Rpb25Db21tYW5kLmphdmE=) | | | | | [...di/hadoop/BootstrapColumnStichingRecordReader.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL0Jvb3RzdHJhcENvbHVtblN0aWNoaW5nUmVjb3JkUmVhZGVyLmphdmE=) | | | | | [...org/apache/hudi/common/util/SpillableMapUtils.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvU3BpbGxhYmxlTWFwVXRpbHMuamF2YQ==) | | | | | [...ain/scala/org/apache/hudi/HoodieBootstrapRDD.scala](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0hvb2RpZUJvb3RzdHJhcFJERC5zY2FsYQ==) | | | | | [...he/hudi/common/model/EmptyHoodieRecordPayload.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0VtcHR5SG9vZGllUmVjb3JkUGF5bG9hZC5qYXZh) | | | | | [...i/table/format/cow/Int64TimestampColumnReader.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS90YWJsZS9mb3JtYXQvY293L0ludDY0VGltZXN0YW1wQ29sdW1uUmVhZGVyLmphdmE=) | | | | | [...e/hudi/table/format/mor/MergeOnReadTableState.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS90YWJsZS9mb3JtYXQvbW9yL01lcmdlT25SZWFkVGFibGVTdGF0ZS5qYXZh) | | | | | [.../hudi/table/format/cow/ParquetSplitReaderUtil.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS90YWJsZS9mb3JtYXQvY293L1BhcnF1ZXRTcGxpdFJlYWRlclV0aWwuamF2YQ==) | | | | | ... and [409 more](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree-more) | | -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For
[GitHub] [hudi] codecov-io edited a comment on pull request #2765: [HUDI-1716]: Resolving default values for schema from dataframe
codecov-io edited a comment on pull request #2765: URL: https://github.com/apache/hudi/pull/2765#issuecomment-813008111 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on a change in pull request #2769: [HUDI-1762] Added HiveStylePartitionExtractor to support Hive style partitions
nsivabalan commented on a change in pull request #2769: URL: https://github.com/apache/hudi/pull/2769#discussion_r608577463 ## File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveStylePartitionValueExtractor.java ## @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.hive; + +import java.util.Collections; +import java.util.List; + +/** + * Extractor for Hive Style Partitioned tables, when the parition folders are key value pairs. + * + * This implementation extracts the partition value of -mm-dd from the path of type datestr=-mm-dd. + */ +public class HiveStylePartitionValueExtractor implements PartitionValueExtractor { + private static final long serialVersionUID = 1L; + + @Override + public List extractPartitionValuesInPath(String partitionPath) { +// partition path is expected to be in this format partition_key=partition_value. +String[] splits = partitionPath.split("="); Review comment: may be I am being very nit picky. but is there a chance that partitionpath has "=" in its field name? I know it does not make sense. anyways. partitionpath field name = "datestr" in your example. can it be "partition=path" ? If yes, Collections.singletonList(splits[1]) at line 40 might break right? @n3nash : do we make any such assumptions in hudi's code base wrt partition path's field name? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on pull request #2720: [HUDI-1719]hive on spark/mr,Incremental query of the mor table, the partition field is incorrect
nsivabalan commented on pull request #2720: URL: https://github.com/apache/hudi/pull/2720#issuecomment-814844457 @xushiyan : I see you have disabled test in TestHoodieCombineHiveInputFormat. can you help explain the reason. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-io commented on pull request #2782: [MINOR] ut code optimize
codecov-io commented on pull request #2782: URL: https://github.com/apache/hudi/pull/2782#issuecomment-814831752 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2782?src=pr&el=h1) Report > Merging [#2782](https://codecov.io/gh/apache/hudi/pull/2782?src=pr&el=desc) (ca38e68) into [master](https://codecov.io/gh/apache/hudi/commit/e926c1a45ca95fa1911f6f88a0577554f2797760?el=desc) (e926c1a) will **decrease** coverage by `41.35%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2782/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2782?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master #2782 +/- ## - Coverage 50.73% 9.37% -41.36% + Complexity 3064 48 -3016 Files 419 54 -365 Lines 187971995-16802 Branches 1922 235 -1687 - Hits 9536 187 -9349 + Misses 84851795 -6690 + Partials776 13 -763 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `?` | `?` | | | hudiclient | `?` | `?` | | | hudicommon | `?` | `?` | | | hudiflink | `?` | `?` | | | hudihadoopmr | `?` | `?` | | | hudisparkdatasource | `?` | `?` | | | hudisync | `?` | `?` | | | huditimelineservice | `?` | `?` | | | hudiutilities | `9.37% <ø> (-60.06%)` | `0.00 <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2782?src=pr&el=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2782/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | | | [...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2782/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | | | [...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2782/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | | | [.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2782/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | | | [.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2782/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | | | [...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2782/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | | | [...g/apache/hudi/utilities/sources/JsonDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2782/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkRGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | | | [...apache/hudi/utilities/sources/JsonKafkaSource.java](https://codecov.io/gh/apache/hudi/pull/2782/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkthZmthU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-6.00%)` | | | [...pache/hudi/utilities/sources/ParquetDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2782/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUGFycXVldERGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-5.00%)` | | | [...lities/schema/SchemaProviderWithPostProcessor.java](https://codecov.io/gh/apache/hudi/pull/2782/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFQcm92aWRlc
[jira] [Closed] (HUDI-1773) HoodieFileGroup code optimize
[ https://issues.apache.org/jira/browse/HUDI-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang closed HUDI-1773. -- Resolution: Done 3a926aacf6552fc06005db4a7880a233db904330 > HoodieFileGroup code optimize > - > > Key: HUDI-1773 > URL: https://issues.apache.org/jira/browse/HUDI-1773 > Project: Apache Hudi > Issue Type: Improvement > Components: Common Core >Reporter: 谢波 >Assignee: 谢波 >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > optimize HoodieFileGroup getAllFileSlicesIncludingInflight and > getAllFileSlices > remove unused import. > import java.util.Map; -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1773) HoodieFileGroup code optimize
[ https://issues.apache.org/jira/browse/HUDI-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang updated HUDI-1773: --- Fix Version/s: 0.9.0 > HoodieFileGroup code optimize > - > > Key: HUDI-1773 > URL: https://issues.apache.org/jira/browse/HUDI-1773 > Project: Apache Hudi > Issue Type: Improvement > Components: Common Core >Reporter: 谢波 >Assignee: 谢波 >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > optimize HoodieFileGroup getAllFileSlicesIncludingInflight and > getAllFileSlices > remove unused import. > import java.util.Map; -- This message was sent by Atlassian Jira (v8.3.4#803005)
[hudi] branch master updated: [HUDI-1773] HoodieFileGroup code optimize (#2781)
This is an automated email from the ASF dual-hosted git repository. vinoyang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 3a926aa [HUDI-1773] HoodieFileGroup code optimize (#2781) 3a926aa is described below commit 3a926aacf6552fc06005db4a7880a233db904330 Author: hiscat <46845236+mylanpan...@users.noreply.github.com> AuthorDate: Wed Apr 7 18:16:03 2021 +0800 [HUDI-1773] HoodieFileGroup code optimize (#2781) --- .../main/java/org/apache/hudi/common/model/HoodieFileGroup.java| 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/hudi-common/src/main/java/org/apache/hudi/common/model/HoodieFileGroup.java b/hudi-common/src/main/java/org/apache/hudi/common/model/HoodieFileGroup.java index 849f08e..6979c30 100644 --- a/hudi-common/src/main/java/org/apache/hudi/common/model/HoodieFileGroup.java +++ b/hudi-common/src/main/java/org/apache/hudi/common/model/HoodieFileGroup.java @@ -26,7 +26,6 @@ import org.apache.hudi.common.util.collection.Pair; import java.io.Serializable; import java.util.Comparator; import java.util.List; -import java.util.Map; import java.util.TreeMap; import java.util.stream.Collectors; import java.util.stream.Stream; @@ -133,7 +132,7 @@ public class HoodieFileGroup implements Serializable { * Get all the the file slices including in-flight ones as seen in underlying file-system. */ public Stream getAllFileSlicesIncludingInflight() { -return fileSlices.entrySet().stream().map(Map.Entry::getValue); +return fileSlices.values().stream(); } /** @@ -148,7 +147,7 @@ public class HoodieFileGroup implements Serializable { */ public Stream getAllFileSlices() { if (!timeline.empty()) { - return fileSlices.entrySet().stream().map(Map.Entry::getValue).filter(this::isFileSliceCommitted); + return fileSlices.values().stream().filter(this::isFileSliceCommitted); } return Stream.empty(); } @@ -182,7 +181,7 @@ public class HoodieFileGroup implements Serializable { * Obtain the latest file slice, upto an instantTime i.e < maxInstantTime. * * @param maxInstantTime Max Instant Time - * @return + * @return the latest file slice */ public Option getLatestFileSliceBefore(String maxInstantTime) { return Option.fromJavaOptional(getAllFileSlices().filter(
[GitHub] [hudi] yanghua merged pull request #2781: [HUDI-1773] HoodieFileGroup code optimize
yanghua merged pull request #2781: URL: https://github.com/apache/hudi/pull/2781 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (HUDI-1772) HoodieFileGroupId compareTo logical error(fileId self compare)
[ https://issues.apache.org/jira/browse/HUDI-1772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang closed HUDI-1772. -- Resolution: Fixed f4f9dd9d83a6a852c0e733802c6c49747cde5531 > HoodieFileGroupId compareTo logical error(fileId self compare) > -- > > Key: HUDI-1772 > URL: https://issues.apache.org/jira/browse/HUDI-1772 > Project: Apache Hudi > Issue Type: Bug > Components: Common Core >Reporter: 谢波 >Assignee: 谢波 >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[hudi] branch master updated (dadd081 -> f4f9dd9)
This is an automated email from the ASF dual-hosted git repository. vinoyang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git. from dadd081 [HUDI-1751] DeltaStreamer print many unnecessary warn log (#2754) add f4f9dd9 [HUDI-1772] HoodieFileGroupId compareTo logical error(fileId self compare) (#2780) No new revisions were added by this update. Summary of changes: .../src/main/java/org/apache/hudi/common/model/HoodieFileGroupId.java | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
[GitHub] [hudi] yanghua merged pull request #2780: [HUDI-1772] HoodieFileGroupId compareTo logical error(fileId self compare)
yanghua merged pull request #2780: URL: https://github.com/apache/hudi/pull/2780 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] ztcheck edited a comment on issue #2680: [SUPPORT]Hive sync error by using run_sync_tool.sh
ztcheck edited a comment on issue #2680: URL: https://github.com/apache/hudi/issues/2680#issuecomment-814772442 @n3nash, Yes,`hudi-hive-sync-bundle` already in the script `run_sync_tool .sh` . I use the default value `HUDI_HIVE_UBER_JAR` in the script, such like ' HUDI_HIVE_UBER_JAR=`ls -c $DIR/../../packaging/hudi-hive-sync-bundle/target/hudi-hive-sync-*.jar | grep -v source | head -1` ' -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] ztcheck edited a comment on issue #2680: [SUPPORT]Hive sync error by using run_sync_tool.sh
ztcheck edited a comment on issue #2680: URL: https://github.com/apache/hudi/issues/2680#issuecomment-814772442 @n3nash, Yes,`hudi-hive-sync-bundle` already in the script `run_sync_tool .sh` . I use the default value `HUDI_HIVE_UBER_JAR` in the script.Such like ' HUDI_HIVE_UBER_JAR=`ls -c $DIR/../../packaging/hudi-hive-sync-bundle/target/hudi-hive-sync-*.jar | grep -v source | head -1` ' -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] ztcheck edited a comment on issue #2680: [SUPPORT]Hive sync error by using run_sync_tool.sh
ztcheck edited a comment on issue #2680: URL: https://github.com/apache/hudi/issues/2680#issuecomment-800020976 My environment is k8s. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] ztcheck edited a comment on issue #2680: [SUPPORT]Hive sync error by using run_sync_tool.sh
ztcheck edited a comment on issue #2680: URL: https://github.com/apache/hudi/issues/2680#issuecomment-814772442 @n3nash, Yes,`hudi-hive-sync-bundle` already in the script `run_sync_tool .sh` . I use the default value `HUDI_HIVE_UBER_JAR` in the script.Just like ' HUDI_HIVE_UBER_JAR=`ls -c $DIR/../../packaging/hudi-hive-sync-bundle/target/hudi-hive-sync-*.jar | grep -v source | head -1` ' -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] ztcheck commented on issue #2680: [SUPPORT]Hive sync error by using run_sync_tool.sh
ztcheck commented on issue #2680: URL: https://github.com/apache/hudi/issues/2680#issuecomment-814772442 @n3nash, Yes,`hudi-hive-sync-bundle` already in the script `run_sync_tool .sh` . I use the default value `HUDI_HIVE_UBER_JAR` in the script.Just like `HUDI_HIVE_UBER_JAR=`ls -c $DIR/../../packaging/hudi-hive-sync-bundle/target/hudi-hive-sync-*.jar | grep -v source | head -1`` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org