[jira] [Created] (HUDI-4381) Add support for reading Protobuf data from Kafka
Vinay created HUDI-4381: --- Summary: Add support for reading Protobuf data from Kafka Key: HUDI-4381 URL: https://issues.apache.org/jira/browse/HUDI-4381 Project: Apache Hudi Issue Type: New Feature Components: deltastreamer Reporter: Vinay Currently DelaStreamer supports Avro/JSON while reading from Kafka. We should add support for Protobuf as well. The schema for the same can be read from a file or schema registry -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-3496) Add note for S3 Versioned Bucket
[ https://issues.apache.org/jira/browse/HUDI-3496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated HUDI-3496: Status: In Progress (was: Open) > Add note for S3 Versioned Bucket > > > Key: HUDI-3496 > URL: https://issues.apache.org/jira/browse/HUDI-3496 > Project: Apache Hudi > Issue Type: Task > Components: docs >Reporter: Vinay >Assignee: Vinay >Priority: Trivial > > We have faced one issue where the AWS SDK ListApi was choking because the > number of delete markers for /.hoodie/ and .hoodie/temp folder has increased > to 1000. > > This task is to add a note to setup up Lifecycle rule properly -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HUDI-3496) Add note for S3 Versioned Bucket
Vinay created HUDI-3496: --- Summary: Add note for S3 Versioned Bucket Key: HUDI-3496 URL: https://issues.apache.org/jira/browse/HUDI-3496 Project: Apache Hudi Issue Type: Task Components: docs Reporter: Vinay Assignee: Vinay We have faced one issue where the AWS SDK ListApi was choking because the number of delete markers for /.hoodie/ and .hoodie/temp folder has increased to 1000. This task is to add a note to setup up Lifecycle rule properly -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HUDI-310) DynamoDB/Kinesis Change Capture using Delta Streamer
[ https://issues.apache.org/jira/browse/HUDI-310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17471309#comment-17471309 ] Vinay commented on HUDI-310: [~vinoth] I remember discussing about this, sry it went to backlog, I am taking this up. > DynamoDB/Kinesis Change Capture using Delta Streamer > > > Key: HUDI-310 > URL: https://issues.apache.org/jira/browse/HUDI-310 > Project: Apache Hudi > Issue Type: New Feature > Components: DeltaStreamer >Reporter: Vinoth Chandar >Assignee: Vinay >Priority: Major > > The goal here is to do CDC from DynamoDB and then have it be ingested into S3 > as a Hudi dataset > Few resources: > # DynamoDB Streams > [https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html] > provides change capture logs in Kinesis. > # Walkthrough > [https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.KCLAdapter.Walkthrough.html] > Code [https://github.com/awslabs/dynamodb-streams-kinesis-adapter] > # Spark Streaming has support for reading Kinesis streams > [https://spark.apache.org/docs/2.4.4/streaming-kinesis-integration.html] one > of the many resources showing how to change the Spark Kinesis example code to > consume dynamodb stream > [https://medium.com/@ravi72munde/using-spark-streaming-with-dynamodb-d325b9a73c79] > # In DeltaStreamer, we need to add some form of KinesisSource that returns a > RDD with new data everytime `fetchNewData` is called > [https://github.com/apache/incubator-hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/Source.java] > . DeltaStreamer itself does not use Spark Streaming APIs > # Internally, we have Avro, Json, Row sources that extract data in these > formats. > Open questions : > # Should this just be a KinesisSource inside Hudi, that needs to be > configured differently or do we need two sources: DynamoDBKinesisSource (that > does some DynamoDB Stream specific setup/assumptions) and a plain > KinesisSource. What's more valuable to do , if we have to pick one. > # For Kafka integration, we just reused the KafkaRDD in Spark Streaming > easily and avoided writing a lot of code by hand. Could we pull the same > thing off for Kinesis? (probably needs digging through Spark code) > # What's the format of the data for DynamoDB streams? > > > We should probably flesh these out before going ahead with implementation? > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HUDI-310) DynamoDB/Kinesis Change Capture using Delta Streamer
[ https://issues.apache.org/jira/browse/HUDI-310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay reassigned HUDI-310: -- Assignee: Vinay (was: Suneel Marthi) > DynamoDB/Kinesis Change Capture using Delta Streamer > > > Key: HUDI-310 > URL: https://issues.apache.org/jira/browse/HUDI-310 > Project: Apache Hudi > Issue Type: New Feature > Components: DeltaStreamer >Reporter: Vinoth Chandar >Assignee: Vinay >Priority: Major > > The goal here is to do CDC from DynamoDB and then have it be ingested into S3 > as a Hudi dataset > Few resources: > # DynamoDB Streams > [https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html] > provides change capture logs in Kinesis. > # Walkthrough > [https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.KCLAdapter.Walkthrough.html] > Code [https://github.com/awslabs/dynamodb-streams-kinesis-adapter] > # Spark Streaming has support for reading Kinesis streams > [https://spark.apache.org/docs/2.4.4/streaming-kinesis-integration.html] one > of the many resources showing how to change the Spark Kinesis example code to > consume dynamodb stream > [https://medium.com/@ravi72munde/using-spark-streaming-with-dynamodb-d325b9a73c79] > # In DeltaStreamer, we need to add some form of KinesisSource that returns a > RDD with new data everytime `fetchNewData` is called > [https://github.com/apache/incubator-hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/Source.java] > . DeltaStreamer itself does not use Spark Streaming APIs > # Internally, we have Avro, Json, Row sources that extract data in these > formats. > Open questions : > # Should this just be a KinesisSource inside Hudi, that needs to be > configured differently or do we need two sources: DynamoDBKinesisSource (that > does some DynamoDB Stream specific setup/assumptions) and a plain > KinesisSource. What's more valuable to do , if we have to pick one. > # For Kafka integration, we just reused the KafkaRDD in Spark Streaming > easily and avoided writing a lot of code by hand. Could we pull the same > thing off for Kinesis? (probably needs digging through Spark code) > # What's the format of the data for DynamoDB streams? > > > We should probably flesh these out before going ahead with implementation? > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (HUDI-2257) Add a note to set keygenerator class while deleting data
[ https://issues.apache.org/jira/browse/HUDI-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay resolved HUDI-2257. - Resolution: Done > Add a note to set keygenerator class while deleting data > > > Key: HUDI-2257 > URL: https://issues.apache.org/jira/browse/HUDI-2257 > Project: Apache Hudi > Issue Type: Improvement > Components: Docs >Reporter: Vinay >Assignee: Vinay >Priority: Minor > Labels: pull-request-available > > Copying examples from this blog > [https://hudi.apache.org/blog/delete-support-in-hudi/] , does not work as is > for Non-Partitioned table, user have to explicitly set the following option > in order for delete to work > {code:java} > option("hoodie.datasource.write.keygenerator.class","org.apache.hudi.keygen.NonpartitionedKeyGenerator") > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2499) jdbc-url, user and pass is required to be passed in HMS mode
[ https://issues.apache.org/jira/browse/HUDI-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated HUDI-2499: Summary: jdbc-url, user and pass is required to be passed in HMS mode (was: jdbc-url is required to be passed in HMS mode) > jdbc-url, user and pass is required to be passed in HMS mode > > > Key: HUDI-2499 > URL: https://issues.apache.org/jira/browse/HUDI-2499 > Project: Apache Hudi > Issue Type: Bug > Components: Hive Integration >Affects Versions: 0.9.0 >Reporter: Vinay >Assignee: Vinay >Priority: Major > > When trying out HMS mode the command fails if jdbc-url is not passed. This is > not a required property for HMS mode. > {code:java} > Exception in thread "main" com.beust.jcommander.ParameterException: The > following option is required: [--jdbc-url]Exception in thread "main" > com.beust.jcommander.ParameterException: The following option is required: > [--jdbc-url] at > com.beust.jcommander.JCommander.validateOptions(JCommander.java:381) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2499) jdbc-url is required to be passed in HMS mode
[ https://issues.apache.org/jira/browse/HUDI-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated HUDI-2499: Affects Version/s: 0.9.0 > jdbc-url is required to be passed in HMS mode > - > > Key: HUDI-2499 > URL: https://issues.apache.org/jira/browse/HUDI-2499 > Project: Apache Hudi > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Vinay >Assignee: Vinay >Priority: Major > > When trying out HMS mode the command fails if jdbc-url is not passed. This is > not a required property for HMS mode. > {code:java} > Exception in thread "main" com.beust.jcommander.ParameterException: The > following option is required: [--jdbc-url]Exception in thread "main" > com.beust.jcommander.ParameterException: The following option is required: > [--jdbc-url] at > com.beust.jcommander.JCommander.validateOptions(JCommander.java:381) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2499) jdbc-url is required to be passed in HMS mode
[ https://issues.apache.org/jira/browse/HUDI-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated HUDI-2499: Component/s: Hive Integration > jdbc-url is required to be passed in HMS mode > - > > Key: HUDI-2499 > URL: https://issues.apache.org/jira/browse/HUDI-2499 > Project: Apache Hudi > Issue Type: Bug > Components: Hive Integration >Affects Versions: 0.9.0 >Reporter: Vinay >Assignee: Vinay >Priority: Major > > When trying out HMS mode the command fails if jdbc-url is not passed. This is > not a required property for HMS mode. > {code:java} > Exception in thread "main" com.beust.jcommander.ParameterException: The > following option is required: [--jdbc-url]Exception in thread "main" > com.beust.jcommander.ParameterException: The following option is required: > [--jdbc-url] at > com.beust.jcommander.JCommander.validateOptions(JCommander.java:381) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2499) jdbc-url is required to be passed in HMS mode
[ https://issues.apache.org/jira/browse/HUDI-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated HUDI-2499: Status: In Progress (was: Open) > jdbc-url is required to be passed in HMS mode > - > > Key: HUDI-2499 > URL: https://issues.apache.org/jira/browse/HUDI-2499 > Project: Apache Hudi > Issue Type: Bug >Reporter: Vinay >Assignee: Vinay >Priority: Major > > When trying out HMS mode the command fails if jdbc-url is not passed. This is > not a required property for HMS mode. > {code:java} > Exception in thread "main" com.beust.jcommander.ParameterException: The > following option is required: [--jdbc-url]Exception in thread "main" > com.beust.jcommander.ParameterException: The following option is required: > [--jdbc-url] at > com.beust.jcommander.JCommander.validateOptions(JCommander.java:381) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-2499) jdbc-url is required to be passed in HMS mode
Vinay created HUDI-2499: --- Summary: jdbc-url is required to be passed in HMS mode Key: HUDI-2499 URL: https://issues.apache.org/jira/browse/HUDI-2499 Project: Apache Hudi Issue Type: Bug Reporter: Vinay Assignee: Vinay When trying out HMS mode the command fails if jdbc-url is not passed. This is not a required property for HMS mode. {code:java} Exception in thread "main" com.beust.jcommander.ParameterException: The following option is required: [--jdbc-url]Exception in thread "main" com.beust.jcommander.ParameterException: The following option is required: [--jdbc-url] at com.beust.jcommander.JCommander.validateOptions(JCommander.java:381) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-2498) Support Hive sync to work with s3
Vinay created HUDI-2498: --- Summary: Support Hive sync to work with s3 Key: HUDI-2498 URL: https://issues.apache.org/jira/browse/HUDI-2498 Project: Apache Hudi Issue Type: New Feature Components: Hive Integration Reporter: Vinay Assignee: Vinay Currently Hive sync is not working with s3 out of the box, we have to add dependencies explicitly to run_hive_sync script to make it work. It works fine on EMR but does not work with standalone clusters -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2257) Add a note to set keygenerator class while deleting data
[ https://issues.apache.org/jira/browse/HUDI-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated HUDI-2257: Status: In Progress (was: Open) > Add a note to set keygenerator class while deleting data > > > Key: HUDI-2257 > URL: https://issues.apache.org/jira/browse/HUDI-2257 > Project: Apache Hudi > Issue Type: Improvement > Components: Docs >Reporter: Vinay >Assignee: Vinay >Priority: Minor > > Copying examples from this blog > [https://hudi.apache.org/blog/delete-support-in-hudi/] , does not work as is > for Non-Partitioned table, user have to explicitly set the following option > in order for delete to work > {code:java} > option("hoodie.datasource.write.keygenerator.class","org.apache.hudi.keygen.NonpartitionedKeyGenerator") > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2257) Add a note to set keygenerator class while deleting data
[ https://issues.apache.org/jira/browse/HUDI-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated HUDI-2257: Description: Copying examples from this blog [https://hudi.apache.org/blog/delete-support-in-hudi/] , does not work as is for Non-Partitioned table, user have to explicitly set the following option in order for delete to work {code:java} option("hoodie.datasource.write.keygenerator.class","org.apache.hudi.keygen.NonpartitionedKeyGenerator") {code} was:Copying examples from this blog [https://hudi.apache.org/blog/delete-support-in-hudi/] , does not work as is for Non-Partitioned table, user have to explic > Add a note to set keygenerator class while deleting data > > > Key: HUDI-2257 > URL: https://issues.apache.org/jira/browse/HUDI-2257 > Project: Apache Hudi > Issue Type: Improvement > Components: Docs >Reporter: Vinay >Assignee: Vinay >Priority: Minor > > Copying examples from this blog > [https://hudi.apache.org/blog/delete-support-in-hudi/] , does not work as is > for Non-Partitioned table, user have to explicitly set the following option > in order for delete to work > {code:java} > option("hoodie.datasource.write.keygenerator.class","org.apache.hudi.keygen.NonpartitionedKeyGenerator") > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2257) Add a note to set keygenerator class while deleting data
[ https://issues.apache.org/jira/browse/HUDI-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated HUDI-2257: Description: Copying examples from this blog [https://hudi.apache.org/blog/delete-support-in-hudi/] , does not work as is for Non-Partitioned table, user have to explic > Add a note to set keygenerator class while deleting data > > > Key: HUDI-2257 > URL: https://issues.apache.org/jira/browse/HUDI-2257 > Project: Apache Hudi > Issue Type: Improvement > Components: Docs >Reporter: Vinay >Assignee: Vinay >Priority: Minor > > Copying examples from this blog > [https://hudi.apache.org/blog/delete-support-in-hudi/] , does not work as is > for Non-Partitioned table, user have to explic -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-2257) Add a note to set keygenerator class while deleting data
Vinay created HUDI-2257: --- Summary: Add a note to set keygenerator class while deleting data Key: HUDI-2257 URL: https://issues.apache.org/jira/browse/HUDI-2257 Project: Apache Hudi Issue Type: Improvement Components: Docs Reporter: Vinay Assignee: Vinay -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-2192) Clean up Multiple versions of scala libraries detected Warning
[ https://issues.apache.org/jira/browse/HUDI-2192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay resolved HUDI-2192. - Resolution: Fixed Fixed - 5a94b6bf54b18739da55ebde10adf93f133e3204 > Clean up Multiple versions of scala libraries detected Warning > -- > > Key: HUDI-2192 > URL: https://issues.apache.org/jira/browse/HUDI-2192 > Project: Apache Hudi > Issue Type: Improvement > Components: Code Cleanup >Reporter: Vinay >Assignee: Vinay >Priority: Minor > Labels: pull-request-available > Fix For: 0.9.0 > > > Building from source results in following warning > > {code:java} > [INFO] --- scala-maven-plugin:3.3.1:testCompile (scala-test-compile) @ > hudi-cli --- > [WARNING] Expected all dependencies to require Scala version: 2.11.12 > [WARNING] org.apache.hudi:hudi-cli:0.9.0-SNAPSHOT requires scala version: > 2.11.12 > [WARNING] org.apache.hudi:hudi-spark-client:0.9.0-SNAPSHOT requires scala > version: 2.11.12 > [WARNING] org.apache.hudi:hudi-spark_2.11:0.9.0-SNAPSHOT requires scala > version: 2.11.12 > [WARNING] org.apache.hudi:hudi-spark-client:0.9.0-SNAPSHOT requires scala > version: 2.11.12 > [WARNING] org.apache.hudi:hudi-spark-common_2.11:0.9.0-SNAPSHOT requires > scala version: 2.11.12 > [WARNING] org.apache.hudi:hudi-spark_2.11:0.9.0-SNAPSHOT requires scala > version: 2.11.12 > [WARNING] com.fasterxml.jackson.module:jackson-module-scala_2.11:2.6.7.1 > requires scala version: 2.11.8 > [WARNING] Multiple versions of scala libraries detected! > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-2192) Clean up Multiple versions of scala libraries detected Warning
Vinay created HUDI-2192: --- Summary: Clean up Multiple versions of scala libraries detected Warning Key: HUDI-2192 URL: https://issues.apache.org/jira/browse/HUDI-2192 Project: Apache Hudi Issue Type: Improvement Components: Code Cleanup Reporter: Vinay Assignee: Vinay Fix For: 0.9.0 Building from source results in following warning {code:java} [INFO] --- scala-maven-plugin:3.3.1:testCompile (scala-test-compile) @ hudi-cli --- [WARNING] Expected all dependencies to require Scala version: 2.11.12 [WARNING] org.apache.hudi:hudi-cli:0.9.0-SNAPSHOT requires scala version: 2.11.12 [WARNING] org.apache.hudi:hudi-spark-client:0.9.0-SNAPSHOT requires scala version: 2.11.12 [WARNING] org.apache.hudi:hudi-spark_2.11:0.9.0-SNAPSHOT requires scala version: 2.11.12 [WARNING] org.apache.hudi:hudi-spark-client:0.9.0-SNAPSHOT requires scala version: 2.11.12 [WARNING] org.apache.hudi:hudi-spark-common_2.11:0.9.0-SNAPSHOT requires scala version: 2.11.12 [WARNING] org.apache.hudi:hudi-spark_2.11:0.9.0-SNAPSHOT requires scala version: 2.11.12 [WARNING] com.fasterxml.jackson.module:jackson-module-scala_2.11:2.6.7.1 requires scala version: 2.11.8 [WARNING] Multiple versions of scala libraries detected! {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2192) Clean up Multiple versions of scala libraries detected Warning
[ https://issues.apache.org/jira/browse/HUDI-2192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated HUDI-2192: Status: In Progress (was: Open) > Clean up Multiple versions of scala libraries detected Warning > -- > > Key: HUDI-2192 > URL: https://issues.apache.org/jira/browse/HUDI-2192 > Project: Apache Hudi > Issue Type: Improvement > Components: Code Cleanup >Reporter: Vinay >Assignee: Vinay >Priority: Minor > Fix For: 0.9.0 > > > Building from source results in following warning > > {code:java} > [INFO] --- scala-maven-plugin:3.3.1:testCompile (scala-test-compile) @ > hudi-cli --- > [WARNING] Expected all dependencies to require Scala version: 2.11.12 > [WARNING] org.apache.hudi:hudi-cli:0.9.0-SNAPSHOT requires scala version: > 2.11.12 > [WARNING] org.apache.hudi:hudi-spark-client:0.9.0-SNAPSHOT requires scala > version: 2.11.12 > [WARNING] org.apache.hudi:hudi-spark_2.11:0.9.0-SNAPSHOT requires scala > version: 2.11.12 > [WARNING] org.apache.hudi:hudi-spark-client:0.9.0-SNAPSHOT requires scala > version: 2.11.12 > [WARNING] org.apache.hudi:hudi-spark-common_2.11:0.9.0-SNAPSHOT requires > scala version: 2.11.12 > [WARNING] org.apache.hudi:hudi-spark_2.11:0.9.0-SNAPSHOT requires scala > version: 2.11.12 > [WARNING] com.fasterxml.jackson.module:jackson-module-scala_2.11:2.6.7.1 > requires scala version: 2.11.8 > [WARNING] Multiple versions of scala libraries detected! > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-2168) AccessControlException for anonymous user
[ https://issues.apache.org/jira/browse/HUDI-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay resolved HUDI-2168. - Resolution: Fixed > AccessControlException for anonymous user > - > > Key: HUDI-2168 > URL: https://issues.apache.org/jira/browse/HUDI-2168 > Project: Apache Hudi > Issue Type: Task > Components: Testing >Reporter: Vinay >Assignee: Vinay >Priority: Trivial > Labels: pull-request-available > > Users are facing the following exception while executing test case dependent > on starting Hive service > > {code:java} > Got exception: org.apache.hadoop.security.AccessControlException Permission > denied: user=anonymous, access=WRITE > {code} > This is specifically happening at the time of clearing Hive DB > {code:java} > client.updateHiveSQL("drop database if exists " + > hiveSyncConfig.databaseName); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2168) AccessControlException for anonymous user
[ https://issues.apache.org/jira/browse/HUDI-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated HUDI-2168: Status: In Progress (was: Open) > AccessControlException for anonymous user > - > > Key: HUDI-2168 > URL: https://issues.apache.org/jira/browse/HUDI-2168 > Project: Apache Hudi > Issue Type: Task > Components: Testing >Reporter: Vinay >Assignee: Vinay >Priority: Trivial > > Users are facing the following exception while executing test case dependent > on starting Hive service > > {code:java} > Got exception: org.apache.hadoop.security.AccessControlException Permission > denied: user=anonymous, access=WRITE > {code} > This is specifically happening at the time of clearing Hive DB > {code:java} > client.updateHiveSQL("drop database if exists " + > hiveSyncConfig.databaseName); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-2168) AccessControlException for anonymous user
Vinay created HUDI-2168: --- Summary: AccessControlException for anonymous user Key: HUDI-2168 URL: https://issues.apache.org/jira/browse/HUDI-2168 Project: Apache Hudi Issue Type: Task Components: Testing Reporter: Vinay Assignee: Vinay Users are facing the following exception while executing test case dependent on starting Hive service {code:java} Got exception: org.apache.hadoop.security.AccessControlException Permission denied: user=anonymous, access=WRITE {code} This is specifically happening at the time of clearing Hive DB {code:java} client.updateHiveSQL("drop database if exists " + hiveSyncConfig.databaseName); {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2083) Hudi CLI does not work with S3
[ https://issues.apache.org/jira/browse/HUDI-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated HUDI-2083: Status: In Progress (was: Open) > Hudi CLI does not work with S3 > -- > > Key: HUDI-2083 > URL: https://issues.apache.org/jira/browse/HUDI-2083 > Project: Apache Hudi > Issue Type: Task > Components: CLI >Reporter: Vinay >Assignee: Vinay >Priority: Major > > Hudi CLI gives exception when trying to connect to s3 path > {code:java} > create --path s3://some-bucket/tmp/hudi/test_mor --tableName test_mor_s3 > --tableType MERGE_ON_READ > Failed to get instance of org.apache.hadoop.fs.FileSystem > org.apache.hudi.exception.HoodieIOException: Failed to get instance of > org.apache.hadoop.fs.FileSystem > at org.apache.hudi.common.fs.FSUtils.getFs(FSUtils.java:98) > = > create --path s3a://some-bucket/tmp/hudi/test_mor --tableName test_mor_s3 > --tableType MERGE_ON_READ > Command failed java.lang.RuntimeException: java.lang.ClassNotFoundException: > Class org.apache.hadoop.fs.s3a.S3AFileSystem not found > java.lang.ClassNotFoundException: Class > org.apache.hadoop.fs.s3a.S3AFileSystem not found > java.lang.RuntimeException: java.lang.ClassNotFoundException: Class > org.apache.hadoop.fs.s3a.S3AFileSystem not found > at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195) > at > org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2654) > {code} > This could be because target/lib folder does not contain hadoop-aws or aws-s3 > dependency. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-1910) Supporting Kafka based checkpointing for HoodieDeltaStreamer
[ https://issues.apache.org/jira/browse/HUDI-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay resolved HUDI-1910. - Resolution: Implemented > Supporting Kafka based checkpointing for HoodieDeltaStreamer > > > Key: HUDI-1910 > URL: https://issues.apache.org/jira/browse/HUDI-1910 > Project: Apache Hudi > Issue Type: Improvement > Components: DeltaStreamer >Reporter: Nishith Agarwal >Assignee: Vinay >Priority: Major > Labels: pull-request-available, sev:normal, triaged > > HoodieDeltaStreamer currently supports commit metadata based checkpoint. Some > users have requested support for Kafka based checkpoints for freshness > auditing purposes. This ticket tracks any implementation for that. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1910) Supporting Kafka based checkpointing for HoodieDeltaStreamer
[ https://issues.apache.org/jira/browse/HUDI-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated HUDI-1910: Status: In Progress (was: Open) > Supporting Kafka based checkpointing for HoodieDeltaStreamer > > > Key: HUDI-1910 > URL: https://issues.apache.org/jira/browse/HUDI-1910 > Project: Apache Hudi > Issue Type: Improvement > Components: DeltaStreamer >Reporter: Nishith Agarwal >Assignee: Vinay >Priority: Major > Labels: pull-request-available, sev:normal, triaged > > HoodieDeltaStreamer currently supports commit metadata based checkpoint. Some > users have requested support for Kafka based checkpoints for freshness > auditing purposes. This ticket tracks any implementation for that. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1944) Support Hudi to read from Kafka Consumer Group Offset
[ https://issues.apache.org/jira/browse/HUDI-1944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated HUDI-1944: Status: In Progress (was: Open) > Support Hudi to read from Kafka Consumer Group Offset > - > > Key: HUDI-1944 > URL: https://issues.apache.org/jira/browse/HUDI-1944 > Project: Apache Hudi > Issue Type: Sub-task > Components: DeltaStreamer >Reporter: Vinay >Assignee: Vinay >Priority: Major > > Currently, Hudi provides options to read from latest or earliest. We should > even provide users an option to read from group offset as well. > This change will be in `KafkaOffsetGen` where we can add a method to support > this functionality -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1910) Supporting Kafka based checkpointing for HoodieDeltaStreamer
[ https://issues.apache.org/jira/browse/HUDI-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17370650#comment-17370650 ] Vinay commented on HUDI-1910: - Done - 039aeb6dcee0a8eb4372c079ec07b8fc2582e41f > Supporting Kafka based checkpointing for HoodieDeltaStreamer > > > Key: HUDI-1910 > URL: https://issues.apache.org/jira/browse/HUDI-1910 > Project: Apache Hudi > Issue Type: Improvement > Components: DeltaStreamer >Reporter: Nishith Agarwal >Assignee: Vinay >Priority: Major > Labels: pull-request-available, sev:normal, triaged > > HoodieDeltaStreamer currently supports commit metadata based checkpoint. Some > users have requested support for Kafka based checkpoints for freshness > auditing purposes. This ticket tracks any implementation for that. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2083) Hudi CLI does not work with S3
[ https://issues.apache.org/jira/browse/HUDI-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated HUDI-2083: Description: Hudi CLI gives exception when trying to connect to s3 path {code:java} create --path s3://some-bucket/tmp/hudi/test_mor --tableName test_mor_s3 --tableType MERGE_ON_READ Failed to get instance of org.apache.hadoop.fs.FileSystem org.apache.hudi.exception.HoodieIOException: Failed to get instance of org.apache.hadoop.fs.FileSystem at org.apache.hudi.common.fs.FSUtils.getFs(FSUtils.java:98) = create --path s3a://some-bucket/tmp/hudi/test_mor --tableName test_mor_s3 --tableType MERGE_ON_READ Command failed java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195) at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2654) {code} This could be because target/lib folder does not contain hadoop-aws or aws-s3 dependency. was: Hudi CLI gives exception when trying to connect to s3 path {code:java} create --path s3://some-bucket/tmp/hudi/test_mor --tableName test_mor_s3 --tableType MERGE_ON_READ Failed to get instance of org.apache.hadoop.fs.FileSystem org.apache.hudi.exception.HoodieIOException: Failed to get instance of org.apache.hadoop.fs.FileSystem at org.apache.hudi.common.fs.FSUtils.getFs(FSUtils.java:98) = create --path s3a://some-bucket/tmp/hudi/test_mor --tableName test_mor_s3 --tableType MERGE_ON_READ Command failed java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195) at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2654) {code} This could be because target/lib folder does not contain hadoop-aws or was-s3 dependency. > Hudi CLI does not work with S3 > -- > > Key: HUDI-2083 > URL: https://issues.apache.org/jira/browse/HUDI-2083 > Project: Apache Hudi > Issue Type: Task > Components: CLI >Reporter: Vinay >Assignee: Vinay >Priority: Major > > Hudi CLI gives exception when trying to connect to s3 path > {code:java} > create --path s3://some-bucket/tmp/hudi/test_mor --tableName test_mor_s3 > --tableType MERGE_ON_READ > Failed to get instance of org.apache.hadoop.fs.FileSystem > org.apache.hudi.exception.HoodieIOException: Failed to get instance of > org.apache.hadoop.fs.FileSystem > at org.apache.hudi.common.fs.FSUtils.getFs(FSUtils.java:98) > = > create --path s3a://some-bucket/tmp/hudi/test_mor --tableName test_mor_s3 > --tableType MERGE_ON_READ > Command failed java.lang.RuntimeException: java.lang.ClassNotFoundException: > Class org.apache.hadoop.fs.s3a.S3AFileSystem not found > java.lang.ClassNotFoundException: Class > org.apache.hadoop.fs.s3a.S3AFileSystem not found > java.lang.RuntimeException: java.lang.ClassNotFoundException: Class > org.apache.hadoop.fs.s3a.S3AFileSystem not found > at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195) > at > org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2654) > {code} > This could be because target/lib folder does not contain hadoop-aws or aws-s3 > dependency. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-2083) Hudi CLI does not with S3
Vinay created HUDI-2083: --- Summary: Hudi CLI does not with S3 Key: HUDI-2083 URL: https://issues.apache.org/jira/browse/HUDI-2083 Project: Apache Hudi Issue Type: Task Components: CLI Reporter: Vinay Assignee: Vinay Hudi CLI gives exception when trying to connect to s3 path {code:java} create --path s3://some-bucket/tmp/hudi/test_mor --tableName test_mor_s3 --tableType MERGE_ON_READ Failed to get instance of org.apache.hadoop.fs.FileSystem org.apache.hudi.exception.HoodieIOException: Failed to get instance of org.apache.hadoop.fs.FileSystem at org.apache.hudi.common.fs.FSUtils.getFs(FSUtils.java:98) = create --path s3a://some-bucket/tmp/hudi/test_mor --tableName test_mor_s3 --tableType MERGE_ON_READ Command failed java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195) at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2654) {code} This could be because target/lib folder does not contain hadoop-aws or was-s3 dependency. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2083) Hudi CLI does not work with S3
[ https://issues.apache.org/jira/browse/HUDI-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated HUDI-2083: Summary: Hudi CLI does not work with S3 (was: Hudi CLI does not with S3) > Hudi CLI does not work with S3 > -- > > Key: HUDI-2083 > URL: https://issues.apache.org/jira/browse/HUDI-2083 > Project: Apache Hudi > Issue Type: Task > Components: CLI >Reporter: Vinay >Assignee: Vinay >Priority: Major > > Hudi CLI gives exception when trying to connect to s3 path > {code:java} > create --path s3://some-bucket/tmp/hudi/test_mor --tableName test_mor_s3 > --tableType MERGE_ON_READ > Failed to get instance of org.apache.hadoop.fs.FileSystem > org.apache.hudi.exception.HoodieIOException: Failed to get instance of > org.apache.hadoop.fs.FileSystem > at org.apache.hudi.common.fs.FSUtils.getFs(FSUtils.java:98) > = > create --path s3a://some-bucket/tmp/hudi/test_mor --tableName test_mor_s3 > --tableType MERGE_ON_READ > Command failed java.lang.RuntimeException: java.lang.ClassNotFoundException: > Class org.apache.hadoop.fs.s3a.S3AFileSystem not found > java.lang.ClassNotFoundException: Class > org.apache.hadoop.fs.s3a.S3AFileSystem not found > java.lang.RuntimeException: java.lang.ClassNotFoundException: Class > org.apache.hadoop.fs.s3a.S3AFileSystem not found > at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195) > at > org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2654) > {code} > This could be because target/lib folder does not contain hadoop-aws or was-s3 > dependency. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1976) Upgrade hive, jackson, log4j, hadoop to remove vulnerability
[ https://issues.apache.org/jira/browse/HUDI-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated HUDI-1976: Status: In Progress (was: Open) > Upgrade hive, jackson, log4j, hadoop to remove vulnerability > > > Key: HUDI-1976 > URL: https://issues.apache.org/jira/browse/HUDI-1976 > Project: Apache Hudi > Issue Type: Task > Components: Hive Integration >Reporter: Nishith Agarwal >Assignee: Vinay >Priority: Blocker > Labels: pull-request-available > Fix For: 0.9.0 > > > [https://github.com/apache/hudi/issues/2827] > [https://github.com/apache/hudi/issues/2826] > [https://github.com/apache/hudi/issues/2824|https://github.com/apache/hudi/issues/2826] > [https://github.com/apache/hudi/issues/2823|https://github.com/apache/hudi/issues/2826] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HUDI-1976) Upgrade hive, jackson, log4j, hadoop to remove vulnerability
[ https://issues.apache.org/jira/browse/HUDI-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay reassigned HUDI-1976: --- Assignee: Vinay > Upgrade hive, jackson, log4j, hadoop to remove vulnerability > > > Key: HUDI-1976 > URL: https://issues.apache.org/jira/browse/HUDI-1976 > Project: Apache Hudi > Issue Type: Task > Components: Hive Integration >Reporter: Nishith Agarwal >Assignee: Vinay >Priority: Blocker > Labels: pull-request-available > Fix For: 0.9.0 > > > [https://github.com/apache/hudi/issues/2827] > [https://github.com/apache/hudi/issues/2826] > [https://github.com/apache/hudi/issues/2824|https://github.com/apache/hudi/issues/2826] > [https://github.com/apache/hudi/issues/2823|https://github.com/apache/hudi/issues/2826] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-2082) Provide option to choose Spark or Flink Delta Streamer
Vinay created HUDI-2082: --- Summary: Provide option to choose Spark or Flink Delta Streamer Key: HUDI-2082 URL: https://issues.apache.org/jira/browse/HUDI-2082 Project: Apache Hudi Issue Type: New Feature Components: DeltaStreamer Reporter: Vinay Assignee: Vinay Currently, Hudi supports Flink as well Spark engine, there are two different classes for DeltaStreamer 1. HoodieDeltaStreamer 2.HoodieFlinkStreamer We should have a provision to pass the flag like --runner to choose between Flink or Spark and have a single entry point class which will take all the common configs. Based on the runner flag, we can call HoodieDeltaStreamer or HoodieFlinkStreamer This also takes care of making DeltaStreamer generic -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1975) Upgrade java-prometheus-client from 3.1.2 to 4.x
[ https://issues.apache.org/jira/browse/HUDI-1975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369811#comment-17369811 ] Vinay commented on HUDI-1975: - [~nishith29] I have tried 1st option, the build is passing locally. Asked asked the reporter on [https://github.com/apache/hudi/issues/2774] to try out this change - [https://github.com/apache/hudi/pull/3160] > Upgrade java-prometheus-client from 3.1.2 to 4.x > > > Key: HUDI-1975 > URL: https://issues.apache.org/jira/browse/HUDI-1975 > Project: Apache Hudi > Issue Type: Task >Reporter: Nishith Agarwal >Assignee: Vinay >Priority: Blocker > Labels: pull-request-available > Fix For: 0.9.0 > > > Find more details here -> https://github.com/apache/hudi/issues/2774 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1975) Upgrade java-prometheus-client from 3.1.2 to 4.x
[ https://issues.apache.org/jira/browse/HUDI-1975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated HUDI-1975: Status: In Progress (was: Open) > Upgrade java-prometheus-client from 3.1.2 to 4.x > > > Key: HUDI-1975 > URL: https://issues.apache.org/jira/browse/HUDI-1975 > Project: Apache Hudi > Issue Type: Task >Reporter: Nishith Agarwal >Assignee: Vinay >Priority: Blocker > Fix For: 0.9.0 > > > Find more details here -> https://github.com/apache/hudi/issues/2774 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HUDI-1975) Upgrade java-prometheus-client from 3.1.2 to 4.x
[ https://issues.apache.org/jira/browse/HUDI-1975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay reassigned HUDI-1975: --- Assignee: Vinay > Upgrade java-prometheus-client from 3.1.2 to 4.x > > > Key: HUDI-1975 > URL: https://issues.apache.org/jira/browse/HUDI-1975 > Project: Apache Hudi > Issue Type: Task >Reporter: Nishith Agarwal >Assignee: Vinay >Priority: Blocker > Fix For: 0.9.0 > > > Find more details here -> https://github.com/apache/hudi/issues/2774 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-2060) Create Tests for KafkaOffsetGen
[ https://issues.apache.org/jira/browse/HUDI-2060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay resolved HUDI-2060. - Resolution: Done Done - ed1a5daa9a15e9123aa7fdba5ce8262d1cae0704 > Create Tests for KafkaOffsetGen > --- > > Key: HUDI-2060 > URL: https://issues.apache.org/jira/browse/HUDI-2060 > Project: Apache Hudi > Issue Type: Test > Components: Testing >Reporter: Vinay >Assignee: Vinay >Priority: Minor > Labels: pull-request-available > > We do not have tests for KafkaOffsetGen, there are important functions like ` > getNextOffsetRanges` which should be tested > ` -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HUDI-2067) Sync all the options of FlinkOptions to FlinkStreamerConfig
[ https://issues.apache.org/jira/browse/HUDI-2067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay reassigned HUDI-2067: --- Assignee: Vinay > Sync all the options of FlinkOptions to FlinkStreamerConfig > --- > > Key: HUDI-2067 > URL: https://issues.apache.org/jira/browse/HUDI-2067 > Project: Apache Hudi > Issue Type: Task > Components: Flink Integration >Reporter: Danny Chen >Assignee: Vinay >Priority: Major > Fix For: 0.9.0 > > > Sync the options so that the {{HoodieFlinkStreamer}} can have more config > options. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-2060) Create Tests for KafkaOffsetGen
Vinay created HUDI-2060: --- Summary: Create Tests for KafkaOffsetGen Key: HUDI-2060 URL: https://issues.apache.org/jira/browse/HUDI-2060 Project: Apache Hudi Issue Type: Test Components: Testing Reporter: Vinay Assignee: Vinay We do not have tests for KafkaOffsetGen, there are important functions like ` getNextOffsetRanges` which should be tested ` -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2060) Create Tests for KafkaOffsetGen
[ https://issues.apache.org/jira/browse/HUDI-2060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated HUDI-2060: Status: In Progress (was: Open) > Create Tests for KafkaOffsetGen > --- > > Key: HUDI-2060 > URL: https://issues.apache.org/jira/browse/HUDI-2060 > Project: Apache Hudi > Issue Type: Test > Components: Testing >Reporter: Vinay >Assignee: Vinay >Priority: Minor > > We do not have tests for KafkaOffsetGen, there are important functions like ` > getNextOffsetRanges` which should be tested > ` -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-2020) Add Concurrency based configs to Write Configs
[ https://issues.apache.org/jira/browse/HUDI-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay closed HUDI-2020. --- Resolution: Not A Problem > Add Concurrency based configs to Write Configs > -- > > Key: HUDI-2020 > URL: https://issues.apache.org/jira/browse/HUDI-2020 > Project: Apache Hudi > Issue Type: Improvement > Components: Writer Core >Reporter: Vinay >Priority: Minor > > Some of the configs mentioned here - > [https://hudi.apache.org/docs/concurrency_control.html#enabling-multi-writing] > are not present in any class. > > We should add this to HoodieWriteConfig class. Following configs are to be > added > {code:java} > hoodie.write.lock.provider > hoodie.write.lock.zookeeper.url > hoodie.write.lock.zookeeper.port > hoodie.write.lock.zookeeper.lock_key > hoodie.write.lock.zookeeper.base_path > hoodie.write.lock.hivemetastore.database > hoodie.write.lock.hivemetastore.table > {code} > ` -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2020) Add Concurrency based configs to Write Configs
[ https://issues.apache.org/jira/browse/HUDI-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated HUDI-2020: Status: New (was: Open) > Add Concurrency based configs to Write Configs > -- > > Key: HUDI-2020 > URL: https://issues.apache.org/jira/browse/HUDI-2020 > Project: Apache Hudi > Issue Type: Improvement > Components: Writer Core >Reporter: Vinay >Priority: Minor > > Some of the configs mentioned here - > [https://hudi.apache.org/docs/concurrency_control.html#enabling-multi-writing] > are not present in any class. > > We should add this to HoodieWriteConfig class. Following configs are to be > added > {code:java} > hoodie.write.lock.provider > hoodie.write.lock.zookeeper.url > hoodie.write.lock.zookeeper.port > hoodie.write.lock.zookeeper.lock_key > hoodie.write.lock.zookeeper.base_path > hoodie.write.lock.hivemetastore.database > hoodie.write.lock.hivemetastore.table > {code} > ` -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HUDI-2020) Add Concurrency based configs to Write Configs
[ https://issues.apache.org/jira/browse/HUDI-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay reassigned HUDI-2020: --- Assignee: (was: Vinay) > Add Concurrency based configs to Write Configs > -- > > Key: HUDI-2020 > URL: https://issues.apache.org/jira/browse/HUDI-2020 > Project: Apache Hudi > Issue Type: Improvement > Components: Writer Core >Reporter: Vinay >Priority: Minor > > Some of the configs mentioned here - > [https://hudi.apache.org/docs/concurrency_control.html#enabling-multi-writing] > are not present in any class. > > We should add this to HoodieWriteConfig class. Following configs are to be > added > {code:java} > hoodie.write.lock.provider > hoodie.write.lock.zookeeper.url > hoodie.write.lock.zookeeper.port > hoodie.write.lock.zookeeper.lock_key > hoodie.write.lock.zookeeper.base_path > hoodie.write.lock.hivemetastore.database > hoodie.write.lock.hivemetastore.table > {code} > ` -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2020) Add Concurrency based configs to Write Configs
[ https://issues.apache.org/jira/browse/HUDI-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated HUDI-2020: Status: Open (was: New) > Add Concurrency based configs to Write Configs > -- > > Key: HUDI-2020 > URL: https://issues.apache.org/jira/browse/HUDI-2020 > Project: Apache Hudi > Issue Type: Improvement > Components: Writer Core >Reporter: Vinay >Priority: Minor > > Some of the configs mentioned here - > [https://hudi.apache.org/docs/concurrency_control.html#enabling-multi-writing] > are not present in any class. > > We should add this to HoodieWriteConfig class. Following configs are to be > added > {code:java} > hoodie.write.lock.provider > hoodie.write.lock.zookeeper.url > hoodie.write.lock.zookeeper.port > hoodie.write.lock.zookeeper.lock_key > hoodie.write.lock.zookeeper.base_path > hoodie.write.lock.hivemetastore.database > hoodie.write.lock.hivemetastore.table > {code} > ` -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2020) Add Concurrency based configs to Write Configs
[ https://issues.apache.org/jira/browse/HUDI-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17367224#comment-17367224 ] Vinay commented on HUDI-2020: - Not an issue, configs are provided in LockConfiguration class > Add Concurrency based configs to Write Configs > -- > > Key: HUDI-2020 > URL: https://issues.apache.org/jira/browse/HUDI-2020 > Project: Apache Hudi > Issue Type: Improvement > Components: Writer Core >Reporter: Vinay >Assignee: Vinay >Priority: Minor > > Some of the configs mentioned here - > [https://hudi.apache.org/docs/concurrency_control.html#enabling-multi-writing] > are not present in any class. > > We should add this to HoodieWriteConfig class. Following configs are to be > added > {code:java} > hoodie.write.lock.provider > hoodie.write.lock.zookeeper.url > hoodie.write.lock.zookeeper.port > hoodie.write.lock.zookeeper.lock_key > hoodie.write.lock.zookeeper.base_path > hoodie.write.lock.hivemetastore.database > hoodie.write.lock.hivemetastore.table > {code} > ` -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2020) Add Concurrency based configs to Write Configs
[ https://issues.apache.org/jira/browse/HUDI-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated HUDI-2020: Status: In Progress (was: Open) > Add Concurrency based configs to Write Configs > -- > > Key: HUDI-2020 > URL: https://issues.apache.org/jira/browse/HUDI-2020 > Project: Apache Hudi > Issue Type: Improvement > Components: Writer Core >Reporter: Vinay >Assignee: Vinay >Priority: Minor > > Some of the configs mentioned here - > [https://hudi.apache.org/docs/concurrency_control.html#enabling-multi-writing] > are not present in any class. > > We should add this to HoodieWriteConfig class. Following configs are to be > added > {code:java} > hoodie.write.lock.provider > hoodie.write.lock.zookeeper.url > hoodie.write.lock.zookeeper.port > hoodie.write.lock.zookeeper.lock_key > hoodie.write.lock.zookeeper.base_path > hoodie.write.lock.hivemetastore.database > hoodie.write.lock.hivemetastore.table > {code} > ` -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-2035) Create document for PrometheusReporter
Vinay created HUDI-2035: --- Summary: Create document for PrometheusReporter Key: HUDI-2035 URL: https://issues.apache.org/jira/browse/HUDI-2035 Project: Apache Hudi Issue Type: Task Components: Docs Reporter: Vinay Although PrometheusReporter is released, there is no documentation for the same -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1872) Move HoodieFlinkStreamer into hudi-utilities module
[ https://issues.apache.org/jira/browse/HUDI-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17364250#comment-17364250 ] Vinay commented on HUDI-1872: - Assigning to me as per the discussion on mailing list, also the PR is in closed state - [https://github.com/apache/hudi/pull/2922] cc [~danny0405] > Move HoodieFlinkStreamer into hudi-utilities module > --- > > Key: HUDI-1872 > URL: https://issues.apache.org/jira/browse/HUDI-1872 > Project: Apache Hudi > Issue Type: Task > Components: Flink Integration >Reporter: Danny Chen >Assignee: 谢波 >Priority: Major > Labels: pull-request-available, sev:normal > Fix For: 0.9.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HUDI-1872) Move HoodieFlinkStreamer into hudi-utilities module
[ https://issues.apache.org/jira/browse/HUDI-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay reassigned HUDI-1872: --- Assignee: Vinay (was: 谢波) > Move HoodieFlinkStreamer into hudi-utilities module > --- > > Key: HUDI-1872 > URL: https://issues.apache.org/jira/browse/HUDI-1872 > Project: Apache Hudi > Issue Type: Task > Components: Flink Integration >Reporter: Danny Chen >Assignee: Vinay >Priority: Major > Labels: pull-request-available, sev:normal > Fix For: 0.9.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1872) Move HoodieFlinkStreamer into hudi-utilities module
[ https://issues.apache.org/jira/browse/HUDI-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated HUDI-1872: Status: In Progress (was: Open) > Move HoodieFlinkStreamer into hudi-utilities module > --- > > Key: HUDI-1872 > URL: https://issues.apache.org/jira/browse/HUDI-1872 > Project: Apache Hudi > Issue Type: Task > Components: Flink Integration >Reporter: Danny Chen >Assignee: Vinay >Priority: Major > Labels: pull-request-available, sev:normal > Fix For: 0.9.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-2020) Add Concurrency based configs to Write Configs
Vinay created HUDI-2020: --- Summary: Add Concurrency based configs to Write Configs Key: HUDI-2020 URL: https://issues.apache.org/jira/browse/HUDI-2020 Project: Apache Hudi Issue Type: Improvement Components: Writer Core Reporter: Vinay Assignee: Vinay Some of the configs mentioned here - [https://hudi.apache.org/docs/concurrency_control.html#enabling-multi-writing] are not present in any class. We should add this to HoodieWriteConfig class. Following configs are to be added {code:java} hoodie.write.lock.provider hoodie.write.lock.zookeeper.url hoodie.write.lock.zookeeper.port hoodie.write.lock.zookeeper.lock_key hoodie.write.lock.zookeeper.base_path hoodie.write.lock.hivemetastore.database hoodie.write.lock.hivemetastore.table {code} ` -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2003) Auto Compute Compression
[ https://issues.apache.org/jira/browse/HUDI-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362925#comment-17362925 ] Vinay commented on HUDI-2003: - [~nishith29] Please do update the description if I have missed anything here > Auto Compute Compression > > > Key: HUDI-2003 > URL: https://issues.apache.org/jira/browse/HUDI-2003 > Project: Apache Hudi > Issue Type: Bug > Components: Writer Core >Reporter: Vinay >Priority: Major > > Context : > Submitted a spark job to read 3-4B ORC records and wrote to Hudi format. > Creating the following table with all the runs that I had carried out based > on different options > > ||CONFIG ||Number of Files Created||Size of each file|| > |PARQUET_FILE_MAX_BYTES=DEFAULT|30K|21MB| > |PARQUET_FILE_MAX_BYTES=1GB|3700|178MB| > |PARQUET_FILE_MAX_BYTES=1GB > COPY_ON_WRITE_TABLE_INSERT_SPLIT_SIZE=110|Same as before|Same as before| > |PARQUET_FILE_MAX_BYTES=1GB > BULKINSERT_PARALLELISM=100|Same as before|Same as before| > |PARQUET_FILE_MAX_BYTES=4GB|1600|675MB| > |PARQUET_FILE_MAX_BYTES=6GB|669|1012MB| > Based on this runs, it feels that the compression ratio is off. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-2004) Move KafkaOffsetGen.CheckpointUtils test cases to independent class and improve coverage
[ https://issues.apache.org/jira/browse/HUDI-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay resolved HUDI-2004. - Resolution: Done Done - 769dd2d7c98558146eb4accb75b6d8e339ae6e0f > Move KafkaOffsetGen.CheckpointUtils test cases to independent class and > improve coverage > > > Key: HUDI-2004 > URL: https://issues.apache.org/jira/browse/HUDI-2004 > Project: Apache Hudi > Issue Type: Improvement > Components: Testing >Reporter: Vinay >Assignee: Vinay >Priority: Minor > Labels: pull-request-available > > Currently KafkaOffsetGen.CheckpointUtils test cases are present in > TestKafkaSource which starts up hdfs, hive,zk service locally. This is not > required for CheckpointUtils test cases, hence should be moved to independent > test case of its own > > Also, .CheckpointUtils.strToOffsets and CheckpointUtils.offsetsToStr are not > unit tested currently -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HUDI-1910) Supporting Kafka based checkpointing for HoodieDeltaStreamer
[ https://issues.apache.org/jira/browse/HUDI-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362688#comment-17362688 ] Vinay edited comment on HUDI-1910 at 6/14/21, 9:09 AM: --- [~nishith29] Make sense, so you are suggesting to include COMMIT_OFFSET_TO_KAFKA config in KafkaOffsetGen.Config class so that users can include it in property file like we pass topic name. And then use it here - [https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java#L474] and call commitOffsetToKafka function. is that correct ? If this approach looks good, I can test this change out and create a PR was (Author: vinaypatil18): [~nishith29] Make sense, so you suggesting to include COMMIT_OFFSET_TO_KAFKA config in KafkaOffsetGen.Config class so that users can include it in property file like we pass topic name. And then use it here - [https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java#L474] and call commitOffsetToKafka function. If this approach looks good, I can test this change out and create a PR > Supporting Kafka based checkpointing for HoodieDeltaStreamer > > > Key: HUDI-1910 > URL: https://issues.apache.org/jira/browse/HUDI-1910 > Project: Apache Hudi > Issue Type: Improvement > Components: DeltaStreamer >Reporter: Nishith Agarwal >Assignee: Vinay >Priority: Major > Labels: sev:normal, triaged > > HoodieDeltaStreamer currently supports commit metadata based checkpoint. Some > users have requested support for Kafka based checkpoints for freshness > auditing purposes. This ticket tracks any implementation for that. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HUDI-1975) Upgrade java-prometheus-client from 3.1.2 to 4.x
[ https://issues.apache.org/jira/browse/HUDI-1975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362691#comment-17362691 ] Vinay edited comment on HUDI-1975 at 6/14/21, 7:54 AM: --- [~nishith29] Updated the metrics.version in pom to 3.1.2 , the build fails with {code:java} /hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metrics/Metrics.java:[128,49] cannot find symbol {code} MetricsRegistry does not have gauge method in 3.1.2 version, this is part of metrics-core dependency. There is a workaround of doing so here - [https://github.com/eclipse/microprofile-metrics/issues/244] was (Author: vinaypatil18): Updated the metrics.version in pom to 3.1.2 , the build fails with {code:java} /hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metrics/Metrics.java:[128,49] cannot find symbol {code} MetricsRegistry does not gauge method in 3.1.2 version, this is part of metrics-core dependency. There is a workaround of doing so here - [https://github.com/eclipse/microprofile-metrics/issues/244] > Upgrade java-prometheus-client from 3.1.2 to 4.x > > > Key: HUDI-1975 > URL: https://issues.apache.org/jira/browse/HUDI-1975 > Project: Apache Hudi > Issue Type: Task >Reporter: Nishith Agarwal >Priority: Blocker > Fix For: 0.9.0 > > > Find more details here -> https://github.com/apache/hudi/issues/2774 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1975) Upgrade java-prometheus-client from 3.1.2 to 4.x
[ https://issues.apache.org/jira/browse/HUDI-1975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362691#comment-17362691 ] Vinay commented on HUDI-1975: - Updated the metrics.version in pom to 3.1.2 , the build fails with {code:java} /hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metrics/Metrics.java:[128,49] cannot find symbol {code} MetricsRegistry does not gauge method in 3.1.2 version, this is part of metrics-core dependency. There is a workaround of doing so here - [https://github.com/eclipse/microprofile-metrics/issues/244] > Upgrade java-prometheus-client from 3.1.2 to 4.x > > > Key: HUDI-1975 > URL: https://issues.apache.org/jira/browse/HUDI-1975 > Project: Apache Hudi > Issue Type: Task >Reporter: Nishith Agarwal >Priority: Blocker > Fix For: 0.9.0 > > > Find more details here -> https://github.com/apache/hudi/issues/2774 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1910) Supporting Kafka based checkpointing for HoodieDeltaStreamer
[ https://issues.apache.org/jira/browse/HUDI-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362688#comment-17362688 ] Vinay commented on HUDI-1910: - [~nishith29] Make sense, so you suggesting to include COMMIT_OFFSET_TO_KAFKA config in KafkaOffsetGen.Config class so that users can include it in property file like we pass topic name. And then use it here - [https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java#L474] and call commitOffsetToKafka function. If this approach looks good, I can test this change out and create a PR > Supporting Kafka based checkpointing for HoodieDeltaStreamer > > > Key: HUDI-1910 > URL: https://issues.apache.org/jira/browse/HUDI-1910 > Project: Apache Hudi > Issue Type: Improvement > Components: DeltaStreamer >Reporter: Nishith Agarwal >Assignee: Vinay >Priority: Major > Labels: sev:normal, triaged > > HoodieDeltaStreamer currently supports commit metadata based checkpoint. Some > users have requested support for Kafka based checkpoints for freshness > auditing purposes. This ticket tracks any implementation for that. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1910) Supporting Kafka based checkpointing for HoodieDeltaStreamer
[ https://issues.apache.org/jira/browse/HUDI-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362564#comment-17362564 ] Vinay commented on HUDI-1910: - [~nishith29] Instead of Updating the HoodieWriteCommitCallbackMessage and asking user to enable callback config to commit offset to Kafka, I have another way in mind. Should we just take the flag as config in delta streamer as --commit-offset-to-kafka ? We already get the checkpointStr which contains the end offset of each partition here - [https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java#L265] If the commit is successful and commit-offset-to-kafka is true - [https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java#L474] , we can commit the offset back to Kafka as well {code:java} private void commitOffsetToKafka(String checkpointStr) { // checkpointStr => hoodie_test,0:30,1:35 // offsetMap => {hoodie_test-0=30, hoodie_test-1=35} Map offsetMap = KafkaOffsetGen.CheckpointUtils.strToOffsets(checkpointStr); Map kafkaParams = new HashMap<>(); props.keySet().stream().filter(prop -> { return !prop.toString().startsWith("hoodie."); }).forEach(prop -> { kafkaParams.put(prop.toString(), props.get(prop.toString())); }); Map offsetAndMetadataMap = new HashMap<>(offsetMap.size()); offsetMap.forEach((key, value) -> offsetAndMetadataMap.put(key, new OffsetAndMetadata(value))); try (KafkaConsumer consumer = new KafkaConsumer(kafkaParams)) { consumer.commitAsync(offsetAndMetadataMap, new OffsetCommitCallback() { @Override public void onComplete(Map offsets, Exception exception) { LOG.info("Offsets committed to Kafka successfully "+ offsets.toString()); } }); } {code} What do you think of this approach ? > Supporting Kafka based checkpointing for HoodieDeltaStreamer > > > Key: HUDI-1910 > URL: https://issues.apache.org/jira/browse/HUDI-1910 > Project: Apache Hudi > Issue Type: Improvement > Components: DeltaStreamer >Reporter: Nishith Agarwal >Assignee: Vinay >Priority: Major > Labels: sev:normal, triaged > > HoodieDeltaStreamer currently supports commit metadata based checkpoint. Some > users have requested support for Kafka based checkpoints for freshness > auditing purposes. This ticket tracks any implementation for that. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1910) Supporting Kafka based checkpointing for HoodieDeltaStreamer
[ https://issues.apache.org/jira/browse/HUDI-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated HUDI-1910: Status: In Progress (was: Open) > Supporting Kafka based checkpointing for HoodieDeltaStreamer > > > Key: HUDI-1910 > URL: https://issues.apache.org/jira/browse/HUDI-1910 > Project: Apache Hudi > Issue Type: Improvement > Components: DeltaStreamer >Reporter: Nishith Agarwal >Assignee: Vinay >Priority: Major > Labels: sev:normal, triaged > > HoodieDeltaStreamer currently supports commit metadata based checkpoint. Some > users have requested support for Kafka based checkpoints for freshness > auditing purposes. This ticket tracks any implementation for that. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (HUDI-1997) Fix hoodie.datasource.hive_sync.auto_create_database documentation
[ https://issues.apache.org/jira/browse/HUDI-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay reopened HUDI-1997: - > Fix hoodie.datasource.hive_sync.auto_create_database documentation > --- > > Key: HUDI-1997 > URL: https://issues.apache.org/jira/browse/HUDI-1997 > Project: Apache Hudi > Issue Type: Bug > Components: Docs >Reporter: Nishith Agarwal >Assignee: Vinay >Priority: Major > Fix For: 0.9.0 > > > hoodie.datasource.hive_sync.auto_create_database is supposed to be defaulting > to true according to docs but actually defaults to false for 0.7 & 0.8 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-1997) Fix hoodie.datasource.hive_sync.auto_create_database documentation
[ https://issues.apache.org/jira/browse/HUDI-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay resolved HUDI-1997. - Resolution: Fixed > Fix hoodie.datasource.hive_sync.auto_create_database documentation > --- > > Key: HUDI-1997 > URL: https://issues.apache.org/jira/browse/HUDI-1997 > Project: Apache Hudi > Issue Type: Bug > Components: Docs >Reporter: Nishith Agarwal >Assignee: Vinay >Priority: Major > Fix For: 0.9.0 > > > hoodie.datasource.hive_sync.auto_create_database is supposed to be defaulting > to true according to docs but actually defaults to false for 0.7 & 0.8 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1997) Fix hoodie.datasource.hive_sync.auto_create_database documentation
[ https://issues.apache.org/jira/browse/HUDI-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362550#comment-17362550 ] Vinay commented on HUDI-1997: - Fixed - 64a8f53b25dd21fb737f783a5cf5d316fc0ae56d > Fix hoodie.datasource.hive_sync.auto_create_database documentation > --- > > Key: HUDI-1997 > URL: https://issues.apache.org/jira/browse/HUDI-1997 > Project: Apache Hudi > Issue Type: Bug > Components: Docs >Reporter: Nishith Agarwal >Assignee: Vinay >Priority: Major > Fix For: 0.9.0 > > > hoodie.datasource.hive_sync.auto_create_database is supposed to be defaulting > to true according to docs but actually defaults to false for 0.7 & 0.8 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1997) Fix hoodie.datasource.hive_sync.auto_create_database documentation
[ https://issues.apache.org/jira/browse/HUDI-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated HUDI-1997: Status: Closed (was: Patch Available) > Fix hoodie.datasource.hive_sync.auto_create_database documentation > --- > > Key: HUDI-1997 > URL: https://issues.apache.org/jira/browse/HUDI-1997 > Project: Apache Hudi > Issue Type: Bug > Components: Docs >Reporter: Nishith Agarwal >Assignee: Vinay >Priority: Major > Fix For: 0.9.0 > > > hoodie.datasource.hive_sync.auto_create_database is supposed to be defaulting > to true according to docs but actually defaults to false for 0.7 & 0.8 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2004) Move KafkaOffsetGen.CheckpointUtils test cases to independent class and improve coverage
[ https://issues.apache.org/jira/browse/HUDI-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362549#comment-17362549 ] Vinay commented on HUDI-2004: - Created PR - https://github.com/apache/hudi/pull/3072 > Move KafkaOffsetGen.CheckpointUtils test cases to independent class and > improve coverage > > > Key: HUDI-2004 > URL: https://issues.apache.org/jira/browse/HUDI-2004 > Project: Apache Hudi > Issue Type: Improvement > Components: Testing >Reporter: Vinay >Assignee: Vinay >Priority: Minor > Labels: pull-request-available > > Currently KafkaOffsetGen.CheckpointUtils test cases are present in > TestKafkaSource which starts up hdfs, hive,zk service locally. This is not > required for CheckpointUtils test cases, hence should be moved to independent > test case of its own > > Also, .CheckpointUtils.strToOffsets and CheckpointUtils.offsetsToStr are not > unit tested currently -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2004) Move KafkaOffsetGen.CheckpointUtils test cases to independent class and improve coverage
[ https://issues.apache.org/jira/browse/HUDI-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated HUDI-2004: Labels: pull-request-available (was: ) > Move KafkaOffsetGen.CheckpointUtils test cases to independent class and > improve coverage > > > Key: HUDI-2004 > URL: https://issues.apache.org/jira/browse/HUDI-2004 > Project: Apache Hudi > Issue Type: Improvement > Components: Testing >Reporter: Vinay >Assignee: Vinay >Priority: Minor > Labels: pull-request-available > > Currently KafkaOffsetGen.CheckpointUtils test cases are present in > TestKafkaSource which starts up hdfs, hive,zk service locally. This is not > required for CheckpointUtils test cases, hence should be moved to independent > test case of its own > > Also, .CheckpointUtils.strToOffsets and CheckpointUtils.offsetsToStr are not > unit tested currently -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1997) Fix hoodie.datasource.hive_sync.auto_create_database documentation
[ https://issues.apache.org/jira/browse/HUDI-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362465#comment-17362465 ] Vinay commented on HUDI-1997: - Created PR - https://github.com/apache/hudi/pull/3066 > Fix hoodie.datasource.hive_sync.auto_create_database documentation > --- > > Key: HUDI-1997 > URL: https://issues.apache.org/jira/browse/HUDI-1997 > Project: Apache Hudi > Issue Type: Bug > Components: Docs >Reporter: Nishith Agarwal >Assignee: Vinay >Priority: Major > Fix For: 0.9.0 > > > hoodie.datasource.hive_sync.auto_create_database is supposed to be defaulting > to true according to docs but actually defaults to false for 0.7 & 0.8 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1997) Fix hoodie.datasource.hive_sync.auto_create_database documentation
[ https://issues.apache.org/jira/browse/HUDI-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated HUDI-1997: Status: Patch Available (was: In Progress) > Fix hoodie.datasource.hive_sync.auto_create_database documentation > --- > > Key: HUDI-1997 > URL: https://issues.apache.org/jira/browse/HUDI-1997 > Project: Apache Hudi > Issue Type: Bug > Components: Docs >Reporter: Nishith Agarwal >Assignee: Vinay >Priority: Major > Fix For: 0.9.0 > > > hoodie.datasource.hive_sync.auto_create_database is supposed to be defaulting > to true according to docs but actually defaults to false for 0.7 & 0.8 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2004) Move KafkaOffsetGen.CheckpointUtils test cases to independent class and improve coverage
[ https://issues.apache.org/jira/browse/HUDI-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated HUDI-2004: Status: In Progress (was: Open) > Move KafkaOffsetGen.CheckpointUtils test cases to independent class and > improve coverage > > > Key: HUDI-2004 > URL: https://issues.apache.org/jira/browse/HUDI-2004 > Project: Apache Hudi > Issue Type: Improvement > Components: Testing >Reporter: Vinay >Assignee: Vinay >Priority: Minor > > Currently KafkaOffsetGen.CheckpointUtils test cases are present in > TestKafkaSource which starts up hdfs, hive,zk service locally. This is not > required for CheckpointUtils test cases, hence should be moved to independent > test case of its own > > Also, .CheckpointUtils.strToOffsets and CheckpointUtils.offsetsToStr are not > unit tested currently -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-1148) Revisit log messages seen when wiriting or reading through Hudi
[ https://issues.apache.org/jira/browse/HUDI-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay closed HUDI-1148. --- > Revisit log messages seen when wiriting or reading through Hudi > --- > > Key: HUDI-1148 > URL: https://issues.apache.org/jira/browse/HUDI-1148 > Project: Apache Hudi > Issue Type: Improvement > Components: Writer Core >Affects Versions: 0.9.0 >Reporter: Balaji Varadarajan >Assignee: Vinay >Priority: Minor > Labels: pull-request-available > Fix For: 0.9.0 > > > [https://github.com/apache/hudi/issues/1906] > > Some of these Log messages can be made debug. We need to generally see the > verbosity of log messages when running hudi operations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1148) Revisit log messages seen when wiriting or reading through Hudi
[ https://issues.apache.org/jira/browse/HUDI-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362376#comment-17362376 ] Vinay commented on HUDI-1148: - Fixed - f3d7b49bfea0630dcf488c087755485d8d088270 > Revisit log messages seen when wiriting or reading through Hudi > --- > > Key: HUDI-1148 > URL: https://issues.apache.org/jira/browse/HUDI-1148 > Project: Apache Hudi > Issue Type: Improvement > Components: Writer Core >Affects Versions: 0.9.0 >Reporter: Balaji Varadarajan >Assignee: Vinay >Priority: Minor > Labels: pull-request-available > Fix For: 0.9.0 > > > [https://github.com/apache/hudi/issues/1906] > > Some of these Log messages can be made debug. We need to generally see the > verbosity of log messages when running hudi operations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1975) Upgrade java-prometheus-client from 3.1.2 to 4.x
[ https://issues.apache.org/jira/browse/HUDI-1975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362375#comment-17362375 ] Vinay commented on HUDI-1975: - [~nishith29] Currently in pom.xml the version is specified as {code:java} 0.8.0 {code} Where do we have to upgrade the version to 4.x ? Also, can you please link GH issue on why we are doing this > Upgrade java-prometheus-client from 3.1.2 to 4.x > > > Key: HUDI-1975 > URL: https://issues.apache.org/jira/browse/HUDI-1975 > Project: Apache Hudi > Issue Type: Task >Reporter: Nishith Agarwal >Priority: Blocker > Fix For: 0.9.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1976) Upgrade hive, jackson, log4j, hadoop to remove vulnerability
[ https://issues.apache.org/jira/browse/HUDI-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362372#comment-17362372 ] Vinay commented on HUDI-1976: - WIP PR -https://github.com/apache/hudi/pull/3071 > Upgrade hive, jackson, log4j, hadoop to remove vulnerability > > > Key: HUDI-1976 > URL: https://issues.apache.org/jira/browse/HUDI-1976 > Project: Apache Hudi > Issue Type: Task > Components: Hive Integration >Reporter: Nishith Agarwal >Priority: Blocker > Fix For: 0.9.0 > > > [https://github.com/apache/hudi/issues/2827] > [https://github.com/apache/hudi/issues/2826] > [https://github.com/apache/hudi/issues/2824|https://github.com/apache/hudi/issues/2826] > [https://github.com/apache/hudi/issues/2823|https://github.com/apache/hudi/issues/2826] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-2003) Auto Compute Compression
Vinay created HUDI-2003: --- Summary: Auto Compute Compression Key: HUDI-2003 URL: https://issues.apache.org/jira/browse/HUDI-2003 Project: Apache Hudi Issue Type: Bug Components: Writer Core Reporter: Vinay Context : Submitted a spark job to read 3-4B ORC records and wrote to Hudi format. Creating the following table with all the runs that I had carried out based on different options ||CONFIG ||Number of Files Created||Size of each file|| |PARQUET_FILE_MAX_BYTES=DEFAULT|30K|21MB| |PARQUET_FILE_MAX_BYTES=1GB|3700|178MB| |PARQUET_FILE_MAX_BYTES=1GB COPY_ON_WRITE_TABLE_INSERT_SPLIT_SIZE=110|Same as before|Same as before| |PARQUET_FILE_MAX_BYTES=1GB BULKINSERT_PARALLELISM=100|Same as before|Same as before| |PARQUET_FILE_MAX_BYTES=4GB|1600|675MB| |PARQUET_FILE_MAX_BYTES=6GB|669|1012MB| Based on this runs, it feels that the compression ratio is off. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1997) Fix hoodie.datasource.hive_sync.auto_create_database documentation
[ https://issues.apache.org/jira/browse/HUDI-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated HUDI-1997: Status: In Progress (was: Open) > Fix hoodie.datasource.hive_sync.auto_create_database documentation > --- > > Key: HUDI-1997 > URL: https://issues.apache.org/jira/browse/HUDI-1997 > Project: Apache Hudi > Issue Type: Bug > Components: Docs >Reporter: Nishith Agarwal >Assignee: Vinay >Priority: Major > Fix For: 0.9.0 > > > hoodie.datasource.hive_sync.auto_create_database is supposed to be defaulting > to true according to docs but actually defaults to false for 0.7 & 0.8 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HUDI-1997) Fix hoodie.datasource.hive_sync.auto_create_database documentation
[ https://issues.apache.org/jira/browse/HUDI-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay reassigned HUDI-1997: --- Assignee: Vinay > Fix hoodie.datasource.hive_sync.auto_create_database documentation > --- > > Key: HUDI-1997 > URL: https://issues.apache.org/jira/browse/HUDI-1997 > Project: Apache Hudi > Issue Type: Bug > Components: Docs >Reporter: Nishith Agarwal >Assignee: Vinay >Priority: Major > Fix For: 0.9.0 > > > hoodie.datasource.hive_sync.auto_create_database is supposed to be defaulting > to true according to docs but actually defaults to false for 0.7 & 0.8 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-1942) HIVE_AUTO_CREATE_DATABASE_OPT_KEY This should default to true when Hudi synchronizes Hive
[ https://issues.apache.org/jira/browse/HUDI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay resolved HUDI-1942. - Fix Version/s: (was: 0.8.0) Resolution: Fixed Fixed - [2a7e1e0|https://github.com/apache/hudi/commit/2a7e1e091e69c53acc0a19e3d792ca15a3d7db62] > HIVE_AUTO_CREATE_DATABASE_OPT_KEY This should default to true when Hudi > synchronizes Hive > - > > Key: HUDI-1942 > URL: https://issues.apache.org/jira/browse/HUDI-1942 > Project: Apache Hudi > Issue Type: Bug > Components: newbie >Reporter: yao.zhou >Assignee: Vinay >Priority: Major > Labels: easy-fix, pull-request-available > Fix For: 0.9.0 > > > HIVE_AUTO_CREATE_DATABASE_OPT_KEY = > "hoodie.datasource.hive_sync.auto_create_database" > DEFAULT_HIVE_AUTO_CREATE_DATABASE_OPT_KEY = "true" > in HoodieSparkSqlWriter.buildSyncConfig > hiveSyncConfig.autoCreateDatabase = > parameters.get(HIVE_AUTO_CREATE_DATABASE_OPT_KEY).exists(r => r.toBoolean) > * This method sets the parameter to false -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-1892) NullPointerException when using OverwriteNonDefaultsWithLatestAvroPayload at hudi 0.9.0
[ https://issues.apache.org/jira/browse/HUDI-1892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay resolved HUDI-1892. - Resolution: Fixed > NullPointerException when using OverwriteNonDefaultsWithLatestAvroPayload at > hudi 0.9.0 > --- > > Key: HUDI-1892 > URL: https://issues.apache.org/jira/browse/HUDI-1892 > Project: Apache Hudi > Issue Type: Bug >Reporter: shenbing >Assignee: Vinay >Priority: Major > Labels: pull-request-available > > using compiled hudi 0.9.0 with hadoop3.0.0 and hive3.1.1 after resolving > dependency conflicts, > I import hudi-spark-bundle_2.11-0.9.0-SNAPSHOT.jar into my project. > When I using OverwriteNonDefaultsWithLatestAvroPayload to update field with > new value, I got the error. > {code:java} > Caused by: java.util.concurrent.ExecutionException: > org.apache.hudi.exception.HoodieUpsertException: Failed to combine/merge new > record with old value in storage, for new record {HoodieRecord{key=HoodieKey > { recordKey=1 partitionPath=date=1}, currentLocation='HoodieRecordLocation > {instantTime=20210510160355, fileId=9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0}', > newLocation='HoodieRecordLocation {instantTime=20210510160400, > fileId=9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0}'}}, old value > {{"_hoodie_commit_time": "20210510160355", "_hoodie_commit_seqno": > "20210510160355_0_50", "_hoodie_record_key": "1", "_hoodie_partition_path": > "date=1", "_hoodie_file_name": > "9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0_0-1502-1519_20210510160355.parquet", > "uuid": "1", "name": "jerry", "age": 10, "date": "1", "update_time": "1"}} > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:141) > ... 34 more > Caused by: org.apache.hudi.exception.HoodieUpsertException: Failed to > combine/merge new record with old value in storage, for new record > {HoodieRecord{key=HoodieKey { recordKey=1 partitionPath=date=1}, > currentLocation='HoodieRecordLocation {instantTime=20210510160355, > fileId=9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0}', > newLocation='HoodieRecordLocation {instantTime=20210510160400, > fileId=9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0}'}}, old value > {{"_hoodie_commit_time": "20210510160355", "_hoodie_commit_seqno": > "20210510160355_0_50", "_hoodie_record_key": "1", "_hoodie_partition_path": > "date=1", "_hoodie_file_name": > "9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0_0-1502-1519_20210510160355.parquet", > "uuid": "1", "name": "jerry", "age": 10, "date": "1", "update_time": "1"}} > at > org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:290) > at > org.apache.hudi.table.action.commit.AbstractMergeHelper$UpdateHandler.consumeOneRecord(AbstractMergeHelper.java:122) > at > org.apache.hudi.table.action.commit.AbstractMergeHelper$UpdateHandler.consumeOneRecord(AbstractMergeHelper.java:112) > at > org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:37) > at > org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:121) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ... 3 more > Caused by: java.lang.NullPointerException > at > org.apache.hudi.common.model.OverwriteWithLatestAvroPayload.overwriteField(OverwriteWithLatestAvroPayload.java:97) > at > org.apache.hudi.common.model.OverwriteNonDefaultsWithLatestAvroPayload.lambda$combineAndGetUpdateValue$0(OverwriteNonDefaultsWithLatestAvroPayload.java:67) > at java.util.ArrayList.forEach(ArrayList.java:1259) > at > org.apache.hudi.common.model.OverwriteNonDefaultsWithLatestAvroPayload.combineAndGetUpdateValue(OverwriteNonDefaultsWithLatestAvroPayload.java:64) > at > org.apache.hudi.common.model.HoodieRecordPayload.combineAndGetUpdateValue(HoodieRecordPayload.java:81) > at > org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:276) > ... 8 more > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1892) NullPointerException when using OverwriteNonDefaultsWithLatestAvroPayload at hudi 0.9.0
[ https://issues.apache.org/jira/browse/HUDI-1892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17360060#comment-17360060 ] Vinay commented on HUDI-1892: - Fixed - 11360f707e969747e1a30791acb23857cc376589 > NullPointerException when using OverwriteNonDefaultsWithLatestAvroPayload at > hudi 0.9.0 > --- > > Key: HUDI-1892 > URL: https://issues.apache.org/jira/browse/HUDI-1892 > Project: Apache Hudi > Issue Type: Bug >Reporter: shenbing >Assignee: Vinay >Priority: Major > Labels: pull-request-available > > using compiled hudi 0.9.0 with hadoop3.0.0 and hive3.1.1 after resolving > dependency conflicts, > I import hudi-spark-bundle_2.11-0.9.0-SNAPSHOT.jar into my project. > When I using OverwriteNonDefaultsWithLatestAvroPayload to update field with > new value, I got the error. > {code:java} > Caused by: java.util.concurrent.ExecutionException: > org.apache.hudi.exception.HoodieUpsertException: Failed to combine/merge new > record with old value in storage, for new record {HoodieRecord{key=HoodieKey > { recordKey=1 partitionPath=date=1}, currentLocation='HoodieRecordLocation > {instantTime=20210510160355, fileId=9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0}', > newLocation='HoodieRecordLocation {instantTime=20210510160400, > fileId=9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0}'}}, old value > {{"_hoodie_commit_time": "20210510160355", "_hoodie_commit_seqno": > "20210510160355_0_50", "_hoodie_record_key": "1", "_hoodie_partition_path": > "date=1", "_hoodie_file_name": > "9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0_0-1502-1519_20210510160355.parquet", > "uuid": "1", "name": "jerry", "age": 10, "date": "1", "update_time": "1"}} > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:141) > ... 34 more > Caused by: org.apache.hudi.exception.HoodieUpsertException: Failed to > combine/merge new record with old value in storage, for new record > {HoodieRecord{key=HoodieKey { recordKey=1 partitionPath=date=1}, > currentLocation='HoodieRecordLocation {instantTime=20210510160355, > fileId=9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0}', > newLocation='HoodieRecordLocation {instantTime=20210510160400, > fileId=9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0}'}}, old value > {{"_hoodie_commit_time": "20210510160355", "_hoodie_commit_seqno": > "20210510160355_0_50", "_hoodie_record_key": "1", "_hoodie_partition_path": > "date=1", "_hoodie_file_name": > "9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0_0-1502-1519_20210510160355.parquet", > "uuid": "1", "name": "jerry", "age": 10, "date": "1", "update_time": "1"}} > at > org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:290) > at > org.apache.hudi.table.action.commit.AbstractMergeHelper$UpdateHandler.consumeOneRecord(AbstractMergeHelper.java:122) > at > org.apache.hudi.table.action.commit.AbstractMergeHelper$UpdateHandler.consumeOneRecord(AbstractMergeHelper.java:112) > at > org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:37) > at > org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:121) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ... 3 more > Caused by: java.lang.NullPointerException > at > org.apache.hudi.common.model.OverwriteWithLatestAvroPayload.overwriteField(OverwriteWithLatestAvroPayload.java:97) > at > org.apache.hudi.common.model.OverwriteNonDefaultsWithLatestAvroPayload.lambda$combineAndGetUpdateValue$0(OverwriteNonDefaultsWithLatestAvroPayload.java:67) > at java.util.ArrayList.forEach(ArrayList.java:1259) > at > org.apache.hudi.common.model.OverwriteNonDefaultsWithLatestAvroPayload.combineAndGetUpdateValue(OverwriteNonDefaultsWithLatestAvroPayload.java:64) > at > org.apache.hudi.common.model.HoodieRecordPayload.combineAndGetUpdateValue(HoodieRecordPayload.java:81) > at > org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:276) > ... 8 more > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1892) NullPointerException when using OverwriteNonDefaultsWithLatestAvroPayload at hudi 0.9.0
[ https://issues.apache.org/jira/browse/HUDI-1892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17359328#comment-17359328 ] Vinay commented on HUDI-1892: - Looking at the code this issue could be because the defaultValue is not empty but the actual value of the field is null (This could be the case if field schema is Union type like "null,string") Creating a PR to use String.valueOf(value) in the below function to handle this edge case {code:java} public Boolean overwriteField(Object value, Object defaultValue) { return defaultValue == null ? value == null : defaultValue.toString().equals(value.toString()); } {code} > NullPointerException when using OverwriteNonDefaultsWithLatestAvroPayload at > hudi 0.9.0 > --- > > Key: HUDI-1892 > URL: https://issues.apache.org/jira/browse/HUDI-1892 > Project: Apache Hudi > Issue Type: Bug >Reporter: shenbing >Assignee: Vinay >Priority: Major > > using compiled hudi 0.9.0 with hadoop3.0.0 and hive3.1.1 after resolving > dependency conflicts, > I import hudi-spark-bundle_2.11-0.9.0-SNAPSHOT.jar into my project. > When I using OverwriteNonDefaultsWithLatestAvroPayload to update field with > new value, I got the error. > {code:java} > Caused by: java.util.concurrent.ExecutionException: > org.apache.hudi.exception.HoodieUpsertException: Failed to combine/merge new > record with old value in storage, for new record {HoodieRecord{key=HoodieKey > { recordKey=1 partitionPath=date=1}, currentLocation='HoodieRecordLocation > {instantTime=20210510160355, fileId=9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0}', > newLocation='HoodieRecordLocation {instantTime=20210510160400, > fileId=9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0}'}}, old value > {{"_hoodie_commit_time": "20210510160355", "_hoodie_commit_seqno": > "20210510160355_0_50", "_hoodie_record_key": "1", "_hoodie_partition_path": > "date=1", "_hoodie_file_name": > "9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0_0-1502-1519_20210510160355.parquet", > "uuid": "1", "name": "jerry", "age": 10, "date": "1", "update_time": "1"}} > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:141) > ... 34 more > Caused by: org.apache.hudi.exception.HoodieUpsertException: Failed to > combine/merge new record with old value in storage, for new record > {HoodieRecord{key=HoodieKey { recordKey=1 partitionPath=date=1}, > currentLocation='HoodieRecordLocation {instantTime=20210510160355, > fileId=9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0}', > newLocation='HoodieRecordLocation {instantTime=20210510160400, > fileId=9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0}'}}, old value > {{"_hoodie_commit_time": "20210510160355", "_hoodie_commit_seqno": > "20210510160355_0_50", "_hoodie_record_key": "1", "_hoodie_partition_path": > "date=1", "_hoodie_file_name": > "9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0_0-1502-1519_20210510160355.parquet", > "uuid": "1", "name": "jerry", "age": 10, "date": "1", "update_time": "1"}} > at > org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:290) > at > org.apache.hudi.table.action.commit.AbstractMergeHelper$UpdateHandler.consumeOneRecord(AbstractMergeHelper.java:122) > at > org.apache.hudi.table.action.commit.AbstractMergeHelper$UpdateHandler.consumeOneRecord(AbstractMergeHelper.java:112) > at > org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:37) > at > org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:121) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ... 3 more > Caused by: java.lang.NullPointerException > at > org.apache.hudi.common.model.OverwriteWithLatestAvroPayload.overwriteField(OverwriteWithLatestAvroPayload.java:97) > at > org.apache.hudi.common.model.OverwriteNonDefaultsWithLatestAvroPayload.lambda$combineAndGetUpdateValue$0(OverwriteNonDefaultsWithLatestAvroPayload.java:67) > at java.util.ArrayList.forEach(ArrayList.java:1259) > at > org.apache.hudi.common.model.OverwriteNonDefaultsWithLatestAvroPayload.combineAndGetUpdateValue(OverwriteNonDefaultsWithLatestAvroPayload.java:64) > at > org.apache.hudi.common.model.HoodieRecordPayload.combineAndGetUpdateValue(HoodieRecordPayload.java:81) > at > org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:276) > ... 8 more > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HUDI-1892) NullPointerException when using OverwriteNonDefaultsWithLatestAvroPayload at hudi 0.9.0
[ https://issues.apache.org/jira/browse/HUDI-1892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay reassigned HUDI-1892: --- Assignee: Vinay > NullPointerException when using OverwriteNonDefaultsWithLatestAvroPayload at > hudi 0.9.0 > --- > > Key: HUDI-1892 > URL: https://issues.apache.org/jira/browse/HUDI-1892 > Project: Apache Hudi > Issue Type: Bug >Reporter: shenbing >Assignee: Vinay >Priority: Major > > using compiled hudi 0.9.0 with hadoop3.0.0 and hive3.1.1 after resolving > dependency conflicts, > I import hudi-spark-bundle_2.11-0.9.0-SNAPSHOT.jar into my project. > When I using OverwriteNonDefaultsWithLatestAvroPayload to update field with > new value, I got the error. > {code:java} > Caused by: java.util.concurrent.ExecutionException: > org.apache.hudi.exception.HoodieUpsertException: Failed to combine/merge new > record with old value in storage, for new record {HoodieRecord{key=HoodieKey > { recordKey=1 partitionPath=date=1}, currentLocation='HoodieRecordLocation > {instantTime=20210510160355, fileId=9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0}', > newLocation='HoodieRecordLocation {instantTime=20210510160400, > fileId=9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0}'}}, old value > {{"_hoodie_commit_time": "20210510160355", "_hoodie_commit_seqno": > "20210510160355_0_50", "_hoodie_record_key": "1", "_hoodie_partition_path": > "date=1", "_hoodie_file_name": > "9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0_0-1502-1519_20210510160355.parquet", > "uuid": "1", "name": "jerry", "age": 10, "date": "1", "update_time": "1"}} > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:141) > ... 34 more > Caused by: org.apache.hudi.exception.HoodieUpsertException: Failed to > combine/merge new record with old value in storage, for new record > {HoodieRecord{key=HoodieKey { recordKey=1 partitionPath=date=1}, > currentLocation='HoodieRecordLocation {instantTime=20210510160355, > fileId=9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0}', > newLocation='HoodieRecordLocation {instantTime=20210510160400, > fileId=9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0}'}}, old value > {{"_hoodie_commit_time": "20210510160355", "_hoodie_commit_seqno": > "20210510160355_0_50", "_hoodie_record_key": "1", "_hoodie_partition_path": > "date=1", "_hoodie_file_name": > "9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0_0-1502-1519_20210510160355.parquet", > "uuid": "1", "name": "jerry", "age": 10, "date": "1", "update_time": "1"}} > at > org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:290) > at > org.apache.hudi.table.action.commit.AbstractMergeHelper$UpdateHandler.consumeOneRecord(AbstractMergeHelper.java:122) > at > org.apache.hudi.table.action.commit.AbstractMergeHelper$UpdateHandler.consumeOneRecord(AbstractMergeHelper.java:112) > at > org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:37) > at > org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:121) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ... 3 more > Caused by: java.lang.NullPointerException > at > org.apache.hudi.common.model.OverwriteWithLatestAvroPayload.overwriteField(OverwriteWithLatestAvroPayload.java:97) > at > org.apache.hudi.common.model.OverwriteNonDefaultsWithLatestAvroPayload.lambda$combineAndGetUpdateValue$0(OverwriteNonDefaultsWithLatestAvroPayload.java:67) > at java.util.ArrayList.forEach(ArrayList.java:1259) > at > org.apache.hudi.common.model.OverwriteNonDefaultsWithLatestAvroPayload.combineAndGetUpdateValue(OverwriteNonDefaultsWithLatestAvroPayload.java:64) > at > org.apache.hudi.common.model.HoodieRecordPayload.combineAndGetUpdateValue(HoodieRecordPayload.java:81) > at > org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:276) > ... 8 more > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1892) NullPointerException when using OverwriteNonDefaultsWithLatestAvroPayload at hudi 0.9.0
[ https://issues.apache.org/jira/browse/HUDI-1892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated HUDI-1892: Status: In Progress (was: Open) > NullPointerException when using OverwriteNonDefaultsWithLatestAvroPayload at > hudi 0.9.0 > --- > > Key: HUDI-1892 > URL: https://issues.apache.org/jira/browse/HUDI-1892 > Project: Apache Hudi > Issue Type: Bug >Reporter: shenbing >Assignee: Vinay >Priority: Major > > using compiled hudi 0.9.0 with hadoop3.0.0 and hive3.1.1 after resolving > dependency conflicts, > I import hudi-spark-bundle_2.11-0.9.0-SNAPSHOT.jar into my project. > When I using OverwriteNonDefaultsWithLatestAvroPayload to update field with > new value, I got the error. > {code:java} > Caused by: java.util.concurrent.ExecutionException: > org.apache.hudi.exception.HoodieUpsertException: Failed to combine/merge new > record with old value in storage, for new record {HoodieRecord{key=HoodieKey > { recordKey=1 partitionPath=date=1}, currentLocation='HoodieRecordLocation > {instantTime=20210510160355, fileId=9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0}', > newLocation='HoodieRecordLocation {instantTime=20210510160400, > fileId=9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0}'}}, old value > {{"_hoodie_commit_time": "20210510160355", "_hoodie_commit_seqno": > "20210510160355_0_50", "_hoodie_record_key": "1", "_hoodie_partition_path": > "date=1", "_hoodie_file_name": > "9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0_0-1502-1519_20210510160355.parquet", > "uuid": "1", "name": "jerry", "age": 10, "date": "1", "update_time": "1"}} > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:141) > ... 34 more > Caused by: org.apache.hudi.exception.HoodieUpsertException: Failed to > combine/merge new record with old value in storage, for new record > {HoodieRecord{key=HoodieKey { recordKey=1 partitionPath=date=1}, > currentLocation='HoodieRecordLocation {instantTime=20210510160355, > fileId=9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0}', > newLocation='HoodieRecordLocation {instantTime=20210510160400, > fileId=9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0}'}}, old value > {{"_hoodie_commit_time": "20210510160355", "_hoodie_commit_seqno": > "20210510160355_0_50", "_hoodie_record_key": "1", "_hoodie_partition_path": > "date=1", "_hoodie_file_name": > "9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0_0-1502-1519_20210510160355.parquet", > "uuid": "1", "name": "jerry", "age": 10, "date": "1", "update_time": "1"}} > at > org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:290) > at > org.apache.hudi.table.action.commit.AbstractMergeHelper$UpdateHandler.consumeOneRecord(AbstractMergeHelper.java:122) > at > org.apache.hudi.table.action.commit.AbstractMergeHelper$UpdateHandler.consumeOneRecord(AbstractMergeHelper.java:112) > at > org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:37) > at > org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:121) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ... 3 more > Caused by: java.lang.NullPointerException > at > org.apache.hudi.common.model.OverwriteWithLatestAvroPayload.overwriteField(OverwriteWithLatestAvroPayload.java:97) > at > org.apache.hudi.common.model.OverwriteNonDefaultsWithLatestAvroPayload.lambda$combineAndGetUpdateValue$0(OverwriteNonDefaultsWithLatestAvroPayload.java:67) > at java.util.ArrayList.forEach(ArrayList.java:1259) > at > org.apache.hudi.common.model.OverwriteNonDefaultsWithLatestAvroPayload.combineAndGetUpdateValue(OverwriteNonDefaultsWithLatestAvroPayload.java:64) > at > org.apache.hudi.common.model.HoodieRecordPayload.combineAndGetUpdateValue(HoodieRecordPayload.java:81) > at > org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:276) > ... 8 more > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1909) Skip the commits with empty files for flink streaming reader
[ https://issues.apache.org/jira/browse/HUDI-1909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated HUDI-1909: Status: In Progress (was: Open) > Skip the commits with empty files for flink streaming reader > > > Key: HUDI-1909 > URL: https://issues.apache.org/jira/browse/HUDI-1909 > Project: Apache Hudi > Issue Type: Improvement > Components: Flink Integration >Reporter: Danny Chen >Assignee: Vinay >Priority: Major > Fix For: 0.9.0 > > > Log warnings instead of throwing to make the reader more robust. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HUDI-1909) Skip the commits with empty files for flink streaming reader
[ https://issues.apache.org/jira/browse/HUDI-1909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay reassigned HUDI-1909: --- Assignee: Vinay (was: Danny Chen) > Skip the commits with empty files for flink streaming reader > > > Key: HUDI-1909 > URL: https://issues.apache.org/jira/browse/HUDI-1909 > Project: Apache Hudi > Issue Type: Improvement > Components: Flink Integration >Reporter: Danny Chen >Assignee: Vinay >Priority: Major > Fix For: 0.9.0 > > > Log warnings instead of throwing to make the reader more robust. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HUDI-1847) Add ability to decouple configs for scheduling inline and running async
[ https://issues.apache.org/jira/browse/HUDI-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay reassigned HUDI-1847: --- Assignee: Vinay > Add ability to decouple configs for scheduling inline and running async > --- > > Key: HUDI-1847 > URL: https://issues.apache.org/jira/browse/HUDI-1847 > Project: Apache Hudi > Issue Type: Improvement > Components: Compaction >Reporter: Nishith Agarwal >Assignee: Vinay >Priority: Major > Labels: sev:high > > Currently, there are 2 ways to enable compaction: > > # Inline - This will schedule compaction inline and execute inline > # Async - This option is only available for HoodieDeltaStreamer based jobs. > This turns on scheduling inline and running async as part of the same spark > job. > > Users need a config to be able to schedule only inline while having an > ability to execute in their own spark job -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1847) Add ability to decouple configs for scheduling inline and running async
[ https://issues.apache.org/jira/browse/HUDI-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17357787#comment-17357787 ] Vinay commented on HUDI-1847: - [~nishith29] Thank you for mentioning all the steps clearly, I would like to start working on this issue, will let you know if I face any issues > Add ability to decouple configs for scheduling inline and running async > --- > > Key: HUDI-1847 > URL: https://issues.apache.org/jira/browse/HUDI-1847 > Project: Apache Hudi > Issue Type: Improvement > Components: Compaction >Reporter: Nishith Agarwal >Priority: Major > Labels: sev:high > > Currently, there are 2 ways to enable compaction: > > # Inline - This will schedule compaction inline and execute inline > # Async - This option is only available for HoodieDeltaStreamer based jobs. > This turns on scheduling inline and running async as part of the same spark > job. > > Users need a config to be able to schedule only inline while having an > ability to execute in their own spark job -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1942) HIVE_AUTO_CREATE_DATABASE_OPT_KEY This should default to true when Hudi synchronizes Hive
[ https://issues.apache.org/jira/browse/HUDI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17357786#comment-17357786 ] Vinay commented on HUDI-1942: - [~yao.z...@yuanxi.onaliyun.com] yes, I have already added that in the PR, can you pls review - [GitHub Pull Request #3036|https://github.com/apache/hudi/pull/3036] > HIVE_AUTO_CREATE_DATABASE_OPT_KEY This should default to true when Hudi > synchronizes Hive > - > > Key: HUDI-1942 > URL: https://issues.apache.org/jira/browse/HUDI-1942 > Project: Apache Hudi > Issue Type: Bug > Components: newbie >Reporter: yao.zhou >Assignee: Vinay >Priority: Major > Labels: easy-fix, pull-request-available > Fix For: 0.8.0, 0.9.0 > > > HIVE_AUTO_CREATE_DATABASE_OPT_KEY = > "hoodie.datasource.hive_sync.auto_create_database" > DEFAULT_HIVE_AUTO_CREATE_DATABASE_OPT_KEY = "true" > in HoodieSparkSqlWriter.buildSyncConfig > hiveSyncConfig.autoCreateDatabase = > parameters.get(HIVE_AUTO_CREATE_DATABASE_OPT_KEY).exists(r => r.toBoolean) > * This method sets the parameter to false -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HUDI-1942) HIVE_AUTO_CREATE_DATABASE_OPT_KEY This should default to true when Hudi synchronizes Hive
[ https://issues.apache.org/jira/browse/HUDI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay reassigned HUDI-1942: --- Assignee: Vinay > HIVE_AUTO_CREATE_DATABASE_OPT_KEY This should default to true when Hudi > synchronizes Hive > - > > Key: HUDI-1942 > URL: https://issues.apache.org/jira/browse/HUDI-1942 > Project: Apache Hudi > Issue Type: Bug > Components: newbie >Reporter: yao.zhou >Assignee: Vinay >Priority: Major > Labels: easy-fix > Fix For: 0.8.0, 0.9.0 > > > HIVE_AUTO_CREATE_DATABASE_OPT_KEY = > "hoodie.datasource.hive_sync.auto_create_database" > DEFAULT_HIVE_AUTO_CREATE_DATABASE_OPT_KEY = "true" > in HoodieSparkSqlWriter.buildSyncConfig > hiveSyncConfig.autoCreateDatabase = > parameters.get(HIVE_AUTO_CREATE_DATABASE_OPT_KEY).exists(r => r.toBoolean) > * This method sets the parameter to false -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1942) HIVE_AUTO_CREATE_DATABASE_OPT_KEY This should default to true when Hudi synchronizes Hive
[ https://issues.apache.org/jira/browse/HUDI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated HUDI-1942: Status: In Progress (was: Open) > HIVE_AUTO_CREATE_DATABASE_OPT_KEY This should default to true when Hudi > synchronizes Hive > - > > Key: HUDI-1942 > URL: https://issues.apache.org/jira/browse/HUDI-1942 > Project: Apache Hudi > Issue Type: Bug > Components: newbie >Reporter: yao.zhou >Assignee: Vinay >Priority: Major > Labels: easy-fix > Fix For: 0.8.0, 0.9.0 > > > HIVE_AUTO_CREATE_DATABASE_OPT_KEY = > "hoodie.datasource.hive_sync.auto_create_database" > DEFAULT_HIVE_AUTO_CREATE_DATABASE_OPT_KEY = "true" > in HoodieSparkSqlWriter.buildSyncConfig > hiveSyncConfig.autoCreateDatabase = > parameters.get(HIVE_AUTO_CREATE_DATABASE_OPT_KEY).exists(r => r.toBoolean) > * This method sets the parameter to false -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1942) HIVE_AUTO_CREATE_DATABASE_OPT_KEY This should default to true when Hudi synchronizes Hive
[ https://issues.apache.org/jira/browse/HUDI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17357776#comment-17357776 ] Vinay commented on HUDI-1942: - Tried to test code snippet locally {code:java} scala> var parameters = Map("hoodie.datasource.hive_sync.auto_create_database" -> "true") scala> val a = parameters.get("hoodie.datasource.hive_sync.auto_create_database").exists(r => r.toBoolean) scala> print(a) true {code} This means the only way we will get false value if parameters map does not contain HIVE_AUTO_CREATE_DATABASE_OPT_KEY , one way is to use parameters.getOrElse like we use in `DataSourceUtils.buildHiveSyncConfig` method - https://github.com/apache/hudi/blob/master/hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/DataSourceUtils.java#L271 > HIVE_AUTO_CREATE_DATABASE_OPT_KEY This should default to true when Hudi > synchronizes Hive > - > > Key: HUDI-1942 > URL: https://issues.apache.org/jira/browse/HUDI-1942 > Project: Apache Hudi > Issue Type: Bug > Components: newbie >Reporter: yao.zhou >Assignee: Vinay >Priority: Major > Labels: easy-fix > Fix For: 0.8.0, 0.9.0 > > > HIVE_AUTO_CREATE_DATABASE_OPT_KEY = > "hoodie.datasource.hive_sync.auto_create_database" > DEFAULT_HIVE_AUTO_CREATE_DATABASE_OPT_KEY = "true" > in HoodieSparkSqlWriter.buildSyncConfig > hiveSyncConfig.autoCreateDatabase = > parameters.get(HIVE_AUTO_CREATE_DATABASE_OPT_KEY).exists(r => r.toBoolean) > * This method sets the parameter to false -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1281) deltacommit is not part of ActionType used in HoodieArchivedMetaEntry
[ https://issues.apache.org/jira/browse/HUDI-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17357761#comment-17357761 ] Vinay commented on HUDI-1281: - cf90f17732e313cf71248a8baaf10307463d9b6e > deltacommit is not part of ActionType used in HoodieArchivedMetaEntry > -- > > Key: HUDI-1281 > URL: https://issues.apache.org/jira/browse/HUDI-1281 > Project: Apache Hudi > Issue Type: Wish > Components: Code Cleanup >Reporter: satish >Assignee: Vinay >Priority: Minor > Labels: code-cleanup, pull-request-available > > incubator-hudi/hudi-common/src/main/java/org/apache/hudi/common/model/ActionType.java > does not include deltacommit > Both commit/deltacommit use 'commit' command which can be confusing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-1281) deltacommit is not part of ActionType used in HoodieArchivedMetaEntry
[ https://issues.apache.org/jira/browse/HUDI-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay resolved HUDI-1281. - Resolution: Fixed > deltacommit is not part of ActionType used in HoodieArchivedMetaEntry > -- > > Key: HUDI-1281 > URL: https://issues.apache.org/jira/browse/HUDI-1281 > Project: Apache Hudi > Issue Type: Wish > Components: Code Cleanup >Reporter: satish >Assignee: Vinay >Priority: Minor > Labels: code-cleanup, pull-request-available > > incubator-hudi/hudi-common/src/main/java/org/apache/hudi/common/model/ActionType.java > does not include deltacommit > Both commit/deltacommit use 'commit' command which can be confusing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-1281) deltacommit is not part of ActionType used in HoodieArchivedMetaEntry
[ https://issues.apache.org/jira/browse/HUDI-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay closed HUDI-1281. --- > deltacommit is not part of ActionType used in HoodieArchivedMetaEntry > -- > > Key: HUDI-1281 > URL: https://issues.apache.org/jira/browse/HUDI-1281 > Project: Apache Hudi > Issue Type: Wish > Components: Code Cleanup >Reporter: satish >Assignee: Vinay >Priority: Minor > Labels: code-cleanup, pull-request-available > > incubator-hudi/hudi-common/src/main/java/org/apache/hudi/common/model/ActionType.java > does not include deltacommit > Both commit/deltacommit use 'commit' command which can be confusing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1148) Revisit log messages seen when wiriting or reading through Hudi
[ https://issues.apache.org/jira/browse/HUDI-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17356174#comment-17356174 ] Vinay commented on HUDI-1148: - [~vbalaji] I can take a look at this, looking at the code we printing the entire hadoop conf ``` LOG.info(String.format("Hadoop Configuration: fs.defaultFS: [%s], Config:[%s], FileSystem: [%s]", conf.getRaw("fs.defaultFS"), conf.toString(), fs.toString())); ``` Will make this a debug log. Also, will check the logs while running with HudiDeltaStreamer and writing to CoW/MoR table > Revisit log messages seen when wiriting or reading through Hudi > --- > > Key: HUDI-1148 > URL: https://issues.apache.org/jira/browse/HUDI-1148 > Project: Apache Hudi > Issue Type: Improvement > Components: Writer Core >Affects Versions: 0.9.0 >Reporter: Balaji Varadarajan >Assignee: Vinay >Priority: Minor > Fix For: 0.9.0 > > > [https://github.com/apache/hudi/issues/1906] > > Some of these Log messages can be made debug. We need to generally see the > verbosity of log messages when running hudi operations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1148) Revisit log messages seen when wiriting or reading through Hudi
[ https://issues.apache.org/jira/browse/HUDI-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated HUDI-1148: Status: In Progress (was: Open) > Revisit log messages seen when wiriting or reading through Hudi > --- > > Key: HUDI-1148 > URL: https://issues.apache.org/jira/browse/HUDI-1148 > Project: Apache Hudi > Issue Type: Improvement > Components: Writer Core >Affects Versions: 0.9.0 >Reporter: Balaji Varadarajan >Assignee: Vinay >Priority: Minor > Fix For: 0.9.0 > > > [https://github.com/apache/hudi/issues/1906] > > Some of these Log messages can be made debug. We need to generally see the > verbosity of log messages when running hudi operations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HUDI-1148) Revisit log messages seen when wiriting or reading through Hudi
[ https://issues.apache.org/jira/browse/HUDI-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay reassigned HUDI-1148: --- Assignee: Vinay > Revisit log messages seen when wiriting or reading through Hudi > --- > > Key: HUDI-1148 > URL: https://issues.apache.org/jira/browse/HUDI-1148 > Project: Apache Hudi > Issue Type: Improvement > Components: Writer Core >Affects Versions: 0.9.0 >Reporter: Balaji Varadarajan >Assignee: Vinay >Priority: Minor > Fix For: 0.9.0 > > > [https://github.com/apache/hudi/issues/1906] > > Some of these Log messages can be made debug. We need to generally see the > verbosity of log messages when running hudi operations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1281) deltacommit is not part of ActionType used in HoodieArchivedMetaEntry
[ https://issues.apache.org/jira/browse/HUDI-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated HUDI-1281: Status: Open (was: New) > deltacommit is not part of ActionType used in HoodieArchivedMetaEntry > -- > > Key: HUDI-1281 > URL: https://issues.apache.org/jira/browse/HUDI-1281 > Project: Apache Hudi > Issue Type: Wish > Components: Code Cleanup >Reporter: satish >Assignee: Vinay >Priority: Minor > Labels: code-cleanup > > incubator-hudi/hudi-common/src/main/java/org/apache/hudi/common/model/ActionType.java > does not include deltacommit > Both commit/deltacommit use 'commit' command which can be confusing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1281) deltacommit is not part of ActionType used in HoodieArchivedMetaEntry
[ https://issues.apache.org/jira/browse/HUDI-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated HUDI-1281: Status: In Progress (was: Open) > deltacommit is not part of ActionType used in HoodieArchivedMetaEntry > -- > > Key: HUDI-1281 > URL: https://issues.apache.org/jira/browse/HUDI-1281 > Project: Apache Hudi > Issue Type: Wish > Components: Code Cleanup >Reporter: satish >Assignee: Vinay >Priority: Minor > Labels: code-cleanup > > incubator-hudi/hudi-common/src/main/java/org/apache/hudi/common/model/ActionType.java > does not include deltacommit > Both commit/deltacommit use 'commit' command which can be confusing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HUDI-1281) deltacommit is not part of ActionType used in HoodieArchivedMetaEntry
[ https://issues.apache.org/jira/browse/HUDI-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17354433#comment-17354433 ] Vinay edited comment on HUDI-1281 at 5/31/21, 12:29 PM: [~shivnarayan] Since there is no activity on this ticket, I can pick this up, I see we have not yet added `deltacommit` to ActionType enum. {code:java} public enum ActionType { //TODO HUDI-1281 make deltacommit part of this commit, savepoint, compaction, clean, rollback, replacecommit } {code} was (Author: vinaypatil18): [~shivnarayan] Since there is no activity on this ticket, I can pick this up, I see we have not yet added `deltacommit` to ActionType enum. ``` public enum ActionType { //TODO HUDI-1281 make deltacommit part of this commit, savepoint, compaction, clean, rollback, replacecommit } ``` > deltacommit is not part of ActionType used in HoodieArchivedMetaEntry > -- > > Key: HUDI-1281 > URL: https://issues.apache.org/jira/browse/HUDI-1281 > Project: Apache Hudi > Issue Type: Wish > Components: Code Cleanup >Reporter: satish >Assignee: Vinay >Priority: Minor > Labels: code-cleanup > > incubator-hudi/hudi-common/src/main/java/org/apache/hudi/common/model/ActionType.java > does not include deltacommit > Both commit/deltacommit use 'commit' command which can be confusing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HUDI-1281) deltacommit is not part of ActionType used in HoodieArchivedMetaEntry
[ https://issues.apache.org/jira/browse/HUDI-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay reassigned HUDI-1281: --- Assignee: Vinay > deltacommit is not part of ActionType used in HoodieArchivedMetaEntry > -- > > Key: HUDI-1281 > URL: https://issues.apache.org/jira/browse/HUDI-1281 > Project: Apache Hudi > Issue Type: Wish > Components: Code Cleanup >Reporter: satish >Assignee: Vinay >Priority: Minor > Labels: code-cleanup > > incubator-hudi/hudi-common/src/main/java/org/apache/hudi/common/model/ActionType.java > does not include deltacommit > Both commit/deltacommit use 'commit' command which can be confusing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1281) deltacommit is not part of ActionType used in HoodieArchivedMetaEntry
[ https://issues.apache.org/jira/browse/HUDI-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17354433#comment-17354433 ] Vinay commented on HUDI-1281: - [~shivnarayan] Since there is no activity on this ticket, I can pick this up, I see we have not yet added `deltacommit` to ActionType enum. ``` public enum ActionType { //TODO HUDI-1281 make deltacommit part of this commit, savepoint, compaction, clean, rollback, replacecommit } ``` > deltacommit is not part of ActionType used in HoodieArchivedMetaEntry > -- > > Key: HUDI-1281 > URL: https://issues.apache.org/jira/browse/HUDI-1281 > Project: Apache Hudi > Issue Type: Wish > Components: Code Cleanup >Reporter: satish >Assignee: Vinay >Priority: Minor > Labels: code-cleanup > > incubator-hudi/hudi-common/src/main/java/org/apache/hudi/common/model/ActionType.java > does not include deltacommit > Both commit/deltacommit use 'commit' command which can be confusing. -- This message was sent by Atlassian Jira (v8.3.4#803005)