[jira] [Created] (HUDI-4381) Add support for reading Protobuf data from Kafka

2022-07-11 Thread Vinay (Jira)
Vinay created HUDI-4381:
---

 Summary: Add support for reading Protobuf data from Kafka
 Key: HUDI-4381
 URL: https://issues.apache.org/jira/browse/HUDI-4381
 Project: Apache Hudi
  Issue Type: New Feature
  Components: deltastreamer
Reporter: Vinay


Currently DelaStreamer supports Avro/JSON while reading from Kafka. We should 
add support for Protobuf as well.

The schema for the same can be read from a file or schema registry



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-3496) Add note for S3 Versioned Bucket

2022-02-24 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HUDI-3496:

Status: In Progress  (was: Open)

> Add note for S3 Versioned Bucket
> 
>
> Key: HUDI-3496
> URL: https://issues.apache.org/jira/browse/HUDI-3496
> Project: Apache Hudi
>  Issue Type: Task
>  Components: docs
>Reporter: Vinay
>Assignee: Vinay
>Priority: Trivial
>
> We have faced one issue where the AWS SDK ListApi was choking because the 
> number of delete markers for /.hoodie/ and .hoodie/temp folder has increased 
> to 1000. 
>  
> This task is to add a note to setup up Lifecycle rule properly



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HUDI-3496) Add note for S3 Versioned Bucket

2022-02-24 Thread Vinay (Jira)
Vinay created HUDI-3496:
---

 Summary: Add note for S3 Versioned Bucket
 Key: HUDI-3496
 URL: https://issues.apache.org/jira/browse/HUDI-3496
 Project: Apache Hudi
  Issue Type: Task
  Components: docs
Reporter: Vinay
Assignee: Vinay


We have faced one issue where the AWS SDK ListApi was choking because the 
number of delete markers for /.hoodie/ and .hoodie/temp folder has increased to 
1000. 

 

This task is to add a note to setup up Lifecycle rule properly



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HUDI-310) DynamoDB/Kinesis Change Capture using Delta Streamer

2022-01-09 Thread Vinay (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17471309#comment-17471309
 ] 

Vinay commented on HUDI-310:


[~vinoth]  I remember discussing about this, sry it went to backlog, I am 
taking this up. 

> DynamoDB/Kinesis Change Capture using Delta Streamer
> 
>
> Key: HUDI-310
> URL: https://issues.apache.org/jira/browse/HUDI-310
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: DeltaStreamer
>Reporter: Vinoth Chandar
>Assignee: Vinay
>Priority: Major
>
> The goal here is to do CDC from DynamoDB and then have it be ingested into S3 
> as a Hudi dataset 
> Few resources: 
>  # DynamoDB Streams 
> [https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html]
>   provides change capture logs in Kinesis. 
>  # Walkthrough 
> [https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.KCLAdapter.Walkthrough.html]
>  Code [https://github.com/awslabs/dynamodb-streams-kinesis-adapter] 
>  # Spark Streaming has support for reading Kinesis streams 
> [https://spark.apache.org/docs/2.4.4/streaming-kinesis-integration.html] one 
> of the many resources showing how to change the Spark Kinesis example code to 
> consume dynamodb stream   
> [https://medium.com/@ravi72munde/using-spark-streaming-with-dynamodb-d325b9a73c79]
>  # In DeltaStreamer, we need to add some form of KinesisSource that returns a 
> RDD with new data everytime `fetchNewData` is called 
> [https://github.com/apache/incubator-hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/Source.java]
>   . DeltaStreamer itself does not use Spark Streaming APIs
>  # Internally, we have Avro, Json, Row sources that extract data in these 
> formats. 
> Open questions : 
>  # Should this just be a KinesisSource inside Hudi, that needs to be 
> configured differently or do we need two sources: DynamoDBKinesisSource (that 
> does some DynamoDB Stream specific setup/assumptions) and a plain 
> KinesisSource. What's more valuable to do , if we have to pick one. 
>  # For Kafka integration, we just reused the KafkaRDD in Spark Streaming 
> easily and avoided writing a lot of code by hand. Could we pull the same 
> thing off for Kinesis? (probably needs digging through Spark code) 
>  # What's the format of the data for DynamoDB streams? 
>  
>  
> We should probably flesh these out before going ahead with implementation? 
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HUDI-310) DynamoDB/Kinesis Change Capture using Delta Streamer

2022-01-09 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay reassigned HUDI-310:
--

Assignee: Vinay  (was: Suneel Marthi)

> DynamoDB/Kinesis Change Capture using Delta Streamer
> 
>
> Key: HUDI-310
> URL: https://issues.apache.org/jira/browse/HUDI-310
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: DeltaStreamer
>Reporter: Vinoth Chandar
>Assignee: Vinay
>Priority: Major
>
> The goal here is to do CDC from DynamoDB and then have it be ingested into S3 
> as a Hudi dataset 
> Few resources: 
>  # DynamoDB Streams 
> [https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html]
>   provides change capture logs in Kinesis. 
>  # Walkthrough 
> [https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.KCLAdapter.Walkthrough.html]
>  Code [https://github.com/awslabs/dynamodb-streams-kinesis-adapter] 
>  # Spark Streaming has support for reading Kinesis streams 
> [https://spark.apache.org/docs/2.4.4/streaming-kinesis-integration.html] one 
> of the many resources showing how to change the Spark Kinesis example code to 
> consume dynamodb stream   
> [https://medium.com/@ravi72munde/using-spark-streaming-with-dynamodb-d325b9a73c79]
>  # In DeltaStreamer, we need to add some form of KinesisSource that returns a 
> RDD with new data everytime `fetchNewData` is called 
> [https://github.com/apache/incubator-hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/Source.java]
>   . DeltaStreamer itself does not use Spark Streaming APIs
>  # Internally, we have Avro, Json, Row sources that extract data in these 
> formats. 
> Open questions : 
>  # Should this just be a KinesisSource inside Hudi, that needs to be 
> configured differently or do we need two sources: DynamoDBKinesisSource (that 
> does some DynamoDB Stream specific setup/assumptions) and a plain 
> KinesisSource. What's more valuable to do , if we have to pick one. 
>  # For Kafka integration, we just reused the KafkaRDD in Spark Streaming 
> easily and avoided writing a lot of code by hand. Could we pull the same 
> thing off for Kinesis? (probably needs digging through Spark code) 
>  # What's the format of the data for DynamoDB streams? 
>  
>  
> We should probably flesh these out before going ahead with implementation? 
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HUDI-2257) Add a note to set keygenerator class while deleting data

2021-10-19 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay resolved HUDI-2257.
-
Resolution: Done

> Add a note to set keygenerator class while deleting data
> 
>
> Key: HUDI-2257
> URL: https://issues.apache.org/jira/browse/HUDI-2257
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Docs
>Reporter: Vinay
>Assignee: Vinay
>Priority: Minor
>  Labels: pull-request-available
>
> Copying examples from this blog 
> [https://hudi.apache.org/blog/delete-support-in-hudi/] , does not work as is 
> for Non-Partitioned table, user have to explicitly set the following option 
> in order for delete to work
> {code:java}
> option("hoodie.datasource.write.keygenerator.class","org.apache.hudi.keygen.NonpartitionedKeyGenerator")
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2499) jdbc-url, user and pass is required to be passed in HMS mode

2021-09-29 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HUDI-2499:

Summary: jdbc-url, user and pass is required to be passed in HMS mode  
(was: jdbc-url is required to be passed in HMS mode)

> jdbc-url, user and pass is required to be passed in HMS mode
> 
>
> Key: HUDI-2499
> URL: https://issues.apache.org/jira/browse/HUDI-2499
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
>Affects Versions: 0.9.0
>Reporter: Vinay
>Assignee: Vinay
>Priority: Major
>
> When trying out HMS mode the command fails if jdbc-url is not passed. This is 
> not a required property for HMS mode. 
> {code:java}
> Exception in thread "main" com.beust.jcommander.ParameterException: The 
> following option is required: [--jdbc-url]Exception in thread "main" 
> com.beust.jcommander.ParameterException: The following option is required: 
> [--jdbc-url] at 
> com.beust.jcommander.JCommander.validateOptions(JCommander.java:381)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2499) jdbc-url is required to be passed in HMS mode

2021-09-29 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HUDI-2499:

Affects Version/s: 0.9.0

> jdbc-url is required to be passed in HMS mode
> -
>
> Key: HUDI-2499
> URL: https://issues.apache.org/jira/browse/HUDI-2499
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Vinay
>Assignee: Vinay
>Priority: Major
>
> When trying out HMS mode the command fails if jdbc-url is not passed. This is 
> not a required property for HMS mode. 
> {code:java}
> Exception in thread "main" com.beust.jcommander.ParameterException: The 
> following option is required: [--jdbc-url]Exception in thread "main" 
> com.beust.jcommander.ParameterException: The following option is required: 
> [--jdbc-url] at 
> com.beust.jcommander.JCommander.validateOptions(JCommander.java:381)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2499) jdbc-url is required to be passed in HMS mode

2021-09-29 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HUDI-2499:

Component/s: Hive Integration

> jdbc-url is required to be passed in HMS mode
> -
>
> Key: HUDI-2499
> URL: https://issues.apache.org/jira/browse/HUDI-2499
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
>Affects Versions: 0.9.0
>Reporter: Vinay
>Assignee: Vinay
>Priority: Major
>
> When trying out HMS mode the command fails if jdbc-url is not passed. This is 
> not a required property for HMS mode. 
> {code:java}
> Exception in thread "main" com.beust.jcommander.ParameterException: The 
> following option is required: [--jdbc-url]Exception in thread "main" 
> com.beust.jcommander.ParameterException: The following option is required: 
> [--jdbc-url] at 
> com.beust.jcommander.JCommander.validateOptions(JCommander.java:381)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2499) jdbc-url is required to be passed in HMS mode

2021-09-29 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HUDI-2499:

Status: In Progress  (was: Open)

> jdbc-url is required to be passed in HMS mode
> -
>
> Key: HUDI-2499
> URL: https://issues.apache.org/jira/browse/HUDI-2499
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Vinay
>Assignee: Vinay
>Priority: Major
>
> When trying out HMS mode the command fails if jdbc-url is not passed. This is 
> not a required property for HMS mode. 
> {code:java}
> Exception in thread "main" com.beust.jcommander.ParameterException: The 
> following option is required: [--jdbc-url]Exception in thread "main" 
> com.beust.jcommander.ParameterException: The following option is required: 
> [--jdbc-url] at 
> com.beust.jcommander.JCommander.validateOptions(JCommander.java:381)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2499) jdbc-url is required to be passed in HMS mode

2021-09-29 Thread Vinay (Jira)
Vinay created HUDI-2499:
---

 Summary: jdbc-url is required to be passed in HMS mode
 Key: HUDI-2499
 URL: https://issues.apache.org/jira/browse/HUDI-2499
 Project: Apache Hudi
  Issue Type: Bug
Reporter: Vinay
Assignee: Vinay


When trying out HMS mode the command fails if jdbc-url is not passed. This is 
not a required property for HMS mode. 
{code:java}
Exception in thread "main" com.beust.jcommander.ParameterException: The 
following option is required: [--jdbc-url]Exception in thread "main" 
com.beust.jcommander.ParameterException: The following option is required: 
[--jdbc-url] at 
com.beust.jcommander.JCommander.validateOptions(JCommander.java:381)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2498) Support Hive sync to work with s3

2021-09-28 Thread Vinay (Jira)
Vinay created HUDI-2498:
---

 Summary: Support Hive sync to work with s3
 Key: HUDI-2498
 URL: https://issues.apache.org/jira/browse/HUDI-2498
 Project: Apache Hudi
  Issue Type: New Feature
  Components: Hive Integration
Reporter: Vinay
Assignee: Vinay


Currently Hive sync is not working with s3 out of the box, we have to add 
dependencies explicitly to run_hive_sync script to make it work. 

 

It works fine on EMR but does not work with standalone clusters



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2257) Add a note to set keygenerator class while deleting data

2021-07-30 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HUDI-2257:

Status: In Progress  (was: Open)

> Add a note to set keygenerator class while deleting data
> 
>
> Key: HUDI-2257
> URL: https://issues.apache.org/jira/browse/HUDI-2257
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Docs
>Reporter: Vinay
>Assignee: Vinay
>Priority: Minor
>
> Copying examples from this blog 
> [https://hudi.apache.org/blog/delete-support-in-hudi/] , does not work as is 
> for Non-Partitioned table, user have to explicitly set the following option 
> in order for delete to work
> {code:java}
> option("hoodie.datasource.write.keygenerator.class","org.apache.hudi.keygen.NonpartitionedKeyGenerator")
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2257) Add a note to set keygenerator class while deleting data

2021-07-30 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HUDI-2257:

Description: 
Copying examples from this blog 
[https://hudi.apache.org/blog/delete-support-in-hudi/] , does not work as is 
for Non-Partitioned table, user have to explicitly set the following option in 
order for delete to work
{code:java}
option("hoodie.datasource.write.keygenerator.class","org.apache.hudi.keygen.NonpartitionedKeyGenerator")
{code}

  was:Copying examples from this blog 
[https://hudi.apache.org/blog/delete-support-in-hudi/] , does not work as is 
for Non-Partitioned table, user have to explic


> Add a note to set keygenerator class while deleting data
> 
>
> Key: HUDI-2257
> URL: https://issues.apache.org/jira/browse/HUDI-2257
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Docs
>Reporter: Vinay
>Assignee: Vinay
>Priority: Minor
>
> Copying examples from this blog 
> [https://hudi.apache.org/blog/delete-support-in-hudi/] , does not work as is 
> for Non-Partitioned table, user have to explicitly set the following option 
> in order for delete to work
> {code:java}
> option("hoodie.datasource.write.keygenerator.class","org.apache.hudi.keygen.NonpartitionedKeyGenerator")
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2257) Add a note to set keygenerator class while deleting data

2021-07-30 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HUDI-2257:

Description: Copying examples from this blog 
[https://hudi.apache.org/blog/delete-support-in-hudi/] , does not work as is 
for Non-Partitioned table, user have to explic

> Add a note to set keygenerator class while deleting data
> 
>
> Key: HUDI-2257
> URL: https://issues.apache.org/jira/browse/HUDI-2257
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Docs
>Reporter: Vinay
>Assignee: Vinay
>Priority: Minor
>
> Copying examples from this blog 
> [https://hudi.apache.org/blog/delete-support-in-hudi/] , does not work as is 
> for Non-Partitioned table, user have to explic



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2257) Add a note to set keygenerator class while deleting data

2021-07-30 Thread Vinay (Jira)
Vinay created HUDI-2257:
---

 Summary: Add a note to set keygenerator class while deleting data
 Key: HUDI-2257
 URL: https://issues.apache.org/jira/browse/HUDI-2257
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Docs
Reporter: Vinay
Assignee: Vinay






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-2192) Clean up Multiple versions of scala libraries detected Warning

2021-07-21 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay resolved HUDI-2192.
-
Resolution: Fixed

Fixed - 5a94b6bf54b18739da55ebde10adf93f133e3204

> Clean up Multiple versions of scala libraries detected Warning
> --
>
> Key: HUDI-2192
> URL: https://issues.apache.org/jira/browse/HUDI-2192
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Code Cleanup
>Reporter: Vinay
>Assignee: Vinay
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Building from source results in following warning 
>  
> {code:java}
> [INFO] --- scala-maven-plugin:3.3.1:testCompile (scala-test-compile) @ 
> hudi-cli ---
> [WARNING]  Expected all dependencies to require Scala version: 2.11.12
> [WARNING]  org.apache.hudi:hudi-cli:0.9.0-SNAPSHOT requires scala version: 
> 2.11.12
> [WARNING]  org.apache.hudi:hudi-spark-client:0.9.0-SNAPSHOT requires scala 
> version: 2.11.12
> [WARNING]  org.apache.hudi:hudi-spark_2.11:0.9.0-SNAPSHOT requires scala 
> version: 2.11.12
> [WARNING]  org.apache.hudi:hudi-spark-client:0.9.0-SNAPSHOT requires scala 
> version: 2.11.12
> [WARNING]  org.apache.hudi:hudi-spark-common_2.11:0.9.0-SNAPSHOT requires 
> scala version: 2.11.12
> [WARNING]  org.apache.hudi:hudi-spark_2.11:0.9.0-SNAPSHOT requires scala 
> version: 2.11.12
> [WARNING]  com.fasterxml.jackson.module:jackson-module-scala_2.11:2.6.7.1 
> requires scala version: 2.11.8
> [WARNING] Multiple versions of scala libraries detected!
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2192) Clean up Multiple versions of scala libraries detected Warning

2021-07-18 Thread Vinay (Jira)
Vinay created HUDI-2192:
---

 Summary: Clean up Multiple versions of scala libraries detected 
Warning
 Key: HUDI-2192
 URL: https://issues.apache.org/jira/browse/HUDI-2192
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Code Cleanup
Reporter: Vinay
Assignee: Vinay
 Fix For: 0.9.0


Building from source results in following warning 

 
{code:java}
[INFO] --- scala-maven-plugin:3.3.1:testCompile (scala-test-compile) @ hudi-cli 
---
[WARNING]  Expected all dependencies to require Scala version: 2.11.12
[WARNING]  org.apache.hudi:hudi-cli:0.9.0-SNAPSHOT requires scala version: 
2.11.12
[WARNING]  org.apache.hudi:hudi-spark-client:0.9.0-SNAPSHOT requires scala 
version: 2.11.12
[WARNING]  org.apache.hudi:hudi-spark_2.11:0.9.0-SNAPSHOT requires scala 
version: 2.11.12
[WARNING]  org.apache.hudi:hudi-spark-client:0.9.0-SNAPSHOT requires scala 
version: 2.11.12
[WARNING]  org.apache.hudi:hudi-spark-common_2.11:0.9.0-SNAPSHOT requires scala 
version: 2.11.12
[WARNING]  org.apache.hudi:hudi-spark_2.11:0.9.0-SNAPSHOT requires scala 
version: 2.11.12
[WARNING]  com.fasterxml.jackson.module:jackson-module-scala_2.11:2.6.7.1 
requires scala version: 2.11.8
[WARNING] Multiple versions of scala libraries detected!

{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2192) Clean up Multiple versions of scala libraries detected Warning

2021-07-18 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HUDI-2192:

Status: In Progress  (was: Open)

> Clean up Multiple versions of scala libraries detected Warning
> --
>
> Key: HUDI-2192
> URL: https://issues.apache.org/jira/browse/HUDI-2192
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Code Cleanup
>Reporter: Vinay
>Assignee: Vinay
>Priority: Minor
> Fix For: 0.9.0
>
>
> Building from source results in following warning 
>  
> {code:java}
> [INFO] --- scala-maven-plugin:3.3.1:testCompile (scala-test-compile) @ 
> hudi-cli ---
> [WARNING]  Expected all dependencies to require Scala version: 2.11.12
> [WARNING]  org.apache.hudi:hudi-cli:0.9.0-SNAPSHOT requires scala version: 
> 2.11.12
> [WARNING]  org.apache.hudi:hudi-spark-client:0.9.0-SNAPSHOT requires scala 
> version: 2.11.12
> [WARNING]  org.apache.hudi:hudi-spark_2.11:0.9.0-SNAPSHOT requires scala 
> version: 2.11.12
> [WARNING]  org.apache.hudi:hudi-spark-client:0.9.0-SNAPSHOT requires scala 
> version: 2.11.12
> [WARNING]  org.apache.hudi:hudi-spark-common_2.11:0.9.0-SNAPSHOT requires 
> scala version: 2.11.12
> [WARNING]  org.apache.hudi:hudi-spark_2.11:0.9.0-SNAPSHOT requires scala 
> version: 2.11.12
> [WARNING]  com.fasterxml.jackson.module:jackson-module-scala_2.11:2.6.7.1 
> requires scala version: 2.11.8
> [WARNING] Multiple versions of scala libraries detected!
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-2168) AccessControlException for anonymous user

2021-07-13 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay resolved HUDI-2168.
-
Resolution: Fixed

> AccessControlException for anonymous user
> -
>
> Key: HUDI-2168
> URL: https://issues.apache.org/jira/browse/HUDI-2168
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Testing
>Reporter: Vinay
>Assignee: Vinay
>Priority: Trivial
>  Labels: pull-request-available
>
> Users are facing the following exception while executing test case dependent 
> on starting Hive service
>  
> {code:java}
> Got exception: org.apache.hadoop.security.AccessControlException Permission 
> denied: user=anonymous, access=WRITE
> {code}
> This is specifically happening at the time of clearing Hive DB
> {code:java}
> client.updateHiveSQL("drop database if exists " + 
> hiveSyncConfig.databaseName);
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2168) AccessControlException for anonymous user

2021-07-12 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HUDI-2168:

Status: In Progress  (was: Open)

> AccessControlException for anonymous user
> -
>
> Key: HUDI-2168
> URL: https://issues.apache.org/jira/browse/HUDI-2168
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Testing
>Reporter: Vinay
>Assignee: Vinay
>Priority: Trivial
>
> Users are facing the following exception while executing test case dependent 
> on starting Hive service
>  
> {code:java}
> Got exception: org.apache.hadoop.security.AccessControlException Permission 
> denied: user=anonymous, access=WRITE
> {code}
> This is specifically happening at the time of clearing Hive DB
> {code:java}
> client.updateHiveSQL("drop database if exists " + 
> hiveSyncConfig.databaseName);
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2168) AccessControlException for anonymous user

2021-07-12 Thread Vinay (Jira)
Vinay created HUDI-2168:
---

 Summary: AccessControlException for anonymous user
 Key: HUDI-2168
 URL: https://issues.apache.org/jira/browse/HUDI-2168
 Project: Apache Hudi
  Issue Type: Task
  Components: Testing
Reporter: Vinay
Assignee: Vinay


Users are facing the following exception while executing test case dependent on 
starting Hive service

 
{code:java}
Got exception: org.apache.hadoop.security.AccessControlException Permission 
denied: user=anonymous, access=WRITE
{code}
This is specifically happening at the time of clearing Hive DB
{code:java}
client.updateHiveSQL("drop database if exists " + hiveSyncConfig.databaseName);
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2083) Hudi CLI does not work with S3

2021-07-03 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HUDI-2083:

Status: In Progress  (was: Open)

> Hudi CLI does not work with S3
> --
>
> Key: HUDI-2083
> URL: https://issues.apache.org/jira/browse/HUDI-2083
> Project: Apache Hudi
>  Issue Type: Task
>  Components: CLI
>Reporter: Vinay
>Assignee: Vinay
>Priority: Major
>
> Hudi CLI gives exception when trying to connect to s3 path
> {code:java}
> create --path s3://some-bucket/tmp/hudi/test_mor --tableName test_mor_s3 
> --tableType MERGE_ON_READ
> Failed to get instance of org.apache.hadoop.fs.FileSystem
> org.apache.hudi.exception.HoodieIOException: Failed to get instance of 
> org.apache.hadoop.fs.FileSystem
> at org.apache.hudi.common.fs.FSUtils.getFs(FSUtils.java:98)
> =
> create --path s3a://some-bucket/tmp/hudi/test_mor --tableName test_mor_s3 
> --tableType MERGE_ON_READ
> Command failed java.lang.RuntimeException: java.lang.ClassNotFoundException: 
> Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
> java.lang.ClassNotFoundException: Class 
> org.apache.hadoop.fs.s3a.S3AFileSystem not found
> java.lang.RuntimeException: java.lang.ClassNotFoundException: Class 
> org.apache.hadoop.fs.s3a.S3AFileSystem not found
> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195)
> at 
> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2654)
> {code}
> This could be because target/lib folder does not contain hadoop-aws or aws-s3 
> dependency.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-1910) Supporting Kafka based checkpointing for HoodieDeltaStreamer

2021-06-30 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay resolved HUDI-1910.
-
Resolution: Implemented

> Supporting Kafka based checkpointing for HoodieDeltaStreamer
> 
>
> Key: HUDI-1910
> URL: https://issues.apache.org/jira/browse/HUDI-1910
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: DeltaStreamer
>Reporter: Nishith Agarwal
>Assignee: Vinay
>Priority: Major
>  Labels: pull-request-available, sev:normal, triaged
>
> HoodieDeltaStreamer currently supports commit metadata based checkpoint. Some 
> users have requested support for Kafka based checkpoints for freshness 
> auditing purposes. This ticket tracks any implementation for that. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1910) Supporting Kafka based checkpointing for HoodieDeltaStreamer

2021-06-30 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HUDI-1910:

Status: In Progress  (was: Open)

> Supporting Kafka based checkpointing for HoodieDeltaStreamer
> 
>
> Key: HUDI-1910
> URL: https://issues.apache.org/jira/browse/HUDI-1910
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: DeltaStreamer
>Reporter: Nishith Agarwal
>Assignee: Vinay
>Priority: Major
>  Labels: pull-request-available, sev:normal, triaged
>
> HoodieDeltaStreamer currently supports commit metadata based checkpoint. Some 
> users have requested support for Kafka based checkpoints for freshness 
> auditing purposes. This ticket tracks any implementation for that. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1944) Support Hudi to read from Kafka Consumer Group Offset

2021-06-28 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HUDI-1944:

Status: In Progress  (was: Open)

> Support Hudi to read from Kafka Consumer Group Offset
> -
>
> Key: HUDI-1944
> URL: https://issues.apache.org/jira/browse/HUDI-1944
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: DeltaStreamer
>Reporter: Vinay
>Assignee: Vinay
>Priority: Major
>
> Currently, Hudi provides options to read from latest or earliest. We should 
> even provide users an option to read from group offset as well.
> This change will be in `KafkaOffsetGen` where we can add a method to support 
> this functionality



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1910) Supporting Kafka based checkpointing for HoodieDeltaStreamer

2021-06-28 Thread Vinay (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17370650#comment-17370650
 ] 

Vinay commented on HUDI-1910:
-

Done - 039aeb6dcee0a8eb4372c079ec07b8fc2582e41f

> Supporting Kafka based checkpointing for HoodieDeltaStreamer
> 
>
> Key: HUDI-1910
> URL: https://issues.apache.org/jira/browse/HUDI-1910
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: DeltaStreamer
>Reporter: Nishith Agarwal
>Assignee: Vinay
>Priority: Major
>  Labels: pull-request-available, sev:normal, triaged
>
> HoodieDeltaStreamer currently supports commit metadata based checkpoint. Some 
> users have requested support for Kafka based checkpoints for freshness 
> auditing purposes. This ticket tracks any implementation for that. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2083) Hudi CLI does not work with S3

2021-06-28 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HUDI-2083:

Description: 
Hudi CLI gives exception when trying to connect to s3 path
{code:java}
create --path s3://some-bucket/tmp/hudi/test_mor --tableName test_mor_s3 
--tableType MERGE_ON_READ

Failed to get instance of org.apache.hadoop.fs.FileSystem
org.apache.hudi.exception.HoodieIOException: Failed to get instance of 
org.apache.hadoop.fs.FileSystem
at org.apache.hudi.common.fs.FSUtils.getFs(FSUtils.java:98)

=

create --path s3a://some-bucket/tmp/hudi/test_mor --tableName test_mor_s3 
--tableType MERGE_ON_READ

Command failed java.lang.RuntimeException: java.lang.ClassNotFoundException: 
Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem 
not found
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class 
org.apache.hadoop.fs.s3a.S3AFileSystem not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2654)

{code}
This could be because target/lib folder does not contain hadoop-aws or aws-s3 
dependency.

 

  was:
Hudi CLI gives exception when trying to connect to s3 path
{code:java}
create --path s3://some-bucket/tmp/hudi/test_mor --tableName test_mor_s3 
--tableType MERGE_ON_READ

Failed to get instance of org.apache.hadoop.fs.FileSystem
org.apache.hudi.exception.HoodieIOException: Failed to get instance of 
org.apache.hadoop.fs.FileSystem
at org.apache.hudi.common.fs.FSUtils.getFs(FSUtils.java:98)

=

create --path s3a://some-bucket/tmp/hudi/test_mor --tableName test_mor_s3 
--tableType MERGE_ON_READ

Command failed java.lang.RuntimeException: java.lang.ClassNotFoundException: 
Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem 
not found
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class 
org.apache.hadoop.fs.s3a.S3AFileSystem not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2654)

{code}
This could be because target/lib folder does not contain hadoop-aws or was-s3 
dependency.

 


> Hudi CLI does not work with S3
> --
>
> Key: HUDI-2083
> URL: https://issues.apache.org/jira/browse/HUDI-2083
> Project: Apache Hudi
>  Issue Type: Task
>  Components: CLI
>Reporter: Vinay
>Assignee: Vinay
>Priority: Major
>
> Hudi CLI gives exception when trying to connect to s3 path
> {code:java}
> create --path s3://some-bucket/tmp/hudi/test_mor --tableName test_mor_s3 
> --tableType MERGE_ON_READ
> Failed to get instance of org.apache.hadoop.fs.FileSystem
> org.apache.hudi.exception.HoodieIOException: Failed to get instance of 
> org.apache.hadoop.fs.FileSystem
> at org.apache.hudi.common.fs.FSUtils.getFs(FSUtils.java:98)
> =
> create --path s3a://some-bucket/tmp/hudi/test_mor --tableName test_mor_s3 
> --tableType MERGE_ON_READ
> Command failed java.lang.RuntimeException: java.lang.ClassNotFoundException: 
> Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
> java.lang.ClassNotFoundException: Class 
> org.apache.hadoop.fs.s3a.S3AFileSystem not found
> java.lang.RuntimeException: java.lang.ClassNotFoundException: Class 
> org.apache.hadoop.fs.s3a.S3AFileSystem not found
> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195)
> at 
> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2654)
> {code}
> This could be because target/lib folder does not contain hadoop-aws or aws-s3 
> dependency.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2083) Hudi CLI does not with S3

2021-06-27 Thread Vinay (Jira)
Vinay created HUDI-2083:
---

 Summary: Hudi CLI does not with S3
 Key: HUDI-2083
 URL: https://issues.apache.org/jira/browse/HUDI-2083
 Project: Apache Hudi
  Issue Type: Task
  Components: CLI
Reporter: Vinay
Assignee: Vinay


Hudi CLI gives exception when trying to connect to s3 path
{code:java}
create --path s3://some-bucket/tmp/hudi/test_mor --tableName test_mor_s3 
--tableType MERGE_ON_READ

Failed to get instance of org.apache.hadoop.fs.FileSystem
org.apache.hudi.exception.HoodieIOException: Failed to get instance of 
org.apache.hadoop.fs.FileSystem
at org.apache.hudi.common.fs.FSUtils.getFs(FSUtils.java:98)

=

create --path s3a://some-bucket/tmp/hudi/test_mor --tableName test_mor_s3 
--tableType MERGE_ON_READ

Command failed java.lang.RuntimeException: java.lang.ClassNotFoundException: 
Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem 
not found
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class 
org.apache.hadoop.fs.s3a.S3AFileSystem not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2654)

{code}
This could be because target/lib folder does not contain hadoop-aws or was-s3 
dependency.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2083) Hudi CLI does not work with S3

2021-06-27 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HUDI-2083:

Summary: Hudi CLI does not work with S3  (was: Hudi CLI does not with S3)

> Hudi CLI does not work with S3
> --
>
> Key: HUDI-2083
> URL: https://issues.apache.org/jira/browse/HUDI-2083
> Project: Apache Hudi
>  Issue Type: Task
>  Components: CLI
>Reporter: Vinay
>Assignee: Vinay
>Priority: Major
>
> Hudi CLI gives exception when trying to connect to s3 path
> {code:java}
> create --path s3://some-bucket/tmp/hudi/test_mor --tableName test_mor_s3 
> --tableType MERGE_ON_READ
> Failed to get instance of org.apache.hadoop.fs.FileSystem
> org.apache.hudi.exception.HoodieIOException: Failed to get instance of 
> org.apache.hadoop.fs.FileSystem
> at org.apache.hudi.common.fs.FSUtils.getFs(FSUtils.java:98)
> =
> create --path s3a://some-bucket/tmp/hudi/test_mor --tableName test_mor_s3 
> --tableType MERGE_ON_READ
> Command failed java.lang.RuntimeException: java.lang.ClassNotFoundException: 
> Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
> java.lang.ClassNotFoundException: Class 
> org.apache.hadoop.fs.s3a.S3AFileSystem not found
> java.lang.RuntimeException: java.lang.ClassNotFoundException: Class 
> org.apache.hadoop.fs.s3a.S3AFileSystem not found
> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195)
> at 
> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2654)
> {code}
> This could be because target/lib folder does not contain hadoop-aws or was-s3 
> dependency.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1976) Upgrade hive, jackson, log4j, hadoop to remove vulnerability

2021-06-27 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HUDI-1976:

Status: In Progress  (was: Open)

> Upgrade hive, jackson, log4j, hadoop to remove vulnerability
> 
>
> Key: HUDI-1976
> URL: https://issues.apache.org/jira/browse/HUDI-1976
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Hive Integration
>Reporter: Nishith Agarwal
>Assignee: Vinay
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> [https://github.com/apache/hudi/issues/2827]
> [https://github.com/apache/hudi/issues/2826]
> [https://github.com/apache/hudi/issues/2824|https://github.com/apache/hudi/issues/2826]
> [https://github.com/apache/hudi/issues/2823|https://github.com/apache/hudi/issues/2826]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-1976) Upgrade hive, jackson, log4j, hadoop to remove vulnerability

2021-06-27 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay reassigned HUDI-1976:
---

Assignee: Vinay

> Upgrade hive, jackson, log4j, hadoop to remove vulnerability
> 
>
> Key: HUDI-1976
> URL: https://issues.apache.org/jira/browse/HUDI-1976
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Hive Integration
>Reporter: Nishith Agarwal
>Assignee: Vinay
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> [https://github.com/apache/hudi/issues/2827]
> [https://github.com/apache/hudi/issues/2826]
> [https://github.com/apache/hudi/issues/2824|https://github.com/apache/hudi/issues/2826]
> [https://github.com/apache/hudi/issues/2823|https://github.com/apache/hudi/issues/2826]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2082) Provide option to choose Spark or Flink Delta Streamer

2021-06-27 Thread Vinay (Jira)
Vinay created HUDI-2082:
---

 Summary: Provide option to choose Spark or Flink Delta Streamer
 Key: HUDI-2082
 URL: https://issues.apache.org/jira/browse/HUDI-2082
 Project: Apache Hudi
  Issue Type: New Feature
  Components: DeltaStreamer
Reporter: Vinay
Assignee: Vinay


Currently, Hudi supports Flink as well Spark engine, there are two different 
classes for DeltaStreamer
1. HoodieDeltaStreamer
2.HoodieFlinkStreamer
 
We should have a provision to pass the flag like --runner to choose between 
Flink or Spark and have a single entry point class which will take all the 
common configs. 
 
Based on the runner flag, we can call HoodieDeltaStreamer or HoodieFlinkStreamer
 
This also takes care of making DeltaStreamer generic 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1975) Upgrade java-prometheus-client from 3.1.2 to 4.x

2021-06-25 Thread Vinay (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369811#comment-17369811
 ] 

Vinay commented on HUDI-1975:
-

[~nishith29] I have tried 1st option, the build is passing locally. Asked asked 
the reporter on [https://github.com/apache/hudi/issues/2774] to try out this 
change - [https://github.com/apache/hudi/pull/3160] 

> Upgrade java-prometheus-client from 3.1.2 to 4.x
> 
>
> Key: HUDI-1975
> URL: https://issues.apache.org/jira/browse/HUDI-1975
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Nishith Agarwal
>Assignee: Vinay
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Find more details here -> https://github.com/apache/hudi/issues/2774



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1975) Upgrade java-prometheus-client from 3.1.2 to 4.x

2021-06-25 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HUDI-1975:

Status: In Progress  (was: Open)

> Upgrade java-prometheus-client from 3.1.2 to 4.x
> 
>
> Key: HUDI-1975
> URL: https://issues.apache.org/jira/browse/HUDI-1975
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Nishith Agarwal
>Assignee: Vinay
>Priority: Blocker
> Fix For: 0.9.0
>
>
> Find more details here -> https://github.com/apache/hudi/issues/2774



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-1975) Upgrade java-prometheus-client from 3.1.2 to 4.x

2021-06-25 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay reassigned HUDI-1975:
---

Assignee: Vinay

> Upgrade java-prometheus-client from 3.1.2 to 4.x
> 
>
> Key: HUDI-1975
> URL: https://issues.apache.org/jira/browse/HUDI-1975
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Nishith Agarwal
>Assignee: Vinay
>Priority: Blocker
> Fix For: 0.9.0
>
>
> Find more details here -> https://github.com/apache/hudi/issues/2774



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-2060) Create Tests for KafkaOffsetGen

2021-06-25 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay resolved HUDI-2060.
-
Resolution: Done

Done - ed1a5daa9a15e9123aa7fdba5ce8262d1cae0704

> Create Tests for KafkaOffsetGen
> ---
>
> Key: HUDI-2060
> URL: https://issues.apache.org/jira/browse/HUDI-2060
> Project: Apache Hudi
>  Issue Type: Test
>  Components: Testing
>Reporter: Vinay
>Assignee: Vinay
>Priority: Minor
>  Labels: pull-request-available
>
> We do not have tests for KafkaOffsetGen, there are important functions like `
> getNextOffsetRanges` which should be tested
> `



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-2067) Sync all the options of FlinkOptions to FlinkStreamerConfig

2021-06-23 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay reassigned HUDI-2067:
---

Assignee: Vinay

> Sync all the options of FlinkOptions to FlinkStreamerConfig
> ---
>
> Key: HUDI-2067
> URL: https://issues.apache.org/jira/browse/HUDI-2067
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Vinay
>Priority: Major
> Fix For: 0.9.0
>
>
> Sync the options so that the {{HoodieFlinkStreamer}} can have more config 
> options.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2060) Create Tests for KafkaOffsetGen

2021-06-23 Thread Vinay (Jira)
Vinay created HUDI-2060:
---

 Summary: Create Tests for KafkaOffsetGen
 Key: HUDI-2060
 URL: https://issues.apache.org/jira/browse/HUDI-2060
 Project: Apache Hudi
  Issue Type: Test
  Components: Testing
Reporter: Vinay
Assignee: Vinay


We do not have tests for KafkaOffsetGen, there are important functions like `

getNextOffsetRanges` which should be tested

`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2060) Create Tests for KafkaOffsetGen

2021-06-23 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HUDI-2060:

Status: In Progress  (was: Open)

> Create Tests for KafkaOffsetGen
> ---
>
> Key: HUDI-2060
> URL: https://issues.apache.org/jira/browse/HUDI-2060
> Project: Apache Hudi
>  Issue Type: Test
>  Components: Testing
>Reporter: Vinay
>Assignee: Vinay
>Priority: Minor
>
> We do not have tests for KafkaOffsetGen, there are important functions like `
> getNextOffsetRanges` which should be tested
> `



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-2020) Add Concurrency based configs to Write Configs

2021-06-22 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay closed HUDI-2020.
---
Resolution: Not A Problem

> Add Concurrency based configs to Write Configs
> --
>
> Key: HUDI-2020
> URL: https://issues.apache.org/jira/browse/HUDI-2020
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: Vinay
>Priority: Minor
>
> Some of the configs mentioned here - 
> [https://hudi.apache.org/docs/concurrency_control.html#enabling-multi-writing]
>  are not present in any class. 
>  
> We should add this to HoodieWriteConfig class. Following configs are to be 
> added
> {code:java}
> hoodie.write.lock.provider
> hoodie.write.lock.zookeeper.url
> hoodie.write.lock.zookeeper.port
> hoodie.write.lock.zookeeper.lock_key
> hoodie.write.lock.zookeeper.base_path
> hoodie.write.lock.hivemetastore.database
> hoodie.write.lock.hivemetastore.table
> {code}
> `



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2020) Add Concurrency based configs to Write Configs

2021-06-22 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HUDI-2020:

Status: New  (was: Open)

> Add Concurrency based configs to Write Configs
> --
>
> Key: HUDI-2020
> URL: https://issues.apache.org/jira/browse/HUDI-2020
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: Vinay
>Priority: Minor
>
> Some of the configs mentioned here - 
> [https://hudi.apache.org/docs/concurrency_control.html#enabling-multi-writing]
>  are not present in any class. 
>  
> We should add this to HoodieWriteConfig class. Following configs are to be 
> added
> {code:java}
> hoodie.write.lock.provider
> hoodie.write.lock.zookeeper.url
> hoodie.write.lock.zookeeper.port
> hoodie.write.lock.zookeeper.lock_key
> hoodie.write.lock.zookeeper.base_path
> hoodie.write.lock.hivemetastore.database
> hoodie.write.lock.hivemetastore.table
> {code}
> `



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-2020) Add Concurrency based configs to Write Configs

2021-06-22 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay reassigned HUDI-2020:
---

Assignee: (was: Vinay)

> Add Concurrency based configs to Write Configs
> --
>
> Key: HUDI-2020
> URL: https://issues.apache.org/jira/browse/HUDI-2020
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: Vinay
>Priority: Minor
>
> Some of the configs mentioned here - 
> [https://hudi.apache.org/docs/concurrency_control.html#enabling-multi-writing]
>  are not present in any class. 
>  
> We should add this to HoodieWriteConfig class. Following configs are to be 
> added
> {code:java}
> hoodie.write.lock.provider
> hoodie.write.lock.zookeeper.url
> hoodie.write.lock.zookeeper.port
> hoodie.write.lock.zookeeper.lock_key
> hoodie.write.lock.zookeeper.base_path
> hoodie.write.lock.hivemetastore.database
> hoodie.write.lock.hivemetastore.table
> {code}
> `



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2020) Add Concurrency based configs to Write Configs

2021-06-22 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HUDI-2020:

Status: Open  (was: New)

> Add Concurrency based configs to Write Configs
> --
>
> Key: HUDI-2020
> URL: https://issues.apache.org/jira/browse/HUDI-2020
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: Vinay
>Priority: Minor
>
> Some of the configs mentioned here - 
> [https://hudi.apache.org/docs/concurrency_control.html#enabling-multi-writing]
>  are not present in any class. 
>  
> We should add this to HoodieWriteConfig class. Following configs are to be 
> added
> {code:java}
> hoodie.write.lock.provider
> hoodie.write.lock.zookeeper.url
> hoodie.write.lock.zookeeper.port
> hoodie.write.lock.zookeeper.lock_key
> hoodie.write.lock.zookeeper.base_path
> hoodie.write.lock.hivemetastore.database
> hoodie.write.lock.hivemetastore.table
> {code}
> `



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2020) Add Concurrency based configs to Write Configs

2021-06-22 Thread Vinay (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17367224#comment-17367224
 ] 

Vinay commented on HUDI-2020:
-

Not an issue, configs are provided in LockConfiguration class

> Add Concurrency based configs to Write Configs
> --
>
> Key: HUDI-2020
> URL: https://issues.apache.org/jira/browse/HUDI-2020
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: Vinay
>Assignee: Vinay
>Priority: Minor
>
> Some of the configs mentioned here - 
> [https://hudi.apache.org/docs/concurrency_control.html#enabling-multi-writing]
>  are not present in any class. 
>  
> We should add this to HoodieWriteConfig class. Following configs are to be 
> added
> {code:java}
> hoodie.write.lock.provider
> hoodie.write.lock.zookeeper.url
> hoodie.write.lock.zookeeper.port
> hoodie.write.lock.zookeeper.lock_key
> hoodie.write.lock.zookeeper.base_path
> hoodie.write.lock.hivemetastore.database
> hoodie.write.lock.hivemetastore.table
> {code}
> `



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2020) Add Concurrency based configs to Write Configs

2021-06-22 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HUDI-2020:

Status: In Progress  (was: Open)

> Add Concurrency based configs to Write Configs
> --
>
> Key: HUDI-2020
> URL: https://issues.apache.org/jira/browse/HUDI-2020
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: Vinay
>Assignee: Vinay
>Priority: Minor
>
> Some of the configs mentioned here - 
> [https://hudi.apache.org/docs/concurrency_control.html#enabling-multi-writing]
>  are not present in any class. 
>  
> We should add this to HoodieWriteConfig class. Following configs are to be 
> added
> {code:java}
> hoodie.write.lock.provider
> hoodie.write.lock.zookeeper.url
> hoodie.write.lock.zookeeper.port
> hoodie.write.lock.zookeeper.lock_key
> hoodie.write.lock.zookeeper.base_path
> hoodie.write.lock.hivemetastore.database
> hoodie.write.lock.hivemetastore.table
> {code}
> `



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2035) Create document for PrometheusReporter

2021-06-16 Thread Vinay (Jira)
Vinay created HUDI-2035:
---

 Summary: Create document for PrometheusReporter
 Key: HUDI-2035
 URL: https://issues.apache.org/jira/browse/HUDI-2035
 Project: Apache Hudi
  Issue Type: Task
  Components: Docs
Reporter: Vinay


Although PrometheusReporter is released, there is no documentation for the same



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1872) Move HoodieFlinkStreamer into hudi-utilities module

2021-06-16 Thread Vinay (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17364250#comment-17364250
 ] 

Vinay commented on HUDI-1872:
-

Assigning to me as per the discussion on mailing list, also the PR is in closed 
state - [https://github.com/apache/hudi/pull/2922] 

cc [~danny0405]

> Move HoodieFlinkStreamer into hudi-utilities module
> ---
>
> Key: HUDI-1872
> URL: https://issues.apache.org/jira/browse/HUDI-1872
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: 谢波
>Priority: Major
>  Labels: pull-request-available, sev:normal
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-1872) Move HoodieFlinkStreamer into hudi-utilities module

2021-06-16 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay reassigned HUDI-1872:
---

Assignee: Vinay  (was: 谢波)

> Move HoodieFlinkStreamer into hudi-utilities module
> ---
>
> Key: HUDI-1872
> URL: https://issues.apache.org/jira/browse/HUDI-1872
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Vinay
>Priority: Major
>  Labels: pull-request-available, sev:normal
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1872) Move HoodieFlinkStreamer into hudi-utilities module

2021-06-16 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HUDI-1872:

Status: In Progress  (was: Open)

> Move HoodieFlinkStreamer into hudi-utilities module
> ---
>
> Key: HUDI-1872
> URL: https://issues.apache.org/jira/browse/HUDI-1872
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Vinay
>Priority: Major
>  Labels: pull-request-available, sev:normal
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2020) Add Concurrency based configs to Write Configs

2021-06-15 Thread Vinay (Jira)
Vinay created HUDI-2020:
---

 Summary: Add Concurrency based configs to Write Configs
 Key: HUDI-2020
 URL: https://issues.apache.org/jira/browse/HUDI-2020
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Writer Core
Reporter: Vinay
Assignee: Vinay


Some of the configs mentioned here - 
[https://hudi.apache.org/docs/concurrency_control.html#enabling-multi-writing] 
are not present in any class. 

 

We should add this to HoodieWriteConfig class. Following configs are to be added
{code:java}
hoodie.write.lock.provider
hoodie.write.lock.zookeeper.url
hoodie.write.lock.zookeeper.port
hoodie.write.lock.zookeeper.lock_key
hoodie.write.lock.zookeeper.base_path

hoodie.write.lock.hivemetastore.database
hoodie.write.lock.hivemetastore.table
{code}
`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2003) Auto Compute Compression

2021-06-14 Thread Vinay (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362925#comment-17362925
 ] 

Vinay commented on HUDI-2003:
-

[~nishith29] Please do update the description if I have missed anything here

> Auto Compute Compression
> 
>
> Key: HUDI-2003
> URL: https://issues.apache.org/jira/browse/HUDI-2003
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Writer Core
>Reporter: Vinay
>Priority: Major
>
> Context : 
> Submitted  a spark job to read 3-4B ORC records and wrote to Hudi format. 
> Creating the following table with all the runs that I had carried out based 
> on different options
>  
> ||CONFIG ||Number of Files Created||Size of each file||
> |PARQUET_FILE_MAX_BYTES=DEFAULT|30K|21MB|
> |PARQUET_FILE_MAX_BYTES=1GB|3700|178MB|
> |PARQUET_FILE_MAX_BYTES=1GB
> COPY_ON_WRITE_TABLE_INSERT_SPLIT_SIZE=110|Same as before|Same as before|
> |PARQUET_FILE_MAX_BYTES=1GB
> BULKINSERT_PARALLELISM=100|Same as before|Same as before|
> |PARQUET_FILE_MAX_BYTES=4GB|1600|675MB|
> |PARQUET_FILE_MAX_BYTES=6GB|669|1012MB|
> Based on this runs, it feels that the compression ratio is off. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-2004) Move KafkaOffsetGen.CheckpointUtils test cases to independent class and improve coverage

2021-06-14 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay resolved HUDI-2004.
-
Resolution: Done

Done - 769dd2d7c98558146eb4accb75b6d8e339ae6e0f

> Move KafkaOffsetGen.CheckpointUtils test cases to independent class and 
> improve coverage
> 
>
> Key: HUDI-2004
> URL: https://issues.apache.org/jira/browse/HUDI-2004
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Testing
>Reporter: Vinay
>Assignee: Vinay
>Priority: Minor
>  Labels: pull-request-available
>
> Currently KafkaOffsetGen.CheckpointUtils test cases are present in 
> TestKafkaSource which starts up hdfs, hive,zk service locally. This is not 
> required for CheckpointUtils test cases, hence should be moved to independent 
> test case of its own
>  
> Also, .CheckpointUtils.strToOffsets and CheckpointUtils.offsetsToStr are not 
> unit tested currently



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HUDI-1910) Supporting Kafka based checkpointing for HoodieDeltaStreamer

2021-06-14 Thread Vinay (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362688#comment-17362688
 ] 

Vinay edited comment on HUDI-1910 at 6/14/21, 9:09 AM:
---

[~nishith29] Make sense, so you are suggesting to include 
COMMIT_OFFSET_TO_KAFKA config in KafkaOffsetGen.Config class so that users can 
include it in property file like we pass topic name.

And then use it here -  
[https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java#L474]
 and call commitOffsetToKafka function. is that correct ?

 

If this approach looks good, I can test this change out and create a PR


was (Author: vinaypatil18):
[~nishith29] Make sense, so you suggesting to include COMMIT_OFFSET_TO_KAFKA 
config in KafkaOffsetGen.Config class so that users can include it in property 
file like we pass topic name.

And then use it here -  
[https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java#L474]
 and call commitOffsetToKafka function.

 

If this approach looks good, I can test this change out and create a PR

> Supporting Kafka based checkpointing for HoodieDeltaStreamer
> 
>
> Key: HUDI-1910
> URL: https://issues.apache.org/jira/browse/HUDI-1910
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: DeltaStreamer
>Reporter: Nishith Agarwal
>Assignee: Vinay
>Priority: Major
>  Labels: sev:normal, triaged
>
> HoodieDeltaStreamer currently supports commit metadata based checkpoint. Some 
> users have requested support for Kafka based checkpoints for freshness 
> auditing purposes. This ticket tracks any implementation for that. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HUDI-1975) Upgrade java-prometheus-client from 3.1.2 to 4.x

2021-06-14 Thread Vinay (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362691#comment-17362691
 ] 

Vinay edited comment on HUDI-1975 at 6/14/21, 7:54 AM:
---

[~nishith29] Updated the metrics.version in pom to 3.1.2 , the build fails with
{code:java}
/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metrics/Metrics.java:[128,49]
 cannot find symbol
{code}
MetricsRegistry does not have gauge method in 3.1.2 version, this is part of 
metrics-core dependency. There is a workaround of doing so here - 
[https://github.com/eclipse/microprofile-metrics/issues/244] 


was (Author: vinaypatil18):
Updated the metrics.version in pom to 3.1.2 , the build fails with
{code:java}
/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metrics/Metrics.java:[128,49]
 cannot find symbol
{code}
MetricsRegistry does not gauge method in 3.1.2 version, this is part of 
metrics-core dependency. There is a workaround of doing so here - 
[https://github.com/eclipse/microprofile-metrics/issues/244] 

> Upgrade java-prometheus-client from 3.1.2 to 4.x
> 
>
> Key: HUDI-1975
> URL: https://issues.apache.org/jira/browse/HUDI-1975
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Nishith Agarwal
>Priority: Blocker
> Fix For: 0.9.0
>
>
> Find more details here -> https://github.com/apache/hudi/issues/2774



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1975) Upgrade java-prometheus-client from 3.1.2 to 4.x

2021-06-13 Thread Vinay (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362691#comment-17362691
 ] 

Vinay commented on HUDI-1975:
-

Updated the metrics.version in pom to 3.1.2 , the build fails with
{code:java}
/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metrics/Metrics.java:[128,49]
 cannot find symbol
{code}
MetricsRegistry does not gauge method in 3.1.2 version, this is part of 
metrics-core dependency. There is a workaround of doing so here - 
[https://github.com/eclipse/microprofile-metrics/issues/244] 

> Upgrade java-prometheus-client from 3.1.2 to 4.x
> 
>
> Key: HUDI-1975
> URL: https://issues.apache.org/jira/browse/HUDI-1975
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Nishith Agarwal
>Priority: Blocker
> Fix For: 0.9.0
>
>
> Find more details here -> https://github.com/apache/hudi/issues/2774



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1910) Supporting Kafka based checkpointing for HoodieDeltaStreamer

2021-06-13 Thread Vinay (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362688#comment-17362688
 ] 

Vinay commented on HUDI-1910:
-

[~nishith29] Make sense, so you suggesting to include COMMIT_OFFSET_TO_KAFKA 
config in KafkaOffsetGen.Config class so that users can include it in property 
file like we pass topic name.

And then use it here -  
[https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java#L474]
 and call commitOffsetToKafka function.

 

If this approach looks good, I can test this change out and create a PR

> Supporting Kafka based checkpointing for HoodieDeltaStreamer
> 
>
> Key: HUDI-1910
> URL: https://issues.apache.org/jira/browse/HUDI-1910
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: DeltaStreamer
>Reporter: Nishith Agarwal
>Assignee: Vinay
>Priority: Major
>  Labels: sev:normal, triaged
>
> HoodieDeltaStreamer currently supports commit metadata based checkpoint. Some 
> users have requested support for Kafka based checkpoints for freshness 
> auditing purposes. This ticket tracks any implementation for that. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1910) Supporting Kafka based checkpointing for HoodieDeltaStreamer

2021-06-13 Thread Vinay (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362564#comment-17362564
 ] 

Vinay commented on HUDI-1910:
-

[~nishith29]  Instead of Updating the HoodieWriteCommitCallbackMessage and 
asking user to enable callback config to commit offset to Kafka, I have another 
way in mind. Should we just take the flag as config in delta streamer as 
--commit-offset-to-kafka ? 

We already get the checkpointStr which contains the end offset of each 
partition here - 
[https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java#L265]

 

If the commit is successful and commit-offset-to-kafka is true  - 
[https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java#L474]
 , we can commit the offset back to Kafka as well

 
{code:java}
private void commitOffsetToKafka(String checkpointStr) {
  // checkpointStr => hoodie_test,0:30,1:35
  // offsetMap => {hoodie_test-0=30, hoodie_test-1=35}
  Map offsetMap = 
KafkaOffsetGen.CheckpointUtils.strToOffsets(checkpointStr);
  Map kafkaParams = new HashMap<>();
  props.keySet().stream().filter(prop -> {
return !prop.toString().startsWith("hoodie.");
  }).forEach(prop -> {
kafkaParams.put(prop.toString(), props.get(prop.toString()));
  });
  Map offsetAndMetadataMap = new 
HashMap<>(offsetMap.size());
  offsetMap.forEach((key, value) -> offsetAndMetadataMap.put(key, new 
OffsetAndMetadata(value)));
  try (KafkaConsumer consumer = new KafkaConsumer(kafkaParams)) {
   consumer.commitAsync(offsetAndMetadataMap, new OffsetCommitCallback() {
 @Override
 public void onComplete(Map offsets, 
Exception exception) {
   LOG.info("Offsets committed to Kafka successfully "+ offsets.toString());
 }
   });
  }
{code}
What do you think of this approach ?

 

> Supporting Kafka based checkpointing for HoodieDeltaStreamer
> 
>
> Key: HUDI-1910
> URL: https://issues.apache.org/jira/browse/HUDI-1910
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: DeltaStreamer
>Reporter: Nishith Agarwal
>Assignee: Vinay
>Priority: Major
>  Labels: sev:normal, triaged
>
> HoodieDeltaStreamer currently supports commit metadata based checkpoint. Some 
> users have requested support for Kafka based checkpoints for freshness 
> auditing purposes. This ticket tracks any implementation for that. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1910) Supporting Kafka based checkpointing for HoodieDeltaStreamer

2021-06-13 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HUDI-1910:

Status: In Progress  (was: Open)

> Supporting Kafka based checkpointing for HoodieDeltaStreamer
> 
>
> Key: HUDI-1910
> URL: https://issues.apache.org/jira/browse/HUDI-1910
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: DeltaStreamer
>Reporter: Nishith Agarwal
>Assignee: Vinay
>Priority: Major
>  Labels: sev:normal, triaged
>
> HoodieDeltaStreamer currently supports commit metadata based checkpoint. Some 
> users have requested support for Kafka based checkpoints for freshness 
> auditing purposes. This ticket tracks any implementation for that. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HUDI-1997) Fix hoodie.datasource.hive_sync.auto_create_database documentation

2021-06-13 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay reopened HUDI-1997:
-

> Fix hoodie.datasource.hive_sync.auto_create_database documentation 
> ---
>
> Key: HUDI-1997
> URL: https://issues.apache.org/jira/browse/HUDI-1997
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Docs
>Reporter: Nishith Agarwal
>Assignee: Vinay
>Priority: Major
> Fix For: 0.9.0
>
>
> hoodie.datasource.hive_sync.auto_create_database is supposed to be defaulting 
> to true according to docs but actually defaults to false for 0.7 & 0.8



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-1997) Fix hoodie.datasource.hive_sync.auto_create_database documentation

2021-06-13 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay resolved HUDI-1997.
-
Resolution: Fixed

> Fix hoodie.datasource.hive_sync.auto_create_database documentation 
> ---
>
> Key: HUDI-1997
> URL: https://issues.apache.org/jira/browse/HUDI-1997
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Docs
>Reporter: Nishith Agarwal
>Assignee: Vinay
>Priority: Major
> Fix For: 0.9.0
>
>
> hoodie.datasource.hive_sync.auto_create_database is supposed to be defaulting 
> to true according to docs but actually defaults to false for 0.7 & 0.8



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1997) Fix hoodie.datasource.hive_sync.auto_create_database documentation

2021-06-13 Thread Vinay (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362550#comment-17362550
 ] 

Vinay commented on HUDI-1997:
-

Fixed - 64a8f53b25dd21fb737f783a5cf5d316fc0ae56d

> Fix hoodie.datasource.hive_sync.auto_create_database documentation 
> ---
>
> Key: HUDI-1997
> URL: https://issues.apache.org/jira/browse/HUDI-1997
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Docs
>Reporter: Nishith Agarwal
>Assignee: Vinay
>Priority: Major
> Fix For: 0.9.0
>
>
> hoodie.datasource.hive_sync.auto_create_database is supposed to be defaulting 
> to true according to docs but actually defaults to false for 0.7 & 0.8



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1997) Fix hoodie.datasource.hive_sync.auto_create_database documentation

2021-06-13 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HUDI-1997:

Status: Closed  (was: Patch Available)

> Fix hoodie.datasource.hive_sync.auto_create_database documentation 
> ---
>
> Key: HUDI-1997
> URL: https://issues.apache.org/jira/browse/HUDI-1997
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Docs
>Reporter: Nishith Agarwal
>Assignee: Vinay
>Priority: Major
> Fix For: 0.9.0
>
>
> hoodie.datasource.hive_sync.auto_create_database is supposed to be defaulting 
> to true according to docs but actually defaults to false for 0.7 & 0.8



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2004) Move KafkaOffsetGen.CheckpointUtils test cases to independent class and improve coverage

2021-06-13 Thread Vinay (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362549#comment-17362549
 ] 

Vinay commented on HUDI-2004:
-

Created PR - https://github.com/apache/hudi/pull/3072

> Move KafkaOffsetGen.CheckpointUtils test cases to independent class and 
> improve coverage
> 
>
> Key: HUDI-2004
> URL: https://issues.apache.org/jira/browse/HUDI-2004
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Testing
>Reporter: Vinay
>Assignee: Vinay
>Priority: Minor
>  Labels: pull-request-available
>
> Currently KafkaOffsetGen.CheckpointUtils test cases are present in 
> TestKafkaSource which starts up hdfs, hive,zk service locally. This is not 
> required for CheckpointUtils test cases, hence should be moved to independent 
> test case of its own
>  
> Also, .CheckpointUtils.strToOffsets and CheckpointUtils.offsetsToStr are not 
> unit tested currently



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2004) Move KafkaOffsetGen.CheckpointUtils test cases to independent class and improve coverage

2021-06-13 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HUDI-2004:

Labels: pull-request-available  (was: )

> Move KafkaOffsetGen.CheckpointUtils test cases to independent class and 
> improve coverage
> 
>
> Key: HUDI-2004
> URL: https://issues.apache.org/jira/browse/HUDI-2004
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Testing
>Reporter: Vinay
>Assignee: Vinay
>Priority: Minor
>  Labels: pull-request-available
>
> Currently KafkaOffsetGen.CheckpointUtils test cases are present in 
> TestKafkaSource which starts up hdfs, hive,zk service locally. This is not 
> required for CheckpointUtils test cases, hence should be moved to independent 
> test case of its own
>  
> Also, .CheckpointUtils.strToOffsets and CheckpointUtils.offsetsToStr are not 
> unit tested currently



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1997) Fix hoodie.datasource.hive_sync.auto_create_database documentation

2021-06-12 Thread Vinay (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362465#comment-17362465
 ] 

Vinay commented on HUDI-1997:
-

Created PR - https://github.com/apache/hudi/pull/3066

> Fix hoodie.datasource.hive_sync.auto_create_database documentation 
> ---
>
> Key: HUDI-1997
> URL: https://issues.apache.org/jira/browse/HUDI-1997
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Docs
>Reporter: Nishith Agarwal
>Assignee: Vinay
>Priority: Major
> Fix For: 0.9.0
>
>
> hoodie.datasource.hive_sync.auto_create_database is supposed to be defaulting 
> to true according to docs but actually defaults to false for 0.7 & 0.8



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1997) Fix hoodie.datasource.hive_sync.auto_create_database documentation

2021-06-12 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HUDI-1997:

Status: Patch Available  (was: In Progress)

> Fix hoodie.datasource.hive_sync.auto_create_database documentation 
> ---
>
> Key: HUDI-1997
> URL: https://issues.apache.org/jira/browse/HUDI-1997
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Docs
>Reporter: Nishith Agarwal
>Assignee: Vinay
>Priority: Major
> Fix For: 0.9.0
>
>
> hoodie.datasource.hive_sync.auto_create_database is supposed to be defaulting 
> to true according to docs but actually defaults to false for 0.7 & 0.8



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2004) Move KafkaOffsetGen.CheckpointUtils test cases to independent class and improve coverage

2021-06-12 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HUDI-2004:

Status: In Progress  (was: Open)

> Move KafkaOffsetGen.CheckpointUtils test cases to independent class and 
> improve coverage
> 
>
> Key: HUDI-2004
> URL: https://issues.apache.org/jira/browse/HUDI-2004
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Testing
>Reporter: Vinay
>Assignee: Vinay
>Priority: Minor
>
> Currently KafkaOffsetGen.CheckpointUtils test cases are present in 
> TestKafkaSource which starts up hdfs, hive,zk service locally. This is not 
> required for CheckpointUtils test cases, hence should be moved to independent 
> test case of its own
>  
> Also, .CheckpointUtils.strToOffsets and CheckpointUtils.offsetsToStr are not 
> unit tested currently



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-1148) Revisit log messages seen when wiriting or reading through Hudi

2021-06-12 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay closed HUDI-1148.
---

> Revisit log messages seen when wiriting or reading through Hudi
> ---
>
> Key: HUDI-1148
> URL: https://issues.apache.org/jira/browse/HUDI-1148
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Affects Versions: 0.9.0
>Reporter: Balaji Varadarajan
>Assignee: Vinay
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> [https://github.com/apache/hudi/issues/1906]
>  
> Some of these Log messages can be made debug. We need to generally see the 
> verbosity of log messages when running hudi operations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1148) Revisit log messages seen when wiriting or reading through Hudi

2021-06-12 Thread Vinay (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362376#comment-17362376
 ] 

Vinay commented on HUDI-1148:
-

Fixed - f3d7b49bfea0630dcf488c087755485d8d088270

> Revisit log messages seen when wiriting or reading through Hudi
> ---
>
> Key: HUDI-1148
> URL: https://issues.apache.org/jira/browse/HUDI-1148
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Affects Versions: 0.9.0
>Reporter: Balaji Varadarajan
>Assignee: Vinay
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> [https://github.com/apache/hudi/issues/1906]
>  
> Some of these Log messages can be made debug. We need to generally see the 
> verbosity of log messages when running hudi operations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1975) Upgrade java-prometheus-client from 3.1.2 to 4.x

2021-06-12 Thread Vinay (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362375#comment-17362375
 ] 

Vinay commented on HUDI-1975:
-

[~nishith29] Currently in pom.xml the version is specified as 

 
{code:java}
0.8.0
{code}
Where do we have to upgrade the version to 4.x ? Also, can you please link GH 
issue on why we are doing this

> Upgrade java-prometheus-client from 3.1.2 to 4.x
> 
>
> Key: HUDI-1975
> URL: https://issues.apache.org/jira/browse/HUDI-1975
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Nishith Agarwal
>Priority: Blocker
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1976) Upgrade hive, jackson, log4j, hadoop to remove vulnerability

2021-06-12 Thread Vinay (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362372#comment-17362372
 ] 

Vinay commented on HUDI-1976:
-

WIP PR -https://github.com/apache/hudi/pull/3071

> Upgrade hive, jackson, log4j, hadoop to remove vulnerability
> 
>
> Key: HUDI-1976
> URL: https://issues.apache.org/jira/browse/HUDI-1976
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Hive Integration
>Reporter: Nishith Agarwal
>Priority: Blocker
> Fix For: 0.9.0
>
>
> [https://github.com/apache/hudi/issues/2827]
> [https://github.com/apache/hudi/issues/2826]
> [https://github.com/apache/hudi/issues/2824|https://github.com/apache/hudi/issues/2826]
> [https://github.com/apache/hudi/issues/2823|https://github.com/apache/hudi/issues/2826]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2003) Auto Compute Compression

2021-06-12 Thread Vinay (Jira)
Vinay created HUDI-2003:
---

 Summary: Auto Compute Compression
 Key: HUDI-2003
 URL: https://issues.apache.org/jira/browse/HUDI-2003
 Project: Apache Hudi
  Issue Type: Bug
  Components: Writer Core
Reporter: Vinay


Context : 

Submitted  a spark job to read 3-4B ORC records and wrote to Hudi format. 
Creating the following table with all the runs that I had carried out based on 
different options

 
||CONFIG ||Number of Files Created||Size of each file||
|PARQUET_FILE_MAX_BYTES=DEFAULT|30K|21MB|
|PARQUET_FILE_MAX_BYTES=1GB|3700|178MB|
|PARQUET_FILE_MAX_BYTES=1GB
COPY_ON_WRITE_TABLE_INSERT_SPLIT_SIZE=110|Same as before|Same as before|
|PARQUET_FILE_MAX_BYTES=1GB
BULKINSERT_PARALLELISM=100|Same as before|Same as before|
|PARQUET_FILE_MAX_BYTES=4GB|1600|675MB|
|PARQUET_FILE_MAX_BYTES=6GB|669|1012MB|

Based on this runs, it feels that the compression ratio is off. 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1997) Fix hoodie.datasource.hive_sync.auto_create_database documentation

2021-06-11 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HUDI-1997:

Status: In Progress  (was: Open)

> Fix hoodie.datasource.hive_sync.auto_create_database documentation 
> ---
>
> Key: HUDI-1997
> URL: https://issues.apache.org/jira/browse/HUDI-1997
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Docs
>Reporter: Nishith Agarwal
>Assignee: Vinay
>Priority: Major
> Fix For: 0.9.0
>
>
> hoodie.datasource.hive_sync.auto_create_database is supposed to be defaulting 
> to true according to docs but actually defaults to false for 0.7 & 0.8



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-1997) Fix hoodie.datasource.hive_sync.auto_create_database documentation

2021-06-11 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay reassigned HUDI-1997:
---

Assignee: Vinay

> Fix hoodie.datasource.hive_sync.auto_create_database documentation 
> ---
>
> Key: HUDI-1997
> URL: https://issues.apache.org/jira/browse/HUDI-1997
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Docs
>Reporter: Nishith Agarwal
>Assignee: Vinay
>Priority: Major
> Fix For: 0.9.0
>
>
> hoodie.datasource.hive_sync.auto_create_database is supposed to be defaulting 
> to true according to docs but actually defaults to false for 0.7 & 0.8



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-1942) HIVE_AUTO_CREATE_DATABASE_OPT_KEY This should default to true when Hudi synchronizes Hive

2021-06-10 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay resolved HUDI-1942.
-
Fix Version/s: (was: 0.8.0)
   Resolution: Fixed

Fixed - 
[2a7e1e0|https://github.com/apache/hudi/commit/2a7e1e091e69c53acc0a19e3d792ca15a3d7db62]

> HIVE_AUTO_CREATE_DATABASE_OPT_KEY This should default to true when Hudi 
> synchronizes Hive
> -
>
> Key: HUDI-1942
> URL: https://issues.apache.org/jira/browse/HUDI-1942
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: newbie
>Reporter: yao.zhou
>Assignee: Vinay
>Priority: Major
>  Labels: easy-fix, pull-request-available
> Fix For: 0.9.0
>
>
> HIVE_AUTO_CREATE_DATABASE_OPT_KEY = 
> "hoodie.datasource.hive_sync.auto_create_database"
> DEFAULT_HIVE_AUTO_CREATE_DATABASE_OPT_KEY = "true"
> in HoodieSparkSqlWriter.buildSyncConfig 
> hiveSyncConfig.autoCreateDatabase = 
> parameters.get(HIVE_AUTO_CREATE_DATABASE_OPT_KEY).exists(r => r.toBoolean)
>  * This method sets the parameter to false



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-1892) NullPointerException when using OverwriteNonDefaultsWithLatestAvroPayload at hudi 0.9.0

2021-06-09 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay resolved HUDI-1892.
-
Resolution: Fixed

> NullPointerException when using OverwriteNonDefaultsWithLatestAvroPayload at 
> hudi 0.9.0
> ---
>
> Key: HUDI-1892
> URL: https://issues.apache.org/jira/browse/HUDI-1892
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: shenbing
>Assignee: Vinay
>Priority: Major
>  Labels: pull-request-available
>
> using compiled hudi 0.9.0 with hadoop3.0.0 and hive3.1.1 after resolving 
> dependency conflicts, 
> I import hudi-spark-bundle_2.11-0.9.0-SNAPSHOT.jar into my project. 
> When I using OverwriteNonDefaultsWithLatestAvroPayload to update field with 
> new value, I got the error.
> {code:java}
> Caused by: java.util.concurrent.ExecutionException: 
> org.apache.hudi.exception.HoodieUpsertException: Failed to combine/merge new 
> record with old value in storage, for new record {HoodieRecord{key=HoodieKey 
> { recordKey=1 partitionPath=date=1}, currentLocation='HoodieRecordLocation 
> {instantTime=20210510160355, fileId=9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0}', 
> newLocation='HoodieRecordLocation {instantTime=20210510160400, 
> fileId=9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0}'}}, old value 
> {{"_hoodie_commit_time": "20210510160355", "_hoodie_commit_seqno": 
> "20210510160355_0_50", "_hoodie_record_key": "1", "_hoodie_partition_path": 
> "date=1", "_hoodie_file_name": 
> "9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0_0-1502-1519_20210510160355.parquet", 
> "uuid": "1", "name": "jerry", "age": 10, "date": "1", "update_time": "1"}}
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> at 
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:141)
> ... 34 more
> Caused by: org.apache.hudi.exception.HoodieUpsertException: Failed to 
> combine/merge new record with old value in storage, for new record 
> {HoodieRecord{key=HoodieKey { recordKey=1 partitionPath=date=1}, 
> currentLocation='HoodieRecordLocation {instantTime=20210510160355, 
> fileId=9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0}', 
> newLocation='HoodieRecordLocation {instantTime=20210510160400, 
> fileId=9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0}'}}, old value 
> {{"_hoodie_commit_time": "20210510160355", "_hoodie_commit_seqno": 
> "20210510160355_0_50", "_hoodie_record_key": "1", "_hoodie_partition_path": 
> "date=1", "_hoodie_file_name": 
> "9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0_0-1502-1519_20210510160355.parquet", 
> "uuid": "1", "name": "jerry", "age": 10, "date": "1", "update_time": "1"}}
> at 
> org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:290)
> at 
> org.apache.hudi.table.action.commit.AbstractMergeHelper$UpdateHandler.consumeOneRecord(AbstractMergeHelper.java:122)
> at 
> org.apache.hudi.table.action.commit.AbstractMergeHelper$UpdateHandler.consumeOneRecord(AbstractMergeHelper.java:112)
> at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:37)
> at 
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:121)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ... 3 more
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hudi.common.model.OverwriteWithLatestAvroPayload.overwriteField(OverwriteWithLatestAvroPayload.java:97)
> at 
> org.apache.hudi.common.model.OverwriteNonDefaultsWithLatestAvroPayload.lambda$combineAndGetUpdateValue$0(OverwriteNonDefaultsWithLatestAvroPayload.java:67)
> at java.util.ArrayList.forEach(ArrayList.java:1259)
> at 
> org.apache.hudi.common.model.OverwriteNonDefaultsWithLatestAvroPayload.combineAndGetUpdateValue(OverwriteNonDefaultsWithLatestAvroPayload.java:64)
> at 
> org.apache.hudi.common.model.HoodieRecordPayload.combineAndGetUpdateValue(HoodieRecordPayload.java:81)
> at 
> org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:276)
> ... 8 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1892) NullPointerException when using OverwriteNonDefaultsWithLatestAvroPayload at hudi 0.9.0

2021-06-09 Thread Vinay (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17360060#comment-17360060
 ] 

Vinay commented on HUDI-1892:
-

Fixed - 11360f707e969747e1a30791acb23857cc376589

> NullPointerException when using OverwriteNonDefaultsWithLatestAvroPayload at 
> hudi 0.9.0
> ---
>
> Key: HUDI-1892
> URL: https://issues.apache.org/jira/browse/HUDI-1892
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: shenbing
>Assignee: Vinay
>Priority: Major
>  Labels: pull-request-available
>
> using compiled hudi 0.9.0 with hadoop3.0.0 and hive3.1.1 after resolving 
> dependency conflicts, 
> I import hudi-spark-bundle_2.11-0.9.0-SNAPSHOT.jar into my project. 
> When I using OverwriteNonDefaultsWithLatestAvroPayload to update field with 
> new value, I got the error.
> {code:java}
> Caused by: java.util.concurrent.ExecutionException: 
> org.apache.hudi.exception.HoodieUpsertException: Failed to combine/merge new 
> record with old value in storage, for new record {HoodieRecord{key=HoodieKey 
> { recordKey=1 partitionPath=date=1}, currentLocation='HoodieRecordLocation 
> {instantTime=20210510160355, fileId=9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0}', 
> newLocation='HoodieRecordLocation {instantTime=20210510160400, 
> fileId=9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0}'}}, old value 
> {{"_hoodie_commit_time": "20210510160355", "_hoodie_commit_seqno": 
> "20210510160355_0_50", "_hoodie_record_key": "1", "_hoodie_partition_path": 
> "date=1", "_hoodie_file_name": 
> "9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0_0-1502-1519_20210510160355.parquet", 
> "uuid": "1", "name": "jerry", "age": 10, "date": "1", "update_time": "1"}}
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> at 
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:141)
> ... 34 more
> Caused by: org.apache.hudi.exception.HoodieUpsertException: Failed to 
> combine/merge new record with old value in storage, for new record 
> {HoodieRecord{key=HoodieKey { recordKey=1 partitionPath=date=1}, 
> currentLocation='HoodieRecordLocation {instantTime=20210510160355, 
> fileId=9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0}', 
> newLocation='HoodieRecordLocation {instantTime=20210510160400, 
> fileId=9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0}'}}, old value 
> {{"_hoodie_commit_time": "20210510160355", "_hoodie_commit_seqno": 
> "20210510160355_0_50", "_hoodie_record_key": "1", "_hoodie_partition_path": 
> "date=1", "_hoodie_file_name": 
> "9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0_0-1502-1519_20210510160355.parquet", 
> "uuid": "1", "name": "jerry", "age": 10, "date": "1", "update_time": "1"}}
> at 
> org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:290)
> at 
> org.apache.hudi.table.action.commit.AbstractMergeHelper$UpdateHandler.consumeOneRecord(AbstractMergeHelper.java:122)
> at 
> org.apache.hudi.table.action.commit.AbstractMergeHelper$UpdateHandler.consumeOneRecord(AbstractMergeHelper.java:112)
> at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:37)
> at 
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:121)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ... 3 more
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hudi.common.model.OverwriteWithLatestAvroPayload.overwriteField(OverwriteWithLatestAvroPayload.java:97)
> at 
> org.apache.hudi.common.model.OverwriteNonDefaultsWithLatestAvroPayload.lambda$combineAndGetUpdateValue$0(OverwriteNonDefaultsWithLatestAvroPayload.java:67)
> at java.util.ArrayList.forEach(ArrayList.java:1259)
> at 
> org.apache.hudi.common.model.OverwriteNonDefaultsWithLatestAvroPayload.combineAndGetUpdateValue(OverwriteNonDefaultsWithLatestAvroPayload.java:64)
> at 
> org.apache.hudi.common.model.HoodieRecordPayload.combineAndGetUpdateValue(HoodieRecordPayload.java:81)
> at 
> org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:276)
> ... 8 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1892) NullPointerException when using OverwriteNonDefaultsWithLatestAvroPayload at hudi 0.9.0

2021-06-08 Thread Vinay (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17359328#comment-17359328
 ] 

Vinay commented on HUDI-1892:
-

Looking at the code this issue could be because the defaultValue is not empty 
but the actual value of the field is null (This could be the case if field 
schema is Union type like "null,string")

Creating a PR to use String.valueOf(value) in the below function to handle this 
edge case
{code:java}
public Boolean overwriteField(Object value, Object defaultValue) {
  return defaultValue == null ? value == null : 
defaultValue.toString().equals(value.toString());
}
{code}

> NullPointerException when using OverwriteNonDefaultsWithLatestAvroPayload at 
> hudi 0.9.0
> ---
>
> Key: HUDI-1892
> URL: https://issues.apache.org/jira/browse/HUDI-1892
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: shenbing
>Assignee: Vinay
>Priority: Major
>
> using compiled hudi 0.9.0 with hadoop3.0.0 and hive3.1.1 after resolving 
> dependency conflicts, 
> I import hudi-spark-bundle_2.11-0.9.0-SNAPSHOT.jar into my project. 
> When I using OverwriteNonDefaultsWithLatestAvroPayload to update field with 
> new value, I got the error.
> {code:java}
> Caused by: java.util.concurrent.ExecutionException: 
> org.apache.hudi.exception.HoodieUpsertException: Failed to combine/merge new 
> record with old value in storage, for new record {HoodieRecord{key=HoodieKey 
> { recordKey=1 partitionPath=date=1}, currentLocation='HoodieRecordLocation 
> {instantTime=20210510160355, fileId=9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0}', 
> newLocation='HoodieRecordLocation {instantTime=20210510160400, 
> fileId=9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0}'}}, old value 
> {{"_hoodie_commit_time": "20210510160355", "_hoodie_commit_seqno": 
> "20210510160355_0_50", "_hoodie_record_key": "1", "_hoodie_partition_path": 
> "date=1", "_hoodie_file_name": 
> "9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0_0-1502-1519_20210510160355.parquet", 
> "uuid": "1", "name": "jerry", "age": 10, "date": "1", "update_time": "1"}}
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> at 
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:141)
> ... 34 more
> Caused by: org.apache.hudi.exception.HoodieUpsertException: Failed to 
> combine/merge new record with old value in storage, for new record 
> {HoodieRecord{key=HoodieKey { recordKey=1 partitionPath=date=1}, 
> currentLocation='HoodieRecordLocation {instantTime=20210510160355, 
> fileId=9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0}', 
> newLocation='HoodieRecordLocation {instantTime=20210510160400, 
> fileId=9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0}'}}, old value 
> {{"_hoodie_commit_time": "20210510160355", "_hoodie_commit_seqno": 
> "20210510160355_0_50", "_hoodie_record_key": "1", "_hoodie_partition_path": 
> "date=1", "_hoodie_file_name": 
> "9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0_0-1502-1519_20210510160355.parquet", 
> "uuid": "1", "name": "jerry", "age": 10, "date": "1", "update_time": "1"}}
> at 
> org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:290)
> at 
> org.apache.hudi.table.action.commit.AbstractMergeHelper$UpdateHandler.consumeOneRecord(AbstractMergeHelper.java:122)
> at 
> org.apache.hudi.table.action.commit.AbstractMergeHelper$UpdateHandler.consumeOneRecord(AbstractMergeHelper.java:112)
> at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:37)
> at 
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:121)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ... 3 more
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hudi.common.model.OverwriteWithLatestAvroPayload.overwriteField(OverwriteWithLatestAvroPayload.java:97)
> at 
> org.apache.hudi.common.model.OverwriteNonDefaultsWithLatestAvroPayload.lambda$combineAndGetUpdateValue$0(OverwriteNonDefaultsWithLatestAvroPayload.java:67)
> at java.util.ArrayList.forEach(ArrayList.java:1259)
> at 
> org.apache.hudi.common.model.OverwriteNonDefaultsWithLatestAvroPayload.combineAndGetUpdateValue(OverwriteNonDefaultsWithLatestAvroPayload.java:64)
> at 
> org.apache.hudi.common.model.HoodieRecordPayload.combineAndGetUpdateValue(HoodieRecordPayload.java:81)
> at 
> org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:276)
> ... 8 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-1892) NullPointerException when using OverwriteNonDefaultsWithLatestAvroPayload at hudi 0.9.0

2021-06-08 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay reassigned HUDI-1892:
---

Assignee: Vinay

> NullPointerException when using OverwriteNonDefaultsWithLatestAvroPayload at 
> hudi 0.9.0
> ---
>
> Key: HUDI-1892
> URL: https://issues.apache.org/jira/browse/HUDI-1892
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: shenbing
>Assignee: Vinay
>Priority: Major
>
> using compiled hudi 0.9.0 with hadoop3.0.0 and hive3.1.1 after resolving 
> dependency conflicts, 
> I import hudi-spark-bundle_2.11-0.9.0-SNAPSHOT.jar into my project. 
> When I using OverwriteNonDefaultsWithLatestAvroPayload to update field with 
> new value, I got the error.
> {code:java}
> Caused by: java.util.concurrent.ExecutionException: 
> org.apache.hudi.exception.HoodieUpsertException: Failed to combine/merge new 
> record with old value in storage, for new record {HoodieRecord{key=HoodieKey 
> { recordKey=1 partitionPath=date=1}, currentLocation='HoodieRecordLocation 
> {instantTime=20210510160355, fileId=9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0}', 
> newLocation='HoodieRecordLocation {instantTime=20210510160400, 
> fileId=9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0}'}}, old value 
> {{"_hoodie_commit_time": "20210510160355", "_hoodie_commit_seqno": 
> "20210510160355_0_50", "_hoodie_record_key": "1", "_hoodie_partition_path": 
> "date=1", "_hoodie_file_name": 
> "9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0_0-1502-1519_20210510160355.parquet", 
> "uuid": "1", "name": "jerry", "age": 10, "date": "1", "update_time": "1"}}
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> at 
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:141)
> ... 34 more
> Caused by: org.apache.hudi.exception.HoodieUpsertException: Failed to 
> combine/merge new record with old value in storage, for new record 
> {HoodieRecord{key=HoodieKey { recordKey=1 partitionPath=date=1}, 
> currentLocation='HoodieRecordLocation {instantTime=20210510160355, 
> fileId=9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0}', 
> newLocation='HoodieRecordLocation {instantTime=20210510160400, 
> fileId=9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0}'}}, old value 
> {{"_hoodie_commit_time": "20210510160355", "_hoodie_commit_seqno": 
> "20210510160355_0_50", "_hoodie_record_key": "1", "_hoodie_partition_path": 
> "date=1", "_hoodie_file_name": 
> "9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0_0-1502-1519_20210510160355.parquet", 
> "uuid": "1", "name": "jerry", "age": 10, "date": "1", "update_time": "1"}}
> at 
> org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:290)
> at 
> org.apache.hudi.table.action.commit.AbstractMergeHelper$UpdateHandler.consumeOneRecord(AbstractMergeHelper.java:122)
> at 
> org.apache.hudi.table.action.commit.AbstractMergeHelper$UpdateHandler.consumeOneRecord(AbstractMergeHelper.java:112)
> at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:37)
> at 
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:121)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ... 3 more
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hudi.common.model.OverwriteWithLatestAvroPayload.overwriteField(OverwriteWithLatestAvroPayload.java:97)
> at 
> org.apache.hudi.common.model.OverwriteNonDefaultsWithLatestAvroPayload.lambda$combineAndGetUpdateValue$0(OverwriteNonDefaultsWithLatestAvroPayload.java:67)
> at java.util.ArrayList.forEach(ArrayList.java:1259)
> at 
> org.apache.hudi.common.model.OverwriteNonDefaultsWithLatestAvroPayload.combineAndGetUpdateValue(OverwriteNonDefaultsWithLatestAvroPayload.java:64)
> at 
> org.apache.hudi.common.model.HoodieRecordPayload.combineAndGetUpdateValue(HoodieRecordPayload.java:81)
> at 
> org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:276)
> ... 8 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1892) NullPointerException when using OverwriteNonDefaultsWithLatestAvroPayload at hudi 0.9.0

2021-06-08 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HUDI-1892:

Status: In Progress  (was: Open)

> NullPointerException when using OverwriteNonDefaultsWithLatestAvroPayload at 
> hudi 0.9.0
> ---
>
> Key: HUDI-1892
> URL: https://issues.apache.org/jira/browse/HUDI-1892
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: shenbing
>Assignee: Vinay
>Priority: Major
>
> using compiled hudi 0.9.0 with hadoop3.0.0 and hive3.1.1 after resolving 
> dependency conflicts, 
> I import hudi-spark-bundle_2.11-0.9.0-SNAPSHOT.jar into my project. 
> When I using OverwriteNonDefaultsWithLatestAvroPayload to update field with 
> new value, I got the error.
> {code:java}
> Caused by: java.util.concurrent.ExecutionException: 
> org.apache.hudi.exception.HoodieUpsertException: Failed to combine/merge new 
> record with old value in storage, for new record {HoodieRecord{key=HoodieKey 
> { recordKey=1 partitionPath=date=1}, currentLocation='HoodieRecordLocation 
> {instantTime=20210510160355, fileId=9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0}', 
> newLocation='HoodieRecordLocation {instantTime=20210510160400, 
> fileId=9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0}'}}, old value 
> {{"_hoodie_commit_time": "20210510160355", "_hoodie_commit_seqno": 
> "20210510160355_0_50", "_hoodie_record_key": "1", "_hoodie_partition_path": 
> "date=1", "_hoodie_file_name": 
> "9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0_0-1502-1519_20210510160355.parquet", 
> "uuid": "1", "name": "jerry", "age": 10, "date": "1", "update_time": "1"}}
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> at 
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:141)
> ... 34 more
> Caused by: org.apache.hudi.exception.HoodieUpsertException: Failed to 
> combine/merge new record with old value in storage, for new record 
> {HoodieRecord{key=HoodieKey { recordKey=1 partitionPath=date=1}, 
> currentLocation='HoodieRecordLocation {instantTime=20210510160355, 
> fileId=9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0}', 
> newLocation='HoodieRecordLocation {instantTime=20210510160400, 
> fileId=9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0}'}}, old value 
> {{"_hoodie_commit_time": "20210510160355", "_hoodie_commit_seqno": 
> "20210510160355_0_50", "_hoodie_record_key": "1", "_hoodie_partition_path": 
> "date=1", "_hoodie_file_name": 
> "9a0fcb8e-8cd9-4c9c-bea8-46bbf509035e-0_0-1502-1519_20210510160355.parquet", 
> "uuid": "1", "name": "jerry", "age": 10, "date": "1", "update_time": "1"}}
> at 
> org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:290)
> at 
> org.apache.hudi.table.action.commit.AbstractMergeHelper$UpdateHandler.consumeOneRecord(AbstractMergeHelper.java:122)
> at 
> org.apache.hudi.table.action.commit.AbstractMergeHelper$UpdateHandler.consumeOneRecord(AbstractMergeHelper.java:112)
> at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:37)
> at 
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:121)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ... 3 more
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hudi.common.model.OverwriteWithLatestAvroPayload.overwriteField(OverwriteWithLatestAvroPayload.java:97)
> at 
> org.apache.hudi.common.model.OverwriteNonDefaultsWithLatestAvroPayload.lambda$combineAndGetUpdateValue$0(OverwriteNonDefaultsWithLatestAvroPayload.java:67)
> at java.util.ArrayList.forEach(ArrayList.java:1259)
> at 
> org.apache.hudi.common.model.OverwriteNonDefaultsWithLatestAvroPayload.combineAndGetUpdateValue(OverwriteNonDefaultsWithLatestAvroPayload.java:64)
> at 
> org.apache.hudi.common.model.HoodieRecordPayload.combineAndGetUpdateValue(HoodieRecordPayload.java:81)
> at 
> org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:276)
> ... 8 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1909) Skip the commits with empty files for flink streaming reader

2021-06-07 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HUDI-1909:

Status: In Progress  (was: Open)

> Skip the commits with empty files for flink streaming reader
> 
>
> Key: HUDI-1909
> URL: https://issues.apache.org/jira/browse/HUDI-1909
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Vinay
>Priority: Major
> Fix For: 0.9.0
>
>
> Log warnings instead of throwing to make the reader more robust.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-1909) Skip the commits with empty files for flink streaming reader

2021-06-07 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay reassigned HUDI-1909:
---

Assignee: Vinay  (was: Danny Chen)

> Skip the commits with empty files for flink streaming reader
> 
>
> Key: HUDI-1909
> URL: https://issues.apache.org/jira/browse/HUDI-1909
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Vinay
>Priority: Major
> Fix For: 0.9.0
>
>
> Log warnings instead of throwing to make the reader more robust.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-1847) Add ability to decouple configs for scheduling inline and running async

2021-06-05 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay reassigned HUDI-1847:
---

Assignee: Vinay

> Add ability to decouple configs for scheduling inline and running async
> ---
>
> Key: HUDI-1847
> URL: https://issues.apache.org/jira/browse/HUDI-1847
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Nishith Agarwal
>Assignee: Vinay
>Priority: Major
>  Labels: sev:high
>
> Currently, there are 2 ways to enable compaction:
>  
>  # Inline - This will schedule compaction inline and execute inline
>  # Async - This option is only available for HoodieDeltaStreamer based jobs. 
> This turns on scheduling inline and running async as part of the same spark 
> job.
>  
> Users need a config to be able to schedule only inline while having an 
> ability to execute in their own spark job



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1847) Add ability to decouple configs for scheduling inline and running async

2021-06-05 Thread Vinay (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17357787#comment-17357787
 ] 

Vinay commented on HUDI-1847:
-

[~nishith29]  Thank you for mentioning all the steps clearly, I would like to 
start working on this issue, will let you know if I face any issues

> Add ability to decouple configs for scheduling inline and running async
> ---
>
> Key: HUDI-1847
> URL: https://issues.apache.org/jira/browse/HUDI-1847
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Nishith Agarwal
>Priority: Major
>  Labels: sev:high
>
> Currently, there are 2 ways to enable compaction:
>  
>  # Inline - This will schedule compaction inline and execute inline
>  # Async - This option is only available for HoodieDeltaStreamer based jobs. 
> This turns on scheduling inline and running async as part of the same spark 
> job.
>  
> Users need a config to be able to schedule only inline while having an 
> ability to execute in their own spark job



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1942) HIVE_AUTO_CREATE_DATABASE_OPT_KEY This should default to true when Hudi synchronizes Hive

2021-06-05 Thread Vinay (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17357786#comment-17357786
 ] 

Vinay commented on HUDI-1942:
-

[~yao.z...@yuanxi.onaliyun.com] yes, I have already added that in the PR, can 
you pls review - [GitHub Pull Request 
#3036|https://github.com/apache/hudi/pull/3036]

> HIVE_AUTO_CREATE_DATABASE_OPT_KEY This should default to true when Hudi 
> synchronizes Hive
> -
>
> Key: HUDI-1942
> URL: https://issues.apache.org/jira/browse/HUDI-1942
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: newbie
>Reporter: yao.zhou
>Assignee: Vinay
>Priority: Major
>  Labels: easy-fix, pull-request-available
> Fix For: 0.8.0, 0.9.0
>
>
> HIVE_AUTO_CREATE_DATABASE_OPT_KEY = 
> "hoodie.datasource.hive_sync.auto_create_database"
> DEFAULT_HIVE_AUTO_CREATE_DATABASE_OPT_KEY = "true"
> in HoodieSparkSqlWriter.buildSyncConfig 
> hiveSyncConfig.autoCreateDatabase = 
> parameters.get(HIVE_AUTO_CREATE_DATABASE_OPT_KEY).exists(r => r.toBoolean)
>  * This method sets the parameter to false



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-1942) HIVE_AUTO_CREATE_DATABASE_OPT_KEY This should default to true when Hudi synchronizes Hive

2021-06-05 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay reassigned HUDI-1942:
---

Assignee: Vinay

> HIVE_AUTO_CREATE_DATABASE_OPT_KEY This should default to true when Hudi 
> synchronizes Hive
> -
>
> Key: HUDI-1942
> URL: https://issues.apache.org/jira/browse/HUDI-1942
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: newbie
>Reporter: yao.zhou
>Assignee: Vinay
>Priority: Major
>  Labels: easy-fix
> Fix For: 0.8.0, 0.9.0
>
>
> HIVE_AUTO_CREATE_DATABASE_OPT_KEY = 
> "hoodie.datasource.hive_sync.auto_create_database"
> DEFAULT_HIVE_AUTO_CREATE_DATABASE_OPT_KEY = "true"
> in HoodieSparkSqlWriter.buildSyncConfig 
> hiveSyncConfig.autoCreateDatabase = 
> parameters.get(HIVE_AUTO_CREATE_DATABASE_OPT_KEY).exists(r => r.toBoolean)
>  * This method sets the parameter to false



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1942) HIVE_AUTO_CREATE_DATABASE_OPT_KEY This should default to true when Hudi synchronizes Hive

2021-06-05 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HUDI-1942:

Status: In Progress  (was: Open)

> HIVE_AUTO_CREATE_DATABASE_OPT_KEY This should default to true when Hudi 
> synchronizes Hive
> -
>
> Key: HUDI-1942
> URL: https://issues.apache.org/jira/browse/HUDI-1942
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: newbie
>Reporter: yao.zhou
>Assignee: Vinay
>Priority: Major
>  Labels: easy-fix
> Fix For: 0.8.0, 0.9.0
>
>
> HIVE_AUTO_CREATE_DATABASE_OPT_KEY = 
> "hoodie.datasource.hive_sync.auto_create_database"
> DEFAULT_HIVE_AUTO_CREATE_DATABASE_OPT_KEY = "true"
> in HoodieSparkSqlWriter.buildSyncConfig 
> hiveSyncConfig.autoCreateDatabase = 
> parameters.get(HIVE_AUTO_CREATE_DATABASE_OPT_KEY).exists(r => r.toBoolean)
>  * This method sets the parameter to false



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1942) HIVE_AUTO_CREATE_DATABASE_OPT_KEY This should default to true when Hudi synchronizes Hive

2021-06-05 Thread Vinay (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17357776#comment-17357776
 ] 

Vinay commented on HUDI-1942:
-

Tried to test code snippet locally

 
{code:java}
scala> var parameters = Map("hoodie.datasource.hive_sync.auto_create_database" 
-> "true")
scala> val a = 
parameters.get("hoodie.datasource.hive_sync.auto_create_database").exists(r => 
r.toBoolean)
scala> print(a)
true
{code}
This means the only way we will get false value if parameters map does not 
contain  HIVE_AUTO_CREATE_DATABASE_OPT_KEY , one way is to use  
parameters.getOrElse like we use in `DataSourceUtils.buildHiveSyncConfig` 
method - 
https://github.com/apache/hudi/blob/master/hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/DataSourceUtils.java#L271

 

 

> HIVE_AUTO_CREATE_DATABASE_OPT_KEY This should default to true when Hudi 
> synchronizes Hive
> -
>
> Key: HUDI-1942
> URL: https://issues.apache.org/jira/browse/HUDI-1942
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: newbie
>Reporter: yao.zhou
>Assignee: Vinay
>Priority: Major
>  Labels: easy-fix
> Fix For: 0.8.0, 0.9.0
>
>
> HIVE_AUTO_CREATE_DATABASE_OPT_KEY = 
> "hoodie.datasource.hive_sync.auto_create_database"
> DEFAULT_HIVE_AUTO_CREATE_DATABASE_OPT_KEY = "true"
> in HoodieSparkSqlWriter.buildSyncConfig 
> hiveSyncConfig.autoCreateDatabase = 
> parameters.get(HIVE_AUTO_CREATE_DATABASE_OPT_KEY).exists(r => r.toBoolean)
>  * This method sets the parameter to false



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1281) deltacommit is not part of ActionType used in HoodieArchivedMetaEntry

2021-06-04 Thread Vinay (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17357761#comment-17357761
 ] 

Vinay commented on HUDI-1281:
-

cf90f17732e313cf71248a8baaf10307463d9b6e

> deltacommit is not part of ActionType used in HoodieArchivedMetaEntry 
> --
>
> Key: HUDI-1281
> URL: https://issues.apache.org/jira/browse/HUDI-1281
> Project: Apache Hudi
>  Issue Type: Wish
>  Components: Code Cleanup
>Reporter: satish
>Assignee: Vinay
>Priority: Minor
>  Labels: code-cleanup, pull-request-available
>
> incubator-hudi/hudi-common/src/main/java/org/apache/hudi/common/model/ActionType.java
>   does not include deltacommit 
> Both commit/deltacommit use 'commit' command which can be confusing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-1281) deltacommit is not part of ActionType used in HoodieArchivedMetaEntry

2021-06-04 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay resolved HUDI-1281.
-
Resolution: Fixed

> deltacommit is not part of ActionType used in HoodieArchivedMetaEntry 
> --
>
> Key: HUDI-1281
> URL: https://issues.apache.org/jira/browse/HUDI-1281
> Project: Apache Hudi
>  Issue Type: Wish
>  Components: Code Cleanup
>Reporter: satish
>Assignee: Vinay
>Priority: Minor
>  Labels: code-cleanup, pull-request-available
>
> incubator-hudi/hudi-common/src/main/java/org/apache/hudi/common/model/ActionType.java
>   does not include deltacommit 
> Both commit/deltacommit use 'commit' command which can be confusing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-1281) deltacommit is not part of ActionType used in HoodieArchivedMetaEntry

2021-06-04 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay closed HUDI-1281.
---

> deltacommit is not part of ActionType used in HoodieArchivedMetaEntry 
> --
>
> Key: HUDI-1281
> URL: https://issues.apache.org/jira/browse/HUDI-1281
> Project: Apache Hudi
>  Issue Type: Wish
>  Components: Code Cleanup
>Reporter: satish
>Assignee: Vinay
>Priority: Minor
>  Labels: code-cleanup, pull-request-available
>
> incubator-hudi/hudi-common/src/main/java/org/apache/hudi/common/model/ActionType.java
>   does not include deltacommit 
> Both commit/deltacommit use 'commit' command which can be confusing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1148) Revisit log messages seen when wiriting or reading through Hudi

2021-06-02 Thread Vinay (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17356174#comment-17356174
 ] 

Vinay commented on HUDI-1148:
-

[~vbalaji] I can take a look at this, looking at the code we printing the 
entire hadoop conf

```

LOG.info(String.format("Hadoop Configuration: fs.defaultFS: [%s], Config:[%s], 
FileSystem: [%s]",
 conf.getRaw("fs.defaultFS"), conf.toString(), fs.toString()));

```

Will make this a debug log. Also, will check the logs while running with 
HudiDeltaStreamer and writing to CoW/MoR table

> Revisit log messages seen when wiriting or reading through Hudi
> ---
>
> Key: HUDI-1148
> URL: https://issues.apache.org/jira/browse/HUDI-1148
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Affects Versions: 0.9.0
>Reporter: Balaji Varadarajan
>Assignee: Vinay
>Priority: Minor
> Fix For: 0.9.0
>
>
> [https://github.com/apache/hudi/issues/1906]
>  
> Some of these Log messages can be made debug. We need to generally see the 
> verbosity of log messages when running hudi operations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1148) Revisit log messages seen when wiriting or reading through Hudi

2021-06-02 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HUDI-1148:

Status: In Progress  (was: Open)

> Revisit log messages seen when wiriting or reading through Hudi
> ---
>
> Key: HUDI-1148
> URL: https://issues.apache.org/jira/browse/HUDI-1148
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Affects Versions: 0.9.0
>Reporter: Balaji Varadarajan
>Assignee: Vinay
>Priority: Minor
> Fix For: 0.9.0
>
>
> [https://github.com/apache/hudi/issues/1906]
>  
> Some of these Log messages can be made debug. We need to generally see the 
> verbosity of log messages when running hudi operations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-1148) Revisit log messages seen when wiriting or reading through Hudi

2021-06-02 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay reassigned HUDI-1148:
---

Assignee: Vinay

> Revisit log messages seen when wiriting or reading through Hudi
> ---
>
> Key: HUDI-1148
> URL: https://issues.apache.org/jira/browse/HUDI-1148
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Affects Versions: 0.9.0
>Reporter: Balaji Varadarajan
>Assignee: Vinay
>Priority: Minor
> Fix For: 0.9.0
>
>
> [https://github.com/apache/hudi/issues/1906]
>  
> Some of these Log messages can be made debug. We need to generally see the 
> verbosity of log messages when running hudi operations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1281) deltacommit is not part of ActionType used in HoodieArchivedMetaEntry

2021-05-31 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HUDI-1281:

Status: Open  (was: New)

> deltacommit is not part of ActionType used in HoodieArchivedMetaEntry 
> --
>
> Key: HUDI-1281
> URL: https://issues.apache.org/jira/browse/HUDI-1281
> Project: Apache Hudi
>  Issue Type: Wish
>  Components: Code Cleanup
>Reporter: satish
>Assignee: Vinay
>Priority: Minor
>  Labels: code-cleanup
>
> incubator-hudi/hudi-common/src/main/java/org/apache/hudi/common/model/ActionType.java
>   does not include deltacommit 
> Both commit/deltacommit use 'commit' command which can be confusing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1281) deltacommit is not part of ActionType used in HoodieArchivedMetaEntry

2021-05-31 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HUDI-1281:

Status: In Progress  (was: Open)

> deltacommit is not part of ActionType used in HoodieArchivedMetaEntry 
> --
>
> Key: HUDI-1281
> URL: https://issues.apache.org/jira/browse/HUDI-1281
> Project: Apache Hudi
>  Issue Type: Wish
>  Components: Code Cleanup
>Reporter: satish
>Assignee: Vinay
>Priority: Minor
>  Labels: code-cleanup
>
> incubator-hudi/hudi-common/src/main/java/org/apache/hudi/common/model/ActionType.java
>   does not include deltacommit 
> Both commit/deltacommit use 'commit' command which can be confusing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HUDI-1281) deltacommit is not part of ActionType used in HoodieArchivedMetaEntry

2021-05-31 Thread Vinay (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17354433#comment-17354433
 ] 

Vinay edited comment on HUDI-1281 at 5/31/21, 12:29 PM:


[~shivnarayan] Since there is no activity on this ticket, I can pick this up, I 
see we have not yet added `deltacommit` to ActionType enum. 
{code:java}
public enum ActionType
{ //TODO HUDI-1281 make deltacommit part of this commit, savepoint, compaction, 
clean, rollback, replacecommit }
 
{code}
 


was (Author: vinaypatil18):
[~shivnarayan] Since there is no activity on this ticket, I can pick this up, I 
see we have not yet added `deltacommit` to ActionType enum. 

```

public enum ActionType {
 //TODO HUDI-1281 make deltacommit part of this
 commit, savepoint, compaction, clean, rollback, replacecommit
}
```

 

> deltacommit is not part of ActionType used in HoodieArchivedMetaEntry 
> --
>
> Key: HUDI-1281
> URL: https://issues.apache.org/jira/browse/HUDI-1281
> Project: Apache Hudi
>  Issue Type: Wish
>  Components: Code Cleanup
>Reporter: satish
>Assignee: Vinay
>Priority: Minor
>  Labels: code-cleanup
>
> incubator-hudi/hudi-common/src/main/java/org/apache/hudi/common/model/ActionType.java
>   does not include deltacommit 
> Both commit/deltacommit use 'commit' command which can be confusing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-1281) deltacommit is not part of ActionType used in HoodieArchivedMetaEntry

2021-05-31 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay reassigned HUDI-1281:
---

Assignee: Vinay

> deltacommit is not part of ActionType used in HoodieArchivedMetaEntry 
> --
>
> Key: HUDI-1281
> URL: https://issues.apache.org/jira/browse/HUDI-1281
> Project: Apache Hudi
>  Issue Type: Wish
>  Components: Code Cleanup
>Reporter: satish
>Assignee: Vinay
>Priority: Minor
>  Labels: code-cleanup
>
> incubator-hudi/hudi-common/src/main/java/org/apache/hudi/common/model/ActionType.java
>   does not include deltacommit 
> Both commit/deltacommit use 'commit' command which can be confusing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1281) deltacommit is not part of ActionType used in HoodieArchivedMetaEntry

2021-05-31 Thread Vinay (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17354433#comment-17354433
 ] 

Vinay commented on HUDI-1281:
-

[~shivnarayan] Since there is no activity on this ticket, I can pick this up, I 
see we have not yet added `deltacommit` to ActionType enum. 

```

public enum ActionType {
 //TODO HUDI-1281 make deltacommit part of this
 commit, savepoint, compaction, clean, rollback, replacecommit
}
```

 

> deltacommit is not part of ActionType used in HoodieArchivedMetaEntry 
> --
>
> Key: HUDI-1281
> URL: https://issues.apache.org/jira/browse/HUDI-1281
> Project: Apache Hudi
>  Issue Type: Wish
>  Components: Code Cleanup
>Reporter: satish
>Assignee: Vinay
>Priority: Minor
>  Labels: code-cleanup
>
> incubator-hudi/hudi-common/src/main/java/org/apache/hudi/common/model/ActionType.java
>   does not include deltacommit 
> Both commit/deltacommit use 'commit' command which can be confusing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   >