[jira] [Updated] (HUDI-1930) Bootstrap support configure KeyGenerator by type

2021-07-03 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-1930:
---
Fix Version/s: 0.9.0

> Bootstrap support configure KeyGenerator by type
> 
>
> Key: HUDI-1930
> URL: https://issues.apache.org/jira/browse/HUDI-1930
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Xianghu Wang
>Assignee: Xianghu Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-1930) Bootstrap support configure KeyGenerator by type

2021-07-03 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-1930.
--
Resolution: Implemented

> Bootstrap support configure KeyGenerator by type
> 
>
> Key: HUDI-1930
> URL: https://issues.apache.org/jira/browse/HUDI-1930
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Xianghu Wang
>Assignee: Xianghu Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2132) Make coordinator events as POJO for efficient serialization

2021-07-05 Thread vinoyang (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17375078#comment-17375078
 ] 

vinoyang commented on HUDI-2132:


32bd8ce088e0f1d82577575ac048e1a44d44e380

> Make coordinator events as POJO for efficient serialization
> ---
>
> Key: HUDI-2132
> URL: https://issues.apache.org/jira/browse/HUDI-2132
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-2106) Fix flink batch compaction bug while user don't set compaction tasks

2021-07-05 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang reassigned HUDI-2106:
--

Assignee: Zheng yunhong

> Fix flink batch compaction bug while user don't set compaction tasks
> 
>
> Key: HUDI-2106
> URL: https://issues.apache.org/jira/browse/HUDI-2106
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Reporter: Zheng yunhong
>Assignee: Zheng yunhong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> There is a bug in flink batch compaction while we did not set compaction 
> tasks, the compaction tasks would always default value instead of 
> compactionPlan operations size.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-2106) Fix flink batch compaction bug while user don't set compaction tasks

2021-07-05 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-2106.
--
Resolution: Fixed

bc313727e3e89640edad85364022e057c9864ee9

> Fix flink batch compaction bug while user don't set compaction tasks
> 
>
> Key: HUDI-2106
> URL: https://issues.apache.org/jira/browse/HUDI-2106
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Reporter: Zheng yunhong
>Assignee: Zheng yunhong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> There is a bug in flink batch compaction while we did not set compaction 
> tasks, the compaction tasks would always default value instead of 
> compactionPlan operations size.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-2111) Update docs about bootstrap support configure KeyGenerator by type

2021-07-05 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-2111.
--
Resolution: Done

0a6e48dd23c73a1ef34852396291cfec388bb0ca

> Update docs about bootstrap support configure KeyGenerator by type
> --
>
> Key: HUDI-2111
> URL: https://issues.apache.org/jira/browse/HUDI-2111
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Xianghu Wang
>Assignee: Xianghu Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-2136) Fix packet conflict when flink-sql-connector-hive and hudi-flink-bundle both in flink lib

2021-07-08 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang reassigned HUDI-2136:
--

Assignee: Zheng yunhong

> Fix packet conflict when flink-sql-connector-hive and hudi-flink-bundle both 
> in flink lib
> -
>
> Key: HUDI-2136
> URL: https://issues.apache.org/jira/browse/HUDI-2136
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Reporter: Zheng yunhong
>Assignee: Zheng yunhong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Fix packet conflict when flink-sql-connector-hive and hudi-flink-bundle both 
> in flink lib.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-2136) Fix packet conflict when flink-sql-connector-hive and hudi-flink-bundle both in flink lib

2021-07-08 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-2136.
--
Resolution: Fixed

047d956e01b6d7c92320686d8321b2bbe9d2188e

> Fix packet conflict when flink-sql-connector-hive and hudi-flink-bundle both 
> in flink lib
> -
>
> Key: HUDI-2136
> URL: https://issues.apache.org/jira/browse/HUDI-2136
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Reporter: Zheng yunhong
>Assignee: Zheng yunhong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Fix packet conflict when flink-sql-connector-hive and hudi-flink-bundle both 
> in flink lib.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-2087) Support Append only in Flink stream

2021-07-09 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-2087.
--
Fix Version/s: 0.9.0
   Resolution: Done

371526789d663dee85041eb31c27c52c81ef87ef

> Support Append only in Flink stream
> ---
>
> Key: HUDI-2087
> URL: https://issues.apache.org/jira/browse/HUDI-2087
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: yuzhaojing
>Assignee: yuzhaojing
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
> Attachments: image-2021-07-08-22-04-30-039.png, 
> image-2021-07-08-22-04-40-018.png
>
>
> It is necessary to support append mode in flink stream, as the data lake 
> should be able to write log type data as parquet high performance without 
> merge.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-2147) Remove unused class AvroConvertor in hudi-flink

2021-07-09 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-2147.
--
Resolution: Done

3b2a4f2b6b49e13997292ecafa9accdd3e7b9efd

> Remove unused class AvroConvertor in hudi-flink
> ---
>
> Key: HUDI-2147
> URL: https://issues.apache.org/jira/browse/HUDI-2147
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Xianghu Wang
>Assignee: Xianghu Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2147) Remove unused class AvroConvertor in hudi-flink

2021-07-09 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-2147:
---
Fix Version/s: 0.9.0

> Remove unused class AvroConvertor in hudi-flink
> ---
>
> Key: HUDI-2147
> URL: https://issues.apache.org/jira/browse/HUDI-2147
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Xianghu Wang
>Assignee: Xianghu Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-2143) Tweak the default compaction target IO to 500GB when flink async compaction is off

2021-07-10 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-2143.
--
Resolution: Done

942a024e74af52e09cabbfe967f5da0ef108bdbb

> Tweak the default compaction target IO to 500GB when flink async compaction 
> is off
> --
>
> Key: HUDI-2143
> URL: https://issues.apache.org/jira/browse/HUDI-2143
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-2142) Support setting bucket assign parallelism for flink write task

2021-07-10 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-2142.
--
Resolution: Implemented

9b01d2a04520db6230cd16ef2b29013c013b1944

> Support setting bucket assign parallelism for flink write task
> --
>
> Key: HUDI-2142
> URL: https://issues.apache.org/jira/browse/HUDI-2142
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: Zheng yunhong
>Assignee: Zheng yunhong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Support setting bucket assign parallelism for flink write task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-2142) Support setting bucket assign parallelism for flink write task

2021-07-10 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang reassigned HUDI-2142:
--

Assignee: Zheng yunhong

> Support setting bucket assign parallelism for flink write task
> --
>
> Key: HUDI-2142
> URL: https://issues.apache.org/jira/browse/HUDI-2142
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: Zheng yunhong
>Assignee: Zheng yunhong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Support setting bucket assign parallelism for flink write task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2165) Support Transformer for HoodieFlinkStreamer

2021-07-12 Thread vinoyang (Jira)
vinoyang created HUDI-2165:
--

 Summary: Support Transformer for HoodieFlinkStreamer
 Key: HUDI-2165
 URL: https://issues.apache.org/jira/browse/HUDI-2165
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: vinoyang


Hoodie's delta streamer support {{Transformer}} , we can also provide this 
feature for {{HoodieFlinkStreamer}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-2165) Support Transformer for HoodieFlinkStreamer

2021-07-13 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang reassigned HUDI-2165:
--

Assignee: vinoyang

> Support Transformer for HoodieFlinkStreamer
> ---
>
> Key: HUDI-2165
> URL: https://issues.apache.org/jira/browse/HUDI-2165
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>
> Hoodie's delta streamer support {{Transformer}} , we can also provide this 
> feature for {{HoodieFlinkStreamer}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2165) Support Transformer for HoodieFlinkStreamer

2021-07-14 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-2165:
---
Fix Version/s: 0.9.0

> Support Transformer for HoodieFlinkStreamer
> ---
>
> Key: HUDI-2165
> URL: https://issues.apache.org/jira/browse/HUDI-2165
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Hoodie's delta streamer support {{Transformer}} , we can also provide this 
> feature for {{HoodieFlinkStreamer}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2165) Support Transformer for HoodieFlinkStreamer

2021-07-14 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-2165:
---
Issue Type: New Feature  (was: Improvement)

> Support Transformer for HoodieFlinkStreamer
> ---
>
> Key: HUDI-2165
> URL: https://issues.apache.org/jira/browse/HUDI-2165
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Flink Integration
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Hoodie's delta streamer support {{Transformer}} , we can also provide this 
> feature for {{HoodieFlinkStreamer}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-2165) Support Transformer for HoodieFlinkStreamer

2021-07-14 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-2165.
--
Resolution: Implemented

52524b659d2cb64403e8ba87d2fefe6d536156e9

> Support Transformer for HoodieFlinkStreamer
> ---
>
> Key: HUDI-2165
> URL: https://issues.apache.org/jira/browse/HUDI-2165
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Flink Integration
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Hoodie's delta streamer support {{Transformer}} , we can also provide this 
> feature for {{HoodieFlinkStreamer}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-409) Replace Log Magic header with a secure hash to avoid clashes with data

2020-02-18 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang reassigned HUDI-409:
-

Assignee: Nishith Agarwal

> Replace Log Magic header with a secure hash to avoid clashes with data
> --
>
> Key: HUDI-409
> URL: https://issues.apache.org/jira/browse/HUDI-409
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Common Core
>Reporter: Nishith Agarwal
>Assignee: Nishith Agarwal
>Priority: Major
> Fix For: 0.5.2
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-409) Replace Log Magic header with a secure hash to avoid clashes with data

2020-02-18 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-409:
--
Issue Type: Improvement  (was: Bug)

> Replace Log Magic header with a secure hash to avoid clashes with data
> --
>
> Key: HUDI-409
> URL: https://issues.apache.org/jira/browse/HUDI-409
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Common Core
>Reporter: Nishith Agarwal
>Assignee: Nishith Agarwal
>Priority: Major
> Fix For: 0.5.2
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-622) Remove VisibleForTesting annotation and import from code

2020-02-19 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-622:
--
Status: Closed  (was: Patch Available)

> Remove VisibleForTesting annotation and import from code
> 
>
> Key: HUDI-622
> URL: https://issues.apache.org/jira/browse/HUDI-622
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Code Cleanup
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
>Priority: Minor
>  Labels: patch, pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Remove Guava VisibleForTesting annotation from codebase - part of the code 
> change for HUDI-479



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HUDI-622) Remove VisibleForTesting annotation and import from code

2020-02-19 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang reopened HUDI-622:
---

> Remove VisibleForTesting annotation and import from code
> 
>
> Key: HUDI-622
> URL: https://issues.apache.org/jira/browse/HUDI-622
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Code Cleanup
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
>Priority: Minor
>  Labels: patch, pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Remove Guava VisibleForTesting annotation from codebase - part of the code 
> change for HUDI-479



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-622) Remove VisibleForTesting annotation and import from code

2020-02-19 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang resolved HUDI-622.
---
Resolution: Done

Done via master branch: f9d2f66dc16540e3e5c1cb1f7f23b4fca7c656c3

> Remove VisibleForTesting annotation and import from code
> 
>
> Key: HUDI-622
> URL: https://issues.apache.org/jira/browse/HUDI-622
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Code Cleanup
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
>Priority: Minor
>  Labels: patch, pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Remove Guava VisibleForTesting annotation from codebase - part of the code 
> change for HUDI-479



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-623) Remove UpgradePayloadFromUberToApache

2020-02-19 Thread vinoyang (Jira)
vinoyang created HUDI-623:
-

 Summary: Remove UpgradePayloadFromUberToApache
 Key: HUDI-623
 URL: https://issues.apache.org/jira/browse/HUDI-623
 Project: Apache Hudi (incubating)
  Issue Type: Wish
  Components: Code Cleanup
Reporter: vinoyang
Assignee: wangxianghu
 Fix For: 0.5.2


{{UpgradePayloadFromUberToApache}} used to covert the package names from the 
pattern {{com.uber.hoodie}} to {{org.apache.hudi}}. It's a one-shot work. Since 
we have done this work. IMO, we can remove this class.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-623) Remove UpgradePayloadFromUberToApache

2020-02-19 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-623:
--
Priority: Trivial  (was: Major)

> Remove UpgradePayloadFromUberToApache
> -
>
> Key: HUDI-623
> URL: https://issues.apache.org/jira/browse/HUDI-623
> Project: Apache Hudi (incubating)
>  Issue Type: Wish
>  Components: Code Cleanup
>Reporter: vinoyang
>Assignee: wangxianghu
>Priority: Trivial
> Fix For: 0.5.2
>
>
> {{UpgradePayloadFromUberToApache}} used to covert the package names from the 
> pattern {{com.uber.hoodie}} to {{org.apache.hudi}}. It's a one-shot work. 
> Since we have done this work. IMO, we can remove this class.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-623) Remove UpgradePayloadFromUberToApache

2020-02-19 Thread vinoyang (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17040710#comment-17040710
 ] 

vinoyang commented on HUDI-623:
---

[~vinoth] WDYT?

> Remove UpgradePayloadFromUberToApache
> -
>
> Key: HUDI-623
> URL: https://issues.apache.org/jira/browse/HUDI-623
> Project: Apache Hudi (incubating)
>  Issue Type: Wish
>  Components: Code Cleanup
>Reporter: vinoyang
>Assignee: wangxianghu
>Priority: Trivial
> Fix For: 0.5.2
>
>
> {{UpgradePayloadFromUberToApache}} used to covert the package names from the 
> pattern {{com.uber.hoodie}} to {{org.apache.hudi}}. It's a one-shot work. 
> Since we have done this work. IMO, we can remove this class.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-623) Remove UpgradePayloadFromUberToApache

2020-02-20 Thread vinoyang (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041440#comment-17041440
 ] 

vinoyang commented on HUDI-623:
---

OK, let wait for more release cycle.

> Remove UpgradePayloadFromUberToApache
> -
>
> Key: HUDI-623
> URL: https://issues.apache.org/jira/browse/HUDI-623
> Project: Apache Hudi (incubating)
>  Issue Type: Wish
>  Components: Code Cleanup
>Reporter: vinoyang
>Assignee: wangxianghu
>Priority: Trivial
> Fix For: 0.5.2
>
>
> {{UpgradePayloadFromUberToApache}} used to covert the package names from the 
> pattern {{com.uber.hoodie}} to {{org.apache.hudi}}. It's a one-shot work. 
> Since we have done this work. IMO, we can remove this class.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-624) Split some of the code from PR for HUDI-479

2020-02-20 Thread vinoyang (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041584#comment-17041584
 ] 

vinoyang commented on HUDI-624:
---

Done via master branch: 8f6035de4a0486e996647e1246334123aed0c9d6

> Split some of the code from PR for HUDI-479 
> 
>
> Key: HUDI-624
> URL: https://issues.apache.org/jira/browse/HUDI-624
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Code Cleanup
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
>Priority: Major
>  Labels: patch, pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This Jira is to reduce the size of the code base in PR# 1159 for HUDI-479, 
> making it easier for review.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-624) Split some of the code from PR for HUDI-479

2020-02-20 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-624:
--
Status: Closed  (was: Patch Available)

> Split some of the code from PR for HUDI-479 
> 
>
> Key: HUDI-624
> URL: https://issues.apache.org/jira/browse/HUDI-624
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Code Cleanup
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
>Priority: Major
>  Labels: patch, pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This Jira is to reduce the size of the code base in PR# 1159 for HUDI-479, 
> making it easier for review.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-289) Implement a test suite to support long running test for Hudi writing and querying end-end

2020-02-23 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-289:
--
Fix Version/s: (was: 0.5.2)
   0.6.0

> Implement a test suite to support long running test for Hudi writing and 
> querying end-end
> -
>
> Key: HUDI-289
> URL: https://issues.apache.org/jira/browse/HUDI-289
> Project: Apache Hudi (incubating)
>  Issue Type: Test
>  Components: Usability
>Reporter: Vinoth Chandar
>Assignee: vinoyang
>Priority: Major
> Fix For: 0.6.0
>
>
> We would need an equivalent of an end-end test which runs some workload for 
> few hours atleast, triggers various actions like commit, deltacopmmit, 
> rollback, compaction and ensures correctness of code before every release
> P.S: Learn from all the CSS issues managing compaction..
> The feature branch is here: 
> [https://github.com/apache/incubator-hudi/tree/hudi_test_suite_refactor]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-581) NOTICE need more work as it missing content form included 3rd party ALv2 licensed NOTICE files

2020-02-23 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-581:
--
Priority: Blocker  (was: Major)

> NOTICE need more work as it missing content form included 3rd party ALv2 
> licensed NOTICE files
> --
>
> Key: HUDI-581
> URL: https://issues.apache.org/jira/browse/HUDI-581
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: leesf
>Assignee: Suneel Marthi
>Priority: Blocker
> Fix For: 0.5.2
>
>
> Issues pointed out in general@incubator ML, more context here: 
> [https://lists.apache.org/thread.html/rd3f4a72d82a4a5a81b2c6bd71e1417054daa38637ce8e07901f26f04%40%3Cgeneral.incubator.apache.org%3E]
>  
> Would get it fixed before next release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-580) Incorrect license header in docker/hoodie/hadoop/base/entrypoint.sh

2020-02-24 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-580:
--
Priority: Blocker  (was: Major)

> Incorrect license header in docker/hoodie/hadoop/base/entrypoint.sh
> ---
>
> Key: HUDI-580
> URL: https://issues.apache.org/jira/browse/HUDI-580
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: newbie
>Reporter: leesf
>Assignee: lamber-ken
>Priority: Blocker
>  Labels: compliance
> Fix For: 0.5.2
>
>
> Issues pointed out in general@incubator ML, more context here: 
> [https://lists.apache.org/thread.html/rd3f4a72d82a4a5a81b2c6bd71e1417054daa38637ce8e07901f26f04%40%3Cgeneral.incubator.apache.org%3E]
>  
> Would get it fixed before next release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-580) Incorrect license header in docker/hoodie/hadoop/base/entrypoint.sh

2020-02-24 Thread vinoyang (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044067#comment-17044067
 ] 

vinoyang commented on HUDI-580:
---

[~lamber-ken] I have marked this issue as a blocker.  We need to fix it ASAP.

> Incorrect license header in docker/hoodie/hadoop/base/entrypoint.sh
> ---
>
> Key: HUDI-580
> URL: https://issues.apache.org/jira/browse/HUDI-580
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: newbie
>Reporter: leesf
>Assignee: lamber-ken
>Priority: Blocker
>  Labels: compliance
> Fix For: 0.5.2
>
>
> Issues pointed out in general@incubator ML, more context here: 
> [https://lists.apache.org/thread.html/rd3f4a72d82a4a5a81b2c6bd71e1417054daa38637ce8e07901f26f04%40%3Cgeneral.incubator.apache.org%3E]
>  
> Would get it fixed before next release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-580) Fix incorrect license header in files

2020-02-25 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang resolved HUDI-580.
---
Resolution: Fixed

Fixed via master branch: 159cb060f0502bb01ec4cefaa743d3678b711254

> Fix incorrect license header in files
> -
>
> Key: HUDI-580
> URL: https://issues.apache.org/jira/browse/HUDI-580
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: newbie
>Reporter: leesf
>Assignee: lamber-ken
>Priority: Blocker
>  Labels: compliance, pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Issues pointed out in general@incubator ML, more context here: 
> [https://lists.apache.org/thread.html/rd3f4a72d82a4a5a81b2c6bd71e1417054daa38637ce8e07901f26f04%40%3Cgeneral.incubator.apache.org%3E]
>  
> Would get it fixed before next release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-638) Fix the asf compliant issues based on the maturity model

2020-02-26 Thread vinoyang (Jira)
vinoyang created HUDI-638:
-

 Summary: Fix the asf compliant issues based on the maturity model
 Key: HUDI-638
 URL: https://issues.apache.org/jira/browse/HUDI-638
 Project: Apache Hudi (incubating)
  Issue Type: Task
  Components: Release & Administrative
Reporter: vinoyang
 Fix For: 0.5.2


The Hudi's maturity model link is here: 
https://cwiki.apache.org/confluence/display/HUDI/Apache+Hudi+Maturity+Matrix

We should fix all of the compliant issues ASAP before releasing Hudi 0.5.2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-639) QU20: The project puts a very high priority on producing secure software.

2020-02-26 Thread vinoyang (Jira)
vinoyang created HUDI-639:
-

 Summary: QU20: The project puts a very high priority on producing 
secure software.
 Key: HUDI-639
 URL: https://issues.apache.org/jira/browse/HUDI-639
 Project: Apache Hudi (incubating)
  Issue Type: Sub-task
Reporter: vinoyang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-638) Fix the asf compliant issues based on the maturity model

2020-02-26 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-638:
--
Priority: Blocker  (was: Major)

> Fix the asf compliant issues based on the maturity model
> 
>
> Key: HUDI-638
> URL: https://issues.apache.org/jira/browse/HUDI-638
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Release & Administrative
>Reporter: vinoyang
>Priority: Blocker
> Fix For: 0.5.2
>
>
> The Hudi's maturity model link is here: 
> https://cwiki.apache.org/confluence/display/HUDI/Apache+Hudi+Maturity+Matrix
> We should fix all of the compliant issues ASAP before releasing Hudi 0.5.2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-640) QU30: The project provides a well-documented channel to report security issues, along with a documented way of responding to them.

2020-02-26 Thread vinoyang (Jira)
vinoyang created HUDI-640:
-

 Summary: QU30: The project provides a well-documented channel to 
report security issues, along with a documented way of responding to them.  
  
 Key: HUDI-640
 URL: https://issues.apache.org/jira/browse/HUDI-640
 Project: Apache Hudi (incubating)
  Issue Type: Sub-task
Reporter: vinoyang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-641) CO60: The community operates based on the consensus of its members (see CS10) who have decision power. Dictators, benevolent or not, are not welcome in Apache projects.

2020-02-26 Thread vinoyang (Jira)
vinoyang created HUDI-641:
-

 Summary: CO60: The community operates based on the consensus of 
its members (see CS10) who have decision power. Dictators, benevolent or not, 
are not welcome in Apache projects.
 Key: HUDI-641
 URL: https://issues.apache.org/jira/browse/HUDI-641
 Project: Apache Hudi (incubating)
  Issue Type: Sub-task
Reporter: vinoyang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-642) CO70: The project strives to answer user questions in a timely manner.

2020-02-26 Thread vinoyang (Jira)
vinoyang created HUDI-642:
-

 Summary: CO70: The project strives to answer user questions in a 
timely manner.
 Key: HUDI-642
 URL: https://issues.apache.org/jira/browse/HUDI-642
 Project: Apache Hudi (incubating)
  Issue Type: Sub-task
Reporter: vinoyang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-643) Check and write comment of all the rule items

2020-02-26 Thread vinoyang (Jira)
vinoyang created HUDI-643:
-

 Summary: Check and write comment of all the rule items
 Key: HUDI-643
 URL: https://issues.apache.org/jira/browse/HUDI-643
 Project: Apache Hudi (incubating)
  Issue Type: Sub-task
Reporter: vinoyang


Some rule item does not contain "comment", we should check and write it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-643) Check and write comment for all the rule items

2020-02-26 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-643:
--
Summary: Check and write comment for all the rule items  (was: Check and 
write comment of all the rule items)

> Check and write comment for all the rule items
> --
>
> Key: HUDI-643
> URL: https://issues.apache.org/jira/browse/HUDI-643
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>Reporter: vinoyang
>Priority: Blocker
>
> Some rule item does not contain "comment", we should check and write it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-645) Provide a statement page to describe how to report security issues

2020-02-27 Thread vinoyang (Jira)
vinoyang created HUDI-645:
-

 Summary: Provide a statement page to describe how to report 
security issues
 Key: HUDI-645
 URL: https://issues.apache.org/jira/browse/HUDI-645
 Project: Apache Hudi (incubating)
  Issue Type: Sub-task
  Components: Docs
Reporter: vinoyang
Assignee: vinoyang


This issue point to QU30 of the Apache hudi's maturity model.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-645) Provide a statement page to describe how to report security issues

2020-02-27 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-645:
--
Status: Open  (was: New)

> Provide a statement page to describe how to report security issues
> --
>
> Key: HUDI-645
> URL: https://issues.apache.org/jira/browse/HUDI-645
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Blocker
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This issue point to QU30 of the Apache hudi's maturity model.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-645) Provide a statement page to describe how to report security issues

2020-02-27 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang resolved HUDI-645.
---
Resolution: Done

Done via asf-site branch: 6d844cefaaab3c732e2440f880c2c44a6aeaa474

> Provide a statement page to describe how to report security issues
> --
>
> Key: HUDI-645
> URL: https://issues.apache.org/jira/browse/HUDI-645
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This issue point to QU30 of the Apache hudi's maturity model.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-652) Decouple HoodieReadClient and AbstractHoodieClient to brean the inheritance chain

2020-03-03 Thread vinoyang (Jira)
vinoyang created HUDI-652:
-

 Summary: Decouple HoodieReadClient and AbstractHoodieClient to 
brean the inheritance chain
 Key: HUDI-652
 URL: https://issues.apache.org/jira/browse/HUDI-652
 Project: Apache Hudi (incubating)
  Issue Type: Sub-task
Reporter: vinoyang
Assignee: vinoyang


Since we decide to restructure the {{hudi-client}} module so that the 
write-specific classes could be moved to {{hudi-write-common}}. Currently, 
{{HoodieReadClient}} and {{HoodieWriteClient}} shared the same super class, 
it's {{AbstractHoodieClient}}. To do that, we should decouple 
{{HoodieReadClient}} and {{AbstractHoodieClient}}. Frome the source code, I 
found {{HoodieReadClient}} does not depend on {{AbstractHoodieClient}} deeply.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-652) Decouple HoodieReadClient and AbstractHoodieClient to brean the inheritance chain

2020-03-03 Thread vinoyang (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17050259#comment-17050259
 ] 

vinoyang commented on HUDI-652:
---

[~vinoth] WDYT?

> Decouple HoodieReadClient and AbstractHoodieClient to brean the inheritance 
> chain
> -
>
> Key: HUDI-652
> URL: https://issues.apache.org/jira/browse/HUDI-652
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>
> Since we decide to restructure the {{hudi-client}} module so that the 
> write-specific classes could be moved to {{hudi-write-common}}. Currently, 
> {{HoodieReadClient}} and {{HoodieWriteClient}} shared the same super class, 
> it's {{AbstractHoodieClient}}. To do that, we should decouple 
> {{HoodieReadClient}} and {{AbstractHoodieClient}}. Frome the source code, I 
> found {{HoodieReadClient}} does not depend on {{AbstractHoodieClient}} deeply.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-652) Decouple HoodieReadClient and AbstractHoodieClient to break the inheritance chain

2020-03-04 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-652:
--
Summary: Decouple HoodieReadClient and AbstractHoodieClient to break the 
inheritance chain  (was: Decouple HoodieReadClient and AbstractHoodieClient to 
brean the inheritance chain)

> Decouple HoodieReadClient and AbstractHoodieClient to break the inheritance 
> chain
> -
>
> Key: HUDI-652
> URL: https://issues.apache.org/jira/browse/HUDI-652
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>
> Since we decide to restructure the {{hudi-client}} module so that the 
> write-specific classes could be moved to {{hudi-write-common}}. Currently, 
> {{HoodieReadClient}} and {{HoodieWriteClient}} shared the same super class, 
> it's {{AbstractHoodieClient}}. To do that, we should decouple 
> {{HoodieReadClient}} and {{AbstractHoodieClient}}. Frome the source code, I 
> found {{HoodieReadClient}} does not depend on {{AbstractHoodieClient}} deeply.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-658) Make ClientUtils spark-free

2020-03-05 Thread vinoyang (Jira)
vinoyang created HUDI-658:
-

 Summary: Make ClientUtils spark-free
 Key: HUDI-658
 URL: https://issues.apache.org/jira/browse/HUDI-658
 Project: Apache Hudi (incubating)
  Issue Type: Sub-task
Reporter: vinoyang
Assignee: vinoyang


{{ClientUtils#createMetaClient}} require {{JavaSparkContext}} only for getting 
the hadoop configuration obejct. We can pass the {{Configuration}} object 
directly so that we can make {{ClientUtils}} spark-free.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-658) Make ClientUtils spark-free

2020-03-05 Thread vinoyang (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17051899#comment-17051899
 ] 

vinoyang commented on HUDI-658:
---

[~vinoth] WDYT?

> Make ClientUtils spark-free
> ---
>
> Key: HUDI-658
> URL: https://issues.apache.org/jira/browse/HUDI-658
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>
> {{ClientUtils#createMetaClient}} require {{JavaSparkContext}} only for 
> getting the hadoop configuration obejct. We can pass the {{Configuration}} 
> object directly so that we can make {{ClientUtils}} spark-free.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-659) Make HoodieCommitArchiveLog spark free

2020-03-05 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-659:
--
Description: Currently, {{HoodieCommitArchiveLog}} depends on 
{{JavaSparkContext}} in its two methods: {{archiveIfRequired}} and 
{{getInstantsToArchive}}. These two methods pass {{JavaSparkContext}} to get 
{{HoodieTable}} object. After diving into the call chain, I found we can 
replace {{JavaSparkContext}} with {{Configuration}} and other some cleanup(e.g. 
HUDI-658). After that, we can make {{HoodieCommitArchiveLog}} spark free.  
(was: Currently, {{HoodieCommitArchiveLog}} depends on {{JavaSparkContext}} in 
its two methods: {{archiveIfRequired}} and {{getInstantsToArchive}}. These two 
methods pass {{JavaSparkContext}} to get {{HoodieTable}} object. After diving 
into the call chain, I found we can replace {{JavaSparkContext}} with 
{{Configuration}} and other some cleanup. After that, we can make 
{{HoodieCommitArchiveLog}} spark free.)

> Make HoodieCommitArchiveLog spark free
> --
>
> Key: HUDI-659
> URL: https://issues.apache.org/jira/browse/HUDI-659
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>
> Currently, {{HoodieCommitArchiveLog}} depends on {{JavaSparkContext}} in its 
> two methods: {{archiveIfRequired}} and {{getInstantsToArchive}}. These two 
> methods pass {{JavaSparkContext}} to get {{HoodieTable}} object. After diving 
> into the call chain, I found we can replace {{JavaSparkContext}} with 
> {{Configuration}} and other some cleanup(e.g. HUDI-658). After that, we can 
> make {{HoodieCommitArchiveLog}} spark free.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-659) Make HoodieCommitArchiveLog spark free

2020-03-05 Thread vinoyang (Jira)
vinoyang created HUDI-659:
-

 Summary: Make HoodieCommitArchiveLog spark free
 Key: HUDI-659
 URL: https://issues.apache.org/jira/browse/HUDI-659
 Project: Apache Hudi (incubating)
  Issue Type: Sub-task
Reporter: vinoyang
Assignee: vinoyang


Currently, {{HoodieCommitArchiveLog}} depends on {{JavaSparkContext}} in its 
two methods: {{archiveIfRequired}} and {{getInstantsToArchive}}. These two 
methods pass {{JavaSparkContext}} to get {{HoodieTable}} object. After diving 
into the call chain, I found we can replace {{JavaSparkContext}} with 
{{Configuration}} and other some cleanup. After that, we can make 
{{HoodieCommitArchiveLog}} spark free.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-660) Make HoodieWriteConfig spark free

2020-03-05 Thread vinoyang (Jira)
vinoyang created HUDI-660:
-

 Summary: Make HoodieWriteConfig spark free
 Key: HUDI-660
 URL: https://issues.apache.org/jira/browse/HUDI-660
 Project: Apache Hudi (incubating)
  Issue Type: Sub-task
Reporter: vinoyang
Assignee: vinoyang


Currently, {{HoodieWriteConfig}} related to 
{{org.apache.spark.storage.StorageLevel}}. Considering {{StorageLevel}} 
provides {{StorageLevel.fromString}} method. IMO, we can just store the string 
representation. Further more, we can split some spark-job specific config 
options into a sub config class e.g. {{HoodieSparkWriteConfig}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-538) Restructuring hudi client module for multi engine support

2020-03-05 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-538:
--
Description: Hudi is currently tightly coupled with the Spark framework. It 
caused the integration with other computing engine more difficult. We plan to 
decouple it with Spark. This umbrella issue used to track this work.  (was: 
Hudi is currently tightly coupled with the Spark framework. It caused the 
integration with other computing engine more difficult. We plan to decouple it 
with Spark. This umbrella issue used to track this work.

Some thoughts wrote here: 
https://docs.google.com/document/d/1Q9w_4K6xzGbUrtTS0gAlzNYOmRXjzNUdbbe0q59PX9w/edit?usp=sharing

The feature branch is {{restructure-hudi-client}}.)

> Restructuring hudi client module for multi engine support
> -
>
> Key: HUDI-538
> URL: https://issues.apache.org/jira/browse/HUDI-538
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Code Cleanup
>Reporter: vinoyang
>Priority: Major
> Fix For: 0.6.0
>
>
> Hudi is currently tightly coupled with the Spark framework. It caused the 
> integration with other computing engine more difficult. We plan to decouple 
> it with Spark. This umbrella issue used to track this work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-661) Make EmbeddedTimelineService spark free

2020-03-05 Thread vinoyang (Jira)
vinoyang created HUDI-661:
-

 Summary: Make EmbeddedTimelineService spark free
 Key: HUDI-661
 URL: https://issues.apache.org/jira/browse/HUDI-661
 Project: Apache Hudi (incubating)
  Issue Type: Sub-task
Reporter: vinoyang
Assignee: vinoyang


Currently, {{EmbeddedTimelineService}} owns {{SparkConf}} to get 
{{spark.driver.host}}. The value is a string. We can pass it from the outside 
instead of depending on {{SparkConf}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-401) Remove unnecessary use of spark in savepoint timeline

2020-03-06 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-401:
--
Parent: HUDI-538
Issue Type: Sub-task  (was: Improvement)

> Remove unnecessary use of spark in savepoint timeline
> -
>
> Key: HUDI-401
> URL: https://issues.apache.org/jira/browse/HUDI-401
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: CLI, Writer Core
>Reporter: hong dongdong
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently, javasparkcontext was inited when savepoint create, but it is not 
> necessary.  Javasparkcontext's whole work is provide hadoopconfig, but need 
> time and resources to init it. 
> So we can use hadoop config instead of jsc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-643) Check and write comment for all the rule items

2020-03-06 Thread vinoyang (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053931#comment-17053931
 ] 

vinoyang commented on HUDI-643:
---

[~smarthi] Do you think the "comment" of all rules in the maturity model table 
should have content? If no, I will close this ticket.

> Check and write comment for all the rule items
> --
>
> Key: HUDI-643
> URL: https://issues.apache.org/jira/browse/HUDI-643
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>Reporter: vinoyang
>Priority: Blocker
>
> Some rule item does not contain "comment", we should check and write it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-639) QU20: The project puts a very high priority on producing secure software.

2020-03-06 Thread vinoyang (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053932#comment-17053932
 ] 

vinoyang commented on HUDI-639:
---

[~smarthi] After finishing HUDI-640, we will close this ticket.

> QU20: The project puts a very high priority on producing secure software.
> -
>
> Key: HUDI-639
> URL: https://issues.apache.org/jira/browse/HUDI-639
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>Reporter: vinoyang
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-640) QU30: The project provides a well-documented channel to report security issues, along with a documented way of responding to them.

2020-03-07 Thread vinoyang (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053933#comment-17053933
 ] 

vinoyang commented on HUDI-640:
---

After [~vinoth] republish the website, we will have a link about describing how 
to report security issues. Then, [~smarthi]  you can fix QU30 rule(type the 
link in the "comment" field). And we can close this issue.

> QU30: The project provides a well-documented channel to report security 
> issues, along with a documented way of responding to them.
> ---
>
> Key: HUDI-640
> URL: https://issues.apache.org/jira/browse/HUDI-640
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>Reporter: vinoyang
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-641) CO60: The community operates based on the consensus of its members (see CS10) who have decision power. Dictators, benevolent or not, are not welcome in Apache projects.

2020-03-07 Thread vinoyang (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053938#comment-17053938
 ] 

vinoyang commented on HUDI-641:
---

[~smarthi] If you think we are not necessary to do something about CO60, we can 
close it directly.

> CO60: The community operates based on the consensus of its members (see CS10) 
> who have decision power. Dictators, benevolent or not, are not welcome in 
> Apache projects.
> 
>
> Key: HUDI-641
> URL: https://issues.apache.org/jira/browse/HUDI-641
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>Reporter: vinoyang
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-676) Address issues towards removing use of WIP Disclaimer

2020-03-07 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-676:
--
Status: Open  (was: New)

> Address issues towards removing use of WIP Disclaimer
> -
>
> Key: HUDI-676
> URL: https://issues.apache.org/jira/browse/HUDI-676
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Release & Administrative
>Reporter: Vinoth Chandar
>Assignee: vinoyang
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> https://incubator.apache.org/guides/releasemanagement.html#choice_of_disclaimers
> Currently we use the
> https://github.com/apache/incubator-hudi/blob/master/DISCLAIMER-WIP 
> Addressing gaps to remove this is also a good step in the direction of being 
> compliant 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-676) Address issues towards removing use of WIP Disclaimer

2020-03-07 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-676:
--
Status: In Progress  (was: Open)

> Address issues towards removing use of WIP Disclaimer
> -
>
> Key: HUDI-676
> URL: https://issues.apache.org/jira/browse/HUDI-676
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Release & Administrative
>Reporter: Vinoth Chandar
>Assignee: vinoyang
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> https://incubator.apache.org/guides/releasemanagement.html#choice_of_disclaimers
> Currently we use the
> https://github.com/apache/incubator-hudi/blob/master/DISCLAIMER-WIP 
> Addressing gaps to remove this is also a good step in the direction of being 
> compliant 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-677) Abstract/Refactor all transaction management logic into a set of classes from HoodieWriteClient

2020-03-08 Thread vinoyang (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17054339#comment-17054339
 ] 

vinoyang commented on HUDI-677:
---

[~vinoth], [~hongdongdong] want to give a try with this issue, I will discuss 
with him.

> Abstract/Refactor all transaction management logic into a set of classes from 
> HoodieWriteClient
> ---
>
> Key: HUDI-677
> URL: https://issues.apache.org/jira/browse/HUDI-677
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Code Cleanup
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-681) Remove the dependency of EmbeddedTimelineService from HoodieReadClient

2020-03-08 Thread vinoyang (Jira)
vinoyang created HUDI-681:
-

 Summary: Remove the dependency of EmbeddedTimelineService from 
HoodieReadClient
 Key: HUDI-681
 URL: https://issues.apache.org/jira/browse/HUDI-681
 Project: Apache Hudi (incubating)
  Issue Type: Sub-task
  Components: Code Cleanup
Reporter: vinoyang
Assignee: hong dongdong


After decoupling {{HoodieReadClient}} and {{AbstractHoodieClient}}, we can 
remove the {{EmbeddedTimelineService}} from {{HoodieReadClient}} so that we can 
remove {{HoodieReadClient}} into hudi-spark module.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-682) Move HoodieReadClient into hudi-spark module

2020-03-08 Thread vinoyang (Jira)
vinoyang created HUDI-682:
-

 Summary: Move HoodieReadClient into hudi-spark module
 Key: HUDI-682
 URL: https://issues.apache.org/jira/browse/HUDI-682
 Project: Apache Hudi (incubating)
  Issue Type: Sub-task
  Components: Code Cleanup
Reporter: vinoyang
Assignee: vinoyang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-662) Attribute binary dependencies that are included in the bundle jars

2020-03-08 Thread vinoyang (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17054622#comment-17054622
 ] 

vinoyang commented on HUDI-662:
---

+1 too

> Attribute binary dependencies that are included in the bundle jars
> --
>
> Key: HUDI-662
> URL: https://issues.apache.org/jira/browse/HUDI-662
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Release & Administrative
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Blocker
> Fix For: 0.5.2
>
>
> https://www.apache.org/legal/resolved.html is the comprehensive guide here.
> http://www.apache.org/dev/licensing-howto.html is the comprehensive guide 
> here.
> Previously, we asked about some specific dependencies here
> https://issues.apache.org/jira/browse/LEGAL-461



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-681) Remove the dependency of EmbeddedTimelineService from HoodieReadClient

2020-03-09 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-681.
-
Resolution: Done

Done via master branch: f93e64fee413ed1b774156e688794ee7937cc01a

> Remove the dependency of EmbeddedTimelineService from HoodieReadClient
> --
>
> Key: HUDI-681
> URL: https://issues.apache.org/jira/browse/HUDI-681
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Code Cleanup
>Reporter: vinoyang
>Assignee: hong dongdong
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> After decoupling {{HoodieReadClient}} and {{AbstractHoodieClient}}, we can 
> remove the {{EmbeddedTimelineService}} from {{HoodieReadClient}} so that we 
> can remove {{HoodieReadClient}} into hudi-spark module.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-681) Remove the dependency of EmbeddedTimelineService from HoodieReadClient

2020-03-09 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-681:
--
Status: Open  (was: New)

> Remove the dependency of EmbeddedTimelineService from HoodieReadClient
> --
>
> Key: HUDI-681
> URL: https://issues.apache.org/jira/browse/HUDI-681
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Code Cleanup
>Reporter: vinoyang
>Assignee: hong dongdong
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> After decoupling {{HoodieReadClient}} and {{AbstractHoodieClient}}, we can 
> remove the {{EmbeddedTimelineService}} from {{HoodieReadClient}} so that we 
> can remove {{HoodieReadClient}} into hudi-spark module.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-682) Move HoodieReadClient into hudi-spark module

2020-03-09 Thread vinoyang (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17054841#comment-17054841
 ] 

vinoyang commented on HUDI-682:
---

[~vinoth] Do you think we can do this work right now?

> Move HoodieReadClient into hudi-spark module
> 
>
> Key: HUDI-682
> URL: https://issues.apache.org/jira/browse/HUDI-682
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Code Cleanup
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-590) Cut a new Doc version 0.5.1 explicitly

2020-03-09 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-590:
--
Fix Version/s: 0.5.2

> Cut a new Doc version 0.5.1 explicitly
> --
>
> Key: HUDI-590
> URL: https://issues.apache.org/jira/browse/HUDI-590
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Docs, Release & Administrative
>Reporter: Bhavani Sudha
>Assignee: Bhavani Sudha
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The latest version of docs needs to be tagged as 0.5.1 explicitly in the 
> site. Follow instructions in 
> [https://github.com/apache/incubator-hudi/blob/asf-site/README.md#updating-site]
>  to create a new dir 0.5.1 under docs/_docs/ 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-430) Design Inline FileSystem which supports embedding any file format (parquet/avro/etc)

2020-03-09 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-430:
--
Fix Version/s: (was: 0.5.2)
   0.6.0

> Design Inline FileSystem which supports embedding any file format 
> (parquet/avro/etc) 
> -
>
> Key: HUDI-430
> URL: https://issues.apache.org/jira/browse/HUDI-430
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Storage Management
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Basically the log file should be capable of embedding any file format. In 
> other words, if parquet is embedded, direct parquet reader should work on 
> reading the content directly. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-434) Design and develop HFile based Index using InlineFS

2020-03-09 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-434:
--
Fix Version/s: (was: 0.5.2)
   0.6.0

> Design and develop HFile based Index using InlineFS
> ---
>
> Key: HUDI-434
> URL: https://issues.apache.org/jira/browse/HUDI-434
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Storage Management
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-251) JDBC incremental load to HUDI with DeltaStreamer

2020-03-09 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-251:
--
Fix Version/s: (was: 0.5.2)
   0.6.0

> JDBC incremental load to HUDI with DeltaStreamer
> 
>
> Key: HUDI-251
> URL: https://issues.apache.org/jira/browse/HUDI-251
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: DeltaStreamer
>Reporter: Taher Koitawala
>Assignee: Purushotham Pushpavanthar
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Mirroring RDBMS to HUDI is one of the most basic use cases of HUDI. Hence, 
> for such use cases, DeltaStreamer should provide inbuilt support.
> DeltaSteamer should accept something like jdbc-source.properties where users 
> can define the RDBMS connection properties along with a timestamp column and 
> an interval which allows users to express how frequently HUDI should check 
> with RDBMS data source for new inserts or updates.
> Details are documented in RFC-14
> https://cwiki.apache.org/confluence/display/HUDI/RFC+-+14+%3A+JDBC+incremental+puller



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-431) Design and develop parquet logging in Log file

2020-03-09 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-431:
--
Fix Version/s: (was: 0.5.2)
   0.6.0

> Design and develop parquet logging in Log file
> --
>
> Key: HUDI-431
> URL: https://issues.apache.org/jira/browse/HUDI-431
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Storage Management
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
> Fix For: 0.6.0
>
>
> Once Inline FS is available, enable parquet logging support with 
> HoodieLogFile. LogFile can expose a writer (essentially ParquetWriter) and 
> users can write records as though writing to parquet files. Similarly on the 
> read path, a reader (parquetReader) will be exposed which the user can use to 
> read data out of it. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-669) HoodieDeltaStreamer offset not handled correctly when using LATEST offset reset strategy

2020-03-09 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-669:
--
Fix Version/s: (was: 0.5.2)
   0.6.0

> HoodieDeltaStreamer offset not handled correctly when using LATEST offset 
> reset strategy
> 
>
> Key: HUDI-669
> URL: https://issues.apache.org/jira/browse/HUDI-669
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: DeltaStreamer
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
> Fix For: 0.6.0
>
>
> Context : [https://github.com/apache/incubator-hudi/issues/1375]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-646) Re-enable TestUpdateSchemaEvolution after triaging weird CI issue

2020-03-09 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-646:
--
Fix Version/s: (was: 0.5.2)
   0.6.0

> Re-enable TestUpdateSchemaEvolution after triaging weird CI issue
> -
>
> Key: HUDI-646
> URL: https://issues.apache.org/jira/browse/HUDI-646
> Project: Apache Hudi (incubating)
>  Issue Type: Test
>  Components: Testing
>Reporter: Vinoth Chandar
>Assignee: lamber-ken
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> https://github.com/apache/incubator-hudi/pull/1346/commits/5b20891619380a66e2a62c9e57fb28c4f5ed948b
>  undo this
> {code}
> Job aborted due to stage failure: Task 7 in stage 1.0 failed 1 times, most 
> recent failure: Lost task 7.0 in stage 1.0 (TID 15, localhost, executor 
> driver): org.apache.parquet.io.ParquetDecodingException: Can not read value 
> at 0 in block -1 in file 
> file:/tmp/junit3406952253616234024/2016/01/31/f1-0_7-0-7_100.parquet
>   at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:251)
>   at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:132)
>   at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:136)
>   at 
> org.apache.hudi.common.util.ParquetUtils.readAvroRecords(ParquetUtils.java:190)
>   at 
> org.apache.hudi.client.TestUpdateSchemaEvolution.lambda$testSchemaEvolutionOnUpdate$dfb2f24e$1(TestUpdateSchemaEvolution.java:123)
>   at 
> org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1.apply(JavaPairRDD.scala:1040)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:891)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
>   at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
>   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
>   at scala.collection.AbstractIterator.to(Iterator.scala:1334)
>   at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
>   at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1334)
>   at 
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
>   at scala.collection.AbstractIterator.toArray(Iterator.scala:1334)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:945)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:945)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   at org.apache.spark.scheduler.Task.run(Task.scala:123)
>   at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.UnsupportedOperationException: Byte-buffer read 
> unsupported by input stream
>   at 
> org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:146)
>   at 
> org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:143)
>   at 
> org.apache.parquet.hadoop.util.H2SeekableInputStream$H2Reader.read(H2SeekableInputStream.java:81)
>   at 
> org.apache.parquet.hadoop.util.H2SeekableInputStream.readFully(H2SeekableInputStream.java:90)
>   at 
> org.apache.parquet.hadoop.util.H2SeekableInputStream.readFully(H2SeekableInputStream.java:75)
>   at 
> org.apache.parquet.hadoop.ParquetFileReader$ConsecutiveChunkList.readAll(ParquetFileReader.java:1174)
>   at 
> org.apache.parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:805)
>   at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:127)
>   at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:222)
>   ... 29 more
> {code}
> Only happens on travis. Locally succeeded over 5000 

[jira] [Updated] (HUDI-437) Support user-defined index

2020-03-09 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-437:
--
Fix Version/s: (was: 0.5.2)
   0.6.0

> Support user-defined index
> --
>
> Key: HUDI-437
> URL: https://issues.apache.org/jira/browse/HUDI-437
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Index, newbie, Writer Core
>Reporter: leesf
>Assignee: leesf
>Priority: Major
> Fix For: 0.6.0
>
>
> Currently, Hudi does not support user-defined index, and will throw exception 
> if configured other index type except for HBASE/INMEMORY/BLOOM/GLOBAL_BLOOM



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-436) Integrate HFile and Compaction

2020-03-09 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-436:
--
Fix Version/s: (was: 0.5.2)
   0.6.0

> Integrate HFile and Compaction 
> ---
>
> Key: HUDI-436
> URL: https://issues.apache.org/jira/browse/HUDI-436
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Storage Management
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-435) Make async compaction extensible to be available in other components.

2020-03-09 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-435:
--
Fix Version/s: (was: 0.5.2)
   0.6.0

> Make async compaction extensible to be available in other components. 
> --
>
> Key: HUDI-435
> URL: https://issues.apache.org/jira/browse/HUDI-435
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Compaction, Writer Core
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
> Fix For: 0.6.0
>
>
> Once HFile based index is available, next step is to make compaction 
> extensible to be available for all components.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-617) Add support for data types convertible to String in TimestampBasedKeyGenerator

2020-03-09 Thread vinoyang (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17055560#comment-17055560
 ] 

vinoyang commented on HUDI-617:
---

[~amitds1997] Did you apply for contribution permission? Why I can not assign 
this ticket to you?

> Add support for data types convertible to String in TimestampBasedKeyGenerator
> --
>
> Key: HUDI-617
> URL: https://issues.apache.org/jira/browse/HUDI-617
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Utilities
>Reporter: Amit Singh
>Priority: Minor
>  Labels: easyfix, pull-request-available
> Fix For: 0.5.2
>
> Attachments: test_data.json, test_schema.avsc
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently, TimestampBasedKeyGenerator only supports 4 data types for the 
> partition key. They are  Double, Long, Float and String. However, if the 
> `avro.java.string` is not specified in the schema provided, Hudi throws the 
> following error:
>  org.apache.hudi.exception.HoodieNotSupportedException: Unexpected type for 
> partition field: org.apache.avro.util.Utf8
>  at 
> org.apache.hudi.utilities.keygen.TimestampBasedKeyGenerator.getKey(TimestampBasedKeyGenerator.java:111)
>  at 
> org.apache.hudi.utilities.deltastreamer.DeltaSync.lambda$readFromSource$f92c188c$1(DeltaSync.java:338)
> 
>  It will be better if the support was more generalised to include the data 
> types that provide method to convert them to String such as `Utf8` since all 
> these methods implement the `CharSequence` interface.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-587) Jacoco coverage report is not generated

2020-03-09 Thread vinoyang (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17055561#comment-17055561
 ] 

vinoyang commented on HUDI-587:
---

[~pwason] Did you apply the contribution permission? I can not assign this 
ticket to you.

> Jacoco coverage report is not generated
> ---
>
> Key: HUDI-587
> URL: https://issues.apache.org/jira/browse/HUDI-587
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>Reporter: Prashant Wason
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>   Original Estimate: 1h
>  Time Spent: 20m
>  Remaining Estimate: 40m
>
> When running tests, the jacoco coverage report is not generated. The jacoco 
> plugin is loaded, it sets the correct Java Agent line, bit it fails to find 
> the execution data file after tests complete.
> Example:
> mvn test -Dtest=TestHoodieActiveTimeline
> ...
> 22:42:40 [INFO] — jacoco-maven-plugin:0.7.8:prepare-agent (pre-unit-test) @ 
> hudi-common —
>  22:42:40 [INFO] *surefireArgLine set to 
> javaagent:/home/pwason/.m2/repository/org/jacoco/org.jacoco.agent/0.7.8/org.jacoco.agent-0.7.8-runtime.jar=destfile=/home/pwason/work/java/incubator-hudi/hudi-common/target/coverage-reports/jacocout.exec*
> *...*
> 22:42:49 [INFO] — jacoco-maven-plugin:0.7.8:report (post-unit-test) @ 
> hudi-common —
>  22:42:49 [INFO] *Skipping JaCoCo execution due to missing execution data 
> file.*
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-662) Attribute binary dependencies that are included in the bundle jars

2020-03-09 Thread vinoyang (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17055615#comment-17055615
 ] 

vinoyang commented on HUDI-662:
---

{quote}Apache Singa [https://github.com/apache/singa] follows same model, only 
does source releases
Apache Dubbo [http://dubbo.apache.org/en-us/blog/download.html] also only puts 
out source releases

Their LICENSE and NOTICE follow same principles as ours.
{quote}

In short, we did not reference the suitable projects? We should reference those 
projects which only release source distribution?

> Attribute binary dependencies that are included in the bundle jars
> --
>
> Key: HUDI-662
> URL: https://issues.apache.org/jira/browse/HUDI-662
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Release & Administrative
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
> Fix For: 0.5.2
>
>
> [https://www.apache.org/legal/resolved.html] is the comprehensive guide here.
>  [http://www.apache.org/dev/licensing-howto.html] is the comprehensive guide 
> here.'
> [http://www.apache.org/legal/src-headers.html] also 
>  
> Previously, we asked about some specific dependencies here
>  https://issues.apache.org/jira/browse/LEGAL-461



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-634) Document breaking changes for 0.5.2 release

2020-03-09 Thread vinoyang (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17055623#comment-17055623
 ] 

vinoyang commented on HUDI-634:
---

[~vinoth] I am browsing the commit history and jira issues that involved in 
v0.5.2. What's more, where would we document the break changes? Is the release 
blog a suitable place? And I am also preparing the release blog.

> Document breaking changes for 0.5.2 release
> ---
>
> Key: HUDI-634
> URL: https://issues.apache.org/jira/browse/HUDI-634
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Release & Administrative
>Reporter: Vinoth Chandar
>Assignee: vinoyang
>Priority: Blocker
> Fix For: 0.5.2
>
>
> * Write Client restructuring has moved classes around (HUDI-554) 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-688) Ensure NOTICE contains all notices for the dependencies called out in LICENSE

2020-03-09 Thread vinoyang (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17055628#comment-17055628
 ] 

vinoyang commented on HUDI-688:
---

If we can make sure the recently graduated project also can verify our thought. 
It would be very good.

> Ensure NOTICE contains all notices for the dependencies called out in LICENSE
> -
>
> Key: HUDI-688
> URL: https://issues.apache.org/jira/browse/HUDI-688
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Release & Administrative
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Blocker
> Fix For: 0.5.2
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-688) Ensure NOTICE contains all notices for the dependencies called out in LICENSE

2020-03-09 Thread vinoyang (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17055639#comment-17055639
 ] 

vinoyang commented on HUDI-688:
---

[~vinoth] IMO, we can follow druid's solution, since it's graduate date is so 
close to now. It should be checked and verified before graduating.

> Ensure NOTICE contains all notices for the dependencies called out in LICENSE
> -
>
> Key: HUDI-688
> URL: https://issues.apache.org/jira/browse/HUDI-688
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Release & Administrative
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Blocker
> Fix For: 0.5.2
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-634) Write release blog and document breaking changes for 0.5.2 release

2020-03-10 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-634:
--
Summary: Write release blog and document breaking changes for 0.5.2 release 
 (was: Document breaking changes for 0.5.2 release)

> Write release blog and document breaking changes for 0.5.2 release
> --
>
> Key: HUDI-634
> URL: https://issues.apache.org/jira/browse/HUDI-634
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Release & Administrative
>Reporter: Vinoth Chandar
>Assignee: vinoyang
>Priority: Blocker
> Fix For: 0.5.2
>
>
> * Write Client restructuring has moved classes around (HUDI-554) 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-680) Update Jackson databind to 2.6.7.3

2020-03-10 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-680:
--
Fix Version/s: (was: 0.5.2)
   0.6.0

> Update Jackson databind to 2.6.7.3
> --
>
> Key: HUDI-680
> URL: https://issues.apache.org/jira/browse/HUDI-680
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Common Core
>Reporter: Aki Tanaka
>Assignee: Aki Tanaka
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I would like to update Jackson databind to 2.6.7.3. Because this version is 
> the latest jackson-databind of 2.6.7.x line and it has all CVE fixes up to 
> 2.9.10.
> https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.6.7.x



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-554) Restructure code/packages to move more code back into hudi-writer-common

2020-03-11 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-554:
--
Fix Version/s: (was: 0.6.0)
   0.5.2

> Restructure code/packages  to move more code back into hudi-writer-common
> -
>
> Key: HUDI-554
> URL: https://issues.apache.org/jira/browse/HUDI-554
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Code Cleanup
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-634) Cut 0.5.2 documentation and write release note

2020-03-11 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-634:
--
Summary: Cut 0.5.2 documentation and write release note  (was: Write 
release blog and document breaking changes for 0.5.2 release)

> Cut 0.5.2 documentation and write release note
> --
>
> Key: HUDI-634
> URL: https://issues.apache.org/jira/browse/HUDI-634
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Release & Administrative
>Reporter: Vinoth Chandar
>Assignee: vinoyang
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> * Write Client restructuring has moved classes around (HUDI-554) 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-688) Ensure NOTICE contains all notices for the dependencies called out in LICENSE

2020-03-11 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-688:
--
Status: In Progress  (was: Open)

> Ensure NOTICE contains all notices for the dependencies called out in LICENSE
> -
>
> Key: HUDI-688
> URL: https://issues.apache.org/jira/browse/HUDI-688
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Release & Administrative
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-617) Add support for data types convertible to String in TimestampBasedKeyGenerator

2020-03-14 Thread vinoyang (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17059267#comment-17059267
 ] 

vinoyang commented on HUDI-617:
---

[~amitds1997] If you want [~vinoth] can give you permission.

> Add support for data types convertible to String in TimestampBasedKeyGenerator
> --
>
> Key: HUDI-617
> URL: https://issues.apache.org/jira/browse/HUDI-617
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Utilities
>Reporter: Amit Singh
>Priority: Minor
>  Labels: easyfix, pull-request-available
> Fix For: 0.5.2
>
> Attachments: test_data.json, test_schema.avsc
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently, TimestampBasedKeyGenerator only supports 4 data types for the 
> partition key. They are  Double, Long, Float and String. However, if the 
> `avro.java.string` is not specified in the schema provided, Hudi throws the 
> following error:
>  org.apache.hudi.exception.HoodieNotSupportedException: Unexpected type for 
> partition field: org.apache.avro.util.Utf8
>  at 
> org.apache.hudi.utilities.keygen.TimestampBasedKeyGenerator.getKey(TimestampBasedKeyGenerator.java:111)
>  at 
> org.apache.hudi.utilities.deltastreamer.DeltaSync.lambda$readFromSource$f92c188c$1(DeltaSync.java:338)
> 
>  It will be better if the support was more generalised to include the data 
> types that provide method to convert them to String such as `Utf8` since all 
> these methods implement the `CharSequence` interface.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-480) Support a querying delete data methond in incremental view

2020-03-14 Thread vinoyang (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17059275#comment-17059275
 ] 

vinoyang commented on HUDI-480:
---

[~vinoth] IMO, support query delete data in the incremental view is a key 
feature. Comparing with MySQL binlog CDC for incremental processing, we'd 
better provide deleted data for the further purpose. In our company's data 
warehouse scenes, we have a strong business requirement.

> Support a querying delete data methond in incremental view
> --
>
> Key: HUDI-480
> URL: https://issues.apache.org/jira/browse/HUDI-480
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Incremental Pull
>Reporter: cdmikechen
>Priority: Minor
>
> As we known, hudi have supported many method to query data in Spark and Hive 
> and Presto. And it also provides a very good timeline idea to trace changes 
> in data, and it can be used to query incremental data in incremental view.
> In old time, we just have insert and update funciton to upsert data, and now 
> we have added new functions to delete some existing data.
> *[HUDI-328] Adding delete api to HoodieWriteClient* 
> https://github.com/apache/incubator-hudi/pull/1004
> *[HUDI-377] Adding Delete() support to 
> DeltaStreamer**https://github.com/apache/incubator-hudi/pull/1073
> So I think if we have delete api, should we add another method to get deleted 
> data in incremental view?
> I've looked at the methods for generating new parquet files. I think the main 
> idea is to combine old and new data, and then filter the data which need to 
> be deleted, so that the deleted data does not exist in the new dataset. 
> However, in this way, the data to be deleted will not be retained in new 
> dataset, so that only the inserted or modified data can be found according to 
> the existing timestamp field during data tracing in incremental view.
> If we can do it, I feel that there are two ideas to consider:
> 1. Trace the dataset in the same file at different time check points 
> according to the timeline, compare the two datasets according to the key and 
> filter out the deleted data. This method does not consume extra when writing, 
> but it needs to call the analysis function according to the actual request 
> during query, which consumes a lot.
> 2. When writing data, if there is any deleted data, we will record it. File 
> name such as *.delete_filename_version_timestamp*. So that we can immediately 
> give feedback according to the time. But additional processing will be done 
> at the time of writing.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-710) Fixing failure in Staging Validation Script

2020-03-15 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang reassigned HUDI-710:
-

Assignee: Balaji Varadarajan

> Fixing failure in Staging Validation Script
> ---
>
> Key: HUDI-710
> URL: https://issues.apache.org/jira/browse/HUDI-710
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Release & Administrative
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The script was wrongly failing with the below message
> There were non-text files in source release. Please check below
>  ./docker/demo/data/batch_2.json: application/json; charset=us-ascii
> ./docker/demo/data/batch_1.json: application/json; charset=us-ascii
> ./hudi-common/src/test/resources/sample.data: application/json; 
> charset=us-ascii



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-710) Fixing failure in Staging Validation Script

2020-03-15 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-710:
--
Status: In Progress  (was: Open)

> Fixing failure in Staging Validation Script
> ---
>
> Key: HUDI-710
> URL: https://issues.apache.org/jira/browse/HUDI-710
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Release & Administrative
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The script was wrongly failing with the below message
> There were non-text files in source release. Please check below
>  ./docker/demo/data/batch_2.json: application/json; charset=us-ascii
> ./docker/demo/data/batch_1.json: application/json; charset=us-ascii
> ./hudi-common/src/test/resources/sample.data: application/json; 
> charset=us-ascii



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-710) Fixing failure in Staging Validation Script

2020-03-15 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-710.
-
Fix Version/s: 0.6.0
   Resolution: Fixed

Fixed via master branch: 23afe7a4872fca66d9aeb36d209c6538a17d81f1

> Fixing failure in Staging Validation Script
> ---
>
> Key: HUDI-710
> URL: https://issues.apache.org/jira/browse/HUDI-710
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Release & Administrative
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The script was wrongly failing with the below message
> There were non-text files in source release. Please check below
>  ./docker/demo/data/batch_2.json: application/json; charset=us-ascii
> ./docker/demo/data/batch_1.json: application/json; charset=us-ascii
> ./hudi-common/src/test/resources/sample.data: application/json; 
> charset=us-ascii



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-694) Add unit test for SparkEnvCommand

2020-03-15 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-694:
--
Status: Open  (was: New)

> Add unit test for SparkEnvCommand
> -
>
> Key: HUDI-694
> URL: https://issues.apache.org/jira/browse/HUDI-694
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: CLI, Testing
>Reporter: hong dongdong
>Assignee: hong dongdong
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Add unit test for SparkEnvCommand in hudi-cli



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-694) Add unit test for SparkEnvCommand

2020-03-15 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-694.
-
Fix Version/s: 0.6.0
   Resolution: Implemented

Implemented via master branch: 55e6d348155f63eb128cd208687d02206bad66a5

> Add unit test for SparkEnvCommand
> -
>
> Key: HUDI-694
> URL: https://issues.apache.org/jira/browse/HUDI-694
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: CLI, Testing
>Reporter: hong dongdong
>Assignee: hong dongdong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Add unit test for SparkEnvCommand in hudi-cli



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-715) Fix duplicate name in TableCommand

2020-03-16 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-715.
-
Fix Version/s: 0.6.0
   Resolution: Fixed

Fixed via master branch: 3ef9e885cacc064fc316c61c7c826f3a1cb96da0

> Fix duplicate name in TableCommand
> --
>
> Key: HUDI-715
> URL: https://issues.apache.org/jira/browse/HUDI-715
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: CLI
>Reporter: hong dongdong
>Assignee: hong dongdong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> connect command has duplicate key name maxCheckIntervalMs, fix it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-715) Fix duplicate name in TableCommand

2020-03-16 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-715:
--
Status: Open  (was: New)

> Fix duplicate name in TableCommand
> --
>
> Key: HUDI-715
> URL: https://issues.apache.org/jira/browse/HUDI-715
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: CLI
>Reporter: hong dongdong
>Assignee: hong dongdong
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> connect command has duplicate key name maxCheckIntervalMs, fix it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   5   6   7   8   >