[jira] [Closed] (HUDI-358) Add Java-doc and importOrder checkstyle rule

2019-11-25 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-358.
--
Fix Version/s: 0.5.1
   Resolution: Fixed

Fixed via master: 212282c8aaf623f451e3f72674ed4d3ed550602d

> Add Java-doc and importOrder checkstyle rule
> 
>
> Key: HUDI-358
> URL: https://issues.apache.org/jira/browse/HUDI-358
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Common Core
>Reporter: lamber-ken
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> 1, Add Java-doc and importOrder checkstyle rule.
> 2, Keep severity as info level before finish the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-359) Add hudi-env for hudi-cli module

2019-11-25 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-359.
--
Fix Version/s: 0.5.1
   Resolution: Fixed

Fixed via master: a7e07cd910425b5cfe9886677e780bfb2ae96c52

> Add hudi-env for hudi-cli module
> 
>
> Key: HUDI-359
> URL: https://issues.apache.org/jira/browse/HUDI-359
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: CLI
>Reporter: hong dongdong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Add hudi-env.sh for hudi-cli module to set running environments.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-362) Adds a check for the existence of field

2019-11-25 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-362.
--
Fix Version/s: 0.5.1
   Resolution: Fixed

Fixed via master: 44823041a37601fed8163502272a8fcb7a5be45d

> Adds a check for the existence of field
> ---
>
> Key: HUDI-362
> URL: https://issues.apache.org/jira/browse/HUDI-362
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: CLI
>Reporter: hong dongdong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
> Attachments: image-2019-11-25-15-32-14-057.png, 
> image-2019-11-25-15-33-21-610.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Use command
> {code:java}
> commits show --sortBy "Total Bytes Written" --desc true --limit 10{code}
> when  sortBy field not in columns, it throw 
> !image-2019-11-25-15-32-14-057.png!
> It is better to give a friendly hint as: !image-2019-11-25-15-33-21-610.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-288) Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment

2019-11-28 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16984368#comment-16984368
 ] 

leesf commented on HUDI-288:


Thanks for your sharing. Looks very comprehensive. 
I have some thoughts. Regarding point 6, the target path was designed to  
_//_, as 
discussed above with vinoth, is it resonable to _ 
`/`_ ?  Regarding point 7, would we get rid of 
oozie as introducing it to hudi might be not very resonable?  And is there any 
other considerations not supporting continous mode currently? Also, the wrapper 
seem to be able to replace the current DeltaStreamer? 


> Add support for ingesting multiple kafka streams in a single DeltaStreamer 
> deployment
> -
>
> Key: HUDI-288
> URL: https://issues.apache.org/jira/browse/HUDI-288
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: deltastreamer
>Reporter: Vinoth Chandar
>Assignee: leesf
>Priority: Major
>
> https://lists.apache.org/thread.html/3a69934657c48b1c0d85cba223d69cb18e18cd8aaa4817c9fd72cef6@
>  has all the context



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-209) Implement JMX metrics reporter

2019-11-28 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-209.
--
Fix Version/s: 0.5.1
   Resolution: Fixed

Fixed via master: 0b52ae3ac2685c5afa7821d663854b526b5a1cff

> Implement JMX metrics reporter
> --
>
> Key: HUDI-209
> URL: https://issues.apache.org/jira/browse/HUDI-209
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>Reporter: vinoyang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently, there are only two reporters {{MetricsGraphiteReporter}} and 
> {{InMemoryMetricsReporter}}. {{InMemoryMetricsReporter}} is used for testing. 
> So actually we only have one metrics reporter. Since JMX is a standard of the 
> monitor on the JVM platform, I propose to provide a JMX metrics reporter. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-277) Translate Documentation -> Performance page

2019-11-12 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-277.
--
Resolution: Fixed

Fixed via asf-site: 747a1d4e21dd7900085b8cc0f695daa147727241

> Translate Documentation -> Performance page
> ---
>
> Key: HUDI-277
> URL: https://issues.apache.org/jira/browse/HUDI-277
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: docs-chinese
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Translate this page into Chinese:
>  
> [http://hudi.apache.org/performance.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-255) Translate Talks & Powered By page

2019-09-22 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-255.
--
Resolution: Fixed

Fixed via asf-site: 8cfe93700bba8cb3025babe9182e9ad63a7e1035

> Translate Talks & Powered By page
> -
>
> Key: HUDI-255
> URL: https://issues.apache.org/jira/browse/HUDI-255
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: docs-chinese
>Reporter: leesf
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The online HTML web page: [https://hudi.apache.org/powered_by.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-232) Implement sealing/unsealing for HoodieRecord class

2019-09-25 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937477#comment-16937477
 ] 

leesf commented on HUDI-232:


How about add seal and unseal methods to HoodieRecord? Error will thrown if 
modify HoodieRecord after sealed. and modification is allowed after unsealed. 
cc [~vinoth]

> Implement sealing/unsealing for HoodieRecord class
> --
>
> Key: HUDI-232
> URL: https://issues.apache.org/jira/browse/HUDI-232
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Write Client
>Affects Versions: 0.5.0
>Reporter: Vinoth Chandar
>Priority: Major
>
> HoodieRecord class sometimes is modified to set the record location. We can 
> get into issues like HUDI-170 if the modification is misplaced. We need a 
> mechanism to seal the class and unseal for modification explicity.. Try to 
> modify in sealed state should throw an error



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-278) Translate Administering page

2019-09-24 Thread leesf (Jira)
leesf created HUDI-278:
--

 Summary: Translate Administering page
 Key: HUDI-278
 URL: https://issues.apache.org/jira/browse/HUDI-278
 Project: Apache Hudi (incubating)
  Issue Type: Sub-task
  Components: docs-chinese
Reporter: leesf
Assignee: leesf
 Fix For: 0.5.1


he online HTML web page: [http://hudi.apache.org/admin_guide.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-296) Explore use of spotless to auto fix formatting errors

2019-10-08 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf reassigned HUDI-296:
--

Assignee: leesf

> Explore use of spotless to auto fix formatting errors 
> --
>
> Key: HUDI-296
> URL: https://issues.apache.org/jira/browse/HUDI-296
> Project: Apache Hudi (incubating)
>  Issue Type: Test
>Reporter: Vinoth Chandar
>Assignee: leesf
>Priority: Major
>
> https://github.com/diffplug/spotless/tree/master/plugin-maven



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-292) Consume more entries from kafka than specified sourceLimit.

2019-10-07 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16946383#comment-16946383
 ] 

leesf commented on HUDI-292:


 `long toOffset = Math.min(toOffsetMax, range.untilOffset() + 
eventsPerPartition);` to compute the offset is well, but we should handle the 
case in which remainingEvents is less than `toOffset - range.untilOffset()`. 
Also it may not affect so much even consume more entries from partial 
partitions, but we had better to fix it. And i would like to open a PR to fix 
it. CC [~vinoth]

> Consume more entries from kafka than specified sourceLimit.
> ---
>
> Key: HUDI-292
> URL: https://issues.apache.org/jira/browse/HUDI-292
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Utilities
>Reporter: leesf
>Assignee: leesf
>Priority: Major
> Fix For: 0.5.1
>
>
> When _CheckpointUtils#computeOffsetRanges_ for consuming kafka messges. 
> Given 
> topic = "test",
> fromOffsets(partition -> offset pair) = (0 -> 0), (1 -> 0), (2 -> 0), (3 -> 
> 0), (4 -> 0),
> toOffsets = (0, 100), (1, 1000), (2, 1000), (3, 1000), (4, 1000),
> numEvents = 1001.
> The output of _CheckpointUtils#computesOffsetRanges_ is  
> OffsetRange(topic: 'test', partition: 0, range: [0 -> 100])
> OffsetRange(topic: 'test', partition: 1, range: [0 -> 226])
> OffsetRange(topic: 'test', partition: 2, range: [0 -> 226])
> OffsetRange(topic: 'test', partition: 3, range: [0 -> 226])
> OffsetRange(topic: 'test', partition: 4, range: [0 -> 226])
> Total count is 1004(100 + 266 * 4), more than 1001, and thus consume more 
> entries from kafka  than specified 1001.
> CC [~vinoth]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-295) Do one-time cleanup of Hudi git history

2019-10-07 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16946399#comment-16946399
 ] 

leesf commented on HUDI-295:


In order to clean up git history, it seems that we need rebase and force push 
against master branch. Does others(contributers) have access to write to master 
branch? If not, i think only committers and PMC who have access to master 
branch would take the ticket and help to clean up git history.

> Do one-time cleanup of Hudi git history
> ---
>
> Key: HUDI-295
> URL: https://issues.apache.org/jira/browse/HUDI-295
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Docs
>Reporter: Vinoth Chandar
>Priority: Major
>
> https://lists.apache.org/thread.html/dc6eb516e248088dac1a2b5c9690383dfe2eb3912f76bbe9dd763c2b@



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-300) Explore use of spotbugs to find bugs

2019-10-10 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-300:
---
Description: 
https://spotbugs.github.io/
https://github.com/apache/incubator-hudi/pull/945

  was:https://spotbugs.github.io/


> Explore use of spotbugs to find bugs
> 
>
> Key: HUDI-300
> URL: https://issues.apache.org/jira/browse/HUDI-300
> Project: Apache Hudi (incubating)
>  Issue Type: Test
>    Reporter: leesf
>Assignee: leesf
>Priority: Major
>
> https://spotbugs.github.io/
> https://github.com/apache/incubator-hudi/pull/945



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-265) Failed to delete tmp dirs created in unit tests

2019-10-03 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-265.
--
Resolution: Fixed

Fixed via master: 3dedc7e5fdd5f885915e81e47e110b845a905dbf

> Failed to delete tmp dirs created in unit tests
> ---
>
> Key: HUDI-265
> URL: https://issues.apache.org/jira/browse/HUDI-265
> Project: Apache Hudi (incubating)
>  Issue Type: Test
>  Components: Testing
>Reporter: leesf
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In some unit tests, such as TestHoodieSnapshotCopier, TestUpdateMapFunction.  
> After run these tests, it fails to delete tmp dir created in _init(with 
> before annotation)_ after clean(with after annotation), thus will cause too 
> many folders in /tmp. we need to delete these dirs after finishing ut.
> I will go through all the unit tests that did not properly delete the tmp dir 
> and send a patch.
>  
> cc [~vinoth] [~vbalaji]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-292) Consume more entries from kafka than specified sourceLimit.

2019-10-04 Thread leesf (Jira)
leesf created HUDI-292:
--

 Summary: Consume more entries from kafka than specified 
sourceLimit.
 Key: HUDI-292
 URL: https://issues.apache.org/jira/browse/HUDI-292
 Project: Apache Hudi (incubating)
  Issue Type: Improvement
  Components: Utilities
Reporter: leesf
Assignee: leesf
 Fix For: 0.5.1


When _CheckpointUtils#computeOffsetRanges_ for consuming kafka messges. 

Given 
topic = "test",
fromOffsets(partition -> offset pair) = (0 -> 0), (1 -> 0), (2 -> 0), (3 -> 0), 
(4 -> 0),
toOffsets = (0, 100), (1, 1000), (2, 1000), (3, 1000), (4, 1000),
numEvents = 1001.

The output of _CheckpointUtils#computesOffsetRanges_ is  

OffsetRange(topic: 'test', partition: 0, range: [0 -> 100])
OffsetRange(topic: 'test', partition: 1, range: [0 -> 226])
OffsetRange(topic: 'test', partition: 2, range: [0 -> 226])
OffsetRange(topic: 'test', partition: 3, range: [0 -> 226])
OffsetRange(topic: 'test', partition: 4, range: [0 -> 226])

Total count is 1004(100 + 266 * 4), more than 1001, and thus consume more 
entries from kafka  than specified 1001.

CC [~vinoth]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-285) Implement HoodieStorageWriter based on actual file type

2019-10-04 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-285.
--
Resolution: Fixed

Fixed via master: 7dd9c74b1bb28c3a934e46d560abbb4c5b6d4586

> Implement HoodieStorageWriter based on actual file type
> ---
>
> Key: HUDI-285
> URL: https://issues.apache.org/jira/browse/HUDI-285
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Write Client
>Reporter: leesf
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently the _getStorageWriter_ method in HoodieStorageWriterFactory to get 
> HoodieStorageWriter is hard code to HoodieParquetWriter since currently only 
> parquet is supported for HoodieStorageWriter. However, it is better to 
> implement HoodieStorageWriter based on actual file type for extension.
> cc [~vinoth] [~vbalaji]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-290) Normalize Test class name of HoodieWriteConfigTest

2019-10-03 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16943558#comment-16943558
 ] 

leesf commented on HUDI-290:


+1 rename to TestHoodieWriteConfig, and i see many UTs start already start with 
Test... Also please check other UTs name not started with Test in the project.  
Thanks.

> Normalize Test class name of HoodieWriteConfigTest
> --
>
> Key: HUDI-290
> URL: https://issues.apache.org/jira/browse/HUDI-290
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Testing
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>
> In general, a test case name start with {{Test}}. It would be better to 
> rename {{HoodieWriteConfigTest}} to {{TestHoodieWriteConfig}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-288) Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment

2019-10-03 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf reassigned HUDI-288:
--

Assignee: leesf

> Add support for ingesting multiple kafka streams in a single DeltaStreamer 
> deployment
> -
>
> Key: HUDI-288
> URL: https://issues.apache.org/jira/browse/HUDI-288
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: deltastreamer
>Reporter: Vinoth Chandar
>Assignee: leesf
>Priority: Major
>
> https://lists.apache.org/thread.html/3a69934657c48b1c0d85cba223d69cb18e18cd8aaa4817c9fd72cef6@
>  has all the context



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HUDI-290) Normalize Test class name of HoodieWriteConfigTest

2019-10-03 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16943558#comment-16943558
 ] 

leesf edited comment on HUDI-290 at 10/3/19 2:53 PM:
-

+1 rename to TestHoodieWriteConfig, and i see many UT names already start with 
Test... Also please check other UT names not started with Test in the project.  
Thanks.


was (Author: xleesf):
+1 rename to TestHoodieWriteConfig, and i see many UTs start already start with 
Test... Also please check other UTs name not started with Test in the project.  
Thanks.

> Normalize Test class name of HoodieWriteConfigTest
> --
>
> Key: HUDI-290
> URL: https://issues.apache.org/jira/browse/HUDI-290
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Testing
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>
> In general, a test case name start with {{Test}}. It would be better to 
> rename {{HoodieWriteConfigTest}} to {{TestHoodieWriteConfig}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-292) Consume more entries from kafka than specified sourceLimit.

2019-10-11 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-292.
--
Resolution: Fixed

Fixed via master: e10e06918e4758917513c55f9bc02c35dad99128

> Consume more entries from kafka than specified sourceLimit.
> ---
>
> Key: HUDI-292
> URL: https://issues.apache.org/jira/browse/HUDI-292
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Utilities
>Reporter: leesf
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When _CheckpointUtils#computeOffsetRanges_ for consuming kafka messges. 
> Given 
> topic = "test",
> fromOffsets(partition -> offset pair) = (0 -> 0), (1 -> 0), (2 -> 0), (3 -> 
> 0), (4 -> 0),
> toOffsets = (0, 100), (1, 1000), (2, 1000), (3, 1000), (4, 1000),
> numEvents = 1001.
> The output of _CheckpointUtils#computesOffsetRanges_ is  
> OffsetRange(topic: 'test', partition: 0, range: [0 -> 100])
> OffsetRange(topic: 'test', partition: 1, range: [0 -> 226])
> OffsetRange(topic: 'test', partition: 2, range: [0 -> 226])
> OffsetRange(topic: 'test', partition: 3, range: [0 -> 226])
> OffsetRange(topic: 'test', partition: 4, range: [0 -> 226])
> Total count is 1004(100 + 266 * 4), more than 1001, and thus consume more 
> entries from kafka  than specified 1001.
> CC [~vinoth]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-437) Support user-defined index

2019-12-17 Thread leesf (Jira)
leesf created HUDI-437:
--

 Summary: Support user-defined index
 Key: HUDI-437
 URL: https://issues.apache.org/jira/browse/HUDI-437
 Project: Apache Hudi (incubating)
  Issue Type: Improvement
Reporter: leesf
Assignee: leesf
 Fix For: 0.5.2


Currently, Hudi does not support user-defined index, and will throw exception 
if configured other index type except for HBASE/INMEMORY/BLOOM/GLOBAL_BLOOM



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-386) Refactor hudi scala checkstyle rules

2019-12-21 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-386.
--
Fix Version/s: 0.5.1
   Resolution: Fixed

Fixed via master: b284091783af44341f20af11825ea9b6e3ba23da

> Refactor hudi scala checkstyle rules
> 
>
> Key: HUDI-386
> URL: https://issues.apache.org/jira/browse/HUDI-386
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>Reporter: lamber-ken
>Assignee: lamber-ken
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Refactor hudi scala checkstyle rules



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-415) HoodieSparkSqlWriter Commit time not representing the Spark job starting time

2019-12-20 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-415:
---
Fix Version/s: 0.5.1

> HoodieSparkSqlWriter Commit time not representing the Spark job starting time
> -
>
> Key: HUDI-415
> URL: https://issues.apache.org/jira/browse/HUDI-415
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>Reporter: Yanjia Gary Li
>Assignee: Yanjia Gary Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hudi records the commit time after the first action complete. If there is a 
> heavy transformation before isEmpty(), then the commit time could be 
> inaccurate.
> {code:java}
> if (hoodieRecords.isEmpty()) { 
> log.info("new batch has no new records, skipping...") 
> return (true, common.util.Option.empty()) 
> } 
> commitTime = client.startCommit() 
> writeStatuses = DataSourceUtils.doWriteOperation(client, hoodieRecords, 
> commitTime, operation)
> {code}
> For example, I start the spark job at 20190101, but *isEmpty()* ran for 2 
> hours, then the commit time in the .hoodie folder will be 201901010*2*00. If 
> I use the commit time to ingest data starting from 201901010200(from HDFS, 
> not using deltastreamer), then I will miss 2 hours of data.
> Is this set up intended? Can we move the commit time before isEmpty()?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-93) Enforce semantics on HoodieRecordPayload to allow for a consistent instantiation of custom payloads via reflection

2019-12-25 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-93?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf reassigned HUDI-93:
-

Assignee: leesf

> Enforce semantics on HoodieRecordPayload to allow for a consistent 
> instantiation of custom payloads via reflection
> --
>
> Key: HUDI-93
> URL: https://issues.apache.org/jira/browse/HUDI-93
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Common Core
>Reporter: Nishith Agarwal
>Assignee: leesf
>Priority: Major
>
> At the moment, the expectation is that any implementation of 
> HoodieRecordPayload needs to have a constructor with Optional. 
> But this is not enforced in the HoodieRecordPayload interface. We require a 
> method to enforce a semantic that works consistently.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-248) CLI doesn't allow rolling back a Delta commit

2019-12-25 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf reassigned HUDI-248:
--

Assignee: leesf

> CLI doesn't allow rolling back a Delta commit
> -
>
> Key: HUDI-248
> URL: https://issues.apache.org/jira/browse/HUDI-248
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: CLI, Usability
>Reporter: Rahul Bhartia
>Assignee: leesf
>Priority: Minor
>  Labels: aws-emr
> Fix For: 0.5.1
>
>
> [https://github.com/apache/incubator-hudi/blob/master/hudi-cli/src/main/java/org/apache/hudi/cli/commands/CommitsCommand.java#L128]
>  
> When trying to find a match for passed in commit value, the "commit rollback" 
> command is always default to using HoodieTimeline.COMMIT_ACTION - and hence 
> doesn't allow rolling back delta commits.
> Note: Delta Commits can be rolled back using a HoodieWriteClient, so seems 
> like it's a just a matter of having to match against both COMMIT_ACTION and 
> DELTA_COMMIT_ACTION in the CLI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-211) Maintain Chinese docs for Hudi

2019-12-25 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-211.
--
Fix Version/s: 0.5.1
   Resolution: Fixed

> Maintain Chinese docs for Hudi
> --
>
> Key: HUDI-211
> URL: https://issues.apache.org/jira/browse/HUDI-211
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: docs-chinese
>Reporter: vinoyang
>Priority: Major
> Fix For: 0.5.1
>
>
> All the translation of docs should be held under this umbrella issue. The 
> best practice would be *one doc page one subtask*. Before releasing a new 
> version, we will align with the English docs.
> The doc and website are held in {{asf-site}} branch, more details please see: 
> [https://hudi.apache.org/contributing.html#website]
> The Chinese docs support by jekyll-multiple-languages plugin. More details 
> about this plugin, please see: [http://jekyll-langs-sample.liaohuqiu.net/]
> Generally speaking, two basic steps:
>  * create a subtask issue of this umbrella issue for the page you want to 
> translate;
>  * copy the English markdown page and rename it to {{*.cn.md}} then translate 
> it to Chinese



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-333) Improve page navigation using TOC

2019-12-25 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-333.
--
Fix Version/s: 0.5.1
   Resolution: Fixed

Fixed via asf-site: 1fd0439a84c65ae12e025a64b6e0a0087aa7295e

> Improve page navigation using TOC
> -
>
> Key: HUDI-333
> URL: https://issues.apache.org/jira/browse/HUDI-333
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Bhavani Sudha Saktheeswaran
>Assignee: Bhavani Sudha Saktheeswaran
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Add Table of Contents to all pages.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-380) Update IDE set up documentation for IDE related errors

2019-12-25 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-380.
--
Fix Version/s: 0.5.1
   Resolution: Fixed

Fixed via asf-site: 9e30add249bdadb6b94cf0ff0090c4eaac625d68

> Update IDE set up documentation for IDE related errors
> --
>
> Key: HUDI-380
> URL: https://issues.apache.org/jira/browse/HUDI-380
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Docs, newbie, Usability
>Reporter: Pratyaksh Sharma
>Assignee: Pratyaksh Sharma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The errors are generally caused by jetty version conflicts. 
>  
> Sample issues -> 
> [https://github.com/apache/incubator-hudi/issues/894]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-67) Tool to convert sequence file based archived commits to log format #224

2019-12-25 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-67?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf reassigned HUDI-67:
-

Assignee: leesf

> Tool to convert sequence file based archived commits to log format #224
> ---
>
> Key: HUDI-67
> URL: https://issues.apache.org/jira/browse/HUDI-67
> Project: Apache Hudi (incubating)
>  Issue Type: Wish
>  Components: CLI, Write Client
>Reporter: Vinoth Chandar
>Assignee: leesf
>Priority: Major
>
> https://github.com/uber/hudi/issues/224



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-213) Add gem dependencies installation step for building doc description

2019-12-25 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-213.
--
Fix Version/s: 0.5.1
   Resolution: Fixed

Fixed via asf-site: 09525c1fb4e066e981047957e85bbefbd3b3ae91

> Add gem dependencies installation step for building doc description
> ---
>
> Key: HUDI-213
> URL: https://issues.apache.org/jira/browse/HUDI-213
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Docs
>Reporter: vinoyang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In asf-site branch, following the building doc steps 
> [here|[https://github.com/apache/incubator-hudi/tree/asf-site/docs#host-os]] 
> under "docs" folder, if we just invoke this command:
> {code:java}
> bundle exec jekyll serve
> {code}
> We will get an error:
> {code:java}
> Could not find concurrent-ruby-1.1.4 in any of the sources
> Run `bundle install` to install missing gems.
> {code}
> The reason is that we do not install the gem dependencies.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-427) Implement CLI support for performing bootstrap

2019-12-25 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf reassigned HUDI-427:
--

Assignee: leesf

> Implement CLI support for performing bootstrap
> --
>
> Key: HUDI-427
> URL: https://issues.apache.org/jira/browse/HUDI-427
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: CLI
>Reporter: Balaji Varadarajan
>Assignee: leesf
>Priority: Major
> Fix For: 0.5.1
>
>
> Need CLI to perform bootstrap as described in 
> [https://cwiki.apache.org/confluence/display/HUDI/RFC+-+12+%3A+Efficient+Migration+of+Large+Parquet+Tables+to+Apache+Hudi]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-416) Improve hint information for Cli

2019-12-25 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-416.
--
Fix Version/s: 0.5.1
   Resolution: Fixed

Fixed via master: 8affdf8bcbb4c7b236283e97c3afad186d5b6a3e

> Improve hint information for Cli
> 
>
> Key: HUDI-416
> URL: https://issues.apache.org/jira/browse/HUDI-416
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: CLI
>Reporter: hong dongdong
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Right now, cli always give error information: 
> {code:java}
> Command 'desc' was found but is not currently available (type 'help' then 
> ENTER to learn about this command)
> {code}
> but it is confused to user. We can give a hint clearly like:
> {code:java}
> Command failed java.lang.NullPointerException: There is no hudi dataset. 
> Please use connect command to set dataset first
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-585) Optimize the steps of building with scala-2.12

2020-02-09 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-585.

Fix Version/s: 0.5.2
   Resolution: Fixed

Fixed via master: 425e3e6c78b9be00fc3fecfc335c94e05a1c70e5

> Optimize the steps of building with scala-2.12 
> ---
>
> Key: HUDI-585
> URL: https://issues.apache.org/jira/browse/HUDI-585
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Utilities
>Reporter: lamber-ken
>Assignee: lamber-ken
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Optimize the steps of building with scala-2.12.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-596) KafkaConsumer need to be closed

2020-02-09 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-596.

Fix Version/s: 0.5.2
   Resolution: Fixed

Fixed via master: 347e297ac19ed55172e84e13075e19ce060954c6

> KafkaConsumer need to be closed
> ---
>
> Key: HUDI-596
> URL: https://issues.apache.org/jira/browse/HUDI-596
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Utilities
>Reporter: dengziming
>Assignee: dengziming
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> `offsetGen.getNextOffsetRanges` will is called periodically in DeltaStreamer 
> application, and it will `new KafkaConsumer(kafkaParams)` without close, and 
> Exception will throw after a while.
> ```
> java.net.SocketException: Too many open files
>   at sun.nio.ch.Net.socket0(Native Method)
>   at sun.nio.ch.Net.socket(Net.java:411)
>   at sun.nio.ch.Net.socket(Net.java:404)
>   at sun.nio.ch.SocketChannelImpl.(SocketChannelImpl.java:105)
>   at 
> sun.nio.ch.SelectorProviderImpl.openSocketChannel(SelectorProviderImpl.java:60)
>   at java.nio.channels.SocketChannel.open(SocketChannel.java:145)
>   at org.apache.kafka.common.network.Selector.connect(Selector.java:211)
>   at 
> org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:864)
>   at org.apache.kafka.clients.NetworkClient.ready(NetworkClient.java:265)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.trySend(ConsumerNetworkClient.java:485)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:261)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:242)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:218)
>   at 
> org.apache.kafka.clients.consumer.internals.Fetcher.getTopicMetadata(Fetcher.java:274)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.partitionsFor(KafkaConsumer.java:1774)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.partitionsFor(KafkaConsumer.java:1742)
>   at 
> org.apache.hudi.utilities.sources.helpers.KafkaOffsetGen.getNextOffsetRanges(KafkaOffsetGen.java:177)
>   at 
> org.apache.hudi.utilities.sources.JsonKafkaSource.fetchNewData(JsonKafkaSource.java:56)
>   at org.apache.hudi.utilities.sources.Source.fetchNext(Source.java:73)
>   at 
> org.apache.hudi.utilities.deltastreamer.SourceFormatAdapter.fetchNewDataInRowFormat(SourceFormatAdapter.java:107)
>   at 
> org.apache.hudi.utilities.deltastreamer.DeltaSync.readFromSource(DeltaSync.java:288)
>   at 
> org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:226)
> ```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-596) KafkaConsumer need to be closed

2020-02-09 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-596:
---
Status: Open  (was: New)

> KafkaConsumer need to be closed
> ---
>
> Key: HUDI-596
> URL: https://issues.apache.org/jira/browse/HUDI-596
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Utilities
>Reporter: dengziming
>Assignee: dengziming
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> `offsetGen.getNextOffsetRanges` will is called periodically in DeltaStreamer 
> application, and it will `new KafkaConsumer(kafkaParams)` without close, and 
> Exception will throw after a while.
> ```
> java.net.SocketException: Too many open files
>   at sun.nio.ch.Net.socket0(Native Method)
>   at sun.nio.ch.Net.socket(Net.java:411)
>   at sun.nio.ch.Net.socket(Net.java:404)
>   at sun.nio.ch.SocketChannelImpl.(SocketChannelImpl.java:105)
>   at 
> sun.nio.ch.SelectorProviderImpl.openSocketChannel(SelectorProviderImpl.java:60)
>   at java.nio.channels.SocketChannel.open(SocketChannel.java:145)
>   at org.apache.kafka.common.network.Selector.connect(Selector.java:211)
>   at 
> org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:864)
>   at org.apache.kafka.clients.NetworkClient.ready(NetworkClient.java:265)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.trySend(ConsumerNetworkClient.java:485)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:261)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:242)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:218)
>   at 
> org.apache.kafka.clients.consumer.internals.Fetcher.getTopicMetadata(Fetcher.java:274)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.partitionsFor(KafkaConsumer.java:1774)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.partitionsFor(KafkaConsumer.java:1742)
>   at 
> org.apache.hudi.utilities.sources.helpers.KafkaOffsetGen.getNextOffsetRanges(KafkaOffsetGen.java:177)
>   at 
> org.apache.hudi.utilities.sources.JsonKafkaSource.fetchNewData(JsonKafkaSource.java:56)
>   at org.apache.hudi.utilities.sources.Source.fetchNext(Source.java:73)
>   at 
> org.apache.hudi.utilities.deltastreamer.SourceFormatAdapter.fetchNewDataInRowFormat(SourceFormatAdapter.java:107)
>   at 
> org.apache.hudi.utilities.deltastreamer.DeltaSync.readFromSource(DeltaSync.java:288)
>   at 
> org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:226)
> ```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-617) Add support for data types convertible to String in TimestampBasedKeyGenerator

2020-02-23 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-617.

Fix Version/s: 0.5.2
   Resolution: Fixed

Fixed via master: c2b08cdfc9b762801a63fee988f1c24cc17df4ce

> Add support for data types convertible to String in TimestampBasedKeyGenerator
> --
>
> Key: HUDI-617
> URL: https://issues.apache.org/jira/browse/HUDI-617
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Utilities
>Reporter: Amit Singh
>Priority: Minor
>  Labels: easyfix, pull-request-available
> Fix For: 0.5.2
>
> Attachments: test_data.json, test_schema.avsc
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently, TimestampBasedKeyGenerator only supports 4 data types for the 
> partition key. They are  Double, Long, Float and String. However, if the 
> `avro.java.string` is not specified in the schema provided, Hudi throws the 
> following error:
>  org.apache.hudi.exception.HoodieNotSupportedException: Unexpected type for 
> partition field: org.apache.avro.util.Utf8
>  at 
> org.apache.hudi.utilities.keygen.TimestampBasedKeyGenerator.getKey(TimestampBasedKeyGenerator.java:111)
>  at 
> org.apache.hudi.utilities.deltastreamer.DeltaSync.lambda$readFromSource$f92c188c$1(DeltaSync.java:338)
> 
>  It will be better if the support was more generalised to include the data 
> types that provide method to convert them to String such as `Utf8` since all 
> these methods implement the `CharSequence` interface.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-617) Add support for data types convertible to String in TimestampBasedKeyGenerator

2020-02-23 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-617:
---
Status: Open  (was: New)

> Add support for data types convertible to String in TimestampBasedKeyGenerator
> --
>
> Key: HUDI-617
> URL: https://issues.apache.org/jira/browse/HUDI-617
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Utilities
>Reporter: Amit Singh
>Priority: Minor
>  Labels: easyfix, pull-request-available
> Attachments: test_data.json, test_schema.avsc
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently, TimestampBasedKeyGenerator only supports 4 data types for the 
> partition key. They are  Double, Long, Float and String. However, if the 
> `avro.java.string` is not specified in the schema provided, Hudi throws the 
> following error:
>  org.apache.hudi.exception.HoodieNotSupportedException: Unexpected type for 
> partition field: org.apache.avro.util.Utf8
>  at 
> org.apache.hudi.utilities.keygen.TimestampBasedKeyGenerator.getKey(TimestampBasedKeyGenerator.java:111)
>  at 
> org.apache.hudi.utilities.deltastreamer.DeltaSync.lambda$readFromSource$f92c188c$1(DeltaSync.java:338)
> 
>  It will be better if the support was more generalised to include the data 
> types that provide method to convert them to String such as `Utf8` since all 
> these methods implement the `CharSequence` interface.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-636) Fix could not get sources warnings while compiling

2020-03-01 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-636.
--

> Fix could not get sources warnings while compiling 
> ---
>
> Key: HUDI-636
> URL: https://issues.apache.org/jira/browse/HUDI-636
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: lamber-ken
>Assignee: lamber-ken
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> During the voting process on rc1 0.5.1-incubating release, Justin pointed out 
> that mvn log display could not get sources warnings
>  
> [https://lists.apache.org/thread.html/rd3f4a72d82a4a5a81b2c6bd71e1417054daa38637ce8e07901f26f04%40%3Cgeneral.incubator.apache.org%3E]
>  
> {code:java}
> [INFO] --- maven-shade-plugin:3.1.1:shade (default) @ hudi-hadoop-mr-bundle 
> ---
> [INFO] Including org.apache.hudi:hudi-common:jar:0.5.2-SNAPSHOT in the shaded 
> jar.
> Downloading from aliyun: 
> http://maven.aliyun.com/nexus/content/groups/public/org/apache/hudi/hudi-common/0.5.2-SNAPSHOT/hudi-common-0.5.2-SNAPSHOT-sources.jar
> Downloading from cloudera: 
> https://repository.cloudera.com/artifactory/cloudera-repos/org/apache/hudi/hudi-common/0.5.2-SNAPSHOT/hudi-common-0.5.2-SNAPSHOT-sources.jar
> Downloading from confluent: 
> https://packages.confluent.io/maven/org/apache/hudi/hudi-common/0.5.2-SNAPSHOT/hudi-common-0.5.2-SNAPSHOT-sources.jar
> Downloading from libs-milestone: 
> https://repo.spring.io/libs-milestone/org/apache/hudi/hudi-common/0.5.2-SNAPSHOT/hudi-common-0.5.2-SNAPSHOT-sources.jar
> Downloading from libs-release: 
> https://repo.spring.io/libs-release/org/apache/hudi/hudi-common/0.5.2-SNAPSHOT/hudi-common-0.5.2-SNAPSHOT-sources.jar
> Downloading from apache.snapshots: 
> https://repository.apache.org/snapshots/org/apache/hudi/hudi-common/0.5.2-SNAPSHOT/hudi-common-0.5.2-SNAPSHOT-sources.jar
> [WARNING] Could not get sources for 
> org.apache.hudi:hudi-common:jar:0.5.2-SNAPSHOT:compile
> [INFO] Excluding com.fasterxml.jackson.core:jackson-annotations:jar:2.6.7 
> from the shaded jar.
> [INFO] Excluding com.fasterxml.jackson.core:jackson-databind:jar:2.6.7.1 from 
> the shaded jar.
> [INFO] Excluding com.fasterxml.jackson.core:jackson-core:jar:2.6.7 from the 
> shaded jar.
> [INFO] Excluding org.apache.httpcomponents:fluent-hc:jar:4.3.2 from the 
> shaded jar.
> [INFO] Excluding commons-logging:commons-logging:jar:1.1.3 from the shaded 
> jar.
> [INFO] Excluding org.apache.httpcomponents:httpclient:jar:4.3.6 from the 
> shaded jar.
> [INFO] Excluding org.apache.httpcomponents:httpcore:jar:4.3.2 from the shaded 
> jar.
> [INFO] Excluding commons-codec:commons-codec:jar:1.6 from the shaded jar.
> [INFO] Excluding org.rocksdb:rocksdbjni:jar:5.17.2 from the shaded jar.
> [INFO] Including com.esotericsoftware:kryo-shaded:jar:4.0.2 in the shaded jar.
> [INFO] Including com.esotericsoftware:minlog:jar:1.3.0 in the shaded jar.
> [INFO] Including org.objenesis:objenesis:jar:2.5.1 in the shaded jar.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-636) Fix could not get sources warnings while compiling

2020-03-01 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-636.

Fix Version/s: 0.5.2
   Resolution: Fixed

Fixed via master: cacd9a33222d28c905891362312545230b6d30b9

> Fix could not get sources warnings while compiling 
> ---
>
> Key: HUDI-636
> URL: https://issues.apache.org/jira/browse/HUDI-636
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: lamber-ken
>Assignee: lamber-ken
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> During the voting process on rc1 0.5.1-incubating release, Justin pointed out 
> that mvn log display could not get sources warnings
>  
> [https://lists.apache.org/thread.html/rd3f4a72d82a4a5a81b2c6bd71e1417054daa38637ce8e07901f26f04%40%3Cgeneral.incubator.apache.org%3E]
>  
> {code:java}
> [INFO] --- maven-shade-plugin:3.1.1:shade (default) @ hudi-hadoop-mr-bundle 
> ---
> [INFO] Including org.apache.hudi:hudi-common:jar:0.5.2-SNAPSHOT in the shaded 
> jar.
> Downloading from aliyun: 
> http://maven.aliyun.com/nexus/content/groups/public/org/apache/hudi/hudi-common/0.5.2-SNAPSHOT/hudi-common-0.5.2-SNAPSHOT-sources.jar
> Downloading from cloudera: 
> https://repository.cloudera.com/artifactory/cloudera-repos/org/apache/hudi/hudi-common/0.5.2-SNAPSHOT/hudi-common-0.5.2-SNAPSHOT-sources.jar
> Downloading from confluent: 
> https://packages.confluent.io/maven/org/apache/hudi/hudi-common/0.5.2-SNAPSHOT/hudi-common-0.5.2-SNAPSHOT-sources.jar
> Downloading from libs-milestone: 
> https://repo.spring.io/libs-milestone/org/apache/hudi/hudi-common/0.5.2-SNAPSHOT/hudi-common-0.5.2-SNAPSHOT-sources.jar
> Downloading from libs-release: 
> https://repo.spring.io/libs-release/org/apache/hudi/hudi-common/0.5.2-SNAPSHOT/hudi-common-0.5.2-SNAPSHOT-sources.jar
> Downloading from apache.snapshots: 
> https://repository.apache.org/snapshots/org/apache/hudi/hudi-common/0.5.2-SNAPSHOT/hudi-common-0.5.2-SNAPSHOT-sources.jar
> [WARNING] Could not get sources for 
> org.apache.hudi:hudi-common:jar:0.5.2-SNAPSHOT:compile
> [INFO] Excluding com.fasterxml.jackson.core:jackson-annotations:jar:2.6.7 
> from the shaded jar.
> [INFO] Excluding com.fasterxml.jackson.core:jackson-databind:jar:2.6.7.1 from 
> the shaded jar.
> [INFO] Excluding com.fasterxml.jackson.core:jackson-core:jar:2.6.7 from the 
> shaded jar.
> [INFO] Excluding org.apache.httpcomponents:fluent-hc:jar:4.3.2 from the 
> shaded jar.
> [INFO] Excluding commons-logging:commons-logging:jar:1.1.3 from the shaded 
> jar.
> [INFO] Excluding org.apache.httpcomponents:httpclient:jar:4.3.6 from the 
> shaded jar.
> [INFO] Excluding org.apache.httpcomponents:httpcore:jar:4.3.2 from the shaded 
> jar.
> [INFO] Excluding commons-codec:commons-codec:jar:1.6 from the shaded jar.
> [INFO] Excluding org.rocksdb:rocksdbjni:jar:5.17.2 from the shaded jar.
> [INFO] Including com.esotericsoftware:kryo-shaded:jar:4.0.2 in the shaded jar.
> [INFO] Including com.esotericsoftware:minlog:jar:1.3.0 in the shaded jar.
> [INFO] Including org.objenesis:objenesis:jar:2.5.1 in the shaded jar.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-597) Enable incremental pulling from defined partitions

2020-03-01 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17048544#comment-17048544
 ] 

leesf commented on HUDI-597:


[~garyli1019] I think we could update the DOC after cutting the 0.5.1 docs and 
merge it to 0.5.2 docs, FYI: [~bhasudha]

> Enable incremental pulling from defined partitions
> --
>
> Key: HUDI-597
> URL: https://issues.apache.org/jira/browse/HUDI-597
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>Reporter: Yanjia Gary Li
>Assignee: Yanjia Gary Li
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For the use case that I only need to pull the incremental part of certain 
> partitions, I need to do the incremental pulling from the entire dataset 
> first then filtering in Spark.
> If we can use the folder partitions directly as part of the input path, it 
> could run faster by only load relevant parquet files.
> Example:
>  
> {code:java}
> spark.read.format("org.apache.hudi")
> .option(DataSourceReadOptions.VIEW_TYPE_OPT_KEY,DataSourceReadOptions.VIEW_TYPE_INCREMENTAL_OPT_VAL)
> .option(DataSourceReadOptions.BEGIN_INSTANTTIME_OPT_KEY, "000")
> .load(path, "year=2020/*/*/*")
>  
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-627) Publish coverage to codecov.io

2020-03-01 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-627:
---
Fix Version/s: 0.5.2

> Publish coverage to codecov.io
> --
>
> Key: HUDI-627
> URL: https://issues.apache.org/jira/browse/HUDI-627
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>Reporter: Ramachandran M S
>Assignee: Ramachandran M S
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> * Publish the coverage to codecov.io on every build
>  * Fix code coverage to pickup cross module testing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-627) Publish coverage to codecov.io

2020-03-01 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-627.
--

Fixed via master: acf359c834bc1d9b9c4ea64d362ea20c7410c70a

> Publish coverage to codecov.io
> --
>
> Key: HUDI-627
> URL: https://issues.apache.org/jira/browse/HUDI-627
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>Reporter: Ramachandran M S
>Assignee: Ramachandran M S
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> * Publish the coverage to codecov.io on every build
>  * Fix code coverage to pickup cross module testing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-554) Restructure code/packages to move more code back into hudi-writer-common

2020-03-01 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-554.

Resolution: Fixed

Fixed via master: 71170fafe77e11ea1a458a38e3395a471d94a047

> Restructure code/packages  to move more code back into hudi-writer-common
> -
>
> Key: HUDI-554
> URL: https://issues.apache.org/jira/browse/HUDI-554
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Code Cleanup
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-618) Improve unit test coverage for org.apache.hudi.common.table.view. PriorityBasedFileSystemView

2020-03-01 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-618:
---
Fix Version/s: 0.5.2

> Improve unit test coverage for org.apache.hudi.common.table.view. 
> PriorityBasedFileSystemView
> -
>
> Key: HUDI-618
> URL: https://issues.apache.org/jira/browse/HUDI-618
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>Reporter: Ramachandran M S
>Assignee: Ramachandran M S
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Add unit tests for all methods



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-618) Improve unit test coverage for org.apache.hudi.common.table.view. PriorityBasedFileSystemView

2020-03-01 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-618.
--

> Improve unit test coverage for org.apache.hudi.common.table.view. 
> PriorityBasedFileSystemView
> -
>
> Key: HUDI-618
> URL: https://issues.apache.org/jira/browse/HUDI-618
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>Reporter: Ramachandran M S
>Assignee: Ramachandran M S
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Add unit tests for all methods



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-599) Update release guide & release scripts due to the change of scala 2.12 build

2020-02-29 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-599:
---
Status: Open  (was: New)

> Update release guide & release scripts due to the change of scala 2.12 build
> 
>
> Key: HUDI-599
> URL: https://issues.apache.org/jira/browse/HUDI-599
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Release  Administrative
>Reporter: leesf
>Assignee: leesf
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Update release guide due to the change of scala 2.12 build, PR link below
> [https://github.com/apache/incubator-hudi/pull/1293]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-599) Update release guide & release scripts due to the change of scala 2.12 build

2020-02-29 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-599.
--

> Update release guide & release scripts due to the change of scala 2.12 build
> 
>
> Key: HUDI-599
> URL: https://issues.apache.org/jira/browse/HUDI-599
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Release  Administrative
>Reporter: leesf
>Assignee: leesf
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Update release guide due to the change of scala 2.12 build, PR link below
> [https://github.com/apache/incubator-hudi/pull/1293]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-599) Update release guide & release scripts due to the change of scala 2.12 build

2020-02-29 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-599.

Resolution: Fixed

Fixed via master: 0cde27e63c2cf9b70f24f0ae6b63fad9259b28d3

and updated the release guide accordingly.

> Update release guide & release scripts due to the change of scala 2.12 build
> 
>
> Key: HUDI-599
> URL: https://issues.apache.org/jira/browse/HUDI-599
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Release  Administrative
>Reporter: leesf
>Assignee: leesf
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Update release guide due to the change of scala 2.12 build, PR link below
> [https://github.com/apache/incubator-hudi/pull/1293]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-666) sync updated docs to chinese.

2020-03-06 Thread leesf (Jira)
leesf created HUDI-666:
--

 Summary: sync updated docs to chinese.
 Key: HUDI-666
 URL: https://issues.apache.org/jira/browse/HUDI-666
 Project: Apache Hudi (incubating)
  Issue Type: Improvement
  Components: docs-chinese
Reporter: leesf
Assignee: vinoyang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-578) Trim recordKeyFields and partitionPathFields in ComplexKeyGenerator

2020-01-26 Thread leesf (Jira)
leesf created HUDI-578:
--

 Summary: Trim recordKeyFields and partitionPathFields in 
ComplexKeyGenerator
 Key: HUDI-578
 URL: https://issues.apache.org/jira/browse/HUDI-578
 Project: Apache Hudi (incubating)
  Issue Type: Improvement
Reporter: leesf
Assignee: leesf
 Fix For: 0.5.2


when use ComplexKeyGenerator

the options the below.
{code:java}
option("hoodie.datasource.write.recordkey.field", "name, age").
option("hoodie.datasource.write.keygenerator.class", 
ComplexKeyGenerator.class.getName()).
option("hoodie.datasource.write.partitionpath.field", "location, age").
{code}

and the data is 

{code:java}
"{ \"name\": \"name1\", \"ts\": 1574297893839, \"age\": 15, \"location\": 
\"latitude\", \"sex\":\"male\"}"
{code}

the result is incorrect with age = null in recordkey, and age = default in 
partitionpath.

We would trim the paritions and recordkeys in complexKeyGenerator



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-587) Jacoco coverage report is not generated

2020-02-05 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-587.

Fix Version/s: 0.5.2
   Resolution: Fixed

Fixed via master: d26dc0b229043afa5aefca239e72f40d80446917

> Jacoco coverage report is not generated
> ---
>
> Key: HUDI-587
> URL: https://issues.apache.org/jira/browse/HUDI-587
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>Reporter: Prashant Wason
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>   Original Estimate: 1h
>  Time Spent: 20m
>  Remaining Estimate: 40m
>
> When running tests, the jacoco coverage report is not generated. The jacoco 
> plugin is loaded, it sets the correct Java Agent line, bit it fails to find 
> the execution data file after tests complete.
> Example:
> mvn test -Dtest=TestHoodieActiveTimeline
> ...
> 22:42:40 [INFO] — jacoco-maven-plugin:0.7.8:prepare-agent (pre-unit-test) @ 
> hudi-common —
>  22:42:40 [INFO] *surefireArgLine set to 
> javaagent:/home/pwason/.m2/repository/org/jacoco/org.jacoco.agent/0.7.8/org.jacoco.agent-0.7.8-runtime.jar=destfile=/home/pwason/work/java/incubator-hudi/hudi-common/target/coverage-reports/jacocout.exec*
> *...*
> 22:42:49 [INFO] — jacoco-maven-plugin:0.7.8:report (post-unit-test) @ 
> hudi-common —
>  22:42:49 [INFO] *Skipping JaCoCo execution due to missing execution data 
> file.*
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-570) Improve unit test coverage FSUtils.java

2020-02-05 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-570:
---
Status: Open  (was: New)

> Improve unit test coverage FSUtils.java
> ---
>
> Key: HUDI-570
> URL: https://issues.apache.org/jira/browse/HUDI-570
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>Reporter: Balajee Nagasubramaniam
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Add test cases for 
> - deleteOlderRollbackMetaFiles()
> - deleteOlderCleanMetaFiles()



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-570) Improve unit test coverage FSUtils.java

2020-02-05 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-570.

Fix Version/s: 0.5.2
   Resolution: Fixed

Fixed via master: 1fb0b001a38ddc940995e45f5cd53701d0110c3b

> Improve unit test coverage FSUtils.java
> ---
>
> Key: HUDI-570
> URL: https://issues.apache.org/jira/browse/HUDI-570
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>Reporter: Balajee Nagasubramaniam
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Add test cases for 
> - deleteOlderRollbackMetaFiles()
> - deleteOlderCleanMetaFiles()



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-571) Modify Hudi CLI to show archived commits

2020-02-05 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-571:
---
Status: Closed  (was: Patch Available)

> Modify Hudi CLI to show archived commits
> 
>
> Key: HUDI-571
> URL: https://issues.apache.org/jira/browse/HUDI-571
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: CLI
>Reporter: satish
>Assignee: satish
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Hudi CLI has 'show archived commits' command which is not very helpful
>  
> {code:java}
> ->show archived commits
> ===> Showing only 10 archived commits <===
>     
>     | CommitTime    | CommitType|
>     |===|
>     | 2019033304| commit    |
>     | 20190323220154| commit    |
>     | 20190323220154| commit    |
>     | 20190323224004| commit    |
>     | 20190323224013| commit    |
>     | 20190323224229| commit    |
>     | 20190323224229| commit    |
>     | 20190323232849| commit    |
>     | 20190323233109| commit    |
>     | 20190323233109| commit    |
>  {code}
> Modify or introduce new command to make it easy to debug
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-571) Modify Hudi CLI to show archived commits

2020-02-05 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-571:
---
Fix Version/s: 0.5.2

> Modify Hudi CLI to show archived commits
> 
>
> Key: HUDI-571
> URL: https://issues.apache.org/jira/browse/HUDI-571
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: CLI
>Reporter: satish
>Assignee: satish
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Hudi CLI has 'show archived commits' command which is not very helpful
>  
> {code:java}
> ->show archived commits
> ===> Showing only 10 archived commits <===
>     
>     | CommitTime    | CommitType|
>     |===|
>     | 2019033304| commit    |
>     | 20190323220154| commit    |
>     | 20190323220154| commit    |
>     | 20190323224004| commit    |
>     | 20190323224013| commit    |
>     | 20190323224229| commit    |
>     | 20190323224229| commit    |
>     | 20190323232849| commit    |
>     | 20190323233109| commit    |
>     | 20190323233109| commit    |
>  {code}
> Modify or introduce new command to make it easy to debug
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-564) Improve unit test coverage for org.apache.hudi.common.table.log.HoodieLogFormatVersion

2020-01-30 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-564.

Fix Version/s: 0.5.2
   Resolution: Fixed

Fixedd via master: f27c7a16c6d437efaa83e50a7117b83e5201ac49

> Improve unit test coverage for 
> org.apache.hudi.common.table.log.HoodieLogFormatVersion
> --
>
> Key: HUDI-564
> URL: https://issues.apache.org/jira/browse/HUDI-564
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>Reporter: Prashant Wason
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-578) Trim recordKeyFields and partitionPathFields in ComplexKeyGenerator

2020-01-30 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-578.

Resolution: Fixed

Fixed via master: 652224edc882c083ac46cff095324975e2457004

> Trim recordKeyFields and partitionPathFields in ComplexKeyGenerator
> ---
>
> Key: HUDI-578
> URL: https://issues.apache.org/jira/browse/HUDI-578
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>    Reporter: leesf
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> when use ComplexKeyGenerator
> the options the below.
> {code:java}
> option("hoodie.datasource.write.recordkey.field", "name, age").
> option("hoodie.datasource.write.keygenerator.class", 
> ComplexKeyGenerator.class.getName()).
> option("hoodie.datasource.write.partitionpath.field", "location, age").
> {code}
> and the data is 
> {code:java}
> "{ \"name\": \"name1\", \"ts\": 1574297893839, \"age\": 15, \"location\": 
> \"latitude\", \"sex\":\"male\"}"
> {code}
> the result is incorrect with age = null in recordkey, and age = default in 
> partitionpath.
> We would trim the paritions and recordkeys in complexKeyGenerator



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-578) Trim recordKeyFields and partitionPathFields in ComplexKeyGenerator

2020-01-30 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-578:
---
Status: Open  (was: New)

> Trim recordKeyFields and partitionPathFields in ComplexKeyGenerator
> ---
>
> Key: HUDI-578
> URL: https://issues.apache.org/jira/browse/HUDI-578
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>    Reporter: leesf
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> when use ComplexKeyGenerator
> the options the below.
> {code:java}
> option("hoodie.datasource.write.recordkey.field", "name, age").
> option("hoodie.datasource.write.keygenerator.class", 
> ComplexKeyGenerator.class.getName()).
> option("hoodie.datasource.write.partitionpath.field", "location, age").
> {code}
> and the data is 
> {code:java}
> "{ \"name\": \"name1\", \"ts\": 1574297893839, \"age\": 15, \"location\": 
> \"latitude\", \"sex\":\"male\"}"
> {code}
> the result is incorrect with age = null in recordkey, and age = default in 
> partitionpath.
> We would trim the paritions and recordkeys in complexKeyGenerator



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-550) Add to Release Notes : Configuration Value change for Kafka Reset Offset Strategies

2020-01-30 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17027253#comment-17027253
 ] 

leesf commented on HUDI-550:


Fixed via asf-site: 20ede76c4c79c0804518a4fe148b8fcd48391f5c

> Add to Release Notes : Configuration Value change for Kafka Reset Offset 
> Strategies
> ---
>
> Key: HUDI-550
> URL: https://issues.apache.org/jira/browse/HUDI-550
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Release  Administrative
>Reporter: Balaji Varadarajan
>Assignee: leesf
>Priority: Blocker
> Fix For: 0.5.1
>
>
> Enum Values are changed for configuring kafka reset offset strategies in 
> deltastreamer
>    LARGEST -> LATEST
>   SMALLEST -> EARLIEST
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-550) Add to Release Notes : Configuration Value change for Kafka Reset Offset Strategies

2020-01-30 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-550:
---
Status: Closed  (was: Patch Available)

> Add to Release Notes : Configuration Value change for Kafka Reset Offset 
> Strategies
> ---
>
> Key: HUDI-550
> URL: https://issues.apache.org/jira/browse/HUDI-550
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Release  Administrative
>Reporter: Balaji Varadarajan
>Assignee: leesf
>Priority: Blocker
> Fix For: 0.5.1
>
>
> Enum Values are changed for configuring kafka reset offset strategies in 
> deltastreamer
>    LARGEST -> LATEST
>   SMALLEST -> EARLIEST
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-588) Sync latest docs to cn docs

2020-01-30 Thread leesf (Jira)
leesf created HUDI-588:
--

 Summary: Sync latest docs to cn docs
 Key: HUDI-588
 URL: https://issues.apache.org/jira/browse/HUDI-588
 Project: Apache Hudi (incubating)
  Issue Type: Improvement
  Components: docs-chinese
Reporter: leesf
Assignee: vinoyang


Sync latest website docs to cn docs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-543) Carefully draft release notes for 0.5.1 with all breaking/user impacting changes

2020-01-30 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-543.

Resolution: Fixed

Fixed via asf-site: 20ede76c4c79c0804518a4fe148b8fcd48391f5c

> Carefully draft release notes for 0.5.1 with all breaking/user impacting 
> changes
> 
>
> Key: HUDI-543
> URL: https://issues.apache.org/jira/browse/HUDI-543
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Release  Administrative
>Reporter: Vinoth Chandar
>Assignee: leesf
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Call out all breaking changes : 
>  * Spark 2.4 support drop, avro version change etc.  "Hudi 0.5.1+ above needs 
> Spark 2.4+"
>  * Need for shading custom Payloads 
>  * --packages for spark-shell 
>  * key generator changes 
>  * _ro suffix for read optimized views.. 
>  * Delta streamer command line changes
>  * Scala version changes.. packages names now have _2.11
>  
> Also need to call out major release highlights (quoting docs/blogs as 
> available)
>  * better delete support
>  * dynamic bloom filters
>  * DMS support
>  
>  
> I am also linking the different jiras as subtaks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-547) Call out changes in package names due to scala cross compiling support

2020-01-30 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17027254#comment-17027254
 ] 

leesf commented on HUDI-547:


Fixed via asf-site: 20ede76c4c79c0804518a4fe148b8fcd48391f5c

> Call out changes in package names due to scala cross compiling support
> --
>
> Key: HUDI-547
> URL: https://issues.apache.org/jira/browse/HUDI-547
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Release  Administrative
>Reporter: Balaji Varadarajan
>Assignee: leesf
>Priority: Blocker
> Fix For: 0.5.1
>
>
> Two versions of each of the below packages needs to be built. 
> hudi-spark is hudi-spark_2.11 and hudi-spark_2.12
> hudi-utilities is hudi-utilities_2.11 and hudi-utilities_2.12
> hudi-spark-bundle is hudi-spark-bundle_2.11 and hudi-spark-bundle_2.12
> hudi-utilities-bundle is hudi-utilities-bundle_2.11 and 
> hudi-utilities-bundle_2.12
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-547) Call out changes in package names due to scala cross compiling support

2020-01-30 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-547:
---
Status: Closed  (was: Patch Available)

> Call out changes in package names due to scala cross compiling support
> --
>
> Key: HUDI-547
> URL: https://issues.apache.org/jira/browse/HUDI-547
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Release  Administrative
>Reporter: Balaji Varadarajan
>Assignee: leesf
>Priority: Blocker
> Fix For: 0.5.1
>
>
> Two versions of each of the below packages needs to be built. 
> hudi-spark is hudi-spark_2.11 and hudi-spark_2.12
> hudi-utilities is hudi-utilities_2.11 and hudi-utilities_2.12
> hudi-spark-bundle is hudi-spark-bundle_2.11 and hudi-spark-bundle_2.12
> hudi-utilities-bundle is hudi-utilities-bundle_2.11 and 
> hudi-utilities-bundle_2.12
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-586) Revisit the release guide

2020-01-30 Thread leesf (Jira)
leesf created HUDI-586:
--

 Summary: Revisit the release guide
 Key: HUDI-586
 URL: https://issues.apache.org/jira/browse/HUDI-586
 Project: Apache Hudi (incubating)
  Issue Type: Improvement
  Components: Release  Administrative
Reporter: leesf
 Fix For: 0.5.2


Currently, the release guide is not very standard, mainly meaning the finalize 
the release step, we would refer to FLINK 
[https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release] , 
main change might be not adding rc-\{RC_NUM} to the pom.xml.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-586) Revisit the release guide

2020-01-30 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17027150#comment-17027150
 ] 

leesf commented on HUDI-586:


[~vinoth] [~vbalaji] please chime in to standarize the release guide.

> Revisit the release guide
> -
>
> Key: HUDI-586
> URL: https://issues.apache.org/jira/browse/HUDI-586
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Release  Administrative
>Reporter: leesf
>Priority: Major
> Fix For: 0.5.2
>
>
> Currently, the release guide is not very standard, mainly meaning the 
> finalize the release step, we would refer to FLINK 
> [https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release] 
> , main change might be not adding rc-\{RC_NUM} to the pom.xml.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-583) cleanup legacy code

2020-02-02 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-583:
---
Status: Open  (was: New)

> cleanup legacy code 
> 
>
> Key: HUDI-583
> URL: https://issues.apache.org/jira/browse/HUDI-583
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Cleaner
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> See [https://github.com/apache/incubator-hudi/pull/1237]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-583) cleanup legacy code

2020-02-02 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-583.

Resolution: Fixed

Fixed via master: 5b7bb142dc6712c41fd8ada208ab3186369431f9

> cleanup legacy code 
> 
>
> Key: HUDI-583
> URL: https://issues.apache.org/jira/browse/HUDI-583
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Cleaner
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> See [https://github.com/apache/incubator-hudi/pull/1237]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-238) Make separate release for hudi spark/scala based packages for scala 2.12

2020-02-02 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-238:
---
Fix Version/s: (was: 0.5.2)
   0.5.1

> Make separate release for hudi spark/scala based packages for scala 2.12 
> -
>
> Key: HUDI-238
> URL: https://issues.apache.org/jira/browse/HUDI-238
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Release  Administrative, Usability
>Reporter: Balaji Varadarajan
>Assignee: Tadas Sugintas
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> [https://github.com/apache/incubator-hudi/issues/881#issuecomment-528700749]
> Suspects: 
> h3. Hudi utilities package 
> bringing in spark-streaming-kafka-0.8* 
> {code:java}
> [INFO] Scanning for projects...
> [INFO] 
> [INFO] ---< org.apache.hudi:hudi-utilities 
> >---
> [INFO] Building hudi-utilities 0.5.0-SNAPSHOT
> [INFO] [ jar 
> ]-
> [INFO] 
> [INFO] --- maven-dependency-plugin:3.1.1:tree (default-cli) @ hudi-utilities 
> ---
> [INFO] org.apache.hudi:hudi-utilities:jar:0.5.0-SNAPSHOT
> [INFO] ...
> [INFO] +- org.apache.hudi:hudi-client:jar:0.5.0-SNAPSHOT:compile
>...
> [INFO] 
> [INFO] +- org.apache.hudi:hudi-spark:jar:0.5.0-SNAPSHOT:compile
> [INFO] |  \- org.scala-lang:scala-library:jar:2.11.8:compile
> [INFO] +- log4j:log4j:jar:1.2.17:compile
>...
> [INFO] +- org.apache.spark:spark-core_2.11:jar:2.1.0:provided
> [INFO] |  +- org.apache.avro:avro-mapred:jar:hadoop2:1.7.7:provided
> [INFO] |  |  +- org.apache.avro:avro-ipc:jar:1.7.7:provided
> [INFO] |  |  \- org.apache.avro:avro-ipc:jar:tests:1.7.7:provided
> [INFO] |  +- com.twitter:chill_2.11:jar:0.8.0:provided
> [INFO] |  +- com.twitter:chill-java:jar:0.8.0:provided
> [INFO] |  +- org.apache.xbean:xbean-asm5-shaded:jar:4.4:provided
> [INFO] |  +- org.apache.spark:spark-launcher_2.11:jar:2.1.0:provided
> [INFO] |  +- org.apache.spark:spark-network-common_2.11:jar:2.1.0:provided
> [INFO] |  +- org.apache.spark:spark-network-shuffle_2.11:jar:2.1.0:provided
> [INFO] |  +- org.apache.spark:spark-unsafe_2.11:jar:2.1.0:provided
> [INFO] |  +- net.java.dev.jets3t:jets3t:jar:0.7.1:provided
> [INFO] |  +- org.apache.curator:curator-recipes:jar:2.4.0:provided
> [INFO] |  +- org.apache.commons:commons-lang3:jar:3.5:provided
> [INFO] |  +- org.apache.commons:commons-math3:jar:3.4.1:provided
> [INFO] |  +- com.google.code.findbugs:jsr305:jar:1.3.9:provided
> [INFO] |  +- org.slf4j:slf4j-api:jar:1.7.16:compile
> [INFO] |  +- org.slf4j:jul-to-slf4j:jar:1.7.16:provided
> [INFO] |  +- org.slf4j:jcl-over-slf4j:jar:1.7.16:provided
> [INFO] |  +- org.slf4j:slf4j-log4j12:jar:1.7.16:compile
> [INFO] |  +- com.ning:compress-lzf:jar:1.0.3:provided
> [INFO] |  +- org.xerial.snappy:snappy-java:jar:1.1.2.6:compile
> [INFO] |  +- net.jpountz.lz4:lz4:jar:1.3.0:compile
> [INFO] |  +- org.roaringbitmap:RoaringBitmap:jar:0.5.11:provided
> [INFO] |  +- commons-net:commons-net:jar:2.2:provided
>
> [INFO] +- org.apache.spark:spark-sql_2.11:jar:2.1.0:provided
> [INFO] |  +- com.univocity:univocity-parsers:jar:2.2.1:provided
> [INFO] |  +- org.apache.spark:spark-sketch_2.11:jar:2.1.0:provided
> [INFO] |  \- org.apache.spark:spark-catalyst_2.11:jar:2.1.0:provided
> [INFO] | +- org.codehaus.janino:janino:jar:3.0.0:provided
> [INFO] | +- org.codehaus.janino:commons-compiler:jar:3.0.0:provided
> [INFO] | \- org.antlr:antlr4-runtime:jar:4.5.3:provided
> [INFO] +- com.databricks:spark-avro_2.11:jar:4.0.0:provided
> [INFO] +- org.apache.spark:spark-streaming_2.11:jar:2.1.0:compile
> [INFO] +- org.apache.spark:spark-streaming-kafka-0-8_2.11:jar:2.1.0:compile
> [INFO] |  \- org.apache.kafka:kafka_2.11:jar:0.8.2.1:compile
> [INFO] | +- org.scala-lang.modules:scala-xml_2.11:jar:1.0.2:compile
> [INFO] | +- 
> org.scala-lang.modules:scala-parser-combinators_2.11:jar:1.0.2:compile
> [INFO] | \- org.apache.kafka:kafka-clients:jar:0.8.2.1:compile
> [INFO] +- io.dropwizard.metrics:metrics-core:jar:4.0.2:compile
> [INFO] +- org.antlr:stringtemplate:jar:4.0.2:compile
> [INFO] |  \- org.antlr:antlr-runtime:jar:3.3:compile
> [INFO] +- com.beust:jcommander:jar:1.72:compile
> [INFO] +- com.twitter:bijection-avro_2.11:jar:0.9.2:compile
> [INFO] |  \- com.twitter:bijection-core_2.11:jar:0.9.2:compile
> [INFO] +- io.confluent:ka

[jira] [Commented] (HUDI-590) Cut a new Doc version 0.5.1 explicitly

2020-02-07 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17032262#comment-17032262
 ] 

leesf commented on HUDI-590:


[~bhavanisudha] Thanks.

> Cut a new Doc version 0.5.1 explicitly
> --
>
> Key: HUDI-590
> URL: https://issues.apache.org/jira/browse/HUDI-590
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Docs, Release  Administrative
>Reporter: Bhavani Sudha
>Assignee: Bhavani Sudha
>Priority: Major
>
> The latest version of docs needs to be tagged as 0.5.1 explicitly in the 
> site. Follow instructions in 
> [https://github.com/apache/incubator-hudi/blob/asf-site/README.md#updating-site]
>  to create a new dir 0.5.1 under docs/_docs/ 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-549) Update documentation to reflect changes in package names due to scala cross compiling support

2020-01-25 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-549.

Resolution: Fixed

Fixed via master: 1e79cbc259b92f75e5fd387c0271b163532aebb9

> Update documentation to reflect changes in package names due to scala cross 
> compiling support
> -
>
> Key: HUDI-549
> URL: https://issues.apache.org/jira/browse/HUDI-549
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Release  Administrative
>Reporter: Balaji Varadarajan
>Assignee: Bhavani Sudha Saktheeswaran
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Two versions of each of the below packages will be built. Please note the 
> change in package names and update documentation. 
> hudi-spark is hudi-spark_2.11 and hudi-spark_2.12
> hudi-utilities is hudi-utilities_2.11 and hudi-utilities_2.12
> hudi-spark-bundle is hudi-spark-bundle_2.11 and hudi-spark-bundle_2.12
> hudi-utilities-bundle is hudi-utilities-bundle_2.11 and 
> hudi-utilities-bundle_2.12
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-550) Add to Release Notes : Configuration Value change for Kafka Reset Offset Strategies

2020-01-25 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-550:
---
Status: Patch Available  (was: In Progress)

> Add to Release Notes : Configuration Value change for Kafka Reset Offset 
> Strategies
> ---
>
> Key: HUDI-550
> URL: https://issues.apache.org/jira/browse/HUDI-550
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Release  Administrative
>Reporter: Balaji Varadarajan
>Assignee: leesf
>Priority: Blocker
> Fix For: 0.5.1
>
>
> Enum Values are changed for configuring kafka reset offset strategies in 
> deltastreamer
>    LARGEST -> LATEST
>   SMALLEST -> EARLIEST
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-403) Publish a deployment guide talking about deployment options, upgrading etc

2020-01-25 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-403.

Resolution: Fixed

Fixed via asf-site: 41754bb31bb8656d0570371ba2283c987f9a8c22

> Publish a deployment guide talking about deployment options, upgrading etc
> --
>
> Key: HUDI-403
> URL: https://issues.apache.org/jira/browse/HUDI-403
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Docs
>Reporter: Vinoth Chandar
>Assignee: Balaji Varadarajan
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Things to cover 
>  # Upgrade readers first, Upgrade writers next, Principles of compatibility 
> followed
>  # DeltaStreamer Deployment models
>  # Scheduling Compactions.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-547) Call out changes in package names due to scala cross compiling support

2020-01-25 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-547:
---
Status: Patch Available  (was: In Progress)

> Call out changes in package names due to scala cross compiling support
> --
>
> Key: HUDI-547
> URL: https://issues.apache.org/jira/browse/HUDI-547
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Release  Administrative
>Reporter: Balaji Varadarajan
>Assignee: leesf
>Priority: Blocker
> Fix For: 0.5.1
>
>
> Two versions of each of the below packages needs to be built. 
> hudi-spark is hudi-spark_2.11 and hudi-spark_2.12
> hudi-utilities is hudi-utilities_2.11 and hudi-utilities_2.12
> hudi-spark-bundle is hudi-spark-bundle_2.11 and hudi-spark-bundle_2.12
> hudi-utilities-bundle is hudi-utilities-bundle_2.11 and 
> hudi-utilities-bundle_2.12
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-536) Update release notes to include KeyGenerator package changes

2020-01-23 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-536:
---
Status: In Progress  (was: Open)

> Update release notes to include KeyGenerator package changes
> 
>
> Key: HUDI-536
> URL: https://issues.apache.org/jira/browse/HUDI-536
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>Reporter: Brandon Scheller
>Priority: Major
> Fix For: 0.5.1
>
>
> The change introduced here:
>  [https://github.com/apache/incubator-hudi/pull/1194]
> Refactors hudi keygenerators into their own package.
> We need to make this a backwards compatible change or update the release 
> notes to address this.
> Specifically:
> org.apache.hudi.ComplexKeyGenerator -> 
> org.apache.hudi.keygen.ComplexKeyGenerator



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-580) Incorrect license header in docker/hoodie/hadoop/base/entrypoint.sh

2020-01-28 Thread leesf (Jira)
leesf created HUDI-580:
--

 Summary: Incorrect license header in 
docker/hoodie/hadoop/base/entrypoint.sh
 Key: HUDI-580
 URL: https://issues.apache.org/jira/browse/HUDI-580
 Project: Apache Hudi (incubating)
  Issue Type: Improvement
  Components: newbie
Reporter: leesf
 Fix For: 0.5.2


Issues pointed out in general@incubator ML, more context here: 
[https://lists.apache.org/thread.html/rd3f4a72d82a4a5a81b2c6bd71e1417054daa38637ce8e07901f26f04%40%3Cgeneral.incubator.apache.org%3E]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-580) Incorrect license header in docker/hoodie/hadoop/base/entrypoint.sh

2020-01-28 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-580:
---
Description: 
Issues pointed out in general@incubator ML, more context here: 
[https://lists.apache.org/thread.html/rd3f4a72d82a4a5a81b2c6bd71e1417054daa38637ce8e07901f26f04%40%3Cgeneral.incubator.apache.org%3E]

 

Would get it fixed before next release.

  was:Issues pointed out in general@incubator ML, more context here: 
[https://lists.apache.org/thread.html/rd3f4a72d82a4a5a81b2c6bd71e1417054daa38637ce8e07901f26f04%40%3Cgeneral.incubator.apache.org%3E]


> Incorrect license header in docker/hoodie/hadoop/base/entrypoint.sh
> ---
>
> Key: HUDI-580
> URL: https://issues.apache.org/jira/browse/HUDI-580
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: newbie
>Reporter: leesf
>Priority: Major
> Fix For: 0.5.2
>
>
> Issues pointed out in general@incubator ML, more context here: 
> [https://lists.apache.org/thread.html/rd3f4a72d82a4a5a81b2c6bd71e1417054daa38637ce8e07901f26f04%40%3Cgeneral.incubator.apache.org%3E]
>  
> Would get it fixed before next release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-582) NOTICE year is incorrect

2020-01-28 Thread leesf (Jira)
leesf created HUDI-582:
--

 Summary: NOTICE year is incorrect
 Key: HUDI-582
 URL: https://issues.apache.org/jira/browse/HUDI-582
 Project: Apache Hudi (incubating)
  Issue Type: Improvement
  Components: newbie
Reporter: leesf
 Fix For: 0.5.2


Issues pointed out in general@incubator ML, more context here: 
[https://lists.apache.org/thread.html/rd3f4a72d82a4a5a81b2c6bd71e1417054daa38637ce8e07901f26f04%40%3Cgeneral.incubator.apache.org%3E]

 

Would get it fixed before next release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-581) NOTICE need more work as it missing content form included 3rd party ALv2 licensed NOTICE files

2020-01-28 Thread leesf (Jira)
leesf created HUDI-581:
--

 Summary: NOTICE need more work as it missing content form included 
3rd party ALv2 licensed NOTICE files
 Key: HUDI-581
 URL: https://issues.apache.org/jira/browse/HUDI-581
 Project: Apache Hudi (incubating)
  Issue Type: Improvement
Reporter: leesf


Issues pointed out in general@incubator ML, more context here: 
[https://lists.apache.org/thread.html/rd3f4a72d82a4a5a81b2c6bd71e1417054daa38637ce8e07901f26f04%40%3Cgeneral.incubator.apache.org%3E]

 

Would get it fixed before next release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-579) Add border to table on hudi website

2020-01-28 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-579.

Resolution: Fixed

Fixed via asf-site: 4670c026010b61d5bd591119902a19d64d2b8889

> Add border to table on hudi website
> ---
>
> Key: HUDI-579
> URL: https://issues.apache.org/jira/browse/HUDI-579
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Docs
>Reporter: lamber-ken
>Assignee: lamber-ken
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Add border to table which on hudi website



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-595) code cleanup

2020-02-05 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-595:
---
Status: Open  (was: New)

> code cleanup 
> -
>
> Key: HUDI-595
> URL: https://issues.apache.org/jira/browse/HUDI-595
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Code Cleanup
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Moving out the cleanup code from PR# 1159 into a separate PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-595) code cleanup

2020-02-05 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-595.

Resolution: Fixed

Fixed via master: 594da28fbf64fb20432e718a409577fd10516c4a

> code cleanup 
> -
>
> Key: HUDI-595
> URL: https://issues.apache.org/jira/browse/HUDI-595
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Code Cleanup
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Moving out the cleanup code from PR# 1159 into a separate PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-586) Revisit the release guide

2020-02-05 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf reassigned HUDI-586:
--

Assignee: leesf

> Revisit the release guide
> -
>
> Key: HUDI-586
> URL: https://issues.apache.org/jira/browse/HUDI-586
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Release  Administrative
>Reporter: leesf
>Assignee: leesf
>Priority: Major
> Fix For: 0.6.0
>
>
> Currently, the release guide is not very standard, mainly meaning the 
> finalize the release step, we would refer to FLINK 
> [https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release] 
> , main change might be not adding rc-\{RC_NUM} to the pom.xml.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-595) code cleanup

2020-02-05 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-595.
--

> code cleanup 
> -
>
> Key: HUDI-595
> URL: https://issues.apache.org/jira/browse/HUDI-595
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Code Cleanup
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Moving out the cleanup code from PR# 1159 into a separate PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-599) Update release guide/release scripts due to the change of scala 2.12 build

2020-02-05 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-599:
---
Summary: Update release guide/release scripts due to the change of scala 
2.12 build  (was: Update Release guide due to the change of scala 2.12 build)

> Update release guide/release scripts due to the change of scala 2.12 build
> --
>
> Key: HUDI-599
> URL: https://issues.apache.org/jira/browse/HUDI-599
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Release  Administrative
>Reporter: leesf
>Assignee: leesf
>Priority: Major
> Fix For: 0.5.2
>
>
> Update release guide due to the change of scala 2.12 build, PR link below
> [https://github.com/apache/incubator-hudi/pull/1293]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-599) Update release guide & release scripts due to the change of scala 2.12 build

2020-02-05 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-599:
---
Summary: Update release guide & release scripts due to the change of scala 
2.12 build  (was: Update release guide/release scripts due to the change of 
scala 2.12 build)

> Update release guide & release scripts due to the change of scala 2.12 build
> 
>
> Key: HUDI-599
> URL: https://issues.apache.org/jira/browse/HUDI-599
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Release  Administrative
>Reporter: leesf
>Assignee: leesf
>Priority: Major
> Fix For: 0.5.2
>
>
> Update release guide due to the change of scala 2.12 build, PR link below
> [https://github.com/apache/incubator-hudi/pull/1293]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-590) Cut a new Doc version 0.5.1 explicitly

2020-02-05 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17030656#comment-17030656
 ] 

leesf commented on HUDI-590:


[~bhavanisudha] It would be better to create the 0.5.1 version faster since 
there are some docs update in exist PRs. WDYT?

> Cut a new Doc version 0.5.1 explicitly
> --
>
> Key: HUDI-590
> URL: https://issues.apache.org/jira/browse/HUDI-590
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Docs, Release  Administrative
>Reporter: Bhavani Sudha
>Assignee: Bhavani Sudha
>Priority: Major
>
> The latest version of docs needs to be tagged as 0.5.1 explicitly in the 
> site. Follow instructions in 
> [https://github.com/apache/incubator-hudi/blob/asf-site/README.md#updating-site]
>  to create a new dir 0.5.1 under docs/_docs/ 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-599) Update Release guide due to the change of scala 2.12 build

2020-02-05 Thread leesf (Jira)
leesf created HUDI-599:
--

 Summary: Update Release guide due to the change of scala 2.12 build
 Key: HUDI-599
 URL: https://issues.apache.org/jira/browse/HUDI-599
 Project: Apache Hudi (incubating)
  Issue Type: Improvement
  Components: Release  Administrative
Reporter: leesf
Assignee: leesf
 Fix For: 0.5.2


Update release guide due to the change of scala 2.12 build, PR link below

[https://github.com/apache/incubator-hudi/pull/1293]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-238) Make separate release for hudi spark/scala based packages for scala 2.12

2020-02-02 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028623#comment-17028623
 ] 

leesf commented on HUDI-238:


Fixed via master: 292c1e2ff436a711cbbb53ad9b1f6232121d53ec

> Make separate release for hudi spark/scala based packages for scala 2.12 
> -
>
> Key: HUDI-238
> URL: https://issues.apache.org/jira/browse/HUDI-238
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Release  Administrative, Usability
>Reporter: Balaji Varadarajan
>Assignee: Tadas Sugintas
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> [https://github.com/apache/incubator-hudi/issues/881#issuecomment-528700749]
> Suspects: 
> h3. Hudi utilities package 
> bringing in spark-streaming-kafka-0.8* 
> {code:java}
> [INFO] Scanning for projects...
> [INFO] 
> [INFO] ---< org.apache.hudi:hudi-utilities 
> >---
> [INFO] Building hudi-utilities 0.5.0-SNAPSHOT
> [INFO] [ jar 
> ]-
> [INFO] 
> [INFO] --- maven-dependency-plugin:3.1.1:tree (default-cli) @ hudi-utilities 
> ---
> [INFO] org.apache.hudi:hudi-utilities:jar:0.5.0-SNAPSHOT
> [INFO] ...
> [INFO] +- org.apache.hudi:hudi-client:jar:0.5.0-SNAPSHOT:compile
>...
> [INFO] 
> [INFO] +- org.apache.hudi:hudi-spark:jar:0.5.0-SNAPSHOT:compile
> [INFO] |  \- org.scala-lang:scala-library:jar:2.11.8:compile
> [INFO] +- log4j:log4j:jar:1.2.17:compile
>...
> [INFO] +- org.apache.spark:spark-core_2.11:jar:2.1.0:provided
> [INFO] |  +- org.apache.avro:avro-mapred:jar:hadoop2:1.7.7:provided
> [INFO] |  |  +- org.apache.avro:avro-ipc:jar:1.7.7:provided
> [INFO] |  |  \- org.apache.avro:avro-ipc:jar:tests:1.7.7:provided
> [INFO] |  +- com.twitter:chill_2.11:jar:0.8.0:provided
> [INFO] |  +- com.twitter:chill-java:jar:0.8.0:provided
> [INFO] |  +- org.apache.xbean:xbean-asm5-shaded:jar:4.4:provided
> [INFO] |  +- org.apache.spark:spark-launcher_2.11:jar:2.1.0:provided
> [INFO] |  +- org.apache.spark:spark-network-common_2.11:jar:2.1.0:provided
> [INFO] |  +- org.apache.spark:spark-network-shuffle_2.11:jar:2.1.0:provided
> [INFO] |  +- org.apache.spark:spark-unsafe_2.11:jar:2.1.0:provided
> [INFO] |  +- net.java.dev.jets3t:jets3t:jar:0.7.1:provided
> [INFO] |  +- org.apache.curator:curator-recipes:jar:2.4.0:provided
> [INFO] |  +- org.apache.commons:commons-lang3:jar:3.5:provided
> [INFO] |  +- org.apache.commons:commons-math3:jar:3.4.1:provided
> [INFO] |  +- com.google.code.findbugs:jsr305:jar:1.3.9:provided
> [INFO] |  +- org.slf4j:slf4j-api:jar:1.7.16:compile
> [INFO] |  +- org.slf4j:jul-to-slf4j:jar:1.7.16:provided
> [INFO] |  +- org.slf4j:jcl-over-slf4j:jar:1.7.16:provided
> [INFO] |  +- org.slf4j:slf4j-log4j12:jar:1.7.16:compile
> [INFO] |  +- com.ning:compress-lzf:jar:1.0.3:provided
> [INFO] |  +- org.xerial.snappy:snappy-java:jar:1.1.2.6:compile
> [INFO] |  +- net.jpountz.lz4:lz4:jar:1.3.0:compile
> [INFO] |  +- org.roaringbitmap:RoaringBitmap:jar:0.5.11:provided
> [INFO] |  +- commons-net:commons-net:jar:2.2:provided
>
> [INFO] +- org.apache.spark:spark-sql_2.11:jar:2.1.0:provided
> [INFO] |  +- com.univocity:univocity-parsers:jar:2.2.1:provided
> [INFO] |  +- org.apache.spark:spark-sketch_2.11:jar:2.1.0:provided
> [INFO] |  \- org.apache.spark:spark-catalyst_2.11:jar:2.1.0:provided
> [INFO] | +- org.codehaus.janino:janino:jar:3.0.0:provided
> [INFO] | +- org.codehaus.janino:commons-compiler:jar:3.0.0:provided
> [INFO] | \- org.antlr:antlr4-runtime:jar:4.5.3:provided
> [INFO] +- com.databricks:spark-avro_2.11:jar:4.0.0:provided
> [INFO] +- org.apache.spark:spark-streaming_2.11:jar:2.1.0:compile
> [INFO] +- org.apache.spark:spark-streaming-kafka-0-8_2.11:jar:2.1.0:compile
> [INFO] |  \- org.apache.kafka:kafka_2.11:jar:0.8.2.1:compile
> [INFO] | +- org.scala-lang.modules:scala-xml_2.11:jar:1.0.2:compile
> [INFO] | +- 
> org.scala-lang.modules:scala-parser-combinators_2.11:jar:1.0.2:compile
> [INFO] | \- org.apache.kafka:kafka-clients:jar:0.8.2.1:compile
> [INFO] +- io.dropwizard.metrics:metrics-core:jar:4.0.2:compile
> [INFO] +- org.antlr:stringtemplate:jar:4.0.2:compile
> [INFO] |  \- org.antlr:antlr-runtime:jar:3.3:compile
> [INFO] +- com.beust:jcommander:jar:1.72:compile
> [INFO] +- com.twitter:bijection-avro_2.11:jar:0.9.2:compile
> [INFO] |  \- com.twitter:bijection-core_2.11:jar:0.9

[jira] [Updated] (HUDI-550) Add to Release Notes : Configuration Value change for Kafka Reset Offset Strategies

2020-02-02 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-550:
---
Fix Version/s: (was: 0.5.2)
   0.5.1

> Add to Release Notes : Configuration Value change for Kafka Reset Offset 
> Strategies
> ---
>
> Key: HUDI-550
> URL: https://issues.apache.org/jira/browse/HUDI-550
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Release  Administrative
>Reporter: Balaji Varadarajan
>Assignee: leesf
>Priority: Blocker
> Fix For: 0.5.1
>
>
> Enum Values are changed for configuring kafka reset offset strategies in 
> deltastreamer
>    LARGEST -> LATEST
>   SMALLEST -> EARLIEST
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-547) Call out changes in package names due to scala cross compiling support

2020-02-02 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-547:
---
Fix Version/s: (was: 0.5.2)
   0.5.1

> Call out changes in package names due to scala cross compiling support
> --
>
> Key: HUDI-547
> URL: https://issues.apache.org/jira/browse/HUDI-547
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Release  Administrative
>Reporter: Balaji Varadarajan
>Assignee: leesf
>Priority: Blocker
> Fix For: 0.5.1
>
>
> Two versions of each of the below packages needs to be built. 
> hudi-spark is hudi-spark_2.11 and hudi-spark_2.12
> hudi-utilities is hudi-utilities_2.11 and hudi-utilities_2.12
> hudi-spark-bundle is hudi-spark-bundle_2.11 and hudi-spark-bundle_2.12
> hudi-utilities-bundle is hudi-utilities-bundle_2.11 and 
> hudi-utilities-bundle_2.12
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-12) Upgrade Hudi to Spark 2.4

2020-02-02 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-12?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-12:
--
Fix Version/s: (was: 0.5.2)
   0.5.1

> Upgrade Hudi to Spark 2.4
> -
>
> Key: HUDI-12
> URL: https://issues.apache.org/jira/browse/HUDI-12
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Usability, Writer Core
>Reporter: Vinoth Chandar
>Assignee: Udit Mehrotra
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> https://github.com/uber/hudi/issues/549



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-343) Create a DOAP File for Hudi

2020-02-02 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-343:
---
Fix Version/s: (was: 0.5.2)
   0.5.1

> Create a DOAP File for Hudi
> ---
>
> Key: HUDI-343
> URL: https://issues.apache.org/jira/browse/HUDI-343
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Release  Administrative
>Reporter: Vinoth Chandar
>Assignee: Suneel Marthi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> But please create a DOAP file for Hudi, where you can also list the
> release: https://projects.apache.org/create.html
> <https://projects.apache.org/project.html?incubator-hudi>



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-377) Add Delete() support to HoodieDeltaStreamer

2020-02-02 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-377:
---
Fix Version/s: (was: 0.5.2)
   0.5.1

> Add Delete() support to HoodieDeltaStreamer
> ---
>
> Key: HUDI-377
> URL: https://issues.apache.org/jira/browse/HUDI-377
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: DeltaStreamer
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>   Original Estimate: 72h
>  Time Spent: 20m
>  Remaining Estimate: 71h 40m
>
> Add Delete() support to HoodieDeltaStreamer



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-389) Updates sent to diff partition for a given key with Global Index

2020-02-02 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028630#comment-17028630
 ] 

leesf commented on HUDI-389:


Fixed via master: 9c4217a3e1b9b728690282c914db2067117f4cfb

> Updates sent to diff partition for a given key with Global Index 
> -
>
> Key: HUDI-389
> URL: https://issues.apache.org/jira/browse/HUDI-389
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Index
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>   Original Estimate: 48h
>  Time Spent: 20m
>  Remaining Estimate: 47h 40m
>
> Updates sent to diff partition for a given key with Global Index should 
> succeed by updating the record under original partition. As of now, it throws 
> exception. 
> [https://github.com/apache/incubator-hudi/issues/1021] 
>  
>  
> error log:
> {code:java}
>  14738 [Executor task launch worker-0] INFO 
> com.uber.hoodie.common.table.timeline.HoodieActiveTimeline - Loaded instants 
> java.util.stream.ReferencePipeline$Head@d02b1c7
>  14738 [Executor task launch worker-0] INFO 
> com.uber.hoodie.common.table.view.AbstractTableFileSystemView - Building file 
> system view for partition (2016/04/15)
>  14738 [Executor task launch worker-0] INFO 
> com.uber.hoodie.common.table.view.AbstractTableFileSystemView - #files found 
> in partition (2016/04/15) =0, Time taken =0
>  14738 [Executor task launch worker-0] INFO 
> com.uber.hoodie.common.table.view.AbstractTableFileSystemView - 
> addFilesToView: NumFiles=0, FileGroupsCreationTime=0, StoreTimeTaken=0
>  14738 [Executor task launch worker-0] INFO 
> com.uber.hoodie.common.table.view.HoodieTableFileSystemView - Adding 
> file-groups for partition :2016/04/15, #FileGroups=0
>  14738 [Executor task launch worker-0] INFO 
> com.uber.hoodie.common.table.view.AbstractTableFileSystemView - Time to load 
> partition (2016/04/15) =0
>  14754 [Executor task launch worker-0] ERROR 
> com.uber.hoodie.table.HoodieCopyOnWriteTable - Error upserting bucketType 
> UPDATE for partition :0
>  java.util.NoSuchElementException: No value present
>  at com.uber.hoodie.common.util.Option.get(Option.java:112)
>  at com.uber.hoodie.io.HoodieMergeHandle.(HoodieMergeHandle.java:71)
>  at 
> com.uber.hoodie.table.HoodieCopyOnWriteTable.getUpdateHandle(HoodieCopyOnWriteTable.java:226)
>  at 
> com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:180)
>  at 
> com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:263)
>  at 
> com.uber.hoodie.HoodieWriteClient.lambda$upsertRecordsInternal$7ef77fd$1(HoodieWriteClient.java:442)
>  at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
>  at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
>  at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:843)
>  at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:843)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>  at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:336)
>  at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:334)
>  at 
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:973)
>  at 
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:948)
>  at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:888)
>  at 
> org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:948)
>  at 
> org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:694)
>  at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:285)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>  at org.apac

[jira] [Updated] (HUDI-443) Add slides for Hadoop summit 2019, Bangalore to powered-by page

2020-02-02 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-443:
---
Fix Version/s: (was: 0.5.2)
   0.5.1

> Add slides for Hadoop summit 2019, Bangalore to powered-by page
> ---
>
> Key: HUDI-443
> URL: https://issues.apache.org/jira/browse/HUDI-443
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Docs, newbie
>Reporter: Pratyaksh Sharma
>Assignee: Pratyaksh Sharma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Add slides for the talk on Apache Hudi and debezium at Hadoop summit 2019, 
> Bangalore to powered-by page



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-389) Updates sent to diff partition for a given key with Global Index

2020-02-02 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-389:
---
Fix Version/s: (was: 0.5.2)
   0.5.1

> Updates sent to diff partition for a given key with Global Index 
> -
>
> Key: HUDI-389
> URL: https://issues.apache.org/jira/browse/HUDI-389
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Index
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>   Original Estimate: 48h
>  Time Spent: 20m
>  Remaining Estimate: 47h 40m
>
> Updates sent to diff partition for a given key with Global Index should 
> succeed by updating the record under original partition. As of now, it throws 
> exception. 
> [https://github.com/apache/incubator-hudi/issues/1021] 
>  
>  
> error log:
> {code:java}
>  14738 [Executor task launch worker-0] INFO 
> com.uber.hoodie.common.table.timeline.HoodieActiveTimeline - Loaded instants 
> java.util.stream.ReferencePipeline$Head@d02b1c7
>  14738 [Executor task launch worker-0] INFO 
> com.uber.hoodie.common.table.view.AbstractTableFileSystemView - Building file 
> system view for partition (2016/04/15)
>  14738 [Executor task launch worker-0] INFO 
> com.uber.hoodie.common.table.view.AbstractTableFileSystemView - #files found 
> in partition (2016/04/15) =0, Time taken =0
>  14738 [Executor task launch worker-0] INFO 
> com.uber.hoodie.common.table.view.AbstractTableFileSystemView - 
> addFilesToView: NumFiles=0, FileGroupsCreationTime=0, StoreTimeTaken=0
>  14738 [Executor task launch worker-0] INFO 
> com.uber.hoodie.common.table.view.HoodieTableFileSystemView - Adding 
> file-groups for partition :2016/04/15, #FileGroups=0
>  14738 [Executor task launch worker-0] INFO 
> com.uber.hoodie.common.table.view.AbstractTableFileSystemView - Time to load 
> partition (2016/04/15) =0
>  14754 [Executor task launch worker-0] ERROR 
> com.uber.hoodie.table.HoodieCopyOnWriteTable - Error upserting bucketType 
> UPDATE for partition :0
>  java.util.NoSuchElementException: No value present
>  at com.uber.hoodie.common.util.Option.get(Option.java:112)
>  at com.uber.hoodie.io.HoodieMergeHandle.(HoodieMergeHandle.java:71)
>  at 
> com.uber.hoodie.table.HoodieCopyOnWriteTable.getUpdateHandle(HoodieCopyOnWriteTable.java:226)
>  at 
> com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:180)
>  at 
> com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:263)
>  at 
> com.uber.hoodie.HoodieWriteClient.lambda$upsertRecordsInternal$7ef77fd$1(HoodieWriteClient.java:442)
>  at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
>  at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
>  at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:843)
>  at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:843)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>  at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:336)
>  at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:334)
>  at 
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:973)
>  at 
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:948)
>  at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:888)
>  at 
> org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:948)
>  at 
> org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:694)
>  at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:285)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>  at org.apache.spark.scheduler.Task.run(

[jira] [Commented] (HUDI-311) Support AWS DMS source on DeltaStreamer

2020-02-02 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028629#comment-17028629
 ] 

leesf commented on HUDI-311:


Fix via master: 350b0ecb4d137411c6231a1568add585c6d7b7d5

> Support AWS DMS source on DeltaStreamer
> ---
>
> Key: HUDI-311
> URL: https://issues.apache.org/jira/browse/HUDI-311
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: DeltaStreamer
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> https://aws.amazon.com/dms/ seems like a one-stop shop for database change 
> logs. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-415) HoodieSparkSqlWriter Commit time not representing the Spark job starting time

2020-02-02 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-415:
---
Fix Version/s: (was: 0.5.2)
   0.5.1

> HoodieSparkSqlWriter Commit time not representing the Spark job starting time
> -
>
> Key: HUDI-415
> URL: https://issues.apache.org/jira/browse/HUDI-415
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>Reporter: Yanjia Gary Li
>Assignee: Yanjia Gary Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hudi records the commit time after the first action complete. If there is a 
> heavy transformation before isEmpty(), then the commit time could be 
> inaccurate.
> {code:java}
> if (hoodieRecords.isEmpty()) { 
> log.info("new batch has no new records, skipping...") 
> return (true, common.util.Option.empty()) 
> } 
> commitTime = client.startCommit() 
> writeStatuses = DataSourceUtils.doWriteOperation(client, hoodieRecords, 
> commitTime, operation)
> {code}
> For example, I start the spark job at 20190101, but *isEmpty()* ran for 2 
> hours, then the commit time in the .hoodie folder will be 201901010*2*00. If 
> I use the commit time to ingest data starting from 201901010200(from HDFS, 
> not using deltastreamer), then I will miss 2 hours of data.
> Is this set up intended? Can we move the commit time before isEmpty()?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-106) Dynamically tune bloom filter entries

2020-02-02 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-106:
---
Fix Version/s: (was: 0.5.2)
   0.5.1

> Dynamically tune bloom filter entries
> -
>
> Key: HUDI-106
> URL: https://issues.apache.org/jira/browse/HUDI-106
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Index
>Reporter: Vinoth Chandar
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available, realtime-data-lakes
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Tuning bloom filters is currently based on a configuration, that could be 
> cumbersome to tune per dataset to obtain good indexing performance.. Lets add 
> support for Dynamic Bloom Filters, that can automatically achieve a 
> configured false positive ratio depending on number of entries. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


<    1   2   3   4   5   6   7   8   9   10   >