[jira] [Created] (HUDI-240) Translate Use Cases page

2019-09-09 Thread leesf (Jira)
leesf created HUDI-240:
--

 Summary: Translate Use Cases page
 Key: HUDI-240
 URL: https://issues.apache.org/jira/browse/HUDI-240
 Project: Apache Hudi (incubating)
  Issue Type: Sub-task
  Components: docs-chinese
Reporter: leesf
Assignee: leesf


The online HTML web page: [https://hudi.apache.org/use_cases.html]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (HUDI-215) Update documentation for joining slack group

2019-09-17 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932079#comment-16932079
 ] 

leesf commented on HUDI-215:


Fixed via asf-site: cb57c912bf3d9ceb8b81a512b17d61cbd7ad1af9

> Update documentation for joining slack group
> 
>
> Key: HUDI-215
> URL: https://issues.apache.org/jira/browse/HUDI-215
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Docs
>Reporter: Pratyaksh Sharma
>Assignee: Pratyaksh Sharma
>Priority: Major
>  Labels: documentation, newbie, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently we have a list of pre-approved mail domains for joining apache-hudi 
> slack group. If anyone, whose mail-id is not present in that list, wants to 
> join the group, he/she has to check out github issue - 
> [https://github.com/apache/incubator-hudi/issues/143]. 
> However there is a documentation gap as this issue is not mentioned in the 
> documentation. This Jira is regarding updating the documentation to mention 
> this github issue in community.html page.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-257) Unit tests intermittently failing

2019-09-18 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932412#comment-16932412
 ] 

leesf commented on HUDI-257:


Fixed via master: 2c6da09d9d17f33ebc025c9ec9fa949605288bb7

> Unit tests intermittently failing 
> --
>
> Key: HUDI-257
> URL: https://issues.apache.org/jira/browse/HUDI-257
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Common Core
>Reporter: BALAJI VARADARAJAN
>Assignee: BALAJI VARADARAJAN
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
>   TestHoodieBloomIndex.testCheckUUIDsAgainstOneFile:270 » HoodieIndex Error 
> chec...
>   TestHoodieBloomIndex.testCheckUUIDsAgainstOneFile:270 » HoodieIndex Error 
> chec...
>   TestHoodieBloomIndex.testCheckUUIDsAgainstOneFile:270 » HoodieIndex Error 
> chec...
>   TestHoodieBloomIndex.testCheckUUIDsAgainstOneFile:270 » HoodieIndex Error 
> chec...
>   TestCopyOnWriteTable.testUpdateRecords:170 » HoodieIO Failed to read footer 
> fo...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-244) Hive Sync Should escape partition column name

2019-09-15 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930220#comment-16930220
 ] 

leesf commented on HUDI-244:


Fixed via master: b1446be2b4d50e60b54146c66a6e6412d41b3a17

> Hive Sync Should escape partition column name
> -
>
> Key: HUDI-244
> URL: https://issues.apache.org/jira/browse/HUDI-244
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Hive Integration
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Closed] (HUDI-240) Translate Use Cases page

2019-09-15 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-240.
--
Fix Version/s: 0.5.0
   Resolution: Fixed

Fixed via asf-site: 175a9d81fe4ed5447a5779fe5fbab9987bd93b83

> Translate Use Cases page
> 
>
> Key: HUDI-240
> URL: https://issues.apache.org/jira/browse/HUDI-240
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: docs-chinese
>Reporter: leesf
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The online HTML web page: [https://hudi.apache.org/use_cases.html]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (HUDI-250) Ensure Hudi CLI wrapper works with non snapshot jars too

2019-09-15 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930222#comment-16930222
 ] 

leesf commented on HUDI-250:


Fixed via master: 3ee16b5439837f6bb03052a7d57edd8bc67db7d7

> Ensure Hudi CLI wrapper works with non snapshot jars too
> 
>
> Key: HUDI-250
> URL: https://issues.apache.org/jira/browse/HUDI-250
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: CLI
>Reporter: BALAJI VARADARAJAN
>Assignee: BALAJI VARADARAJAN
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> hudi-cli.sh expects jars to have SNAPSHOT suffix in them. It should work for 
> both master and release branches



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (HUDI-237) Translate quickstart page

2019-09-05 Thread leesf (Jira)
leesf created HUDI-237:
--

 Summary: Translate quickstart page
 Key: HUDI-237
 URL: https://issues.apache.org/jira/browse/HUDI-237
 Project: Apache Hudi (incubating)
  Issue Type: Sub-task
  Components: docs-chinese
Reporter: leesf
Assignee: leesf


The online HTML web page: https://hudi.apache.org/quickstart.html



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (HUDI-265) Failed to delete tmp dirs created in unit tests

2019-09-19 Thread leesf (Jira)
leesf created HUDI-265:
--

 Summary: Failed to delete tmp dirs created in unit tests
 Key: HUDI-265
 URL: https://issues.apache.org/jira/browse/HUDI-265
 Project: Apache Hudi (incubating)
  Issue Type: Bug
  Components: Testing
Reporter: leesf
Assignee: leesf


In some unit tests, such as TestHoodieSnapshotCopier, TestUpdateMapFunction.  
After run these tests, it fails to delete tmp dir created in _init(with before 
annotation)_ after clean(with after annotation), thus will cause too many 
folders in /tmp. we need to delete these dirs after finishing ut.

I will go through all the unit tests that did not properly delete the tmp dir 
and send a patch.

 

cc [~vinoth] [~vbalaji]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-265) Failed to delete tmp dirs created in unit tests

2019-09-19 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-265:
---
Fix Version/s: 0.5.1
   Issue Type: Test  (was: Bug)

> Failed to delete tmp dirs created in unit tests
> ---
>
> Key: HUDI-265
> URL: https://issues.apache.org/jira/browse/HUDI-265
> Project: Apache Hudi (incubating)
>  Issue Type: Test
>  Components: Testing
>Reporter: leesf
>Assignee: leesf
>Priority: Major
> Fix For: 0.5.1
>
>
> In some unit tests, such as TestHoodieSnapshotCopier, TestUpdateMapFunction.  
> After run these tests, it fails to delete tmp dir created in _init(with 
> before annotation)_ after clean(with after annotation), thus will cause too 
> many folders in /tmp. we need to delete these dirs after finishing ut.
> I will go through all the unit tests that did not properly delete the tmp dir 
> and send a patch.
>  
> cc [~vinoth] [~vbalaji]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-279) Regression in Schema Evolution due to PR-755

2019-09-26 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-279.
--
Resolution: Fixed

Fixed via master: 2ea8b0c3f1eeb19f4dc1e9946331c8fd93e6daab

> Regression in Schema Evolution due to PR-755
> 
>
> Key: HUDI-279
> URL: https://issues.apache.org/jira/browse/HUDI-279
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Common Core
>Reporter: Balaji Varadarajan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Reported by Alex:
> [https://github.com/apache/incubator-hudi/blob/master/hudi-client/src/main/java/org/apache/hudi/table/HoodieCopyOnWriteTable.java#L200]
> this sets an Avro Schema on the config
>  
> but I see that AvroReadSupport.init is getting a different config instance, 
> with avro schema set to null and falls back to what is in parquet. Which 
> breaks during the old/new data merge. I’m pretty sure it worked before as we 
> had successful schema evolutions. Any idea why it might be happening? 
>  
> Caused by changes in :
> [https://github.com/apache/incubator-hudi/pull/755]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-285) Implement HoodieStorageWriter based on the metadata

2019-09-26 Thread leesf (Jira)
leesf created HUDI-285:
--

 Summary: Implement HoodieStorageWriter based on the metadata
 Key: HUDI-285
 URL: https://issues.apache.org/jira/browse/HUDI-285
 Project: Apache Hudi (incubating)
  Issue Type: Improvement
  Components: Write Client
Reporter: leesf
Assignee: leesf
 Fix For: 0.5.1


Currently the _getStorageWriter_ method in HoodieStorageWriterFactory to get 
HoodieStorageWriter is hard code to HoodieParquetWriter since currently only 
parquet is supported for HoodieStorageWriter. However, it is better to 
implement HoodieStorageWriter based on the metadata for extension. And if 
_StorageWriterType_ is emtpy in metadata, the default HoodieParquetWriter is 
returned to not affect the current logic.

cc [~vinoth] [~vbalaji]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-232) Implement sealing/unsealing for HoodieRecord class

2019-09-26 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf reassigned HUDI-232:
--

Assignee: leesf

> Implement sealing/unsealing for HoodieRecord class
> --
>
> Key: HUDI-232
> URL: https://issues.apache.org/jira/browse/HUDI-232
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Write Client
>Affects Versions: 0.5.0
>Reporter: Vinoth Chandar
>Assignee: leesf
>Priority: Major
>
> HoodieRecord class sometimes is modified to set the record location. We can 
> get into issues like HUDI-170 if the modification is misplaced. We need a 
> mechanism to seal the class and unseal for modification explicity.. Try to 
> modify in sealed state should throw an error



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-278) Translate Administering page

2019-09-27 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-278.
--
Resolution: Fixed

Fixed via asf-site: 7ff0fe2a0754771fdf27d8280212a452f4e9a269

> Translate Administering page
> 
>
> Key: HUDI-278
> URL: https://issues.apache.org/jira/browse/HUDI-278
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: docs-chinese
>Reporter: leesf
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> he online HTML web page: [http://hudi.apache.org/admin_guide.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-233) Redo log statements using SLF4J

2019-09-24 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16936670#comment-16936670
 ] 

leesf commented on HUDI-233:


Could I start the progress? For simplicity, I woluld like create an umbrella 
ticket and split the redo work to sub-task according to the module, such as

Redo hudi-cli log statements using SLF4J.
Redo hudi-client log statements using SLF4J.
Redo hudi-common log statements using SLF4J.
...

Or just complete the whole redo work in this ticket.

and what do you think? cc [~vinoth]

> Redo log statements using SLF4J 
> 
>
> Key: HUDI-233
> URL: https://issues.apache.org/jira/browse/HUDI-233
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: newbie, Performance
>Affects Versions: 0.5.0
>Reporter: Vinoth Chandar
>Assignee: leesf
>Priority: Major
>
> Currently we are not employing variable substitution aggresively in the 
> project.  ala 
> {code:java}
> LogManager.getLogger(SomeName.class.getName()).info("Message: {}, Detail: 
> {}", message, detail);
> {code}
> This can improve performance since the string concatenation is deferrable to 
> when the logging is actually in effect.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-220) Translate root index page

2019-09-24 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-220.
--
Resolution: Fixed

Fixed via asf-site: f92566d2bf75077b4b1a7d9004ddb025122c2141

> Translate root index page
> -
>
> Key: HUDI-220
> URL: https://issues.apache.org/jira/browse/HUDI-220
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: docs-chinese
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The online HTML page is : [https://hudi.apache.org/index.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-285) Implement HoodieStorageWriter based on actual file type

2019-09-29 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-285:
---
Description: 
Currently the _getStorageWriter_ method in HoodieStorageWriterFactory to get 
HoodieStorageWriter is hard code to HoodieParquetWriter since currently only 
parquet is supported for HoodieStorageWriter. However, it is better to 
implement HoodieStorageWriter based on actual file type for extension.

cc [~vinoth] [~vbalaji]

  was:
Currently the _getStorageWriter_ method in HoodieStorageWriterFactory to get 
HoodieStorageWriter is hard code to HoodieParquetWriter since currently only 
parquet is supported for HoodieStorageWriter. However, it is better to 
implement HoodieStorageWriter based on the metadata for extension. And if 
_StorageWriterType_ is emtpy in metadata, the default HoodieParquetWriter is 
returned to not affect the current logic.

cc [~vinoth] [~vbalaji]


> Implement HoodieStorageWriter based on actual file type
> ---
>
> Key: HUDI-285
> URL: https://issues.apache.org/jira/browse/HUDI-285
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Write Client
>Reporter: leesf
>Assignee: leesf
>Priority: Major
> Fix For: 0.5.1
>
>
> Currently the _getStorageWriter_ method in HoodieStorageWriterFactory to get 
> HoodieStorageWriter is hard code to HoodieParquetWriter since currently only 
> parquet is supported for HoodieStorageWriter. However, it is better to 
> implement HoodieStorageWriter based on actual file type for extension.
> cc [~vinoth] [~vbalaji]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-285) Implement HoodieStorageWriter based on actual file type

2019-09-29 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-285:
---
Summary: Implement HoodieStorageWriter based on actual file type  (was: 
Implement HoodieStorageWriter based on the metadata)

> Implement HoodieStorageWriter based on actual file type
> ---
>
> Key: HUDI-285
> URL: https://issues.apache.org/jira/browse/HUDI-285
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Write Client
>Reporter: leesf
>Assignee: leesf
>Priority: Major
> Fix For: 0.5.1
>
>
> Currently the _getStorageWriter_ method in HoodieStorageWriterFactory to get 
> HoodieStorageWriter is hard code to HoodieParquetWriter since currently only 
> parquet is supported for HoodieStorageWriter. However, it is better to 
> implement HoodieStorageWriter based on the metadata for extension. And if 
> _StorageWriterType_ is emtpy in metadata, the default HoodieParquetWriter is 
> returned to not affect the current logic.
> cc [~vinoth] [~vbalaji]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-256) Translate Comparison page

2019-09-25 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-256.
--
Resolution: Fixed

Fixed via asf-site: cef57691228de09429cd8794117dee6fc8f729d2

> Translate Comparison page
> -
>
> Key: HUDI-256
> URL: https://issues.apache.org/jira/browse/HUDI-256
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: docs-chinese
>Reporter: leesf
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The online HTML web page: [https://hudi.apache.org/comparison.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-285) Implement HoodieStorageWriter based on the metadata

2019-09-27 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939810#comment-16939810
 ] 

leesf commented on HUDI-285:


Yes, we can get the actual type of the file from the path parameter, and then 
reuse the HoodieFileFormat.

> Implement HoodieStorageWriter based on the metadata
> ---
>
> Key: HUDI-285
> URL: https://issues.apache.org/jira/browse/HUDI-285
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Write Client
>Reporter: leesf
>Assignee: leesf
>Priority: Major
> Fix For: 0.5.1
>
>
> Currently the _getStorageWriter_ method in HoodieStorageWriterFactory to get 
> HoodieStorageWriter is hard code to HoodieParquetWriter since currently only 
> parquet is supported for HoodieStorageWriter. However, it is better to 
> implement HoodieStorageWriter based on the metadata for extension. And if 
> _StorageWriterType_ is emtpy in metadata, the default HoodieParquetWriter is 
> returned to not affect the current logic.
> cc [~vinoth] [~vbalaji]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-324) TimestampKeyGenerator should support milliseconds

2019-11-05 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-324.
--
Fix Version/s: 0.5.1
   Resolution: Fixed

Fixed via master: 71ac2c0d5e8bffdbb11f7789a7805575736049c1

> TimestampKeyGenerator should support milliseconds
> -
>
> Key: HUDI-324
> URL: https://issues.apache.org/jira/browse/HUDI-324
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: deltastreamer
>Reporter: Vinoth Chandar
>Assignee: Gurudatt Kulkarni
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-221) Translate concept page

2019-10-31 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-221.
--
Fix Version/s: 0.5.1
   Resolution: Fixed

Fixed via asf-site: 4fd3c7f737a2cf2a5d506896ea641e4d62d103ce

> Translate concept page
> --
>
> Key: HUDI-221
> URL: https://issues.apache.org/jira/browse/HUDI-221
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: docs-chinese
>Reporter: vinoyang
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The online HTML web page: [https://hudi.apache.org/concepts.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-275) Translate Documentation -> Querying Data page

2019-10-31 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-275.
--
Resolution: Fixed

Fixed via asf-site: 4b3b197b8a6e983f20067ed3ef00694e19edf9f9

> Translate Documentation -> Querying Data page
> -
>
> Key: HUDI-275
> URL: https://issues.apache.org/jira/browse/HUDI-275
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: docs-chinese
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Translate this page into Chinese:
>  
> [http://hudi.apache.org/querying_data.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-299) Refactoring Hoodie#getFileName

2019-10-30 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-299.
--
Resolution: Not A Problem

> Refactoring Hoodie#getFileName
> --
>
> Key: HUDI-299
> URL: https://issues.apache.org/jira/browse/HUDI-299
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Common Core
>Reporter: leesf
>Assignee: leesf
>Priority: Minor
> Fix For: 0.5.1
>
>
> Currently, the code style used in HoodieInstance#getFileName is below.
> {code:java}
> if (xxx) {
> return;
> } else if (xxx) {
> return;
> } else if (xxx) {
> return;
> }
> throw new IllegalArgumentException("xxx");
> {code}
> However, it could be refactored into a simpler and more readable code style.
> {code:java}
> if (xxx) {
> return;
> } 
> if (xxx) {
> return;
> } 
> if (xxx) {
> return;
> }
> throw new IllegalArgumentException("xxx");
> {code}
> CC [~vbalaji] [~vinoth]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-299) Refactoring Hoodie#getFileName

2019-10-30 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963558#comment-16963558
 ] 

leesf commented on HUDI-299:


Close the issue for no other views.

> Refactoring Hoodie#getFileName
> --
>
> Key: HUDI-299
> URL: https://issues.apache.org/jira/browse/HUDI-299
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Common Core
>Reporter: leesf
>Assignee: leesf
>Priority: Minor
> Fix For: 0.5.1
>
>
> Currently, the code style used in HoodieInstance#getFileName is below.
> {code:java}
> if (xxx) {
> return;
> } else if (xxx) {
> return;
> } else if (xxx) {
> return;
> }
> throw new IllegalArgumentException("xxx");
> {code}
> However, it could be refactored into a simpler and more readable code style.
> {code:java}
> if (xxx) {
> return;
> } 
> if (xxx) {
> return;
> } 
> if (xxx) {
> return;
> }
> throw new IllegalArgumentException("xxx");
> {code}
> CC [~vbalaji] [~vinoth]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HUDI-319) Create online javadocs based on the jar

2019-11-01 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964777#comment-16964777
 ] 

leesf edited comment on HUDI-319 at 11/1/19 11:21 AM:
--

+1. This will be very practical and convenient


was (Author: xleesf):
+1.

> Create online javadocs based on the jar
> ---
>
> Key: HUDI-319
> URL: https://issues.apache.org/jira/browse/HUDI-319
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Docs
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Minor
>  Labels: Documentation
>
> It makes the development easier to have the online javadocs on the side and 
> understand the public APIs provided by Hudi when necessary, instead of always 
> going into the source code.
>  
> Example of Spark online javadocs: 
> [https://spark.apache.org/docs/latest/api/java/index.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-319) Create online javadocs based on the jar

2019-11-01 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964777#comment-16964777
 ] 

leesf commented on HUDI-319:


+1.

> Create online javadocs based on the jar
> ---
>
> Key: HUDI-319
> URL: https://issues.apache.org/jira/browse/HUDI-319
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Docs
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Minor
>  Labels: Documentation
>
> It makes the development easier to have the online javadocs on the side and 
> understand the public APIs provided by Hudi when necessary, instead of always 
> going into the source code.
>  
> Example of Spark online javadocs: 
> [https://spark.apache.org/docs/latest/api/java/index.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-330) add EmptyStatement java checkstyle rule

2019-11-13 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-330.
--
Resolution: Fixed

Fixed via master: 045fa87a3db5d9954d1ca88c8a2ef28c11214330

> add EmptyStatement java checkstyle rule
> ---
>
> Key: HUDI-330
> URL: https://issues.apache.org/jira/browse/HUDI-330
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Write Client
>Reporter: lamber-ken
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Detects empty statements (standalone ';').



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-335) Improvements to DiskBasedMap

2019-11-17 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975969#comment-16975969
 ] 

leesf commented on HUDI-335:


Looks promising. Would you please send a PR? [~balajeeUber]

> Improvements to DiskBasedMap
> 
>
> Key: HUDI-335
> URL: https://issues.apache.org/jira/browse/HUDI-335
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Common Core
>Reporter: Balajee Nagasubramaniam
>Priority: Major
>  Labels: Hoodie
> Fix For: 0.5.1
>
> Attachments: Screen Shot 2019-11-11 at 1.22.44 PM.png, Screen Shot 
> 2019-11-13 at 2.56.53 PM.png
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> DiskBasedMap is used by ExternalSpillableMap for writing (K,V) pair to a file,
> keeping the (K, fileMetadata) in memory, to reduce the foot print of the 
> record on disk.
> This change improves the performance of the record get/read operation to 
> disk, by using
> a BufferedInputStream to cache the data.
> Results from POC are promising.   Before the write performance improvement, 
> spilling/writing 1 million records (record size ~ 350 bytes) to the file took 
> about 104 seconds. 
> After the improvement, same operation can be performed in under 5 seconds
> Similarly, before the read performance improvement reading 1 million records 
> (size ~350 bytes) from the spill file took about 23 seconds.  After the 
> improvement, same operation can be performed in under 4 seconds.
> {{without read/write performance improvements 
> 
> RecordsHandled:   1   totalTestTime:  3145writeTime:  1176
> readTime:   255
> RecordsHandled:   5   totalTestTime:  5775writeTime:  4187
> readTime:   1175
> RecordsHandled:   10  totalTestTime:  10570   writeTime:  7718
> readTime:   2203
> RecordsHandled:   50  totalTestTime:  59723   writeTime:  45618   
> readTime:   11093
> RecordsHandled:   100 totalTestTime:  120022  writeTime:  87918   
> readTime:   22355
> RecordsHandled:   200 totalTestTime:  258627  writeTime:  187185  
> readTime:   56431}}
> {{With write improvement:
> RecordsHandled:   1   totalTestTime:  2013writeTime:  700 
> readTime:   503
> RecordsHandled:   5   totalTestTime:  2525writeTime:  390 
> readTime:   1247
> RecordsHandled:   10  totalTestTime:  3583writeTime:  464 
> readTime:   2352
> RecordsHandled:   50  totalTestTime:  22934   writeTime:  3731
> readTime:   15778
> RecordsHandled:   100 totalTestTime:  42415   writeTime:  4816
> readTime:   30332
> RecordsHandled:   200 totalTestTime:  74158   writeTime:  10192   
> readTime:   53195}}
> {{With read improvements:
> RecordsHandled:   1   totalTestTime:  2473writeTime:  1562
> readTime:   87
> RecordsHandled:   5   totalTestTime:  6169writeTime:  5151
> readTime:   438
> RecordsHandled:   10  totalTestTime:  9967writeTime:  8636
> readTime:   252
> RecordsHandled:   50  totalTestTime:  50889   writeTime:  46766   
> readTime:   1014
> RecordsHandled:   100 totalTestTime:  114482  writeTime:  104353  
> readTime:   3776
> RecordsHandled:   200 totalTestTime:  239251  writeTime:  219041  
> readTime:   8127}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-337) Merge on Read incremental pull has in-consistent results

2019-11-17 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-337.
--
Fix Version/s: 0.5.1
   Resolution: Fixed

Fixed via master: f82e58994e046d92916d45f7ec921e2fb6ba26ef

> Merge on Read incremental pull has in-consistent results
> 
>
> Key: HUDI-337
> URL: https://issues.apache.org/jira/browse/HUDI-337
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Incremental Pull
>Reporter: Nishith Agarwal
>Assignee: Nishith Agarwal
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Consider the following example :
> 1.deltacommit
> 2.inflight
> 3.rollback <- because rollback was initiated after startCommit
> Now, for the logReader, we pass the latestInstant allowed to be 
> getCompletedInstants(COMMIT, DELTACOMMIT, ROLLBACK).
> Now, since the latestInstant from the call above is "3", the inflight 
> logblocks will be returned leading to invalid values.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HUDI-288) Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment

2019-11-18 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976500#comment-16976500
 ] 

leesf edited comment on HUDI-288 at 11/18/19 12:37 PM:
---

[~vinoth] Sorry for late feedback. After a closer look to code paths, I prefer 
the second solution that we can write a new tool that wraps the current 
DeltaStreamer, just uses the kafka topic regex to identify all topics that need 
to be ingested, and just creates one delta streamer each topic within a SINGLE 
spark application. This solution is easier compared to the first solution.

Few questions. 
First, if the topics need to be ingested do not in regex pattern, should we 
also allow users to list all topics explicitly? Second, in currenty data flow, 
the relationship of kafka topic to _targetBasePath_  is one-to-one, should we 
allow users to specify multi targetBasePath while consuming many topics, I 
think only one targetBasePath is simpler but does it make sense? and the same 
question to the config _targetTableName_ in hive.


was (Author: xleesf):
[~vinoth] Sorry for late feedback. After a closer look to code paths, I prefer 
the second solution that we can write a new tool that wraps the current 
DeltaStreamer, just uses the kafka topic regex to identify all topics that need 
to be ingested, and just creates one delta streamer each topic within a SINGLE 
spark application. This solution is easier compared to the first solution.

Few questions. 
If the topics need to be ingested do not in regex pattern, should we also allow 
users to list all topics explicitly? 
Second, in currenty data flow, the relationship of kafka topic to 
_targetBasePath _is one-to-one, should we allow users to specify multi 
targetBasePath while consuming many topics, I think only one targetBasePath is 
simpler but does it make sense? and the same question to the config 
_targetTableName_ in hive.

> Add support for ingesting multiple kafka streams in a single DeltaStreamer 
> deployment
> -
>
> Key: HUDI-288
> URL: https://issues.apache.org/jira/browse/HUDI-288
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: deltastreamer
>Reporter: Vinoth Chandar
>Assignee: leesf
>Priority: Major
>
> https://lists.apache.org/thread.html/3a69934657c48b1c0d85cba223d69cb18e18cd8aaa4817c9fd72cef6@
>  has all the context



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-288) Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment

2019-11-18 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976500#comment-16976500
 ] 

leesf commented on HUDI-288:


[~vinoth] Sorry for late feedback. After a closer look to code paths, I prefer 
the second solution that we can write a new tool that wraps the current 
DeltaStreamer, just uses the kafka topic regex to identify all topics that need 
to be ingested, and just creates one delta streamer each topic within a SINGLE 
spark application. This solution is easier compared to the first solution.

Two questions. If the topics need to be ingested do not in regex pattern, 
should we also allow users to list all topics explicitly? 
Second, in currenty data flow, the relationship of kafka topic to 
_targetBasePath _is one-to-one, should we allow users to specify multi 
targetBasePath while consuming many topics? and the same to the config 
_targetTableName_ in hive.

> Add support for ingesting multiple kafka streams in a single DeltaStreamer 
> deployment
> -
>
> Key: HUDI-288
> URL: https://issues.apache.org/jira/browse/HUDI-288
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: deltastreamer
>Reporter: Vinoth Chandar
>Assignee: leesf
>Priority: Major
>
> https://lists.apache.org/thread.html/3a69934657c48b1c0d85cba223d69cb18e18cd8aaa4817c9fd72cef6@
>  has all the context



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HUDI-288) Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment

2019-11-18 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976500#comment-16976500
 ] 

leesf edited comment on HUDI-288 at 11/18/19 12:36 PM:
---

[~vinoth] Sorry for late feedback. After a closer look to code paths, I prefer 
the second solution that we can write a new tool that wraps the current 
DeltaStreamer, just uses the kafka topic regex to identify all topics that need 
to be ingested, and just creates one delta streamer each topic within a SINGLE 
spark application. This solution is easier compared to the first solution.

Few questions. 
If the topics need to be ingested do not in regex pattern, should we also allow 
users to list all topics explicitly? 
Second, in currenty data flow, the relationship of kafka topic to 
_targetBasePath _is one-to-one, should we allow users to specify multi 
targetBasePath while consuming many topics, I think only one targetBasePath is 
simpler but does it make sense? and the same question to the config 
_targetTableName_ in hive.


was (Author: xleesf):
[~vinoth] Sorry for late feedback. After a closer look to code paths, I prefer 
the second solution that we can write a new tool that wraps the current 
DeltaStreamer, just uses the kafka topic regex to identify all topics that need 
to be ingested, and just creates one delta streamer each topic within a SINGLE 
spark application. This solution is easier compared to the first solution.

Two questions. If the topics need to be ingested do not in regex pattern, 
should we also allow users to list all topics explicitly? 
Second, in currenty data flow, the relationship of kafka topic to 
_targetBasePath _is one-to-one, should we allow users to specify multi 
targetBasePath while consuming many topics? and the same to the config 
_targetTableName_ in hive.

> Add support for ingesting multiple kafka streams in a single DeltaStreamer 
> deployment
> -
>
> Key: HUDI-288
> URL: https://issues.apache.org/jira/browse/HUDI-288
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: deltastreamer
>Reporter: Vinoth Chandar
>Assignee: leesf
>Priority: Major
>
> https://lists.apache.org/thread.html/3a69934657c48b1c0d85cba223d69cb18e18cd8aaa4817c9fd72cef6@
>  has all the context



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-245) Refactor code references that call HoodieTimeline.getInstants() and reverse to directly use method HoodieTimeline.getReverseOrderedInstants

2019-11-07 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-245.
--
Fix Version/s: 0.5.1
   Resolution: Fixed

Fixed via master: 0863b1cfd947402c66221afa8d1f18fd2bd8273b

> Refactor code references that call HoodieTimeline.getInstants() and reverse 
> to directly use method HoodieTimeline.getReverseOrderedInstants 
> 
>
> Key: HUDI-245
> URL: https://issues.apache.org/jira/browse/HUDI-245
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: newbie
>Reporter: Bhavani Sudha Saktheeswaran
>Assignee: Pratyaksh Sharma
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-354) Introduce stricter comment and code style validation rules

2019-11-21 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979225#comment-16979225
 ] 

leesf commented on HUDI-354:


I verified in my local dev. If set_ severity_ property to _info_, the 
checkstyle did affect nothing. It works when I changed the severity to _error_. 
 Any other concern to set it to _info_? [~lamber-ken]. And if we do not add 
severity property, it is set to error default. CC [~yanghua]

> Introduce stricter comment and code style validation rules
> --
>
> Key: HUDI-354
> URL: https://issues.apache.org/jira/browse/HUDI-354
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>Reporter: vinoyang
>Priority: Major
>
> This is an umbrella issue used to track apply some stricter comment and code 
> style validation rules for the whole project. The rules list below:
>  # All public classes must add class-level comments;
>  # All comments must end with a clear "."
>  # In the import statement of the class, clearly distinguish (by blank lines) 
> the import of Java SE and the import of non-java SE. Currently, I saw at 
> least two projects(Spark and Flink) that implement this rule. Flink 
> implements stricter rules than Spark. It is divided into several blocks from 
> top to bottom(owner import -> non-owner and non-JavaSE import -> Java SE 
> import -> static import), each block are sorted according to the natural 
> sequence of letters;
>  # Reconfirm the method and whether the comment is consistency;
> Each project sub-module mappings to one subtask.
> How to find all the invalidated points?
>  * Add the XML code snippet into {{PROJECT_ROOT/style/checkstyle.xml}} : 
> {code:java}
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>value="Import {0} appears after other imports that it should precede"/>
> 
> 
> 
> 
>value="Redundant import {0}."/>
> 
> 
> 
> 
> {code}
>  *  Make sure you have installed CheckStyle-IDEA plugin and activated for the 
> project.
>  * Scan the project module you want to refactor and fix all the issues one by 
> one.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HUDI-350) Update javadocs in HoodieCleanHelper to reflect correct defaults for retained commits

2019-11-21 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf reopened HUDI-350:


> Update javadocs in HoodieCleanHelper to reflect correct defaults for retained 
> commits
> -
>
> Key: HUDI-350
> URL: https://issues.apache.org/jira/browse/HUDI-350
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Cleaner, newbie
>Reporter: Balaji Varadarajan
>Assignee: Pratyaksh Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Context in the thread : 
> [https://lists.apache.org/thread.html/e834d1f5df4341596884b476b6433bf609fe70aca19dcd3ac2242845@%3Cdev.hudi.apache.org%3E]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-350) Update javadocs in HoodieCleanHelper to reflect correct defaults for retained commits

2019-11-21 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-350.
--
Fix Version/s: 0.5.1
   Resolution: Fixed

> Update javadocs in HoodieCleanHelper to reflect correct defaults for retained 
> commits
> -
>
> Key: HUDI-350
> URL: https://issues.apache.org/jira/browse/HUDI-350
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Cleaner, newbie
>Reporter: Balaji Varadarajan
>Assignee: Pratyaksh Sharma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Context in the thread : 
> [https://lists.apache.org/thread.html/e834d1f5df4341596884b476b6433bf609fe70aca19dcd3ac2242845@%3Cdev.hudi.apache.org%3E]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HUDI-350) Update javadocs in HoodieCleanHelper to reflect correct defaults for retained commits

2019-11-21 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979258#comment-16979258
 ] 

leesf edited comment on HUDI-350 at 11/21/19 1:05 PM:
--

Fixed(reopen and close) [~Pratyaksh] and thanks for your contributing.


was (Author: xleesf):
Fixed [~Pratyaksh] and thanks for your contributing.

> Update javadocs in HoodieCleanHelper to reflect correct defaults for retained 
> commits
> -
>
> Key: HUDI-350
> URL: https://issues.apache.org/jira/browse/HUDI-350
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Cleaner, newbie
>Reporter: Balaji Varadarajan
>Assignee: Pratyaksh Sharma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Context in the thread : 
> [https://lists.apache.org/thread.html/e834d1f5df4341596884b476b6433bf609fe70aca19dcd3ac2242845@%3Cdev.hudi.apache.org%3E]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-350) Update javadocs in HoodieCleanHelper to reflect correct defaults for retained commits

2019-11-21 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979258#comment-16979258
 ] 

leesf commented on HUDI-350:


Fixed [~Pratyaksh] and thanks for your contributing.

> Update javadocs in HoodieCleanHelper to reflect correct defaults for retained 
> commits
> -
>
> Key: HUDI-350
> URL: https://issues.apache.org/jira/browse/HUDI-350
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Cleaner, newbie
>Reporter: Balaji Varadarajan
>Assignee: Pratyaksh Sharma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Context in the thread : 
> [https://lists.apache.org/thread.html/e834d1f5df4341596884b476b6433bf609fe70aca19dcd3ac2242845@%3Cdev.hudi.apache.org%3E]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-354) Introduce stricter comment and code style validation rules

2019-11-21 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979227#comment-16979227
 ] 

leesf commented on HUDI-354:


Also, when we fix the comment error, should we also fix some semantic errors of 
some comments?

> Introduce stricter comment and code style validation rules
> --
>
> Key: HUDI-354
> URL: https://issues.apache.org/jira/browse/HUDI-354
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>Reporter: vinoyang
>Priority: Major
>
> This is an umbrella issue used to track apply some stricter comment and code 
> style validation rules for the whole project. The rules list below:
>  # All public classes must add class-level comments;
>  # All comments must end with a clear "."
>  # In the import statement of the class, clearly distinguish (by blank lines) 
> the import of Java SE and the import of non-java SE. Currently, I saw at 
> least two projects(Spark and Flink) that implement this rule. Flink 
> implements stricter rules than Spark. It is divided into several blocks from 
> top to bottom(owner import -> non-owner and non-JavaSE import -> Java SE 
> import -> static import), each block are sorted according to the natural 
> sequence of letters;
>  # Reconfirm the method and whether the comment is consistency;
> Each project sub-module mappings to one subtask.
> How to find all the invalidated points?
>  * Add the XML code snippet into {{PROJECT_ROOT/style/checkstyle.xml}} : 
> {code:java}
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>value="Import {0} appears after other imports that it should precede"/>
> 
> 
> 
> 
>value="Redundant import {0}."/>
> 
> 
> 
> 
> {code}
>  *  Make sure you have installed CheckStyle-IDEA plugin and activated for the 
> project.
>  * Scan the project module you want to refactor and fix all the issues one by 
> one.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-350) Update javadocs in HoodieCleanHelper to reflect correct defaults for retained commits

2019-11-21 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979243#comment-16979243
 ] 

leesf commented on HUDI-350:


[~Pratyaksh][~vbalaji], Hi, I think we had better mark the jira resolved 
instead of unresolved.

> Update javadocs in HoodieCleanHelper to reflect correct defaults for retained 
> commits
> -
>
> Key: HUDI-350
> URL: https://issues.apache.org/jira/browse/HUDI-350
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Cleaner, newbie
>Reporter: Balaji Varadarajan
>Assignee: Pratyaksh Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Context in the thread : 
> [https://lists.apache.org/thread.html/e834d1f5df4341596884b476b6433bf609fe70aca19dcd3ac2242845@%3Cdev.hudi.apache.org%3E]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-288) Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment

2019-12-02 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16986680#comment-16986680
 ] 

leesf commented on HUDI-288:


> Thank you for letting me drive this work. I was thinking if we should add 
> documentation for this tool as well, that will help a lot of users quickly 
> adopt Hudi for data ingestion. 

Would be nice to have some docs. And assigned to you.

> Add support for ingesting multiple kafka streams in a single DeltaStreamer 
> deployment
> -
>
> Key: HUDI-288
> URL: https://issues.apache.org/jira/browse/HUDI-288
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: deltastreamer
>Reporter: Vinoth Chandar
>Assignee: Pratyaksh Sharma
>Priority: Major
>
> https://lists.apache.org/thread.html/3a69934657c48b1c0d85cba223d69cb18e18cd8aaa4817c9fd72cef6@
>  has all the context



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-233) Redo log statements using SLF4J

2019-12-03 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16986863#comment-16986863
 ] 

leesf commented on HUDI-233:


Is the time to start the work? [~vinoth]

> Redo log statements using SLF4J 
> 
>
> Key: HUDI-233
> URL: https://issues.apache.org/jira/browse/HUDI-233
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: newbie, Performance
>Affects Versions: 0.5.0
>Reporter: Vinoth Chandar
>Assignee: leesf
>Priority: Major
>
> Currently we are not employing variable substitution aggresively in the 
> project.  ala 
> {code:java}
> LogManager.getLogger(SomeName.class.getName()).info("Message: {}, Detail: 
> {}", message, detail);
> {code}
> This can improve performance since the string concatenation is deferrable to 
> when the logging is actually in effect.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-355) Refactor hudi-common based on new comment and code style rules

2019-12-08 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-355.
--
Fix Version/s: 0.5.1
   Resolution: Fixed

Fixed via master: 84602c888298e70ea5e64029c14e862172d32f99

> Refactor hudi-common based on new comment and code style rules
> --
>
> Key: HUDI-355
> URL: https://issues.apache.org/jira/browse/HUDI-355
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This issue used to refactor hudi-common module based on new comment and code 
> style rules.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-395) hudi does not support scheme s3n when wrtiing to S3

2019-12-09 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992040#comment-16992040
 ] 

leesf commented on HUDI-395:


Hi, thanks for reporting this, right now, s3n is not supported yet, s3 and s3a 
is supported. and you would check it here 
https://github.com/apache/incubator-hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/common/storage/StorageSchemes.java

> hudi does not support scheme s3n when wrtiing to S3
> ---
>
> Key: HUDI-395
> URL: https://issues.apache.org/jira/browse/HUDI-395
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Spark datasource
> Environment: spark-2.4.4-bin-hadoop2.7
>Reporter: rui feng
>Priority: Major
>
> When I use Hudi to create a hudi table then write to s3, I used below maven 
> snnipet which is recommended by [https://hudi.apache.org/s3_hoodie.html]
> 
>  org.apache.hudi
>  hudi-spark-bundle
>  0.5.0-incubating
> 
> 
>  org.apache.hadoop
>  hadoop-aws
>  2.7.3
> 
> 
>  com.amazonaws
>  aws-java-sdk
>  1.10.34
> 
> and add the below configuration:
> sc.hadoopConfiguration.set("fs.defaultFS", "s3://niketest1")
>  sc.hadoopConfiguration.set("fs.s3.impl", 
> "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
>  sc.hadoopConfiguration.set("fs.s3n.impl", 
> "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
>  sc.hadoopConfiguration.set("fs.s3.awsAccessKeyId", "xx")
>  sc.hadoopConfiguration.set("fs.s3.awsSecretAccessKey", "x")
>  sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "xx")
>  sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", "x")
>  
> my spark version is spark-2.4.4-bin-hadoop2.7 and when I run below
> {color:#FF}df.write.format("org.apache.hudi").options(hudiOptions).mode(SaveMode.Overwrite).save(hudiTablePath).{color}
> val hudiOptions = Map[String,String](
>  HoodieWriteConfig.TABLE_NAME -> "hudi12",
>  DataSourceWriteOptions.OPERATION_OPT_KEY -> 
> DataSourceWriteOptions.INSERT_OPERATION_OPT_VAL,
>  DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY -> "rider",
>  DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY -> 
> DataSourceWriteOptions.MOR_STORAGE_TYPE_OPT_VAL)
> val hudiTablePath = "s3://niketest1/hudi_test/hudi12"
> the exception occur:
> j{color:#FF}ava.lang.IllegalArgumentException: 
> BlockAlignedAvroParquetWriter does not support scheme s3n{color}
>  at 
> org.apache.hudi.common.io.storage.HoodieWrapperFileSystem.getHoodieScheme(HoodieWrapperFileSystem.java:109)
>  at 
> org.apache.hudi.common.io.storage.HoodieWrapperFileSystem.convertToHoodiePath(HoodieWrapperFileSystem.java:85)
>  at 
> org.apache.hudi.io.storage.HoodieParquetWriter.(HoodieParquetWriter.java:57)
>  at 
> org.apache.hudi.io.storage.HoodieStorageWriterFactory.newParquetStorageWriter(HoodieStorageWriterFactory.java:60)
>  at 
> org.apache.hudi.io.storage.HoodieStorageWriterFactory.getStorageWriter(HoodieStorageWriterFactory.java:44)
>  at org.apache.hudi.io.HoodieCreateHandle.(HoodieCreateHandle.java:70)
>  at 
> org.apache.hudi.func.CopyOnWriteLazyInsertIterable$CopyOnWriteInsertHandler.consumeOneRecord(CopyOnWriteLazyInsertIterable.java:137)
>  at 
> org.apache.hudi.func.CopyOnWriteLazyInsertIterable$CopyOnWriteInsertHandler.consumeOneRecord(CopyOnWriteLazyInsertIterable.java:125)
>  at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:38)
>  at 
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:120)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  
>  
> Is anyone can tell me what's cause this exception, I tried to use 
> org.apache.hadoop.fs.s3.S3FileSystem to replace 
> org.apache.hadoop.fs.s3native.NativeS3FileSystem for the conf "fs.s3.impl", 
> but other exception occur and it seems org.apache.hadoop.fs.s3.S3FileSystem 
> fit hadoop 2.6.
>  
> Thanks advance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-368) Code clean up in TestAsyncCompaction class

2019-12-10 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-368.
--
Resolution: Fixed

Fixed via master: 3790b75e059a06e6f5467c8b8d549ef38cd6b98a

> Code clean up in TestAsyncCompaction class
> --
>
> Key: HUDI-368
> URL: https://issues.apache.org/jira/browse/HUDI-368
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Compaction, Testing
>Reporter: Pratyaksh Sharma
>Assignee: Pratyaksh Sharma
>Priority: Major
>  Labels: newbie, pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> TestAsyncCompaction class has a lot of redundant method calls, or lambda 
> functions which can be simplified further. Also there are few unused 
> variables getting defined which can be removed. 
>  
> For example -> 
> assertFalse("Verify all file-slices have no log-files",
>  fileSliceList.stream().filter(fs -> fs.getLogFiles().count() > 
> 0).findAny().isPresent());
> can be simplified as - 
> assertFalse("Verify all file-slices have no log-files",
>  fileSliceList.stream().anyMatch(fs -> fs.getLogFiles().count() > 0));



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-390) Hive Sync should support keywords are table names

2019-12-10 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-390.
--
Resolution: Fixed

Fixed via master: 8df4b83017f74173b8289ab50b0f723a38e8eebe

> Hive Sync should support keywords are table names
> -
>
> Key: HUDI-390
> URL: https://issues.apache.org/jira/browse/HUDI-390
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Hive Integration
>Reporter: Vinoth Chandar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> [https://github.com/apache/incubator-hudi/issues/1084]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-410) HFile inlining into the log file

2019-12-13 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16996218#comment-16996218
 ] 

leesf commented on HUDI-410:


Hi [~nagarwal], any more context here?

> HFile inlining into the log file
> 
>
> Key: HUDI-410
> URL: https://issues.apache.org/jira/browse/HUDI-410
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Index
>Reporter: Nishith Agarwal
>Priority: Major
> Fix For: 0.5.2
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-409) Replace Log Magic header with a secure hash to avoid clashes with data

2019-12-13 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16996221#comment-16996221
 ] 

leesf commented on HUDI-409:


A hash code for all files? how about generating hash with Apache HUDI?

> Replace Log Magic header with a secure hash to avoid clashes with data
> --
>
> Key: HUDI-409
> URL: https://issues.apache.org/jira/browse/HUDI-409
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Common Core
>Reporter: Nishith Agarwal
>Priority: Major
> Fix For: 0.5.2
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-398) Add set env for spark launcher

2019-12-13 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-398:
---
Status: Closed  (was: Patch Available)

> Add set env for spark launcher
> --
>
> Key: HUDI-398
> URL: https://issues.apache.org/jira/browse/HUDI-398
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: CLI
>Reporter: hong dongdong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
> Attachments: image-2019-12-11-14-44-55-064.png, 
> image-2019-12-11-14-45-27-764.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> It always throw exception 'SPAR_HOEM not found' when SPARK_HOME is not set. 
> So we need quit and set it.
> !image-2019-12-11-14-45-27-764.png!
> After add this function for cli, we can type SPARK_HOEM and other conf on 
> hudi-CLI.
> !image-2019-12-11-14-44-55-064.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-398) Add set env for spark launcher

2019-12-13 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-398.
--
Resolution: Fixed

Fixed via master: 8963a68e6a3f4875de6787dfc206543e9ca824d9

> Add set env for spark launcher
> --
>
> Key: HUDI-398
> URL: https://issues.apache.org/jira/browse/HUDI-398
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: CLI
>Reporter: hong dongdong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
> Attachments: image-2019-12-11-14-44-55-064.png, 
> image-2019-12-11-14-45-27-764.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> It always throw exception 'SPAR_HOEM not found' when SPARK_HOME is not set. 
> So we need quit and set it.
> !image-2019-12-11-14-45-27-764.png!
> After add this function for cli, we can type SPARK_HOEM and other conf on 
> hudi-CLI.
> !image-2019-12-11-14-44-55-064.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HUDI-398) Add set env for spark launcher

2019-12-13 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf reopened HUDI-398:


> Add set env for spark launcher
> --
>
> Key: HUDI-398
> URL: https://issues.apache.org/jira/browse/HUDI-398
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: CLI
>Reporter: hong dongdong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
> Attachments: image-2019-12-11-14-44-55-064.png, 
> image-2019-12-11-14-45-27-764.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> It always throw exception 'SPAR_HOEM not found' when SPARK_HOME is not set. 
> So we need quit and set it.
> !image-2019-12-11-14-45-27-764.png!
> After add this function for cli, we can type SPARK_HOEM and other conf on 
> hudi-CLI.
> !image-2019-12-11-14-44-55-064.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-331) Fix java docs for all public apis (HoodieWriteClient)

2019-12-16 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16997800#comment-16997800
 ] 

leesf commented on HUDI-331:


[~hongdongdong] Thanks.

> Fix java docs for all public apis (HoodieWriteClient)
> -
>
> Key: HUDI-331
> URL: https://issues.apache.org/jira/browse/HUDI-331
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Docs
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: newbie
> Fix For: 0.5.1
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Some public apis in HoodieWriteClient need to be fixed with sufficient info. 
> Creating this ticket to get it fixed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-400) Add more checks to TestCompactionUtils#testUpgradeDowngrade

2019-12-11 Thread leesf (Jira)
leesf created HUDI-400:
--

 Summary: Add more checks to 
TestCompactionUtils#testUpgradeDowngrade
 Key: HUDI-400
 URL: https://issues.apache.org/jira/browse/HUDI-400
 Project: Apache Hudi (incubating)
  Issue Type: Improvement
  Components: newbie, Testing
Reporter: leesf


Currently, the TestCompactionUtils#testUpgradeDowngrade does not check upgrade 
from old plan to new plan, it is proper to add some checks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-322) DeltaSteamer should pick checkpoints off only deltacommits for MOR tables

2019-12-11 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16993922#comment-16993922
 ] 

leesf commented on HUDI-322:


You are welcome.:) just do it.

> DeltaSteamer should pick checkpoints off only deltacommits for MOR tables
> -
>
> Key: HUDI-322
> URL: https://issues.apache.org/jira/browse/HUDI-322
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: deltastreamer
>Reporter: Vinoth Chandar
>Assignee: Shahida Khan
>Priority: Major
> Fix For: 0.5.1
>
>
> When using DeltaStreamer with MOR, the checkpoints would be written out to 
> .deltacommit files (and not .commit files). We need to confirm the behavior 
> and change code such that it reads from the correct metadata file..  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-383) Introduce TransactionHandle abstraction to manage state transitions in hudi clients

2019-12-05 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf reassigned HUDI-383:
--

Assignee: leesf

> Introduce TransactionHandle abstraction to manage state transitions in hudi 
> clients
> ---
>
> Key: HUDI-383
> URL: https://issues.apache.org/jira/browse/HUDI-383
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Cleaner, Compaction, Write Client
>Reporter: Balaji Varadarajan
>Assignee: leesf
>Priority: Minor
>
> Came up in review comment. 
> https://github.com/apache/incubator-hudi/pull/1009/files#r347705820



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-381) Refactor hudi-client based on new comment and code style rules

2019-12-06 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-381.
--
Resolution: Duplicate

> Refactor hudi-client based on new comment and code style rules
> --
>
> Key: HUDI-381
> URL: https://issues.apache.org/jira/browse/HUDI-381
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>Reporter: Forward Xu
>Assignee: Forward Xu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This issue used to refactor hudi-client module based on new comment and code 
> style rules.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-294) Delete Paths written in Cleaner plan needs to be relative to partition-path

2019-12-03 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-294.
--
Resolution: Fixed

Fixed via master: 98ab33bb6e1637d18c8fab7a9ddd50daeaf56962

> Delete Paths written in Cleaner plan needs to be relative to partition-path
> ---
>
> Key: HUDI-294
> URL: https://issues.apache.org/jira/browse/HUDI-294
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Cleaner
>Reporter: Balaji Varadarajan
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The deleted file paths stored in Clean metadata are all absolute. They need 
> to be changed to relative path.
> The challenge would be to handle cases when both version of cleaner metadata 
> are present and needs to be processed  (backwards compatibility)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-414) Refactor handling of layout version filters in active timeline

2019-12-15 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf reassigned HUDI-414:
--

Assignee: leesf

> Refactor handling of layout version filters in active timeline
> --
>
> Key: HUDI-414
> URL: https://issues.apache.org/jira/browse/HUDI-414
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>Reporter: Balaji Varadarajan
>Assignee: leesf
>Priority: Minor
>
> Per code-review comment : 
> [https://github.com/apache/incubator-hudi/pull/1009#discussion_r357181383]
> One idea is to introduce factory methods with name explicitly suggesting if 
> instants are filtered or not ? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HUDI-395) hudi does not support scheme s3n when wrtiing to S3

2019-12-09 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992040#comment-16992040
 ] 

leesf edited comment on HUDI-395 at 12/10/19 1:39 AM:
--

Hi, thanks for reporting this, right now, s3n is not supported yet, s3 and s3a 
is supported. and you would check it here 
https://github.com/apache/incubator-hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/common/storage/StorageSchemes.java
and maybe you could send a PR to support it.


was (Author: xleesf):
Hi, thanks for reporting this, right now, s3n is not supported yet, s3 and s3a 
is supported. and you would check it here 
https://github.com/apache/incubator-hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/common/storage/StorageSchemes.java

> hudi does not support scheme s3n when wrtiing to S3
> ---
>
> Key: HUDI-395
> URL: https://issues.apache.org/jira/browse/HUDI-395
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Spark datasource
> Environment: spark-2.4.4-bin-hadoop2.7
>Reporter: rui feng
>Priority: Major
>
> When I use Hudi to create a hudi table then write to s3, I used below maven 
> snnipet which is recommended by [https://hudi.apache.org/s3_hoodie.html]
> 
>  org.apache.hudi
>  hudi-spark-bundle
>  0.5.0-incubating
> 
> 
>  org.apache.hadoop
>  hadoop-aws
>  2.7.3
> 
> 
>  com.amazonaws
>  aws-java-sdk
>  1.10.34
> 
> and add the below configuration:
> sc.hadoopConfiguration.set("fs.defaultFS", "s3://niketest1")
>  sc.hadoopConfiguration.set("fs.s3.impl", 
> "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
>  sc.hadoopConfiguration.set("fs.s3n.impl", 
> "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
>  sc.hadoopConfiguration.set("fs.s3.awsAccessKeyId", "xx")
>  sc.hadoopConfiguration.set("fs.s3.awsSecretAccessKey", "x")
>  sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "xx")
>  sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", "x")
>  
> my spark version is spark-2.4.4-bin-hadoop2.7 and when I run below
> {color:#FF}df.write.format("org.apache.hudi").options(hudiOptions).mode(SaveMode.Overwrite).save(hudiTablePath).{color}
> val hudiOptions = Map[String,String](
>  HoodieWriteConfig.TABLE_NAME -> "hudi12",
>  DataSourceWriteOptions.OPERATION_OPT_KEY -> 
> DataSourceWriteOptions.INSERT_OPERATION_OPT_VAL,
>  DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY -> "rider",
>  DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY -> 
> DataSourceWriteOptions.MOR_STORAGE_TYPE_OPT_VAL)
> val hudiTablePath = "s3://niketest1/hudi_test/hudi12"
> the exception occur:
> j{color:#FF}ava.lang.IllegalArgumentException: 
> BlockAlignedAvroParquetWriter does not support scheme s3n{color}
>  at 
> org.apache.hudi.common.io.storage.HoodieWrapperFileSystem.getHoodieScheme(HoodieWrapperFileSystem.java:109)
>  at 
> org.apache.hudi.common.io.storage.HoodieWrapperFileSystem.convertToHoodiePath(HoodieWrapperFileSystem.java:85)
>  at 
> org.apache.hudi.io.storage.HoodieParquetWriter.(HoodieParquetWriter.java:57)
>  at 
> org.apache.hudi.io.storage.HoodieStorageWriterFactory.newParquetStorageWriter(HoodieStorageWriterFactory.java:60)
>  at 
> org.apache.hudi.io.storage.HoodieStorageWriterFactory.getStorageWriter(HoodieStorageWriterFactory.java:44)
>  at org.apache.hudi.io.HoodieCreateHandle.(HoodieCreateHandle.java:70)
>  at 
> org.apache.hudi.func.CopyOnWriteLazyInsertIterable$CopyOnWriteInsertHandler.consumeOneRecord(CopyOnWriteLazyInsertIterable.java:137)
>  at 
> org.apache.hudi.func.CopyOnWriteLazyInsertIterable$CopyOnWriteInsertHandler.consumeOneRecord(CopyOnWriteLazyInsertIterable.java:125)
>  at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:38)
>  at 
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:120)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  
>  
> Is anyone can tell me what's cause this exception, I tried to use 
> org.apache.hadoop.fs.s3.S3FileSystem to replace 
> org.apache.hadoop.fs.s3native.NativeS3FileSystem for the conf "fs.s3.impl", 
> but other exception occur and it seems org.apache.hadoop.fs.s3.S3FileSystem 
> fit hadoop 2.6.
>  
> Thanks advance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-343) Create a DOAP File for Hudi

2019-12-11 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf reassigned HUDI-343:
--

Assignee: leesf

> Create a DOAP File for Hudi
> ---
>
> Key: HUDI-343
> URL: https://issues.apache.org/jira/browse/HUDI-343
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: release
>Reporter: Vinoth Chandar
>Assignee: leesf
>Priority: Major
> Fix For: 0.5.1
>
>
> But please create a DOAP file for Hudi, where you can also list the
> release: https://projects.apache.org/create.html
> <https://projects.apache.org/project.html?incubator-hudi>



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-248) CLI doesn't allow rolling back a Delta commit

2019-12-11 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16994055#comment-16994055
 ] 

leesf commented on HUDI-248:


hi [~rbhartia], do you have interest in working it?

> CLI doesn't allow rolling back a Delta commit
> -
>
> Key: HUDI-248
> URL: https://issues.apache.org/jira/browse/HUDI-248
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: CLI
>Reporter: Rahul Bhartia
>Priority: Minor
>  Labels: aws-emr
> Fix For: 0.5.1
>
>
> [https://github.com/apache/incubator-hudi/blob/master/hudi-cli/src/main/java/org/apache/hudi/cli/commands/CommitsCommand.java#L128]
>  
> When trying to find a match for passed in commit value, the "commit rollback" 
> command is always default to using HoodieTimeline.COMMIT_ACTION - and hence 
> doesn't allow rolling back delta commits.
> Note: Delta Commits can be rolled back using a HoodieWriteClient, so seems 
> like it's a just a matter of having to match against both COMMIT_ACTION and 
> DELTA_COMMIT_ACTION in the CLI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-343) Create a DOAP File for Hudi

2019-12-11 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16994056#comment-16994056
 ] 

leesf commented on HUDI-343:


[~vinoth] I can help to work on it together with you.

> Create a DOAP File for Hudi
> ---
>
> Key: HUDI-343
> URL: https://issues.apache.org/jira/browse/HUDI-343
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: release
>Reporter: Vinoth Chandar
>Priority: Major
> Fix For: 0.5.1
>
>
> But please create a DOAP file for Hudi, where you can also list the
> release: https://projects.apache.org/create.html
> <https://projects.apache.org/project.html?incubator-hudi>



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-331) Fix java docs for all public apis (HoodieWriteClient)

2019-12-11 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16994059#comment-16994059
 ] 

leesf commented on HUDI-331:


Hi [~shivnarayan], do you have interest working on it?

> Fix java docs for all public apis (HoodieWriteClient)
> -
>
> Key: HUDI-331
> URL: https://issues.apache.org/jira/browse/HUDI-331
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Docs
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: newbie
> Fix For: 0.5.1
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Some public apis in HoodieWriteClient need to be fixed with sufficient info. 
> Creating this ticket to get it fixed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-376) AWS Glue dependency issue for EMR 5.28.0

2019-12-11 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16994048#comment-16994048
 ] 

leesf commented on HUDI-376:


[~XingXPan] Would you please send a PR to fix it?

> AWS Glue dependency issue for EMR 5.28.0
> 
>
> Key: HUDI-376
> URL: https://issues.apache.org/jira/browse/HUDI-376
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: CLI
>Reporter: Xing Pan
>Priority: Minor
> Fix For: 0.5.1
>
>
> Hi hudi team, it's really encouraging that Hudi is finally officially 
> supported application on AWS EMR. Great job!
> I found a *ClassNotFound* exception when using:
> {code:java}
> /usr/lib/hudi/bin/run_sync_tool.sh
> {code}
> in emr master.
> And I think is due to demand of aws glue data sdk dependency. (I used aws 
> glue as hive meta data)
> So I added a line to run_sync_tool.sh to get a quick fix for this:
> {code:java}
> HIVE_JARS=$HIVE_JARS:/usr/lib/hive/auxlib/aws-glue-datacatalog-hive2-client.jar:/usr/share/aws/emr/emr-metrics-collector/lib/aws-java-sdk-glue-1.11.475.jar{code}
> not sure if any more jars needed, but these two jar fixed my problem.
>  
> I think it would be great if take glue in consideration for emr scripts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-304) Bring back spotless plugin

2019-12-11 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16994051#comment-16994051
 ] 

leesf commented on HUDI-304:


[~vbalaji] Sorry for lately response maybe for the missing of notification. 
Will do it. 

> Bring back spotless plugin 
> ---
>
> Key: HUDI-304
> URL: https://issues.apache.org/jira/browse/HUDI-304
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Developer Productivity
>Reporter: Balaji Varadarajan
>Assignee: leesf
>Priority: Major
> Fix For: 0.5.1
>
>
> spotless plugin has been turned off as the eclipse style format it was 
> referencing was removed due to compliance reasons. 
> We use google style eclipse format with some changes
> 90c90
> < 
> ---
> > 
> 242c242
> <  value="100"/>
> ---
> >  > value="120"/>
>  
> The eclipse style sheet was originally obtained from 
> [https://github.com/google/styleguide] which CC -By 3.0 license which is not 
> compatible for source distribution (See 
> [https://www.apache.org/legal/resolved.html#cc-by]) 
>  
> We need to figure out a way to bring this back
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-304) Bring back spotless plugin

2019-12-11 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf reassigned HUDI-304:
--

Assignee: leesf

> Bring back spotless plugin 
> ---
>
> Key: HUDI-304
> URL: https://issues.apache.org/jira/browse/HUDI-304
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Developer Productivity
>Reporter: Balaji Varadarajan
>Assignee: leesf
>Priority: Major
> Fix For: 0.5.1
>
>
> spotless plugin has been turned off as the eclipse style format it was 
> referencing was removed due to compliance reasons. 
> We use google style eclipse format with some changes
> 90c90
> < 
> ---
> > 
> 242c242
> <  value="100"/>
> ---
> >  > value="120"/>
>  
> The eclipse style sheet was originally obtained from 
> [https://github.com/google/styleguide] which CC -By 3.0 license which is not 
> compatible for source distribution (See 
> [https://www.apache.org/legal/resolved.html#cc-by]) 
>  
> We need to figure out a way to bring this back
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-312) Investigate recent flaky CI runs

2019-10-28 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961014#comment-16961014
 ] 

leesf commented on HUDI-312:


For vm crashes, the following information is useful.
https://stackoverflow.com/questions/23260057/the-forked-vm-terminated-without-saying-properly-goodbye-vm-crash-or-system-exi.
I added -Xmx1024m -XX:MaxPermSize=256m to surefire plugin 
configuration(https://github.com/leesf/incubator-hudi/commit/d2b26650fc921666761d816b721cea22307bf884)
 and build it many times in travis 
(https://travis-ci.org/leesf/incubator-hudi/builds/603810854), the vm did not 
crash again. [~vinoth]

> Investigate recent flaky CI runs
> 
>
> Key: HUDI-312
> URL: https://issues.apache.org/jira/browse/HUDI-312
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Testing
>Reporter: Vinoth Chandar
>Priority: Major
> Fix For: 0.5.1
>
> Attachments: Builds - apache_incubator-hudi - Travis CI.pdf
>
>
> master used to be solid green. noticing that nowadays PRs and even some 
> master merges fail with 
> - No output received for 10m
> - Exceeded runtime of 50m 
> - VM exit crash 
> We saw this earlier in the year as well. It was due to the apache org queue 
> in travis being busy/stressed. I think we should shadow azure CI or circle CI 
> parallely and weed out code vs environment issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-312) Investigate recent flaky CI runs

2019-10-28 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961205#comment-16961205
 ] 

leesf commented on HUDI-312:


Wil send a PR later today.

> Investigate recent flaky CI runs
> 
>
> Key: HUDI-312
> URL: https://issues.apache.org/jira/browse/HUDI-312
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Testing
>Reporter: Vinoth Chandar
>Priority: Major
> Fix For: 0.5.1
>
> Attachments: Builds - apache_incubator-hudi - Travis CI.pdf
>
>
> master used to be solid green. noticing that nowadays PRs and even some 
> master merges fail with 
> - No output received for 10m
> - Exceeded runtime of 50m 
> - VM exit crash 
> We saw this earlier in the year as well. It was due to the apache org queue 
> in travis being busy/stressed. I think we should shadow azure CI or circle CI 
> parallely and weed out code vs environment issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-283) Look at spark-shell and ensure that auto-tune for memory for spillable map has sane defaults

2019-10-21 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-283.
--
Resolution: Fixed

Fixed via master: dfdc0e40e1f85c49e580b31204621758e3d76ac5

> Look at spark-shell and ensure that auto-tune for memory for spillable map 
> has sane defaults
> 
>
> Key: HUDI-283
> URL: https://issues.apache.org/jira/browse/HUDI-283
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Common Core
>Reporter: Nishith Agarwal
>Assignee: Vinoth Chandar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HUDI-304) Bring back spotless plugin

2019-10-16 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952639#comment-16952639
 ] 

leesf edited comment on HUDI-304 at 10/16/19 9:19 AM:
--

[~vbalaji] got it. 
PS: I notice that the 
avro(https://github.com/apache/avro/blob/master/lang/java/eclipse-java-formatter.xml)
 uses apache lisence. CC [~vinoth]


was (Author: xleesf):
[~vbalaji] got it.

> Bring back spotless plugin 
> ---
>
> Key: HUDI-304
> URL: https://issues.apache.org/jira/browse/HUDI-304
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Developer Productivity
>Reporter: Balaji Varadarajan
>Priority: Major
>
> spotless plugin has been turned off as the eclipse style format it was 
> referencing was removed due to compliance reasons. 
> We use google style eclipse format with some changes
> 90c90
> < 
> ---
> > 
> 242c242
> <  value="100"/>
> ---
> >  > value="120"/>
>  
> The eclipse style sheet was originally obtained from 
> [https://github.com/google/styleguide] which CC -By 3.0 license which is not 
> compatible for source distribution (See 
> [https://www.apache.org/legal/resolved.html#cc-by]) 
>  
> We need to figure out a way to bring this back
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-304) Bring back spotless plugin

2019-10-16 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952639#comment-16952639
 ] 

leesf commented on HUDI-304:


[~vbalaji] got it.

> Bring back spotless plugin 
> ---
>
> Key: HUDI-304
> URL: https://issues.apache.org/jira/browse/HUDI-304
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Developer Productivity
>Reporter: Balaji Varadarajan
>Priority: Major
>
> spotless plugin has been turned off as the eclipse style format it was 
> referencing was removed due to compliance reasons. 
> We use google style eclipse format with some changes
> 90c90
> < 
> ---
> > 
> 242c242
> <  value="100"/>
> ---
> >  > value="120"/>
>  
> The eclipse style sheet was originally obtained from 
> [https://github.com/google/styleguide] which CC -By 3.0 license which is not 
> compatible for source distribution (See 
> [https://www.apache.org/legal/resolved.html#cc-by]) 
>  
> We need to figure out a way to bring this back
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-221) Translate concept page

2019-10-14 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16951479#comment-16951479
 ] 

leesf commented on HUDI-221:


OK, please assign the ticket to me.

> Translate concept page
> --
>
> Key: HUDI-221
> URL: https://issues.apache.org/jira/browse/HUDI-221
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: docs-chinese
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>
> The online HTML web page: [https://hudi.apache.org/concepts.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HUDI-232) Implement sealing/unsealing for HoodieRecord class

2019-10-16 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf reopened HUDI-232:


> Implement sealing/unsealing for HoodieRecord class
> --
>
> Key: HUDI-232
> URL: https://issues.apache.org/jira/browse/HUDI-232
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Write Client
>Affects Versions: 0.5.0
>Reporter: Vinoth Chandar
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> HoodieRecord class sometimes is modified to set the record location. We can 
> get into issues like HUDI-170 if the modification is misplaced. We need a 
> mechanism to seal the class and unseal for modification explicity.. Try to 
> modify in sealed state should throw an error



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-232) Implement sealing/unsealing for HoodieRecord class

2019-10-16 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-232.
--
Resolution: Fixed

> Implement sealing/unsealing for HoodieRecord class
> --
>
> Key: HUDI-232
> URL: https://issues.apache.org/jira/browse/HUDI-232
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Write Client
>Affects Versions: 0.5.0
>Reporter: Vinoth Chandar
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> HoodieRecord class sometimes is modified to set the record location. We can 
> get into issues like HUDI-170 if the modification is misplaced. We need a 
> mechanism to seal the class and unseal for modification explicity.. Try to 
> modify in sealed state should throw an error



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-312) Investigate recent flaky CI runs

2019-10-24 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959125#comment-16959125
 ] 

leesf commented on HUDI-312:


I will invertigate the frequently VM exit crash error when get a circle.

> Investigate recent flaky CI runs
> 
>
> Key: HUDI-312
> URL: https://issues.apache.org/jira/browse/HUDI-312
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Testing
>Reporter: Vinoth Chandar
>Priority: Major
> Fix For: 0.5.1
>
> Attachments: Builds - apache_incubator-hudi - Travis CI.pdf
>
>
> master used to be solid green. noticing that nowadays PRs and even some 
> master merges fail with 
> - No output received for 10m
> - Exceeded runtime of 50m 
> - VM exit crash 
> We saw this earlier in the year as well. It was due to the apache org queue 
> in travis being busy/stressed. I think we should shadow azure CI or circle CI 
> parallely and weed out code vs environment issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-312) Investigate recent flaky CI runs

2019-10-29 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf reassigned HUDI-312:
--

Assignee: leesf

> Investigate recent flaky CI runs
> 
>
> Key: HUDI-312
> URL: https://issues.apache.org/jira/browse/HUDI-312
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Testing
>Reporter: Vinoth Chandar
>Assignee: leesf
>Priority: Major
> Fix For: 0.5.1
>
> Attachments: Builds - apache_incubator-hudi - Travis CI.pdf
>
>
> master used to be solid green. noticing that nowadays PRs and even some 
> master merges fail with 
> - No output received for 10m
> - Exceeded runtime of 50m 
> - VM exit crash 
> We saw this earlier in the year as well. It was due to the apache org queue 
> in travis being busy/stressed. I think we should shadow azure CI or circle CI 
> parallely and weed out code vs environment issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-312) Investigate recent flaky CI runs

2019-10-29 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961772#comment-16961772
 ] 

leesf commented on HUDI-312:


On no output, I changed the 
[travis.yaml|https://github.com/leesf/incubator-hudi/commit/dcc7e6a13b41dcf8e8df30eb8d5b64367b8feb06],
 borrow from 
[here|https://github.com/cyberFund/cybernode-archive/commit/0dbb14c5169144b7535cbf3dc474916b93a64a5e]
 and run more than 5 times in 
[travis|https://travis-ci.org/leesf/incubator-hudi/jobs/604252022], no output 
occurs again. I will continue restart the travis and observe the result before 
send a PR.

> Investigate recent flaky CI runs
> 
>
> Key: HUDI-312
> URL: https://issues.apache.org/jira/browse/HUDI-312
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Testing
>Reporter: Vinoth Chandar
>Assignee: leesf
>Priority: Major
> Fix For: 0.5.1
>
> Attachments: Builds - apache_incubator-hudi - Travis CI.pdf
>
>
> master used to be solid green. noticing that nowadays PRs and even some 
> master merges fail with 
> - No output received for 10m
> - Exceeded runtime of 50m 
> - VM exit crash 
> We saw this earlier in the year as well. It was due to the apache org queue 
> in travis being busy/stressed. I think we should shadow azure CI or circle CI 
> parallely and weed out code vs environment issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HUDI-312) Investigate recent flaky CI runs

2019-10-29 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961772#comment-16961772
 ] 

leesf edited comment on HUDI-312 at 10/29/19 7:50 AM:
--

On no output, I changed the 
[travis.yaml|https://github.com/leesf/incubator-hudi/commit/dcc7e6a13b41dcf8e8df30eb8d5b64367b8feb06],
 borrow from 
[here|https://github.com/cyberFund/cybernode-archive/commit/0dbb14c5169144b7535cbf3dc474916b93a64a5e]
 and run more than 5 times in 
[travis|https://travis-ci.org/leesf/incubator-hudi/jobs/604252022], no output 
does not occurs again. I will continue restart the travis and observe the 
result before send a PR.


was (Author: xleesf):
On no output, I changed the 
[travis.yaml|https://github.com/leesf/incubator-hudi/commit/dcc7e6a13b41dcf8e8df30eb8d5b64367b8feb06],
 borrow from 
[here|https://github.com/cyberFund/cybernode-archive/commit/0dbb14c5169144b7535cbf3dc474916b93a64a5e]
 and run more than 5 times in 
[travis|https://travis-ci.org/leesf/incubator-hudi/jobs/604252022], no output 
occurs again. I will continue restart the travis and observe the result before 
send a PR.

> Investigate recent flaky CI runs
> 
>
> Key: HUDI-312
> URL: https://issues.apache.org/jira/browse/HUDI-312
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Testing
>Reporter: Vinoth Chandar
>Assignee: leesf
>Priority: Major
> Fix For: 0.5.1
>
> Attachments: Builds - apache_incubator-hudi - Travis CI.pdf
>
>
> master used to be solid green. noticing that nowadays PRs and even some 
> master merges fail with 
> - No output received for 10m
> - Exceeded runtime of 50m 
> - VM exit crash 
> We saw this earlier in the year as well. It was due to the apache org queue 
> in travis being busy/stressed. I think we should shadow azure CI or circle CI 
> parallely and weed out code vs environment issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-312) Investigate recent flaky CI runs

2019-10-28 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961627#comment-16961627
 ] 

leesf commented on HUDI-312:


another build error (No output has been received in the last 10m0s, this 
potentially indicates a stalled build or something wrong with the build 
itself.) https://travis-ci.org/leesf/incubator-hudi/jobs/604224429

> Investigate recent flaky CI runs
> 
>
> Key: HUDI-312
> URL: https://issues.apache.org/jira/browse/HUDI-312
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Testing
>Reporter: Vinoth Chandar
>Priority: Major
> Fix For: 0.5.1
>
> Attachments: Builds - apache_incubator-hudi - Travis CI.pdf
>
>
> master used to be solid green. noticing that nowadays PRs and even some 
> master merges fail with 
> - No output received for 10m
> - Exceeded runtime of 50m 
> - VM exit crash 
> We saw this earlier in the year as well. It was due to the apache org queue 
> in travis being busy/stressed. I think we should shadow azure CI or circle CI 
> parallely and weed out code vs environment issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-312) Investigate recent flaky CI runs

2019-10-29 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16962467#comment-16962467
 ] 

leesf commented on HUDI-312:


Will provide a PR soon.
Of cause it is suitable to open a new issue to track integ test hanging. And I 
could not reproduce integ test hanging locally yet.:(

> Investigate recent flaky CI runs
> 
>
> Key: HUDI-312
> URL: https://issues.apache.org/jira/browse/HUDI-312
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Testing
>Reporter: Vinoth Chandar
>Assignee: leesf
>Priority: Major
> Fix For: 0.5.1
>
> Attachments: Builds - apache_incubator-hudi - Travis CI.pdf
>
>
> master used to be solid green. noticing that nowadays PRs and even some 
> master merges fail with 
> - No output received for 10m
> - Exceeded runtime of 50m 
> - VM exit crash 
> We saw this earlier in the year as well. It was due to the apache org queue 
> in travis being busy/stressed. I think we should shadow azure CI or circle CI 
> parallely and weed out code vs environment issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-290) Normalize test class name of all test classes

2019-10-23 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-290.
--
Fix Version/s: 0.5.1
   Resolution: Fixed

Fixed via master: e4c91ed13f16e1d0d55d9d690395bb62ab2b4fa0

> Normalize test class name of all test classes
> -
>
> Key: HUDI-290
> URL: https://issues.apache.org/jira/browse/HUDI-290
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Testing
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In general, a test case name start with {{Test}}. For example, it would be 
> better to rename {{HoodieWriteConfigTest}} to {{TestHoodieWriteConfig}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HUDI-233) Redo log statements using SLF4J

2019-10-23 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958408#comment-16958408
 ] 

leesf edited comment on HUDI-233 at 10/23/19 11:51 PM:
---

[~shivnarayan] Thanks for your interests. You could take some tickets once we 
could be start the progress. 
Also https://jira.apache.org/jira/projects/HUDI/issues/HUDI-302 is a startup if 
you are interest.


was (Author: xleesf):
[~shivnarayan] Thanks for your interests. You could pick some tickets once we 
could be start the progress. 
Also https://jira.apache.org/jira/projects/HUDI/issues/HUDI-302 is a startup if 
you are interest.

> Redo log statements using SLF4J 
> 
>
> Key: HUDI-233
> URL: https://issues.apache.org/jira/browse/HUDI-233
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: newbie, Performance
>Affects Versions: 0.5.0
>Reporter: Vinoth Chandar
>Assignee: leesf
>Priority: Major
>
> Currently we are not employing variable substitution aggresively in the 
> project.  ala 
> {code:java}
> LogManager.getLogger(SomeName.class.getName()).info("Message: {}, Detail: 
> {}", message, detail);
> {code}
> This can improve performance since the string concatenation is deferrable to 
> when the logging is actually in effect.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HUDI-304) Bring back spotless plugin

2019-10-16 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953401#comment-16953401
 ] 

leesf edited comment on HUDI-304 at 10/17/19 4:47 AM:
--

yeah, of casuse wait for the release.


was (Author: xleesf):
of casuse wait for the release.

> Bring back spotless plugin 
> ---
>
> Key: HUDI-304
> URL: https://issues.apache.org/jira/browse/HUDI-304
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Developer Productivity
>Reporter: Balaji Varadarajan
>Priority: Major
>
> spotless plugin has been turned off as the eclipse style format it was 
> referencing was removed due to compliance reasons. 
> We use google style eclipse format with some changes
> 90c90
> < 
> ---
> > 
> 242c242
> <  value="100"/>
> ---
> >  > value="120"/>
>  
> The eclipse style sheet was originally obtained from 
> [https://github.com/google/styleguide] which CC -By 3.0 license which is not 
> compatible for source distribution (See 
> [https://www.apache.org/legal/resolved.html#cc-by]) 
>  
> We need to figure out a way to bring this back
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-304) Bring back spotless plugin

2019-10-16 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953401#comment-16953401
 ] 

leesf commented on HUDI-304:


of casuse wait for the release.

> Bring back spotless plugin 
> ---
>
> Key: HUDI-304
> URL: https://issues.apache.org/jira/browse/HUDI-304
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Developer Productivity
>Reporter: Balaji Varadarajan
>Priority: Major
>
> spotless plugin has been turned off as the eclipse style format it was 
> referencing was removed due to compliance reasons. 
> We use google style eclipse format with some changes
> 90c90
> < 
> ---
> > 
> 242c242
> <  value="100"/>
> ---
> >  > value="120"/>
>  
> The eclipse style sheet was originally obtained from 
> [https://github.com/google/styleguide] which CC -By 3.0 license which is not 
> compatible for source distribution (See 
> [https://www.apache.org/legal/resolved.html#cc-by]) 
>  
> We need to figure out a way to bring this back
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-273) Translate Documentation -> Writing Data page

2019-10-17 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-273.
--
Resolution: Fixed

Fixed via asf-site: 9cd67ba81875d66c05a630e2948073ed33ec0548

> Translate Documentation -> Writing Data page
> 
>
> Key: HUDI-273
> URL: https://issues.apache.org/jira/browse/HUDI-273
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: docs-chinese
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Translate this page into Chinese:
>  
> [http://hudi.apache.org/writing_data.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-342) add pull request template for hudi project

2019-11-18 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-342.
--
Fix Version/s: 0.5.1
   Resolution: Fixed

Fixed via master: 66492498f702e9224da838d1869e9376111939ce

> add pull request template for hudi project
> --
>
> Key: HUDI-342
> URL: https://issues.apache.org/jira/browse/HUDI-342
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Common Core
>Reporter: lamber-ken
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> add pull request template for hudi project, if so, reviewers will understands 
> what's the meaning of the pr quickly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-288) Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment

2019-11-20 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978324#comment-16978324
 ] 

leesf commented on HUDI-288:


> I think we can stick to the same whitelist/blacklist that Kafka itself uses? 

It makes sense.

> IIUC, even now, we can specifiy multiple topics as source but they get 
>written as a single Hudi dataset.

Take a look to the currently code, and find the config 
_hoodie.deltastreamer.source.kafka.topic_  to identify the topic to ingest, and 
I  think it does not support topics, so we only support configuring only one 
topic to ingest currently, any thing I missed and please correct me if I am 
wrong.

> we want to ingest kafka topics are separate Hudi datasets.  1-1 mapping 
>between a kafka topic and a hudi dataset.. I think the tool can take a 
>`--base-path-prefix` and place each hudi dataset under 
>`/`

It makes sense.

> Also we could allow topic level overrides as needed.. for deltra steamer/hudi 
>properties.. Our DFSPropertiesConfiguration class already supports includes as 
>well. 

Sorry not to understand it correctly. Could you please show more details?

 

> Are you targetting this for 0.5.1 next release? Or do you think we can pick 
>up some things already labelled for that release.

I would like to get it ready for 0.5.1.

 

 

 

 

> Add support for ingesting multiple kafka streams in a single DeltaStreamer 
> deployment
> -
>
> Key: HUDI-288
> URL: https://issues.apache.org/jira/browse/HUDI-288
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: deltastreamer
>Reporter: Vinoth Chandar
>Assignee: leesf
>Priority: Major
>
> https://lists.apache.org/thread.html/3a69934657c48b1c0d85cba223d69cb18e18cd8aaa4817c9fd72cef6@
>  has all the context



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-346) Set allowMultipleEmptyLines property false for EmptyLineSeparator rule

2019-11-19 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-346.
--
Fix Version/s: 0.5.1
   Resolution: Fixed

Fixed via master: 804e348d0e8176ceced046fa9e87963907aecc38

> Set allowMultipleEmptyLines property false for EmptyLineSeparator rule
> --
>
> Key: HUDI-346
> URL: https://issues.apache.org/jira/browse/HUDI-346
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Common Core
>Reporter: lamber-ken
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Set allowMultipleEmptyLines property false for EmptyLineSeparator rule.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-340) Increase Default max events to read from kafka source

2019-11-26 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-340.
--
Fix Version/s: 0.5.1
   Resolution: Fixed

Fixed via master: 2a4cfb47c76a0c8800d4998453ec356711807c83

> Increase Default max events to read from kafka source
> -
>
> Key: HUDI-340
> URL: https://issues.apache.org/jira/browse/HUDI-340
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: deltastreamer
>Reporter: Pratyaksh Sharma
>Assignee: Pratyaksh Sharma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Right now, DEFAULT_MAX_EVENTS_TO_READ is set to 1M in case of kafka source in 
> KafkaOffsetGen.java class. DeltaStreamer can handle much more incoming 
> records than this. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-327) Introduce "null" supporting KeyGenerator

2019-11-26 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-327.
--
Fix Version/s: 0.5.1
   Resolution: Fixed

Fixed via master: 60fed21dc7e4cb66b154ae9be77dfada0f3071a5

> Introduce "null" supporting KeyGenerator
> 
>
> Key: HUDI-327
> URL: https://issues.apache.org/jira/browse/HUDI-327
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: Brandon Scheller
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Customers have been running into issues where they would like to use a 
> record_key from columns that can contain null values. Currently, this will 
> cause Hudi to crash and throw a cryptic exception.(improving error messaging 
> is a separate but related issue)
> We would like to propose a new KeyGenerator based on ComplexKeyGenerator that 
> allows for null record_keys.
> At a basic level, using the key generator without any options would 
> essentially allow a null record_key to be accepted. (It can be replaced with 
> an empty string, null, or some predefined "null" string representation)
> This comes with the negative side effect that all records with a null 
> record_key would then be associated together. To work around this, you would 
> be able to specify a secondary record_key to be used in the case that the 
> first one is null. You would specify this in the same way that you do for the 
> ComplexKeyGenerator as a comma separated list of record_keys. In this case, 
> when the first key is seen as null then the second key will be used instead. 
> We could support any arbitrary limit of record_keys here.
> While we are aware there are many alternatives to avoid using a null 
> record_key. We believe this will act as a usability improvement so that new 
> users are not forced to clean/update their data in order to use Hudi.
> We are hoping to get some feedback on the idea
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-288) Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment

2019-11-26 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982492#comment-16982492
 ] 

leesf commented on HUDI-288:


[~Pratyaksh] Of cause and glad that you have implemented the similar wrapper, 
it will save a lot of time. And is it convenient for you to share PoC you have 
implemented so that we would see  whether it satisfies the needs discussed 
above.

> Add support for ingesting multiple kafka streams in a single DeltaStreamer 
> deployment
> -
>
> Key: HUDI-288
> URL: https://issues.apache.org/jira/browse/HUDI-288
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: deltastreamer
>Reporter: Vinoth Chandar
>Assignee: leesf
>Priority: Major
>
> https://lists.apache.org/thread.html/3a69934657c48b1c0d85cba223d69cb18e18cd8aaa4817c9fd72cef6@
>  has all the context



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-348) Add Issue template for the project

2019-11-24 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-348.
--
Resolution: Fixed

Fixed via master: 17eaf41c54411344916fdf4ace80163438e9bee2

> Add Issue template for the project
> --
>
> Key: HUDI-348
> URL: https://issues.apache.org/jira/browse/HUDI-348
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Developer Productivity
>Reporter: Gurudatt Kulkarni
>Assignee: Gurudatt Kulkarni
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Add an issue template for a convenient way to file issues. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-328) Add support for Delete api in HoodieWriteClient

2019-11-24 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-328.
--
Resolution: Fixed

Fixed via master: c3355109b1fa8f3055a2ba57d6e2b49679581db5

> Add support for Delete api in HoodieWriteClient
> ---
>
> Key: HUDI-328
> URL: https://issues.apache.org/jira/browse/HUDI-328
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Write Client
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>   Original Estimate: 72h
>  Time Spent: 10m
>  Remaining Estimate: 71h 50m
>
> This ticket is specifically for HoodieWriteClient. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-345) Fix used deprecated function

2019-11-24 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-345.
--
Fix Version/s: 0.5.1
   Resolution: Fixed

Fixed via master: 7bc08cbfdce337ad980bb544ec9fc3dbdf9c

> Fix used deprecated function
> 
>
> Key: HUDI-345
> URL: https://issues.apache.org/jira/browse/HUDI-345
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Common Core
>Reporter: hong dongdong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Fix used deprecated function to be compatible with higher version of hadoop.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-294) Delete Paths written in Cleaner plan needs to be relative to partition-path

2019-11-26 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16983187#comment-16983187
 ] 

leesf commented on HUDI-294:


yeah, still working.

> Delete Paths written in Cleaner plan needs to be relative to partition-path
> ---
>
> Key: HUDI-294
> URL: https://issues.apache.org/jira/browse/HUDI-294
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Cleaner
>Reporter: Balaji Varadarajan
>Assignee: leesf
>Priority: Major
> Fix For: 0.5.1
>
>
> The deleted file paths stored in Clean metadata are all absolute. They need 
> to be changed to relative path.
> The challenge would be to handle cases when both version of cleaner metadata 
> are present and needs to be processed  (backwards compatibility)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-357) Refactor hudi-cli based on new comment and code style rules

2019-12-01 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-357.
--
Fix Version/s: 0.5.1
   Resolution: Fixed

Fixed via master: 75132c139f0faf9ef68bb461b3a551238a377455

> Refactor hudi-cli based on new comment and code style rules
> ---
>
> Key: HUDI-357
> URL: https://issues.apache.org/jira/browse/HUDI-357
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>Reporter: Gurudatt Kulkarni
>Assignee: Gurudatt Kulkarni
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This issue used to refactor hudi-cli module based on new comments and code 
> style rules.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-374) Unable to generateUpdates in QuickstartUtils

2019-12-01 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-374.
--
Fix Version/s: 0.5.1
   Resolution: Fixed

Fixed via master: b65a897856259e7872e39a9e3e68661926592d7b

> Unable to generateUpdates in QuickstartUtils
> 
>
> Key: HUDI-374
> URL: https://issues.apache.org/jira/browse/HUDI-374
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Spark datasource
>Reporter: hong dongdong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {code:java}
> scala> convertToStringList(dataGen.generateInserts(1))
> res0: java.util.List[String] = [{"ts": 0.0, "uuid": 
> "78956d3a-c13b-4871-8b14-596b2a7e11d9", "rider": "rider-213", "driver": 
> "driver-213", "begin_lat": 0.4726905879569653, "begin_lon": 
> 0.46157858450465483, "end_lat": 0.754803407008858, "end_lon": 
> 0.9671159942018241, "fare": 34.158284716382845, "partitionpath": 
> "americas/brazil/sao_paulo"}]
> scala> convertToStringList(dataGen.generateUpdates(1))
> java.lang.IllegalArgumentException: bound must be positive
>   at java.util.Random.nextInt(Random.java:388)
>   at 
> org.apache.hudi.QuickstartUtils$DataGenerator.generateUpdates(QuickstartUtils.java:163)
>   ... 73 elided
> {code}
> When `numExistingKeys = 1`, `rand.nextInt(numExistingKeys - 1 )`  is 
> equivalent to `rand.nextInt(0)` and bound of nextInt() must be  positive.  On 
> the other hand,  the range of nextInt is [0, numExistingKeys),  
> rand.nextInt(numExistingKeys) here is right.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-370) Refactor hudi-common based on new ImportOrder code style rule

2019-12-01 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-370.
--
Resolution: Fixed

Fixed via master: 784e3ad0b65ce8f7e0de2e2935032fb2573d5a26

> Refactor hudi-common based on new ImportOrder code style rule
> -
>
> Key: HUDI-370
> URL: https://issues.apache.org/jira/browse/HUDI-370
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>Reporter: lamber-ken
>Assignee: lamber-ken
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Refactor hudi-common based on new ImportOrder code style rule



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-288) Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment

2019-12-02 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16986433#comment-16986433
 ] 

leesf commented on HUDI-288:


Hi [~vinoth]. Since [~Pratyaksh] have completed most of the code, I would like 
to assign it to him and I will help to reivew the code.

> Add support for ingesting multiple kafka streams in a single DeltaStreamer 
> deployment
> -
>
> Key: HUDI-288
> URL: https://issues.apache.org/jira/browse/HUDI-288
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: deltastreamer
>Reporter: Vinoth Chandar
>Assignee: leesf
>Priority: Major
>
> https://lists.apache.org/thread.html/3a69934657c48b1c0d85cba223d69cb18e18cd8aaa4817c9fd72cef6@
>  has all the context



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-356) Sync translation and code in quickstart.cn and admin_guide.cn pages

2019-11-21 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-356.
--
Fix Version/s: 0.5.1
   Resolution: Fixed

Fix via asf-site: 46a1546259fcfb629a26508e829d3d3e4b9cee21

> Sync translation and code in quickstart.cn and admin_guide.cn pages
> ---
>
> Key: HUDI-356
> URL: https://issues.apache.org/jira/browse/HUDI-356
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Docs
>Reporter: hong dongdong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


<    1   2   3   4   5   6   7   8   9   10   >