[jira] [Commented] (HUDI-860) Ability to do small file handling without need for caching

2021-10-06 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17425284#comment-17425284 ] Ethan Guo commented on HUDI-860: Cool, I'll take a look. > Ability to do small file handling without need

[jira] [Commented] (HUDI-860) Ability to do small file handling without need for caching

2021-10-06 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17425003#comment-17425003 ] Vinoth Chandar commented on HUDI-860: - [~guoyihua] this is a good one to get started on the

[jira] [Commented] (HUDI-860) Ability to do small file handling without need for caching

2020-07-16 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17159651#comment-17159651 ] Vinoth Chandar commented on HUDI-860: - >and will return the info to driver. Driver will collect all

[jira] [Commented] (HUDI-860) Ability to do small file handling without need for caching

2020-07-16 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17159649#comment-17159649 ] sivabalan narayanan commented on HUDI-860: -- To determine total parallelism, driver needs to

[jira] [Commented] (HUDI-860) Ability to do small file handling without need for caching

2020-07-16 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17159648#comment-17159648 ] sivabalan narayanan commented on HUDI-860: -- we don't need to build global workload stats. with

[jira] [Commented] (HUDI-860) Ability to do small file handling without need for caching

2020-07-16 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17159647#comment-17159647 ] Vinoth Chandar commented on HUDI-860: - What I mean is, we need to read the RDD again if we convert to

[jira] [Commented] (HUDI-860) Ability to do small file handling without need for caching

2020-07-16 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17159646#comment-17159646 ] Vinoth Chandar commented on HUDI-860: - >MapPartitions() with one per hudi partition to collect all

[jira] [Commented] (HUDI-860) Ability to do small file handling without need for caching

2020-07-16 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17159133#comment-17159133 ] sivabalan narayanan commented on HUDI-860: -- [~vinoth]: did you get a chance to look at this

[jira] [Commented] (HUDI-860) Ability to do small file handling without need for caching

2020-07-06 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17152012#comment-17152012 ] sivabalan narayanan commented on HUDI-860: -- Here is a proposal. High level idea is to avoid doing