[GitHub] [hudi] bvaradar commented on issue #1852: [SUPPORT]

2020-08-04 Thread GitBox
bvaradar commented on issue #1852: URL: https://github.com/apache/hudi/issues/1852#issuecomment-668667100 Closing this ticket as it was answered. This is an automated message from the Apache Git Service. To respond to the

[GitHub] [hudi] bvaradar commented on issue #1852: [SUPPORT]

2020-07-25 Thread GitBox
bvaradar commented on issue #1852: URL: https://github.com/apache/hudi/issues/1852#issuecomment-663816363 @ssomuah : Looking at the commit metadata, it is the case where your updates are spread across a large number of files. For example, in latest commit, 334 files sees updates whereas

[GitHub] [hudi] bvaradar commented on issue #1852: [SUPPORT]

2020-07-24 Thread GitBox
bvaradar commented on issue #1852: URL: https://github.com/apache/hudi/issues/1852#issuecomment-663427905 What do you mean by "runs serially with ingestion"? My understanding was that inline compaction happened in the same flow as writing so an inline compaction would simply slow down

[GitHub] [hudi] bvaradar commented on issue #1852: [SUPPORT]

2020-07-23 Thread GitBox
bvaradar commented on issue #1852: URL: https://github.com/apache/hudi/issues/1852#issuecomment-663178646 @ssomuah : Such a large number of log files indicates your compaction frequency (INLINE_COMPACT_NUM_DELTA_COMMITS_PROP) is conservative. Many of these log files could also be

[GitHub] [hudi] bvaradar commented on issue #1852: [SUPPORT]

2020-07-23 Thread GitBox
bvaradar commented on issue #1852: URL: https://github.com/apache/hudi/issues/1852#issuecomment-663121167 @ssomuah : Regarding the patch, it is meant to ensure all pending compactions are completed. Regarding the slowness, we are working on general and S3 specific performance

[GitHub] [hudi] bvaradar commented on issue #1852: [SUPPORT]

2020-07-22 Thread GitBox
bvaradar commented on issue #1852: URL: https://github.com/apache/hudi/issues/1852#issuecomment-662638790 Ended up creating a new jira : https://issues.apache.org/jira/browse/HUDI-1119 as this has different cause. This is

[GitHub] [hudi] bvaradar commented on issue #1852: [SUPPORT]

2020-07-22 Thread GitBox
bvaradar commented on issue #1852: URL: https://github.com/apache/hudi/issues/1852#issuecomment-662632930 We have a jira : https://issues.apache.org/jira/browse/HUDI-1015 to improve/avoid listing. I have added this case to the jira.

[GitHub] [hudi] bvaradar commented on issue #1852: [SUPPORT]

2020-07-22 Thread GitBox
bvaradar commented on issue #1852: URL: https://github.com/apache/hudi/issues/1852#issuecomment-662630342 Sorry, I did not realize that. Let me check and get back This is an automated message from the Apache Git Service. To

[GitHub] [hudi] bvaradar commented on issue #1852: [SUPPORT]

2020-07-21 Thread GitBox
bvaradar commented on issue #1852: URL: https://github.com/apache/hudi/issues/1852#issuecomment-662177092 MacBook-Pro:hudi balaji.varadarajan$ grep -c '\.clean.requested' ~/Downloads/dot_hoodie_folder.txt 16 MacBook-Pro:hudi balaji.varadarajan$ grep -c '\.deltacommit.requested'

[GitHub] [hudi] bvaradar commented on issue #1852: [SUPPORT]

2020-07-21 Thread GitBox
bvaradar commented on issue #1852: URL: https://github.com/apache/hudi/issues/1852#issuecomment-661692328 ``` And looking at the thread dump of the executors they are almost always spending their time listing files. ``` This looks surprising to me. file listing for finding