bvaradar commented on issue #1852:
URL: https://github.com/apache/hudi/issues/1852#issuecomment-668667100
Closing this ticket as it was answered.
This is an automated message from the Apache Git Service.
To respond to the
bvaradar commented on issue #1852:
URL: https://github.com/apache/hudi/issues/1852#issuecomment-663816363
@ssomuah : Looking at the commit metadata, it is the case where your updates
are spread across a large number of files. For example, in latest commit, 334
files sees updates whereas
bvaradar commented on issue #1852:
URL: https://github.com/apache/hudi/issues/1852#issuecomment-663427905
What do you mean by "runs serially with ingestion"? My understanding was
that inline compaction happened in the same flow as writing so an inline
compaction would simply slow down
bvaradar commented on issue #1852:
URL: https://github.com/apache/hudi/issues/1852#issuecomment-663178646
@ssomuah :
Such a large number of log files indicates your compaction frequency
(INLINE_COMPACT_NUM_DELTA_COMMITS_PROP) is conservative. Many of these log
files could also be
bvaradar commented on issue #1852:
URL: https://github.com/apache/hudi/issues/1852#issuecomment-663121167
@ssomuah : Regarding the patch, it is meant to ensure all pending
compactions are completed. Regarding the slowness, we are working on general
and S3 specific performance
bvaradar commented on issue #1852:
URL: https://github.com/apache/hudi/issues/1852#issuecomment-662638790
Ended up creating a new jira :
https://issues.apache.org/jira/browse/HUDI-1119 as this has different cause.
This is
bvaradar commented on issue #1852:
URL: https://github.com/apache/hudi/issues/1852#issuecomment-662632930
We have a jira : https://issues.apache.org/jira/browse/HUDI-1015 to
improve/avoid listing. I have added this case to the jira.
bvaradar commented on issue #1852:
URL: https://github.com/apache/hudi/issues/1852#issuecomment-662630342
Sorry, I did not realize that. Let me check and get back
This is an automated message from the Apache Git Service.
To
bvaradar commented on issue #1852:
URL: https://github.com/apache/hudi/issues/1852#issuecomment-662177092
MacBook-Pro:hudi balaji.varadarajan$ grep -c '\.clean.requested'
~/Downloads/dot_hoodie_folder.txt
16
MacBook-Pro:hudi balaji.varadarajan$ grep -c '\.deltacommit.requested'
bvaradar commented on issue #1852:
URL: https://github.com/apache/hudi/issues/1852#issuecomment-661692328
```
And looking at the thread dump of the executors they are almost always
spending their time listing files.
```
This looks surprising to me. file listing for finding
10 matches
Mail list logo