[GitHub] [hudi] bvaradar commented on issue #1829: [SUPPORT] S3 slow file listing causes Hudi read performance.

2020-07-17 Thread GitBox
bvaradar commented on issue #1829: URL: https://github.com/apache/hudi/issues/1829#issuecomment-659899321 Thanks @zuyanton for the updates. IIUC, S3 optimized committer was for optimizing writes reducing the renames done. I might be wrong but I am generally curious on EMR optimizations

[GitHub] [hudi] bvaradar commented on issue #1829: [SUPPORT] S3 slow file listing causes Hudi read performance.

2020-07-15 Thread GitBox
bvaradar commented on issue #1829: URL: https://github.com/apache/hudi/issues/1829#issuecomment-659036363 @zuyanton : This sounds like a general Spark/HMS query integration issue. Are we seeing similar behavior when running the same query over non-hudi table ?

[GitHub] [hudi] bvaradar commented on issue #1829: [SUPPORT] S3 slow file listing causes Hudi read performance.

2020-07-15 Thread GitBox
bvaradar commented on issue #1829: URL: https://github.com/apache/hudi/issues/1829#issuecomment-658902047 @zuyanton : HoodieParquetInputFormat relies on hadoop-mapreduce FileInputFormat listing implementation to perform listing. There is a knob in base FileInputFormat to tune listing