[GitHub] [hudi] garyli1019 commented on issue #1786: [SUPPORT] Bulk insert slow on MOR

2020-07-07 Thread GitBox
garyli1019 commented on issue #1786: URL: https://github.com/apache/hudi/issues/1786#issuecomment-655021966 @rvd8345 Ok, 100 wouldn't be too much different from 64. During the Stage 5 `count xxx`, Hudi is actually writing the file into the filesystem. Even we reduce the parallelism number

[GitHub] [hudi] garyli1019 commented on issue #1786: [SUPPORT] Bulk insert slow on MOR

2020-07-05 Thread GitBox
garyli1019 commented on issue #1786: URL: https://github.com/apache/hudi/issues/1786#issuecomment-653989028 Hi @rvd8345 , are you referring `shuffle parallelism` to `spark.shuffle.partition` or hudi parallelism. For bulk insert, the Hudi parallelism seems too large for 9.7 GB data. With