subject:"\[GitHub\] \[hudi\] bvaradar commented on issue #1830\: \[SUPPORT\] Processing time gradually increases while using Spark Streaming"

[GitHub] [hudi] bvaradar commented on issue #1830: [SUPPORT] Processing time gradually increases while using Spark Streaming

2020-07-20 Thread GitBox

bvaradar commented on issue #1830: URL: https://github.com/apache/hudi/issues/1830#issuecomment-660840191 We spent time over the weekend setting up a local test bed with kafka and structured streaming to reproduce this behavior. Here are the steps I followed with code :

[GitHub] [hudi] bvaradar commented on issue #1830: [SUPPORT] Processing time gradually increases while using Spark Streaming

2020-07-15 Thread GitBox

bvaradar commented on issue #1830: URL: https://github.com/apache/hudi/issues/1830#issuecomment-659163337 @umehrot2 @srsteinmetz : Thanks for the information. I have not seen similar issues but looking at the trend (increase) in number of file groups and partitions is a good angle to

[GitHub] [hudi] bvaradar commented on issue #1830: [SUPPORT] Processing time gradually increases while using Spark Streaming

2020-07-15 Thread GitBox

bvaradar commented on issue #1830: URL: https://github.com/apache/hudi/issues/1830#issuecomment-659044105 From the highlighted section in your spark UI image, it looks like there is an increase during index lookup. Between 2 runs, there is an increase of 10 sec (around 4%). Is this the