[GitHub] [incubator-hudi] tverdokhlebd commented on issue #1491: [SUPPORT] OutOfMemoryError during upsert 53M records

2020-04-21 Thread GitBox
tverdokhlebd commented on issue #1491: URL: https://github.com/apache/incubator-hudi/issues/1491#issuecomment-617163716 @lamber-ken, I have tested on 4GB driver memory (local[12]) file "2.csv" (5M records): 1. bulk_insert with *.parallelism 80 and without

[GitHub] [incubator-hudi] tverdokhlebd commented on issue #1491: [SUPPORT] OutOfMemoryError during upsert 53M records

2020-04-17 Thread GitBox
tverdokhlebd commented on issue #1491: [SUPPORT] OutOfMemoryError during upsert 53M records URL: https://github.com/apache/incubator-hudi/issues/1491#issuecomment-615141491 Hi @lamber-ken Yes, I have resolved this problem. Firstly, the problem was in SBT memory that

[GitHub] [incubator-hudi] tverdokhlebd commented on issue #1491: [SUPPORT] OutOfMemoryError during upsert 53M records

2020-04-10 Thread GitBox
tverdokhlebd commented on issue #1491: [SUPPORT] OutOfMemoryError during upsert 53M records URL: https://github.com/apache/incubator-hudi/issues/1491#issuecomment-611920831 > I run those operations with local[2] and 6GB driver memory, still worked fine. How did you set memory? I

[GitHub] [incubator-hudi] tverdokhlebd commented on issue #1491: [SUPPORT] OutOfMemoryError during upsert 53M records

2020-04-09 Thread GitBox
tverdokhlebd commented on issue #1491: [SUPPORT] OutOfMemoryError during upsert 53M records URL: https://github.com/apache/incubator-hudi/issues/1491#issuecomment-611636241 @lamber-ken , I have run on my local machine (Windows, WSL) without docker. But, I have also Jenkins (Linux) and

[GitHub] [incubator-hudi] tverdokhlebd commented on issue #1491: [SUPPORT] OutOfMemoryError during upsert 53M records

2020-04-09 Thread GitBox
tverdokhlebd commented on issue #1491: [SUPPORT] OutOfMemoryError during upsert 53M records URL: https://github.com/apache/incubator-hudi/issues/1491#issuecomment-611462195 Hmm, this is strange :( I have tried with your parameters (parallelism and memory), but it does not help me

[GitHub] [incubator-hudi] tverdokhlebd commented on issue #1491: [SUPPORT] OutOfMemoryError during upsert 53M records

2020-04-08 Thread GitBox
tverdokhlebd commented on issue #1491: [SUPPORT] OutOfMemoryError during upsert 53M records URL: https://github.com/apache/incubator-hudi/issues/1491#issuecomment-611161702 @lamber-ken , can you try with those params? This

[GitHub] [incubator-hudi] tverdokhlebd commented on issue #1491: [SUPPORT] OutOfMemoryError during upsert 53M records

2020-04-08 Thread GitBox
tverdokhlebd commented on issue #1491: [SUPPORT] OutOfMemoryError during upsert 53M records URL: https://github.com/apache/incubator-hudi/issues/1491#issuecomment-611158024 @lamber-ken , did you do "bulk insert" on a partition and then "upsert" to the same partition, yes? Did you

[GitHub] [incubator-hudi] tverdokhlebd commented on issue #1491: [SUPPORT] OutOfMemoryError during upsert 53M records

2020-04-08 Thread GitBox
tverdokhlebd commented on issue #1491: [SUPPORT] OutOfMemoryError during upsert 53M records URL: https://github.com/apache/incubator-hudi/issues/1491#issuecomment-610980530 > upsert (use the same CSV dataset) ? Yes, use the same CSV dataset.

[GitHub] [incubator-hudi] tverdokhlebd commented on issue #1491: [SUPPORT] OutOfMemoryError during upsert 53M records

2020-04-08 Thread GitBox
tverdokhlebd commented on issue #1491: [SUPPORT] OutOfMemoryError during upsert 53M records URL: https://github.com/apache/incubator-hudi/issues/1491#issuecomment-610880742 > .option("hoodie.write.buffer.limit.bytes", "131072") //128MB I have tried but it doesn't help me (local[3]

[GitHub] [incubator-hudi] tverdokhlebd commented on issue #1491: [SUPPORT] OutOfMemoryError during upsert 53M records

2020-04-07 Thread GitBox
tverdokhlebd commented on issue #1491: [SUPPORT] OutOfMemoryError during upsert 53M records URL: https://github.com/apache/incubator-hudi/issues/1491#issuecomment-610580236 > Can you give this a shot on a cluster? Do you mean access to the cluster? Those steps also were reproducing on

[GitHub] [incubator-hudi] tverdokhlebd commented on issue #1491: [SUPPORT] OutOfMemoryError during upsert 53M records

2020-04-07 Thread GitBox
tverdokhlebd commented on issue #1491: [SUPPORT] OutOfMemoryError during upsert 53M records URL: https://github.com/apache/incubator-hudi/issues/1491#issuecomment-610564674 Code: ` sparkSession .read .jdbc( url = jdbcConfig.url, table = table,

[GitHub] [incubator-hudi] tverdokhlebd commented on issue #1491: [SUPPORT] OutOfMemoryError during upsert 53M records

2020-04-07 Thread GitBox
tverdokhlebd commented on issue #1491: [SUPPORT] OutOfMemoryError during upsert 53M records URL: https://github.com/apache/incubator-hudi/issues/1491#issuecomment-610307143 So, the process took 2h 40m and thrown "java.lang.OutOfMemoryError: GC overhead limit exceeded". Log