tverdokhlebd commented on issue #1491:
URL: https://github.com/apache/incubator-hudi/issues/1491#issuecomment-617163716
@lamber-ken, I have tested on 4GB driver memory (local[12]) file "2.csv" (5M
records):
1. bulk_insert with *.parallelism 80 and without
tverdokhlebd commented on issue #1491: [SUPPORT] OutOfMemoryError during upsert
53M records
URL: https://github.com/apache/incubator-hudi/issues/1491#issuecomment-615141491
Hi @lamber-ken
Yes, I have resolved this problem.
Firstly, the problem was in SBT memory that
tverdokhlebd commented on issue #1491: [SUPPORT] OutOfMemoryError during upsert
53M records
URL: https://github.com/apache/incubator-hudi/issues/1491#issuecomment-611920831
> I run those operations with local[2] and 6GB driver memory, still worked
fine.
How did you set memory? I
tverdokhlebd commented on issue #1491: [SUPPORT] OutOfMemoryError during upsert
53M records
URL: https://github.com/apache/incubator-hudi/issues/1491#issuecomment-611636241
@lamber-ken , I have run on my local machine (Windows, WSL) without docker.
But, I have also Jenkins (Linux) and
tverdokhlebd commented on issue #1491: [SUPPORT] OutOfMemoryError during upsert
53M records
URL: https://github.com/apache/incubator-hudi/issues/1491#issuecomment-611462195
Hmm, this is strange :(
I have tried with your parameters (parallelism and memory), but it does not
help me
tverdokhlebd commented on issue #1491: [SUPPORT] OutOfMemoryError during upsert
53M records
URL: https://github.com/apache/incubator-hudi/issues/1491#issuecomment-611161702
@lamber-ken , can you try with those params?
This
tverdokhlebd commented on issue #1491: [SUPPORT] OutOfMemoryError during upsert
53M records
URL: https://github.com/apache/incubator-hudi/issues/1491#issuecomment-611158024
@lamber-ken , did you do "bulk insert" on a partition and then "upsert" to
the same partition, yes?
Did you
tverdokhlebd commented on issue #1491: [SUPPORT] OutOfMemoryError during upsert
53M records
URL: https://github.com/apache/incubator-hudi/issues/1491#issuecomment-610980530
> upsert (use the same CSV dataset) ?
Yes, use the same CSV dataset.
tverdokhlebd commented on issue #1491: [SUPPORT] OutOfMemoryError during upsert
53M records
URL: https://github.com/apache/incubator-hudi/issues/1491#issuecomment-610880742
> .option("hoodie.write.buffer.limit.bytes", "131072") //128MB
I have tried but it doesn't help me (local[3]
tverdokhlebd commented on issue #1491: [SUPPORT] OutOfMemoryError during upsert
53M records
URL: https://github.com/apache/incubator-hudi/issues/1491#issuecomment-610580236
> Can you give this a shot on a cluster?
Do you mean access to the cluster? Those steps also were reproducing on
tverdokhlebd commented on issue #1491: [SUPPORT] OutOfMemoryError during upsert
53M records
URL: https://github.com/apache/incubator-hudi/issues/1491#issuecomment-610564674
Code:
`
sparkSession
.read
.jdbc(
url = jdbcConfig.url,
table = table,
tverdokhlebd commented on issue #1491: [SUPPORT] OutOfMemoryError during upsert
53M records
URL: https://github.com/apache/incubator-hudi/issues/1491#issuecomment-610307143
So, the process took 2h 40m and thrown "java.lang.OutOfMemoryError: GC
overhead limit exceeded".
Log
12 matches
Mail list logo