Github user ZunwenYou commented on the issue:
https://github.com/apache/spark/pull/17000
Hi, @hhbyyh
In our experiment, the class **_MultivariateOnlineSummarizer_** contains 8
arrays, if the dimension reaches 20 million, the memory of
MultivariateOnlineSummarizer is 1280M(8Bit* 20M * 8).
The experiment configuration as follows:
spark.driver.maxResultSize 6g
spark.kryoserializer.buffer.max 2047m
driver-memory 20g
num-executors 100
executor-cores 2
executor-memory 15g
RDD and aggregate parameter:
RDD partition number 300
treeAggregate depth 5
As the description of configuration, treeAggregate will run into four
stages, each stage task number is 300, 75, 18, 4.
At the last stage of treeAggrate, tasks will be killed, because executors
throw exception _**java.lang.OutOfMemoryError: Requested array size exceeds VM
limit**_.
I set treeAggregate depth=7, executor-memory=30g, the last stage still
failed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]