[GitHub] spark issue #17000: [SPARK-18946][ML] sliceAggregate which is a new aggregat...

ZunwenYou Mon, 20 Feb 2017 23:28:11 -0800

Github user ZunwenYou commented on the issue:

    https://github.com/apache/spark/pull/17000
  
    Hi, @hhbyyh 
    
    In our experiment, the class **_MultivariateOnlineSummarizer_** contains 8 
arrays, if the dimension reaches 20 million, the memory of 
MultivariateOnlineSummarizer is 1280M(8Bit* 20M * 8).
    
    The experiment configuration as follows:
    spark.driver.maxResultSize 6g
    spark.kryoserializer.buffer.max 2047m
    driver-memory 20g 
    num-executors 100 
    executor-cores 2 
    executor-memory 15g
    
    RDD and aggregate parameter:
    RDD partition number 300
    treeAggregate depth 5
    As the description of configuration, treeAggregate will run into four 
stages, each stage task number is 300, 75, 18, 4.
    At the last stage of treeAggrate, tasks will be killed, because executors 
throw exception _**java.lang.OutOfMemoryError: Requested array size exceeds VM 
limit**_. 
    I set treeAggregate depth=7, executor-memory=30g, the last stage still 
failed.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #17000: [SPARK-18946][ML] sliceAggregate which is a new aggregat...

Reply via email to