[GitHub] spark pull request: [WIP][SPARK-3517]mapPartitions is not correct ...

witgo Fri, 12 Sep 2014 22:40:34 -0700

Github user witgo commented on the pull request:

    https://github.com/apache/spark/pull/2376#issuecomment-55482720
  
    @rxin  Code like this: 
    ```
         val topicModel =  "Big object"
          val broadcastModel = data.context.broadcast(topicModel) 
          corpus = corpus.mapPartitions { docs =>
            val topicModel = broadcastModel.value
           .....
          }
    ```
    The serialized corpus RDD and serialized topicModel broadcast almost as big.
    ` cat spark.log | grep 'stored as values in memory'` =>
    ```
    14/09/13 00:49:21 INFO MemoryStore: Block broadcast_11 stored as values in 
memory (estimated size 197.5 MB, free 2.6 GB)
    14/09/13 00:49:24 INFO MemoryStore: Block broadcast_12 stored as values in 
memory (estimated size 197.7 MB, free 2.3 GB)
    ```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [WIP][SPARK-3517]mapPartitions is not correct ...

Reply via email to