[GitHub] spark pull request: [SPARK-1201] Do not fully materialize partitio...

andrewor14 Mon, 16 Jun 2014 13:51:07 -0700

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/1083#issuecomment-46235253
  
    It's worth noting that the special handling of the memory serialized 
storage level actually introduce a regression. In particular, we first 
serialize the values, put it in block manager, and then deserialize the bytes 
to return the original values to the user. The tradeoff is that this adds an 
extra step to deserialize the bytes in the end, which could be slow for large 
partitions.
    
    This special handling will most likely be superseded by a more general 
solution for [SPARK-1777](https://issues.apache.org/jira/browse/SPARK-1777), 
which avoids unrolling an entire partition if there is not enough space for it, 
regardless of the storage level. For now, I will put this PR on hold.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1201] Do not fully materialize partitio...

Reply via email to