Github user advancedxy commented on the pull request:

    https://github.com/apache/spark/pull/4783#issuecomment-83410489
  
    @shivaram @srowen Tuple2(Int, Int) got specialized to Tuple2$mcII$sp class. 
But the Tuple2$mcII$sp is a subclass of Tuple2. So in our implementation, the 
specialized class will get two additional object references. (_1, _2 in 
superclass Tuple2, in our case). So, for Tuple2(Int, Int), SizeEstimator will 
give 32 bytes rather than 24 bytes. In theory, the Tuple2(1,2) class filed 
layout should be something like below.
    ```
    scala.Tuple2$mcII$sp object internals:
    OFFSET SIZE TYPE DESCRIPTION VALUE
    0 4 (object header) 01 00 00 00 (0000 0001 0000 0000 0000 0000 0000 0000)
    4 4 (object header) 00 00 00 00 (0000 0000 0000 0000 0000 0000 0000 0000)
    8 4 (object header) 05 c3 00 f8 (0000 0101 1100 0011 0000 0000 1111 1000)
    12 4 Object Tuple2._1 null
    16 4 Object Tuple2._2 null
    20 4 int Tuple2$mcII$sp._1$mcI$sp 1
    24 4 int Tuple2$mcII$sp._2$mcI$sp 2
    28 4 (loss due to the next object alignment)
    Instance size: 32 bytes (reported by Instrumentation API)
    Space losses: 0 bytes internal + 4 bytes external = 4 bytes total
    ```
    
    But in practice, the size of Tuple2(1, 2) is 24 bytes. So is there any 
scala expert we can ping? I really want to know why Tuple2(1, 2) can be 24 
bytes when the specialized version is involved.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to