Github user advancedxy commented on the pull request:
https://github.com/apache/spark/pull/4783#issuecomment-83878165
@rxin as I mentioned in the previous comment, I use the method in this
article.
http://www.javaworld.com/article/2077496/testing-debugging/java-tip-130--do-you-know-your-data-size-.html
The related code is listed below. The scala library should be included in
the classpath
```
//
----------------------------------------------------------------------------
/**
* A simple class to experiment with your JVM's garbage collector
* and memory sizes for various data types.
*
* @author <a href="mailto:[email protected]">Vladimir Roubtsov</a>
*/
import scala.Tuple2;
import scala.Tuple2$mcII$sp;
public class Sizeof
{
public static class DummyClass1 {}
public static class DummyClass2 extends DummyClass1 {
public boolean x;
}
public static class DummyClass3 extends DummyClass2 {
public boolean y;
}
public static void main (String [] args) throws Exception
{
// "warm up" all classes/methods that we are going to use:
runGC ();
usedMemory ();
// array to keep strong references to allocated objects:
final int count = 10000; // 10000 or so is enough for small ojects
Object [] objects = new Object [count];
long heap1 = 0;
// allocate count+1 objects, discard the first one:
for (int i = -1; i < count; ++ i)
{
Object object;
// INSTANTIATE YOUR DATA HERE AND ASSIGN IT TO 'object':
object = new Tuple2(1, 2); // This gives 24 bytes. 64 bit vm
// object = new Tuple2$mcII$sp(1, 2); // This gives 32 bytes.
64 bit vm
// This gives 56 bytes per object.
// 24B(Tuple2) + 16B(Integer)+ 16B(Integer) = 56B
// object = new Tuple2(2 * i + 1, 2 * i + 2);
if (i >= 0)
objects [i] = object;
else
{
object = null; // discard the "warmup" object
runGC ();
heap1 = usedMemory (); // take a "before" heap snapshot
}
}
runGC ();
long heap2 = usedMemory (); // take an "after" heap snapshot:
final int size = Math.round (((float)(heap2 - heap1))/count);
System.out.println ("'before' heap: " + heap1 +
", 'after' heap: " + heap2);
System.out.println ("heap delta: " + (heap2 - heap1) +
", {" + objects [0].getClass () + "} size = " + size + "
bytes");
}
// a helper method for creating Strings of desired length
// and avoiding getting tricked by String interning:
public static String createString (final int length)
{
final char [] result = new char [length];
for (int i = 0; i < length; ++ i) result [i] = (char) i;
return new String (result);
}
// this is our way of requesting garbage collection to be run:
// [how aggressive it is depends on the JVM to a large degree, but
// it is almost always better than a single Runtime.gc() call]
private static void runGC () throws Exception
{
// for whatever reason it helps to call Runtime.gc()
// using several method calls:
for (int r = 0; r < 4; ++ r) _runGC ();
}
private static void _runGC () throws Exception
{
long usedMem1 = usedMemory (), usedMem2 = Long.MAX_VALUE;
for (int i = 0; (usedMem1 < usedMem2) && (i < 1000); ++ i)
{
s_runtime.runFinalization ();
s_runtime.gc ();
Thread.currentThread ().yield ();
usedMem2 = usedMem1;
usedMem1 = usedMemory ();
}
}
private static long usedMemory ()
{
return s_runtime.totalMemory () - s_runtime.freeMemory ();
}
private static final Runtime s_runtime = Runtime.getRuntime ();
} // end of class
//
----------------------------------------------------------------------------
```
I took another look at the result. I think I got fooled by the result of
Tuple2(1,2); It gives 24 bytes as result, but the javac don't pick the
specialized version class. The Tuple2$mcII$sp(1, 2) does give 32 bytes as
result. So, The (1, 2) in Scala on 64 bit JVM do takes 32 bytes size. We get
that right
The last thing left is to figure out why the ExternalSort keeping failing
tests.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]