Github user cxzl25 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21311#discussion_r190146533
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/joins/HashedRelationSuite.scala
---
@@ -254,6 +254,30 @@ class HashedRelationSuite extends SparkFunSuite with
SharedSQLContext {
map.free()
}
+ test("LongToUnsafeRowMap with big values") {
+ val taskMemoryManager = new TaskMemoryManager(
+ new StaticMemoryManager(
+ new SparkConf().set(MEMORY_OFFHEAP_ENABLED.key, "false"),
+ Long.MaxValue,
+ Long.MaxValue,
+ 1),
+ 0)
+ val unsafeProj = UnsafeProjection.create(Array[DataType](StringType))
+ val map = new LongToUnsafeRowMap(taskMemoryManager, 1)
+
+ val key = 0L
+ // the page array is initialized with length 1 << 17 (1M bytes),
+ // so here we need a value larger than 1 << 18 (2M bytes),to trigger
the bug
+ val bigStr = UTF8String.fromString("x" * (1 << 22))
--- End diff --
LongToUnsafeRowMap#getRow
resultRow=UnsafeRow#pointTo(page(1<<18), baseOffset(16),
sizeInBytes(1<<21+16))
UTF8String#getBytes
copyMemory(base(page), offset, bytes, BYTE_ARRAY_OFFSET,
numBytes(1<<21+16));
In the case of similar size sometimes, can still read the original value.
When introducing SPARK-10399,UnsafeRow#getUTF8String check the size at this
time.
If we pick 1 << 18 + 1, 100% reproduce this bug.
But when this patch is not introduced, differences that are too small
sometimes do not trigger.
So I chose a larger value.
My understanding may be problematic. Please advise. Thank you.
```java
sun.misc.Unsafe unsafe;
try {
Field unsafeField = Unsafe.class.getDeclaredField("theUnsafe");
unsafeField.setAccessible(true);
unsafe = (sun.misc.Unsafe) unsafeField.get(null);
} catch (Throwable cause) {
unsafe = null;
}
String value = "xxxxx";
byte[] src = value.getBytes();
byte[] dst = new byte[3];
byte[] newDst = new byte[5];
unsafe.copyMemory(src, 16, dst, 16, src.length);
unsafe.copyMemory(dst, 16, newDst, 16, src.length);
System.out.println("dst:" + new String(dst));
System.out.println("newDst:" + new String(newDst));
```
output:
>dst:xxx
>newDst:xxxxx
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]