Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/7789#discussion_r35904128
--- Diff:
unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java ---
@@ -141,7 +154,24 @@ public int numChars() {
* Returns a 64-bit integer that can be used as the prefix used in
sorting.
*/
public long getPrefix() {
- long p = PlatformDependent.UNSAFE.getLong(base, offset);
+ // Since JVMs are either 4-byte aligned or 8-byte aligned, we check
the size of the string.
+ // If size is 0, just return 0.
+ // If size is between 0 and 4 (inclusive), assume data is 4-byte
aligned under the hood and
+ // use a getInt to fetch the prefix.
+ // If size is greater than 4, assume we have at least 8 bytes of data
to fetch.
+ // After getting the data, we use a mask to mask out data that is not
part of the string.
+ long p;
+ if (numBytes >= 8) {
--- End diff --
Yup that's a good idea.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]