yaooqinn commented on a change in pull request #31960:
URL: https://github.com/apache/spark/pull/31960#discussion_r601571505



##########
File path: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetDictionary.java
##########
@@ -61,6 +63,14 @@ public double decodeToDouble(int id) {
 
   @Override
   public byte[] decodeToBinary(int id) {
-    return dictionary.decodeToBinary(id).getBytes();
+    if (needTransform) {
+      // For unsigned int64, it stores as dictionary encoded signed int64 in 
Parquet
+      // whenever dictionary is available.
+      // Here we lazily decode it to the original signed long value then 
convert to decimal(20, 0).
+      long signed = dictionary.decodeToLong(id);
+      return new BigInteger(Long.toUnsignedString(signed)).toByteArray();

Review comment:
       use `BigInteger` for caller side
    
https://github.com/apache/spark/blob/a418548dad57775fbb10b4ea690610bad1a8bfb0/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java#L371~375

##########
File path: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetDictionary.java
##########
@@ -61,6 +63,14 @@ public double decodeToDouble(int id) {
 
   @Override
   public byte[] decodeToBinary(int id) {
-    return dictionary.decodeToBinary(id).getBytes();
+    if (needTransform) {
+      // For unsigned int64, it stores as dictionary encoded signed int64 in 
Parquet
+      // whenever dictionary is available.
+      // Here we lazily decode it to the original signed long value then 
convert to decimal(20, 0).
+      long signed = dictionary.decodeToLong(id);
+      return new BigInteger(Long.toUnsignedString(signed)).toByteArray();

Review comment:
       use `BigInteger` for caller side
    
https://github.com/apache/spark/blob/a418548dad57775fbb10b4ea690610bad1a8bfb0/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java#L371-375




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to