Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21556#discussion_r200181749
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
---
@@ -82,6 +120,30 @@ private[parquet] class ParquetFilters(pushDownDate:
Boolean, pushDownStartWith:
(n: String, v: Any) => FilterApi.eq(
intColumn(n),
Option(v).map(date =>
dateToDays(date.asInstanceOf[Date]).asInstanceOf[Integer]).orNull)
+
+ case ParquetSchemaType(DECIMAL, INT32, decimal) if pushDownDecimal =>
+ (n: String, v: Any) => FilterApi.eq(
+ intColumn(n),
+
Option(v).map(_.asInstanceOf[JBigDecimal].unscaledValue().intValue()
+ .asInstanceOf[Integer]).orNull)
+ case ParquetSchemaType(DECIMAL, INT64, decimal) if pushDownDecimal =>
+ (n: String, v: Any) => FilterApi.eq(
+ longColumn(n),
+
Option(v).map(_.asInstanceOf[JBigDecimal].unscaledValue().longValue()
+ .asInstanceOf[java.lang.Long]).orNull)
+ // Legacy DecimalType
+ case ParquetSchemaType(DECIMAL, FIXED_LEN_BYTE_ARRAY, decimal) if
pushDownDecimal &&
--- End diff --
The binary used for the legacy type and for fixed-length storage should be
the same, so I don't understand why there are two different conversion methods.
Also, because this is using the Parquet schema now, there's no need to base the
length of this binary on what older versions of Spark did -- in other words, if
the underlying Parquet type is fixed, then just convert the decimal to that
size fixed without worrying about legacy types.
I think this should pass in the fixed array's length and convert the
BigDecimal value to that length array for all cases. That works no matter what
the file contains.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]