LuciferYang opened a new pull request, #55890: URL: https://github.com/apache/spark/pull/55890
### What changes were proposed in this pull request? `DowncastLongUpdater` (selected for reading INT64 DECIMAL columns into a Spark target whose precision is `<= 9`) targets a 32-bit decimal column vector, which is backed by `intData[]`; `longData[]` is unallocated. Its `decodeSingleDictionaryId` previously called `values.putLong(...)`, which NPE'd as soon as that path was actually exercised. The fix narrows the dictionary's long value to int with the same `(int) longValue` cast already used by `readValue` and `readValues`: ```java values.putInt(offset, (int) dictionary.decodeToLong(dictionaryIds.getDictId(offset))); ``` ### Why are the changes needed? This is a latent bug going back to SPARK-35640 (Jun 2021). It went undetected because the path is only reachable when: 1. The Parquet column is stored as **INT64** with logical type **DECIMAL(precision <= 9)** — which Spark's own writer never produces (it emits INT32 for `DECIMAL(p<=9)`); only external writers (Hive, Impala, ...) emit this form. 2. The Spark read schema targets a **DecimalType with precision <= 9**, so the factory routes to `DowncastLongUpdater`. 3. The vectorized reader has to **eagerly drain** dictionary IDs — for example when parquet-mr starts dictionary-encoded and then falls back to PLAIN mid-column. The normal lazy-dictionary path (where decoding happens at row read time via `ParquetDictionary`) bypasses this updater method entirely, which is why everyday workloads never hit it. ### Does this PR introduce _any_ user-facing change? Yes — reads that previously failed with a `NullPointerException` now succeed and return the correct values. ### How was this patch tested? Added a regression test in `ParquetIOSuite` that writes INT64 DECIMAL(9, 2) via parquet-mr's low-level writer with a mix-cardinality pattern (80% from a 4-value pool, 20% unique-per-row, 5000 rows). This forces the dictionary-to-PLAIN fallback that triggers the eager-decode path. The test NPE'd on master without this fix and now passes. ``` build/sbt 'sql/testOnly *ParquetIOSuite' ... [info] Tests: succeeded 92, failed 0, canceled 0, ignored 0, pending 0 ``` ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Code -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
