Hi, I tested it with the versions you mentioned (Hadoop 3.3.4 and Spark 3.5.7 with Scala 2.12), and Spark could read the file locally without issues.
For the native libs (libhadoop.so and libzstd), I used the ones from CDP 7.1.9 (probably older than yours). I also tried it with the CDP versions (Hadoop 3.3.1 and Spark 3.3.2) and it worked fine too. Cheers, Ángel El mié, 3 dic 2025 a las 4:36, FengYu Cao (<[email protected]>) escribió: > Hi all, > > I would like to ask whether there are any known or potential workarounds > on the Spark side for a reproducible failure in Hadoop’s native ZSTD > decompression. The issue appears to be triggered specifically when the > original (uncompressed) file size is smaller than 129 KiB. > > Environment: > - Apache Spark 3.5.7 (Scala 2.12) with Hadoop 3.3.4 > - libhadoop.so from Apache Hadoop 3.3.6 > - libzstd 1.5.4 > > Summary of the problem: > When Spark reads a ZSTD-compressed file through Hadoop’s native > ZStandardDecompressor, the following errors can be reproduced reliably: > > 1. For files whose original size is <129 KiB: > java.lang.InternalError: Src size is incorrect > > 2. Under a slightly different sequence of reads: > java.lang.InternalError: Restored data doesn't match checksum > > These errors occur even though the ZSTD files are valid and can be > decompressed normally with the `zstd` CLI tools. > > Reproduction procedure: > 1. `yes a | head -n 65536 > file_128KiB.txt` (128 KiB) > 2. `zstd file_128KiB.txt` > 3. Validate with `zstd -lv` and `zstdcat`. > 4. In PySpark: > > > `spark.read.text("hdfs://dhome/camepr42/test_zstd/file_128KiB.txt.zst").show()` > 5. The executor raises `InternalError: Src size is incorrect`. > > A second sequence involving both 129 KiB and 128 KiB files can reproduce: > `InternalError: Restored data doesn't match checksum`. > > Details including stack traces and command steps are included in my > comment to Hadoop. https://issues.apache.org/jira/browse/HADOOP-18799 > > Thanks > -- > *camper42* > Douban, Inc. > > E-mail: [email protected] >
