jiangjiguang commented on PR #40646: URL: https://github.com/apache/spark/pull/40646#issuecomment-1495204917
> > Add configuration spark.sql.parquet.vector512.read.enabled, If true and CPU contains avx512vbmi & avx512_vbmi2 instruction set, parquet decodes using Java Vector API. For Intel CPU, Ice Lake or newer contains the required instruction set. > > hmm... what happens if machines that support AVX 512 are prohibited from using AVX 512? @LuciferYang this is a good question. As far as I know, Cascade Lake currently used by most users. Compared with Ice lake, Cascade Lake is Intel's previous generation CPU and it does not contains avx512vbmi & avx512_vbmi2 instruction set. If running the parquet vector optimization on Cascade Lake, it will become very slowly(0.01x), but there is a JDK [Patch](https://bugs.openjdk.org/browse/JDK-8290322) to fix this problem, with the patch, it will be 1.7x. Under normal situation, it will be 5.5x on Ice Lake with Java17. The patch has been merged Java17. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
