[GitHub] [spark] jiangjiguang commented on pull request #40646: [WIP][SPARK-42696]Speed up parquet reading with Java Vector API

via GitHub Mon, 03 Apr 2023 18:17:10 -0700


jiangjiguang commented on PR #40646:
URL: https://github.com/apache/spark/pull/40646#issuecomment-1495204917


   > > Add configuration spark.sql.parquet.vector512.read.enabled, If true and 
CPU contains avx512vbmi & avx512_vbmi2 instruction set, parquet decodes using 
Java Vector API. For Intel CPU, Ice Lake or newer contains the required 
instruction set.
   > 
   > hmm... what happens if machines that support AVX 512 are prohibited from 
using AVX 512?
   
   @LuciferYang this is a good question.  As far as I know, Cascade Lake 
currently used by most users. Compared with Ice lake, Cascade Lake is Intel's 
previous generation CPU and it does not contains avx512vbmi & avx512_vbmi2 
instruction set.  If running the parquet vector optimization on Cascade Lake, 
it will become very slowly(0.01x), but there is a JDK 
[Patch](https://bugs.openjdk.org/browse/JDK-8290322) to fix this problem, with 
the patch, it will be 1.7x.  Under normal situation, it will be 5.5x on Ice 
Lake with Java17. The patch has been merged Java17.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] jiangjiguang commented on pull request #40646: [WIP][SPARK-42696]Speed up parquet reading with Java Vector API

Reply via email to