Re: Increasing degree of parallelism when reading Parquet files

Dmitry Lychagin Mon, 09 Aug 2021 10:24:46 -0700

Ingo,

We have `compiler.parallelism` parameter that controls how many cores are used 
for query execution.
See 
https://ci.apache.org/projects/asterixdb/sqlpp/manual.html#Parallelism_parameter
You can either set it per query (e.g. SET `compiler.parallelism` "-1";) ,
or globally in the cluster configuration:
https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-app/src/main/resources/cc2.conf#L57

Thanks,
-- Dmitry

From: Müller Ingo <[email protected]>
Reply-To: "[email protected]" <[email protected]>
Date: Monday, August 9, 2021 at 10:05 AM
To: "[email protected]" <[email protected]>
Subject: Increasing degree of parallelism when reading Parquet files

 EXTERNAL EMAIL:  Use caution when opening attachments or clicking on links

Dear AsterixDB devs,

I am currently trying out the new support for Parquet files on S3 (still in the 
context of my High-energy Physics use case [1]). This works great so far and 
has generally decent performance. However, I realized that it does not use more 
than 16 cores, even though 96 logical cores are available and even though I run 
long-running queries (several minutes) on large data sets with a large number 
of files (I tried 128 files of 17GB each). Is this an arbitrary/artificial 
limitation that can be changed somehow (potentially with a small 
patch+recompiling) or is there more serious development required to lift it? 
FYI, I am currently using 03fd6d0f, which should include all S3/Parquet commits 
on master.

Cheers,
Ingo

[1] https://arxiv.org/abs/2104.12615

Re: Increasing degree of parallelism when reading Parquet files

Reply via email to