Do I hear 30? (And ~1.6 mins?)
On 1/29/18 12:28 AM, Rana Alotaibi wrote:
Thanks Murtadha for your informative email. I have now 15 partitions
(~15 cores were utilized as well), and it helps to reduce the
execution time. The query execution time now is ~3.2 mins :).
--Rana
On Sun, Jan 28, 2018 at 8:29 PM, Murtadha Hubail <[email protected]
<mailto:[email protected]>> wrote:
If reloading the data isn’t too much trouble, the first thing I
would do is recreate the instance with more partitions (e.g.
partition per core or partition per 2 cores) and check the cores
utilization. If this is the same dataset as the one in your
previous email, you mentioned that it was about 10GB per
partition, in that case, you might want to allocate at least 40GB
for the buffer cache and you can reduce
storage.memorycomponent.globalbudget to get enough memory to
execute the job (depending on the number of partitions you
create). After recreating with higher number of partitions, don’t
use “SET `compiler.parallelism` "39"”. It will automatically use
the number of partitions you create.
Regarding the metrics time, it includes the results printing time,
so if you want to see if it has any impact, try adding “limit 1”
at the end of your query or change it to select count(*) instead
of subject_id.
Cheers,
Murtadha
*From: *Rana Alotaibi <[email protected]
<mailto:[email protected]>>
*Date: *Monday, 29 January 2018 at 6:48 AM
*To: *<[email protected] <mailto:[email protected]>>
*Cc: *<[email protected]
<mailto:[email protected]>>, <[email protected]
<mailto:[email protected]>>
*Subject: *Re: Hyracks Job Requirement Configuration
*- Do you see all cores being fully utilized during the query
execution? *
**I have noticed only 6 cores were utilized
*- How much time does the query take right now and how do you
measure the query execution time? Do you wait for the result to be
printed somewhere (e.g. in the browser)?*
I'm using the HTTP APIs. The response is a JSON object that
includes the query execution time:
{ "status": "success",
"metrics": {
*"elapsedTime": "434.627299814s",
"executionTime": "434.626137977s",*
"resultCount": 4943,
"resultSize": 132293,-
"processedObjects": 46875
}
}
I run the query 10 times and took the average which is ~6mins.
*- You mentioned that you have 4 partitions, how many physical
hard drives are they mapped to?*
**One physical hard drive
*- Also, increasing the sort/join memory doesn’t necessarily lead
to a better performance. Have you tried changing these values to
something smaller and seeing the effects?*
Yes, I tried the following numbers:
1) sort-memory: 32MB, join-memory: 64MB
2) sort-memory: 64MB, join-memory: 128MB
3) sort-memory: 128MB, join-memory: 265MB
The execution time remains on average ~6 - 6.5mins. I didn't see
any improvement. The configurations that I have now:
- compiler.parallelism :39 //Only 6 were utilized
- storage.buffercache.size: 20GB
- storage.buffercache.pagesize: 1MB
Thanks,
Rana
On Sun, Jan 28, 2018 at 6:41 PM, Murtadha Hubail
<[email protected]<mailto:[email protected]>> wrote:
I have few questions if you don’t mind:
Do you see all cores being fully utilized during the query
execution?
How much time does the query take right now and how do you
measure the query execution time? Do you wait for the result
to be printed somewhere (e.g. in the browser)?
You mentioned that you have 4 partitions, how many physical
hard drives are they mapped to?
Also, increasing the sort/join memory doesn’t necessarily lead
to a better performance. Have you tried changing these values
to something smaller and seeing the effects?
Cheers,
Murtadha
*From: *Rana Alotaibi
<[email protected]<mailto:[email protected]>>
*Date: *Monday, 29 January 2018 at 5:21 AM
*To: *<[email protected]<mailto:[email protected]>>
*Cc:
*<[email protected]<mailto:[email protected]>>,
<[email protected]<mailto:[email protected]>>
*Subject: *Re: Hyracks Job Requirement Configuration
Thanks Murtadha! The problem solved. However, increasing the
number of cores didn't help to improve the performance of that
query.
On Sun, Jan 28, 2018 at 5:05 PM, Murtadha Hubail
<[email protected]<mailto:[email protected]>> wrote:
Hi Rana,
The memory used for query processing is automatically
calculated as follows:
JVM Max Memory - storage.buffercache.size -
storage.memorycomponent.globalbudget
The documentation defaults for these parameters are
outdated. The default value for storage.buffercache.size
is (JVM Max Memory / 4) and it's the same for
storage.memorycomponent.globalbudget. Since your dataset
is already loaded, you could reduce the budget of
storage.memorycomponent.globalbudget. In addition, if I
recall correctly, your dataset size is way smaller than
what's allocated for the buffer cache, so you might want
to reduce the buffer cache budget. That should give you
more than enough memory to execute on 39 cores.
Cheers,
Murtadha
On 01/29/2018, 3:30 AM, "Mike Carey"
<[email protected]<mailto:[email protected]>> wrote:
+ dev
On 1/28/18 3:37 PM, Rana Alotaibi wrote:
> Hi all,
>
> I would like to make AsterixDB utilizes all
available CPU cores (39)
> that I have for the following query:
>
> USE mimiciii;
> SET `compiler.parallelism` "39";
> SET `compiler.sortmemory` "128MB";
> SET `compiler.joinmemory` "265MB";
> SELECT P.SUBJECT_ID
> FROM LABITEMS I, PATIENTS P, P.ADMISSIONS A,
A.LABEVENTS E
> WHERE E.ITEMID/*+bcast*/=I.ITEMID AND
> E.FLAG = 'abnormal' AND
> I.FLUID='Blood' AND
> I.LABEL='Haptoglobin'
>
>
> The total memory size that I have is 125GB(57GB for
the AsterixDB
> buffer cache). By running the above query, I got the
following error:
>
> "msg": "HYR0009: Job requirement (memory:
10705403904 bytes, CPU
> cores: 39) exceeds capacity (memory:
3258744832<tel:%28325%29%20874-4832>bytes, CPU cores: 39)"
>
> How can I change this capacity default
configuration? I'm looking into
> this page :
https://asterixdb.apache.org/docs/0.9.2/ncservice.html<https://asterixdb.apache.org/docs/0.9.2/ncservice.html>.
> Could you please point me to the appropriate
configuration parameter?
>
> Thanks
> -- Rana
>
>
>
>