Re: Hyracks Job Requirement Configuration

Michael Carey Mon, 29 Jan 2018 08:24:04 -0800

Do I hear 30?  (And ~1.6 mins?)


On 1/29/18 12:28 AM, Rana Alotaibi wrote:

Thanks Murtadha for your informative email. I have now 15 partitions(~15 cores were utilized as well), and it helps to reduce theexecution time. The query execution time now is ~3.2 mins :).


--Rana

On Sun, Jan 28, 2018 at 8:29 PM, Murtadha Hubail <[email protected]<mailto:[email protected]>> wrote:


    If reloading the data isn’t too much trouble, the first thing I
    would do is recreate the instance with more partitions (e.g.
    partition per core or partition per 2 cores) and check the cores
    utilization. If this is the same dataset as the one in your
    previous email, you mentioned that it was about 10GB per
    partition, in that case, you might want to allocate at least 40GB
    for the buffer cache and you can reduce
    storage.memorycomponent.globalbudget to get enough memory to
    execute the job (depending on the number of partitions you
    create). After recreating with higher number of partitions, don’t
    use “SET `compiler.parallelism` "39"”. It will automatically use
    the number of partitions you create.

    Regarding the metrics time, it includes the results printing time,
    so if you want to see if it has any impact, try adding “limit 1”
    at the end of your query or change it to select count(*) instead
    of subject_id.

    Cheers,

    Murtadha

    *From: *Rana Alotaibi <[email protected]
    <mailto:[email protected]>>
    *Date: *Monday, 29 January 2018 at 6:48 AM


    *To: *<[email protected] <mailto:[email protected]>>
    *Cc: *<[email protected]
    <mailto:[email protected]>>, <[email protected]
    <mailto:[email protected]>>
    *Subject: *Re: Hyracks Job Requirement Configuration

    *- Do you see all cores being fully utilized during the query
    execution? *

    **I have noticed only 6 cores were utilized
    *- How much time does the query take right now and how do you
    measure the query execution time? Do you wait for the result to be
    printed somewhere (e.g. in the browser)?*

    I'm using the HTTP APIs. The response is a JSON object that
    includes the query execution time:

       { "status": "success",
            "metrics": {
    *"elapsedTime": "434.627299814s",
                    "executionTime": "434.626137977s",*
                    "resultCount": 4943,
                    "resultSize": 132293,-
                    "processedObjects": 46875
            }
    }

    I run the query 10 times and took the average which is ~6mins.

    *- You mentioned that you have 4 partitions, how many physical
    hard drives are they mapped to?*

    **One physical hard drive

    *- Also, increasing the sort/join memory doesn’t necessarily lead
    to a better performance. Have you tried changing these values to
    something smaller and seeing the effects?*

      Yes, I tried the following numbers:

      1) sort-memory: 32MB, join-memory: 64MB

      2) sort-memory: 64MB, join-memory: 128MB

      3) sort-memory: 128MB, join-memory: 265MB

    The execution time remains on average ~6 - 6.5mins. I didn't see
    any improvement. The configurations that I have now:

    - compiler.parallelism :39 //Only 6 were utilized

    - storage.buffercache.size: 20GB

    - storage.buffercache.pagesize: 1MB

    Thanks,

    Rana

    On Sun, Jan 28, 2018 at 6:41 PM, Murtadha Hubail
    <[email protected]<mailto:[email protected]>> wrote:

        I have few questions if you don’t mind:

        Do you see all cores being fully utilized during the query
        execution?

        How much time does the query take right now and how do you
        measure the query execution time? Do you wait for the result
        to be printed somewhere (e.g. in the browser)?

        You mentioned that you have 4 partitions, how many physical
        hard drives are they mapped to?

        Also, increasing the sort/join memory doesn’t necessarily lead
        to a better performance. Have you tried changing these values
        to something smaller and seeing the effects?

        Cheers,

        Murtadha

        *From: *Rana Alotaibi
        <[email protected]<mailto:[email protected]>>
        *Date: *Monday, 29 January 2018 at 5:21 AM
        *To: *<[email protected]<mailto:[email protected]>>
        *Cc:
        *<[email protected]<mailto:[email protected]>>,
        <[email protected]<mailto:[email protected]>>
        *Subject: *Re: Hyracks Job Requirement Configuration

        Thanks Murtadha! The problem solved. However, increasing the
        number of cores didn't help to improve the performance of that
        query.

        On Sun, Jan 28, 2018 at 5:05 PM, Murtadha Hubail
        <[email protected]<mailto:[email protected]>> wrote:

            Hi Rana,

            The memory used for query processing is automatically
            calculated as follows:
            JVM Max Memory - storage.buffercache.size -
            storage.memorycomponent.globalbudget

            The documentation defaults for these parameters are
            outdated. The default value for storage.buffercache.size
            is (JVM Max Memory / 4) and it's the same for
            storage.memorycomponent.globalbudget. Since your dataset
            is already loaded, you could reduce the budget of
            storage.memorycomponent.globalbudget. In addition, if I
            recall correctly, your dataset size is way smaller than
            what's allocated for the buffer cache, so you might want
            to reduce the buffer cache budget. That should give you
            more than enough memory to execute on 39 cores.

            Cheers,
            Murtadha


            On 01/29/2018, 3:30 AM, "Mike Carey"
            <[email protected]<mailto:[email protected]>> wrote:

                + dev


                On 1/28/18 3:37 PM, Rana Alotaibi wrote:
                > Hi all,
                >
                > I would like to make AsterixDB utilizes all
            available CPU cores (39)
                > that I have for the following query:
                >
                > USE mimiciii;
                > SET `compiler.parallelism` "39";
                > SET `compiler.sortmemory` "128MB";
                > SET `compiler.joinmemory` "265MB";
                > SELECT P.SUBJECT_ID
                > FROM  LABITEMS I, PATIENTS P, P.ADMISSIONS A,
            A.LABEVENTS E
                > WHERE E.ITEMID/*+bcast*/=I.ITEMID AND
                >         E.FLAG = 'abnormal' AND
                > I.FLUID='Blood' AND
                > I.LABEL='Haptoglobin'
                >
                >
                > The total memory size that I have is 125GB(57GB for
            the AsterixDB
                > buffer cache). By running the above query, I got the
            following error:
                >
                > "msg": "HYR0009: Job requirement (memory:
            10705403904 bytes, CPU
                > cores: 39) exceeds capacity (memory:
            3258744832<tel:%28325%29%20874-4832>bytes, CPU cores: 39)"
                >
                > How can I change this capacity default
            configuration? I'm looking into
                > this page :
            
https://asterixdb.apache.org/docs/0.9.2/ncservice.html<https://asterixdb.apache.org/docs/0.9.2/ncservice.html>.
                > Could you please point me to the appropriate
            configuration parameter?
                >
                > Thanks
                > -- Rana
                >
                >
                >
                >

Re: Hyracks Job Requirement Configuration

Reply via email to