Hi Rana,

I think the performance issue might related to the access of nested fields,
since the rest performance hot spots (index search, hash join etc looks
normal to me), and I assume " I.FLUID='Blood' AND I.LABEL='Haptoglobin'"
should be very selective. Since MongoDB supports array index, did you build
an index on L.FLAG using MongoDB?

@Wail, do you have any clue on nested fields access?

Best regards,
Chen Luo

On Fri, Jan 26, 2018 at 1:47 AM, Rana Alotaibi <[email protected]>
wrote:

> Hi Taewoo,
>
>    -
>
>    Can you paste the optimized plan? -- Attached the plan (Plan_01.txt)
>
>    -
>
>    Can you create an index on LABEVENTS.FLAG? -- I couldn't create an index 
> on LABEVENTS.FLAG since LABEVENTS is of type array. I got this message when I 
> tried to create the index : "msg": "ASX0001: Field type array can't be 
> promoted to type object"
>
>    - Can you switch the predicate order? -- It seems for me that the plan
>    remains the same even if I changed the order of the predicates. (Attached
>    the plan after changing the order of the predicates Plan_02.txt)
>
> Thanks
>
> Rana
> On Thu, Jan 25, 2018 at 11:24 PM, Rana Alotaibi <[email protected]>
> wrote:
>
>> Hi Chen,
>>
>> *How did you import data into the dataset? using "load" or "feed"?*
>> I used "LOAD" (i.e USE mimiciii; LOAD DATASET PATIENTS USING localfs
>> ((\"path\"=\"127.0.0.1:///data/ralotaib/patients.json\"),
>> (\"format\"=\"json\"))).
>>
>>
>> *Which version of AsterixDB are you using?*
>> AsterixDB Master (0.9.3-SNAPSHOT)
>>
>> Thanks!
>>
>>
>>
>>
>>
>> On Thu, Jan 25, 2018 at 10:39 PM, Chen Luo <[email protected]> wrote:
>>
>>> Hi Rana,
>>>
>>> Nice to see you again! You may post to [email protected] as well
>>> to get more feedbacks from our developers.
>>>
>>> Just clarify two things: how did you import data into the dataset? using
>>> "load" or "feed"? And which version of AsterixDB are you using? But any way
>>> in your case it seems the join takes a lot of time, and your data is pretty
>>> much cached into the memory...
>>>
>>> Best regards,
>>> Chen Luo
>>>
>>> On Thu, Jan 25, 2018 at 8:46 PM, Rana Alotaibi <[email protected]>
>>> wrote:
>>>
>>>> Hi there,
>>>>
>>>> I have a query that takes ~12.7mins on average (I have excluded the
>>>> warm-up time which was 30mins)!, and I would like to make sure that I
>>>> didn't miss any performance tuning parameters ( I have run the same query
>>>> on MongoDB, and it took ~2mins).
>>>>
>>>> The query asks to find all patients that have 'abnormal' haptoglobin
>>>> blood test result. (The query result can have duplicate values).
>>>>
>>>> *Query:*
>>>> USE mimiciii;
>>>> SET `compiler.parallelism` "5";
>>>> SET `compiler.sortmemory` "128MB";
>>>> SET `compiler.joinmemory` "265MB";
>>>> SELECT P.SUBJECT_ID
>>>> FROM   LABITEMS I, PATIENTS P, P.ADMISSIONS A, A.LABEVENTS E
>>>> WHERE I.ITEMID=E.ITEMID AND
>>>>              E.FLAG = 'abnormal' AND
>>>>              I.FLUID='Blood' AND
>>>>              I.LABEL='Haptoglobin'
>>>>
>>>> *Datasets Schema:*
>>>> - PATIENTS  and LABITEMS datasets have an open schema.
>>>> - LABITEMS's  primary key is ITEMID
>>>> - PATIENTS 's primary key is SUBJECT_ID
>>>> - The JSON schema for both datasets is attached.
>>>> - The DDL for both datasets is attached
>>>>
>>>> *Performance Tuning Parameters:*
>>>> - 4 partitions (iodevices)
>>>> - The total memory size is : 125GB, and I have assigned ~ 57GB to the
>>>> buffercache (storage.buffercache.size).
>>>> - As you can see from the query, I set the parallelism to 5,
>>>> sort-memory to 128MB, join-memory to 265MB.
>>>> - The data size is 7GB
>>>>
>>>> Your feedback is highly appreciated!
>>>>
>>>> --Rana
>>>>
>>>
>>>
>>
>

Reply via email to