Hi Rana,

Nice to see you again! You may post to [email protected] as well to
get more feedbacks from our developers.

Just clarify two things: how did you import data into the dataset? using
"load" or "feed"? And which version of AsterixDB are you using? But any way
in your case it seems the join takes a lot of time, and your data is pretty
much cached into the memory...

Best regards,
Chen Luo

On Thu, Jan 25, 2018 at 8:46 PM, Rana Alotaibi <[email protected]>
wrote:

> Hi there,
>
> I have a query that takes ~12.7mins on average (I have excluded the
> warm-up time which was 30mins)!, and I would like to make sure that I
> didn't miss any performance tuning parameters ( I have run the same query
> on MongoDB, and it took ~2mins).
>
> The query asks to find all patients that have 'abnormal' haptoglobin blood
> test result. (The query result can have duplicate values).
>
> *Query:*
> USE mimiciii;
> SET `compiler.parallelism` "5";
> SET `compiler.sortmemory` "128MB";
> SET `compiler.joinmemory` "265MB";
> SELECT P.SUBJECT_ID
> FROM   LABITEMS I, PATIENTS P, P.ADMISSIONS A, A.LABEVENTS E
> WHERE I.ITEMID=E.ITEMID AND
>              E.FLAG = 'abnormal' AND
>              I.FLUID='Blood' AND
>              I.LABEL='Haptoglobin'
>
> *Datasets Schema:*
> - PATIENTS  and LABITEMS datasets have an open schema.
> - LABITEMS's  primary key is ITEMID
> - PATIENTS 's primary key is SUBJECT_ID
> - The JSON schema for both datasets is attached.
> - The DDL for both datasets is attached
>
> *Performance Tuning Parameters:*
> - 4 partitions (iodevices)
> - The total memory size is : 125GB, and I have assigned ~ 57GB to the
> buffercache (storage.buffercache.size).
> - As you can see from the query, I set the parallelism to 5,  sort-memory
> to 128MB, join-memory to 265MB.
> - The data size is 7GB
>
> Your feedback is highly appreciated!
>
> --Rana
>

Reply via email to