Hi Chen,

*How did you import data into the dataset? using "load" or "feed"?*
I used "LOAD" (i.e USE mimiciii; LOAD DATASET PATIENTS USING localfs
((\"path\"=\"127.0.0.1:///data/ralotaib/patients.json\"),
(\"format\"=\"json\"))).


*Which version of AsterixDB are you using?*
AsterixDB Master (0.9.3-SNAPSHOT)

Thanks!





On Thu, Jan 25, 2018 at 10:39 PM, Chen Luo <[email protected]> wrote:

> Hi Rana,
>
> Nice to see you again! You may post to [email protected] as well
> to get more feedbacks from our developers.
>
> Just clarify two things: how did you import data into the dataset? using
> "load" or "feed"? And which version of AsterixDB are you using? But any way
> in your case it seems the join takes a lot of time, and your data is pretty
> much cached into the memory...
>
> Best regards,
> Chen Luo
>
> On Thu, Jan 25, 2018 at 8:46 PM, Rana Alotaibi <[email protected]>
> wrote:
>
>> Hi there,
>>
>> I have a query that takes ~12.7mins on average (I have excluded the
>> warm-up time which was 30mins)!, and I would like to make sure that I
>> didn't miss any performance tuning parameters ( I have run the same query
>> on MongoDB, and it took ~2mins).
>>
>> The query asks to find all patients that have 'abnormal' haptoglobin
>> blood test result. (The query result can have duplicate values).
>>
>> *Query:*
>> USE mimiciii;
>> SET `compiler.parallelism` "5";
>> SET `compiler.sortmemory` "128MB";
>> SET `compiler.joinmemory` "265MB";
>> SELECT P.SUBJECT_ID
>> FROM   LABITEMS I, PATIENTS P, P.ADMISSIONS A, A.LABEVENTS E
>> WHERE I.ITEMID=E.ITEMID AND
>>              E.FLAG = 'abnormal' AND
>>              I.FLUID='Blood' AND
>>              I.LABEL='Haptoglobin'
>>
>> *Datasets Schema:*
>> - PATIENTS  and LABITEMS datasets have an open schema.
>> - LABITEMS's  primary key is ITEMID
>> - PATIENTS 's primary key is SUBJECT_ID
>> - The JSON schema for both datasets is attached.
>> - The DDL for both datasets is attached
>>
>> *Performance Tuning Parameters:*
>> - 4 partitions (iodevices)
>> - The total memory size is : 125GB, and I have assigned ~ 57GB to the
>> buffercache (storage.buffercache.size).
>> - As you can see from the query, I set the parallelism to 5,  sort-memory
>> to 128MB, join-memory to 265MB.
>> - The data size is 7GB
>>
>> Your feedback is highly appreciated!
>>
>> --Rana
>>
>
>

Reply via email to