Hi Chen, *How did you import data into the dataset? using "load" or "feed"?* I used "LOAD" (i.e USE mimiciii; LOAD DATASET PATIENTS USING localfs ((\"path\"=\"127.0.0.1:///data/ralotaib/patients.json\"), (\"format\"=\"json\"))).
*Which version of AsterixDB are you using?* AsterixDB Master (0.9.3-SNAPSHOT) Thanks! On Thu, Jan 25, 2018 at 10:39 PM, Chen Luo <[email protected]> wrote: > Hi Rana, > > Nice to see you again! You may post to [email protected] as well > to get more feedbacks from our developers. > > Just clarify two things: how did you import data into the dataset? using > "load" or "feed"? And which version of AsterixDB are you using? But any way > in your case it seems the join takes a lot of time, and your data is pretty > much cached into the memory... > > Best regards, > Chen Luo > > On Thu, Jan 25, 2018 at 8:46 PM, Rana Alotaibi <[email protected]> > wrote: > >> Hi there, >> >> I have a query that takes ~12.7mins on average (I have excluded the >> warm-up time which was 30mins)!, and I would like to make sure that I >> didn't miss any performance tuning parameters ( I have run the same query >> on MongoDB, and it took ~2mins). >> >> The query asks to find all patients that have 'abnormal' haptoglobin >> blood test result. (The query result can have duplicate values). >> >> *Query:* >> USE mimiciii; >> SET `compiler.parallelism` "5"; >> SET `compiler.sortmemory` "128MB"; >> SET `compiler.joinmemory` "265MB"; >> SELECT P.SUBJECT_ID >> FROM LABITEMS I, PATIENTS P, P.ADMISSIONS A, A.LABEVENTS E >> WHERE I.ITEMID=E.ITEMID AND >> E.FLAG = 'abnormal' AND >> I.FLUID='Blood' AND >> I.LABEL='Haptoglobin' >> >> *Datasets Schema:* >> - PATIENTS and LABITEMS datasets have an open schema. >> - LABITEMS's primary key is ITEMID >> - PATIENTS 's primary key is SUBJECT_ID >> - The JSON schema for both datasets is attached. >> - The DDL for both datasets is attached >> >> *Performance Tuning Parameters:* >> - 4 partitions (iodevices) >> - The total memory size is : 125GB, and I have assigned ~ 57GB to the >> buffercache (storage.buffercache.size). >> - As you can see from the query, I set the parallelism to 5, sort-memory >> to 128MB, join-memory to 265MB. >> - The data size is 7GB >> >> Your feedback is highly appreciated! >> >> --Rana >> > >
