Hi there,

I have a query that takes ~12.7mins on average (I have excluded the warm-up
time which was 30mins)!, and I would like to make sure that I didn't miss
any performance tuning parameters ( I have run the same query on MongoDB,
and it took ~2mins).

The query asks to find all patients that have 'abnormal' haptoglobin blood
test result. (The query result can have duplicate values).

*Query:*
USE mimiciii;
SET `compiler.parallelism` "5";
SET `compiler.sortmemory` "128MB";
SET `compiler.joinmemory` "265MB";
SELECT P.SUBJECT_ID
FROM   LABITEMS I, PATIENTS P, P.ADMISSIONS A, A.LABEVENTS E
WHERE I.ITEMID=E.ITEMID AND
             E.FLAG = 'abnormal' AND
             I.FLUID='Blood' AND
             I.LABEL='Haptoglobin'

*Datasets Schema:*
- PATIENTS  and LABITEMS datasets have an open schema.
- LABITEMS's  primary key is ITEMID
- PATIENTS 's primary key is SUBJECT_ID
- The JSON schema for both datasets is attached.
- The DDL for both datasets is attached

*Performance Tuning Parameters:*
- 4 partitions (iodevices)
- The total memory size is : 125GB, and I have assigned ~ 57GB to the
buffercache (storage.buffercache.size).
- As you can see from the query, I set the parallelism to 5,  sort-memory
to 128MB, join-memory to 265MB.
- The data size is 7GB

Your feedback is highly appreciated!

--Rana
--LABITEMS
USE mimiciii; 
CREATE TYPE lType AS { ITEMID: int };  
CREATE DATASET LABITEMS(lType) PRIMARY KEY ITEMID;

--PATIENTS
USE mimiciii; 
CREATE TYPE pType AS { SUBJECT_ID: int };  
CREATE DATASET PATIENTS(pType) 
PRIMARY KEY SUBJECT_ID;

--Indexes
USE mimiciii; 
CREATE INDEX FLUIDIndx on LABITEMS (FLUID: string?) enforced;
CREATE INDEX LABLEIndx on LABITEMS (LABEL: string?) enforced;

Attachment: MIMIC-SchemaJSON.json
Description: application/json

Reply via email to