Hi Taewoo,
-
Can you paste the optimized plan? -- Attached the plan (Plan_01.txt)
-
Can you create an index on LABEVENTS.FLAG? -- I couldn't create an
index on LABEVENTS.FLAG since LABEVENTS is of type array. I got this
message when I tried to create the index : "msg": "ASX0001: Field type
array can't be promoted to type object"
- Can you switch the predicate order? -- It seems for me that the plan
remains the same even if I changed the order of the predicates. (Attached
the plan after changing the order of the predicates Plan_02.txt)
Thanks
Rana
On Thu, Jan 25, 2018 at 11:24 PM, Rana Alotaibi <[email protected]>
wrote:
> Hi Chen,
>
> *How did you import data into the dataset? using "load" or "feed"?*
> I used "LOAD" (i.e USE mimiciii; LOAD DATASET PATIENTS USING localfs
> ((\"path\"=\"127.0.0.1:///data/ralotaib/patients.json\"),
> (\"format\"=\"json\"))).
>
>
> *Which version of AsterixDB are you using?*
> AsterixDB Master (0.9.3-SNAPSHOT)
>
> Thanks!
>
>
>
>
>
> On Thu, Jan 25, 2018 at 10:39 PM, Chen Luo <[email protected]> wrote:
>
>> Hi Rana,
>>
>> Nice to see you again! You may post to [email protected] as well
>> to get more feedbacks from our developers.
>>
>> Just clarify two things: how did you import data into the dataset? using
>> "load" or "feed"? And which version of AsterixDB are you using? But any way
>> in your case it seems the join takes a lot of time, and your data is pretty
>> much cached into the memory...
>>
>> Best regards,
>> Chen Luo
>>
>> On Thu, Jan 25, 2018 at 8:46 PM, Rana Alotaibi <[email protected]>
>> wrote:
>>
>>> Hi there,
>>>
>>> I have a query that takes ~12.7mins on average (I have excluded the
>>> warm-up time which was 30mins)!, and I would like to make sure that I
>>> didn't miss any performance tuning parameters ( I have run the same query
>>> on MongoDB, and it took ~2mins).
>>>
>>> The query asks to find all patients that have 'abnormal' haptoglobin
>>> blood test result. (The query result can have duplicate values).
>>>
>>> *Query:*
>>> USE mimiciii;
>>> SET `compiler.parallelism` "5";
>>> SET `compiler.sortmemory` "128MB";
>>> SET `compiler.joinmemory` "265MB";
>>> SELECT P.SUBJECT_ID
>>> FROM LABITEMS I, PATIENTS P, P.ADMISSIONS A, A.LABEVENTS E
>>> WHERE I.ITEMID=E.ITEMID AND
>>> E.FLAG = 'abnormal' AND
>>> I.FLUID='Blood' AND
>>> I.LABEL='Haptoglobin'
>>>
>>> *Datasets Schema:*
>>> - PATIENTS and LABITEMS datasets have an open schema.
>>> - LABITEMS's primary key is ITEMID
>>> - PATIENTS 's primary key is SUBJECT_ID
>>> - The JSON schema for both datasets is attached.
>>> - The DDL for both datasets is attached
>>>
>>> *Performance Tuning Parameters:*
>>> - 4 partitions (iodevices)
>>> - The total memory size is : 125GB, and I have assigned ~ 57GB to the
>>> buffercache (storage.buffercache.size).
>>> - As you can see from the query, I set the parallelism to 5,
>>> sort-memory to 128MB, join-memory to 265MB.
>>> - The data size is 7GB
>>>
>>> Your feedback is highly appreciated!
>>>
>>> --Rana
>>>
>>
>>
>
distribute result [$$49]
-- DISTRIBUTE_RESULT |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
project ([$$49])
-- STREAM_PROJECT |PARTITIONED|
assign [$$49] <- [{"SUBJECT_ID": $$51}]
-- ASSIGN |PARTITIONED|
project ([$$51])
-- STREAM_PROJECT |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
join (eq($$50, $$53))
-- HYBRID_HASH_JOIN [$$50][$$53] |PARTITIONED|
exchange
-- HASH_PARTITION_EXCHANGE [$$50] |PARTITIONED|
project ([$$50])
-- STREAM_PROJECT |PARTITIONED|
select (and(eq($$I.getField("FLUID"), "Blood"),
eq($$I.getField("LABEL"), "Haptoglobin")))
-- STREAM_SELECT |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
unnest-map [$$50, $$I] <- index-search("LABITEMS", 0,
"mimiciii", "LABITEMS", FALSE, FALSE, 1, $$63, 1, $$63, TRUE, TRUE, TRUE)
-- BTREE_SEARCH |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
intersect ([$$63] <- [[$$63], [$$67]])
-- INTERSECT |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
order (ASC, $$63)
-- STABLE_SORT [$$63(ASC)] |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
project ([$$63])
-- STREAM_PROJECT |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
unnest-map [$$62, $$63] <-
index-search("LABLEIndx", 0, "mimiciii", "LABITEMS", FALSE, FALSE, 1, $$60, 1,
$$61, TRUE, TRUE, TRUE)
-- BTREE_SEARCH |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
assign [$$60, $$61] <-
["Haptoglobin", "Haptoglobin"]
-- ASSIGN |PARTITIONED|
empty-tuple-source
-- EMPTY_TUPLE_SOURCE |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
order (ASC, $$67)
-- STABLE_SORT [$$67(ASC)] |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
project ([$$67])
-- STREAM_PROJECT |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
unnest-map [$$66, $$67] <-
index-search("FLUIDIndx", 0, "mimiciii", "LABITEMS", FALSE, FALSE, 1, $$64, 1,
$$65, TRUE, TRUE, TRUE)
-- BTREE_SEARCH |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
assign [$$64, $$65] <- ["Blood",
"Blood"]
-- ASSIGN |PARTITIONED|
empty-tuple-source
-- EMPTY_TUPLE_SOURCE |PARTITIONED|
exchange
-- HASH_PARTITION_EXCHANGE [$$53] |PARTITIONED|
project ([$$51, $$53])
-- STREAM_PROJECT |PARTITIONED|
select (eq($$E.getField("FLAG"), "abnormal"))
-- STREAM_SELECT |PARTITIONED|
assign [$$53] <- [$$E.getField("ITEMID")]
-- ASSIGN |PARTITIONED|
project ([$$51, $$E])
-- STREAM_PROJECT |PARTITIONED|
unnest $$E <- scan-collection($$57)
-- UNNEST |PARTITIONED|
project ([$$51, $$57])
-- STREAM_PROJECT |PARTITIONED|
assign [$$57] <- [$$A.getField("LABEVENTS")]
-- ASSIGN |PARTITIONED|
project ([$$51, $$A])
-- STREAM_PROJECT |PARTITIONED|
unnest $$A <- scan-collection($$56)
-- UNNEST |PARTITIONED|
project ([$$51, $$56])
-- STREAM_PROJECT |PARTITIONED|
assign [$$56] <-
[$$P.getField("ADMISSIONS")]
-- ASSIGN |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
data-scan []<-[$$51, $$P] <-
mimiciii.PATIENTS
-- DATASOURCE_SCAN |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
empty-tuple-source
-- EMPTY_TUPLE_SOURCE |PARTITIONED|
distribute result [$$49]
-- DISTRIBUTE_RESULT |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
project ([$$49])
-- STREAM_PROJECT |PARTITIONED|
assign [$$49] <- [{"SUBJECT_ID": $$51}]
-- ASSIGN |PARTITIONED|
project ([$$51])
-- STREAM_PROJECT |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
join (eq($$50, $$53))
-- HYBRID_HASH_JOIN [$$50][$$53] |PARTITIONED|
exchange
-- HASH_PARTITION_EXCHANGE [$$50] |PARTITIONED|
project ([$$50])
-- STREAM_PROJECT |PARTITIONED|
select (and(eq($$I.getField("LABEL"), "Haptoglobin"),
eq($$I.getField("FLUID"), "Blood")))
-- STREAM_SELECT |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
unnest-map [$$50, $$I] <- index-search("LABITEMS", 0,
"mimiciii", "LABITEMS", FALSE, FALSE, 1, $$63, 1, $$63, TRUE, TRUE, TRUE)
-- BTREE_SEARCH |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
intersect ([$$63] <- [[$$63], [$$67]])
-- INTERSECT |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
order (ASC, $$63)
-- STABLE_SORT [$$63(ASC)] |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
project ([$$63])
-- STREAM_PROJECT |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
unnest-map [$$62, $$63] <-
index-search("LABLEIndx", 0, "mimiciii", "LABITEMS", FALSE, FALSE, 1, $$60, 1,
$$61, TRUE, TRUE, TRUE)
-- BTREE_SEARCH |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
assign [$$60, $$61] <-
["Haptoglobin", "Haptoglobin"]
-- ASSIGN |PARTITIONED|
empty-tuple-source
-- EMPTY_TUPLE_SOURCE |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
order (ASC, $$67)
-- STABLE_SORT [$$67(ASC)] |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
project ([$$67])
-- STREAM_PROJECT |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
unnest-map [$$66, $$67] <-
index-search("FLUIDIndx", 0, "mimiciii", "LABITEMS", FALSE, FALSE, 1, $$64, 1,
$$65, TRUE, TRUE, TRUE)
-- BTREE_SEARCH |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
assign [$$64, $$65] <- ["Blood",
"Blood"]
-- ASSIGN |PARTITIONED|
empty-tuple-source
-- EMPTY_TUPLE_SOURCE |PARTITIONED|
exchange
-- HASH_PARTITION_EXCHANGE [$$53] |PARTITIONED|
project ([$$51, $$53])
-- STREAM_PROJECT |PARTITIONED|
select (eq($$E.getField("FLAG"), "abnormal"))
-- STREAM_SELECT |PARTITIONED|
assign [$$53] <- [$$E.getField("ITEMID")]
-- ASSIGN |PARTITIONED|
project ([$$51, $$E])
-- STREAM_PROJECT |PARTITIONED|
unnest $$E <- scan-collection($$57)
-- UNNEST |PARTITIONED|
project ([$$51, $$57])
-- STREAM_PROJECT |PARTITIONED|
assign [$$57] <- [$$A.getField("LABVENTS")]
-- ASSIGN |PARTITIONED|
project ([$$51, $$A])
-- STREAM_PROJECT |PARTITIONED|
unnest $$A <- scan-collection($$56)
-- UNNEST |PARTITIONED|
project ([$$51, $$56])
-- STREAM_PROJECT |PARTITIONED|
assign [$$56] <-
[$$P.getField("ADMISSIONS")]
-- ASSIGN |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
data-scan []<-[$$51, $$P] <-
mimiciii.PAITENTS
-- DATASOURCE_SCAN |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
empty-tuple-source
-- EMPTY_TUPLE_SOURCE |PARTITIONED|