Hi Sungwook, Can you upload the profile file to a public place? The user mailing list does not allow to attach files.
On Tue, Aug 25, 2015 at 10:58 AM, Sungwook Yoon <[email protected]> wrote: > > Jacques, > > Here is the physical profile for this query, > I attached the 1000 lines of the tail of the drillbit.log from the foreman. > Also, attached is the picture of the filter minorfragment information. > > We have plenty of memory and Drill is not using them all. > > Let me know your thoughts. > > Thanks, > > Sungwook > > > > 00-00 Screen : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, > cumulative cost = {1.452325068635E10 rows, 7.525684447485E10 cpu, 0.0 io, > 4096.0 network, 0.0 memory}, id = 228 > 00-01 StreamAgg(group=[{}], EXPR$0=[$SUM0($0)]) : rowType = > RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = > {1.452325068625E10 rows, 7.525684447475E10 cpu, 0.0 io, 4096.0 network, 0.0 > memory}, id = 227 > 00-02 UnionExchange : rowType = RecordType(BIGINT EXPR$0): rowcount > = 1.0, cumulative cost = {1.452325068525E10 rows, 7.525684446275E10 cpu, > 0.0 io, 4096.0 network, 0.0 memory}, id = 226 > 01-01 StreamAgg(group=[{}], EXPR$0=[COUNT()]) : rowType = > RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = > {1.452325068425E10 rows, 7.525684445475E10 cpu, 0.0 io, 0.0 network, 0.0 > memory}, id = 225 > 01-02 Project($f0=[1]) : rowType = RecordType(INTEGER $f0): > rowcount = 1.32029551675E9, cumulative cost = {1.32029551675E10 rows, > 5.941329825375E10 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 224 > 01-03 SelectionVectorRemover : rowType = RecordType(ANY DX1, > ANY DX2, ANY DX3, ANY DX4, ANY DX5): rowcount = 1.32029551675E9, cumulative > cost = {1.188265965075E10 rows, 5.413211618675E10 cpu, 0.0 io, 0.0 network, > 0.0 memory}, id = 223 > 01-04 Filter(condition=[OR(=($0, '39891'), =($0, '4280'), > =($0, '4281'), =($0, '42820'), =($0, '42821'), =($0, '42822'), =($0, > '42823'), =($0, '42830'), =($0, '42831'), =($0, '42832'), =($0, '42833'), > =($0, '42840'), =($0, '42841'), =($0, '42842'), =($0, '42843'), =($0, > '4289'), =($1, '39891'), =($1, '4280'), =($1, '4281'), =($1, '42820'), > =($1, '42821'), =($1, '42822'), =($1, '42823'), =($1, '42830'), =($1, > '42831'), =($1, '42832'), =($1, '42833'), =($1, '42840'), =($1, '42841'), > =($1, '42842'), =($1, '42843'), =($1, '4289'), =($2, '39891'), =($2, > '4280'), =($2, '4281'), =($2, '42820'), =($2, '42821'), =($2, '42822'), > =($2, '42823'), =($2, '42830'), =($2, '42831'), =($2, '42832'), =($2, > '42833'), =($2, '42840'), =($2, '42841'), =($2, '42842'), =($2, '42843'), > =($2, '4289'), =($3, '39891'), =($3, '4280'), =($3, '4281'), =($3, > '42820'), =($3, '42821'), =($3, '42822'), =($3, '42823'), =($3, '42830'), > =($3, '42831'), =($3, '42832'), =($3, '42833'), =($3, '42840'), =($3, > '42841'), =($3, '42842'), =($3, '42843'), =($3, '4289'), =($4, '39891'), > =($4, '4280'), =($4, '4281'), =($4, '42820'), =($4, '42821'), =($4, > '42822'), =($4, '42823'), =($4, '42830'), =($4, '42831'), =($4, '42832'), > =($4, '42833'), =($4, '42840'), =($4, '42841'), =($4, '42842'), =($4, > '42843'), =($4, '4289'))]) : rowType = RecordType(ANY DX1, ANY DX2, ANY > DX3, ANY DX4, ANY DX5): rowcount = 1.32029551675E9, cumulative cost = > {1.0562364134E10 rows, 5.281182067E10 cpu, 0.0 io, 0.0 network, 0.0 > memory}, id = 222 > 01-05 Scan(groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath > [path=maprfs:/user/hive/warehouse/mktscan_o/year=2009/000114_0], > ReadEntryWithPath > [path=maprfs:/user/hive/warehouse/mktscan_o/year=2009/000570_0], > ReadEntryWithPath > [path=maprfs:/user/hive/warehouse/mktscan_o/year=2009/000117_0], > ReadEntryWithPath > [path=maprfs:/user/hive/warehouse/mktscan_o/year=2009/000792_0], > ReadEntryWithPath > [path=maprfs:/user/hive/warehouse/mktscan_o/year=2009/000357_0], > ReadEntryWithPath > [path=maprfs:/user/hive/warehouse/mktscan_o/year=2009/000545_0], > ReadEntryWithPath > [path=maprfs:/user/hive/warehouse/mktscan_o/year=2009/000452_0], > ReadEntryWithPath > [path=maprfs:/user/hive/warehouse/mktscan_o/year=2009/000189_0], > ReadEntryWithPath > [path=maprfs:/user/hive/warehouse/mktscan_o/year=2009/000709_0], > ReadEntryWithPath > [path=maprfs:/user/hive/warehouse/mktscan_o/year=2009/000761_0], Re > > > On Tue, Aug 25, 2015 at 10:50 AM, Jacques Nadeau <[email protected]> > wrote: > >> I still think a profile analysis would be good. Can you post the >> following >> on gist: >> >> - profile for this query >> - source code for the slow filter operation (turn on debug logging to get >> this) >> >> thanks, >> Jacques >> >> -- >> Jacques Nadeau >> CTO and Co-Founder, Dremio >> >> On Tue, Aug 25, 2015 at 8:16 AM, Sungwook Yoon <[email protected]> >> wrote: >> >> > Understood,.. >> > I can see Jacques and Jinfeng's point of view. >> > >> > I would not bother think about this query .... but some claim that this >> > query ran orders of magnitude faster on some other sql engine... >> > >> > I don't know how.. they got that performance from this query.. >> > >> > Sungwook >> > >> > >> > >> > On Tue, Aug 25, 2015 at 10:04 AM, Jacques Nadeau <[email protected]> >> > wrote: >> > >> > > CONVERT_FROMUTF8 is a trivial operation. I would be surprised if that >> > was >> > > the bottleneck. See here [1] for the implementation. I'm guessing >> that >> > > the problem is simply that the best alternative on this structure >> would >> > be >> > > a non-equality join. The way I look at this query is you have data >> set >> > > like the following: >> > > >> > > dx1 dx2 dx3 dx4 dx5 >> > > # # # # # >> > > # # # # # >> > > # # # # # >> > > # # # # # >> > > # # # # # >> > > >> > > And you are doing a join: >> > > >> > > select * from baseTable t >> > > join dxTable dx on (dx1 = t.c1 OR dx2 = t.c2 OR dx3 = t.c3 OR dx4 = >> t.c4 >> > OR >> > > dx5 = t.c5) >> > > >> > > That isn't a pretty algorithm in general. I'm guessing that the best >> > > opportunity is to convert the data to something where you can start >> > taking >> > > advantage of hashing. For example, put all the values in a single >> list >> > > with each base table row key. For each value, prefix by column id. >> Then >> > > do a join on the new compound key and then get to distinct rows per >> base >> > > table row key. It isn't pretty but it would be much better than the >> > > current approach. >> > > >> > > >> > > [1] >> > > >> > > >> > >> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/conv/UTF8ConvertFrom.java >> > > >> > > >> > > -- >> > > Jacques Nadeau >> > > CTO and Co-Founder, Dremio >> > > >> > > On Tue, Aug 25, 2015 at 7:48 AM, Jinfeng Ni <[email protected]> >> > wrote: >> > > >> > > > I looked at the two physical plans you posted. Seems they are >> > different : >> > > > the first one has multiple CONVERT_FROMUTF8() call, while the second >> > does >> > > > not have. >> > > > >> > > > 1. >> > > > >> > > > Filter(condition=[AND(>=(CAST($0):INTEGER, 2009), >> <=(CAST($0):INTEGER, >> > > > 2013), OR(=(CONVERT_FROMUTF8($1), '39891'), =(CONVERT_FROMUTF8($1), >> > > > '4280'), =(CONVERT_FROMUTF8($1), '4281'), =(CONVERT_FROMUTF8($1), >> > > '42820'), >> > > > =(CONVERT_FROMUTF8($1), '42821'), =(CONVERT_FROMUTF8($1), '42822'), >> > > > =(CONVERT_FROMUTF8($1), '42823'), =(CONVERT_FROMUTF8($1), '42830'), >> > > > =(CONVERT_FROMUTF8($1), '42831'), =(CONVERT_FROMUTF8($1), '42832'), >> > > > =(CONVERT_FROMUTF8($1), '42833'), =(CONVERT_FROMUTF8($1), '42840'), >> > > > =(CONVERT_FROMUTF8($1), '42841'), =(CONVERT_FROMUTF8($1), '42842'), >> > > > =(CONVERT_FROMUTF8($1), '42843'), =(CONVERT_FROMUTF8($1), '4289'), >> > > > =(CONVERT_FROMUTF8($2), '39891'), =(CONVERT_FROMUTF8($2), '4280'), >> > > > =(CONVERT_FROMUTF8($2), '4281'), =(CONVERT_FROMUTF8($2), '42820'), >> > > > =(CONVERT_FROMUTF8($2), '42821'), =(CONVERT_FROMUTF8($2), '42822'), >> > > > =(CONVERT_FROMUTF8($2), '42823'), =(CONVERT_FROMUTF8($2), '42830'), >> > > > =(CONVERT_FROMUTF8($2), '42831'), =(CONVERT_FROMUTF8($2), '42832'), >> > > > =(CONVERT_FROMUTF8($2), '42833'), =(CONVERT_FROMUTF8($2), '42840'), >> > > > =(CONVERT_FROMUTF8($2), '42841'), =(CONVERT_FROMUTF8($2), '42842'), >> > > > =(CONVERT_FROMUTF8($2), '42843'), =(CONVERT_FROMUTF8($2), >> '4289')))]) : >> > > > >> > > > 2. >> > > > Filter(condition=[AND(>=($0, 2009), <=($0, 2013), >> > > > OR(=($1, '39891'), =($1, '4280'), =($1, '4281'), =($1, '42820'), >> =($1, >> > > > '42821'), =($1, '42822'), =($1, '42823'), =($1, '42830'), =($1, >> > '42831'), >> > > > =($1, '42832'), =($1, '42833'), =($1, '42840'), =($1, '42841'), >> =($1, >> > > > '42842'), =($1, '42843'), =($1, '4289'), =($2, '39891'), =($2, >> '4280'), >> > > > =($2, '4281'), =($2, '42820'), =($2, '42821'), =($2, '42822'), =($2, >> > > > '42823'), =($2, '42830'), =($2, '42831'), =($2, '42832'), =($2, >> > '42833'), >> > > > =($2, '42840'), =($2, '42841'), =($2, '42842'), =($2, '42843'), >> =($2, >> > > > '4289'), =($3, '39891'), =($3, '4280'), =($3, '4281'), =($3, >> '42820'), >> > > > =($3, '42821'), =($3, '42822'), =($3, '42823'), =($3, '42830'), >> =($3, >> > > > '42831'), =($3, '42832'), =($3, '42833'), =($3, '42840'), =($3, >> > '42841'), >> > > > =($3, '42842'), =($3, '42843'), =($3, '4289'), =($4, '39891'), =($4, >> > > > '4280'), =($4, '4281'), =($4, '42820'), =($4, '42821'), =($4, >> '42822'), >> > > > =($4, '42823'), =($4, '42830'), =($4, '42831'), =($4, '42832'), >> =($4, >> > > > '42833'), =($4, '42840'), =($4, '42841'), =($4, '42842'), =($4, >> > '42843'), >> > > > =($4, '4289'), =($5, '39891'), =($5, '4280'), =($5, '4281'), =($5, >> > > > '42820'), =($5, '42821'), =($5, '42822'), =($5, '42823'), =($5, >> > '42830'), >> > > > =($5, '42831'), =($5, '42832'), =($5, '42833'), =($5, '42840'), >> =($5, >> > > > '42841'), =($5, '42842'), =($5, '42843'), =($5, '4289')))]) >> > > > >> > > > Can you please post the query PLUS the corresponding plan together? >> > > > >> > > > For the first plan, there are room for improvement, since Drill >> should >> > > push >> > > > the common expression : CONVERT_FROMUTF8($2) down into a project, >> and >> > > refer >> > > > the result of expression directly in Filter, which will avoid the >> > > repeated >> > > > evaluation in Filter. >> > > > >> > > > For the second plan, I do not see room for improvement. >> > > > >> > > > >> > > > >> > > > >> > > > On Tue, Aug 25, 2015 at 2:38 AM, Sungwook Yoon <[email protected]> >> > > wrote: >> > > > >> > > > > Aman, >> > > > > >> > > > > Sorry for slow reply. >> > > > > >> > > > > Here is the SQL query. >> > > > > >> > > > > SELECT count(1) FROM `o` a WHERE (a.DX1 IN ('39891', '4280', >> '4281', >> > > > > '42820', '42821', '42822', '42823', '42830', '42831', '42832', >> > '42833', >> > > > > '42840', '42841', '42842', '42843', '4289') OR a.DX2 IN ('39891', >> > > '4280', >> > > > > '4281', '42820', '42821', '42822', '42823', '42830', '42831', >> > '42832', >> > > > > '42833', '42840', '42841', '42842', '42843', '4289') OR a.DX3 IN >> > > > ('39891', >> > > > > '4280', '4281', '42820', '42821', '42822', '42823', '42830', >> '42831', >> > > > > '42832', '42833', '42840', '42841', '42842', '42843', '4289') OR >> > a.DX4 >> > > IN >> > > > > ('39891', '4280', '4281', '42820', '42821', '42822', '42823', >> > '42830', >> > > > > '42831', '42832', '42833', '42840', '42841', '42842', '42843', >> > '4289') >> > > OR >> > > > > a.DX5 IN ('39891', '4280', '4281', '42820', '42821', '42822', >> > '42823', >> > > > > '42830', '42831', '42832', '42833', '42840', '42841', '42842', >> > '42843', >> > > > > '4289')) AND a.`year` BETWEEN 2009 AND 2013; >> > > > > >> > > > > For the view definition, I tried to convert_from to UTF and cast >> as >> > > > > Varchar. >> > > > > For integer type, I casted. >> > > > > All resulted in similar slow performance. >> > > > > >> > > > > Let me know if you have any insights. >> > > > > >> > > > > Thanks, >> > > > > >> > > > > Sungwook >> > > > > >> > > > > >> > > > > On Mon, Aug 24, 2015 at 7:20 PM, Aman Sinha <[email protected] >> > >> > > > wrote: >> > > > > >> > > > > > I was about to say that for IN lists of size 20 or more, Drill >> > uses a >> > > > > more >> > > > > > efficient Values operator instead of OR conditions but then >> > realized >> > > > the >> > > > > OR >> > > > > > filter is referencing 4 different columns : $1..$4 and each of >> > those >> > > > > > individual lists is less than 20. Sungwook, can you please >> > provide >> > > > the >> > > > > > SQL query and any view definitions or anything that goes with >> it ? >> > > It >> > > > is >> > > > > > difficult to figure out things without the full picture. >> > > > > > thanks, >> > > > > > Aman >> > > > > > >> > > > > > On Mon, Aug 24, 2015 at 5:10 PM, Ted Dunning < >> > [email protected]> >> > > > > > wrote: >> > > > > > >> > > > > > > On Mon, Aug 24, 2015 at 4:50 PM, Sungwook Yoon < >> > [email protected] >> > > > >> > > > > > wrote: >> > > > > > > >> > > > > > > > Still, the performance drop down due to OR filtering is just >> > > > > > > astounding... >> > > > > > > > >> > > > > > > >> > > > > > > That is what query optimizers are for and why getting them to >> > work >> > > > well >> > > > > > is >> > > > > > > important. >> > > > > > > >> > > > > > > The difference in performance that you are observing is not >> > > > surprising >> > > > > > > given the redundant work that you are seeing. Using the OR >> > operator >> > > > > > > prevents any significant short-circuiting and the repeated >> > > conversion >> > > > > > > operations that are happening make the evaluation much more >> > > expensive >> > > > > > than >> > > > > > > it would otherwise be (a dozen extra copies where only one is >> > > > needed). >> > > > > > > >> > > > > > > Other queries that can be subject to similar problems include >> > > common >> > > > > > table >> > > > > > > expressions that read the same (large) input file many >> times. So >> > > > far, >> > > > > > > Drill doesn't optimize all such expressions well. >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> > >
