Expanding the query using the definition of the view runs quickly even with the CONVERT, apparently, so I think that there is something going on in the view that is causing the optimizer to lose sight of the pushdown.
On Sun, Feb 7, 2016 at 11:11 AM, Zelaine Fong <[email protected]> wrote: > Oops, missed the "not" in one of my sentences. The corrected sentence > should be: > > You can tell that's the case because in the case of the view query, the > explain plan has a Filter operation, whereas in the case of the non-view > query you do NOT. > > -- Zelaine > > On Sun, Feb 7, 2016 at 11:09 AM, Zelaine Fong <[email protected]> wrote: > > > It looks like in the case of the non-view query, the WHERE clause is > being > > pushed down to MapR-DB. The pushdown isn't happening in the case of the > > view. You can tell that's the case because in the case of the view > query, > > the explain plan has a Filter operation, whereas in the case of the > > non-view query you do. As you noted, in the case of the non-view, the > Scan > > has a startRow/stopRow/filter, which corresponds to the pushdown of the > > filter. > > > > I'm wondering if the problem is related to the CONVERT_FROM() in the > WHERE > > clause, or if all filters on views aren't being pushed down? > > > > -- Zelaine > > > > On Sat, Feb 6, 2016 at 2:02 PM, John Omernik <[email protected]> wrote: > > > >> Hey all, I was running some queries on a MapR-DB Table I have. I > created > >> a > >> view to avoid forcing users to write queries that always included the > >> CONVERT_FROM statements. (I am a huge advocate of making things easy for > >> the the users and writing queries with CONVERT_FROM statements isn't > >> easy). > >> > >> I ran a query the other day on one of these views and noticed that a > query > >> that took 30 seconds really shouldn't take 30 seconds. What do I mean? > >> well I wanted to get part of a record by looking up the MapR-DB Row key > >> (equiv. to HBASE row key) That should be an instant lookup. Sure > enough, > >> when I tried it in the hbase shell that returns instantly. So why did > >> Drill take 30 seconds? I shot an email to Ted and Jim at MapR to ask > this > >> very question. Ted suggested that I try the query without a view. Sure > >> enough, If I use the convert_from in a direct query, it's an instant > (sub > >> second) return. Thus it appears something in the view is not allowing > the > >> query to short circuit the read. > >> > >> Ted suggests I post here (I am curious if anyone who has HBASE setup is > >> seeing this same issue with views) but also include the EXPLAIN plan. > >> Basically, using my very limited ability to read EXPLAIN plans (If > someone > >> has a pointer to a blog post or docs on how to read EXPLAIN I would love > >> that!) it looks like in the view the startRow and stopRow in the > >> hbaseScanSpec are not set, seeming to cause a scan. Is there any away > to > >> assist the planner when running this through a view so that we can get > the > >> performance of the query without the view but with the easy of > >> use/readability of using the view? > >> > >> Thanks!!! > >> > >> John > >> > >> View Creation > >> > >> CREATE VIEW view_testpaste as > >> SELECT > >> CONVERT_FROM(row_key, 'UTF8') AS pasteid, > >> CONVERT_FROM(pastes.pdata.lang, 'UTF8') AS lang, > >> CONVERT_FROM(pastes.raw.paste, 'UTF8') AS paste > >> FROM dfs.`pastes`.`/pastes` pastes; > >> > >> > >> Select from view takes 32 seconds (seems to be a scan) > >> > >> > select paste from view_testpaste where pasteid = 'djHEHcPM' > >> > >> 1 row selected (32.302 seconds) > >> > >> > >> Just a direct select returns very fast (0.486 seconds) > >> > >> > select CONVERT_FROM(pastes.raw.paste, 'UTF8') AS paste > >> FROM dfs.`pastes`.`/pastes` pastes where > >> CONVERT_FROM(row_key, 'UTF8') = 'djHEHcPM'; > >> > >> 1 row selected (0.486 seconds) > >> > >> > >> > >> > >> EXPLAIN PLAN FOR select paste from view_testpaste where pasteid = > >> 'djHEHcPM' > >> > >> +------+------+ > >> | text | json | > >> +------+------+ > >> | 00-00 Screen > >> 00-01 UnionExchange > >> 01-01 Project(paste=[CONVERT_FROMUTF8($1)]) > >> 01-02 SelectionVectorRemover > >> 01-03 Filter(condition=[=(CONVERT_FROMUTF8($0), 'djHEHcPM')]) > >> 01-04 Project(row_key=[$1], ITEM=[ITEM($0, 'paste')]) > >> 01-05 Scan(groupscan=[MapRDBGroupScan > >> [HBaseScanSpec=HBaseScanSpec > [tableName=maprfs:///data/pastebiner/pastes, > >> startRow=null, stopRow=null, filter=null], columns=[`row_key`, > >> `raw`.`paste`]]]) > >> | { > >> "head" : { > >> "version" : 1, > >> "generator" : { > >> "type" : "ExplainHandler", > >> "info" : "" > >> }, > >> "type" : "APACHE_DRILL_PHYSICAL", > >> "options" : [ ], > >> "queue" : 0, > >> "resultMode" : "EXEC" > >> }, > >> "graph" : [ { > >> "pop" : "maprdb-scan", > >> "@id" : 65541, > >> "userName" : "darkness", > >> "hbaseScanSpec" : { > >> "tableName" : "maprfs:///data/pastebiner/pastes", > >> "startRow" : "", > >> "stopRow" : "", > >> "serializedFilter" : null > >> }, > >> "storage" : { > >> "type" : "file", > >> "enabled" : true, > >> "connection" : "maprfs:///", > >> "workspaces" : { > >> "root" : { > >> "location" : "/", > >> "writable" : false, > >> "defaultInputFormat" : null > >> }, > >> "pastes" : { > >> "location" : "/data/pastebiner", > >> "writable" : true, > >> "defaultInputFormat" : null > >> }, > >> "dev" : { > >> "location" : "/data/dev", > >> "writable" : true, > >> "defaultInputFormat" : null > >> }, > >> "hive" : { > >> "location" : "/user/hive", > >> "writable" : true, > >> "defaultInputFormat" : null > >> }, > >> "tmp" : { > >> "location" : "/tmp", > >> "writable" : true, > >> "defaultInputFormat" : null > >> } > >> }, > >> "formats" : { > >> "psv" : { > >> "type" : "text", > >> "extensions" : [ "tbl" ], > >> "delimiter" : "|" > >> }, > >> "csv" : { > >> "type" : "text", > >> "extensions" : [ "csv" ], > >> "escape" : "`", > >> "delimiter" : "," > >> }, > >> "tsv" : { > >> "type" : "text", > >> "extensions" : [ "tsv" ], > >> "delimiter" : "\t" > >> }, > >> "parquet" : { > >> "type" : "parquet" > >> }, > >> "json" : { > >> "type" : "json" > >> }, > >> "maprdb" : { > >> "type" : "maprdb" > >> } > >> } > >> }, > >> "columns" : [ "`row_key`", "`raw`.`paste`" ], > >> "cost" : 573950.0 > >> }, { > >> "pop" : "project", > >> "@id" : 65540, > >> "exprs" : [ { > >> "ref" : "`row_key`", > >> "expr" : "`row_key`" > >> }, { > >> "ref" : "`ITEM`", > >> "expr" : "`raw`.`paste`" > >> } ], > >> "child" : 65541, > >> "initialAllocation" : 1000000, > >> "maxAllocation" : 10000000000, > >> "cost" : 573950.0 > >> }, { > >> "pop" : "filter", > >> "@id" : 65539, > >> "child" : 65540, > >> "expr" : "equal(convert_fromutf8(`row_key`) , 'djHEHcPM') ", > >> "initialAllocation" : 1000000, > >> "maxAllocation" : 10000000000, > >> "cost" : 86092.5 > >> }, { > >> "pop" : "selection-vector-remover", > >> "@id" : 65538, > >> "child" : 65539, > >> "initialAllocation" : 1000000, > >> "maxAllocation" : 10000000000, > >> "cost" : 86092.5 > >> }, { > >> "pop" : "project", > >> "@id" : 65537, > >> "exprs" : [ { > >> "ref" : "`paste`", > >> "expr" : "convert_fromutf8(`ITEM`) " > >> } ], > >> "child" : 65538, > >> "initialAllocation" : 1000000, > >> "maxAllocation" : 10000000000, > >> "cost" : 86092.5 > >> }, { > >> "pop" : "union-exchange", > >> "@id" : 1, > >> "child" : 65537, > >> "initialAllocation" : 1000000, > >> "maxAllocation" : 10000000000, > >> "cost" : 86092.5 > >> }, { > >> "pop" : "screen", > >> "@id" : 0, > >> "child" : 1, > >> "initialAllocation" : 1000000, > >> "maxAllocation" : 10000000000, > >> "cost" : 86092.5 > >> } ] > >> } | > >> +------+------+ > >> 1 row selected (0.42 seconds) > >> > >> > >> > >> EXPLAIN PLAN FOR select CONVERT_FROM(pastes.raw.paste, 'UTF8') AS paste > >> FROM dfs.`pastes`.`/pastes` pastes where > >> CONVERT_FROM(row_key, 'UTF8') = 'djHEHcPM'; > >> > >> +------+------+ > >> | text | json | > >> +------+------+ > >> | 00-00 Screen > >> 00-01 Project(paste=[CONVERT_FROMUTF8($1)]) > >> 00-02 Project(row_key=[$1], ITEM=[ITEM($0, 'paste')]) > >> 00-03 Scan(groupscan=[MapRDBGroupScan > >> [HBaseScanSpec=HBaseScanSpec > >> [tableName=maprfs:///data/pastebiner/pastes, startRow=djHEHcPM, > >> stopRow=djHEHcPM\x00, filter=RowFilter (EQUAL, djHEHcPM)], > >> columns=[`row_key`, `raw`.`paste`]]]) > >> | { > >> "head" : { > >> "version" : 1, > >> "generator" : { > >> "type" : "ExplainHandler", > >> "info" : "" > >> }, > >> "type" : "APACHE_DRILL_PHYSICAL", > >> "options" : [ ], > >> "queue" : 0, > >> "resultMode" : "EXEC" > >> }, > >> "graph" : [ { > >> "pop" : "maprdb-scan", > >> "@id" : 3, > >> "userName" : "darkness", > >> "hbaseScanSpec" : { > >> "tableName" : "maprfs:///data/pastebiner/pastes", > >> "startRow" : "ZGpIRUhjUE0=", > >> "stopRow" : "ZGpIRUhjUE0A", > >> "serializedFilter" : > >> > >> > "CihvcmcuYXBhY2hlLmhhZG9vcC5oYmFzZS5maWx0ZXIuUm93RmlsdGVyEkUKQwgCEj8KL29yZy5hcGFjaGUuaGFkb29wLmhiYXNlLmZpbHRlci5CaW5hcnlDb21wYXJhdG9yEgwKCgoIZGpIRUhjUE0=" > >> }, > >> "storage" : { > >> "type" : "file", > >> "enabled" : true, > >> "connection" : "maprfs:///", > >> "workspaces" : { > >> "root" : { > >> "location" : "/", > >> "writable" : false, > >> "defaultInputFormat" : null > >> }, > >> "pastes" : { > >> "location" : "/data/pastebiner", > >> "writable" : true, > >> "defaultInputFormat" : null > >> }, > >> "dev" : { > >> "location" : "/data/dev", > >> "writable" : true, > >> "defaultInputFormat" : null > >> }, > >> "hive" : { > >> "location" : "/user/hive", > >> "writable" : true, > >> "defaultInputFormat" : null > >> }, > >> "tmp" : { > >> "location" : "/tmp", > >> "writable" : true, > >> "defaultInputFormat" : null > >> } > >> }, > >> "formats" : { > >> "psv" : { > >> "type" : "text", > >> "extensions" : [ "tbl" ], > >> "delimiter" : "|" > >> }, > >> "csv" : { > >> "type" : "text", > >> "extensions" : [ "csv" ], > >> "escape" : "`", > >> "delimiter" : "," > >> }, > >> "tsv" : { > >> "type" : "text", > >> "extensions" : [ "tsv" ], > >> "delimiter" : "\t" > >> }, > >> "parquet" : { > >> "type" : "parquet" > >> }, > >> "json" : { > >> "type" : "json" > >> }, > >> "maprdb" : { > >> "type" : "maprdb" > >> } > >> } > >> }, > >> "columns" : [ "`row_key`", "`raw`.`paste`" ], > >> "cost" : 286975.0 > >> }, { > >> "pop" : "project", > >> "@id" : 2, > >> "exprs" : [ { > >> "ref" : "`row_key`", > >> "expr" : "`row_key`" > >> }, { > >> "ref" : "`ITEM`", > >> "expr" : "`raw`.`paste`" > >> } ], > >> "child" : 3, > >> "initialAllocation" : 1000000, > >> "maxAllocation" : 10000000000, > >> "cost" : 286975.0 > >> }, { > >> "pop" : "project", > >> "@id" : 1, > >> "exprs" : [ { > >> "ref" : "`paste`", > >> "expr" : "convert_fromutf8(`ITEM`) " > >> } ], > >> "child" : 2, > >> "initialAllocation" : 1000000, > >> "maxAllocation" : 10000000000, > >> "cost" : 286975.0 > >> }, { > >> "pop" : "screen", > >> "@id" : 0, > >> "child" : 1, > >> "initialAllocation" : 1000000, > >> "maxAllocation" : 10000000000, > >> "cost" : 286975.0 > >> } ] > >> } | > >> > > > > >
