I know that historically there have been bugs with pushing past too many
projects. I wonder if the intermediate
John, can you turn on some additional logging and then run the queries
again? If you enable DEBUG logging on the following logger:
"org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler" we will get
more information on what is happening. Once you do so, run each query again
and report back what we see for each of the following plans: [Optiq
Logical, Drill Logical, Drill Physical].
If you haven't done so before, you can enable logging of a new logger by
modifying the logback.xml configuration file. For example, add the
following block to get the information into the log file:
<logger
name="org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler"
additivity="false">
<level value="info" />
<appender-ref ref="FILE" />
</logger>
Thanks,
Jacques
--
Jacques Nadeau
CTO and Co-Founder, Dremio
On Sun, Feb 7, 2016 at 11:48 AM, Ted Dunning <[email protected]> wrote:
> Expanding the query using the definition of the view runs quickly even with
> the CONVERT, apparently, so I think that there is something going on in the
> view that is causing the optimizer to lose sight of the pushdown.
>
>
>
> On Sun, Feb 7, 2016 at 11:11 AM, Zelaine Fong <[email protected]> wrote:
>
> > Oops, missed the "not" in one of my sentences. The corrected sentence
> > should be:
> >
> > You can tell that's the case because in the case of the view query, the
> > explain plan has a Filter operation, whereas in the case of the non-view
> > query you do NOT.
> >
> > -- Zelaine
> >
> > On Sun, Feb 7, 2016 at 11:09 AM, Zelaine Fong <[email protected]>
> wrote:
> >
> > > It looks like in the case of the non-view query, the WHERE clause is
> > being
> > > pushed down to MapR-DB. The pushdown isn't happening in the case of
> the
> > > view. You can tell that's the case because in the case of the view
> > query,
> > > the explain plan has a Filter operation, whereas in the case of the
> > > non-view query you do. As you noted, in the case of the non-view, the
> > Scan
> > > has a startRow/stopRow/filter, which corresponds to the pushdown of the
> > > filter.
> > >
> > > I'm wondering if the problem is related to the CONVERT_FROM() in the
> > WHERE
> > > clause, or if all filters on views aren't being pushed down?
> > >
> > > -- Zelaine
> > >
> > > On Sat, Feb 6, 2016 at 2:02 PM, John Omernik <[email protected]> wrote:
> > >
> > >> Hey all, I was running some queries on a MapR-DB Table I have. I
> > created
> > >> a
> > >> view to avoid forcing users to write queries that always included the
> > >> CONVERT_FROM statements. (I am a huge advocate of making things easy
> for
> > >> the the users and writing queries with CONVERT_FROM statements isn't
> > >> easy).
> > >>
> > >> I ran a query the other day on one of these views and noticed that a
> > query
> > >> that took 30 seconds really shouldn't take 30 seconds. What do I
> mean?
> > >> well I wanted to get part of a record by looking up the MapR-DB Row
> key
> > >> (equiv. to HBASE row key) That should be an instant lookup. Sure
> > enough,
> > >> when I tried it in the hbase shell that returns instantly. So why did
> > >> Drill take 30 seconds? I shot an email to Ted and Jim at MapR to ask
> > this
> > >> very question. Ted suggested that I try the query without a view.
> Sure
> > >> enough, If I use the convert_from in a direct query, it's an instant
> > (sub
> > >> second) return. Thus it appears something in the view is not allowing
> > the
> > >> query to short circuit the read.
> > >>
> > >> Ted suggests I post here (I am curious if anyone who has HBASE setup
> is
> > >> seeing this same issue with views) but also include the EXPLAIN plan.
> > >> Basically, using my very limited ability to read EXPLAIN plans (If
> > someone
> > >> has a pointer to a blog post or docs on how to read EXPLAIN I would
> love
> > >> that!) it looks like in the view the startRow and stopRow in the
> > >> hbaseScanSpec are not set, seeming to cause a scan. Is there any away
> > to
> > >> assist the planner when running this through a view so that we can get
> > the
> > >> performance of the query without the view but with the easy of
> > >> use/readability of using the view?
> > >>
> > >> Thanks!!!
> > >>
> > >> John
> > >>
> > >> View Creation
> > >>
> > >> CREATE VIEW view_testpaste as
> > >> SELECT
> > >> CONVERT_FROM(row_key, 'UTF8') AS pasteid,
> > >> CONVERT_FROM(pastes.pdata.lang, 'UTF8') AS lang,
> > >> CONVERT_FROM(pastes.raw.paste, 'UTF8') AS paste
> > >> FROM dfs.`pastes`.`/pastes` pastes;
> > >>
> > >>
> > >> Select from view takes 32 seconds (seems to be a scan)
> > >>
> > >> > select paste from view_testpaste where pasteid = 'djHEHcPM'
> > >>
> > >> 1 row selected (32.302 seconds)
> > >>
> > >>
> > >> Just a direct select returns very fast (0.486 seconds)
> > >>
> > >> > select CONVERT_FROM(pastes.raw.paste, 'UTF8') AS paste
> > >> FROM dfs.`pastes`.`/pastes` pastes where
> > >> CONVERT_FROM(row_key, 'UTF8') = 'djHEHcPM';
> > >>
> > >> 1 row selected (0.486 seconds)
> > >>
> > >>
> > >>
> > >>
> > >> EXPLAIN PLAN FOR select paste from view_testpaste where pasteid =
> > >> 'djHEHcPM'
> > >>
> > >> +------+------+
> > >> | text | json |
> > >> +------+------+
> > >> | 00-00 Screen
> > >> 00-01 UnionExchange
> > >> 01-01 Project(paste=[CONVERT_FROMUTF8($1)])
> > >> 01-02 SelectionVectorRemover
> > >> 01-03 Filter(condition=[=(CONVERT_FROMUTF8($0),
> 'djHEHcPM')])
> > >> 01-04 Project(row_key=[$1], ITEM=[ITEM($0, 'paste')])
> > >> 01-05 Scan(groupscan=[MapRDBGroupScan
> > >> [HBaseScanSpec=HBaseScanSpec
> > [tableName=maprfs:///data/pastebiner/pastes,
> > >> startRow=null, stopRow=null, filter=null], columns=[`row_key`,
> > >> `raw`.`paste`]]])
> > >> | {
> > >> "head" : {
> > >> "version" : 1,
> > >> "generator" : {
> > >> "type" : "ExplainHandler",
> > >> "info" : ""
> > >> },
> > >> "type" : "APACHE_DRILL_PHYSICAL",
> > >> "options" : [ ],
> > >> "queue" : 0,
> > >> "resultMode" : "EXEC"
> > >> },
> > >> "graph" : [ {
> > >> "pop" : "maprdb-scan",
> > >> "@id" : 65541,
> > >> "userName" : "darkness",
> > >> "hbaseScanSpec" : {
> > >> "tableName" : "maprfs:///data/pastebiner/pastes",
> > >> "startRow" : "",
> > >> "stopRow" : "",
> > >> "serializedFilter" : null
> > >> },
> > >> "storage" : {
> > >> "type" : "file",
> > >> "enabled" : true,
> > >> "connection" : "maprfs:///",
> > >> "workspaces" : {
> > >> "root" : {
> > >> "location" : "/",
> > >> "writable" : false,
> > >> "defaultInputFormat" : null
> > >> },
> > >> "pastes" : {
> > >> "location" : "/data/pastebiner",
> > >> "writable" : true,
> > >> "defaultInputFormat" : null
> > >> },
> > >> "dev" : {
> > >> "location" : "/data/dev",
> > >> "writable" : true,
> > >> "defaultInputFormat" : null
> > >> },
> > >> "hive" : {
> > >> "location" : "/user/hive",
> > >> "writable" : true,
> > >> "defaultInputFormat" : null
> > >> },
> > >> "tmp" : {
> > >> "location" : "/tmp",
> > >> "writable" : true,
> > >> "defaultInputFormat" : null
> > >> }
> > >> },
> > >> "formats" : {
> > >> "psv" : {
> > >> "type" : "text",
> > >> "extensions" : [ "tbl" ],
> > >> "delimiter" : "|"
> > >> },
> > >> "csv" : {
> > >> "type" : "text",
> > >> "extensions" : [ "csv" ],
> > >> "escape" : "`",
> > >> "delimiter" : ","
> > >> },
> > >> "tsv" : {
> > >> "type" : "text",
> > >> "extensions" : [ "tsv" ],
> > >> "delimiter" : "\t"
> > >> },
> > >> "parquet" : {
> > >> "type" : "parquet"
> > >> },
> > >> "json" : {
> > >> "type" : "json"
> > >> },
> > >> "maprdb" : {
> > >> "type" : "maprdb"
> > >> }
> > >> }
> > >> },
> > >> "columns" : [ "`row_key`", "`raw`.`paste`" ],
> > >> "cost" : 573950.0
> > >> }, {
> > >> "pop" : "project",
> > >> "@id" : 65540,
> > >> "exprs" : [ {
> > >> "ref" : "`row_key`",
> > >> "expr" : "`row_key`"
> > >> }, {
> > >> "ref" : "`ITEM`",
> > >> "expr" : "`raw`.`paste`"
> > >> } ],
> > >> "child" : 65541,
> > >> "initialAllocation" : 1000000,
> > >> "maxAllocation" : 10000000000,
> > >> "cost" : 573950.0
> > >> }, {
> > >> "pop" : "filter",
> > >> "@id" : 65539,
> > >> "child" : 65540,
> > >> "expr" : "equal(convert_fromutf8(`row_key`) , 'djHEHcPM') ",
> > >> "initialAllocation" : 1000000,
> > >> "maxAllocation" : 10000000000,
> > >> "cost" : 86092.5
> > >> }, {
> > >> "pop" : "selection-vector-remover",
> > >> "@id" : 65538,
> > >> "child" : 65539,
> > >> "initialAllocation" : 1000000,
> > >> "maxAllocation" : 10000000000,
> > >> "cost" : 86092.5
> > >> }, {
> > >> "pop" : "project",
> > >> "@id" : 65537,
> > >> "exprs" : [ {
> > >> "ref" : "`paste`",
> > >> "expr" : "convert_fromutf8(`ITEM`) "
> > >> } ],
> > >> "child" : 65538,
> > >> "initialAllocation" : 1000000,
> > >> "maxAllocation" : 10000000000,
> > >> "cost" : 86092.5
> > >> }, {
> > >> "pop" : "union-exchange",
> > >> "@id" : 1,
> > >> "child" : 65537,
> > >> "initialAllocation" : 1000000,
> > >> "maxAllocation" : 10000000000,
> > >> "cost" : 86092.5
> > >> }, {
> > >> "pop" : "screen",
> > >> "@id" : 0,
> > >> "child" : 1,
> > >> "initialAllocation" : 1000000,
> > >> "maxAllocation" : 10000000000,
> > >> "cost" : 86092.5
> > >> } ]
> > >> } |
> > >> +------+------+
> > >> 1 row selected (0.42 seconds)
> > >>
> > >>
> > >>
> > >> EXPLAIN PLAN FOR select CONVERT_FROM(pastes.raw.paste, 'UTF8') AS
> paste
> > >> FROM dfs.`pastes`.`/pastes` pastes where
> > >> CONVERT_FROM(row_key, 'UTF8') = 'djHEHcPM';
> > >>
> > >> +------+------+
> > >> | text | json |
> > >> +------+------+
> > >> | 00-00 Screen
> > >> 00-01 Project(paste=[CONVERT_FROMUTF8($1)])
> > >> 00-02 Project(row_key=[$1], ITEM=[ITEM($0, 'paste')])
> > >> 00-03 Scan(groupscan=[MapRDBGroupScan
> > >> [HBaseScanSpec=HBaseScanSpec
> > >> [tableName=maprfs:///data/pastebiner/pastes, startRow=djHEHcPM,
> > >> stopRow=djHEHcPM\x00, filter=RowFilter (EQUAL, djHEHcPM)],
> > >> columns=[`row_key`, `raw`.`paste`]]])
> > >> | {
> > >> "head" : {
> > >> "version" : 1,
> > >> "generator" : {
> > >> "type" : "ExplainHandler",
> > >> "info" : ""
> > >> },
> > >> "type" : "APACHE_DRILL_PHYSICAL",
> > >> "options" : [ ],
> > >> "queue" : 0,
> > >> "resultMode" : "EXEC"
> > >> },
> > >> "graph" : [ {
> > >> "pop" : "maprdb-scan",
> > >> "@id" : 3,
> > >> "userName" : "darkness",
> > >> "hbaseScanSpec" : {
> > >> "tableName" : "maprfs:///data/pastebiner/pastes",
> > >> "startRow" : "ZGpIRUhjUE0=",
> > >> "stopRow" : "ZGpIRUhjUE0A",
> > >> "serializedFilter" :
> > >>
> > >>
> >
> "CihvcmcuYXBhY2hlLmhhZG9vcC5oYmFzZS5maWx0ZXIuUm93RmlsdGVyEkUKQwgCEj8KL29yZy5hcGFjaGUuaGFkb29wLmhiYXNlLmZpbHRlci5CaW5hcnlDb21wYXJhdG9yEgwKCgoIZGpIRUhjUE0="
> > >> },
> > >> "storage" : {
> > >> "type" : "file",
> > >> "enabled" : true,
> > >> "connection" : "maprfs:///",
> > >> "workspaces" : {
> > >> "root" : {
> > >> "location" : "/",
> > >> "writable" : false,
> > >> "defaultInputFormat" : null
> > >> },
> > >> "pastes" : {
> > >> "location" : "/data/pastebiner",
> > >> "writable" : true,
> > >> "defaultInputFormat" : null
> > >> },
> > >> "dev" : {
> > >> "location" : "/data/dev",
> > >> "writable" : true,
> > >> "defaultInputFormat" : null
> > >> },
> > >> "hive" : {
> > >> "location" : "/user/hive",
> > >> "writable" : true,
> > >> "defaultInputFormat" : null
> > >> },
> > >> "tmp" : {
> > >> "location" : "/tmp",
> > >> "writable" : true,
> > >> "defaultInputFormat" : null
> > >> }
> > >> },
> > >> "formats" : {
> > >> "psv" : {
> > >> "type" : "text",
> > >> "extensions" : [ "tbl" ],
> > >> "delimiter" : "|"
> > >> },
> > >> "csv" : {
> > >> "type" : "text",
> > >> "extensions" : [ "csv" ],
> > >> "escape" : "`",
> > >> "delimiter" : ","
> > >> },
> > >> "tsv" : {
> > >> "type" : "text",
> > >> "extensions" : [ "tsv" ],
> > >> "delimiter" : "\t"
> > >> },
> > >> "parquet" : {
> > >> "type" : "parquet"
> > >> },
> > >> "json" : {
> > >> "type" : "json"
> > >> },
> > >> "maprdb" : {
> > >> "type" : "maprdb"
> > >> }
> > >> }
> > >> },
> > >> "columns" : [ "`row_key`", "`raw`.`paste`" ],
> > >> "cost" : 286975.0
> > >> }, {
> > >> "pop" : "project",
> > >> "@id" : 2,
> > >> "exprs" : [ {
> > >> "ref" : "`row_key`",
> > >> "expr" : "`row_key`"
> > >> }, {
> > >> "ref" : "`ITEM`",
> > >> "expr" : "`raw`.`paste`"
> > >> } ],
> > >> "child" : 3,
> > >> "initialAllocation" : 1000000,
> > >> "maxAllocation" : 10000000000,
> > >> "cost" : 286975.0
> > >> }, {
> > >> "pop" : "project",
> > >> "@id" : 1,
> > >> "exprs" : [ {
> > >> "ref" : "`paste`",
> > >> "expr" : "convert_fromutf8(`ITEM`) "
> > >> } ],
> > >> "child" : 2,
> > >> "initialAllocation" : 1000000,
> > >> "maxAllocation" : 10000000000,
> > >> "cost" : 286975.0
> > >> }, {
> > >> "pop" : "screen",
> > >> "@id" : 0,
> > >> "child" : 1,
> > >> "initialAllocation" : 1000000,
> > >> "maxAllocation" : 10000000000,
> > >> "cost" : 286975.0
> > >> } ]
> > >> } |
> > >>
> > >
> > >
> >
>