date:20181009

[jira] [Commented] (DRILL-6763) Codegen optimization of SQL functions with constant values

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644387#comment-16644387
 ] 

ASF GitHub Bot commented on DRILL-6763:
---

lushuifeng commented on issue #1481: DRILL-6763: Codegen optimization of SQL 
functions with constant values
URL: https://github.com/apache/drill/pull/1481#issuecomment-428425133
 
 
   @vvysotskyi thanks for your testing, I'm working on this, it may take some 
time


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Codegen optimization of SQL functions with constant values
> --
>
> Key: DRILL-6763
> URL: https://issues.apache.org/jira/browse/DRILL-6763
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Codegen
>Affects Versions: 1.14.0
>Reporter: shuifeng lu
>Assignee: shuifeng lu
>Priority: Major
> Fix For: 1.15.0
>
> Attachments: Query1.java, Query2.java, code_compare.png, 
> compilation_time.png
>
>
> Codegen class compilation takes tens to hundreds of milliseconds, a class 
> cache is hit when generifiedCode of code generator is exactly the same.
>  It works fine when UDF only takes columns or symbols, but not efficient when 
> one or more parameters in UDF is always distinct from the other.
>  Take face recognition for example, the face images are almost distinct from 
> each other according to lighting, facial expressions and details.
>  It is important to reduce redundant class compilation especially for those 
> low latency queries.
>  Cache miss rate and metaspace gc can also be reduced by eliminating the 
> redundant classes.
> Here is the query to get the persons whose last name is Brunner and hire from 
> 1st Jan 1990:
>  SELECT full_name, hire_date FROM cp.`employee.json` where last_name = 
> 'Brunner' and hire_date >= '1990-01-01 00:00:00.0';
>  Now get the persons whose last name is Bernard and hire from 1st Jan 1990.
>  SELECT full_name, hire_date FROM cp.`employee.json` where last_name = 
> 'Bernard' and hire_date >= '1990-01-01 00:00:00.0';
> Figure !compilation_time.png! shows the compilation time of the generated 
> code by the above query in FilterRecordBatch on my laptop
>  Figure !code_compare.png!  shows the only difference of the generated code 
> from the attachments is the last_name value at line 156.
>  It is straightforward that the redundant class compilation can be eliminated 
> by making the string12 as a member of the class and set the value when the 
> instance is created



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-6787) Update Spnego webpage

2018-10-09 Thread Robert Hou (JIRA)

Robert Hou created DRILL-6787:
-

 Summary: Update Spnego webpage
 Key: DRILL-6787
 URL: https://issues.apache.org/jira/browse/DRILL-6787
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.14.0
Reporter: Robert Hou
Assignee: Bridget Bevens
 Fix For: 1.15.0


A few things should be updated on this webpage:
https://drill.apache.org/docs/configuring-drill-to-use-spnego-for-http-authentication/

When configuring drillbits in drill-override.conf, the principal and keytab 
should be corrected.  There are two places where this should be corrected.
{noformat}
drill.exec.http: {
  auth.spnego.principal:"HTTP/hostname@realm",
  auth.spnego.keytab:"path/to/keytab",
  auth.mechanisms: [“SPNEGO”]
}
{noformat}
For the section on Chrome, we should change "hostname/domain" to "domain".  Or 
"hostname@domain".  Also, the two blanks around the "=" should be removed.
{noformat}
google-chrome --auth-server-whitelist="hostname/domain"
{noformat}
Also, for the section on Chrome, the "domain" should match the URL given to 
Chrome to access the Web UI.

Also, Linux and Mac should be treated in separate paragraphs.  These should be 
the directions for Mac:
{noformat}
cd /Applications/Google Chrome.app/Contents/MacOS
./"Google Chrome" --auth-server-whitelist="example.com"
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6739) Update Kafka libs to 2.0.0 version

2018-10-09 Thread Pritesh Maker (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-6739:
-
Fix Version/s: (was: Future)
   1.16.0

> Update Kafka libs to 2.0.0 version
> --
>
> Key: DRILL-6739
> URL: https://issues.apache.org/jira/browse/DRILL-6739
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Kafka
>Affects Versions: 1.14.0
>Reporter: Vitalii Diravka
>Priority: Minor
> Fix For: 1.16.0
>
>
> The current version of Kafka libs is 0.11.0.1
>  The last version is 2.0.0 (September 2018) 
> https://mvnrepository.com/artifact/org.apache.kafka/kafka-clients
> Looks like the only changes which should be done are:
>  * replacing {{serverConfig()}} method with {{staticServerConfig()}} in Drill 
> {{EmbeddedKafkaCluster}} class
>  * Replacing deprecated {{AdminUtils}} with {{kafka.zk.AdminZkClient}} 
> [https://github.com/apache/kafka/blob/3cdc78e6bb1f83973a14ce1550fe3874f7348b05/core/src/main/scala/kafka/admin/AdminUtils.scala#L35]
>  https://issues.apache.org/jira/browse/KAFKA-6545
> The initial work: https://github.com/vdiravka/drill/commits/DRILL-6739



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644152#comment-16644152
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

amansinha100 commented on a change in pull request #1466: DRILL-6381: Add 
support for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223878408
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/json/MaprDBJsonRecordReader.java
 ##
 @@ -517,6 +388,15 @@ private static FieldPath 
getFieldPathForProjection(SchemaPath column) {
 return new FieldPath(child);
   }
 
+  public static boolean includesIdField(Collection projected) {
+return Iterables.tryFind(projected, new Predicate() {
 
 Review comment:
   @vdiravka could you elaborate on this a bit ?  I haven't done much with 
lambda expressions in Java 8, but if you show how to rewrite this statement and 
assuming it has an advantage over the existing implementation, I would be happy 
to.  Thanks. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6739) Update Kafka libs to 2.0.0 version

2018-10-09 Thread Kunal Khatua (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua updated DRILL-6739:

Component/s: (was: Storage - Other)
 Storage - Kafka

> Update Kafka libs to 2.0.0 version
> --
>
> Key: DRILL-6739
> URL: https://issues.apache.org/jira/browse/DRILL-6739
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Kafka
>Affects Versions: 1.14.0
>Reporter: Vitalii Diravka
>Priority: Minor
> Fix For: Future
>
>
> The current version of Kafka libs is 0.11.0.1
>  The last version is 2.0.0 (September 2018) 
> https://mvnrepository.com/artifact/org.apache.kafka/kafka-clients
> Looks like the only changes which should be done are:
>  * replacing {{serverConfig()}} method with {{staticServerConfig()}} in Drill 
> {{EmbeddedKafkaCluster}} class
>  * Replacing deprecated {{AdminUtils}} with {{kafka.zk.AdminZkClient}} 
> [https://github.com/apache/kafka/blob/3cdc78e6bb1f83973a14ce1550fe3874f7348b05/core/src/main/scala/kafka/admin/AdminUtils.scala#L35]
>  https://issues.apache.org/jira/browse/KAFKA-6545
> The initial work: https://github.com/vdiravka/drill/commits/DRILL-6739



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6731) JPPD:Move aggregating the BF from the Foreman to the RuntimeFilter

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644082#comment-16644082
 ] 

ASF GitHub Bot commented on DRILL-6731:
---

sohami commented on issue #1459: DRILL-6731: Move the BFs aggregating work from 
the Foreman to the RuntimeFi…
URL: https://github.com/apache/drill/pull/1459#issuecomment-428349411
 
 
   +1 for last commit as well. LGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> JPPD:Move aggregating the BF from the Foreman to the RuntimeFilter
> --
>
> Key: DRILL-6731
> URL: https://issues.apache.org/jira/browse/DRILL-6731
> Project: Apache Drill
>  Issue Type: Improvement
>  Components:  Server
>Affects Versions: 1.15.0
>Reporter: weijie.tong
>Assignee: weijie.tong
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
>
> This PR is to move the BloomFilter aggregating work from the foreman to 
> RuntimeFilter. Though this change, the RuntimeFilter can apply the incoming 
> BF as soon as possible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6786) Work with servers where you do not have full privileges

2018-10-09 Thread Dobes Vandermeer (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dobes Vandermeer updated DRILL-6786:

Description: 
We have a mongo database hosted with mLab.  When trying to connect to this 
database, a couple of issues show up due to the lack of admin privileges on 
that database.

Assuming I specify the full connection URL including the name of a database, 
the connection fails with this in the logs:

 
{code:java}
2018-10-09 19:42:52,788 [2442fb44-f73b-1344-0359-125fcc386645:foreman] WARN 
o.a.d.e.s.m.s.MongoSchemaFactory - Failure while loading databases in Mongo. 
Command failed with error 13: 'not authorized on admin to execute command { 
listDatabases: 1, $db: "admin" }' on server 
ds063555-a1.vvn97.fleet.mlab.com:63555. The full response is { "operationTime" 
: { "$timestamp" :
{ "t" : 1539114172, "i" : 230 }
}, "ok" : 0.0, "errmsg" : "not authorized on admin to execute command { 
listDatabases: 1, $db: \"admin\" }", "code" : 13, "codeName" : "Unauthorized", 
"$clusterTime" : { "clusterTime" : { "$timestamp" :
{ "t" : 1539114172, "i" : 230 }
}, "signature" : { "hash" :
{ "$binary" : "r3SzOhbP8Zgu9BSyWvGPBlPmrt8=", "$type" : "0" }
, "keyId" : { "$numberLong" : "6608592120234115184" } } } }
 
{code}
 

If I don't specify the database name on the connection string, I get:

 
{code:java}
2018-10-09 19:22:26,144 [2443000d-43b5-7cf3-e914-c27a7349fbf7:foreman] INFO 
o.a.drill.exec.work.foreman.Foreman - Query text for query id 
2443000d-43b5-7cf3-e914-c27a7349fbf7: SHOW DATABASES 2018-10-09 19:22:56,147 
[2443000d-43b5-7cf3-e914-c27a7349fbf7:foreman] WARN 
o.a.d.e.s.m.s.MongoSchemaFactory - Failure while loading databases in Mongo. 
Timed out after 3 ms while waiting for a server that matches 
ReadPreferenceServerSelector{readPreference=primary}. Client view of cluster 
state is {type=REPLICA_SET, 
servers=[{address=ds063555-a0.vvn97.fleet.mlab.com:63555, type=UNKNOWN, 
state=CONNECTING, exception={com.mongodb.MongoSecurityException: Exception 
authenticating MongoCredential{mechanism=null, userName='dobes', 
source='admin', password=, mechanismProperties={}}}, caused by 
{com.mongodb.MongoCommandException: Command failed with error 18: 
'Authentication failed.' on server ds063555-a0.vvn97.fleet.mlab.com:63555. The 
full response is { "ok" : 0.0, "errmsg" : "Authentication failed.", "code" : 
18, "codeName" : "AuthenticationFailed", "operationTime" : { "$timestamp" :
{ "t" : 1539112946, "i" : 166 }
}, "$clusterTime" : { "clusterTime" : { "$timestamp" :
{ "t" : 1539112946, "i" : 166 }
}, "signature" : { "hash" :
{ "$binary" : "fSW8oqdPrR41ffVTL/Lv9/uZz6M=", "$type" : "0" }
, "keyId" : { "$numberLong" : "6608592120234115184" } } } }}}, 
{address=ds063555-a1.vvn97.fleet.mlab.com:63555, type=UNKNOWN, 
state=CONNECTING, exception={com.mongodb.MongoSecurityException: Exception 
authenticating MongoCredential{mechanism=null, userName='dobes', 
source='admin', password=, mechanismProperties={}}}, caused by 
{com.mongodb.MongoCommandException: Command failed with error 18: 
'Authentication failed.' on server ds063555-a1.vvn97.fleet.mlab.com:63555. The 
full response is { "ok" : 0.0, "errmsg" : "Authentication failed.", "code" : 
18, "codeName" : "AuthenticationFailed", "operationTime" : { "$timestamp" :
{ "t" : 1539112946, "i" : 166 }
}, "$clusterTime" : { "clusterTime" : { "$timestamp" :
{ "t" : 1539112946, "i" : 166 }
}, "signature" : { "hash" :
{ "$binary" : "fSW8oqdPrR41ffVTL/Lv9/uZz6M=", "$type" : "0" }
, "keyId" : { "$numberLong" : "6608592120234115184" } } } }}}] 2018-10-09 
19:23:26,149 [2443000d-43b5-7cf3-e914-c27a7349fbf7:foreman] WARN 
o.a.d.e.s.m.s.MongoSchemaFactory - Failure while getting collection names from 
'formative'. Timed out after 3 ms while waiting for a server that matches 
ReadPreferenceServerSelector{readPreference=primary}. Client view of cluster 
state is {type=REPLICA_SET, 
servers=[{address=ds063555-a0.vvn97.fleet.mlab.com:63555, type=UNKNOWN, 
state=CONNECTING, exception={com.mongodb.MongoSecurityException: Exception 
authenticating MongoCredential{mechanism=null, userName='dobes', 
source='admin', password=, mechanismProperties={}}}, caused by 
{com.mongodb.MongoCommandException: Command failed with error 18: 
'Authentication failed.' on server ds063555-a0.vvn97.fleet.mlab.com:63555. The 
full response is { "ok" : 0.0, "errmsg" : "Authentication failed.", "code" : 
18, "codeName" : "AuthenticationFailed", "operationTime" : { "$timestamp" :
{ "t" : 1539112946, "i" : 166 }
}, "$clusterTime" : { "clusterTime" : { "$timestamp" :
{ "t" : 1539112946, "i" : 166 }
}, "signature" : { "hash" :
{ "$binary" : "fSW8oqdPrR41ffVTL/Lv9/uZz6M=", "$type" : "0" }
, "keyId" : { "$numberLong" : "6608592120234115184" } } } }}}, 
{address=ds063555-a1.vvn97.fleet.mlab.com:63555, type=UNKNOWN, 
state=CONNECTING, exception={com.mongodb.MongoSecurityException: Exception 
authenticating

[jira] [Updated] (DRILL-6786) Work with servers where you do not have full privileges

2018-10-09 Thread Dobes Vandermeer (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dobes Vandermeer updated DRILL-6786:

Description: 
We have a mongo database hosted with mLab.  When trying to connect to this 
database, a couple of issues show up due to the lack of admin privileges on 
that database.

Assuming I specify the full connection URL including the name of a database, 
the connection fails with this in the logs:

 
{code:java}
2018-10-09 19:42:52,788 [2442fb44-f73b-1344-0359-125fcc386645:foreman] WARN 
o.a.d.e.s.m.s.MongoSchemaFactory - Failure while loading databases in Mongo. 
Command failed with error 13: 'not authorized on admin to execute command { 
listDatabases: 1, $db: "admin" }' on server 
ds063555-a1.vvn97.fleet.mlab.com:63555. The full response is { "operationTime" 
: { "$timestamp" :
{ "t" : 1539114172, "i" : 230 }
}, "ok" : 0.0, "errmsg" : "not authorized on admin to execute command { 
listDatabases: 1, $db: \"admin\" }", "code" : 13, "codeName" : "Unauthorized", 
"$clusterTime" : { "clusterTime" : { "$timestamp" :
{ "t" : 1539114172, "i" : 230 }
}, "signature" : { "hash" :
{ "$binary" : "r3SzOhbP8Zgu9BSyWvGPBlPmrt8=", "$type" : "0" }
, "keyId" : { "$numberLong" : "6608592120234115184" } } } }
 
{code}
 

If I don't specify the database name on the connection string, I get:

 
{code:java}
2018-10-09 19:22:26,144 [2443000d-43b5-7cf3-e914-c27a7349fbf7:foreman] INFO 
o.a.drill.exec.work.foreman.Foreman - Query text for query id 
2443000d-43b5-7cf3-e914-c27a7349fbf7: SHOW DATABASES 2018-10-09 19:22:56,147 
[2443000d-43b5-7cf3-e914-c27a7349fbf7:foreman] WARN 
o.a.d.e.s.m.s.MongoSchemaFactory - Failure while loading databases in Mongo. 
Timed out after 3 ms while waiting for a server that matches 
ReadPreferenceServerSelector{readPreference=primary}. Client view of cluster 
state is {type=REPLICA_SET, 
servers=[{address=ds063555-a0.vvn97.fleet.mlab.com:63555, type=UNKNOWN, 
state=CONNECTING, exception={com.mongodb.MongoSecurityException: Exception 
authenticating MongoCredential{mechanism=null, userName='dobes', 
source='admin', password=, mechanismProperties={}}}, caused by 
{com.mongodb.MongoCommandException: Command failed with error 18: 
'Authentication failed.' on server ds063555-a0.vvn97.fleet.mlab.com:63555. The 
full response is { "ok" : 0.0, "errmsg" : "Authentication failed.", "code" : 
18, "codeName" : "AuthenticationFailed", "operationTime" : { "$timestamp" :
{ "t" : 1539112946, "i" : 166 }
}, "$clusterTime" : { "clusterTime" : { "$timestamp" :
{ "t" : 1539112946, "i" : 166 }
}, "signature" : { "hash" :
{ "$binary" : "fSW8oqdPrR41ffVTL/Lv9/uZz6M=", "$type" : "0" }
, "keyId" : { "$numberLong" : "6608592120234115184" } } } }}}, 
{address=ds063555-a1.vvn97.fleet.mlab.com:63555, type=UNKNOWN, 
state=CONNECTING, exception={com.mongodb.MongoSecurityException: Exception 
authenticating MongoCredential{mechanism=null, userName='dobes', 
source='admin', password=, mechanismProperties={}}}, caused by 
{com.mongodb.MongoCommandException: Command failed with error 18: 
'Authentication failed.' on server ds063555-a1.vvn97.fleet.mlab.com:63555. The 
full response is { "ok" : 0.0, "errmsg" : "Authentication failed.", "code" : 
18, "codeName" : "AuthenticationFailed", "operationTime" : { "$timestamp" :
{ "t" : 1539112946, "i" : 166 }
}, "$clusterTime" : { "clusterTime" : { "$timestamp" :
{ "t" : 1539112946, "i" : 166 }
}, "signature" : { "hash" :
{ "$binary" : "fSW8oqdPrR41ffVTL/Lv9/uZz6M=", "$type" : "0" }
, "keyId" : { "$numberLong" : "6608592120234115184" } } } }}}] 2018-10-09 
19:23:26,149 [2443000d-43b5-7cf3-e914-c27a7349fbf7:foreman] WARN 
o.a.d.e.s.m.s.MongoSchemaFactory - Failure while getting collection names from 
'formative'. Timed out after 3 ms while waiting for a server that matches 
ReadPreferenceServerSelector{readPreference=primary}. Client view of cluster 
state is {type=REPLICA_SET, 
servers=[{address=ds063555-a0.vvn97.fleet.mlab.com:63555, type=UNKNOWN, 
state=CONNECTING, exception={com.mongodb.MongoSecurityException: Exception 
authenticating MongoCredential{mechanism=null, userName='dobes', 
source='admin', password=, mechanismProperties={}}}, caused by 
{com.mongodb.MongoCommandException: Command failed with error 18: 
'Authentication failed.' on server ds063555-a0.vvn97.fleet.mlab.com:63555. The 
full response is { "ok" : 0.0, "errmsg" : "Authentication failed.", "code" : 
18, "codeName" : "AuthenticationFailed", "operationTime" : { "$timestamp" :
{ "t" : 1539112946, "i" : 166 }
}, "$clusterTime" : { "clusterTime" : { "$timestamp" :
{ "t" : 1539112946, "i" : 166 }
}, "signature" : { "hash" :
{ "$binary" : "fSW8oqdPrR41ffVTL/Lv9/uZz6M=", "$type" : "0" }
, "keyId" : { "$numberLong" : "6608592120234115184" } } } }}}, 
{address=ds063555-a1.vvn97.fleet.mlab.com:63555, type=UNKNOWN, 
state=CONNECTING, exception={com.mongodb.MongoSecurityException: Exception 
authenticating

[jira] [Created] (DRILL-6786) Work with servers where you do not have full privileges

2018-10-09 Thread Dobes Vandermeer (JIRA)

Dobes Vandermeer created DRILL-6786:
---

 Summary: Work with servers where you do not have full privileges
 Key: DRILL-6786
 URL: https://issues.apache.org/jira/browse/DRILL-6786
 Project: Apache Drill
  Issue Type: Improvement
  Components: Storage - MongoDB
Affects Versions: 1.14.0
Reporter: Dobes Vandermeer


We have a mongo database hosted with mLab.  When trying to connect to this 
database, a couple of issues show up due to the lack of admin privileges on 
that database.

Assuming I specify the full connection URL including the name of a database, 
the connection fails with this in the logs:

2018-10-09 19:42:52,788 [2442fb44-f73b-1344-0359-125fcc386645:foreman] WARN 
o.a.d.e.s.m.s.MongoSchemaFactory - Failure while loading databases in Mongo. 
Command failed with error 13: 'not authorized on admin to execute command \{ 
listDatabases: 1, $db: "admin" }' on server 
ds063555-a1.vvn97.fleet.mlab.com:63555. The full response is \{ "operationTime" 
: { "$timestamp" : { "t" : 1539114172, "i" : 230 } }, "ok" : 0.0, "errmsg" : 
"not authorized on admin to execute command \{ listDatabases: 1, $db: \"admin\" 
}", "code" : 13, "codeName" : "Unauthorized", "$clusterTime" : \{ "clusterTime" 
: { "$timestamp" : { "t" : 1539114172, "i" : 230 } }, "signature" : \{ "hash" : 
{ "$binary" : "r3SzOhbP8Zgu9BSyWvGPBlPmrt8=", "$type" : "0" }, "keyId" : \{ 
"$numberLong" : "6608592120234115184" } } } }

 

If I don't specify the database name on the connection string, I get:

2018-10-09 19:22:26,144 [2443000d-43b5-7cf3-e914-c27a7349fbf7:foreman] INFO 
o.a.drill.exec.work.foreman.Foreman - Query text for query id 
2443000d-43b5-7cf3-e914-c27a7349fbf7: SHOW DATABASES 2018-10-09 19:22:56,147 
[2443000d-43b5-7cf3-e914-c27a7349fbf7:foreman] WARN 
o.a.d.e.s.m.s.MongoSchemaFactory - Failure while loading databases in Mongo. 
Timed out after 3 ms while waiting for a server that matches 
ReadPreferenceServerSelector\{readPreference=primary}. Client view of cluster 
state is \{type=REPLICA_SET, 
servers=[{address=ds063555-a0.vvn97.fleet.mlab.com:63555, type=UNKNOWN, 
state=CONNECTING, exception={com.mongodb.MongoSecurityException: Exception 
authenticating MongoCredential{mechanism=null, userName='dobes', 
source='admin', password=, mechanismProperties={}}}, caused by 
\{com.mongodb.MongoCommandException: Command failed with error 18: 
'Authentication failed.' on server ds063555-a0.vvn97.fleet.mlab.com:63555. The 
full response is { "ok" : 0.0, "errmsg" : "Authentication failed.", "code" : 
18, "codeName" : "AuthenticationFailed", "operationTime" : { "$timestamp" : { 
"t" : 1539112946, "i" : 166 } }, "$clusterTime" : \{ "clusterTime" : { 
"$timestamp" : { "t" : 1539112946, "i" : 166 } }, "signature" : \{ "hash" : { 
"$binary" : "fSW8oqdPrR41ffVTL/Lv9/uZz6M=", "$type" : "0" }, "keyId" : \{ 
"$numberLong" : "6608592120234115184" } } } }}}, 
\{address=ds063555-a1.vvn97.fleet.mlab.com:63555, type=UNKNOWN, 
state=CONNECTING, exception={com.mongodb.MongoSecurityException: Exception 
authenticating MongoCredential{mechanism=null, userName='dobes', 
source='admin', password=, mechanismProperties={}}}, caused by 
\{com.mongodb.MongoCommandException: Command failed with error 18: 
'Authentication failed.' on server ds063555-a1.vvn97.fleet.mlab.com:63555. The 
full response is { "ok" : 0.0, "errmsg" : "Authentication failed.", "code" : 
18, "codeName" : "AuthenticationFailed", "operationTime" : { "$timestamp" : { 
"t" : 1539112946, "i" : 166 } }, "$clusterTime" : \{ "clusterTime" : { 
"$timestamp" : { "t" : 1539112946, "i" : 166 } }, "signature" : \{ "hash" : { 
"$binary" : "fSW8oqdPrR41ffVTL/Lv9/uZz6M=", "$type" : "0" }, "keyId" : \{ 
"$numberLong" : "6608592120234115184" } } } }}}] 2018-10-09 19:23:26,149 
[2443000d-43b5-7cf3-e914-c27a7349fbf7:foreman] WARN 
o.a.d.e.s.m.s.MongoSchemaFactory - Failure while getting collection names from 
'formative'. Timed out after 3 ms while waiting for a server that matches 
ReadPreferenceServerSelector\{readPreference=primary}. Client view of cluster 
state is \{type=REPLICA_SET, 
servers=[{address=ds063555-a0.vvn97.fleet.mlab.com:63555, type=UNKNOWN, 
state=CONNECTING, exception={com.mongodb.MongoSecurityException: Exception 
authenticating MongoCredential{mechanism=null, userName='dobes', 
source='admin', password=, mechanismProperties={}}}, caused by 
\{com.mongodb.MongoCommandException: Command failed with error 18: 
'Authentication failed.' on server ds063555-a0.vvn97.fleet.mlab.com:63555. The 
full response is { "ok" : 0.0, "errmsg" : "Authentication failed.", "code" : 
18, "codeName" : "AuthenticationFailed", "operationTime" : { "$timestamp" : { 
"t" : 1539112946, "i" : 166 } }, "$clusterTime" : \{ "clusterTime" : { 
"$timestamp" : { "t" : 1539112946, "i" : 166 } }, "signature" : \{ "hash" : { 
"$binary" : "fSW8oqdPrR41ffVTL/Lv9/uZz6M=", "$type" : "0" }, "keyId" : \{

[jira] [Commented] (DRILL-3988) Create a sys.functions table to expose available Drill functions

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-3988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643976#comment-16643976
 ] 

ASF GitHub Bot commented on DRILL-3988:
---

kkhatua commented on issue #1483: DRILL-3988: Expose Drill built-in functions & 
UDFs  in a system table
URL: https://github.com/apache/drill/pull/1483#issuecomment-428318959
 
 
   @arina-ielchiieva  
   Rebased on top of Master which carries DRILL-6762's fix ( #1484 ), and 
removed reference to DRILL-6084 .


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Create a sys.functions table to expose available Drill functions
> 
>
> Key: DRILL-3988
> URL: https://issues.apache.org/jira/browse/DRILL-3988
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Metadata
>Reporter: Jacques Nadeau
>Assignee: Kunal Khatua
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.15.0
>
>
> Create a new sys.functions table that returns a list of all available 
> functions.
> Key considerations: 
> - one row per name or one per argument set. I'm inclined to latter so people 
> can use queries to get to data.
> - we need to create a delineation between user functions and internal 
> functions and only show user functions. 'CastInt' isn't something the user 
> should be able to see (or run).
> - should we add a description annotation that could be included in the 
> sys.functions table?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-5571) Unable to cancel running queries from Web UI

2018-10-09 Thread Kunal Khatua (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-5571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua updated DRILL-5571:

Labels: javascript newbie  (was: )

> Unable to cancel running queries from Web UI
> 
>
> Key: DRILL-5571
> URL: https://issues.apache.org/jira/browse/DRILL-5571
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - HTTP
>Affects Versions: 1.11.0
>Reporter: Kedar Sankar Behera
>Assignee: Hanumath Rao Maduri
>Priority: Major
>  Labels: javascript, newbie
> Fix For: 1.15.0
>
>
> We are unable to access profiles of some running queries. Hit the following 
> error on the Web UI:
> {code}
> {
>   “errorMessage” : “VALIDATION ERROR: No profile with given query id 
> ‘26c90b95-928b-15e3-bedc-bfb4a046cc8b’ exists. Please verify the query 
> id.\n\n\n[Error Id: e6896a23-6932-469d-9968-d315fdd06dd4 ]”
> }
> {code}
> And we cannot cancel the running queries whose profile page can be accessed:
> {code}
> Failure attempting to cancel query 26c90b33-cf7e-0495-8f76-55220f71f809.  
> Unable to find information about where query is actively running.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643915#comment-16643915
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223672779
 
 

 ##
 File path: pom.xml
 ##
 @@ -53,8 +53,8 @@
 2.9.5
 2.9.5
 3.4.12
-5.2.1-mapr
-1.1
+6.0.1-mapr
 
 Review comment:
   Use 6.1.0-mapr version
   It will be introduced after merging the following PR:
   
https://github.com/apache/drill/pull/1489/files#diff-600376dffeb79835ede4a0b285078036R56


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643913#comment-16643913
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223667043
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillRelMdSelectivity.java
 ##
 @@ -0,0 +1,90 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.cost;
+
+import org.apache.calcite.plan.volcano.RelSubset;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.metadata.ReflectiveRelMetadataProvider;
+import org.apache.calcite.rel.metadata.RelMdSelectivity;
+import org.apache.calcite.rel.metadata.RelMdUtil;
+import org.apache.calcite.rel.metadata.RelMetadataProvider;
+import org.apache.calcite.rel.metadata.RelMetadataQuery;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.util.BuiltInMethod;
+import org.apache.drill.exec.physical.base.DbGroupScan;
+import org.apache.drill.exec.physical.base.GroupScan;
+import org.apache.drill.exec.planner.logical.DrillScanRel;
+import org.apache.drill.exec.planner.physical.PlannerSettings;
+import org.apache.drill.exec.planner.physical.PrelUtil;
+import org.apache.drill.exec.planner.physical.ScanPrel;
+
+import java.util.List;
+
+public class DrillRelMdSelectivity extends RelMdSelectivity {
+  private static final DrillRelMdSelectivity INSTANCE = new 
DrillRelMdSelectivity();
+
+  public static final RelMetadataProvider SOURCE = 
ReflectiveRelMetadataProvider.reflectiveSource(BuiltInMethod.SELECTIVITY.method,
 INSTANCE);
+
+
+  public Double getSelectivity(RelNode rel, RexNode predicate) {
 
 Review comment:
   Why super methods can't be used instead? Is it necessary to improve them in 
Calcite RelMdSelectivity.java?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6775) The schema for empty output is not shown in Drill Web UI

2018-10-09 Thread Arina Ielchiieva (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6775:

Labels: ready-to-commit  (was: )

> The schema for empty output is not shown in Drill Web UI
> 
>
> Key: DRILL-6775
> URL: https://issues.apache.org/jira/browse/DRILL-6775
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - HTTP, Web Server
>Affects Versions: 1.14.0
>Reporter: Vitalii Diravka
>Assignee: Anton Gozhiy
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
> Attachments: image-2018-10-05-16-16-45-389.png
>
>
> The query in SqlLine:
> {code}
> 0: jdbc:drill:zk=local> SELECT employee_id, full_name, first_name, last_name 
> FROM cp.`employee.json` LIMIT 0;
> +--++-++
> | employee_id | full_name | first_name | last_name |
> +--++-++
> +--++-++
> No rows selected (0.118 seconds)
> {code}
> But the same in Drill UI shows nothing, see the attachment.
>  !image-2018-10-05-16-16-45-389.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6775) The schema for empty output is not shown in Drill Web UI

2018-10-09 Thread Arina Ielchiieva (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6775:

Reviewer: Arina Ielchiieva

> The schema for empty output is not shown in Drill Web UI
> 
>
> Key: DRILL-6775
> URL: https://issues.apache.org/jira/browse/DRILL-6775
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - HTTP, Web Server
>Affects Versions: 1.14.0
>Reporter: Vitalii Diravka
>Assignee: Anton Gozhiy
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
> Attachments: image-2018-10-05-16-16-45-389.png
>
>
> The query in SqlLine:
> {code}
> 0: jdbc:drill:zk=local> SELECT employee_id, full_name, first_name, last_name 
> FROM cp.`employee.json` LIMIT 0;
> +--++-++
> | employee_id | full_name | first_name | last_name |
> +--++-++
> +--++-++
> No rows selected (0.118 seconds)
> {code}
> But the same in Drill UI shows nothing, see the attachment.
>  !image-2018-10-05-16-16-45-389.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6775) The schema for empty output is not shown in Drill Web UI

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643706#comment-16643706
 ] 

ASF GitHub Bot commented on DRILL-6775:
---

arina-ielchiieva commented on issue #1498: DRILL-6775: The schema for empty 
output is not shown in Drill Web UI
URL: https://github.com/apache/drill/pull/1498#issuecomment-428260765
 
 
   +1


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> The schema for empty output is not shown in Drill Web UI
> 
>
> Key: DRILL-6775
> URL: https://issues.apache.org/jira/browse/DRILL-6775
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - HTTP, Web Server
>Affects Versions: 1.14.0
>Reporter: Vitalii Diravka
>Assignee: Anton Gozhiy
>Priority: Minor
> Fix For: 1.15.0
>
> Attachments: image-2018-10-05-16-16-45-389.png
>
>
> The query in SqlLine:
> {code}
> 0: jdbc:drill:zk=local> SELECT employee_id, full_name, first_name, last_name 
> FROM cp.`employee.json` LIMIT 0;
> +--++-++
> | employee_id | full_name | first_name | last_name |
> +--++-++
> +--++-++
> No rows selected (0.118 seconds)
> {code}
> But the same in Drill UI shows nothing, see the attachment.
>  !image-2018-10-05-16-16-45-389.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6775) The schema for empty output is not shown in Drill Web UI

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643703#comment-16643703
 ] 

ASF GitHub Bot commented on DRILL-6775:
---

agozhiy commented on issue #1498: DRILL-6775: The schema for empty output is 
not shown in Drill Web UI
URL: https://github.com/apache/drill/pull/1498#issuecomment-428260087
 
 
   Now it returns a table with no data:
   ![screenshot from 2018-10-09 
15-38-06](https://user-images.githubusercontent.com/31588230/46683685-7c72e580-cbf9-11e8-8a4c-338f5d0321e4.png)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> The schema for empty output is not shown in Drill Web UI
> 
>
> Key: DRILL-6775
> URL: https://issues.apache.org/jira/browse/DRILL-6775
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - HTTP, Web Server
>Affects Versions: 1.14.0
>Reporter: Vitalii Diravka
>Assignee: Anton Gozhiy
>Priority: Minor
> Fix For: 1.15.0
>
> Attachments: image-2018-10-05-16-16-45-389.png
>
>
> The query in SqlLine:
> {code}
> 0: jdbc:drill:zk=local> SELECT employee_id, full_name, first_name, last_name 
> FROM cp.`employee.json` LIMIT 0;
> +--++-++
> | employee_id | full_name | first_name | last_name |
> +--++-++
> +--++-++
> No rows selected (0.118 seconds)
> {code}
> But the same in Drill UI shows nothing, see the attachment.
>  !image-2018-10-05-16-16-45-389.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6775) The schema for empty output is not shown in Drill Web UI

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643678#comment-16643678
 ] 

ASF GitHub Bot commented on DRILL-6775:
---

agozhiy opened a new pull request #1498: DRILL-6775: The schema for empty 
output is not shown in Drill Web UI
URL: https://github.com/apache/drill/pull/1498
 
 
   Removed an excess check for the result emptiness that prevented retrieval of 
the column names.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> The schema for empty output is not shown in Drill Web UI
> 
>
> Key: DRILL-6775
> URL: https://issues.apache.org/jira/browse/DRILL-6775
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - HTTP, Web Server
>Affects Versions: 1.14.0
>Reporter: Vitalii Diravka
>Assignee: Anton Gozhiy
>Priority: Minor
> Fix For: 1.15.0
>
> Attachments: image-2018-10-05-16-16-45-389.png
>
>
> The query in SqlLine:
> {code}
> 0: jdbc:drill:zk=local> SELECT employee_id, full_name, first_name, last_name 
> FROM cp.`employee.json` LIMIT 0;
> +--++-++
> | employee_id | full_name | first_name | last_name |
> +--++-++
> +--++-++
> No rows selected (0.118 seconds)
> {code}
> But the same in Drill UI shows nothing, see the attachment.
>  !image-2018-10-05-16-16-45-389.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6410) Memory leak in Parquet Reader during cancellation

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643669#comment-16643669
 ] 

ASF GitHub Bot commented on DRILL-6410:
---

sachouche commented on a change in pull request #1497: DRILL-6410: Fixed memory 
leak in flat Parquet reader
URL: https://github.com/apache/drill/pull/1497#discussion_r223762506
 
 

 ##
 File path: exec/jdbc-all/pom.xml
 ##
 @@ -511,7 +511,7 @@
   This is likely due to you adding new dependencies to a 
java-exec and not updating the excludes in this module. This is important as it 
minimizes the size of the dependency of Drill application users.
 
   
-  3900
+  3950
 
 Review comment:
   @vvysotskyi,
   - I rebased several times and didn't see the fix; saw it only after 
publishing the PR
   - Talked internally to the team and was advised to update the pom
   
   BTW - The current way of packaging the jdbc-all jar is unsustainable (unless 
we accept the fact this jar might indefinitely increase in size). The right 
fix, is to divide the java-exec project into multiple projects: common, client, 
and server. This way, the jdbc-all artifact should only depend on common and 
client.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Memory leak in Parquet Reader during cancellation
> -
>
> Key: DRILL-6410
> URL: https://issues.apache.org/jira/browse/DRILL-6410
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.15.0
>
>
> Occasionally, a memory leak is observed within the flat Parquet reader when 
> query cancellation is invoked.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6770) Queries on MapR-DB JSON tables fail with UnsupportedOperationException: Getting number of rows for tablet not supported

2018-10-09 Thread Anton Gozhiy (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643489#comment-16643489
 ] 

Anton Gozhiy commented on DRILL-6770:
-

[~agirish], these are different failures.

> Queries on MapR-DB JSON tables fail with UnsupportedOperationException: 
> Getting number of rows for tablet not supported
> ---
>
> Key: DRILL-6770
> URL: https://issues.apache.org/jira/browse/DRILL-6770
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization, Storage - MapRDB
>Affects Versions: 1.15.0
> Environment: MapR 6.1.0
> Drill 1.15.0
>Reporter: Abhishek Girish
>Assignee: Gautam Kumar Parai
>Priority: Critical
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
>
> Create a simple MapR-DB JSON table
> {code}
> $ mapr dbshell 
> MapR-DB Shell
> maprdb root:> create /tmp/t1
> Table /tmp/t1 created.
> maprdb root:> insert /tmp/t1 --id '1' --v '{"a":1}'
> Document with id: "1" inserted.
> maprdb root:> find /tmp/t1
> {"_id":"1","a":1}
> 1 document(s) found.
> {code}
> Querying this from Drill fails:
> {code}
> > select * from mfs.`/tmp/t1`;
> Error: SYSTEM ERROR: UnsupportedOperationException: Getting number of rows 
> for tablet not supported
> {code}
> Stack Trace:
> {code}
>   (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception 
> during fragment initialization: Error while applying rule DrillTableRule, 
> args [rel#1400499:EnumerableTableScan.ENUMERABLE.ANY([]).[](table=[mfs, 
> /tmp/t1])]
> org.apache.drill.exec.work.foreman.Foreman.run():300
> java.util.concurrent.ThreadPoolExecutor.runWorker():1149
> java.util.concurrent.ThreadPoolExecutor$Worker.run():624
> java.lang.Thread.run():748
>   Caused By (java.lang.RuntimeException) Error while applying rule 
> DrillTableRule, args 
> [rel#1400499:EnumerableTableScan.ENUMERABLE.ANY([]).[](table=[mfs, /tmp/t1])]
> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch():236
> org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp():648
> org.apache.calcite.tools.Programs$RuleSetProgram.run():339
> 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform():425
> 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform():365
> 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToRawDrel():252
> 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel():314
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan():179
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan():145
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan():83
> org.apache.drill.exec.work.foreman.Foreman.runSQL():584
> org.apache.drill.exec.work.foreman.Foreman.run():272
> java.util.concurrent.ThreadPoolExecutor.runWorker():1149
> java.util.concurrent.ThreadPoolExecutor$Worker.run():624
> java.lang.Thread.run():748
>   Caused By (org.apache.drill.common.exceptions.DrillRuntimeException) Error 
> getting region info for table: maprfs:///tmp/t1
> org.apache.drill.exec.store.mapr.db.json.JsonTableGroupScan.init():161
> org.apache.drill.exec.store.mapr.db.json.JsonTableGroupScan.():83
> org.apache.drill.exec.store.mapr.db.MapRDBFormatPlugin.getGroupScan():81
> org.apache.drill.exec.store.dfs.FileSystemPlugin.getPhysicalScan():170
> org.apache.drill.exec.store.AbstractStoragePlugin.getPhysicalScan():117
> org.apache.drill.exec.store.AbstractStoragePlugin.getPhysicalScan():112
> org.apache.drill.exec.planner.logical.DrillTable.getGroupScan():99
> org.apache.drill.exec.planner.logical.DrillScanRel.():90
> org.apache.drill.exec.planner.logical.DrillScanRel.():70
> org.apache.drill.exec.planner.logical.DrillScanRel.():63
> org.apache.drill.exec.planner.logical.DrillScanRule.onMatch():38
> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch():212
> org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp():648
> org.apache.calcite.tools.Programs$RuleSetProgram.run():339
> 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform():425
> 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform():365
> 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToRawDrel():252
> 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel():314
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan():179
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan():145
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan():83
>

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643338#comment-16643338
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223675289
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PlannerSettings.java
 ##
 @@ -114,6 +114,28 @@
   public static final String UNIONALL_DISTRIBUTE_KEY = 
"planner.enable_unionall_distribute";
   public static final BooleanValidator UNIONALL_DISTRIBUTE = new 
BooleanValidator(UNIONALL_DISTRIBUTE_KEY, null);
 
+  // --- Index planning related 
options BEGIN --
+  public static final String USE_SIMPLE_OPTIMIZER_KEY = 
"planner.use_simple_optimizer";
+  public static final BooleanValidator USE_SIMPLE_OPTIMIZER = new 
BooleanValidator(USE_SIMPLE_OPTIMIZER_KEY, null);
+  public static final BooleanValidator INDEX_PLANNING = new 
BooleanValidator("planner.enable_index_planning", null);
+  public static final BooleanValidator ENABLE_STATS = new 
BooleanValidator("planner.enable_statistics", null);
 
 Review comment:
   Is it used for obtaining MapR-DB and Parquet table statistics?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643334#comment-16643334
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223670746
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/index/IndexPlanUtils.java
 ##
 @@ -0,0 +1,872 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.planner.index;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+import org.apache.drill.shaded.guava.com.google.common.collect.Maps;
+import org.apache.drill.shaded.guava.com.google.common.collect.Sets;
+
+import org.apache.calcite.plan.RelTraitSet;
+import org.apache.calcite.plan.volcano.RelSubset;
+import org.apache.calcite.rel.RelCollation;
+import org.apache.calcite.rel.RelCollationTraitDef;
+import org.apache.calcite.rel.RelCollations;
+import org.apache.calcite.rel.RelFieldCollation;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.Sort;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexUtil;
+import org.apache.calcite.rex.RexLiteral;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.drill.common.expression.FieldReference;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.physical.base.DbGroupScan;
+import org.apache.drill.exec.physical.base.GroupScan;
+import org.apache.drill.exec.physical.base.IndexGroupScan;
+import org.apache.drill.exec.planner.common.DrillProjectRelBase;
+import org.apache.drill.exec.planner.common.DrillScanRelBase;
+import org.apache.drill.exec.planner.fragment.DistributionAffinity;
+import org.apache.drill.exec.planner.logical.DrillOptiq;
+import org.apache.drill.exec.planner.logical.DrillParseContext;
+import org.apache.drill.exec.planner.logical.DrillScanRel;
+import org.apache.drill.exec.planner.physical.DrillDistributionTrait;
+import org.apache.drill.exec.planner.physical.Prel;
+import org.apache.drill.exec.planner.physical.PrelUtil;
+import org.apache.drill.exec.planner.physical.ScanPrel;
+import org.apache.drill.exec.planner.physical.ProjectPrel;
+import org.apache.drill.exec.planner.common.OrderedRel;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.calcite.rex.RexNode;
+
+public class IndexPlanUtils {
+
+  public enum ConditionIndexed {
+NONE,
+PARTIAL,
+FULL}
+
+  /**
+   * Check if any of the fields of the index are present in a list of 
LogicalExpressions supplied
+   * as part of IndexableExprMarker
+   * @param exprMarker, the marker that has analyzed original index condition 
on top of original scan
+   * @param indexDesc
+   * @return ConditionIndexed.FULL, PARTIAL or NONE depending on whether all, 
some or no columns
+   * of the indexDesc are present in the list of LogicalExpressions supplied 
as part of exprMarker
+   *
+   */
+  static public ConditionIndexed conditionIndexed(IndexableExprMarker 
exprMarker, IndexDescriptor indexDesc) {
+Map mapRexExpr = 
exprMarker.getIndexableExpression();
+List infoCols = Lists.newArrayList();
+infoCols.addAll(mapRexExpr.values());
+if (indexDesc.allColumnsIndexed(infoCols)) {
+  return ConditionIndexed.FULL;
+} else if (indexDesc.someColumnsIndexed(infoCols)) {
+  return ConditionIndexed.PARTIAL;
+} else {
+  return ConditionIndexed.NONE;
+}
+  }
+
+  /**
+   * check if we want to apply index rules on this scan,
+   * if group scan is not instance of DbGroupScan, or this DbGroupScan 
instance does not support secondary index, or
+   *this scan is already an index scan or Restricted Scan, do not

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643321#comment-16643321
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223652706
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/json/MaprDBJsonRecordReader.java
 ##
 @@ -517,6 +388,15 @@ private static FieldPath 
getFieldPathForProjection(SchemaPath column) {
 return new FieldPath(child);
   }
 
+  public static boolean includesIdField(Collection projected) {
+return Iterables.tryFind(projected, new Predicate() {
 
 Review comment:
   replace with lambda


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643317#comment-16643317
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223654960
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/udf/mapr/db/DecodeFieldPath.java
 ##
 @@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.udf.mapr.db;
+
+import javax.inject.Inject;
+
+import org.apache.drill.exec.expr.DrillSimpleFunc;
+import org.apache.drill.exec.expr.annotations.FunctionTemplate;
+import org.apache.drill.exec.expr.annotations.FunctionTemplate.FunctionScope;
+import org.apache.drill.exec.expr.annotations.FunctionTemplate.NullHandling;
+import org.apache.drill.exec.expr.annotations.Output;
+import org.apache.drill.exec.expr.annotations.Param;
+import org.apache.drill.exec.expr.holders.VarCharHolder;
+
+import io.netty.buffer.DrillBuf;
+
+@FunctionTemplate(name = "maprdb_decode_fieldpath", scope = 
FunctionScope.SIMPLE, nulls = NullHandling.NULL_IF_NULL)
+public class DecodeFieldPath implements DrillSimpleFunc {
+  @Param  VarCharHolder input;
+  @Output VarCharHolder   out;
+
+  @Inject DrillBuf buffer;
+
+  @Override
+  public void setup() {
+  }
+
+  @Override
+  public void eval() {
+String[] encodedPaths = 
org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.
+toStringFromUTF8(input.start, input.end, input.buffer).split(",");
+String[] decodedPaths = 
org.apache.drill.exec.util.EncodedSchemaPathSet.decode(encodedPaths);
+java.util.Arrays.sort(decodedPaths);
+
+StringBuilder sb = new StringBuilder();
+for(String decodedPath : decodedPaths) {
 
 Review comment:
   sapce


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643308#comment-16643308
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223367759
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/MapRDBPushProjectIntoScan.java
 ##
 @@ -0,0 +1,145 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.mapr.db;
+
+import com.google.common.collect.Lists;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptRuleOperand;
+import org.apache.calcite.plan.RelTrait;
+import org.apache.calcite.plan.RelTraitSet;
+import org.apache.calcite.rel.RelCollation;
+import org.apache.calcite.rel.rules.ProjectRemoveRule;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexNode;
+import org.apache.drill.common.exceptions.DrillRuntimeException;
+import org.apache.drill.exec.planner.common.DrillRelOptUtil;
+import org.apache.drill.exec.planner.logical.RelOptHelper;
+import org.apache.drill.exec.planner.common.DrillRelOptUtil;
+import org.apache.drill.exec.planner.common.DrillRelOptUtil.ProjectPushInfo;
+import org.apache.drill.exec.planner.physical.Prel;
+import org.apache.drill.exec.planner.physical.ProjectPrel;
+import org.apache.drill.exec.planner.physical.ScanPrel;
+import org.apache.drill.exec.store.StoragePluginOptimizerRule;
+import org.apache.drill.exec.store.mapr.db.binary.BinaryTableGroupScan;
+import org.apache.drill.exec.store.mapr.db.json.JsonTableGroupScan;
+import org.apache.drill.exec.util.Utilities;
+
+import java.util.List;
+
+public abstract class MapRDBPushProjectIntoScan extends 
StoragePluginOptimizerRule {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(MapRDBPushProjectIntoScan.class);
+
+  private MapRDBPushProjectIntoScan(RelOptRuleOperand operand, String 
description) {
+super(operand, description);
+  }
+
+  public static final StoragePluginOptimizerRule PROJECT_ON_SCAN = new 
MapRDBPushProjectIntoScan(
+  RelOptHelper.some(ProjectPrel.class, RelOptHelper.any(ScanPrel.class)), 
"MapRDBPushProjIntoScan:Proj_On_Scan") {
+@Override
+public void onMatch(RelOptRuleCall call) {
+  final ScanPrel scan = (ScanPrel) call.rel(1);
+  final ProjectPrel project = (ProjectPrel) call.rel(0);
+  if (!(scan.getGroupScan() instanceof MapRDBGroupScan)) {
+return;
+  }
+  doPushProjectIntoGroupScan(call, project, scan, (MapRDBGroupScan) 
scan.getGroupScan());
+  if (scan.getGroupScan() instanceof BinaryTableGroupScan) {
+BinaryTableGroupScan groupScan = (BinaryTableGroupScan) 
scan.getGroupScan();
+
+  } else {
+assert (scan.getGroupScan() instanceof JsonTableGroupScan);
+JsonTableGroupScan groupScan = (JsonTableGroupScan) 
scan.getGroupScan();
+
+doPushProjectIntoGroupScan(call, project, scan, groupScan);
+  }
+}
+
+@Override
+public boolean matches(RelOptRuleCall call) {
+  final ScanPrel scan = (ScanPrel) call.rel(1);
+  if (scan.getGroupScan() instanceof BinaryTableGroupScan ||
+  scan.getGroupScan() instanceof JsonTableGroupScan) {
+return super.matches(call);
+  }
+  return false;
+}
+  };
+
+  protected void doPushProjectIntoGroupScan(RelOptRuleCall call,
+  ProjectPrel project, ScanPrel scan, MapRDBGroupScan groupScan) {
+try {
+
+  DrillRelOptUtil.ProjectPushInfo columnInfo =
+  DrillRelOptUtil.getFieldsInformation(scan.getRowType(), 
project.getProjects());
+  if (columnInfo == null || Utilities.isStarQuery(columnInfo.getFields()) 
//
+  || !groupScan.canPushdownProjects(columnInfo.getFields())) {
+return;
+  }
+  RelTraitSet newTraits = call.getPlanner().emptyTraitSet();
+  // Clear out collation trait
+  for (RelTrait trait : scan.getTraitSet()) {
+

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643307#comment-16643307
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223651821
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/json/JsonTableRangePartitionFunction.java
 ##
 @@ -0,0 +1,237 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.mapr.db.json;
+
+import java.util.List;
+
+import org.apache.drill.common.expression.FieldReference;
+import org.apache.drill.exec.planner.physical.AbstractRangePartitionFunction;
+import org.apache.drill.exec.record.VectorWrapper;
+import org.apache.drill.exec.store.mapr.db.MapRDBFormatPlugin;
+import org.apache.drill.exec.vector.ValueVector;
+import org.ojai.store.QueryCondition;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonIgnore;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Lists;
+import com.mapr.db.Table;
+import com.mapr.db.impl.ConditionImpl;
+import com.mapr.db.impl.IdCodec;
+import com.mapr.db.impl.ConditionNode.RowkeyRange;
+import com.mapr.db.scan.ScanRange;
+import com.mapr.fs.jni.MapRConstants;
+import com.mapr.org.apache.hadoop.hbase.util.Bytes;
+
+@JsonTypeName("jsontable-range-partition-function")
+public class JsonTableRangePartitionFunction extends 
AbstractRangePartitionFunction {
+
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(JsonTableRangePartitionFunction.class);
+
+  @JsonProperty("refList")
+  protected List refList;
+
+  @JsonProperty("tableName")
+  protected String tableName;
+
+  @JsonIgnore
+  protected String userName;
+
+  @JsonIgnore
+  protected ValueVector partitionKeyVector = null;
+
+  // List of start keys of the scan ranges for the table.
+  @JsonProperty
+  protected List startKeys = null;
+
+  // List of stop keys of the scan ranges for the table.
+  @JsonProperty
+  protected List stopKeys = null;
+
+  @JsonCreator
+  public JsonTableRangePartitionFunction(
+  @JsonProperty("refList") List refList,
+  @JsonProperty("tableName") String tableName,
+  @JsonProperty("startKeys") List startKeys,
+  @JsonProperty("stopKeys") List stopKeys) {
+this.refList = refList;
+this.tableName = tableName;
+this.startKeys = startKeys;
+this.stopKeys = stopKeys;
+  }
+
+  public JsonTableRangePartitionFunction(List refList,
+  String tableName, String userName, MapRDBFormatPlugin formatPlugin) {
+this.refList = refList;
+this.tableName = tableName;
+this.userName = userName;
+initialize(formatPlugin);
+  }
+
+  @JsonProperty("refList")
+  @Override
+  public List getPartitionRefList() {
+return refList;
+  }
+
+  @Override
+  public void setup(List> partitionKeys) {
+if (partitionKeys.size() != 1) {
+  throw new UnsupportedOperationException(
+  "Range partitioning function supports exactly one partition column; 
encountered " + partitionKeys.size());
+}
+
+VectorWrapper v = partitionKeys.get(0);
+
+partitionKeyVector = v.getValueVector();
+
+Preconditions.checkArgument(partitionKeyVector != null, "Found null 
partitionKeVector.") ;
+  }
+
+  @Override
+  public boolean equals(Object obj) {
+if (this == obj) {
+  return true;
+}
+if (obj instanceof JsonTableRangePartitionFunction) {
+  JsonTableRangePartitionFunction rpf = (JsonTableRangePartitionFunction) 
obj;
+  List thisPartRefList = this.getPartitionRefList();
+  List otherPartRefList = rpf.getPartitionRefList();
+  if (thisPartRefList.size() != otherPartRefList.size()) {
+return false;
+  }
+  for (int refIdx=0; refIdx Add capability to do index based planning and execution
> ---
>
>

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643340#comment-16643340
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223668844
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/index/IndexConditionInfo.java
 ##
 @@ -0,0 +1,250 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.index;
+
+
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+import org.apache.drill.shaded.guava.com.google.common.collect.Maps;
+import org.apache.drill.shaded.guava.com.google.common.collect.Sets;
+import org.apache.calcite.plan.RelOptUtil;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.rex.RexUtil;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.exec.planner.logical.DrillScanRel;
+import 
org.apache.drill.exec.planner.logical.partition.RewriteCombineBinaryOperators;
+import org.apache.calcite.rel.RelNode;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+public class IndexConditionInfo {
+  public final RexNode indexCondition;
+  public final RexNode remainderCondition;
+  public final boolean hasIndexCol;
+
+  public IndexConditionInfo(RexNode indexCondition, RexNode 
remainderCondition, boolean hasIndexCol) {
+this.indexCondition = indexCondition;
+this.remainderCondition = remainderCondition;
+this.hasIndexCol = hasIndexCol;
+  }
+
+  public static Builder newBuilder(RexNode condition,
+   Iterable indexes,
+   RexBuilder builder,
+   RelNode scan) {
+return new Builder(condition, indexes, builder, scan);
+  }
+
+  public static class Builder {
+final RexBuilder builder;
+final RelNode scan;
+final Iterable indexes;
+private RexNode condition;
+
+public Builder(RexNode condition,
+   Iterable indexes,
+   RexBuilder builder,
+   RelNode scan
+) {
+  this.condition = condition;
+  this.builder = builder;
+  this.scan = scan;
+  this.indexes = indexes;
+}
+
+public Builder(RexNode condition,
+   IndexDescriptor index,
+   RexBuilder builder,
+   DrillScanRel scan
+) {
+  this.condition = condition;
+  this.builder = builder;
+  this.scan = scan;
+  this.indexes = Lists.newArrayList(index);
+}
+
+/**
+ * Get a single IndexConditionInfo in which indexCondition has field  on 
all indexes in this.indexes
+ * @return
+ */
+public IndexConditionInfo getCollectiveInfo(IndexLogicalPlanCallContext 
indexContext) {
+  Set paths = Sets.newLinkedHashSet();
+  for ( IndexDescriptor index : indexes ) {
+paths.addAll(index.getIndexColumns());
+//paths.addAll(index.getNonIndexColumns());
+  }
+  return indexConditionRelatedToFields(Lists.newArrayList(paths), 
condition);
+}
+
+/*
+ * A utility function to check whether the given index hint is valid.
+ */
+public boolean isValidIndexHint(IndexLogicalPlanCallContext indexContext) {
+  if (indexContext.indexHint.equals("")) { return false; }
+
+  for ( IndexDescriptor index: indexes ) {
 
 Review comment:
   formatting


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
>

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643364#comment-16643364
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223656111
 
 

 ##
 File path: 
contrib/format-maprdb/src/test/java/com/mapr/drill/maprdb/tests/index/IndexHintPlanTest.java
 ##
 @@ -0,0 +1,171 @@
+package com.mapr.drill.maprdb.tests.index;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+import com.mapr.drill.maprdb.tests.json.BaseJsonTest;
+import com.mapr.tests.annotations.ClusterTest;
+import org.apache.drill.PlanTestBase;
+import org.junit.experimental.categories.Category;
+import org.junit.runners.MethodSorters;
+import org.junit.FixMethodOrder;
+import org.junit.Test;
+
+@FixMethodOrder(MethodSorters.NAME_ASCENDING)
+@Category(ClusterTest.class)
+public class IndexHintPlanTest extends IndexPlanTest {
+
+private static final String defaultHavingIndexPlan = "alter session reset 
`planner.enable_index_planning`";
+
+@Test
+// A simple testcase with index hint on a table which has only one index 
for a column t.id.ssn;
+// This should pick i_ssn index for the query
+public void testSimpleIndexHint() throws Exception {
+String hintquery = "SELECT  t.id.ssn as ssn FROM 
table(hbase.`index_test_primary`(type => 'maprdb', index => 'i_ssn')) as t " +
+" where t.id.ssn = '17423'";
+
+String query = "SELECT t.id.ssn as ssn FROM hbase.`index_test_primary` 
as t where t.id.ssn = '17423'";
+test(defaultHavingIndexPlan);
+PlanTestBase.testPlanMatchingPatterns(hintquery,
+new String[] 
{".*JsonTableGroupScan.*tableName=.*index_test_primary.*indexName=i_ssn"},
+new String[]{"RowKeyJoin"}
+);
+
+//default plan picked by optimizer.
+PlanTestBase.testPlanMatchingPatterns(query,
+new String[] 
{".*JsonTableGroupScan.*tableName=.*index_test_primary.*indexName=i_ssn"},
+new String[]{"RowKeyJoin"}
+);
+testBuilder()
+.sqlQuery(hintquery)
+.ordered()
+.baselineColumns("ssn").baselineValues("17423")
+.go();
+
+}
+
+
+@Test
+// A testcase where there are multiple index to pick from but only picks 
the index provided as hint.
+// A valid index is provided as hint and it is useful during the index 
selection process, hence it will be selected.
+public void testHintCaseWithMultipleIndexes_1() throws Exception {
+
+String hintquery = "SELECT t.`address`.`state` AS `state` FROM 
table(hbase.`index_test_primary`(type => 'maprdb', index => 'i_state_city')) as 
t " +
+" where t.address.state = 'pc'";
+
+String query = "SELECT t.`address`.`state` AS `state` FROM 
hbase.`index_test_primary` as t where t.address.state = 'pc'";
+test(defaultHavingIndexPlan);
+PlanTestBase.testPlanMatchingPatterns(hintquery,
+new String[] 
{".*JsonTableGroupScan.*tableName=.*index_test_primary.*indexName=i_state_city"},
+new String[]{"RowKeyJoin"}
+);
+
+//default plan picked by optimizer
+PlanTestBase.testPlanMatchingPatterns(query,
+new String[] 
{".*JsonTableGroupScan.*tableName=.*index_test_primary.*indexName=(i_state_city|i_state_age_phone)"},
+new String[]{"RowKeyJoin"}
+);
+
+return;
+}
+
+@Test
+// A testcase where there are multiple index to pick from but only picks 
the index provided as hint.
+// A valid index is provided as hint and it is useful during the index 
selection process, hence it will be selected.
+// Difference between this testcase and the one before this is that index 
name is switched. This shows that index hint makes sure to select only one
+// valid index specified as hint.
+public void testHintCaseWithMultipleIndexes_2() throws Exception {
+
+

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643318#comment-16643318
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223369746
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/MapRDBPushProjectIntoScan.java
 ##
 @@ -0,0 +1,145 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.mapr.db;
+
+import com.google.common.collect.Lists;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptRuleOperand;
+import org.apache.calcite.plan.RelTrait;
+import org.apache.calcite.plan.RelTraitSet;
+import org.apache.calcite.rel.RelCollation;
+import org.apache.calcite.rel.rules.ProjectRemoveRule;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexNode;
+import org.apache.drill.common.exceptions.DrillRuntimeException;
+import org.apache.drill.exec.planner.common.DrillRelOptUtil;
+import org.apache.drill.exec.planner.logical.RelOptHelper;
+import org.apache.drill.exec.planner.common.DrillRelOptUtil;
+import org.apache.drill.exec.planner.common.DrillRelOptUtil.ProjectPushInfo;
+import org.apache.drill.exec.planner.physical.Prel;
+import org.apache.drill.exec.planner.physical.ProjectPrel;
+import org.apache.drill.exec.planner.physical.ScanPrel;
+import org.apache.drill.exec.store.StoragePluginOptimizerRule;
+import org.apache.drill.exec.store.mapr.db.binary.BinaryTableGroupScan;
+import org.apache.drill.exec.store.mapr.db.json.JsonTableGroupScan;
+import org.apache.drill.exec.util.Utilities;
+
+import java.util.List;
+
+public abstract class MapRDBPushProjectIntoScan extends 
StoragePluginOptimizerRule {
 
 Review comment:
   Not sure why do we  need the separate rule for pushing Project into Scan for 
`MapRDBGroupScan`.
   Why is `DrillPushProjectIntoScanRule` not suitable?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643355#comment-16643355
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223659073
 
 

 ##
 File path: 
contrib/format-maprdb/src/test/java/com/mapr/drill/maprdb/tests/index/IndexPlanTest.java
 ##
 @@ -0,0 +1,1715 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package com.mapr.drill.maprdb.tests.index;
+
+import com.mapr.db.Admin;
+import com.mapr.drill.maprdb.tests.MaprDBTestsSuite;
+import com.mapr.drill.maprdb.tests.json.BaseJsonTest;
+import com.mapr.tests.annotations.ClusterTest;
+import org.apache.drill.PlanTestBase;
+import org.joda.time.DateTime;
+import org.joda.time.format.DateTimeFormat;
+import org.apache.drill.common.config.DrillConfig;
+import org.junit.AfterClass;
+import org.junit.BeforeClass;
+import org.junit.FixMethodOrder;
+import org.junit.Ignore;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+import org.junit.runners.MethodSorters;
+import java.util.Properties;
+
+
+@FixMethodOrder(MethodSorters.NAME_ASCENDING)
+@Category(ClusterTest.class)
+public class IndexPlanTest extends BaseJsonTest {
+
+  final static String PRIMARY_TABLE_NAME = "/tmp/index_test_primary";
+
+  final static int PRIMARY_TABLE_SIZE = 1;
+  private static final String sliceTargetSmall = "alter session set 
`planner.slice_target` = 1";
+  private static final String sliceTargetDefault = "alter session reset 
`planner.slice_target`";
+  private static final String noIndexPlan = "alter session set 
`planner.enable_index_planning` = false";
+  private static final String defaultHavingIndexPlan = "alter session reset 
`planner.enable_index_planning`";
+  private static final String disableHashAgg = "alter session set 
`planner.enable_hashagg` = false";
+  private static final String enableHashAgg =  "alter session set 
`planner.enable_hashagg` = true";
+  private static final String defaultnonCoveringSelectivityThreshold = "alter 
session set `planner.index.noncovering_selectivity_threshold` = 0.025";
+  private static final String incrnonCoveringSelectivityThreshold = "alter 
session set `planner.index.noncovering_selectivity_threshold` = 0.25";
+  private static final String disableFTS = "alter session set 
`planner.disable_full_table_scan` = true";
+  private static final String enableFTS = "alter session reset 
`planner.disable_full_table_scan`";
+  private static final String preferIntersectPlans = "alter session set 
`planner.index.prefer_intersect_plans` = true";
+  private static final String defaultIntersectPlans = "alter session reset 
`planner.index.prefer_intersect_plans`";
+  private static final String lowRowKeyJoinBackIOFactor
+  = "alter session set `planner.index.rowkeyjoin_cost_factor` = 0.01";
+  private static final String defaultRowKeyJoinBackIOFactor
+  = "alter session reset `planner.index.rowkeyjoin_cost_factor`";
+
+  /**
+   *  A sample row of this 10K table:
+   --+-++
+   | 1012  | {"city":"pfrrs","state":"pc"}  | 
{"email":"kffzkuz...@gmail.com","phone":"655471"}  |
+   {"ssn":"17423"}  | {"fname":"KfFzK","lname":"UZwNk"}  | 
{"age":53.0,"income":45.0}  | 1012   |
+   *
+   * This test suite generate random content to fill all the rows, since the 
random function always start from
+   * the same seed for different runs, when the row count is not changed, the 
data in table will always be the same,
+   * thus the query result could be predicted and verified.
+   */
+
+  @BeforeClass
+  public static void setupTableIndexes() throws Exception {
+
+Properties overrideProps = new Properties();
+
overrideProps.setProperty("format-maprdb.json.useNumRegionsForDistribution", 
"true");
+updateTestCluster(1, DrillConfig.create(overrideProps));
+
+MaprDBTestsSuite.setupTests();
+MaprDBTestsSuite.createPluginAndGetConf(getDrillbitContext());
+
+

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643361#comment-16643361
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223657717
 
 

 ##
 File path: 
contrib/format-maprdb/src/test/java/com/mapr/drill/maprdb/tests/index/IndexPlanTest.java
 ##
 @@ -0,0 +1,1715 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package com.mapr.drill.maprdb.tests.index;
+
+import com.mapr.db.Admin;
+import com.mapr.drill.maprdb.tests.MaprDBTestsSuite;
+import com.mapr.drill.maprdb.tests.json.BaseJsonTest;
+import com.mapr.tests.annotations.ClusterTest;
+import org.apache.drill.PlanTestBase;
+import org.joda.time.DateTime;
+import org.joda.time.format.DateTimeFormat;
+import org.apache.drill.common.config.DrillConfig;
+import org.junit.AfterClass;
+import org.junit.BeforeClass;
+import org.junit.FixMethodOrder;
+import org.junit.Ignore;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+import org.junit.runners.MethodSorters;
+import java.util.Properties;
+
+
+@FixMethodOrder(MethodSorters.NAME_ASCENDING)
+@Category(ClusterTest.class)
+public class IndexPlanTest extends BaseJsonTest {
+
+  final static String PRIMARY_TABLE_NAME = "/tmp/index_test_primary";
+
+  final static int PRIMARY_TABLE_SIZE = 1;
+  private static final String sliceTargetSmall = "alter session set 
`planner.slice_target` = 1";
+  private static final String sliceTargetDefault = "alter session reset 
`planner.slice_target`";
+  private static final String noIndexPlan = "alter session set 
`planner.enable_index_planning` = false";
+  private static final String defaultHavingIndexPlan = "alter session reset 
`planner.enable_index_planning`";
+  private static final String disableHashAgg = "alter session set 
`planner.enable_hashagg` = false";
+  private static final String enableHashAgg =  "alter session set 
`planner.enable_hashagg` = true";
+  private static final String defaultnonCoveringSelectivityThreshold = "alter 
session set `planner.index.noncovering_selectivity_threshold` = 0.025";
+  private static final String incrnonCoveringSelectivityThreshold = "alter 
session set `planner.index.noncovering_selectivity_threshold` = 0.25";
+  private static final String disableFTS = "alter session set 
`planner.disable_full_table_scan` = true";
+  private static final String enableFTS = "alter session reset 
`planner.disable_full_table_scan`";
+  private static final String preferIntersectPlans = "alter session set 
`planner.index.prefer_intersect_plans` = true";
+  private static final String defaultIntersectPlans = "alter session reset 
`planner.index.prefer_intersect_plans`";
+  private static final String lowRowKeyJoinBackIOFactor
+  = "alter session set `planner.index.rowkeyjoin_cost_factor` = 0.01";
+  private static final String defaultRowKeyJoinBackIOFactor
+  = "alter session reset `planner.index.rowkeyjoin_cost_factor`";
+
+  /**
+   *  A sample row of this 10K table:
+   --+-++
+   | 1012  | {"city":"pfrrs","state":"pc"}  | 
{"email":"kffzkuz...@gmail.com","phone":"655471"}  |
+   {"ssn":"17423"}  | {"fname":"KfFzK","lname":"UZwNk"}  | 
{"age":53.0,"income":45.0}  | 1012   |
+   *
+   * This test suite generate random content to fill all the rows, since the 
random function always start from
+   * the same seed for different runs, when the row count is not changed, the 
data in table will always be the same,
+   * thus the query result could be predicted and verified.
+   */
+
+  @BeforeClass
+  public static void setupTableIndexes() throws Exception {
+
+Properties overrideProps = new Properties();
+
overrideProps.setProperty("format-maprdb.json.useNumRegionsForDistribution", 
"true");
+updateTestCluster(1, DrillConfig.create(overrideProps));
+
+MaprDBTestsSuite.setupTests();
+MaprDBTestsSuite.createPluginAndGetConf(getDrillbitContext());
+
+

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643359#comment-16643359
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223353853
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/MapRDBPushProjectIntoScan.java
 ##
 @@ -0,0 +1,145 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.mapr.db;
+
+import com.google.common.collect.Lists;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptRuleOperand;
+import org.apache.calcite.plan.RelTrait;
+import org.apache.calcite.plan.RelTraitSet;
+import org.apache.calcite.rel.RelCollation;
+import org.apache.calcite.rel.rules.ProjectRemoveRule;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexNode;
+import org.apache.drill.common.exceptions.DrillRuntimeException;
+import org.apache.drill.exec.planner.common.DrillRelOptUtil;
+import org.apache.drill.exec.planner.logical.RelOptHelper;
+import org.apache.drill.exec.planner.common.DrillRelOptUtil;
+import org.apache.drill.exec.planner.common.DrillRelOptUtil.ProjectPushInfo;
+import org.apache.drill.exec.planner.physical.Prel;
+import org.apache.drill.exec.planner.physical.ProjectPrel;
+import org.apache.drill.exec.planner.physical.ScanPrel;
+import org.apache.drill.exec.store.StoragePluginOptimizerRule;
+import org.apache.drill.exec.store.mapr.db.binary.BinaryTableGroupScan;
+import org.apache.drill.exec.store.mapr.db.json.JsonTableGroupScan;
+import org.apache.drill.exec.util.Utilities;
+
+import java.util.List;
+
+public abstract class MapRDBPushProjectIntoScan extends 
StoragePluginOptimizerRule {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(MapRDBPushProjectIntoScan.class);
+
+  private MapRDBPushProjectIntoScan(RelOptRuleOperand operand, String 
description) {
+super(operand, description);
+  }
+
+  public static final StoragePluginOptimizerRule PROJECT_ON_SCAN = new 
MapRDBPushProjectIntoScan(
+  RelOptHelper.some(ProjectPrel.class, RelOptHelper.any(ScanPrel.class)), 
"MapRDBPushProjIntoScan:Proj_On_Scan") {
+@Override
+public void onMatch(RelOptRuleCall call) {
+  final ScanPrel scan = (ScanPrel) call.rel(1);
+  final ProjectPrel project = (ProjectPrel) call.rel(0);
+  if (!(scan.getGroupScan() instanceof MapRDBGroupScan)) {
+return;
+  }
+  doPushProjectIntoGroupScan(call, project, scan, (MapRDBGroupScan) 
scan.getGroupScan());
+  if (scan.getGroupScan() instanceof BinaryTableGroupScan) {
 
 Review comment:
   Why do we need this whole `if` block at all?
   In the `if` block there is no useful logic. In `else` block there is a 
second execution of `doPushProjectIntoGroupScan()`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643356#comment-16643356
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223680969
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillMergeProjectRule.java
 ##
 @@ -166,4 +169,25 @@ public void onMatch(RelOptRuleCall call) {
 return list;
   }
 
+  public static Project replace(Project topProject, Project bottomProject) {
+final List newProjects =
+RelOptUtil.pushPastProject(topProject.getProjects(), bottomProject);
+
+// replace the two projects with a combined projection
+if(topProject instanceof DrillProjectRel) {
 
 Review comment:
   space


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643354#comment-16643354
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223667409
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/index/DrillIndexDefinition.java
 ##
 @@ -0,0 +1,278 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.index;
+
+import com.fasterxml.jackson.annotation.JsonIgnore;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import org.apache.drill.shaded.guava.com.google.common.collect.Sets;
+
+import org.apache.calcite.rel.RelCollation;
+import org.apache.calcite.rel.RelCollations;
+import org.apache.calcite.rel.RelFieldCollation;
+import org.apache.calcite.rel.RelFieldCollation.NullDirection;
+import org.apache.drill.common.expression.CastExpression;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.common.expression.SchemaPath;
+
+import java.util.Collection;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+public class DrillIndexDefinition implements IndexDefinition {
+  /**
+   * The indexColumns is the list of column(s) on which this index is created. 
If there is more than 1 column,
+   * the order of the columns is important: index on {a, b} is not the same as 
index on {b, a}
+   * NOTE: the indexed column could be of type columnfamily.column
+   */
+  @JsonProperty
+  protected final List indexColumns;
+
+  /**
+   * nonIndexColumns: the list of columns that are included in the index as 
'covering'
+   * columns but are not themselves indexed.  These are useful for covering 
indexes where the
+   * query request can be satisfied directly by the index and avoid accessing 
the table altogether.
+   */
+  @JsonProperty
+  protected final List nonIndexColumns;
+
+  @JsonIgnore
+  protected final Set allIndexColumns;
+
+  @JsonProperty
+  protected final List rowKeyColumns;
+
+  @JsonProperty
+  protected final CollationContext indexCollationContext;
+
+  /**
+   * indexName: name of the index that should be unique within the scope of a 
table
+   */
+  @JsonProperty
+  protected final String indexName;
+
+  protected final String tableName;
+
+  @JsonProperty
+  protected final IndexDescriptor.IndexType indexType;
+
+  @JsonProperty
+  protected final NullDirection nullsDirection;
+
+  public DrillIndexDefinition(List indexCols,
+  CollationContext indexCollationContext,
+  List nonIndexCols,
+  List rowKeyColumns,
+  String indexName,
+  String tableName,
+  IndexType type,
+  NullDirection nullsDirection) {
+this.indexColumns = indexCols;
+this.nonIndexColumns = nonIndexCols;
+this.rowKeyColumns = rowKeyColumns;
+this.indexName = indexName;
+this.tableName = tableName;
+this.indexType = type;
+this.allIndexColumns = Sets.newHashSet(indexColumns);
+this.allIndexColumns.addAll(nonIndexColumns);
+this.indexCollationContext = indexCollationContext;
+this.nullsDirection = nullsDirection;
+
+  }
+
+  @Override
+  public int getIndexColumnOrdinal(LogicalExpression path) {
+int id = indexColumns.indexOf(path);
+return id;
+  }
+
+  @Override
+  public boolean isCoveringIndex(List columns) {
+return allIndexColumns.containsAll(columns);
+  }
+
+  @Override
+  public boolean allColumnsIndexed(Collection columns) {
+return columnsInIndexFields(columns, indexColumns);
+  }
+
+  @Override
+  public boolean someColumnsIndexed(Collection columns) {
+return someColumnsInIndexFields(columns, indexColumns);
+  }
+
+  public boolean pathExactIn(SchemaPath path, Collection 
exprs) {
+for (LogicalExpression expr : exprs) {
+  if (expr instanceof SchemaPath) {
+if (((SchemaPath)

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643349#comment-16643349
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223677718
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/index/SimpleRexRemap.java
 ##
 @@ -0,0 +1,300 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.index;
+
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableMap;
+import org.apache.drill.shaded.guava.com.google.common.collect.Maps;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexCall;
+import org.apache.calcite.rex.RexCorrelVariable;
+import org.apache.calcite.rex.RexDynamicParam;
+import org.apache.calcite.rex.RexFieldAccess;
+
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.calcite.rex.RexLiteral;
+import org.apache.calcite.rex.RexLocalRef;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.rex.RexOver;
+import org.apache.calcite.rex.RexRangeRef;
+import org.apache.calcite.rex.RexShuttle;
+import org.apache.calcite.rex.RexVisitorImpl;
+
+import org.apache.calcite.sql.fun.SqlStdOperatorTable;
+import org.apache.calcite.sql.type.SqlTypeName;
+import org.apache.calcite.util.NlsString;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.common.expression.PathSegment;
+
+import java.util.List;
+import java.util.Map;
+
+/**
+ * Rewrite RexNode with these policies:
+ * 1) field renamed. The input field was named differently in index table,
+ * 2) field is in different position of underlying rowtype
+ *
+ * TODO: 3) certain operator needs rewriting. e.g. CAST function
+ * This class for now applies to only filter on scan, for 
filter-on-project-on-scan. A stack of
+ * rowType is required.
+ */
+public class SimpleRexRemap {
+  final RelNode origRel;
+  final RelDataType origRowType;
+  final RelDataType newRowType;
+
+  private RexBuilder builder;
+  private Map destExprMap;
+
+  public SimpleRexRemap(RelNode origRel,
+RelDataType newRowType, RexBuilder builder) {
+super();
+this.origRel = origRel;
+this.origRowType = origRel.getRowType();
+this.newRowType = newRowType;
+this.builder = builder;
+this.destExprMap = Maps.newHashMap();
+  }
+
+  /**
+   * Set the map of src expression to target expression, expressions not in 
the map do not have assigned destinations
+   * @param exprMap
+   * @return
+   */
+  public SimpleRexRemap setExpressionMap(Map  exprMap) {
+destExprMap.putAll(exprMap);
+return this;
+  }
+
+  public RexNode rewriteEqualOnCharToLike(RexNode expr,
+  Map 
equalOnCastCharExprs) {
+Map srcToReplace = Maps.newIdentityHashMap();
+for(Map.Entry entry: 
equalOnCastCharExprs.entrySet()) {
+  RexNode equalOp = entry.getKey();
+  LogicalExpression opInput = entry.getValue();
+
+  final List operands = ((RexCall)equalOp).getOperands();
+  RexLiteral newLiteral = null;
+  RexNode input = null;
+  if(operands.size() == 2 ) {
+RexLiteral oplit = null;
+if (operands.get(0) instanceof RexLiteral) {
+  oplit = (RexLiteral) operands.get(0);
+  if(oplit.getTypeName() == SqlTypeName.CHAR) {
+newLiteral = builder.makeLiteral(((NlsString) 
oplit.getValue()).getValue() + "%");
+input = operands.get(1);
+  }
+}
+else if (operands.get(1) instanceof RexLiteral) {
+  oplit = (RexLiteral) operands.get(1);
+  if(oplit.getTypeName() == SqlTypeName.CHAR) {
 
 Review comment:
   space 


This is an automated message from the Apache Git Service.
To respond to the message,

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643348#comment-16643348
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223683219
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillMergeProjectRule.java
 ##
 @@ -166,4 +169,25 @@ public void onMatch(RelOptRuleCall call) {
 return list;
   }
 
+  public static Project replace(Project topProject, Project bottomProject) {
 
 Review comment:
   Can `ProjectAllowDup` and `Project` rels be merged?
   For instance:
   `00-01  ProjectAllowDup(MONTHID=[$0], NOFOCONTRACTS=[$1]) : rowType = 
RecordType(INTEGER MONTHID, INTEGER NOFOCONTRACTS)`
   `00-02Project(MonthId=[CAST($1):INTEGER], No of 
Contracts=[CAST($2):INTEGER]) : rowType = RecordType(INTEGER MonthId, INTEGER 
No of Contracts)`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643353#comment-16643353
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223668629
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/index/IndexConditionInfo.java
 ##
 @@ -0,0 +1,250 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.index;
+
+
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+import org.apache.drill.shaded.guava.com.google.common.collect.Maps;
+import org.apache.drill.shaded.guava.com.google.common.collect.Sets;
+import org.apache.calcite.plan.RelOptUtil;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.rex.RexUtil;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.exec.planner.logical.DrillScanRel;
+import 
org.apache.drill.exec.planner.logical.partition.RewriteCombineBinaryOperators;
+import org.apache.calcite.rel.RelNode;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+public class IndexConditionInfo {
+  public final RexNode indexCondition;
+  public final RexNode remainderCondition;
+  public final boolean hasIndexCol;
+
+  public IndexConditionInfo(RexNode indexCondition, RexNode 
remainderCondition, boolean hasIndexCol) {
+this.indexCondition = indexCondition;
+this.remainderCondition = remainderCondition;
+this.hasIndexCol = hasIndexCol;
+  }
+
+  public static Builder newBuilder(RexNode condition,
+   Iterable indexes,
+   RexBuilder builder,
+   RelNode scan) {
+return new Builder(condition, indexes, builder, scan);
+  }
+
+  public static class Builder {
+final RexBuilder builder;
+final RelNode scan;
+final Iterable indexes;
+private RexNode condition;
+
+public Builder(RexNode condition,
+   Iterable indexes,
+   RexBuilder builder,
+   RelNode scan
+) {
 
 Review comment:
   formatting


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643351#comment-16643351
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223655243
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/udf/mapr/db/NotTypeOfPlaceholder.java
 ##
 @@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.udf.mapr.db;
+
+import org.apache.drill.exec.expr.DrillSimpleFunc;
+import org.apache.drill.exec.expr.annotations.FunctionTemplate;
+import org.apache.drill.exec.expr.annotations.Output;
+import org.apache.drill.exec.expr.annotations.Param;
+import org.apache.drill.exec.expr.holders.BigIntHolder;
+import org.apache.drill.exec.expr.holders.BitHolder;
+import org.apache.drill.exec.expr.holders.IntHolder;
+
+/**
+ * This is a placeholder for the nottypeof() function.
+ *
+ * At this time, this function can only be used in predicates. The placeholder
+ * is here to prevent calcite from complaining; the function will get pushed 
down
+ * by the storage plug-in into DB. That process will go through 
JsonConditionBuilder.java,
+ * which will replace this function with the real OJAI equivalent to be pushed 
down.
+ * Therefore, there's no implementation here.
+ */
+@FunctionTemplate(
 
 Review comment:
   the same regarding `FunctionTemplate` formatting


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643350#comment-16643350
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223677301
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/index/InvalidIndexDefinitionException.java
 ##
 @@ -0,0 +1,27 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.index;
+
+import org.apache.drill.common.exceptions.DrillRuntimeException;
+
+public class InvalidIndexDefinitionException extends DrillRuntimeException {
 
 Review comment:
   Consider adding some description to the exception


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643376#comment-16643376
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223379455
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/MapRDBPushLimitIntoScan.java
 ##
 @@ -0,0 +1,203 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.mapr.db;
+
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptRuleOperand;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rex.RexLiteral;
+import org.apache.drill.exec.physical.base.GroupScan;
+import org.apache.drill.exec.planner.common.DrillRelOptUtil;
+import org.apache.drill.exec.planner.logical.RelOptHelper;
+import org.apache.drill.exec.planner.physical.LimitPrel;
+import org.apache.drill.exec.planner.physical.ProjectPrel;
+import org.apache.drill.exec.planner.physical.RowKeyJoinPrel;
+import org.apache.drill.exec.planner.physical.ScanPrel;
+import org.apache.drill.exec.store.StoragePluginOptimizerRule;
+import org.apache.drill.exec.store.hbase.HBaseScanSpec;
+import org.apache.drill.exec.store.mapr.db.binary.BinaryTableGroupScan;
+import org.apache.drill.exec.store.mapr.db.json.JsonTableGroupScan;
+import org.apache.drill.exec.store.mapr.db.json.RestrictedJsonTableGroupScan;
+
+public abstract class MapRDBPushLimitIntoScan extends 
StoragePluginOptimizerRule {
 
 Review comment:
   `DrillPushLimitToScanRule` doesn't work for MapR-DB Scan, am I right?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643374#comment-16643374
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223369746
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/MapRDBPushProjectIntoScan.java
 ##
 @@ -0,0 +1,145 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.mapr.db;
+
+import com.google.common.collect.Lists;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptRuleOperand;
+import org.apache.calcite.plan.RelTrait;
+import org.apache.calcite.plan.RelTraitSet;
+import org.apache.calcite.rel.RelCollation;
+import org.apache.calcite.rel.rules.ProjectRemoveRule;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexNode;
+import org.apache.drill.common.exceptions.DrillRuntimeException;
+import org.apache.drill.exec.planner.common.DrillRelOptUtil;
+import org.apache.drill.exec.planner.logical.RelOptHelper;
+import org.apache.drill.exec.planner.common.DrillRelOptUtil;
+import org.apache.drill.exec.planner.common.DrillRelOptUtil.ProjectPushInfo;
+import org.apache.drill.exec.planner.physical.Prel;
+import org.apache.drill.exec.planner.physical.ProjectPrel;
+import org.apache.drill.exec.planner.physical.ScanPrel;
+import org.apache.drill.exec.store.StoragePluginOptimizerRule;
+import org.apache.drill.exec.store.mapr.db.binary.BinaryTableGroupScan;
+import org.apache.drill.exec.store.mapr.db.json.JsonTableGroupScan;
+import org.apache.drill.exec.util.Utilities;
+
+import java.util.List;
+
+public abstract class MapRDBPushProjectIntoScan extends 
StoragePluginOptimizerRule {
 
 Review comment:
   Why is `DrillPushProjectIntoScanRule` not suitable for `MapRDBGroupScan`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643372#comment-16643372
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223368541
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/MapRDBPushProjectIntoScan.java
 ##
 @@ -0,0 +1,145 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.mapr.db;
+
+import com.google.common.collect.Lists;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptRuleOperand;
+import org.apache.calcite.plan.RelTrait;
+import org.apache.calcite.plan.RelTraitSet;
+import org.apache.calcite.rel.RelCollation;
+import org.apache.calcite.rel.rules.ProjectRemoveRule;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexNode;
+import org.apache.drill.common.exceptions.DrillRuntimeException;
+import org.apache.drill.exec.planner.common.DrillRelOptUtil;
+import org.apache.drill.exec.planner.logical.RelOptHelper;
+import org.apache.drill.exec.planner.common.DrillRelOptUtil;
+import org.apache.drill.exec.planner.common.DrillRelOptUtil.ProjectPushInfo;
+import org.apache.drill.exec.planner.physical.Prel;
+import org.apache.drill.exec.planner.physical.ProjectPrel;
+import org.apache.drill.exec.planner.physical.ScanPrel;
+import org.apache.drill.exec.store.StoragePluginOptimizerRule;
+import org.apache.drill.exec.store.mapr.db.binary.BinaryTableGroupScan;
+import org.apache.drill.exec.store.mapr.db.json.JsonTableGroupScan;
+import org.apache.drill.exec.util.Utilities;
+
+import java.util.List;
+
+public abstract class MapRDBPushProjectIntoScan extends 
StoragePluginOptimizerRule {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(MapRDBPushProjectIntoScan.class);
+
+  private MapRDBPushProjectIntoScan(RelOptRuleOperand operand, String 
description) {
+super(operand, description);
+  }
+
+  public static final StoragePluginOptimizerRule PROJECT_ON_SCAN = new 
MapRDBPushProjectIntoScan(
+  RelOptHelper.some(ProjectPrel.class, RelOptHelper.any(ScanPrel.class)), 
"MapRDBPushProjIntoScan:Proj_On_Scan") {
+@Override
+public void onMatch(RelOptRuleCall call) {
+  final ScanPrel scan = (ScanPrel) call.rel(1);
+  final ProjectPrel project = (ProjectPrel) call.rel(0);
 
 Review comment:
   Explicit casts to `ScanPrel` and `ProjectPrel` are redundant here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643300#comment-16643300
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223368541
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/MapRDBPushProjectIntoScan.java
 ##
 @@ -0,0 +1,145 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.mapr.db;
+
+import com.google.common.collect.Lists;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptRuleOperand;
+import org.apache.calcite.plan.RelTrait;
+import org.apache.calcite.plan.RelTraitSet;
+import org.apache.calcite.rel.RelCollation;
+import org.apache.calcite.rel.rules.ProjectRemoveRule;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexNode;
+import org.apache.drill.common.exceptions.DrillRuntimeException;
+import org.apache.drill.exec.planner.common.DrillRelOptUtil;
+import org.apache.drill.exec.planner.logical.RelOptHelper;
+import org.apache.drill.exec.planner.common.DrillRelOptUtil;
+import org.apache.drill.exec.planner.common.DrillRelOptUtil.ProjectPushInfo;
+import org.apache.drill.exec.planner.physical.Prel;
+import org.apache.drill.exec.planner.physical.ProjectPrel;
+import org.apache.drill.exec.planner.physical.ScanPrel;
+import org.apache.drill.exec.store.StoragePluginOptimizerRule;
+import org.apache.drill.exec.store.mapr.db.binary.BinaryTableGroupScan;
+import org.apache.drill.exec.store.mapr.db.json.JsonTableGroupScan;
+import org.apache.drill.exec.util.Utilities;
+
+import java.util.List;
+
+public abstract class MapRDBPushProjectIntoScan extends 
StoragePluginOptimizerRule {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(MapRDBPushProjectIntoScan.class);
+
+  private MapRDBPushProjectIntoScan(RelOptRuleOperand operand, String 
description) {
+super(operand, description);
+  }
+
+  public static final StoragePluginOptimizerRule PROJECT_ON_SCAN = new 
MapRDBPushProjectIntoScan(
+  RelOptHelper.some(ProjectPrel.class, RelOptHelper.any(ScanPrel.class)), 
"MapRDBPushProjIntoScan:Proj_On_Scan") {
+@Override
+public void onMatch(RelOptRuleCall call) {
+  final ScanPrel scan = (ScanPrel) call.rel(1);
+  final ProjectPrel project = (ProjectPrel) call.rel(0);
 
 Review comment:
   Explicit casts to `ScanPrel` and `ProjectPrel` are redundant in case of the 
same type of variables.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643302#comment-16643302
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r217477279
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/MapRDBScanBatchCreator.java
 ##
 @@ -33,7 +33,9 @@
 
 import org.apache.drill.shaded.guava.com.google.common.base.Preconditions;
 
-public class MapRDBScanBatchCreator implements BatchCreator {
+public class MapRDBScanBatchCreator implements BatchCreator{
 
 Review comment:
   space before curly brace


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643363#comment-16643363
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223680267
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/index/rules/AbstractMatchFunction.java
 ##
 @@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.index.rules;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.physical.base.DbGroupScan;
+import org.apache.drill.exec.physical.base.GroupScan;
+import org.apache.drill.exec.planner.logical.DrillScanRel;
+
+public abstract class AbstractMatchFunction implements MatchFunction {
+  public boolean checkScan(DrillScanRel scanRel) {
+GroupScan groupScan = scanRel.getGroupScan();
+if (groupScan instanceof DbGroupScan) {
+  DbGroupScan dbscan = ((DbGroupScan) groupScan);
+  //if we already applied index convert rule, and this scan is indexScan 
or restricted scan already,
+  //no more trying index convert rule
+  return dbscan.supportsSecondaryIndex() && (!dbscan.isIndexScan()) && 
(!dbscan.isRestrictedScan());
+}
+return false;
+  }
+
+  public boolean checkScan(GroupScan groupScan) {
+if (groupScan instanceof DbGroupScan) {
+  DbGroupScan dbscan = ((DbGroupScan) groupScan);
+  //if we already applied index convert rule, and this scan is indexScan 
or restricted scan already,
+  //no more trying index convert rule
+  return dbscan.supportsSecondaryIndex() &&
+ !dbscan.isRestrictedScan() &&
+  (!dbscan.isFilterPushedDown() || dbscan.isIndexScan()) &&
+ !containsStar(dbscan);
+}
+return false;
+  }
+
+  public static boolean containsStar(DbGroupScan dbscan) {
+for (SchemaPath column : dbscan.getColumns()) {
+  if (column.getRootSegment().getPath().startsWith("*")) {
+return true;
+  }
+}
+return false;
+  }
+}
 
 Review comment:
   new line


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643368#comment-16643368
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223679873
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/index/generators/IndexIntersectPlanGenerator.java
 ##
 @@ -0,0 +1,350 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.index.generators;
+
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+import org.apache.drill.shaded.guava.com.google.common.collect.Maps;
+
+import org.apache.calcite.plan.RelOptUtil;
+import org.apache.calcite.plan.RelTraitSet;
+import org.apache.calcite.rel.InvalidRelException;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.JoinRelType;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeFactory;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.util.Pair;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.physical.base.DbGroupScan;
+import org.apache.drill.exec.physical.base.IndexGroupScan;
+import org.apache.drill.exec.planner.common.JoinControl;
+import org.apache.drill.exec.planner.index.IndexLogicalPlanCallContext;
+import org.apache.drill.exec.planner.index.IndexDescriptor;
+import org.apache.drill.exec.planner.index.FunctionalIndexInfo;
+import org.apache.drill.exec.planner.index.FunctionalIndexHelper;
+import org.apache.drill.exec.planner.index.IndexPlanUtils;
+import org.apache.drill.exec.planner.index.IndexConditionInfo;
+import org.apache.drill.exec.planner.physical.DrillDistributionTrait;
+import org.apache.drill.exec.planner.physical.DrillDistributionTraitDef;
+import 
org.apache.drill.exec.planner.physical.DrillDistributionTrait.DistributionType;
+import org.apache.drill.exec.planner.physical.FilterPrel;
+import org.apache.drill.exec.planner.physical.HashJoinPrel;
+import org.apache.drill.exec.planner.physical.PlannerSettings;
+import org.apache.drill.exec.planner.physical.Prel;
+import org.apache.drill.exec.planner.physical.ProjectPrel;
+import org.apache.drill.exec.planner.physical.Prule;
+import org.apache.drill.exec.planner.physical.RowKeyJoinPrel;
+import org.apache.drill.exec.planner.physical.ScanPrel;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * IndexScanIntersectGenerator is to generate index plan against multiple 
index tables,
+ * the input indexes are assumed to be ranked by selectivity(low to high) 
already.
+ */
+public class IndexIntersectPlanGenerator extends AbstractIndexPlanGenerator {
+
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(IndexIntersectPlanGenerator.class);
+
+  final Map indexInfoMap;
+
+  public IndexIntersectPlanGenerator(IndexLogicalPlanCallContext indexContext,
+ Map 
indexInfoMap,
+ RexBuilder builder,
+ PlannerSettings settings) {
+super(indexContext, null, null, builder, settings);
+this.indexInfoMap = indexInfoMap;
+  }
+
+  public RelNode buildRowKeyJoin(RelNode left, RelNode right, boolean 
isRowKeyJoin, int htControl)
+  throws InvalidRelException {
+final int leftRowKeyIdx = getRowKeyIndex(left.getRowType(), origScan);
+final int rightRowKeyIdx = 0; // only rowkey field is being projected from 
right side
+
+assert leftRowKeyIdx >= 0;
+
+List leftJoinKeys = ImmutableList.of(leftRowKeyIdx);
+List rightJoinKeys = ImmutableList.of(rightRowKeyIdx);
+
+logger.trace(String.format(
+"buildRowKeyJoin: leftIdx: %d, rightIdx: %d",
+

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643315#comment-16643315
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223660320
 
 

 ##
 File path: 
contrib/format-maprdb/src/test/java/com/mapr/drill/maprdb/tests/index/LargeTableGen.java
 ##
 @@ -0,0 +1,176 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package com.mapr.drill.maprdb.tests.index;
+
+import static com.mapr.drill.maprdb.tests.MaprDBTestsSuite.INDEX_FLUSH_TIMEOUT;
+
+import java.io.InputStream;
+import java.io.StringBufferInputStream;
+
+import org.apache.hadoop.fs.Path;
+import org.ojai.DocumentStream;
+import org.ojai.json.Json;
+
+import com.mapr.db.Admin;
+import com.mapr.db.Table;
+import com.mapr.db.TableDescriptor;
+import com.mapr.db.impl.MapRDBImpl;
+import com.mapr.db.impl.TableDescriptorImpl;
+import com.mapr.db.tests.utils.DBTests;
+import com.mapr.fs.utils.ssh.TestCluster;
+
+/**
+ * This class is to generate a MapR json table of this schema:
+ * {
+ *   "address" : {
+ *  "city":"wtj",
+ *  "state":"ho"
+ *   }
+ *   "contact" : {
+ *  "email":"vcfahj...@gmail.com",
+ *  "phone":"655583"
+ *   }
+ *   "id" : {
+ *  "ssn":"15461"
+ *   }
+ *   "name" : {
+ *  "fname":"VcFahj",
+ *  "lname":"RfM"
+ *   }
+ * }
+ *
+ */
+public class LargeTableGen extends LargeTableGenBase {
+
+  static final int SPLIT_SIZE = 5000;
+  private Admin admin;
+
+  public LargeTableGen(Admin dbadmin) {
+admin = dbadmin;
+  }
+
+  Table createOrGetTable(String tableName, int recordNum) {
+if (admin.tableExists(tableName)) {
+  return MapRDBImpl.getTable(tableName);
+  //admin.deleteTable(tableName);
+}
+else {
+  TableDescriptor desc = new TableDescriptorImpl(new Path(tableName));
+
+  int splits = (recordNum / SPLIT_SIZE) - (((recordNum % SPLIT_SIZE) > 1)? 
0 : 1);
+
+  String[] splitsStr = new String[splits];
+  StringBuilder strBuilder = new StringBuilder("Splits:");
+  for(int i=0; i Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643306#comment-16643306
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223639490
 
 

 ##
 File path: contrib/format-maprdb/pom.xml
 ##
 @@ -83,6 +83,41 @@
   
 
   
+  
+org.apache.maven.plugins
+maven-jar-plugin
+
+  
+
+**/core-site.xml
+**/logback.xml
+  
+
+  
+
+  
+  
+org.codehaus.mojo
+build-helper-maven-plugin
+1.9.1
 
 Review comment:
   Since `build-helper-maven-plugin` appears 3 times in the project, could you 
introduce it in the `pluginManagement` of the root POM. It will allow to keep 
the version of this plugin in one place.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643344#comment-16643344
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223672779
 
 

 ##
 File path: pom.xml
 ##
 @@ -53,8 +53,8 @@
 2.9.5
 2.9.5
 3.4.12
-5.2.1-mapr
-1.1
+6.0.1-mapr
 
 Review comment:
   Use 6.1.0-mapr version
   It will be introduced after merging this PR:
   
https://github.com/apache/drill/pull/1489/files#diff-600376dffeb79835ede4a0b285078036R56


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643366#comment-16643366
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223680822
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/index/rules/MatchFunction.java
 ##
 @@ -0,0 +1,25 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.index.rules;
+
+import org.apache.calcite.plan.RelOptRuleCall;
+
+public interface MatchFunction {
+  boolean match(RelOptRuleCall call);
+  T onMatch(RelOptRuleCall call);
+}
 
 Review comment:
   new line


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643305#comment-16643305
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223370889
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/MapRDBPushProjectIntoScan.java
 ##
 @@ -0,0 +1,145 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.mapr.db;
+
+import com.google.common.collect.Lists;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptRuleOperand;
+import org.apache.calcite.plan.RelTrait;
+import org.apache.calcite.plan.RelTraitSet;
+import org.apache.calcite.rel.RelCollation;
+import org.apache.calcite.rel.rules.ProjectRemoveRule;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexNode;
+import org.apache.drill.common.exceptions.DrillRuntimeException;
+import org.apache.drill.exec.planner.common.DrillRelOptUtil;
+import org.apache.drill.exec.planner.logical.RelOptHelper;
+import org.apache.drill.exec.planner.common.DrillRelOptUtil;
+import org.apache.drill.exec.planner.common.DrillRelOptUtil.ProjectPushInfo;
+import org.apache.drill.exec.planner.physical.Prel;
+import org.apache.drill.exec.planner.physical.ProjectPrel;
+import org.apache.drill.exec.planner.physical.ScanPrel;
+import org.apache.drill.exec.store.StoragePluginOptimizerRule;
+import org.apache.drill.exec.store.mapr.db.binary.BinaryTableGroupScan;
+import org.apache.drill.exec.store.mapr.db.json.JsonTableGroupScan;
+import org.apache.drill.exec.util.Utilities;
+
+import java.util.List;
+
+public abstract class MapRDBPushProjectIntoScan extends 
StoragePluginOptimizerRule {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(MapRDBPushProjectIntoScan.class);
+
+  private MapRDBPushProjectIntoScan(RelOptRuleOperand operand, String 
description) {
+super(operand, description);
+  }
+
+  public static final StoragePluginOptimizerRule PROJECT_ON_SCAN = new 
MapRDBPushProjectIntoScan(
+  RelOptHelper.some(ProjectPrel.class, RelOptHelper.any(ScanPrel.class)), 
"MapRDBPushProjIntoScan:Proj_On_Scan") {
+@Override
+public void onMatch(RelOptRuleCall call) {
+  final ScanPrel scan = (ScanPrel) call.rel(1);
+  final ProjectPrel project = (ProjectPrel) call.rel(0);
+  if (!(scan.getGroupScan() instanceof MapRDBGroupScan)) {
+return;
+  }
+  doPushProjectIntoGroupScan(call, project, scan, (MapRDBGroupScan) 
scan.getGroupScan());
+  if (scan.getGroupScan() instanceof BinaryTableGroupScan) {
+BinaryTableGroupScan groupScan = (BinaryTableGroupScan) 
scan.getGroupScan();
+
+  } else {
+assert (scan.getGroupScan() instanceof JsonTableGroupScan);
+JsonTableGroupScan groupScan = (JsonTableGroupScan) 
scan.getGroupScan();
+
+doPushProjectIntoGroupScan(call, project, scan, groupScan);
+  }
+}
+
+@Override
+public boolean matches(RelOptRuleCall call) {
+  final ScanPrel scan = (ScanPrel) call.rel(1);
+  if (scan.getGroupScan() instanceof BinaryTableGroupScan ||
 
 Review comment:
   It is possible just to check `if (scan.getGroupScan() instanceof 
MapRDBGroupScan) ...` here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1664#comment-1664
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223670701
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/index/IndexPlanUtils.java
 ##
 @@ -0,0 +1,872 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.planner.index;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+import org.apache.drill.shaded.guava.com.google.common.collect.Maps;
+import org.apache.drill.shaded.guava.com.google.common.collect.Sets;
+
+import org.apache.calcite.plan.RelTraitSet;
+import org.apache.calcite.plan.volcano.RelSubset;
+import org.apache.calcite.rel.RelCollation;
+import org.apache.calcite.rel.RelCollationTraitDef;
+import org.apache.calcite.rel.RelCollations;
+import org.apache.calcite.rel.RelFieldCollation;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.Sort;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexUtil;
+import org.apache.calcite.rex.RexLiteral;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.drill.common.expression.FieldReference;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.physical.base.DbGroupScan;
+import org.apache.drill.exec.physical.base.GroupScan;
+import org.apache.drill.exec.physical.base.IndexGroupScan;
+import org.apache.drill.exec.planner.common.DrillProjectRelBase;
+import org.apache.drill.exec.planner.common.DrillScanRelBase;
+import org.apache.drill.exec.planner.fragment.DistributionAffinity;
+import org.apache.drill.exec.planner.logical.DrillOptiq;
+import org.apache.drill.exec.planner.logical.DrillParseContext;
+import org.apache.drill.exec.planner.logical.DrillScanRel;
+import org.apache.drill.exec.planner.physical.DrillDistributionTrait;
+import org.apache.drill.exec.planner.physical.Prel;
+import org.apache.drill.exec.planner.physical.PrelUtil;
+import org.apache.drill.exec.planner.physical.ScanPrel;
+import org.apache.drill.exec.planner.physical.ProjectPrel;
+import org.apache.drill.exec.planner.common.OrderedRel;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.calcite.rex.RexNode;
+
+public class IndexPlanUtils {
+
+  public enum ConditionIndexed {
+NONE,
+PARTIAL,
+FULL}
+
+  /**
+   * Check if any of the fields of the index are present in a list of 
LogicalExpressions supplied
+   * as part of IndexableExprMarker
+   * @param exprMarker, the marker that has analyzed original index condition 
on top of original scan
+   * @param indexDesc
+   * @return ConditionIndexed.FULL, PARTIAL or NONE depending on whether all, 
some or no columns
+   * of the indexDesc are present in the list of LogicalExpressions supplied 
as part of exprMarker
+   *
+   */
+  static public ConditionIndexed conditionIndexed(IndexableExprMarker 
exprMarker, IndexDescriptor indexDesc) {
+Map mapRexExpr = 
exprMarker.getIndexableExpression();
+List infoCols = Lists.newArrayList();
+infoCols.addAll(mapRexExpr.values());
+if (indexDesc.allColumnsIndexed(infoCols)) {
+  return ConditionIndexed.FULL;
+} else if (indexDesc.someColumnsIndexed(infoCols)) {
+  return ConditionIndexed.PARTIAL;
+} else {
+  return ConditionIndexed.NONE;
+}
+  }
+
+  /**
+   * check if we want to apply index rules on this scan,
+   * if group scan is not instance of DbGroupScan, or this DbGroupScan 
instance does not support secondary index, or
+   *this scan is already an index scan or Restricted Scan, do not

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643339#comment-16643339
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223654400
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/json/OjaiFunctionsProcessor.java
 ##
 @@ -0,0 +1,214 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.mapr.db.json;
+
+import org.apache.commons.codec.binary.Base64;
+
+import org.apache.drill.common.expression.FunctionCall;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.expression.ValueExpressions.IntExpression;
+import org.apache.drill.common.expression.ValueExpressions.LongExpression;
+import org.apache.drill.common.expression.ValueExpressions.QuotedString;
+import org.apache.drill.common.expression.visitors.AbstractExprVisitor;
+
+import org.ojai.Value;
+import org.ojai.store.QueryCondition;
+
+import com.google.common.collect.ImmutableMap;
+import com.mapr.db.impl.ConditionImpl;
+import com.mapr.db.impl.MapRDBImpl;
+
+import java.nio.ByteBuffer;
+
+class OjaiFunctionsProcessor extends AbstractExprVisitor {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(OjaiFunctionsProcessor.class);
+  private QueryCondition queryCond;
+
+  private OjaiFunctionsProcessor() {
+  }
+
+  private static String getStackTrace() {
+final Throwable throwable = new Throwable();
+final StackTraceElement[] ste = throwable.getStackTrace();
+final StringBuilder sb = new StringBuilder();
+for(int i = 1; i < ste.length; ++i) {
 
 Review comment:
   space


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643328#comment-16643328
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223661752
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/IndexGroupScan.java
 ##
 @@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.base;
+
+import com.fasterxml.jackson.annotation.JsonIgnore;
+
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rex.RexNode;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.planner.index.Statistics;
+
+
+import java.util.List;
+
+/**
+ * An IndexGroupScan operator represents the scan associated with an Index.
+ */
+public interface IndexGroupScan extends GroupScan {
+
+  /**
+   * Get the column ordinal of the rowkey column from the output schema of the 
IndexGroupScan
+   * @return
+   */
+  @JsonIgnore
+  public int getRowKeyOrdinal();
+
+  /**
+   * Set the artificial row count after applying the {@link RexNode} condition
+   * Mainly used for debugging
+   * @param condition
+   * @param count
+   * @param capRowCount
+   */
+  @JsonIgnore
+  public void setRowCount(RexNode condition, double count, double capRowCount);
+
+  /**
+   * Get the row count after applying the {@link RexNode} condition
+   * @param condition, filter to apply
+   * @return row count post filtering
+   */
+  @JsonIgnore
+  public double getRowCount(RexNode condition, RelNode scanRel);
+
+  /**
+   * Set the statistics for {@link IndexGroupScan}
+   * @param statistics
+   */
+  @JsonIgnore
+  public void setStatistics(Statistics statistics);
+
+  @JsonIgnore
+  public void setColumns(List columns);
+
+  @JsonIgnore
+  public List getColumns();
+
+  @JsonIgnore
+  public void setParallelizationWidth(int width);
+
+}
 
 Review comment:
   new line


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643320#comment-16643320
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223652043
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/json/MaprDBJsonRecordReader.java
 ##
 @@ -98,91 +114,181 @@
   private final boolean disableCountOptimization;
   private final boolean nonExistentColumnsProjection;
 
-  public MaprDBJsonRecordReader(MapRDBSubScanSpec subScanSpec,
-  MapRDBFormatPluginConfig formatPluginConfig,
-  List projectedColumns, FragmentContext context) {
+  protected final MapRDBSubScanSpec subScanSpec;
+  protected final MapRDBFormatPlugin formatPlugin;
+
+  protected OjaiValueWriter valueWriter;
+  protected DocumentReaderVectorWriter documentWriter;
+  protected int maxRecordsToRead = -1;
+
+  public MaprDBJsonRecordReader(MapRDBSubScanSpec subScanSpec, 
MapRDBFormatPlugin formatPlugin,
+List projectedColumns, 
FragmentContext context, int maxRecords) {
+this(subScanSpec, formatPlugin, projectedColumns, context);
+this.maxRecordsToRead = maxRecords;
+  }
+
+  protected MaprDBJsonRecordReader(MapRDBSubScanSpec subScanSpec, 
MapRDBFormatPlugin formatPlugin,
+List projectedColumns, 
FragmentContext context) {
 buffer = context.getManagedBuffer();
-projectedFields = null;
-tableName = Preconditions.checkNotNull(subScanSpec, "MapRDB reader needs a 
sub-scan spec").getTableName();
-documentReaderIterators = null;
-includeId = false;
-idOnly= false;
+final Path tablePath = new Path(Preconditions.checkNotNull(subScanSpec,
+  "MapRDB reader needs a sub-scan spec").getTableName());
+this.subScanSpec = subScanSpec;
+this.formatPlugin = formatPlugin;
+final IndexDesc indexDesc = subScanSpec.getIndexDesc();
 byte[] serializedFilter = subScanSpec.getSerializedFilter();
 condition = null;
 
 if (serializedFilter != null) {
   condition = 
com.mapr.db.impl.ConditionImpl.parseFrom(ByteBufs.wrap(serializedFilter));
 }
 
-disableCountOptimization = formatPluginConfig.disableCountOptimization();
+disableCountOptimization = 
formatPlugin.getConfig().disableCountOptimization();
+// Below call will set the scannedFields and includeId correctly
 setColumns(projectedColumns);
-unionEnabled = 
context.getOptions().getBoolean(ExecConstants.ENABLE_UNION_TYPE_KEY);
-readNumbersAsDouble = formatPluginConfig.isReadAllNumbersAsDouble();
-allTextMode = formatPluginConfig.isAllTextMode();
-ignoreSchemaChange = formatPluginConfig.isIgnoreSchemaChange();
-disablePushdown = !formatPluginConfig.isEnablePushdown();
-nonExistentColumnsProjection = 
formatPluginConfig.isNonExistentFieldSupport();
+unionEnabled = 
context.getOptions().getOption(ExecConstants.ENABLE_UNION_TYPE);
+readNumbersAsDouble = formatPlugin.getConfig().isReadAllNumbersAsDouble();
+allTextMode = formatPlugin.getConfig().isAllTextMode();
+ignoreSchemaChange = formatPlugin.getConfig().isIgnoreSchemaChange();
+disablePushdown = !formatPlugin.getConfig().isEnablePushdown();
+nonExistentColumnsProjection = 
formatPlugin.getConfig().isNonExistentFieldSupport();
+
+// Do not use cached table handle for two reasons.
+// cached table handles default timeout is 60 min after which those 
handles will become stale.
+// Since execution can run for longer than 60 min, we want to get a new 
table handle and use it
+// instead of the one from cache.
+// Since we are setting some table options, we do not want to use shared 
handles.
+//
+// Call it here instead of setup since this will make sure it's called 
under correct UGI block when impersonation
+// is enabled and table is used with and without views.
+table = (indexDesc == null ? MapRDBImpl.getTable(tablePath) : 
MapRDBImpl.getIndexTable(indexDesc));
+
+if (condition != null) {
+  logger.debug("Created record reader with query condition {}", 
condition.toString());
+} else {
+  logger.debug("Created record reader with query condition NULL");
+}
   }
 
   @Override
   protected Collection transformColumns(Collection 
columns) {
 Set transformed = Sets.newLinkedHashSet();
+Set encodedSchemaPathSet = Sets.newLinkedHashSet();
+
 if (disablePushdown) {
   transformed.add(SchemaPath.STAR_COLUMN);
   includeId = true;
-  return transformed;
-}
+} else {
+  if (isStarQuery()) {
+transformed.add(SchemaPath.STAR_COLUMN);
+includeId = true;
+if (isSkipQuery() && !disableCountOptimization) {
+  // `SELECT

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643346#comment-16643346
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223664534
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/RowKeyJoinBatch.java
 ##
 @@ -0,0 +1,284 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.join;
+
+
+import java.util.List;
+
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.drill.exec.exception.OutOfMemoryException;
+import org.apache.drill.exec.exception.SchemaChangeException;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.physical.config.RowKeyJoinPOP;
+import org.apache.drill.exec.record.AbstractRecordBatch;
+import org.apache.drill.exec.record.RecordBatch;
+import org.apache.drill.exec.record.TransferPair;
+import org.apache.drill.exec.record.VectorWrapper;
+import org.apache.drill.exec.record.selection.SelectionVector2;
+import org.apache.drill.exec.record.selection.SelectionVector4;
+import org.apache.drill.exec.vector.SchemaChangeCallBack;
+import org.apache.drill.exec.vector.ValueVector;
+
+import org.apache.drill.shaded.guava.com.google.common.collect.Iterables;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+
+
+public class RowKeyJoinBatch extends AbstractRecordBatch 
implements RowKeyJoin {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(RowKeyJoinBatch.class);
+
+  // primary table side record batch
+  private final RecordBatch left;
+
+  // index table side record batch
+  private final RecordBatch right;
+
+  private boolean hasRowKeyBatch;
+  private IterOutcome leftUpstream = IterOutcome.NONE;
+  private IterOutcome rightUpstream = IterOutcome.NONE;
+  private final List transfers = Lists.newArrayList();
+  private int recordCount = 0;
+  private SchemaChangeCallBack callBack = new SchemaChangeCallBack();
+  private RowKeyJoinState rkJoinState = RowKeyJoinState.INITIAL;
+
+  public RowKeyJoinBatch(RowKeyJoinPOP config, FragmentContext context, 
RecordBatch left, RecordBatch right)
+  throws OutOfMemoryException {
+super(config, context, true /* need to build schema */);
+this.left = left;
+this.right = right;
+this.hasRowKeyBatch = false;
+  }
+
+  @Override
+  public int getRecordCount() {
+if (state == BatchState.DONE) {
+  return 0;
+}
+return recordCount;
+  }
+
+  @Override
+  public SelectionVector2 getSelectionVector2() {
+throw new UnsupportedOperationException("RowKeyJoinBatch does not support 
selection vector");
+  }
+
+  @Override
+  public SelectionVector4 getSelectionVector4() {
+throw new UnsupportedOperationException("RowKeyJoinBatch does not support 
selection vector");
+  }
+
+  @Override
+  protected void buildSchema() throws SchemaChangeException {
+container.clear();
+
+rightUpstream = next(right);
+
+if (leftUpstream == IterOutcome.STOP || rightUpstream == IterOutcome.STOP) 
{
+  state = BatchState.STOP;
+  return;
+}
+
+if (right.getRecordCount() > 0) {
+  // set the hasRowKeyBatch flag such that calling next() on the left input
+  // would see the correct status
+  hasRowKeyBatch = true;
+}
+
+leftUpstream = next(left);
+
+if (leftUpstream == IterOutcome.OUT_OF_MEMORY || rightUpstream == 
IterOutcome.OUT_OF_MEMORY) {
+  state = BatchState.OUT_OF_MEMORY;
+  return;
+}
+
+for(final VectorWrapper v : left) {
+  final TransferPair pair = v.getValueVector().makeTransferPair(
+  container.addOrGet(v.getField(), callBack));
+  transfers.add(pair);
+}
+
+container.buildSchema(left.getSchema().getSelectionVectorMode());
+  }
+
+  @Override
+  public IterOutcome innerNext() {
+if (state == BatchState.DONE) {
+  return IterOutcome.NONE;
+}
+try {
+  if (state == BatchState.FIRST &&

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643304#comment-16643304
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r217477453
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/MapRDBSubScanSpec.java
 ##
 @@ -19,32 +19,39 @@
 
 import com.fasterxml.jackson.annotation.JsonCreator;
 import com.fasterxml.jackson.annotation.JsonProperty;
+import com.mapr.db.index.IndexDesc;
 import com.mapr.fs.jni.MapRConstants;
 import com.mapr.org.apache.hadoop.hbase.util.Bytes;
 
-public class MapRDBSubScanSpec {
+public class MapRDBSubScanSpec implements Comparable{
 
 Review comment:
   space


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643357#comment-16643357
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223668951
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/index/IndexConditionInfo.java
 ##
 @@ -0,0 +1,250 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.index;
+
+
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+import org.apache.drill.shaded.guava.com.google.common.collect.Maps;
+import org.apache.drill.shaded.guava.com.google.common.collect.Sets;
+import org.apache.calcite.plan.RelOptUtil;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.rex.RexUtil;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.exec.planner.logical.DrillScanRel;
+import 
org.apache.drill.exec.planner.logical.partition.RewriteCombineBinaryOperators;
+import org.apache.calcite.rel.RelNode;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+public class IndexConditionInfo {
+  public final RexNode indexCondition;
+  public final RexNode remainderCondition;
+  public final boolean hasIndexCol;
+
+  public IndexConditionInfo(RexNode indexCondition, RexNode 
remainderCondition, boolean hasIndexCol) {
+this.indexCondition = indexCondition;
+this.remainderCondition = remainderCondition;
+this.hasIndexCol = hasIndexCol;
+  }
+
+  public static Builder newBuilder(RexNode condition,
+   Iterable indexes,
+   RexBuilder builder,
+   RelNode scan) {
+return new Builder(condition, indexes, builder, scan);
+  }
+
+  public static class Builder {
+final RexBuilder builder;
+final RelNode scan;
+final Iterable indexes;
+private RexNode condition;
+
+public Builder(RexNode condition,
+   Iterable indexes,
+   RexBuilder builder,
+   RelNode scan
+) {
+  this.condition = condition;
+  this.builder = builder;
+  this.scan = scan;
+  this.indexes = indexes;
+}
+
+public Builder(RexNode condition,
+   IndexDescriptor index,
+   RexBuilder builder,
+   DrillScanRel scan
+) {
+  this.condition = condition;
+  this.builder = builder;
+  this.scan = scan;
+  this.indexes = Lists.newArrayList(index);
+}
+
+/**
+ * Get a single IndexConditionInfo in which indexCondition has field  on 
all indexes in this.indexes
+ * @return
+ */
+public IndexConditionInfo getCollectiveInfo(IndexLogicalPlanCallContext 
indexContext) {
+  Set paths = Sets.newLinkedHashSet();
+  for ( IndexDescriptor index : indexes ) {
+paths.addAll(index.getIndexColumns());
+//paths.addAll(index.getNonIndexColumns());
+  }
+  return indexConditionRelatedToFields(Lists.newArrayList(paths), 
condition);
+}
+
+/*
+ * A utility function to check whether the given index hint is valid.
+ */
+public boolean isValidIndexHint(IndexLogicalPlanCallContext indexContext) {
+  if (indexContext.indexHint.equals("")) { return false; }
+
+  for ( IndexDescriptor index: indexes ) {
+if ( indexContext.indexHint.equals(index.getIndexName())) {
+  return true;
+}
+  }
+  return false;
+}
+
+/**
+ * Get a map of Index=>IndexConditionInfo, each IndexConditionInfo has the 
separated condition and remainder condition.
+ * The map is ordered, so the last IndexDescriptor will have the final 
remainderCondition after separating conditions
+ * that are relevant to this.indexes. The conditions are separated on

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643311#comment-16643311
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223381370
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/MapRDBPushFilterIntoScan.java
 ##
 @@ -137,11 +137,10 @@ protected void 
doPushFilterIntoJsonGroupScan(RelOptRuleCall call,
   return; //no filter pushdown ==> No transformation.
 }
 
-// clone the groupScan with the newScanSpec.
-final JsonTableGroupScan newGroupsScan = groupScan.clone(newScanSpec);
+final JsonTableGroupScan newGroupsScan = (JsonTableGroupScan) 
groupScan.clone(newScanSpec);
 
 Review comment:
   Could you remove redundant casts in the class? 
   Note: not in this string, I just can't leave the comment above :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643326#comment-16643326
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223655155
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/udf/mapr/db/MatchesPlaceholder.java
 ##
 @@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.udf.mapr.db;
+
+import org.apache.drill.exec.expr.DrillSimpleFunc;
+import org.apache.drill.exec.expr.annotations.FunctionTemplate;
+import org.apache.drill.exec.expr.annotations.Output;
+import org.apache.drill.exec.expr.annotations.Param;
+import org.apache.drill.exec.expr.holders.BigIntHolder;
+import org.apache.drill.exec.expr.holders.BitHolder;
+import org.apache.drill.exec.expr.holders.VarCharHolder;
+
+/**
+ * This is a placeholder for the matches() function.
+ *
+ * At this time, this function can only be used in predicates. The placeholder
+ * is here to prevent calcite from complaining; the function will get pushed 
down
+ * by the storage plug-in into DB. That process will go through 
JsonConditionBuilder.java,
+ * which will replace this function with the real OJAI equivalent to be pushed 
down.
+ * Therefore, there's no implementation here.
+ */
+@FunctionTemplate(
 
 Review comment:
   Make this `FunctionTemplate` formatting similar to others


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643343#comment-16643343
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223670799
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/index/IndexPlanUtils.java
 ##
 @@ -0,0 +1,872 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.planner.index;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+import org.apache.drill.shaded.guava.com.google.common.collect.Maps;
+import org.apache.drill.shaded.guava.com.google.common.collect.Sets;
+
+import org.apache.calcite.plan.RelTraitSet;
+import org.apache.calcite.plan.volcano.RelSubset;
+import org.apache.calcite.rel.RelCollation;
+import org.apache.calcite.rel.RelCollationTraitDef;
+import org.apache.calcite.rel.RelCollations;
+import org.apache.calcite.rel.RelFieldCollation;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.Sort;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexUtil;
+import org.apache.calcite.rex.RexLiteral;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.drill.common.expression.FieldReference;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.physical.base.DbGroupScan;
+import org.apache.drill.exec.physical.base.GroupScan;
+import org.apache.drill.exec.physical.base.IndexGroupScan;
+import org.apache.drill.exec.planner.common.DrillProjectRelBase;
+import org.apache.drill.exec.planner.common.DrillScanRelBase;
+import org.apache.drill.exec.planner.fragment.DistributionAffinity;
+import org.apache.drill.exec.planner.logical.DrillOptiq;
+import org.apache.drill.exec.planner.logical.DrillParseContext;
+import org.apache.drill.exec.planner.logical.DrillScanRel;
+import org.apache.drill.exec.planner.physical.DrillDistributionTrait;
+import org.apache.drill.exec.planner.physical.Prel;
+import org.apache.drill.exec.planner.physical.PrelUtil;
+import org.apache.drill.exec.planner.physical.ScanPrel;
+import org.apache.drill.exec.planner.physical.ProjectPrel;
+import org.apache.drill.exec.planner.common.OrderedRel;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.calcite.rex.RexNode;
+
+public class IndexPlanUtils {
+
+  public enum ConditionIndexed {
+NONE,
+PARTIAL,
+FULL}
+
+  /**
+   * Check if any of the fields of the index are present in a list of 
LogicalExpressions supplied
+   * as part of IndexableExprMarker
+   * @param exprMarker, the marker that has analyzed original index condition 
on top of original scan
+   * @param indexDesc
+   * @return ConditionIndexed.FULL, PARTIAL or NONE depending on whether all, 
some or no columns
+   * of the indexDesc are present in the list of LogicalExpressions supplied 
as part of exprMarker
+   *
+   */
+  static public ConditionIndexed conditionIndexed(IndexableExprMarker 
exprMarker, IndexDescriptor indexDesc) {
+Map mapRexExpr = 
exprMarker.getIndexableExpression();
+List infoCols = Lists.newArrayList();
+infoCols.addAll(mapRexExpr.values());
+if (indexDesc.allColumnsIndexed(infoCols)) {
+  return ConditionIndexed.FULL;
+} else if (indexDesc.someColumnsIndexed(infoCols)) {
+  return ConditionIndexed.PARTIAL;
+} else {
+  return ConditionIndexed.NONE;
+}
+  }
+
+  /**
+   * check if we want to apply index rules on this scan,
+   * if group scan is not instance of DbGroupScan, or this DbGroupScan 
instance does not support secondary index, or
+   *this scan is already an index scan or Restricted Scan, do not

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643313#comment-16643313
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223659334
 
 

 ##
 File path: 
contrib/format-maprdb/src/test/java/com/mapr/drill/maprdb/tests/index/LargeTableGen.java
 ##
 @@ -0,0 +1,176 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package com.mapr.drill.maprdb.tests.index;
+
+import static com.mapr.drill.maprdb.tests.MaprDBTestsSuite.INDEX_FLUSH_TIMEOUT;
+
+import java.io.InputStream;
+import java.io.StringBufferInputStream;
+
+import org.apache.hadoop.fs.Path;
+import org.ojai.DocumentStream;
+import org.ojai.json.Json;
+
+import com.mapr.db.Admin;
+import com.mapr.db.Table;
+import com.mapr.db.TableDescriptor;
+import com.mapr.db.impl.MapRDBImpl;
+import com.mapr.db.impl.TableDescriptorImpl;
+import com.mapr.db.tests.utils.DBTests;
+import com.mapr.fs.utils.ssh.TestCluster;
+
+/**
+ * This class is to generate a MapR json table of this schema:
+ * {
+ *   "address" : {
+ *  "city":"wtj",
+ *  "state":"ho"
+ *   }
+ *   "contact" : {
+ *  "email":"vcfahj...@gmail.com",
+ *  "phone":"655583"
+ *   }
+ *   "id" : {
+ *  "ssn":"15461"
+ *   }
+ *   "name" : {
+ *  "fname":"VcFahj",
+ *  "lname":"RfM"
+ *   }
+ * }
+ *
+ */
+public class LargeTableGen extends LargeTableGenBase {
+
+  static final int SPLIT_SIZE = 5000;
+  private Admin admin;
+
+  public LargeTableGen(Admin dbadmin) {
+admin = dbadmin;
+  }
+
+  Table createOrGetTable(String tableName, int recordNum) {
+if (admin.tableExists(tableName)) {
+  return MapRDBImpl.getTable(tableName);
+  //admin.deleteTable(tableName);
+}
+else {
+  TableDescriptor desc = new TableDescriptorImpl(new Path(tableName));
+
+  int splits = (recordNum / SPLIT_SIZE) - (((recordNum % SPLIT_SIZE) > 1)? 
0 : 1);
+
+  String[] splitsStr = new String[splits];
+  StringBuilder strBuilder = new StringBuilder("Splits:");
+  for(int i=0; i Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643336#comment-16643336
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223667884
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/index/DrillIndexDescriptor.java
 ##
 @@ -0,0 +1,110 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.index;
+
+import org.apache.calcite.rel.RelFieldCollation.NullDirection;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rex.RexNode;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.exec.physical.base.GroupScan;
+import org.apache.drill.exec.physical.base.IndexGroupScan;
+import org.apache.drill.exec.planner.cost.PluginCost;
+import org.apache.drill.exec.planner.logical.DrillTable;
+
+import java.io.IOException;
+import java.util.List;
+
+public class DrillIndexDescriptor extends AbstractIndexDescriptor {
+
+  /**
+   * The name of Drill's Storage Plugin on which the Index was stored
+   */
+  private String storage;
+
+  private DrillTable table;
+
+  public DrillIndexDescriptor(List indexCols,
+  CollationContext indexCollationContext,
+  List nonIndexCols,
+  List rowKeyColumns,
+  String indexName,
+  String tableName,
+  IndexType type,
+  NullDirection nullsDirection) {
+super(indexCols, indexCollationContext, nonIndexCols, rowKeyColumns, 
indexName, tableName, type, nullsDirection);
+  }
+
+  public DrillIndexDescriptor(DrillIndexDefinition def) {
+this(def.indexColumns, def.indexCollationContext, def.nonIndexColumns, 
def.rowKeyColumns, def.indexName,
+def.getTableName(), def.getIndexType(), def.nullsDirection);
+  }
+
+  @Override
+  public double getRows(RelNode scan, RexNode indexCondition) {
+//TODO: real implementation is to use Drill's stats implementation. for 
now return fake value 1.0
+return 1.0;
+  }
+
+  @Override
+  public IndexGroupScan getIndexGroupScan() {
+try {
+  final DrillTable idxTable = getDrillTable();
+  GroupScan scan = idxTable.getGroupScan();
+
+  if (!(scan instanceof IndexGroupScan)){
+logger.error("The Groupscan from table {} is not an IndexGroupScan", 
idxTable.toString());
+return null;
+  }
+  return (IndexGroupScan)scan;
+}
+catch(IOException e) {
+  logger.error("Error in getIndexGroupScan ", e);
+}
+return null;
+  }
+
+  public void attach(String storageName, DrillTable inTable) {
 
 Review comment:
   Are these `@Override` methods? If no, add javadoc for them


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643330#comment-16643330
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r217475314
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/planner/index/MapRDBFunctionalIndexInfo.java
 ##
 @@ -0,0 +1,163 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.index;
+
+import com.google.common.collect.Maps;
+import com.google.common.collect.Sets;
+import org.apache.drill.common.expression.CastExpression;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.common.expression.SchemaPath;
+
+import java.util.Map;
+import java.util.Set;
+
+public class MapRDBFunctionalIndexInfo implements FunctionalIndexInfo {
+
+  final private IndexDescriptor indexDesc;
+
+  private boolean hasFunctionalField = false;
+
+  //when we scan schemaPath in groupscan's columns, we check if this 
column(schemaPath) should be rewritten to '$N',
+  //When there are more than two functions on the same column in index, 
CAST(a.b as INT), CAST(a.b as VARCHAR),
+  // then we should map SchemaPath a.b to a set of SchemaPath, e.g. $1, $2
+  private Map> columnToConvert;
+
+  // map of functional index expression to destination SchemaPath e.g. $N
+  private Map exprToConvert;
+
+  //map of SchemaPath involved in a functional field
+  private Map> pathsInExpr;
+
+  private Set newPathsForIndexedFunction;
+
+  private Set allPathsInFunction;
+
+  public MapRDBFunctionalIndexInfo(IndexDescriptor indexDesc) {
+this.indexDesc = indexDesc;
+columnToConvert = Maps.newHashMap();
+exprToConvert = Maps.newHashMap();
+pathsInExpr = Maps.newHashMap();
+//keep the order of new paths, it may be related to the naming policy
+newPathsForIndexedFunction = Sets.newLinkedHashSet();
+allPathsInFunction = Sets.newHashSet();
+init();
+  }
+
+  private void init() {
+int count = 0;
+for(LogicalExpression indexedExpr : indexDesc.getIndexColumns()) {
+  if( !(indexedExpr instanceof SchemaPath) ) {
+hasFunctionalField = true;
+SchemaPath functionalFieldPath = SchemaPath.getSimplePath("$"+count);
+newPathsForIndexedFunction.add(functionalFieldPath);
+
+//now we handle only cast expression
+if(indexedExpr instanceof CastExpression) {
+  //We handle only CAST directly on SchemaPath for now.
+  SchemaPath pathBeingCasted = (SchemaPath)((CastExpression) 
indexedExpr).getInput();
+  addTargetPathForOriginalPath(pathBeingCasted, functionalFieldPath);
+  addPathInExpr(indexedExpr, pathBeingCasted);
+  exprToConvert.put(indexedExpr, functionalFieldPath);
+  allPathsInFunction.add(pathBeingCasted);
+}
+
+count++;
+  }
+}
+  }
+
+  private void addPathInExpr(LogicalExpression expr, SchemaPath path) {
+if (!pathsInExpr.containsKey(expr)) {
+  Set newSet = Sets.newHashSet();
+  newSet.add(path);
+  pathsInExpr.put(expr, newSet);
+}
+else {
+  pathsInExpr.get(expr).add(path);
+}
+  }
+
+  private void addTargetPathForOriginalPath(SchemaPath origPath, SchemaPath 
newPath) {
+if (!columnToConvert.containsKey(origPath)) {
+  Set newSet = Sets.newHashSet();
+  newSet.add(newPath);
+  columnToConvert.put(origPath, newSet);
+}
+else {
+  columnToConvert.get(origPath).add(newPath);
+}
+  }
+
+
+  public boolean hasFunctional() {
+return hasFunctionalField;
+  }
+
+  public IndexDescriptor getIndexDesc() {
+return indexDesc;
+  }
+
+  /**
+   * getNewPath: for an original path, return new rename '$N' path, notice 
there could be multiple renamed paths
+   * if the there are multiple functional indexes refer original path.
+   * @param path
+   * @return
+   */
+  public SchemaPath getNewPath(SchemaPath path) {
+if(columnToConvert.containsKey(path)) {
 
 Review

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643335#comment-16643335
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223667043
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillRelMdSelectivity.java
 ##
 @@ -0,0 +1,90 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.cost;
+
+import org.apache.calcite.plan.volcano.RelSubset;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.metadata.ReflectiveRelMetadataProvider;
+import org.apache.calcite.rel.metadata.RelMdSelectivity;
+import org.apache.calcite.rel.metadata.RelMdUtil;
+import org.apache.calcite.rel.metadata.RelMetadataProvider;
+import org.apache.calcite.rel.metadata.RelMetadataQuery;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.util.BuiltInMethod;
+import org.apache.drill.exec.physical.base.DbGroupScan;
+import org.apache.drill.exec.physical.base.GroupScan;
+import org.apache.drill.exec.planner.logical.DrillScanRel;
+import org.apache.drill.exec.planner.physical.PlannerSettings;
+import org.apache.drill.exec.planner.physical.PrelUtil;
+import org.apache.drill.exec.planner.physical.ScanPrel;
+
+import java.util.List;
+
+public class DrillRelMdSelectivity extends RelMdSelectivity {
+  private static final DrillRelMdSelectivity INSTANCE = new 
DrillRelMdSelectivity();
+
+  public static final RelMetadataProvider SOURCE = 
ReflectiveRelMetadataProvider.reflectiveSource(BuiltInMethod.SELECTIVITY.method,
 INSTANCE);
+
+
+  public Double getSelectivity(RelNode rel, RexNode predicate) {
 
 Review comment:
   Why super methods can' be used instead? Is it necessary to improve them in 
Calcite?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643369#comment-16643369
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223671209
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/index/IndexPlanUtils.java
 ##
 @@ -0,0 +1,872 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.planner.index;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+import org.apache.drill.shaded.guava.com.google.common.collect.Maps;
+import org.apache.drill.shaded.guava.com.google.common.collect.Sets;
+
+import org.apache.calcite.plan.RelTraitSet;
+import org.apache.calcite.plan.volcano.RelSubset;
+import org.apache.calcite.rel.RelCollation;
+import org.apache.calcite.rel.RelCollationTraitDef;
+import org.apache.calcite.rel.RelCollations;
+import org.apache.calcite.rel.RelFieldCollation;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.Sort;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexUtil;
+import org.apache.calcite.rex.RexLiteral;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.drill.common.expression.FieldReference;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.physical.base.DbGroupScan;
+import org.apache.drill.exec.physical.base.GroupScan;
+import org.apache.drill.exec.physical.base.IndexGroupScan;
+import org.apache.drill.exec.planner.common.DrillProjectRelBase;
+import org.apache.drill.exec.planner.common.DrillScanRelBase;
+import org.apache.drill.exec.planner.fragment.DistributionAffinity;
+import org.apache.drill.exec.planner.logical.DrillOptiq;
+import org.apache.drill.exec.planner.logical.DrillParseContext;
+import org.apache.drill.exec.planner.logical.DrillScanRel;
+import org.apache.drill.exec.planner.physical.DrillDistributionTrait;
+import org.apache.drill.exec.planner.physical.Prel;
+import org.apache.drill.exec.planner.physical.PrelUtil;
+import org.apache.drill.exec.planner.physical.ScanPrel;
+import org.apache.drill.exec.planner.physical.ProjectPrel;
+import org.apache.drill.exec.planner.common.OrderedRel;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.calcite.rex.RexNode;
+
+public class IndexPlanUtils {
+
+  public enum ConditionIndexed {
+NONE,
+PARTIAL,
+FULL}
+
+  /**
+   * Check if any of the fields of the index are present in a list of 
LogicalExpressions supplied
+   * as part of IndexableExprMarker
+   * @param exprMarker, the marker that has analyzed original index condition 
on top of original scan
+   * @param indexDesc
+   * @return ConditionIndexed.FULL, PARTIAL or NONE depending on whether all, 
some or no columns
+   * of the indexDesc are present in the list of LogicalExpressions supplied 
as part of exprMarker
+   *
+   */
+  static public ConditionIndexed conditionIndexed(IndexableExprMarker 
exprMarker, IndexDescriptor indexDesc) {
+Map mapRexExpr = 
exprMarker.getIndexableExpression();
+List infoCols = Lists.newArrayList();
+infoCols.addAll(mapRexExpr.values());
+if (indexDesc.allColumnsIndexed(infoCols)) {
+  return ConditionIndexed.FULL;
+} else if (indexDesc.someColumnsIndexed(infoCols)) {
+  return ConditionIndexed.PARTIAL;
+} else {
+  return ConditionIndexed.NONE;
+}
+  }
+
+  /**
+   * check if we want to apply index rules on this scan,
+   * if group scan is not instance of DbGroupScan, or this DbGroupScan 
instance does not support secondary index, or
+   *this scan is already an index scan or Restricted Scan, do not

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643312#comment-16643312
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223373337
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/MapRDBPushFilterIntoScan.java
 ##
 @@ -187,7 +186,7 @@ protected void doPushFilterIntoBinaryGroupScan(final 
RelOptRuleCall call,
 
groupScan.getTableStats());
 newGroupsScan.setFilterPushedDown(true);
 
-final ScanPrel newScanPrel = ScanPrel.create(scan, filter.getTraitSet(), 
newGroupsScan, scan.getRowType());
+final ScanPrel newScanPrel = ScanPrel.create(scan, filter.getTraitSet(), 
newGroupsScan, scan.getRowType(), scan.getTable());
 
 Review comment:
   I have made changes in `ScanPrel`. Please use constructor instead of method.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643322#comment-16643322
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223664484
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/RowKeyJoinBatch.java
 ##
 @@ -0,0 +1,284 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.join;
+
+
+import java.util.List;
+
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.drill.exec.exception.OutOfMemoryException;
+import org.apache.drill.exec.exception.SchemaChangeException;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.physical.config.RowKeyJoinPOP;
+import org.apache.drill.exec.record.AbstractRecordBatch;
+import org.apache.drill.exec.record.RecordBatch;
+import org.apache.drill.exec.record.TransferPair;
+import org.apache.drill.exec.record.VectorWrapper;
+import org.apache.drill.exec.record.selection.SelectionVector2;
+import org.apache.drill.exec.record.selection.SelectionVector4;
+import org.apache.drill.exec.vector.SchemaChangeCallBack;
+import org.apache.drill.exec.vector.ValueVector;
+
+import org.apache.drill.shaded.guava.com.google.common.collect.Iterables;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+
+
+public class RowKeyJoinBatch extends AbstractRecordBatch 
implements RowKeyJoin {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(RowKeyJoinBatch.class);
+
+  // primary table side record batch
+  private final RecordBatch left;
+
+  // index table side record batch
+  private final RecordBatch right;
+
+  private boolean hasRowKeyBatch;
+  private IterOutcome leftUpstream = IterOutcome.NONE;
+  private IterOutcome rightUpstream = IterOutcome.NONE;
+  private final List transfers = Lists.newArrayList();
+  private int recordCount = 0;
+  private SchemaChangeCallBack callBack = new SchemaChangeCallBack();
+  private RowKeyJoinState rkJoinState = RowKeyJoinState.INITIAL;
+
+  public RowKeyJoinBatch(RowKeyJoinPOP config, FragmentContext context, 
RecordBatch left, RecordBatch right)
+  throws OutOfMemoryException {
+super(config, context, true /* need to build schema */);
+this.left = left;
+this.right = right;
+this.hasRowKeyBatch = false;
+  }
+
+  @Override
+  public int getRecordCount() {
+if (state == BatchState.DONE) {
+  return 0;
+}
+return recordCount;
+  }
+
+  @Override
+  public SelectionVector2 getSelectionVector2() {
+throw new UnsupportedOperationException("RowKeyJoinBatch does not support 
selection vector");
+  }
+
+  @Override
+  public SelectionVector4 getSelectionVector4() {
+throw new UnsupportedOperationException("RowKeyJoinBatch does not support 
selection vector");
+  }
+
+  @Override
+  protected void buildSchema() throws SchemaChangeException {
+container.clear();
+
+rightUpstream = next(right);
+
+if (leftUpstream == IterOutcome.STOP || rightUpstream == IterOutcome.STOP) 
{
+  state = BatchState.STOP;
+  return;
+}
+
+if (right.getRecordCount() > 0) {
+  // set the hasRowKeyBatch flag such that calling next() on the left input
+  // would see the correct status
+  hasRowKeyBatch = true;
+}
+
+leftUpstream = next(left);
+
+if (leftUpstream == IterOutcome.OUT_OF_MEMORY || rightUpstream == 
IterOutcome.OUT_OF_MEMORY) {
+  state = BatchState.OUT_OF_MEMORY;
+  return;
+}
+
+for(final VectorWrapper v : left) {
 
 Review comment:
   space


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
>

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643309#comment-16643309
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223377345
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/MapRDBPushLimitIntoScan.java
 ##
 @@ -0,0 +1,203 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.mapr.db;
+
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptRuleOperand;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rex.RexLiteral;
+import org.apache.drill.exec.physical.base.GroupScan;
+import org.apache.drill.exec.planner.common.DrillRelOptUtil;
+import org.apache.drill.exec.planner.logical.RelOptHelper;
+import org.apache.drill.exec.planner.physical.LimitPrel;
+import org.apache.drill.exec.planner.physical.ProjectPrel;
+import org.apache.drill.exec.planner.physical.RowKeyJoinPrel;
+import org.apache.drill.exec.planner.physical.ScanPrel;
+import org.apache.drill.exec.store.StoragePluginOptimizerRule;
+import org.apache.drill.exec.store.hbase.HBaseScanSpec;
+import org.apache.drill.exec.store.mapr.db.binary.BinaryTableGroupScan;
+import org.apache.drill.exec.store.mapr.db.json.JsonTableGroupScan;
+import org.apache.drill.exec.store.mapr.db.json.RestrictedJsonTableGroupScan;
+
+public abstract class MapRDBPushLimitIntoScan extends 
StoragePluginOptimizerRule {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(MapRDBPushLimitIntoScan.class);
+
+  private MapRDBPushLimitIntoScan(RelOptRuleOperand operand, String 
description) {
+super(operand, description);
+  }
+
+  public static final StoragePluginOptimizerRule LIMIT_ON_SCAN =
+  new MapRDBPushLimitIntoScan(RelOptHelper.some(LimitPrel.class, 
RelOptHelper.any(ScanPrel.class)),
+  "MapRDBPushLimitIntoScan:Limit_On_Scan") {
+
+@Override
+public void onMatch(RelOptRuleCall call) {
+  final ScanPrel scan = call.rel(1);
 
 Review comment:
   Could you please change 50 and 51 number strings? It is more clear to use 
`relNodes` from `call` in order.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643298#comment-16643298
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r217475035
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/planner/index/MapRDBFunctionalIndexInfo.java
 ##
 @@ -0,0 +1,163 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.index;
+
+import com.google.common.collect.Maps;
+import com.google.common.collect.Sets;
+import org.apache.drill.common.expression.CastExpression;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.common.expression.SchemaPath;
+
+import java.util.Map;
+import java.util.Set;
+
+public class MapRDBFunctionalIndexInfo implements FunctionalIndexInfo {
+
+  final private IndexDescriptor indexDesc;
+
+  private boolean hasFunctionalField = false;
+
+  //when we scan schemaPath in groupscan's columns, we check if this 
column(schemaPath) should be rewritten to '$N',
+  //When there are more than two functions on the same column in index, 
CAST(a.b as INT), CAST(a.b as VARCHAR),
+  // then we should map SchemaPath a.b to a set of SchemaPath, e.g. $1, $2
+  private Map> columnToConvert;
+
+  // map of functional index expression to destination SchemaPath e.g. $N
+  private Map exprToConvert;
+
+  //map of SchemaPath involved in a functional field
+  private Map> pathsInExpr;
+
+  private Set newPathsForIndexedFunction;
+
+  private Set allPathsInFunction;
+
+  public MapRDBFunctionalIndexInfo(IndexDescriptor indexDesc) {
+this.indexDesc = indexDesc;
+columnToConvert = Maps.newHashMap();
+exprToConvert = Maps.newHashMap();
+pathsInExpr = Maps.newHashMap();
+//keep the order of new paths, it may be related to the naming policy
+newPathsForIndexedFunction = Sets.newLinkedHashSet();
+allPathsInFunction = Sets.newHashSet();
+init();
+  }
+
+  private void init() {
+int count = 0;
+for(LogicalExpression indexedExpr : indexDesc.getIndexColumns()) {
+  if( !(indexedExpr instanceof SchemaPath) ) {
 
 Review comment:
   remove spaces please


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643352#comment-16643352
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223665236
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/mergereceiver/MergingRecordBatch.java
 ##
 @@ -535,7 +535,10 @@ public FragmentContext getContext() {
 
   @Override
   public BatchSchema getSchema() {
-return outgoingContainer.getSchema();
+if (outgoingContainer.hasSchema()) {
+  return outgoingContainer.getSchema();
+}
+return null;
 
 Review comment:
   Is it a case of empty schemaless ouput?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643332#comment-16643332
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223669319
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/index/IndexConditionInfo.java
 ##
 @@ -0,0 +1,250 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.index;
+
+
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+import org.apache.drill.shaded.guava.com.google.common.collect.Maps;
+import org.apache.drill.shaded.guava.com.google.common.collect.Sets;
+import org.apache.calcite.plan.RelOptUtil;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.rex.RexUtil;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.exec.planner.logical.DrillScanRel;
+import 
org.apache.drill.exec.planner.logical.partition.RewriteCombineBinaryOperators;
+import org.apache.calcite.rel.RelNode;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+public class IndexConditionInfo {
+  public final RexNode indexCondition;
+  public final RexNode remainderCondition;
+  public final boolean hasIndexCol;
+
+  public IndexConditionInfo(RexNode indexCondition, RexNode 
remainderCondition, boolean hasIndexCol) {
+this.indexCondition = indexCondition;
+this.remainderCondition = remainderCondition;
+this.hasIndexCol = hasIndexCol;
+  }
+
+  public static Builder newBuilder(RexNode condition,
+   Iterable indexes,
+   RexBuilder builder,
+   RelNode scan) {
+return new Builder(condition, indexes, builder, scan);
+  }
+
+  public static class Builder {
+final RexBuilder builder;
+final RelNode scan;
+final Iterable indexes;
+private RexNode condition;
+
+public Builder(RexNode condition,
+   Iterable indexes,
+   RexBuilder builder,
+   RelNode scan
+) {
+  this.condition = condition;
+  this.builder = builder;
+  this.scan = scan;
+  this.indexes = indexes;
+}
+
+public Builder(RexNode condition,
+   IndexDescriptor index,
+   RexBuilder builder,
+   DrillScanRel scan
+) {
+  this.condition = condition;
+  this.builder = builder;
+  this.scan = scan;
+  this.indexes = Lists.newArrayList(index);
+}
+
+/**
+ * Get a single IndexConditionInfo in which indexCondition has field  on 
all indexes in this.indexes
+ * @return
+ */
+public IndexConditionInfo getCollectiveInfo(IndexLogicalPlanCallContext 
indexContext) {
+  Set paths = Sets.newLinkedHashSet();
+  for ( IndexDescriptor index : indexes ) {
+paths.addAll(index.getIndexColumns());
+//paths.addAll(index.getNonIndexColumns());
+  }
+  return indexConditionRelatedToFields(Lists.newArrayList(paths), 
condition);
+}
+
+/*
+ * A utility function to check whether the given index hint is valid.
+ */
+public boolean isValidIndexHint(IndexLogicalPlanCallContext indexContext) {
+  if (indexContext.indexHint.equals("")) { return false; }
+
+  for ( IndexDescriptor index: indexes ) {
+if ( indexContext.indexHint.equals(index.getIndexName())) {
+  return true;
+}
+  }
+  return false;
+}
+
+/**
+ * Get a map of Index=>IndexConditionInfo, each IndexConditionInfo has the 
separated condition and remainder condition.
+ * The map is ordered, so the last IndexDescriptor will have the final 
remainderCondition after separating conditions
+ * that are relevant to this.indexes. The conditions are separated on

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643365#comment-16643365
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223671173
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/index/IndexPlanUtils.java
 ##
 @@ -0,0 +1,872 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.planner.index;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+import org.apache.drill.shaded.guava.com.google.common.collect.Maps;
+import org.apache.drill.shaded.guava.com.google.common.collect.Sets;
+
+import org.apache.calcite.plan.RelTraitSet;
+import org.apache.calcite.plan.volcano.RelSubset;
+import org.apache.calcite.rel.RelCollation;
+import org.apache.calcite.rel.RelCollationTraitDef;
+import org.apache.calcite.rel.RelCollations;
+import org.apache.calcite.rel.RelFieldCollation;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.Sort;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexUtil;
+import org.apache.calcite.rex.RexLiteral;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.drill.common.expression.FieldReference;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.physical.base.DbGroupScan;
+import org.apache.drill.exec.physical.base.GroupScan;
+import org.apache.drill.exec.physical.base.IndexGroupScan;
+import org.apache.drill.exec.planner.common.DrillProjectRelBase;
+import org.apache.drill.exec.planner.common.DrillScanRelBase;
+import org.apache.drill.exec.planner.fragment.DistributionAffinity;
+import org.apache.drill.exec.planner.logical.DrillOptiq;
+import org.apache.drill.exec.planner.logical.DrillParseContext;
+import org.apache.drill.exec.planner.logical.DrillScanRel;
+import org.apache.drill.exec.planner.physical.DrillDistributionTrait;
+import org.apache.drill.exec.planner.physical.Prel;
+import org.apache.drill.exec.planner.physical.PrelUtil;
+import org.apache.drill.exec.planner.physical.ScanPrel;
+import org.apache.drill.exec.planner.physical.ProjectPrel;
+import org.apache.drill.exec.planner.common.OrderedRel;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.calcite.rex.RexNode;
+
+public class IndexPlanUtils {
+
+  public enum ConditionIndexed {
+NONE,
+PARTIAL,
+FULL}
+
+  /**
+   * Check if any of the fields of the index are present in a list of 
LogicalExpressions supplied
+   * as part of IndexableExprMarker
+   * @param exprMarker, the marker that has analyzed original index condition 
on top of original scan
+   * @param indexDesc
+   * @return ConditionIndexed.FULL, PARTIAL or NONE depending on whether all, 
some or no columns
+   * of the indexDesc are present in the list of LogicalExpressions supplied 
as part of exprMarker
+   *
+   */
+  static public ConditionIndexed conditionIndexed(IndexableExprMarker 
exprMarker, IndexDescriptor indexDesc) {
+Map mapRexExpr = 
exprMarker.getIndexableExpression();
+List infoCols = Lists.newArrayList();
+infoCols.addAll(mapRexExpr.values());
+if (indexDesc.allColumnsIndexed(infoCols)) {
+  return ConditionIndexed.FULL;
+} else if (indexDesc.someColumnsIndexed(infoCols)) {
+  return ConditionIndexed.PARTIAL;
+} else {
+  return ConditionIndexed.NONE;
+}
+  }
+
+  /**
+   * check if we want to apply index rules on this scan,
+   * if group scan is not instance of DbGroupScan, or this DbGroupScan 
instance does not support secondary index, or
+   *this scan is already an index scan or Restricted Scan, do not

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643358#comment-16643358
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223671338
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/index/IndexSelector.java
 ##
 @@ -0,0 +1,766 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.index;
+
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.calcite.plan.RelOptCost;
+import org.apache.calcite.plan.RelOptPlanner;
+import org.apache.calcite.rel.RelCollation;
+import org.apache.calcite.rel.RelCollationTraitDef;
+import org.apache.calcite.rel.metadata.RelMdUtil;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.rex.RexUtil;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.exec.physical.base.DbGroupScan;
+import org.apache.drill.exec.planner.common.DrillJoinRelBase;
+import org.apache.drill.exec.planner.cost.DrillCostBase;
+import org.apache.drill.exec.planner.cost.PluginCost;
+import org.apache.drill.exec.planner.physical.PlannerSettings;
+import org.apache.drill.exec.planner.physical.PrelUtil;
+import org.apache.drill.exec.planner.physical.ScanPrel;
+import org.apache.drill.exec.planner.common.DrillScanRelBase;
+
+import org.apache.drill.shaded.guava.com.google.common.base.Preconditions;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+import org.apache.drill.shaded.guava.com.google.common.collect.Maps;
+
+public class IndexSelector  {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(IndexSelector.class);
+  private static final double COVERING_TO_NONCOVERING_FACTOR = 100.0;
+  private RexNode indexCondition;   // filter condition on indexed columns
+  private RexNode otherRemainderCondition;  // remainder condition on all 
other columns
+  private double totalRows;
+  private Statistics stats; // a Statistics instance that will be used 
to get estimated rowcount for filter conditions
+  private IndexConditionInfo.Builder builder;
+  private List indexPropList;
+  private DrillScanRelBase primaryTableScan;
+  private IndexCallContext indexContext;
+  private RexBuilder rexBuilder;
+
+  public IndexSelector(RexNode indexCondition,
+  RexNode otherRemainderCondition,
+  IndexCallContext indexContext,
+  IndexCollection collection,
+  RexBuilder rexBuilder,
+  double totalRows) {
+this.indexCondition = indexCondition;
+this.otherRemainderCondition = otherRemainderCondition;
+this.indexContext = indexContext;
+this.totalRows = totalRows;
+this.stats = indexContext.getGroupScan().getStatistics();
+this.rexBuilder = rexBuilder;
+this.builder =
+IndexConditionInfo.newBuilder(indexCondition, collection, rexBuilder, 
indexContext.getScan());
+this.primaryTableScan = indexContext.getScan();
+this.indexPropList = Lists.newArrayList();
+  }
+
+  /**
+   * This constructor is to build selector for no index condition case (no 
filter)
+   * @param indexContext
+   */
+  public IndexSelector(IndexCallContext indexContext) {
+this.indexCondition = null;
+this.otherRemainderCondition = null;
+this.indexContext = indexContext;
+this.totalRows = Statistics.ROWCOUNT_UNKNOWN;
+this.stats = indexContext.getGroupScan().getStatistics();
+this.rexBuilder = indexContext.getScan().getCluster().getRexBuilder();
+this.builder = null;
+this.primaryTableScan = indexContext.getScan();
+this.indexPropList = Lists.newArrayList();
+  }
+
+  public void addIndex(IndexDescriptor indexDesc, boolean isCovering, int 
numProjectedFields) {
+IndexProperties indexProps = new DrillIndexProperties(indexDesc,

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643342#comment-16643342
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223660964
 
 

 ##
 File path: 
contrib/format-maprdb/src/test/java/com/mapr/drill/maprdb/tests/index/StatisticsTest.java
 ##
 @@ -0,0 +1,115 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package com.mapr.drill.maprdb.tests.index;
+
+import com.google.common.collect.Lists;
+import com.mapr.db.Admin;
+import com.mapr.drill.maprdb.tests.MaprDBTestsSuite;
+import com.mapr.drill.maprdb.tests.json.BaseJsonTest;
+import com.mapr.tests.annotations.ClusterTest;
+import org.apache.drill.PlanTestBase;
+import org.apache.hadoop.hbase.TableName;
+import org.junit.AfterClass;
+import org.junit.BeforeClass;
+import org.junit.Ignore;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+
+import java.util.List;
+
+@Category(ClusterTest.class)
+public class StatisticsTest extends IndexPlanTest {
+  /**
+   *  A sample row of this 10K table:
+   --+-++
+   | 1012  | {"city":"pfrrs","state":"pc"}  | 
{"email":"kffzkuz...@gmail.com","phone":"655471"}  |
+   {"ssn":"17423"}  | {"fname":"KfFzK","lname":"UZwNk"}  | 
{"age":53.0,"income":45.0}  | 1012   |
+   *
+   * This test suite generate random content to fill all the rows, since the 
random function always start from
+   * the same seed for different runs, when the row count is not changed, the 
data in table will always be the same,
+   * thus the query result could be predicted and verified.
+   */
+
+  @Test
+  @Ignore("Currently untested; re-enable after stats/costing integration 
complete")
+  public void testFilters() throws Exception {
+String query;
+String explain = "explain plan including all attributes for ";
+
+// Top-level ANDs - Leading columns (personal.age), (address.state)
+query = "select * from hbase.`index_test_primary` t "
++ " where (t.personal.age < 30 or t.personal.age > 100)"
++ " and (t.address.state = 'mo' or t.address.state = 'ca')";
+PlanTestBase.testPlanMatchingPatterns(explain+query,
+new String[] 
{".*JsonTableGroupScan.*tableName=.*index_test_primary.*rows=1"},
+new String[] {}
+);
+
+// Top-level ORs - Cannot split top-level ORs so use defaults
+query = "select * from hbase.`index_test_primary` t "
++ " where (t.personal.age > 30 and t.personal.age < 100)"
++ " or (t.address.state = 'mo')";
+PlanTestBase.testPlanMatchingPatterns(explain+query,
+new String[] 
{".*JsonTableGroupScan.*tableName=.*index_test_primary.*rows=1"},
+new String[] {}
+);
+
+// ANDed condition - Leading index column(personal.age) and non-leading 
column(address.city)
+query = "select * from hbase.`index_test_primary` t "
++ " where (t.personal.age < 30 or t.personal.age > 100)"
++ " and `address.city` = 'sf'";
+PlanTestBase.testPlanMatchingPatterns(explain+query,
+new String[] 
{".*JsonTableGroupScan.*tableName=.*index_test_primary.*rows=1"},
+new String[] {}
+);
+
+// ANDed condition - Leading index columns (address.state) and 
(address.city)
+query = "select * from hbase.`index_test_primary` t "
++ " where (`address.state` = 'mo' or `address.state` = 'ca') " // 
Leading index column
++ " and `address.city` = 'sf'";// Non 
leading index column
+PlanTestBase.testPlanMatchingPatterns(explain+query,
+new String[] 
{".*JsonTableGroupScan.*tableName=.*index_test_primary.*rows=1"},
+new String[] {}
 
 Review comment:
   There is an overloaded method without `excludedPatterns`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643345#comment-16643345
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223651337
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/json/JsonTableGroupScan.java
 ##
 @@ -179,16 +295,126 @@ public MapRDBSubScan getSpecificScan(int 
minorFragmentId) {
 assert minorFragmentId < endpointFragmentMapping.size() : String.format(
 "Mappings length [%d] should be greater than minor fragment id [%d] 
but it isn't.", endpointFragmentMapping.size(),
 minorFragmentId);
-return new MapRDBSubScan(getUserName(), formatPlugin, 
endpointFragmentMapping.get(minorFragmentId), columns, TABLE_JSON);
+return new MapRDBSubScan(getUserName(), formatPlugin, 
endpointFragmentMapping.get(minorFragmentId), columns, maxRecordsToRead, 
TABLE_JSON);
   }
 
   @Override
   public ScanStats getScanStats() {
-//TODO: look at stats for this.
-long rowCount = (long) ((scanSpec.getSerializedFilter() != null ? .5 : 1) 
* totalRowCount);
-int avgColumnSize = 10;
-int numColumns = (columns == null || columns.isEmpty()) ? 100 : 
columns.size();
-return new ScanStats(GroupScanProperty.NO_EXACT_ROW_COUNT, rowCount, 1, 
avgColumnSize * numColumns * rowCount);
+if (isIndexScan()) {
+  return indexScanStats();
+}
+return fullTableScanStats();
+  }
+
+  private ScanStats fullTableScanStats() {
+PluginCost pluginCostModel = formatPlugin.getPluginCostModel();
+final int avgColumnSize = pluginCostModel.getAverageColumnSize(this);
+final int numColumns = (columns == null || columns.isEmpty()) ? STAR_COLS 
: columns.size();
+// index will be NULL for FTS
+double rowCount = stats.getRowCount(scanSpec.getCondition(), null);
+// rowcount based on _id predicate. If NO _id predicate present in 
condition, then the
+// rowcount should be same as totalRowCount. Equality b/w the two 
rowcounts should not be
+// construed as NO _id predicate since stats are approximate.
+double leadingRowCount = stats.getLeadingRowCount(scanSpec.getCondition(), 
null);
+double avgRowSize = stats.getAvgRowSize(null, true);
+double totalRowCount = stats.getRowCount(null, null);
+logger.debug("GroupScan {} with stats {}: rowCount={}, condition={}, 
totalRowCount={}, fullTableRowCount={}",
+System.identityHashCode(this), System.identityHashCode(stats), 
rowCount,
+scanSpec.getCondition()==null?"null":scanSpec.getCondition(),
 
 Review comment:
   formatting


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643327#comment-16643327
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223665649
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/partitionsender/Partitioner.java
 ##
 @@ -29,6 +29,8 @@
 import org.apache.drill.exec.record.RecordBatch;
 
 public interface Partitioner {
+  int DEFAULT_RECORD_BATCH_SIZE = (1 << 10) - 1;
 
 Review comment:
   How default record batch size is obtained?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643360#comment-16643360
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223676558
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillOptiq.java
 ##
 @@ -22,7 +22,7 @@
 import java.util.LinkedList;
 import java.util.List;
 
-import org.apache.drill.shaded.guava.com.google.common.base.Preconditions;
+import org.apache.calcite.rel.type.RelDataType;
 
 Review comment:
   Optiq is an old name of Calcite. Does it make sense to rename this class?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643331#comment-16643331
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223661175
 
 

 ##
 File path: 
contrib/format-maprdb/src/test/java/com/mapr/drill/maprdb/tests/index/TableIndexCmd.java
 ##
 @@ -0,0 +1,127 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package com.mapr.drill.maprdb.tests.index;
+
+
+import com.mapr.db.Admin;
+import com.mapr.db.MapRDB;
+import org.apache.drill.exec.util.GuavaPatcher;
+
+import java.util.HashMap;
+import java.util.Map;
+
+/**
+* Copy classes to a MapR cluster node, then run a command like this:
+* java -classpath 
/tmp/drill-cmd-1.9.0-SNAPSHOT.jar:/opt/mapr/drill/drill-1.9.0/jars/*:/opt/mapr/drill/drill-1.9.0/jars/3rdparty/*:/opt/mapr/drill/drill-1.9.0/jars/ext/*
+* org.apache.drill.hbase.index.TableIndexGen -host 
10.10.88.128 -port 5181 [-table pop3] [-size 100]
+*/
+
+class TestBigTable {
+
+  Admin admin;
+  boolean initialized = false;
+
+  LargeTableGen gen;
+
+  /*
+"hbase.zookeeper.quorum": "10.10.88.128",
+"hbase.zookeeper.property.clientPort": "5181"
+   */
+  void init(String host, String port) {
+try {
+  admin = MapRDB.newAdmin();
+  initialized = true;
+  gen = new LargeTableGen(admin);
+} catch (Exception e) {
+  System.out.println("Connection to HBase threw" + e.getMessage());
+}
+  }
+}
+
+
+public class TableIndexCmd {
+
+  public static Map parseParameter(String[] params) {
+HashMap retParams = new HashMap();
+for (int i=0; i Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643341#comment-16643341
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223658773
 
 

 ##
 File path: 
contrib/format-maprdb/src/test/java/com/mapr/drill/maprdb/tests/index/IndexPlanTest.java
 ##
 @@ -0,0 +1,1715 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package com.mapr.drill.maprdb.tests.index;
+
+import com.mapr.db.Admin;
+import com.mapr.drill.maprdb.tests.MaprDBTestsSuite;
+import com.mapr.drill.maprdb.tests.json.BaseJsonTest;
+import com.mapr.tests.annotations.ClusterTest;
+import org.apache.drill.PlanTestBase;
+import org.joda.time.DateTime;
+import org.joda.time.format.DateTimeFormat;
+import org.apache.drill.common.config.DrillConfig;
+import org.junit.AfterClass;
+import org.junit.BeforeClass;
+import org.junit.FixMethodOrder;
+import org.junit.Ignore;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+import org.junit.runners.MethodSorters;
+import java.util.Properties;
+
+
+@FixMethodOrder(MethodSorters.NAME_ASCENDING)
+@Category(ClusterTest.class)
+public class IndexPlanTest extends BaseJsonTest {
+
+  final static String PRIMARY_TABLE_NAME = "/tmp/index_test_primary";
+
+  final static int PRIMARY_TABLE_SIZE = 1;
+  private static final String sliceTargetSmall = "alter session set 
`planner.slice_target` = 1";
+  private static final String sliceTargetDefault = "alter session reset 
`planner.slice_target`";
+  private static final String noIndexPlan = "alter session set 
`planner.enable_index_planning` = false";
+  private static final String defaultHavingIndexPlan = "alter session reset 
`planner.enable_index_planning`";
+  private static final String disableHashAgg = "alter session set 
`planner.enable_hashagg` = false";
+  private static final String enableHashAgg =  "alter session set 
`planner.enable_hashagg` = true";
+  private static final String defaultnonCoveringSelectivityThreshold = "alter 
session set `planner.index.noncovering_selectivity_threshold` = 0.025";
+  private static final String incrnonCoveringSelectivityThreshold = "alter 
session set `planner.index.noncovering_selectivity_threshold` = 0.25";
+  private static final String disableFTS = "alter session set 
`planner.disable_full_table_scan` = true";
+  private static final String enableFTS = "alter session reset 
`planner.disable_full_table_scan`";
+  private static final String preferIntersectPlans = "alter session set 
`planner.index.prefer_intersect_plans` = true";
+  private static final String defaultIntersectPlans = "alter session reset 
`planner.index.prefer_intersect_plans`";
+  private static final String lowRowKeyJoinBackIOFactor
+  = "alter session set `planner.index.rowkeyjoin_cost_factor` = 0.01";
+  private static final String defaultRowKeyJoinBackIOFactor
+  = "alter session reset `planner.index.rowkeyjoin_cost_factor`";
+
+  /**
+   *  A sample row of this 10K table:
+   --+-++
+   | 1012  | {"city":"pfrrs","state":"pc"}  | 
{"email":"kffzkuz...@gmail.com","phone":"655471"}  |
+   {"ssn":"17423"}  | {"fname":"KfFzK","lname":"UZwNk"}  | 
{"age":53.0,"income":45.0}  | 1012   |
+   *
+   * This test suite generate random content to fill all the rows, since the 
random function always start from
+   * the same seed for different runs, when the row count is not changed, the 
data in table will always be the same,
+   * thus the query result could be predicted and verified.
+   */
+
+  @BeforeClass
+  public static void setupTableIndexes() throws Exception {
+
+Properties overrideProps = new Properties();
+
overrideProps.setProperty("format-maprdb.json.useNumRegionsForDistribution", 
"true");
+updateTestCluster(1, DrillConfig.create(overrideProps));
+
+MaprDBTestsSuite.setupTests();
+MaprDBTestsSuite.createPluginAndGetConf(getDrillbitContext());
+
+

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643337#comment-16643337
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223661248
 
 

 ##
 File path: 
contrib/format-maprdb/src/test/java/com/mapr/drill/maprdb/tests/index/TableIndexCmd.java
 ##
 @@ -0,0 +1,127 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package com.mapr.drill.maprdb.tests.index;
+
+
+import com.mapr.db.Admin;
+import com.mapr.db.MapRDB;
+import org.apache.drill.exec.util.GuavaPatcher;
+
+import java.util.HashMap;
+import java.util.Map;
+
+/**
+* Copy classes to a MapR cluster node, then run a command like this:
+* java -classpath 
/tmp/drill-cmd-1.9.0-SNAPSHOT.jar:/opt/mapr/drill/drill-1.9.0/jars/*:/opt/mapr/drill/drill-1.9.0/jars/3rdparty/*:/opt/mapr/drill/drill-1.9.0/jars/ext/*
+* org.apache.drill.hbase.index.TableIndexGen -host 
10.10.88.128 -port 5181 [-table pop3] [-size 100]
+*/
+
+class TestBigTable {
+
+  Admin admin;
+  boolean initialized = false;
+
+  LargeTableGen gen;
+
+  /*
+"hbase.zookeeper.quorum": "10.10.88.128",
+"hbase.zookeeper.property.clientPort": "5181"
+   */
+  void init(String host, String port) {
+try {
+  admin = MapRDB.newAdmin();
+  initialized = true;
+  gen = new LargeTableGen(admin);
+} catch (Exception e) {
+  System.out.println("Connection to HBase threw" + e.getMessage());
+}
+  }
+}
+
+
+public class TableIndexCmd {
+
+  public static Map parseParameter(String[] params) {
+HashMap retParams = new HashMap();
+for (int i=0; i params = parseParameter(args);
+if(args.length >= 2) {
 
 Review comment:
   formatting


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643362#comment-16643362
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223659414
 
 

 ##
 File path: 
contrib/format-maprdb/src/test/java/com/mapr/drill/maprdb/tests/index/LargeTableGen.java
 ##
 @@ -0,0 +1,176 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package com.mapr.drill.maprdb.tests.index;
+
+import static com.mapr.drill.maprdb.tests.MaprDBTestsSuite.INDEX_FLUSH_TIMEOUT;
+
+import java.io.InputStream;
+import java.io.StringBufferInputStream;
+
+import org.apache.hadoop.fs.Path;
+import org.ojai.DocumentStream;
+import org.ojai.json.Json;
+
+import com.mapr.db.Admin;
+import com.mapr.db.Table;
+import com.mapr.db.TableDescriptor;
+import com.mapr.db.impl.MapRDBImpl;
+import com.mapr.db.impl.TableDescriptorImpl;
+import com.mapr.db.tests.utils.DBTests;
+import com.mapr.fs.utils.ssh.TestCluster;
+
+/**
+ * This class is to generate a MapR json table of this schema:
+ * {
+ *   "address" : {
+ *  "city":"wtj",
+ *  "state":"ho"
+ *   }
+ *   "contact" : {
+ *  "email":"vcfahj...@gmail.com",
+ *  "phone":"655583"
+ *   }
+ *   "id" : {
+ *  "ssn":"15461"
+ *   }
+ *   "name" : {
+ *  "fname":"VcFahj",
+ *  "lname":"RfM"
+ *   }
+ * }
+ *
+ */
+public class LargeTableGen extends LargeTableGenBase {
+
+  static final int SPLIT_SIZE = 5000;
+  private Admin admin;
+
+  public LargeTableGen(Admin dbadmin) {
+admin = dbadmin;
+  }
+
+  Table createOrGetTable(String tableName, int recordNum) {
+if (admin.tableExists(tableName)) {
+  return MapRDBImpl.getTable(tableName);
+  //admin.deleteTable(tableName);
+}
+else {
+  TableDescriptor desc = new TableDescriptorImpl(new Path(tableName));
+
+  int splits = (recordNum / SPLIT_SIZE) - (((recordNum % SPLIT_SIZE) > 1)? 
0 : 1);
+
+  String[] splitsStr = new String[splits];
+  StringBuilder strBuilder = new StringBuilder("Splits:");
+  for(int i=0; i Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643347#comment-16643347
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223672289
 
 

 ##
 File path: 
protocol/src/main/java/org/apache/drill/exec/proto/beans/CoreOperatorType.java
 ##
 @@ -78,7 +78,8 @@
 SEQUENCE_SUB_SCAN(53),
 PARTITION_LIMIT(54),
 PCAPNG_SUB_SCAN(55),
-RUNTIME_FILTER(56);
+RUNTIME_FILTER(56),
+ROWKEY_JOIN(57);
 
 Review comment:
   Please regenerate protobuf files for C++ native client as well


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643316#comment-16643316
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223651862
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/json/JsonTableRangePartitionFunction.java
 ##
 @@ -0,0 +1,237 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.mapr.db.json;
+
+import java.util.List;
+
+import org.apache.drill.common.expression.FieldReference;
+import org.apache.drill.exec.planner.physical.AbstractRangePartitionFunction;
+import org.apache.drill.exec.record.VectorWrapper;
+import org.apache.drill.exec.store.mapr.db.MapRDBFormatPlugin;
+import org.apache.drill.exec.vector.ValueVector;
+import org.ojai.store.QueryCondition;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonIgnore;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Lists;
+import com.mapr.db.Table;
+import com.mapr.db.impl.ConditionImpl;
+import com.mapr.db.impl.IdCodec;
+import com.mapr.db.impl.ConditionNode.RowkeyRange;
+import com.mapr.db.scan.ScanRange;
+import com.mapr.fs.jni.MapRConstants;
+import com.mapr.org.apache.hadoop.hbase.util.Bytes;
+
+@JsonTypeName("jsontable-range-partition-function")
+public class JsonTableRangePartitionFunction extends 
AbstractRangePartitionFunction {
+
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(JsonTableRangePartitionFunction.class);
+
+  @JsonProperty("refList")
+  protected List refList;
+
+  @JsonProperty("tableName")
+  protected String tableName;
+
+  @JsonIgnore
+  protected String userName;
+
+  @JsonIgnore
+  protected ValueVector partitionKeyVector = null;
+
+  // List of start keys of the scan ranges for the table.
+  @JsonProperty
+  protected List startKeys = null;
+
+  // List of stop keys of the scan ranges for the table.
+  @JsonProperty
+  protected List stopKeys = null;
+
+  @JsonCreator
+  public JsonTableRangePartitionFunction(
+  @JsonProperty("refList") List refList,
+  @JsonProperty("tableName") String tableName,
+  @JsonProperty("startKeys") List startKeys,
+  @JsonProperty("stopKeys") List stopKeys) {
+this.refList = refList;
+this.tableName = tableName;
+this.startKeys = startKeys;
+this.stopKeys = stopKeys;
+  }
+
+  public JsonTableRangePartitionFunction(List refList,
+  String tableName, String userName, MapRDBFormatPlugin formatPlugin) {
+this.refList = refList;
+this.tableName = tableName;
+this.userName = userName;
+initialize(formatPlugin);
+  }
+
+  @JsonProperty("refList")
+  @Override
+  public List getPartitionRefList() {
+return refList;
+  }
+
+  @Override
+  public void setup(List> partitionKeys) {
+if (partitionKeys.size() != 1) {
+  throw new UnsupportedOperationException(
+  "Range partitioning function supports exactly one partition column; 
encountered " + partitionKeys.size());
+}
+
+VectorWrapper v = partitionKeys.get(0);
+
+partitionKeyVector = v.getValueVector();
+
+Preconditions.checkArgument(partitionKeyVector != null, "Found null 
partitionKeVector.") ;
+  }
+
+  @Override
+  public boolean equals(Object obj) {
+if (this == obj) {
+  return true;
+}
+if (obj instanceof JsonTableRangePartitionFunction) {
+  JsonTableRangePartitionFunction rpf = (JsonTableRangePartitionFunction) 
obj;
+  List thisPartRefList = this.getPartitionRefList();
+  List otherPartRefList = rpf.getPartitionRefList();
+  if (thisPartRefList.size() != otherPartRefList.size()) {
+return false;
+  }
+  for (int refIdx=0; refIdx= 0 ||
 
 Review comment:
   formatting


This is an

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643324#comment-16643324
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223654776
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/json/RestrictedJsonTableGroupScan.java
 ##
 @@ -0,0 +1,184 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.mapr.db.json;
+
+import java.util.List;
+import java.util.NavigableMap;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonIgnore;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Lists;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.physical.base.GroupScan;
+import org.apache.drill.exec.physical.base.PhysicalOperator;
+import org.apache.drill.exec.physical.base.ScanStats;
+import org.apache.drill.exec.physical.base.ScanStats.GroupScanProperty;
+import org.apache.drill.exec.planner.index.MapRDBStatistics;
+import org.apache.drill.exec.planner.cost.PluginCost;
+import org.apache.drill.exec.planner.index.Statistics;
+import org.apache.drill.exec.store.dfs.FileSystemPlugin;
+import org.apache.drill.exec.store.mapr.db.MapRDBFormatPlugin;
+import org.apache.drill.exec.store.mapr.db.MapRDBSubScan;
+import org.apache.drill.exec.store.mapr.db.MapRDBSubScanSpec;
+import org.apache.drill.exec.store.mapr.db.RestrictedMapRDBSubScan;
+import org.apache.drill.exec.store.mapr.db.RestrictedMapRDBSubScanSpec;
+import org.apache.drill.exec.store.mapr.db.TabletFragmentInfo;
+
+/**
+ * A RestrictedJsonTableGroupScan encapsulates (along with a subscan) the 
functionality
+ * for doing restricted (i.e skip) scan rather than sequential scan.  The 
skipping is based
+ * on a supplied set of row keys (primary keys) from a join operator.
+ */
+@JsonTypeName("restricted-json-scan")
+public class RestrictedJsonTableGroupScan extends JsonTableGroupScan {
+
+  @JsonCreator
+  public RestrictedJsonTableGroupScan(@JsonProperty("userName") String 
userName,
+@JsonProperty("storage") FileSystemPlugin 
storagePlugin,
+@JsonProperty("format") MapRDBFormatPlugin 
formatPlugin,
+@JsonProperty("scanSpec") JsonScanSpec scanSpec, 
/* scan spec of the original table */
+@JsonProperty("columns") List columns,
+@JsonProperty("")MapRDBStatistics statistics) {
 
 Review comment:
   space


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643319#comment-16643319
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223639785
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/planner/index/MapRDBFunctionalIndexInfo.java
 ##
 @@ -0,0 +1,168 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.index;
+
+import com.google.common.collect.Maps;
+import com.google.common.collect.Sets;
+import org.apache.drill.common.expression.CastExpression;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.common.expression.SchemaPath;
+
+import java.util.Map;
+import java.util.Set;
+
+public class MapRDBFunctionalIndexInfo implements FunctionalIndexInfo {
+
+  final private IndexDescriptor indexDesc;
+
+  private boolean hasFunctionalField = false;
+
+  //when we scan schemaPath in groupscan's columns, we check if this 
column(schemaPath) should be rewritten to '$N',
+  //When there are more than two functions on the same column in index, 
CAST(a.b as INT), CAST(a.b as VARCHAR),
+  // then we should map SchemaPath a.b to a set of SchemaPath, e.g. $1, $2
+  private Map> columnToConvert;
+
+  // map of functional index expression to destination SchemaPath e.g. $N
+  private Map exprToConvert;
+
+  //map of SchemaPath involved in a functional field
+  private Map> pathsInExpr;
+
+  private Set newPathsForIndexedFunction;
+
+  private Set allPathsInFunction;
+
+  public MapRDBFunctionalIndexInfo(IndexDescriptor indexDesc) {
+this.indexDesc = indexDesc;
+columnToConvert = Maps.newHashMap();
 
 Review comment:
   `new HashMap<>()` here and in other places


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643310#comment-16643310
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223648507
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/MapRDBFormatPlugin.java
 ##
 @@ -57,34 +64,74 @@ public MapRDBFormatPlugin(String name, DrillbitContext 
context, Configuration fs
 hbaseConf = HBaseConfiguration.create(fsConf);
 hbaseConf.set(ConnectionFactory.DEFAULT_DB, 
ConnectionFactory.MAPR_ENGINE2);
 connection = ConnectionFactory.createConnection(hbaseConf);
+jsonTableCache = new MapRDBTableCache(context.getConfig());
+int scanRangeSizeMBConfig = 
context.getConfig().getInt(PluginConstants.JSON_TABLE_SCAN_SIZE_MB);
+if (scanRangeSizeMBConfig < 32 || scanRangeSizeMBConfig > 8192) {
 
 Review comment:
   These are magic numbers. Please create constant field or local values for 32 
and 8192 (If they doesn't exist in MapR-DB libs)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643329#comment-16643329
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223658126
 
 

 ##
 File path: 
contrib/format-maprdb/src/test/java/com/mapr/drill/maprdb/tests/index/IndexPlanTest.java
 ##
 @@ -0,0 +1,1715 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package com.mapr.drill.maprdb.tests.index;
+
+import com.mapr.db.Admin;
+import com.mapr.drill.maprdb.tests.MaprDBTestsSuite;
+import com.mapr.drill.maprdb.tests.json.BaseJsonTest;
+import com.mapr.tests.annotations.ClusterTest;
+import org.apache.drill.PlanTestBase;
+import org.joda.time.DateTime;
+import org.joda.time.format.DateTimeFormat;
+import org.apache.drill.common.config.DrillConfig;
+import org.junit.AfterClass;
+import org.junit.BeforeClass;
+import org.junit.FixMethodOrder;
+import org.junit.Ignore;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+import org.junit.runners.MethodSorters;
+import java.util.Properties;
+
+
+@FixMethodOrder(MethodSorters.NAME_ASCENDING)
+@Category(ClusterTest.class)
+public class IndexPlanTest extends BaseJsonTest {
+
+  final static String PRIMARY_TABLE_NAME = "/tmp/index_test_primary";
+
+  final static int PRIMARY_TABLE_SIZE = 1;
+  private static final String sliceTargetSmall = "alter session set 
`planner.slice_target` = 1";
 
 Review comment:
   It is better to use upper case for static final variables


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643303#comment-16643303
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223379455
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/MapRDBPushLimitIntoScan.java
 ##
 @@ -0,0 +1,203 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.mapr.db;
+
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptRuleOperand;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rex.RexLiteral;
+import org.apache.drill.exec.physical.base.GroupScan;
+import org.apache.drill.exec.planner.common.DrillRelOptUtil;
+import org.apache.drill.exec.planner.logical.RelOptHelper;
+import org.apache.drill.exec.planner.physical.LimitPrel;
+import org.apache.drill.exec.planner.physical.ProjectPrel;
+import org.apache.drill.exec.planner.physical.RowKeyJoinPrel;
+import org.apache.drill.exec.planner.physical.ScanPrel;
+import org.apache.drill.exec.store.StoragePluginOptimizerRule;
+import org.apache.drill.exec.store.hbase.HBaseScanSpec;
+import org.apache.drill.exec.store.mapr.db.binary.BinaryTableGroupScan;
+import org.apache.drill.exec.store.mapr.db.json.JsonTableGroupScan;
+import org.apache.drill.exec.store.mapr.db.json.RestrictedJsonTableGroupScan;
+
+public abstract class MapRDBPushLimitIntoScan extends 
StoragePluginOptimizerRule {
 
 Review comment:
   `DrillPushLimitToScanRule` doesn't work for MapR-DB Scan, right?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643314#comment-16643314
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223651509
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/json/JsonTableGroupScan.java
 ##
 @@ -214,11 +445,323 @@ public boolean canPushdownProjects(List 
columns) {
 
   @Override
   public String toString() {
-return "JsonTableGroupScan [ScanSpec=" + scanSpec + ", columns=" + columns 
+ "]";
+return "JsonTableGroupScan [ScanSpec=" + scanSpec + ", columns=" + columns
++ (maxRecordsToRead>0? ", limit=" + maxRecordsToRead : "")
 
 Review comment:
   formatting


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643325#comment-16643325
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223660467
 
 

 ##
 File path: 
contrib/format-maprdb/src/test/java/com/mapr/drill/maprdb/tests/index/LargeTableGenBase.java
 ##
 @@ -0,0 +1,186 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package com.mapr.drill.maprdb.tests.index;
+
+import org.apache.commons.lang3.RandomStringUtils;
+
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.LinkedHashSet;
+import java.util.List;
+import java.util.Random;
+import java.util.Set;
+
+public class LargeTableGenBase {
+
+  private boolean dict_ready = false;
+
+  protected List firstnames;
+  protected List lastnames;
+  protected List cities;
+  protected int[] randomized;
+
+  protected synchronized void  initDictionary() {
+initDictionaryWithRand();
+  }
+
+  protected void initDictionaryWithRand() {
+{
+  firstnames = new ArrayList<>();
+  lastnames = new ArrayList<>();
+  cities = new ArrayList<>();
+  List states = new ArrayList<>();
+
+  int fnNum = 2000; //2k
+  int lnNum = 20;//200k
+  int cityNum = 1;//10k
+  int stateNum = 50;
+  Random rand = new Random(2017);
+  int i;
+  try {
+Set strSet = new LinkedHashSet<>();
+while(strSet.size() < stateNum) {
+  strSet.add(RandomStringUtils.random(2, 0, 0, true, false, null, 
rand));
+}
+states.addAll(strSet);
+
+strSet = new LinkedHashSet<>();
+while(strSet.size() < cityNum) {
+  int len = 3 + strSet.size() % 6;
+  strSet.add(RandomStringUtils.random(len, 0, 0, true, false, null, 
rand));
+}
+
+Iterator it = strSet.iterator();
+for(i=0; i Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643323#comment-16643323
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223654821
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/json/RestrictedJsonTableGroupScan.java
 ##
 @@ -0,0 +1,184 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.mapr.db.json;
+
+import java.util.List;
+import java.util.NavigableMap;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonIgnore;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Lists;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.physical.base.GroupScan;
+import org.apache.drill.exec.physical.base.PhysicalOperator;
+import org.apache.drill.exec.physical.base.ScanStats;
+import org.apache.drill.exec.physical.base.ScanStats.GroupScanProperty;
+import org.apache.drill.exec.planner.index.MapRDBStatistics;
+import org.apache.drill.exec.planner.cost.PluginCost;
+import org.apache.drill.exec.planner.index.Statistics;
+import org.apache.drill.exec.store.dfs.FileSystemPlugin;
+import org.apache.drill.exec.store.mapr.db.MapRDBFormatPlugin;
+import org.apache.drill.exec.store.mapr.db.MapRDBSubScan;
+import org.apache.drill.exec.store.mapr.db.MapRDBSubScanSpec;
+import org.apache.drill.exec.store.mapr.db.RestrictedMapRDBSubScan;
+import org.apache.drill.exec.store.mapr.db.RestrictedMapRDBSubScanSpec;
+import org.apache.drill.exec.store.mapr.db.TabletFragmentInfo;
+
+/**
+ * A RestrictedJsonTableGroupScan encapsulates (along with a subscan) the 
functionality
+ * for doing restricted (i.e skip) scan rather than sequential scan.  The 
skipping is based
+ * on a supplied set of row keys (primary keys) from a join operator.
+ */
+@JsonTypeName("restricted-json-scan")
+public class RestrictedJsonTableGroupScan extends JsonTableGroupScan {
+
+  @JsonCreator
+  public RestrictedJsonTableGroupScan(@JsonProperty("userName") String 
userName,
+@JsonProperty("storage") FileSystemPlugin 
storagePlugin,
+@JsonProperty("format") MapRDBFormatPlugin 
formatPlugin,
+@JsonProperty("scanSpec") JsonScanSpec scanSpec, 
/* scan spec of the original table */
+@JsonProperty("columns") List columns,
+@JsonProperty("")MapRDBStatistics statistics) {
+super(userName, storagePlugin, formatPlugin, scanSpec, columns, 
statistics);
+  }
+
+  // TODO:  this method needs to be fully implemented
+  protected RestrictedMapRDBSubScanSpec getSubScanSpec(TabletFragmentInfo tfi) 
{
+JsonScanSpec spec = scanSpec;
+RestrictedMapRDBSubScanSpec subScanSpec =
+new RestrictedMapRDBSubScanSpec(
+spec.getTableName(),
+getRegionsToScan().get(tfi), spec.getSerializedFilter(), 
getUserName());
+return subScanSpec;
+  }
+
+  protected NavigableMap getRegionsToScan() {
+return getRegionsToScan(formatPlugin.getRestrictedScanRangeSizeMB());
+  }
+
+  @Override
+  public MapRDBSubScan getSpecificScan(int minorFragmentId) {
+assert minorFragmentId < endpointFragmentMapping.size() : String.format(
+"Mappings length [%d] should be greater than minor fragment id [%d] 
but it isn't.", endpointFragmentMapping.size(),
+minorFragmentId);
+RestrictedMapRDBSubScan subscan =
+new RestrictedMapRDBSubScan(getUserName(), formatPlugin,
+getEndPointFragmentMapping(minorFragmentId), columns, 
maxRecordsToRead, TABLE_JSON);
+
+return subscan;
+  }
+
+  private List getEndPointFragmentMapping(int 
minorFragmentId) {
+List restrictedSubScanSpecList = 
Lists.newArrayList();
+List subScanSpecList =

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643299#comment-16643299
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r217475464
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/planner/index/MapRDBIndexDescriptor.java
 ##
 @@ -0,0 +1,222 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.index;
+
+
+import java.util.Collection;
+import java.util.List;
+import java.util.Set;
+
+import com.google.common.collect.Lists;
+import com.google.common.collect.Sets;
+
+import org.apache.calcite.plan.RelOptCost;
+import org.apache.calcite.plan.RelOptPlanner;
+import org.apache.calcite.rel.RelFieldCollation.NullDirection;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.expr.CloneVisitor;
+import org.apache.drill.exec.physical.base.DbGroupScan;
+import org.apache.drill.exec.physical.base.GroupScan;
+import org.apache.drill.exec.planner.cost.DrillCostBase;
+import org.apache.drill.exec.planner.cost.DrillCostBase.DrillCostFactory;
+import org.apache.drill.exec.planner.cost.PluginCost;
+import org.apache.drill.exec.planner.index.IndexProperties;
+import org.apache.drill.exec.store.mapr.PluginConstants;
+import org.apache.drill.exec.util.EncodedSchemaPathSet;
+import org.apache.drill.common.expression.LogicalExpression;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.ImmutableSet;
+
+public class MapRDBIndexDescriptor extends DrillIndexDescriptor {
+
+  protected final Object desc;
+  protected final Set allFields;
+  protected final Set indexedFields;
+  protected MapRDBFunctionalIndexInfo functionalInfo;
+  protected PluginCost pluginCost;
+
+  public MapRDBIndexDescriptor(List indexCols,
+   CollationContext indexCollationContext,
+   List nonIndexCols,
+   List rowKeyColumns,
+   String indexName,
+   String tableName,
+   IndexType type,
+   Object desc,
+   DbGroupScan scan,
+   NullDirection nullsDirection) {
+super(indexCols, indexCollationContext, nonIndexCols, rowKeyColumns, 
indexName, tableName, type, nullsDirection);
+this.desc = desc;
+this.indexedFields = ImmutableSet.copyOf(indexColumns);
+this.allFields = new ImmutableSet.Builder()
+.add(PluginConstants.DOCUMENT_SCHEMA_PATH)
+.addAll(indexColumns)
+.addAll(nonIndexColumns)
+.build();
+this.pluginCost = scan.getPluginCostModel();
+  }
+
+  public Object getOriginalDesc(){
+return desc;
+  }
+
+  @Override
+  public boolean isCoveringIndex(List expressions) {
+List decodedCols = new 
DecodePathinExpr().parseExpressions(expressions);
+return columnsInIndexFields(decodedCols, allFields);
+  }
+
+  @Override
+  public boolean allColumnsIndexed(Collection expressions) {
+List decodedCols = new 
DecodePathinExpr().parseExpressions(expressions);
+return columnsInIndexFields(decodedCols, indexedFields);
+  }
+
+  @Override
+  public boolean someColumnsIndexed(Collection columns) {
+return columnsIndexed(columns, false);
+  }
+
+  private boolean columnsIndexed(Collection expressions, 
boolean allColsIndexed) {
+List decodedCols = new 
DecodePathinExpr().parseExpressions(expressions);
+if (allColsIndexed) {
+  return columnsInIndexFields(decodedCols, indexedFields);
+} else {
+  return someColumnsInIndexFields(decodedCols, indexedFields);
+}
+  }
+
+  public FunctionalIndexInfo getFunctionalInfo() {
+if (this.functionalInfo == null) {
+  this.functionalInfo = new MapRDBFunctionalIndexInfo(this);
+}
+return this.functionalInfo;
+  }
+
+  /**
+   * Search through a LogicalExpression, finding all

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643301#comment-16643301
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r217475224
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/planner/index/MapRDBFunctionalIndexInfo.java
 ##
 @@ -0,0 +1,163 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.index;
+
+import com.google.common.collect.Maps;
+import com.google.common.collect.Sets;
+import org.apache.drill.common.expression.CastExpression;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.common.expression.SchemaPath;
+
+import java.util.Map;
+import java.util.Set;
+
+public class MapRDBFunctionalIndexInfo implements FunctionalIndexInfo {
+
+  final private IndexDescriptor indexDesc;
+
+  private boolean hasFunctionalField = false;
+
+  //when we scan schemaPath in groupscan's columns, we check if this 
column(schemaPath) should be rewritten to '$N',
+  //When there are more than two functions on the same column in index, 
CAST(a.b as INT), CAST(a.b as VARCHAR),
+  // then we should map SchemaPath a.b to a set of SchemaPath, e.g. $1, $2
+  private Map> columnToConvert;
+
+  // map of functional index expression to destination SchemaPath e.g. $N
+  private Map exprToConvert;
+
+  //map of SchemaPath involved in a functional field
+  private Map> pathsInExpr;
+
+  private Set newPathsForIndexedFunction;
+
+  private Set allPathsInFunction;
+
+  public MapRDBFunctionalIndexInfo(IndexDescriptor indexDesc) {
+this.indexDesc = indexDesc;
+columnToConvert = Maps.newHashMap();
+exprToConvert = Maps.newHashMap();
+pathsInExpr = Maps.newHashMap();
+//keep the order of new paths, it may be related to the naming policy
+newPathsForIndexedFunction = Sets.newLinkedHashSet();
+allPathsInFunction = Sets.newHashSet();
+init();
+  }
+
+  private void init() {
+int count = 0;
+for(LogicalExpression indexedExpr : indexDesc.getIndexColumns()) {
+  if( !(indexedExpr instanceof SchemaPath) ) {
+hasFunctionalField = true;
+SchemaPath functionalFieldPath = SchemaPath.getSimplePath("$"+count);
+newPathsForIndexedFunction.add(functionalFieldPath);
+
+//now we handle only cast expression
 
 Review comment:
   Please add white spaces in the comments


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian

[jira] [Commented] (DRILL-6731) JPPD:Move aggregating the BF from the Foreman to the RuntimeFilter

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643271#comment-16643271
 ] 

ASF GitHub Bot commented on DRILL-6731:
---

weijietong commented on issue #1459: DRILL-6731: Move the BFs aggregating work 
from the Foreman to the RuntimeFi…
URL: https://github.com/apache/drill/pull/1459#issuecomment-428174077
 
 
   @sohami  I do a minor change to at this additional commit. The change is to 
run the RuntimeFilter Aggregating work from a thread pool. This will be 
resource benefit.  Another change is to not spawn the RuntimeFilterSink while 
the RuntimeFilter enable option is false. Please give your review.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> JPPD:Move aggregating the BF from the Foreman to the RuntimeFilter
> --
>
> Key: DRILL-6731
> URL: https://issues.apache.org/jira/browse/DRILL-6731
> Project: Apache Drill
>  Issue Type: Improvement
>  Components:  Server
>Affects Versions: 1.15.0
>Reporter: weijie.tong
>Assignee: weijie.tong
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
>
> This PR is to move the BloomFilter aggregating work from the foreman to 
> RuntimeFilter. Though this change, the RuntimeFilter can apply the incoming 
> BF as soon as possible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6763) Codegen optimization of SQL functions with constant values

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643260#comment-16643260
 ] 

ASF GitHub Bot commented on DRILL-6763:
---

vvysotskyi commented on issue #1481: DRILL-6763: Codegen optimization of SQL 
functions with constant values
URL: https://github.com/apache/drill/pull/1481#issuecomment-428173000
 
 
   @lushuifeng, since these tests failures are absent on the latest master, 
they caused by your changes. Could you please fix them? Also, some of the 
failures in `TestLargeFileCompilation` may be caused by the previously failed 
unit tests from this class.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Codegen optimization of SQL functions with constant values
> --
>
> Key: DRILL-6763
> URL: https://issues.apache.org/jira/browse/DRILL-6763
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Codegen
>Affects Versions: 1.14.0
>Reporter: shuifeng lu
>Assignee: shuifeng lu
>Priority: Major
> Fix For: 1.15.0
>
> Attachments: Query1.java, Query2.java, code_compare.png, 
> compilation_time.png
>
>
> Codegen class compilation takes tens to hundreds of milliseconds, a class 
> cache is hit when generifiedCode of code generator is exactly the same.
>  It works fine when UDF only takes columns or symbols, but not efficient when 
> one or more parameters in UDF is always distinct from the other.
>  Take face recognition for example, the face images are almost distinct from 
> each other according to lighting, facial expressions and details.
>  It is important to reduce redundant class compilation especially for those 
> low latency queries.
>  Cache miss rate and metaspace gc can also be reduced by eliminating the 
> redundant classes.
> Here is the query to get the persons whose last name is Brunner and hire from 
> 1st Jan 1990:
>  SELECT full_name, hire_date FROM cp.`employee.json` where last_name = 
> 'Brunner' and hire_date >= '1990-01-01 00:00:00.0';
>  Now get the persons whose last name is Bernard and hire from 1st Jan 1990.
>  SELECT full_name, hire_date FROM cp.`employee.json` where last_name = 
> 'Bernard' and hire_date >= '1990-01-01 00:00:00.0';
> Figure !compilation_time.png! shows the compilation time of the generated 
> code by the above query in FilterRecordBatch on my laptop
>  Figure !code_compare.png!  shows the only difference of the generated code 
> from the attachments is the last_name value at line 156.
>  It is straightforward that the redundant class compilation can be eliminated 
> by making the string12 as a member of the class and set the value when the 
> instance is created



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6775) The schema for empty output is not shown in Drill Web UI

2018-10-09 Thread Arina Ielchiieva (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6775:

Fix Version/s: (was: Future)
   1.15.0

> The schema for empty output is not shown in Drill Web UI
> 
>
> Key: DRILL-6775
> URL: https://issues.apache.org/jira/browse/DRILL-6775
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - HTTP, Web Server
>Affects Versions: 1.14.0
>Reporter: Vitalii Diravka
>Assignee: Anton Gozhiy
>Priority: Minor
> Fix For: 1.15.0
>
> Attachments: image-2018-10-05-16-16-45-389.png
>
>
> The query in SqlLine:
> {code}
> 0: jdbc:drill:zk=local> SELECT employee_id, full_name, first_name, last_name 
> FROM cp.`employee.json` LIMIT 0;
> +--++-++
> | employee_id | full_name | first_name | last_name |
> +--++-++
> +--++-++
> No rows selected (0.118 seconds)
> {code}
> But the same in Drill UI shows nothing, see the attachment.
>  !image-2018-10-05-16-16-45-389.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (DRILL-6775) The schema for empty output is not shown in Drill Web UI

2018-10-09 Thread Anton Gozhiy (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Gozhiy reassigned DRILL-6775:
---

Assignee: Anton Gozhiy

> The schema for empty output is not shown in Drill Web UI
> 
>
> Key: DRILL-6775
> URL: https://issues.apache.org/jira/browse/DRILL-6775
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - HTTP, Web Server
>Affects Versions: 1.14.0
>Reporter: Vitalii Diravka
>Assignee: Anton Gozhiy
>Priority: Minor
> Fix For: Future
>
> Attachments: image-2018-10-05-16-16-45-389.png
>
>
> The query in SqlLine:
> {code}
> 0: jdbc:drill:zk=local> SELECT employee_id, full_name, first_name, last_name 
> FROM cp.`employee.json` LIMIT 0;
> +--++-++
> | employee_id | full_name | first_name | last_name |
> +--++-++
> +--++-++
> No rows selected (0.118 seconds)
> {code}
> But the same in Drill UI shows nothing, see the attachment.
>  !image-2018-10-05-16-16-45-389.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6763) Codegen optimization of SQL functions with constant values

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643089#comment-16643089
 ] 

ASF GitHub Bot commented on DRILL-6763:
---

lushuifeng commented on issue #1481: DRILL-6763: Codegen optimization of SQL 
functions with constant values
URL: https://github.com/apache/drill/pull/1481#issuecomment-428140346
 
 
   There is no such tests on master, but also failed with the same errors on 
branch DRILL-6763.
   Comment-out these two new tests and rerun all tests in 
TestLargeFileCompilation got the same error as above


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Codegen optimization of SQL functions with constant values
> --
>
> Key: DRILL-6763
> URL: https://issues.apache.org/jira/browse/DRILL-6763
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Codegen
>Affects Versions: 1.14.0
>Reporter: shuifeng lu
>Assignee: shuifeng lu
>Priority: Major
> Fix For: 1.15.0
>
> Attachments: Query1.java, Query2.java, code_compare.png, 
> compilation_time.png
>
>
> Codegen class compilation takes tens to hundreds of milliseconds, a class 
> cache is hit when generifiedCode of code generator is exactly the same.
>  It works fine when UDF only takes columns or symbols, but not efficient when 
> one or more parameters in UDF is always distinct from the other.
>  Take face recognition for example, the face images are almost distinct from 
> each other according to lighting, facial expressions and details.
>  It is important to reduce redundant class compilation especially for those 
> low latency queries.
>  Cache miss rate and metaspace gc can also be reduced by eliminating the 
> redundant classes.
> Here is the query to get the persons whose last name is Brunner and hire from 
> 1st Jan 1990:
>  SELECT full_name, hire_date FROM cp.`employee.json` where last_name = 
> 'Brunner' and hire_date >= '1990-01-01 00:00:00.0';
>  Now get the persons whose last name is Bernard and hire from 1st Jan 1990.
>  SELECT full_name, hire_date FROM cp.`employee.json` where last_name = 
> 'Bernard' and hire_date >= '1990-01-01 00:00:00.0';
> Figure !compilation_time.png! shows the compilation time of the generated 
> code by the above query in FilterRecordBatch on my laptop
>  Figure !code_compare.png!  shows the only difference of the generated code 
> from the attachments is the last_name value at line 156.
>  It is straightforward that the redundant class compilation can be eliminated 
> by making the string12 as a member of the class and set the value when the 
> instance is created



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6763) Codegen optimization of SQL functions with constant values

2018-10-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643066#comment-16643066
 ] 

ASF GitHub Bot commented on DRILL-6763:
---

vvysotskyi commented on issue #1481: DRILL-6763: Codegen optimization of SQL 
functions with constant values
URL: https://github.com/apache/drill/pull/1481#issuecomment-428133136
 
 
   This fix causes the next unit tests failures:
   ```
   Failed tests: 
 
TestLargeFileCompilation.testJDKHugeStringConstantCompilation:257->BaseTestQuery.resetSessionOption:375
 Failed to reset session option `exec.java_compiler`, Error: 
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
IllegalArgumentException: Attempted to send a message when connection is no 
longer valid.
   
   Query submission to Drillbit failed.
   
   [Error Id: 6a397072-825d-4550-a39e-ea4f6eafdc6d ]
   
   Tests in error: 
 TestLargeFileCompilation.testProject:191->BaseTestQuery.testNoResult:384 » 
Rpc
 
TestLargeFileCompilation.testTOP_N_SORT:177->BaseTestQuery.testNoResult:358->BaseTestQuery.testNoResult:384
 » Rpc
 
TestLargeFileCompilation.testEXTERNAL_SORT:171->BaseTestQuery.testNoResult:358->BaseTestQuery.testNoResult:384
 » Rpc
 
TestLargeFileCompilation.testPARQUET_WRITER:156->BaseTestQuery.testNoResult:358->BaseTestQuery.testNoResult:384
 » Rpc
 
TestLargeFileCompilation.testHashJoin:206->BaseTestQuery.testNoResult:358->BaseTestQuery.testNoResult:384
 » Rpc
 TestHashJoin.testHashJoinExprInCondition:252 » Rpc 
org.apache.drill.common.exc...
   
   Tests run: 3545, Failures: 1, Errors: 6, Skipped: 156
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Codegen optimization of SQL functions with constant values
> --
>
> Key: DRILL-6763
> URL: https://issues.apache.org/jira/browse/DRILL-6763
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Codegen
>Affects Versions: 1.14.0
>Reporter: shuifeng lu
>Assignee: shuifeng lu
>Priority: Major
> Fix For: 1.15.0
>
> Attachments: Query1.java, Query2.java, code_compare.png, 
> compilation_time.png
>
>
> Codegen class compilation takes tens to hundreds of milliseconds, a class 
> cache is hit when generifiedCode of code generator is exactly the same.
>  It works fine when UDF only takes columns or symbols, but not efficient when 
> one or more parameters in UDF is always distinct from the other.
>  Take face recognition for example, the face images are almost distinct from 
> each other according to lighting, facial expressions and details.
>  It is important to reduce redundant class compilation especially for those 
> low latency queries.
>  Cache miss rate and metaspace gc can also be reduced by eliminating the 
> redundant classes.
> Here is the query to get the persons whose last name is Brunner and hire from 
> 1st Jan 1990:
>  SELECT full_name, hire_date FROM cp.`employee.json` where last_name = 
> 'Brunner' and hire_date >= '1990-01-01 00:00:00.0';
>  Now get the persons whose last name is Bernard and hire from 1st Jan 1990.
>  SELECT full_name, hire_date FROM cp.`employee.json` where last_name = 
> 'Bernard' and hire_date >= '1990-01-01 00:00:00.0';
> Figure !compilation_time.png! shows the compilation time of the generated 
> code by the above query in FilterRecordBatch on my laptop
>  Figure !code_compare.png!  shows the only difference of the generated code 
> from the attachments is the last_name value at line 156.
>  It is straightforward that the redundant class compilation can be eliminated 
> by making the string12 as a member of the class and set the value when the 
> instance is created



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

1 2 >

1 - 100 of 103 matches

Mail list logo