[jira] [Commented] (DRILL-7016) Wrong query result with RuntimeFilter enabled when order of join and filter condition is swapped
[ https://issues.apache.org/jira/browse/DRILL-7016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755771#comment-16755771 ] Sorabh Hamirwasia commented on DRILL-7016: -- Looked more into the issue and it's *not* because of filter on top of Runtime Filter operator. While generating RuntimeFilter there is a bug in which left and right side fields in join condition is decided to be used in BloomFilter. It doesn't uses the ordinals for right keys and instead directly get the field name starting with index 0 for each left keys. Hence with changing order of filter and join condition the ordinals of right fields changes and bloom filter is generated for wrong right side field. For example: In case 1 below bloomFilter is generated on right side column c_mktsegment instead of c_custkey. Whereas in case 2 bloomFilter is generated on right side column c_custkey. *Case 1:* {code:java} 01-04 HashJoin(condition=[=($2, $0)], joinType=[inner], semi-join: =[false]) : rowType = RecordType(ANY o_custkey, ANY c_mktsegment, ANY c_custkey): rowcount = 1.5E7, cumulative cost = {6.3675E7 rows, 2.38725E8 cpu, 1.8E7 io, 3.87072E9 network, 396.05 memory}, id = 65202{code} 1 row selected (3.654 seconds) {code:java} 0: jdbc:drill:drillbits=10.10.100.188> select count(*) . . . . . . . . . . . . . . semicolon> from . . . . . . . . . . . . . . semicolon> customer c, . . . . . . . . . . . . . . semicolon> orders o . . . . . . . . . . . . . . semicolon> where c.c_mktsegment = 'HOUSEHOLD' . . . . . . . . . . . . . . semicolon> and c.c_custkey = o.o_custkey;{code} +-+ | EXPR$0 | +-+ | 19826 | +-+ *Case 2:* {code:java} 01-04 HashJoin(condition=[=($1, $0)], joinType=[inner], semi-join: =[false]) : rowType = RecordType(ANY o_custkey, ANY c_custkey, ANY c_mktsegment): rowcount = 1.5E7, cumulative cost = {6.3675E7 rows, 2.38725E8 cpu, 1.8E7 io, 3.87072E9 network, 396.05 memory}, id = 66134{code} 1 row selected (1.328 seconds) {code:java} 0: jdbc:drill:drillbits=10.10.100.188> select count(*) . . . . . . . . . . . . . . semicolon> from . . . . . . . . . . . . . . semicolon> customer c, . . . . . . . . . . . . . . semicolon> orders o . . . . . . . . . . . . . . semicolon> where c.c_custkey = o.o_custkey and . . . . . . . . . . . . . . semicolon> c.c_mktsegment = 'HOUSEHOLD' . . . . . . . . . . . . . . semicolon> ;{code} +--+ | EXPR$0 | +--+ | 2990828 | +--+ > Wrong query result with RuntimeFilter enabled when order of join and filter > condition is swapped > > > Key: DRILL-7016 > URL: https://issues.apache.org/jira/browse/DRILL-7016 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.16.0 >Reporter: Sorabh Hamirwasia >Assignee: Sorabh Hamirwasia >Priority: Major > Fix For: 1.16.0 > > > Below 2 queries generate different results: > *Query1: Result: 19826* > {code:java} > select count(*) > from > customer c, > orders o > where > c.c_mktsegment = 'HOUSEHOLD' > and c.c_custkey = o.o_custkey > {code} > *Query2: Result: 2990828* > {code:java} > select count(*) > from > customer c, > orders o > where > c.c_custkey = o.o_custkey and > c.c_mktsegment = 'HOUSEHOLD' > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7016) Wrong query result with RuntimeFilter enabled when order of join and filter condition is swapped
[ https://issues.apache.org/jira/browse/DRILL-7016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sorabh Hamirwasia updated DRILL-7016: - Reviewer: weijie.tong > Wrong query result with RuntimeFilter enabled when order of join and filter > condition is swapped > > > Key: DRILL-7016 > URL: https://issues.apache.org/jira/browse/DRILL-7016 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.16.0 >Reporter: Sorabh Hamirwasia >Assignee: Sorabh Hamirwasia >Priority: Major > Fix For: 1.16.0 > > > Below 2 queries generate different results: > *Query1: Result: 19826* > {code:java} > select count(*) > from > customer c, > orders o > where > c.c_mktsegment = 'HOUSEHOLD' > and c.c_custkey = o.o_custkey > {code} > *Query2: Result: 2990828* > {code:java} > select count(*) > from > customer c, > orders o > where > c.c_custkey = o.o_custkey and > c.c_mktsegment = 'HOUSEHOLD' > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-7016) Wrong query result with RuntimeFilter enabled when order of join and filter condition is swapped
Sorabh Hamirwasia created DRILL-7016: Summary: Wrong query result with RuntimeFilter enabled when order of join and filter condition is swapped Key: DRILL-7016 URL: https://issues.apache.org/jira/browse/DRILL-7016 Project: Apache Drill Issue Type: Bug Components: Execution - Flow Affects Versions: 1.16.0 Reporter: Sorabh Hamirwasia Assignee: Sorabh Hamirwasia Fix For: 1.16.0 Below 2 queries generate different results: *Query1: Result: 19826* {code:java} select count(*) from customer c, orders o where c.c_mktsegment = 'HOUSEHOLD' and c.c_custkey = o.o_custkey {code} *Query2: Result: 2990828* {code:java} select count(*) from customer c, orders o where c.c_custkey = o.o_custkey and c.c_mktsegment = 'HOUSEHOLD' {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7014) Format plugin for LTSV files
[ https://issues.apache.org/jira/browse/DRILL-7014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755547#comment-16755547 ] Charles Givre commented on DRILL-7014: -- [~priteshm] I can review this. The initial PR didn't pass Travis CI, but I'll post comments anyway in the next day or so. > Format plugin for LTSV files > > > Key: DRILL-7014 > URL: https://issues.apache.org/jira/browse/DRILL-7014 > Project: Apache Drill > Issue Type: New Feature > Components: Storage - Other >Affects Versions: 1.15.0 >Reporter: Takako Shimamoto >Assignee: Takako Shimamoto >Priority: Major > Fix For: 1.16.0 > > > I would like to contribute [this > plugin|https://github.com/bizreach/drill-ltsv-plugin] to Drill. > h4. Abstract > storage-plugins-override.conf > {code:json} > "storage":{ > dfs: { > type: "file", > connection: "file:///", > formats: { > "ltsv": { > "type": "ltsv", > "extensions": [ > "ltsv" > ] > } > }, > enabled: true > } > } > {code} > sample.ltsv > {code} > time:30/Nov/2016:00:55:08 +0900 host:xxx.xxx.xxx.xxx forwardedfor:- req:GET > /v1/xxx HTTP/1.1 status:200 size:4968 referer:- ua:Java/1.8.0_131 > reqtime:2.532 apptime:2.532 vhost:api.example.com > time:30/Nov/2016:00:56:37 +0900 host:xxx.xxx.xxx.xxx forwardedfor:- req:GET > /v1/yyy HTTP/1.1 status:200 size:412 referer:- ua:Java/1.8.0_201 > reqtime:3.580 apptime:3.580 vhost:api.example.com > {code} > Run query > {code:sh} > root@1805183e9b65:/apache-drill-1.15.0# ./bin/drill-embedded > Apache Drill 1.15.0 > "Drill must go on." > 0: jdbc:drill:zk=local> SELECT * FROM > dfs.`/apache-drill-1.15.0/sample-data/sample.ltsv` WHERE reqtime > 3.0; > +-+--+---+---+-+---+--+-+--+--+--+ > |time | host | forwardedfor | > req | status | size | referer | ua| reqtime | > apptime | vhost | > +-+--+---+---+-+---+--+-+--+--+--+ > | 30/Nov/2016:00:56:37 +0900 | xxx.xxx.xxx.xxx | - | GET > /v1/yyy HTTP/1.1 | 200 | 412 | -| Java/1.8.0_201 | 3.580| > 3.580| api.example.com | > +-+--+---+---+-+---+--+-+--+--+--+ > 1 row selected (6.074 seconds) > 0: jdbc:drill:zk=local> > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7014) Format plugin for LTSV files
[ https://issues.apache.org/jira/browse/DRILL-7014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pritesh Maker updated DRILL-7014: - Fix Version/s: 1.16.0 > Format plugin for LTSV files > > > Key: DRILL-7014 > URL: https://issues.apache.org/jira/browse/DRILL-7014 > Project: Apache Drill > Issue Type: New Feature > Components: Storage - Other >Affects Versions: 1.15.0 >Reporter: Takako Shimamoto >Assignee: Takako Shimamoto >Priority: Major > Fix For: 1.16.0 > > > I would like to contribute [this > plugin|https://github.com/bizreach/drill-ltsv-plugin] to Drill. > h4. Abstract > storage-plugins-override.conf > {code:json} > "storage":{ > dfs: { > type: "file", > connection: "file:///", > formats: { > "ltsv": { > "type": "ltsv", > "extensions": [ > "ltsv" > ] > } > }, > enabled: true > } > } > {code} > sample.ltsv > {code} > time:30/Nov/2016:00:55:08 +0900 host:xxx.xxx.xxx.xxx forwardedfor:- req:GET > /v1/xxx HTTP/1.1 status:200 size:4968 referer:- ua:Java/1.8.0_131 > reqtime:2.532 apptime:2.532 vhost:api.example.com > time:30/Nov/2016:00:56:37 +0900 host:xxx.xxx.xxx.xxx forwardedfor:- req:GET > /v1/yyy HTTP/1.1 status:200 size:412 referer:- ua:Java/1.8.0_201 > reqtime:3.580 apptime:3.580 vhost:api.example.com > {code} > Run query > {code:sh} > root@1805183e9b65:/apache-drill-1.15.0# ./bin/drill-embedded > Apache Drill 1.15.0 > "Drill must go on." > 0: jdbc:drill:zk=local> SELECT * FROM > dfs.`/apache-drill-1.15.0/sample-data/sample.ltsv` WHERE reqtime > 3.0; > +-+--+---+---+-+---+--+-+--+--+--+ > |time | host | forwardedfor | > req | status | size | referer | ua| reqtime | > apptime | vhost | > +-+--+---+---+-+---+--+-+--+--+--+ > | 30/Nov/2016:00:56:37 +0900 | xxx.xxx.xxx.xxx | - | GET > /v1/yyy HTTP/1.1 | 200 | 412 | -| Java/1.8.0_201 | 3.580| > 3.580| api.example.com | > +-+--+---+---+-+---+--+-+--+--+--+ > 1 row selected (6.074 seconds) > 0: jdbc:drill:zk=local> > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7014) Format plugin for LTSV files
[ https://issues.apache.org/jira/browse/DRILL-7014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pritesh Maker updated DRILL-7014: - Reviewer: Charles Givre > Format plugin for LTSV files > > > Key: DRILL-7014 > URL: https://issues.apache.org/jira/browse/DRILL-7014 > Project: Apache Drill > Issue Type: New Feature > Components: Storage - Other >Affects Versions: 1.15.0 >Reporter: Takako Shimamoto >Assignee: Takako Shimamoto >Priority: Major > Fix For: 1.16.0 > > > I would like to contribute [this > plugin|https://github.com/bizreach/drill-ltsv-plugin] to Drill. > h4. Abstract > storage-plugins-override.conf > {code:json} > "storage":{ > dfs: { > type: "file", > connection: "file:///", > formats: { > "ltsv": { > "type": "ltsv", > "extensions": [ > "ltsv" > ] > } > }, > enabled: true > } > } > {code} > sample.ltsv > {code} > time:30/Nov/2016:00:55:08 +0900 host:xxx.xxx.xxx.xxx forwardedfor:- req:GET > /v1/xxx HTTP/1.1 status:200 size:4968 referer:- ua:Java/1.8.0_131 > reqtime:2.532 apptime:2.532 vhost:api.example.com > time:30/Nov/2016:00:56:37 +0900 host:xxx.xxx.xxx.xxx forwardedfor:- req:GET > /v1/yyy HTTP/1.1 status:200 size:412 referer:- ua:Java/1.8.0_201 > reqtime:3.580 apptime:3.580 vhost:api.example.com > {code} > Run query > {code:sh} > root@1805183e9b65:/apache-drill-1.15.0# ./bin/drill-embedded > Apache Drill 1.15.0 > "Drill must go on." > 0: jdbc:drill:zk=local> SELECT * FROM > dfs.`/apache-drill-1.15.0/sample-data/sample.ltsv` WHERE reqtime > 3.0; > +-+--+---+---+-+---+--+-+--+--+--+ > |time | host | forwardedfor | > req | status | size | referer | ua| reqtime | > apptime | vhost | > +-+--+---+---+-+---+--+-+--+--+--+ > | 30/Nov/2016:00:56:37 +0900 | xxx.xxx.xxx.xxx | - | GET > /v1/yyy HTTP/1.1 | 200 | 412 | -| Java/1.8.0_201 | 3.580| > 3.580| api.example.com | > +-+--+---+---+-+---+--+-+--+--+--+ > 1 row selected (6.074 seconds) > 0: jdbc:drill:zk=local> > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (DRILL-7014) Format plugin for LTSV files
[ https://issues.apache.org/jira/browse/DRILL-7014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pritesh Maker reassigned DRILL-7014: Assignee: Takako Shimamoto > Format plugin for LTSV files > > > Key: DRILL-7014 > URL: https://issues.apache.org/jira/browse/DRILL-7014 > Project: Apache Drill > Issue Type: New Feature > Components: Storage - Other >Affects Versions: 1.15.0 >Reporter: Takako Shimamoto >Assignee: Takako Shimamoto >Priority: Major > > I would like to contribute [this > plugin|https://github.com/bizreach/drill-ltsv-plugin] to Drill. > h4. Abstract > storage-plugins-override.conf > {code:json} > "storage":{ > dfs: { > type: "file", > connection: "file:///", > formats: { > "ltsv": { > "type": "ltsv", > "extensions": [ > "ltsv" > ] > } > }, > enabled: true > } > } > {code} > sample.ltsv > {code} > time:30/Nov/2016:00:55:08 +0900 host:xxx.xxx.xxx.xxx forwardedfor:- req:GET > /v1/xxx HTTP/1.1 status:200 size:4968 referer:- ua:Java/1.8.0_131 > reqtime:2.532 apptime:2.532 vhost:api.example.com > time:30/Nov/2016:00:56:37 +0900 host:xxx.xxx.xxx.xxx forwardedfor:- req:GET > /v1/yyy HTTP/1.1 status:200 size:412 referer:- ua:Java/1.8.0_201 > reqtime:3.580 apptime:3.580 vhost:api.example.com > {code} > Run query > {code:sh} > root@1805183e9b65:/apache-drill-1.15.0# ./bin/drill-embedded > Apache Drill 1.15.0 > "Drill must go on." > 0: jdbc:drill:zk=local> SELECT * FROM > dfs.`/apache-drill-1.15.0/sample-data/sample.ltsv` WHERE reqtime > 3.0; > +-+--+---+---+-+---+--+-+--+--+--+ > |time | host | forwardedfor | > req | status | size | referer | ua| reqtime | > apptime | vhost | > +-+--+---+---+-+---+--+-+--+--+--+ > | 30/Nov/2016:00:56:37 +0900 | xxx.xxx.xxx.xxx | - | GET > /v1/yyy HTTP/1.1 | 200 | 412 | -| Java/1.8.0_201 | 3.580| > 3.580| api.example.com | > +-+--+---+---+-+---+--+-+--+--+--+ > 1 row selected (6.074 seconds) > 0: jdbc:drill:zk=local> > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7014) Format plugin for LTSV files
[ https://issues.apache.org/jira/browse/DRILL-7014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755545#comment-16755545 ] Pritesh Maker commented on DRILL-7014: -- [~cgivre] would you be able to review this contribution? > Format plugin for LTSV files > > > Key: DRILL-7014 > URL: https://issues.apache.org/jira/browse/DRILL-7014 > Project: Apache Drill > Issue Type: New Feature > Components: Storage - Other >Affects Versions: 1.15.0 >Reporter: Takako Shimamoto >Assignee: Takako Shimamoto >Priority: Major > > I would like to contribute [this > plugin|https://github.com/bizreach/drill-ltsv-plugin] to Drill. > h4. Abstract > storage-plugins-override.conf > {code:json} > "storage":{ > dfs: { > type: "file", > connection: "file:///", > formats: { > "ltsv": { > "type": "ltsv", > "extensions": [ > "ltsv" > ] > } > }, > enabled: true > } > } > {code} > sample.ltsv > {code} > time:30/Nov/2016:00:55:08 +0900 host:xxx.xxx.xxx.xxx forwardedfor:- req:GET > /v1/xxx HTTP/1.1 status:200 size:4968 referer:- ua:Java/1.8.0_131 > reqtime:2.532 apptime:2.532 vhost:api.example.com > time:30/Nov/2016:00:56:37 +0900 host:xxx.xxx.xxx.xxx forwardedfor:- req:GET > /v1/yyy HTTP/1.1 status:200 size:412 referer:- ua:Java/1.8.0_201 > reqtime:3.580 apptime:3.580 vhost:api.example.com > {code} > Run query > {code:sh} > root@1805183e9b65:/apache-drill-1.15.0# ./bin/drill-embedded > Apache Drill 1.15.0 > "Drill must go on." > 0: jdbc:drill:zk=local> SELECT * FROM > dfs.`/apache-drill-1.15.0/sample-data/sample.ltsv` WHERE reqtime > 3.0; > +-+--+---+---+-+---+--+-+--+--+--+ > |time | host | forwardedfor | > req | status | size | referer | ua| reqtime | > apptime | vhost | > +-+--+---+---+-+---+--+-+--+--+--+ > | 30/Nov/2016:00:56:37 +0900 | xxx.xxx.xxx.xxx | - | GET > /v1/yyy HTTP/1.1 | 200 | 412 | -| Java/1.8.0_201 | 3.580| > 3.580| api.example.com | > +-+--+---+---+-+---+--+-+--+--+--+ > 1 row selected (6.074 seconds) > 0: jdbc:drill:zk=local> > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6991) Kerberos ticket is being dumped in the log if log level is "debug" for stdout
[ https://issues.apache.org/jira/browse/DRILL-6991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755543#comment-16755543 ] Pritesh Maker commented on DRILL-6991: -- [~shamirwasia] do you recommend we close this issue? > Kerberos ticket is being dumped in the log if log level is "debug" for stdout > -- > > Key: DRILL-6991 > URL: https://issues.apache.org/jira/browse/DRILL-6991 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.15.0 >Reporter: Anton Gozhiy >Priority: Major > > *Prerequisites:* > # Drill is installed on cluster with Kerberos security > # Into conf/logback.xml, set the following log level: > {code:xml} > > > > > {code} > *Steps:* > # Start Drill > # Connect using sqlline using the following string: > {noformat} > bin/sqlline -u "jdbc:drill:zk=;principal=" > {noformat} > *Expected result:* > No sensitive information should be displayed > *Actual result:* > Kerberos ticket and session key are being dumped into console output: > {noformat} > 14:35:38.806 [TGT Renewer for mapr/node1.cluster.com@NODE1] DEBUG > o.a.h.security.UserGroupInformation - Found tgt Ticket (hex) = > : 61 82 01 3D 30 82 01 39 A0 03 02 01 05 A1 07 1B a..=0..9 > 0010: 05 4E 4F 44 45 31 A2 1A 30 18 A0 03 02 01 02 A1 .NODE1..0... > 0020: 11 30 0F 1B 06 6B 72 62 74 67 74 1B 05 4E 4F 44 .0...krbtgt..NOD > 0030: 45 31 A3 82 01 0B 30 82 01 07 A0 03 02 01 12 A1 E10. > 0040: 03 02 01 01 A2 81 FA 04 81 F7 03 8D A9 FA 7D 89 > 0050: 1B DF 37 B7 4D E6 6C 99 3E 8F FA 48 D9 9A 79 F3 ..7.M.l.>..H..y. > 0060: 92 34 7F BF 67 1E 77 4A 2F C9 AF 82 93 4E 46 1D .4..g.wJ/NF. > 0070: 41 74 B0 AF 41 A8 8B 02 71 83 CC 14 51 72 60 EE At..A...q...Qr`. > 0080: 29 67 14 F0 A6 33 63 07 41 AA 8D DC 7B 5B 41 F3 )g...3c.A[A. > 0090: 83 48 8B 2A 0B 4D 6D 57 9A 6E CF 6B DC 0B C0 D1 .H.*.MmW.n.k > 00A0: 83 BB 27 40 88 7E 9F 2B D1 FD A8 6A E1 BF F6 CC ..'@...+...j > 00B0: 0E 0C FB 93 5D 69 9A 8B 11 88 0C F2 7C E1 FD 04 ]i.. > 00C0: F5 AB 66 0C A4 A4 7B 30 D1 7F F1 2D D6 A1 52 D1 ..f0...-..R. > 00D0: 79 59 F2 06 CB 65 FB 73 63 1D 5B E9 4F 28 73 EB yY...e.sc.[.O(s. > 00E0: 72 7F 04 46 34 56 F4 40 6C C0 2C 39 C0 5B C6 25 r..F4V.@l.,9.[.% > 00F0: ED EF 64 07 CE ED 35 9D D7 91 6C 8F C9 CE 16 F5 ..d...5...l. > 0100: CA 5E 6F DE 08 D2 68 30 C7 03 97 E7 C0 FF D9 52 .^o...h0...R > 0110: F8 1D 2F DB 63 6D 12 4A CD 60 AD D0 BA FA 4B CF ../.cm.J.`K. > 0120: 2C B9 8C CA 5A E6 EC 10 5A 0A 1F 84 B0 80 BD 39 ,...Z...Z..9 > 0130: 42 2C 33 EB C0 AA 0D 44 F0 F4 E9 87 24 43 BB 9A B,3D$C.. > 0140: 52 R > Client Principal = mapr/node1.cluster.com@NODE1 > Server Principal = krbtgt/NODE1@NODE1 > Session Key = EncryptionKey: keyType=18 keyBytes (hex dump)= > : 50 DA D1 D7 91 D3 64 BE 45 7B D8 02 25 81 18 25 P.d.E...%..% > 0010: DA 59 4F BA 76 67 BB 39 9C F7 17 46 A7 C5 00 E2 .YO.vg.9...F > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6855) Query from non-existent proxy user fails with "No default schema selected" when impersonation is enabled
[ https://issues.apache.org/jira/browse/DRILL-6855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pritesh Maker updated DRILL-6855: - Fix Version/s: 1.16.0 > Query from non-existent proxy user fails with "No default schema selected" > when impersonation is enabled > > > Key: DRILL-6855 > URL: https://issues.apache.org/jira/browse/DRILL-6855 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.15.0 >Reporter: Abhishek Ravi >Assignee: Abhishek Ravi >Priority: Major > Fix For: 1.16.0 > > > Query from a *proxy user* fails with following error when *impersonation* is > *enabled* but user does not exist. This behaviour was discovered when running > Drill on MapR. > {noformat} > Error: VALIDATION ERROR: Schema [[dfs]] is not valid with respect to either > root schema or current default schema. > Current default schema: No default schema selected > {noformat} > The above error is very confusing and made it very hard to relate to proxy > user does not exist + impersonation issue. > The {{fs.access(wsPath, FsAction.READ)}} in > {{WorkspaceSchemaFactory.accessible fails with IOException,}} which is not > handled in {{accessible}} but in {{DynamicRootSchema.loadSchemaFactory}}. At > this point none of the schemas are registered and hence the root schema will > be registered as default schema. > The query execution continues and fails much ahead at > {{DrillSqlWorker.getQueryPlan}} where the {{SqlConverter.validate}} > eventually throws {{SchemaUtilites.throwSchemaNotFoundException}}. > One possible fix could be to handle {{IOException}} similar to > {{FileNotFoundException}} in {{WorkspaceSchemaFactory.accessible}}. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-7015) Improve documentation for PARTITION BY
Boaz Ben-Zvi created DRILL-7015: --- Summary: Improve documentation for PARTITION BY Key: DRILL-7015 URL: https://issues.apache.org/jira/browse/DRILL-7015 Project: Apache Drill Issue Type: Improvement Components: Documentation Affects Versions: 1.15.0 Reporter: Boaz Ben-Zvi Assignee: Bridget Bevens Fix For: 1.16.0 The documentation for CREATE TABLE AS (CTAS) shows the syntax of the command, without the optional PARTITION BY clause. That option is only mentioned later under the usage notes. *+_Suggestion_+*: Add this optional clause to the syntax (same as for CREATE TEMPORARY TABLE (CTTAS)). And mention that this option is only applicable when storing in Parquet. And the documentation for CREATE TEMPORARY TABLE (CTTAS), the comment says: {panel} An optional parameter that can *only* be used to create temporary tables with the Parquet data format. {panel} Which can mistakenly be understood as "only for temporary tables". *_+Suggestion+_*: erase the "to create temporary tables" part (not needed, as it is implied from the context of this page). *_+Last suggestion+_*: In the documentation for the PARTITION BY clause, can add an example using the implicit column "filename" to demonstrate how the partitioning column puts each distinct value into a separate file. For example, add in the "Other Examples" section : {noformat} 0: jdbc:drill:zk=local> select distinct r_regionkey, filename from mytable1; +--++ | r_regionkey |filename| +--++ | 2| 0_0_3.parquet | | 1| 0_0_2.parquet | | 0| 0_0_1.parquet | | 3| 0_0_4.parquet | | 4| 0_0_5.parquet | +--++ {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6997) Semijoin is changing the join ordering for some tpcds queries.
[ https://issues.apache.org/jira/browse/DRILL-6997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanumath Rao Maduri updated DRILL-6997: --- Labels: ready-to-commit (was: ) > Semijoin is changing the join ordering for some tpcds queries. > -- > > Key: DRILL-6997 > URL: https://issues.apache.org/jira/browse/DRILL-6997 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.15.0 >Reporter: Hanumath Rao Maduri >Assignee: Hanumath Rao Maduri >Priority: Major > Labels: ready-to-commit > Fix For: 1.16.0 > > Attachments: 240aa5f8-24c4-e678-8d42-0fc06e5d2465.sys.drill, > 240abc6d-b816-5320-93b1-2a07d850e734.sys.drill > > > TPCDS query 95 runs 50% slower with semi-join enabled compared to semi-join > disabled at scale factor 100. It runs 100% slower at scale factor 1000. This > issue was introduced with commit 71809ca6216d95540b2a41ce1ab2ebb742888671. > DRILL-6798: Planner changes to support semi-join. > {code:java} > with ws_wh as > (select ws1.ws_order_number,ws1.ws_warehouse_sk wh1,ws2.ws_warehouse_sk wh2 > from web_sales ws1,web_sales ws2 > where ws1.ws_order_number = ws2.ws_order_number > and ws1.ws_warehouse_sk <> ws2.ws_warehouse_sk) > [_LIMITA] select [_LIMITB] > count(distinct ws_order_number) as "order count" > ,sum(ws_ext_ship_cost) as "total shipping cost" > ,sum(ws_net_profit) as "total net profit" > from > web_sales ws1 > ,date_dim > ,customer_address > ,web_site > where > d_date between '[YEAR]-[MONTH]-01' and > (cast('[YEAR]-[MONTH]-01' as date) + 60 days) > and ws1.ws_ship_date_sk = d_date_sk > and ws1.ws_ship_addr_sk = ca_address_sk > and ca_state = '[STATE]' > and ws1.ws_web_site_sk = web_site_sk > and web_company_name = 'pri' > and ws1.ws_order_number in (select ws_order_number > from ws_wh) > and ws1.ws_order_number in (select wr_order_number > from web_returns,ws_wh > where wr_order_number = ws_wh.ws_order_number) > order by count(distinct ws_order_number) > [_LIMITC]; > {code} > I have attached two profiles. 240abc6d-b816-5320-93b1-2a07d850e734 has > semi-join enabled. 240aa5f8-24c4-e678-8d42-0fc06e5d2465 has semi-join > disabled. Both are executed with commit id > 6267185823c4c50ab31c029ee5b4d9df2fc94d03 and scale factor 100. > The plan with semi-join enabled has moved the first hash join: > and ws1.ws_order_number in (select ws_order_number > from ws_wh) > It used to be on the build side of the first HJ on the left hand side > (04-05). It is now on the build side of the fourth HJ on the left hand side > (01-13). > The plan with semi-join enabled has a hash_partition_sender (operator 05-00) > that takes 10 seconds to execute. But all the fragments take about the same > amount of time. > The plan with semi-join enabled has two HJ that process 1B rows while the > plan with semi-joins disabled has one HJ that processes 1B rows. > The plan with semi-join enabled has several senders and receivers that wait > more than 10 seconds, (00-07, 01-07, 03-00, 04-00, 07-00, 08-00, 14-00, > 17-00). When disabled, there is no operator waiting more than 10 seconds. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7011) Allow hybrid model in the Row set-based scan framework
[ https://issues.apache.org/jira/browse/DRILL-7011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-7011: Description: As part of schema provisioning project we want to allow hybrid model for Row set-based scan framework, namely to allow to pass custom schema metadata which can be partial. Currently schema provisioning has SchemaContainer class that contains the following information (can be obtained from metastore, schema file, table function): 1. schema represented by org.apache.drill.exec.record.metadata.TupleMetadata 2. properties represented by Map, can contain information if schema is strict or partial (default is partial) etc. was: As part of schema provisioning project we want to allow hybrid model for Row set-based scan framework, namely to allow to pass custom schema metadata which can be partial. Currently schema provisioning has SchemaContainer class that contains the following information (can be obtained from metastore, schema file, table function): 1. table schema represented by org.apache.drill.exec.record.metadata.TupleMetadata 2. table properties represented by Map, can contain information if schema is strict or partial (default is partial) etc. > Allow hybrid model in the Row set-based scan framework > -- > > Key: DRILL-7011 > URL: https://issues.apache.org/jira/browse/DRILL-7011 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.15.0 >Reporter: Arina Ielchiieva >Assignee: Paul Rogers >Priority: Major > Fix For: 1.16.0 > > > As part of schema provisioning project we want to allow hybrid model for Row > set-based scan framework, namely to allow to pass custom schema metadata > which can be partial. > Currently schema provisioning has SchemaContainer class that contains the > following information (can be obtained from metastore, schema file, table > function): > 1. schema represented by org.apache.drill.exec.record.metadata.TupleMetadata > 2. properties represented by Map, can contain information if > schema is strict or partial (default is partial) etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7011) Allow hybrid model in the Row set-based scan framework
[ https://issues.apache.org/jira/browse/DRILL-7011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-7011: Description: As part of schema provisioning project we want to allow hybrid model for Row set-based scan framework, namely to allow to pass custom schema metadata which can be partial. Currently schema provisioning has SchemaContainer class that contains the following information (can be obtained from metastore, schema file, table function): 1. table schema represented by org.apache.drill.exec.record.metadata.TupleMetadata 2. table properties represented by Map, can contain information if schema is strict or partial (default is partial) etc. was: As part of schema provisioning project we want to allow hybrid model for Row set-based scan framework, namely to allow to pass custom schema metadata which can be partial. Currently schema provisioning has TableSchema class that contains the following information (can be obtained from metastore, schema file, table function): 1. table schema represented by org.apache.drill.exec.record.metadata.TupleMetadata 2. table properties represented by Map, can contain information if schema is strict or partial (default is partial) etc. > Allow hybrid model in the Row set-based scan framework > -- > > Key: DRILL-7011 > URL: https://issues.apache.org/jira/browse/DRILL-7011 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.15.0 >Reporter: Arina Ielchiieva >Assignee: Paul Rogers >Priority: Major > Fix For: 1.16.0 > > > As part of schema provisioning project we want to allow hybrid model for Row > set-based scan framework, namely to allow to pass custom schema metadata > which can be partial. > Currently schema provisioning has SchemaContainer class that contains the > following information (can be obtained from metastore, schema file, table > function): > 1. table schema represented by > org.apache.drill.exec.record.metadata.TupleMetadata > 2. table properties represented by Map, can contain > information if schema is strict or partial (default is partial) etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (DRILL-7002) RuntimeFilter produce wrong results while setting exec.hashjoin.num_partitions=1
[ https://issues.apache.org/jira/browse/DRILL-7002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva reopened DRILL-7002: - > RuntimeFilter produce wrong results while setting > exec.hashjoin.num_partitions=1 > > > Key: DRILL-7002 > URL: https://issues.apache.org/jira/browse/DRILL-7002 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.16.0 >Reporter: weijie.tong >Assignee: Arina Ielchiieva >Priority: Major > Labels: ready-to-commit > Fix For: 1.16.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (DRILL-7002) RuntimeFilter produce wrong results while setting exec.hashjoin.num_partitions=1
[ https://issues.apache.org/jira/browse/DRILL-7002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva reassigned DRILL-7002: --- Assignee: weijie.tong (was: Arina Ielchiieva) > RuntimeFilter produce wrong results while setting > exec.hashjoin.num_partitions=1 > > > Key: DRILL-7002 > URL: https://issues.apache.org/jira/browse/DRILL-7002 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.16.0 >Reporter: weijie.tong >Assignee: weijie.tong >Priority: Major > Labels: ready-to-commit > Fix For: 1.16.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (DRILL-7002) RuntimeFilter produce wrong results while setting exec.hashjoin.num_partitions=1
[ https://issues.apache.org/jira/browse/DRILL-7002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva reassigned DRILL-7002: --- Assignee: Arina Ielchiieva (was: weijie.tong) > RuntimeFilter produce wrong results while setting > exec.hashjoin.num_partitions=1 > > > Key: DRILL-7002 > URL: https://issues.apache.org/jira/browse/DRILL-7002 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.16.0 >Reporter: weijie.tong >Assignee: Arina Ielchiieva >Priority: Major > Labels: ready-to-commit > Fix For: 1.16.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-7014) Format plugin for LTSV files
Takako Shimamoto created DRILL-7014: --- Summary: Format plugin for LTSV files Key: DRILL-7014 URL: https://issues.apache.org/jira/browse/DRILL-7014 Project: Apache Drill Issue Type: New Feature Components: Storage - Other Affects Versions: 1.15.0 Reporter: Takako Shimamoto I would like to contribute [this plugin|https://github.com/bizreach/drill-ltsv-plugin] to Drill. h4. Abstract storage-plugins-override.conf {code:json} "storage":{ dfs: { type: "file", connection: "file:///", formats: { "ltsv": { "type": "ltsv", "extensions": [ "ltsv" ] } }, enabled: true } } {code} sample.ltsv {code} time:30/Nov/2016:00:55:08 +0900 host:xxx.xxx.xxx.xxx forwardedfor:- req:GET /v1/xxx HTTP/1.1 status:200 size:4968 referer:- ua:Java/1.8.0_131 reqtime:2.532 apptime:2.532 vhost:api.example.com time:30/Nov/2016:00:56:37 +0900 host:xxx.xxx.xxx.xxx forwardedfor:- req:GET /v1/yyy HTTP/1.1 status:200 size:412 referer:- ua:Java/1.8.0_201 reqtime:3.580 apptime:3.580 vhost:api.example.com {code} Run query {code:sh} root@1805183e9b65:/apache-drill-1.15.0# ./bin/drill-embedded Apache Drill 1.15.0 "Drill must go on." 0: jdbc:drill:zk=local> SELECT * FROM dfs.`/apache-drill-1.15.0/sample-data/sample.ltsv` WHERE reqtime > 3.0; +-+--+---+---+-+---+--+-+--+--+--+ |time | host | forwardedfor | req | status | size | referer | ua| reqtime | apptime | vhost | +-+--+---+---+-+---+--+-+--+--+--+ | 30/Nov/2016:00:56:37 +0900 | xxx.xxx.xxx.xxx | - | GET /v1/yyy HTTP/1.1 | 200 | 412 | -| Java/1.8.0_201 | 3.580| 3.580 | api.example.com | +-+--+---+---+-+---+--+-+--+--+--+ 1 row selected (6.074 seconds) 0: jdbc:drill:zk=local> {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7006) Support type conversion shims in RowSetWriter
[ https://issues.apache.org/jira/browse/DRILL-7006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-7006: Labels: ready-to-commit (was: ) > Support type conversion shims in RowSetWriter > - > > Key: DRILL-7006 > URL: https://issues.apache.org/jira/browse/DRILL-7006 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.15.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > Labels: ready-to-commit > Fix For: 1.16.0 > > > The {{RowSet}} tools include a {{RowSetWriter}} for populating a batch of > vectors. A set of "column writers" exist: one for each kind of vector. These > classes provide methods to write a value into a vector. For example, the > {{VarcharColumnWriter}} provides a {{setString())}} method to set the value. > The current writers provide only "natural" conversions: from Java String to > Varchar, from Java Double to FLOAT8 and so on. That is, the methods > implemented for each type are those that provide s single, unambiguous > conversion. > This ticket asks to add a translation layer: to allow, say, writing an Int > column using a String (parsed according to some rules). Or, to convert from a > string to a Date using some format. > The goal is not to provide the type conversions themselves, rather it is to > provide a way to insert the type conversion "shim" on top of the "native" > column writer in a way that is transparent to code using the row set writer. -- This message was sent by Atlassian JIRA (v7.6.3#76005)