[jira] [Commented] (DRILL-5015) As per documentation, when issuing a list of drillbits in the connection string, we always attempt to connect only to the first one
[ https://issues.apache.org/jira/browse/DRILL-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15656001#comment-15656001 ] ASF GitHub Bot commented on DRILL-5015: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/648#discussion_r87531891 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java --- @@ -223,19 +224,94 @@ public void connect(Properties props) throws RpcException { connect(null, props); } + /** + * Function to Populate the endpointList with the list of drillbits + * provided in the connection string by client. + * + * For direct connection we can get URL string having drillbit property as below: + * drillbit=: --- use the ip and port specified as the Foreman ip and port + * drillbit=--- use the ip specified as the Foreman ip with default port in config file + * drillbit=:,:... --- Randomly select the ip and port pair from the specified list as the + * Foreman ip and port. + * + * @param drillbits string with drillbit value provided in connection string + * @param defaultUserPort string with default userport of drillbit specified in config file + * @return list of drillbitendpoints parsed from connection string + * @throws InvalidConnectionInfoException if the connection string has invalid or no drillbit information + */ + static List parseAndVerifyEndpoints(String drillbits, String defaultUserPort) +throws InvalidConnectionInfoException { +// If no drillbits is provided then just return empty list. +if (drillbits.trim().isEmpty()) { + throw new InvalidConnectionInfoException("No drillbit information specified in the connection string"); +} + +ArrayList endpointList = new ArrayList<>(); +final String[] connectInfo = drillbits.split(","); + +// Fetch ip address and port information for each drillbit and populate the list +for (String drillbit : connectInfo) { + + // Trim all the empty spaces and check if the entry is empty string. + // Ignore the empty ones. + drillbit = drillbit.trim(); + + if (!drillbit.isEmpty()) { +// Verify if we have only ":" or only ":port" pattern +if (drillbit.charAt(0) == ':') { + // Invalid drillbit information + throw new InvalidConnectionInfoException("Malformed connection string with drillbit hostname or " + + "hostaddress missing for an entry: " + drillbit); +} + +// We are now sure that each ip:port entry will have some both the entries. +// Split each drillbit connection string to get ip address and port value +final String[] drillbitInfo = drillbit.split(":"); + +// Check if we have more than one port +if (drillbitInfo.length > 2) { + throw new InvalidConnectionInfoException("Malformed connection string with more than one port in a " + + "drillbit entry: " + drillbit); +} + +// At this point we are sure that drillbitInfo has atleast hostname or host address +// trim all the empty spaces which might be present in front of hostname or +// host address information +final String ipAddress = drillbitInfo[0].trim(); +String port = defaultUserPort; + +if (drillbitInfo.length == 2) { + // We have a port value also given by user. trim all the empty spaces between : and port value before + // validating the correctness of value. + port = drillbitInfo[1].trim(); +} + +try { + final DrillbitEndpoint endpoint = DrillbitEndpoint.newBuilder() +.setAddress(ipAddress) + .setUserPort(Integer.parseInt(port)) +.build(); + + endpointList.add(endpoint); +} catch (NumberFormatException e) { + throw new InvalidConnectionInfoException("Malformed port value in entry: " + ipAddress + ":" + port + " " + + "passed in connection string"); +} + } +} +return endpointList; --- End diff -- One last check: must have at least one endpoint. (The code above skips an entry if it is empty. If that was the only entry, the endpoint list might be empty here.) > As per doc
[jira] [Commented] (DRILL-5015) As per documentation, when issuing a list of drillbits in the connection string, we always attempt to connect only to the first one
[ https://issues.apache.org/jira/browse/DRILL-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15656000#comment-15656000 ] ASF GitHub Bot commented on DRILL-5015: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/648#discussion_r87531981 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java --- @@ -223,19 +224,94 @@ public void connect(Properties props) throws RpcException { connect(null, props); } + /** + * Function to Populate the endpointList with the list of drillbits + * provided in the connection string by client. + * + * For direct connection we can get URL string having drillbit property as below: --- End diff -- Nit: this is a Javadoc comment, so strings must be formatted using HTML. One handy tool: drillbit=... us the ip and port... In Eclipse, hovering over the method name shows the formatted Javadoc for a quick check. > As per documentation, when issuing a list of drillbits in the connection > string, we always attempt to connect only to the first one > --- > > Key: DRILL-5015 > URL: https://issues.apache.org/jira/browse/DRILL-5015 > Project: Apache Drill > Issue Type: Bug > Components: Client - JDBC >Affects Versions: 1.8.0, 1.9.0 >Reporter: Sorabh Hamirwasia >Assignee: Sudheesh Katkam > > When trying to connect to a Drill cluster by specifying more than 1 drillbits > to connect to, we always attempt to connect to only the first drillbit. > As an example, we tested against a pair of drillbits, but we always connect > to the first entry in the CSV list by querying for the 'current' drillbit. > The remaining entries are never attempted. > [root@pssc-60 agileSqlPerfTests]# /opt/mapr/drill/drill-1.8.0/bin/sqlline -u > "jdbc:drill:schema=dfs.tmp;drillbit=pssc-61:31010,pssc-62:31010" -f > whereAmI.q | grep -v logback > 1/1 select * from sys.drillbits where `current`; > +-++---++--+ > |hostname | user_port | control_port | data_port | current | > +-++---++--+ > | pssc-61.qa.lab | 31010 | 31011 | 31012 | true | > +-++---++--+ > 1 row selected (0.265 seconds) > Closing: org.apache.drill.jdbc.impl.DrillConnectionImpl > apache drill 1.8.0 > "a little sql for your nosql" > This property is meant for use by clients when not wanting to overload the ZK > for fetching a list of existing Drillbits, but the behaviour doesn't match > the documentation. > [Making a Direct Drillbit Connection | > https://drill.apache.org/docs/using-the-jdbc-driver/#using-the-jdbc-url-format-for-a-direct-drillbit-connection > ] > We need to randomly shuffle between this list and If an entry in the shuffled > list is unreachable, we need to try for the next entry in the list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5015) As per documentation, when issuing a list of drillbits in the connection string, we always attempt to connect only to the first one
[ https://issues.apache.org/jira/browse/DRILL-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15655998#comment-15655998 ] ASF GitHub Bot commented on DRILL-5015: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/648#discussion_r87532223 --- Diff: exec/rpc/src/main/java/org/apache/drill/exec/rpc/InvalidConnectionInfoException.java --- @@ -0,0 +1,35 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.rpc; + +/** + * Exception class to differentiate errors due to malformed connection string from client + */ +public class InvalidConnectionInfoException extends RpcException { + static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(InvalidConnectionInfoException.class); --- End diff -- logger not strictly needed here: we don't log anything. > As per documentation, when issuing a list of drillbits in the connection > string, we always attempt to connect only to the first one > --- > > Key: DRILL-5015 > URL: https://issues.apache.org/jira/browse/DRILL-5015 > Project: Apache Drill > Issue Type: Bug > Components: Client - JDBC >Affects Versions: 1.8.0, 1.9.0 >Reporter: Sorabh Hamirwasia >Assignee: Sudheesh Katkam > > When trying to connect to a Drill cluster by specifying more than 1 drillbits > to connect to, we always attempt to connect to only the first drillbit. > As an example, we tested against a pair of drillbits, but we always connect > to the first entry in the CSV list by querying for the 'current' drillbit. > The remaining entries are never attempted. > [root@pssc-60 agileSqlPerfTests]# /opt/mapr/drill/drill-1.8.0/bin/sqlline -u > "jdbc:drill:schema=dfs.tmp;drillbit=pssc-61:31010,pssc-62:31010" -f > whereAmI.q | grep -v logback > 1/1 select * from sys.drillbits where `current`; > +-++---++--+ > |hostname | user_port | control_port | data_port | current | > +-++---++--+ > | pssc-61.qa.lab | 31010 | 31011 | 31012 | true | > +-++---++--+ > 1 row selected (0.265 seconds) > Closing: org.apache.drill.jdbc.impl.DrillConnectionImpl > apache drill 1.8.0 > "a little sql for your nosql" > This property is meant for use by clients when not wanting to overload the ZK > for fetching a list of existing Drillbits, but the behaviour doesn't match > the documentation. > [Making a Direct Drillbit Connection | > https://drill.apache.org/docs/using-the-jdbc-driver/#using-the-jdbc-url-format-for-a-direct-drillbit-connection > ] > We need to randomly shuffle between this list and If an entry in the shuffled > list is unreachable, we need to try for the next entry in the list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5015) As per documentation, when issuing a list of drillbits in the connection string, we always attempt to connect only to the first one
[ https://issues.apache.org/jira/browse/DRILL-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15655999#comment-15655999 ] ASF GitHub Bot commented on DRILL-5015: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/648#discussion_r87532217 --- Diff: exec/rpc/src/main/java/org/apache/drill/exec/rpc/InvalidConnectionInfoException.java --- @@ -0,0 +1,35 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.rpc; + +/** + * Exception class to differentiate errors due to malformed connection string from client + */ +public class InvalidConnectionInfoException extends RpcException { + static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(InvalidConnectionInfoException.class); + + private final String message; + + public InvalidConnectionInfoException(String message) { --- End diff -- Just use the normal facilities: public Invalid...( String message ) { super( message ); } ... e.getMessage( ) ... > As per documentation, when issuing a list of drillbits in the connection > string, we always attempt to connect only to the first one > --- > > Key: DRILL-5015 > URL: https://issues.apache.org/jira/browse/DRILL-5015 > Project: Apache Drill > Issue Type: Bug > Components: Client - JDBC >Affects Versions: 1.8.0, 1.9.0 >Reporter: Sorabh Hamirwasia >Assignee: Sudheesh Katkam > > When trying to connect to a Drill cluster by specifying more than 1 drillbits > to connect to, we always attempt to connect to only the first drillbit. > As an example, we tested against a pair of drillbits, but we always connect > to the first entry in the CSV list by querying for the 'current' drillbit. > The remaining entries are never attempted. > [root@pssc-60 agileSqlPerfTests]# /opt/mapr/drill/drill-1.8.0/bin/sqlline -u > "jdbc:drill:schema=dfs.tmp;drillbit=pssc-61:31010,pssc-62:31010" -f > whereAmI.q | grep -v logback > 1/1 select * from sys.drillbits where `current`; > +-++---++--+ > |hostname | user_port | control_port | data_port | current | > +-++---++--+ > | pssc-61.qa.lab | 31010 | 31011 | 31012 | true | > +-++---++--+ > 1 row selected (0.265 seconds) > Closing: org.apache.drill.jdbc.impl.DrillConnectionImpl > apache drill 1.8.0 > "a little sql for your nosql" > This property is meant for use by clients when not wanting to overload the ZK > for fetching a list of existing Drillbits, but the behaviour doesn't match > the documentation. > [Making a Direct Drillbit Connection | > https://drill.apache.org/docs/using-the-jdbc-driver/#using-the-jdbc-url-format-for-a-direct-drillbit-connection > ] > We need to randomly shuffle between this list and If an entry in the shuffled > list is unreachable, we need to try for the next entry in the list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15655464#comment-15655464 ] Robert Hou edited comment on DRILL-5035 at 11/11/16 2:29 AM: - I set the new option to false and I do not get an exception. I will try with IMPALA_TIMESTAMP. was (Author: rhou): I set the new option to false and I do not see a problem. I will try with IMPALA_TIMESTAMP. > Selecting timestamp value from Hive table causes IndexOutOfBoundsException > -- > > Key: DRILL-5035 > URL: https://issues.apache.org/jira/browse/DRILL-5035 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.9.0 >Reporter: Robert Hou >Assignee: Vitalii Diravka > Attachments: orders_parts_hive.tar > > > I used the new option to read Hive timestamps. > alter session set `store.parquet.reader.int96_as_timestamp` = true; > This query fails: > select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 > 06:11:52.429'; > Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: > 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768)) > Fragment 0:0 > [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] > (state=,code=0) > Selecting all the columns succeed. > 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where > timestamp_id = '2016-10-03 06:11:52.429'; > +-+++---+--+--+-++-++---++-+-+--+-+ > | o_orderkey | o_custkey | o_orderstatus | o_totalprice | o_orderdate | > o_clerk | o_shippriority | o_comment > | int_id | bigint_id | float_id | double_id | > varchar_id | date_id | timestamp_id | dir0 > | > +-+++---+--+--+-++-++---++-+-+--+-+ > | 11335 | 871| F | 133549.0 | 1994-10-22 | > null | 0 | ealms. theodolites maintain. regular, even > instructions against t | -4 | -4 | -4.0 | -4.0 | -4 > | 2016-09-29 | 2016-10-03 06:11:52.429 | o_orderpriority=2-HIGH | > +-+++---+--+--+-++-++---++-+-+--+-+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15655896#comment-15655896 ] Robert Hou commented on DRILL-5035: --- I am not able to use timestamp_impala yet. But I tried the original query with Drill 1.8, and I get zero rows back. Which makes sense, since we are not interpreting the timestamp correctly. select timestamp_id from orders_parts_hive where timestamp_id >= '2016-10-09 13:36:38.986' and timestamp_id <= '2016-10-09 13:45:38.986'; +---+ | timestamp_id | +---+ +---+ I also tried selecting the whole column. I get bad values (known problem), but I get all the values. I don't get an exception. select timestamp_id from orders_parts_hive; > Selecting timestamp value from Hive table causes IndexOutOfBoundsException > -- > > Key: DRILL-5035 > URL: https://issues.apache.org/jira/browse/DRILL-5035 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.9.0 >Reporter: Robert Hou >Assignee: Vitalii Diravka > Attachments: orders_parts_hive.tar > > > I used the new option to read Hive timestamps. > alter session set `store.parquet.reader.int96_as_timestamp` = true; > This query fails: > select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 > 06:11:52.429'; > Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: > 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768)) > Fragment 0:0 > [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] > (state=,code=0) > Selecting all the columns succeed. > 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where > timestamp_id = '2016-10-03 06:11:52.429'; > +-+++---+--+--+-++-++---++-+-+--+-+ > | o_orderkey | o_custkey | o_orderstatus | o_totalprice | o_orderdate | > o_clerk | o_shippriority | o_comment > | int_id | bigint_id | float_id | double_id | > varchar_id | date_id | timestamp_id | dir0 > | > +-+++---+--+--+-++-++---++-+-+--+-+ > | 11335 | 871| F | 133549.0 | 1994-10-22 | > null | 0 | ealms. theodolites maintain. regular, even > instructions against t | -4 | -4 | -4.0 | -4.0 | -4 > | 2016-09-29 | 2016-10-03 06:11:52.429 | o_orderpriority=2-HIGH | > +-+++---+--+--+-++-++---++-+-+--+-+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4653) Malformed JSON should not stop the entire query from progressing
[ https://issues.apache.org/jira/browse/DRILL-4653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15655750#comment-15655750 ] Kunal Khatua commented on DRILL-4653: - [~ssriniva123] , while the feature is disabled by default, we should mark it as resolved only if it passes with the feature enabled. [~khfaraaz] Please reopen this bug if the FAIL case would qualify as a blocker for closing this bug, so that we are tracking this correctly. > Malformed JSON should not stop the entire query from progressing > > > Key: DRILL-4653 > URL: https://issues.apache.org/jira/browse/DRILL-4653 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - JSON >Affects Versions: 1.6.0 >Reporter: subbu srinivasan > Fix For: 1.9.0 > > > Currently Drill query terminates upon first encounter of a invalid JSON line. > Drill has to continue progressing after ignoring the bad records. Something > similar to a setting of (ignore.malformed.json) would help. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4726) Dynamic UDFs support
[ https://issues.apache.org/jira/browse/DRILL-4726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rahul Challapalli updated DRILL-4726: - Reviewer: (was: Rahul Challapalli) > Dynamic UDFs support > > > Key: DRILL-4726 > URL: https://issues.apache.org/jira/browse/DRILL-4726 > Project: Apache Drill > Issue Type: New Feature >Affects Versions: 1.6.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva > Labels: doc-impacting > Fix For: 1.9.0 > > > Allow register UDFs without restart of Drillbits. > Design is described in document below: > https://docs.google.com/document/d/1FfyJtWae5TLuyheHCfldYUpCdeIezR2RlNsrOTYyAB4/edit?usp=sharing > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15655624#comment-15655624 ] Rahul Challapalli commented on DRILL-5035: -- Thanks [~rhou]. This confirms that its a bug with drill. Can you also check whether it is a regression? (Use timestamp_impala with drill-1.8.0 and see if it succeeds) https://drill.apache.org/docs/parquet-format/#about-int96-support > Selecting timestamp value from Hive table causes IndexOutOfBoundsException > -- > > Key: DRILL-5035 > URL: https://issues.apache.org/jira/browse/DRILL-5035 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.9.0 >Reporter: Robert Hou >Assignee: Vitalii Diravka > Attachments: orders_parts_hive.tar > > > I used the new option to read Hive timestamps. > alter session set `store.parquet.reader.int96_as_timestamp` = true; > This query fails: > select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 > 06:11:52.429'; > Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: > 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768)) > Fragment 0:0 > [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] > (state=,code=0) > Selecting all the columns succeed. > 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where > timestamp_id = '2016-10-03 06:11:52.429'; > +-+++---+--+--+-++-++---++-+-+--+-+ > | o_orderkey | o_custkey | o_orderstatus | o_totalprice | o_orderdate | > o_clerk | o_shippriority | o_comment > | int_id | bigint_id | float_id | double_id | > varchar_id | date_id | timestamp_id | dir0 > | > +-+++---+--+--+-++-++---++-+-+--+-+ > | 11335 | 871| F | 133549.0 | 1994-10-22 | > null | 0 | ealms. theodolites maintain. regular, even > instructions against t | -4 | -4 | -4.0 | -4.0 | -4 > | 2016-09-29 | 2016-10-03 06:11:52.429 | o_orderpriority=2-HIGH | > +-+++---+--+--+-++-++---++-+-+--+-+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15655566#comment-15655566 ] Robert Hou commented on DRILL-5035: --- I tried with Hive. It succeeds. > Selecting timestamp value from Hive table causes IndexOutOfBoundsException > -- > > Key: DRILL-5035 > URL: https://issues.apache.org/jira/browse/DRILL-5035 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.9.0 >Reporter: Robert Hou >Assignee: Vitalii Diravka > Attachments: orders_parts_hive.tar > > > I used the new option to read Hive timestamps. > alter session set `store.parquet.reader.int96_as_timestamp` = true; > This query fails: > select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 > 06:11:52.429'; > Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: > 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768)) > Fragment 0:0 > [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] > (state=,code=0) > Selecting all the columns succeed. > 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where > timestamp_id = '2016-10-03 06:11:52.429'; > +-+++---+--+--+-++-++---++-+-+--+-+ > | o_orderkey | o_custkey | o_orderstatus | o_totalprice | o_orderdate | > o_clerk | o_shippriority | o_comment > | int_id | bigint_id | float_id | double_id | > varchar_id | date_id | timestamp_id | dir0 > | > +-+++---+--+--+-++-++---++-+-+--+-+ > | 11335 | 871| F | 133549.0 | 1994-10-22 | > null | 0 | ealms. theodolites maintain. regular, even > instructions against t | -4 | -4 | -4.0 | -4.0 | -4 > | 2016-09-29 | 2016-10-03 06:11:52.429 | o_orderpriority=2-HIGH | > +-+++---+--+--+-++-++---++-+-+--+-+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15655535#comment-15655535 ] Robert Hou commented on DRILL-5035: --- ~/bin/parquet-meta 00_0 file: file:/root/drill-test-framework-pushdown/data/orders_parts_hive/o_orderpriority=1-URGENT/00_0 creator:parquet-mr version 1.6.0 > Selecting timestamp value from Hive table causes IndexOutOfBoundsException > -- > > Key: DRILL-5035 > URL: https://issues.apache.org/jira/browse/DRILL-5035 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.9.0 >Reporter: Robert Hou >Assignee: Vitalii Diravka > Attachments: orders_parts_hive.tar > > > I used the new option to read Hive timestamps. > alter session set `store.parquet.reader.int96_as_timestamp` = true; > This query fails: > select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 > 06:11:52.429'; > Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: > 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768)) > Fragment 0:0 > [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] > (state=,code=0) > Selecting all the columns succeed. > 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where > timestamp_id = '2016-10-03 06:11:52.429'; > +-+++---+--+--+-++-++---++-+-+--+-+ > | o_orderkey | o_custkey | o_orderstatus | o_totalprice | o_orderdate | > o_clerk | o_shippriority | o_comment > | int_id | bigint_id | float_id | double_id | > varchar_id | date_id | timestamp_id | dir0 > | > +-+++---+--+--+-++-++---++-+-+--+-+ > | 11335 | 871| F | 133549.0 | 1994-10-22 | > null | 0 | ealms. theodolites maintain. regular, even > instructions against t | -4 | -4 | -4.0 | -4.0 | -4 > | 2016-09-29 | 2016-10-03 06:11:52.429 | o_orderpriority=2-HIGH | > +-+++---+--+--+-++-++---++-+-+--+-+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15655520#comment-15655520 ] Rahul Challapalli commented on DRILL-5035: -- Based on your explanation, the parquet files are created by hive itself. Then this is a bug. But just to confirm, can you do the below checks 1. Inspect the parquet metadata and look for "creator" field 2. Try to run a similar query from hive and see if it succeeds > Selecting timestamp value from Hive table causes IndexOutOfBoundsException > -- > > Key: DRILL-5035 > URL: https://issues.apache.org/jira/browse/DRILL-5035 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.9.0 >Reporter: Robert Hou >Assignee: Vitalii Diravka > Attachments: orders_parts_hive.tar > > > I used the new option to read Hive timestamps. > alter session set `store.parquet.reader.int96_as_timestamp` = true; > This query fails: > select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 > 06:11:52.429'; > Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: > 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768)) > Fragment 0:0 > [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] > (state=,code=0) > Selecting all the columns succeed. > 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where > timestamp_id = '2016-10-03 06:11:52.429'; > +-+++---+--+--+-++-++---++-+-+--+-+ > | o_orderkey | o_custkey | o_orderstatus | o_totalprice | o_orderdate | > o_clerk | o_shippriority | o_comment > | int_id | bigint_id | float_id | double_id | > varchar_id | date_id | timestamp_id | dir0 > | > +-+++---+--+--+-++-++---++-+-+--+-+ > | 11335 | 871| F | 133549.0 | 1994-10-22 | > null | 0 | ealms. theodolites maintain. regular, even > instructions against t | -4 | -4 | -4.0 | -4.0 | -4 > | 2016-09-29 | 2016-10-03 06:11:52.429 | o_orderpriority=2-HIGH | > +-+++---+--+--+-++-++---++-+-+--+-+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15655518#comment-15655518 ] Robert Hou commented on DRILL-5035: --- I am not sure this is a release stopper. It may be due to the fact that I have a partition that only has null values for the column. > Selecting timestamp value from Hive table causes IndexOutOfBoundsException > -- > > Key: DRILL-5035 > URL: https://issues.apache.org/jira/browse/DRILL-5035 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.9.0 >Reporter: Robert Hou >Assignee: Vitalii Diravka > Attachments: orders_parts_hive.tar > > > I used the new option to read Hive timestamps. > alter session set `store.parquet.reader.int96_as_timestamp` = true; > This query fails: > select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 > 06:11:52.429'; > Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: > 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768)) > Fragment 0:0 > [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] > (state=,code=0) > Selecting all the columns succeed. > 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where > timestamp_id = '2016-10-03 06:11:52.429'; > +-+++---+--+--+-++-++---++-+-+--+-+ > | o_orderkey | o_custkey | o_orderstatus | o_totalprice | o_orderdate | > o_clerk | o_shippriority | o_comment > | int_id | bigint_id | float_id | double_id | > varchar_id | date_id | timestamp_id | dir0 > | > +-+++---+--+--+-++-++---++-+-+--+-+ > | 11335 | 871| F | 133549.0 | 1994-10-22 | > null | 0 | ealms. theodolites maintain. regular, even > instructions against t | -4 | -4 | -4.0 | -4.0 | -4 > | 2016-09-29 | 2016-10-03 06:11:52.429 | o_orderpriority=2-HIGH | > +-+++---+--+--+-++-++---++-+-+--+-+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15655507#comment-15655507 ] Robert Hou commented on DRILL-5035: --- The DDL for the partitioned Hive table: create table orders_parts_hive ( o_orderkey int, o_custkey int, o_orderstatus string, o_totalprice double, o_orderdate date, o_clerk string, o_shippriority int, o_comment string, int_id int, bigint_id bigint, float_id float, double_id double, varchar_id string, date_id date, timestamp_id timestamp) partitioned by (o_orderpriority string) stored as parquet; > Selecting timestamp value from Hive table causes IndexOutOfBoundsException > -- > > Key: DRILL-5035 > URL: https://issues.apache.org/jira/browse/DRILL-5035 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.9.0 >Reporter: Robert Hou >Assignee: Vitalii Diravka > Attachments: orders_parts_hive.tar > > > I used the new option to read Hive timestamps. > alter session set `store.parquet.reader.int96_as_timestamp` = true; > This query fails: > select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 > 06:11:52.429'; > Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: > 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768)) > Fragment 0:0 > [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] > (state=,code=0) > Selecting all the columns succeed. > 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where > timestamp_id = '2016-10-03 06:11:52.429'; > +-+++---+--+--+-++-++---++-+-+--+-+ > | o_orderkey | o_custkey | o_orderstatus | o_totalprice | o_orderdate | > o_clerk | o_shippriority | o_comment > | int_id | bigint_id | float_id | double_id | > varchar_id | date_id | timestamp_id | dir0 > | > +-+++---+--+--+-++-++---++-+-+--+-+ > | 11335 | 871| F | 133549.0 | 1994-10-22 | > null | 0 | ealms. theodolites maintain. regular, even > instructions against t | -4 | -4 | -4.0 | -4.0 | -4 > | 2016-09-29 | 2016-10-03 06:11:52.429 | o_orderpriority=2-HIGH | > +-+++---+--+--+-++-++---++-+-+--+-+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15655504#comment-15655504 ] Robert Hou commented on DRILL-5035: --- Interesting. I exported Drill data to a tbl file. I edited the tbl file so that Hive could read it. I created a Hive table and loaded it from the tbl file. Created a parquet Hive table from the first Hive table. And then created a partitioned Hive table from the parquet Hive table. > Selecting timestamp value from Hive table causes IndexOutOfBoundsException > -- > > Key: DRILL-5035 > URL: https://issues.apache.org/jira/browse/DRILL-5035 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.9.0 >Reporter: Robert Hou >Assignee: Vitalii Diravka > Attachments: orders_parts_hive.tar > > > I used the new option to read Hive timestamps. > alter session set `store.parquet.reader.int96_as_timestamp` = true; > This query fails: > select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 > 06:11:52.429'; > Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: > 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768)) > Fragment 0:0 > [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] > (state=,code=0) > Selecting all the columns succeed. > 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where > timestamp_id = '2016-10-03 06:11:52.429'; > +-+++---+--+--+-++-++---++-+-+--+-+ > | o_orderkey | o_custkey | o_orderstatus | o_totalprice | o_orderdate | > o_clerk | o_shippriority | o_comment > | int_id | bigint_id | float_id | double_id | > varchar_id | date_id | timestamp_id | dir0 > | > +-+++---+--+--+-++-++---++-+-+--+-+ > | 11335 | 871| F | 133549.0 | 1994-10-22 | > null | 0 | ealms. theodolites maintain. regular, even > instructions against t | -4 | -4 | -4.0 | -4.0 | -4 > | 2016-09-29 | 2016-10-03 06:11:52.429 | o_orderpriority=2-HIGH | > +-+++---+--+--+-++-++---++-+-+--+-+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15655486#comment-15655486 ] Rahul Challapalli commented on DRILL-5035: -- I am a little confused. How did you generate the data for the hive table? If it is generated by drill, that explains the behavior and this is not a bug > Selecting timestamp value from Hive table causes IndexOutOfBoundsException > -- > > Key: DRILL-5035 > URL: https://issues.apache.org/jira/browse/DRILL-5035 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.9.0 >Reporter: Robert Hou >Assignee: Vitalii Diravka > Attachments: orders_parts_hive.tar > > > I used the new option to read Hive timestamps. > alter session set `store.parquet.reader.int96_as_timestamp` = true; > This query fails: > select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 > 06:11:52.429'; > Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: > 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768)) > Fragment 0:0 > [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] > (state=,code=0) > Selecting all the columns succeed. > 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where > timestamp_id = '2016-10-03 06:11:52.429'; > +-+++---+--+--+-++-++---++-+-+--+-+ > | o_orderkey | o_custkey | o_orderstatus | o_totalprice | o_orderdate | > o_clerk | o_shippriority | o_comment > | int_id | bigint_id | float_id | double_id | > varchar_id | date_id | timestamp_id | dir0 > | > +-+++---+--+--+-++-++---++-+-+--+-+ > | 11335 | 871| F | 133549.0 | 1994-10-22 | > null | 0 | ealms. theodolites maintain. regular, even > instructions against t | -4 | -4 | -4.0 | -4.0 | -4 > | 2016-09-29 | 2016-10-03 06:11:52.429 | o_orderpriority=2-HIGH | > +-+++---+--+--+-++-++---++-+-+--+-+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-5035: -- Attachment: orders_parts_hive.tar This is a Hive partitioned table. It is partitioned on o_orderpriority. > Selecting timestamp value from Hive table causes IndexOutOfBoundsException > -- > > Key: DRILL-5035 > URL: https://issues.apache.org/jira/browse/DRILL-5035 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.9.0 >Reporter: Robert Hou >Assignee: Vitalii Diravka > Attachments: orders_parts_hive.tar > > > I used the new option to read Hive timestamps. > alter session set `store.parquet.reader.int96_as_timestamp` = true; > This query fails: > select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 > 06:11:52.429'; > Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: > 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768)) > Fragment 0:0 > [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] > (state=,code=0) > Selecting all the columns succeed. > 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where > timestamp_id = '2016-10-03 06:11:52.429'; > +-+++---+--+--+-++-++---++-+-+--+-+ > | o_orderkey | o_custkey | o_orderstatus | o_totalprice | o_orderdate | > o_clerk | o_shippriority | o_comment > | int_id | bigint_id | float_id | double_id | > varchar_id | date_id | timestamp_id | dir0 > | > +-+++---+--+--+-++-++---++-+-+--+-+ > | 11335 | 871| F | 133549.0 | 1994-10-22 | > null | 0 | ealms. theodolites maintain. regular, even > instructions against t | -4 | -4 | -4.0 | -4.0 | -4 > | 2016-09-29 | 2016-10-03 06:11:52.429 | o_orderpriority=2-HIGH | > +-+++---+--+--+-++-++---++-+-+--+-+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15655479#comment-15655479 ] Robert Hou commented on DRILL-5035: --- I'm trying to figure out how to do that. Because it is a Hive partitioned table, it has five directories, each with one file, and they all have the same name. Maybe I'll use a tar file. > Selecting timestamp value from Hive table causes IndexOutOfBoundsException > -- > > Key: DRILL-5035 > URL: https://issues.apache.org/jira/browse/DRILL-5035 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.9.0 >Reporter: Robert Hou >Assignee: Vitalii Diravka > > I used the new option to read Hive timestamps. > alter session set `store.parquet.reader.int96_as_timestamp` = true; > This query fails: > select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 > 06:11:52.429'; > Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: > 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768)) > Fragment 0:0 > [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] > (state=,code=0) > Selecting all the columns succeed. > 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where > timestamp_id = '2016-10-03 06:11:52.429'; > +-+++---+--+--+-++-++---++-+-+--+-+ > | o_orderkey | o_custkey | o_orderstatus | o_totalprice | o_orderdate | > o_clerk | o_shippriority | o_comment > | int_id | bigint_id | float_id | double_id | > varchar_id | date_id | timestamp_id | dir0 > | > +-+++---+--+--+-++-++---++-+-+--+-+ > | 11335 | 871| F | 133549.0 | 1994-10-22 | > null | 0 | ealms. theodolites maintain. regular, even > instructions against t | -4 | -4 | -4.0 | -4.0 | -4 > | 2016-09-29 | 2016-10-03 06:11:52.429 | o_orderpriority=2-HIGH | > +-+++---+--+--+-++-++---++-+-+--+-+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15655475#comment-15655475 ] Rahul Challapalli commented on DRILL-5035: -- Also it would be helpful if you can upload the data along with hive ddl. Assuming data is less than 10MB > Selecting timestamp value from Hive table causes IndexOutOfBoundsException > -- > > Key: DRILL-5035 > URL: https://issues.apache.org/jira/browse/DRILL-5035 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.9.0 >Reporter: Robert Hou >Assignee: Vitalii Diravka > > I used the new option to read Hive timestamps. > alter session set `store.parquet.reader.int96_as_timestamp` = true; > This query fails: > select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 > 06:11:52.429'; > Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: > 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768)) > Fragment 0:0 > [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] > (state=,code=0) > Selecting all the columns succeed. > 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where > timestamp_id = '2016-10-03 06:11:52.429'; > +-+++---+--+--+-++-++---++-+-+--+-+ > | o_orderkey | o_custkey | o_orderstatus | o_totalprice | o_orderdate | > o_clerk | o_shippriority | o_comment > | int_id | bigint_id | float_id | double_id | > varchar_id | date_id | timestamp_id | dir0 > | > +-+++---+--+--+-++-++---++-+-+--+-+ > | 11335 | 871| F | 133549.0 | 1994-10-22 | > null | 0 | ealms. theodolites maintain. regular, even > instructions against t | -4 | -4 | -4.0 | -4.0 | -4 > | 2016-09-29 | 2016-10-03 06:11:52.429 | o_orderpriority=2-HIGH | > +-+++---+--+--+-++-++---++-+-+--+-+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15655476#comment-15655476 ] Robert Hou commented on DRILL-5035: --- Yes, I created it. It is a Hive table partitioned on a string. I created it using data from a Drill table. > Selecting timestamp value from Hive table causes IndexOutOfBoundsException > -- > > Key: DRILL-5035 > URL: https://issues.apache.org/jira/browse/DRILL-5035 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.9.0 >Reporter: Robert Hou >Assignee: Vitalii Diravka > > I used the new option to read Hive timestamps. > alter session set `store.parquet.reader.int96_as_timestamp` = true; > This query fails: > select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 > 06:11:52.429'; > Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: > 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768)) > Fragment 0:0 > [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] > (state=,code=0) > Selecting all the columns succeed. > 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where > timestamp_id = '2016-10-03 06:11:52.429'; > +-+++---+--+--+-++-++---++-+-+--+-+ > | o_orderkey | o_custkey | o_orderstatus | o_totalprice | o_orderdate | > o_clerk | o_shippriority | o_comment > | int_id | bigint_id | float_id | double_id | > varchar_id | date_id | timestamp_id | dir0 > | > +-+++---+--+--+-++-++---++-+-+--+-+ > | 11335 | 871| F | 133549.0 | 1994-10-22 | > null | 0 | ealms. theodolites maintain. regular, even > instructions against t | -4 | -4 | -4.0 | -4.0 | -4 > | 2016-09-29 | 2016-10-03 06:11:52.429 | o_orderpriority=2-HIGH | > +-+++---+--+--+-++-++---++-+-+--+-+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15655471#comment-15655471 ] Rahul Challapalli commented on DRILL-5035: -- Are you sure that the data is generated using itself? You can have a hive table sitting on top of data generated by drill. > Selecting timestamp value from Hive table causes IndexOutOfBoundsException > -- > > Key: DRILL-5035 > URL: https://issues.apache.org/jira/browse/DRILL-5035 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.9.0 >Reporter: Robert Hou >Assignee: Vitalii Diravka > > I used the new option to read Hive timestamps. > alter session set `store.parquet.reader.int96_as_timestamp` = true; > This query fails: > select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 > 06:11:52.429'; > Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: > 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768)) > Fragment 0:0 > [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] > (state=,code=0) > Selecting all the columns succeed. > 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where > timestamp_id = '2016-10-03 06:11:52.429'; > +-+++---+--+--+-++-++---++-+-+--+-+ > | o_orderkey | o_custkey | o_orderstatus | o_totalprice | o_orderdate | > o_clerk | o_shippriority | o_comment > | int_id | bigint_id | float_id | double_id | > varchar_id | date_id | timestamp_id | dir0 > | > +-+++---+--+--+-++-++---++-+-+--+-+ > | 11335 | 871| F | 133549.0 | 1994-10-22 | > null | 0 | ealms. theodolites maintain. regular, even > instructions against t | -4 | -4 | -4.0 | -4.0 | -4 > | 2016-09-29 | 2016-10-03 06:11:52.429 | o_orderpriority=2-HIGH | > +-+++---+--+--+-++-++---++-+-+--+-+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15655464#comment-15655464 ] Robert Hou commented on DRILL-5035: --- I set the new option to false and I do not see a problem. I will try with IMPALA_TIMESTAMP. > Selecting timestamp value from Hive table causes IndexOutOfBoundsException > -- > > Key: DRILL-5035 > URL: https://issues.apache.org/jira/browse/DRILL-5035 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.9.0 >Reporter: Robert Hou >Assignee: Vitalii Diravka > > I used the new option to read Hive timestamps. > alter session set `store.parquet.reader.int96_as_timestamp` = true; > This query fails: > select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 > 06:11:52.429'; > Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: > 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768)) > Fragment 0:0 > [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] > (state=,code=0) > Selecting all the columns succeed. > 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where > timestamp_id = '2016-10-03 06:11:52.429'; > +-+++---+--+--+-++-++---++-+-+--+-+ > | o_orderkey | o_custkey | o_orderstatus | o_totalprice | o_orderdate | > o_clerk | o_shippriority | o_comment > | int_id | bigint_id | float_id | double_id | > varchar_id | date_id | timestamp_id | dir0 > | > +-+++---+--+--+-++-++---++-+-+--+-+ > | 11335 | 871| F | 133549.0 | 1994-10-22 | > null | 0 | ealms. theodolites maintain. regular, even > instructions against t | -4 | -4 | -4.0 | -4.0 | -4 > | 2016-09-29 | 2016-10-03 06:11:52.429 | o_orderpriority=2-HIGH | > +-+++---+--+--+-++-++---++-+-+--+-+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zelaine Fong updated DRILL-5035: Assignee: Vitalii Diravka > Selecting timestamp value from Hive table causes IndexOutOfBoundsException > -- > > Key: DRILL-5035 > URL: https://issues.apache.org/jira/browse/DRILL-5035 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.9.0 >Reporter: Robert Hou >Assignee: Vitalii Diravka > > I used the new option to read Hive timestamps. > alter session set `store.parquet.reader.int96_as_timestamp` = true; > This query fails: > select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 > 06:11:52.429'; > Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: > 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768)) > Fragment 0:0 > [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] > (state=,code=0) > Selecting all the columns succeed. > 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where > timestamp_id = '2016-10-03 06:11:52.429'; > +-+++---+--+--+-++-++---++-+-+--+-+ > | o_orderkey | o_custkey | o_orderstatus | o_totalprice | o_orderdate | > o_clerk | o_shippriority | o_comment > | int_id | bigint_id | float_id | double_id | > varchar_id | date_id | timestamp_id | dir0 > | > +-+++---+--+--+-++-++---++-+-+--+-+ > | 11335 | 871| F | 133549.0 | 1994-10-22 | > null | 0 | ealms. theodolites maintain. regular, even > instructions against t | -4 | -4 | -4.0 | -4.0 | -4 > | 2016-09-29 | 2016-10-03 06:11:52.429 | o_orderpriority=2-HIGH | > +-+++---+--+--+-++-++---++-+-+--+-+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15655451#comment-15655451 ] Rahul Challapalli commented on DRILL-5035: -- Can we remove the new session option and use IMPALA_TIMESTAMP and see if it has the same issue? And run the same query on drill-1.8.0 and see if this is a regression. > Selecting timestamp value from Hive table causes IndexOutOfBoundsException > -- > > Key: DRILL-5035 > URL: https://issues.apache.org/jira/browse/DRILL-5035 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.9.0 >Reporter: Robert Hou > > I used the new option to read Hive timestamps. > alter session set `store.parquet.reader.int96_as_timestamp` = true; > This query fails: > select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 > 06:11:52.429'; > Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: > 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768)) > Fragment 0:0 > [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] > (state=,code=0) > Selecting all the columns succeed. > 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where > timestamp_id = '2016-10-03 06:11:52.429'; > +-+++---+--+--+-++-++---++-+-+--+-+ > | o_orderkey | o_custkey | o_orderstatus | o_totalprice | o_orderdate | > o_clerk | o_shippriority | o_comment > | int_id | bigint_id | float_id | double_id | > varchar_id | date_id | timestamp_id | dir0 > | > +-+++---+--+--+-++-++---++-+-+--+-+ > | 11335 | 871| F | 133549.0 | 1994-10-22 | > null | 0 | ealms. theodolites maintain. regular, even > instructions against t | -4 | -4 | -4.0 | -4.0 | -4 > | 2016-09-29 | 2016-10-03 06:11:52.429 | o_orderpriority=2-HIGH | > +-+++---+--+--+-++-++---++-+-+--+-+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15655374#comment-15655374 ] Robert Hou edited comment on DRILL-5035 at 11/10/16 10:39 PM: -- This table is partitioned on a string. The problem occurs with a partition that has null values. The value of the string is "NOT SPECIFIED". I can select every row up to the null partition using: select timestamp_id from orders_parts_hive limit 9026; But the next row is in the null partition and causes an exception. select timestamp_id from orders_parts_hive limit 9027; was (Author: rhou): This table is partitioned on a string. The problem occurs with a partition that has null values. The value of the varchar is "NOT SPECIFIED". I can select every row up to the null partition using: select timestamp_id from orders_parts_hive limit 9026; But the next row is in the null partition and causes an exception. select timestamp_id from orders_parts_hive limit 9027; > Selecting timestamp value from Hive table causes IndexOutOfBoundsException > -- > > Key: DRILL-5035 > URL: https://issues.apache.org/jira/browse/DRILL-5035 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.9.0 >Reporter: Robert Hou > > I used the new option to read Hive timestamps. > alter session set `store.parquet.reader.int96_as_timestamp` = true; > This query fails: > select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 > 06:11:52.429'; > Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: > 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768)) > Fragment 0:0 > [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] > (state=,code=0) > Selecting all the columns succeed. > 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where > timestamp_id = '2016-10-03 06:11:52.429'; > +-+++---+--+--+-++-++---++-+-+--+-+ > | o_orderkey | o_custkey | o_orderstatus | o_totalprice | o_orderdate | > o_clerk | o_shippriority | o_comment > | int_id | bigint_id | float_id | double_id | > varchar_id | date_id | timestamp_id | dir0 > | > +-+++---+--+--+-++-++---++-+-+--+-+ > | 11335 | 871| F | 133549.0 | 1994-10-22 | > null | 0 | ealms. theodolites maintain. regular, even > instructions against t | -4 | -4 | -4.0 | -4.0 | -4 > | 2016-09-29 | 2016-10-03 06:11:52.429 | o_orderpriority=2-HIGH | > +-+++---+--+--+-++-++---++-+-+--+-+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15655397#comment-15655397 ] Robert Hou edited comment on DRILL-5035 at 11/10/16 10:39 PM: -- The partition only has null values for timestamp_id. Could this be an issue with empty batches? There are 3024 null values in the partition. was (Author: rhou): The partition only has null values for timestamp_id. > Selecting timestamp value from Hive table causes IndexOutOfBoundsException > -- > > Key: DRILL-5035 > URL: https://issues.apache.org/jira/browse/DRILL-5035 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.9.0 >Reporter: Robert Hou > > I used the new option to read Hive timestamps. > alter session set `store.parquet.reader.int96_as_timestamp` = true; > This query fails: > select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 > 06:11:52.429'; > Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: > 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768)) > Fragment 0:0 > [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] > (state=,code=0) > Selecting all the columns succeed. > 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where > timestamp_id = '2016-10-03 06:11:52.429'; > +-+++---+--+--+-++-++---++-+-+--+-+ > | o_orderkey | o_custkey | o_orderstatus | o_totalprice | o_orderdate | > o_clerk | o_shippriority | o_comment > | int_id | bigint_id | float_id | double_id | > varchar_id | date_id | timestamp_id | dir0 > | > +-+++---+--+--+-++-++---++-+-+--+-+ > | 11335 | 871| F | 133549.0 | 1994-10-22 | > null | 0 | ealms. theodolites maintain. regular, even > instructions against t | -4 | -4 | -4.0 | -4.0 | -4 > | 2016-09-29 | 2016-10-03 06:11:52.429 | o_orderpriority=2-HIGH | > +-+++---+--+--+-++-++---++-+-+--+-+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15655374#comment-15655374 ] Robert Hou edited comment on DRILL-5035 at 11/10/16 10:39 PM: -- This table is partitioned on a string. The problem occurs with a partition that has null values. The value of the varchar is "NOT SPECIFIED". I can select every row up to the null partition using: select timestamp_id from orders_parts_hive limit 9026; But the next row is in the null partition and causes an exception. select timestamp_id from orders_parts_hive limit 9027; was (Author: rhou): This table is partitioned on a varchar. The problem occurs with a partition that has null values. The value of the varchar is "NOT SPECIFIED". I can select every row up to the null partition using: select timestamp_id from orders_parts_hive limit 9026; But the next row is in the null partition and causes an exception. select timestamp_id from orders_parts_hive limit 9027; > Selecting timestamp value from Hive table causes IndexOutOfBoundsException > -- > > Key: DRILL-5035 > URL: https://issues.apache.org/jira/browse/DRILL-5035 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.9.0 >Reporter: Robert Hou > > I used the new option to read Hive timestamps. > alter session set `store.parquet.reader.int96_as_timestamp` = true; > This query fails: > select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 > 06:11:52.429'; > Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: > 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768)) > Fragment 0:0 > [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] > (state=,code=0) > Selecting all the columns succeed. > 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where > timestamp_id = '2016-10-03 06:11:52.429'; > +-+++---+--+--+-++-++---++-+-+--+-+ > | o_orderkey | o_custkey | o_orderstatus | o_totalprice | o_orderdate | > o_clerk | o_shippriority | o_comment > | int_id | bigint_id | float_id | double_id | > varchar_id | date_id | timestamp_id | dir0 > | > +-+++---+--+--+-++-++---++-+-+--+-+ > | 11335 | 871| F | 133549.0 | 1994-10-22 | > null | 0 | ealms. theodolites maintain. regular, even > instructions against t | -4 | -4 | -4.0 | -4.0 | -4 > | 2016-09-29 | 2016-10-03 06:11:52.429 | o_orderpriority=2-HIGH | > +-+++---+--+--+-++-++---++-+-+--+-+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15655397#comment-15655397 ] Robert Hou edited comment on DRILL-5035 at 11/10/16 10:36 PM: -- The partition only has null values for timestamp_id. was (Author: rhou): The partition only has null values. > Selecting timestamp value from Hive table causes IndexOutOfBoundsException > -- > > Key: DRILL-5035 > URL: https://issues.apache.org/jira/browse/DRILL-5035 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.9.0 >Reporter: Robert Hou > > I used the new option to read Hive timestamps. > alter session set `store.parquet.reader.int96_as_timestamp` = true; > This query fails: > select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 > 06:11:52.429'; > Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: > 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768)) > Fragment 0:0 > [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] > (state=,code=0) > Selecting all the columns succeed. > 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where > timestamp_id = '2016-10-03 06:11:52.429'; > +-+++---+--+--+-++-++---++-+-+--+-+ > | o_orderkey | o_custkey | o_orderstatus | o_totalprice | o_orderdate | > o_clerk | o_shippriority | o_comment > | int_id | bigint_id | float_id | double_id | > varchar_id | date_id | timestamp_id | dir0 > | > +-+++---+--+--+-++-++---++-+-+--+-+ > | 11335 | 871| F | 133549.0 | 1994-10-22 | > null | 0 | ealms. theodolites maintain. regular, even > instructions against t | -4 | -4 | -4.0 | -4.0 | -4 > | 2016-09-29 | 2016-10-03 06:11:52.429 | o_orderpriority=2-HIGH | > +-+++---+--+--+-++-++---++-+-+--+-+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15655397#comment-15655397 ] Robert Hou commented on DRILL-5035: --- The partition only has null values. > Selecting timestamp value from Hive table causes IndexOutOfBoundsException > -- > > Key: DRILL-5035 > URL: https://issues.apache.org/jira/browse/DRILL-5035 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.9.0 >Reporter: Robert Hou > > I used the new option to read Hive timestamps. > alter session set `store.parquet.reader.int96_as_timestamp` = true; > This query fails: > select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 > 06:11:52.429'; > Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: > 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768)) > Fragment 0:0 > [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] > (state=,code=0) > Selecting all the columns succeed. > 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where > timestamp_id = '2016-10-03 06:11:52.429'; > +-+++---+--+--+-++-++---++-+-+--+-+ > | o_orderkey | o_custkey | o_orderstatus | o_totalprice | o_orderdate | > o_clerk | o_shippriority | o_comment > | int_id | bigint_id | float_id | double_id | > varchar_id | date_id | timestamp_id | dir0 > | > +-+++---+--+--+-++-++---++-+-+--+-+ > | 11335 | 871| F | 133549.0 | 1994-10-22 | > null | 0 | ealms. theodolites maintain. regular, even > instructions against t | -4 | -4 | -4.0 | -4.0 | -4 > | 2016-09-29 | 2016-10-03 06:11:52.429 | o_orderpriority=2-HIGH | > +-+++---+--+--+-++-++---++-+-+--+-+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15655378#comment-15655378 ] Robert Hou commented on DRILL-5035: --- The Hive table is partitioned on o_orderpriority, which is a string. > Selecting timestamp value from Hive table causes IndexOutOfBoundsException > -- > > Key: DRILL-5035 > URL: https://issues.apache.org/jira/browse/DRILL-5035 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.9.0 >Reporter: Robert Hou > > I used the new option to read Hive timestamps. > alter session set `store.parquet.reader.int96_as_timestamp` = true; > This query fails: > select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 > 06:11:52.429'; > Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: > 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768)) > Fragment 0:0 > [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] > (state=,code=0) > Selecting all the columns succeed. > 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where > timestamp_id = '2016-10-03 06:11:52.429'; > +-+++---+--+--+-++-++---++-+-+--+-+ > | o_orderkey | o_custkey | o_orderstatus | o_totalprice | o_orderdate | > o_clerk | o_shippriority | o_comment > | int_id | bigint_id | float_id | double_id | > varchar_id | date_id | timestamp_id | dir0 > | > +-+++---+--+--+-++-++---++-+-+--+-+ > | 11335 | 871| F | 133549.0 | 1994-10-22 | > null | 0 | ealms. theodolites maintain. regular, even > instructions against t | -4 | -4 | -4.0 | -4.0 | -4 > | 2016-09-29 | 2016-10-03 06:11:52.429 | o_orderpriority=2-HIGH | > +-+++---+--+--+-++-++---++-+-+--+-+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15655374#comment-15655374 ] Robert Hou commented on DRILL-5035: --- This table is partitioned on a varchar. The problem occurs with a partition that has null values. The value of the varchar is "NOT SPECIFIED". I can select every row up to the null partition using: select timestamp_id from orders_parts_hive limit 9026; But the next row is in the null partition and causes an exception. select timestamp_id from orders_parts_hive limit 9027; > Selecting timestamp value from Hive table causes IndexOutOfBoundsException > -- > > Key: DRILL-5035 > URL: https://issues.apache.org/jira/browse/DRILL-5035 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.9.0 >Reporter: Robert Hou > > I used the new option to read Hive timestamps. > alter session set `store.parquet.reader.int96_as_timestamp` = true; > This query fails: > select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 > 06:11:52.429'; > Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: > 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768)) > Fragment 0:0 > [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] > (state=,code=0) > Selecting all the columns succeed. > 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where > timestamp_id = '2016-10-03 06:11:52.429'; > +-+++---+--+--+-++-++---++-+-+--+-+ > | o_orderkey | o_custkey | o_orderstatus | o_totalprice | o_orderdate | > o_clerk | o_shippriority | o_comment > | int_id | bigint_id | float_id | double_id | > varchar_id | date_id | timestamp_id | dir0 > | > +-+++---+--+--+-++-++---++-+-+--+-+ > | 11335 | 871| F | 133549.0 | 1994-10-22 | > null | 0 | ealms. theodolites maintain. regular, even > instructions against t | -4 | -4 | -4.0 | -4.0 | -4 > | 2016-09-29 | 2016-10-03 06:11:52.429 | o_orderpriority=2-HIGH | > +-+++---+--+--+-++-++---++-+-+--+-+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15655306#comment-15655306 ] Robert Hou commented on DRILL-5035: --- I am using RC1. 0: jdbc:drill:zk=10.10.100.186:5181> select * from sys.version; +--+---+-++++ | version | commit_id | commit_message|commit_time | build_email | build_time | +--+---+-++++ | 1.9.0| 5cea9afa6278e21574c6a982ae5c3d82085ef904 | [maven-release-plugin] prepare release drill-1.9.0 | 09.11.2016 @ 10:28:44 PST | r...@mapr.com | 10.11.2016 @ 12:56:24 PST | +--+---+-++++ > Selecting timestamp value from Hive table causes IndexOutOfBoundsException > -- > > Key: DRILL-5035 > URL: https://issues.apache.org/jira/browse/DRILL-5035 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.9.0 >Reporter: Robert Hou > > I used the new option to read Hive timestamps. > alter session set `store.parquet.reader.int96_as_timestamp` = true; > This query fails: > select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 > 06:11:52.429'; > Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: > 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768)) > Fragment 0:0 > [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] > (state=,code=0) > Selecting all the columns succeed. > 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where > timestamp_id = '2016-10-03 06:11:52.429'; > +-+++---+--+--+-++-++---++-+-+--+-+ > | o_orderkey | o_custkey | o_orderstatus | o_totalprice | o_orderdate | > o_clerk | o_shippriority | o_comment > | int_id | bigint_id | float_id | double_id | > varchar_id | date_id | timestamp_id | dir0 > | > +-+++---+--+--+-++-++---++-+-+--+-+ > | 11335 | 871| F | 133549.0 | 1994-10-22 | > null | 0 | ealms. theodolites maintain. regular, even > instructions against t | -4 | -4 | -4.0 | -4.0 | -4 > | 2016-09-29 | 2016-10-03 06:11:52.429 | o_orderpriority=2-HIGH | > +-+++---+--+--+-++-++---++-+-+--+-+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException
Robert Hou created DRILL-5035: - Summary: Selecting timestamp value from Hive table causes IndexOutOfBoundsException Key: DRILL-5035 URL: https://issues.apache.org/jira/browse/DRILL-5035 Project: Apache Drill Issue Type: Bug Components: Execution - Data Types Affects Versions: 1.9.0 Reporter: Robert Hou I used the new option to read Hive timestamps. alter session set `store.parquet.reader.int96_as_timestamp` = true; This query fails: select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 06:11:52.429'; Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768)) Fragment 0:0 [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] (state=,code=0) Selecting all the columns succeed. 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where timestamp_id = '2016-10-03 06:11:52.429'; +-+++---+--+--+-++-++---++-+-+--+-+ | o_orderkey | o_custkey | o_orderstatus | o_totalprice | o_orderdate | o_clerk | o_shippriority | o_comment | int_id | bigint_id | float_id | double_id | varchar_id | date_id | timestamp_id | dir0 | +-+++---+--+--+-++-++---++-+-+--+-+ | 11335 | 871| F | 133549.0 | 1994-10-22 | null | 0 | ealms. theodolites maintain. regular, even instructions against t | -4 | -4 | -4.0 | -4.0 | -4 | 2016-09-29 | 2016-10-03 06:11:52.429 | o_orderpriority=2-HIGH | +-+++---+--+--+-++-++---++-+-+--+-+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4990) Use new HDFS API access instead of listStatus to check if users have permissions to access workspace.
[ https://issues.apache.org/jira/browse/DRILL-4990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Padma Penumarthy updated DRILL-4990: Description: For every query, we build the schema tree (runSQL->getPlan->getNewDefaultSchema->getRootSchema). All workspaces in all storage plugins are checked and are added to the schema tree if they are accessible by the user who initiated the query. For file system plugin, listStatus API is used to check if the workspace is accessible or not (WorkspaceSchemaFactory.accessible) by the user. The idea seem to be if the user does not have access to file(s) in the workspace, listStatus will generate an exception and we return false. But, listStatus (which lists all the entries of a directory) is an expensive operation when there are large number of files in the directory. A new API is added in Hadoop 2.6 called access (HDFS-6570) which provides the ability to check if the user has permissions on a file/directory. Use this new API instead of listStatus. (was: For every query, we build the schema tree (runSQL->getPlan->getNewDefaultSchema->getRootSchema). All workspaces in all storage plugins are checked and are added to the schema tree if they are accessible by the user who initiated the query. For file system plugin, listStatus API is used to check if the workspace is accessible or not (WorkspaceSchemaFactory.accessible) by the user. The idea seem to be if the user does not have access to file(s) in the workspace, listStatus will generate an exception and we return false. But, listStatus (which lists all the entries of a directory) is an expensive operation when there are large number of files in the directory. A new API is added in Hadoop 2.6 called access (HDFS-6570) which provides the ability to check if the user has permissions on a file/directory. Use this new API instead of listStatus. For a directory with 256k+ files, an improvement of upto 10 sec in planning time was observed when using the new API vs. old way of listStatus. ) > Use new HDFS API access instead of listStatus to check if users have > permissions to access workspace. > - > > Key: DRILL-4990 > URL: https://issues.apache.org/jira/browse/DRILL-4990 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.8.0 >Reporter: Padma Penumarthy >Assignee: Padma Penumarthy > Fix For: 1.9.0 > > > For every query, we build the schema tree > (runSQL->getPlan->getNewDefaultSchema->getRootSchema). All workspaces in all > storage plugins are checked and are added to the schema tree if they are > accessible by the user who initiated the query. For file system plugin, > listStatus API is used to check if the workspace is accessible or not > (WorkspaceSchemaFactory.accessible) by the user. The idea seem to be if the > user does not have access to file(s) in the workspace, listStatus will > generate an exception and we return false. But, listStatus (which lists all > the entries of a directory) is an expensive operation when there are large > number of files in the directory. A new API is added in Hadoop 2.6 called > access (HDFS-6570) which provides the ability to check if the user has > permissions on a file/directory. Use this new API instead of listStatus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (DRILL-5025) ExternalSortBatch provides weak control over spill file size
[ https://issues.apache.org/jira/browse/DRILL-5025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers closed DRILL-5025. -- > ExternalSortBatch provides weak control over spill file size > > > Key: DRILL-5025 > URL: https://issues.apache.org/jira/browse/DRILL-5025 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.8.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > > The ExternalSortBatch (ESB) operator sorts records while spilling to disk to > control memory use. The size of the spill file is not easy to control. It is > a function of the accumulated batches size (half of the accumulated total), > which is determined by either the memory budget or the > {{drill.exec.sort.external.group.size}} parameter. (But, even with the > parameter, the actual file size is still half the accumulated batches.) > The proposed solution is to provide an explicit parameter that sets the > maximum spill file size: {{drill.exec.sort.external.spill.size}}. If the ESB > needs to spill more than this amount of data, ESB should split the spill into > multiple files. > The spill.size should be in bytes (or MB). (A size in records makes the file > size data-dependent, which would not be helpful.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-5025) ExternalSortBatch provides weak control over spill file size
[ https://issues.apache.org/jira/browse/DRILL-5025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers resolved DRILL-5025. Resolution: Invalid Assignee: Paul Rogers > ExternalSortBatch provides weak control over spill file size > > > Key: DRILL-5025 > URL: https://issues.apache.org/jira/browse/DRILL-5025 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.8.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > > The ExternalSortBatch (ESB) operator sorts records while spilling to disk to > control memory use. The size of the spill file is not easy to control. It is > a function of the accumulated batches size (half of the accumulated total), > which is determined by either the memory budget or the > {{drill.exec.sort.external.group.size}} parameter. (But, even with the > parameter, the actual file size is still half the accumulated batches.) > The proposed solution is to provide an explicit parameter that sets the > maximum spill file size: {{drill.exec.sort.external.spill.size}}. If the ESB > needs to spill more than this amount of data, ESB should split the spill into > multiple files. > The spill.size should be in bytes (or MB). (A size in records makes the file > size data-dependent, which would not be helpful.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5025) ExternalSortBatch provides weak control over spill file size
[ https://issues.apache.org/jira/browse/DRILL-5025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15655275#comment-15655275 ] Paul Rogers commented on DRILL-5025: Cancelling for now; spill file size is determined by the spill/respill strategy; is best discussed in that context. > ExternalSortBatch provides weak control over spill file size > > > Key: DRILL-5025 > URL: https://issues.apache.org/jira/browse/DRILL-5025 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.8.0 >Reporter: Paul Rogers >Priority: Minor > > The ExternalSortBatch (ESB) operator sorts records while spilling to disk to > control memory use. The size of the spill file is not easy to control. It is > a function of the accumulated batches size (half of the accumulated total), > which is determined by either the memory budget or the > {{drill.exec.sort.external.group.size}} parameter. (But, even with the > parameter, the actual file size is still half the accumulated batches.) > The proposed solution is to provide an explicit parameter that sets the > maximum spill file size: {{drill.exec.sort.external.spill.size}}. If the ESB > needs to spill more than this amount of data, ESB should split the spill into > multiple files. > The spill.size should be in bytes (or MB). (A size in records makes the file > size data-dependent, which would not be helpful.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet
[ https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krystal reopened DRILL-4373: Re-open jira as this breaks existing behavior as described in https://issues.apache.org/jira/browse/DRILL-5034. > Drill and Hive have incompatible timestamp representations in parquet > - > > Key: DRILL-4373 > URL: https://issues.apache.org/jira/browse/DRILL-4373 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Hive, Storage - Parquet >Affects Versions: 1.8.0 >Reporter: Rahul Challapalli >Assignee: Vitalii Diravka > Labels: doc-impacting > Fix For: 1.9.0 > > > git.commit.id.abbrev=83d460c > I created a parquet file with a timestamp type using Drill. Now if I define a > hive table on top of the parquet file and use "timestamp" as the column type, > drill fails to read the hive table through the hive storage plugin > Implementation: > Added int96 to timestamp converter for both parquet readers and controling it > by system / session option "store.parquet.int96_as_timestamp". > The value of the option is false by default for the proper work of the old > query scripts with the "convert_from TIMESTAMP_IMPALA" function. > When the option is true using of that function is unnesessary and can lead to > the query fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-5034) Select timestamp from hive generated parquet always return in UTC
Krystal created DRILL-5034: -- Summary: Select timestamp from hive generated parquet always return in UTC Key: DRILL-5034 URL: https://issues.apache.org/jira/browse/DRILL-5034 Project: Apache Drill Issue Type: Bug Components: Storage - Parquet Affects Versions: 1.9.0 Reporter: Krystal commit id: 5cea9afa6278e21574c6a982ae5c3d82085ef904 Reading timestamp data against a hive parquet table from drill automatically converts the timestamp data to UTC. SELECT TIMEOFDAY() FROM (VALUES(1)); +--+ |EXPR$0| +--+ | 2016-11-10 12:33:26.547 America/Los_Angeles | +--+ data schema: message hive_schema { optional int32 voter_id; optional binary name (UTF8); optional int32 age; optional binary registration (UTF8); optional fixed_len_byte_array(3) contributions (DECIMAL(6,2)); optional int32 voterzone; optional int96 create_timestamp; optional int32 create_date (DATE); } Using drill-1.8, the returned timestamps match the table data: select convert_from(create_timestamp, 'TIMESTAMP_IMPALA') from `/user/hive/warehouse/voter_hive_parquet` limit 5; ++ | EXPR$0 | ++ | 2016-10-23 20:03:58.0 | | null | | 2016-09-09 12:01:18.0 | | 2017-03-06 20:35:55.0 | | 2017-01-20 22:32:43.0 | ++ 5 rows selected (1.032 seconds) If the user timzone is changed to UTC, then the timestamp data is returned in UTC time. Using drill-1.9, the returned timestamps got converted to UTC eventhough the user timezone is in PST. select convert_from(create_timestamp, 'TIMESTAMP_IMPALA') from dfs.`/user/hive/warehouse/voter_hive_parquet` limit 5; ++ | EXPR$0 | ++ | 2016-10-24 03:03:58.0 | | null | | 2016-09-09 19:01:18.0 | | 2017-03-07 04:35:55.0 | | 2017-01-21 06:32:43.0 | ++ alter session set `store.parquet.reader.int96_as_timestamp`=true; +---+---+ | ok | summary | +---+---+ | true | store.parquet.reader.int96_as_timestamp updated. | +---+---+ select create_timestamp from dfs.`/user/hive/warehouse/voter_hive_parquet` limit 5; ++ |create_timestamp| ++ | 2016-10-24 03:03:58.0 | | null | | 2016-09-09 19:01:18.0 | | 2017-03-07 04:35:55.0 | | 2017-01-21 06:32:43.0 | ++ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-5033) Query on JSON that has null as value for each key
[ https://issues.apache.org/jira/browse/DRILL-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Khurram Faraaz updated DRILL-5033: -- Description: Drill 1.9.0 git commit ID : 83513daf Drill returns same result with or without `store.json.all_text_mode`=true Note that each key in the JSON has null as its value. [root@cent01 null_eq_joins]# cat right_all_nulls.json { "intKey" : null, "bgintKey": null, "strKey": null, "boolKey": null, "fltKey": null, "dblKey": null, "timKey": null, "dtKey": null, "tmstmpKey": null, "intrvldyKey": null, "intrvlyrKey": null } [root@cent01 null_eq_joins]# Querying the above JSON file results in null as query result. - We should see each of the keys in the JSON as a column in query result. - And in each column the value should be a null value. Current behavior does not look right. {noformat} 0: jdbc:drill:schema=dfs.tmp> select * from `right_all_nulls.json`; +---+ | * | +---+ | null | +---+ 1 row selected (0.313 seconds) {noformat} Adding comment from [~julianhyde] IMHO it is similar but not the same as DRILL-1256. Worth logging an issue and let [~jnadeau] (or someone) put on the record what should be the behavior of an empty record (empty JSON map) when it is top-level (as in this case) or in a collection. was: Drill 1.9.0 git commit ID : 83513daf Drill returns same result with or without `store.json.all_text_mode`=true Note that each key in the JSON has null as its value. [root@cent01 null_eq_joins]# cat right_all_nulls.json { "intKey" : null, "bgintKey": null, "strKey": null, "boolKey": null, "fltKey": null, "dblKey": null, "timKey": null, "dtKey": null, "tmstmpKey": null, "intrvldyKey": null, "intrvlyrKey": null } [root@cent01 null_eq_joins]# Querying the above JSON file results in null as query result. - We should see each of the keys in the JSON as a column in query result. - And in each column the value should be a null value. Current behavior does not look right. {noformat} 0: jdbc:drill:schema=dfs.tmp> select * from `right_all_nulls.json`; +---+ | * | +---+ | null | +---+ 1 row selected (0.313 seconds) {noformat} Adding comment from [~julianhyde] {noformat} IMHO it is similar but not the same as DRILL-1256. Worth logging an issue and let [~jnadeau] (or someone) put on the record what should be the behavior of an empty record (empty JSON map) when it is top-level (as in this case) or in a collection. {noformat} > Query on JSON that has null as value for each key > - > > Key: DRILL-5033 > URL: https://issues.apache.org/jira/browse/DRILL-5033 > Project: Apache Drill > Issue Type: Bug > Components: Storage - JSON >Affects Versions: 1.9.0 >Reporter: Khurram Faraaz > > Drill 1.9.0 git commit ID : 83513daf > Drill returns same result with or without `store.json.all_text_mode`=true > Note that each key in the JSON has null as its value. > [root@cent01 null_eq_joins]# cat right_all_nulls.json > { > "intKey" : null, > "bgintKey": null, > "strKey": null, > "boolKey": null, > "fltKey": null, > "dblKey": null, > "timKey": null, > "dtKey": null, > "tmstmpKey": null, > "intrvldyKey": null, > "intrvlyrKey": null > } > [root@cent01 null_eq_joins]# > Querying the above JSON file results in null as query result. > - We should see each of the keys in the JSON as a column in query result. > - And in each column the value should be a null value. > Current behavior does not look right. > {noformat} > 0: jdbc:drill:schema=dfs.tmp> select * from `right_all_nulls.json`; > +---+ > | * | > +---+ > | null | > +---+ > 1 row selected (0.313 seconds) > {noformat} > Adding comment from [~julianhyde] > IMHO it is similar but not the same as DRILL-1256. Worth logging an issue and > let [~jnadeau] (or someone) put on the record what should be the behavior of > an empty record (empty JSON map) when it is top-level (as in this case) or in > a collection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-5033) Query on JSON that has null as value for each key
Khurram Faraaz created DRILL-5033: - Summary: Query on JSON that has null as value for each key Key: DRILL-5033 URL: https://issues.apache.org/jira/browse/DRILL-5033 Project: Apache Drill Issue Type: Bug Components: Storage - JSON Affects Versions: 1.9.0 Reporter: Khurram Faraaz Drill 1.9.0 git commit ID : 83513daf Drill returns same result with or without `store.json.all_text_mode`=true Note that each key in the JSON has null as its value. [root@cent01 null_eq_joins]# cat right_all_nulls.json { "intKey" : null, "bgintKey": null, "strKey": null, "boolKey": null, "fltKey": null, "dblKey": null, "timKey": null, "dtKey": null, "tmstmpKey": null, "intrvldyKey": null, "intrvlyrKey": null } [root@cent01 null_eq_joins]# Querying the above JSON file results in null as query result. - We should see each of the keys in the JSON as a column in query result. - And in each column the value should be a null value. Current behavior does not look right. {noformat} 0: jdbc:drill:schema=dfs.tmp> select * from `right_all_nulls.json`; +---+ | * | +---+ | null | +---+ 1 row selected (0.313 seconds) {noformat} Adding comment from [~julianhyde] {noformat} IMHO it is similar but not the same as DRILL-1256. Worth logging an issue and let [~jnadeau] (or someone) put on the record what should be the behavior of an empty record (empty JSON map) when it is top-level (as in this case) or in a collection. {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4935) Allow drillbits to advertise a configurable host address to Zookeeper
[ https://issues.apache.org/jira/browse/DRILL-4935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15654623#comment-15654623 ] ASF GitHub Bot commented on DRILL-4935: --- Github user harrison-svds commented on the issue: https://github.com/apache/drill/pull/647 @paul-rogers I added a comment detailing the basic solution. I don't know that I have the permissions to assign the story to myself. > Allow drillbits to advertise a configurable host address to Zookeeper > - > > Key: DRILL-4935 > URL: https://issues.apache.org/jira/browse/DRILL-4935 > Project: Apache Drill > Issue Type: New Feature > Components: Execution - RPC >Affects Versions: 1.8.0 >Reporter: Harrison Mebane >Priority: Minor > Fix For: Future > > > There are certain situations, such as running Drill in distributed Docker > containers, in which it is desirable to advertise a different hostname to > Zookeeper than would be output by INetAddress.getLocalHost(). I propose > adding a configuration variable 'drill.exec.rpc.bit.advertised.host' and > passing this address to Zookeeper when the configuration variable is > populated, otherwise falling back to the present behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4935) Allow drillbits to advertise a configurable host address to Zookeeper
[ https://issues.apache.org/jira/browse/DRILL-4935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15654614#comment-15654614 ] Harrison Mebane commented on DRILL-4935: The solution implemented was to add a line to drill-env.sh specifying an environment variable DRILL_HOST_NAME. In a distributed setting, this variable can be set to a command that returns the individual machine's address (e.g. a simple command like `hostname` or a REST call in AWS). If this variable is not set, the code falls back to the original method of calling INetAddress.getLocalHost(). > Allow drillbits to advertise a configurable host address to Zookeeper > - > > Key: DRILL-4935 > URL: https://issues.apache.org/jira/browse/DRILL-4935 > Project: Apache Drill > Issue Type: New Feature > Components: Execution - RPC >Affects Versions: 1.8.0 >Reporter: Harrison Mebane >Priority: Minor > Fix For: Future > > > There are certain situations, such as running Drill in distributed Docker > containers, in which it is desirable to advertise a different hostname to > Zookeeper than would be output by INetAddress.getLocalHost(). I propose > adding a configuration variable 'drill.exec.rpc.bit.advertised.host' and > passing this address to Zookeeper when the configuration variable is > populated, otherwise falling back to the present behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5027) ExternalSortBatch is inefficient, leaks files for large queries
[ https://issues.apache.org/jira/browse/DRILL-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15654610#comment-15654610 ] Paul Rogers commented on DRILL-5027: Another issue with the existing ESB: The code to remerge existing data occurs before the code to spill the current in-memory generation. We are spilling the in-memory generation because memory is tight. But, respelling the on-disk generation requires the use of memory, potentially causing a spike above the memory budget. Better to spill the in-memory generation first, then re-spill the on-disk generation after clearing room in memory. > ExternalSortBatch is inefficient, leaks files for large queries > --- > > Key: DRILL-5027 > URL: https://issues.apache.org/jira/browse/DRILL-5027 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.8.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > > The {{ExternalSortBatch}} (ESB) operator sorts data while spilling to disk as > needed to operate within a memory budget. > The sort happens in two phases: > 1. Gather the incoming batches from the upstream operator, sort them, and > spill to disk as needed. > 2. Merge the "runs" spilled in step 1. > In most cases, the second step should run within the memory available for the > first step (which is why severity is only Minor). Large queries need multiple > sort "phases" in which previously spilled runs are read back into memory, > merged, and again spilled. It is here that ESB has an issue. This process > correctly limit the amount of memory used, but at the cost or rewriting the > same data over and over. > Consider current Drill behavior: > {code} > a b c d (re-spill) > abcd e f g h (re-spill) > abcefgh i j k > {code} > That is batches, a, b, c and d are re-spilled to create the combined abcd, > and so on. The same data is rewritten over and over. > Note that spilled batches take no (direct) memory in Drill, and require only > a small on-heap memento. So, maintaining data on disk s "free". So, better > would be to re-spill only newer data: > {code} > a b c d (re-spill) > abcd | e f g h (re-spill) > abcd efgh | i j k > {code} > Where the bar indicates a moving point at which we've already merged and do > not need to do so again. If each letter is one unit of disk I/O, the original > method uses 35 units while the revised method uses 27 units. > At some point the process may have to repeat by merging the second-generation > spill files and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5032) Drill query on hive parquet table failed with OutOfMemoryError: Java heap space
[ https://issues.apache.org/jira/browse/DRILL-5032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15654543#comment-15654543 ] Serhii Harnyk commented on DRILL-5032: -- As [~jni] mentioned Today, the physical plan looks like: listOfColumns : [col1, col2, ...] — TableLevel Partitons : [ partiton1 : { listOfColums : [col1, col2, ...] -- PartitonLevel } , partiton2 : { listOfColums : [col1, col2, ...] -- PartitonLevel } , ... partiton_n : { listOfColums : [col1, col2, ...] -- PartitonLevel } , ] The listOfColumns are repeating in every partition, which seems to be unnecessary. We should get rid of those repeated list of columns in each partition, as long as they are same as the listOfColumns at Table level. So the initial idea is to remove repeated listOfColums from HivePartition physical plan serialization > Drill query on hive parquet table failed with OutOfMemoryError: Java heap > space > --- > > Key: DRILL-5032 > URL: https://issues.apache.org/jira/browse/DRILL-5032 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Hive >Affects Versions: 1.8.0 >Reporter: Serhii Harnyk >Assignee: Serhii Harnyk > > Following query on hive parquet table failed with OOM Java heap space: > {code} > select distinct(businessdate) from vmdr_trades where trade_date='2016-04-12' > 2016-08-31 08:02:03,597 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO > o.a.drill.exec.work.foreman.Foreman - Query text for query id > 283938c3-fde8-0fc6-37e1-9a568c7f5913: select distinct(businessdate) from > vmdr_trades where trade_date='2016-04-12' > 2016-08-31 08:05:58,502 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO > o.a.d.e.p.l.partition.PruneScanRule - Beginning partition pruning, pruning > class: > org.apache.drill.exec.planner.sql.logical.HivePushPartitionFilterIntoScan$2 > 2016-08-31 08:05:58,506 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO > o.a.d.e.p.l.partition.PruneScanRule - Total elapsed time to build and analyze > filter tree: 1 ms > 2016-08-31 08:05:58,506 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO > o.a.d.e.p.l.partition.PruneScanRule - No conditions were found eligible for > partition pruning.Total pruning elapsed time: 3 ms > 2016-08-31 08:05:58,663 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO > o.a.d.e.p.l.partition.PruneScanRule - Beginning partition pruning, pruning > class: > org.apache.drill.exec.planner.sql.logical.HivePushPartitionFilterIntoScan$2 > 2016-08-31 08:05:58,663 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO > o.a.d.e.p.l.partition.PruneScanRule - Total elapsed time to build and analyze > filter tree: 0 ms > 2016-08-31 08:05:58,663 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO > o.a.d.e.p.l.partition.PruneScanRule - No conditions were found eligible for > partition pruning.Total pruning elapsed time: 0 ms > 2016-08-31 08:05:58,664 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO > o.a.d.e.p.l.partition.PruneScanRule - Beginning partition pruning, pruning > class: > org.apache.drill.exec.planner.sql.logical.HivePushPartitionFilterIntoScan$1 > 2016-08-31 08:05:58,665 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO > o.a.d.e.p.l.partition.PruneScanRule - Total elapsed time to build and analyze > filter tree: 0 ms > 2016-08-31 08:05:58,665 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO > o.a.d.e.p.l.partition.PruneScanRule - No conditions were found eligible for > partition pruning.Total pruning elapsed time: 0 ms > 2016-08-31 08:09:42,355 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] ERROR > o.a.drill.common.CatastrophicFailure - Catastrophic Failure Occurred, > exiting. Information message: Unable to handle out of memory condition in > Foreman. > java.lang.OutOfMemoryError: Java heap space > at java.util.Arrays.copyOf(Arrays.java:3332) ~[na:1.8.0_74] > at > java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137) > ~[na:1.8.0_74] > at > java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121) > ~[na:1.8.0_74] > at > java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:421) > ~[na:1.8.0_74] > at java.lang.StringBuilder.append(StringBuilder.java:136) > ~[na:1.8.0_74] > at java.lang.StringBuilder.append(StringBuilder.java:76) > ~[na:1.8.0_74] > at > java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:457) > ~[na:1.8.0_74] > at java.lang.StringBuilder.append(StringBuilder.java:166) > ~[na:1.8.0_74] > at java.lang.StringBuilder.append(StringBuilder.java:76) > ~[na:1.8.0_74] > at > com.google.protobuf.TextFormat$TextGenerator.write(TextFormat.java:538) > ~[protobuf-java
[jira] [Created] (DRILL-5032) Drill query on hive parquet table failed with OutOfMemoryError: Java heap space
Serhii Harnyk created DRILL-5032: Summary: Drill query on hive parquet table failed with OutOfMemoryError: Java heap space Key: DRILL-5032 URL: https://issues.apache.org/jira/browse/DRILL-5032 Project: Apache Drill Issue Type: Bug Components: Functions - Hive Affects Versions: 1.8.0 Reporter: Serhii Harnyk Assignee: Serhii Harnyk Following query on hive parquet table failed with OOM Java heap space: {code} select distinct(businessdate) from vmdr_trades where trade_date='2016-04-12' 2016-08-31 08:02:03,597 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO o.a.drill.exec.work.foreman.Foreman - Query text for query id 283938c3-fde8-0fc6-37e1-9a568c7f5913: select distinct(businessdate) from vmdr_trades where trade_date='2016-04-12' 2016-08-31 08:05:58,502 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO o.a.d.e.p.l.partition.PruneScanRule - Beginning partition pruning, pruning class: org.apache.drill.exec.planner.sql.logical.HivePushPartitionFilterIntoScan$2 2016-08-31 08:05:58,506 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO o.a.d.e.p.l.partition.PruneScanRule - Total elapsed time to build and analyze filter tree: 1 ms 2016-08-31 08:05:58,506 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO o.a.d.e.p.l.partition.PruneScanRule - No conditions were found eligible for partition pruning.Total pruning elapsed time: 3 ms 2016-08-31 08:05:58,663 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO o.a.d.e.p.l.partition.PruneScanRule - Beginning partition pruning, pruning class: org.apache.drill.exec.planner.sql.logical.HivePushPartitionFilterIntoScan$2 2016-08-31 08:05:58,663 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO o.a.d.e.p.l.partition.PruneScanRule - Total elapsed time to build and analyze filter tree: 0 ms 2016-08-31 08:05:58,663 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO o.a.d.e.p.l.partition.PruneScanRule - No conditions were found eligible for partition pruning.Total pruning elapsed time: 0 ms 2016-08-31 08:05:58,664 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO o.a.d.e.p.l.partition.PruneScanRule - Beginning partition pruning, pruning class: org.apache.drill.exec.planner.sql.logical.HivePushPartitionFilterIntoScan$1 2016-08-31 08:05:58,665 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO o.a.d.e.p.l.partition.PruneScanRule - Total elapsed time to build and analyze filter tree: 0 ms 2016-08-31 08:05:58,665 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO o.a.d.e.p.l.partition.PruneScanRule - No conditions were found eligible for partition pruning.Total pruning elapsed time: 0 ms 2016-08-31 08:09:42,355 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] ERROR o.a.drill.common.CatastrophicFailure - Catastrophic Failure Occurred, exiting. Information message: Unable to handle out of memory condition in Foreman. java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:3332) ~[na:1.8.0_74] at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137) ~[na:1.8.0_74] at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121) ~[na:1.8.0_74] at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:421) ~[na:1.8.0_74] at java.lang.StringBuilder.append(StringBuilder.java:136) ~[na:1.8.0_74] at java.lang.StringBuilder.append(StringBuilder.java:76) ~[na:1.8.0_74] at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:457) ~[na:1.8.0_74] at java.lang.StringBuilder.append(StringBuilder.java:166) ~[na:1.8.0_74] at java.lang.StringBuilder.append(StringBuilder.java:76) ~[na:1.8.0_74] at com.google.protobuf.TextFormat$TextGenerator.write(TextFormat.java:538) ~[protobuf-java-2.5.0.jar:na] at com.google.protobuf.TextFormat$TextGenerator.print(TextFormat.java:526) ~[protobuf-java-2.5.0.jar:na] at com.google.protobuf.TextFormat$Printer.printFieldValue(TextFormat.java:389) ~[protobuf-java-2.5.0.jar:na] at com.google.protobuf.TextFormat$Printer.printSingleField(TextFormat.java:327) ~[protobuf-java-2.5.0.jar:na] at com.google.protobuf.TextFormat$Printer.printField(TextFormat.java:286) ~[protobuf-java-2.5.0.jar:na] at com.google.protobuf.TextFormat$Printer.print(TextFormat.java:273) ~[protobuf-java-2.5.0.jar:na] at com.google.protobuf.TextFormat$Printer.access$400(TextFormat.java:248) ~[protobuf-java-2.5.0.jar:na] at com.google.protobuf.TextFormat.print(TextFormat.java:71) ~[protobuf-java-2.5.0.jar:na] at com.google.protobuf.TextFormat.printToString(TextFormat.java:118) ~[protobuf-java-2.5.0.jar:na] at com.google.protobuf.AbstractMessage.toString(AbstractMessage.java:106) ~[protobuf-java-2.5.0.jar:na] at org.apache.drill.exec.pla
[jira] [Commented] (DRILL-5027) ExternalSortBatch is inefficient, leaks files for large queries
[ https://issues.apache.org/jira/browse/DRILL-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15654495#comment-15654495 ] Paul Rogers commented on DRILL-5027: Proposed solution, which seems reachable by rearranging the existing pieces of ESB: Assume three "generations": * An in-memory generation of sorted batches received from upstream, not yet spilled. * A "new" spill generation: those files created directly by spilling the in-memory generation. * An "old" spill generation: those files created by re-spilling new generation files. The spill would work as follows: * For each upstream batch, sort it and add it to the in-memory generation. * When the in-memory generation reaches the spill threshold, merge the in-memory batches and write to a spill file. Add the spill file to the new spill generation. At this point, ESB memory is empty. * If the new spill generation has reached the spill threshold, merge the spilled batches and write to another spill file. Delete the old spill files. Add the newly created file to the old spill generation. The new spill generation is now empty (as is memory.) * If the old spill generation has reached the spill threshold, transfer it to the new generation and spill as above. The old generation now has a single file. (The other two generations are empty.) The spill threshold is defined as: * Start with the memory budget for the ESB. * Define a target spill-batch size. (The minimum of 32K rows or some defined size in bytes.) * Define the maximum number of in-memory batches as memory budget / spill-batch size. * Set the spill threshold to some number less than the maximum in-memory batch size. When gathering incoming batches in memory, or reading batches from disk, the above ensures that total memory used is less than the budget. Benefits of this approach: * Minimizes read/writes of existing spilled data (overcomes the re-spill issue above.) * Ensures that disk files are deleted as soon as possible. * Ensures that ESB operates within a defined memory budget. * Handles data of any size; the algorithm above simply continues to combine generations as needed. Trades off performance (more disk I/O) for a fixed memory budget. * Limits disk use to no more than twice the amount of spilled data (to account for merging the old generation). > ExternalSortBatch is inefficient, leaks files for large queries > --- > > Key: DRILL-5027 > URL: https://issues.apache.org/jira/browse/DRILL-5027 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.8.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > > The {{ExternalSortBatch}} (ESB) operator sorts data while spilling to disk as > needed to operate within a memory budget. > The sort happens in two phases: > 1. Gather the incoming batches from the upstream operator, sort them, and > spill to disk as needed. > 2. Merge the "runs" spilled in step 1. > In most cases, the second step should run within the memory available for the > first step (which is why severity is only Minor). Large queries need multiple > sort "phases" in which previously spilled runs are read back into memory, > merged, and again spilled. It is here that ESB has an issue. This process > correctly limit the amount of memory used, but at the cost or rewriting the > same data over and over. > Consider current Drill behavior: > {code} > a b c d (re-spill) > abcd e f g h (re-spill) > abcefgh i j k > {code} > That is batches, a, b, c and d are re-spilled to create the combined abcd, > and so on. The same data is rewritten over and over. > Note that spilled batches take no (direct) memory in Drill, and require only > a small on-heap memento. So, maintaining data on disk s "free". So, better > would be to re-spill only newer data: > {code} > a b c d (re-spill) > abcd | e f g h (re-spill) > abcd efgh | i j k > {code} > Where the bar indicates a moving point at which we've already merged and do > not need to do so again. If each letter is one unit of disk I/O, the original > method uses 35 units while the revised method uses 27 units. > At some point the process may have to repeat by merging the second-generation > spill files and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5030) Drill SSL Docs have Bad Link to Oracle Website
[ https://issues.apache.org/jira/browse/DRILL-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15654458#comment-15654458 ] Keys Botzum commented on DRILL-5030: By they I believe this is the correct URL: http://docs.oracle.com/javase/7/docs/technotes/guides/security/jsse/JSSERefGuide.html#Customization (notice that there is a minor typo in the doc - the '/'). > Drill SSL Docs have Bad Link to Oracle Website > -- > > Key: DRILL-5030 > URL: https://issues.apache.org/jira/browse/DRILL-5030 > Project: Apache Drill > Issue Type: Bug > Components: Documentation >Affects Versions: 1.8.0 >Reporter: John Omernik > > When going to setup custom SSL certs on Drill, I found that the link to the > oracle website was broken on this page: > https://drill.apache.org/docs/configuring-web-console-and-rest-api-security/ > at: > As cluster administrator, you can set the following SSL configuration > parameters in the conf/drill-override.conf file, as described in the Java > product documentation: > Obviously fixing the link is one option, another would be to provide > instructions for SSL certs directly in the drill docs so we are not reliant > on Oracle's website. > Thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5030) Drill SSL Docs have Bad Link to Oracle Website
[ https://issues.apache.org/jira/browse/DRILL-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15654422#comment-15654422 ] Keys Botzum commented on DRILL-5030: The Oracle page referenced is likely just defining the three parameters - basically their meaning. Obviously the link should be fixed, but that's separate from asking for a detailed example of creating certs. My concern about an example is that this is much harder than it appears as there are many, many different ways to get a certificate. What's really key is to explain clearly what Drill expects to be in JKS file (the docs are not clear) and then perhaps provide a simple example using a well known CA to get a sample cert, but realistically that example will only be accurate for a small set of users. That's why it is so important to explain clearly what Drill expects in the JKS file. Also, since you asked, here is an internal document from MapR that describes how to replace the MapR self signed certificate. A similar approach should work with Drill: was kind enough to share the steps he went through to replace the default ssl_keystore and ssl_truststore with CA issued certificates. There is no real improvement in security from doing this but many customers prefer to use CA issued certificates as it does improve the user experience. ASSUMPTIONS --- 1. 1-node cluster on mapr50.hadoopone.com is running in secure mode already. configure.sh was run setting mapr50.hadoopone.com (FQDN) as CLDB, ZK, HS, and RM. 2. cert generated from godaddy is a wildcard cert for *.hadoopone.com domain and contains a7d2eaede47dbc19.crt (wildcard host cert), gd_bundle-g2-g1.crt (cert chain that leads up to the godaddy signer), and hadoopone.key (RSA key). All files stored in home directory of root user. 3. an entry in hosts file of laptop was created for mapr50.hadoopone.com (so that MCS doesn't prompt "continue at your own risk?"), or this host resolves in DNS. PROCEDURE (all commands run as root user) - 1. stop the cluster service mapr-warden stop [Note: no need to stop ZK as it doesn’t use certificates] 2. check contents of certificate issued by godaddy keytool -printcert -file ~/a7d2eaede47dbc19.crt openssl x509 -noout -text -in ~/a7d2eaede47dbc19.crt 3. check cert chain issued by godaddy keytool -printcert -file ~/gd_bundle-g2-g1.crt 4. check RSA key issued by godaddy openssl rsa -noout -text -in hadoopone.key 5. create PKCS12 certificate to import openssl pkcs12 -export -in ~/a7d2eaede47dbc19.crt -inkey ~/hadoopone.key -out ~/hadoopone_com.pk12 -name 'mapr50.hadoopone.com' -CAfile ~/gd_bundle-g2-g1.crt -chain -passout pass:mapr123 6. check PKCS12 certificate you just generated keytool -list -keystore ~/hadoopone_com.pk12 -storepass mapr123 -storetype PKCS12 7. import PKCS12 certificate in keystore keytool --importkeystore -noprompt -deststorepass mapr123 -destkeystore ~/ssl_keystore -srckeystore hadoopone_com.pk12 -srcstoretype PKCS12 -srcstorepass mapr123 8. list certs in the keystore keytool -list -v -keystore ~/ssl_keystore -storepass mapr123 9. import certificate chain into truststore keytool --importcert -storepass mapr123 -keystore ssl_truststore -file gd_bundle-g2-g1.crt -alias godaddy 10. list certs in the trust store keytool -list -keystore ~/ssl_truststore -storepass mapr123 11. copy the modified keystore and truststore back to /opt/mapr/conf cp ~/ssl_keystore /opt/mapr/conf cp ~/ssl_truststore /opt/mapr/conf 12. restart the cluster service mapr-zookeeper start service mapr-warden start 13. test > Drill SSL Docs have Bad Link to Oracle Website > -- > > Key: DRILL-5030 > URL: https://issues.apache.org/jira/browse/DRILL-5030 > Project: Apache Drill > Issue Type: Bug > Components: Documentation >Affects Versions: 1.8.0 >Reporter: John Omernik > > When going to setup custom SSL certs on Drill, I found that the link to the > oracle website was broken on this page: > https://drill.apache.org/docs/configuring-web-console-and-rest-api-security/ > at: > As cluster administrator, you can set the following SSL configuration > parameters in the conf/drill-override.conf file, as described in the Java > product documentation: > Obviously fixing the link is one option, another would be to provide > instructions for SSL certs directly in the drill docs so we are not reliant > on Oracle's website. > Thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-5031) Documentation for HTTPD Parser
Charles Givre created DRILL-5031: Summary: Documentation for HTTPD Parser Key: DRILL-5031 URL: https://issues.apache.org/jira/browse/DRILL-5031 Project: Apache Drill Issue Type: Improvement Affects Versions: 1.9.0 Reporter: Charles Givre Priority: Minor Fix For: Future https://gist.github.com/cgivre/47f07a06d44df2af625fc6848407ae7c -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4980) Upgrading of the approach of parquet date correctness status detection
[ https://issues.apache.org/jira/browse/DRILL-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15654351#comment-15654351 ] ASF GitHub Bot commented on DRILL-4980: --- Github user vdiravka commented on the issue: https://github.com/apache/drill/pull/644 @parthchandra Could you please review this PR? > Upgrading of the approach of parquet date correctness status detection > -- > > Key: DRILL-4980 > URL: https://issues.apache.org/jira/browse/DRILL-4980 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Parquet >Affects Versions: 1.8.0 >Reporter: Vitalii Diravka >Assignee: Vitalii Diravka > Fix For: 1.9.0 > > > This jira is an addition for the > [DRILL-4203|https://issues.apache.org/jira/browse/DRILL-4203]. > The date correctness label for the new generated parquet files should be > upgraded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4980) Upgrading of the approach of parquet date correctness status detection
[ https://issues.apache.org/jira/browse/DRILL-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15654323#comment-15654323 ] ASF GitHub Bot commented on DRILL-4980: --- Github user vdiravka commented on a diff in the pull request: https://github.com/apache/drill/pull/644#discussion_r87416017 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java --- @@ -185,7 +185,8 @@ private Metadata(FileSystem fs, ParquetFormatConfig formatConfig) { childFiles.add(file); } } -ParquetTableMetadata_v3 parquetTableMetadata = new ParquetTableMetadata_v3(true); +ParquetTableMetadata_v3 parquetTableMetadata = new ParquetTableMetadata_v3(DrillVersionInfo.getVersion(), +ParquetWriter.WRITER_VERSION); --- End diff -- `is.date.correct` or `parquet-writer.version` were needed in metadata cache file for quick detection of date values correctness. Otherwise need to check `files.rowGroups.columns.mxValue` values from this cache file. But thought a little, I've understood that due to new added `ParquetTableMetadata_v3` we can check: If version of parquet metadata cache file is 3, the date values are definitely correct. Otherwise (when parquet metadata cache file was generated earlier) need to check date values from this file. So `writerVersion` is redundant in the `ParquetTableMetadataBase` now. I deleted it. Please approve does it make sense? > Upgrading of the approach of parquet date correctness status detection > -- > > Key: DRILL-4980 > URL: https://issues.apache.org/jira/browse/DRILL-4980 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Parquet >Affects Versions: 1.8.0 >Reporter: Vitalii Diravka >Assignee: Vitalii Diravka > Fix For: 1.9.0 > > > This jira is an addition for the > [DRILL-4203|https://issues.apache.org/jira/browse/DRILL-4203]. > The date correctness label for the new generated parquet files should be > upgraded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4980) Upgrading of the approach of parquet date correctness status detection
[ https://issues.apache.org/jira/browse/DRILL-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15654274#comment-15654274 ] ASF GitHub Bot commented on DRILL-4980: --- Github user vdiravka commented on a diff in the pull request: https://github.com/apache/drill/pull/644#discussion_r87411842 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetReaderUtility.java --- @@ -59,19 +59,24 @@ */ public static final long JULIAN_DAY_NUMBER_FOR_UNIX_EPOCH = 2440588; /** - * All old parquet files (which haven't "is.date.correct=true" property in metadata) have - * a corrupt date shift: {@value} days or 2 * {@value #JULIAN_DAY_NUMBER_FOR_UNIX_EPOCH} + * All old parquet files (which haven't "is.date.correct=true" or "parquet-writer.version" properties + * in metadata) have a corrupt date shift: {@value} days or 2 * {@value #JULIAN_DAY_NUMBER_FOR_UNIX_EPOCH} */ public static final long CORRECT_CORRUPT_DATE_SHIFT = 2 * JULIAN_DAY_NUMBER_FOR_UNIX_EPOCH; - // The year 5000 (or 1106685 day from Unix epoch) is chosen as the threshold for auto-detecting date corruption. - // This balances two possible cases of bad auto-correction. External tools writing dates in the future will not - // be shifted unless they are past this threshold (and we cannot identify them as external files based on the metadata). - // On the other hand, historical dates written with Drill wouldn't risk being incorrectly shifted unless they were - // something like 10,000 years in the past. private static final Chronology UTC = org.joda.time.chrono.ISOChronology.getInstanceUTC(); + /** + * The year 5000 (or 1106685 day from Unix epoch) is chosen as the threshold for auto-detecting date corruption. + * This balances two possible cases of bad auto-correction. External tools writing dates in the future will not + * be shifted unless they are past this threshold (and we cannot identify them as external files based on the metadata). + * On the other hand, historical dates written with Drill wouldn't risk being incorrectly shifted unless they were + * something like 10,000 years in the past. + */ public static final int DATE_CORRUPTION_THRESHOLD = (int) (UTC.getDateTimeMillis(5000, 1, 1, 0) / DateTimeConstants.MILLIS_PER_DAY); - + /** + * The version of drill parquet writer with date values corruption fix --- End diff -- Done > Upgrading of the approach of parquet date correctness status detection > -- > > Key: DRILL-4980 > URL: https://issues.apache.org/jira/browse/DRILL-4980 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Parquet >Affects Versions: 1.8.0 >Reporter: Vitalii Diravka >Assignee: Vitalii Diravka > Fix For: 1.9.0 > > > This jira is an addition for the > [DRILL-4203|https://issues.apache.org/jira/browse/DRILL-4203]. > The date correctness label for the new generated parquet files should be > upgraded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4935) Allow drillbits to advertise a configurable host address to Zookeeper
[ https://issues.apache.org/jira/browse/DRILL-4935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15654182#comment-15654182 ] ASF GitHub Bot commented on DRILL-4935: --- Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/647 Please update the JIRA ticket to explain the solution. What does an admin need to know to use the feature? How can the admin verify that it works? This will allow the documentation team to add the needed information for folks to use this feature. Also, assign the JIRA ticket to yourself, since you're working on it. > Allow drillbits to advertise a configurable host address to Zookeeper > - > > Key: DRILL-4935 > URL: https://issues.apache.org/jira/browse/DRILL-4935 > Project: Apache Drill > Issue Type: New Feature > Components: Execution - RPC >Affects Versions: 1.8.0 >Reporter: Harrison Mebane >Priority: Minor > Fix For: Future > > > There are certain situations, such as running Drill in distributed Docker > containers, in which it is desirable to advertise a different hostname to > Zookeeper than would be output by INetAddress.getLocalHost(). I propose > adding a configuration variable 'drill.exec.rpc.bit.advertised.host' and > passing this address to Zookeeper when the configuration variable is > populated, otherwise falling back to the present behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-5028) Opening profiles page from web ui gets very slow when a lot of history files have been stored in HDFS or Local FS.
[ https://issues.apache.org/jira/browse/DRILL-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongze Zhang updated DRILL-5028: Description: We have a Drill cluster with 20+ Nodes and we store all history profiles in hdfs. Without doing periodically cleans for hdfs, the profiles page gets slower while serving more queries. Code from LocalPersistentStore.java uses fs.list(false, basePath) for fetching the latest 100 history profiles by default, I guess this operation blocks the page loading (Millions small files can be stored in the basePath), maybe we can try some other ways to reach the same goal. was: We have a Drill cluster with 20+ Nodes and we store all history profiles in hdfs. Without doing periodically cleans for hdfs, the profiles page gets slower while serving more queries. Code from LocalPersistentStore.java uses fs.list(false, basePath) for fetching the latest 100 history profiles by default, I guess this operation blocks that page (Millions small files can be stored in the basePath), maybe we can try some other ways to reach the same goal. > Opening profiles page from web ui gets very slow when a lot of history files > have been stored in HDFS or Local FS. > -- > > Key: DRILL-5028 > URL: https://issues.apache.org/jira/browse/DRILL-5028 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.8.0 >Reporter: Hongze Zhang > Fix For: Future > > > We have a Drill cluster with 20+ Nodes and we store all history profiles in > hdfs. Without doing periodically cleans for hdfs, the profiles page gets > slower while serving more queries. > Code from LocalPersistentStore.java uses fs.list(false, basePath) for > fetching the latest 100 history profiles by default, I guess this operation > blocks the page loading (Millions small files can be stored in the basePath), > maybe we can try some other ways to reach the same goal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-5030) Drill SSL Docs have Bad Link to Oracle Website
John Omernik created DRILL-5030: --- Summary: Drill SSL Docs have Bad Link to Oracle Website Key: DRILL-5030 URL: https://issues.apache.org/jira/browse/DRILL-5030 Project: Apache Drill Issue Type: Bug Components: Documentation Affects Versions: 1.8.0 Reporter: John Omernik When going to setup custom SSL certs on Drill, I found that the link to the oracle website was broken on this page: https://drill.apache.org/docs/configuring-web-console-and-rest-api-security/ at: As cluster administrator, you can set the following SSL configuration parameters in the conf/drill-override.conf file, as described in the Java product documentation: Obviously fixing the link is one option, another would be to provide instructions for SSL certs directly in the drill docs so we are not reliant on Oracle's website. Thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4956) Temporary tables support
[ https://issues.apache.org/jira/browse/DRILL-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-4956: Description: Link to design doc - https://docs.google.com/document/d/1gSRo_w6q2WR5fPx7SsQ5IaVmJXJ6xCOJfYGyqpVOC-g/edit (was: Link to design doc - TBA) > Temporary tables support > > > Key: DRILL-4956 > URL: https://issues.apache.org/jira/browse/DRILL-4956 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.8.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva > Labels: doc-impacting > Fix For: Future > > > Link to design doc - > https://docs.google.com/document/d/1gSRo_w6q2WR5fPx7SsQ5IaVmJXJ6xCOJfYGyqpVOC-g/edit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4956) Temporary tables support
[ https://issues.apache.org/jira/browse/DRILL-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-4956: Labels: doc-impacting (was: ) > Temporary tables support > > > Key: DRILL-4956 > URL: https://issues.apache.org/jira/browse/DRILL-4956 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.8.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva > Labels: doc-impacting > Fix For: Future > > > Link to design doc - TBA -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4956) Temporary tables support
[ https://issues.apache.org/jira/browse/DRILL-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-4956: Fix Version/s: (was: Future) > Temporary tables support > > > Key: DRILL-4956 > URL: https://issues.apache.org/jira/browse/DRILL-4956 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.8.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva > Fix For: Future > > > Link to design doc - TBA -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4956) Temporary tables support
[ https://issues.apache.org/jira/browse/DRILL-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-4956: Fix Version/s: Future > Temporary tables support > > > Key: DRILL-4956 > URL: https://issues.apache.org/jira/browse/DRILL-4956 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.8.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva > Fix For: Future > > > Link to design doc - TBA -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-5029) need better error - cast interval day to int or bigint
Khurram Faraaz created DRILL-5029: - Summary: need better error - cast interval day to int or bigint Key: DRILL-5029 URL: https://issues.apache.org/jira/browse/DRILL-5029 Project: Apache Drill Issue Type: Bug Components: Execution - Flow Affects Versions: 1.8.0 Reporter: Khurram Faraaz Priority: Minor We need a better error message, today Drill returns an AssertionError {noformat} 0: jdbc:drill:schema=dfs.tmp> values(cast('P162M24D' as INTERVAL DAY)); +-+ | EXPR$0 | +-+ | P24D| +-+ 1 row selected (0.419 seconds) {noformat} A better error would be ERROR: cannot cast type interval to int {noformat} 0: jdbc:drill:schema=dfs.tmp> values(cast(cast('P162M24D' as INTERVAL DAY) as INT)); Error: SYSTEM ERROR: AssertionError: Internal error: Conversion to relational algebra failed to preserve datatypes: validated type: RecordType(INTEGER NOT NULL EXPR$0) NOT NULL converted type: RecordType(BIGINT NOT NULL EXPR$0) NOT NULL rel: LogicalProject(EXPR$0=[/INT(Reinterpret(CAST('P162M24D'):INTERVAL DAY NOT NULL), 8640)]) LogicalValues(tuples=[[{ 0 }]]) [Error Id: 662716fb-c2c3-4032-8d92-835f8b0ec7ae on centos-01.qa.lab:31010] (state=,code=0) {noformat} A better message would be ERROR: cannot cast type interval to bigint {noformat} 0: jdbc:drill:schema=dfs.tmp> values(cast(cast('P162M24D' as INTERVAL DAY) as BIGINT)); Error: SYSTEM ERROR: AssertionError: todo: implement syntax SPECIAL(Reinterpret(CAST('P162M24D'):INTERVAL DAY NOT NULL)) [Error Id: ef2c31cd-dee3-4f13-aca0-05c16185f789 on centos-01.qa.lab:31010] (state=,code=0) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5015) As per documentation, when issuing a list of drillbits in the connection string, we always attempt to connect only to the first one
[ https://issues.apache.org/jira/browse/DRILL-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15653519#comment-15653519 ] ASF GitHub Bot commented on DRILL-5015: --- Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/648#discussion_r87351696 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java --- @@ -223,19 +223,65 @@ public void connect(Properties props) throws RpcException { connect(null, props); } + /** + * Function to populate the endpointList with information of all the drillbits + * provided in the connection string by client + * @param endpointList - ArrayList of DrillbitEndpoints + * @param drillbits - One or more drillbit ip[:port] provided in connection string + */ + public void populateEndpointsList(ArrayList endpointList, String drillbits){ + +// If no information about drillbits is provided then just return empty list. +if(drillbits == null || drillbits.length() == 0){ + return; +} +final String[] connectInfo = drillbits.split(","); + +/* For direct connection we can get URL string having drillbit property as below: + drillbit=: --- Use the IP and port specified as the Foreman IP and port + drillbit=--- Use the IP specified as the Foreman IP with default port in config file + drillbit=:,:... --- Randomly select the IP and port pair from the specified + list as the Foreman IP and port. + + Fetch ip address and port information for each drillbit and populate the list +*/ +for(String info : connectInfo){ + info = info.trim(); + + if(info != null){ +// Split each info to get ip address and port value +final String[] drillbitInfo = info.split(":"); + +// Check for malformed ip:port string +if(drillbitInfo == null || drillbitInfo.length == 0){ + continue; +} + +/* If port is present use that one else use the configured one + Assumptions: 1) IP Address provided in connection string is valid +2) Port without IP address is never specified. +*/ +final String port = (drillbitInfo.length == 2) ? drillbitInfo[1] : config.getString(ExecConstants.INITIAL_USER_PORT); +final DrillbitEndpoint endpoint = DrillbitEndpoint.newBuilder() + .setAddress(drillbitInfo[0]) + .setUserPort(Integer.parseInt(port)) + .build(); +endpointList.add(endpoint); + } +} + } + public synchronized void connect(String connect, Properties props) throws RpcException { if (connected) { return; } final DrillbitEndpoint endpoint; +final ArrayList endpoints = new ArrayList<>(); --- End diff -- fixed > As per documentation, when issuing a list of drillbits in the connection > string, we always attempt to connect only to the first one > --- > > Key: DRILL-5015 > URL: https://issues.apache.org/jira/browse/DRILL-5015 > Project: Apache Drill > Issue Type: Bug > Components: Client - JDBC >Affects Versions: 1.8.0, 1.9.0 >Reporter: Sorabh Hamirwasia >Assignee: Sudheesh Katkam > > When trying to connect to a Drill cluster by specifying more than 1 drillbits > to connect to, we always attempt to connect to only the first drillbit. > As an example, we tested against a pair of drillbits, but we always connect > to the first entry in the CSV list by querying for the 'current' drillbit. > The remaining entries are never attempted. > [root@pssc-60 agileSqlPerfTests]# /opt/mapr/drill/drill-1.8.0/bin/sqlline -u > "jdbc:drill:schema=dfs.tmp;drillbit=pssc-61:31010,pssc-62:31010" -f > whereAmI.q | grep -v logback > 1/1 select * from sys.drillbits where `current`; > +-++---++--+ > |hostname | user_port | control_port | data_port | current | > +-++---++--+ > | pssc-61.qa.lab | 31010 | 31011 | 31012 | true | > +-++---++--+ > 1 row selected (0.265 seconds) > Closing: org.apache.drill.jdbc.impl.DrillConnectionImpl > apache drill 1.8.0 > "a little sql
[jira] [Commented] (DRILL-5015) As per documentation, when issuing a list of drillbits in the connection string, we always attempt to connect only to the first one
[ https://issues.apache.org/jira/browse/DRILL-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15653523#comment-15653523 ] ASF GitHub Bot commented on DRILL-5015: --- Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/648#discussion_r87351913 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java --- @@ -223,19 +223,65 @@ public void connect(Properties props) throws RpcException { connect(null, props); } + /** + * Function to populate the endpointList with information of all the drillbits + * provided in the connection string by client + * @param endpointList - ArrayList of DrillbitEndpoints + * @param drillbits - One or more drillbit ip[:port] provided in connection string + */ + public void populateEndpointsList(ArrayList endpointList, String drillbits){ + +// If no information about drillbits is provided then just return empty list. --- End diff -- Fixed > As per documentation, when issuing a list of drillbits in the connection > string, we always attempt to connect only to the first one > --- > > Key: DRILL-5015 > URL: https://issues.apache.org/jira/browse/DRILL-5015 > Project: Apache Drill > Issue Type: Bug > Components: Client - JDBC >Affects Versions: 1.8.0, 1.9.0 >Reporter: Sorabh Hamirwasia >Assignee: Sudheesh Katkam > > When trying to connect to a Drill cluster by specifying more than 1 drillbits > to connect to, we always attempt to connect to only the first drillbit. > As an example, we tested against a pair of drillbits, but we always connect > to the first entry in the CSV list by querying for the 'current' drillbit. > The remaining entries are never attempted. > [root@pssc-60 agileSqlPerfTests]# /opt/mapr/drill/drill-1.8.0/bin/sqlline -u > "jdbc:drill:schema=dfs.tmp;drillbit=pssc-61:31010,pssc-62:31010" -f > whereAmI.q | grep -v logback > 1/1 select * from sys.drillbits where `current`; > +-++---++--+ > |hostname | user_port | control_port | data_port | current | > +-++---++--+ > | pssc-61.qa.lab | 31010 | 31011 | 31012 | true | > +-++---++--+ > 1 row selected (0.265 seconds) > Closing: org.apache.drill.jdbc.impl.DrillConnectionImpl > apache drill 1.8.0 > "a little sql for your nosql" > This property is meant for use by clients when not wanting to overload the ZK > for fetching a list of existing Drillbits, but the behaviour doesn't match > the documentation. > [Making a Direct Drillbit Connection | > https://drill.apache.org/docs/using-the-jdbc-driver/#using-the-jdbc-url-format-for-a-direct-drillbit-connection > ] > We need to randomly shuffle between this list and If an entry in the shuffled > list is unreachable, we need to try for the next entry in the list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5015) As per documentation, when issuing a list of drillbits in the connection string, we always attempt to connect only to the first one
[ https://issues.apache.org/jira/browse/DRILL-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15653522#comment-15653522 ] ASF GitHub Bot commented on DRILL-5015: --- Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/648#discussion_r87351928 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java --- @@ -223,19 +223,65 @@ public void connect(Properties props) throws RpcException { connect(null, props); } + /** + * Function to populate the endpointList with information of all the drillbits + * provided in the connection string by client + * @param endpointList - ArrayList of DrillbitEndpoints + * @param drillbits - One or more drillbit ip[:port] provided in connection string --- End diff -- Just to clarify the I am not changing the variable name "drillbit" used in the connection string. So there won't be any doc impact. The variable "drillbits" is used in the internal function which is parsing the string. As discussed the documentation is already there [here](http://drill.apache.org/docs/using-the-jdbc-driver/#using-the-jdbc-url-format-for-a-direct-drillbit-connection) > As per documentation, when issuing a list of drillbits in the connection > string, we always attempt to connect only to the first one > --- > > Key: DRILL-5015 > URL: https://issues.apache.org/jira/browse/DRILL-5015 > Project: Apache Drill > Issue Type: Bug > Components: Client - JDBC >Affects Versions: 1.8.0, 1.9.0 >Reporter: Sorabh Hamirwasia >Assignee: Sudheesh Katkam > > When trying to connect to a Drill cluster by specifying more than 1 drillbits > to connect to, we always attempt to connect to only the first drillbit. > As an example, we tested against a pair of drillbits, but we always connect > to the first entry in the CSV list by querying for the 'current' drillbit. > The remaining entries are never attempted. > [root@pssc-60 agileSqlPerfTests]# /opt/mapr/drill/drill-1.8.0/bin/sqlline -u > "jdbc:drill:schema=dfs.tmp;drillbit=pssc-61:31010,pssc-62:31010" -f > whereAmI.q | grep -v logback > 1/1 select * from sys.drillbits where `current`; > +-++---++--+ > |hostname | user_port | control_port | data_port | current | > +-++---++--+ > | pssc-61.qa.lab | 31010 | 31011 | 31012 | true | > +-++---++--+ > 1 row selected (0.265 seconds) > Closing: org.apache.drill.jdbc.impl.DrillConnectionImpl > apache drill 1.8.0 > "a little sql for your nosql" > This property is meant for use by clients when not wanting to overload the ZK > for fetching a list of existing Drillbits, but the behaviour doesn't match > the documentation. > [Making a Direct Drillbit Connection | > https://drill.apache.org/docs/using-the-jdbc-driver/#using-the-jdbc-url-format-for-a-direct-drillbit-connection > ] > We need to randomly shuffle between this list and If an entry in the shuffled > list is unreachable, we need to try for the next entry in the list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5015) As per documentation, when issuing a list of drillbits in the connection string, we always attempt to connect only to the first one
[ https://issues.apache.org/jira/browse/DRILL-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15653520#comment-15653520 ] ASF GitHub Bot commented on DRILL-5015: --- Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/648#discussion_r87351935 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java --- @@ -223,19 +223,65 @@ public void connect(Properties props) throws RpcException { connect(null, props); } + /** + * Function to populate the endpointList with information of all the drillbits --- End diff -- Done > As per documentation, when issuing a list of drillbits in the connection > string, we always attempt to connect only to the first one > --- > > Key: DRILL-5015 > URL: https://issues.apache.org/jira/browse/DRILL-5015 > Project: Apache Drill > Issue Type: Bug > Components: Client - JDBC >Affects Versions: 1.8.0, 1.9.0 >Reporter: Sorabh Hamirwasia >Assignee: Sudheesh Katkam > > When trying to connect to a Drill cluster by specifying more than 1 drillbits > to connect to, we always attempt to connect to only the first drillbit. > As an example, we tested against a pair of drillbits, but we always connect > to the first entry in the CSV list by querying for the 'current' drillbit. > The remaining entries are never attempted. > [root@pssc-60 agileSqlPerfTests]# /opt/mapr/drill/drill-1.8.0/bin/sqlline -u > "jdbc:drill:schema=dfs.tmp;drillbit=pssc-61:31010,pssc-62:31010" -f > whereAmI.q | grep -v logback > 1/1 select * from sys.drillbits where `current`; > +-++---++--+ > |hostname | user_port | control_port | data_port | current | > +-++---++--+ > | pssc-61.qa.lab | 31010 | 31011 | 31012 | true | > +-++---++--+ > 1 row selected (0.265 seconds) > Closing: org.apache.drill.jdbc.impl.DrillConnectionImpl > apache drill 1.8.0 > "a little sql for your nosql" > This property is meant for use by clients when not wanting to overload the ZK > for fetching a list of existing Drillbits, but the behaviour doesn't match > the documentation. > [Making a Direct Drillbit Connection | > https://drill.apache.org/docs/using-the-jdbc-driver/#using-the-jdbc-url-format-for-a-direct-drillbit-connection > ] > We need to randomly shuffle between this list and If an entry in the shuffled > list is unreachable, we need to try for the next entry in the list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5015) As per documentation, when issuing a list of drillbits in the connection string, we always attempt to connect only to the first one
[ https://issues.apache.org/jira/browse/DRILL-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15653510#comment-15653510 ] ASF GitHub Bot commented on DRILL-5015: --- Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/648#discussion_r87351682 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java --- @@ -223,19 +223,65 @@ public void connect(Properties props) throws RpcException { connect(null, props); } + /** + * Function to populate the endpointList with information of all the drillbits + * provided in the connection string by client + * @param endpointList - ArrayList of DrillbitEndpoints + * @param drillbits - One or more drillbit ip[:port] provided in connection string + */ + public void populateEndpointsList(ArrayList endpointList, String drillbits){ + +// If no information about drillbits is provided then just return empty list. +if(drillbits == null || drillbits.length() == 0){ + return; +} +final String[] connectInfo = drillbits.split(","); + +/* For direct connection we can get URL string having drillbit property as below: + drillbit=: --- Use the IP and port specified as the Foreman IP and port + drillbit=--- Use the IP specified as the Foreman IP with default port in config file + drillbit=:,:... --- Randomly select the IP and port pair from the specified + list as the Foreman IP and port. + + Fetch ip address and port information for each drillbit and populate the list +*/ +for(String info : connectInfo){ + info = info.trim(); + + if(info != null){ +// Split each info to get ip address and port value +final String[] drillbitInfo = info.split(":"); + +// Check for malformed ip:port string +if(drillbitInfo == null || drillbitInfo.length == 0){ + continue; +} + +/* If port is present use that one else use the configured one + Assumptions: 1) IP Address provided in connection string is valid +2) Port without IP address is never specified. +*/ +final String port = (drillbitInfo.length == 2) ? drillbitInfo[1] : config.getString(ExecConstants.INITIAL_USER_PORT); +final DrillbitEndpoint endpoint = DrillbitEndpoint.newBuilder() + .setAddress(drillbitInfo[0]) + .setUserPort(Integer.parseInt(port)) + .build(); +endpointList.add(endpoint); + } +} + } + public synchronized void connect(String connect, Properties props) throws RpcException { if (connected) { return; } final DrillbitEndpoint endpoint; +final ArrayList endpoints = new ArrayList<>(); if (isDirectConnection) { - final String[] connectInfo = props.getProperty("drillbit").split(":"); - final String port = connectInfo.length==2?connectInfo[1]:config.getString(ExecConstants.INITIAL_USER_PORT); - endpoint = DrillbitEndpoint.newBuilder() - .setAddress(connectInfo[0]) - .setUserPort(Integer.parseInt(port)) - .build(); + // Populate the endpoints list with all the drillbit information provided in the + // connection string + populateEndpointsList(endpoints, props.getProperty("drillbit").trim()); --- End diff -- Fixed. Changed the method to parseAndVerifyEndpoints > As per documentation, when issuing a list of drillbits in the connection > string, we always attempt to connect only to the first one > --- > > Key: DRILL-5015 > URL: https://issues.apache.org/jira/browse/DRILL-5015 > Project: Apache Drill > Issue Type: Bug > Components: Client - JDBC >Affects Versions: 1.8.0, 1.9.0 >Reporter: Sorabh Hamirwasia >Assignee: Sudheesh Katkam > > When trying to connect to a Drill cluster by specifying more than 1 drillbits > to connect to, we always attempt to connect to only the first drillbit. > As an example, we tested against a pair of drillbits, but we always connect > to the first entry in the CSV list by querying for the 'current' drillbit. > The remaining entries are never attempted. > [root@pssc-60 agileSqlPerfTests]# /opt/mapr/drill/drill-1.8.0/bin/sq
[jira] [Commented] (DRILL-5015) As per documentation, when issuing a list of drillbits in the connection string, we always attempt to connect only to the first one
[ https://issues.apache.org/jira/browse/DRILL-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15653518#comment-15653518 ] ASF GitHub Bot commented on DRILL-5015: --- Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/648#discussion_r87351891 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java --- @@ -223,19 +223,65 @@ public void connect(Properties props) throws RpcException { connect(null, props); } + /** + * Function to populate the endpointList with information of all the drillbits + * provided in the connection string by client + * @param endpointList - ArrayList of DrillbitEndpoints + * @param drillbits - One or more drillbit ip[:port] provided in connection string + */ + public void populateEndpointsList(ArrayList endpointList, String drillbits){ + +// If no information about drillbits is provided then just return empty list. +if(drillbits == null || drillbits.length() == 0){ + return; +} +final String[] connectInfo = drillbits.split(","); + +/* For direct connection we can get URL string having drillbit property as below: + drillbit=: --- Use the IP and port specified as the Foreman IP and port + drillbit=--- Use the IP specified as the Foreman IP with default port in config file + drillbit=:,:... --- Randomly select the IP and port pair from the specified + list as the Foreman IP and port. + + Fetch ip address and port information for each drillbit and populate the list +*/ +for(String info : connectInfo){ + info = info.trim(); + + if(info != null){ +// Split each info to get ip address and port value +final String[] drillbitInfo = info.split(":"); + +// Check for malformed ip:port string +if(drillbitInfo == null || drillbitInfo.length == 0){ --- End diff -- Length can be 0 here when the string just contain ":" In case of more than 2 ports right now I am falling back to default port. But as discussed changed it to throw new exception "InvalidConnectionInfoException" > As per documentation, when issuing a list of drillbits in the connection > string, we always attempt to connect only to the first one > --- > > Key: DRILL-5015 > URL: https://issues.apache.org/jira/browse/DRILL-5015 > Project: Apache Drill > Issue Type: Bug > Components: Client - JDBC >Affects Versions: 1.8.0, 1.9.0 >Reporter: Sorabh Hamirwasia >Assignee: Sudheesh Katkam > > When trying to connect to a Drill cluster by specifying more than 1 drillbits > to connect to, we always attempt to connect to only the first drillbit. > As an example, we tested against a pair of drillbits, but we always connect > to the first entry in the CSV list by querying for the 'current' drillbit. > The remaining entries are never attempted. > [root@pssc-60 agileSqlPerfTests]# /opt/mapr/drill/drill-1.8.0/bin/sqlline -u > "jdbc:drill:schema=dfs.tmp;drillbit=pssc-61:31010,pssc-62:31010" -f > whereAmI.q | grep -v logback > 1/1 select * from sys.drillbits where `current`; > +-++---++--+ > |hostname | user_port | control_port | data_port | current | > +-++---++--+ > | pssc-61.qa.lab | 31010 | 31011 | 31012 | true | > +-++---++--+ > 1 row selected (0.265 seconds) > Closing: org.apache.drill.jdbc.impl.DrillConnectionImpl > apache drill 1.8.0 > "a little sql for your nosql" > This property is meant for use by clients when not wanting to overload the ZK > for fetching a list of existing Drillbits, but the behaviour doesn't match > the documentation. > [Making a Direct Drillbit Connection | > https://drill.apache.org/docs/using-the-jdbc-driver/#using-the-jdbc-url-format-for-a-direct-drillbit-connection > ] > We need to randomly shuffle between this list and If an entry in the shuffled > list is unreachable, we need to try for the next entry in the list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5015) As per documentation, when issuing a list of drillbits in the connection string, we always attempt to connect only to the first one
[ https://issues.apache.org/jira/browse/DRILL-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15653521#comment-15653521 ] ASF GitHub Bot commented on DRILL-5015: --- Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/648#discussion_r87351905 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java --- @@ -223,19 +223,65 @@ public void connect(Properties props) throws RpcException { connect(null, props); } + /** + * Function to populate the endpointList with information of all the drillbits + * provided in the connection string by client + * @param endpointList - ArrayList of DrillbitEndpoints + * @param drillbits - One or more drillbit ip[:port] provided in connection string + */ + public void populateEndpointsList(ArrayList endpointList, String drillbits){ + +// If no information about drillbits is provided then just return empty list. +if(drillbits == null || drillbits.length() == 0){ + return; +} +final String[] connectInfo = drillbits.split(","); + +/* For direct connection we can get URL string having drillbit property as below: --- End diff -- Done > As per documentation, when issuing a list of drillbits in the connection > string, we always attempt to connect only to the first one > --- > > Key: DRILL-5015 > URL: https://issues.apache.org/jira/browse/DRILL-5015 > Project: Apache Drill > Issue Type: Bug > Components: Client - JDBC >Affects Versions: 1.8.0, 1.9.0 >Reporter: Sorabh Hamirwasia >Assignee: Sudheesh Katkam > > When trying to connect to a Drill cluster by specifying more than 1 drillbits > to connect to, we always attempt to connect to only the first drillbit. > As an example, we tested against a pair of drillbits, but we always connect > to the first entry in the CSV list by querying for the 'current' drillbit. > The remaining entries are never attempted. > [root@pssc-60 agileSqlPerfTests]# /opt/mapr/drill/drill-1.8.0/bin/sqlline -u > "jdbc:drill:schema=dfs.tmp;drillbit=pssc-61:31010,pssc-62:31010" -f > whereAmI.q | grep -v logback > 1/1 select * from sys.drillbits where `current`; > +-++---++--+ > |hostname | user_port | control_port | data_port | current | > +-++---++--+ > | pssc-61.qa.lab | 31010 | 31011 | 31012 | true | > +-++---++--+ > 1 row selected (0.265 seconds) > Closing: org.apache.drill.jdbc.impl.DrillConnectionImpl > apache drill 1.8.0 > "a little sql for your nosql" > This property is meant for use by clients when not wanting to overload the ZK > for fetching a list of existing Drillbits, but the behaviour doesn't match > the documentation. > [Making a Direct Drillbit Connection | > https://drill.apache.org/docs/using-the-jdbc-driver/#using-the-jdbc-url-format-for-a-direct-drillbit-connection > ] > We need to randomly shuffle between this list and If an entry in the shuffled > list is unreachable, we need to try for the next entry in the list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5015) As per documentation, when issuing a list of drillbits in the connection string, we always attempt to connect only to the first one
[ https://issues.apache.org/jira/browse/DRILL-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15653509#comment-15653509 ] ASF GitHub Bot commented on DRILL-5015: --- Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/648#discussion_r87351650 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/client/DrillClientSystemTest.java --- @@ -73,4 +77,90 @@ public void testSubmitPlanTwoNodes() throws Exception { } client.close(); } + + @Test + public void testPopulateEndpointsList() throws Exception{ + +ArrayList endpointsList = new ArrayList<>(); +String drillBitConnection; +DrillClient client = new DrillClient(); +DrillbitEndpoint endpoint; +Iterator endpointIterator; + + +// Test with single drillbit ip +drillBitConnection = "10.10.100.161"; +client.populateEndpointsList(endpointsList, drillBitConnection); +endpoint = endpointsList.iterator().next(); +assert(endpointsList.size() == 1); +assert(endpoint.getAddress().equalsIgnoreCase(drillBitConnection)); +assert(endpoint.getUserPort() == client.getConfig().getInt(ExecConstants.INITIAL_USER_PORT)); + +// Test with single drillbit ip:port +endpointsList.clear(); --- End diff -- Fixed > As per documentation, when issuing a list of drillbits in the connection > string, we always attempt to connect only to the first one > --- > > Key: DRILL-5015 > URL: https://issues.apache.org/jira/browse/DRILL-5015 > Project: Apache Drill > Issue Type: Bug > Components: Client - JDBC >Affects Versions: 1.8.0, 1.9.0 >Reporter: Sorabh Hamirwasia >Assignee: Sudheesh Katkam > > When trying to connect to a Drill cluster by specifying more than 1 drillbits > to connect to, we always attempt to connect to only the first drillbit. > As an example, we tested against a pair of drillbits, but we always connect > to the first entry in the CSV list by querying for the 'current' drillbit. > The remaining entries are never attempted. > [root@pssc-60 agileSqlPerfTests]# /opt/mapr/drill/drill-1.8.0/bin/sqlline -u > "jdbc:drill:schema=dfs.tmp;drillbit=pssc-61:31010,pssc-62:31010" -f > whereAmI.q | grep -v logback > 1/1 select * from sys.drillbits where `current`; > +-++---++--+ > |hostname | user_port | control_port | data_port | current | > +-++---++--+ > | pssc-61.qa.lab | 31010 | 31011 | 31012 | true | > +-++---++--+ > 1 row selected (0.265 seconds) > Closing: org.apache.drill.jdbc.impl.DrillConnectionImpl > apache drill 1.8.0 > "a little sql for your nosql" > This property is meant for use by clients when not wanting to overload the ZK > for fetching a list of existing Drillbits, but the behaviour doesn't match > the documentation. > [Making a Direct Drillbit Connection | > https://drill.apache.org/docs/using-the-jdbc-driver/#using-the-jdbc-url-format-for-a-direct-drillbit-connection > ] > We need to randomly shuffle between this list and If an entry in the shuffled > list is unreachable, we need to try for the next entry in the list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5015) As per documentation, when issuing a list of drillbits in the connection string, we always attempt to connect only to the first one
[ https://issues.apache.org/jira/browse/DRILL-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15653512#comment-15653512 ] ASF GitHub Bot commented on DRILL-5015: --- Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/648#discussion_r87351880 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java --- @@ -223,19 +223,65 @@ public void connect(Properties props) throws RpcException { connect(null, props); } + /** + * Function to populate the endpointList with information of all the drillbits + * provided in the connection string by client + * @param endpointList - ArrayList of DrillbitEndpoints + * @param drillbits - One or more drillbit ip[:port] provided in connection string + */ + public void populateEndpointsList(ArrayList endpointList, String drillbits){ + +// If no information about drillbits is provided then just return empty list. +if(drillbits == null || drillbits.length() == 0){ + return; +} +final String[] connectInfo = drillbits.split(","); + +/* For direct connection we can get URL string having drillbit property as below: + drillbit=: --- Use the IP and port specified as the Foreman IP and port + drillbit=--- Use the IP specified as the Foreman IP with default port in config file + drillbit=:,:... --- Randomly select the IP and port pair from the specified + list as the Foreman IP and port. + + Fetch ip address and port information for each drillbit and populate the list +*/ +for(String info : connectInfo){ + info = info.trim(); + + if(info != null){ +// Split each info to get ip address and port value +final String[] drillbitInfo = info.split(":"); + +// Check for malformed ip:port string +if(drillbitInfo == null || drillbitInfo.length == 0){ + continue; +} + +/* If port is present use that one else use the configured one + Assumptions: 1) IP Address provided in connection string is valid +2) Port without IP address is never specified. +*/ +final String port = (drillbitInfo.length == 2) ? drillbitInfo[1] : config.getString(ExecConstants.INITIAL_USER_PORT); --- End diff -- Put the sanity checks in place for all error condition and throwing exception with well formed messages. For length > 2 case, treating it as error and throwing exception > As per documentation, when issuing a list of drillbits in the connection > string, we always attempt to connect only to the first one > --- > > Key: DRILL-5015 > URL: https://issues.apache.org/jira/browse/DRILL-5015 > Project: Apache Drill > Issue Type: Bug > Components: Client - JDBC >Affects Versions: 1.8.0, 1.9.0 >Reporter: Sorabh Hamirwasia >Assignee: Sudheesh Katkam > > When trying to connect to a Drill cluster by specifying more than 1 drillbits > to connect to, we always attempt to connect to only the first drillbit. > As an example, we tested against a pair of drillbits, but we always connect > to the first entry in the CSV list by querying for the 'current' drillbit. > The remaining entries are never attempted. > [root@pssc-60 agileSqlPerfTests]# /opt/mapr/drill/drill-1.8.0/bin/sqlline -u > "jdbc:drill:schema=dfs.tmp;drillbit=pssc-61:31010,pssc-62:31010" -f > whereAmI.q | grep -v logback > 1/1 select * from sys.drillbits where `current`; > +-++---++--+ > |hostname | user_port | control_port | data_port | current | > +-++---++--+ > | pssc-61.qa.lab | 31010 | 31011 | 31012 | true | > +-++---++--+ > 1 row selected (0.265 seconds) > Closing: org.apache.drill.jdbc.impl.DrillConnectionImpl > apache drill 1.8.0 > "a little sql for your nosql" > This property is meant for use by clients when not wanting to overload the ZK > for fetching a list of existing Drillbits, but the behaviour doesn't match > the documentation. > [Making a Direct Drillbit Connection | > https://drill.apache.org/docs/using-the-jdbc-driver/#using-the-jdbc-url-format-for-a-direct-drillbit-connection > ] > We need to randomly shuffle between this list and If an entry in the shuffled > list is unreachable,
[jira] [Commented] (DRILL-5015) As per documentation, when issuing a list of drillbits in the connection string, we always attempt to connect only to the first one
[ https://issues.apache.org/jira/browse/DRILL-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15653515#comment-15653515 ] ASF GitHub Bot commented on DRILL-5015: --- Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/648#discussion_r87351685 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java --- @@ -223,19 +223,65 @@ public void connect(Properties props) throws RpcException { connect(null, props); } + /** + * Function to populate the endpointList with information of all the drillbits + * provided in the connection string by client + * @param endpointList - ArrayList of DrillbitEndpoints + * @param drillbits - One or more drillbit ip[:port] provided in connection string + */ + public void populateEndpointsList(ArrayList endpointList, String drillbits){ + +// If no information about drillbits is provided then just return empty list. +if(drillbits == null || drillbits.length() == 0){ + return; +} +final String[] connectInfo = drillbits.split(","); + +/* For direct connection we can get URL string having drillbit property as below: + drillbit=: --- Use the IP and port specified as the Foreman IP and port + drillbit=--- Use the IP specified as the Foreman IP with default port in config file + drillbit=:,:... --- Randomly select the IP and port pair from the specified + list as the Foreman IP and port. + + Fetch ip address and port information for each drillbit and populate the list +*/ +for(String info : connectInfo){ + info = info.trim(); + + if(info != null){ +// Split each info to get ip address and port value +final String[] drillbitInfo = info.split(":"); + +// Check for malformed ip:port string +if(drillbitInfo == null || drillbitInfo.length == 0){ + continue; +} + +/* If port is present use that one else use the configured one + Assumptions: 1) IP Address provided in connection string is valid +2) Port without IP address is never specified. +*/ +final String port = (drillbitInfo.length == 2) ? drillbitInfo[1] : config.getString(ExecConstants.INITIAL_USER_PORT); +final DrillbitEndpoint endpoint = DrillbitEndpoint.newBuilder() + .setAddress(drillbitInfo[0]) + .setUserPort(Integer.parseInt(port)) + .build(); +endpointList.add(endpoint); + } +} + } + public synchronized void connect(String connect, Properties props) throws RpcException { if (connected) { return; } final DrillbitEndpoint endpoint; +final ArrayList endpoints = new ArrayList<>(); if (isDirectConnection) { - final String[] connectInfo = props.getProperty("drillbit").split(":"); - final String port = connectInfo.length==2?connectInfo[1]:config.getString(ExecConstants.INITIAL_USER_PORT); - endpoint = DrillbitEndpoint.newBuilder() - .setAddress(connectInfo[0]) - .setUserPort(Integer.parseInt(port)) - .build(); + // Populate the endpoints list with all the drillbit information provided in the + // connection string + populateEndpointsList(endpoints, props.getProperty("drillbit").trim()); --- End diff -- if "drillbit" is unset in the connection string then this code path won't be called at all. If "drillbit" string is specified in connection string then the value is set to empty string when none exists. So .trim() will not cause NPE. But moved it in callee. > As per documentation, when issuing a list of drillbits in the connection > string, we always attempt to connect only to the first one > --- > > Key: DRILL-5015 > URL: https://issues.apache.org/jira/browse/DRILL-5015 > Project: Apache Drill > Issue Type: Bug > Components: Client - JDBC >Affects Versions: 1.8.0, 1.9.0 >Reporter: Sorabh Hamirwasia >Assignee: Sudheesh Katkam > > When trying to connect to a Drill cluster by specifying more than 1 drillbits > to connect to, we always attempt to connect to only the first drillbit. > As an example, we tested against a pair of drillbits, but we a
[jira] [Commented] (DRILL-5015) As per documentation, when issuing a list of drillbits in the connection string, we always attempt to connect only to the first one
[ https://issues.apache.org/jira/browse/DRILL-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15653514#comment-15653514 ] ASF GitHub Bot commented on DRILL-5015: --- Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/648#discussion_r87351921 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java --- @@ -223,19 +223,65 @@ public void connect(Properties props) throws RpcException { connect(null, props); } + /** + * Function to populate the endpointList with information of all the drillbits + * provided in the connection string by client + * @param endpointList - ArrayList of DrillbitEndpoints + * @param drillbits - One or more drillbit ip[:port] provided in connection string + */ + public void populateEndpointsList(ArrayList endpointList, String drillbits){ --- End diff -- Fixed > As per documentation, when issuing a list of drillbits in the connection > string, we always attempt to connect only to the first one > --- > > Key: DRILL-5015 > URL: https://issues.apache.org/jira/browse/DRILL-5015 > Project: Apache Drill > Issue Type: Bug > Components: Client - JDBC >Affects Versions: 1.8.0, 1.9.0 >Reporter: Sorabh Hamirwasia >Assignee: Sudheesh Katkam > > When trying to connect to a Drill cluster by specifying more than 1 drillbits > to connect to, we always attempt to connect to only the first drillbit. > As an example, we tested against a pair of drillbits, but we always connect > to the first entry in the CSV list by querying for the 'current' drillbit. > The remaining entries are never attempted. > [root@pssc-60 agileSqlPerfTests]# /opt/mapr/drill/drill-1.8.0/bin/sqlline -u > "jdbc:drill:schema=dfs.tmp;drillbit=pssc-61:31010,pssc-62:31010" -f > whereAmI.q | grep -v logback > 1/1 select * from sys.drillbits where `current`; > +-++---++--+ > |hostname | user_port | control_port | data_port | current | > +-++---++--+ > | pssc-61.qa.lab | 31010 | 31011 | 31012 | true | > +-++---++--+ > 1 row selected (0.265 seconds) > Closing: org.apache.drill.jdbc.impl.DrillConnectionImpl > apache drill 1.8.0 > "a little sql for your nosql" > This property is meant for use by clients when not wanting to overload the ZK > for fetching a list of existing Drillbits, but the behaviour doesn't match > the documentation. > [Making a Direct Drillbit Connection | > https://drill.apache.org/docs/using-the-jdbc-driver/#using-the-jdbc-url-format-for-a-direct-drillbit-connection > ] > We need to randomly shuffle between this list and If an entry in the shuffled > list is unreachable, we need to try for the next entry in the list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5015) As per documentation, when issuing a list of drillbits in the connection string, we always attempt to connect only to the first one
[ https://issues.apache.org/jira/browse/DRILL-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15653511#comment-15653511 ] ASF GitHub Bot commented on DRILL-5015: --- Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/648#discussion_r87120245 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java --- @@ -223,19 +223,65 @@ public void connect(Properties props) throws RpcException { connect(null, props); } + /** + * Function to populate the endpointList with information of all the drillbits + * provided in the connection string by client + * @param endpointList - ArrayList of DrillbitEndpoints + * @param drillbits - One or more drillbit ip[:port] provided in connection string --- End diff -- @kkhatua - I am not sure if I followed fully. This is an internal method parameter and is not changing the name of "drillbit" parameter in connection string. So it should not have any doc impact. @paul-rogers - I am not introducing anything new here. The [documentation](http://drill.apache.org/docs/using-the-jdbc-driver/#using-the-jdbc-url-format-for-a-direct-drillbit-connection) already specifies the usage but the correct implementation was lacking. It also says we only support one port (user port) as is the case with zk. > As per documentation, when issuing a list of drillbits in the connection > string, we always attempt to connect only to the first one > --- > > Key: DRILL-5015 > URL: https://issues.apache.org/jira/browse/DRILL-5015 > Project: Apache Drill > Issue Type: Bug > Components: Client - JDBC >Affects Versions: 1.8.0, 1.9.0 >Reporter: Sorabh Hamirwasia >Assignee: Sudheesh Katkam > > When trying to connect to a Drill cluster by specifying more than 1 drillbits > to connect to, we always attempt to connect to only the first drillbit. > As an example, we tested against a pair of drillbits, but we always connect > to the first entry in the CSV list by querying for the 'current' drillbit. > The remaining entries are never attempted. > [root@pssc-60 agileSqlPerfTests]# /opt/mapr/drill/drill-1.8.0/bin/sqlline -u > "jdbc:drill:schema=dfs.tmp;drillbit=pssc-61:31010,pssc-62:31010" -f > whereAmI.q | grep -v logback > 1/1 select * from sys.drillbits where `current`; > +-++---++--+ > |hostname | user_port | control_port | data_port | current | > +-++---++--+ > | pssc-61.qa.lab | 31010 | 31011 | 31012 | true | > +-++---++--+ > 1 row selected (0.265 seconds) > Closing: org.apache.drill.jdbc.impl.DrillConnectionImpl > apache drill 1.8.0 > "a little sql for your nosql" > This property is meant for use by clients when not wanting to overload the ZK > for fetching a list of existing Drillbits, but the behaviour doesn't match > the documentation. > [Making a Direct Drillbit Connection | > https://drill.apache.org/docs/using-the-jdbc-driver/#using-the-jdbc-url-format-for-a-direct-drillbit-connection > ] > We need to randomly shuffle between this list and If an entry in the shuffled > list is unreachable, we need to try for the next entry in the list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5015) As per documentation, when issuing a list of drillbits in the connection string, we always attempt to connect only to the first one
[ https://issues.apache.org/jira/browse/DRILL-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15653508#comment-15653508 ] ASF GitHub Bot commented on DRILL-5015: --- Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/648#discussion_r87351662 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java --- @@ -245,14 +291,15 @@ public synchronized void connect(String connect, Properties props) throws RpcExc throw new RpcException("Failure setting up ZK for client.", e); } } - - final ArrayList endpoints = new ArrayList<>(clusterCoordinator.getAvailableEndpoints()); - checkState(!endpoints.isEmpty(), "No DrillbitEndpoint can be found"); - // shuffle the collection then get the first endpoint - Collections.shuffle(endpoints); - endpoint = endpoints.iterator().next(); + endpoints.addAll(clusterCoordinator.getAvailableEndpoints()); } +// Make sure we have at least one endpoint in the list +checkState(!endpoints.isEmpty(), "No DrillbitEndpoint can be found"); +// shuffle the collection then get the first endpoint +Collections.shuffle(endpoints); +endpoint = endpoints.iterator().next(); --- End diff -- Fixed > As per documentation, when issuing a list of drillbits in the connection > string, we always attempt to connect only to the first one > --- > > Key: DRILL-5015 > URL: https://issues.apache.org/jira/browse/DRILL-5015 > Project: Apache Drill > Issue Type: Bug > Components: Client - JDBC >Affects Versions: 1.8.0, 1.9.0 >Reporter: Sorabh Hamirwasia >Assignee: Sudheesh Katkam > > When trying to connect to a Drill cluster by specifying more than 1 drillbits > to connect to, we always attempt to connect to only the first drillbit. > As an example, we tested against a pair of drillbits, but we always connect > to the first entry in the CSV list by querying for the 'current' drillbit. > The remaining entries are never attempted. > [root@pssc-60 agileSqlPerfTests]# /opt/mapr/drill/drill-1.8.0/bin/sqlline -u > "jdbc:drill:schema=dfs.tmp;drillbit=pssc-61:31010,pssc-62:31010" -f > whereAmI.q | grep -v logback > 1/1 select * from sys.drillbits where `current`; > +-++---++--+ > |hostname | user_port | control_port | data_port | current | > +-++---++--+ > | pssc-61.qa.lab | 31010 | 31011 | 31012 | true | > +-++---++--+ > 1 row selected (0.265 seconds) > Closing: org.apache.drill.jdbc.impl.DrillConnectionImpl > apache drill 1.8.0 > "a little sql for your nosql" > This property is meant for use by clients when not wanting to overload the ZK > for fetching a list of existing Drillbits, but the behaviour doesn't match > the documentation. > [Making a Direct Drillbit Connection | > https://drill.apache.org/docs/using-the-jdbc-driver/#using-the-jdbc-url-format-for-a-direct-drillbit-connection > ] > We need to randomly shuffle between this list and If an entry in the shuffled > list is unreachable, we need to try for the next entry in the list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5015) As per documentation, when issuing a list of drillbits in the connection string, we always attempt to connect only to the first one
[ https://issues.apache.org/jira/browse/DRILL-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15653513#comment-15653513 ] ASF GitHub Bot commented on DRILL-5015: --- Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/648#discussion_r87351874 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java --- @@ -223,19 +223,65 @@ public void connect(Properties props) throws RpcException { connect(null, props); } + /** + * Function to populate the endpointList with information of all the drillbits + * provided in the connection string by client + * @param endpointList - ArrayList of DrillbitEndpoints + * @param drillbits - One or more drillbit ip[:port] provided in connection string + */ + public void populateEndpointsList(ArrayList endpointList, String drillbits){ + +// If no information about drillbits is provided then just return empty list. +if(drillbits == null || drillbits.length() == 0){ + return; +} +final String[] connectInfo = drillbits.split(","); + +/* For direct connection we can get URL string having drillbit property as below: + drillbit=: --- Use the IP and port specified as the Foreman IP and port + drillbit=--- Use the IP specified as the Foreman IP with default port in config file + drillbit=:,:... --- Randomly select the IP and port pair from the specified + list as the Foreman IP and port. + + Fetch ip address and port information for each drillbit and populate the list +*/ +for(String info : connectInfo){ + info = info.trim(); + + if(info != null){ +// Split each info to get ip address and port value +final String[] drillbitInfo = info.split(":"); + +// Check for malformed ip:port string +if(drillbitInfo == null || drillbitInfo.length == 0){ + continue; +} + +/* If port is present use that one else use the configured one + Assumptions: 1) IP Address provided in connection string is valid +2) Port without IP address is never specified. +*/ +final String port = (drillbitInfo.length == 2) ? drillbitInfo[1] : config.getString(ExecConstants.INITIAL_USER_PORT); +final DrillbitEndpoint endpoint = DrillbitEndpoint.newBuilder() + .setAddress(drillbitInfo[0]) + .setUserPort(Integer.parseInt(port)) --- End diff -- I am catching the exception that can arise from parseInt and throwing it back to the user as InvalidConnectionInfoException with proper error message. > As per documentation, when issuing a list of drillbits in the connection > string, we always attempt to connect only to the first one > --- > > Key: DRILL-5015 > URL: https://issues.apache.org/jira/browse/DRILL-5015 > Project: Apache Drill > Issue Type: Bug > Components: Client - JDBC >Affects Versions: 1.8.0, 1.9.0 >Reporter: Sorabh Hamirwasia >Assignee: Sudheesh Katkam > > When trying to connect to a Drill cluster by specifying more than 1 drillbits > to connect to, we always attempt to connect to only the first drillbit. > As an example, we tested against a pair of drillbits, but we always connect > to the first entry in the CSV list by querying for the 'current' drillbit. > The remaining entries are never attempted. > [root@pssc-60 agileSqlPerfTests]# /opt/mapr/drill/drill-1.8.0/bin/sqlline -u > "jdbc:drill:schema=dfs.tmp;drillbit=pssc-61:31010,pssc-62:31010" -f > whereAmI.q | grep -v logback > 1/1 select * from sys.drillbits where `current`; > +-++---++--+ > |hostname | user_port | control_port | data_port | current | > +-++---++--+ > | pssc-61.qa.lab | 31010 | 31011 | 31012 | true | > +-++---++--+ > 1 row selected (0.265 seconds) > Closing: org.apache.drill.jdbc.impl.DrillConnectionImpl > apache drill 1.8.0 > "a little sql for your nosql" > This property is meant for use by clients when not wanting to overload the ZK > for fetching a list of existing Drillbits, but the behaviour doesn't match > the documentation. > [Making a Direct Drillbit Connection | > https:
[jira] [Commented] (DRILL-5015) As per documentation, when issuing a list of drillbits in the connection string, we always attempt to connect only to the first one
[ https://issues.apache.org/jira/browse/DRILL-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15653506#comment-15653506 ] ASF GitHub Bot commented on DRILL-5015: --- Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/648#discussion_r87351621 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/client/DrillClientSystemTest.java --- @@ -73,4 +77,90 @@ public void testSubmitPlanTwoNodes() throws Exception { } client.close(); } + + @Test + public void testPopulateEndpointsList() throws Exception{ + +ArrayList endpointsList = new ArrayList<>(); +String drillBitConnection; +DrillClient client = new DrillClient(); +DrillbitEndpoint endpoint; +Iterator endpointIterator; + + +// Test with single drillbit ip +drillBitConnection = "10.10.100.161"; +client.populateEndpointsList(endpointsList, drillBitConnection); +endpoint = endpointsList.iterator().next(); +assert(endpointsList.size() == 1); +assert(endpoint.getAddress().equalsIgnoreCase(drillBitConnection)); +assert(endpoint.getUserPort() == client.getConfig().getInt(ExecConstants.INITIAL_USER_PORT)); + +// Test with single drillbit ip:port +endpointsList.clear(); +drillBitConnection = "10.10.100.161:5000"; +String[] ipAndPort = drillBitConnection.split(":"); +client.populateEndpointsList(endpointsList, drillBitConnection); +assert(endpointsList.size() == 1); + +endpoint = endpointsList.iterator().next(); +assert(endpoint.getAddress().equalsIgnoreCase(ipAndPort[0])); +assert(endpoint.getUserPort() == Integer.parseInt(ipAndPort[1])); + +// Test with multiple drillbit ip +endpointsList.clear(); +drillBitConnection = "10.10.100.161,10.10.100.162"; +client.populateEndpointsList(endpointsList, drillBitConnection); +assert(endpointsList.size() == 2); + +endpointIterator = endpointsList.iterator(); +endpoint = endpointIterator.next(); +assert(endpoint.getAddress().equalsIgnoreCase("10.10.100.161")); +assert(endpoint.getUserPort() == client.getConfig().getInt(ExecConstants.INITIAL_USER_PORT)); + +endpoint = endpointIterator.next(); +assert(endpoint.getAddress().equalsIgnoreCase("10.10.100.162")); +assert(endpoint.getUserPort() == client.getConfig().getInt(ExecConstants.INITIAL_USER_PORT)); + +// Test with multiple drillbit ip:port +endpointsList.clear(); +drillBitConnection = "10.10.100.161:5000,10.10.100.162:5000"; +client.populateEndpointsList(endpointsList, drillBitConnection); +assert(endpointsList.size() == 2); + +endpointIterator = endpointsList.iterator(); +endpoint = endpointIterator.next(); +assert(endpoint.getAddress().equalsIgnoreCase("10.10.100.161")); +assert(endpoint.getUserPort() == 5000); + +endpoint = endpointIterator.next(); +assert(endpoint.getAddress().equalsIgnoreCase("10.10.100.162")); +assert(endpoint.getUserPort() == 5000); + +// Test with multiple drillbit with mix of ip:port and ip +endpointsList.clear(); +drillBitConnection = "10.10.100.161:5000,10.10.100.162"; +client.populateEndpointsList(endpointsList, drillBitConnection); +assert(endpointsList.size() == 2); + +endpointIterator = endpointsList.iterator(); +endpoint = endpointIterator.next(); +assert(endpoint.getAddress().equalsIgnoreCase("10.10.100.161")); +assert(endpoint.getUserPort() == 5000); + +endpoint = endpointIterator.next(); +assert(endpoint.getAddress().equalsIgnoreCase("10.10.100.162")); +assert(endpoint.getUserPort() == client.getConfig().getInt(ExecConstants.INITIAL_USER_PORT)); + +// Test with empty string +endpointsList.clear(); +drillBitConnection = ""; +client.populateEndpointsList(endpointsList, drillBitConnection); +assert(endpointsList.size() == 0); + + --- End diff -- Added more test case based on new implementation > As per documentation, when issuing a list of drillbits in the connection > string, we always attempt to connect only to the first one > --- > > Key: DRILL-5015 > URL: https://issues.apache.org/jira/browse/DRILL-5015 > Project: Apache Drill > Issue Type: Bug > Components: Client - JDBC >Affects Versions: 1.8.0, 1.9.0 >Reporter: Sorabh Hamirwasia >Assignee: Sudheesh Katkam > >
[jira] [Commented] (DRILL-5015) As per documentation, when issuing a list of drillbits in the connection string, we always attempt to connect only to the first one
[ https://issues.apache.org/jira/browse/DRILL-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15653505#comment-15653505 ] ASF GitHub Bot commented on DRILL-5015: --- Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/648#discussion_r87351670 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java --- @@ -245,14 +291,15 @@ public synchronized void connect(String connect, Properties props) throws RpcExc throw new RpcException("Failure setting up ZK for client.", e); } } - - final ArrayList endpoints = new ArrayList<>(clusterCoordinator.getAvailableEndpoints()); - checkState(!endpoints.isEmpty(), "No DrillbitEndpoint can be found"); - // shuffle the collection then get the first endpoint - Collections.shuffle(endpoints); - endpoint = endpoints.iterator().next(); + endpoints.addAll(clusterCoordinator.getAvailableEndpoints()); } +// Make sure we have at least one endpoint in the list +checkState(!endpoints.isEmpty(), "No DrillbitEndpoint can be found"); --- End diff -- Moved checkstate inside the else condition and change the string to capture more information. For if case we are throwing exception if there is no "drillbit" value passed in connection string > As per documentation, when issuing a list of drillbits in the connection > string, we always attempt to connect only to the first one > --- > > Key: DRILL-5015 > URL: https://issues.apache.org/jira/browse/DRILL-5015 > Project: Apache Drill > Issue Type: Bug > Components: Client - JDBC >Affects Versions: 1.8.0, 1.9.0 >Reporter: Sorabh Hamirwasia >Assignee: Sudheesh Katkam > > When trying to connect to a Drill cluster by specifying more than 1 drillbits > to connect to, we always attempt to connect to only the first drillbit. > As an example, we tested against a pair of drillbits, but we always connect > to the first entry in the CSV list by querying for the 'current' drillbit. > The remaining entries are never attempted. > [root@pssc-60 agileSqlPerfTests]# /opt/mapr/drill/drill-1.8.0/bin/sqlline -u > "jdbc:drill:schema=dfs.tmp;drillbit=pssc-61:31010,pssc-62:31010" -f > whereAmI.q | grep -v logback > 1/1 select * from sys.drillbits where `current`; > +-++---++--+ > |hostname | user_port | control_port | data_port | current | > +-++---++--+ > | pssc-61.qa.lab | 31010 | 31011 | 31012 | true | > +-++---++--+ > 1 row selected (0.265 seconds) > Closing: org.apache.drill.jdbc.impl.DrillConnectionImpl > apache drill 1.8.0 > "a little sql for your nosql" > This property is meant for use by clients when not wanting to overload the ZK > for fetching a list of existing Drillbits, but the behaviour doesn't match > the documentation. > [Making a Direct Drillbit Connection | > https://drill.apache.org/docs/using-the-jdbc-driver/#using-the-jdbc-url-format-for-a-direct-drillbit-connection > ] > We need to randomly shuffle between this list and If an entry in the shuffled > list is unreachable, we need to try for the next entry in the list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5015) As per documentation, when issuing a list of drillbits in the connection string, we always attempt to connect only to the first one
[ https://issues.apache.org/jira/browse/DRILL-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15653517#comment-15653517 ] ASF GitHub Bot commented on DRILL-5015: --- Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/648#discussion_r87351643 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/client/DrillClientSystemTest.java --- @@ -73,4 +77,90 @@ public void testSubmitPlanTwoNodes() throws Exception { } client.close(); } + + @Test + public void testPopulateEndpointsList() throws Exception{ + +ArrayList endpointsList = new ArrayList<>(); +String drillBitConnection; +DrillClient client = new DrillClient(); +DrillbitEndpoint endpoint; +Iterator endpointIterator; + + +// Test with single drillbit ip +drillBitConnection = "10.10.100.161"; +client.populateEndpointsList(endpointsList, drillBitConnection); +endpoint = endpointsList.iterator().next(); +assert(endpointsList.size() == 1); +assert(endpoint.getAddress().equalsIgnoreCase(drillBitConnection)); +assert(endpoint.getUserPort() == client.getConfig().getInt(ExecConstants.INITIAL_USER_PORT)); + +// Test with single drillbit ip:port +endpointsList.clear(); +drillBitConnection = "10.10.100.161:5000"; +String[] ipAndPort = drillBitConnection.split(":"); +client.populateEndpointsList(endpointsList, drillBitConnection); +assert(endpointsList.size() == 1); + +endpoint = endpointsList.iterator().next(); +assert(endpoint.getAddress().equalsIgnoreCase(ipAndPort[0])); +assert(endpoint.getUserPort() == Integer.parseInt(ipAndPort[1])); + +// Test with multiple drillbit ip +endpointsList.clear(); +drillBitConnection = "10.10.100.161,10.10.100.162"; +client.populateEndpointsList(endpointsList, drillBitConnection); +assert(endpointsList.size() == 2); + +endpointIterator = endpointsList.iterator(); +endpoint = endpointIterator.next(); +assert(endpoint.getAddress().equalsIgnoreCase("10.10.100.161")); +assert(endpoint.getUserPort() == client.getConfig().getInt(ExecConstants.INITIAL_USER_PORT)); --- End diff -- Fixed > As per documentation, when issuing a list of drillbits in the connection > string, we always attempt to connect only to the first one > --- > > Key: DRILL-5015 > URL: https://issues.apache.org/jira/browse/DRILL-5015 > Project: Apache Drill > Issue Type: Bug > Components: Client - JDBC >Affects Versions: 1.8.0, 1.9.0 >Reporter: Sorabh Hamirwasia >Assignee: Sudheesh Katkam > > When trying to connect to a Drill cluster by specifying more than 1 drillbits > to connect to, we always attempt to connect to only the first drillbit. > As an example, we tested against a pair of drillbits, but we always connect > to the first entry in the CSV list by querying for the 'current' drillbit. > The remaining entries are never attempted. > [root@pssc-60 agileSqlPerfTests]# /opt/mapr/drill/drill-1.8.0/bin/sqlline -u > "jdbc:drill:schema=dfs.tmp;drillbit=pssc-61:31010,pssc-62:31010" -f > whereAmI.q | grep -v logback > 1/1 select * from sys.drillbits where `current`; > +-++---++--+ > |hostname | user_port | control_port | data_port | current | > +-++---++--+ > | pssc-61.qa.lab | 31010 | 31011 | 31012 | true | > +-++---++--+ > 1 row selected (0.265 seconds) > Closing: org.apache.drill.jdbc.impl.DrillConnectionImpl > apache drill 1.8.0 > "a little sql for your nosql" > This property is meant for use by clients when not wanting to overload the ZK > for fetching a list of existing Drillbits, but the behaviour doesn't match > the documentation. > [Making a Direct Drillbit Connection | > https://drill.apache.org/docs/using-the-jdbc-driver/#using-the-jdbc-url-format-for-a-direct-drillbit-connection > ] > We need to randomly shuffle between this list and If an entry in the shuffled > list is unreachable, we need to try for the next entry in the list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5015) As per documentation, when issuing a list of drillbits in the connection string, we always attempt to connect only to the first one
[ https://issues.apache.org/jira/browse/DRILL-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15653507#comment-15653507 ] ASF GitHub Bot commented on DRILL-5015: --- Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/648#discussion_r87351895 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java --- @@ -223,19 +223,65 @@ public void connect(Properties props) throws RpcException { connect(null, props); } + /** + * Function to populate the endpointList with information of all the drillbits + * provided in the connection string by client + * @param endpointList - ArrayList of DrillbitEndpoints + * @param drillbits - One or more drillbit ip[:port] provided in connection string + */ + public void populateEndpointsList(ArrayList endpointList, String drillbits){ + +// If no information about drillbits is provided then just return empty list. +if(drillbits == null || drillbits.length() == 0){ + return; +} +final String[] connectInfo = drillbits.split(","); + +/* For direct connection we can get URL string having drillbit property as below: + drillbit=: --- Use the IP and port specified as the Foreman IP and port + drillbit=--- Use the IP specified as the Foreman IP with default port in config file + drillbit=:,:... --- Randomly select the IP and port pair from the specified + list as the Foreman IP and port. + + Fetch ip address and port information for each drillbit and populate the list +*/ +for(String info : connectInfo){ + info = info.trim(); + + if(info != null){ --- End diff -- Yes. Fixed. Changed to drillbit > As per documentation, when issuing a list of drillbits in the connection > string, we always attempt to connect only to the first one > --- > > Key: DRILL-5015 > URL: https://issues.apache.org/jira/browse/DRILL-5015 > Project: Apache Drill > Issue Type: Bug > Components: Client - JDBC >Affects Versions: 1.8.0, 1.9.0 >Reporter: Sorabh Hamirwasia >Assignee: Sudheesh Katkam > > When trying to connect to a Drill cluster by specifying more than 1 drillbits > to connect to, we always attempt to connect to only the first drillbit. > As an example, we tested against a pair of drillbits, but we always connect > to the first entry in the CSV list by querying for the 'current' drillbit. > The remaining entries are never attempted. > [root@pssc-60 agileSqlPerfTests]# /opt/mapr/drill/drill-1.8.0/bin/sqlline -u > "jdbc:drill:schema=dfs.tmp;drillbit=pssc-61:31010,pssc-62:31010" -f > whereAmI.q | grep -v logback > 1/1 select * from sys.drillbits where `current`; > +-++---++--+ > |hostname | user_port | control_port | data_port | current | > +-++---++--+ > | pssc-61.qa.lab | 31010 | 31011 | 31012 | true | > +-++---++--+ > 1 row selected (0.265 seconds) > Closing: org.apache.drill.jdbc.impl.DrillConnectionImpl > apache drill 1.8.0 > "a little sql for your nosql" > This property is meant for use by clients when not wanting to overload the ZK > for fetching a list of existing Drillbits, but the behaviour doesn't match > the documentation. > [Making a Direct Drillbit Connection | > https://drill.apache.org/docs/using-the-jdbc-driver/#using-the-jdbc-url-format-for-a-direct-drillbit-connection > ] > We need to randomly shuffle between this list and If an entry in the shuffled > list is unreachable, we need to try for the next entry in the list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5015) As per documentation, when issuing a list of drillbits in the connection string, we always attempt to connect only to the first one
[ https://issues.apache.org/jira/browse/DRILL-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15653516#comment-15653516 ] ASF GitHub Bot commented on DRILL-5015: --- Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/648#discussion_r87351912 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java --- @@ -223,19 +223,65 @@ public void connect(Properties props) throws RpcException { connect(null, props); } + /** + * Function to populate the endpointList with information of all the drillbits + * provided in the connection string by client + * @param endpointList - ArrayList of DrillbitEndpoints + * @param drillbits - One or more drillbit ip[:port] provided in connection string + */ + public void populateEndpointsList(ArrayList endpointList, String drillbits){ + +// If no information about drillbits is provided then just return empty list. +if(drillbits == null || drillbits.length() == 0){ --- End diff -- Fixed > As per documentation, when issuing a list of drillbits in the connection > string, we always attempt to connect only to the first one > --- > > Key: DRILL-5015 > URL: https://issues.apache.org/jira/browse/DRILL-5015 > Project: Apache Drill > Issue Type: Bug > Components: Client - JDBC >Affects Versions: 1.8.0, 1.9.0 >Reporter: Sorabh Hamirwasia >Assignee: Sudheesh Katkam > > When trying to connect to a Drill cluster by specifying more than 1 drillbits > to connect to, we always attempt to connect to only the first drillbit. > As an example, we tested against a pair of drillbits, but we always connect > to the first entry in the CSV list by querying for the 'current' drillbit. > The remaining entries are never attempted. > [root@pssc-60 agileSqlPerfTests]# /opt/mapr/drill/drill-1.8.0/bin/sqlline -u > "jdbc:drill:schema=dfs.tmp;drillbit=pssc-61:31010,pssc-62:31010" -f > whereAmI.q | grep -v logback > 1/1 select * from sys.drillbits where `current`; > +-++---++--+ > |hostname | user_port | control_port | data_port | current | > +-++---++--+ > | pssc-61.qa.lab | 31010 | 31011 | 31012 | true | > +-++---++--+ > 1 row selected (0.265 seconds) > Closing: org.apache.drill.jdbc.impl.DrillConnectionImpl > apache drill 1.8.0 > "a little sql for your nosql" > This property is meant for use by clients when not wanting to overload the ZK > for fetching a list of existing Drillbits, but the behaviour doesn't match > the documentation. > [Making a Direct Drillbit Connection | > https://drill.apache.org/docs/using-the-jdbc-driver/#using-the-jdbc-url-format-for-a-direct-drillbit-connection > ] > We need to randomly shuffle between this list and If an entry in the shuffled > list is unreachable, we need to try for the next entry in the list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)