[jira] [Commented] (DRILL-5015) As per documentation, when issuing a list of drillbits in the connection string, we always attempt to connect only to the first one

2016-11-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656001#comment-15656001
 ] 

ASF GitHub Bot commented on DRILL-5015:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/648#discussion_r87531891
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java ---
@@ -223,19 +224,94 @@ public void connect(Properties props) throws 
RpcException {
 connect(null, props);
   }
 
+  /**
+   * Function to Populate the endpointList with the list of drillbits
+   * provided in the connection string by client.
+   *
+   * For direct connection we can get URL string having drillbit property 
as below:
+   * drillbit=: --- use the ip and port specified as the Foreman 
ip and port
+   * drillbit=--- use the ip specified as the Foreman ip with 
default port in config file
+   * drillbit=:,:... --- Randomly select the ip 
and port pair from the specified list as the
+   * Foreman ip and port.
+   *
+   * @param drillbits string with drillbit value provided in connection 
string
+   * @param defaultUserPort string with default userport of drillbit 
specified in config file
+   * @return list of drillbitendpoints parsed from connection string
+   * @throws InvalidConnectionInfoException if the connection string has 
invalid or no drillbit information
+   */
+  static List parseAndVerifyEndpoints(String drillbits, 
String defaultUserPort)
+throws InvalidConnectionInfoException {
+// If no drillbits is provided then just return empty list.
+if (drillbits.trim().isEmpty()) {
+  throw new InvalidConnectionInfoException("No drillbit information 
specified in the connection string");
+}
+
+ArrayList endpointList = new ArrayList<>();
+final String[] connectInfo = drillbits.split(",");
+
+// Fetch ip address and port information for each drillbit and 
populate the list
+for (String drillbit : connectInfo) {
+
+  // Trim all the empty spaces and check if the entry is empty string.
+  // Ignore the empty ones.
+  drillbit = drillbit.trim();
+
+  if (!drillbit.isEmpty()) {
+// Verify if we have only ":" or only ":port" pattern
+if (drillbit.charAt(0) == ':') {
+  // Invalid drillbit information
+  throw new InvalidConnectionInfoException("Malformed connection 
string with drillbit hostname or " +
+ "hostaddress missing 
for an entry: " + drillbit);
+}
+
+// We are now sure that each ip:port entry will have some both the 
entries.
+// Split each drillbit connection string to get ip address and 
port value
+final String[] drillbitInfo = drillbit.split(":");
+
+// Check if we have more than one port
+if (drillbitInfo.length > 2) {
+  throw new InvalidConnectionInfoException("Malformed connection 
string with more than one port in a " +
+ "drillbit entry: " + 
drillbit);
+}
+
+// At this point we are sure that drillbitInfo has atleast 
hostname or host address
+// trim all the empty spaces which might be present in front of 
hostname or
+// host address information
+final String ipAddress = drillbitInfo[0].trim();
+String port = defaultUserPort;
+
+if (drillbitInfo.length == 2) {
+  // We have a port value also given by user. trim all the empty 
spaces between : and port value before
+  // validating the correctness of value.
+  port = drillbitInfo[1].trim();
+}
+
+try {
+  final DrillbitEndpoint endpoint = DrillbitEndpoint.newBuilder()
+.setAddress(ipAddress)
+
.setUserPort(Integer.parseInt(port))
+.build();
+
+  endpointList.add(endpoint);
+} catch (NumberFormatException e) {
+  throw new InvalidConnectionInfoException("Malformed port value 
in entry: " + ipAddress + ":" + port + " " +
+ "passed in connection 
string");
+}
+  }
+}
+return endpointList;
--- End diff --

One last check: must have at least one endpoint. (The code above skips an 
entry if it is empty. If that was the only entry, the endpoint list might be 
empty here.)


> As per documentation, when 

[jira] [Commented] (DRILL-5015) As per documentation, when issuing a list of drillbits in the connection string, we always attempt to connect only to the first one

2016-11-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656000#comment-15656000
 ] 

ASF GitHub Bot commented on DRILL-5015:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/648#discussion_r87531981
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java ---
@@ -223,19 +224,94 @@ public void connect(Properties props) throws 
RpcException {
 connect(null, props);
   }
 
+  /**
+   * Function to Populate the endpointList with the list of drillbits
+   * provided in the connection string by client.
+   *
+   * For direct connection we can get URL string having drillbit property 
as below:
--- End diff --

Nit: this is a Javadoc comment, so strings must be formatted using HTML. 
One handy tool:


drillbit=...
us the ip and port...


In Eclipse, hovering over the method name shows the formatted Javadoc for a 
quick check.


> As per documentation, when issuing a list of drillbits in the connection 
> string, we always attempt to connect only to the first one
> ---
>
> Key: DRILL-5015
> URL: https://issues.apache.org/jira/browse/DRILL-5015
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 1.8.0, 1.9.0
>Reporter: Sorabh Hamirwasia
>Assignee: Sudheesh Katkam
>
> When trying to connect to a Drill cluster by specifying more than 1 drillbits 
> to connect to, we always attempt to connect to only the first drillbit.
> As an example, we tested against a pair of drillbits, but we always connect 
> to the first entry in the CSV list by querying for the 'current' drillbit. 
> The remaining entries are never attempted.
> [root@pssc-60 agileSqlPerfTests]# /opt/mapr/drill/drill-1.8.0/bin/sqlline  -u 
>  "jdbc:drill:schema=dfs.tmp;drillbit=pssc-61:31010,pssc-62:31010" -f 
> whereAmI.q  | grep -v logback
> 1/1  select * from sys.drillbits where `current`;
> +-++---++--+
> |hostname | user_port  | control_port  | data_port  | current  |
> +-++---++--+
> | pssc-61.qa.lab  | 31010  | 31011 | 31012  | true |
> +-++---++--+
> 1 row selected (0.265 seconds)
> Closing: org.apache.drill.jdbc.impl.DrillConnectionImpl
> apache drill 1.8.0 
> "a little sql for your nosql"
> This property is meant for use by clients when not wanting to overload the ZK 
> for fetching a list of existing Drillbits, but the behaviour doesn't match 
> the documentation. 
> [Making a Direct Drillbit Connection | 
> https://drill.apache.org/docs/using-the-jdbc-driver/#using-the-jdbc-url-format-for-a-direct-drillbit-connection
>  ]
> We need to randomly shuffle between this list and If an entry in the shuffled 
> list is unreachable, we need to try for the next entry in the list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-5015) As per documentation, when issuing a list of drillbits in the connection string, we always attempt to connect only to the first one

2016-11-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655998#comment-15655998
 ] 

ASF GitHub Bot commented on DRILL-5015:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/648#discussion_r87532223
  
--- Diff: 
exec/rpc/src/main/java/org/apache/drill/exec/rpc/InvalidConnectionInfoException.java
 ---
@@ -0,0 +1,35 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.rpc;
+
+/**
+ * Exception class to differentiate errors due to malformed connection 
string from client
+ */
+public class InvalidConnectionInfoException extends RpcException {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(InvalidConnectionInfoException.class);
--- End diff --

logger not strictly needed here: we don't log anything.


> As per documentation, when issuing a list of drillbits in the connection 
> string, we always attempt to connect only to the first one
> ---
>
> Key: DRILL-5015
> URL: https://issues.apache.org/jira/browse/DRILL-5015
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 1.8.0, 1.9.0
>Reporter: Sorabh Hamirwasia
>Assignee: Sudheesh Katkam
>
> When trying to connect to a Drill cluster by specifying more than 1 drillbits 
> to connect to, we always attempt to connect to only the first drillbit.
> As an example, we tested against a pair of drillbits, but we always connect 
> to the first entry in the CSV list by querying for the 'current' drillbit. 
> The remaining entries are never attempted.
> [root@pssc-60 agileSqlPerfTests]# /opt/mapr/drill/drill-1.8.0/bin/sqlline  -u 
>  "jdbc:drill:schema=dfs.tmp;drillbit=pssc-61:31010,pssc-62:31010" -f 
> whereAmI.q  | grep -v logback
> 1/1  select * from sys.drillbits where `current`;
> +-++---++--+
> |hostname | user_port  | control_port  | data_port  | current  |
> +-++---++--+
> | pssc-61.qa.lab  | 31010  | 31011 | 31012  | true |
> +-++---++--+
> 1 row selected (0.265 seconds)
> Closing: org.apache.drill.jdbc.impl.DrillConnectionImpl
> apache drill 1.8.0 
> "a little sql for your nosql"
> This property is meant for use by clients when not wanting to overload the ZK 
> for fetching a list of existing Drillbits, but the behaviour doesn't match 
> the documentation. 
> [Making a Direct Drillbit Connection | 
> https://drill.apache.org/docs/using-the-jdbc-driver/#using-the-jdbc-url-format-for-a-direct-drillbit-connection
>  ]
> We need to randomly shuffle between this list and If an entry in the shuffled 
> list is unreachable, we need to try for the next entry in the list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-5015) As per documentation, when issuing a list of drillbits in the connection string, we always attempt to connect only to the first one

2016-11-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655999#comment-15655999
 ] 

ASF GitHub Bot commented on DRILL-5015:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/648#discussion_r87532217
  
--- Diff: 
exec/rpc/src/main/java/org/apache/drill/exec/rpc/InvalidConnectionInfoException.java
 ---
@@ -0,0 +1,35 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.rpc;
+
+/**
+ * Exception class to differentiate errors due to malformed connection 
string from client
+ */
+public class InvalidConnectionInfoException extends RpcException {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(InvalidConnectionInfoException.class);
+
+  private final String message;
+
+  public InvalidConnectionInfoException(String message) {
--- End diff --

Just use the normal facilities:

public Invalid...( String message ) { super( message ); }
...
e.getMessage( ) ...



> As per documentation, when issuing a list of drillbits in the connection 
> string, we always attempt to connect only to the first one
> ---
>
> Key: DRILL-5015
> URL: https://issues.apache.org/jira/browse/DRILL-5015
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 1.8.0, 1.9.0
>Reporter: Sorabh Hamirwasia
>Assignee: Sudheesh Katkam
>
> When trying to connect to a Drill cluster by specifying more than 1 drillbits 
> to connect to, we always attempt to connect to only the first drillbit.
> As an example, we tested against a pair of drillbits, but we always connect 
> to the first entry in the CSV list by querying for the 'current' drillbit. 
> The remaining entries are never attempted.
> [root@pssc-60 agileSqlPerfTests]# /opt/mapr/drill/drill-1.8.0/bin/sqlline  -u 
>  "jdbc:drill:schema=dfs.tmp;drillbit=pssc-61:31010,pssc-62:31010" -f 
> whereAmI.q  | grep -v logback
> 1/1  select * from sys.drillbits where `current`;
> +-++---++--+
> |hostname | user_port  | control_port  | data_port  | current  |
> +-++---++--+
> | pssc-61.qa.lab  | 31010  | 31011 | 31012  | true |
> +-++---++--+
> 1 row selected (0.265 seconds)
> Closing: org.apache.drill.jdbc.impl.DrillConnectionImpl
> apache drill 1.8.0 
> "a little sql for your nosql"
> This property is meant for use by clients when not wanting to overload the ZK 
> for fetching a list of existing Drillbits, but the behaviour doesn't match 
> the documentation. 
> [Making a Direct Drillbit Connection | 
> https://drill.apache.org/docs/using-the-jdbc-driver/#using-the-jdbc-url-format-for-a-direct-drillbit-connection
>  ]
> We need to randomly shuffle between this list and If an entry in the shuffled 
> list is unreachable, we need to try for the next entry in the list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException

2016-11-10 Thread Robert Hou (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655464#comment-15655464
 ] 

Robert Hou edited comment on DRILL-5035 at 11/11/16 2:29 AM:
-

I set the new option to false and I do not get an exception.  I will try with 
IMPALA_TIMESTAMP.


was (Author: rhou):
I set the new option to false and I do not see a problem.  I will try with 
IMPALA_TIMESTAMP.

> Selecting timestamp value from Hive table causes IndexOutOfBoundsException
> --
>
> Key: DRILL-5035
> URL: https://issues.apache.org/jira/browse/DRILL-5035
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>Assignee: Vitalii Diravka
> Attachments: orders_parts_hive.tar
>
>
> I used the new option to read Hive timestamps.
> alter session set `store.parquet.reader.int96_as_timestamp` = true;
> This query fails:
> select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 
> 06:11:52.429';
> Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 
> 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768))
> Fragment 0:0
> [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] 
> (state=,code=0)
> Selecting all the columns succeed.
> 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where 
> timestamp_id = '2016-10-03 06:11:52.429';
> +-+++---+--+--+-++-++---++-+-+--+-+
> | o_orderkey  | o_custkey  | o_orderstatus  | o_totalprice  | o_orderdate  | 
> o_clerk  | o_shippriority  | o_comment
>   | int_id  | bigint_id  | float_id  | double_id  | 
> varchar_id  |   date_id   |   timestamp_id   |  dir0  
>  |
> +-+++---+--+--+-++-++---++-+-+--+-+
> | 11335   | 871| F  | 133549.0  | 1994-10-22   | 
> null | 0   | ealms. theodolites maintain. regular, even 
> instructions against t  | -4  | -4 | -4.0  | -4.0   | -4  
> | 2016-09-29  | 2016-10-03 06:11:52.429  | o_orderpriority=2-HIGH  |
> +-+++---+--+--+-++-++---++-+-+--+-+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException

2016-11-10 Thread Robert Hou (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655896#comment-15655896
 ] 

Robert Hou commented on DRILL-5035:
---

I am not able to use timestamp_impala yet.  But I tried the original query with 
Drill 1.8, and I get zero rows back.  Which makes sense, since we are not 
interpreting the timestamp correctly.

select timestamp_id from orders_parts_hive where timestamp_id >= '2016-10-09 
13:36:38.986' and timestamp_id <= '2016-10-09 13:45:38.986';
+---+
| timestamp_id  |
+---+
+---+


I also tried selecting the whole column.  I get bad values (known problem), but 
I get all the values.  I don't get an exception.

select timestamp_id from orders_parts_hive;




> Selecting timestamp value from Hive table causes IndexOutOfBoundsException
> --
>
> Key: DRILL-5035
> URL: https://issues.apache.org/jira/browse/DRILL-5035
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>Assignee: Vitalii Diravka
> Attachments: orders_parts_hive.tar
>
>
> I used the new option to read Hive timestamps.
> alter session set `store.parquet.reader.int96_as_timestamp` = true;
> This query fails:
> select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 
> 06:11:52.429';
> Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 
> 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768))
> Fragment 0:0
> [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] 
> (state=,code=0)
> Selecting all the columns succeed.
> 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where 
> timestamp_id = '2016-10-03 06:11:52.429';
> +-+++---+--+--+-++-++---++-+-+--+-+
> | o_orderkey  | o_custkey  | o_orderstatus  | o_totalprice  | o_orderdate  | 
> o_clerk  | o_shippriority  | o_comment
>   | int_id  | bigint_id  | float_id  | double_id  | 
> varchar_id  |   date_id   |   timestamp_id   |  dir0  
>  |
> +-+++---+--+--+-++-++---++-+-+--+-+
> | 11335   | 871| F  | 133549.0  | 1994-10-22   | 
> null | 0   | ealms. theodolites maintain. regular, even 
> instructions against t  | -4  | -4 | -4.0  | -4.0   | -4  
> | 2016-09-29  | 2016-10-03 06:11:52.429  | o_orderpriority=2-HIGH  |
> +-+++---+--+--+-++-++---++-+-+--+-+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4653) Malformed JSON should not stop the entire query from progressing

2016-11-10 Thread Kunal Khatua (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655750#comment-15655750
 ] 

Kunal Khatua commented on DRILL-4653:
-

[~ssriniva123] , while the feature is disabled by default, we should mark it as 
resolved only if it passes with the feature enabled.
 
[~khfaraaz] Please reopen this bug if the FAIL case would qualify as a blocker 
for closing this bug, so that we are tracking this correctly. 


> Malformed JSON should not stop the entire query from progressing
> 
>
> Key: DRILL-4653
> URL: https://issues.apache.org/jira/browse/DRILL-4653
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - JSON
>Affects Versions: 1.6.0
>Reporter: subbu srinivasan
> Fix For: 1.9.0
>
>
> Currently Drill query terminates upon first encounter of a invalid JSON line.
> Drill has to continue progressing after ignoring the bad records. Something 
> similar to a setting of (ignore.malformed.json) would help.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4726) Dynamic UDFs support

2016-11-10 Thread Rahul Challapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rahul Challapalli updated DRILL-4726:
-
Reviewer:   (was: Rahul Challapalli)

> Dynamic UDFs support
> 
>
> Key: DRILL-4726
> URL: https://issues.apache.org/jira/browse/DRILL-4726
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.6.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>  Labels: doc-impacting
> Fix For: 1.9.0
>
>
> Allow register UDFs without  restart of Drillbits.
> Design is described in document below:
> https://docs.google.com/document/d/1FfyJtWae5TLuyheHCfldYUpCdeIezR2RlNsrOTYyAB4/edit?usp=sharing
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException

2016-11-10 Thread Rahul Challapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655624#comment-15655624
 ] 

Rahul Challapalli commented on DRILL-5035:
--

Thanks [~rhou]. This confirms that its a bug with drill. Can you also check 
whether it is a regression? (Use timestamp_impala with drill-1.8.0 and see if 
it succeeds)

https://drill.apache.org/docs/parquet-format/#about-int96-support

> Selecting timestamp value from Hive table causes IndexOutOfBoundsException
> --
>
> Key: DRILL-5035
> URL: https://issues.apache.org/jira/browse/DRILL-5035
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>Assignee: Vitalii Diravka
> Attachments: orders_parts_hive.tar
>
>
> I used the new option to read Hive timestamps.
> alter session set `store.parquet.reader.int96_as_timestamp` = true;
> This query fails:
> select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 
> 06:11:52.429';
> Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 
> 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768))
> Fragment 0:0
> [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] 
> (state=,code=0)
> Selecting all the columns succeed.
> 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where 
> timestamp_id = '2016-10-03 06:11:52.429';
> +-+++---+--+--+-++-++---++-+-+--+-+
> | o_orderkey  | o_custkey  | o_orderstatus  | o_totalprice  | o_orderdate  | 
> o_clerk  | o_shippriority  | o_comment
>   | int_id  | bigint_id  | float_id  | double_id  | 
> varchar_id  |   date_id   |   timestamp_id   |  dir0  
>  |
> +-+++---+--+--+-++-++---++-+-+--+-+
> | 11335   | 871| F  | 133549.0  | 1994-10-22   | 
> null | 0   | ealms. theodolites maintain. regular, even 
> instructions against t  | -4  | -4 | -4.0  | -4.0   | -4  
> | 2016-09-29  | 2016-10-03 06:11:52.429  | o_orderpriority=2-HIGH  |
> +-+++---+--+--+-++-++---++-+-+--+-+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException

2016-11-10 Thread Robert Hou (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655566#comment-15655566
 ] 

Robert Hou commented on DRILL-5035:
---

I tried with Hive.  It succeeds.

> Selecting timestamp value from Hive table causes IndexOutOfBoundsException
> --
>
> Key: DRILL-5035
> URL: https://issues.apache.org/jira/browse/DRILL-5035
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>Assignee: Vitalii Diravka
> Attachments: orders_parts_hive.tar
>
>
> I used the new option to read Hive timestamps.
> alter session set `store.parquet.reader.int96_as_timestamp` = true;
> This query fails:
> select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 
> 06:11:52.429';
> Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 
> 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768))
> Fragment 0:0
> [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] 
> (state=,code=0)
> Selecting all the columns succeed.
> 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where 
> timestamp_id = '2016-10-03 06:11:52.429';
> +-+++---+--+--+-++-++---++-+-+--+-+
> | o_orderkey  | o_custkey  | o_orderstatus  | o_totalprice  | o_orderdate  | 
> o_clerk  | o_shippriority  | o_comment
>   | int_id  | bigint_id  | float_id  | double_id  | 
> varchar_id  |   date_id   |   timestamp_id   |  dir0  
>  |
> +-+++---+--+--+-++-++---++-+-+--+-+
> | 11335   | 871| F  | 133549.0  | 1994-10-22   | 
> null | 0   | ealms. theodolites maintain. regular, even 
> instructions against t  | -4  | -4 | -4.0  | -4.0   | -4  
> | 2016-09-29  | 2016-10-03 06:11:52.429  | o_orderpriority=2-HIGH  |
> +-+++---+--+--+-++-++---++-+-+--+-+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException

2016-11-10 Thread Robert Hou (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655535#comment-15655535
 ] 

Robert Hou commented on DRILL-5035:
---

~/bin/parquet-meta 00_0 
file:   
file:/root/drill-test-framework-pushdown/data/orders_parts_hive/o_orderpriority=1-URGENT/00_0
 
creator:parquet-mr version 1.6.0 


> Selecting timestamp value from Hive table causes IndexOutOfBoundsException
> --
>
> Key: DRILL-5035
> URL: https://issues.apache.org/jira/browse/DRILL-5035
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>Assignee: Vitalii Diravka
> Attachments: orders_parts_hive.tar
>
>
> I used the new option to read Hive timestamps.
> alter session set `store.parquet.reader.int96_as_timestamp` = true;
> This query fails:
> select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 
> 06:11:52.429';
> Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 
> 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768))
> Fragment 0:0
> [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] 
> (state=,code=0)
> Selecting all the columns succeed.
> 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where 
> timestamp_id = '2016-10-03 06:11:52.429';
> +-+++---+--+--+-++-++---++-+-+--+-+
> | o_orderkey  | o_custkey  | o_orderstatus  | o_totalprice  | o_orderdate  | 
> o_clerk  | o_shippriority  | o_comment
>   | int_id  | bigint_id  | float_id  | double_id  | 
> varchar_id  |   date_id   |   timestamp_id   |  dir0  
>  |
> +-+++---+--+--+-++-++---++-+-+--+-+
> | 11335   | 871| F  | 133549.0  | 1994-10-22   | 
> null | 0   | ealms. theodolites maintain. regular, even 
> instructions against t  | -4  | -4 | -4.0  | -4.0   | -4  
> | 2016-09-29  | 2016-10-03 06:11:52.429  | o_orderpriority=2-HIGH  |
> +-+++---+--+--+-++-++---++-+-+--+-+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException

2016-11-10 Thread Rahul Challapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655520#comment-15655520
 ] 

Rahul Challapalli commented on DRILL-5035:
--

Based on your explanation, the parquet files are created by hive itself. Then 
this is a bug. But just to confirm, can you do the below checks

1. Inspect the parquet metadata and look for "creator" field
2. Try to run a similar query from hive and see if it succeeds

> Selecting timestamp value from Hive table causes IndexOutOfBoundsException
> --
>
> Key: DRILL-5035
> URL: https://issues.apache.org/jira/browse/DRILL-5035
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>Assignee: Vitalii Diravka
> Attachments: orders_parts_hive.tar
>
>
> I used the new option to read Hive timestamps.
> alter session set `store.parquet.reader.int96_as_timestamp` = true;
> This query fails:
> select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 
> 06:11:52.429';
> Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 
> 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768))
> Fragment 0:0
> [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] 
> (state=,code=0)
> Selecting all the columns succeed.
> 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where 
> timestamp_id = '2016-10-03 06:11:52.429';
> +-+++---+--+--+-++-++---++-+-+--+-+
> | o_orderkey  | o_custkey  | o_orderstatus  | o_totalprice  | o_orderdate  | 
> o_clerk  | o_shippriority  | o_comment
>   | int_id  | bigint_id  | float_id  | double_id  | 
> varchar_id  |   date_id   |   timestamp_id   |  dir0  
>  |
> +-+++---+--+--+-++-++---++-+-+--+-+
> | 11335   | 871| F  | 133549.0  | 1994-10-22   | 
> null | 0   | ealms. theodolites maintain. regular, even 
> instructions against t  | -4  | -4 | -4.0  | -4.0   | -4  
> | 2016-09-29  | 2016-10-03 06:11:52.429  | o_orderpriority=2-HIGH  |
> +-+++---+--+--+-++-++---++-+-+--+-+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException

2016-11-10 Thread Robert Hou (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655518#comment-15655518
 ] 

Robert Hou commented on DRILL-5035:
---

I am not sure this is a release stopper.  It may be due to the fact that I have 
a partition that only has null values for the column.

> Selecting timestamp value from Hive table causes IndexOutOfBoundsException
> --
>
> Key: DRILL-5035
> URL: https://issues.apache.org/jira/browse/DRILL-5035
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>Assignee: Vitalii Diravka
> Attachments: orders_parts_hive.tar
>
>
> I used the new option to read Hive timestamps.
> alter session set `store.parquet.reader.int96_as_timestamp` = true;
> This query fails:
> select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 
> 06:11:52.429';
> Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 
> 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768))
> Fragment 0:0
> [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] 
> (state=,code=0)
> Selecting all the columns succeed.
> 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where 
> timestamp_id = '2016-10-03 06:11:52.429';
> +-+++---+--+--+-++-++---++-+-+--+-+
> | o_orderkey  | o_custkey  | o_orderstatus  | o_totalprice  | o_orderdate  | 
> o_clerk  | o_shippriority  | o_comment
>   | int_id  | bigint_id  | float_id  | double_id  | 
> varchar_id  |   date_id   |   timestamp_id   |  dir0  
>  |
> +-+++---+--+--+-++-++---++-+-+--+-+
> | 11335   | 871| F  | 133549.0  | 1994-10-22   | 
> null | 0   | ealms. theodolites maintain. regular, even 
> instructions against t  | -4  | -4 | -4.0  | -4.0   | -4  
> | 2016-09-29  | 2016-10-03 06:11:52.429  | o_orderpriority=2-HIGH  |
> +-+++---+--+--+-++-++---++-+-+--+-+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException

2016-11-10 Thread Robert Hou (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655507#comment-15655507
 ] 

Robert Hou commented on DRILL-5035:
---

The DDL for the partitioned Hive table:

create table orders_parts_hive (
o_orderkey int,
o_custkey int,
o_orderstatus string,
o_totalprice double,
o_orderdate date,
o_clerk string,
o_shippriority int,
o_comment string,
int_id int,
bigint_id bigint,
float_id float,
double_id double,
varchar_id string,
date_id date,
timestamp_id timestamp)
partitioned by (o_orderpriority string)
stored as parquet;

> Selecting timestamp value from Hive table causes IndexOutOfBoundsException
> --
>
> Key: DRILL-5035
> URL: https://issues.apache.org/jira/browse/DRILL-5035
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>Assignee: Vitalii Diravka
> Attachments: orders_parts_hive.tar
>
>
> I used the new option to read Hive timestamps.
> alter session set `store.parquet.reader.int96_as_timestamp` = true;
> This query fails:
> select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 
> 06:11:52.429';
> Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 
> 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768))
> Fragment 0:0
> [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] 
> (state=,code=0)
> Selecting all the columns succeed.
> 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where 
> timestamp_id = '2016-10-03 06:11:52.429';
> +-+++---+--+--+-++-++---++-+-+--+-+
> | o_orderkey  | o_custkey  | o_orderstatus  | o_totalprice  | o_orderdate  | 
> o_clerk  | o_shippriority  | o_comment
>   | int_id  | bigint_id  | float_id  | double_id  | 
> varchar_id  |   date_id   |   timestamp_id   |  dir0  
>  |
> +-+++---+--+--+-++-++---++-+-+--+-+
> | 11335   | 871| F  | 133549.0  | 1994-10-22   | 
> null | 0   | ealms. theodolites maintain. regular, even 
> instructions against t  | -4  | -4 | -4.0  | -4.0   | -4  
> | 2016-09-29  | 2016-10-03 06:11:52.429  | o_orderpriority=2-HIGH  |
> +-+++---+--+--+-++-++---++-+-+--+-+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException

2016-11-10 Thread Robert Hou (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655504#comment-15655504
 ] 

Robert Hou commented on DRILL-5035:
---

Interesting.

I exported Drill data to a tbl file.  I edited the tbl file so that Hive could 
read it.  I created a Hive table and loaded it from the tbl file.  Created a 
parquet Hive table from the first Hive table.  And then created a partitioned 
Hive table from the parquet Hive table.

> Selecting timestamp value from Hive table causes IndexOutOfBoundsException
> --
>
> Key: DRILL-5035
> URL: https://issues.apache.org/jira/browse/DRILL-5035
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>Assignee: Vitalii Diravka
> Attachments: orders_parts_hive.tar
>
>
> I used the new option to read Hive timestamps.
> alter session set `store.parquet.reader.int96_as_timestamp` = true;
> This query fails:
> select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 
> 06:11:52.429';
> Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 
> 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768))
> Fragment 0:0
> [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] 
> (state=,code=0)
> Selecting all the columns succeed.
> 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where 
> timestamp_id = '2016-10-03 06:11:52.429';
> +-+++---+--+--+-++-++---++-+-+--+-+
> | o_orderkey  | o_custkey  | o_orderstatus  | o_totalprice  | o_orderdate  | 
> o_clerk  | o_shippriority  | o_comment
>   | int_id  | bigint_id  | float_id  | double_id  | 
> varchar_id  |   date_id   |   timestamp_id   |  dir0  
>  |
> +-+++---+--+--+-++-++---++-+-+--+-+
> | 11335   | 871| F  | 133549.0  | 1994-10-22   | 
> null | 0   | ealms. theodolites maintain. regular, even 
> instructions against t  | -4  | -4 | -4.0  | -4.0   | -4  
> | 2016-09-29  | 2016-10-03 06:11:52.429  | o_orderpriority=2-HIGH  |
> +-+++---+--+--+-++-++---++-+-+--+-+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException

2016-11-10 Thread Rahul Challapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655486#comment-15655486
 ] 

Rahul Challapalli commented on DRILL-5035:
--

I am a little confused. How did you generate the data for the hive table? If it 
is generated by drill, that explains the behavior and this is not a bug

> Selecting timestamp value from Hive table causes IndexOutOfBoundsException
> --
>
> Key: DRILL-5035
> URL: https://issues.apache.org/jira/browse/DRILL-5035
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>Assignee: Vitalii Diravka
> Attachments: orders_parts_hive.tar
>
>
> I used the new option to read Hive timestamps.
> alter session set `store.parquet.reader.int96_as_timestamp` = true;
> This query fails:
> select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 
> 06:11:52.429';
> Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 
> 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768))
> Fragment 0:0
> [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] 
> (state=,code=0)
> Selecting all the columns succeed.
> 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where 
> timestamp_id = '2016-10-03 06:11:52.429';
> +-+++---+--+--+-++-++---++-+-+--+-+
> | o_orderkey  | o_custkey  | o_orderstatus  | o_totalprice  | o_orderdate  | 
> o_clerk  | o_shippriority  | o_comment
>   | int_id  | bigint_id  | float_id  | double_id  | 
> varchar_id  |   date_id   |   timestamp_id   |  dir0  
>  |
> +-+++---+--+--+-++-++---++-+-+--+-+
> | 11335   | 871| F  | 133549.0  | 1994-10-22   | 
> null | 0   | ealms. theodolites maintain. regular, even 
> instructions against t  | -4  | -4 | -4.0  | -4.0   | -4  
> | 2016-09-29  | 2016-10-03 06:11:52.429  | o_orderpriority=2-HIGH  |
> +-+++---+--+--+-++-++---++-+-+--+-+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException

2016-11-10 Thread Robert Hou (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-5035:
--
Attachment: orders_parts_hive.tar

This is a Hive partitioned table.  It is partitioned on o_orderpriority.

> Selecting timestamp value from Hive table causes IndexOutOfBoundsException
> --
>
> Key: DRILL-5035
> URL: https://issues.apache.org/jira/browse/DRILL-5035
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>Assignee: Vitalii Diravka
> Attachments: orders_parts_hive.tar
>
>
> I used the new option to read Hive timestamps.
> alter session set `store.parquet.reader.int96_as_timestamp` = true;
> This query fails:
> select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 
> 06:11:52.429';
> Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 
> 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768))
> Fragment 0:0
> [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] 
> (state=,code=0)
> Selecting all the columns succeed.
> 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where 
> timestamp_id = '2016-10-03 06:11:52.429';
> +-+++---+--+--+-++-++---++-+-+--+-+
> | o_orderkey  | o_custkey  | o_orderstatus  | o_totalprice  | o_orderdate  | 
> o_clerk  | o_shippriority  | o_comment
>   | int_id  | bigint_id  | float_id  | double_id  | 
> varchar_id  |   date_id   |   timestamp_id   |  dir0  
>  |
> +-+++---+--+--+-++-++---++-+-+--+-+
> | 11335   | 871| F  | 133549.0  | 1994-10-22   | 
> null | 0   | ealms. theodolites maintain. regular, even 
> instructions against t  | -4  | -4 | -4.0  | -4.0   | -4  
> | 2016-09-29  | 2016-10-03 06:11:52.429  | o_orderpriority=2-HIGH  |
> +-+++---+--+--+-++-++---++-+-+--+-+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException

2016-11-10 Thread Robert Hou (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655479#comment-15655479
 ] 

Robert Hou commented on DRILL-5035:
---

I'm trying to figure out how to do that.  Because it is a Hive partitioned 
table, it has five directories, each with one file, and they all have the same 
name.  Maybe I'll use a tar file.

> Selecting timestamp value from Hive table causes IndexOutOfBoundsException
> --
>
> Key: DRILL-5035
> URL: https://issues.apache.org/jira/browse/DRILL-5035
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>Assignee: Vitalii Diravka
>
> I used the new option to read Hive timestamps.
> alter session set `store.parquet.reader.int96_as_timestamp` = true;
> This query fails:
> select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 
> 06:11:52.429';
> Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 
> 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768))
> Fragment 0:0
> [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] 
> (state=,code=0)
> Selecting all the columns succeed.
> 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where 
> timestamp_id = '2016-10-03 06:11:52.429';
> +-+++---+--+--+-++-++---++-+-+--+-+
> | o_orderkey  | o_custkey  | o_orderstatus  | o_totalprice  | o_orderdate  | 
> o_clerk  | o_shippriority  | o_comment
>   | int_id  | bigint_id  | float_id  | double_id  | 
> varchar_id  |   date_id   |   timestamp_id   |  dir0  
>  |
> +-+++---+--+--+-++-++---++-+-+--+-+
> | 11335   | 871| F  | 133549.0  | 1994-10-22   | 
> null | 0   | ealms. theodolites maintain. regular, even 
> instructions against t  | -4  | -4 | -4.0  | -4.0   | -4  
> | 2016-09-29  | 2016-10-03 06:11:52.429  | o_orderpriority=2-HIGH  |
> +-+++---+--+--+-++-++---++-+-+--+-+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException

2016-11-10 Thread Rahul Challapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655475#comment-15655475
 ] 

Rahul Challapalli commented on DRILL-5035:
--

Also it would be helpful if you can upload the data along with hive ddl. 
Assuming data is less than 10MB

> Selecting timestamp value from Hive table causes IndexOutOfBoundsException
> --
>
> Key: DRILL-5035
> URL: https://issues.apache.org/jira/browse/DRILL-5035
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>Assignee: Vitalii Diravka
>
> I used the new option to read Hive timestamps.
> alter session set `store.parquet.reader.int96_as_timestamp` = true;
> This query fails:
> select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 
> 06:11:52.429';
> Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 
> 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768))
> Fragment 0:0
> [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] 
> (state=,code=0)
> Selecting all the columns succeed.
> 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where 
> timestamp_id = '2016-10-03 06:11:52.429';
> +-+++---+--+--+-++-++---++-+-+--+-+
> | o_orderkey  | o_custkey  | o_orderstatus  | o_totalprice  | o_orderdate  | 
> o_clerk  | o_shippriority  | o_comment
>   | int_id  | bigint_id  | float_id  | double_id  | 
> varchar_id  |   date_id   |   timestamp_id   |  dir0  
>  |
> +-+++---+--+--+-++-++---++-+-+--+-+
> | 11335   | 871| F  | 133549.0  | 1994-10-22   | 
> null | 0   | ealms. theodolites maintain. regular, even 
> instructions against t  | -4  | -4 | -4.0  | -4.0   | -4  
> | 2016-09-29  | 2016-10-03 06:11:52.429  | o_orderpriority=2-HIGH  |
> +-+++---+--+--+-++-++---++-+-+--+-+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException

2016-11-10 Thread Robert Hou (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655476#comment-15655476
 ] 

Robert Hou commented on DRILL-5035:
---

Yes, I created it.  It is a Hive table partitioned on a string.  I created it 
using data from a Drill table.

> Selecting timestamp value from Hive table causes IndexOutOfBoundsException
> --
>
> Key: DRILL-5035
> URL: https://issues.apache.org/jira/browse/DRILL-5035
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>Assignee: Vitalii Diravka
>
> I used the new option to read Hive timestamps.
> alter session set `store.parquet.reader.int96_as_timestamp` = true;
> This query fails:
> select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 
> 06:11:52.429';
> Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 
> 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768))
> Fragment 0:0
> [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] 
> (state=,code=0)
> Selecting all the columns succeed.
> 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where 
> timestamp_id = '2016-10-03 06:11:52.429';
> +-+++---+--+--+-++-++---++-+-+--+-+
> | o_orderkey  | o_custkey  | o_orderstatus  | o_totalprice  | o_orderdate  | 
> o_clerk  | o_shippriority  | o_comment
>   | int_id  | bigint_id  | float_id  | double_id  | 
> varchar_id  |   date_id   |   timestamp_id   |  dir0  
>  |
> +-+++---+--+--+-++-++---++-+-+--+-+
> | 11335   | 871| F  | 133549.0  | 1994-10-22   | 
> null | 0   | ealms. theodolites maintain. regular, even 
> instructions against t  | -4  | -4 | -4.0  | -4.0   | -4  
> | 2016-09-29  | 2016-10-03 06:11:52.429  | o_orderpriority=2-HIGH  |
> +-+++---+--+--+-++-++---++-+-+--+-+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException

2016-11-10 Thread Rahul Challapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655471#comment-15655471
 ] 

Rahul Challapalli commented on DRILL-5035:
--

Are you sure that the data is generated using itself? You can have a hive table 
sitting on top of data generated by drill.

> Selecting timestamp value from Hive table causes IndexOutOfBoundsException
> --
>
> Key: DRILL-5035
> URL: https://issues.apache.org/jira/browse/DRILL-5035
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>Assignee: Vitalii Diravka
>
> I used the new option to read Hive timestamps.
> alter session set `store.parquet.reader.int96_as_timestamp` = true;
> This query fails:
> select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 
> 06:11:52.429';
> Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 
> 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768))
> Fragment 0:0
> [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] 
> (state=,code=0)
> Selecting all the columns succeed.
> 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where 
> timestamp_id = '2016-10-03 06:11:52.429';
> +-+++---+--+--+-++-++---++-+-+--+-+
> | o_orderkey  | o_custkey  | o_orderstatus  | o_totalprice  | o_orderdate  | 
> o_clerk  | o_shippriority  | o_comment
>   | int_id  | bigint_id  | float_id  | double_id  | 
> varchar_id  |   date_id   |   timestamp_id   |  dir0  
>  |
> +-+++---+--+--+-++-++---++-+-+--+-+
> | 11335   | 871| F  | 133549.0  | 1994-10-22   | 
> null | 0   | ealms. theodolites maintain. regular, even 
> instructions against t  | -4  | -4 | -4.0  | -4.0   | -4  
> | 2016-09-29  | 2016-10-03 06:11:52.429  | o_orderpriority=2-HIGH  |
> +-+++---+--+--+-++-++---++-+-+--+-+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException

2016-11-10 Thread Robert Hou (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655464#comment-15655464
 ] 

Robert Hou commented on DRILL-5035:
---

I set the new option to false and I do not see a problem.  I will try with 
IMPALA_TIMESTAMP.

> Selecting timestamp value from Hive table causes IndexOutOfBoundsException
> --
>
> Key: DRILL-5035
> URL: https://issues.apache.org/jira/browse/DRILL-5035
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>Assignee: Vitalii Diravka
>
> I used the new option to read Hive timestamps.
> alter session set `store.parquet.reader.int96_as_timestamp` = true;
> This query fails:
> select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 
> 06:11:52.429';
> Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 
> 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768))
> Fragment 0:0
> [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] 
> (state=,code=0)
> Selecting all the columns succeed.
> 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where 
> timestamp_id = '2016-10-03 06:11:52.429';
> +-+++---+--+--+-++-++---++-+-+--+-+
> | o_orderkey  | o_custkey  | o_orderstatus  | o_totalprice  | o_orderdate  | 
> o_clerk  | o_shippriority  | o_comment
>   | int_id  | bigint_id  | float_id  | double_id  | 
> varchar_id  |   date_id   |   timestamp_id   |  dir0  
>  |
> +-+++---+--+--+-++-++---++-+-+--+-+
> | 11335   | 871| F  | 133549.0  | 1994-10-22   | 
> null | 0   | ealms. theodolites maintain. regular, even 
> instructions against t  | -4  | -4 | -4.0  | -4.0   | -4  
> | 2016-09-29  | 2016-10-03 06:11:52.429  | o_orderpriority=2-HIGH  |
> +-+++---+--+--+-++-++---++-+-+--+-+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException

2016-11-10 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5035:

Assignee: Vitalii Diravka

> Selecting timestamp value from Hive table causes IndexOutOfBoundsException
> --
>
> Key: DRILL-5035
> URL: https://issues.apache.org/jira/browse/DRILL-5035
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>Assignee: Vitalii Diravka
>
> I used the new option to read Hive timestamps.
> alter session set `store.parquet.reader.int96_as_timestamp` = true;
> This query fails:
> select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 
> 06:11:52.429';
> Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 
> 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768))
> Fragment 0:0
> [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] 
> (state=,code=0)
> Selecting all the columns succeed.
> 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where 
> timestamp_id = '2016-10-03 06:11:52.429';
> +-+++---+--+--+-++-++---++-+-+--+-+
> | o_orderkey  | o_custkey  | o_orderstatus  | o_totalprice  | o_orderdate  | 
> o_clerk  | o_shippriority  | o_comment
>   | int_id  | bigint_id  | float_id  | double_id  | 
> varchar_id  |   date_id   |   timestamp_id   |  dir0  
>  |
> +-+++---+--+--+-++-++---++-+-+--+-+
> | 11335   | 871| F  | 133549.0  | 1994-10-22   | 
> null | 0   | ealms. theodolites maintain. regular, even 
> instructions against t  | -4  | -4 | -4.0  | -4.0   | -4  
> | 2016-09-29  | 2016-10-03 06:11:52.429  | o_orderpriority=2-HIGH  |
> +-+++---+--+--+-++-++---++-+-+--+-+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException

2016-11-10 Thread Rahul Challapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655451#comment-15655451
 ] 

Rahul Challapalli commented on DRILL-5035:
--

Can we remove the new session option and use IMPALA_TIMESTAMP and see if it has 
the same issue? And run the same query on drill-1.8.0 and see if this is a 
regression.

> Selecting timestamp value from Hive table causes IndexOutOfBoundsException
> --
>
> Key: DRILL-5035
> URL: https://issues.apache.org/jira/browse/DRILL-5035
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>
> I used the new option to read Hive timestamps.
> alter session set `store.parquet.reader.int96_as_timestamp` = true;
> This query fails:
> select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 
> 06:11:52.429';
> Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 
> 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768))
> Fragment 0:0
> [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] 
> (state=,code=0)
> Selecting all the columns succeed.
> 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where 
> timestamp_id = '2016-10-03 06:11:52.429';
> +-+++---+--+--+-++-++---++-+-+--+-+
> | o_orderkey  | o_custkey  | o_orderstatus  | o_totalprice  | o_orderdate  | 
> o_clerk  | o_shippriority  | o_comment
>   | int_id  | bigint_id  | float_id  | double_id  | 
> varchar_id  |   date_id   |   timestamp_id   |  dir0  
>  |
> +-+++---+--+--+-++-++---++-+-+--+-+
> | 11335   | 871| F  | 133549.0  | 1994-10-22   | 
> null | 0   | ealms. theodolites maintain. regular, even 
> instructions against t  | -4  | -4 | -4.0  | -4.0   | -4  
> | 2016-09-29  | 2016-10-03 06:11:52.429  | o_orderpriority=2-HIGH  |
> +-+++---+--+--+-++-++---++-+-+--+-+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException

2016-11-10 Thread Robert Hou (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655374#comment-15655374
 ] 

Robert Hou edited comment on DRILL-5035 at 11/10/16 10:39 PM:
--

This table is partitioned on a string.  The problem occurs with a partition 
that has null values.  The value of the string is "NOT SPECIFIED".

I can select every row up to the null partition using:

select timestamp_id from orders_parts_hive limit 9026;

But the next row is in the null partition and causes an exception.

select timestamp_id from orders_parts_hive limit 9027;


was (Author: rhou):
This table is partitioned on a string.  The problem occurs with a partition 
that has null values.  The value of the varchar is "NOT SPECIFIED".

I can select every row up to the null partition using:

select timestamp_id from orders_parts_hive limit 9026;

But the next row is in the null partition and causes an exception.

select timestamp_id from orders_parts_hive limit 9027;

> Selecting timestamp value from Hive table causes IndexOutOfBoundsException
> --
>
> Key: DRILL-5035
> URL: https://issues.apache.org/jira/browse/DRILL-5035
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>
> I used the new option to read Hive timestamps.
> alter session set `store.parquet.reader.int96_as_timestamp` = true;
> This query fails:
> select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 
> 06:11:52.429';
> Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 
> 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768))
> Fragment 0:0
> [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] 
> (state=,code=0)
> Selecting all the columns succeed.
> 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where 
> timestamp_id = '2016-10-03 06:11:52.429';
> +-+++---+--+--+-++-++---++-+-+--+-+
> | o_orderkey  | o_custkey  | o_orderstatus  | o_totalprice  | o_orderdate  | 
> o_clerk  | o_shippriority  | o_comment
>   | int_id  | bigint_id  | float_id  | double_id  | 
> varchar_id  |   date_id   |   timestamp_id   |  dir0  
>  |
> +-+++---+--+--+-++-++---++-+-+--+-+
> | 11335   | 871| F  | 133549.0  | 1994-10-22   | 
> null | 0   | ealms. theodolites maintain. regular, even 
> instructions against t  | -4  | -4 | -4.0  | -4.0   | -4  
> | 2016-09-29  | 2016-10-03 06:11:52.429  | o_orderpriority=2-HIGH  |
> +-+++---+--+--+-++-++---++-+-+--+-+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException

2016-11-10 Thread Robert Hou (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655397#comment-15655397
 ] 

Robert Hou edited comment on DRILL-5035 at 11/10/16 10:39 PM:
--

The partition only has null values for timestamp_id.  Could this be an issue 
with empty batches?  There are 3024 null values in the partition.


was (Author: rhou):
The partition only has null values for timestamp_id.

> Selecting timestamp value from Hive table causes IndexOutOfBoundsException
> --
>
> Key: DRILL-5035
> URL: https://issues.apache.org/jira/browse/DRILL-5035
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>
> I used the new option to read Hive timestamps.
> alter session set `store.parquet.reader.int96_as_timestamp` = true;
> This query fails:
> select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 
> 06:11:52.429';
> Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 
> 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768))
> Fragment 0:0
> [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] 
> (state=,code=0)
> Selecting all the columns succeed.
> 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where 
> timestamp_id = '2016-10-03 06:11:52.429';
> +-+++---+--+--+-++-++---++-+-+--+-+
> | o_orderkey  | o_custkey  | o_orderstatus  | o_totalprice  | o_orderdate  | 
> o_clerk  | o_shippriority  | o_comment
>   | int_id  | bigint_id  | float_id  | double_id  | 
> varchar_id  |   date_id   |   timestamp_id   |  dir0  
>  |
> +-+++---+--+--+-++-++---++-+-+--+-+
> | 11335   | 871| F  | 133549.0  | 1994-10-22   | 
> null | 0   | ealms. theodolites maintain. regular, even 
> instructions against t  | -4  | -4 | -4.0  | -4.0   | -4  
> | 2016-09-29  | 2016-10-03 06:11:52.429  | o_orderpriority=2-HIGH  |
> +-+++---+--+--+-++-++---++-+-+--+-+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException

2016-11-10 Thread Robert Hou (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655374#comment-15655374
 ] 

Robert Hou edited comment on DRILL-5035 at 11/10/16 10:39 PM:
--

This table is partitioned on a string.  The problem occurs with a partition 
that has null values.  The value of the varchar is "NOT SPECIFIED".

I can select every row up to the null partition using:

select timestamp_id from orders_parts_hive limit 9026;

But the next row is in the null partition and causes an exception.

select timestamp_id from orders_parts_hive limit 9027;


was (Author: rhou):
This table is partitioned on a varchar.  The problem occurs with a partition 
that has null values.  The value of the varchar is "NOT SPECIFIED".

I can select every row up to the null partition using:

select timestamp_id from orders_parts_hive limit 9026;

But the next row is in the null partition and causes an exception.

select timestamp_id from orders_parts_hive limit 9027;

> Selecting timestamp value from Hive table causes IndexOutOfBoundsException
> --
>
> Key: DRILL-5035
> URL: https://issues.apache.org/jira/browse/DRILL-5035
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>
> I used the new option to read Hive timestamps.
> alter session set `store.parquet.reader.int96_as_timestamp` = true;
> This query fails:
> select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 
> 06:11:52.429';
> Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 
> 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768))
> Fragment 0:0
> [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] 
> (state=,code=0)
> Selecting all the columns succeed.
> 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where 
> timestamp_id = '2016-10-03 06:11:52.429';
> +-+++---+--+--+-++-++---++-+-+--+-+
> | o_orderkey  | o_custkey  | o_orderstatus  | o_totalprice  | o_orderdate  | 
> o_clerk  | o_shippriority  | o_comment
>   | int_id  | bigint_id  | float_id  | double_id  | 
> varchar_id  |   date_id   |   timestamp_id   |  dir0  
>  |
> +-+++---+--+--+-++-++---++-+-+--+-+
> | 11335   | 871| F  | 133549.0  | 1994-10-22   | 
> null | 0   | ealms. theodolites maintain. regular, even 
> instructions against t  | -4  | -4 | -4.0  | -4.0   | -4  
> | 2016-09-29  | 2016-10-03 06:11:52.429  | o_orderpriority=2-HIGH  |
> +-+++---+--+--+-++-++---++-+-+--+-+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException

2016-11-10 Thread Robert Hou (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655397#comment-15655397
 ] 

Robert Hou edited comment on DRILL-5035 at 11/10/16 10:36 PM:
--

The partition only has null values for timestamp_id.


was (Author: rhou):
The partition only has null values.

> Selecting timestamp value from Hive table causes IndexOutOfBoundsException
> --
>
> Key: DRILL-5035
> URL: https://issues.apache.org/jira/browse/DRILL-5035
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>
> I used the new option to read Hive timestamps.
> alter session set `store.parquet.reader.int96_as_timestamp` = true;
> This query fails:
> select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 
> 06:11:52.429';
> Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 
> 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768))
> Fragment 0:0
> [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] 
> (state=,code=0)
> Selecting all the columns succeed.
> 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where 
> timestamp_id = '2016-10-03 06:11:52.429';
> +-+++---+--+--+-++-++---++-+-+--+-+
> | o_orderkey  | o_custkey  | o_orderstatus  | o_totalprice  | o_orderdate  | 
> o_clerk  | o_shippriority  | o_comment
>   | int_id  | bigint_id  | float_id  | double_id  | 
> varchar_id  |   date_id   |   timestamp_id   |  dir0  
>  |
> +-+++---+--+--+-++-++---++-+-+--+-+
> | 11335   | 871| F  | 133549.0  | 1994-10-22   | 
> null | 0   | ealms. theodolites maintain. regular, even 
> instructions against t  | -4  | -4 | -4.0  | -4.0   | -4  
> | 2016-09-29  | 2016-10-03 06:11:52.429  | o_orderpriority=2-HIGH  |
> +-+++---+--+--+-++-++---++-+-+--+-+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException

2016-11-10 Thread Robert Hou (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655397#comment-15655397
 ] 

Robert Hou commented on DRILL-5035:
---

The partition only has null values.

> Selecting timestamp value from Hive table causes IndexOutOfBoundsException
> --
>
> Key: DRILL-5035
> URL: https://issues.apache.org/jira/browse/DRILL-5035
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>
> I used the new option to read Hive timestamps.
> alter session set `store.parquet.reader.int96_as_timestamp` = true;
> This query fails:
> select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 
> 06:11:52.429';
> Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 
> 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768))
> Fragment 0:0
> [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] 
> (state=,code=0)
> Selecting all the columns succeed.
> 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where 
> timestamp_id = '2016-10-03 06:11:52.429';
> +-+++---+--+--+-++-++---++-+-+--+-+
> | o_orderkey  | o_custkey  | o_orderstatus  | o_totalprice  | o_orderdate  | 
> o_clerk  | o_shippriority  | o_comment
>   | int_id  | bigint_id  | float_id  | double_id  | 
> varchar_id  |   date_id   |   timestamp_id   |  dir0  
>  |
> +-+++---+--+--+-++-++---++-+-+--+-+
> | 11335   | 871| F  | 133549.0  | 1994-10-22   | 
> null | 0   | ealms. theodolites maintain. regular, even 
> instructions against t  | -4  | -4 | -4.0  | -4.0   | -4  
> | 2016-09-29  | 2016-10-03 06:11:52.429  | o_orderpriority=2-HIGH  |
> +-+++---+--+--+-++-++---++-+-+--+-+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException

2016-11-10 Thread Robert Hou (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655378#comment-15655378
 ] 

Robert Hou commented on DRILL-5035:
---

The Hive table is partitioned on o_orderpriority, which is a string.

> Selecting timestamp value from Hive table causes IndexOutOfBoundsException
> --
>
> Key: DRILL-5035
> URL: https://issues.apache.org/jira/browse/DRILL-5035
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>
> I used the new option to read Hive timestamps.
> alter session set `store.parquet.reader.int96_as_timestamp` = true;
> This query fails:
> select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 
> 06:11:52.429';
> Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 
> 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768))
> Fragment 0:0
> [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] 
> (state=,code=0)
> Selecting all the columns succeed.
> 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where 
> timestamp_id = '2016-10-03 06:11:52.429';
> +-+++---+--+--+-++-++---++-+-+--+-+
> | o_orderkey  | o_custkey  | o_orderstatus  | o_totalprice  | o_orderdate  | 
> o_clerk  | o_shippriority  | o_comment
>   | int_id  | bigint_id  | float_id  | double_id  | 
> varchar_id  |   date_id   |   timestamp_id   |  dir0  
>  |
> +-+++---+--+--+-++-++---++-+-+--+-+
> | 11335   | 871| F  | 133549.0  | 1994-10-22   | 
> null | 0   | ealms. theodolites maintain. regular, even 
> instructions against t  | -4  | -4 | -4.0  | -4.0   | -4  
> | 2016-09-29  | 2016-10-03 06:11:52.429  | o_orderpriority=2-HIGH  |
> +-+++---+--+--+-++-++---++-+-+--+-+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException

2016-11-10 Thread Robert Hou (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655374#comment-15655374
 ] 

Robert Hou commented on DRILL-5035:
---

This table is partitioned on a varchar.  The problem occurs with a partition 
that has null values.  The value of the varchar is "NOT SPECIFIED".

I can select every row up to the null partition using:

select timestamp_id from orders_parts_hive limit 9026;

But the next row is in the null partition and causes an exception.

select timestamp_id from orders_parts_hive limit 9027;

> Selecting timestamp value from Hive table causes IndexOutOfBoundsException
> --
>
> Key: DRILL-5035
> URL: https://issues.apache.org/jira/browse/DRILL-5035
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>
> I used the new option to read Hive timestamps.
> alter session set `store.parquet.reader.int96_as_timestamp` = true;
> This query fails:
> select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 
> 06:11:52.429';
> Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 
> 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768))
> Fragment 0:0
> [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] 
> (state=,code=0)
> Selecting all the columns succeed.
> 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where 
> timestamp_id = '2016-10-03 06:11:52.429';
> +-+++---+--+--+-++-++---++-+-+--+-+
> | o_orderkey  | o_custkey  | o_orderstatus  | o_totalprice  | o_orderdate  | 
> o_clerk  | o_shippriority  | o_comment
>   | int_id  | bigint_id  | float_id  | double_id  | 
> varchar_id  |   date_id   |   timestamp_id   |  dir0  
>  |
> +-+++---+--+--+-++-++---++-+-+--+-+
> | 11335   | 871| F  | 133549.0  | 1994-10-22   | 
> null | 0   | ealms. theodolites maintain. regular, even 
> instructions against t  | -4  | -4 | -4.0  | -4.0   | -4  
> | 2016-09-29  | 2016-10-03 06:11:52.429  | o_orderpriority=2-HIGH  |
> +-+++---+--+--+-++-++---++-+-+--+-+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException

2016-11-10 Thread Robert Hou (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655306#comment-15655306
 ] 

Robert Hou commented on DRILL-5035:
---

I am using RC1.

0: jdbc:drill:zk=10.10.100.186:5181> select * from sys.version;
+--+---+-++++
| version  | commit_id |   
commit_message|commit_time |  build_email   
| build_time |
+--+---+-++++
| 1.9.0| 5cea9afa6278e21574c6a982ae5c3d82085ef904  | [maven-release-plugin] 
prepare release drill-1.9.0  | 09.11.2016 @ 10:28:44 PST  | r...@mapr.com  | 
10.11.2016 @ 12:56:24 PST  |
+--+---+-++++


> Selecting timestamp value from Hive table causes IndexOutOfBoundsException
> --
>
> Key: DRILL-5035
> URL: https://issues.apache.org/jira/browse/DRILL-5035
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>
> I used the new option to read Hive timestamps.
> alter session set `store.parquet.reader.int96_as_timestamp` = true;
> This query fails:
> select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 
> 06:11:52.429';
> Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 
> 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768))
> Fragment 0:0
> [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] 
> (state=,code=0)
> Selecting all the columns succeed.
> 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where 
> timestamp_id = '2016-10-03 06:11:52.429';
> +-+++---+--+--+-++-++---++-+-+--+-+
> | o_orderkey  | o_custkey  | o_orderstatus  | o_totalprice  | o_orderdate  | 
> o_clerk  | o_shippriority  | o_comment
>   | int_id  | bigint_id  | float_id  | double_id  | 
> varchar_id  |   date_id   |   timestamp_id   |  dir0  
>  |
> +-+++---+--+--+-++-++---++-+-+--+-+
> | 11335   | 871| F  | 133549.0  | 1994-10-22   | 
> null | 0   | ealms. theodolites maintain. regular, even 
> instructions against t  | -4  | -4 | -4.0  | -4.0   | -4  
> | 2016-09-29  | 2016-10-03 06:11:52.429  | o_orderpriority=2-HIGH  |
> +-+++---+--+--+-++-++---++-+-+--+-+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException

2016-11-10 Thread Robert Hou (JIRA)
Robert Hou created DRILL-5035:
-

 Summary: Selecting timestamp value from Hive table causes 
IndexOutOfBoundsException
 Key: DRILL-5035
 URL: https://issues.apache.org/jira/browse/DRILL-5035
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Data Types
Affects Versions: 1.9.0
Reporter: Robert Hou


I used the new option to read Hive timestamps.

alter session set `store.parquet.reader.int96_as_timestamp` = true;

This query fails:

select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 
06:11:52.429';
Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 
36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768))

Fragment 0:0

[Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] 
(state=,code=0)


Selecting all the columns succeed.

0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where 
timestamp_id = '2016-10-03 06:11:52.429';
+-+++---+--+--+-++-++---++-+-+--+-+
| o_orderkey  | o_custkey  | o_orderstatus  | o_totalprice  | o_orderdate  | 
o_clerk  | o_shippriority  | o_comment  
| int_id  | bigint_id  | float_id  | double_id  | varchar_id  | 
  date_id   |   timestamp_id   |  dir0   |
+-+++---+--+--+-++-++---++-+-+--+-+
| 11335   | 871| F  | 133549.0  | 1994-10-22   | 
null | 0   | ealms. theodolites maintain. regular, even 
instructions against t  | -4  | -4 | -4.0  | -4.0   | -4
  | 2016-09-29  | 2016-10-03 06:11:52.429  | o_orderpriority=2-HIGH  |
+-+++---+--+--+-++-++---++-+-+--+-+




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4990) Use new HDFS API access instead of listStatus to check if users have permissions to access workspace.

2016-11-10 Thread Padma Penumarthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Padma Penumarthy updated DRILL-4990:

Description: For every query, we build the schema tree 
(runSQL->getPlan->getNewDefaultSchema->getRootSchema). All workspaces in all 
storage plugins are checked and are added to the schema tree if they are 
accessible by the user who initiated the query.  For file system plugin, 
listStatus API is used to check if  the workspace is accessible or not 
(WorkspaceSchemaFactory.accessible) by the user.  The idea seem to be if the 
user does not have access to file(s) in the workspace, listStatus will generate 
an exception and we return false. But, listStatus (which lists all the entries 
of a directory) is an expensive operation when there are large number of files 
in the directory. A new API is added in Hadoop 2.6 called access (HDFS-6570) 
which provides the ability to check if the user has permissions on a 
file/directory.  Use this new API instead of listStatus.   (was: For every 
query, we build the schema tree 
(runSQL->getPlan->getNewDefaultSchema->getRootSchema). All workspaces in all 
storage plugins are checked and are added to the schema tree if they are 
accessible by the user who initiated the query.  For file system plugin, 
listStatus API is used to check if  the workspace is accessible or not 
(WorkspaceSchemaFactory.accessible) by the user.  The idea seem to be if the 
user does not have access to file(s) in the workspace, listStatus will generate 
an exception and we return false. But, listStatus (which lists all the entries 
of a directory) is an expensive operation when there are large number of files 
in the directory. A new API is added in Hadoop 2.6 called access (HDFS-6570) 
which provides the ability to check if the user has permissions on a 
file/directory.  Use this new API instead of listStatus. For a directory with 
256k+ files, an improvement of upto 10 sec in planning time was observed when 
using the new API vs. old way of listStatus. )

> Use new HDFS API access instead of listStatus to check if users have 
> permissions to access workspace.
> -
>
> Key: DRILL-4990
> URL: https://issues.apache.org/jira/browse/DRILL-4990
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.8.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
> Fix For: 1.9.0
>
>
> For every query, we build the schema tree 
> (runSQL->getPlan->getNewDefaultSchema->getRootSchema). All workspaces in all 
> storage plugins are checked and are added to the schema tree if they are 
> accessible by the user who initiated the query.  For file system plugin, 
> listStatus API is used to check if  the workspace is accessible or not 
> (WorkspaceSchemaFactory.accessible) by the user.  The idea seem to be if the 
> user does not have access to file(s) in the workspace, listStatus will 
> generate an exception and we return false. But, listStatus (which lists all 
> the entries of a directory) is an expensive operation when there are large 
> number of files in the directory. A new API is added in Hadoop 2.6 called 
> access (HDFS-6570) which provides the ability to check if the user has 
> permissions on a file/directory.  Use this new API instead of listStatus. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (DRILL-5025) ExternalSortBatch provides weak control over spill file size

2016-11-10 Thread Paul Rogers (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers closed DRILL-5025.
--

> ExternalSortBatch provides weak control over spill file size
> 
>
> Key: DRILL-5025
> URL: https://issues.apache.org/jira/browse/DRILL-5025
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> The ExternalSortBatch (ESB) operator sorts records while spilling to disk to 
> control memory use. The size of the spill file is not easy to control. It is 
> a function of the accumulated batches size (half of the accumulated total), 
> which is determined by either the memory budget or the 
> {{drill.exec.sort.external.group.size}} parameter. (But, even with the 
> parameter, the actual file size is still half the accumulated batches.)
> The proposed solution is to provide an explicit parameter that sets the 
> maximum spill file size: {{drill.exec.sort.external.spill.size}}. If the ESB 
> needs to spill more than this amount of data, ESB should split the spill into 
> multiple files.
> The spill.size should be in bytes (or MB). (A size in records makes the file 
> size data-dependent, which would not be helpful.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-5025) ExternalSortBatch provides weak control over spill file size

2016-11-10 Thread Paul Rogers (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers resolved DRILL-5025.

Resolution: Invalid
  Assignee: Paul Rogers

> ExternalSortBatch provides weak control over spill file size
> 
>
> Key: DRILL-5025
> URL: https://issues.apache.org/jira/browse/DRILL-5025
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> The ExternalSortBatch (ESB) operator sorts records while spilling to disk to 
> control memory use. The size of the spill file is not easy to control. It is 
> a function of the accumulated batches size (half of the accumulated total), 
> which is determined by either the memory budget or the 
> {{drill.exec.sort.external.group.size}} parameter. (But, even with the 
> parameter, the actual file size is still half the accumulated batches.)
> The proposed solution is to provide an explicit parameter that sets the 
> maximum spill file size: {{drill.exec.sort.external.spill.size}}. If the ESB 
> needs to spill more than this amount of data, ESB should split the spill into 
> multiple files.
> The spill.size should be in bytes (or MB). (A size in records makes the file 
> size data-dependent, which would not be helpful.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-5025) ExternalSortBatch provides weak control over spill file size

2016-11-10 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655275#comment-15655275
 ] 

Paul Rogers commented on DRILL-5025:


Cancelling for now; spill file size is determined by the spill/respill 
strategy; is best discussed in that context.

> ExternalSortBatch provides weak control over spill file size
> 
>
> Key: DRILL-5025
> URL: https://issues.apache.org/jira/browse/DRILL-5025
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Paul Rogers
>Priority: Minor
>
> The ExternalSortBatch (ESB) operator sorts records while spilling to disk to 
> control memory use. The size of the spill file is not easy to control. It is 
> a function of the accumulated batches size (half of the accumulated total), 
> which is determined by either the memory budget or the 
> {{drill.exec.sort.external.group.size}} parameter. (But, even with the 
> parameter, the actual file size is still half the accumulated batches.)
> The proposed solution is to provide an explicit parameter that sets the 
> maximum spill file size: {{drill.exec.sort.external.spill.size}}. If the ESB 
> needs to spill more than this amount of data, ESB should split the spill into 
> multiple files.
> The spill.size should be in bytes (or MB). (A size in records makes the file 
> size data-dependent, which would not be helpful.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet

2016-11-10 Thread Krystal (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krystal reopened DRILL-4373:


Re-open jira as this breaks existing behavior as described in 
https://issues.apache.org/jira/browse/DRILL-5034.

> Drill and Hive have incompatible timestamp representations in parquet
> -
>
> Key: DRILL-4373
> URL: https://issues.apache.org/jira/browse/DRILL-4373
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Parquet
>Affects Versions: 1.8.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
>  Labels: doc-impacting
> Fix For: 1.9.0
>
>
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a 
> hive table on top of the parquet file and use "timestamp" as the column type, 
> drill fails to read the hive table through the hive storage plugin
> Implementation: 
> Added int96 to timestamp converter for both parquet readers and controling it 
> by system / session option "store.parquet.int96_as_timestamp".
> The value of the option is false by default for the proper work of the old 
> query scripts with the "convert_from TIMESTAMP_IMPALA" function.
> When the option is true using of that function is unnesessary and can lead to 
> the query fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)



[jira] [Created] (DRILL-5034) Select timestamp from hive generated parquet always return in UTC

2016-11-10 Thread Krystal (JIRA)
Krystal created DRILL-5034:
--

 Summary: Select timestamp from hive generated parquet always 
return in UTC
 Key: DRILL-5034
 URL: https://issues.apache.org/jira/browse/DRILL-5034
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Parquet
Affects Versions: 1.9.0
Reporter: Krystal


commit id: 5cea9afa6278e21574c6a982ae5c3d82085ef904

Reading timestamp data against a hive parquet table from drill automatically 
converts the timestamp data to UTC. 

SELECT TIMEOFDAY() FROM (VALUES(1));
+--+
|EXPR$0|
+--+
| 2016-11-10 12:33:26.547 America/Los_Angeles  |
+--+

data schema:
message hive_schema {
  optional int32 voter_id;
  optional binary name (UTF8);
  optional int32 age;
  optional binary registration (UTF8);
  optional fixed_len_byte_array(3) contributions (DECIMAL(6,2));
  optional int32 voterzone;
  optional int96 create_timestamp;
  optional int32 create_date (DATE);
}

Using drill-1.8, the returned timestamps match the table data:

select convert_from(create_timestamp, 'TIMESTAMP_IMPALA') from 
`/user/hive/warehouse/voter_hive_parquet` limit 5;
++
| EXPR$0 |
++
| 2016-10-23 20:03:58.0  |
| null   |
| 2016-09-09 12:01:18.0  |
| 2017-03-06 20:35:55.0  |
| 2017-01-20 22:32:43.0  |
++
5 rows selected (1.032 seconds)

If the user timzone is changed to UTC, then the timestamp data is returned in 
UTC time.

Using drill-1.9, the returned timestamps got converted to UTC eventhough the 
user timezone is in PST.

select convert_from(create_timestamp, 'TIMESTAMP_IMPALA') from 
dfs.`/user/hive/warehouse/voter_hive_parquet` limit 5;
++
| EXPR$0 |
++
| 2016-10-24 03:03:58.0  |
| null   |
| 2016-09-09 19:01:18.0  |
| 2017-03-07 04:35:55.0  |
| 2017-01-21 06:32:43.0  |
++

alter session set `store.parquet.reader.int96_as_timestamp`=true;
+---+---+
|  ok   |  summary  |
+---+---+
| true  | store.parquet.reader.int96_as_timestamp updated.  |
+---+---+

select create_timestamp from dfs.`/user/hive/warehouse/voter_hive_parquet` 
limit 5;
++
|create_timestamp|
++
| 2016-10-24 03:03:58.0  |
| null   |
| 2016-09-09 19:01:18.0  |
| 2017-03-07 04:35:55.0  |
| 2017-01-21 06:32:43.0  |
++


 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-5033) Query on JSON that has null as value for each key

2016-11-10 Thread Khurram Faraaz (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Khurram Faraaz updated DRILL-5033:
--
Description: 
Drill 1.9.0 git commit ID : 83513daf

Drill returns same result with or without `store.json.all_text_mode`=true

Note that each key in the JSON has null as its value.
[root@cent01 null_eq_joins]# cat right_all_nulls.json
{
 "intKey" : null,
 "bgintKey": null,
 "strKey": null,
 "boolKey": null,
 "fltKey": null,
 "dblKey": null,
 "timKey": null,
 "dtKey": null,
 "tmstmpKey": null,
 "intrvldyKey": null,
 "intrvlyrKey": null
}
[root@cent01 null_eq_joins]#

Querying the above JSON file results in null as query result.
 -  We should see each of the keys in the JSON as a column in query result.
 -  And in each column the value should be a null value. 
Current behavior does not look right.

{noformat}
0: jdbc:drill:schema=dfs.tmp> select * from `right_all_nulls.json`;
+---+
|   *   |
+---+
| null  |
+---+
1 row selected (0.313 seconds)
{noformat}

Adding comment from [~julianhyde] 

IMHO it is similar but not the same as DRILL-1256. Worth logging an issue and 
let [~jnadeau] (or someone) put on the record what should be the behavior of an 
empty record (empty JSON map) when it is top-level (as in this case) or in a 
collection.


  was:
Drill 1.9.0 git commit ID : 83513daf

Drill returns same result with or without `store.json.all_text_mode`=true

Note that each key in the JSON has null as its value.
[root@cent01 null_eq_joins]# cat right_all_nulls.json
{
 "intKey" : null,
 "bgintKey": null,
 "strKey": null,
 "boolKey": null,
 "fltKey": null,
 "dblKey": null,
 "timKey": null,
 "dtKey": null,
 "tmstmpKey": null,
 "intrvldyKey": null,
 "intrvlyrKey": null
}
[root@cent01 null_eq_joins]#

Querying the above JSON file results in null as query result.
 -  We should see each of the keys in the JSON as a column in query result.
 -  And in each column the value should be a null value. 
Current behavior does not look right.

{noformat}
0: jdbc:drill:schema=dfs.tmp> select * from `right_all_nulls.json`;
+---+
|   *   |
+---+
| null  |
+---+
1 row selected (0.313 seconds)
{noformat}

Adding comment from [~julianhyde] 
{noformat}
IMHO it is similar but not the same as DRILL-1256. Worth logging an issue and 
let [~jnadeau] (or someone) put on the record what should be the behavior of an 
empty record (empty JSON map) when it is top-level (as in this case) or in a 
collection.
{noformat}


> Query on JSON that has null as value for each key
> -
>
> Key: DRILL-5033
> URL: https://issues.apache.org/jira/browse/DRILL-5033
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Affects Versions: 1.9.0
>Reporter: Khurram Faraaz
>
> Drill 1.9.0 git commit ID : 83513daf
> Drill returns same result with or without `store.json.all_text_mode`=true
> Note that each key in the JSON has null as its value.
> [root@cent01 null_eq_joins]# cat right_all_nulls.json
> {
>  "intKey" : null,
>  "bgintKey": null,
>  "strKey": null,
>  "boolKey": null,
>  "fltKey": null,
>  "dblKey": null,
>  "timKey": null,
>  "dtKey": null,
>  "tmstmpKey": null,
>  "intrvldyKey": null,
>  "intrvlyrKey": null
> }
> [root@cent01 null_eq_joins]#
> Querying the above JSON file results in null as query result.
>  -  We should see each of the keys in the JSON as a column in query result.
>  -  And in each column the value should be a null value. 
> Current behavior does not look right.
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select * from `right_all_nulls.json`;
> +---+
> |   *   |
> +---+
> | null  |
> +---+
> 1 row selected (0.313 seconds)
> {noformat}
> Adding comment from [~julianhyde] 
> IMHO it is similar but not the same as DRILL-1256. Worth logging an issue and 
> let [~jnadeau] (or someone) put on the record what should be the behavior of 
> an empty record (empty JSON map) when it is top-level (as in this case) or in 
> a collection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-5033) Query on JSON that has null as value for each key

2016-11-10 Thread Khurram Faraaz (JIRA)
Khurram Faraaz created DRILL-5033:
-

 Summary: Query on JSON that has null as value for each key
 Key: DRILL-5033
 URL: https://issues.apache.org/jira/browse/DRILL-5033
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - JSON
Affects Versions: 1.9.0
Reporter: Khurram Faraaz


Drill 1.9.0 git commit ID : 83513daf

Drill returns same result with or without `store.json.all_text_mode`=true

Note that each key in the JSON has null as its value.
[root@cent01 null_eq_joins]# cat right_all_nulls.json
{
 "intKey" : null,
 "bgintKey": null,
 "strKey": null,
 "boolKey": null,
 "fltKey": null,
 "dblKey": null,
 "timKey": null,
 "dtKey": null,
 "tmstmpKey": null,
 "intrvldyKey": null,
 "intrvlyrKey": null
}
[root@cent01 null_eq_joins]#

Querying the above JSON file results in null as query result.
 -  We should see each of the keys in the JSON as a column in query result.
 -  And in each column the value should be a null value. 
Current behavior does not look right.

{noformat}
0: jdbc:drill:schema=dfs.tmp> select * from `right_all_nulls.json`;
+---+
|   *   |
+---+
| null  |
+---+
1 row selected (0.313 seconds)
{noformat}

Adding comment from [~julianhyde] 
{noformat}
IMHO it is similar but not the same as DRILL-1256. Worth logging an issue and 
let [~jnadeau] (or someone) put on the record what should be the behavior of an 
empty record (empty JSON map) when it is top-level (as in this case) or in a 
collection.
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4935) Allow drillbits to advertise a configurable host address to Zookeeper

2016-11-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15654623#comment-15654623
 ] 

ASF GitHub Bot commented on DRILL-4935:
---

Github user harrison-svds commented on the issue:

https://github.com/apache/drill/pull/647
  
@paul-rogers I added a comment detailing the basic solution.  I don't know 
that I have the permissions to assign the story to myself.


> Allow drillbits to advertise a configurable host address to Zookeeper
> -
>
> Key: DRILL-4935
> URL: https://issues.apache.org/jira/browse/DRILL-4935
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - RPC
>Affects Versions: 1.8.0
>Reporter: Harrison Mebane
>Priority: Minor
> Fix For: Future
>
>
> There are certain situations, such as running Drill in distributed Docker 
> containers, in which it is desirable to advertise a different hostname to 
> Zookeeper than would be output by INetAddress.getLocalHost().  I propose 
> adding a configuration variable 'drill.exec.rpc.bit.advertised.host' and 
> passing this address to Zookeeper when the configuration variable is 
> populated, otherwise falling back to the present behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4935) Allow drillbits to advertise a configurable host address to Zookeeper

2016-11-10 Thread Harrison Mebane (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15654614#comment-15654614
 ] 

Harrison Mebane commented on DRILL-4935:


The solution implemented was to add a line to drill-env.sh specifying an 
environment variable DRILL_HOST_NAME.  In a distributed setting, this variable 
can be set to a command that returns the individual machine's address (e.g. a 
simple command like `hostname` or a REST call in AWS).  If this variable is not 
set, the code falls back to the original method of calling 
INetAddress.getLocalHost().

> Allow drillbits to advertise a configurable host address to Zookeeper
> -
>
> Key: DRILL-4935
> URL: https://issues.apache.org/jira/browse/DRILL-4935
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - RPC
>Affects Versions: 1.8.0
>Reporter: Harrison Mebane
>Priority: Minor
> Fix For: Future
>
>
> There are certain situations, such as running Drill in distributed Docker 
> containers, in which it is desirable to advertise a different hostname to 
> Zookeeper than would be output by INetAddress.getLocalHost().  I propose 
> adding a configuration variable 'drill.exec.rpc.bit.advertised.host' and 
> passing this address to Zookeeper when the configuration variable is 
> populated, otherwise falling back to the present behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-5027) ExternalSortBatch is inefficient, leaks files for large queries

2016-11-10 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15654610#comment-15654610
 ] 

Paul Rogers commented on DRILL-5027:


Another issue with the existing ESB:

The code to remerge existing data occurs before the code to spill the current 
in-memory generation. We are spilling the in-memory generation because memory 
is tight. But, respelling the on-disk generation requires the use of memory, 
potentially causing a spike above the memory budget.

Better to spill the in-memory generation first, then re-spill the on-disk 
generation after clearing room in memory.

> ExternalSortBatch is inefficient, leaks files for large queries
> ---
>
> Key: DRILL-5027
> URL: https://issues.apache.org/jira/browse/DRILL-5027
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> The {{ExternalSortBatch}} (ESB) operator sorts data while spilling to disk as 
> needed to operate within a memory budget.
> The sort happens in two phases:
> 1. Gather the incoming batches from the upstream operator, sort them, and 
> spill to disk as needed.
> 2. Merge the "runs" spilled in step 1.
> In most cases, the second step should run within the memory available for the 
> first step (which is why severity is only Minor). Large queries need multiple 
> sort "phases" in which previously spilled runs are read back into memory, 
> merged, and again spilled. It is here that ESB has an issue. This process 
> correctly limit the amount of memory used, but at the cost or rewriting the 
> same data over and over.
> Consider current Drill behavior:
> {code}
> a b c d (re-spill)
> abcd e f g h (re-spill)
> abcefgh i j k
> {code}
> That is batches, a, b, c and d are re-spilled to create the combined abcd, 
> and so on. The same data is rewritten over and over.
> Note that spilled batches take no (direct) memory in Drill, and require only 
> a small on-heap memento. So, maintaining data on disk s "free". So, better 
> would be to re-spill only newer data:
> {code}
> a b c d (re-spill)
> abcd | e f g h (re-spill)
> abcd efgh | i j k
> {code}
> Where the bar indicates a moving point at which we've already merged and do 
> not need to do so again. If each letter is one unit of disk I/O, the original 
> method uses 35 units while the revised method uses 27 units.
> At some point the process may have to repeat by merging the second-generation 
> spill files and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-5032) Drill query on hive parquet table failed with OutOfMemoryError: Java heap space

2016-11-10 Thread Serhii Harnyk (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15654543#comment-15654543
 ] 

Serhii Harnyk commented on DRILL-5032:
--

As [~jni] mentioned

Today, the physical plan looks like:
listOfColumns : [col1, col2, ...] — TableLevel
Partitons : [
partiton1 :
{ listOfColums : [col1, col2, ...] -- PartitonLevel  }
,
partiton2 :
{ listOfColums : [col1, col2, ...] -- PartitonLevel  }
,
... 
partiton_n :
{ listOfColums : [col1, col2, ...] -- PartitonLevel  }
,
]
The listOfColumns are repeating in every partition, which seems to be 
unnecessary. We should get rid of those repeated list of columns in each 
partition, as long as they are same as the listOfColumns at Table level.

So the initial idea is to remove repeated listOfColums from HivePartition 
physical plan serialization 

> Drill query on hive parquet table failed with OutOfMemoryError: Java heap 
> space
> ---
>
> Key: DRILL-5032
> URL: https://issues.apache.org/jira/browse/DRILL-5032
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Hive
>Affects Versions: 1.8.0
>Reporter: Serhii Harnyk
>Assignee: Serhii Harnyk
>
> Following query on hive parquet table failed with OOM Java heap space:
> {code}
> select distinct(businessdate) from vmdr_trades where trade_date='2016-04-12'
> 2016-08-31 08:02:03,597 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 283938c3-fde8-0fc6-37e1-9a568c7f5913: select distinct(businessdate) from 
> vmdr_trades where trade_date='2016-04-12'
> 2016-08-31 08:05:58,502 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - Beginning partition pruning, pruning 
> class: 
> org.apache.drill.exec.planner.sql.logical.HivePushPartitionFilterIntoScan$2
> 2016-08-31 08:05:58,506 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - Total elapsed time to build and analyze 
> filter tree: 1 ms
> 2016-08-31 08:05:58,506 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - No conditions were found eligible for 
> partition pruning.Total pruning elapsed time: 3 ms
> 2016-08-31 08:05:58,663 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - Beginning partition pruning, pruning 
> class: 
> org.apache.drill.exec.planner.sql.logical.HivePushPartitionFilterIntoScan$2
> 2016-08-31 08:05:58,663 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - Total elapsed time to build and analyze 
> filter tree: 0 ms
> 2016-08-31 08:05:58,663 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - No conditions were found eligible for 
> partition pruning.Total pruning elapsed time: 0 ms
> 2016-08-31 08:05:58,664 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - Beginning partition pruning, pruning 
> class: 
> org.apache.drill.exec.planner.sql.logical.HivePushPartitionFilterIntoScan$1
> 2016-08-31 08:05:58,665 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - Total elapsed time to build and analyze 
> filter tree: 0 ms
> 2016-08-31 08:05:58,665 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - No conditions were found eligible for 
> partition pruning.Total pruning elapsed time: 0 ms
> 2016-08-31 08:09:42,355 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] ERROR 
> o.a.drill.common.CatastrophicFailure - Catastrophic Failure Occurred, 
> exiting. Information message: Unable to handle out of memory condition in 
> Foreman.
> java.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOf(Arrays.java:3332) ~[na:1.8.0_74]
> at 
> java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137)
>  ~[na:1.8.0_74]
> at 
> java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121)
>  ~[na:1.8.0_74]
> at 
> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:421) 
> ~[na:1.8.0_74]
> at java.lang.StringBuilder.append(StringBuilder.java:136) 
> ~[na:1.8.0_74]
> at java.lang.StringBuilder.append(StringBuilder.java:76) 
> ~[na:1.8.0_74]
> at 
> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:457) 
> ~[na:1.8.0_74]
> at java.lang.StringBuilder.append(StringBuilder.java:166) 
> ~[na:1.8.0_74]
> at java.lang.StringBuilder.append(StringBuilder.java:76) 
> ~[na:1.8.0_74]
> at 
> com.google.protobuf.TextFormat$TextGenerator.write(TextFormat.java:538) 
> ~[protobuf-java-2.5.0.jar:na]
> 

[jira] [Created] (DRILL-5032) Drill query on hive parquet table failed with OutOfMemoryError: Java heap space

2016-11-10 Thread Serhii Harnyk (JIRA)
Serhii Harnyk created DRILL-5032:


 Summary: Drill query on hive parquet table failed with 
OutOfMemoryError: Java heap space
 Key: DRILL-5032
 URL: https://issues.apache.org/jira/browse/DRILL-5032
 Project: Apache Drill
  Issue Type: Bug
  Components: Functions - Hive
Affects Versions: 1.8.0
Reporter: Serhii Harnyk
Assignee: Serhii Harnyk


Following query on hive parquet table failed with OOM Java heap space:
{code}
select distinct(businessdate) from vmdr_trades where trade_date='2016-04-12'
2016-08-31 08:02:03,597 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
o.a.drill.exec.work.foreman.Foreman - Query text for query id 
283938c3-fde8-0fc6-37e1-9a568c7f5913: select distinct(businessdate) from 
vmdr_trades where trade_date='2016-04-12'
2016-08-31 08:05:58,502 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
o.a.d.e.p.l.partition.PruneScanRule - Beginning partition pruning, pruning 
class: 
org.apache.drill.exec.planner.sql.logical.HivePushPartitionFilterIntoScan$2
2016-08-31 08:05:58,506 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
o.a.d.e.p.l.partition.PruneScanRule - Total elapsed time to build and analyze 
filter tree: 1 ms
2016-08-31 08:05:58,506 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
o.a.d.e.p.l.partition.PruneScanRule - No conditions were found eligible for 
partition pruning.Total pruning elapsed time: 3 ms
2016-08-31 08:05:58,663 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
o.a.d.e.p.l.partition.PruneScanRule - Beginning partition pruning, pruning 
class: 
org.apache.drill.exec.planner.sql.logical.HivePushPartitionFilterIntoScan$2
2016-08-31 08:05:58,663 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
o.a.d.e.p.l.partition.PruneScanRule - Total elapsed time to build and analyze 
filter tree: 0 ms
2016-08-31 08:05:58,663 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
o.a.d.e.p.l.partition.PruneScanRule - No conditions were found eligible for 
partition pruning.Total pruning elapsed time: 0 ms
2016-08-31 08:05:58,664 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
o.a.d.e.p.l.partition.PruneScanRule - Beginning partition pruning, pruning 
class: 
org.apache.drill.exec.planner.sql.logical.HivePushPartitionFilterIntoScan$1
2016-08-31 08:05:58,665 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
o.a.d.e.p.l.partition.PruneScanRule - Total elapsed time to build and analyze 
filter tree: 0 ms
2016-08-31 08:05:58,665 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
o.a.d.e.p.l.partition.PruneScanRule - No conditions were found eligible for 
partition pruning.Total pruning elapsed time: 0 ms
2016-08-31 08:09:42,355 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] ERROR 
o.a.drill.common.CatastrophicFailure - Catastrophic Failure Occurred, exiting. 
Information message: Unable to handle out of memory condition in Foreman.
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3332) ~[na:1.8.0_74]
at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137) 
~[na:1.8.0_74]
at 
java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121)
 ~[na:1.8.0_74]
at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:421) 
~[na:1.8.0_74]
at java.lang.StringBuilder.append(StringBuilder.java:136) ~[na:1.8.0_74]
at java.lang.StringBuilder.append(StringBuilder.java:76) ~[na:1.8.0_74]
at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:457) 
~[na:1.8.0_74]
at java.lang.StringBuilder.append(StringBuilder.java:166) ~[na:1.8.0_74]
at java.lang.StringBuilder.append(StringBuilder.java:76) ~[na:1.8.0_74]
at 
com.google.protobuf.TextFormat$TextGenerator.write(TextFormat.java:538) 
~[protobuf-java-2.5.0.jar:na]
at 
com.google.protobuf.TextFormat$TextGenerator.print(TextFormat.java:526) 
~[protobuf-java-2.5.0.jar:na]
at 
com.google.protobuf.TextFormat$Printer.printFieldValue(TextFormat.java:389) 
~[protobuf-java-2.5.0.jar:na]
at 
com.google.protobuf.TextFormat$Printer.printSingleField(TextFormat.java:327) 
~[protobuf-java-2.5.0.jar:na]
at 
com.google.protobuf.TextFormat$Printer.printField(TextFormat.java:286) 
~[protobuf-java-2.5.0.jar:na]
at com.google.protobuf.TextFormat$Printer.print(TextFormat.java:273) 
~[protobuf-java-2.5.0.jar:na]
at 
com.google.protobuf.TextFormat$Printer.access$400(TextFormat.java:248) 
~[protobuf-java-2.5.0.jar:na]
at com.google.protobuf.TextFormat.print(TextFormat.java:71) 
~[protobuf-java-2.5.0.jar:na]
at com.google.protobuf.TextFormat.printToString(TextFormat.java:118) 
~[protobuf-java-2.5.0.jar:na]
at 
com.google.protobuf.AbstractMessage.toString(AbstractMessage.java:106) 
~[protobuf-java-2.5.0.jar:na]
at 

[jira] [Commented] (DRILL-5027) ExternalSortBatch is inefficient, leaks files for large queries

2016-11-10 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15654495#comment-15654495
 ] 

Paul Rogers commented on DRILL-5027:


Proposed solution, which seems reachable by rearranging the existing pieces of 
ESB:

Assume three "generations":

* An in-memory generation of sorted batches received from upstream, not yet 
spilled.
* A "new" spill generation: those files created directly by spilling the 
in-memory generation.
* An "old" spill generation: those files created by re-spilling new generation 
files.

The spill would work as follows:

* For each upstream batch, sort it and add it to the in-memory generation.
* When the in-memory generation reaches the spill threshold, merge the 
in-memory batches and write to a spill file. Add the spill file to the new 
spill generation. At this point, ESB memory is empty.
* If the new spill generation has reached the spill threshold, merge the 
spilled batches and write to another spill file. Delete the old spill files. 
Add the newly created file to the old spill generation. The new spill 
generation is now empty (as is memory.)
* If the old spill generation has reached the spill threshold, transfer it to 
the new generation and spill as above. The old generation now has a single 
file. (The other two generations are empty.)

The spill threshold is defined as:

* Start with the memory budget for the ESB.
* Define a target spill-batch size. (The minimum of 32K rows or some defined 
size in bytes.)
* Define the maximum number of in-memory batches as memory budget / spill-batch 
size.
* Set the spill threshold to some number less than the maximum in-memory batch 
size.

When gathering incoming batches in memory, or reading batches from disk, the 
above ensures that total memory used is less than the budget.

Benefits of this approach:

* Minimizes read/writes of existing spilled data (overcomes the re-spill issue 
above.)
* Ensures that disk files are deleted as soon as possible.
* Ensures that ESB operates within a defined memory budget.
* Handles data of any size; the algorithm above simply continues to combine 
generations as needed. Trades off performance (more disk I/O) for a fixed 
memory budget.
* Limits disk use to no more than twice the amount of spilled data (to account 
for merging the old generation).

> ExternalSortBatch is inefficient, leaks files for large queries
> ---
>
> Key: DRILL-5027
> URL: https://issues.apache.org/jira/browse/DRILL-5027
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> The {{ExternalSortBatch}} (ESB) operator sorts data while spilling to disk as 
> needed to operate within a memory budget.
> The sort happens in two phases:
> 1. Gather the incoming batches from the upstream operator, sort them, and 
> spill to disk as needed.
> 2. Merge the "runs" spilled in step 1.
> In most cases, the second step should run within the memory available for the 
> first step (which is why severity is only Minor). Large queries need multiple 
> sort "phases" in which previously spilled runs are read back into memory, 
> merged, and again spilled. It is here that ESB has an issue. This process 
> correctly limit the amount of memory used, but at the cost or rewriting the 
> same data over and over.
> Consider current Drill behavior:
> {code}
> a b c d (re-spill)
> abcd e f g h (re-spill)
> abcefgh i j k
> {code}
> That is batches, a, b, c and d are re-spilled to create the combined abcd, 
> and so on. The same data is rewritten over and over.
> Note that spilled batches take no (direct) memory in Drill, and require only 
> a small on-heap memento. So, maintaining data on disk s "free". So, better 
> would be to re-spill only newer data:
> {code}
> a b c d (re-spill)
> abcd | e f g h (re-spill)
> abcd efgh | i j k
> {code}
> Where the bar indicates a moving point at which we've already merged and do 
> not need to do so again. If each letter is one unit of disk I/O, the original 
> method uses 35 units while the revised method uses 27 units.
> At some point the process may have to repeat by merging the second-generation 
> spill files and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-5030) Drill SSL Docs have Bad Link to Oracle Website

2016-11-10 Thread Keys Botzum (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15654458#comment-15654458
 ] 

Keys Botzum commented on DRILL-5030:


By they I believe this is the correct URL: 
http://docs.oracle.com/javase/7/docs/technotes/guides/security/jsse/JSSERefGuide.html#Customization
  (notice that there is a minor typo in the doc - the '/').

> Drill SSL Docs have Bad Link to Oracle Website
> --
>
> Key: DRILL-5030
> URL: https://issues.apache.org/jira/browse/DRILL-5030
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.8.0
>Reporter: John Omernik
>
> When going to setup custom SSL certs on Drill, I found that the link to the 
> oracle website was broken on this page: 
> https://drill.apache.org/docs/configuring-web-console-and-rest-api-security/
> at:
> As cluster administrator, you can set the following SSL configuration 
> parameters in the  conf/drill-override.conf file, as described in the Java 
> product documentation:
> Obviously fixing the link is one option, another would be to provide 
> instructions for SSL certs directly in the drill docs so we are not reliant 
> on Oracle's website. 
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-5030) Drill SSL Docs have Bad Link to Oracle Website

2016-11-10 Thread Keys Botzum (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15654422#comment-15654422
 ] 

Keys Botzum commented on DRILL-5030:


The Oracle page referenced is likely just defining the three parameters - 
basically their meaning. Obviously the link should be fixed, but that's 
separate from asking for a detailed example of creating certs.

My concern about an example is that this is much harder than it appears as 
there are many, many different ways to get a certificate. What's really key is 
to explain clearly what Drill expects to be in JKS file (the docs are not 
clear) and then perhaps provide a simple example using a well known CA to get a 
sample cert, but realistically that example will only be accurate for a small 
set of users. That's why it is so important to explain clearly what Drill 
expects in the JKS file.

Also, since you asked, here is an internal document from MapR that describes 
how to replace the MapR self signed certificate. A similar approach should work 
with Drill:

 was kind enough to share the steps he went through to replace the default 
ssl_keystore and ssl_truststore with CA issued certificates. There is no real 
improvement in security from doing this but many customers prefer to use CA 
issued certificates as it does improve the user experience.


ASSUMPTIONS
---
1. 1-node cluster on mapr50.hadoopone.com is running in secure mode already.  
configure.sh was run setting mapr50.hadoopone.com (FQDN) as CLDB, ZK, HS, and 
RM.

2. cert generated from godaddy is a wildcard cert for *.hadoopone.com domain 
and contains a7d2eaede47dbc19.crt (wildcard host cert), gd_bundle-g2-g1.crt 
(cert chain that leads up to the godaddy signer), and hadoopone.key (RSA key).  
All files stored in home directory of root user.

3. an entry in hosts file of laptop was created for mapr50.hadoopone.com (so 
that MCS doesn't prompt "continue at your own risk?"), or this host resolves in 
DNS.

PROCEDURE (all commands run as root user)
-
1. stop the cluster

service mapr-warden stop
[Note: no need to stop ZK as it doesn’t use certificates]

2. check contents of certificate issued by godaddy

keytool -printcert -file ~/a7d2eaede47dbc19.crt
openssl x509 -noout -text -in ~/a7d2eaede47dbc19.crt

3. check cert chain issued by godaddy

keytool -printcert -file ~/gd_bundle-g2-g1.crt 

4. check RSA key issued by godaddy

openssl rsa -noout -text -in hadoopone.key 

5. create PKCS12 certificate to import 

openssl pkcs12 -export -in ~/a7d2eaede47dbc19.crt -inkey ~/hadoopone.key -out 
~/hadoopone_com.pk12 -name 'mapr50.hadoopone.com' -CAfile ~/gd_bundle-g2-g1.crt 
-chain -passout pass:mapr123

6. check PKCS12 certificate you just generated

keytool -list -keystore ~/hadoopone_com.pk12 -storepass mapr123 -storetype 
PKCS12 

7. import PKCS12 certificate in keystore

keytool --importkeystore -noprompt -deststorepass mapr123 -destkeystore 
~/ssl_keystore -srckeystore hadoopone_com.pk12 -srcstoretype PKCS12 
-srcstorepass mapr123

8. list certs in the keystore

keytool -list -v -keystore ~/ssl_keystore -storepass mapr123 

9. import certificate chain into truststore
 
keytool --importcert -storepass mapr123 -keystore ssl_truststore -file 
gd_bundle-g2-g1.crt -alias godaddy

10. list certs in the trust store

keytool -list -keystore ~/ssl_truststore -storepass mapr123 

11. copy the modified keystore and truststore back to /opt/mapr/conf

cp ~/ssl_keystore /opt/mapr/conf
cp ~/ssl_truststore /opt/mapr/conf

12. restart the cluster

service mapr-zookeeper start
service mapr-warden start

13. test

> Drill SSL Docs have Bad Link to Oracle Website
> --
>
> Key: DRILL-5030
> URL: https://issues.apache.org/jira/browse/DRILL-5030
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.8.0
>Reporter: John Omernik
>
> When going to setup custom SSL certs on Drill, I found that the link to the 
> oracle website was broken on this page: 
> https://drill.apache.org/docs/configuring-web-console-and-rest-api-security/
> at:
> As cluster administrator, you can set the following SSL configuration 
> parameters in the  conf/drill-override.conf file, as described in the Java 
> product documentation:
> Obviously fixing the link is one option, another would be to provide 
> instructions for SSL certs directly in the drill docs so we are not reliant 
> on Oracle's website. 
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-5031) Documentation for HTTPD Parser

2016-11-10 Thread Charles Givre (JIRA)
Charles Givre created DRILL-5031:


 Summary: Documentation for HTTPD Parser
 Key: DRILL-5031
 URL: https://issues.apache.org/jira/browse/DRILL-5031
 Project: Apache Drill
  Issue Type: Improvement
Affects Versions: 1.9.0
Reporter: Charles Givre
Priority: Minor
 Fix For: Future


https://gist.github.com/cgivre/47f07a06d44df2af625fc6848407ae7c



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4980) Upgrading of the approach of parquet date correctness status detection

2016-11-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15654351#comment-15654351
 ] 

ASF GitHub Bot commented on DRILL-4980:
---

Github user vdiravka commented on the issue:

https://github.com/apache/drill/pull/644
  
@parthchandra Could you please review this PR?


> Upgrading of the approach of parquet date correctness status detection
> --
>
> Key: DRILL-4980
> URL: https://issues.apache.org/jira/browse/DRILL-4980
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Affects Versions: 1.8.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
> Fix For: 1.9.0
>
>
> This jira is an addition for the 
> [DRILL-4203|https://issues.apache.org/jira/browse/DRILL-4203].
> The date correctness label for the new generated parquet files should be 
> upgraded. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4980) Upgrading of the approach of parquet date correctness status detection

2016-11-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15654323#comment-15654323
 ] 

ASF GitHub Bot commented on DRILL-4980:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/644#discussion_r87416017
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java 
---
@@ -185,7 +185,8 @@ private Metadata(FileSystem fs, ParquetFormatConfig 
formatConfig) {
 childFiles.add(file);
   }
 }
-ParquetTableMetadata_v3 parquetTableMetadata = new 
ParquetTableMetadata_v3(true);
+ParquetTableMetadata_v3 parquetTableMetadata = new 
ParquetTableMetadata_v3(DrillVersionInfo.getVersion(),
+ParquetWriter.WRITER_VERSION);
--- End diff --

`is.date.correct` or `parquet-writer.version` were needed in metadata cache 
file for quick detection of date values correctness. Otherwise need to check 
`files.rowGroups.columns.mxValue` values from this cache file. 
But thought a little, I've understood that due to new added 
`ParquetTableMetadata_v3` we can check:
If version of parquet metadata cache file is 3, the date values are 
definitely correct. Otherwise (when parquet metadata cache file was generated 
earlier) need to check date values from this file. 
So `writerVersion` is redundant in the `ParquetTableMetadataBase` now. I 
deleted it. Please approve does it make sense?


> Upgrading of the approach of parquet date correctness status detection
> --
>
> Key: DRILL-4980
> URL: https://issues.apache.org/jira/browse/DRILL-4980
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Affects Versions: 1.8.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
> Fix For: 1.9.0
>
>
> This jira is an addition for the 
> [DRILL-4203|https://issues.apache.org/jira/browse/DRILL-4203].
> The date correctness label for the new generated parquet files should be 
> upgraded. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4980) Upgrading of the approach of parquet date correctness status detection

2016-11-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15654274#comment-15654274
 ] 

ASF GitHub Bot commented on DRILL-4980:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/644#discussion_r87411842
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetReaderUtility.java
 ---
@@ -59,19 +59,24 @@
*/
   public static final long JULIAN_DAY_NUMBER_FOR_UNIX_EPOCH = 2440588;
   /**
-   * All old parquet files (which haven't "is.date.correct=true" property 
in metadata) have
-   * a corrupt date shift: {@value} days or 2 * {@value 
#JULIAN_DAY_NUMBER_FOR_UNIX_EPOCH}
+   * All old parquet files (which haven't "is.date.correct=true" or 
"parquet-writer.version" properties
+   * in metadata) have a corrupt date shift: {@value} days or 2 * {@value 
#JULIAN_DAY_NUMBER_FOR_UNIX_EPOCH}
*/
   public static final long CORRECT_CORRUPT_DATE_SHIFT = 2 * 
JULIAN_DAY_NUMBER_FOR_UNIX_EPOCH;
-  // The year 5000 (or 1106685 day from Unix epoch) is chosen as the 
threshold for auto-detecting date corruption.
-  // This balances two possible cases of bad auto-correction. External 
tools writing dates in the future will not
-  // be shifted unless they are past this threshold (and we cannot 
identify them as external files based on the metadata).
-  // On the other hand, historical dates written with Drill wouldn't risk 
being incorrectly shifted unless they were
-  // something like 10,000 years in the past.
   private static final Chronology UTC = 
org.joda.time.chrono.ISOChronology.getInstanceUTC();
+  /**
+   * The year 5000 (or 1106685 day from Unix epoch) is chosen as the 
threshold for auto-detecting date corruption.
+   * This balances two possible cases of bad auto-correction. External 
tools writing dates in the future will not
+   * be shifted unless they are past this threshold (and we cannot 
identify them as external files based on the metadata).
+   * On the other hand, historical dates written with Drill wouldn't risk 
being incorrectly shifted unless they were
+   * something like 10,000 years in the past.
+   */
   public static final int DATE_CORRUPTION_THRESHOLD =
   (int) (UTC.getDateTimeMillis(5000, 1, 1, 0) / 
DateTimeConstants.MILLIS_PER_DAY);
-
+  /**
+   * The version of drill parquet writer with date values corruption fix
--- End diff --

Done


> Upgrading of the approach of parquet date correctness status detection
> --
>
> Key: DRILL-4980
> URL: https://issues.apache.org/jira/browse/DRILL-4980
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Affects Versions: 1.8.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
> Fix For: 1.9.0
>
>
> This jira is an addition for the 
> [DRILL-4203|https://issues.apache.org/jira/browse/DRILL-4203].
> The date correctness label for the new generated parquet files should be 
> upgraded. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4935) Allow drillbits to advertise a configurable host address to Zookeeper

2016-11-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15654182#comment-15654182
 ] 

ASF GitHub Bot commented on DRILL-4935:
---

Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/647
  
Please update the JIRA ticket to explain the solution. What does an admin 
need to know to use the feature? How can the admin verify that it works? This 
will allow the documentation team to add the needed information for folks to 
use this feature.

Also, assign the JIRA ticket to yourself, since you're working on it.


> Allow drillbits to advertise a configurable host address to Zookeeper
> -
>
> Key: DRILL-4935
> URL: https://issues.apache.org/jira/browse/DRILL-4935
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - RPC
>Affects Versions: 1.8.0
>Reporter: Harrison Mebane
>Priority: Minor
> Fix For: Future
>
>
> There are certain situations, such as running Drill in distributed Docker 
> containers, in which it is desirable to advertise a different hostname to 
> Zookeeper than would be output by INetAddress.getLocalHost().  I propose 
> adding a configuration variable 'drill.exec.rpc.bit.advertised.host' and 
> passing this address to Zookeeper when the configuration variable is 
> populated, otherwise falling back to the present behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-5028) Opening profiles page from web ui gets very slow when a lot of history files have been stored in HDFS or Local FS.

2016-11-10 Thread Hongze Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongze Zhang updated DRILL-5028:

Description: 
We have a Drill cluster with 20+ Nodes and we store all history profiles in 
hdfs. Without doing periodically cleans for hdfs, the profiles page gets slower 
while serving more queries.

Code from LocalPersistentStore.java uses fs.list(false, basePath) for fetching 
the latest 100 history profiles by default, I guess this operation blocks the 
page loading (Millions small files can be stored in the basePath), maybe we can 
try some other ways to reach the same goal.

  was:
We have a Drill cluster with 20+ Nodes and we store all history profiles in 
hdfs. Without doing periodically cleans for hdfs, the profiles page gets slower 
while serving more queries.

Code from LocalPersistentStore.java uses fs.list(false, basePath) for fetching 
the latest 100 history profiles by default, I guess this operation blocks that 
page (Millions small files can be stored in the basePath), maybe we can try 
some other ways to reach the same goal.


> Opening profiles page from web ui gets very slow when a lot of history files 
> have been stored in HDFS or Local FS.
> --
>
> Key: DRILL-5028
> URL: https://issues.apache.org/jira/browse/DRILL-5028
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Affects Versions: 1.8.0
>Reporter: Hongze Zhang
> Fix For: Future
>
>
> We have a Drill cluster with 20+ Nodes and we store all history profiles in 
> hdfs. Without doing periodically cleans for hdfs, the profiles page gets 
> slower while serving more queries.
> Code from LocalPersistentStore.java uses fs.list(false, basePath) for 
> fetching the latest 100 history profiles by default, I guess this operation 
> blocks the page loading (Millions small files can be stored in the basePath), 
> maybe we can try some other ways to reach the same goal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-5030) Drill SSL Docs have Bad Link to Oracle Website

2016-11-10 Thread John Omernik (JIRA)
John Omernik created DRILL-5030:
---

 Summary: Drill SSL Docs have Bad Link to Oracle Website
 Key: DRILL-5030
 URL: https://issues.apache.org/jira/browse/DRILL-5030
 Project: Apache Drill
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.8.0
Reporter: John Omernik


When going to setup custom SSL certs on Drill, I found that the link to the 
oracle website was broken on this page: 
https://drill.apache.org/docs/configuring-web-console-and-rest-api-security/
at:
As cluster administrator, you can set the following SSL configuration 
parameters in the  conf/drill-override.conf file, as described in the Java 
product documentation:

Obviously fixing the link is one option, another would be to provide 
instructions for SSL certs directly in the drill docs so we are not reliant on 
Oracle's website. 

Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4956) Temporary tables support

2016-11-10 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-4956:

Description: Link to design doc - 
https://docs.google.com/document/d/1gSRo_w6q2WR5fPx7SsQ5IaVmJXJ6xCOJfYGyqpVOC-g/edit
  (was: Link to design doc - TBA)

> Temporary tables support
> 
>
> Key: DRILL-4956
> URL: https://issues.apache.org/jira/browse/DRILL-4956
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>  Labels: doc-impacting
> Fix For: Future
>
>
> Link to design doc - 
> https://docs.google.com/document/d/1gSRo_w6q2WR5fPx7SsQ5IaVmJXJ6xCOJfYGyqpVOC-g/edit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4956) Temporary tables support

2016-11-10 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-4956:

Labels: doc-impacting  (was: )

> Temporary tables support
> 
>
> Key: DRILL-4956
> URL: https://issues.apache.org/jira/browse/DRILL-4956
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>  Labels: doc-impacting
> Fix For: Future
>
>
> Link to design doc - TBA



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4956) Temporary tables support

2016-11-10 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-4956:

Fix Version/s: (was: Future)

> Temporary tables support
> 
>
> Key: DRILL-4956
> URL: https://issues.apache.org/jira/browse/DRILL-4956
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
> Fix For: Future
>
>
> Link to design doc - TBA



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4956) Temporary tables support

2016-11-10 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-4956:

Fix Version/s: Future

> Temporary tables support
> 
>
> Key: DRILL-4956
> URL: https://issues.apache.org/jira/browse/DRILL-4956
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
> Fix For: Future
>
>
> Link to design doc - TBA



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-5029) need better error - cast interval day to int or bigint

2016-11-10 Thread Khurram Faraaz (JIRA)
Khurram Faraaz created DRILL-5029:
-

 Summary: need better error - cast interval day to int or bigint
 Key: DRILL-5029
 URL: https://issues.apache.org/jira/browse/DRILL-5029
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Affects Versions: 1.8.0
Reporter: Khurram Faraaz
Priority: Minor


We need a better error message, today Drill returns an AssertionError

{noformat}
0: jdbc:drill:schema=dfs.tmp> values(cast('P162M24D' as INTERVAL DAY));
+-+
| EXPR$0  |
+-+
| P24D|
+-+
1 row selected (0.419 seconds)
{noformat}

A better error would be
ERROR:  cannot cast type interval to int

{noformat}
0: jdbc:drill:schema=dfs.tmp> values(cast(cast('P162M24D' as INTERVAL DAY) as 
INT));
Error: SYSTEM ERROR: AssertionError: Internal error: Conversion to relational 
algebra failed to preserve datatypes:
validated type:
RecordType(INTEGER NOT NULL EXPR$0) NOT NULL
converted type:
RecordType(BIGINT NOT NULL EXPR$0) NOT NULL
rel:
LogicalProject(EXPR$0=[/INT(Reinterpret(CAST('P162M24D'):INTERVAL DAY NOT 
NULL), 8640)])
  LogicalValues(tuples=[[{ 0 }]])



[Error Id: 662716fb-c2c3-4032-8d92-835f8b0ec7ae on centos-01.qa.lab:31010] 
(state=,code=0)
{noformat}

A better message would be
ERROR:  cannot cast type interval to bigint
{noformat}
0: jdbc:drill:schema=dfs.tmp> values(cast(cast('P162M24D' as INTERVAL DAY) as 
BIGINT));
Error: SYSTEM ERROR: AssertionError: todo: implement syntax 
SPECIAL(Reinterpret(CAST('P162M24D'):INTERVAL DAY NOT NULL))


[Error Id: ef2c31cd-dee3-4f13-aca0-05c16185f789 on centos-01.qa.lab:31010] 
(state=,code=0)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-5015) As per documentation, when issuing a list of drillbits in the connection string, we always attempt to connect only to the first one

2016-11-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15653519#comment-15653519
 ] 

ASF GitHub Bot commented on DRILL-5015:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/648#discussion_r87351696
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java ---
@@ -223,19 +223,65 @@ public void connect(Properties props) throws 
RpcException {
 connect(null, props);
   }
 
+  /**
+   * Function to populate the endpointList with information of all the 
drillbits
+   * provided in the connection string by client
+   * @param endpointList - ArrayList of DrillbitEndpoints
+   * @param drillbits - One or more drillbit ip[:port] provided in 
connection string
+   */
+  public void populateEndpointsList(ArrayList 
endpointList, String drillbits){
+
+// If no information about drillbits is provided then just return 
empty list.
+if(drillbits == null || drillbits.length() == 0){
+  return;
+}
+final String[] connectInfo = drillbits.split(",");
+
+/* For direct connection we can get URL string having drillbit 
property as below:
+ drillbit=: --- Use the IP and port specified as the 
Foreman IP and port
+ drillbit=--- Use the IP specified as the Foreman IP 
with default port in config file
+ drillbit=:,:... --- Randomly select the 
IP and port pair from the specified
+ list as the Foreman 
IP and port.
+
+   Fetch ip address and port information for each drillbit and 
populate the list
+*/
+for(String info : connectInfo){
+  info = info.trim();
+
+  if(info != null){
+// Split each info to get ip address and port value
+final String[] drillbitInfo = info.split(":");
+
+// Check for malformed ip:port string
+if(drillbitInfo == null || drillbitInfo.length == 0){
+  continue;
+}
+
+/* If port is present use that one else use the configured one
+   Assumptions: 1) IP Address provided in connection string is 
valid
+2) Port without IP address is never specified.
+*/
+final String port = (drillbitInfo.length == 2) ? drillbitInfo[1] : 
config.getString(ExecConstants.INITIAL_USER_PORT);
+final DrillbitEndpoint endpoint = DrillbitEndpoint.newBuilder()
+  .setAddress(drillbitInfo[0])
+  
.setUserPort(Integer.parseInt(port))
+  .build();
+endpointList.add(endpoint);
+  }
+}
+  }
+
   public synchronized void connect(String connect, Properties props) 
throws RpcException {
 if (connected) {
   return;
 }
 
 final DrillbitEndpoint endpoint;
+final ArrayList endpoints = new ArrayList<>();
--- End diff --

fixed


> As per documentation, when issuing a list of drillbits in the connection 
> string, we always attempt to connect only to the first one
> ---
>
> Key: DRILL-5015
> URL: https://issues.apache.org/jira/browse/DRILL-5015
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 1.8.0, 1.9.0
>Reporter: Sorabh Hamirwasia
>Assignee: Sudheesh Katkam
>
> When trying to connect to a Drill cluster by specifying more than 1 drillbits 
> to connect to, we always attempt to connect to only the first drillbit.
> As an example, we tested against a pair of drillbits, but we always connect 
> to the first entry in the CSV list by querying for the 'current' drillbit. 
> The remaining entries are never attempted.
> [root@pssc-60 agileSqlPerfTests]# /opt/mapr/drill/drill-1.8.0/bin/sqlline  -u 
>  "jdbc:drill:schema=dfs.tmp;drillbit=pssc-61:31010,pssc-62:31010" -f 
> whereAmI.q  | grep -v logback
> 1/1  select * from sys.drillbits where `current`;
> +-++---++--+
> |hostname | user_port  | control_port  | data_port  | current  |
> +-++---++--+
> | pssc-61.qa.lab  | 31010  | 31011 | 31012  | true |
> +-++---++--+
> 1 row selected (0.265 seconds)
> Closing: org.apache.drill.jdbc.impl.DrillConnectionImpl
> apache drill 1.8.0 
> "a little sql for your nosql"

[jira] [Commented] (DRILL-5015) As per documentation, when issuing a list of drillbits in the connection string, we always attempt to connect only to the first one

2016-11-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15653523#comment-15653523
 ] 

ASF GitHub Bot commented on DRILL-5015:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/648#discussion_r87351913
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java ---
@@ -223,19 +223,65 @@ public void connect(Properties props) throws 
RpcException {
 connect(null, props);
   }
 
+  /**
+   * Function to populate the endpointList with information of all the 
drillbits
+   * provided in the connection string by client
+   * @param endpointList - ArrayList of DrillbitEndpoints
+   * @param drillbits - One or more drillbit ip[:port] provided in 
connection string
+   */
+  public void populateEndpointsList(ArrayList 
endpointList, String drillbits){
+
+// If no information about drillbits is provided then just return 
empty list.
--- End diff --

Fixed


> As per documentation, when issuing a list of drillbits in the connection 
> string, we always attempt to connect only to the first one
> ---
>
> Key: DRILL-5015
> URL: https://issues.apache.org/jira/browse/DRILL-5015
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 1.8.0, 1.9.0
>Reporter: Sorabh Hamirwasia
>Assignee: Sudheesh Katkam
>
> When trying to connect to a Drill cluster by specifying more than 1 drillbits 
> to connect to, we always attempt to connect to only the first drillbit.
> As an example, we tested against a pair of drillbits, but we always connect 
> to the first entry in the CSV list by querying for the 'current' drillbit. 
> The remaining entries are never attempted.
> [root@pssc-60 agileSqlPerfTests]# /opt/mapr/drill/drill-1.8.0/bin/sqlline  -u 
>  "jdbc:drill:schema=dfs.tmp;drillbit=pssc-61:31010,pssc-62:31010" -f 
> whereAmI.q  | grep -v logback
> 1/1  select * from sys.drillbits where `current`;
> +-++---++--+
> |hostname | user_port  | control_port  | data_port  | current  |
> +-++---++--+
> | pssc-61.qa.lab  | 31010  | 31011 | 31012  | true |
> +-++---++--+
> 1 row selected (0.265 seconds)
> Closing: org.apache.drill.jdbc.impl.DrillConnectionImpl
> apache drill 1.8.0 
> "a little sql for your nosql"
> This property is meant for use by clients when not wanting to overload the ZK 
> for fetching a list of existing Drillbits, but the behaviour doesn't match 
> the documentation. 
> [Making a Direct Drillbit Connection | 
> https://drill.apache.org/docs/using-the-jdbc-driver/#using-the-jdbc-url-format-for-a-direct-drillbit-connection
>  ]
> We need to randomly shuffle between this list and If an entry in the shuffled 
> list is unreachable, we need to try for the next entry in the list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-5015) As per documentation, when issuing a list of drillbits in the connection string, we always attempt to connect only to the first one

2016-11-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15653521#comment-15653521
 ] 

ASF GitHub Bot commented on DRILL-5015:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/648#discussion_r87351905
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java ---
@@ -223,19 +223,65 @@ public void connect(Properties props) throws 
RpcException {
 connect(null, props);
   }
 
+  /**
+   * Function to populate the endpointList with information of all the 
drillbits
+   * provided in the connection string by client
+   * @param endpointList - ArrayList of DrillbitEndpoints
+   * @param drillbits - One or more drillbit ip[:port] provided in 
connection string
+   */
+  public void populateEndpointsList(ArrayList 
endpointList, String drillbits){
+
+// If no information about drillbits is provided then just return 
empty list.
+if(drillbits == null || drillbits.length() == 0){
+  return;
+}
+final String[] connectInfo = drillbits.split(",");
+
+/* For direct connection we can get URL string having drillbit 
property as below:
--- End diff --

Done


> As per documentation, when issuing a list of drillbits in the connection 
> string, we always attempt to connect only to the first one
> ---
>
> Key: DRILL-5015
> URL: https://issues.apache.org/jira/browse/DRILL-5015
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 1.8.0, 1.9.0
>Reporter: Sorabh Hamirwasia
>Assignee: Sudheesh Katkam
>
> When trying to connect to a Drill cluster by specifying more than 1 drillbits 
> to connect to, we always attempt to connect to only the first drillbit.
> As an example, we tested against a pair of drillbits, but we always connect 
> to the first entry in the CSV list by querying for the 'current' drillbit. 
> The remaining entries are never attempted.
> [root@pssc-60 agileSqlPerfTests]# /opt/mapr/drill/drill-1.8.0/bin/sqlline  -u 
>  "jdbc:drill:schema=dfs.tmp;drillbit=pssc-61:31010,pssc-62:31010" -f 
> whereAmI.q  | grep -v logback
> 1/1  select * from sys.drillbits where `current`;
> +-++---++--+
> |hostname | user_port  | control_port  | data_port  | current  |
> +-++---++--+
> | pssc-61.qa.lab  | 31010  | 31011 | 31012  | true |
> +-++---++--+
> 1 row selected (0.265 seconds)
> Closing: org.apache.drill.jdbc.impl.DrillConnectionImpl
> apache drill 1.8.0 
> "a little sql for your nosql"
> This property is meant for use by clients when not wanting to overload the ZK 
> for fetching a list of existing Drillbits, but the behaviour doesn't match 
> the documentation. 
> [Making a Direct Drillbit Connection | 
> https://drill.apache.org/docs/using-the-jdbc-driver/#using-the-jdbc-url-format-for-a-direct-drillbit-connection
>  ]
> We need to randomly shuffle between this list and If an entry in the shuffled 
> list is unreachable, we need to try for the next entry in the list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-5015) As per documentation, when issuing a list of drillbits in the connection string, we always attempt to connect only to the first one

2016-11-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15653522#comment-15653522
 ] 

ASF GitHub Bot commented on DRILL-5015:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/648#discussion_r87351928
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java ---
@@ -223,19 +223,65 @@ public void connect(Properties props) throws 
RpcException {
 connect(null, props);
   }
 
+  /**
+   * Function to populate the endpointList with information of all the 
drillbits
+   * provided in the connection string by client
+   * @param endpointList - ArrayList of DrillbitEndpoints
+   * @param drillbits - One or more drillbit ip[:port] provided in 
connection string
--- End diff --

Just to clarify the I am not changing the variable name "drillbit" used in 
the connection string. So there won't be any doc impact. The variable 
"drillbits" is used in the internal function which is parsing the string. 

As discussed the documentation is already there 
[here](http://drill.apache.org/docs/using-the-jdbc-driver/#using-the-jdbc-url-format-for-a-direct-drillbit-connection)


> As per documentation, when issuing a list of drillbits in the connection 
> string, we always attempt to connect only to the first one
> ---
>
> Key: DRILL-5015
> URL: https://issues.apache.org/jira/browse/DRILL-5015
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 1.8.0, 1.9.0
>Reporter: Sorabh Hamirwasia
>Assignee: Sudheesh Katkam
>
> When trying to connect to a Drill cluster by specifying more than 1 drillbits 
> to connect to, we always attempt to connect to only the first drillbit.
> As an example, we tested against a pair of drillbits, but we always connect 
> to the first entry in the CSV list by querying for the 'current' drillbit. 
> The remaining entries are never attempted.
> [root@pssc-60 agileSqlPerfTests]# /opt/mapr/drill/drill-1.8.0/bin/sqlline  -u 
>  "jdbc:drill:schema=dfs.tmp;drillbit=pssc-61:31010,pssc-62:31010" -f 
> whereAmI.q  | grep -v logback
> 1/1  select * from sys.drillbits where `current`;
> +-++---++--+
> |hostname | user_port  | control_port  | data_port  | current  |
> +-++---++--+
> | pssc-61.qa.lab  | 31010  | 31011 | 31012  | true |
> +-++---++--+
> 1 row selected (0.265 seconds)
> Closing: org.apache.drill.jdbc.impl.DrillConnectionImpl
> apache drill 1.8.0 
> "a little sql for your nosql"
> This property is meant for use by clients when not wanting to overload the ZK 
> for fetching a list of existing Drillbits, but the behaviour doesn't match 
> the documentation. 
> [Making a Direct Drillbit Connection | 
> https://drill.apache.org/docs/using-the-jdbc-driver/#using-the-jdbc-url-format-for-a-direct-drillbit-connection
>  ]
> We need to randomly shuffle between this list and If an entry in the shuffled 
> list is unreachable, we need to try for the next entry in the list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-5015) As per documentation, when issuing a list of drillbits in the connection string, we always attempt to connect only to the first one

2016-11-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15653520#comment-15653520
 ] 

ASF GitHub Bot commented on DRILL-5015:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/648#discussion_r87351935
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java ---
@@ -223,19 +223,65 @@ public void connect(Properties props) throws 
RpcException {
 connect(null, props);
   }
 
+  /**
+   * Function to populate the endpointList with information of all the 
drillbits
--- End diff --

Done


> As per documentation, when issuing a list of drillbits in the connection 
> string, we always attempt to connect only to the first one
> ---
>
> Key: DRILL-5015
> URL: https://issues.apache.org/jira/browse/DRILL-5015
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 1.8.0, 1.9.0
>Reporter: Sorabh Hamirwasia
>Assignee: Sudheesh Katkam
>
> When trying to connect to a Drill cluster by specifying more than 1 drillbits 
> to connect to, we always attempt to connect to only the first drillbit.
> As an example, we tested against a pair of drillbits, but we always connect 
> to the first entry in the CSV list by querying for the 'current' drillbit. 
> The remaining entries are never attempted.
> [root@pssc-60 agileSqlPerfTests]# /opt/mapr/drill/drill-1.8.0/bin/sqlline  -u 
>  "jdbc:drill:schema=dfs.tmp;drillbit=pssc-61:31010,pssc-62:31010" -f 
> whereAmI.q  | grep -v logback
> 1/1  select * from sys.drillbits where `current`;
> +-++---++--+
> |hostname | user_port  | control_port  | data_port  | current  |
> +-++---++--+
> | pssc-61.qa.lab  | 31010  | 31011 | 31012  | true |
> +-++---++--+
> 1 row selected (0.265 seconds)
> Closing: org.apache.drill.jdbc.impl.DrillConnectionImpl
> apache drill 1.8.0 
> "a little sql for your nosql"
> This property is meant for use by clients when not wanting to overload the ZK 
> for fetching a list of existing Drillbits, but the behaviour doesn't match 
> the documentation. 
> [Making a Direct Drillbit Connection | 
> https://drill.apache.org/docs/using-the-jdbc-driver/#using-the-jdbc-url-format-for-a-direct-drillbit-connection
>  ]
> We need to randomly shuffle between this list and If an entry in the shuffled 
> list is unreachable, we need to try for the next entry in the list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-5015) As per documentation, when issuing a list of drillbits in the connection string, we always attempt to connect only to the first one

2016-11-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15653510#comment-15653510
 ] 

ASF GitHub Bot commented on DRILL-5015:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/648#discussion_r87351682
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java ---
@@ -223,19 +223,65 @@ public void connect(Properties props) throws 
RpcException {
 connect(null, props);
   }
 
+  /**
+   * Function to populate the endpointList with information of all the 
drillbits
+   * provided in the connection string by client
+   * @param endpointList - ArrayList of DrillbitEndpoints
+   * @param drillbits - One or more drillbit ip[:port] provided in 
connection string
+   */
+  public void populateEndpointsList(ArrayList 
endpointList, String drillbits){
+
+// If no information about drillbits is provided then just return 
empty list.
+if(drillbits == null || drillbits.length() == 0){
+  return;
+}
+final String[] connectInfo = drillbits.split(",");
+
+/* For direct connection we can get URL string having drillbit 
property as below:
+ drillbit=: --- Use the IP and port specified as the 
Foreman IP and port
+ drillbit=--- Use the IP specified as the Foreman IP 
with default port in config file
+ drillbit=:,:... --- Randomly select the 
IP and port pair from the specified
+ list as the Foreman 
IP and port.
+
+   Fetch ip address and port information for each drillbit and 
populate the list
+*/
+for(String info : connectInfo){
+  info = info.trim();
+
+  if(info != null){
+// Split each info to get ip address and port value
+final String[] drillbitInfo = info.split(":");
+
+// Check for malformed ip:port string
+if(drillbitInfo == null || drillbitInfo.length == 0){
+  continue;
+}
+
+/* If port is present use that one else use the configured one
+   Assumptions: 1) IP Address provided in connection string is 
valid
+2) Port without IP address is never specified.
+*/
+final String port = (drillbitInfo.length == 2) ? drillbitInfo[1] : 
config.getString(ExecConstants.INITIAL_USER_PORT);
+final DrillbitEndpoint endpoint = DrillbitEndpoint.newBuilder()
+  .setAddress(drillbitInfo[0])
+  
.setUserPort(Integer.parseInt(port))
+  .build();
+endpointList.add(endpoint);
+  }
+}
+  }
+
   public synchronized void connect(String connect, Properties props) 
throws RpcException {
 if (connected) {
   return;
 }
 
 final DrillbitEndpoint endpoint;
+final ArrayList endpoints = new ArrayList<>();
 if (isDirectConnection) {
-  final String[] connectInfo = 
props.getProperty("drillbit").split(":");
-  final String port = 
connectInfo.length==2?connectInfo[1]:config.getString(ExecConstants.INITIAL_USER_PORT);
-  endpoint = DrillbitEndpoint.newBuilder()
-  .setAddress(connectInfo[0])
-  .setUserPort(Integer.parseInt(port))
-  .build();
+  // Populate the endpoints list with all the drillbit information 
provided in the
+  // connection string
+  populateEndpointsList(endpoints, 
props.getProperty("drillbit").trim());
--- End diff --

Fixed. Changed the method to parseAndVerifyEndpoints


> As per documentation, when issuing a list of drillbits in the connection 
> string, we always attempt to connect only to the first one
> ---
>
> Key: DRILL-5015
> URL: https://issues.apache.org/jira/browse/DRILL-5015
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 1.8.0, 1.9.0
>Reporter: Sorabh Hamirwasia
>Assignee: Sudheesh Katkam
>
> When trying to connect to a Drill cluster by specifying more than 1 drillbits 
> to connect to, we always attempt to connect to only the first drillbit.
> As an example, we tested against a pair of drillbits, but we always connect 
> to the first entry in the CSV list by querying for the 'current' drillbit. 
> The remaining entries are never attempted.
> [root@pssc-60 agileSqlPerfTests]# /opt/mapr/drill/drill-1.8.0/bin/sqlline  -u 
>  

[jira] [Commented] (DRILL-5015) As per documentation, when issuing a list of drillbits in the connection string, we always attempt to connect only to the first one

2016-11-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15653518#comment-15653518
 ] 

ASF GitHub Bot commented on DRILL-5015:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/648#discussion_r87351891
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java ---
@@ -223,19 +223,65 @@ public void connect(Properties props) throws 
RpcException {
 connect(null, props);
   }
 
+  /**
+   * Function to populate the endpointList with information of all the 
drillbits
+   * provided in the connection string by client
+   * @param endpointList - ArrayList of DrillbitEndpoints
+   * @param drillbits - One or more drillbit ip[:port] provided in 
connection string
+   */
+  public void populateEndpointsList(ArrayList 
endpointList, String drillbits){
+
+// If no information about drillbits is provided then just return 
empty list.
+if(drillbits == null || drillbits.length() == 0){
+  return;
+}
+final String[] connectInfo = drillbits.split(",");
+
+/* For direct connection we can get URL string having drillbit 
property as below:
+ drillbit=: --- Use the IP and port specified as the 
Foreman IP and port
+ drillbit=--- Use the IP specified as the Foreman IP 
with default port in config file
+ drillbit=:,:... --- Randomly select the 
IP and port pair from the specified
+ list as the Foreman 
IP and port.
+
+   Fetch ip address and port information for each drillbit and 
populate the list
+*/
+for(String info : connectInfo){
+  info = info.trim();
+
+  if(info != null){
+// Split each info to get ip address and port value
+final String[] drillbitInfo = info.split(":");
+
+// Check for malformed ip:port string
+if(drillbitInfo == null || drillbitInfo.length == 0){
--- End diff --

Length can be 0 here when the string just contain ":" 

In case of more than 2 ports right now I am falling back to default port. 
But as discussed changed it to throw new exception 
"InvalidConnectionInfoException"


> As per documentation, when issuing a list of drillbits in the connection 
> string, we always attempt to connect only to the first one
> ---
>
> Key: DRILL-5015
> URL: https://issues.apache.org/jira/browse/DRILL-5015
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 1.8.0, 1.9.0
>Reporter: Sorabh Hamirwasia
>Assignee: Sudheesh Katkam
>
> When trying to connect to a Drill cluster by specifying more than 1 drillbits 
> to connect to, we always attempt to connect to only the first drillbit.
> As an example, we tested against a pair of drillbits, but we always connect 
> to the first entry in the CSV list by querying for the 'current' drillbit. 
> The remaining entries are never attempted.
> [root@pssc-60 agileSqlPerfTests]# /opt/mapr/drill/drill-1.8.0/bin/sqlline  -u 
>  "jdbc:drill:schema=dfs.tmp;drillbit=pssc-61:31010,pssc-62:31010" -f 
> whereAmI.q  | grep -v logback
> 1/1  select * from sys.drillbits where `current`;
> +-++---++--+
> |hostname | user_port  | control_port  | data_port  | current  |
> +-++---++--+
> | pssc-61.qa.lab  | 31010  | 31011 | 31012  | true |
> +-++---++--+
> 1 row selected (0.265 seconds)
> Closing: org.apache.drill.jdbc.impl.DrillConnectionImpl
> apache drill 1.8.0 
> "a little sql for your nosql"
> This property is meant for use by clients when not wanting to overload the ZK 
> for fetching a list of existing Drillbits, but the behaviour doesn't match 
> the documentation. 
> [Making a Direct Drillbit Connection | 
> https://drill.apache.org/docs/using-the-jdbc-driver/#using-the-jdbc-url-format-for-a-direct-drillbit-connection
>  ]
> We need to randomly shuffle between this list and If an entry in the shuffled 
> list is unreachable, we need to try for the next entry in the list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-5015) As per documentation, when issuing a list of drillbits in the connection string, we always attempt to connect only to the first one

2016-11-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15653514#comment-15653514
 ] 

ASF GitHub Bot commented on DRILL-5015:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/648#discussion_r87351921
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java ---
@@ -223,19 +223,65 @@ public void connect(Properties props) throws 
RpcException {
 connect(null, props);
   }
 
+  /**
+   * Function to populate the endpointList with information of all the 
drillbits
+   * provided in the connection string by client
+   * @param endpointList - ArrayList of DrillbitEndpoints
+   * @param drillbits - One or more drillbit ip[:port] provided in 
connection string
+   */
+  public void populateEndpointsList(ArrayList 
endpointList, String drillbits){
--- End diff --

Fixed


> As per documentation, when issuing a list of drillbits in the connection 
> string, we always attempt to connect only to the first one
> ---
>
> Key: DRILL-5015
> URL: https://issues.apache.org/jira/browse/DRILL-5015
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 1.8.0, 1.9.0
>Reporter: Sorabh Hamirwasia
>Assignee: Sudheesh Katkam
>
> When trying to connect to a Drill cluster by specifying more than 1 drillbits 
> to connect to, we always attempt to connect to only the first drillbit.
> As an example, we tested against a pair of drillbits, but we always connect 
> to the first entry in the CSV list by querying for the 'current' drillbit. 
> The remaining entries are never attempted.
> [root@pssc-60 agileSqlPerfTests]# /opt/mapr/drill/drill-1.8.0/bin/sqlline  -u 
>  "jdbc:drill:schema=dfs.tmp;drillbit=pssc-61:31010,pssc-62:31010" -f 
> whereAmI.q  | grep -v logback
> 1/1  select * from sys.drillbits where `current`;
> +-++---++--+
> |hostname | user_port  | control_port  | data_port  | current  |
> +-++---++--+
> | pssc-61.qa.lab  | 31010  | 31011 | 31012  | true |
> +-++---++--+
> 1 row selected (0.265 seconds)
> Closing: org.apache.drill.jdbc.impl.DrillConnectionImpl
> apache drill 1.8.0 
> "a little sql for your nosql"
> This property is meant for use by clients when not wanting to overload the ZK 
> for fetching a list of existing Drillbits, but the behaviour doesn't match 
> the documentation. 
> [Making a Direct Drillbit Connection | 
> https://drill.apache.org/docs/using-the-jdbc-driver/#using-the-jdbc-url-format-for-a-direct-drillbit-connection
>  ]
> We need to randomly shuffle between this list and If an entry in the shuffled 
> list is unreachable, we need to try for the next entry in the list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-5015) As per documentation, when issuing a list of drillbits in the connection string, we always attempt to connect only to the first one

2016-11-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15653511#comment-15653511
 ] 

ASF GitHub Bot commented on DRILL-5015:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/648#discussion_r87120245
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java ---
@@ -223,19 +223,65 @@ public void connect(Properties props) throws 
RpcException {
 connect(null, props);
   }
 
+  /**
+   * Function to populate the endpointList with information of all the 
drillbits
+   * provided in the connection string by client
+   * @param endpointList - ArrayList of DrillbitEndpoints
+   * @param drillbits - One or more drillbit ip[:port] provided in 
connection string
--- End diff --

@kkhatua - I am not sure if I followed fully. This is an internal method 
parameter and is not changing the name of "drillbit" parameter in connection 
string. So it should not have any doc impact.
@paul-rogers - I am not introducing anything new here. The 
[documentation](http://drill.apache.org/docs/using-the-jdbc-driver/#using-the-jdbc-url-format-for-a-direct-drillbit-connection)
 already specifies the usage but the correct implementation was lacking. It 
also says we only support one port (user port) as is the case with zk.


> As per documentation, when issuing a list of drillbits in the connection 
> string, we always attempt to connect only to the first one
> ---
>
> Key: DRILL-5015
> URL: https://issues.apache.org/jira/browse/DRILL-5015
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 1.8.0, 1.9.0
>Reporter: Sorabh Hamirwasia
>Assignee: Sudheesh Katkam
>
> When trying to connect to a Drill cluster by specifying more than 1 drillbits 
> to connect to, we always attempt to connect to only the first drillbit.
> As an example, we tested against a pair of drillbits, but we always connect 
> to the first entry in the CSV list by querying for the 'current' drillbit. 
> The remaining entries are never attempted.
> [root@pssc-60 agileSqlPerfTests]# /opt/mapr/drill/drill-1.8.0/bin/sqlline  -u 
>  "jdbc:drill:schema=dfs.tmp;drillbit=pssc-61:31010,pssc-62:31010" -f 
> whereAmI.q  | grep -v logback
> 1/1  select * from sys.drillbits where `current`;
> +-++---++--+
> |hostname | user_port  | control_port  | data_port  | current  |
> +-++---++--+
> | pssc-61.qa.lab  | 31010  | 31011 | 31012  | true |
> +-++---++--+
> 1 row selected (0.265 seconds)
> Closing: org.apache.drill.jdbc.impl.DrillConnectionImpl
> apache drill 1.8.0 
> "a little sql for your nosql"
> This property is meant for use by clients when not wanting to overload the ZK 
> for fetching a list of existing Drillbits, but the behaviour doesn't match 
> the documentation. 
> [Making a Direct Drillbit Connection | 
> https://drill.apache.org/docs/using-the-jdbc-driver/#using-the-jdbc-url-format-for-a-direct-drillbit-connection
>  ]
> We need to randomly shuffle between this list and If an entry in the shuffled 
> list is unreachable, we need to try for the next entry in the list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-5015) As per documentation, when issuing a list of drillbits in the connection string, we always attempt to connect only to the first one

2016-11-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15653509#comment-15653509
 ] 

ASF GitHub Bot commented on DRILL-5015:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/648#discussion_r87351650
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/client/DrillClientSystemTest.java
 ---
@@ -73,4 +77,90 @@ public void testSubmitPlanTwoNodes() throws Exception {
 }
 client.close();
   }
+
+  @Test
+  public void testPopulateEndpointsList() throws Exception{
+
+ArrayList endpointsList = new ArrayList<>();
+String drillBitConnection;
+DrillClient client = new DrillClient();
+DrillbitEndpoint endpoint;
+Iterator endpointIterator;
+
+
+// Test with single drillbit ip
+drillBitConnection = "10.10.100.161";
+client.populateEndpointsList(endpointsList, drillBitConnection);
+endpoint = endpointsList.iterator().next();
+assert(endpointsList.size() == 1);
+assert(endpoint.getAddress().equalsIgnoreCase(drillBitConnection));
+assert(endpoint.getUserPort() == 
client.getConfig().getInt(ExecConstants.INITIAL_USER_PORT));
+
+// Test with single drillbit ip:port
+endpointsList.clear();
--- End diff --

Fixed


> As per documentation, when issuing a list of drillbits in the connection 
> string, we always attempt to connect only to the first one
> ---
>
> Key: DRILL-5015
> URL: https://issues.apache.org/jira/browse/DRILL-5015
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 1.8.0, 1.9.0
>Reporter: Sorabh Hamirwasia
>Assignee: Sudheesh Katkam
>
> When trying to connect to a Drill cluster by specifying more than 1 drillbits 
> to connect to, we always attempt to connect to only the first drillbit.
> As an example, we tested against a pair of drillbits, but we always connect 
> to the first entry in the CSV list by querying for the 'current' drillbit. 
> The remaining entries are never attempted.
> [root@pssc-60 agileSqlPerfTests]# /opt/mapr/drill/drill-1.8.0/bin/sqlline  -u 
>  "jdbc:drill:schema=dfs.tmp;drillbit=pssc-61:31010,pssc-62:31010" -f 
> whereAmI.q  | grep -v logback
> 1/1  select * from sys.drillbits where `current`;
> +-++---++--+
> |hostname | user_port  | control_port  | data_port  | current  |
> +-++---++--+
> | pssc-61.qa.lab  | 31010  | 31011 | 31012  | true |
> +-++---++--+
> 1 row selected (0.265 seconds)
> Closing: org.apache.drill.jdbc.impl.DrillConnectionImpl
> apache drill 1.8.0 
> "a little sql for your nosql"
> This property is meant for use by clients when not wanting to overload the ZK 
> for fetching a list of existing Drillbits, but the behaviour doesn't match 
> the documentation. 
> [Making a Direct Drillbit Connection | 
> https://drill.apache.org/docs/using-the-jdbc-driver/#using-the-jdbc-url-format-for-a-direct-drillbit-connection
>  ]
> We need to randomly shuffle between this list and If an entry in the shuffled 
> list is unreachable, we need to try for the next entry in the list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-5015) As per documentation, when issuing a list of drillbits in the connection string, we always attempt to connect only to the first one

2016-11-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15653512#comment-15653512
 ] 

ASF GitHub Bot commented on DRILL-5015:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/648#discussion_r87351880
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java ---
@@ -223,19 +223,65 @@ public void connect(Properties props) throws 
RpcException {
 connect(null, props);
   }
 
+  /**
+   * Function to populate the endpointList with information of all the 
drillbits
+   * provided in the connection string by client
+   * @param endpointList - ArrayList of DrillbitEndpoints
+   * @param drillbits - One or more drillbit ip[:port] provided in 
connection string
+   */
+  public void populateEndpointsList(ArrayList 
endpointList, String drillbits){
+
+// If no information about drillbits is provided then just return 
empty list.
+if(drillbits == null || drillbits.length() == 0){
+  return;
+}
+final String[] connectInfo = drillbits.split(",");
+
+/* For direct connection we can get URL string having drillbit 
property as below:
+ drillbit=: --- Use the IP and port specified as the 
Foreman IP and port
+ drillbit=--- Use the IP specified as the Foreman IP 
with default port in config file
+ drillbit=:,:... --- Randomly select the 
IP and port pair from the specified
+ list as the Foreman 
IP and port.
+
+   Fetch ip address and port information for each drillbit and 
populate the list
+*/
+for(String info : connectInfo){
+  info = info.trim();
+
+  if(info != null){
+// Split each info to get ip address and port value
+final String[] drillbitInfo = info.split(":");
+
+// Check for malformed ip:port string
+if(drillbitInfo == null || drillbitInfo.length == 0){
+  continue;
+}
+
+/* If port is present use that one else use the configured one
+   Assumptions: 1) IP Address provided in connection string is 
valid
+2) Port without IP address is never specified.
+*/
+final String port = (drillbitInfo.length == 2) ? drillbitInfo[1] : 
config.getString(ExecConstants.INITIAL_USER_PORT);
--- End diff --

Put the sanity checks in place for all error condition and throwing 
exception with well formed messages. 
For length > 2 case, treating it as error and throwing exception


> As per documentation, when issuing a list of drillbits in the connection 
> string, we always attempt to connect only to the first one
> ---
>
> Key: DRILL-5015
> URL: https://issues.apache.org/jira/browse/DRILL-5015
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 1.8.0, 1.9.0
>Reporter: Sorabh Hamirwasia
>Assignee: Sudheesh Katkam
>
> When trying to connect to a Drill cluster by specifying more than 1 drillbits 
> to connect to, we always attempt to connect to only the first drillbit.
> As an example, we tested against a pair of drillbits, but we always connect 
> to the first entry in the CSV list by querying for the 'current' drillbit. 
> The remaining entries are never attempted.
> [root@pssc-60 agileSqlPerfTests]# /opt/mapr/drill/drill-1.8.0/bin/sqlline  -u 
>  "jdbc:drill:schema=dfs.tmp;drillbit=pssc-61:31010,pssc-62:31010" -f 
> whereAmI.q  | grep -v logback
> 1/1  select * from sys.drillbits where `current`;
> +-++---++--+
> |hostname | user_port  | control_port  | data_port  | current  |
> +-++---++--+
> | pssc-61.qa.lab  | 31010  | 31011 | 31012  | true |
> +-++---++--+
> 1 row selected (0.265 seconds)
> Closing: org.apache.drill.jdbc.impl.DrillConnectionImpl
> apache drill 1.8.0 
> "a little sql for your nosql"
> This property is meant for use by clients when not wanting to overload the ZK 
> for fetching a list of existing Drillbits, but the behaviour doesn't match 
> the documentation. 
> [Making a Direct Drillbit Connection | 
> https://drill.apache.org/docs/using-the-jdbc-driver/#using-the-jdbc-url-format-for-a-direct-drillbit-connection
>  ]
> We need to randomly shuffle between this list and If an entry in the shuffled 
> list is unreachable, we need to try 

[jira] [Commented] (DRILL-5015) As per documentation, when issuing a list of drillbits in the connection string, we always attempt to connect only to the first one

2016-11-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15653515#comment-15653515
 ] 

ASF GitHub Bot commented on DRILL-5015:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/648#discussion_r87351685
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java ---
@@ -223,19 +223,65 @@ public void connect(Properties props) throws 
RpcException {
 connect(null, props);
   }
 
+  /**
+   * Function to populate the endpointList with information of all the 
drillbits
+   * provided in the connection string by client
+   * @param endpointList - ArrayList of DrillbitEndpoints
+   * @param drillbits - One or more drillbit ip[:port] provided in 
connection string
+   */
+  public void populateEndpointsList(ArrayList 
endpointList, String drillbits){
+
+// If no information about drillbits is provided then just return 
empty list.
+if(drillbits == null || drillbits.length() == 0){
+  return;
+}
+final String[] connectInfo = drillbits.split(",");
+
+/* For direct connection we can get URL string having drillbit 
property as below:
+ drillbit=: --- Use the IP and port specified as the 
Foreman IP and port
+ drillbit=--- Use the IP specified as the Foreman IP 
with default port in config file
+ drillbit=:,:... --- Randomly select the 
IP and port pair from the specified
+ list as the Foreman 
IP and port.
+
+   Fetch ip address and port information for each drillbit and 
populate the list
+*/
+for(String info : connectInfo){
+  info = info.trim();
+
+  if(info != null){
+// Split each info to get ip address and port value
+final String[] drillbitInfo = info.split(":");
+
+// Check for malformed ip:port string
+if(drillbitInfo == null || drillbitInfo.length == 0){
+  continue;
+}
+
+/* If port is present use that one else use the configured one
+   Assumptions: 1) IP Address provided in connection string is 
valid
+2) Port without IP address is never specified.
+*/
+final String port = (drillbitInfo.length == 2) ? drillbitInfo[1] : 
config.getString(ExecConstants.INITIAL_USER_PORT);
+final DrillbitEndpoint endpoint = DrillbitEndpoint.newBuilder()
+  .setAddress(drillbitInfo[0])
+  
.setUserPort(Integer.parseInt(port))
+  .build();
+endpointList.add(endpoint);
+  }
+}
+  }
+
   public synchronized void connect(String connect, Properties props) 
throws RpcException {
 if (connected) {
   return;
 }
 
 final DrillbitEndpoint endpoint;
+final ArrayList endpoints = new ArrayList<>();
 if (isDirectConnection) {
-  final String[] connectInfo = 
props.getProperty("drillbit").split(":");
-  final String port = 
connectInfo.length==2?connectInfo[1]:config.getString(ExecConstants.INITIAL_USER_PORT);
-  endpoint = DrillbitEndpoint.newBuilder()
-  .setAddress(connectInfo[0])
-  .setUserPort(Integer.parseInt(port))
-  .build();
+  // Populate the endpoints list with all the drillbit information 
provided in the
+  // connection string
+  populateEndpointsList(endpoints, 
props.getProperty("drillbit").trim());
--- End diff --

if "drillbit" is unset in the connection string then this code path won't 
be called at all. If "drillbit" string is specified in connection string then 
the value is set to empty string when none exists. So .trim() will not cause 
NPE. But moved it in callee.


> As per documentation, when issuing a list of drillbits in the connection 
> string, we always attempt to connect only to the first one
> ---
>
> Key: DRILL-5015
> URL: https://issues.apache.org/jira/browse/DRILL-5015
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 1.8.0, 1.9.0
>Reporter: Sorabh Hamirwasia
>Assignee: Sudheesh Katkam
>
> When trying to connect to a Drill cluster by specifying more than 1 drillbits 
> to connect to, we always attempt to connect to only the first drillbit.
> As an example, we tested against a pair of drillbits, but we always connect 
> 

[jira] [Commented] (DRILL-5015) As per documentation, when issuing a list of drillbits in the connection string, we always attempt to connect only to the first one

2016-11-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15653513#comment-15653513
 ] 

ASF GitHub Bot commented on DRILL-5015:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/648#discussion_r87351874
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java ---
@@ -223,19 +223,65 @@ public void connect(Properties props) throws 
RpcException {
 connect(null, props);
   }
 
+  /**
+   * Function to populate the endpointList with information of all the 
drillbits
+   * provided in the connection string by client
+   * @param endpointList - ArrayList of DrillbitEndpoints
+   * @param drillbits - One or more drillbit ip[:port] provided in 
connection string
+   */
+  public void populateEndpointsList(ArrayList 
endpointList, String drillbits){
+
+// If no information about drillbits is provided then just return 
empty list.
+if(drillbits == null || drillbits.length() == 0){
+  return;
+}
+final String[] connectInfo = drillbits.split(",");
+
+/* For direct connection we can get URL string having drillbit 
property as below:
+ drillbit=: --- Use the IP and port specified as the 
Foreman IP and port
+ drillbit=--- Use the IP specified as the Foreman IP 
with default port in config file
+ drillbit=:,:... --- Randomly select the 
IP and port pair from the specified
+ list as the Foreman 
IP and port.
+
+   Fetch ip address and port information for each drillbit and 
populate the list
+*/
+for(String info : connectInfo){
+  info = info.trim();
+
+  if(info != null){
+// Split each info to get ip address and port value
+final String[] drillbitInfo = info.split(":");
+
+// Check for malformed ip:port string
+if(drillbitInfo == null || drillbitInfo.length == 0){
+  continue;
+}
+
+/* If port is present use that one else use the configured one
+   Assumptions: 1) IP Address provided in connection string is 
valid
+2) Port without IP address is never specified.
+*/
+final String port = (drillbitInfo.length == 2) ? drillbitInfo[1] : 
config.getString(ExecConstants.INITIAL_USER_PORT);
+final DrillbitEndpoint endpoint = DrillbitEndpoint.newBuilder()
+  .setAddress(drillbitInfo[0])
+  
.setUserPort(Integer.parseInt(port))
--- End diff --

I am catching the exception that can arise from parseInt and throwing it 
back to the user as InvalidConnectionInfoException with proper error message.


> As per documentation, when issuing a list of drillbits in the connection 
> string, we always attempt to connect only to the first one
> ---
>
> Key: DRILL-5015
> URL: https://issues.apache.org/jira/browse/DRILL-5015
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 1.8.0, 1.9.0
>Reporter: Sorabh Hamirwasia
>Assignee: Sudheesh Katkam
>
> When trying to connect to a Drill cluster by specifying more than 1 drillbits 
> to connect to, we always attempt to connect to only the first drillbit.
> As an example, we tested against a pair of drillbits, but we always connect 
> to the first entry in the CSV list by querying for the 'current' drillbit. 
> The remaining entries are never attempted.
> [root@pssc-60 agileSqlPerfTests]# /opt/mapr/drill/drill-1.8.0/bin/sqlline  -u 
>  "jdbc:drill:schema=dfs.tmp;drillbit=pssc-61:31010,pssc-62:31010" -f 
> whereAmI.q  | grep -v logback
> 1/1  select * from sys.drillbits where `current`;
> +-++---++--+
> |hostname | user_port  | control_port  | data_port  | current  |
> +-++---++--+
> | pssc-61.qa.lab  | 31010  | 31011 | 31012  | true |
> +-++---++--+
> 1 row selected (0.265 seconds)
> Closing: org.apache.drill.jdbc.impl.DrillConnectionImpl
> apache drill 1.8.0 
> "a little sql for your nosql"
> This property is meant for use by clients when not wanting to overload the ZK 
> for fetching a list of existing Drillbits, but the behaviour doesn't match 
> the documentation. 
> [Making a Direct Drillbit Connection | 
> 

[jira] [Commented] (DRILL-5015) As per documentation, when issuing a list of drillbits in the connection string, we always attempt to connect only to the first one

2016-11-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15653506#comment-15653506
 ] 

ASF GitHub Bot commented on DRILL-5015:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/648#discussion_r87351621
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/client/DrillClientSystemTest.java
 ---
@@ -73,4 +77,90 @@ public void testSubmitPlanTwoNodes() throws Exception {
 }
 client.close();
   }
+
+  @Test
+  public void testPopulateEndpointsList() throws Exception{
+
+ArrayList endpointsList = new ArrayList<>();
+String drillBitConnection;
+DrillClient client = new DrillClient();
+DrillbitEndpoint endpoint;
+Iterator endpointIterator;
+
+
+// Test with single drillbit ip
+drillBitConnection = "10.10.100.161";
+client.populateEndpointsList(endpointsList, drillBitConnection);
+endpoint = endpointsList.iterator().next();
+assert(endpointsList.size() == 1);
+assert(endpoint.getAddress().equalsIgnoreCase(drillBitConnection));
+assert(endpoint.getUserPort() == 
client.getConfig().getInt(ExecConstants.INITIAL_USER_PORT));
+
+// Test with single drillbit ip:port
+endpointsList.clear();
+drillBitConnection = "10.10.100.161:5000";
+String[] ipAndPort = drillBitConnection.split(":");
+client.populateEndpointsList(endpointsList, drillBitConnection);
+assert(endpointsList.size() == 1);
+
+endpoint = endpointsList.iterator().next();
+assert(endpoint.getAddress().equalsIgnoreCase(ipAndPort[0]));
+assert(endpoint.getUserPort() == Integer.parseInt(ipAndPort[1]));
+
+// Test with multiple drillbit ip
+endpointsList.clear();
+drillBitConnection = "10.10.100.161,10.10.100.162";
+client.populateEndpointsList(endpointsList, drillBitConnection);
+assert(endpointsList.size() == 2);
+
+endpointIterator = endpointsList.iterator();
+endpoint = endpointIterator.next();
+assert(endpoint.getAddress().equalsIgnoreCase("10.10.100.161"));
+assert(endpoint.getUserPort() == 
client.getConfig().getInt(ExecConstants.INITIAL_USER_PORT));
+
+endpoint = endpointIterator.next();
+assert(endpoint.getAddress().equalsIgnoreCase("10.10.100.162"));
+assert(endpoint.getUserPort() == 
client.getConfig().getInt(ExecConstants.INITIAL_USER_PORT));
+
+// Test with multiple drillbit ip:port
+endpointsList.clear();
+drillBitConnection = "10.10.100.161:5000,10.10.100.162:5000";
+client.populateEndpointsList(endpointsList, drillBitConnection);
+assert(endpointsList.size() == 2);
+
+endpointIterator = endpointsList.iterator();
+endpoint = endpointIterator.next();
+assert(endpoint.getAddress().equalsIgnoreCase("10.10.100.161"));
+assert(endpoint.getUserPort() == 5000);
+
+endpoint = endpointIterator.next();
+assert(endpoint.getAddress().equalsIgnoreCase("10.10.100.162"));
+assert(endpoint.getUserPort() == 5000);
+
+// Test with multiple drillbit with mix of ip:port and ip
+endpointsList.clear();
+drillBitConnection = "10.10.100.161:5000,10.10.100.162";
+client.populateEndpointsList(endpointsList, drillBitConnection);
+assert(endpointsList.size() == 2);
+
+endpointIterator = endpointsList.iterator();
+endpoint = endpointIterator.next();
+assert(endpoint.getAddress().equalsIgnoreCase("10.10.100.161"));
+assert(endpoint.getUserPort() == 5000);
+
+endpoint = endpointIterator.next();
+assert(endpoint.getAddress().equalsIgnoreCase("10.10.100.162"));
+assert(endpoint.getUserPort() == 
client.getConfig().getInt(ExecConstants.INITIAL_USER_PORT));
+
+// Test with empty string
+endpointsList.clear();
+drillBitConnection = "";
+client.populateEndpointsList(endpointsList, drillBitConnection);
+assert(endpointsList.size() == 0);
+
+
--- End diff --

Added more test case based on new implementation


> As per documentation, when issuing a list of drillbits in the connection 
> string, we always attempt to connect only to the first one
> ---
>
> Key: DRILL-5015
> URL: https://issues.apache.org/jira/browse/DRILL-5015
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 1.8.0, 1.9.0
>Reporter: Sorabh Hamirwasia
>Assignee: Sudheesh Katkam
>
> When trying to 

[jira] [Commented] (DRILL-5015) As per documentation, when issuing a list of drillbits in the connection string, we always attempt to connect only to the first one

2016-11-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15653517#comment-15653517
 ] 

ASF GitHub Bot commented on DRILL-5015:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/648#discussion_r87351643
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/client/DrillClientSystemTest.java
 ---
@@ -73,4 +77,90 @@ public void testSubmitPlanTwoNodes() throws Exception {
 }
 client.close();
   }
+
+  @Test
+  public void testPopulateEndpointsList() throws Exception{
+
+ArrayList endpointsList = new ArrayList<>();
+String drillBitConnection;
+DrillClient client = new DrillClient();
+DrillbitEndpoint endpoint;
+Iterator endpointIterator;
+
+
+// Test with single drillbit ip
+drillBitConnection = "10.10.100.161";
+client.populateEndpointsList(endpointsList, drillBitConnection);
+endpoint = endpointsList.iterator().next();
+assert(endpointsList.size() == 1);
+assert(endpoint.getAddress().equalsIgnoreCase(drillBitConnection));
+assert(endpoint.getUserPort() == 
client.getConfig().getInt(ExecConstants.INITIAL_USER_PORT));
+
+// Test with single drillbit ip:port
+endpointsList.clear();
+drillBitConnection = "10.10.100.161:5000";
+String[] ipAndPort = drillBitConnection.split(":");
+client.populateEndpointsList(endpointsList, drillBitConnection);
+assert(endpointsList.size() == 1);
+
+endpoint = endpointsList.iterator().next();
+assert(endpoint.getAddress().equalsIgnoreCase(ipAndPort[0]));
+assert(endpoint.getUserPort() == Integer.parseInt(ipAndPort[1]));
+
+// Test with multiple drillbit ip
+endpointsList.clear();
+drillBitConnection = "10.10.100.161,10.10.100.162";
+client.populateEndpointsList(endpointsList, drillBitConnection);
+assert(endpointsList.size() == 2);
+
+endpointIterator = endpointsList.iterator();
+endpoint = endpointIterator.next();
+assert(endpoint.getAddress().equalsIgnoreCase("10.10.100.161"));
+assert(endpoint.getUserPort() == 
client.getConfig().getInt(ExecConstants.INITIAL_USER_PORT));
--- End diff --

Fixed


> As per documentation, when issuing a list of drillbits in the connection 
> string, we always attempt to connect only to the first one
> ---
>
> Key: DRILL-5015
> URL: https://issues.apache.org/jira/browse/DRILL-5015
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 1.8.0, 1.9.0
>Reporter: Sorabh Hamirwasia
>Assignee: Sudheesh Katkam
>
> When trying to connect to a Drill cluster by specifying more than 1 drillbits 
> to connect to, we always attempt to connect to only the first drillbit.
> As an example, we tested against a pair of drillbits, but we always connect 
> to the first entry in the CSV list by querying for the 'current' drillbit. 
> The remaining entries are never attempted.
> [root@pssc-60 agileSqlPerfTests]# /opt/mapr/drill/drill-1.8.0/bin/sqlline  -u 
>  "jdbc:drill:schema=dfs.tmp;drillbit=pssc-61:31010,pssc-62:31010" -f 
> whereAmI.q  | grep -v logback
> 1/1  select * from sys.drillbits where `current`;
> +-++---++--+
> |hostname | user_port  | control_port  | data_port  | current  |
> +-++---++--+
> | pssc-61.qa.lab  | 31010  | 31011 | 31012  | true |
> +-++---++--+
> 1 row selected (0.265 seconds)
> Closing: org.apache.drill.jdbc.impl.DrillConnectionImpl
> apache drill 1.8.0 
> "a little sql for your nosql"
> This property is meant for use by clients when not wanting to overload the ZK 
> for fetching a list of existing Drillbits, but the behaviour doesn't match 
> the documentation. 
> [Making a Direct Drillbit Connection | 
> https://drill.apache.org/docs/using-the-jdbc-driver/#using-the-jdbc-url-format-for-a-direct-drillbit-connection
>  ]
> We need to randomly shuffle between this list and If an entry in the shuffled 
> list is unreachable, we need to try for the next entry in the list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-5015) As per documentation, when issuing a list of drillbits in the connection string, we always attempt to connect only to the first one

2016-11-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15653507#comment-15653507
 ] 

ASF GitHub Bot commented on DRILL-5015:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/648#discussion_r87351895
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java ---
@@ -223,19 +223,65 @@ public void connect(Properties props) throws 
RpcException {
 connect(null, props);
   }
 
+  /**
+   * Function to populate the endpointList with information of all the 
drillbits
+   * provided in the connection string by client
+   * @param endpointList - ArrayList of DrillbitEndpoints
+   * @param drillbits - One or more drillbit ip[:port] provided in 
connection string
+   */
+  public void populateEndpointsList(ArrayList 
endpointList, String drillbits){
+
+// If no information about drillbits is provided then just return 
empty list.
+if(drillbits == null || drillbits.length() == 0){
+  return;
+}
+final String[] connectInfo = drillbits.split(",");
+
+/* For direct connection we can get URL string having drillbit 
property as below:
+ drillbit=: --- Use the IP and port specified as the 
Foreman IP and port
+ drillbit=--- Use the IP specified as the Foreman IP 
with default port in config file
+ drillbit=:,:... --- Randomly select the 
IP and port pair from the specified
+ list as the Foreman 
IP and port.
+
+   Fetch ip address and port information for each drillbit and 
populate the list
+*/
+for(String info : connectInfo){
+  info = info.trim();
+
+  if(info != null){
--- End diff --

Yes. Fixed. Changed to drillbit


> As per documentation, when issuing a list of drillbits in the connection 
> string, we always attempt to connect only to the first one
> ---
>
> Key: DRILL-5015
> URL: https://issues.apache.org/jira/browse/DRILL-5015
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 1.8.0, 1.9.0
>Reporter: Sorabh Hamirwasia
>Assignee: Sudheesh Katkam
>
> When trying to connect to a Drill cluster by specifying more than 1 drillbits 
> to connect to, we always attempt to connect to only the first drillbit.
> As an example, we tested against a pair of drillbits, but we always connect 
> to the first entry in the CSV list by querying for the 'current' drillbit. 
> The remaining entries are never attempted.
> [root@pssc-60 agileSqlPerfTests]# /opt/mapr/drill/drill-1.8.0/bin/sqlline  -u 
>  "jdbc:drill:schema=dfs.tmp;drillbit=pssc-61:31010,pssc-62:31010" -f 
> whereAmI.q  | grep -v logback
> 1/1  select * from sys.drillbits where `current`;
> +-++---++--+
> |hostname | user_port  | control_port  | data_port  | current  |
> +-++---++--+
> | pssc-61.qa.lab  | 31010  | 31011 | 31012  | true |
> +-++---++--+
> 1 row selected (0.265 seconds)
> Closing: org.apache.drill.jdbc.impl.DrillConnectionImpl
> apache drill 1.8.0 
> "a little sql for your nosql"
> This property is meant for use by clients when not wanting to overload the ZK 
> for fetching a list of existing Drillbits, but the behaviour doesn't match 
> the documentation. 
> [Making a Direct Drillbit Connection | 
> https://drill.apache.org/docs/using-the-jdbc-driver/#using-the-jdbc-url-format-for-a-direct-drillbit-connection
>  ]
> We need to randomly shuffle between this list and If an entry in the shuffled 
> list is unreachable, we need to try for the next entry in the list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-5015) As per documentation, when issuing a list of drillbits in the connection string, we always attempt to connect only to the first one

2016-11-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15653516#comment-15653516
 ] 

ASF GitHub Bot commented on DRILL-5015:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/648#discussion_r87351912
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java ---
@@ -223,19 +223,65 @@ public void connect(Properties props) throws 
RpcException {
 connect(null, props);
   }
 
+  /**
+   * Function to populate the endpointList with information of all the 
drillbits
+   * provided in the connection string by client
+   * @param endpointList - ArrayList of DrillbitEndpoints
+   * @param drillbits - One or more drillbit ip[:port] provided in 
connection string
+   */
+  public void populateEndpointsList(ArrayList 
endpointList, String drillbits){
+
+// If no information about drillbits is provided then just return 
empty list.
+if(drillbits == null || drillbits.length() == 0){
--- End diff --

Fixed


> As per documentation, when issuing a list of drillbits in the connection 
> string, we always attempt to connect only to the first one
> ---
>
> Key: DRILL-5015
> URL: https://issues.apache.org/jira/browse/DRILL-5015
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 1.8.0, 1.9.0
>Reporter: Sorabh Hamirwasia
>Assignee: Sudheesh Katkam
>
> When trying to connect to a Drill cluster by specifying more than 1 drillbits 
> to connect to, we always attempt to connect to only the first drillbit.
> As an example, we tested against a pair of drillbits, but we always connect 
> to the first entry in the CSV list by querying for the 'current' drillbit. 
> The remaining entries are never attempted.
> [root@pssc-60 agileSqlPerfTests]# /opt/mapr/drill/drill-1.8.0/bin/sqlline  -u 
>  "jdbc:drill:schema=dfs.tmp;drillbit=pssc-61:31010,pssc-62:31010" -f 
> whereAmI.q  | grep -v logback
> 1/1  select * from sys.drillbits where `current`;
> +-++---++--+
> |hostname | user_port  | control_port  | data_port  | current  |
> +-++---++--+
> | pssc-61.qa.lab  | 31010  | 31011 | 31012  | true |
> +-++---++--+
> 1 row selected (0.265 seconds)
> Closing: org.apache.drill.jdbc.impl.DrillConnectionImpl
> apache drill 1.8.0 
> "a little sql for your nosql"
> This property is meant for use by clients when not wanting to overload the ZK 
> for fetching a list of existing Drillbits, but the behaviour doesn't match 
> the documentation. 
> [Making a Direct Drillbit Connection | 
> https://drill.apache.org/docs/using-the-jdbc-driver/#using-the-jdbc-url-format-for-a-direct-drillbit-connection
>  ]
> We need to randomly shuffle between this list and If an entry in the shuffled 
> list is unreachable, we need to try for the next entry in the list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)