[jira] [Updated] (DRILL-5031) Documentation for HTTPD Parser

2016-11-29 Thread Charles Givre (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Givre updated DRILL-5031:
-
Flags: Patch,Important  (was: Patch)

> Documentation for HTTPD Parser
> --
>
> Key: DRILL-5031
> URL: https://issues.apache.org/jira/browse/DRILL-5031
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.9.0
>Reporter: Charles Givre
>Priority: Minor
>  Labels: doc-impacting
> Fix For: 1.9.0
>
>
> https://gist.github.com/cgivre/47f07a06d44df2af625fc6848407ae7c



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-5031) Documentation for HTTPD Parser

2016-11-29 Thread Charles Givre (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Givre updated DRILL-5031:
-
Fix Version/s: (was: Future)
   1.9.0

> Documentation for HTTPD Parser
> --
>
> Key: DRILL-5031
> URL: https://issues.apache.org/jira/browse/DRILL-5031
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.9.0
>Reporter: Charles Givre
>Priority: Minor
>  Labels: doc-impacting
> Fix For: 1.9.0
>
>
> https://gist.github.com/cgivre/47f07a06d44df2af625fc6848407ae7c



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3637) Elasticsearch storage plugin

2016-11-29 Thread Charles Givre (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15707488#comment-15707488
 ] 

Charles Givre commented on DRILL-3637:
--

Hello all, 
I stumbled on this https://github.com/Anchormen/sql4es this evening and was 
wondering if this might be of use in integrating Drill with Elasticsearch.  It 
is a mostly fully functional JDBC driver for Elasticsearch.  

Here's the description:
Sql-for-Elasticsearch (sql4es) is a jdbc 4.1 driver for Elasticsearch 2.0 - 2.4 
implementing the majority of the JDBC interfaces: Connection, Statement, 
PreparedStatment, ResultSet, Batch and DataBase- / ResultSetMetadata. The 
screenshot below shows SQLWorkbenchJ with a selection of SQL statements that 
can be executed using the driver. As of version 0.8.2.3 the driver supports 
Shield allowing the use of credentials and SSL.





> Elasticsearch storage plugin
> 
>
> Key: DRILL-3637
> URL: https://issues.apache.org/jira/browse/DRILL-3637
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - ElasticSearch
>Reporter: Andrew
> Fix For: Future
>
>
> Create a storage plugin for elasticsearch



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-5082) Metadata Cache is being refreshed every single time

2016-11-29 Thread Padma Penumarthy (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15707052#comment-15707052
 ] 

Padma Penumarthy commented on DRILL-5082:
-

Tried with different file systems to see what the behavior is. HDFS, MapR FS, 
Linux and Mac OS X, all behave the same way. 
rename does not update the modification time of the file and since parent 
directory is changed, its modification time is updated.
One possible solution is after rename, update the modification time of the file 
to be same as parent directory in the code using FileSystem setTimes API. I 
made the change and tested it to be solving this problem.
However, with concurrent access, there could be issues which can only be solved 
with some kind of distributed locking mechanism in place. 
There will be a window between rename and setTimes, during which if  a read 
comes from another connection, we may end up rebuilding the metadata cache. 
setTimes after rename with no synchronization across multiple writers may 
create problems with timestamps when there are concurrent writes.

> Metadata Cache is being refreshed every single time
> ---
>
> Key: DRILL-5082
> URL: https://issues.apache.org/jira/browse/DRILL-5082
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Reporter: Rahul Challapalli
>Assignee: Padma Penumarthy
>Priority: Critical
>
> Git Commit  : 04fb0be191ef09409c00ca7173cb903dfbe2abb0
> After the DRILL-4381 fix we are refreshing the metadata cache for every 
> single query. This could be because renaming a file is updating the 
> directory's timestamp but not the renamed file. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4347) Planning time for query64 from TPCDS test suite has increased 10 times compared to 1.4 release

2016-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15706719#comment-15706719
 ] 

ASF GitHub Bot commented on DRILL-4347:
---

Github user amansinha100 commented on the issue:

https://github.com/apache/drill/pull/671
  
Thanks @julianhyde... CALCITE-604 should potentially help with this.  
Drill's calcite version has not caught up to this yet.  Let me confer with 
@jinfengni  sometime next week (he is on vacation until then) and get back on 
what can be done to get this into Drill.  In the meantime, even though my patch 
addresses the hang issue, I will hold it for now. 


> Planning time for query64 from TPCDS test suite has increased 10 times 
> compared to 1.4 release
> --
>
> Key: DRILL-4347
> URL: https://issues.apache.org/jira/browse/DRILL-4347
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.5.0
>Reporter: Victoria Markman
>Assignee: Gautam Kumar Parai
> Fix For: Future
>
> Attachments: 294e9fb9-cdda-a89f-d1a7-b852878926a1.sys.drill_1.4.0, 
> 294ea418-9fb8-3082-1725-74e3cfe38fe9.sys.drill_1.5.0, drill4347_jstack.txt
>
>
> mapr-drill-1.5.0.201602012001-1.noarch.rpm
> {code}
> 0: jdbc:drill:schema=dfs> WITH cs_ui
> . . . . . . . . . . . . >  AS (SELECT cs_item_sk,
> . . . . . . . . . . . . > Sum(cs_ext_list_price) AS sale,
> . . . . . . . . . . . . > Sum(cr_refunded_cash + 
> cr_reversed_charge
> . . . . . . . . . . . . > + cr_store_credit) AS refund
> . . . . . . . . . . . . >  FROM   catalog_sales,
> . . . . . . . . . . . . > catalog_returns
> . . . . . . . . . . . . >  WHERE  cs_item_sk = cr_item_sk
> . . . . . . . . . . . . > AND cs_order_number = 
> cr_order_number
> . . . . . . . . . . . . >  GROUP  BY cs_item_sk
> . . . . . . . . . . . . >  HAVING Sum(cs_ext_list_price) > 2 * Sum(
> . . . . . . . . . . . . > cr_refunded_cash + 
> cr_reversed_charge
> . . . . . . . . . . . . > + cr_store_credit)),
> . . . . . . . . . . . . >  cross_sales
> . . . . . . . . . . . . >  AS (SELECT i_product_name product_name,
> . . . . . . . . . . . . > i_item_sk  item_sk,
> . . . . . . . . . . . . > s_store_name   store_name,
> . . . . . . . . . . . . > s_zip  store_zip,
> . . . . . . . . . . . . > ad1.ca_street_number   
> b_street_number,
> . . . . . . . . . . . . > ad1.ca_street_name 
> b_streen_name,
> . . . . . . . . . . . . > ad1.ca_cityb_city,
> . . . . . . . . . . . . > ad1.ca_zip b_zip,
> . . . . . . . . . . . . > ad2.ca_street_number   
> c_street_number,
> . . . . . . . . . . . . > ad2.ca_street_name 
> c_street_name,
> . . . . . . . . . . . . > ad2.ca_cityc_city,
> . . . . . . . . . . . . > ad2.ca_zip c_zip,
> . . . . . . . . . . . . > d1.d_year  AS syear,
> . . . . . . . . . . . . > d2.d_year  AS fsyear,
> . . . . . . . . . . . . > d3.d_year  s2year,
> . . . . . . . . . . . . > Count(*)   cnt,
> . . . . . . . . . . . . > Sum(ss_wholesale_cost) s1,
> . . . . . . . . . . . . > Sum(ss_list_price) s2,
> . . . . . . . . . . . . > Sum(ss_coupon_amt) s3
> . . . . . . . . . . . . >  FROM   store_sales,
> . . . . . . . . . . . . > store_returns,
> . . . . . . . . . . . . > cs_ui,
> . . . . . . . . . . . . > date_dim d1,
> . . . . . . . . . . . . > date_dim d2,
> . . . . . . . . . . . . > date_dim d3,
> . . . . . . . . . . . . > store,
> . . . . . . . . . . . . > customer,
> . . . . . . . . . . . . > customer_demographics cd1,
> . . . . . . . . . . . . > customer_demographics cd2,
> . . . . . . . . . . . . > promotion,
> . . . . . . . . . . . . > household_demographics hd1,
> . . . . . . . . . . . . > household_demographics hd2,
> . . . . . . . . . . . . > customer_address ad1,
> . . . . . . . . . . . . > customer_address ad2,
> . . . . . . . . . . . . > income_band ib1,
> . . . . . . . . . . . . > income_band ib2,
> . . . . . . . . . . . . > item
> . . . . . . . . . . . 

[jira] [Commented] (DRILL-4455) Depend on Apache Arrow for Vector and Memory

2016-11-29 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15706607#comment-15706607
 ] 

Julian Hyde commented on DRILL-4455:


[~jnadeau], [~sphillips], [~amansinha100] and [~parthc],

Can all parties please agree (and state publicly for the record) that moving 
value vector code out of Drill and into Arrow is in the best interests of the 
Drill project?

Most contributions can be managed by a process of submitting a patch, review, 
reject, revise, and repeat. But this is not one of those patches that can be 
casually kicked back to the contributor. It is a huge, because it is an 
architectural change. I would like to see a commitment from both sides 
(contributor and reviewer) that we will find consensus and accept the patch.

> Depend on Apache Arrow for Vector and Memory
> 
>
> Key: DRILL-4455
> URL: https://issues.apache.org/jira/browse/DRILL-4455
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Steven Phillips
> Fix For: 2.0.0
>
>
> The code for value vectors and memory has been split and contributed to the 
> apache arrow repository. In order to help this project advance, Drill should 
> depend on the arrow project instead of internal value vector code.
> This change will require recompiling any external code, such as UDFs and 
> StoragePlugins. The changes will mainly just involve renaming the classes to 
> the org.apache.arrow namespace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-5015) As per documentation, when issuing a list of drillbits in the connection string, we always attempt to connect only to the first one

2016-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15706598#comment-15706598
 ] 

ASF GitHub Bot commented on DRILL-5015:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/648#discussion_r90089781
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java ---
@@ -223,19 +224,100 @@ public void connect(Properties props) throws 
RpcException {
 connect(null, props);
   }
 
+  /**
+   * Populates the endpointlist with drillbits information provided in the 
connection string by client.
+   * For direct connection we can have connection string with drillbit 
property as below:
+   * 
+   *   drillbit=ip
+   *   use the ip specified as the Foreman ip with default port in 
config file
+   *   drillbit=ip:port
+   *   use the ip and port specified as the Foreman ip and port
+   *   drillbit=ip1:port1,ip2:port2,...
+   *   randomly select the ip and port pair from the specified list as 
the Foreman ip and port.
+   * 
+   *
+   * @param drillbits string with drillbit value provided in connection 
string
+   * @param defaultUserPort string with default userport of drillbit 
specified in config file
+   * @return list of drillbit endpoints parsed from connection string
+   * @throws InvalidConnectionInfoException if the connection string has 
invalid or no drillbit information
+   */
+  static List parseAndVerifyEndpoints(String drillbits, 
String defaultUserPort)
+throws InvalidConnectionInfoException {
+// If no drillbits is provided then throw exception
+drillbits = drillbits.trim();
+if (drillbits.isEmpty()) {
+  throw new InvalidConnectionInfoException("No drillbit information 
specified in the connection string");
+}
+
+ArrayList endpointList = new ArrayList<>();
+final String[] connectInfo = drillbits.split(",");
+
+// Fetch ip address and port information for each drillbit and 
populate the list
+for (String drillbit : connectInfo) {
+
+  // Trim all the empty spaces and check if the entry is empty string.
+  // Ignore the empty ones.
+  drillbit = drillbit.trim();
+
+  if (!drillbit.isEmpty()) {
+// Verify if we have only ":" or only ":port" pattern
+if (drillbit.charAt(0) == ':') {
+  // Invalid drillbit information
+  throw new InvalidConnectionInfoException("Malformed connection 
string with drillbit hostname or " +
+ "hostaddress missing 
for an entry: " + drillbit);
+}
+
+// We are now sure that each ip:port entry will have both the 
values atleast once.
+// Split each drillbit connection string to get ip address and 
port value
+final String[] drillbitInfo = drillbit.split(":");
+
+// Check if we have more than one port
+if (drillbitInfo.length > 2) {
+  throw new InvalidConnectionInfoException("Malformed connection 
string with more than one port in a " +
+ "drillbit entry: " + 
drillbit);
+}
+
+// At this point we are sure that drillbitInfo has atleast 
hostname or host address
+// trim all the empty spaces which might be present in front of 
hostname or
+// host address information
+final String ipAddress = drillbitInfo[0].trim();
+String port = defaultUserPort;
+
+if (drillbitInfo.length == 2) {
+  // We have a port value also given by user. trim all the empty 
spaces between : and port value before
+  // validating the correctness of value.
+  port = drillbitInfo[1].trim();
+}
+
+try {
+  final DrillbitEndpoint endpoint = DrillbitEndpoint.newBuilder()
+.setAddress(ipAddress)
+
.setUserPort(Integer.parseInt(port))
+.build();
+
+  endpointList.add(endpoint);
+} catch (NumberFormatException e) {
+  throw new InvalidConnectionInfoException("Malformed port value 
in entry: " + ipAddress + ":" + port + " " +
+ "passed in connection 
string");
+}
+  }
+}
+if(endpointList.size() == 0){
--- End diff --

fixed


> As per documentation, when issuing a list of drillbits in the connection 
> string, we always attempt to connect only to the first one
> 

[jira] [Commented] (DRILL-5015) As per documentation, when issuing a list of drillbits in the connection string, we always attempt to connect only to the first one

2016-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15706603#comment-15706603
 ] 

ASF GitHub Bot commented on DRILL-5015:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/648#discussion_r90089750
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java ---
@@ -223,19 +224,100 @@ public void connect(Properties props) throws 
RpcException {
 connect(null, props);
   }
 
+  /**
+   * Populates the endpointlist with drillbits information provided in the 
connection string by client.
+   * For direct connection we can have connection string with drillbit 
property as below:
+   * 
+   *   drillbit=ip
+   *   use the ip specified as the Foreman ip with default port in 
config file
+   *   drillbit=ip:port
+   *   use the ip and port specified as the Foreman ip and port
+   *   drillbit=ip1:port1,ip2:port2,...
+   *   randomly select the ip and port pair from the specified list as 
the Foreman ip and port.
+   * 
+   *
+   * @param drillbits string with drillbit value provided in connection 
string
+   * @param defaultUserPort string with default userport of drillbit 
specified in config file
+   * @return list of drillbit endpoints parsed from connection string
+   * @throws InvalidConnectionInfoException if the connection string has 
invalid or no drillbit information
+   */
+  static List parseAndVerifyEndpoints(String drillbits, 
String defaultUserPort)
+throws InvalidConnectionInfoException {
+// If no drillbits is provided then throw exception
+drillbits = drillbits.trim();
+if (drillbits.isEmpty()) {
+  throw new InvalidConnectionInfoException("No drillbit information 
specified in the connection string");
+}
+
+ArrayList endpointList = new ArrayList<>();
--- End diff --

fixed


> As per documentation, when issuing a list of drillbits in the connection 
> string, we always attempt to connect only to the first one
> ---
>
> Key: DRILL-5015
> URL: https://issues.apache.org/jira/browse/DRILL-5015
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 1.8.0, 1.9.0
>Reporter: Sorabh Hamirwasia
>Assignee: Sudheesh Katkam
>  Labels: ready-to-commit
>
> When trying to connect to a Drill cluster by specifying more than 1 drillbits 
> to connect to, we always attempt to connect to only the first drillbit.
> As an example, we tested against a pair of drillbits, but we always connect 
> to the first entry in the CSV list by querying for the 'current' drillbit. 
> The remaining entries are never attempted.
> [root@pssc-60 agileSqlPerfTests]# /opt/mapr/drill/drill-1.8.0/bin/sqlline  -u 
>  "jdbc:drill:schema=dfs.tmp;drillbit=pssc-61:31010,pssc-62:31010" -f 
> whereAmI.q  | grep -v logback
> 1/1  select * from sys.drillbits where `current`;
> +-++---++--+
> |hostname | user_port  | control_port  | data_port  | current  |
> +-++---++--+
> | pssc-61.qa.lab  | 31010  | 31011 | 31012  | true |
> +-++---++--+
> 1 row selected (0.265 seconds)
> Closing: org.apache.drill.jdbc.impl.DrillConnectionImpl
> apache drill 1.8.0 
> "a little sql for your nosql"
> This property is meant for use by clients when not wanting to overload the ZK 
> for fetching a list of existing Drillbits, but the behaviour doesn't match 
> the documentation. 
> [Making a Direct Drillbit Connection | 
> https://drill.apache.org/docs/using-the-jdbc-driver/#using-the-jdbc-url-format-for-a-direct-drillbit-connection
>  ]
> We need to randomly shuffle between this list and If an entry in the shuffled 
> list is unreachable, we need to try for the next entry in the list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-5015) As per documentation, when issuing a list of drillbits in the connection string, we always attempt to connect only to the first one

2016-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15706602#comment-15706602
 ] 

ASF GitHub Bot commented on DRILL-5015:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/648#discussion_r89928643
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/client/DrillClientSystemTest.java
 ---
@@ -17,11 +17,15 @@
  */
 package org.apache.drill.exec.client;
 
+import java.util.ArrayList;
--- End diff --

I will remove this file from the list since have moved all the changes to 
different file "DrillClientTest.java"


> As per documentation, when issuing a list of drillbits in the connection 
> string, we always attempt to connect only to the first one
> ---
>
> Key: DRILL-5015
> URL: https://issues.apache.org/jira/browse/DRILL-5015
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 1.8.0, 1.9.0
>Reporter: Sorabh Hamirwasia
>Assignee: Sudheesh Katkam
>  Labels: ready-to-commit
>
> When trying to connect to a Drill cluster by specifying more than 1 drillbits 
> to connect to, we always attempt to connect to only the first drillbit.
> As an example, we tested against a pair of drillbits, but we always connect 
> to the first entry in the CSV list by querying for the 'current' drillbit. 
> The remaining entries are never attempted.
> [root@pssc-60 agileSqlPerfTests]# /opt/mapr/drill/drill-1.8.0/bin/sqlline  -u 
>  "jdbc:drill:schema=dfs.tmp;drillbit=pssc-61:31010,pssc-62:31010" -f 
> whereAmI.q  | grep -v logback
> 1/1  select * from sys.drillbits where `current`;
> +-++---++--+
> |hostname | user_port  | control_port  | data_port  | current  |
> +-++---++--+
> | pssc-61.qa.lab  | 31010  | 31011 | 31012  | true |
> +-++---++--+
> 1 row selected (0.265 seconds)
> Closing: org.apache.drill.jdbc.impl.DrillConnectionImpl
> apache drill 1.8.0 
> "a little sql for your nosql"
> This property is meant for use by clients when not wanting to overload the ZK 
> for fetching a list of existing Drillbits, but the behaviour doesn't match 
> the documentation. 
> [Making a Direct Drillbit Connection | 
> https://drill.apache.org/docs/using-the-jdbc-driver/#using-the-jdbc-url-format-for-a-direct-drillbit-connection
>  ]
> We need to randomly shuffle between this list and If an entry in the shuffled 
> list is unreachable, we need to try for the next entry in the list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4347) Planning time for query64 from TPCDS test suite has increased 10 times compared to 1.4 release

2016-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15706525#comment-15706525
 ] 

ASF GitHub Bot commented on DRILL-4347:
---

Github user amansinha100 commented on the issue:

https://github.com/apache/drill/pull/671
  
It is quite likely that the CachingRelMetadataProvider is meant for this.  
Based on the stack trace, there are multiple instances of "at 
org.apache.calcite.rel.metadata.CachingRelMetadataProvider$CachingInvocationHandler.invoke(CachingRelMetadataProvider.java:132)"
  and that line # indicates that there was either a cache miss or the entry was 
stale.   So, the caching provider does in fact get used but then subsequently 
gets stuck in the apply() method of the ReflectiveRelMetadataProvider.  I did 
not attempt to debug why it got stuck there...partly because I am not very 
familiar with the way reflection is used in this provider.  Hence, my fix is an 
attempt to circumvent the issue.  


> Planning time for query64 from TPCDS test suite has increased 10 times 
> compared to 1.4 release
> --
>
> Key: DRILL-4347
> URL: https://issues.apache.org/jira/browse/DRILL-4347
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.5.0
>Reporter: Victoria Markman
>Assignee: Gautam Kumar Parai
> Fix For: Future
>
> Attachments: 294e9fb9-cdda-a89f-d1a7-b852878926a1.sys.drill_1.4.0, 
> 294ea418-9fb8-3082-1725-74e3cfe38fe9.sys.drill_1.5.0, drill4347_jstack.txt
>
>
> mapr-drill-1.5.0.201602012001-1.noarch.rpm
> {code}
> 0: jdbc:drill:schema=dfs> WITH cs_ui
> . . . . . . . . . . . . >  AS (SELECT cs_item_sk,
> . . . . . . . . . . . . > Sum(cs_ext_list_price) AS sale,
> . . . . . . . . . . . . > Sum(cr_refunded_cash + 
> cr_reversed_charge
> . . . . . . . . . . . . > + cr_store_credit) AS refund
> . . . . . . . . . . . . >  FROM   catalog_sales,
> . . . . . . . . . . . . > catalog_returns
> . . . . . . . . . . . . >  WHERE  cs_item_sk = cr_item_sk
> . . . . . . . . . . . . > AND cs_order_number = 
> cr_order_number
> . . . . . . . . . . . . >  GROUP  BY cs_item_sk
> . . . . . . . . . . . . >  HAVING Sum(cs_ext_list_price) > 2 * Sum(
> . . . . . . . . . . . . > cr_refunded_cash + 
> cr_reversed_charge
> . . . . . . . . . . . . > + cr_store_credit)),
> . . . . . . . . . . . . >  cross_sales
> . . . . . . . . . . . . >  AS (SELECT i_product_name product_name,
> . . . . . . . . . . . . > i_item_sk  item_sk,
> . . . . . . . . . . . . > s_store_name   store_name,
> . . . . . . . . . . . . > s_zip  store_zip,
> . . . . . . . . . . . . > ad1.ca_street_number   
> b_street_number,
> . . . . . . . . . . . . > ad1.ca_street_name 
> b_streen_name,
> . . . . . . . . . . . . > ad1.ca_cityb_city,
> . . . . . . . . . . . . > ad1.ca_zip b_zip,
> . . . . . . . . . . . . > ad2.ca_street_number   
> c_street_number,
> . . . . . . . . . . . . > ad2.ca_street_name 
> c_street_name,
> . . . . . . . . . . . . > ad2.ca_cityc_city,
> . . . . . . . . . . . . > ad2.ca_zip c_zip,
> . . . . . . . . . . . . > d1.d_year  AS syear,
> . . . . . . . . . . . . > d2.d_year  AS fsyear,
> . . . . . . . . . . . . > d3.d_year  s2year,
> . . . . . . . . . . . . > Count(*)   cnt,
> . . . . . . . . . . . . > Sum(ss_wholesale_cost) s1,
> . . . . . . . . . . . . > Sum(ss_list_price) s2,
> . . . . . . . . . . . . > Sum(ss_coupon_amt) s3
> . . . . . . . . . . . . >  FROM   store_sales,
> . . . . . . . . . . . . > store_returns,
> . . . . . . . . . . . . > cs_ui,
> . . . . . . . . . . . . > date_dim d1,
> . . . . . . . . . . . . > date_dim d2,
> . . . . . . . . . . . . > date_dim d3,
> . . . . . . . . . . . . > store,
> . . . . . . . . . . . . > customer,
> . . . . . . . . . . . . > customer_demographics cd1,
> . . . . . . . . . . . . > customer_demographics cd2,
> . . . . . . . . . . . . > promotion,
> . . . . . . . . . . . . > household_demographics hd1,
> . . . . . . . . . . . . > 

[jira] [Assigned] (DRILL-4347) Planning time for query64 from TPCDS test suite has increased 10 times compared to 1.4 release

2016-11-29 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong reassigned DRILL-4347:
---

Assignee: Gautam Kumar Parai  (was: Aman Sinha)

Assigning to [~gparai] for review.

> Planning time for query64 from TPCDS test suite has increased 10 times 
> compared to 1.4 release
> --
>
> Key: DRILL-4347
> URL: https://issues.apache.org/jira/browse/DRILL-4347
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.5.0
>Reporter: Victoria Markman
>Assignee: Gautam Kumar Parai
> Fix For: Future
>
> Attachments: 294e9fb9-cdda-a89f-d1a7-b852878926a1.sys.drill_1.4.0, 
> 294ea418-9fb8-3082-1725-74e3cfe38fe9.sys.drill_1.5.0, drill4347_jstack.txt
>
>
> mapr-drill-1.5.0.201602012001-1.noarch.rpm
> {code}
> 0: jdbc:drill:schema=dfs> WITH cs_ui
> . . . . . . . . . . . . >  AS (SELECT cs_item_sk,
> . . . . . . . . . . . . > Sum(cs_ext_list_price) AS sale,
> . . . . . . . . . . . . > Sum(cr_refunded_cash + 
> cr_reversed_charge
> . . . . . . . . . . . . > + cr_store_credit) AS refund
> . . . . . . . . . . . . >  FROM   catalog_sales,
> . . . . . . . . . . . . > catalog_returns
> . . . . . . . . . . . . >  WHERE  cs_item_sk = cr_item_sk
> . . . . . . . . . . . . > AND cs_order_number = 
> cr_order_number
> . . . . . . . . . . . . >  GROUP  BY cs_item_sk
> . . . . . . . . . . . . >  HAVING Sum(cs_ext_list_price) > 2 * Sum(
> . . . . . . . . . . . . > cr_refunded_cash + 
> cr_reversed_charge
> . . . . . . . . . . . . > + cr_store_credit)),
> . . . . . . . . . . . . >  cross_sales
> . . . . . . . . . . . . >  AS (SELECT i_product_name product_name,
> . . . . . . . . . . . . > i_item_sk  item_sk,
> . . . . . . . . . . . . > s_store_name   store_name,
> . . . . . . . . . . . . > s_zip  store_zip,
> . . . . . . . . . . . . > ad1.ca_street_number   
> b_street_number,
> . . . . . . . . . . . . > ad1.ca_street_name 
> b_streen_name,
> . . . . . . . . . . . . > ad1.ca_cityb_city,
> . . . . . . . . . . . . > ad1.ca_zip b_zip,
> . . . . . . . . . . . . > ad2.ca_street_number   
> c_street_number,
> . . . . . . . . . . . . > ad2.ca_street_name 
> c_street_name,
> . . . . . . . . . . . . > ad2.ca_cityc_city,
> . . . . . . . . . . . . > ad2.ca_zip c_zip,
> . . . . . . . . . . . . > d1.d_year  AS syear,
> . . . . . . . . . . . . > d2.d_year  AS fsyear,
> . . . . . . . . . . . . > d3.d_year  s2year,
> . . . . . . . . . . . . > Count(*)   cnt,
> . . . . . . . . . . . . > Sum(ss_wholesale_cost) s1,
> . . . . . . . . . . . . > Sum(ss_list_price) s2,
> . . . . . . . . . . . . > Sum(ss_coupon_amt) s3
> . . . . . . . . . . . . >  FROM   store_sales,
> . . . . . . . . . . . . > store_returns,
> . . . . . . . . . . . . > cs_ui,
> . . . . . . . . . . . . > date_dim d1,
> . . . . . . . . . . . . > date_dim d2,
> . . . . . . . . . . . . > date_dim d3,
> . . . . . . . . . . . . > store,
> . . . . . . . . . . . . > customer,
> . . . . . . . . . . . . > customer_demographics cd1,
> . . . . . . . . . . . . > customer_demographics cd2,
> . . . . . . . . . . . . > promotion,
> . . . . . . . . . . . . > household_demographics hd1,
> . . . . . . . . . . . . > household_demographics hd2,
> . . . . . . . . . . . . > customer_address ad1,
> . . . . . . . . . . . . > customer_address ad2,
> . . . . . . . . . . . . > income_band ib1,
> . . . . . . . . . . . . > income_band ib2,
> . . . . . . . . . . . . > item
> . . . . . . . . . . . . >  WHERE  ss_store_sk = s_store_sk
> . . . . . . . . . . . . > AND ss_sold_date_sk = d1.d_date_sk
> . . . . . . . . . . . . > AND ss_customer_sk = c_customer_sk
> . . . . . . . . . . . . > AND ss_cdemo_sk = cd1.cd_demo_sk
> . . . . . . . . . . . . > AND ss_hdemo_sk = hd1.hd_demo_sk
> . . . . . . . . . . . . > AND 

[jira] [Updated] (DRILL-4984) Limit 0 raises NullPointerException on JDBC storage sources

2016-11-29 Thread Holger Kiel (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Holger Kiel updated DRILL-4984:
---
Fix Version/s: (was: 1.9.0)

> Limit 0 raises NullPointerException on JDBC storage sources
> ---
>
> Key: DRILL-4984
> URL: https://issues.apache.org/jira/browse/DRILL-4984
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.8.0
> Environment: Latest 1.9 Snapshot, also 1.8 release version,
> mysql-connector-java-5.1.30, mysql-connector-java-5.1.40
>Reporter: Holger Kiel
>
> NullPointerExceptions occur when a query with 'limit 0' is executed on a jdbc 
> storage source (e.g. Mysql):
> {code}
> 0: jdbc:drill:zk=local> select * from mysql.sugarcrm.sales_person limit 0;
> Error: SYSTEM ERROR: NullPointerException
> [Error Id: 6cd676fc-6db9-40b3-81d5-c2db044aeb77 on localhost:31010]
>   (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception 
> during fragment initialization: null
> org.apache.drill.exec.work.foreman.Foreman.run():281
> java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> java.lang.Thread.run():745
>   Caused By (java.lang.NullPointerException) null
> 
> org.apache.drill.exec.planner.sql.handlers.FindHardDistributionScans.visit():55
> org.apache.calcite.rel.core.TableScan.accept():166
> org.apache.calcite.rel.RelShuttleImpl.visitChild():53
> org.apache.calcite.rel.RelShuttleImpl.visitChildren():68
> org.apache.calcite.rel.RelShuttleImpl.visit():126
> org.apache.calcite.rel.AbstractRelNode.accept():256
> org.apache.calcite.rel.RelShuttleImpl.visitChild():53
> org.apache.calcite.rel.RelShuttleImpl.visitChildren():68
> org.apache.calcite.rel.RelShuttleImpl.visit():126
> org.apache.calcite.rel.AbstractRelNode.accept():256
> 
> org.apache.drill.exec.planner.sql.handlers.FindHardDistributionScans.canForceSingleMode():45
> 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel():262
> 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel():290
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan():168
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPhysicalPlan():123
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan():97
> org.apache.drill.exec.work.foreman.Foreman.runSQL():1008
> org.apache.drill.exec.work.foreman.Foreman.run():264
> java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> java.lang.Thread.run():745 (state=,code=0)
> 0: jdbc:drill:zk=local> select * from mysql.sugarcrm.sales_person limit 1;
> +-+-+++-+
> | id  | first_name  |   last_name| full_name  | manager_id  |
> +-+-+++-+
> | 1   | null| Administrator  | admin  | 0   |
> +-+-+++-+
> 1 row selected (0,235 seconds)
> {code}
> Other datasources are okay:
> {code}
> 0: jdbc:drill:zk=local> SELECT * FROM cp.`employee.json` LIMIT 0;
> +--+---+---+-+--++-++--+-+---++-++-++--+-+-+--+
> | fqn  | filename  | filepath  | suffix  | employee_id  | full_name  | 
> first_name  | last_name  | position_id  | position_title  | store_id  | 
> department_id  | birth_date  | hire_date  | salary  | supervisor_id  | 
> education_level  | marital_status  | gender  | management_role  |
> +--+---+---+-+--++-++--+-+---++-++-++--+-+-+--+
> +--+---+---+-+--++-++--+-+---++-++-++--+-+-+--+
> No rows selected (0,309 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-5083) IteratorValidator does not handle RecordIterator cleanup call to next( )

2016-11-29 Thread Paul Rogers (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-5083:
---
Assignee: (was: Paul Rogers)

> IteratorValidator does not handle RecordIterator cleanup call to next( )
> 
>
> Key: DRILL-5083
> URL: https://issues.apache.org/jira/browse/DRILL-5083
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Paul Rogers
>Priority: Minor
>
> This one is very confusing...
> In a test with a MergeJoin and external sort, operators are stacked something 
> like this:
> {code}
> Screen
> - MergeJoin
> - - External Sort
> ...
> {code}
> Using the injector to force a OOM in spill, the external sort threw a 
> UserException up the stack. This was handed by:
> {code}
> IteratorValidatorBatchIterator.next( )
> RecordIterator.clearInflightBatches( )
> RecordIterator.close( )
> MergeJoinBatch.close( )
> {code}
> Which does the following:
> {code}
>   // Check whether next() should even have been called in current state.
>   if (null != exceptionState) {
> throw new IllegalStateException(
> {code}
> But, the exceptionState is set, so we end up throwing an 
> IllegalStateException during cleanup.
> Seems the code should agree: if {{next( )}} will be called during cleanup, 
> then {{next( )}} should gracefully handle that case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (DRILL-5083) IteratorValidator does not handle RecordIterator cleanup call to next( )

2016-11-29 Thread Paul Rogers (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers reassigned DRILL-5083:
--

Assignee: Paul Rogers

> IteratorValidator does not handle RecordIterator cleanup call to next( )
> 
>
> Key: DRILL-5083
> URL: https://issues.apache.org/jira/browse/DRILL-5083
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> This one is very confusing...
> In a test with a MergeJoin and external sort, operators are stacked something 
> like this:
> {code}
> Screen
> - MergeJoin
> - - External Sort
> ...
> {code}
> Using the injector to force a OOM in spill, the external sort threw a 
> UserException up the stack. This was handed by:
> {code}
> IteratorValidatorBatchIterator.next( )
> RecordIterator.clearInflightBatches( )
> RecordIterator.close( )
> MergeJoinBatch.close( )
> {code}
> Which does the following:
> {code}
>   // Check whether next() should even have been called in current state.
>   if (null != exceptionState) {
> throw new IllegalStateException(
> {code}
> But, the exceptionState is set, so we end up throwing an 
> IllegalStateException during cleanup.
> Seems the code should agree: if {{next( )}} will be called during cleanup, 
> then {{next( )}} should gracefully handle that case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-5083) IteratorValidator does not handle RecordIterator cleanup call to next( )

2016-11-29 Thread Paul Rogers (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-5083:
---
Reviewer:   (was: Sorabh Hamirwasia)

> IteratorValidator does not handle RecordIterator cleanup call to next( )
> 
>
> Key: DRILL-5083
> URL: https://issues.apache.org/jira/browse/DRILL-5083
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Paul Rogers
>Priority: Minor
>
> This one is very confusing...
> In a test with a MergeJoin and external sort, operators are stacked something 
> like this:
> {code}
> Screen
> - MergeJoin
> - - External Sort
> ...
> {code}
> Using the injector to force a OOM in spill, the external sort threw a 
> UserException up the stack. This was handed by:
> {code}
> IteratorValidatorBatchIterator.next( )
> RecordIterator.clearInflightBatches( )
> RecordIterator.close( )
> MergeJoinBatch.close( )
> {code}
> Which does the following:
> {code}
>   // Check whether next() should even have been called in current state.
>   if (null != exceptionState) {
> throw new IllegalStateException(
> {code}
> But, the exceptionState is set, so we end up throwing an 
> IllegalStateException during cleanup.
> Seems the code should agree: if {{next( )}} will be called during cleanup, 
> then {{next( )}} should gracefully handle that case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-5083) IteratorValidator does not handle RecordIterator cleanup call to next( )

2016-11-29 Thread Paul Rogers (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-5083:
---
Reviewer: Sorabh Hamirwasia

> IteratorValidator does not handle RecordIterator cleanup call to next( )
> 
>
> Key: DRILL-5083
> URL: https://issues.apache.org/jira/browse/DRILL-5083
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Paul Rogers
>Priority: Minor
>
> This one is very confusing...
> In a test with a MergeJoin and external sort, operators are stacked something 
> like this:
> {code}
> Screen
> - MergeJoin
> - - External Sort
> ...
> {code}
> Using the injector to force a OOM in spill, the external sort threw a 
> UserException up the stack. This was handed by:
> {code}
> IteratorValidatorBatchIterator.next( )
> RecordIterator.clearInflightBatches( )
> RecordIterator.close( )
> MergeJoinBatch.close( )
> {code}
> Which does the following:
> {code}
>   // Check whether next() should even have been called in current state.
>   if (null != exceptionState) {
> throw new IllegalStateException(
> {code}
> But, the exceptionState is set, so we end up throwing an 
> IllegalStateException during cleanup.
> Seems the code should agree: if {{next( )}} will be called during cleanup, 
> then {{next( )}} should gracefully handle that case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (DRILL-4347) Planning time for query64 from TPCDS test suite has increased 10 times compared to 1.4 release

2016-11-29 Thread Aman Sinha (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15705810#comment-15705810
 ] 

Aman Sinha edited comment on DRILL-4347 at 11/29/16 4:48 PM:
-

The jstack is long because of the complex query.  It shows that the planner is 
stuck during Calcite's {{ReflectiveRelMetadataProvider.apply()}}  call during 
the post-processing phase of Drill planning.  At this phase, the logical and 
physical planning are done and planner is in SwapHashJoin phase.  During this, 
it calls getRows() on the inputs of all the hash joins to makes its decisions.  
The getRows() eventually calls {{RelMdDistinctRowCount.getDistinctRowCount()}} 
since there is a GROUP-BY and the row count of a grouped aggregate is 
determined by the number of distinct rows for its group-by columns.  Note that 
Calcite needs the distinct row count also from the Join operators (not just 
Aggregates) if the output of the Join is feeding into an Aggregate.

Note that the stack trace is different from a similar (but not same) issue 
reported in CALCITE-1053. It is unclear what is the root cause of the deeply 
nested reflexive calls getting stuck, but one important observation is that 
Drill is needlessly doing this computation twice - once during logical planning 
phase and once during physical planning.  The distinct row count of all the 
Joins can be computed during logical planning and cached for future use during 
physical planning because this value is not going to change.   For complex 
queries such as these with many joins, it also saves planning time.  I am 
proposing to fix the issue by doing this caching of distinct row count for 
Joins.  



was (Author: amansinha100):
The jstack is long because of the complex query.  It shows that the planner is 
stuck during Calcite's {bq} ReflectiveRelMetadataProvider.apply() {bq} call 
during the post-processing phase of Drill planning.  At this phase, the logical 
and physical planning are done and planner is in SwapHashJoin phase.  During 
this, it calls getRows() on the inputs of all the hash joins to makes its 
decisions.  The getRows() eventually calls 
{bq}RelMdDistinctRowCount.getDistinctRowCount(){bq} since there is a GROUP-BY 
and the row count of a grouped aggregate is determined by the number of 
distinct rows for its group-by columns.  Note that Calcite needs the distinct 
row count also from the Join operators (not just Aggregates) if the output of 
the Join is feeding into an Aggregate.

It is unclear what is the root cause of the Calcite call either stuck or taking 
too long (there could be some issues with the deeply nested reflexive calls), 
but one important observation is that Drill is needlessly doing this 
computation twice - once during logical planning phase and once during physical 
planning.  The distinct row count of all the Joins can be computed during 
logical planning and cached for future use during physical planning because 
this value is not going to change.   For complex queries such as these with 
many joins, it also saves planning time.  I am proposing to fix the issue by 
doing this caching of distinct row count for Joins.  


> Planning time for query64 from TPCDS test suite has increased 10 times 
> compared to 1.4 release
> --
>
> Key: DRILL-4347
> URL: https://issues.apache.org/jira/browse/DRILL-4347
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.5.0
>Reporter: Victoria Markman
>Assignee: Aman Sinha
> Fix For: Future
>
> Attachments: 294e9fb9-cdda-a89f-d1a7-b852878926a1.sys.drill_1.4.0, 
> 294ea418-9fb8-3082-1725-74e3cfe38fe9.sys.drill_1.5.0, drill4347_jstack.txt
>
>
> mapr-drill-1.5.0.201602012001-1.noarch.rpm
> {code}
> 0: jdbc:drill:schema=dfs> WITH cs_ui
> . . . . . . . . . . . . >  AS (SELECT cs_item_sk,
> . . . . . . . . . . . . > Sum(cs_ext_list_price) AS sale,
> . . . . . . . . . . . . > Sum(cr_refunded_cash + 
> cr_reversed_charge
> . . . . . . . . . . . . > + cr_store_credit) AS refund
> . . . . . . . . . . . . >  FROM   catalog_sales,
> . . . . . . . . . . . . > catalog_returns
> . . . . . . . . . . . . >  WHERE  cs_item_sk = cr_item_sk
> . . . . . . . . . . . . > AND cs_order_number = 
> cr_order_number
> . . . . . . . . . . . . >  GROUP  BY cs_item_sk
> . . . . . . . . . . . . >  HAVING Sum(cs_ext_list_price) > 2 * Sum(
> . . . . . . . . . . . . > cr_refunded_cash + 
> cr_reversed_charge
> . . . . . . . . . . . . > + cr_store_credit)),
> . . . . . . . . . . . . >  cross_sales
> . . . . . . . . . . . . >  AS 

[jira] [Commented] (DRILL-4347) Planning time for query64 from TPCDS test suite has increased 10 times compared to 1.4 release

2016-11-29 Thread Aman Sinha (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15705810#comment-15705810
 ] 

Aman Sinha commented on DRILL-4347:
---

The jstack is long because of the complex query.  It shows that the planner is 
stuck during Calcite's {bq} ReflectiveRelMetadataProvider.apply() {bq} call 
during the post-processing phase of Drill planning.  At this phase, the logical 
and physical planning are done and planner is in SwapHashJoin phase.  During 
this, it calls getRows() on the inputs of all the hash joins to makes its 
decisions.  The getRows() eventually calls 
{bq}RelMdDistinctRowCount.getDistinctRowCount(){bq} since there is a GROUP-BY 
and the row count of a grouped aggregate is determined by the number of 
distinct rows for its group-by columns.  Note that Calcite needs the distinct 
row count also from the Join operators (not just Aggregates) if the output of 
the Join is feeding into an Aggregate.

It is unclear what is the root cause of the Calcite call either stuck or taking 
too long (there could be some issues with the deeply nested reflexive calls), 
but one important observation is that Drill is needlessly doing this 
computation twice - once during logical planning phase and once during physical 
planning.  The distinct row count of all the Joins can be computed during 
logical planning and cached for future use during physical planning because 
this value is not going to change.   For complex queries such as these with 
many joins, it also saves planning time.  I am proposing to fix the issue by 
doing this caching of distinct row count for Joins.  


> Planning time for query64 from TPCDS test suite has increased 10 times 
> compared to 1.4 release
> --
>
> Key: DRILL-4347
> URL: https://issues.apache.org/jira/browse/DRILL-4347
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.5.0
>Reporter: Victoria Markman
>Assignee: Aman Sinha
> Fix For: Future
>
> Attachments: 294e9fb9-cdda-a89f-d1a7-b852878926a1.sys.drill_1.4.0, 
> 294ea418-9fb8-3082-1725-74e3cfe38fe9.sys.drill_1.5.0, drill4347_jstack.txt
>
>
> mapr-drill-1.5.0.201602012001-1.noarch.rpm
> {code}
> 0: jdbc:drill:schema=dfs> WITH cs_ui
> . . . . . . . . . . . . >  AS (SELECT cs_item_sk,
> . . . . . . . . . . . . > Sum(cs_ext_list_price) AS sale,
> . . . . . . . . . . . . > Sum(cr_refunded_cash + 
> cr_reversed_charge
> . . . . . . . . . . . . > + cr_store_credit) AS refund
> . . . . . . . . . . . . >  FROM   catalog_sales,
> . . . . . . . . . . . . > catalog_returns
> . . . . . . . . . . . . >  WHERE  cs_item_sk = cr_item_sk
> . . . . . . . . . . . . > AND cs_order_number = 
> cr_order_number
> . . . . . . . . . . . . >  GROUP  BY cs_item_sk
> . . . . . . . . . . . . >  HAVING Sum(cs_ext_list_price) > 2 * Sum(
> . . . . . . . . . . . . > cr_refunded_cash + 
> cr_reversed_charge
> . . . . . . . . . . . . > + cr_store_credit)),
> . . . . . . . . . . . . >  cross_sales
> . . . . . . . . . . . . >  AS (SELECT i_product_name product_name,
> . . . . . . . . . . . . > i_item_sk  item_sk,
> . . . . . . . . . . . . > s_store_name   store_name,
> . . . . . . . . . . . . > s_zip  store_zip,
> . . . . . . . . . . . . > ad1.ca_street_number   
> b_street_number,
> . . . . . . . . . . . . > ad1.ca_street_name 
> b_streen_name,
> . . . . . . . . . . . . > ad1.ca_cityb_city,
> . . . . . . . . . . . . > ad1.ca_zip b_zip,
> . . . . . . . . . . . . > ad2.ca_street_number   
> c_street_number,
> . . . . . . . . . . . . > ad2.ca_street_name 
> c_street_name,
> . . . . . . . . . . . . > ad2.ca_cityc_city,
> . . . . . . . . . . . . > ad2.ca_zip c_zip,
> . . . . . . . . . . . . > d1.d_year  AS syear,
> . . . . . . . . . . . . > d2.d_year  AS fsyear,
> . . . . . . . . . . . . > d3.d_year  s2year,
> . . . . . . . . . . . . > Count(*)   cnt,
> . . . . . . . . . . . . > Sum(ss_wholesale_cost) s1,
> . . . . . . . . . . . . > Sum(ss_list_price) s2,
> . . . . . . . . . . . . > Sum(ss_coupon_amt) s3
> . . . . . . . . . . . . >  FROM   store_sales,
> . . . . . . . . . . . . > store_returns,
> . . . . . . 

[jira] [Commented] (DRILL-5044) After the dynamic registration of multiple jars simultaneously not all UDFs were registered

2016-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15705310#comment-15705310
 ] 

ASF GitHub Bot commented on DRILL-5044:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/669#discussion_r89994171
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/CreateFunctionHandler.java
 ---
@@ -175,22 +175,20 @@ private void initRemoteRegistration(List 
functions,
 List remoteJars = 
remoteRegistry.getRegistry(version).getJarList();
 validateAgainstRemoteRegistry(remoteJars, jarManager.getBinaryName(), 
functions);
 jarManager.copyToRegistryArea();
-boolean cleanUp = true;
 List jars = Lists.newArrayList(remoteJars);
 
jars.add(Jar.newBuilder().setName(jarManager.getBinaryName()).addAllFunctionSignature(functions).build());
 Registry updatedRegistry = 
Registry.newBuilder().addAllJar(jars).build();
 try {
   remoteRegistry.updateRegistry(updatedRegistry, version);
-  cleanUp = false;
 } catch (VersionMismatchException ex) {
+  jarManager.deleteQuietlyFromRegistryArea();
--- End diff --

1. I guess having fixed number of retries is enough. Having retry and wait 
logic, may lead us to the point where user will have to wait for a long time 
till registration completes in case of busy system. With only retry logic we 
notify user pretty quickly that the system is busy and it's up to the user to 
decide when to try to register the function again,

2. Totally agree about recursion, since user may modify number of retry 
attempts, it's much better to have while loop to avoid stack overflow.


> After the dynamic registration of multiple jars simultaneously not all UDFs 
> were registered
> ---
>
> Key: DRILL-5044
> URL: https://issues.apache.org/jira/browse/DRILL-5044
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.9.0
>Reporter: Roman
>Assignee: Arina Ielchiieva
>
> I tried to register 21 jars simultaneously (property 'udf.retry-attempts' = 
> 30) and not all jars were registered. As I see in output, all function were 
> registered and /staging directory was empty, but not all of jars were moved 
> into /registry directory. 
> For example, after simultaneously registration I saw "The following UDFs in 
> jar test-1.1.jar have been registered: [test1(VARCHAR-REQUIRED)" message, but 
> this jar was not in /registry directory. When I tried to run function test1, 
> I got this error: "Error: SYSTEM ERROR: SqlValidatorException: No match found 
> for function signature test1()". And when I tried to reregister 
> this jar, I got "Jar with test-1.1.jar name has been already registered".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-5044) After the dynamic registration of multiple jars simultaneously not all UDFs were registered

2016-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15705314#comment-15705314
 ] 

ASF GitHub Bot commented on DRILL-5044:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/669#discussion_r89993367
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/CreateFunctionHandler.java
 ---
@@ -175,22 +175,20 @@ private void initRemoteRegistration(List 
functions,
 List remoteJars = 
remoteRegistry.getRegistry(version).getJarList();
 validateAgainstRemoteRegistry(remoteJars, jarManager.getBinaryName(), 
functions);
 jarManager.copyToRegistryArea();
-boolean cleanUp = true;
 List jars = Lists.newArrayList(remoteJars);
 
jars.add(Jar.newBuilder().setName(jarManager.getBinaryName()).addAllFunctionSignature(functions).build());
 Registry updatedRegistry = 
Registry.newBuilder().addAllJar(jars).build();
 try {
   remoteRegistry.updateRegistry(updatedRegistry, version);
-  cleanUp = false;
 } catch (VersionMismatchException ex) {
+  jarManager.deleteQuietlyFromRegistryArea();
   if (retryAttempts-- == 0) {
 throw new DrillRuntimeException("Failed to update remote function 
registry. Exceeded retry attempts limit.");
   }
   initRemoteRegistration(functions, jarManager, remoteRegistry, 
retryAttempts);
-} finally {
-  if (cleanUp) {
-jarManager.deleteQuietlyFromRegistryArea();
-  }
+} catch (Exception e) {
--- End diff --

You are right. Updated the code, so now we delete jars only once on error.


> After the dynamic registration of multiple jars simultaneously not all UDFs 
> were registered
> ---
>
> Key: DRILL-5044
> URL: https://issues.apache.org/jira/browse/DRILL-5044
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.9.0
>Reporter: Roman
>Assignee: Arina Ielchiieva
>
> I tried to register 21 jars simultaneously (property 'udf.retry-attempts' = 
> 30) and not all jars were registered. As I see in output, all function were 
> registered and /staging directory was empty, but not all of jars were moved 
> into /registry directory. 
> For example, after simultaneously registration I saw "The following UDFs in 
> jar test-1.1.jar have been registered: [test1(VARCHAR-REQUIRED)" message, but 
> this jar was not in /registry directory. When I tried to run function test1, 
> I got this error: "Error: SYSTEM ERROR: SqlValidatorException: No match found 
> for function signature test1()". And when I tried to reregister 
> this jar, I got "Jar with test-1.1.jar name has been already registered".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-5044) After the dynamic registration of multiple jars simultaneously not all UDFs were registered

2016-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15705312#comment-15705312
 ] 

ASF GitHub Bot commented on DRILL-5044:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/669#discussion_r89993422
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DropFunctionHandler.java
 ---
@@ -143,7 +143,7 @@ private Jar unregister(String jarName, 
RemoteFunctionRegistry remoteFunctionRegi
 if (retryAttempts-- == 0) {
   throw new DrillRuntimeException("Failed to update remote 
function registry. Exceeded retry attempts limit.");
 }
-unregister(jarName, remoteFunctionRegistry, retryAttempts);
+return unregister(jarName, remoteFunctionRegistry, retryAttempts);
--- End diff --

Agree. Done.


> After the dynamic registration of multiple jars simultaneously not all UDFs 
> were registered
> ---
>
> Key: DRILL-5044
> URL: https://issues.apache.org/jira/browse/DRILL-5044
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.9.0
>Reporter: Roman
>Assignee: Arina Ielchiieva
>
> I tried to register 21 jars simultaneously (property 'udf.retry-attempts' = 
> 30) and not all jars were registered. As I see in output, all function were 
> registered and /staging directory was empty, but not all of jars were moved 
> into /registry directory. 
> For example, after simultaneously registration I saw "The following UDFs in 
> jar test-1.1.jar have been registered: [test1(VARCHAR-REQUIRED)" message, but 
> this jar was not in /registry directory. When I tried to run function test1, 
> I got this error: "Error: SYSTEM ERROR: SqlValidatorException: No match found 
> for function signature test1()". And when I tried to reregister 
> this jar, I got "Jar with test-1.1.jar name has been already registered".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-5044) After the dynamic registration of multiple jars simultaneously not all UDFs were registered

2016-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15705311#comment-15705311
 ] 

ASF GitHub Bot commented on DRILL-5044:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/669#discussion_r89995323
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/TestDynamicUDFSupport.java ---
@@ -271,6 +271,75 @@ public void testDuplicatedFunctionsInLocalRegistry() 
throws Exception {
   }
 
   @Test
+  public void testSuccessfulRegistrationAfterSeveralRetryAttempts() throws 
Exception {
+RemoteFunctionRegistry remoteFunctionRegistry = 
spyRemoteFunctionRegistry();
+copyDefaultJarsToStagingArea();
+
+doThrow(new VersionMismatchException("Version mismatch detected", 1))
+.doThrow(new VersionMismatchException("Version mismatch 
detected", 1))
+.doCallRealMethod()
+
.when(remoteFunctionRegistry).updateRegistry(any(Registry.class), 
any(DataChangeVersion.class));
+
+String summary = "The following UDFs in jar %s have been 
registered:\n" +
+"[custom_lower(VARCHAR-REQUIRED)]";
+
+testBuilder()
+.sqlQuery("create function using jar '%s'", 
default_binary_name)
+.unOrdered()
+.baselineColumns("ok", "summary")
+.baselineValues(true, String.format(summary, 
default_binary_name))
+.go();
+
+verify(remoteFunctionRegistry, times(3))
+.updateRegistry(any(Registry.class), 
any(DataChangeVersion.class));
+
+FileSystem fs = remoteFunctionRegistry.getFs();
+
+assertFalse("Staging area should be empty", 
fs.listFiles(remoteFunctionRegistry.getStagingArea(), false).hasNext());
+assertFalse("Temporary area should be empty", 
fs.listFiles(remoteFunctionRegistry.getTmpArea(), false).hasNext());
+
+assertTrue("Binary should be present in registry area",
+fs.exists(new Path(remoteFunctionRegistry.getRegistryArea(), 
default_binary_name)));
+assertTrue("Source should be present in registry area",
+fs.exists(new Path(remoteFunctionRegistry.getRegistryArea(), 
default_source_name)));
+
+Registry registry = remoteFunctionRegistry.getRegistry();
+assertEquals("Registry should contain one jar", 
registry.getJarList().size(), 1);
+assertEquals(registry.getJar(0).getName(), default_binary_name);
+  }
+
+  @Test
+  public void testSuccessfulUnregistrationAfterSeveralRetryAttempts() 
throws Exception {
+RemoteFunctionRegistry remoteFunctionRegistry = 
spyRemoteFunctionRegistry();
+copyDefaultJarsToStagingArea();
+test("create function using jar '%s'", default_binary_name);
+
+reset(remoteFunctionRegistry);
+doThrow(new VersionMismatchException("Version mismatch detected", 1))
--- End diff --

It's Mockito functionality. You can mock the method to return failure or 
any result when it is being called.
In this case we mock `updateRegistry()` method to return 
VersionMismatchException first two times. This way we simulate the situation 
that someone has updated remote function registry before us.
After that we instruct to call real method.


> After the dynamic registration of multiple jars simultaneously not all UDFs 
> were registered
> ---
>
> Key: DRILL-5044
> URL: https://issues.apache.org/jira/browse/DRILL-5044
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.9.0
>Reporter: Roman
>Assignee: Arina Ielchiieva
>
> I tried to register 21 jars simultaneously (property 'udf.retry-attempts' = 
> 30) and not all jars were registered. As I see in output, all function were 
> registered and /staging directory was empty, but not all of jars were moved 
> into /registry directory. 
> For example, after simultaneously registration I saw "The following UDFs in 
> jar test-1.1.jar have been registered: [test1(VARCHAR-REQUIRED)" message, but 
> this jar was not in /registry directory. When I tried to run function test1, 
> I got this error: "Error: SYSTEM ERROR: SqlValidatorException: No match found 
> for function signature test1()". And when I tried to reregister 
> this jar, I got "Jar with test-1.1.jar name has been already registered".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (DRILL-4604) Generate warning on Web UI if drillbits version mismatch is detected

2016-11-29 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-4604:
---

Assignee: Arina Ielchiieva  (was: Paul Rogers)

> Generate warning on Web UI if drillbits version mismatch is detected
> 
>
> Key: DRILL-4604
> URL: https://issues.apache.org/jira/browse/DRILL-4604
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.6.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.10.0
>
> Attachments: NEW_matching_drillbits.JPG, 
> NEW_mismatching_drillbits.JPG, index_page.JPG, index_page_mismatch.JPG, 
> screenshots_with_different_states.docx
>
>
> Display drillbit version on web UI. If any of drillbits version doesn't match 
> with current drillbit, generate warning.
> Screenshots - screenshots_with_different_states.docx.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4604) Generate warning on Web UI if drillbits version mismatch is detected

2016-11-29 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-4604:

Labels: doc-impacting ready-to-commit  (was: doc-impacting)

> Generate warning on Web UI if drillbits version mismatch is detected
> 
>
> Key: DRILL-4604
> URL: https://issues.apache.org/jira/browse/DRILL-4604
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.6.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.10.0
>
> Attachments: NEW_matching_drillbits.JPG, 
> NEW_mismatching_drillbits.JPG, index_page.JPG, index_page_mismatch.JPG, 
> screenshots_with_different_states.docx
>
>
> Display drillbit version on web UI. If any of drillbits version doesn't match 
> with current drillbit, generate warning.
> Screenshots - screenshots_with_different_states.docx.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4604) Generate warning on Web UI if drillbits version mismatch is detected

2016-11-29 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-4604:

Fix Version/s: (was: Future)
   1.10.0

> Generate warning on Web UI if drillbits version mismatch is detected
> 
>
> Key: DRILL-4604
> URL: https://issues.apache.org/jira/browse/DRILL-4604
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.6.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.10.0
>
> Attachments: NEW_matching_drillbits.JPG, 
> NEW_mismatching_drillbits.JPG, index_page.JPG, index_page_mismatch.JPG, 
> screenshots_with_different_states.docx
>
>
> Display drillbit version on web UI. If any of drillbits version doesn't match 
> with current drillbit, generate warning.
> Screenshots - screenshots_with_different_states.docx.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-5078) use Custom Functions errors

2016-11-29 Thread Arina Ielchiieva (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15704800#comment-15704800
 ] 

Arina Ielchiieva commented on DRILL-5078:
-

[~zhenTan] you might wanna subscribe to Drill user / dev mailing lists 
(https://drill.apache.org/mailinglists/) and first post questions there when 
your are not sure if it's Drill bug or bug in your implementation, like in this 
case.

> use Custom Functions errors
> ---
>
> Key: DRILL-5078
> URL: https://issues.apache.org/jira/browse/DRILL-5078
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.8.0
> Environment: window 7
>Reporter: mircoteam
>Priority: Trivial
>
> I define a function like change encoding from UTF8 to GBK。
> when I put it classes and source code into 3rdparty, and use it in query sql 
> like this :
> "SELECT encode_translate(columns[0],'UTF-8','GBK') as aaa FROM 
> dfs.`d:/drill_test.csv` LIMIT 20"
> it return a error info:
> Query Failed: An Error Occurred
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
> CompileException: Line 92, Column 42: Cannot determine simple type name 
> "UnsupportedEncodingException" Fragment 0:0 [Error Id: 
> 599d0e39-f05a-4ecd-a539-b5338239d63b on XXX..com:31010]。
> this is resource code of evel :
> public void eval() {
> // get the value and replace with
> String stringValue = 
> org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(input.start,
>  input.end, input.buffer);
> String fromEncodeValue = 
> org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.getStringFromVarCharHolder(fromEncode);
> String toEncodeValue = 
> org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.getStringFromVarCharHolder(toEncode);
> try {
> String toEncodeStringValue = new 
> String(stringValue.getBytes(fromEncodeValue),toEncodeValue);
> out.buffer = buffer;
> out.start = 0;
> out.end = toEncodeStringValue.getBytes().length;
> buffer.setBytes(0, toEncodeStringValue.getBytes());
> } catch (UnsupportedEncodingException e) {
> }
> }
> please help me,thank your.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (DRILL-5078) use Custom Functions errors

2016-11-29 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva closed DRILL-5078.
---
Resolution: Not A Bug

> use Custom Functions errors
> ---
>
> Key: DRILL-5078
> URL: https://issues.apache.org/jira/browse/DRILL-5078
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.8.0
> Environment: window 7
>Reporter: mircoteam
>Priority: Trivial
>
> I define a function like change encoding from UTF8 to GBK。
> when I put it classes and source code into 3rdparty, and use it in query sql 
> like this :
> "SELECT encode_translate(columns[0],'UTF-8','GBK') as aaa FROM 
> dfs.`d:/drill_test.csv` LIMIT 20"
> it return a error info:
> Query Failed: An Error Occurred
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
> CompileException: Line 92, Column 42: Cannot determine simple type name 
> "UnsupportedEncodingException" Fragment 0:0 [Error Id: 
> 599d0e39-f05a-4ecd-a539-b5338239d63b on XXX..com:31010]。
> this is resource code of evel :
> public void eval() {
> // get the value and replace with
> String stringValue = 
> org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(input.start,
>  input.end, input.buffer);
> String fromEncodeValue = 
> org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.getStringFromVarCharHolder(fromEncode);
> String toEncodeValue = 
> org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.getStringFromVarCharHolder(toEncode);
> try {
> String toEncodeStringValue = new 
> String(stringValue.getBytes(fromEncodeValue),toEncodeValue);
> out.buffer = buffer;
> out.start = 0;
> out.end = toEncodeStringValue.getBytes().length;
> buffer.setBytes(0, toEncodeStringValue.getBytes());
> } catch (UnsupportedEncodingException e) {
> }
> }
> please help me,thank your.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2223) Empty parquet file created with Limit 0 query errors out when querying

2016-11-29 Thread SAIKRISHNA (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15704613#comment-15704613
 ] 

SAIKRISHNA commented on DRILL-2223:
---

Is there any way to create empty parquet schema with zero records , for my 
business use case we need to create empty parquet with schema as it is creating 
in json with zero records.

I am trying with below query getting zero records 
create table target.HIVE.employeeTest2911 AS SELECT * FROM cp.`employee.json` 
where employee_id >1157

Fragment  Number of records written
0_0  0

when I try this 
 select * from  target.HIVE.employeeTest2911
getting this exception 
org.apache.drill.common.exceptions.UserRemoteException: VALIDATION ERROR: From 
line 1, column 16 to line 1, column 21: Table 'target.HIVE.employeeTest2911' 
not found SQL Query null [Error Id: 5ee67a9b-b3ec-4ac8-88bd-13d8428f1d48 on 
DataNode1:31010]

Workspace structure is like this 

{
  "type": "file",
  "enabled": true,
  "connection": "hdfs://XXX:8020",
  "config": null,
  "workspaces": {
"HIVE": {
  "location": "/user/tmp",
  "writable": true,
  "defaultInputFormat": null
}
  },
  "formats": {
"parquet": {
  "type": "parquet"
}
  }
}
Can I have solution for this,if any one has the solution to overcome this 
please let me know, Thanks in advance

> Empty parquet file created with Limit 0 query errors out when querying
> --
>
> Key: DRILL-2223
> URL: https://issues.apache.org/jira/browse/DRILL-2223
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 0.7.0
>Reporter: Aman Sinha
> Fix For: Future
>
>
> Doing a CTAS with limit 0 creates a 0 length parquet file which errors out 
> during querying.  This should at least write the schema information and 
> metadata which will allow queries to run. 
> {code}
> 0: jdbc:drill:zk=local> create table tt_nation2 as select n_nationkey, 
> n_name, n_regionkey from cp.`tpch/nation.parquet` limit 0;
> ++---+
> |  Fragment  | Number of records written |
> ++---+
> | 0_0| 0 |
> ++---+
> 1 row selected (0.315 seconds)
> 0: jdbc:drill:zk=local> select n_nationkey from tt_nation2;
> Query failed: RuntimeException: file:/tmp/tt_nation2/0_0_0.parquet is not a 
> Parquet file (too small)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)