[jira] [Closed] (DRILL-4495) IN operator does not work

2016-03-14 Thread Parth Chandra (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Parth Chandra closed DRILL-4495.

Resolution: Fixed

> IN operator does not work
> -
>
> Key: DRILL-4495
> URL: https://issues.apache.org/jira/browse/DRILL-4495
> Project: Apache Drill
>  Issue Type: Bug
>  Components: SQL Parser
>Affects Versions: 1.6.0
>Reporter: Stefán Baxter
>Priority: Blocker
>
> I noticed that using the IN operator with a sub structure has stopped working:
> select s.client_ip.ip from dfs.asa.`/processed/venuepoint/transactions` as s 
> where s.client_ip.ip in ("unknown") limit 2;
> Error: PARSE ERROR: Encountered "\"" at line 1, column 103.
> Was expecting one of:
> ...
> SQL Query select s.client_ip.ip from 
> dfs.asa.`/processed/venuepoint/transactions` as s where s.client_ip.ip in 
> ("2.69.200.113","unknown") limit 2
>   
>   
>   ^
> (the hat should point at the _ip part of the where clause)
> I will report this in Jira and mark it as a blocker.
> The IN works with nothing here using the latest 1.6-SNAPSHOT



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4495) IN operator does not work

2016-03-14 Thread Parth Chandra (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Parth Chandra updated DRILL-4495:
-
Fix Version/s: (was: 1.6.0)

> IN operator does not work
> -
>
> Key: DRILL-4495
> URL: https://issues.apache.org/jira/browse/DRILL-4495
> Project: Apache Drill
>  Issue Type: Bug
>  Components: SQL Parser
>Affects Versions: 1.6.0
>Reporter: Stefán Baxter
>Priority: Blocker
>
> I noticed that using the IN operator with a sub structure has stopped working:
> select s.client_ip.ip from dfs.asa.`/processed/venuepoint/transactions` as s 
> where s.client_ip.ip in ("unknown") limit 2;
> Error: PARSE ERROR: Encountered "\"" at line 1, column 103.
> Was expecting one of:
> ...
> SQL Query select s.client_ip.ip from 
> dfs.asa.`/processed/venuepoint/transactions` as s where s.client_ip.ip in 
> ("2.69.200.113","unknown") limit 2
>   
>   
>   ^
> (the hat should point at the _ip part of the where clause)
> I will report this in Jira and mark it as a blocker.
> The IN works with nothing here using the latest 1.6-SNAPSHOT



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (DRILL-4495) IN operator does not work

2016-03-14 Thread Parth Chandra (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Parth Chandra reopened DRILL-4495:
--

> IN operator does not work
> -
>
> Key: DRILL-4495
> URL: https://issues.apache.org/jira/browse/DRILL-4495
> Project: Apache Drill
>  Issue Type: Bug
>  Components: SQL Parser
>Affects Versions: 1.6.0
>Reporter: Stefán Baxter
>Priority: Blocker
>
> I noticed that using the IN operator with a sub structure has stopped working:
> select s.client_ip.ip from dfs.asa.`/processed/venuepoint/transactions` as s 
> where s.client_ip.ip in ("unknown") limit 2;
> Error: PARSE ERROR: Encountered "\"" at line 1, column 103.
> Was expecting one of:
> ...
> SQL Query select s.client_ip.ip from 
> dfs.asa.`/processed/venuepoint/transactions` as s where s.client_ip.ip in 
> ("2.69.200.113","unknown") limit 2
>   
>   
>   ^
> (the hat should point at the _ip part of the where clause)
> I will report this in Jira and mark it as a blocker.
> The IN works with nothing here using the latest 1.6-SNAPSHOT



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4490) Count(*) function returns as optional instead of required

2016-03-14 Thread Jinfeng Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194313#comment-15194313
 ] 

Jinfeng Ni commented on DRILL-4490:
---

it's not in 1.6.0.  It should be in 1.7.0

> Count(*) function returns as optional instead of required
> -
>
> Key: DRILL-4490
> URL: https://issues.apache.org/jira/browse/DRILL-4490
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
>Reporter: Krystal
>Assignee: Sean Hsuan-Yi Chu
> Fix For: 1.7.0
>
>
> git.commit.id.abbrev=c8a7840
> I have the following CTAS query:
> create table test as select count(*) as col1 from cp.`tpch/orders.parquet`;
> The schema of the test table shows col1 as optional:
> message root {
>   optional int64 col1;
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4050) Add zip archives to the list of artifacts in verify_release.sh

2016-03-14 Thread Aditya Kishore (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Kishore resolved DRILL-4050.
---
   Resolution: Fixed
Fix Version/s: 1.7.0

This has been merged into master.

> Add zip archives to the list of artifacts in verify_release.sh
> --
>
> Key: DRILL-4050
> URL: https://issues.apache.org/jira/browse/DRILL-4050
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Minor
> Fix For: 1.7.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4050) Add zip archives to the list of artifacts in verify_release.sh

2016-03-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194282#comment-15194282
 ] 

ASF GitHub Bot commented on DRILL-4050:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/249


> Add zip archives to the list of artifacts in verify_release.sh
> --
>
> Key: DRILL-4050
> URL: https://issues.apache.org/jira/browse/DRILL-4050
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4504) Create an event loop for each of [user, control, data] RPC components

2016-03-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194221#comment-15194221
 ] 

ASF GitHub Bot commented on DRILL-4504:
---

Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/429#discussion_r56082440
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java ---
@@ -74,73 +74,148 @@
 /**
  * Thin wrapper around a UserClient that handles connect/close and 
transforms
  * String into ByteBuf.
+ *
+ * Use the builder class ({@link DrillClient.Builder}) to build objects of 
this class.
+ * E.g.
+ * 
+ *   DrillClient client = DrillClient.newBuilder()
+ *   .setConfig(...)
+ *   .setIsDirectConnection(true)
+ *   .build();
+ * 
  */
 public class DrillClient implements Closeable, ConnectionThrottle {
   private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(DrillClient.class);
 
   private final DrillConfig config;
-  private UserClient client;
-  private UserProperties props = null;
-  private volatile ClusterCoordinator clusterCoordinator;
-  private volatile boolean connected = false;
   private final BufferAllocator allocator;
-  private int reconnectTimes;
-  private int reconnectDelay;
-  private boolean supportComplexTypes;
-  private final boolean ownsZkConnection;
+  private final boolean isDirectConnection;
+  private final int reconnectTimes;
+  private final int reconnectDelay;
+
+  // checks if this client owns these resources (used when closing)
   private final boolean ownsAllocator;
-  private final boolean isDirectConnection; // true if the connection 
bypasses zookeeper and connects directly to a drillbit
+  private final boolean ownsZkConnection;
+  private final boolean ownsEventLoopGroup;
+  private final boolean ownsExecutor;
+
+  // if the following variables are set during construction, they are not 
overridden during or after #connect call
+  // otherwise, they are set to defaults during #connect call
   private EventLoopGroup eventLoopGroup;
   private ExecutorService executor;
+  private boolean supportComplexTypes;
+
+  // the following variables are set during connection, and must not be 
overridden later
+  private UserClient client;
+  private UserProperties props;
+  private volatile ClusterCoordinator clusterCoordinator;
--- End diff --

The query interface is expected to be threadsafe. Connect and close are not.


> Create an event loop for each of [user, control, data] RPC components
> -
>
> Key: DRILL-4504
> URL: https://issues.apache.org/jira/browse/DRILL-4504
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - RPC
>Reporter: Sudheesh Katkam
>Assignee: Sudheesh Katkam
>
> + Create an event loop group for each client-server pair (data, client and 
> user)
> + Allow DrillClient constructor to specify an event loop group (so user event 
> loop can be used for queries from Web API calls). Deprecate old DrillClient 
> constructors and create a helper class to build instances.
> Miscellaneous:
> + Move WorkEventBus from exec/rpc/control to exec/work



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4504) Create an event loop for each of [user, control, data] RPC components

2016-03-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194199#comment-15194199
 ] 

ASF GitHub Bot commented on DRILL-4504:
---

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/429#discussion_r56080621
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java ---
@@ -74,73 +74,148 @@
 /**
  * Thin wrapper around a UserClient that handles connect/close and 
transforms
  * String into ByteBuf.
+ *
+ * Use the builder class ({@link DrillClient.Builder}) to build objects of 
this class.
+ * E.g.
+ * 
+ *   DrillClient client = DrillClient.newBuilder()
+ *   .setConfig(...)
+ *   .setIsDirectConnection(true)
+ *   .build();
+ * 
  */
 public class DrillClient implements Closeable, ConnectionThrottle {
   private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(DrillClient.class);
 
   private final DrillConfig config;
-  private UserClient client;
-  private UserProperties props = null;
-  private volatile ClusterCoordinator clusterCoordinator;
-  private volatile boolean connected = false;
   private final BufferAllocator allocator;
-  private int reconnectTimes;
-  private int reconnectDelay;
-  private boolean supportComplexTypes;
-  private final boolean ownsZkConnection;
+  private final boolean isDirectConnection;
+  private final int reconnectTimes;
+  private final int reconnectDelay;
+
+  // checks if this client owns these resources (used when closing)
   private final boolean ownsAllocator;
-  private final boolean isDirectConnection; // true if the connection 
bypasses zookeeper and connects directly to a drillbit
+  private final boolean ownsZkConnection;
+  private final boolean ownsEventLoopGroup;
+  private final boolean ownsExecutor;
+
+  // if the following variables are set during construction, they are not 
overridden during or after #connect call
+  // otherwise, they are set to defaults during #connect call
   private EventLoopGroup eventLoopGroup;
   private ExecutorService executor;
+  private boolean supportComplexTypes;
+
+  // the following variables are set during connection, and must not be 
overridden later
+  private UserClient client;
+  private UserProperties props;
+  private volatile ClusterCoordinator clusterCoordinator;
--- End diff --

I'll remove modifier, and document that the class is not thread safe.


> Create an event loop for each of [user, control, data] RPC components
> -
>
> Key: DRILL-4504
> URL: https://issues.apache.org/jira/browse/DRILL-4504
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - RPC
>Reporter: Sudheesh Katkam
>Assignee: Sudheesh Katkam
>
> + Create an event loop group for each client-server pair (data, client and 
> user)
> + Allow DrillClient constructor to specify an event loop group (so user event 
> loop can be used for queries from Web API calls). Deprecate old DrillClient 
> constructors and create a helper class to build instances.
> Miscellaneous:
> + Move WorkEventBus from exec/rpc/control to exec/work



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4490) Count(*) function returns as optional instead of required

2016-03-14 Thread Sean Hsuan-Yi Chu (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194180#comment-15194180
 ] 

Sean Hsuan-Yi Chu commented on DRILL-4490:
--

[~jni] I am assuming it is not in 1.6?

> Count(*) function returns as optional instead of required
> -
>
> Key: DRILL-4490
> URL: https://issues.apache.org/jira/browse/DRILL-4490
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
>Reporter: Krystal
>Assignee: Sean Hsuan-Yi Chu
> Fix For: 1.7.0
>
>
> git.commit.id.abbrev=c8a7840
> I have the following CTAS query:
> create table test as select count(*) as col1 from cp.`tpch/orders.parquet`;
> The schema of the test table shows col1 as optional:
> message root {
>   optional int64 col1;
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4490) Count(*) function returns as optional instead of required

2016-03-14 Thread Sean Hsuan-Yi Chu (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Hsuan-Yi Chu updated DRILL-4490:
-
Fix Version/s: (was: 1.6.0)
   1.7.0

> Count(*) function returns as optional instead of required
> -
>
> Key: DRILL-4490
> URL: https://issues.apache.org/jira/browse/DRILL-4490
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
>Reporter: Krystal
>Assignee: Sean Hsuan-Yi Chu
> Fix For: 1.7.0
>
>
> git.commit.id.abbrev=c8a7840
> I have the following CTAS query:
> create table test as select count(*) as col1 from cp.`tpch/orders.parquet`;
> The schema of the test table shows col1 as optional:
> message root {
>   optional int64 col1;
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4477) Wrong Plan (potentially wrong result) if wrapping a query with SELECT * FROM

2016-03-14 Thread Sean Hsuan-Yi Chu (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194175#comment-15194175
 ] 

Sean Hsuan-Yi Chu commented on DRILL-4477:
--

It is reproduced on Calcite's Master.

https://issues.apache.org/jira/browse/CALCITE-1154

> Wrong Plan (potentially wrong result) if wrapping a query with SELECT * FROM
> 
>
> Key: DRILL-4477
> URL: https://issues.apache.org/jira/browse/DRILL-4477
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Sean Hsuan-Yi Chu
>Assignee: Sean Hsuan-Yi Chu
>Priority: Blocker
> Fix For: 1.7.0
>
> Attachments: t1.json, t2.json
>
>
> For example, a query  
> {code}
> select * from (select s.name, v.name, v.registration from 
> cp.`tpch/region.parquet` s left outer join cp.`tpch/nation.parquet` v
> on (s.name = v.name) 
> where s.age < 30) t 
> {code}
> gives a plan as below:
> {code}
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(name=[$0], name0=[$1], registration=[$2])
> 00-02Project(name=[$0], name0=[$0], registration=[$3])
> 00-03  Project(name=[$2], age=[$3], name0=[$0], registration=[$1])
> 00-04HashJoin(condition=[=($2, $0)], joinType=[right])
> 00-06  Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=classpath:/tpch/nation.parquet]], 
> selectionRoot=classpath:/tpch/nation.parquet, numFiles=1, 
> usedMetadataFile=false, columns=[`name`, `registration`]]])
> 00-05  Project(name0=[$0], age=[$1])
> 00-07SelectionVectorRemover
> 00-08  Filter(condition=[<($1, 30)])
> 00-09Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=classpath:/tpch/region.parquet]], 
> selectionRoot=classpath:/tpch/region.parquet, numFiles=1, 
> usedMetadataFile=false, columns=[`name`, `age`]]])
> {code}
> In the line 00-02, both name and name0 point at the same incoming column 
> (probably due to the JOIN CONDITION). 
> However. the fact that these two are the JOIN condition does not make a case 
> that they must be equal since implicit casting might be invoked to perform 
> the JOIN condition.
> Interestingly, if the SELECT * FROM wrapper is removed, this bug won't be 
> exposed: 
> {code}
> select s.name, v.name, v.registration from cp.`tpch/region.parquet` s left 
> outer join cp.`tpch/nation.parquet` v on (s.name = v.name) 
> where s.age < 30
> {code}
> gives 
> {code}
> 00-00Screen
> 00-01  Project(name=[$0], name0=[$1], registration=[$2])
> 00-02Project(name=[$2], name0=[$0], registration=[$1])
> 00-03  HashJoin(condition=[=($2, $0)], joinType=[right])
> 00-05Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=classpath:/tpch/nation.parquet]], 
> selectionRoot=classpath:/tpch/nation.parquet, numFiles=1, 
> usedMetadataFile=false, columns=[`name`, `registration`]]])
> 00-04Project(name0=[$0])
> 00-06  Project(name=[$0])
> 00-07SelectionVectorRemover
> 00-08  Filter(condition=[<($1, 30)])
> 00-09Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=classpath:/tpch/region.parquet]], 
> selectionRoot=classpath:/tpch/region.parquet, numFiles=1, 
> usedMetadataFile=false, columns=[`name`, `age`]]])
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4479) JsonReader should pick a less restrictive type when creating the default column

2016-03-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194094#comment-15194094
 ] 

ASF GitHub Bot commented on DRILL-4479:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/420


> JsonReader should pick a less restrictive type when creating the default 
> column
> ---
>
> Key: DRILL-4479
> URL: https://issues.apache.org/jira/browse/DRILL-4479
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Affects Versions: 1.5.0
>Reporter: Aman Sinha
>Assignee: Aman Sinha
> Fix For: 1.7.0
>
> Attachments: mostlynulls.json
>
>
> This JIRA is related to DRILL-3806 but has a narrower scope, so I decided to 
> create separate one. 
> The JsonReader has the method ensureAtLeastOneField() (see 
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/fn/JsonReader.java#L91)
>  that ensures that when no columns are found, create an empty one and it 
> chooses to create a nullable int column.  One consequence is that queries of 
> the following type fail:
> {noformat}
> select c1 from dfs.`mostlynulls.json`;
> ...
> ...
> | null  |
> | null  |
> Error: DATA_READ ERROR: Error parsing JSON - You tried to write a VarChar 
> type when you are using a ValueWriter of type NullableIntWriterImpl.
> File  /Users/asinha/data/mostlynulls.json
> Record  4097
> {noformat}
> In this file the first 4096 rows have NULL values for c1 followed by rows 
> that have a valid string.  
> It would be useful for the Json reader to choose a less restrictive type such 
> as varchar in order to allow more types of queries to run.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4405) invalid Postgres SQL generated for CONCAT (literal, literal)

2016-03-14 Thread Serge Harnyk (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194006#comment-15194006
 ] 

Serge Harnyk commented on DRILL-4405:
-

Also
select PI() from postgres.public.tversion 
throws 
org.apache.drill.common.exceptions.UserRemoteException: DATA_READ ERROR: The 
JDBC storage plugin failed while trying setup the SQL query. sql SELECT 
CAST(CAST(3.141592653589793115997963468544185161590576171875 AS DOUBLE) AS ANY) 
AS "EXPR$0" FROM "public"."tversion" plugin postgres

> invalid Postgres SQL generated for CONCAT (literal, literal) 
> -
>
> Key: DRILL-4405
> URL: https://issues.apache.org/jira/browse/DRILL-4405
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JDBC
>Affects Versions: 1.5.0
>Reporter: N Campbell
>Assignee: Serge Harnyk
>
> select concat( 'FF' , 'FF' )  from postgres.public.tversion
> Error: DATA_READ ERROR: The JDBC storage plugin failed while trying setup the 
> SQL query. 
> sql SELECT CAST('' AS ANY) AS "EXPR$0"
> FROM "public"."tversion"
> plugin postgres
> Fragment 0:0
> [Error Id: c3f24106-8d75-4a57-a638-ac5f0aca0769 on centos1:31010]
>   (org.postgresql.util.PSQLException) ERROR: syntax error at or near "ANY"
>   Position: 23
> org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse():2182
> org.postgresql.core.v3.QueryExecutorImpl.processResults():1911
> org.postgresql.core.v3.QueryExecutorImpl.execute():173
> org.postgresql.jdbc.PgStatement.execute():622
> org.postgresql.jdbc.PgStatement.executeWithFlags():458
> org.postgresql.jdbc.PgStatement.executeQuery():374
> org.apache.commons.dbcp.DelegatingStatement.executeQuery():208
> org.apache.commons.dbcp.DelegatingStatement.executeQuery():208
> org.apache.drill.exec.store.jdbc.JdbcRecordReader.setup():177
> org.apache.drill.exec.physical.impl.ScanBatch.():108
> org.apache.drill.exec.physical.impl.ScanBatch.():136
> org.apache.drill.exec.store.jdbc.JdbcBatchCreator.getBatch():40
> org.apache.drill.exec.store.jdbc.JdbcBatchCreator.getBatch():33
> org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():147
> org.apache.drill.exec.physical.impl.ImplCreator.getChildren():170
> org.apache.drill.exec.physical.impl.ImplCreator.getRootExec():101
> org.apache.drill.exec.physical.impl.ImplCreator.getExec():79
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():230
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> java.lang.Thread.run():745
> SQLState:  null
> ErrorCode: 0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4504) Create an event loop for each of [user, control, data] RPC components

2016-03-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15193971#comment-15193971
 ] 

ASF GitHub Bot commented on DRILL-4504:
---

Github user hnfgns commented on a diff in the pull request:

https://github.com/apache/drill/pull/429#discussion_r56060382
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java ---
@@ -74,73 +74,148 @@
 /**
  * Thin wrapper around a UserClient that handles connect/close and 
transforms
  * String into ByteBuf.
+ *
+ * Use the builder class ({@link DrillClient.Builder}) to build objects of 
this class.
+ * E.g.
+ * 
+ *   DrillClient client = DrillClient.newBuilder()
+ *   .setConfig(...)
+ *   .setIsDirectConnection(true)
+ *   .build();
+ * 
  */
 public class DrillClient implements Closeable, ConnectionThrottle {
   private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(DrillClient.class);
 
   private final DrillConfig config;
-  private UserClient client;
-  private UserProperties props = null;
-  private volatile ClusterCoordinator clusterCoordinator;
-  private volatile boolean connected = false;
   private final BufferAllocator allocator;
-  private int reconnectTimes;
-  private int reconnectDelay;
-  private boolean supportComplexTypes;
-  private final boolean ownsZkConnection;
+  private final boolean isDirectConnection;
+  private final int reconnectTimes;
+  private final int reconnectDelay;
+
+  // checks if this client owns these resources (used when closing)
   private final boolean ownsAllocator;
-  private final boolean isDirectConnection; // true if the connection 
bypasses zookeeper and connects directly to a drillbit
+  private final boolean ownsZkConnection;
+  private final boolean ownsEventLoopGroup;
+  private final boolean ownsExecutor;
+
+  // if the following variables are set during construction, they are not 
overridden during or after #connect call
+  // otherwise, they are set to defaults during #connect call
   private EventLoopGroup eventLoopGroup;
   private ExecutorService executor;
+  private boolean supportComplexTypes;
+
+  // the following variables are set during connection, and must not be 
overridden later
+  private UserClient client;
+  private UserProperties props;
+  private volatile ClusterCoordinator clusterCoordinator;
+  private volatile boolean connected; // = false
 
-  public DrillClient() throws OutOfMemoryException {
+  /**
+   * @deprecated Create a DrillClient using {@link DrillClient.Builder}.
+   */
+  @Deprecated
+  public DrillClient() {
 this(DrillConfig.create(), false);
   }
 
-  public DrillClient(boolean isDirect) throws OutOfMemoryException {
+  /**
+   * @deprecated Create a DrillClient using {@link DrillClient.Builder}.
+   */
+  @Deprecated
+  public DrillClient(boolean isDirect) {
 this(DrillConfig.create(), isDirect);
   }
 
-  public DrillClient(String fileName) throws OutOfMemoryException {
+  /**
+   * @deprecated Create a DrillClient using {@link DrillClient.Builder}.
+   */
+  @Deprecated
+  public DrillClient(String fileName) {
 this(DrillConfig.create(fileName), false);
   }
 
-  public DrillClient(DrillConfig config) throws OutOfMemoryException {
+  /**
+   * @deprecated Create a DrillClient using {@link DrillClient.Builder}.
+   */
+  @Deprecated
+  public DrillClient(DrillConfig config) {
 this(config, null, false);
   }
 
-  public DrillClient(DrillConfig config, boolean isDirect)
-  throws OutOfMemoryException {
+  /**
+   * @deprecated Create a DrillClient using {@link DrillClient.Builder}.
+   */
+  @Deprecated
+  public DrillClient(DrillConfig config, boolean isDirect) {
 this(config, null, isDirect);
   }
 
-  public DrillClient(DrillConfig config, ClusterCoordinator coordinator)
-throws OutOfMemoryException {
+  /**
+   * @deprecated Create a DrillClient using {@link DrillClient.Builder}.
+   */
+  @Deprecated
+  public DrillClient(DrillConfig config, ClusterCoordinator coordinator) {
 this(config, coordinator, null, false);
   }
 
-  public DrillClient(DrillConfig config, ClusterCoordinator coordinator, 
boolean isDirect)
-throws OutOfMemoryException {
+  /**
+   * @deprecated Create a DrillClient using {@link DrillClient.Builder}.
+   */
+  @Deprecated
+  public DrillClient(DrillConfig config, ClusterCoordinator coordinator, 
boolean isDirect) {
 this(config, coordinator, null, isDirect);
   }
 
-  

[jira] [Commented] (DRILL-4504) Create an event loop for each of [user, control, data] RPC components

2016-03-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15193932#comment-15193932
 ] 

ASF GitHub Bot commented on DRILL-4504:
---

Github user hnfgns commented on a diff in the pull request:

https://github.com/apache/drill/pull/429#discussion_r56058208
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java ---
@@ -74,73 +74,148 @@
 /**
  * Thin wrapper around a UserClient that handles connect/close and 
transforms
  * String into ByteBuf.
+ *
+ * Use the builder class ({@link DrillClient.Builder}) to build objects of 
this class.
+ * E.g.
+ * 
+ *   DrillClient client = DrillClient.newBuilder()
+ *   .setConfig(...)
+ *   .setIsDirectConnection(true)
+ *   .build();
+ * 
  */
 public class DrillClient implements Closeable, ConnectionThrottle {
   private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(DrillClient.class);
 
   private final DrillConfig config;
-  private UserClient client;
-  private UserProperties props = null;
-  private volatile ClusterCoordinator clusterCoordinator;
-  private volatile boolean connected = false;
   private final BufferAllocator allocator;
-  private int reconnectTimes;
-  private int reconnectDelay;
-  private boolean supportComplexTypes;
-  private final boolean ownsZkConnection;
+  private final boolean isDirectConnection;
+  private final int reconnectTimes;
+  private final int reconnectDelay;
+
+  // checks if this client owns these resources (used when closing)
   private final boolean ownsAllocator;
-  private final boolean isDirectConnection; // true if the connection 
bypasses zookeeper and connects directly to a drillbit
+  private final boolean ownsZkConnection;
+  private final boolean ownsEventLoopGroup;
+  private final boolean ownsExecutor;
+
+  // if the following variables are set during construction, they are not 
overridden during or after #connect call
+  // otherwise, they are set to defaults during #connect call
   private EventLoopGroup eventLoopGroup;
   private ExecutorService executor;
+  private boolean supportComplexTypes;
+
+  // the following variables are set during connection, and must not be 
overridden later
+  private UserClient client;
+  private UserProperties props;
+  private volatile ClusterCoordinator clusterCoordinator;
--- End diff --

-0. 

Why volatile here? Is DrillClient meant to be thread safe? If so, we seem 
to have more work to do: #close for instance. Otherwise volatile seems totally 
irrelevant.


> Create an event loop for each of [user, control, data] RPC components
> -
>
> Key: DRILL-4504
> URL: https://issues.apache.org/jira/browse/DRILL-4504
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - RPC
>Reporter: Sudheesh Katkam
>Assignee: Sudheesh Katkam
>
> + Create an event loop group for each client-server pair (data, client and 
> user)
> + Allow DrillClient constructor to specify an event loop group (so user event 
> loop can be used for queries from Web API calls). Deprecate old DrillClient 
> constructors and create a helper class to build instances.
> Miscellaneous:
> + Move WorkEventBus from exec/rpc/control to exec/work



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4476) Enhance Union-All operator for dealing with empty left input or empty both inputs

2016-03-14 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-4476:

Fix Version/s: 1.7.0

> Enhance Union-All operator for dealing with empty left input or empty both 
> inputs
> -
>
> Key: DRILL-4476
> URL: https://issues.apache.org/jira/browse/DRILL-4476
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Reporter: Sean Hsuan-Yi Chu
>Assignee: Sean Hsuan-Yi Chu
> Fix For: 1.7.0
>
>
> Union-All operator does not deal with the situation where left side comes 
> from empty source.
> Due to DRILL-2288's enhancement for empty sources, Union-All operator now can 
> be allowed to support this scenario.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4510) IllegalStateException: Failure while reading vector. Expected vector class of org.apache.drill.exec.vector.NullableIntVector but was holding vector class org.apache.dril

2016-03-14 Thread Aman Sinha (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aman Sinha updated DRILL-4510:
--
Assignee: Sean Hsuan-Yi Chu

> IllegalStateException: Failure while reading vector.  Expected vector class 
> of org.apache.drill.exec.vector.NullableIntVector but was holding vector 
> class org.apache.drill.exec.vector.NullableVarCharVector
> -
>
> Key: DRILL-4510
> URL: https://issues.apache.org/jira/browse/DRILL-4510
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Reporter: Chun Chang
>Assignee: Sean Hsuan-Yi Chu
>Priority: Critical
>
> Hit the following regression running advanced automation. Regression happened 
> between commit b979bebe83d7017880b0763adcbf8eb80acfcee8 and 
> 1f23b89623c72808f2ee866cec9b4b8a48929d68
> {noformat}
> Execution Failures:
> /root/drillAutomation/framework-master/framework/resources/Advanced/tpcds/tpcds_sf100/original/query66.sql
> Query: 
> -- start query 66 in stream 0 using template query66.tpl 
> SELECT w_warehouse_name, 
>w_warehouse_sq_ft, 
>w_city, 
>w_county, 
>w_state, 
>w_country, 
>ship_carriers, 
>year1,
>Sum(jan_sales) AS jan_sales, 
>Sum(feb_sales) AS feb_sales, 
>Sum(mar_sales) AS mar_sales, 
>Sum(apr_sales) AS apr_sales, 
>Sum(may_sales) AS may_sales, 
>Sum(jun_sales) AS jun_sales, 
>Sum(jul_sales) AS jul_sales, 
>Sum(aug_sales) AS aug_sales, 
>Sum(sep_sales) AS sep_sales, 
>Sum(oct_sales) AS oct_sales, 
>Sum(nov_sales) AS nov_sales, 
>Sum(dec_sales) AS dec_sales, 
>Sum(jan_sales / w_warehouse_sq_ft) AS jan_sales_per_sq_foot, 
>Sum(feb_sales / w_warehouse_sq_ft) AS feb_sales_per_sq_foot, 
>Sum(mar_sales / w_warehouse_sq_ft) AS mar_sales_per_sq_foot, 
>Sum(apr_sales / w_warehouse_sq_ft) AS apr_sales_per_sq_foot, 
>Sum(may_sales / w_warehouse_sq_ft) AS may_sales_per_sq_foot, 
>Sum(jun_sales / w_warehouse_sq_ft) AS jun_sales_per_sq_foot, 
>Sum(jul_sales / w_warehouse_sq_ft) AS jul_sales_per_sq_foot, 
>Sum(aug_sales / w_warehouse_sq_ft) AS aug_sales_per_sq_foot, 
>Sum(sep_sales / w_warehouse_sq_ft) AS sep_sales_per_sq_foot, 
>Sum(oct_sales / w_warehouse_sq_ft) AS oct_sales_per_sq_foot, 
>Sum(nov_sales / w_warehouse_sq_ft) AS nov_sales_per_sq_foot, 
>Sum(dec_sales / w_warehouse_sq_ft) AS dec_sales_per_sq_foot, 
>Sum(jan_net)   AS jan_net, 
>Sum(feb_net)   AS feb_net, 
>Sum(mar_net)   AS mar_net, 
>Sum(apr_net)   AS apr_net, 
>Sum(may_net)   AS may_net, 
>Sum(jun_net)   AS jun_net, 
>Sum(jul_net)   AS jul_net, 
>Sum(aug_net)   AS aug_net, 
>Sum(sep_net)   AS sep_net, 
>Sum(oct_net)   AS oct_net, 
>Sum(nov_net)   AS nov_net, 
>Sum(dec_net)   AS dec_net 
> FROM   (SELECT w_warehouse_name, 
>w_warehouse_sq_ft, 
>w_city, 
>w_county, 
>w_state, 
>w_country, 
>'ZOUROS' 
>|| ',' 
>|| 'ZHOU' AS ship_carriers, 
>d_yearAS year1, 
>Sum(CASE 
>  WHEN d_moy = 1 THEN ws_ext_sales_price * ws_quantity 
>  ELSE 0 
>END)  AS jan_sales, 
>Sum(CASE 
>  WHEN d_moy = 2 THEN ws_ext_sales_price * ws_quantity 
>  ELSE 0 
>END)  AS feb_sales, 
>Sum(CASE 
>  WHEN d_moy = 3 THEN ws_ext_sales_price * ws_quantity 
>  ELSE 0 
>END)  AS mar_sales, 
> 

[jira] [Commented] (DRILL-4510) IllegalStateException: Failure while reading vector. Expected vector class of org.apache.drill.exec.vector.NullableIntVector but was holding vector class org.apache.dr

2016-03-14 Thread Aman Sinha (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15193912#comment-15193912
 ] 

Aman Sinha commented on DRILL-4510:
---

This query has a union-all...[~seanhychu] can you take a look to see if the 
recent patch caused it.  

> IllegalStateException: Failure while reading vector.  Expected vector class 
> of org.apache.drill.exec.vector.NullableIntVector but was holding vector 
> class org.apache.drill.exec.vector.NullableVarCharVector
> -
>
> Key: DRILL-4510
> URL: https://issues.apache.org/jira/browse/DRILL-4510
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Reporter: Chun Chang
>Priority: Critical
>
> Hit the following regression running advanced automation. Regression happened 
> between commit b979bebe83d7017880b0763adcbf8eb80acfcee8 and 
> 1f23b89623c72808f2ee866cec9b4b8a48929d68
> {noformat}
> Execution Failures:
> /root/drillAutomation/framework-master/framework/resources/Advanced/tpcds/tpcds_sf100/original/query66.sql
> Query: 
> -- start query 66 in stream 0 using template query66.tpl 
> SELECT w_warehouse_name, 
>w_warehouse_sq_ft, 
>w_city, 
>w_county, 
>w_state, 
>w_country, 
>ship_carriers, 
>year1,
>Sum(jan_sales) AS jan_sales, 
>Sum(feb_sales) AS feb_sales, 
>Sum(mar_sales) AS mar_sales, 
>Sum(apr_sales) AS apr_sales, 
>Sum(may_sales) AS may_sales, 
>Sum(jun_sales) AS jun_sales, 
>Sum(jul_sales) AS jul_sales, 
>Sum(aug_sales) AS aug_sales, 
>Sum(sep_sales) AS sep_sales, 
>Sum(oct_sales) AS oct_sales, 
>Sum(nov_sales) AS nov_sales, 
>Sum(dec_sales) AS dec_sales, 
>Sum(jan_sales / w_warehouse_sq_ft) AS jan_sales_per_sq_foot, 
>Sum(feb_sales / w_warehouse_sq_ft) AS feb_sales_per_sq_foot, 
>Sum(mar_sales / w_warehouse_sq_ft) AS mar_sales_per_sq_foot, 
>Sum(apr_sales / w_warehouse_sq_ft) AS apr_sales_per_sq_foot, 
>Sum(may_sales / w_warehouse_sq_ft) AS may_sales_per_sq_foot, 
>Sum(jun_sales / w_warehouse_sq_ft) AS jun_sales_per_sq_foot, 
>Sum(jul_sales / w_warehouse_sq_ft) AS jul_sales_per_sq_foot, 
>Sum(aug_sales / w_warehouse_sq_ft) AS aug_sales_per_sq_foot, 
>Sum(sep_sales / w_warehouse_sq_ft) AS sep_sales_per_sq_foot, 
>Sum(oct_sales / w_warehouse_sq_ft) AS oct_sales_per_sq_foot, 
>Sum(nov_sales / w_warehouse_sq_ft) AS nov_sales_per_sq_foot, 
>Sum(dec_sales / w_warehouse_sq_ft) AS dec_sales_per_sq_foot, 
>Sum(jan_net)   AS jan_net, 
>Sum(feb_net)   AS feb_net, 
>Sum(mar_net)   AS mar_net, 
>Sum(apr_net)   AS apr_net, 
>Sum(may_net)   AS may_net, 
>Sum(jun_net)   AS jun_net, 
>Sum(jul_net)   AS jul_net, 
>Sum(aug_net)   AS aug_net, 
>Sum(sep_net)   AS sep_net, 
>Sum(oct_net)   AS oct_net, 
>Sum(nov_net)   AS nov_net, 
>Sum(dec_net)   AS dec_net 
> FROM   (SELECT w_warehouse_name, 
>w_warehouse_sq_ft, 
>w_city, 
>w_county, 
>w_state, 
>w_country, 
>'ZOUROS' 
>|| ',' 
>|| 'ZHOU' AS ship_carriers, 
>d_yearAS year1, 
>Sum(CASE 
>  WHEN d_moy = 1 THEN ws_ext_sales_price * ws_quantity 
>  ELSE 0 
>END)  AS jan_sales, 
>Sum(CASE 
>  WHEN d_moy = 2 THEN ws_ext_sales_price * ws_quantity 
>  ELSE 0 
>END)  AS feb_sales, 
>Sum(CASE 
>  WHEN d_moy = 3 THEN ws_ext_sales_price * ws_quantity 
>   

[jira] [Created] (DRILL-4510) IllegalStateException: Failure while reading vector. Expected vector class of org.apache.drill.exec.vector.NullableIntVector but was holding vector class org.apache.dril

2016-03-14 Thread Chun Chang (JIRA)
Chun Chang created DRILL-4510:
-

 Summary: IllegalStateException: Failure while reading vector.  
Expected vector class of org.apache.drill.exec.vector.NullableIntVector but was 
holding vector class org.apache.drill.exec.vector.NullableVarCharVector
 Key: DRILL-4510
 URL: https://issues.apache.org/jira/browse/DRILL-4510
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Data Types
Reporter: Chun Chang
Priority: Critical


Hit the following regression running advanced automation. Regression happened 
between commit b979bebe83d7017880b0763adcbf8eb80acfcee8 and 
1f23b89623c72808f2ee866cec9b4b8a48929d68

{noformat}
Execution Failures:
/root/drillAutomation/framework-master/framework/resources/Advanced/tpcds/tpcds_sf100/original/query66.sql
Query: 
-- start query 66 in stream 0 using template query66.tpl 
SELECT w_warehouse_name, 
   w_warehouse_sq_ft, 
   w_city, 
   w_county, 
   w_state, 
   w_country, 
   ship_carriers, 
   year1,
   Sum(jan_sales) AS jan_sales, 
   Sum(feb_sales) AS feb_sales, 
   Sum(mar_sales) AS mar_sales, 
   Sum(apr_sales) AS apr_sales, 
   Sum(may_sales) AS may_sales, 
   Sum(jun_sales) AS jun_sales, 
   Sum(jul_sales) AS jul_sales, 
   Sum(aug_sales) AS aug_sales, 
   Sum(sep_sales) AS sep_sales, 
   Sum(oct_sales) AS oct_sales, 
   Sum(nov_sales) AS nov_sales, 
   Sum(dec_sales) AS dec_sales, 
   Sum(jan_sales / w_warehouse_sq_ft) AS jan_sales_per_sq_foot, 
   Sum(feb_sales / w_warehouse_sq_ft) AS feb_sales_per_sq_foot, 
   Sum(mar_sales / w_warehouse_sq_ft) AS mar_sales_per_sq_foot, 
   Sum(apr_sales / w_warehouse_sq_ft) AS apr_sales_per_sq_foot, 
   Sum(may_sales / w_warehouse_sq_ft) AS may_sales_per_sq_foot, 
   Sum(jun_sales / w_warehouse_sq_ft) AS jun_sales_per_sq_foot, 
   Sum(jul_sales / w_warehouse_sq_ft) AS jul_sales_per_sq_foot, 
   Sum(aug_sales / w_warehouse_sq_ft) AS aug_sales_per_sq_foot, 
   Sum(sep_sales / w_warehouse_sq_ft) AS sep_sales_per_sq_foot, 
   Sum(oct_sales / w_warehouse_sq_ft) AS oct_sales_per_sq_foot, 
   Sum(nov_sales / w_warehouse_sq_ft) AS nov_sales_per_sq_foot, 
   Sum(dec_sales / w_warehouse_sq_ft) AS dec_sales_per_sq_foot, 
   Sum(jan_net)   AS jan_net, 
   Sum(feb_net)   AS feb_net, 
   Sum(mar_net)   AS mar_net, 
   Sum(apr_net)   AS apr_net, 
   Sum(may_net)   AS may_net, 
   Sum(jun_net)   AS jun_net, 
   Sum(jul_net)   AS jul_net, 
   Sum(aug_net)   AS aug_net, 
   Sum(sep_net)   AS sep_net, 
   Sum(oct_net)   AS oct_net, 
   Sum(nov_net)   AS nov_net, 
   Sum(dec_net)   AS dec_net 
FROM   (SELECT w_warehouse_name, 
   w_warehouse_sq_ft, 
   w_city, 
   w_county, 
   w_state, 
   w_country, 
   'ZOUROS' 
   || ',' 
   || 'ZHOU' AS ship_carriers, 
   d_yearAS year1, 
   Sum(CASE 
 WHEN d_moy = 1 THEN ws_ext_sales_price * ws_quantity 
 ELSE 0 
   END)  AS jan_sales, 
   Sum(CASE 
 WHEN d_moy = 2 THEN ws_ext_sales_price * ws_quantity 
 ELSE 0 
   END)  AS feb_sales, 
   Sum(CASE 
 WHEN d_moy = 3 THEN ws_ext_sales_price * ws_quantity 
 ELSE 0 
   END)  AS mar_sales, 
   Sum(CASE 
 WHEN d_moy = 4 THEN ws_ext_sales_price * ws_quantity 
 ELSE 0 
   END)  AS apr_sales, 
   Sum(CASE 
 WHEN d_moy = 5 THEN ws_ext_sales_price * ws_quantity 
 ELSE 0 
   END)  AS may_sales, 
   Sum(CASE 
 WHEN d_moy = 6 THEN ws_ext_sales_price * ws_quantity 
 ELSE 0 
   END)  AS jun_sales, 
   Sum(CASE 
 WHEN 

[jira] [Commented] (DRILL-4505) Can't group by or sort across files with different schema

2016-03-14 Thread Jason Altekruse (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15193881#comment-15193881
 ] 

Jason Altekruse commented on DRILL-4505:


[~tobad357] Can you try to add a cast to your column APPLICATION_ID? Work is 
ongoing to fully support changing schema, which includes a concept of an 
untyped null that tries to defer materialization until it is needed. In this 
case I believe it is possible that we are materializing the column that does 
not appear in some of the files to a default type (we arbitrarily chose 
nullable bigint before starting work on the full changing schema support). 
Casting these automatically materialized nulls to the correct type may resolve 
the issue you are seeing.

If this doesn't fix the issue, you can try to enable the union type, but it is 
currently considered an experimental feature and needs to be more thoroughly 
tested.

alter session set `exec.enable_union_type` = true

https://issues.apache.org/jira/browse/DRILL-3229

> Can't group by or sort across files with different schema
> -
>
> Key: DRILL-4505
> URL: https://issues.apache.org/jira/browse/DRILL-4505
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.5.0
> Environment: Java 1.8
>Reporter: Tobias
>
> We are currently trying out the support for querying across parquet files 
> with different schemas.
> Simple selects work well but when we wan't to do sort or group by Drill 
> returns "UNSUPPORTED_OPERATION ERROR: Sort doesn't currently support sorts 
> with changing schemas Fragment 0:0 [Error Id: 
> ff490670-64c1-4fb8-990e-a02aa44ac010 on zookeeper-1:31010]"
> This is despite not even including the new columns in the query.
> Expected result would be to treat the non existing columns in certain files 
> as either null or default value and allow them to be grouped and sorted
> Example
> SELECT APPLICATION_ID ,dir0 AS year_ FROM dfs.`/PRO/UTC/1` WHERE dir2 
> >='2016-01-01' AND dir2<'2016-04-02' work with changing schema
> but SELECT max(APPLICATION_ID ),dir0 AS year_ FROM dfs.`/PRO/UTC/1` WHERE 
> dir2 >='2016-01-01' AND dir2<'2016-04-02'  group by dir0 does not work
> For us this hampers any possibility to have an evolving schema with moderatly 
> complex queries



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4479) JsonReader should pick a less restrictive type when creating the default column

2016-03-14 Thread Hanifi Gunes (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanifi Gunes updated DRILL-4479:

Assignee: Aman Sinha  (was: Hanifi Gunes)

> JsonReader should pick a less restrictive type when creating the default 
> column
> ---
>
> Key: DRILL-4479
> URL: https://issues.apache.org/jira/browse/DRILL-4479
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Affects Versions: 1.5.0
>Reporter: Aman Sinha
>Assignee: Aman Sinha
> Fix For: 1.7.0
>
> Attachments: mostlynulls.json
>
>
> This JIRA is related to DRILL-3806 but has a narrower scope, so I decided to 
> create separate one. 
> The JsonReader has the method ensureAtLeastOneField() (see 
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/fn/JsonReader.java#L91)
>  that ensures that when no columns are found, create an empty one and it 
> chooses to create a nullable int column.  One consequence is that queries of 
> the following type fail:
> {noformat}
> select c1 from dfs.`mostlynulls.json`;
> ...
> ...
> | null  |
> | null  |
> Error: DATA_READ ERROR: Error parsing JSON - You tried to write a VarChar 
> type when you are using a ValueWriter of type NullableIntWriterImpl.
> File  /Users/asinha/data/mostlynulls.json
> Record  4097
> {noformat}
> In this file the first 4096 rows have NULL values for c1 followed by rows 
> that have a valid string.  
> It would be useful for the Json reader to choose a less restrictive type such 
> as varchar in order to allow more types of queries to run.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4479) JsonReader should pick a less restrictive type when creating the default column

2016-03-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15193867#comment-15193867
 ] 

ASF GitHub Bot commented on DRILL-4479:
---

Github user hnfgns commented on the pull request:

https://github.com/apache/drill/pull/420#issuecomment-196464641
  
+1


> JsonReader should pick a less restrictive type when creating the default 
> column
> ---
>
> Key: DRILL-4479
> URL: https://issues.apache.org/jira/browse/DRILL-4479
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Affects Versions: 1.5.0
>Reporter: Aman Sinha
>Assignee: Hanifi Gunes
> Fix For: 1.7.0
>
> Attachments: mostlynulls.json
>
>
> This JIRA is related to DRILL-3806 but has a narrower scope, so I decided to 
> create separate one. 
> The JsonReader has the method ensureAtLeastOneField() (see 
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/fn/JsonReader.java#L91)
>  that ensures that when no columns are found, create an empty one and it 
> chooses to create a nullable int column.  One consequence is that queries of 
> the following type fail:
> {noformat}
> select c1 from dfs.`mostlynulls.json`;
> ...
> ...
> | null  |
> | null  |
> Error: DATA_READ ERROR: Error parsing JSON - You tried to write a VarChar 
> type when you are using a ValueWriter of type NullableIntWriterImpl.
> File  /Users/asinha/data/mostlynulls.json
> Record  4097
> {noformat}
> In this file the first 4096 rows have NULL values for c1 followed by rows 
> that have a valid string.  
> It would be useful for the Json reader to choose a less restrictive type such 
> as varchar in order to allow more types of queries to run.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (DRILL-4503) Schema change exception even with all_text_mode enabled

2016-03-14 Thread Aman Sinha (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aman Sinha reassigned DRILL-4503:
-

Assignee: Aman Sinha  (was: Hanifi Gunes)

> Schema change exception even with all_text_mode enabled
> ---
>
> Key: DRILL-4503
> URL: https://issues.apache.org/jira/browse/DRILL-4503
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Affects Versions: 1.6.0
>Reporter: Aman Sinha
>Assignee: Aman Sinha
> Fix For: 1.7.0
>
> Attachments: mostlynulls_1.json
>
>
> Both HashAggregate and StreamingAggregate encounter schema change error whey 
> querying a JSON file with non-null values for column 'a' and many null values 
> for column 'c'.
> This occurs even when all_text_mode is enabled, which seems counterintuitive 
> since once all_text_mode is enabled, everything (including nulls) should be 
> treated as varchar and one would expect no schema change errors.  
> Here are some example queries that encounter this error: 
> {noformat}
> 0: jdbc:drill:zk=local> select a, c from dfs.`mostlynulls_1.json` group by a, 
> c;
> Error: UNSUPPORTED_OPERATION ERROR: Hash aggregate does not support schema 
> changes
> 0: jdbc:drill:zk=local> alter session set `store.json.all_text_mode` = true;
> +---++
> |  ok   |  summary   |
> +---++
> | true  | store.json.all_text_mode updated.  |
> +---++
> 1 row selected (0.15 seconds)
> 0: jdbc:drill:zk=local> select a, c from dfs.`mostlynulls_1.json` group by a, 
> c;
> Error: UNSUPPORTED_OPERATION ERROR: Hash aggregate does not support schema 
> changes
> 0: jdbc:drill:zk=local> select min(a), min(c) from dfs.`mostlynulls_1.json`;
> Error: UNSUPPORTED_OPERATION ERROR: Streaming aggregate does not support 
> schema changes
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4490) Count(*) function returns as optional instead of required

2016-03-14 Thread Jinfeng Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni resolved DRILL-4490.
---
   Resolution: Fixed
Fix Version/s: 1.6.0

Fixed in commit: 46e3de790da8f9c6d2d18e7e40fd37c01b3b1681


> Count(*) function returns as optional instead of required
> -
>
> Key: DRILL-4490
> URL: https://issues.apache.org/jira/browse/DRILL-4490
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
>Reporter: Krystal
>Assignee: Sean Hsuan-Yi Chu
> Fix For: 1.6.0
>
>
> git.commit.id.abbrev=c8a7840
> I have the following CTAS query:
> create table test as select count(*) as col1 from cp.`tpch/orders.parquet`;
> The schema of the test table shows col1 as optional:
> message root {
>   optional int64 col1;
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4504) Create an event loop for each of [user, control, data] RPC components

2016-03-14 Thread Sudheesh Katkam (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15193833#comment-15193833
 ] 

Sudheesh Katkam commented on DRILL-4504:


Pull request [#429|https://github.com/apache/drill/pull/429].

> Create an event loop for each of [user, control, data] RPC components
> -
>
> Key: DRILL-4504
> URL: https://issues.apache.org/jira/browse/DRILL-4504
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - RPC
>Reporter: Sudheesh Katkam
>Assignee: Sudheesh Katkam
>
> + Create an event loop group for each client-server pair (data, client and 
> user)
> + Allow DrillClient constructor to specify an event loop group (so user event 
> loop can be used for queries from Web API calls). Deprecate old DrillClient 
> constructors and create a helper class to build instances.
> Miscellaneous:
> + Move WorkEventBus from exec/rpc/control to exec/work



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4509) Ignore unknown storage plugin configs while starting Drillbit

2016-03-14 Thread Venki Korukanti (JIRA)
Venki Korukanti created DRILL-4509:
--

 Summary: Ignore unknown storage plugin configs while starting 
Drillbit
 Key: DRILL-4509
 URL: https://issues.apache.org/jira/browse/DRILL-4509
 Project: Apache Drill
  Issue Type: Bug
  Components:  Server
Affects Versions: 1.5.0
Reporter: Venki Korukanti
Priority: Minor
 Fix For: 1.7.0


If zookeeper contains a storage plugin configuration whose implementation is 
not found while starting the Drillbit, Drillbit throws an error and fails to 
restart:

{code}
Could not resolve type id 'newPlugin' into a subtype of [simple type, class 
org.apache.drill.common.logical.StoragePluginConfig]: known type ids = 
[InfoSchemaConfig, StoragePluginConfig, SystemTablePluginConfig, file, hbase, 
hive, jdbc, kudu, mock, mongo, named]
{code}

Should we ignore such plugins with a warning in logs and continue starting 
Drillbit?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3714) Query runs out of memory and remains in CANCELLATION_REQUESTED state until drillbit is restarted

2016-03-14 Thread Deneche A. Hakim (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15193754#comment-15193754
 ] 

Deneche A. Hakim commented on DRILL-3714:
-

Although I do see the OutOfMemory exception in the logs, the query is still 
shown as RUNNING in the WebUI

> Query runs out of memory and remains in CANCELLATION_REQUESTED state until 
> drillbit is restarted
> 
>
> Key: DRILL-3714
> URL: https://issues.apache.org/jira/browse/DRILL-3714
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.2.0
>Reporter: Victoria Markman
>Assignee: Deneche A. Hakim
>Priority: Critical
> Fix For: 1.7.0
>
> Attachments: Screen Shot 2015-08-26 at 10.36.33 AM.png, drillbit.log, 
> jstack.txt, query_profile_2a2210a7-7a78-c774-d54c-c863d0b77bb0.json
>
>
> This is a variation of DRILL-3705 with the difference of drill behavior when 
> hitting OOM condition.
> Query runs out of memory during execution and remains in 
> "CANCELLATION_REQUESTED" state until drillbit is bounced.
> Client (sqlline in this case) never gets a response from the server.
> Reproduction details:
> Single node drillbit installation.
> DRILL_MAX_DIRECT_MEMORY="8G"
> DRILL_HEAP="4G"
> Run this query on TPCDS SF100 data set
> {code}
> SELECT SUM(ss.ss_net_paid_inc_tax) OVER (PARTITION BY ss.ss_store_sk) AS 
> TotalSpend FROM store_sales ss WHERE ss.ss_store_sk IS NOT NULL ORDER BY 1 
> LIMIT 10;
> {code}
> drillbit.log
> {code}
> 2015-08-26 16:54:58,469 [2a2210a7-7a78-c774-d54c-c863d0b77bb0:frag:3:22] INFO 
>  o.a.d.e.w.f.FragmentStatusReporter - 
> 2a2210a7-7a78-c774-d54c-c863d0b77bb0:3:22: State to report: RUNNING
> 2015-08-26 16:55:50,498 [BitServer-5] WARN  
> o.a.drill.exec.rpc.data.DataServer - Message of mode REQUEST of rpc type 3 
> took longer than 500ms.  Actual duration was 2569ms.
> 2015-08-26 16:56:31,086 [BitServer-5] ERROR 
> o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication.  
> Connection: /10.10.88.133:31012 <--> /10.10.88.133:54554 (data server).  
> Closing connection.
> io.netty.handler.codec.DecoderException: java.lang.OutOfMemoryError: Direct 
> buffer memory
> at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:233)
>  ~[netty-codec-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:618)
>  [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
> at 
> io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:329) 
> [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
> at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:250) 
> [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>  [netty-common-4.0.27.Final.jar:4.0.27.Final]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
> Caused by: java.lang.OutOfMemoryError: Direct buffer memory
> at java.nio.Bits.reserveMemory(Bits.java:658) ~[na:1.7.0_71]
> at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) 
> ~[na:1.7.0_71]
> at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) 
> ~[na:1.7.0_71]
> at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:437) 
> ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:179) 
> ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.buffer.PoolArena.allocate(PoolArena.java:168) 
> 

[jira] [Commented] (DRILL-3714) Query runs out of memory and remains in CANCELLATION_REQUESTED state until drillbit is restarted

2016-03-14 Thread Deneche A. Hakim (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15193746#comment-15193746
 ] 

Deneche A. Hakim commented on DRILL-3714:
-

Was able to reproduce it using Drill 1.2.0 release but only after I reduced 
direct memory to 4G

> Query runs out of memory and remains in CANCELLATION_REQUESTED state until 
> drillbit is restarted
> 
>
> Key: DRILL-3714
> URL: https://issues.apache.org/jira/browse/DRILL-3714
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.2.0
>Reporter: Victoria Markman
>Assignee: Deneche A. Hakim
>Priority: Critical
> Fix For: 1.7.0
>
> Attachments: Screen Shot 2015-08-26 at 10.36.33 AM.png, drillbit.log, 
> jstack.txt, query_profile_2a2210a7-7a78-c774-d54c-c863d0b77bb0.json
>
>
> This is a variation of DRILL-3705 with the difference of drill behavior when 
> hitting OOM condition.
> Query runs out of memory during execution and remains in 
> "CANCELLATION_REQUESTED" state until drillbit is bounced.
> Client (sqlline in this case) never gets a response from the server.
> Reproduction details:
> Single node drillbit installation.
> DRILL_MAX_DIRECT_MEMORY="8G"
> DRILL_HEAP="4G"
> Run this query on TPCDS SF100 data set
> {code}
> SELECT SUM(ss.ss_net_paid_inc_tax) OVER (PARTITION BY ss.ss_store_sk) AS 
> TotalSpend FROM store_sales ss WHERE ss.ss_store_sk IS NOT NULL ORDER BY 1 
> LIMIT 10;
> {code}
> drillbit.log
> {code}
> 2015-08-26 16:54:58,469 [2a2210a7-7a78-c774-d54c-c863d0b77bb0:frag:3:22] INFO 
>  o.a.d.e.w.f.FragmentStatusReporter - 
> 2a2210a7-7a78-c774-d54c-c863d0b77bb0:3:22: State to report: RUNNING
> 2015-08-26 16:55:50,498 [BitServer-5] WARN  
> o.a.drill.exec.rpc.data.DataServer - Message of mode REQUEST of rpc type 3 
> took longer than 500ms.  Actual duration was 2569ms.
> 2015-08-26 16:56:31,086 [BitServer-5] ERROR 
> o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication.  
> Connection: /10.10.88.133:31012 <--> /10.10.88.133:54554 (data server).  
> Closing connection.
> io.netty.handler.codec.DecoderException: java.lang.OutOfMemoryError: Direct 
> buffer memory
> at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:233)
>  ~[netty-codec-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:618)
>  [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
> at 
> io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:329) 
> [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
> at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:250) 
> [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>  [netty-common-4.0.27.Final.jar:4.0.27.Final]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
> Caused by: java.lang.OutOfMemoryError: Direct buffer memory
> at java.nio.Bits.reserveMemory(Bits.java:658) ~[na:1.7.0_71]
> at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) 
> ~[na:1.7.0_71]
> at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) 
> ~[na:1.7.0_71]
> at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:437) 
> ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:179) 
> ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.buffer.PoolArena.allocate(PoolArena.java:168) 
> 

[jira] [Assigned] (DRILL-3714) Query runs out of memory and remains in CANCELLATION_REQUESTED state until drillbit is restarted

2016-03-14 Thread Deneche A. Hakim (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deneche A. Hakim reassigned DRILL-3714:
---

Assignee: Deneche A. Hakim  (was: Sudheesh Katkam)

> Query runs out of memory and remains in CANCELLATION_REQUESTED state until 
> drillbit is restarted
> 
>
> Key: DRILL-3714
> URL: https://issues.apache.org/jira/browse/DRILL-3714
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.2.0
>Reporter: Victoria Markman
>Assignee: Deneche A. Hakim
>Priority: Critical
> Fix For: 1.7.0
>
> Attachments: Screen Shot 2015-08-26 at 10.36.33 AM.png, drillbit.log, 
> jstack.txt, query_profile_2a2210a7-7a78-c774-d54c-c863d0b77bb0.json
>
>
> This is a variation of DRILL-3705 with the difference of drill behavior when 
> hitting OOM condition.
> Query runs out of memory during execution and remains in 
> "CANCELLATION_REQUESTED" state until drillbit is bounced.
> Client (sqlline in this case) never gets a response from the server.
> Reproduction details:
> Single node drillbit installation.
> DRILL_MAX_DIRECT_MEMORY="8G"
> DRILL_HEAP="4G"
> Run this query on TPCDS SF100 data set
> {code}
> SELECT SUM(ss.ss_net_paid_inc_tax) OVER (PARTITION BY ss.ss_store_sk) AS 
> TotalSpend FROM store_sales ss WHERE ss.ss_store_sk IS NOT NULL ORDER BY 1 
> LIMIT 10;
> {code}
> drillbit.log
> {code}
> 2015-08-26 16:54:58,469 [2a2210a7-7a78-c774-d54c-c863d0b77bb0:frag:3:22] INFO 
>  o.a.d.e.w.f.FragmentStatusReporter - 
> 2a2210a7-7a78-c774-d54c-c863d0b77bb0:3:22: State to report: RUNNING
> 2015-08-26 16:55:50,498 [BitServer-5] WARN  
> o.a.drill.exec.rpc.data.DataServer - Message of mode REQUEST of rpc type 3 
> took longer than 500ms.  Actual duration was 2569ms.
> 2015-08-26 16:56:31,086 [BitServer-5] ERROR 
> o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication.  
> Connection: /10.10.88.133:31012 <--> /10.10.88.133:54554 (data server).  
> Closing connection.
> io.netty.handler.codec.DecoderException: java.lang.OutOfMemoryError: Direct 
> buffer memory
> at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:233)
>  ~[netty-codec-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:618)
>  [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
> at 
> io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:329) 
> [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
> at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:250) 
> [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>  [netty-common-4.0.27.Final.jar:4.0.27.Final]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
> Caused by: java.lang.OutOfMemoryError: Direct buffer memory
> at java.nio.Bits.reserveMemory(Bits.java:658) ~[na:1.7.0_71]
> at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) 
> ~[na:1.7.0_71]
> at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) 
> ~[na:1.7.0_71]
> at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:437) 
> ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:179) 
> ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.buffer.PoolArena.allocate(PoolArena.java:168) 
> ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.buffer.PoolArena.reallocate(PoolArena.java:280) 
> 

[jira] [Commented] (DRILL-4508) Null proof all AutoCloseable.close() methods

2016-03-14 Thread Venki Korukanti (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15193734#comment-15193734
 ] 

Venki Korukanti commented on DRILL-4508:


We need it in the implementation of AutoCloseable.close(). 

> Null proof all AutoCloseable.close() methods
> 
>
> Key: DRILL-4508
> URL: https://issues.apache.org/jira/browse/DRILL-4508
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Affects Versions: 1.5.0
>Reporter: Venki Korukanti
>Priority: Minor
> Fix For: 1.7.0
>
>
> If Drillbit fails to start (due to incorrect configuration or storage plugin 
> information not found etc.), we end up calling close on various components 
> such as WebServer, Drillbit etc. Some of these components may not have 
> initialized and may have null values. Close() method is not checking for null 
> values before reading them. One example is here:
> {code}
> java.lang.NullPointerException: null
> at 
> org.apache.drill.exec.server.options.SystemOptionManager.close(SystemOptionManager.java:280)
>  ~[drill-java-exec-1.6.0.jar:1.6.0]
> at 
> org.apache.drill.exec.server.DrillbitContext.close(DrillbitContext.java:185) 
> ~[drill-java-exec-1.6.0.jar:1.6.0]
> at org.apache.drill.exec.work.WorkManager.close(WorkManager.java:157) 
> ~[drill-java-exec-1.6.0.jar:1.6.0]
> at 
> org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:76) 
> ~[drill-common-1.6.0.jar:1.6.0]
> at 
> org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:64) 
> ~[drill-common-1.6.0.jar:1.6.0]
> at org.apache.drill.exec.server.Drillbit.close(Drillbit.java:149) 
> [drill-java-exec-1.6.0.jar:1.6.0]
> at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:283) 
> [drill-java-exec-1.6.0.jar:1.6.0]
> at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:261) 
> [drill-java-exec-1.6.0.jar:1.6.0]
> at org.apache.drill.exec.server.Drillbit.main(Drillbit.java:257) 
> [drill-java-exec-1.6.0.jar:1.6.0]
> {code}
> This masks the actual error (incorrect configuration) and it is hard to know 
> what went wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4508) Null proof all AutoCloseable.close() methods

2016-03-14 Thread Sudheesh Katkam (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15193729#comment-15193729
 ] 

Sudheesh Katkam commented on DRILL-4508:


(AutoCloseables already does the null check).

> Null proof all AutoCloseable.close() methods
> 
>
> Key: DRILL-4508
> URL: https://issues.apache.org/jira/browse/DRILL-4508
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Affects Versions: 1.5.0
>Reporter: Venki Korukanti
>Priority: Minor
> Fix For: 1.7.0
>
>
> If Drillbit fails to start (due to incorrect configuration or storage plugin 
> information not found etc.), we end up calling close on various components 
> such as WebServer, Drillbit etc. Some of these components may not have 
> initialized and may have null values. Close() method is not checking for null 
> values before reading them. One example is here:
> {code}
> java.lang.NullPointerException: null
> at 
> org.apache.drill.exec.server.options.SystemOptionManager.close(SystemOptionManager.java:280)
>  ~[drill-java-exec-1.6.0.jar:1.6.0]
> at 
> org.apache.drill.exec.server.DrillbitContext.close(DrillbitContext.java:185) 
> ~[drill-java-exec-1.6.0.jar:1.6.0]
> at org.apache.drill.exec.work.WorkManager.close(WorkManager.java:157) 
> ~[drill-java-exec-1.6.0.jar:1.6.0]
> at 
> org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:76) 
> ~[drill-common-1.6.0.jar:1.6.0]
> at 
> org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:64) 
> ~[drill-common-1.6.0.jar:1.6.0]
> at org.apache.drill.exec.server.Drillbit.close(Drillbit.java:149) 
> [drill-java-exec-1.6.0.jar:1.6.0]
> at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:283) 
> [drill-java-exec-1.6.0.jar:1.6.0]
> at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:261) 
> [drill-java-exec-1.6.0.jar:1.6.0]
> at org.apache.drill.exec.server.Drillbit.main(Drillbit.java:257) 
> [drill-java-exec-1.6.0.jar:1.6.0]
> {code}
> This masks the actual error (incorrect configuration) and it is hard to know 
> what went wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4508) Null proof all AutoCloseable.close() methods

2016-03-14 Thread Sudheesh Katkam (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15193726#comment-15193726
 ] 

Sudheesh Katkam commented on DRILL-4508:


The null check should be in DrillbitContext and not in AutoCloseables, right?

> Null proof all AutoCloseable.close() methods
> 
>
> Key: DRILL-4508
> URL: https://issues.apache.org/jira/browse/DRILL-4508
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Affects Versions: 1.5.0
>Reporter: Venki Korukanti
>Priority: Minor
> Fix For: 1.7.0
>
>
> If Drillbit fails to start (due to incorrect configuration or storage plugin 
> information not found etc.), we end up calling close on various components 
> such as WebServer, Drillbit etc. Some of these components may not have 
> initialized and may have null values. Close() method is not checking for null 
> values before reading them. One example is here:
> {code}
> java.lang.NullPointerException: null
> at 
> org.apache.drill.exec.server.options.SystemOptionManager.close(SystemOptionManager.java:280)
>  ~[drill-java-exec-1.6.0.jar:1.6.0]
> at 
> org.apache.drill.exec.server.DrillbitContext.close(DrillbitContext.java:185) 
> ~[drill-java-exec-1.6.0.jar:1.6.0]
> at org.apache.drill.exec.work.WorkManager.close(WorkManager.java:157) 
> ~[drill-java-exec-1.6.0.jar:1.6.0]
> at 
> org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:76) 
> ~[drill-common-1.6.0.jar:1.6.0]
> at 
> org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:64) 
> ~[drill-common-1.6.0.jar:1.6.0]
> at org.apache.drill.exec.server.Drillbit.close(Drillbit.java:149) 
> [drill-java-exec-1.6.0.jar:1.6.0]
> at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:283) 
> [drill-java-exec-1.6.0.jar:1.6.0]
> at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:261) 
> [drill-java-exec-1.6.0.jar:1.6.0]
> at org.apache.drill.exec.server.Drillbit.main(Drillbit.java:257) 
> [drill-java-exec-1.6.0.jar:1.6.0]
> {code}
> This masks the actual error (incorrect configuration) and it is hard to know 
> what went wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4508) Null proof all AutoCloseable.close() methods

2016-03-14 Thread Venki Korukanti (JIRA)
Venki Korukanti created DRILL-4508:
--

 Summary: Null proof all AutoCloseable.close() methods
 Key: DRILL-4508
 URL: https://issues.apache.org/jira/browse/DRILL-4508
 Project: Apache Drill
  Issue Type: Bug
  Components:  Server
Affects Versions: 1.5.0
Reporter: Venki Korukanti
Priority: Minor
 Fix For: 1.7.0


If Drillbit fails to start (due to incorrect configuration or storage plugin 
information not found etc.), we end up calling close on various components such 
as WebServer, Drillbit etc. Some of these components may not have initialized 
and may have null values. Close() method is not checking for null values before 
reading them. One example is here:

{code}
java.lang.NullPointerException: null
at 
org.apache.drill.exec.server.options.SystemOptionManager.close(SystemOptionManager.java:280)
 ~[drill-java-exec-1.6.0.jar:1.6.0]
at 
org.apache.drill.exec.server.DrillbitContext.close(DrillbitContext.java:185) 
~[drill-java-exec-1.6.0.jar:1.6.0]
at org.apache.drill.exec.work.WorkManager.close(WorkManager.java:157) 
~[drill-java-exec-1.6.0.jar:1.6.0]
at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:76) 
~[drill-common-1.6.0.jar:1.6.0]
at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:64) 
~[drill-common-1.6.0.jar:1.6.0]
at org.apache.drill.exec.server.Drillbit.close(Drillbit.java:149) 
[drill-java-exec-1.6.0.jar:1.6.0]
at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:283) 
[drill-java-exec-1.6.0.jar:1.6.0]
at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:261) 
[drill-java-exec-1.6.0.jar:1.6.0]
at org.apache.drill.exec.server.Drillbit.main(Drillbit.java:257) 
[drill-java-exec-1.6.0.jar:1.6.0]
{code}

This masks the actual error (incorrect configuration) and it is hard to know 
what went wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4347) Planning time for query64 from TPCDS test suite has increased 10 times compared to 1.4 release

2016-03-14 Thread Victoria Markman (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Victoria Markman updated DRILL-4347:

Reviewer: Victoria Markman

> Planning time for query64 from TPCDS test suite has increased 10 times 
> compared to 1.4 release
> --
>
> Key: DRILL-4347
> URL: https://issues.apache.org/jira/browse/DRILL-4347
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.5.0
>Reporter: Victoria Markman
>Assignee: Jinfeng Ni
> Fix For: 1.7.0
>
> Attachments: 294e9fb9-cdda-a89f-d1a7-b852878926a1.sys.drill_1.4.0, 
> 294ea418-9fb8-3082-1725-74e3cfe38fe9.sys.drill_1.5.0
>
>
> mapr-drill-1.5.0.201602012001-1.noarch.rpm
> {code}
> 0: jdbc:drill:schema=dfs> WITH cs_ui
> . . . . . . . . . . . . >  AS (SELECT cs_item_sk,
> . . . . . . . . . . . . > Sum(cs_ext_list_price) AS sale,
> . . . . . . . . . . . . > Sum(cr_refunded_cash + 
> cr_reversed_charge
> . . . . . . . . . . . . > + cr_store_credit) AS refund
> . . . . . . . . . . . . >  FROM   catalog_sales,
> . . . . . . . . . . . . > catalog_returns
> . . . . . . . . . . . . >  WHERE  cs_item_sk = cr_item_sk
> . . . . . . . . . . . . > AND cs_order_number = 
> cr_order_number
> . . . . . . . . . . . . >  GROUP  BY cs_item_sk
> . . . . . . . . . . . . >  HAVING Sum(cs_ext_list_price) > 2 * Sum(
> . . . . . . . . . . . . > cr_refunded_cash + 
> cr_reversed_charge
> . . . . . . . . . . . . > + cr_store_credit)),
> . . . . . . . . . . . . >  cross_sales
> . . . . . . . . . . . . >  AS (SELECT i_product_name product_name,
> . . . . . . . . . . . . > i_item_sk  item_sk,
> . . . . . . . . . . . . > s_store_name   store_name,
> . . . . . . . . . . . . > s_zip  store_zip,
> . . . . . . . . . . . . > ad1.ca_street_number   
> b_street_number,
> . . . . . . . . . . . . > ad1.ca_street_name 
> b_streen_name,
> . . . . . . . . . . . . > ad1.ca_cityb_city,
> . . . . . . . . . . . . > ad1.ca_zip b_zip,
> . . . . . . . . . . . . > ad2.ca_street_number   
> c_street_number,
> . . . . . . . . . . . . > ad2.ca_street_name 
> c_street_name,
> . . . . . . . . . . . . > ad2.ca_cityc_city,
> . . . . . . . . . . . . > ad2.ca_zip c_zip,
> . . . . . . . . . . . . > d1.d_year  AS syear,
> . . . . . . . . . . . . > d2.d_year  AS fsyear,
> . . . . . . . . . . . . > d3.d_year  s2year,
> . . . . . . . . . . . . > Count(*)   cnt,
> . . . . . . . . . . . . > Sum(ss_wholesale_cost) s1,
> . . . . . . . . . . . . > Sum(ss_list_price) s2,
> . . . . . . . . . . . . > Sum(ss_coupon_amt) s3
> . . . . . . . . . . . . >  FROM   store_sales,
> . . . . . . . . . . . . > store_returns,
> . . . . . . . . . . . . > cs_ui,
> . . . . . . . . . . . . > date_dim d1,
> . . . . . . . . . . . . > date_dim d2,
> . . . . . . . . . . . . > date_dim d3,
> . . . . . . . . . . . . > store,
> . . . . . . . . . . . . > customer,
> . . . . . . . . . . . . > customer_demographics cd1,
> . . . . . . . . . . . . > customer_demographics cd2,
> . . . . . . . . . . . . > promotion,
> . . . . . . . . . . . . > household_demographics hd1,
> . . . . . . . . . . . . > household_demographics hd2,
> . . . . . . . . . . . . > customer_address ad1,
> . . . . . . . . . . . . > customer_address ad2,
> . . . . . . . . . . . . > income_band ib1,
> . . . . . . . . . . . . > income_band ib2,
> . . . . . . . . . . . . > item
> . . . . . . . . . . . . >  WHERE  ss_store_sk = s_store_sk
> . . . . . . . . . . . . > AND ss_sold_date_sk = d1.d_date_sk
> . . . . . . . . . . . . > AND ss_customer_sk = c_customer_sk
> . . . . . . . . . . . . > AND ss_cdemo_sk = cd1.cd_demo_sk
> . . . . . . . . . . . . > AND ss_hdemo_sk = hd1.hd_demo_sk
> . . . . . . . . . . . . > AND ss_addr_sk = ad1.ca_address_sk
> . . . . . . . . . . . . > AND ss_item_sk = 

[jira] [Commented] (DRILL-4474) Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)

2016-03-14 Thread Jinfeng Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15193681#comment-15193681
 ] 

Jinfeng Ni commented on DRILL-4474:
---

[~ssmane3], this patch will be included in the upcoming 1.6.0 release. 


> Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)
> --
>
> Key: DRILL-4474
> URL: https://issues.apache.org/jira/browse/DRILL-4474
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.5.0
> Environment: m3.xlarge AWS instances ( 3 nodes)
> CentOS6.5 x64
>Reporter: Shankar
>Assignee: Jacques Nadeau
>Priority: Blocker
> Fix For: 1.6.0
>
>
> {quote}
> * We are using drill to retrieve the business data from game analytic. 
> * We are running below queries on table of size 50GB (parquet)
> * We have found some major inconsistency in data when we use COUNT function.
> * Below is the case by case queries and their output. {color:blue}*Please 
> analyse it carefully, to for clear understanding of behaviour. *{color}
> * Please let me know how to resolve this ? (or any earlier JIRA has been 
> already created). 
> * Hope this may be fixed in later versions. If not please do the needful.
> {quote}
> --
> CASE-1 (Wrong result)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +---+
> |   count   |
> +---+
> | 27645752  |
> +---+
> 1 row selected (0.281 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-2 (Wrong result)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(sessionid), 
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +---+---+
> |  EXPR$0   |  cnt  |
> +---+---+
> | 37772844  | 2108  |
> +---+---+
> 1 row selected (12.597 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-3 (Wrong result, only first count is correct)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(distinct sessionid), 
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +-+---+
> | EXPR$0  |cnt|
> +-+---+
> | 201941  | 37772844  |
> +-+---+
> 1 row selected (8.259 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-4 (Correct result)
> --
> {color:green}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(distinct case when t.id = '/confirmDrop/btnYes/' and 
> t.event = 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +--+
> | cnt  |
> +--+
> | 525  |
> +--+
> 1 row selected (14.318 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-5 (Correct result)
> --
> {color:green}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(sessionid),
> . . . . . . 

[jira] [Resolved] (DRILL-4474) Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)

2016-03-14 Thread Jinfeng Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni resolved DRILL-4474.
---
   Resolution: Fixed
Fix Version/s: 1.6.0

Fixed in commit: 49ae6d363efe78df4e89f7913d1d560e9627b325

> Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)
> --
>
> Key: DRILL-4474
> URL: https://issues.apache.org/jira/browse/DRILL-4474
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.5.0
> Environment: m3.xlarge AWS instances ( 3 nodes)
> CentOS6.5 x64
>Reporter: Shankar
>Assignee: Jacques Nadeau
>Priority: Blocker
> Fix For: 1.6.0
>
>
> {quote}
> * We are using drill to retrieve the business data from game analytic. 
> * We are running below queries on table of size 50GB (parquet)
> * We have found some major inconsistency in data when we use COUNT function.
> * Below is the case by case queries and their output. {color:blue}*Please 
> analyse it carefully, to for clear understanding of behaviour. *{color}
> * Please let me know how to resolve this ? (or any earlier JIRA has been 
> already created). 
> * Hope this may be fixed in later versions. If not please do the needful.
> {quote}
> --
> CASE-1 (Wrong result)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +---+
> |   count   |
> +---+
> | 27645752  |
> +---+
> 1 row selected (0.281 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-2 (Wrong result)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(sessionid), 
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +---+---+
> |  EXPR$0   |  cnt  |
> +---+---+
> | 37772844  | 2108  |
> +---+---+
> 1 row selected (12.597 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-3 (Wrong result, only first count is correct)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(distinct sessionid), 
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +-+---+
> | EXPR$0  |cnt|
> +-+---+
> | 201941  | 37772844  |
> +-+---+
> 1 row selected (8.259 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-4 (Correct result)
> --
> {color:green}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(distinct case when t.id = '/confirmDrop/btnYes/' and 
> t.event = 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +--+
> | cnt  |
> +--+
> | 525  |
> +--+
> 1 row selected (14.318 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-5 (Correct result)
> --
> {color:green}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(sessionid),
> . . . . . . . 

[jira] [Commented] (DRILL-4507) TO_TIMESTAMP does not generate TIMESTAMP data type in metadata

2016-03-14 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15193516#comment-15193516
 ] 

Jacques Nadeau commented on DRILL-4507:
---

I believe this should be fixed by DRILL-4372 once it is merged. Sean do you 
want to add a test case to confirm as part of that patch?

> TO_TIMESTAMP does not generate TIMESTAMP data type in metadata
> --
>
> Key: DRILL-4507
> URL: https://issues.apache.org/jira/browse/DRILL-4507
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.5.0
>Reporter: Ian Hellstrom
>Assignee: Sean Hsuan-Yi Chu
>
> When creating a view that contains the TO_TIMESTAMP() casting function, the 
> resulting column does not show up as a TIMESTAMP but rather as data type ANY:
> {code}
>  CREATE VIEW timestamp_test AS SELECT TO_TIMESTAMP('2008-2-23 12:00:00', 
> '-MM-dd HH:mm:ss') FROM (VALUES(1));
> DESCRIBE timestamp_test;
> {code}
> yields:
> {code}
> +--++--+
> | COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
> +--++--+
> | EXPR$0   | ANY| YES  |
> +--++--+
> {code}
> The same is true when using TO_DATE and SUBSTR.
> Explicit casts with CAST(ts AS TIMESTAMP) or CAST(str AS VARCHAR(10)) work as 
> expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4507) TO_TIMESTAMP does not generate TIMESTAMP data type in metadata

2016-03-14 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated DRILL-4507:
--
Assignee: Sean Hsuan-Yi Chu

> TO_TIMESTAMP does not generate TIMESTAMP data type in metadata
> --
>
> Key: DRILL-4507
> URL: https://issues.apache.org/jira/browse/DRILL-4507
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.5.0
>Reporter: Ian Hellstrom
>Assignee: Sean Hsuan-Yi Chu
>
> When creating a view that contains the TO_TIMESTAMP() casting function, the 
> resulting column does not show up as a TIMESTAMP but rather as data type ANY:
> {code}
>  CREATE VIEW timestamp_test AS SELECT TO_TIMESTAMP('2008-2-23 12:00:00', 
> '-MM-dd HH:mm:ss') FROM (VALUES(1));
> DESCRIBE timestamp_test;
> {code}
> yields:
> {code}
> +--++--+
> | COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
> +--++--+
> | EXPR$0   | ANY| YES  |
> +--++--+
> {code}
> The same is true when using TO_DATE and SUBSTR.
> Explicit casts with CAST(ts AS TIMESTAMP) or CAST(str AS VARCHAR(10)) work as 
> expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (DRILL-3705) Query runs out of memory, reported as FAILED and leaves thread running

2016-03-14 Thread Deneche A. Hakim (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deneche A. Hakim closed DRILL-3705.
---
Resolution: Cannot Reproduce

I was able to reproduce the NPE on Drill-1.2.0 but it's no longer possible in 
current master. The query still runs out of memory but it no longer throws a 
NPE in the RPC layer

> Query runs out of memory, reported as FAILED and leaves thread running 
> ---
>
> Key: DRILL-3705
> URL: https://issues.apache.org/jira/browse/DRILL-3705
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.2.0
>Reporter: Victoria Markman
>Assignee: Deneche A. Hakim
>Priority: Critical
> Fix For: 1.7.0
>
> Attachments: 2a2451ec-09d8-9f26-e856-5fd349ae72fd.sys.drill, 
> drillbit.log, jstack.txt
>
>
> Single node drill installation
> DRILL_MAX_DIRECT_MEMORY="2G"
> DRILL_HEAP="1G"
> Execute tpcds query 15 SF100 (parquet) with the settings above. Reproduces 2 
> out of 3 times.
> {code}
> SELECT ca.ca_zip,
>Sum(cs.cs_sales_price)
> FROM   catalog_salescs,
>customer c,
>customer_address ca,
>date_dim dd
> WHERE  cs.cs_bill_customer_sk = c.c_customer_sk
>AND c.c_current_addr_sk = ca.ca_address_sk
>AND ( Substr(ca.ca_zip, 1, 5) IN ( '85669', '86197', '88274', '83405',
>'86475', '85392', '85460', '80348',
>'81792' )
>   OR ca.ca_state IN ( 'CA', 'WA', 'GA' )
>   OR cs.cs_sales_price > 500 )
>AND cs.cs_sold_date_sk = dd.d_date_sk
>AND dd.d_qoy = 1
>AND dd.d_year = 1998
> GROUP  BY ca.ca_zip
> ORDER  BY ca.ca_zip
> LIMIT 100;
> {code}
> Query runs out of memory, but leaves thread behind even though it is reported 
> as FAILED (expected result)
> Snippet from jstack:
> {code}
> "2a2451ec-09d8-9f26-e856-5fd349ae72fd:frag:4:0" daemon prio=10 
> tid=0x7f507414 nid=0x3000 waiting on condition [0x7f5055b66000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0xc012b038> (a 
> java.util.concurrent.Semaphore$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
> at java.util.concurrent.Semaphore.acquire(Semaphore.java:472)
> at 
> org.apache.drill.exec.ops.SendingAccountor.waitForSendComplete(SendingAccountor.java:48)
> - locked <0xc012b068> (a 
> org.apache.drill.exec.ops.SendingAccountor)
> at 
> org.apache.drill.exec.ops.FragmentContext.waitForSendComplete(FragmentContext.java:436)
> at 
> org.apache.drill.exec.physical.impl.BaseRootExec.close(BaseRootExec.java:112)
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:341)
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:173)
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:292)
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> NPE in drillbit.log:
> {code}
> 2015-08-24 23:52:04,486 [BitServer-5] ERROR 
> o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication.  
> Connection: /10.10.88.133:31012 <--> /10.10.88.133:52417 (data server).  
> Closing connection.
> io.netty.handler.codec.DecoderException: java.lang.NullPointerException
> at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:99)
>  [netty-codec-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> 

[jira] [Updated] (DRILL-4376) Wrong results when doing a count(*) on part of directories with metadata cache

2016-03-14 Thread Aman Sinha (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aman Sinha updated DRILL-4376:
--
Assignee: Deneche A. Hakim  (was: Aman Sinha)

> Wrong results when doing a count(*) on part of directories with metadata cache
> --
>
> Key: DRILL-4376
> URL: https://issues.apache.org/jira/browse/DRILL-4376
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.4.0
>Reporter: Deneche A. Hakim
>Assignee: Deneche A. Hakim
>Priority: Critical
> Fix For: 1.7.0
>
>
> First create some parquet tables in multiple subfolders:
> {noformat}
> create table dfs.tmp.`test/201501` as select employee_id, full_name from 
> cp.`employee.json` limit 2;
> create table dfs.tmp.`test/201502` as select employee_id, full_name from 
> cp.`employee.json` limit 2;
> create table dfs.tmp.`test/201601` as select employee_id, full_name from 
> cp.`employee.json` limit 2;
> create table dfs.tmp.`test/201602` as select employee_id, full_name from 
> cp.`employee.json` limit 2;
> {noformat}
> Running the following query gives the expected count:
> {noformat}
> select count(*) from dfs.tmp.`test/20160*`;
> +-+
> | EXPR$0  |
> +-+
> | 4   |
> +-+
> {noformat}
> But once you create the metadata cache files, the query no longer returns the 
> correct results:
> {noformat}
> refresh table metadata dfs.tmp.`test`;
> select count(*) from dfs.tmp.`test/20160*`;
> +-+
> | EXPR$0  |
> +-+
> | 2   |
> +-+
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4376) Wrong results when doing a count(*) on part of directories with metadata cache

2016-03-14 Thread Aman Sinha (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15193462#comment-15193462
 ] 

Aman Sinha commented on DRILL-4376:
---

Already reviewed along with DRILL-4484.

> Wrong results when doing a count(*) on part of directories with metadata cache
> --
>
> Key: DRILL-4376
> URL: https://issues.apache.org/jira/browse/DRILL-4376
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.4.0
>Reporter: Deneche A. Hakim
>Assignee: Deneche A. Hakim
>Priority: Critical
> Fix For: 1.7.0
>
>
> First create some parquet tables in multiple subfolders:
> {noformat}
> create table dfs.tmp.`test/201501` as select employee_id, full_name from 
> cp.`employee.json` limit 2;
> create table dfs.tmp.`test/201502` as select employee_id, full_name from 
> cp.`employee.json` limit 2;
> create table dfs.tmp.`test/201601` as select employee_id, full_name from 
> cp.`employee.json` limit 2;
> create table dfs.tmp.`test/201602` as select employee_id, full_name from 
> cp.`employee.json` limit 2;
> {noformat}
> Running the following query gives the expected count:
> {noformat}
> select count(*) from dfs.tmp.`test/20160*`;
> +-+
> | EXPR$0  |
> +-+
> | 4   |
> +-+
> {noformat}
> But once you create the metadata cache files, the query no longer returns the 
> correct results:
> {noformat}
> refresh table metadata dfs.tmp.`test`;
> select count(*) from dfs.tmp.`test/20160*`;
> +-+
> | EXPR$0  |
> +-+
> | 2   |
> +-+
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (DRILL-3705) Query runs out of memory, reported as FAILED and leaves thread running

2016-03-14 Thread Deneche A. Hakim (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deneche A. Hakim reassigned DRILL-3705:
---

Assignee: Deneche A. Hakim  (was: Sudheesh Katkam)

> Query runs out of memory, reported as FAILED and leaves thread running 
> ---
>
> Key: DRILL-3705
> URL: https://issues.apache.org/jira/browse/DRILL-3705
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.2.0
>Reporter: Victoria Markman
>Assignee: Deneche A. Hakim
>Priority: Critical
> Fix For: 1.7.0
>
> Attachments: 2a2451ec-09d8-9f26-e856-5fd349ae72fd.sys.drill, 
> drillbit.log, jstack.txt
>
>
> Single node drill installation
> DRILL_MAX_DIRECT_MEMORY="2G"
> DRILL_HEAP="1G"
> Execute tpcds query 15 SF100 (parquet) with the settings above. Reproduces 2 
> out of 3 times.
> {code}
> SELECT ca.ca_zip,
>Sum(cs.cs_sales_price)
> FROM   catalog_salescs,
>customer c,
>customer_address ca,
>date_dim dd
> WHERE  cs.cs_bill_customer_sk = c.c_customer_sk
>AND c.c_current_addr_sk = ca.ca_address_sk
>AND ( Substr(ca.ca_zip, 1, 5) IN ( '85669', '86197', '88274', '83405',
>'86475', '85392', '85460', '80348',
>'81792' )
>   OR ca.ca_state IN ( 'CA', 'WA', 'GA' )
>   OR cs.cs_sales_price > 500 )
>AND cs.cs_sold_date_sk = dd.d_date_sk
>AND dd.d_qoy = 1
>AND dd.d_year = 1998
> GROUP  BY ca.ca_zip
> ORDER  BY ca.ca_zip
> LIMIT 100;
> {code}
> Query runs out of memory, but leaves thread behind even though it is reported 
> as FAILED (expected result)
> Snippet from jstack:
> {code}
> "2a2451ec-09d8-9f26-e856-5fd349ae72fd:frag:4:0" daemon prio=10 
> tid=0x7f507414 nid=0x3000 waiting on condition [0x7f5055b66000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0xc012b038> (a 
> java.util.concurrent.Semaphore$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
> at java.util.concurrent.Semaphore.acquire(Semaphore.java:472)
> at 
> org.apache.drill.exec.ops.SendingAccountor.waitForSendComplete(SendingAccountor.java:48)
> - locked <0xc012b068> (a 
> org.apache.drill.exec.ops.SendingAccountor)
> at 
> org.apache.drill.exec.ops.FragmentContext.waitForSendComplete(FragmentContext.java:436)
> at 
> org.apache.drill.exec.physical.impl.BaseRootExec.close(BaseRootExec.java:112)
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:341)
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:173)
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:292)
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> NPE in drillbit.log:
> {code}
> 2015-08-24 23:52:04,486 [BitServer-5] ERROR 
> o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication.  
> Connection: /10.10.88.133:31012 <--> /10.10.88.133:52417 (data server).  
> Closing connection.
> io.netty.handler.codec.DecoderException: java.lang.NullPointerException
> at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:99)
>  [netty-codec-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.handler.timeout.ReadTimeoutHandler.channelRead(ReadTimeoutHandler.java:150)
>  [netty-handler-4.0.27.Final.jar:4.0.27.Final]
> at 
> 

[jira] [Updated] (DRILL-4507) TO_TIMESTAMP does not generate TIMESTAMP data type in metadata

2016-03-14 Thread Ian Hellstrom (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ian Hellstrom updated DRILL-4507:
-
Description: 
When creating a view that contains the TO_TIMESTAMP() casting function, the 
resulting column does not show up as a TIMESTAMP but rather as data type ANY:

{code}
 CREATE VIEW timestamp_test AS SELECT TO_TIMESTAMP('2008-2-23 12:00:00', 
'-MM-dd HH:mm:ss') FROM (VALUES(1));
DESCRIBE timestamp_test;
{code}

yields:

{code}
+--++--+
| COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
+--++--+
| EXPR$0   | ANY| YES  |
+--++--+
{code}

The same is true when using TO_DATE and SUBSTR.

Explicit casts with CAST(ts AS TIMESTAMP) or CAST(str AS VARCHAR(10)) work as 
expected.

  was:
When creating a view that contains the TO_TIMESTAMP() casting function, the 
resulting column does not show up as a TIMESTAMP but rather as data type ANY:

{code}
 CREATE VIEW timestamp_test AS SELECT TO_TIMESTAMP('2008-2-23 12:00:00', 
'-MM-dd HH:mm:ss') FROM (VALUES(1));
DESCRIBE timestamp_test;
{code}

yields:

{code}
+--++--+
| COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
+--++--+
| EXPR$0   | ANY| YES  |
+--++--+
{code}

The same is true when using SUBSTR, which ought to return strings, but in 
reality shows up as ANY in the description.

Explicit casts with CAST(ts AS TIMESTAMP) or CAST(str AS VARCHAR(10)) work as 
expected.


> TO_TIMESTAMP does not generate TIMESTAMP data type in metadata
> --
>
> Key: DRILL-4507
> URL: https://issues.apache.org/jira/browse/DRILL-4507
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.5.0
>Reporter: Ian Hellstrom
>
> When creating a view that contains the TO_TIMESTAMP() casting function, the 
> resulting column does not show up as a TIMESTAMP but rather as data type ANY:
> {code}
>  CREATE VIEW timestamp_test AS SELECT TO_TIMESTAMP('2008-2-23 12:00:00', 
> '-MM-dd HH:mm:ss') FROM (VALUES(1));
> DESCRIBE timestamp_test;
> {code}
> yields:
> {code}
> +--++--+
> | COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
> +--++--+
> | EXPR$0   | ANY| YES  |
> +--++--+
> {code}
> The same is true when using TO_DATE and SUBSTR.
> Explicit casts with CAST(ts AS TIMESTAMP) or CAST(str AS VARCHAR(10)) work as 
> expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4507) TO_TIMESTAMP does not generate TIMESTAMP data type in metadata

2016-03-14 Thread Ian Hellstrom (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ian Hellstrom updated DRILL-4507:
-
Description: 
When creating a view that contains the TO_TIMESTAMP() casting function, the 
resulting column does not show up as a TIMESTAMP but rather as data type ANY:

{code}
 CREATE VIEW timestamp_test AS SELECT TO_TIMESTAMP('2008-2-23 12:00:00', 
'-MM-dd HH:mm:ss') FROM (VALUES(1));
DESCRIBE timestamp_test;
{code}

yields:

{code}
+--++--+
| COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
+--++--+
| EXPR$0   | ANY| YES  |
+--++--+
{code}

The same is true when using SUBSTR, which ought to return strings, but in 
reality shows up as ANY in the description.

Explicit casts with CAST(ts AS TIMESTAMP) or CAST(str AS VARCHAR(10)) work as 
expected.

  was:
When creating a view that contains the TO_TIMESTAMP() casting function, the 
resulting column does not show up as a TIMESTAMP but rather as data type ANY:

{code}
 CREATE VIEW timestamp_test AS SELECT TO_TIMESTAMP('2008-2-23 12:00:00', 
'-MM-dd HH:mm:ss') FROM (VALUES(1));
DESCRIBE timestamp_test;
{code}

yields:

{code}
+--++--+
| COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
+--++--+
| EXPR$0   | ANY| YES  |
+--++--+
{code}

The same is true when using SUBSTR, which ought to return strings, but in 
reality shows up as ANY in the description.


> TO_TIMESTAMP does not generate TIMESTAMP data type in metadata
> --
>
> Key: DRILL-4507
> URL: https://issues.apache.org/jira/browse/DRILL-4507
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.5.0
>Reporter: Ian Hellstrom
>
> When creating a view that contains the TO_TIMESTAMP() casting function, the 
> resulting column does not show up as a TIMESTAMP but rather as data type ANY:
> {code}
>  CREATE VIEW timestamp_test AS SELECT TO_TIMESTAMP('2008-2-23 12:00:00', 
> '-MM-dd HH:mm:ss') FROM (VALUES(1));
> DESCRIBE timestamp_test;
> {code}
> yields:
> {code}
> +--++--+
> | COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
> +--++--+
> | EXPR$0   | ANY| YES  |
> +--++--+
> {code}
> The same is true when using SUBSTR, which ought to return strings, but in 
> reality shows up as ANY in the description.
> Explicit casts with CAST(ts AS TIMESTAMP) or CAST(str AS VARCHAR(10)) work as 
> expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4507) TO_TIMESTAMP does not generate TIMESTAMP data type in metadata

2016-03-14 Thread Ian Hellstrom (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ian Hellstrom updated DRILL-4507:
-
Description: 
When creating a view that contains the {TO_TIMESTAMP()} casting function, the 
resulting column does not show up as a `TIMESTAMP` but rather as `ANY`:

{code}
 CREATE VIEW timestamp_test AS SELECT TO_TIMESTAMP('2008-2-23 12:00:00', 
'-MM-dd HH:mm:ss') FROM (VALUES(1));
DESCRIBE timestamp_test;
{code}

yields:

{code}
+--++--+
| COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
+--++--+
| EXPR$0   | ANY| YES  |
+--++--+
{code}

The same is true when using `SUBSTR`, which ought to return strings, but in 
reality shows up as `ANY` in the description.

  was:
When creating a view that contains the `TO_TIMESTAMP()` casting function, the 
resulting column does not show up as a `TIMESTAMP` but rather as `ANY`:

{code}
 CREATE VIEW timestamp_test AS SELECT TO_TIMESTAMP('2008-2-23 12:00:00', 
'-MM-dd HH:mm:ss') FROM (VALUES(1));
DESCRIBE timestamp_test;
{code}

yields:

{code}
+--++--+
| COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
+--++--+
| EXPR$0   | ANY| YES  |
+--++--+
{code}

The same is true when using `SUBSTR`, which ought to return strings, but in 
reality shows up as `ANY` in the description.


> TO_TIMESTAMP does not generate TIMESTAMP data type in metadata
> --
>
> Key: DRILL-4507
> URL: https://issues.apache.org/jira/browse/DRILL-4507
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.5.0
>Reporter: Ian Hellstrom
>
> When creating a view that contains the {TO_TIMESTAMP()} casting function, the 
> resulting column does not show up as a `TIMESTAMP` but rather as `ANY`:
> {code}
>  CREATE VIEW timestamp_test AS SELECT TO_TIMESTAMP('2008-2-23 12:00:00', 
> '-MM-dd HH:mm:ss') FROM (VALUES(1));
> DESCRIBE timestamp_test;
> {code}
> yields:
> {code}
> +--++--+
> | COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
> +--++--+
> | EXPR$0   | ANY| YES  |
> +--++--+
> {code}
> The same is true when using `SUBSTR`, which ought to return strings, but in 
> reality shows up as `ANY` in the description.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4507) TO_TIMESTAMP does not generate TIMESTAMP data type in metadata

2016-03-14 Thread Ian Hellstrom (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ian Hellstrom updated DRILL-4507:
-
Description: 
When creating a view that contains the TO_TIMESTAMP() casting function, the 
resulting column does not show up as a TIMESTAMP but rather as data type ANY:

{code}
 CREATE VIEW timestamp_test AS SELECT TO_TIMESTAMP('2008-2-23 12:00:00', 
'-MM-dd HH:mm:ss') FROM (VALUES(1));
DESCRIBE timestamp_test;
{code}

yields:

{code}
+--++--+
| COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
+--++--+
| EXPR$0   | ANY| YES  |
+--++--+
{code}

The same is true when using SUBSTR, which ought to return strings, but in 
reality shows up as ANY in the description.

  was:
When creating a view that contains the {TO_TIMESTAMP()} casting function, the 
resulting column does not show up as a `TIMESTAMP` but rather as `ANY`:

{code}
 CREATE VIEW timestamp_test AS SELECT TO_TIMESTAMP('2008-2-23 12:00:00', 
'-MM-dd HH:mm:ss') FROM (VALUES(1));
DESCRIBE timestamp_test;
{code}

yields:

{code}
+--++--+
| COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
+--++--+
| EXPR$0   | ANY| YES  |
+--++--+
{code}

The same is true when using `SUBSTR`, which ought to return strings, but in 
reality shows up as `ANY` in the description.


> TO_TIMESTAMP does not generate TIMESTAMP data type in metadata
> --
>
> Key: DRILL-4507
> URL: https://issues.apache.org/jira/browse/DRILL-4507
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.5.0
>Reporter: Ian Hellstrom
>
> When creating a view that contains the TO_TIMESTAMP() casting function, the 
> resulting column does not show up as a TIMESTAMP but rather as data type ANY:
> {code}
>  CREATE VIEW timestamp_test AS SELECT TO_TIMESTAMP('2008-2-23 12:00:00', 
> '-MM-dd HH:mm:ss') FROM (VALUES(1));
> DESCRIBE timestamp_test;
> {code}
> yields:
> {code}
> +--++--+
> | COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
> +--++--+
> | EXPR$0   | ANY| YES  |
> +--++--+
> {code}
> The same is true when using SUBSTR, which ought to return strings, but in 
> reality shows up as ANY in the description.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4507) TO_TIMESTAMP does not generate TIMESTAMP data type in metadata

2016-03-14 Thread Ian Hellstrom (JIRA)
Ian Hellstrom created DRILL-4507:


 Summary: TO_TIMESTAMP does not generate TIMESTAMP data type in 
metadata
 Key: DRILL-4507
 URL: https://issues.apache.org/jira/browse/DRILL-4507
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Data Types
Affects Versions: 1.5.0
Reporter: Ian Hellstrom


When creating a view that contains the `TO_TIMESTAMP()` casting function, the 
resulting column does not show up as a `TIMESTAMP` but rather as `ANY`:

{code}
 CREATE VIEW timestamp_test AS SELECT TO_TIMESTAMP('2008-2-23 12:00:00', 
'-MM-dd HH:mm:ss') FROM (VALUES(1));
DESCRIBE timestamp_test;
{code}

yields:

{code}
+--++--+
| COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
+--++--+
| EXPR$0   | ANY| YES  |
+--++--+
{code}

The same is true when using `SUBSTR`, which ought to return strings, but in 
reality shows up as `ANY` in the description.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4497) Casting strings with leading/trailing spaces to integers does not work

2016-03-14 Thread Ian Hellstrom (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ian Hellstrom updated DRILL-4497:
-
Affects Version/s: 1.5.0

> Casting strings with leading/trailing spaces to integers does not work
> --
>
> Key: DRILL-4497
> URL: https://issues.apache.org/jira/browse/DRILL-4497
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Ian Hellstrom
>
> When casting a string with leading and/or trailing spaces to an integer type 
> (e.g. INT or BIGINT), an exception is thrown. Casting the same numbers to 
> floating-point numbers works.
> This is inconsistent and extremely confusing. Simply using TRIM() before 
> casting works though.
> {code}
> SELECT CAST(' 1' AS INT) FROM ... 
> SELECT CAST('1 ' AS INT) FROM ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4506) Allow substitution variables in SQL scripts

2016-03-14 Thread Ian Hellstrom (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ian Hellstrom updated DRILL-4506:
-
Affects Version/s: 1.5.0

> Allow substitution variables in SQL scripts
> ---
>
> Key: DRILL-4506
> URL: https://issues.apache.org/jira/browse/DRILL-4506
> Project: Apache Drill
>  Issue Type: Wish
>  Components: SQL Parser
>Affects Versions: 1.5.0
>Reporter: Ian Hellstrom
>  Labels: features
>
> It would be great if substitution variables could be created à la SQL*Plus in 
> scripts. This would be especially helpful when objects need to be 
> (re-)created from scratch (i.e. in testing), and they share e.g. workspaces, 
> which have to be hard-coded otherwise.
> The following is a rough idea (based on SQL*Plus syntax):
> {code}
> IN_LOCATION=hdfs.project
> OUT_LOCATION=hdfs.storage
> CREATE VIEW _LOCATION.view_name AS
> SELECT * FROM _LOCATION.table_name;
> {code}
> For a first implementation, it would be best to have simply substitution 
> variables and not allow computations to be done, although it is conceivable 
> that that would be a second step.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4474) Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)

2016-03-14 Thread Shankar (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15192874#comment-15192874
 ] 

Shankar commented on DRILL-4474:


{color:green}
Thanks counts are proper now. 
{color}

I have done "mvn clean install" and used generated tarball 
(./distribution/target/apache-drill-1.6.0-SNAPSHOT.tar.gz).
*Shall i use this generated tarball to query my production data ?*

And i hope this fixed will get added in upcoming releases. 


Below are updated data for your reference: 


{quote}
{noformat}

select  
count(case when t.id = '/confirmDrop/btnYes/' and t.event = 'Click' then 
sessionid end) as cnt
from dfs.tmp.a_games_log_visit_base t
; 
+---+
|  cnt  |
+---+
| 2108  |
+---+
1 row selected (17.883 seconds)
}

select  
count(sessionid), 
count(case when t.id = '/confirmDrop/btnYes/' and t.event = 'Click' then 
sessionid end) as cnt
from dfs.tmp.a_games_log_visit_base t
; 
+---+---+
|  EXPR$0   |  cnt  |
+---+---+
| 37772844  | 2108  |
+---+---+
1 row selected (19.085 seconds)



select  
count(distinct sessionid), 
count(case when t.id = '/confirmDrop/btnYes/' and t.event = 'Click' then 
sessionid end) as cnt
from dfs.tmp.a_games_log_visit_base t
; 
+-+---+
| EXPR$0  |  cnt  |
+-+---+
| 201941  | 2108  |
+-+---+
1 row selected (24.638 seconds)





select  
count(distinct case when t.id = '/confirmDrop/btnYes/' and t.event = 'Click' 
then sessionid end) as cnt
from dfs.tmp.a_games_log_visit_base t
;
+--+
| cnt  |
+--+
| 525  |
+--+
1 row selected (17.393 seconds)




select  
count(sessionid),
count(distinct sessionid)
from dfs.tmp.a_games_log_visit_base t
where ( t.id = '/confirmDrop/btnYes/' and t.event = 'Click')
;
+-+-+
| EXPR$0  | EXPR$1  |
+-+-+
| 2108| 525 |
+-+-+
1 row selected (26.083 seconds)

{noformat}

{quote}


> Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)
> --
>
> Key: DRILL-4474
> URL: https://issues.apache.org/jira/browse/DRILL-4474
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.5.0
> Environment: m3.xlarge AWS instances ( 3 nodes)
> CentOS6.5 x64
>Reporter: Shankar
>Assignee: Jacques Nadeau
>Priority: Blocker
>
> {quote}
> * We are using drill to retrieve the business data from game analytic. 
> * We are running below queries on table of size 50GB (parquet)
> * We have found some major inconsistency in data when we use COUNT function.
> * Below is the case by case queries and their output. {color:blue}*Please 
> analyse it carefully, to for clear understanding of behaviour. *{color}
> * Please let me know how to resolve this ? (or any earlier JIRA has been 
> already created). 
> * Hope this may be fixed in later versions. If not please do the needful.
> {quote}
> --
> CASE-1 (Wrong result)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +---+
> |   count   |
> +---+
> | 27645752  |
> +---+
> 1 row selected (0.281 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-2 (Wrong result)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(sessionid), 
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +---+---+
> |  EXPR$0   |  cnt  |
> +---+---+
> | 37772844  | 2108  |
> +---+---+
> 1 row selected (12.597 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-3 (Wrong result, only first count is correct)
> --
> 

[jira] [Issue Comment Deleted] (DRILL-4474) Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)

2016-03-14 Thread Shankar (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shankar updated DRILL-4474:
---
Comment: was deleted

(was: {color:green}
Thanks counts are proper now. 
{color}

I have done "maven clean install" and used generated tarball 
(./distribution/target/apache-drill-1.6.0-SNAPSHOT.tar.gz).
*Shall i use this generated tarball to query my production data ?*

And i hope this fixed will get added in upcoming releases. 


Below are updated data for your reference: 


{quote}
{noformat}

select  
count(case when t.id = '/confirmDrop/btnYes/' and t.event = 'Click' then 
sessionid end) as cnt
from dfs.tmp.a_games_log_visit_base t
; 
+---+
|  cnt  |
+---+
| 2108  |
+---+
1 row selected (17.883 seconds)
}

select  
count(sessionid), 
count(case when t.id = '/confirmDrop/btnYes/' and t.event = 'Click' then 
sessionid end) as cnt
from dfs.tmp.a_games_log_visit_base t
; 
+---+---+
|  EXPR$0   |  cnt  |
+---+---+
| 37772844  | 2108  |
+---+---+
1 row selected (19.085 seconds)



select  
count(distinct sessionid), 
count(case when t.id = '/confirmDrop/btnYes/' and t.event = 'Click' then 
sessionid end) as cnt
from dfs.tmp.a_games_log_visit_base t
; 
+-+---+
| EXPR$0  |  cnt  |
+-+---+
| 201941  | 2108  |
+-+---+
1 row selected (24.638 seconds)





select  
count(distinct case when t.id = '/confirmDrop/btnYes/' and t.event = 'Click' 
then sessionid end) as cnt
from dfs.tmp.a_games_log_visit_base t
;
+--+
| cnt  |
+--+
| 525  |
+--+
1 row selected (17.393 seconds)




select  
count(sessionid),
count(distinct sessionid)
from dfs.tmp.a_games_log_visit_base t
where ( t.id = '/confirmDrop/btnYes/' and t.event = 'Click')
;
+-+-+
| EXPR$0  | EXPR$1  |
+-+-+
| 2108| 525 |
+-+-+
1 row selected (26.083 seconds)

{noformat}

{quote}
)

> Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)
> --
>
> Key: DRILL-4474
> URL: https://issues.apache.org/jira/browse/DRILL-4474
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.5.0
> Environment: m3.xlarge AWS instances ( 3 nodes)
> CentOS6.5 x64
>Reporter: Shankar
>Assignee: Jacques Nadeau
>Priority: Blocker
>
> {quote}
> * We are using drill to retrieve the business data from game analytic. 
> * We are running below queries on table of size 50GB (parquet)
> * We have found some major inconsistency in data when we use COUNT function.
> * Below is the case by case queries and their output. {color:blue}*Please 
> analyse it carefully, to for clear understanding of behaviour. *{color}
> * Please let me know how to resolve this ? (or any earlier JIRA has been 
> already created). 
> * Hope this may be fixed in later versions. If not please do the needful.
> {quote}
> --
> CASE-1 (Wrong result)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +---+
> |   count   |
> +---+
> | 27645752  |
> +---+
> 1 row selected (0.281 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-2 (Wrong result)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(sessionid), 
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +---+---+
> |  EXPR$0   |  cnt  |
> +---+---+
> | 37772844  | 2108  |
> +---+---+
> 1 row selected (12.597 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-3 (Wrong result, only first count is correct)
> --
> 

[jira] [Commented] (DRILL-4474) Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)

2016-03-14 Thread Shankar (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15192873#comment-15192873
 ] 

Shankar commented on DRILL-4474:


{color:green}
Thanks counts are proper now. 
{color}

I have done "maven clean install" and used generated tarball 
(./distribution/target/apache-drill-1.6.0-SNAPSHOT.tar.gz).
*Shall i use this generated tarball to query my production data ?*

And i hope this fixed will get added in upcoming releases. 


Below are updated data for your reference: 


{quote}
{noformat}

select  
count(case when t.id = '/confirmDrop/btnYes/' and t.event = 'Click' then 
sessionid end) as cnt
from dfs.tmp.a_games_log_visit_base t
; 
+---+
|  cnt  |
+---+
| 2108  |
+---+
1 row selected (17.883 seconds)
}

select  
count(sessionid), 
count(case when t.id = '/confirmDrop/btnYes/' and t.event = 'Click' then 
sessionid end) as cnt
from dfs.tmp.a_games_log_visit_base t
; 
+---+---+
|  EXPR$0   |  cnt  |
+---+---+
| 37772844  | 2108  |
+---+---+
1 row selected (19.085 seconds)



select  
count(distinct sessionid), 
count(case when t.id = '/confirmDrop/btnYes/' and t.event = 'Click' then 
sessionid end) as cnt
from dfs.tmp.a_games_log_visit_base t
; 
+-+---+
| EXPR$0  |  cnt  |
+-+---+
| 201941  | 2108  |
+-+---+
1 row selected (24.638 seconds)





select  
count(distinct case when t.id = '/confirmDrop/btnYes/' and t.event = 'Click' 
then sessionid end) as cnt
from dfs.tmp.a_games_log_visit_base t
;
+--+
| cnt  |
+--+
| 525  |
+--+
1 row selected (17.393 seconds)




select  
count(sessionid),
count(distinct sessionid)
from dfs.tmp.a_games_log_visit_base t
where ( t.id = '/confirmDrop/btnYes/' and t.event = 'Click')
;
+-+-+
| EXPR$0  | EXPR$1  |
+-+-+
| 2108| 525 |
+-+-+
1 row selected (26.083 seconds)

{noformat}

{quote}


> Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)
> --
>
> Key: DRILL-4474
> URL: https://issues.apache.org/jira/browse/DRILL-4474
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.5.0
> Environment: m3.xlarge AWS instances ( 3 nodes)
> CentOS6.5 x64
>Reporter: Shankar
>Assignee: Jacques Nadeau
>Priority: Blocker
>
> {quote}
> * We are using drill to retrieve the business data from game analytic. 
> * We are running below queries on table of size 50GB (parquet)
> * We have found some major inconsistency in data when we use COUNT function.
> * Below is the case by case queries and their output. {color:blue}*Please 
> analyse it carefully, to for clear understanding of behaviour. *{color}
> * Please let me know how to resolve this ? (or any earlier JIRA has been 
> already created). 
> * Hope this may be fixed in later versions. If not please do the needful.
> {quote}
> --
> CASE-1 (Wrong result)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +---+
> |   count   |
> +---+
> | 27645752  |
> +---+
> 1 row selected (0.281 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-2 (Wrong result)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(sessionid), 
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +---+---+
> |  EXPR$0   |  cnt  |
> +---+---+
> | 37772844  | 2108  |
> +---+---+
> 1 row selected (12.597 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-3 (Wrong result, only first count is correct)
> --
> 

[jira] [Commented] (DRILL-2217) Trying to flatten an empty list should return an empty result

2016-03-14 Thread Ian Hellstrom (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15192830#comment-15192830
 ] 

Ian Hellstrom commented on DRILL-2217:
--

What is the status of this issue? 

It is hard to believe that support for JSON stops at empty arrays. Drill does 
proclaim that it 'features a JSON data model that enables queries on 
complex/nested data as well as rapidly evolving structures ...'. 

It is vital that a) no exceptions are thrown when empty arrays are encountered, 
and b) it is possible to return the outer bits of a nested data structure even 
when the array itself is empty. The latter is similar to Hive's LATERAL VIEW 
OUTER syntax.

> Trying to flatten an empty list should return an empty result
> -
>
> Key: DRILL-2217
> URL: https://issues.apache.org/jira/browse/DRILL-2217
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Reporter: Rahul Challapalli
> Fix For: Future
>
> Attachments: error.log
>
>
> git.commit.id.abbrev=3d863b5
> Data Set :
> {code}
> {"empty":[[],[[]]]}
> {code}
> Query :
> {code}
> select flatten(empty) from `data1.json`;
> Query failed: RemoteRpcException: Failure while running fragment.[ 
> 1b3123d9-92bc-45d5-bef8-b5f1be9def07 on qa-node191.qa.lab:31010 ]
> [ 1b3123d9-92bc-45d5-bef8-b5f1be9def07 on qa-node191.qa.lab:31010 ]
> {code}
> I also attached the error from the logs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4506) Allow substitution variables in SQL scripts

2016-03-14 Thread Ian Hellstrom (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ian Hellstrom updated DRILL-4506:
-
Issue Type: Wish  (was: Bug)

> Allow substitution variables in SQL scripts
> ---
>
> Key: DRILL-4506
> URL: https://issues.apache.org/jira/browse/DRILL-4506
> Project: Apache Drill
>  Issue Type: Wish
>  Components: SQL Parser
>Reporter: Ian Hellstrom
>  Labels: features
>
> It would be great if substitution variables could be created à la SQL*Plus in 
> scripts. This would be especially helpful when objects need to be 
> (re-)created from scratch (i.e. in testing), and they share e.g. workspaces, 
> which have to be hard-coded otherwise.
> The following is a rough idea (based on SQL*Plus syntax):
> {code}
> IN_LOCATION=hdfs.project
> OUT_LOCATION=hdfs.storage
> CREATE VIEW _LOCATION.view_name AS
> SELECT * FROM _LOCATION.table_name;
> {code}
> For a first implementation, it would be best to have simply substitution 
> variables and not allow computations to be done, although it is conceivable 
> that that would be a second step.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4506) Allow substitution variables in SQL scripts

2016-03-14 Thread Ian Hellstrom (JIRA)
Ian Hellstrom created DRILL-4506:


 Summary: Allow substitution variables in SQL scripts
 Key: DRILL-4506
 URL: https://issues.apache.org/jira/browse/DRILL-4506
 Project: Apache Drill
  Issue Type: Bug
  Components: SQL Parser
Reporter: Ian Hellstrom


It would be great if substitution variables could be created à la SQL*Plus in 
scripts. This would be especially helpful when objects need to be (re-)created 
from scratch (i.e. in testing), and they share e.g. workspaces, which have to 
be hard-coded otherwise.

The following is a rough idea (based on SQL*Plus syntax):

{code}
IN_LOCATION=hdfs.project
OUT_LOCATION=hdfs.storage

CREATE VIEW _LOCATION.view_name AS
SELECT * FROM _LOCATION.table_name;
{code}

For a first implementation, it would be best to have simply substitution 
variables and not allow computations to be done, although it is conceivable 
that that would be a second step.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)