[jira] [Commented] (DRILL-6223) Drill fails on Schema changes

2018-03-30 Thread salim achouche (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16421106#comment-16421106
 ] 

salim achouche commented on DRILL-6223:
---

Thanks Paul for your feedback!

I wanted to bring to your attention another implementation detail that we also 
need to pay a closer look at; the current implementation of a column with all 
nulls is not cheap:
 * Consider a batch of 32k rows
 * A VV with null integer values will require 32kb (bits) + 32kb * 4 = 160kb
 * Each missing column will require that much memory per mini-fragment
 * This is unless (similarly to the implicit columns) we optimize the VV 
storage representation or / and push the column preservation to higher layers 
such as the client or foreman
 * I understand that handling few missing columns this way is fine but this 
will not be the case if we are talking about dozens of such columns especially 
that we are now ramping up our support to operator based batch sizing  

> Drill fails on Schema changes 
> --
>
> Key: DRILL-6223
> URL: https://issues.apache.org/jira/browse/DRILL-6223
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Affects Versions: 1.10.0, 1.12.0
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.14.0
>
>
> Drill Query Failing when selecting all columns from a Complex Nested Data 
> File (Parquet) Set). There are differences in Schema among the files:
>  * The Parquet files exhibit differences both at the first level and within 
> nested data types
>  * A select * will not cause an exception but using a limit clause will
>  * Note also this issue seems to happen only when multiple Drillbit minor 
> fragments are involved (concurrency higher than one)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6289) Cluster view should show more relevant information

2018-03-30 Thread Kunal Khatua (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16421101#comment-16421101
 ] 

Kunal Khatua commented on DRILL-6289:
-

Note: Requires DRILL-6279 for the Glyphicons package

> Cluster view should show more relevant information
> --
>
> Key: DRILL-6289
> URL: https://issues.apache.org/jira/browse/DRILL-6289
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.13.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Major
> Fix For: 1.14.0
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> When fixing DRILL-6224, I noticed that the same information can be very 
> useful to have in the cluster view shown on a Drillbit's homepage. 
> The proposal is to show the following:
> # Heap Memory in use
> # Direct Memory (actively) in use - Since we're not able to get the total 
> memory held by Netty at the moment, but only what is currently allocated to 
> running queries
> # Process CPU
> # Average (System) Load Factor 
> Information such as the port numbers don't help much during general cluster 
> health, so it might be worth removing this information if more real-estate is 
> needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (DRILL-6289) Cluster view should show more relevant information

2018-03-30 Thread Kunal Khatua (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16421101#comment-16421101
 ] 

Kunal Khatua edited comment on DRILL-6289 at 3/31/18 12:30 AM:
---

Note: Requires the pull-request for  DRILL-6279 , which carries the Glyphicons 
package bundled with Bootstrap


was (Author: kkhatua):
Note: Requires DRILL-6279 for the Glyphicons package

> Cluster view should show more relevant information
> --
>
> Key: DRILL-6289
> URL: https://issues.apache.org/jira/browse/DRILL-6289
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.13.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Major
> Fix For: 1.14.0
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> When fixing DRILL-6224, I noticed that the same information can be very 
> useful to have in the cluster view shown on a Drillbit's homepage. 
> The proposal is to show the following:
> # Heap Memory in use
> # Direct Memory (actively) in use - Since we're not able to get the total 
> memory held by Netty at the moment, but only what is currently allocated to 
> running queries
> # Process CPU
> # Average (System) Load Factor 
> Information such as the port numbers don't help much during general cluster 
> health, so it might be worth removing this information if more real-estate is 
> needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6223) Drill fails on Schema changes

2018-03-30 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16421091#comment-16421091
 ] 

Paul Rogers commented on DRILL-6223:


[~sachouche], thanks for the explanation, very helpful. The tests will help 
clarify the original problem and the fix.

Looking at the code, it does appear we try to prune unused columns (there are 
references to used columns; which I naively assumed meant we are separating the 
used from unused, perhaps I'm wrong.)

If we cannot correctly handle a schema change (according to whatever semantics 
we decide we want), then we need to kill the query rather than produce invalid 
results.

On the dynamically adding columns: a careful reading will show that the 
suggestion is to *preserve* columns, not create them. The discussion was around 
when we can preserve columns (columns appeared in first batch, then 
disappeared) and when we can't (columns appear in second or later batch.)

This PR will be solid if we do three things:

* Avoid memory corruption (the primary goal here, and a good one)
* Add unit tests that verify the fix
* Avoid introducing new semantics (dropping columns) as that just digs us 
deeper into the schema-free mess. Instead, fail the query if we are given 
schemas we can't reconcile.


> Drill fails on Schema changes 
> --
>
> Key: DRILL-6223
> URL: https://issues.apache.org/jira/browse/DRILL-6223
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Affects Versions: 1.10.0, 1.12.0
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.14.0
>
>
> Drill Query Failing when selecting all columns from a Complex Nested Data 
> File (Parquet) Set). There are differences in Schema among the files:
>  * The Parquet files exhibit differences both at the first level and within 
> nested data types
>  * A select * will not cause an exception but using a limit clause will
>  * Note also this issue seems to happen only when multiple Drillbit minor 
> fragments are involved (concurrency higher than one)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6294) Update Calcite version to 1.16.0

2018-03-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16421052#comment-16421052
 ] 

ASF GitHub Bot commented on DRILL-6294:
---

Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1198#discussion_r178404654
  
--- Diff: 
exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillResultSetImpl.java ---
@@ -1142,16 +1142,8 @@ public void moveToCurrentRow() throws SQLException {
   }
 
   @Override
-  public AvaticaStatement getStatement() {
-try {
-  throwIfClosed();
-} catch (AlreadyClosedSqlException e) {
-  // Can't throw any SQLException because AvaticaConnection's
-  // getStatement() is missing "throws SQLException".
-  throw new RuntimeException(e.getMessage(), e);
-} catch (SQLException e) {
-  throw new RuntimeException(e.getMessage(), e);
-}
+  public AvaticaStatement getStatement() throws SQLException {
+throwIfClosed();
--- End diff --

Thanks, it looks clearer now.


> Update Calcite version to 1.16.0
> 
>
> Key: DRILL-6294
> URL: https://issues.apache.org/jira/browse/DRILL-6294
> Project: Apache Drill
>  Issue Type: Task
>Affects Versions: 1.13.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.14.0
>
>
> Upgrade to Calcite 16 version.
>  From the last upgrade to Calcite 15, several commits were left in 
> Drill-Calcite fork. Since no additional work was done to move those commits 
> from the fork, they will be placed on top of Calcite 16.
>  Status from the last upgrade:
>  
> [https://docs.google.com/document/d/1Lqk9NoKQviz0YimBmov4z1pui7QjJGjDVwMa1p0emPk/edit#heading=h.i3rowg20vxv4]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6287) apache-release profile should be disabled by default

2018-03-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16421047#comment-16421047
 ] 

ASF GitHub Bot commented on DRILL-6287:
---

Github user vrozov commented on the issue:

https://github.com/apache/drill/pull/1182
  
@parthchandra Please review


> apache-release profile should be disabled by default
> 
>
> Key: DRILL-6287
> URL: https://issues.apache.org/jira/browse/DRILL-6287
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6294) Update Calcite version to 1.16.0

2018-03-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16421016#comment-16421016
 ] 

ASF GitHub Bot commented on DRILL-6294:
---

Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1198#discussion_r178395314
  
--- Diff: 
exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillResultSetImpl.java ---
@@ -1142,16 +1142,8 @@ public void moveToCurrentRow() throws SQLException {
   }
 
   @Override
-  public AvaticaStatement getStatement() {
-try {
-  throwIfClosed();
-} catch (AlreadyClosedSqlException e) {
-  // Can't throw any SQLException because AvaticaConnection's
-  // getStatement() is missing "throws SQLException".
-  throw new RuntimeException(e.getMessage(), e);
-} catch (SQLException e) {
-  throw new RuntimeException(e.getMessage(), e);
-}
+  public AvaticaStatement getStatement() throws SQLException {
+throwIfClosed();
--- End diff --

Since you are touching this file. You might want to remove not needed 
Exceptions for throwIfClosed() method that are derives of SqlException.


> Update Calcite version to 1.16.0
> 
>
> Key: DRILL-6294
> URL: https://issues.apache.org/jira/browse/DRILL-6294
> Project: Apache Drill
>  Issue Type: Task
>Affects Versions: 1.13.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.14.0
>
>
> Upgrade to Calcite 16 version.
>  From the last upgrade to Calcite 15, several commits were left in 
> Drill-Calcite fork. Since no additional work was done to move those commits 
> from the fork, they will be placed on top of Calcite 16.
>  Status from the last upgrade:
>  
> [https://docs.google.com/document/d/1Lqk9NoKQviz0YimBmov4z1pui7QjJGjDVwMa1p0emPk/edit#heading=h.i3rowg20vxv4]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6223) Drill fails on Schema changes

2018-03-30 Thread salim achouche (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420926#comment-16420926
 ] 

salim achouche commented on DRILL-6223:
---

* This PR is only to {color:#FF}avoid corruption{color} now that memory 
checks are disabled
 * As I said, we all agree that Drill enabled an ill-defined functionality
 ** We have the opportunity to discuss, clarify, formalize it it in a dedicated 
JIRA
 * Meanwhile, what to do with the current bugs?
 ** Let's use the following example (which has nothing to do with Schema 
Changes but instead is a byproduct of this functionality); assume the following 
FS structure
 *** ROOT/my_data/T1/\{column-c1}, \{column-c2}, ..
 *** ROOT/my_data/T2/\{column-c1}, \{column-c2}, ..
 ** Assume you issue the following query:  
 *** SELECT * from dfs.`ROOT/my_data/*`;
 ** The current code will blindly attempt to read the files thinking they are 
originating from the same schema
 *** The chance of dangling columns is extremely high
 *** What do we do?
  We can either pretend this is a is schema change issue and try to address 
it by inserting compensation logic
  Avoid corruption by either failing the query or removing dangling columns
  I chose the latter solution because I don't have clarity on the Schema 
Changes functionality
  It occurred to me that we can also disable Schema Change logic for 
SELECT_STAR queries
 ** To your point about compensation logic in the context of Schema Changes
 *** Why do you think it is ok to dynamically include new columns?
 *** Yet it is not ok to exclude them?
 *** Why did we include a mechanism to report schema changes to the JDBC 
client? maybe we thought the consumer app is in a better position to handle 
such events; in which case, any compensation logic is unnecessary (?)
 * With regards to tests
 ** I will add a test that triggers this condition
 ** The test will be deemed successful if there are no runtime failures
 ** Whether we should add missing columns or not is still being debated and 
outside the scope of this JIRA

> Drill fails on Schema changes 
> --
>
> Key: DRILL-6223
> URL: https://issues.apache.org/jira/browse/DRILL-6223
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Affects Versions: 1.10.0, 1.12.0
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.14.0
>
>
> Drill Query Failing when selecting all columns from a Complex Nested Data 
> File (Parquet) Set). There are differences in Schema among the files:
>  * The Parquet files exhibit differences both at the first level and within 
> nested data types
>  * A select * will not cause an exception but using a limit clause will
>  * Note also this issue seems to happen only when multiple Drillbit minor 
> fragments are involved (concurrency higher than one)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6286) Regression: incorrect reference to shutdown in drillbit.log

2018-03-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420924#comment-16420924
 ] 

ASF GitHub Bot commented on DRILL-6286:
---

Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/1196#discussion_r178375583
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/work/WorkManager.java ---
@@ -86,6 +86,9 @@
   private final StatusThread statusThread;
   private final Lock isEmptyLock = new ReentrantLock();
   private final Condition isEmptyCondition = isEmptyLock.newCondition();
+  private boolean isShutdownTriggered = false;
--- End diff --

Is this boolean necessary? Can you delay getting `isEmptyCondition` till 
shutdown is requested and use it in place of `isShutdownTriggered`?


> Regression: incorrect reference to shutdown in drillbit.log
> ---
>
> Key: DRILL-6286
> URL: https://issues.apache.org/jira/browse/DRILL-6286
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Vlad Rozov
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.14.0
>
>
> drillbit.log refers to shutdown even in cases when no shutdown sequence was 
> initiated:
> {noformat}
> 2018-03-16 11:55:52,693 [drill-executor-19] INFO  
> o.apache.drill.exec.work.WorkManager - Waiting for 0 queries to complete 
> before shutting down
> 2018-03-16 11:55:52,693 [drill-executor-19] INFO  
> o.apache.drill.exec.work.WorkManager - Waiting for 3 running fragments to 
> complete before shutting down
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6286) Regression: incorrect reference to shutdown in drillbit.log

2018-03-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420923#comment-16420923
 ] 

ASF GitHub Bot commented on DRILL-6286:
---

Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/1196#discussion_r178376213
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/work/WorkManager.java ---
@@ -212,19 +218,29 @@ private boolean areQueriesAndFragmentsEmpty() {
 return queries.isEmpty() && runningFragments.isEmpty();
   }
 
+  /**
+   * Check if there any new queries or fragments that are added after the 
shutdown is triggered
+   */
+  private boolean areNewQueriesOrFragmentsAdded() {
+return runningFragments.size() > numOfRunningFragments || 
queries.size() > numOfRunningQueries;
--- End diff --

This condition is not reliable. What if some fragments exited and some were 
added? The total number may still be less than the number of fragments when the 
shutdown was requested.


> Regression: incorrect reference to shutdown in drillbit.log
> ---
>
> Key: DRILL-6286
> URL: https://issues.apache.org/jira/browse/DRILL-6286
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Vlad Rozov
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.14.0
>
>
> drillbit.log refers to shutdown even in cases when no shutdown sequence was 
> initiated:
> {noformat}
> 2018-03-16 11:55:52,693 [drill-executor-19] INFO  
> o.apache.drill.exec.work.WorkManager - Waiting for 0 queries to complete 
> before shutting down
> 2018-03-16 11:55:52,693 [drill-executor-19] INFO  
> o.apache.drill.exec.work.WorkManager - Waiting for 3 running fragments to 
> complete before shutting down
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6286) Regression: incorrect reference to shutdown in drillbit.log

2018-03-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420925#comment-16420925
 ] 

ASF GitHub Bot commented on DRILL-6286:
---

Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/1196#discussion_r178376362
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/work/WorkManager.java ---
@@ -212,19 +218,29 @@ private boolean areQueriesAndFragmentsEmpty() {
 return queries.isEmpty() && runningFragments.isEmpty();
   }
 
+  /**
+   * Check if there any new queries or fragments that are added after the 
shutdown is triggered
+   */
+  private boolean areNewQueriesOrFragmentsAdded() {
+return runningFragments.size() > numOfRunningFragments || 
queries.size() > numOfRunningQueries;
+  }
+
   /**
* A thread calling the {@link #waitToExit(boolean)} method is notified 
when a foreman is retired.
*/
   private void indicateIfSafeToExit() {
 isEmptyLock.lock();
 try {
-  logger.info("Waiting for "+ queries.size() +" queries to complete 
before shutting down");
-  logger.info("Waiting for "+ runningFragments.size() +" running 
fragments to complete before shutting down");
+  if (isShutdownTriggered) {
+logger.info("Waiting for "+ queries.size() +" queries to complete 
before shutting down");
--- End diff --

Use slf4j smart logging.


> Regression: incorrect reference to shutdown in drillbit.log
> ---
>
> Key: DRILL-6286
> URL: https://issues.apache.org/jira/browse/DRILL-6286
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Vlad Rozov
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.14.0
>
>
> drillbit.log refers to shutdown even in cases when no shutdown sequence was 
> initiated:
> {noformat}
> 2018-03-16 11:55:52,693 [drill-executor-19] INFO  
> o.apache.drill.exec.work.WorkManager - Waiting for 0 queries to complete 
> before shutting down
> 2018-03-16 11:55:52,693 [drill-executor-19] INFO  
> o.apache.drill.exec.work.WorkManager - Waiting for 3 running fragments to 
> complete before shutting down
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6294) Update Calcite version to 1.16.0

2018-03-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420906#comment-16420906
 ] 

ASF GitHub Bot commented on DRILL-6294:
---

Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1198#discussion_r178371341
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillReduceAggregatesRule.java
 ---
@@ -218,7 +218,8 @@ private void reduceAggs(
 RelOptUtil.createProject(
 newAggRel,
 projList,
-oldAggRel.getRowType().getFieldNames());
+oldAggRel.getRowType().getFieldNames(),
+DrillRelFactories.LOGICAL_BUILDER);
--- End diff --

In the second commit it was fixed and used relBuilderFactory to create 
builder and project. 


> Update Calcite version to 1.16.0
> 
>
> Key: DRILL-6294
> URL: https://issues.apache.org/jira/browse/DRILL-6294
> Project: Apache Drill
>  Issue Type: Task
>Affects Versions: 1.13.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.14.0
>
>
> Upgrade to Calcite 16 version.
>  From the last upgrade to Calcite 15, several commits were left in 
> Drill-Calcite fork. Since no additional work was done to move those commits 
> from the fork, they will be placed on top of Calcite 16.
>  Status from the last upgrade:
>  
> [https://docs.google.com/document/d/1Lqk9NoKQviz0YimBmov4z1pui7QjJGjDVwMa1p0emPk/edit#heading=h.i3rowg20vxv4]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6273) Remove dependency licensed under Category X

2018-03-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420881#comment-16420881
 ] 

ASF GitHub Bot commented on DRILL-6273:
---

Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/1195#discussion_r178365989
  
--- Diff: tools/fmpp/src/main/java/bsh/EvalError.java ---
@@ -0,0 +1,28 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package bsh;
+/**
--- End diff --

will be better to have this comment in package-info.


> Remove dependency licensed under Category X
> ---
>
> Key: DRILL-6273
> URL: https://issues.apache.org/jira/browse/DRILL-6273
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Venkata Jyothsna Donapati
>Priority: Critical
> Fix For: 1.14.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6273) Remove dependency licensed under Category X

2018-03-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420883#comment-16420883
 ] 

ASF GitHub Bot commented on DRILL-6273:
---

Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/1195#discussion_r178366218
  
--- Diff: tools/fmpp/pom.xml ---
@@ -57,6 +57,10 @@
   commons-logging-api
   commons-logging
 
+
+  bsh
--- End diff --

add bsh:org.beanshell to the prohibited dependencies.


> Remove dependency licensed under Category X
> ---
>
> Key: DRILL-6273
> URL: https://issues.apache.org/jira/browse/DRILL-6273
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Venkata Jyothsna Donapati
>Priority: Critical
> Fix For: 1.14.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6273) Remove dependency licensed under Category X

2018-03-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420882#comment-16420882
 ] 

ASF GitHub Bot commented on DRILL-6273:
---

Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/1195#discussion_r178364790
  
--- Diff: tools/fmpp/src/main/java/bsh/EvalError.java ---
@@ -0,0 +1,28 @@
+/**
--- End diff --

Please do not use doc comment for the license.


> Remove dependency licensed under Category X
> ---
>
> Key: DRILL-6273
> URL: https://issues.apache.org/jira/browse/DRILL-6273
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Venkata Jyothsna Donapati
>Priority: Critical
> Fix For: 1.14.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6294) Update Calcite version to 1.16.0

2018-03-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420875#comment-16420875
 ] 

ASF GitHub Bot commented on DRILL-6294:
---

Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1198#discussion_r178364263
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillReduceAggregatesRule.java
 ---
@@ -218,7 +218,8 @@ private void reduceAggs(
 RelOptUtil.createProject(
 newAggRel,
 projList,
-oldAggRel.getRowType().getFieldNames());
+oldAggRel.getRowType().getFieldNames(),
+DrillRelFactories.LOGICAL_BUILDER);
--- End diff --

Could you explain why we are using DrillRelFactories.LOGICAL_BUILDER but 
not relBuilderFactory that was used in line 211? And could you point me to this 
4 param createProject method with Factory as the last param?


> Update Calcite version to 1.16.0
> 
>
> Key: DRILL-6294
> URL: https://issues.apache.org/jira/browse/DRILL-6294
> Project: Apache Drill
>  Issue Type: Task
>Affects Versions: 1.13.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.14.0
>
>
> Upgrade to Calcite 16 version.
>  From the last upgrade to Calcite 15, several commits were left in 
> Drill-Calcite fork. Since no additional work was done to move those commits 
> from the fork, they will be placed on top of Calcite 16.
>  Status from the last upgrade:
>  
> [https://docs.google.com/document/d/1Lqk9NoKQviz0YimBmov4z1pui7QjJGjDVwMa1p0emPk/edit#heading=h.i3rowg20vxv4]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6303) Provide a button to copy the Drillbit's JStack shown in /threads

2018-03-30 Thread Kunal Khatua (JIRA)
Kunal Khatua created DRILL-6303:
---

 Summary: Provide a button to copy the Drillbit's JStack shown in 
/threads
 Key: DRILL-6303
 URL: https://issues.apache.org/jira/browse/DRILL-6303
 Project: Apache Drill
  Issue Type: Improvement
  Components: Web Server
Reporter: Kunal Khatua
Assignee: Kunal Khatua
 Fix For: 1.14.0


Currently, when using the WebUI inspecting the JStack for the state of threads 
within a Drillbit (via +{{http://:8047/threads}}+ ), the contents of 
the `div` element refreshes automatically and resets any selection, making it 
harder to freeze the contents for inspection.

Pausing the refresh is not recommended, so the alternative is to copy the 
contents to the user's clipboard for separately viewing in a text editor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6009) No drillbits on index page

2018-03-30 Thread Venkata Jyothsna Donapati (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420807#comment-16420807
 ] 

Venkata Jyothsna Donapati commented on DRILL-6009:
--

[~arina] I have tried to reproduce this issue and had no luck. Can you please 
provide some specific scenarios when you ran into this issue?

> No drillbits on index page
> --
>
> Key: DRILL-6009
> URL: https://issues.apache.org/jira/browse/DRILL-6009
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Arina Ielchiieva
>Assignee: Venkata Jyothsna Donapati
>Priority: Minor
> Fix For: 1.14.0
>
> Attachments: empty_drillbits.JPG
>
>
> After DRILL-4286 once I saw that index page showed no drillbits at all but it 
> was working, so at least one drillbit was online (empty_drillbits.JPG). After 
> refresh everything was fine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6279) Web UI should indicate when operators have spilled in-memory data to disk

2018-03-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420778#comment-16420778
 ] 

ASF GitHub Bot commented on DRILL-6279:
---

Github user kkhatua commented on the issue:

https://github.com/apache/drill/pull/1197
  
@arina-ielchiieva could you review this? I will be using this commit as the 
basis for [DRILL-6289](https://issues.apache.org/jira/browse/DRILL-6289)


> Web UI should indicate when operators have spilled in-memory data to disk
> -
>
> Key: DRILL-6279
> URL: https://issues.apache.org/jira/browse/DRILL-6279
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Major
> Fix For: 1.14.0
>
>
> Currently, there is no indication of when an operator is spilling to disk, 
> which would help explain a slow running query. 
> Suggestions are welcome, but the current proposal is to simply update the 
> Operators Overview section to show average and max spill cycles, preferrably, 
> with a color code (or formatting).  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6279) Web UI should indicate when operators have spilled in-memory data to disk

2018-03-30 Thread Kunal Khatua (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua updated DRILL-6279:

Reviewer: Arina Ielchiieva

> Web UI should indicate when operators have spilled in-memory data to disk
> -
>
> Key: DRILL-6279
> URL: https://issues.apache.org/jira/browse/DRILL-6279
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Major
> Fix For: 1.14.0
>
>
> Currently, there is no indication of when an operator is spilling to disk, 
> which would help explain a slow running query. 
> Suggestions are welcome, but the current proposal is to simply update the 
> Operators Overview section to show average and max spill cycles, preferrably, 
> with a color code (or formatting).  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6223) Drill fails on Schema changes

2018-03-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420714#comment-16420714
 ] 

ASF GitHub Bot commented on DRILL-6223:
---

Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/1170
  
Over the last year, we've tended to favor including unit tests with each 
PR. There don't seem to be any with this one, yet we are proposing to make a 
fairly complex change. Perhaps tests can be added.

Further, by having good tests, we don't have to debate how Drill will 
handle the scenarios discussed in an earlier comment: we just code 'em up and 
try 'em out, letting Drill speak for itself. We can then decide whether or not 
we like the results, rather than discussing hypotheticals.


> Drill fails on Schema changes 
> --
>
> Key: DRILL-6223
> URL: https://issues.apache.org/jira/browse/DRILL-6223
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Affects Versions: 1.10.0, 1.12.0
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.14.0
>
>
> Drill Query Failing when selecting all columns from a Complex Nested Data 
> File (Parquet) Set). There are differences in Schema among the files:
>  * The Parquet files exhibit differences both at the first level and within 
> nested data types
>  * A select * will not cause an exception but using a limit clause will
>  * Note also this issue seems to happen only when multiple Drillbit minor 
> fragments are involved (concurrency higher than one)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6223) Drill fails on Schema changes

2018-03-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420711#comment-16420711
 ] 

ASF GitHub Bot commented on DRILL-6223:
---

Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/1170
  
BTW: thanks for tackling such a difficult, core issue in Drill. Drill 
claims to be a) schema free and b) SQL compliant. SQL is based on operations 
over relations with a fixed number of columns of fixed types. Reconciling these 
two ideas is very difficult. Even the original Drill developers, who built a 
huge amount of code very quickly, and who had intimate knowledge of the Drill 
internals, even they did not find a good solution which is why the problem is 
still open.

There are two obvious approaches: 1) redefine SQL to operate over lists of 
maps (with arbitrary name/value pairs that differ across rows), or 2) define 
translation rules from schema-free input into the schema-full relations that 
SQL requires.

This PR attempts to go down the first route: redefine SQL. To be 
successful, we'd want to rely on research papers, if any, that show how to 
reformulate relational theory on top of lists of maps rather than on relations 
and domains.

The other approach is to define conversion rules: something much more on 
the order of a straight-forward implementation project. Can the user provide 
conversion rules (in the form of a schema) when the conversion is ambiguous? 
Would users rather encounter schema change exceptions or provide the conversion 
rules? These are interesting open questions.


> Drill fails on Schema changes 
> --
>
> Key: DRILL-6223
> URL: https://issues.apache.org/jira/browse/DRILL-6223
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Affects Versions: 1.10.0, 1.12.0
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.14.0
>
>
> Drill Query Failing when selecting all columns from a Complex Nested Data 
> File (Parquet) Set). There are differences in Schema among the files:
>  * The Parquet files exhibit differences both at the first level and within 
> nested data types
>  * A select * will not cause an exception but using a limit clause will
>  * Note also this issue seems to happen only when multiple Drillbit minor 
> fragments are involved (concurrency higher than one)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6294) Update Calcite version to 1.16.0

2018-03-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420696#comment-16420696
 ] 

ASF GitHub Bot commented on DRILL-6294:
---

Github user vvysotskyi commented on the issue:

https://github.com/apache/drill/pull/1198
  
@amansinha100, @chunhui-shi could you please take a look?


> Update Calcite version to 1.16.0
> 
>
> Key: DRILL-6294
> URL: https://issues.apache.org/jira/browse/DRILL-6294
> Project: Apache Drill
>  Issue Type: Task
>Affects Versions: 1.13.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.14.0
>
>
> Upgrade to Calcite 16 version.
>  From the last upgrade to Calcite 15, several commits were left in 
> Drill-Calcite fork. Since no additional work was done to move those commits 
> from the fork, they will be placed on top of Calcite 16.
>  Status from the last upgrade:
>  
> [https://docs.google.com/document/d/1Lqk9NoKQviz0YimBmov4z1pui7QjJGjDVwMa1p0emPk/edit#heading=h.i3rowg20vxv4]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6259) Support parquet filter push down for complex types

2018-03-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420691#comment-16420691
 ] 

ASF GitHub Bot commented on DRILL-6259:
---

Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/1173
  
Done.


> Support parquet filter push down for complex types
> --
>
> Key: DRILL-6259
> URL: https://issues.apache.org/jira/browse/DRILL-6259
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
>
> Currently parquet filter push down is not working for complex types 
> (including arrays).
> This Jira aims to implement filter push down for complex types which 
> underneath type is among supported simple types for filter push down. For 
> instance, currently Drill does not support filter push down for varchars, 
> decimals etc. Though once Drill will start support, this support will be 
> applied for complex type automatically.
> Complex fields will be pushed down the same way regular fields are, except 
> for one case with arrays.
> Query with predicate {{where users.hobbies_ids[2] is null}} won't be able to 
> push down because we are not able to determine exact number of nulls in 
> arrays fields. 
> {{Consider [1, 2, 3]}} vs {{[1, 2]}} if these arrays are in different files. 
> Statistics for the second case won't show any nulls but when querying from 
> two files, in terms of data the third value in array is null.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6294) Update Calcite version to 1.16.0

2018-03-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420690#comment-16420690
 ] 

ASF GitHub Bot commented on DRILL-6294:
---

GitHub user vvysotskyi opened a pull request:

https://github.com/apache/drill/pull/1198

DRILL-6294: Changes to support Calcite 1.16.0



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vvysotskyi/drill DRILL-6294

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1198.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1198


commit a79d6586033b618c95462368520237aab84f47bf
Author: Volodymyr Vysotskyi 
Date:   2018-02-07T14:24:50Z

DRILL-6294: Changes to support Calcite 1.16.0

commit 48880eb80d60a15e4ffdfcd6c729bfc75cf5d2da
Author: Volodymyr Vysotskyi 
Date:   2018-03-11T10:11:45Z

DRILL-6294: Remove deprecated API usage




> Update Calcite version to 1.16.0
> 
>
> Key: DRILL-6294
> URL: https://issues.apache.org/jira/browse/DRILL-6294
> Project: Apache Drill
>  Issue Type: Task
>Affects Versions: 1.13.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.14.0
>
>
> Upgrade to Calcite 16 version.
>  From the last upgrade to Calcite 15, several commits were left in 
> Drill-Calcite fork. Since no additional work was done to move those commits 
> from the fork, they will be placed on top of Calcite 16.
>  Status from the last upgrade:
>  
> [https://docs.google.com/document/d/1Lqk9NoKQviz0YimBmov4z1pui7QjJGjDVwMa1p0emPk/edit#heading=h.i3rowg20vxv4]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6016) Error reading INT96 created by Apache Spark

2018-03-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420683#comment-16420683
 ] 

ASF GitHub Bot commented on DRILL-6016:
---

Github user rajrahul commented on a diff in the pull request:

https://github.com/apache/drill/pull/1166#discussion_r178324303
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriter.java
 ---
@@ -61,6 +60,7 @@
 import org.junit.runners.Parameterized;
 
 @RunWith(Parameterized.class)
+
--- End diff --

Actually not required, tried to add another RunWith for Mocking and removed 
later on leaving the newline.


> Error reading INT96 created by Apache Spark
> ---
>
> Key: DRILL-6016
> URL: https://issues.apache.org/jira/browse/DRILL-6016
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Rahul Raj
>Assignee: Rahul Raj
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
>
> Hi,
> I am getting the error - SYSTEM ERROR : ClassCastException: 
> org.apache.drill.exec.vector.TimeStampVector cannot be cast to 
> org.apache.drill.exec.vector.VariableWidthVector while trying to read a spark 
> INT96 datetime field on Drill 1.11 in spite of setting the property 
> store.parquet.reader.int96_as_timestamp to  true.
> I believe this was fixed in drill 
> 1.10(https://issues.apache.org/jira/browse/DRILL-4373). What could be wrong.
> I have attached the dataset at 
> https://github.com/rajrahul/files/blob/master/result.tar.gz



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-5990) Apache Drill /status API returns OK ('Running') even with JRE while queries will not work - make status API reflect the fact that Drill is broken on JRE or stop Drill st

2018-03-30 Thread Pritesh Maker (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker reassigned DRILL-5990:


Assignee: Venkata Jyothsna Donapati

> Apache Drill /status API returns OK ('Running') even with JRE while queries 
> will not work - make status API reflect the fact that Drill is broken on JRE 
> or stop Drill starting up with JRE
> ---
>
> Key: DRILL-5990
> URL: https://issues.apache.org/jira/browse/DRILL-5990
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server, Client - HTTP, Execution - Monitoring
>Affects Versions: 1.10.0, 1.11.0
> Environment: Docker
>Reporter: Hari Sekhon
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
>
> If running Apache Drill on versions 1.10 / 1.11 on JRE it appears that 
> queries no longer run without JDK, but the /status monitoring API endpoint 
> does not reflect the fact the Apache Drill will not work and still returns 
> 'Running'.
> While 'Running' is technically true the process is up, it's effectively 
> broken and Apache Drill should either reflect this in /status that is is 
> broken or refuse to start up on JRE altogether.
> See this ticket for more information:
> https://github.com/HariSekhon/Dockerfiles/pull/15



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5846) Improve Parquet Reader Performance for Flat Data types

2018-03-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420653#comment-16420653
 ] 

ASF GitHub Bot commented on DRILL-5846:
---

Github user sachouche commented on the issue:

https://github.com/apache/drill/pull/1060
  
Parth,

- I have attached, within the DRILL-5846, two profiles with latest Apache 
code and this PR request (bounds checks are off):
  o Used one thread in each run
  o I observe ~3x performance difference when the new logic is turned on
  o The difference is 4x if I include the implicit column optimization 
(which is not part of this PR)
  o The impact of the new optimizations can be felt when there are many 
variable length columns

- The rational of trying to approve this PR
   o The optimizations that I have included are local to the Flat Parquet 
Reader (incapsulated)
   o The logic is backward compatible and turned off by default
   o I have added the new Batch Sizing functionality on top of this PR 
(columnar processing pattern)
   o The result of DRILL-6301 would only result in a local refactoring step
   o Not being able to add the new code results in a substantial 
maintenance overhead

   


> Improve Parquet Reader Performance for Flat Data types 
> ---
>
> Key: DRILL-5846
> URL: https://issues.apache.org/jira/browse/DRILL-5846
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Affects Versions: 1.11.0
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
>  Labels: performance
> Fix For: 1.14.0
>
> Attachments: 2542d447-9837-3924-dd12-f759108461e5.sys.drill, 
> 2542d49b-88ef-38e3-a02b-b441c1295817.sys.drill
>
>
> The Parquet Reader is a key use-case for Drill. This JIRA is an attempt to 
> further improve the Parquet Reader performance as several users reported that 
> Parquet parsing represents the lion share of the overall query execution. It 
> tracks Flat Data types only as Nested DTs might involve functional and 
> processing enhancements (e.g., a nested column can be seen as a Document; 
> user might want to perform operations scoped at the document level that is no 
> need to span all rows). Another JIRA will be created to handle the nested 
> columns use-case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-5846) Improve Parquet Reader Performance for Flat Data types

2018-03-30 Thread salim achouche (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche updated DRILL-5846:
--
Attachment: 2542d49b-88ef-38e3-a02b-b441c1295817.sys.drill

> Improve Parquet Reader Performance for Flat Data types 
> ---
>
> Key: DRILL-5846
> URL: https://issues.apache.org/jira/browse/DRILL-5846
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Affects Versions: 1.11.0
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
>  Labels: performance
> Fix For: 1.14.0
>
> Attachments: 2542d447-9837-3924-dd12-f759108461e5.sys.drill, 
> 2542d49b-88ef-38e3-a02b-b441c1295817.sys.drill
>
>
> The Parquet Reader is a key use-case for Drill. This JIRA is an attempt to 
> further improve the Parquet Reader performance as several users reported that 
> Parquet parsing represents the lion share of the overall query execution. It 
> tracks Flat Data types only as Nested DTs might involve functional and 
> processing enhancements (e.g., a nested column can be seen as a Document; 
> user might want to perform operations scoped at the document level that is no 
> need to span all rows). Another JIRA will be created to handle the nested 
> columns use-case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-5846) Improve Parquet Reader Performance for Flat Data types

2018-03-30 Thread salim achouche (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche updated DRILL-5846:
--
Attachment: 2542d447-9837-3924-dd12-f759108461e5.sys.drill

> Improve Parquet Reader Performance for Flat Data types 
> ---
>
> Key: DRILL-5846
> URL: https://issues.apache.org/jira/browse/DRILL-5846
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Affects Versions: 1.11.0
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
>  Labels: performance
> Fix For: 1.14.0
>
> Attachments: 2542d447-9837-3924-dd12-f759108461e5.sys.drill
>
>
> The Parquet Reader is a key use-case for Drill. This JIRA is an attempt to 
> further improve the Parquet Reader performance as several users reported that 
> Parquet parsing represents the lion share of the overall query execution. It 
> tracks Flat Data types only as Nested DTs might involve functional and 
> processing enhancements (e.g., a nested column can be seen as a Document; 
> user might want to perform operations scoped at the document level that is no 
> need to span all rows). Another JIRA will be created to handle the nested 
> columns use-case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6259) Support parquet filter push down for complex types

2018-03-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420585#comment-16420585
 ] 

ASF GitHub Bot commented on DRILL-6259:
---

Github user amansinha100 commented on the issue:

https://github.com/apache/drill/pull/1173
  
@arina-ielchiieva could you rebase this on latest master ? thanks. 


> Support parquet filter push down for complex types
> --
>
> Key: DRILL-6259
> URL: https://issues.apache.org/jira/browse/DRILL-6259
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
>
> Currently parquet filter push down is not working for complex types 
> (including arrays).
> This Jira aims to implement filter push down for complex types which 
> underneath type is among supported simple types for filter push down. For 
> instance, currently Drill does not support filter push down for varchars, 
> decimals etc. Though once Drill will start support, this support will be 
> applied for complex type automatically.
> Complex fields will be pushed down the same way regular fields are, except 
> for one case with arrays.
> Query with predicate {{where users.hobbies_ids[2] is null}} won't be able to 
> push down because we are not able to determine exact number of nulls in 
> arrays fields. 
> {{Consider [1, 2, 3]}} vs {{[1, 2]}} if these arrays are in different files. 
> Statistics for the second case won't show any nulls but when querying from 
> two files, in terms of data the third value in array is null.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6299) Parquet query returns unexpected results

2018-03-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420563#comment-16420563
 ] 

ASF GitHub Bot commented on DRILL-6299:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/1192


> Parquet query returns unexpected results 
> -
>
> Key: DRILL-6299
> URL: https://issues.apache.org/jira/browse/DRILL-6299
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.13.0
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
>
> Query "select id from   where str_empty is null and 
> tinyint_var between -10 and 15" returns unexpected results. The same query 
> will succeed if the filter pushdown functionality is disabled.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6256) Remove references to java 7 from readme and other files

2018-03-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420567#comment-16420567
 ] 

ASF GitHub Bot commented on DRILL-6256:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/1172


> Remove references to java 7 from readme and other files
> ---
>
> Key: DRILL-6256
> URL: https://issues.apache.org/jira/browse/DRILL-6256
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Volodymyr Tkach
>Assignee: Volodymyr Tkach
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
>
> Since master branch uses jdk 8 we should remove all references to java 7.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5937) Make prepare.statement.create_timeout_ms default to 30 seconds instead of 10 seconds

2018-03-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420564#comment-16420564
 ] 

ASF GitHub Bot commented on DRILL-5937:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/1190


>  Make prepare.statement.create_timeout_ms default to 30 seconds instead of 10 
> seconds
> -
>
> Key: DRILL-5937
> URL: https://issues.apache.org/jira/browse/DRILL-5937
> Project: Apache Drill
>  Issue Type: Task
>  Components: Query Planning & Optimization
>Affects Versions: 1.8.0
>Reporter: Pushpendra Jaiswal
>Assignee: Pushpendra Jaiswal
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
>
> prepare.statement.create_timeout_ms default is 10 seconds but code comment 
> says default should be 10 mins
> The value is by default set to 1 ms 
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java#L526
>  
> /**
>* Timeout for create prepare statement request. If the request exceeds 
> this timeout, then request is timed out.
>* Default value is 10mins.
>*/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6278) DRILL-5993 Made Debugging Generated Code Harder

2018-03-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420566#comment-16420566
 ] 

ASF GitHub Bot commented on DRILL-6278:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/1178


> DRILL-5993 Made Debugging Generated Code Harder
> ---
>
> Key: DRILL-6278
> URL: https://issues.apache.org/jira/browse/DRILL-6278
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
>
> DRILL-5993 made debugging generated code more difficult since it stored 
> generated code in unique directories in target. This required adding possibly 
> many tmp directories as source folders in order to be able to set break 
> points in generated code for different tests. This change should be reverted 
> to store generated code in the original default /tmp/drill/codegen directory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6125) PartitionSenderRootExec can leak memory because close method is not synchronized

2018-03-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420568#comment-16420568
 ] 

ASF GitHub Bot commented on DRILL-6125:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/1105


> PartitionSenderRootExec can leak memory because close method is not 
> synchronized
> 
>
> Key: DRILL-6125
> URL: https://issues.apache.org/jira/browse/DRILL-6125
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
>
> PartitionSenderRootExec creates a PartitionerDecorator and saves it in the 
> *partitioner* field. The creation of the partitioner happens in the 
> createPartitioner method. This method get's called by the main fragment 
> thread. The partitioner field is accessed by the fragment thread during 
> normal execution but it can also be accessed by the receivingFragmentFinished 
> method which is a callback executed by the event processor thread. Because 
> multiple threads can access the partitioner field synchronization is done on 
> creation and on when receivingFragmentFinished. However, the close method can 
> also be called by the event processor thread, and the close method does not 
> synchronize before accessing the partitioner field. Since synchronization is 
> not done the event processor thread may have an old reference to the 
> partitioner when a query cancellation is done. Since it has an old reference 
> the current partitioner can may not be cleared and a memory leak may occur.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6300) Refresh protobuf C++ source files

2018-03-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420562#comment-16420562
 ] 

ASF GitHub Bot commented on DRILL-6300:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/1194


> Refresh protobuf C++ source files
> -
>
> Key: DRILL-6300
> URL: https://issues.apache.org/jira/browse/DRILL-6300
> Project: Apache Drill
>  Issue Type: Task
>Affects Versions: 1.13.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
>
> After the changes in the classes which are used for generating protobuf 
> files, java protobuf sources were regenerated, but C++ not.
> The aim of this Jira is to refresh protobuf C++ source files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6254) IllegalArgumentException: the requested size must be non-negative

2018-03-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420565#comment-16420565
 ] 

ASF GitHub Bot commented on DRILL-6254:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/1179


> IllegalArgumentException: the requested size must be non-negative
> -
>
> Key: DRILL-6254
> URL: https://issues.apache.org/jira/browse/DRILL-6254
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.13.0
>Reporter: Khurram Faraaz
>Assignee: Padma Penumarthy
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
> Attachments: genAllTypesJSN.py
>
>
> Flatten query fails due to IllegalArgumentException: the requested size must 
> be non-negative.
> Script to generate JSON data file is attached here.
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> CREATE TABLE tbl_all_types_jsn_to_parquet AS 
> . . . . . . . . . . . . . . > SELECT
> . . . . . . . . . . . . . . > CAST( col_int AS INT) col_int, 
> . . . . . . . . . . . . . . > CAST( col_bigint AS BIGINT) col_bigint, 
> . . . . . . . . . . . . . . > CAST( col_char AS CHAR(10)) col_char, 
> . . . . . . . . . . . . . . > CAST( col_fxdln_str AS VARCHAR(256)) 
> col_fxdln_str, 
> . . . . . . . . . . . . . . > CAST( col_varln_str AS VARCHAR(256)) 
> col_varln_str, 
> . . . . . . . . . . . . . . > CAST( col_float AS FLOAT) col_float, 
> . . . . . . . . . . . . . . > CAST( col_double AS DOUBLE PRECISION) 
> col_double, 
> . . . . . . . . . . . . . . > CAST( col_date AS DATE) col_date, 
> . . . . . . . . . . . . . . > CAST( col_time AS TIME) col_time, 
> . . . . . . . . . . . . . . > CAST( col_tmstmp AS TIMESTAMP) col_tmstmp, 
> . . . . . . . . . . . . . . > CAST( col_boolean AS BOOLEAN) col_boolean, 
> . . . . . . . . . . . . . . > col_binary, 
> . . . . . . . . . . . . . . > array_of_ints from `all_supported_types.json`;
> +---++
> | Fragment | Number of records written |
> +---++
> | 0_0 | 9 |
> +---++
> 1 row selected (0.29 seconds)
> {noformat}
> Reset all options and set slice_target=1
> alter system reset all;
> alter system set `planner.slice_target`=1;
> output_batch_size was set to its default value
> drill.exec.memory.operator.output_batch_size = 16777216
>  
> {noformat}
> select *, flatten(array_of_ints) from tbl_all_types_jsn_to_parquet;
> Error: SYSTEM ERROR: IllegalArgumentException: the requested size must be 
> non-negative
> Fragment 0:0
> [Error Id: 480bae96-ae89-45a7-b937-011c0f87c14d on qa102-45.qa.lab:31010] 
> (state=,code=0)
> 0: jdbc:drill:schema=dfs.tmp>
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> 2018-03-15 12:19:43,916 [255538af-bcd5-98ee-32e0-68d98fc4a6fa:foreman] INFO 
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 255538af-bcd5-98ee-32e0-68d98fc4a6fa: select *, flatten(array_of_ints) from 
> tbl_all_types_jsn_to_parquet
> 2018-03-15 12:19:43,952 [255538af-bcd5-98ee-32e0-68d98fc4a6fa:foreman] INFO 
> o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0 ms, 
> numFiles: 1
> 2018-03-15 12:19:43,953 [255538af-bcd5-98ee-32e0-68d98fc4a6fa:foreman] INFO 
> o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0 ms, 
> numFiles: 1
> 2018-03-15 12:19:43,966 [255538af-bcd5-98ee-32e0-68d98fc4a6fa:foreman] INFO 
> o.a.d.exec.store.parquet.Metadata - Took 0 ms to get file statuses
> 2018-03-15 12:19:43,969 [255538af-bcd5-98ee-32e0-68d98fc4a6fa:foreman] INFO 
> o.a.d.exec.store.parquet.Metadata - Fetch parquet metadata: Executed 1 out of 
> 1 using 1 threads. Time: 2ms total, 2.927366ms avg, 2ms max.
> 2018-03-15 12:19:43,969 [255538af-bcd5-98ee-32e0-68d98fc4a6fa:foreman] INFO 
> o.a.d.exec.store.parquet.Metadata - Fetch parquet metadata: Executed 1 out of 
> 1 using 1 threads. Earliest start: 2.829000 μs, Latest start: 2.829000 μs, 
> Average start: 2.829000 μs .
> 2018-03-15 12:19:43,969 [255538af-bcd5-98ee-32e0-68d98fc4a6fa:foreman] INFO 
> o.a.d.exec.store.parquet.Metadata - Took 3 ms to read file metadata
> 2018-03-15 12:19:44,000 [255538af-bcd5-98ee-32e0-68d98fc4a6fa:frag:0:0] INFO 
> o.a.d.e.w.fragment.FragmentExecutor - 
> 255538af-bcd5-98ee-32e0-68d98fc4a6fa:0:0: State change requested 
> AWAITING_ALLOCATION --> RUNNING
> 2018-03-15 12:19:44,000 [255538af-bcd5-98ee-32e0-68d98fc4a6fa:frag:0:0] INFO 
> o.a.d.e.w.f.FragmentStatusReporter - 
> 255538af-bcd5-98ee-32e0-68d98fc4a6fa:0:0: State to report: RUNNING
> 2018-03-15 12:19:44,905 [255538af-bcd5-98ee-32e0-68d98fc4a6fa:frag:0:0] INFO 
> o.a.d.e.w.fragment.FragmentExecutor - 
> 255538af-bcd5-98ee-32e0-68d98fc4a6fa:0:0: State change 

[jira] [Commented] (DRILL-4699) Add Description Column in sys.options

2018-03-30 Thread John Omernik (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420499#comment-16420499
 ] 

John Omernik commented on DRILL-4699:
-

I can't comment on the work/PR being dated. However, I do not think the 
intended goals are covered by DRILL-5723.  The main goal is to make it so 
options are documented when added, and that documentation is right in the 
sys.options (or sys.internaloptions even, I was not aware of the work to make 
internal options).  Either way, the goal is the same, there should not be 
options added where there is a reason why the developer adding the option can't 
add a few sentences on what the option is, what it does, and how it works.  
There have been too many times I have to go to the source code to figure out an 
option.  By adding it here, it can act as the source of record for all of the 
options. This could then be utilized downstream for automatically updating the 
documentation. (If the sys.options description changes, the documentation can 
be auto generated based on it).  

 

 

> Add Description Column in sys.options
> -
>
> Key: DRILL-4699
> URL: https://issues.apache.org/jira/browse/DRILL-4699
> Project: Apache Drill
>  Issue Type: Improvement
>  Components:  Server, Documentation
>Affects Versions: 1.6.0
>Reporter: John Omernik
>Assignee: Sudheesh Katkam
>Priority: Major
>
> select * from sys.options provides a user with a strong understanding of what 
> options are available to Drill. These options are not well documented.  Some 
> options are "experimental" other options have a function only in specific 
> cases (writers vs readers for example).  If we had a large text field for 
> description, we could enforce documentation of the settings are option 
> creation time, and the description of the setting could change as the 
> versions change (i.e. when an option graduates to being supported from being 
> experimental, it would be changed in the version the user is using. I.e. when 
> they run select * from sys.options, they know the exact state of the option 
> every time they query. It could also facilitate better self documentation via 
> QA on pull requests "Did you update the sys.options.desc?"  This makes it 
> easier for users, and admins in the use of Drill in an enterprise.
> The first step is adding the field, and then going back and filling in the 
> desc for each option.  (Another JIRA after the option is available)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6016) Error reading INT96 created by Apache Spark

2018-03-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420493#comment-16420493
 ] 

ASF GitHub Bot commented on DRILL-6016:
---

Github user rajrahul commented on a diff in the pull request:

https://github.com/apache/drill/pull/1166#discussion_r178290675
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriter.java
 ---
@@ -780,17 +780,31 @@ public void 
testImpalaParquetBinaryAsVarBinary_DictChange() throws Exception {
   Test the reading of a binary field as drill timestamp where data is in 
dictionary _and_ non-dictionary encoded pages
*/
   @Test
-  @Ignore("relies on particular time zone, works for UTC")
   public void testImpalaParquetBinaryAsTimeStamp_DictChange() throws 
Exception {
 try {
   testBuilder()
-  .sqlQuery("select int96_ts from dfs.`parquet/int96_dict_change` 
order by int96_ts")
+  .sqlQuery("select min(int96_ts) date_value from 
dfs.`parquet/int96_dict_change`")
--- End diff --

I did not try a WHERE statement, MIN was used to select a single record to 
compare. Was there any specific reason to use WHERE?


> Error reading INT96 created by Apache Spark
> ---
>
> Key: DRILL-6016
> URL: https://issues.apache.org/jira/browse/DRILL-6016
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Rahul Raj
>Assignee: Rahul Raj
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
>
> Hi,
> I am getting the error - SYSTEM ERROR : ClassCastException: 
> org.apache.drill.exec.vector.TimeStampVector cannot be cast to 
> org.apache.drill.exec.vector.VariableWidthVector while trying to read a spark 
> INT96 datetime field on Drill 1.11 in spite of setting the property 
> store.parquet.reader.int96_as_timestamp to  true.
> I believe this was fixed in drill 
> 1.10(https://issues.apache.org/jira/browse/DRILL-4373). What could be wrong.
> I have attached the dataset at 
> https://github.com/rajrahul/files/blob/master/result.tar.gz



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5846) Improve Parquet Reader Performance for Flat Data types

2018-03-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420305#comment-16420305
 ] 

ASF GitHub Bot commented on DRILL-5846:
---

Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1060
  
I feel putting this PR in without finalizing DRILL-6301 is putting the cart 
before the horse. (BTW, it would help the discussion if the benchmarks were 
published !). My observation based on profiling I did sometime back is that the 
performance gains seen here are roughly in line with removing bounds checks. 
Paul has seen similar gains in the batch sizing project.
Which takes us back to the question, raised by Paul in his first comment, 
of how we want to reconcile batch sizing and vectorizing of scans; a question 
we have deferred. If removing bounds checks gets us the same performance gains, 
then why not would put our efforts in implementing batch sizing with the 
accompanying elimination in bounds checking. 
I'm mostly not in favor of having MemoryUtils unless you make a compelling 
argument that it is the only way to save the planet (i.e get the performance 
you want). I feel operators should not establish the pattern of accessing 
memory directly. So far, I'm -0 on this as my arguments are mostly high level 
(and somewhat philosophical). 
Minor nitpick - The prefix VL is not as informative as say, VarLen or 
VariableLength.



> Improve Parquet Reader Performance for Flat Data types 
> ---
>
> Key: DRILL-5846
> URL: https://issues.apache.org/jira/browse/DRILL-5846
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Affects Versions: 1.11.0
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
>  Labels: performance
> Fix For: 1.14.0
>
>
> The Parquet Reader is a key use-case for Drill. This JIRA is an attempt to 
> further improve the Parquet Reader performance as several users reported that 
> Parquet parsing represents the lion share of the overall query execution. It 
> tracks Flat Data types only as Nested DTs might involve functional and 
> processing enhancements (e.g., a nested column can be seen as a Document; 
> user might want to perform operations scoped at the document level that is no 
> need to span all rows). Another JIRA will be created to handle the nested 
> columns use-case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6016) Error reading INT96 created by Apache Spark

2018-03-30 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-6016:
---
Labels: ready-to-commit  (was: )

> Error reading INT96 created by Apache Spark
> ---
>
> Key: DRILL-6016
> URL: https://issues.apache.org/jira/browse/DRILL-6016
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Rahul Raj
>Assignee: Rahul Raj
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
>
> Hi,
> I am getting the error - SYSTEM ERROR : ClassCastException: 
> org.apache.drill.exec.vector.TimeStampVector cannot be cast to 
> org.apache.drill.exec.vector.VariableWidthVector while trying to read a spark 
> INT96 datetime field on Drill 1.11 in spite of setting the property 
> store.parquet.reader.int96_as_timestamp to  true.
> I believe this was fixed in drill 
> 1.10(https://issues.apache.org/jira/browse/DRILL-4373). What could be wrong.
> I have attached the dataset at 
> https://github.com/rajrahul/files/blob/master/result.tar.gz



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6016) Error reading INT96 created by Apache Spark

2018-03-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420294#comment-16420294
 ] 

ASF GitHub Bot commented on DRILL-6016:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/1166#discussion_r178255942
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriter.java
 ---
@@ -780,17 +780,31 @@ public void 
testImpalaParquetBinaryAsVarBinary_DictChange() throws Exception {
   Test the reading of a binary field as drill timestamp where data is in 
dictionary _and_ non-dictionary encoded pages
*/
   @Test
-  @Ignore("relies on particular time zone, works for UTC")
   public void testImpalaParquetBinaryAsTimeStamp_DictChange() throws 
Exception {
 try {
   testBuilder()
-  .sqlQuery("select int96_ts from dfs.`parquet/int96_dict_change` 
order by int96_ts")
+  .sqlQuery("select min(int96_ts) date_value from 
dfs.`parquet/int96_dict_change`")
--- End diff --

Did you try WHERE statement?


> Error reading INT96 created by Apache Spark
> ---
>
> Key: DRILL-6016
> URL: https://issues.apache.org/jira/browse/DRILL-6016
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Rahul Raj
>Assignee: Rahul Raj
>Priority: Major
> Fix For: 1.14.0
>
>
> Hi,
> I am getting the error - SYSTEM ERROR : ClassCastException: 
> org.apache.drill.exec.vector.TimeStampVector cannot be cast to 
> org.apache.drill.exec.vector.VariableWidthVector while trying to read a spark 
> INT96 datetime field on Drill 1.11 in spite of setting the property 
> store.parquet.reader.int96_as_timestamp to  true.
> I believe this was fixed in drill 
> 1.10(https://issues.apache.org/jira/browse/DRILL-4373). What could be wrong.
> I have attached the dataset at 
> https://github.com/rajrahul/files/blob/master/result.tar.gz



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6016) Error reading INT96 created by Apache Spark

2018-03-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420295#comment-16420295
 ] 

ASF GitHub Bot commented on DRILL-6016:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/1166#discussion_r178255699
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriter.java
 ---
@@ -61,6 +60,7 @@
 import org.junit.runners.Parameterized;
 
 @RunWith(Parameterized.class)
+
--- End diff --

new line?


> Error reading INT96 created by Apache Spark
> ---
>
> Key: DRILL-6016
> URL: https://issues.apache.org/jira/browse/DRILL-6016
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Rahul Raj
>Assignee: Rahul Raj
>Priority: Major
> Fix For: 1.14.0
>
>
> Hi,
> I am getting the error - SYSTEM ERROR : ClassCastException: 
> org.apache.drill.exec.vector.TimeStampVector cannot be cast to 
> org.apache.drill.exec.vector.VariableWidthVector while trying to read a spark 
> INT96 datetime field on Drill 1.11 in spite of setting the property 
> store.parquet.reader.int96_as_timestamp to  true.
> I believe this was fixed in drill 
> 1.10(https://issues.apache.org/jira/browse/DRILL-4373). What could be wrong.
> I have attached the dataset at 
> https://github.com/rajrahul/files/blob/master/result.tar.gz



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)