[jira] [Commented] (DRILL-5735) UI options grouping and filtering & Metrics hints

2018-08-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16583404#comment-16583404
 ] 

ASF GitHub Bot commented on DRILL-5735:
---

kkhatua edited a comment on issue #1279: DRILL-5735: Allow search/sort in the 
Options webUI
URL: https://github.com/apache/drill/pull/1279#issuecomment-413764952
 
 
   @arina-ielchiieva please review.
   This contains changes as requested.
   The UI and the System tables, both function as expected.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> UI options grouping and filtering & Metrics hints
> -
>
> Key: DRILL-5735
> URL: https://issues.apache.org/jira/browse/DRILL-5735
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.9.0, 1.10.0, 1.11.0
>Reporter: Muhammad Gelbana
>Assignee: Kunal Khatua
>Priority: Major
> Fix For: 1.15.0
>
>
> I'm thinking of some UI improvements that could make all the difference for 
> users trying to optimize low-performing queries.
> h2. Options
> h3. Grouping
> We can organize the options to be grouped by their scope of effect, this will 
> help users easily locate the options they may need to tune.
> h3. Filtering
> Since the options are a lot, we can add a filtering mechanism (i.e. string 
> search or group\scope filtering) so the user can filter out the options he's 
> not interested in. To provide more benefit than the grouping idea mentioned 
> above, filtering may include keywords also and not just the option name, 
> since the user may not be aware of the name of the option he's looking for.
> h2. Metrics
> I'm referring here to the metrics page and the query execution plan page that 
> displays the overview section and major\minor fragments metrics. We can show 
> hints for each metric such as:
> # What does it represent in more details.
> # What option\scope-of-options to tune (increase ? decrease ?) to improve the 
> performance reported by this metric.
> # May be even provide a small dialog to quickly allow the modification of the 
> related option(s) to that metric



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5735) UI options grouping and filtering & Metrics hints

2018-08-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16583401#comment-16583401
 ] 

ASF GitHub Bot commented on DRILL-5735:
---

kkhatua commented on issue #1279: DRILL-5735: Allow search/sort in the Options 
webUI
URL: https://github.com/apache/drill/pull/1279#issuecomment-413764952
 
 
   @arina-ielchiieva please review.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> UI options grouping and filtering & Metrics hints
> -
>
> Key: DRILL-5735
> URL: https://issues.apache.org/jira/browse/DRILL-5735
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.9.0, 1.10.0, 1.11.0
>Reporter: Muhammad Gelbana
>Assignee: Kunal Khatua
>Priority: Major
> Fix For: 1.15.0
>
>
> I'm thinking of some UI improvements that could make all the difference for 
> users trying to optimize low-performing queries.
> h2. Options
> h3. Grouping
> We can organize the options to be grouped by their scope of effect, this will 
> help users easily locate the options they may need to tune.
> h3. Filtering
> Since the options are a lot, we can add a filtering mechanism (i.e. string 
> search or group\scope filtering) so the user can filter out the options he's 
> not interested in. To provide more benefit than the grouping idea mentioned 
> above, filtering may include keywords also and not just the option name, 
> since the user may not be aware of the name of the option he's looking for.
> h2. Metrics
> I'm referring here to the metrics page and the query execution plan page that 
> displays the overview section and major\minor fragments metrics. We can show 
> hints for each metric such as:
> # What does it represent in more details.
> # What option\scope-of-options to tune (increase ? decrease ?) to improve the 
> performance reported by this metric.
> # May be even provide a small dialog to quickly allow the modification of the 
> related option(s) to that metric



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6694) NPE in UnnestRecordBatch when query uses a column name not present in data

2018-08-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16583189#comment-16583189
 ] 

ASF GitHub Bot commented on DRILL-6694:
---

sohami closed pull request #1434: DRILL-6694: NPE in UnnestRecordBatch when 
query uses a column name no…
URL: https://github.com/apache/drill/pull/1434
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unnest/UnnestRecordBatch.java
 
b/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unnest/UnnestRecordBatch.java
index e1b8acb42de..a00fae67bd5 100644
--- 
a/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unnest/UnnestRecordBatch.java
+++ 
b/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unnest/UnnestRecordBatch.java
@@ -62,6 +62,8 @@
   private int remainderIndex = 0;
   private int recordCount;
   private MaterializedField unnestFieldMetadata;
+  // Reference of TypedFieldId for Unnest column. It's always set in 
schemaChanged method and later used by others
+  private TypedFieldId unnestTypedFieldId;
   private final UnnestMemoryManager memoryManager;
 
   public enum Metric implements MetricDef {
@@ -95,12 +97,8 @@ public void update() {
   // Get sizing information for the batch.
   setRecordBatchSizer(new RecordBatchSizer(incoming));
 
-  final TypedFieldId typedFieldId = 
incoming.getValueVectorId(popConfig.getColumn());
-  final MaterializedField field = 
incoming.getSchema().getColumn(typedFieldId.getFieldIds()[0]);
-
   // Get column size of unnest column.
-
-  RecordBatchSizer.ColumnSize columnSize = 
getRecordBatchSizer().getColumn(field.getName());
+  RecordBatchSizer.ColumnSize columnSize = 
getRecordBatchSizer().getColumn(unnestFieldMetadata.getName());
 
   final int rowIdColumnSize = 
TypeHelper.getSize(rowIdVector.getField().getType());
 
@@ -213,22 +211,15 @@ public IterOutcome innerNext() {
   container.zeroVectors();
   // Check if schema has changed
   if (lateral.getRecordIndex() == 0) {
-boolean hasNewSchema = schemaChanged();
-stats.batchReceived(0, incoming.getRecordCount(), hasNewSchema);
-if (hasNewSchema) {
-  try {
+try {
+  boolean hasNewSchema = schemaChanged();
+  stats.batchReceived(0, incoming.getRecordCount(), hasNewSchema);
+  if (hasNewSchema) {
 setupNewSchema();
 hasRemainder = true;
 memoryManager.update();
-  } catch (SchemaChangeException ex) {
-kill(false);
-logger.error("Failure during query", ex);
-context.getExecutorState().fail(ex);
-return IterOutcome.STOP;
-  }
-  return OK_NEW_SCHEMA;
-} else { // Unnest field schema didn't changed but new left 
empty/nonempty batch might come with OK_NEW_SCHEMA
-  try {
+return OK_NEW_SCHEMA;
+  } else { // Unnest field schema didn't changed but new left 
empty/nonempty batch might come with OK_NEW_SCHEMA
 // This means even though there is no schema change for unnest 
field the reference of unnest field
 // ValueVector must have changed hence we should just refresh the 
transfer pairs and keep output vector
 // same as before. In case when new left batch is received with 
SchemaChange but was empty Lateral will
@@ -237,19 +228,18 @@ public IterOutcome innerNext() {
 // pair. It should do for each new left incoming batch.
 resetUnnestTransferPair();
 container.zeroVectors();
-  } catch (SchemaChangeException ex) {
-kill(false);
-logger.error("Failure during query", ex);
-context.getExecutorState().fail(ex);
-return IterOutcome.STOP;
-  }
-} // else
-unnest.resetGroupIndex();
-memoryManager.update();
+  } // else
+  unnest.resetGroupIndex();
+  memoryManager.update();
+} catch (SchemaChangeException ex) {
+  kill(false);
+  logger.error("Failure during query", ex);
+  context.getExecutorState().fail(ex);
+  return IterOutcome.STOP;
+}
   }
   return doWork();
 }
-
   }
 
 @Override
@@ -259,11 +249,10 @@ public VectorContainer getOutgoingContainer() {
 
   @SuppressWarnings("resource")
   private void setUnnestVector() {
-final TypedFieldId typedFieldId = 
incoming.getValueVectorId(popConfig.getColumn());
-final MaterializedField field = 
incoming.getSchema().getColumn(typedFieldId.getFieldIds()[0]);
+final MaterializedField field = 

[jira] [Commented] (DRILL-6695) Graceful shutdown removes spill directory before query finished

2018-08-16 Thread Krystal (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16583151#comment-16583151
 ] 

Krystal commented on DRILL-6695:


Forgot to mention that this problem does not occur if graceful shutdown is 
initiated from the webUI.

> Graceful shutdown removes spill directory before query finished 
> 
>
> Key: DRILL-6695
> URL: https://issues.apache.org/jira/browse/DRILL-6695
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.14.0
> Environment: Ran the following query from sqlline:
> select a.columns[0], b.columns[1], a.columns[2], b.columns[3], a.columns[4] 
> from `test` a join `test` b on a.columns[0]=b.columns[0] and 
> a.columns[4]=b.columns[4] order by a.columns[0] limit 1000;
> While the query was running, initiated a graceful shutdown from command line 
> on the foreman node.  The query failed with the following error message:
> Error: RESOURCE ERROR: Hash Join failed to open spill file: 
> /tmp/drill/spill/248a054a-ee63-e795-a44e-d9205df8e9b8_HashJoin_3-2-0/spill7_outer
> Fragment 3:0
> Looks like somehow the spill directory gets deleted while query is still 
> running when graceful_shutdown is initiated.
>  
>  
>Reporter: Krystal
>Priority: Major
> Attachments: drillbit.log
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6695) Graceful shutdown removes spill directory before query finished

2018-08-16 Thread Krystal (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krystal updated DRILL-6695:
---
Attachment: drillbit.log

> Graceful shutdown removes spill directory before query finished 
> 
>
> Key: DRILL-6695
> URL: https://issues.apache.org/jira/browse/DRILL-6695
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.14.0
> Environment: Ran the following query from sqlline:
> select a.columns[0], b.columns[1], a.columns[2], b.columns[3], a.columns[4] 
> from `test` a join `test` b on a.columns[0]=b.columns[0] and 
> a.columns[4]=b.columns[4] order by a.columns[0] limit 1000;
> While the query was running, initiated a graceful shutdown from command line 
> on the foreman node.  The query failed with the following error message:
> Error: RESOURCE ERROR: Hash Join failed to open spill file: 
> /tmp/drill/spill/248a054a-ee63-e795-a44e-d9205df8e9b8_HashJoin_3-2-0/spill7_outer
> Fragment 3:0
> Looks like somehow the spill directory gets deleted while query is still 
> running when graceful_shutdown is initiated.
>  
>  
>Reporter: Krystal
>Priority: Major
> Attachments: drillbit.log
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6695) Graceful shutdown removes spill directory before query finished

2018-08-16 Thread Krystal (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16583147#comment-16583147
 ] 

Krystal commented on DRILL-6695:


Here is the log file:

[^drillbit.log]

> Graceful shutdown removes spill directory before query finished 
> 
>
> Key: DRILL-6695
> URL: https://issues.apache.org/jira/browse/DRILL-6695
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.14.0
> Environment: Ran the following query from sqlline:
> select a.columns[0], b.columns[1], a.columns[2], b.columns[3], a.columns[4] 
> from `test` a join `test` b on a.columns[0]=b.columns[0] and 
> a.columns[4]=b.columns[4] order by a.columns[0] limit 1000;
> While the query was running, initiated a graceful shutdown from command line 
> on the foreman node.  The query failed with the following error message:
> Error: RESOURCE ERROR: Hash Join failed to open spill file: 
> /tmp/drill/spill/248a054a-ee63-e795-a44e-d9205df8e9b8_HashJoin_3-2-0/spill7_outer
> Fragment 3:0
> Looks like somehow the spill directory gets deleted while query is still 
> running when graceful_shutdown is initiated.
>  
>  
>Reporter: Krystal
>Priority: Major
> Attachments: drillbit.log
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6695) Graceful shutdown removes spill directory before query finished

2018-08-16 Thread Krystal (JIRA)
Krystal created DRILL-6695:
--

 Summary: Graceful shutdown removes spill directory before query 
finished 
 Key: DRILL-6695
 URL: https://issues.apache.org/jira/browse/DRILL-6695
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Affects Versions: 1.14.0
 Environment: Ran the following query from sqlline:

select a.columns[0], b.columns[1], a.columns[2], b.columns[3], a.columns[4] 
from `test` a join `test` b on a.columns[0]=b.columns[0] and 
a.columns[4]=b.columns[4] order by a.columns[0] limit 1000;

While the query was running, initiated a graceful shutdown from command line on 
the foreman node.  The query failed with the following error message:

Error: RESOURCE ERROR: Hash Join failed to open spill file: 
/tmp/drill/spill/248a054a-ee63-e795-a44e-d9205df8e9b8_HashJoin_3-2-0/spill7_outer
Fragment 3:0

Looks like somehow the spill directory gets deleted while query is still 
running when graceful_shutdown is initiated.

 

 
Reporter: Krystal






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6694) NPE in UnnestRecordBatch when query uses a column name not present in data

2018-08-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16583126#comment-16583126
 ] 

ASF GitHub Bot commented on DRILL-6694:
---

sohami commented on a change in pull request #1434: DRILL-6694: NPE in 
UnnestRecordBatch when query uses a column name no…
URL: https://github.com/apache/drill/pull/1434#discussion_r210759060
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unnest/UnnestRecordBatch.java
 ##
 @@ -62,6 +62,8 @@
   private int remainderIndex = 0;
   private int recordCount;
   private MaterializedField unnestFieldMetadata;
+  // Reference of TypedFieldId for Unnest column. It's always set in 
schemaChanged method and later used by others
+  private TypedFieldId unnestTypedFieldId;
 
 Review comment:
   kept a reference of it here since it's costly to get it everytime from 
incoming batch based on column name


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> NPE in UnnestRecordBatch when query uses a column name not present in data
> --
>
> Key: DRILL-6694
> URL: https://issues.apache.org/jira/browse/DRILL-6694
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.14.0
>Reporter: Sorabh Hamirwasia
>Assignee: Sorabh Hamirwasia
>Priority: Major
> Fix For: 1.15.0
>
>
> When the array column name doesn't exist in the underlying data and is used 
> in query with Unnest then there is NPE. The reason is Unnest tries to get the 
> ValueVector of unnest column from incoming based on TypedFieldId which will 
> be null in this case and hence the exception.
> {code:java}
> [Error Id: 6f8461ee-92c7-4865-b5e6-3e2f756391c4 on pssc-67.qa.lab:31010] at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
>  ~[drill-common-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:361)
>  [drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:216)
>  [drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:327)
>  [drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [na:1.8.0_151] at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_151] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_151] 
> Caused by: java.lang.NullPointerException: null at 
> org.apache.drill.exec.physical.impl.unnest.UnnestRecordBatch.schemaChanged(UnnestRecordBatch.java:422)
>  ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
> org.apache.drill.exec.physical.impl.unnest.UnnestRecordBatch.innerNext(UnnestRecordBatch.java:208)
>  ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172)
>  ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>  ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
>  ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:64)
>  ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:142)
>  ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172)
>  ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>  ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
> org.apache.drill.exec.physical.impl.join.LateralJoinBatch.prefetchFirstBatchFromBothSides(LateralJoinBatch.java:331)
>  ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
> org.apache.drill.exec.physical.impl.join.LateralJoinBatch.buildSchema(LateralJoinBatch.java:356)
>  

[jira] [Commented] (DRILL-6694) NPE in UnnestRecordBatch when query uses a column name not present in data

2018-08-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16583124#comment-16583124
 ] 

ASF GitHub Bot commented on DRILL-6694:
---

sohami opened a new pull request #1434: DRILL-6694: NPE in UnnestRecordBatch 
when query uses a column name no…
URL: https://github.com/apache/drill/pull/1434
 
 
   …t present in data


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> NPE in UnnestRecordBatch when query uses a column name not present in data
> --
>
> Key: DRILL-6694
> URL: https://issues.apache.org/jira/browse/DRILL-6694
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.14.0
>Reporter: Sorabh Hamirwasia
>Assignee: Sorabh Hamirwasia
>Priority: Major
> Fix For: 1.15.0
>
>
> When the array column name doesn't exist in the underlying data and is used 
> in query with Unnest then there is NPE. The reason is Unnest tries to get the 
> ValueVector of unnest column from incoming based on TypedFieldId which will 
> be null in this case and hence the exception.
> {code:java}
> [Error Id: 6f8461ee-92c7-4865-b5e6-3e2f756391c4 on pssc-67.qa.lab:31010] at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
>  ~[drill-common-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:361)
>  [drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:216)
>  [drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:327)
>  [drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [na:1.8.0_151] at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_151] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_151] 
> Caused by: java.lang.NullPointerException: null at 
> org.apache.drill.exec.physical.impl.unnest.UnnestRecordBatch.schemaChanged(UnnestRecordBatch.java:422)
>  ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
> org.apache.drill.exec.physical.impl.unnest.UnnestRecordBatch.innerNext(UnnestRecordBatch.java:208)
>  ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172)
>  ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>  ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
>  ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:64)
>  ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:142)
>  ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172)
>  ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>  ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
> org.apache.drill.exec.physical.impl.join.LateralJoinBatch.prefetchFirstBatchFromBothSides(LateralJoinBatch.java:331)
>  ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
> org.apache.drill.exec.physical.impl.join.LateralJoinBatch.buildSchema(LateralJoinBatch.java:356)
>  ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:152)
>  ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>  ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
>  ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
> 

[jira] [Created] (DRILL-6694) NPE in UnnestRecordBatch when query uses a column name not present in data

2018-08-16 Thread Sorabh Hamirwasia (JIRA)
Sorabh Hamirwasia created DRILL-6694:


 Summary: NPE in UnnestRecordBatch when query uses a column name 
not present in data
 Key: DRILL-6694
 URL: https://issues.apache.org/jira/browse/DRILL-6694
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.14.0
Reporter: Sorabh Hamirwasia
Assignee: Sorabh Hamirwasia
 Fix For: 1.15.0


When the array column name doesn't exist in the underlying data and is used in 
query with Unnest then there is NPE. The reason is Unnest tries to get the 
ValueVector of unnest column from incoming based on TypedFieldId which will be 
null in this case and hence the exception.
{code:java}
[Error Id: 6f8461ee-92c7-4865-b5e6-3e2f756391c4 on pssc-67.qa.lab:31010] at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
 ~[drill-common-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:361)
 [drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:216)
 [drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:327)
 [drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) 
[drill-common-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[na:1.8.0_151] at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[na:1.8.0_151] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_151] Caused 
by: java.lang.NullPointerException: null at 
org.apache.drill.exec.physical.impl.unnest.UnnestRecordBatch.schemaChanged(UnnestRecordBatch.java:422)
 ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
org.apache.drill.exec.physical.impl.unnest.UnnestRecordBatch.innerNext(UnnestRecordBatch.java:208)
 ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172)
 ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
 ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
 ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:64)
 ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:142)
 ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172)
 ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
 ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
org.apache.drill.exec.physical.impl.join.LateralJoinBatch.prefetchFirstBatchFromBothSides(LateralJoinBatch.java:331)
 ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
org.apache.drill.exec.physical.impl.join.LateralJoinBatch.buildSchema(LateralJoinBatch.java:356)
 ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:152)
 ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
 ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
 ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.buildSchema(StreamingAggBatch.java:158)
 ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:152)
 ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:103) 
~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext(SingleSenderCreator.java:93)
 ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:93) 
~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:294)
 ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at 

[jira] [Commented] (DRILL-6552) Drill Metadata management "Drill MetaStore"

2018-08-16 Thread Parth Chandra (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16583061#comment-16583061
 ] 

Parth Chandra commented on DRILL-6552:
--

Good point about #4 : Improvements to Drill execution and planning to leverage 
(enhanced) metadata. 
{quote}This project sounds pretty large. Given the modular structure you've 
outlined, the initial implementation might focus on the API and Drill internals 
changes to use the data. Create starter implementations for HMS, Drill's 
existing Parquet metadata, an easy-to-use file based description for ad-hoc 
uses, and a system table based system when querying a JDBC data store.
{quote}
I would second this approach. 

 

> Drill Metadata management "Drill MetaStore"
> ---
>
> Key: DRILL-6552
> URL: https://issues.apache.org/jira/browse/DRILL-6552
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Metadata
>Affects Versions: 1.13.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
> Fix For: 2.0.0
>
>
> It would be useful for Drill to have some sort of metastore which would 
> enable Drill to remember previously defined schemata so Drill doesn’t have to 
> do the same work over and over again.
> It allows to store schema and statistics, which will allow to accelerate 
> queries validation, planning and execution time. Also it increases stability 
> of Drill and allows to avoid different kind if issues: "schema change 
> Exceptions", "limit 0" optimization and so on. 
> One of the main candidates is Hive Metastore.
> Starting from 3.0 version Hive Metastore can be the separate service from 
> Hive server:
> [https://cwiki.apache.org/confluence/display/Hive/AdminManual+Metastore+3.0+Administration]
> Optional enhancement is storing Drill's profiles, UDFs, plugins configs in 
> some kind of metastore as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6461) Add Basic Data Correctness Unit Tests

2018-08-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16583004#comment-16583004
 ] 

ASF GitHub Bot commented on DRILL-6461:
---

ilooner commented on issue #1344: DRILL-6461: Added basic data correctness 
tests for hash agg, and improved operator unit testing framework.
URL: https://github.com/apache/drill/pull/1344#issuecomment-413667931
 
 
   @sohami The PR is ready for another round of review.
   
   I removed the fix that allowed **copyEntry** to work with empty VarLength 
vectors since value vectors should not be manipulated unless allocateNew is 
called. I fixed the failing test by calling allocateNew on all the vectors in 
the destination vector container.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Basic Data Correctness Unit Tests
> -
>
> Key: DRILL-6461
> URL: https://issues.apache.org/jira/browse/DRILL-6461
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
>
> There are no data correctness unit tests for HashAgg. We need to add some.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6566) Jenkins Regression: TPCDS query 66 fails with RESOURCE ERROR: One or more nodes ran out of memory while executing the query. AGGR OOM at First Phase.

2018-08-16 Thread Boaz Ben-Zvi (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582913#comment-16582913
 ] 

Boaz Ben-Zvi commented on DRILL-6566:
-

This query is doing 36 SUM() aggregations !! Hence leading to a very large 
batch of about 31 MB ( ~ 64K * 36 * ~ 13) .

The memory available to the first phase Hash-Agg is only 27 MB -> OOM.

Should the batch-sizing address this situation ?

 

> Jenkins Regression: TPCDS query 66 fails with RESOURCE ERROR: One or more 
> nodes ran out of memory while executing the query.  AGGR OOM at First Phase.
> --
>
> Key: DRILL-6566
> URL: https://issues.apache.org/jira/browse/DRILL-6566
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Assignee: Timothy Farkas
>Priority: Critical
> Fix For: 1.15.0
>
>
> This is TPCDS Query 66.
> Query: tpcds/tpcds_sf1/original/parquet/query66.sql
> SELECT w_warehouse_name,
> w_warehouse_sq_ft,
> w_city,
> w_county,
> w_state,
> w_country,
> ship_carriers,
> year1,
> Sum(jan_sales) AS jan_sales,
> Sum(feb_sales) AS feb_sales,
> Sum(mar_sales) AS mar_sales,
> Sum(apr_sales) AS apr_sales,
> Sum(may_sales) AS may_sales,
> Sum(jun_sales) AS jun_sales,
> Sum(jul_sales) AS jul_sales,
> Sum(aug_sales) AS aug_sales,
> Sum(sep_sales) AS sep_sales,
> Sum(oct_sales) AS oct_sales,
> Sum(nov_sales) AS nov_sales,
> Sum(dec_sales) AS dec_sales,
> Sum(jan_sales / w_warehouse_sq_ft) AS jan_sales_per_sq_foot,
> Sum(feb_sales / w_warehouse_sq_ft) AS feb_sales_per_sq_foot,
> Sum(mar_sales / w_warehouse_sq_ft) AS mar_sales_per_sq_foot,
> Sum(apr_sales / w_warehouse_sq_ft) AS apr_sales_per_sq_foot,
> Sum(may_sales / w_warehouse_sq_ft) AS may_sales_per_sq_foot,
> Sum(jun_sales / w_warehouse_sq_ft) AS jun_sales_per_sq_foot,
> Sum(jul_sales / w_warehouse_sq_ft) AS jul_sales_per_sq_foot,
> Sum(aug_sales / w_warehouse_sq_ft) AS aug_sales_per_sq_foot,
> Sum(sep_sales / w_warehouse_sq_ft) AS sep_sales_per_sq_foot,
> Sum(oct_sales / w_warehouse_sq_ft) AS oct_sales_per_sq_foot,
> Sum(nov_sales / w_warehouse_sq_ft) AS nov_sales_per_sq_foot,
> Sum(dec_sales / w_warehouse_sq_ft) AS dec_sales_per_sq_foot,
> Sum(jan_net)   AS jan_net,
> Sum(feb_net)   AS feb_net,
> Sum(mar_net)   AS mar_net,
> Sum(apr_net)   AS apr_net,
> Sum(may_net)   AS may_net,
> Sum(jun_net)   AS jun_net,
> Sum(jul_net)   AS jul_net,
> Sum(aug_net)   AS aug_net,
> Sum(sep_net)   AS sep_net,
> Sum(oct_net)   AS oct_net,
> Sum(nov_net)   AS nov_net,
> Sum(dec_net)   AS dec_net
> FROM   (SELECT w_warehouse_name,
> w_warehouse_sq_ft,
> w_city,
> w_county,
> w_state,
> w_country,
> 'ZOUROS'
> \|\| ','
> \|\| 'ZHOU' AS ship_carriers,
> d_yearAS year1,
> Sum(CASE
> WHEN d_moy = 1 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS jan_sales,
> Sum(CASE
> WHEN d_moy = 2 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS feb_sales,
> Sum(CASE
> WHEN d_moy = 3 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS mar_sales,
> Sum(CASE
> WHEN d_moy = 4 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS apr_sales,
> Sum(CASE
> WHEN d_moy = 5 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS may_sales,
> Sum(CASE
> WHEN d_moy = 6 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS jun_sales,
> Sum(CASE
> WHEN d_moy = 7 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS jul_sales,
> Sum(CASE
> WHEN d_moy = 8 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS aug_sales,
> Sum(CASE
> WHEN d_moy = 9 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS sep_sales,
> Sum(CASE
> WHEN d_moy = 10 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS oct_sales,
> Sum(CASE
> WHEN d_moy = 11 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS nov_sales,
> Sum(CASE
> WHEN d_moy = 12 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS dec_sales,
> Sum(CASE
> WHEN d_moy = 1 THEN ws_net_paid_inc_ship * ws_quantity
> ELSE 0
> END)  AS jan_net,
> Sum(CASE
> WHEN d_moy = 2 THEN ws_net_paid_inc_ship * ws_quantity
> ELSE 0
> END)  AS feb_net,
> Sum(CASE
> WHEN d_moy = 3 THEN ws_net_paid_inc_ship * ws_quantity
> ELSE 0
> END)  AS mar_net,
> Sum(CASE
> WHEN 

[jira] [Created] (DRILL-6693) When a query is started from Drill Web Console, the UI becomes inaccessible until the query is completed

2018-08-16 Thread Anton Gozhiy (JIRA)
Anton Gozhiy created DRILL-6693:
---

 Summary: When a query is started from Drill Web Console, the UI 
becomes inaccessible until the query is completed
 Key: DRILL-6693
 URL: https://issues.apache.org/jira/browse/DRILL-6693
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.15.0
Reporter: Anton Gozhiy


*Steps:*
# From Web UI, run the following query:
{noformat}
select * 
from (
select employee_id, full_name, first_name, last_name, position_id, 
position_title, store_id, department_id, birth_date, hire_date, salary, 
supervisor_id, education_level, marital_status, gender, management_role 
from cp.`employee.json` 
union
select employee_id, full_name, first_name, last_name, position_id, 
position_title, store_id, department_id, birth_date, hire_date, salary, 
supervisor_id, education_level, marital_status, gender, management_role 
from cp.`employee.json` 
union
select employee_id, full_name, first_name, last_name, position_id, 
position_title, store_id, department_id, birth_date, hire_date, salary, 
supervisor_id, education_level, marital_status, gender, management_role
from cp.`employee.json`)
where last_name = 'Blumberg'
{noformat}
# While query is running, try open the Profiles page (or any other). If It 
completes too fast, add some unions to the query above.

*Expected result:*
Profiles page should be opened. The running query should be listed.

*Actual result:*
The Web UI hangs until the query completes.

*Note:*
If the query is started from sqlline, everything is fine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6692) Undeclared dependencies for Nullable and NotNull annotations

2018-08-16 Thread Vitalii Diravka (JIRA)
Vitalii Diravka created DRILL-6692:
--

 Summary: Undeclared dependencies for Nullable and NotNull 
annotations
 Key: DRILL-6692
 URL: https://issues.apache.org/jira/browse/DRILL-6692
 Project: Apache Drill
  Issue Type: Task
Affects Versions: 1.14.0
Reporter: Vitalii Diravka
 Fix For: Future


Threre are actively used {{@Nullable}} and {{@NotNull}} annotations in the 
project. They come to Drill from transitive dependencies {{javax.validation}} 
(validation-api-1.1.0.Final.jar) and {{javax.annotation}} (jsr305-3.0.1.jar), 
but Drill has not direct dependencies to them.

It is possible to add dependencies to this libraries, but possibly the better 
choice is to get rid from them and replace {{@NotNull}} annotation with 
{{Objects.requireNonNull()}} check in beginning of the method (the issue will 
be with methods in interfaces). Possibly the right decision to solve it could 
be raised in Drill dev mailing list.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6691) Unify checkstyle-config.xml files

2018-08-16 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6691:

Fix Version/s: 1.15.0

> Unify checkstyle-config.xml files
> -
>
> Key: DRILL-6691
> URL: https://issues.apache.org/jira/browse/DRILL-6691
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Volodymyr Vysotskyi
>Assignee: Timothy Farkas
>Priority: Minor
> Fix For: 1.15.0
>
>
> Currently, `drill-root` and `format-maprdb` modules contain 
> `checkstyle-config.xml` own files.
> They should be unified to apply the same checkstyle rules for all modules.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6691) Unify checkstyle-config.xml files

2018-08-16 Thread Volodymyr Vysotskyi (JIRA)
Volodymyr Vysotskyi created DRILL-6691:
--

 Summary: Unify checkstyle-config.xml files
 Key: DRILL-6691
 URL: https://issues.apache.org/jira/browse/DRILL-6691
 Project: Apache Drill
  Issue Type: Task
Reporter: Volodymyr Vysotskyi
Assignee: Timothy Farkas


Currently, `drill-root` and `format-maprdb` modules contain 
`checkstyle-config.xml` own files.

They should be unified to apply the same checkstyle rules for all modules.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6662) Access AWS access key ID and secret access key using Credential Provider API for S3 storage plugin

2018-08-16 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6662:

Labels: doc-impacting ready-to-commit  (was: doc-impacting)

> Access AWS access key ID and secret access key using Credential Provider API 
> for S3 storage plugin
> --
>
> Key: DRILL-6662
> URL: https://issues.apache.org/jira/browse/DRILL-6662
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.15.0
>
>
> Hadoop provides [CredentialProvider 
> API|[https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html]]
>  which allows passwords and other sensitive secrets to be stored in an 
> external provider rather than in configuration files in plaintext.
> Currently S3 storage plugin is accessing passwords, namely 
> 'fs.s3a.access.key' and 'fs.s3a.secret.key', stored in clear text in 
> Configuration with get() method. To give users an ability to remove clear 
> text passwords for S3 from configuration files Configuration.getPassword() 
> method should be used, given they configure 
> 'hadoop.security.credential.provider.path' property which points to a file 
> containing encrypted passwords instead of configuring two aforementioned 
> properties.
> By using this approach, credential providers will be checked first and if the 
> secret is not provided or providers are not configured there will be a 
> fallback to secrets configured in clear text (unless 
> 'hadoop.security.credential.clear-text-fallback' is configured to be 
> "false"), thus making new change backwards-compatible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6662) Access AWS access key ID and secret access key using Credential Provider API for S3 storage plugin

2018-08-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582372#comment-16582372
 ] 

ASF GitHub Bot commented on DRILL-6662:
---

KazydubB commented on issue #1419: DRILL-6662: Access AWS access key ID and 
secret access key using Cred…
URL: https://github.com/apache/drill/pull/1419#issuecomment-413512145
 
 
   @arina-ielchiieva I have addressed review comments. Could you take a look, 
please?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Access AWS access key ID and secret access key using Credential Provider API 
> for S3 storage plugin
> --
>
> Key: DRILL-6662
> URL: https://issues.apache.org/jira/browse/DRILL-6662
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.15.0
>
>
> Hadoop provides [CredentialProvider 
> API|[https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html]]
>  which allows passwords and other sensitive secrets to be stored in an 
> external provider rather than in configuration files in plaintext.
> Currently S3 storage plugin is accessing passwords, namely 
> 'fs.s3a.access.key' and 'fs.s3a.secret.key', stored in clear text in 
> Configuration with get() method. To give users an ability to remove clear 
> text passwords for S3 from configuration files Configuration.getPassword() 
> method should be used, given they configure 
> 'hadoop.security.credential.provider.path' property which points to a file 
> containing encrypted passwords instead of configuring two aforementioned 
> properties.
> By using this approach, credential providers will be checked first and if the 
> secret is not provided or providers are not configured there will be a 
> fallback to secrets configured in clear text (unless 
> 'hadoop.security.credential.clear-text-fallback' is configured to be 
> "false"), thus making new change backwards-compatible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6662) Access AWS access key ID and secret access key using Credential Provider API for S3 storage plugin

2018-08-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582373#comment-16582373
 ] 

ASF GitHub Bot commented on DRILL-6662:
---

arina-ielchiieva commented on issue #1419: DRILL-6662: Access AWS access key ID 
and secret access key using Cred…
URL: https://github.com/apache/drill/pull/1419#issuecomment-413512174
 
 
   +1, LGTM.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Access AWS access key ID and secret access key using Credential Provider API 
> for S3 storage plugin
> --
>
> Key: DRILL-6662
> URL: https://issues.apache.org/jira/browse/DRILL-6662
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.15.0
>
>
> Hadoop provides [CredentialProvider 
> API|[https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html]]
>  which allows passwords and other sensitive secrets to be stored in an 
> external provider rather than in configuration files in plaintext.
> Currently S3 storage plugin is accessing passwords, namely 
> 'fs.s3a.access.key' and 'fs.s3a.secret.key', stored in clear text in 
> Configuration with get() method. To give users an ability to remove clear 
> text passwords for S3 from configuration files Configuration.getPassword() 
> method should be used, given they configure 
> 'hadoop.security.credential.provider.path' property which points to a file 
> containing encrypted passwords instead of configuring two aforementioned 
> properties.
> By using this approach, credential providers will be checked first and if the 
> secret is not provided or providers are not configured there will be a 
> fallback to secrets configured in clear text (unless 
> 'hadoop.security.credential.clear-text-fallback' is configured to be 
> "false"), thus making new change backwards-compatible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6662) Access AWS access key ID and secret access key using Credential Provider API for S3 storage plugin

2018-08-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582371#comment-16582371
 ] 

ASF GitHub Bot commented on DRILL-6662:
---

arina-ielchiieva commented on a change in pull request #1419: DRILL-6662: 
Access AWS access key ID and secret access key using Cred…
URL: https://github.com/apache/drill/pull/1419#discussion_r210559817
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java
 ##
 @@ -104,6 +109,33 @@ public FileSystemPlugin(FileSystemConfig config, 
DrillbitContext context, String
 }
   }
 
+  private boolean isS3Connection(Configuration conf) {
+URI uri = FileSystem.getDefaultUri(conf);
+return uri.getScheme().equals("s3a");
+  }
+
+  /**
+   * Retrieve secret and access keys from configured (with
+   * {@link 
org.apache.hadoop.security.alias.CredentialProviderFactory#CREDENTIAL_PROVIDER_PATH}
 property)
+   * credential providers and set it into {@code conf}. If provider path is 
not configured or credential
+   * is absent in providers, it will conditionally fallback to configuration 
setting. The fallback will occur unless
+   * {@link 
org.apache.hadoop.security.alias.CredentialProvider#CLEAR_TEXT_FALLBACK} is set 
to {@code false}.
+   *
+   * @param conf {@code Configuration} which will be updated with credentials 
from provider
+   * @throws IOException thrown if a credential cannot be retrieved from 
provider
+   */
+  private void handleS3Credentials(Configuration conf) throws IOException {
+String[] credentialKeys = {"fs.s3a.secret.key", "fs.s3a.access.key"};
 
 Review comment:
   In this case, please leave as is.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Access AWS access key ID and secret access key using Credential Provider API 
> for S3 storage plugin
> --
>
> Key: DRILL-6662
> URL: https://issues.apache.org/jira/browse/DRILL-6662
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.15.0
>
>
> Hadoop provides [CredentialProvider 
> API|[https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html]]
>  which allows passwords and other sensitive secrets to be stored in an 
> external provider rather than in configuration files in plaintext.
> Currently S3 storage plugin is accessing passwords, namely 
> 'fs.s3a.access.key' and 'fs.s3a.secret.key', stored in clear text in 
> Configuration with get() method. To give users an ability to remove clear 
> text passwords for S3 from configuration files Configuration.getPassword() 
> method should be used, given they configure 
> 'hadoop.security.credential.provider.path' property which points to a file 
> containing encrypted passwords instead of configuring two aforementioned 
> properties.
> By using this approach, credential providers will be checked first and if the 
> secret is not provided or providers are not configured there will be a 
> fallback to secrets configured in clear text (unless 
> 'hadoop.security.credential.clear-text-fallback' is configured to be 
> "false"), thus making new change backwards-compatible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6662) Access AWS access key ID and secret access key using Credential Provider API for S3 storage plugin

2018-08-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582369#comment-16582369
 ] 

ASF GitHub Bot commented on DRILL-6662:
---

KazydubB commented on a change in pull request #1419: DRILL-6662: Access AWS 
access key ID and secret access key using Cred…
URL: https://github.com/apache/drill/pull/1419#discussion_r210559211
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java
 ##
 @@ -104,6 +109,33 @@ public FileSystemPlugin(FileSystemConfig config, 
DrillbitContext context, String
 }
   }
 
+  private boolean isS3Connection(Configuration conf) {
+URI uri = FileSystem.getDefaultUri(conf);
+return uri.getScheme().equals("s3a");
+  }
+
+  /**
+   * Retrieve secret and access keys from configured (with
+   * {@link 
org.apache.hadoop.security.alias.CredentialProviderFactory#CREDENTIAL_PROVIDER_PATH}
 property)
+   * credential providers and set it into {@code conf}. If provider path is 
not configured or credential
+   * is absent in providers, it will conditionally fallback to configuration 
setting. The fallback will occur unless
+   * {@link 
org.apache.hadoop.security.alias.CredentialProvider#CLEAR_TEXT_FALLBACK} is set 
to {@code false}.
+   *
+   * @param conf {@code Configuration} which will be updated with credentials 
from provider
+   * @throws IOException thrown if a credential cannot be retrieved from 
provider
+   */
+  private void handleS3Credentials(Configuration conf) throws IOException {
+String[] credentialKeys = {"fs.s3a.secret.key", "fs.s3a.access.key"};
 
 Review comment:
   I am aware of the Constants, but artifact (hadoop-aws), containing this 
class is not among the module's dependencies (however it is present in 
distribution's (compile-scope) and drill-root's (test-scope) dependencies). Is 
there a need to add the dependency?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Access AWS access key ID and secret access key using Credential Provider API 
> for S3 storage plugin
> --
>
> Key: DRILL-6662
> URL: https://issues.apache.org/jira/browse/DRILL-6662
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.15.0
>
>
> Hadoop provides [CredentialProvider 
> API|[https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html]]
>  which allows passwords and other sensitive secrets to be stored in an 
> external provider rather than in configuration files in plaintext.
> Currently S3 storage plugin is accessing passwords, namely 
> 'fs.s3a.access.key' and 'fs.s3a.secret.key', stored in clear text in 
> Configuration with get() method. To give users an ability to remove clear 
> text passwords for S3 from configuration files Configuration.getPassword() 
> method should be used, given they configure 
> 'hadoop.security.credential.provider.path' property which points to a file 
> containing encrypted passwords instead of configuring two aforementioned 
> properties.
> By using this approach, credential providers will be checked first and if the 
> secret is not provided or providers are not configured there will be a 
> fallback to secrets configured in clear text (unless 
> 'hadoop.security.credential.clear-text-fallback' is configured to be 
> "false"), thus making new change backwards-compatible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6662) Access AWS access key ID and secret access key using Credential Provider API for S3 storage plugin

2018-08-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582314#comment-16582314
 ] 

ASF GitHub Bot commented on DRILL-6662:
---

arina-ielchiieva commented on a change in pull request #1419: DRILL-6662: 
Access AWS access key ID and secret access key using Cred…
URL: https://github.com/apache/drill/pull/1419#discussion_r210540931
 
 

 ##
 File path: distribution/src/resources/core-site-example.xml
 ##
 @@ -30,4 +30,14 @@
 ENTER_YOUR_SECRETKEY
 
 
+
 
 Review comment:
   Please comment out this section and add comment explaining that user should 
use one of those.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Access AWS access key ID and secret access key using Credential Provider API 
> for S3 storage plugin
> --
>
> Key: DRILL-6662
> URL: https://issues.apache.org/jira/browse/DRILL-6662
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.15.0
>
>
> Hadoop provides [CredentialProvider 
> API|[https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html]]
>  which allows passwords and other sensitive secrets to be stored in an 
> external provider rather than in configuration files in plaintext.
> Currently S3 storage plugin is accessing passwords, namely 
> 'fs.s3a.access.key' and 'fs.s3a.secret.key', stored in clear text in 
> Configuration with get() method. To give users an ability to remove clear 
> text passwords for S3 from configuration files Configuration.getPassword() 
> method should be used, given they configure 
> 'hadoop.security.credential.provider.path' property which points to a file 
> containing encrypted passwords instead of configuring two aforementioned 
> properties.
> By using this approach, credential providers will be checked first and if the 
> secret is not provided or providers are not configured there will be a 
> fallback to secrets configured in clear text (unless 
> 'hadoop.security.credential.clear-text-fallback' is configured to be 
> "false"), thus making new change backwards-compatible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6662) Access AWS access key ID and secret access key using Credential Provider API for S3 storage plugin

2018-08-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582313#comment-16582313
 ] 

ASF GitHub Bot commented on DRILL-6662:
---

arina-ielchiieva commented on a change in pull request #1419: DRILL-6662: 
Access AWS access key ID and secret access key using Cred…
URL: https://github.com/apache/drill/pull/1419#discussion_r210541907
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java
 ##
 @@ -104,6 +109,33 @@ public FileSystemPlugin(FileSystemConfig config, 
DrillbitContext context, String
 }
   }
 
+  private boolean isS3Connection(Configuration conf) {
+URI uri = FileSystem.getDefaultUri(conf);
+return uri.getScheme().equals("s3a");
+  }
+
+  /**
+   * Retrieve secret and access keys from configured (with
+   * {@link 
org.apache.hadoop.security.alias.CredentialProviderFactory#CREDENTIAL_PROVIDER_PATH}
 property)
+   * credential providers and set it into {@code conf}. If provider path is 
not configured or credential
+   * is absent in providers, it will conditionally fallback to configuration 
setting. The fallback will occur unless
+   * {@link 
org.apache.hadoop.security.alias.CredentialProvider#CLEAR_TEXT_FALLBACK} is set 
to {@code false}.
+   *
+   * @param conf {@code Configuration} which will be updated with credentials 
from provider
+   * @throws IOException thrown if a credential cannot be retrieved from 
provider
+   */
+  private void handleS3Credentials(Configuration conf) throws IOException {
+String[] credentialKeys = {"fs.s3a.secret.key", "fs.s3a.access.key"};
 
 Review comment:
   Consider using org.apache.hadoop.fs.s3a.Contants class.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Access AWS access key ID and secret access key using Credential Provider API 
> for S3 storage plugin
> --
>
> Key: DRILL-6662
> URL: https://issues.apache.org/jira/browse/DRILL-6662
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.15.0
>
>
> Hadoop provides [CredentialProvider 
> API|[https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html]]
>  which allows passwords and other sensitive secrets to be stored in an 
> external provider rather than in configuration files in plaintext.
> Currently S3 storage plugin is accessing passwords, namely 
> 'fs.s3a.access.key' and 'fs.s3a.secret.key', stored in clear text in 
> Configuration with get() method. To give users an ability to remove clear 
> text passwords for S3 from configuration files Configuration.getPassword() 
> method should be used, given they configure 
> 'hadoop.security.credential.provider.path' property which points to a file 
> containing encrypted passwords instead of configuring two aforementioned 
> properties.
> By using this approach, credential providers will be checked first and if the 
> secret is not provided or providers are not configured there will be a 
> fallback to secrets configured in clear text (unless 
> 'hadoop.security.credential.clear-text-fallback' is configured to be 
> "false"), thus making new change backwards-compatible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)