[jira] [Commented] (DRILL-5735) UI options grouping and filtering & Metrics hints
[ https://issues.apache.org/jira/browse/DRILL-5735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16583404#comment-16583404 ] ASF GitHub Bot commented on DRILL-5735: --- kkhatua edited a comment on issue #1279: DRILL-5735: Allow search/sort in the Options webUI URL: https://github.com/apache/drill/pull/1279#issuecomment-413764952 @arina-ielchiieva please review. This contains changes as requested. The UI and the System tables, both function as expected. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > UI options grouping and filtering & Metrics hints > - > > Key: DRILL-5735 > URL: https://issues.apache.org/jira/browse/DRILL-5735 > Project: Apache Drill > Issue Type: Improvement > Components: Web Server >Affects Versions: 1.9.0, 1.10.0, 1.11.0 >Reporter: Muhammad Gelbana >Assignee: Kunal Khatua >Priority: Major > Fix For: 1.15.0 > > > I'm thinking of some UI improvements that could make all the difference for > users trying to optimize low-performing queries. > h2. Options > h3. Grouping > We can organize the options to be grouped by their scope of effect, this will > help users easily locate the options they may need to tune. > h3. Filtering > Since the options are a lot, we can add a filtering mechanism (i.e. string > search or group\scope filtering) so the user can filter out the options he's > not interested in. To provide more benefit than the grouping idea mentioned > above, filtering may include keywords also and not just the option name, > since the user may not be aware of the name of the option he's looking for. > h2. Metrics > I'm referring here to the metrics page and the query execution plan page that > displays the overview section and major\minor fragments metrics. We can show > hints for each metric such as: > # What does it represent in more details. > # What option\scope-of-options to tune (increase ? decrease ?) to improve the > performance reported by this metric. > # May be even provide a small dialog to quickly allow the modification of the > related option(s) to that metric -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-5735) UI options grouping and filtering & Metrics hints
[ https://issues.apache.org/jira/browse/DRILL-5735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16583401#comment-16583401 ] ASF GitHub Bot commented on DRILL-5735: --- kkhatua commented on issue #1279: DRILL-5735: Allow search/sort in the Options webUI URL: https://github.com/apache/drill/pull/1279#issuecomment-413764952 @arina-ielchiieva please review. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > UI options grouping and filtering & Metrics hints > - > > Key: DRILL-5735 > URL: https://issues.apache.org/jira/browse/DRILL-5735 > Project: Apache Drill > Issue Type: Improvement > Components: Web Server >Affects Versions: 1.9.0, 1.10.0, 1.11.0 >Reporter: Muhammad Gelbana >Assignee: Kunal Khatua >Priority: Major > Fix For: 1.15.0 > > > I'm thinking of some UI improvements that could make all the difference for > users trying to optimize low-performing queries. > h2. Options > h3. Grouping > We can organize the options to be grouped by their scope of effect, this will > help users easily locate the options they may need to tune. > h3. Filtering > Since the options are a lot, we can add a filtering mechanism (i.e. string > search or group\scope filtering) so the user can filter out the options he's > not interested in. To provide more benefit than the grouping idea mentioned > above, filtering may include keywords also and not just the option name, > since the user may not be aware of the name of the option he's looking for. > h2. Metrics > I'm referring here to the metrics page and the query execution plan page that > displays the overview section and major\minor fragments metrics. We can show > hints for each metric such as: > # What does it represent in more details. > # What option\scope-of-options to tune (increase ? decrease ?) to improve the > performance reported by this metric. > # May be even provide a small dialog to quickly allow the modification of the > related option(s) to that metric -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6694) NPE in UnnestRecordBatch when query uses a column name not present in data
[ https://issues.apache.org/jira/browse/DRILL-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16583189#comment-16583189 ] ASF GitHub Bot commented on DRILL-6694: --- sohami closed pull request #1434: DRILL-6694: NPE in UnnestRecordBatch when query uses a column name no… URL: https://github.com/apache/drill/pull/1434 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unnest/UnnestRecordBatch.java b/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unnest/UnnestRecordBatch.java index e1b8acb42de..a00fae67bd5 100644 --- a/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unnest/UnnestRecordBatch.java +++ b/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unnest/UnnestRecordBatch.java @@ -62,6 +62,8 @@ private int remainderIndex = 0; private int recordCount; private MaterializedField unnestFieldMetadata; + // Reference of TypedFieldId for Unnest column. It's always set in schemaChanged method and later used by others + private TypedFieldId unnestTypedFieldId; private final UnnestMemoryManager memoryManager; public enum Metric implements MetricDef { @@ -95,12 +97,8 @@ public void update() { // Get sizing information for the batch. setRecordBatchSizer(new RecordBatchSizer(incoming)); - final TypedFieldId typedFieldId = incoming.getValueVectorId(popConfig.getColumn()); - final MaterializedField field = incoming.getSchema().getColumn(typedFieldId.getFieldIds()[0]); - // Get column size of unnest column. - - RecordBatchSizer.ColumnSize columnSize = getRecordBatchSizer().getColumn(field.getName()); + RecordBatchSizer.ColumnSize columnSize = getRecordBatchSizer().getColumn(unnestFieldMetadata.getName()); final int rowIdColumnSize = TypeHelper.getSize(rowIdVector.getField().getType()); @@ -213,22 +211,15 @@ public IterOutcome innerNext() { container.zeroVectors(); // Check if schema has changed if (lateral.getRecordIndex() == 0) { -boolean hasNewSchema = schemaChanged(); -stats.batchReceived(0, incoming.getRecordCount(), hasNewSchema); -if (hasNewSchema) { - try { +try { + boolean hasNewSchema = schemaChanged(); + stats.batchReceived(0, incoming.getRecordCount(), hasNewSchema); + if (hasNewSchema) { setupNewSchema(); hasRemainder = true; memoryManager.update(); - } catch (SchemaChangeException ex) { -kill(false); -logger.error("Failure during query", ex); -context.getExecutorState().fail(ex); -return IterOutcome.STOP; - } - return OK_NEW_SCHEMA; -} else { // Unnest field schema didn't changed but new left empty/nonempty batch might come with OK_NEW_SCHEMA - try { +return OK_NEW_SCHEMA; + } else { // Unnest field schema didn't changed but new left empty/nonempty batch might come with OK_NEW_SCHEMA // This means even though there is no schema change for unnest field the reference of unnest field // ValueVector must have changed hence we should just refresh the transfer pairs and keep output vector // same as before. In case when new left batch is received with SchemaChange but was empty Lateral will @@ -237,19 +228,18 @@ public IterOutcome innerNext() { // pair. It should do for each new left incoming batch. resetUnnestTransferPair(); container.zeroVectors(); - } catch (SchemaChangeException ex) { -kill(false); -logger.error("Failure during query", ex); -context.getExecutorState().fail(ex); -return IterOutcome.STOP; - } -} // else -unnest.resetGroupIndex(); -memoryManager.update(); + } // else + unnest.resetGroupIndex(); + memoryManager.update(); +} catch (SchemaChangeException ex) { + kill(false); + logger.error("Failure during query", ex); + context.getExecutorState().fail(ex); + return IterOutcome.STOP; +} } return doWork(); } - } @Override @@ -259,11 +249,10 @@ public VectorContainer getOutgoingContainer() { @SuppressWarnings("resource") private void setUnnestVector() { -final TypedFieldId typedFieldId = incoming.getValueVectorId(popConfig.getColumn()); -final MaterializedField field = incoming.getSchema().getColumn(typedFieldId.getFieldIds()[0]); +final MaterializedField field =
[jira] [Commented] (DRILL-6695) Graceful shutdown removes spill directory before query finished
[ https://issues.apache.org/jira/browse/DRILL-6695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16583151#comment-16583151 ] Krystal commented on DRILL-6695: Forgot to mention that this problem does not occur if graceful shutdown is initiated from the webUI. > Graceful shutdown removes spill directory before query finished > > > Key: DRILL-6695 > URL: https://issues.apache.org/jira/browse/DRILL-6695 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.14.0 > Environment: Ran the following query from sqlline: > select a.columns[0], b.columns[1], a.columns[2], b.columns[3], a.columns[4] > from `test` a join `test` b on a.columns[0]=b.columns[0] and > a.columns[4]=b.columns[4] order by a.columns[0] limit 1000; > While the query was running, initiated a graceful shutdown from command line > on the foreman node. The query failed with the following error message: > Error: RESOURCE ERROR: Hash Join failed to open spill file: > /tmp/drill/spill/248a054a-ee63-e795-a44e-d9205df8e9b8_HashJoin_3-2-0/spill7_outer > Fragment 3:0 > Looks like somehow the spill directory gets deleted while query is still > running when graceful_shutdown is initiated. > > >Reporter: Krystal >Priority: Major > Attachments: drillbit.log > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6695) Graceful shutdown removes spill directory before query finished
[ https://issues.apache.org/jira/browse/DRILL-6695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krystal updated DRILL-6695: --- Attachment: drillbit.log > Graceful shutdown removes spill directory before query finished > > > Key: DRILL-6695 > URL: https://issues.apache.org/jira/browse/DRILL-6695 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.14.0 > Environment: Ran the following query from sqlline: > select a.columns[0], b.columns[1], a.columns[2], b.columns[3], a.columns[4] > from `test` a join `test` b on a.columns[0]=b.columns[0] and > a.columns[4]=b.columns[4] order by a.columns[0] limit 1000; > While the query was running, initiated a graceful shutdown from command line > on the foreman node. The query failed with the following error message: > Error: RESOURCE ERROR: Hash Join failed to open spill file: > /tmp/drill/spill/248a054a-ee63-e795-a44e-d9205df8e9b8_HashJoin_3-2-0/spill7_outer > Fragment 3:0 > Looks like somehow the spill directory gets deleted while query is still > running when graceful_shutdown is initiated. > > >Reporter: Krystal >Priority: Major > Attachments: drillbit.log > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6695) Graceful shutdown removes spill directory before query finished
[ https://issues.apache.org/jira/browse/DRILL-6695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16583147#comment-16583147 ] Krystal commented on DRILL-6695: Here is the log file: [^drillbit.log] > Graceful shutdown removes spill directory before query finished > > > Key: DRILL-6695 > URL: https://issues.apache.org/jira/browse/DRILL-6695 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.14.0 > Environment: Ran the following query from sqlline: > select a.columns[0], b.columns[1], a.columns[2], b.columns[3], a.columns[4] > from `test` a join `test` b on a.columns[0]=b.columns[0] and > a.columns[4]=b.columns[4] order by a.columns[0] limit 1000; > While the query was running, initiated a graceful shutdown from command line > on the foreman node. The query failed with the following error message: > Error: RESOURCE ERROR: Hash Join failed to open spill file: > /tmp/drill/spill/248a054a-ee63-e795-a44e-d9205df8e9b8_HashJoin_3-2-0/spill7_outer > Fragment 3:0 > Looks like somehow the spill directory gets deleted while query is still > running when graceful_shutdown is initiated. > > >Reporter: Krystal >Priority: Major > Attachments: drillbit.log > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6695) Graceful shutdown removes spill directory before query finished
Krystal created DRILL-6695: -- Summary: Graceful shutdown removes spill directory before query finished Key: DRILL-6695 URL: https://issues.apache.org/jira/browse/DRILL-6695 Project: Apache Drill Issue Type: Bug Components: Execution - Flow Affects Versions: 1.14.0 Environment: Ran the following query from sqlline: select a.columns[0], b.columns[1], a.columns[2], b.columns[3], a.columns[4] from `test` a join `test` b on a.columns[0]=b.columns[0] and a.columns[4]=b.columns[4] order by a.columns[0] limit 1000; While the query was running, initiated a graceful shutdown from command line on the foreman node. The query failed with the following error message: Error: RESOURCE ERROR: Hash Join failed to open spill file: /tmp/drill/spill/248a054a-ee63-e795-a44e-d9205df8e9b8_HashJoin_3-2-0/spill7_outer Fragment 3:0 Looks like somehow the spill directory gets deleted while query is still running when graceful_shutdown is initiated. Reporter: Krystal -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6694) NPE in UnnestRecordBatch when query uses a column name not present in data
[ https://issues.apache.org/jira/browse/DRILL-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16583126#comment-16583126 ] ASF GitHub Bot commented on DRILL-6694: --- sohami commented on a change in pull request #1434: DRILL-6694: NPE in UnnestRecordBatch when query uses a column name no… URL: https://github.com/apache/drill/pull/1434#discussion_r210759060 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unnest/UnnestRecordBatch.java ## @@ -62,6 +62,8 @@ private int remainderIndex = 0; private int recordCount; private MaterializedField unnestFieldMetadata; + // Reference of TypedFieldId for Unnest column. It's always set in schemaChanged method and later used by others + private TypedFieldId unnestTypedFieldId; Review comment: kept a reference of it here since it's costly to get it everytime from incoming batch based on column name This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > NPE in UnnestRecordBatch when query uses a column name not present in data > -- > > Key: DRILL-6694 > URL: https://issues.apache.org/jira/browse/DRILL-6694 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.14.0 >Reporter: Sorabh Hamirwasia >Assignee: Sorabh Hamirwasia >Priority: Major > Fix For: 1.15.0 > > > When the array column name doesn't exist in the underlying data and is used > in query with Unnest then there is NPE. The reason is Unnest tries to get the > ValueVector of unnest column from incoming based on TypedFieldId which will > be null in this case and hence the exception. > {code:java} > [Error Id: 6f8461ee-92c7-4865-b5e6-3e2f756391c4 on pssc-67.qa.lab:31010] at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633) > ~[drill-common-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at > org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:361) > [drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:216) > [drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:327) > [drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > [drill-common-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [na:1.8.0_151] at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [na:1.8.0_151] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_151] > Caused by: java.lang.NullPointerException: null at > org.apache.drill.exec.physical.impl.unnest.UnnestRecordBatch.schemaChanged(UnnestRecordBatch.java:422) > ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at > org.apache.drill.exec.physical.impl.unnest.UnnestRecordBatch.innerNext(UnnestRecordBatch.java:208) > ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172) > ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) > ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) > ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at > org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:64) > ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:142) > ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172) > ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) > ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at > org.apache.drill.exec.physical.impl.join.LateralJoinBatch.prefetchFirstBatchFromBothSides(LateralJoinBatch.java:331) > ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at > org.apache.drill.exec.physical.impl.join.LateralJoinBatch.buildSchema(LateralJoinBatch.java:356) >
[jira] [Commented] (DRILL-6694) NPE in UnnestRecordBatch when query uses a column name not present in data
[ https://issues.apache.org/jira/browse/DRILL-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16583124#comment-16583124 ] ASF GitHub Bot commented on DRILL-6694: --- sohami opened a new pull request #1434: DRILL-6694: NPE in UnnestRecordBatch when query uses a column name no… URL: https://github.com/apache/drill/pull/1434 …t present in data This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > NPE in UnnestRecordBatch when query uses a column name not present in data > -- > > Key: DRILL-6694 > URL: https://issues.apache.org/jira/browse/DRILL-6694 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.14.0 >Reporter: Sorabh Hamirwasia >Assignee: Sorabh Hamirwasia >Priority: Major > Fix For: 1.15.0 > > > When the array column name doesn't exist in the underlying data and is used > in query with Unnest then there is NPE. The reason is Unnest tries to get the > ValueVector of unnest column from incoming based on TypedFieldId which will > be null in this case and hence the exception. > {code:java} > [Error Id: 6f8461ee-92c7-4865-b5e6-3e2f756391c4 on pssc-67.qa.lab:31010] at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633) > ~[drill-common-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at > org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:361) > [drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:216) > [drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:327) > [drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > [drill-common-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [na:1.8.0_151] at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [na:1.8.0_151] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_151] > Caused by: java.lang.NullPointerException: null at > org.apache.drill.exec.physical.impl.unnest.UnnestRecordBatch.schemaChanged(UnnestRecordBatch.java:422) > ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at > org.apache.drill.exec.physical.impl.unnest.UnnestRecordBatch.innerNext(UnnestRecordBatch.java:208) > ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172) > ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) > ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) > ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at > org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:64) > ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:142) > ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172) > ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) > ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at > org.apache.drill.exec.physical.impl.join.LateralJoinBatch.prefetchFirstBatchFromBothSides(LateralJoinBatch.java:331) > ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at > org.apache.drill.exec.physical.impl.join.LateralJoinBatch.buildSchema(LateralJoinBatch.java:356) > ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:152) > ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) > ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) > ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at >
[jira] [Created] (DRILL-6694) NPE in UnnestRecordBatch when query uses a column name not present in data
Sorabh Hamirwasia created DRILL-6694: Summary: NPE in UnnestRecordBatch when query uses a column name not present in data Key: DRILL-6694 URL: https://issues.apache.org/jira/browse/DRILL-6694 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Affects Versions: 1.14.0 Reporter: Sorabh Hamirwasia Assignee: Sorabh Hamirwasia Fix For: 1.15.0 When the array column name doesn't exist in the underlying data and is used in query with Unnest then there is NPE. The reason is Unnest tries to get the ValueVector of unnest column from incoming based on TypedFieldId which will be null in this case and hence the exception. {code:java} [Error Id: 6f8461ee-92c7-4865-b5e6-3e2f756391c4 on pssc-67.qa.lab:31010] at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633) ~[drill-common-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:361) [drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:216) [drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:327) [drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) [drill-common-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_151] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_151] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_151] Caused by: java.lang.NullPointerException: null at org.apache.drill.exec.physical.impl.unnest.UnnestRecordBatch.schemaChanged(UnnestRecordBatch.java:422) ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.unnest.UnnestRecordBatch.innerNext(UnnestRecordBatch.java:208) ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172) ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:64) ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:142) ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172) ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.join.LateralJoinBatch.prefetchFirstBatchFromBothSides(LateralJoinBatch.java:331) ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.join.LateralJoinBatch.buildSchema(LateralJoinBatch.java:356) ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:152) ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.buildSchema(StreamingAggBatch.java:158) ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:152) ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:103) ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext(SingleSenderCreator.java:93) ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:93) ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:294) ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT] at
[jira] [Commented] (DRILL-6552) Drill Metadata management "Drill MetaStore"
[ https://issues.apache.org/jira/browse/DRILL-6552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16583061#comment-16583061 ] Parth Chandra commented on DRILL-6552: -- Good point about #4 : Improvements to Drill execution and planning to leverage (enhanced) metadata. {quote}This project sounds pretty large. Given the modular structure you've outlined, the initial implementation might focus on the API and Drill internals changes to use the data. Create starter implementations for HMS, Drill's existing Parquet metadata, an easy-to-use file based description for ad-hoc uses, and a system table based system when querying a JDBC data store. {quote} I would second this approach. > Drill Metadata management "Drill MetaStore" > --- > > Key: DRILL-6552 > URL: https://issues.apache.org/jira/browse/DRILL-6552 > Project: Apache Drill > Issue Type: New Feature > Components: Metadata >Affects Versions: 1.13.0 >Reporter: Vitalii Diravka >Assignee: Vitalii Diravka >Priority: Major > Fix For: 2.0.0 > > > It would be useful for Drill to have some sort of metastore which would > enable Drill to remember previously defined schemata so Drill doesn’t have to > do the same work over and over again. > It allows to store schema and statistics, which will allow to accelerate > queries validation, planning and execution time. Also it increases stability > of Drill and allows to avoid different kind if issues: "schema change > Exceptions", "limit 0" optimization and so on. > One of the main candidates is Hive Metastore. > Starting from 3.0 version Hive Metastore can be the separate service from > Hive server: > [https://cwiki.apache.org/confluence/display/Hive/AdminManual+Metastore+3.0+Administration] > Optional enhancement is storing Drill's profiles, UDFs, plugins configs in > some kind of metastore as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6461) Add Basic Data Correctness Unit Tests
[ https://issues.apache.org/jira/browse/DRILL-6461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16583004#comment-16583004 ] ASF GitHub Bot commented on DRILL-6461: --- ilooner commented on issue #1344: DRILL-6461: Added basic data correctness tests for hash agg, and improved operator unit testing framework. URL: https://github.com/apache/drill/pull/1344#issuecomment-413667931 @sohami The PR is ready for another round of review. I removed the fix that allowed **copyEntry** to work with empty VarLength vectors since value vectors should not be manipulated unless allocateNew is called. I fixed the failing test by calling allocateNew on all the vectors in the destination vector container. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add Basic Data Correctness Unit Tests > - > > Key: DRILL-6461 > URL: https://issues.apache.org/jira/browse/DRILL-6461 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Major > > There are no data correctness unit tests for HashAgg. We need to add some. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6566) Jenkins Regression: TPCDS query 66 fails with RESOURCE ERROR: One or more nodes ran out of memory while executing the query. AGGR OOM at First Phase.
[ https://issues.apache.org/jira/browse/DRILL-6566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582913#comment-16582913 ] Boaz Ben-Zvi commented on DRILL-6566: - This query is doing 36 SUM() aggregations !! Hence leading to a very large batch of about 31 MB ( ~ 64K * 36 * ~ 13) . The memory available to the first phase Hash-Agg is only 27 MB -> OOM. Should the batch-sizing address this situation ? > Jenkins Regression: TPCDS query 66 fails with RESOURCE ERROR: One or more > nodes ran out of memory while executing the query. AGGR OOM at First Phase. > -- > > Key: DRILL-6566 > URL: https://issues.apache.org/jira/browse/DRILL-6566 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.14.0 >Reporter: Robert Hou >Assignee: Timothy Farkas >Priority: Critical > Fix For: 1.15.0 > > > This is TPCDS Query 66. > Query: tpcds/tpcds_sf1/original/parquet/query66.sql > SELECT w_warehouse_name, > w_warehouse_sq_ft, > w_city, > w_county, > w_state, > w_country, > ship_carriers, > year1, > Sum(jan_sales) AS jan_sales, > Sum(feb_sales) AS feb_sales, > Sum(mar_sales) AS mar_sales, > Sum(apr_sales) AS apr_sales, > Sum(may_sales) AS may_sales, > Sum(jun_sales) AS jun_sales, > Sum(jul_sales) AS jul_sales, > Sum(aug_sales) AS aug_sales, > Sum(sep_sales) AS sep_sales, > Sum(oct_sales) AS oct_sales, > Sum(nov_sales) AS nov_sales, > Sum(dec_sales) AS dec_sales, > Sum(jan_sales / w_warehouse_sq_ft) AS jan_sales_per_sq_foot, > Sum(feb_sales / w_warehouse_sq_ft) AS feb_sales_per_sq_foot, > Sum(mar_sales / w_warehouse_sq_ft) AS mar_sales_per_sq_foot, > Sum(apr_sales / w_warehouse_sq_ft) AS apr_sales_per_sq_foot, > Sum(may_sales / w_warehouse_sq_ft) AS may_sales_per_sq_foot, > Sum(jun_sales / w_warehouse_sq_ft) AS jun_sales_per_sq_foot, > Sum(jul_sales / w_warehouse_sq_ft) AS jul_sales_per_sq_foot, > Sum(aug_sales / w_warehouse_sq_ft) AS aug_sales_per_sq_foot, > Sum(sep_sales / w_warehouse_sq_ft) AS sep_sales_per_sq_foot, > Sum(oct_sales / w_warehouse_sq_ft) AS oct_sales_per_sq_foot, > Sum(nov_sales / w_warehouse_sq_ft) AS nov_sales_per_sq_foot, > Sum(dec_sales / w_warehouse_sq_ft) AS dec_sales_per_sq_foot, > Sum(jan_net) AS jan_net, > Sum(feb_net) AS feb_net, > Sum(mar_net) AS mar_net, > Sum(apr_net) AS apr_net, > Sum(may_net) AS may_net, > Sum(jun_net) AS jun_net, > Sum(jul_net) AS jul_net, > Sum(aug_net) AS aug_net, > Sum(sep_net) AS sep_net, > Sum(oct_net) AS oct_net, > Sum(nov_net) AS nov_net, > Sum(dec_net) AS dec_net > FROM (SELECT w_warehouse_name, > w_warehouse_sq_ft, > w_city, > w_county, > w_state, > w_country, > 'ZOUROS' > \|\| ',' > \|\| 'ZHOU' AS ship_carriers, > d_yearAS year1, > Sum(CASE > WHEN d_moy = 1 THEN ws_ext_sales_price * ws_quantity > ELSE 0 > END) AS jan_sales, > Sum(CASE > WHEN d_moy = 2 THEN ws_ext_sales_price * ws_quantity > ELSE 0 > END) AS feb_sales, > Sum(CASE > WHEN d_moy = 3 THEN ws_ext_sales_price * ws_quantity > ELSE 0 > END) AS mar_sales, > Sum(CASE > WHEN d_moy = 4 THEN ws_ext_sales_price * ws_quantity > ELSE 0 > END) AS apr_sales, > Sum(CASE > WHEN d_moy = 5 THEN ws_ext_sales_price * ws_quantity > ELSE 0 > END) AS may_sales, > Sum(CASE > WHEN d_moy = 6 THEN ws_ext_sales_price * ws_quantity > ELSE 0 > END) AS jun_sales, > Sum(CASE > WHEN d_moy = 7 THEN ws_ext_sales_price * ws_quantity > ELSE 0 > END) AS jul_sales, > Sum(CASE > WHEN d_moy = 8 THEN ws_ext_sales_price * ws_quantity > ELSE 0 > END) AS aug_sales, > Sum(CASE > WHEN d_moy = 9 THEN ws_ext_sales_price * ws_quantity > ELSE 0 > END) AS sep_sales, > Sum(CASE > WHEN d_moy = 10 THEN ws_ext_sales_price * ws_quantity > ELSE 0 > END) AS oct_sales, > Sum(CASE > WHEN d_moy = 11 THEN ws_ext_sales_price * ws_quantity > ELSE 0 > END) AS nov_sales, > Sum(CASE > WHEN d_moy = 12 THEN ws_ext_sales_price * ws_quantity > ELSE 0 > END) AS dec_sales, > Sum(CASE > WHEN d_moy = 1 THEN ws_net_paid_inc_ship * ws_quantity > ELSE 0 > END) AS jan_net, > Sum(CASE > WHEN d_moy = 2 THEN ws_net_paid_inc_ship * ws_quantity > ELSE 0 > END) AS feb_net, > Sum(CASE > WHEN d_moy = 3 THEN ws_net_paid_inc_ship * ws_quantity > ELSE 0 > END) AS mar_net, > Sum(CASE > WHEN
[jira] [Created] (DRILL-6693) When a query is started from Drill Web Console, the UI becomes inaccessible until the query is completed
Anton Gozhiy created DRILL-6693: --- Summary: When a query is started from Drill Web Console, the UI becomes inaccessible until the query is completed Key: DRILL-6693 URL: https://issues.apache.org/jira/browse/DRILL-6693 Project: Apache Drill Issue Type: Bug Affects Versions: 1.15.0 Reporter: Anton Gozhiy *Steps:* # From Web UI, run the following query: {noformat} select * from ( select employee_id, full_name, first_name, last_name, position_id, position_title, store_id, department_id, birth_date, hire_date, salary, supervisor_id, education_level, marital_status, gender, management_role from cp.`employee.json` union select employee_id, full_name, first_name, last_name, position_id, position_title, store_id, department_id, birth_date, hire_date, salary, supervisor_id, education_level, marital_status, gender, management_role from cp.`employee.json` union select employee_id, full_name, first_name, last_name, position_id, position_title, store_id, department_id, birth_date, hire_date, salary, supervisor_id, education_level, marital_status, gender, management_role from cp.`employee.json`) where last_name = 'Blumberg' {noformat} # While query is running, try open the Profiles page (or any other). If It completes too fast, add some unions to the query above. *Expected result:* Profiles page should be opened. The running query should be listed. *Actual result:* The Web UI hangs until the query completes. *Note:* If the query is started from sqlline, everything is fine. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6692) Undeclared dependencies for Nullable and NotNull annotations
Vitalii Diravka created DRILL-6692: -- Summary: Undeclared dependencies for Nullable and NotNull annotations Key: DRILL-6692 URL: https://issues.apache.org/jira/browse/DRILL-6692 Project: Apache Drill Issue Type: Task Affects Versions: 1.14.0 Reporter: Vitalii Diravka Fix For: Future Threre are actively used {{@Nullable}} and {{@NotNull}} annotations in the project. They come to Drill from transitive dependencies {{javax.validation}} (validation-api-1.1.0.Final.jar) and {{javax.annotation}} (jsr305-3.0.1.jar), but Drill has not direct dependencies to them. It is possible to add dependencies to this libraries, but possibly the better choice is to get rid from them and replace {{@NotNull}} annotation with {{Objects.requireNonNull()}} check in beginning of the method (the issue will be with methods in interfaces). Possibly the right decision to solve it could be raised in Drill dev mailing list. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6691) Unify checkstyle-config.xml files
[ https://issues.apache.org/jira/browse/DRILL-6691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-6691: Fix Version/s: 1.15.0 > Unify checkstyle-config.xml files > - > > Key: DRILL-6691 > URL: https://issues.apache.org/jira/browse/DRILL-6691 > Project: Apache Drill > Issue Type: Task >Reporter: Volodymyr Vysotskyi >Assignee: Timothy Farkas >Priority: Minor > Fix For: 1.15.0 > > > Currently, `drill-root` and `format-maprdb` modules contain > `checkstyle-config.xml` own files. > They should be unified to apply the same checkstyle rules for all modules. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6691) Unify checkstyle-config.xml files
Volodymyr Vysotskyi created DRILL-6691: -- Summary: Unify checkstyle-config.xml files Key: DRILL-6691 URL: https://issues.apache.org/jira/browse/DRILL-6691 Project: Apache Drill Issue Type: Task Reporter: Volodymyr Vysotskyi Assignee: Timothy Farkas Currently, `drill-root` and `format-maprdb` modules contain `checkstyle-config.xml` own files. They should be unified to apply the same checkstyle rules for all modules. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6662) Access AWS access key ID and secret access key using Credential Provider API for S3 storage plugin
[ https://issues.apache.org/jira/browse/DRILL-6662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-6662: Labels: doc-impacting ready-to-commit (was: doc-impacting) > Access AWS access key ID and secret access key using Credential Provider API > for S3 storage plugin > -- > > Key: DRILL-6662 > URL: https://issues.apache.org/jira/browse/DRILL-6662 > Project: Apache Drill > Issue Type: Improvement >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Labels: doc-impacting, ready-to-commit > Fix For: 1.15.0 > > > Hadoop provides [CredentialProvider > API|[https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html]] > which allows passwords and other sensitive secrets to be stored in an > external provider rather than in configuration files in plaintext. > Currently S3 storage plugin is accessing passwords, namely > 'fs.s3a.access.key' and 'fs.s3a.secret.key', stored in clear text in > Configuration with get() method. To give users an ability to remove clear > text passwords for S3 from configuration files Configuration.getPassword() > method should be used, given they configure > 'hadoop.security.credential.provider.path' property which points to a file > containing encrypted passwords instead of configuring two aforementioned > properties. > By using this approach, credential providers will be checked first and if the > secret is not provided or providers are not configured there will be a > fallback to secrets configured in clear text (unless > 'hadoop.security.credential.clear-text-fallback' is configured to be > "false"), thus making new change backwards-compatible. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6662) Access AWS access key ID and secret access key using Credential Provider API for S3 storage plugin
[ https://issues.apache.org/jira/browse/DRILL-6662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582372#comment-16582372 ] ASF GitHub Bot commented on DRILL-6662: --- KazydubB commented on issue #1419: DRILL-6662: Access AWS access key ID and secret access key using Cred… URL: https://github.com/apache/drill/pull/1419#issuecomment-413512145 @arina-ielchiieva I have addressed review comments. Could you take a look, please? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Access AWS access key ID and secret access key using Credential Provider API > for S3 storage plugin > -- > > Key: DRILL-6662 > URL: https://issues.apache.org/jira/browse/DRILL-6662 > Project: Apache Drill > Issue Type: Improvement >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Labels: doc-impacting > Fix For: 1.15.0 > > > Hadoop provides [CredentialProvider > API|[https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html]] > which allows passwords and other sensitive secrets to be stored in an > external provider rather than in configuration files in plaintext. > Currently S3 storage plugin is accessing passwords, namely > 'fs.s3a.access.key' and 'fs.s3a.secret.key', stored in clear text in > Configuration with get() method. To give users an ability to remove clear > text passwords for S3 from configuration files Configuration.getPassword() > method should be used, given they configure > 'hadoop.security.credential.provider.path' property which points to a file > containing encrypted passwords instead of configuring two aforementioned > properties. > By using this approach, credential providers will be checked first and if the > secret is not provided or providers are not configured there will be a > fallback to secrets configured in clear text (unless > 'hadoop.security.credential.clear-text-fallback' is configured to be > "false"), thus making new change backwards-compatible. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6662) Access AWS access key ID and secret access key using Credential Provider API for S3 storage plugin
[ https://issues.apache.org/jira/browse/DRILL-6662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582373#comment-16582373 ] ASF GitHub Bot commented on DRILL-6662: --- arina-ielchiieva commented on issue #1419: DRILL-6662: Access AWS access key ID and secret access key using Cred… URL: https://github.com/apache/drill/pull/1419#issuecomment-413512174 +1, LGTM. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Access AWS access key ID and secret access key using Credential Provider API > for S3 storage plugin > -- > > Key: DRILL-6662 > URL: https://issues.apache.org/jira/browse/DRILL-6662 > Project: Apache Drill > Issue Type: Improvement >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Labels: doc-impacting > Fix For: 1.15.0 > > > Hadoop provides [CredentialProvider > API|[https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html]] > which allows passwords and other sensitive secrets to be stored in an > external provider rather than in configuration files in plaintext. > Currently S3 storage plugin is accessing passwords, namely > 'fs.s3a.access.key' and 'fs.s3a.secret.key', stored in clear text in > Configuration with get() method. To give users an ability to remove clear > text passwords for S3 from configuration files Configuration.getPassword() > method should be used, given they configure > 'hadoop.security.credential.provider.path' property which points to a file > containing encrypted passwords instead of configuring two aforementioned > properties. > By using this approach, credential providers will be checked first and if the > secret is not provided or providers are not configured there will be a > fallback to secrets configured in clear text (unless > 'hadoop.security.credential.clear-text-fallback' is configured to be > "false"), thus making new change backwards-compatible. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6662) Access AWS access key ID and secret access key using Credential Provider API for S3 storage plugin
[ https://issues.apache.org/jira/browse/DRILL-6662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582371#comment-16582371 ] ASF GitHub Bot commented on DRILL-6662: --- arina-ielchiieva commented on a change in pull request #1419: DRILL-6662: Access AWS access key ID and secret access key using Cred… URL: https://github.com/apache/drill/pull/1419#discussion_r210559817 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java ## @@ -104,6 +109,33 @@ public FileSystemPlugin(FileSystemConfig config, DrillbitContext context, String } } + private boolean isS3Connection(Configuration conf) { +URI uri = FileSystem.getDefaultUri(conf); +return uri.getScheme().equals("s3a"); + } + + /** + * Retrieve secret and access keys from configured (with + * {@link org.apache.hadoop.security.alias.CredentialProviderFactory#CREDENTIAL_PROVIDER_PATH} property) + * credential providers and set it into {@code conf}. If provider path is not configured or credential + * is absent in providers, it will conditionally fallback to configuration setting. The fallback will occur unless + * {@link org.apache.hadoop.security.alias.CredentialProvider#CLEAR_TEXT_FALLBACK} is set to {@code false}. + * + * @param conf {@code Configuration} which will be updated with credentials from provider + * @throws IOException thrown if a credential cannot be retrieved from provider + */ + private void handleS3Credentials(Configuration conf) throws IOException { +String[] credentialKeys = {"fs.s3a.secret.key", "fs.s3a.access.key"}; Review comment: In this case, please leave as is. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Access AWS access key ID and secret access key using Credential Provider API > for S3 storage plugin > -- > > Key: DRILL-6662 > URL: https://issues.apache.org/jira/browse/DRILL-6662 > Project: Apache Drill > Issue Type: Improvement >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Labels: doc-impacting > Fix For: 1.15.0 > > > Hadoop provides [CredentialProvider > API|[https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html]] > which allows passwords and other sensitive secrets to be stored in an > external provider rather than in configuration files in plaintext. > Currently S3 storage plugin is accessing passwords, namely > 'fs.s3a.access.key' and 'fs.s3a.secret.key', stored in clear text in > Configuration with get() method. To give users an ability to remove clear > text passwords for S3 from configuration files Configuration.getPassword() > method should be used, given they configure > 'hadoop.security.credential.provider.path' property which points to a file > containing encrypted passwords instead of configuring two aforementioned > properties. > By using this approach, credential providers will be checked first and if the > secret is not provided or providers are not configured there will be a > fallback to secrets configured in clear text (unless > 'hadoop.security.credential.clear-text-fallback' is configured to be > "false"), thus making new change backwards-compatible. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6662) Access AWS access key ID and secret access key using Credential Provider API for S3 storage plugin
[ https://issues.apache.org/jira/browse/DRILL-6662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582369#comment-16582369 ] ASF GitHub Bot commented on DRILL-6662: --- KazydubB commented on a change in pull request #1419: DRILL-6662: Access AWS access key ID and secret access key using Cred… URL: https://github.com/apache/drill/pull/1419#discussion_r210559211 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java ## @@ -104,6 +109,33 @@ public FileSystemPlugin(FileSystemConfig config, DrillbitContext context, String } } + private boolean isS3Connection(Configuration conf) { +URI uri = FileSystem.getDefaultUri(conf); +return uri.getScheme().equals("s3a"); + } + + /** + * Retrieve secret and access keys from configured (with + * {@link org.apache.hadoop.security.alias.CredentialProviderFactory#CREDENTIAL_PROVIDER_PATH} property) + * credential providers and set it into {@code conf}. If provider path is not configured or credential + * is absent in providers, it will conditionally fallback to configuration setting. The fallback will occur unless + * {@link org.apache.hadoop.security.alias.CredentialProvider#CLEAR_TEXT_FALLBACK} is set to {@code false}. + * + * @param conf {@code Configuration} which will be updated with credentials from provider + * @throws IOException thrown if a credential cannot be retrieved from provider + */ + private void handleS3Credentials(Configuration conf) throws IOException { +String[] credentialKeys = {"fs.s3a.secret.key", "fs.s3a.access.key"}; Review comment: I am aware of the Constants, but artifact (hadoop-aws), containing this class is not among the module's dependencies (however it is present in distribution's (compile-scope) and drill-root's (test-scope) dependencies). Is there a need to add the dependency? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Access AWS access key ID and secret access key using Credential Provider API > for S3 storage plugin > -- > > Key: DRILL-6662 > URL: https://issues.apache.org/jira/browse/DRILL-6662 > Project: Apache Drill > Issue Type: Improvement >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Labels: doc-impacting > Fix For: 1.15.0 > > > Hadoop provides [CredentialProvider > API|[https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html]] > which allows passwords and other sensitive secrets to be stored in an > external provider rather than in configuration files in plaintext. > Currently S3 storage plugin is accessing passwords, namely > 'fs.s3a.access.key' and 'fs.s3a.secret.key', stored in clear text in > Configuration with get() method. To give users an ability to remove clear > text passwords for S3 from configuration files Configuration.getPassword() > method should be used, given they configure > 'hadoop.security.credential.provider.path' property which points to a file > containing encrypted passwords instead of configuring two aforementioned > properties. > By using this approach, credential providers will be checked first and if the > secret is not provided or providers are not configured there will be a > fallback to secrets configured in clear text (unless > 'hadoop.security.credential.clear-text-fallback' is configured to be > "false"), thus making new change backwards-compatible. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6662) Access AWS access key ID and secret access key using Credential Provider API for S3 storage plugin
[ https://issues.apache.org/jira/browse/DRILL-6662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582314#comment-16582314 ] ASF GitHub Bot commented on DRILL-6662: --- arina-ielchiieva commented on a change in pull request #1419: DRILL-6662: Access AWS access key ID and secret access key using Cred… URL: https://github.com/apache/drill/pull/1419#discussion_r210540931 ## File path: distribution/src/resources/core-site-example.xml ## @@ -30,4 +30,14 @@ ENTER_YOUR_SECRETKEY + Review comment: Please comment out this section and add comment explaining that user should use one of those. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Access AWS access key ID and secret access key using Credential Provider API > for S3 storage plugin > -- > > Key: DRILL-6662 > URL: https://issues.apache.org/jira/browse/DRILL-6662 > Project: Apache Drill > Issue Type: Improvement >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Labels: doc-impacting > Fix For: 1.15.0 > > > Hadoop provides [CredentialProvider > API|[https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html]] > which allows passwords and other sensitive secrets to be stored in an > external provider rather than in configuration files in plaintext. > Currently S3 storage plugin is accessing passwords, namely > 'fs.s3a.access.key' and 'fs.s3a.secret.key', stored in clear text in > Configuration with get() method. To give users an ability to remove clear > text passwords for S3 from configuration files Configuration.getPassword() > method should be used, given they configure > 'hadoop.security.credential.provider.path' property which points to a file > containing encrypted passwords instead of configuring two aforementioned > properties. > By using this approach, credential providers will be checked first and if the > secret is not provided or providers are not configured there will be a > fallback to secrets configured in clear text (unless > 'hadoop.security.credential.clear-text-fallback' is configured to be > "false"), thus making new change backwards-compatible. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6662) Access AWS access key ID and secret access key using Credential Provider API for S3 storage plugin
[ https://issues.apache.org/jira/browse/DRILL-6662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582313#comment-16582313 ] ASF GitHub Bot commented on DRILL-6662: --- arina-ielchiieva commented on a change in pull request #1419: DRILL-6662: Access AWS access key ID and secret access key using Cred… URL: https://github.com/apache/drill/pull/1419#discussion_r210541907 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java ## @@ -104,6 +109,33 @@ public FileSystemPlugin(FileSystemConfig config, DrillbitContext context, String } } + private boolean isS3Connection(Configuration conf) { +URI uri = FileSystem.getDefaultUri(conf); +return uri.getScheme().equals("s3a"); + } + + /** + * Retrieve secret and access keys from configured (with + * {@link org.apache.hadoop.security.alias.CredentialProviderFactory#CREDENTIAL_PROVIDER_PATH} property) + * credential providers and set it into {@code conf}. If provider path is not configured or credential + * is absent in providers, it will conditionally fallback to configuration setting. The fallback will occur unless + * {@link org.apache.hadoop.security.alias.CredentialProvider#CLEAR_TEXT_FALLBACK} is set to {@code false}. + * + * @param conf {@code Configuration} which will be updated with credentials from provider + * @throws IOException thrown if a credential cannot be retrieved from provider + */ + private void handleS3Credentials(Configuration conf) throws IOException { +String[] credentialKeys = {"fs.s3a.secret.key", "fs.s3a.access.key"}; Review comment: Consider using org.apache.hadoop.fs.s3a.Contants class. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Access AWS access key ID and secret access key using Credential Provider API > for S3 storage plugin > -- > > Key: DRILL-6662 > URL: https://issues.apache.org/jira/browse/DRILL-6662 > Project: Apache Drill > Issue Type: Improvement >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Labels: doc-impacting > Fix For: 1.15.0 > > > Hadoop provides [CredentialProvider > API|[https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html]] > which allows passwords and other sensitive secrets to be stored in an > external provider rather than in configuration files in plaintext. > Currently S3 storage plugin is accessing passwords, namely > 'fs.s3a.access.key' and 'fs.s3a.secret.key', stored in clear text in > Configuration with get() method. To give users an ability to remove clear > text passwords for S3 from configuration files Configuration.getPassword() > method should be used, given they configure > 'hadoop.security.credential.provider.path' property which points to a file > containing encrypted passwords instead of configuring two aforementioned > properties. > By using this approach, credential providers will be checked first and if the > secret is not provided or providers are not configured there will be a > fallback to secrets configured in clear text (unless > 'hadoop.security.credential.clear-text-fallback' is configured to be > "false"), thus making new change backwards-compatible. -- This message was sent by Atlassian JIRA (v7.6.3#76005)