[jira] [Commented] (DRILL-6562) Plugin Management improvements

2019-04-16 Thread Bridget Bevens (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16819625#comment-16819625
 ] 

Bridget Bevens commented on DRILL-6562:
---

Hi [~vitalii],

If you use the Export All button to export all storage plugin configurations to 
HOCON format (.conf file), how do you import those configurations back into 
Drill? My guess is that you would name the file storage-plugins-override.conf 
and then copy it to the /conf directory, but I'm not sure. 

Thanks,
Bridget

> Plugin Management improvements
> --
>
> Key: DRILL-6562
> URL: https://issues.apache.org/jira/browse/DRILL-6562
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - HTTP, Web Server
>Affects Versions: 1.14.0
>Reporter: Abhishek Girish
>Assignee: Vitalii Diravka
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.16.0
>
> Attachments: Export.png, ExportAll.png, Screenshot from 2019-03-21 
> 01-18-17.png, Screenshot from 2019-03-21 02-52-50.png, Storage.png, 
> UpdateExport.png, create.png, image-2018-07-23-02-55-02-024.png, 
> image-2018-10-22-20-20-24-658.png, image-2018-10-22-20-20-59-105.png
>
>
> Follow-up to DRILL-4580.
> Provide ability to export all storage plugin configurations at once, with a 
> new "Export All" option on the Storage page of the Drill web UI



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7062) Run-time row group pruning

2019-04-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16819612#comment-16819612
 ] 

ASF GitHub Bot commented on DRILL-7062:
---

rhou1 commented on pull request #1738: DRILL-7062: Initial implementation of 
run-time row-group pruning
URL: https://github.com/apache/drill/pull/1738#discussion_r276035958
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/AbstractParquetScanBatchCreator.java
 ##
 @@ -68,76 +84,144 @@ protected ScanBatch getBatch(ExecutorFragmentContext 
context, AbstractParquetRow
 List readers = new LinkedList<>();
 List> implicitColumns = new ArrayList<>();
 Map mapWithMaxColumns = new LinkedHashMap<>();
-for (RowGroupReadEntry rowGroup : rowGroupScan.getRowGroupReadEntries()) {
-  /*
-  Here we could store a map from file names to footers, to prevent 
re-reading the footer for each row group in a file
-  TODO - to prevent reading the footer again in the parquet record reader 
(it is read earlier in the ParquetStorageEngine)
-  we should add more information to the RowGroupInfo that will be 
populated upon the first read to
-  provide the reader with all of th file meta-data it needs
-  These fields will be added to the constructor below
-  */
-  try {
-Stopwatch timer = logger.isTraceEnabled() ? 
Stopwatch.createUnstarted() : null;
-DrillFileSystem fs = fsManager.get(rowGroupScan.getFsConf(rowGroup), 
rowGroup.getPath());
-ParquetReaderConfig readerConfig = rowGroupScan.getReaderConfig();
-if (!footers.containsKey(rowGroup.getPath())) {
-  if (timer != null) {
-timer.start();
-  }
+ParquetReaderConfig readerConfig = rowGroupScan.getReaderConfig();
+RowGroupReadEntry firstRowGroup = null; // to be scanned in case ALL row 
groups are pruned out
+ParquetMetadata firstFooter = null;
+long rowgroupsPruned = 0; // for stats
+TupleSchema tupleSchema = rowGroupScan.getTupleSchema();
+
+try {
+
+  LogicalExpression filterExpr = rowGroupScan.getFilter();
+  boolean doRuntimePruning = filterExpr != null && // was a filter given ? 
  And it is not just a "TRUE" predicate
+! ((filterExpr instanceof ValueExpressions.BooleanExpression) && 
((ValueExpressions.BooleanExpression) filterExpr).getBoolean() );
 
-  ParquetMetadata footer = readFooter(fs.getConf(), 
rowGroup.getPath(), readerConfig);
-  if (timer != null) {
-long timeToRead = timer.elapsed(TimeUnit.MICROSECONDS);
-logger.trace("ParquetTrace,Read Footer,{},{},{},{},{},{},{}", "", 
rowGroup.getPath(), "", 0, 0, 0, timeToRead);
+  // Runtime pruning: Avoid recomputing metadata objects for each 
row-group in case they use the same file
+  // by keeping the following objects computed earlier (relies on same 
file being in consecutive rowgroups)
+  Path prevRowGroupPath = null;
+  Metadata_V4.ParquetTableMetadata_v4 tableMetadataV4 = null;
+  Metadata_V4.ParquetFileAndRowCountMetadata fileMetadataV4 = null;
+  FileSelection fileSelection = null;
+  FilterPredicate filterPredicate = null;
+  Set schemaPathsInExpr = null;
+  Set columnsInExpr = null;
+
+  // If pruning - Prepare the predicate and the columns before the FOR LOOP
+  if ( doRuntimePruning ) {
+filterPredicate = 
AbstractGroupScanWithMetadata.getFilterPredicate(filterExpr, context,
+  (FunctionImplementationRegistry) context.getFunctionRegistry(), 
context.getOptions(), true,
+  true /* supports file implicit columns */,
+  tupleSchema);
+// Extract only the relevant columns from the filter (sans implicit 
columns, if any)
+schemaPathsInExpr = filterExpr.accept(new 
FilterEvaluatorUtils.FieldReferenceFinder(), null);
+columnsInExpr = new HashSet<>();
+String partitionColumnLabel = 
context.getOptions().getOption(ExecConstants.FILESYSTEM_PARTITION_COLUMN_LABEL).string_val;
+for (SchemaPath path : schemaPathsInExpr) {
+  if (rowGroupScan.supportsFileImplicitColumns() &&
+path.toString().matches(partitionColumnLabel+"\\d+")) {
+continue;  // skip implicit columns like dir0, dir1
   }
-  footers.put(rowGroup.getPath(), footer);
-}
-ParquetMetadata footer = footers.get(rowGroup.getPath());
-
-ParquetReaderUtility.DateCorruptionStatus containsCorruptDates = 
ParquetReaderUtility.detectCorruptDates(footer,
-  rowGroupScan.getColumns(), readerConfig.autoCorrectCorruptedDates());
-logger.debug("Contains corrupt dates: {}.", containsCorruptDates);
-
-boolean useNewReader = 
context.getOptions().getBoolean(ExecConstants.PARQUET_NEW_RECORD_READER);
-boolean containsComplexColumn = 
ParquetReaderUtility.containsComplexColumn(footer, 

[jira] [Commented] (DRILL-7062) Run-time row group pruning

2019-04-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16819611#comment-16819611
 ] 

ASF GitHub Bot commented on DRILL-7062:
---

rhou1 commented on pull request #1738: DRILL-7062: Initial implementation of 
run-time row-group pruning
URL: https://github.com/apache/drill/pull/1738#discussion_r276035958
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/AbstractParquetScanBatchCreator.java
 ##
 @@ -68,76 +84,144 @@ protected ScanBatch getBatch(ExecutorFragmentContext 
context, AbstractParquetRow
 List readers = new LinkedList<>();
 List> implicitColumns = new ArrayList<>();
 Map mapWithMaxColumns = new LinkedHashMap<>();
-for (RowGroupReadEntry rowGroup : rowGroupScan.getRowGroupReadEntries()) {
-  /*
-  Here we could store a map from file names to footers, to prevent 
re-reading the footer for each row group in a file
-  TODO - to prevent reading the footer again in the parquet record reader 
(it is read earlier in the ParquetStorageEngine)
-  we should add more information to the RowGroupInfo that will be 
populated upon the first read to
-  provide the reader with all of th file meta-data it needs
-  These fields will be added to the constructor below
-  */
-  try {
-Stopwatch timer = logger.isTraceEnabled() ? 
Stopwatch.createUnstarted() : null;
-DrillFileSystem fs = fsManager.get(rowGroupScan.getFsConf(rowGroup), 
rowGroup.getPath());
-ParquetReaderConfig readerConfig = rowGroupScan.getReaderConfig();
-if (!footers.containsKey(rowGroup.getPath())) {
-  if (timer != null) {
-timer.start();
-  }
+ParquetReaderConfig readerConfig = rowGroupScan.getReaderConfig();
+RowGroupReadEntry firstRowGroup = null; // to be scanned in case ALL row 
groups are pruned out
+ParquetMetadata firstFooter = null;
+long rowgroupsPruned = 0; // for stats
+TupleSchema tupleSchema = rowGroupScan.getTupleSchema();
+
+try {
+
+  LogicalExpression filterExpr = rowGroupScan.getFilter();
+  boolean doRuntimePruning = filterExpr != null && // was a filter given ? 
  And it is not just a "TRUE" predicate
+! ((filterExpr instanceof ValueExpressions.BooleanExpression) && 
((ValueExpressions.BooleanExpression) filterExpr).getBoolean() );
 
-  ParquetMetadata footer = readFooter(fs.getConf(), 
rowGroup.getPath(), readerConfig);
-  if (timer != null) {
-long timeToRead = timer.elapsed(TimeUnit.MICROSECONDS);
-logger.trace("ParquetTrace,Read Footer,{},{},{},{},{},{},{}", "", 
rowGroup.getPath(), "", 0, 0, 0, timeToRead);
+  // Runtime pruning: Avoid recomputing metadata objects for each 
row-group in case they use the same file
+  // by keeping the following objects computed earlier (relies on same 
file being in consecutive rowgroups)
+  Path prevRowGroupPath = null;
+  Metadata_V4.ParquetTableMetadata_v4 tableMetadataV4 = null;
+  Metadata_V4.ParquetFileAndRowCountMetadata fileMetadataV4 = null;
+  FileSelection fileSelection = null;
+  FilterPredicate filterPredicate = null;
+  Set schemaPathsInExpr = null;
+  Set columnsInExpr = null;
+
+  // If pruning - Prepare the predicate and the columns before the FOR LOOP
+  if ( doRuntimePruning ) {
+filterPredicate = 
AbstractGroupScanWithMetadata.getFilterPredicate(filterExpr, context,
+  (FunctionImplementationRegistry) context.getFunctionRegistry(), 
context.getOptions(), true,
+  true /* supports file implicit columns */,
+  tupleSchema);
+// Extract only the relevant columns from the filter (sans implicit 
columns, if any)
+schemaPathsInExpr = filterExpr.accept(new 
FilterEvaluatorUtils.FieldReferenceFinder(), null);
+columnsInExpr = new HashSet<>();
+String partitionColumnLabel = 
context.getOptions().getOption(ExecConstants.FILESYSTEM_PARTITION_COLUMN_LABEL).string_val;
+for (SchemaPath path : schemaPathsInExpr) {
+  if (rowGroupScan.supportsFileImplicitColumns() &&
+path.toString().matches(partitionColumnLabel+"\\d+")) {
+continue;  // skip implicit columns like dir0, dir1
   }
-  footers.put(rowGroup.getPath(), footer);
-}
-ParquetMetadata footer = footers.get(rowGroup.getPath());
-
-ParquetReaderUtility.DateCorruptionStatus containsCorruptDates = 
ParquetReaderUtility.detectCorruptDates(footer,
-  rowGroupScan.getColumns(), readerConfig.autoCorrectCorruptedDates());
-logger.debug("Contains corrupt dates: {}.", containsCorruptDates);
-
-boolean useNewReader = 
context.getOptions().getBoolean(ExecConstants.PARQUET_NEW_RECORD_READER);
-boolean containsComplexColumn = 
ParquetReaderUtility.containsComplexColumn(footer, 

[jira] [Commented] (DRILL-7062) Run-time row group pruning

2019-04-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16819604#comment-16819604
 ] 

ASF GitHub Bot commented on DRILL-7062:
---

amansinha100 commented on pull request #1738: DRILL-7062: Initial 
implementation of run-time row-group pruning
URL: https://github.com/apache/drill/pull/1738#discussion_r276033309
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/AbstractParquetScanBatchCreator.java
 ##
 @@ -68,76 +84,144 @@ protected ScanBatch getBatch(ExecutorFragmentContext 
context, AbstractParquetRow
 List readers = new LinkedList<>();
 List> implicitColumns = new ArrayList<>();
 Map mapWithMaxColumns = new LinkedHashMap<>();
-for (RowGroupReadEntry rowGroup : rowGroupScan.getRowGroupReadEntries()) {
-  /*
-  Here we could store a map from file names to footers, to prevent 
re-reading the footer for each row group in a file
-  TODO - to prevent reading the footer again in the parquet record reader 
(it is read earlier in the ParquetStorageEngine)
-  we should add more information to the RowGroupInfo that will be 
populated upon the first read to
-  provide the reader with all of th file meta-data it needs
-  These fields will be added to the constructor below
-  */
-  try {
-Stopwatch timer = logger.isTraceEnabled() ? 
Stopwatch.createUnstarted() : null;
-DrillFileSystem fs = fsManager.get(rowGroupScan.getFsConf(rowGroup), 
rowGroup.getPath());
-ParquetReaderConfig readerConfig = rowGroupScan.getReaderConfig();
-if (!footers.containsKey(rowGroup.getPath())) {
-  if (timer != null) {
-timer.start();
-  }
+ParquetReaderConfig readerConfig = rowGroupScan.getReaderConfig();
+RowGroupReadEntry firstRowGroup = null; // to be scanned in case ALL row 
groups are pruned out
+ParquetMetadata firstFooter = null;
+long rowgroupsPruned = 0; // for stats
+TupleSchema tupleSchema = rowGroupScan.getTupleSchema();
+
+try {
+
+  LogicalExpression filterExpr = rowGroupScan.getFilter();
+  boolean doRuntimePruning = filterExpr != null && // was a filter given ? 
  And it is not just a "TRUE" predicate
+! ((filterExpr instanceof ValueExpressions.BooleanExpression) && 
((ValueExpressions.BooleanExpression) filterExpr).getBoolean() );
 
-  ParquetMetadata footer = readFooter(fs.getConf(), 
rowGroup.getPath(), readerConfig);
-  if (timer != null) {
-long timeToRead = timer.elapsed(TimeUnit.MICROSECONDS);
-logger.trace("ParquetTrace,Read Footer,{},{},{},{},{},{},{}", "", 
rowGroup.getPath(), "", 0, 0, 0, timeToRead);
+  // Runtime pruning: Avoid recomputing metadata objects for each 
row-group in case they use the same file
+  // by keeping the following objects computed earlier (relies on same 
file being in consecutive rowgroups)
+  Path prevRowGroupPath = null;
+  Metadata_V4.ParquetTableMetadata_v4 tableMetadataV4 = null;
+  Metadata_V4.ParquetFileAndRowCountMetadata fileMetadataV4 = null;
+  FileSelection fileSelection = null;
+  FilterPredicate filterPredicate = null;
+  Set schemaPathsInExpr = null;
+  Set columnsInExpr = null;
+
+  // If pruning - Prepare the predicate and the columns before the FOR LOOP
+  if ( doRuntimePruning ) {
+filterPredicate = 
AbstractGroupScanWithMetadata.getFilterPredicate(filterExpr, context,
+  (FunctionImplementationRegistry) context.getFunctionRegistry(), 
context.getOptions(), true,
+  true /* supports file implicit columns */,
+  tupleSchema);
+// Extract only the relevant columns from the filter (sans implicit 
columns, if any)
+schemaPathsInExpr = filterExpr.accept(new 
FilterEvaluatorUtils.FieldReferenceFinder(), null);
+columnsInExpr = new HashSet<>();
+String partitionColumnLabel = 
context.getOptions().getOption(ExecConstants.FILESYSTEM_PARTITION_COLUMN_LABEL).string_val;
+for (SchemaPath path : schemaPathsInExpr) {
+  if (rowGroupScan.supportsFileImplicitColumns() &&
+path.toString().matches(partitionColumnLabel+"\\d+")) {
+continue;  // skip implicit columns like dir0, dir1
   }
-  footers.put(rowGroup.getPath(), footer);
-}
-ParquetMetadata footer = footers.get(rowGroup.getPath());
-
-ParquetReaderUtility.DateCorruptionStatus containsCorruptDates = 
ParquetReaderUtility.detectCorruptDates(footer,
-  rowGroupScan.getColumns(), readerConfig.autoCorrectCorruptedDates());
-logger.debug("Contains corrupt dates: {}.", containsCorruptDates);
-
-boolean useNewReader = 
context.getOptions().getBoolean(ExecConstants.PARQUET_NEW_RECORD_READER);
-boolean containsComplexColumn = 
ParquetReaderUtility.containsComplexColumn(footer, 

[jira] [Commented] (DRILL-7062) Run-time row group pruning

2019-04-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16819606#comment-16819606
 ] 

ASF GitHub Bot commented on DRILL-7062:
---

amansinha100 commented on pull request #1738: DRILL-7062: Initial 
implementation of run-time row-group pruning
URL: https://github.com/apache/drill/pull/1738#discussion_r276033774
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractGroupScanWithMetadata.java
 ##
 @@ -43,6 +44,7 @@
 import org.apache.drill.exec.store.ColumnExplorer;
 import org.apache.drill.exec.store.dfs.FileSelection;
 import org.apache.drill.exec.store.parquet.FilterEvaluatorUtils;
+// import org.apache.drill.exec.store.parquet.ParquetGroupScan;
 
 Review comment:
   Remove. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Run-time row group pruning
> --
>
> Key: DRILL-7062
> URL: https://issues.apache.org/jira/browse/DRILL-7062
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Metadata
>Reporter: Venkata Jyothsna Donapati
>Assignee: Boaz Ben-Zvi
>Priority: Major
> Fix For: 1.17.0
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7062) Run-time row group pruning

2019-04-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16819605#comment-16819605
 ] 

ASF GitHub Bot commented on DRILL-7062:
---

amansinha100 commented on pull request #1738: DRILL-7062: Initial 
implementation of run-time row-group pruning
URL: https://github.com/apache/drill/pull/1738#discussion_r276032653
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/AbstractParquetScanBatchCreator.java
 ##
 @@ -68,76 +84,144 @@ protected ScanBatch getBatch(ExecutorFragmentContext 
context, AbstractParquetRow
 List readers = new LinkedList<>();
 List> implicitColumns = new ArrayList<>();
 Map mapWithMaxColumns = new LinkedHashMap<>();
-for (RowGroupReadEntry rowGroup : rowGroupScan.getRowGroupReadEntries()) {
-  /*
-  Here we could store a map from file names to footers, to prevent 
re-reading the footer for each row group in a file
-  TODO - to prevent reading the footer again in the parquet record reader 
(it is read earlier in the ParquetStorageEngine)
-  we should add more information to the RowGroupInfo that will be 
populated upon the first read to
-  provide the reader with all of th file meta-data it needs
-  These fields will be added to the constructor below
-  */
-  try {
-Stopwatch timer = logger.isTraceEnabled() ? 
Stopwatch.createUnstarted() : null;
-DrillFileSystem fs = fsManager.get(rowGroupScan.getFsConf(rowGroup), 
rowGroup.getPath());
-ParquetReaderConfig readerConfig = rowGroupScan.getReaderConfig();
-if (!footers.containsKey(rowGroup.getPath())) {
-  if (timer != null) {
-timer.start();
-  }
+ParquetReaderConfig readerConfig = rowGroupScan.getReaderConfig();
+RowGroupReadEntry firstRowGroup = null; // to be scanned in case ALL row 
groups are pruned out
+ParquetMetadata firstFooter = null;
+long rowgroupsPruned = 0; // for stats
+TupleSchema tupleSchema = rowGroupScan.getTupleSchema();
+
+try {
+
+  LogicalExpression filterExpr = rowGroupScan.getFilter();
+  boolean doRuntimePruning = filterExpr != null && // was a filter given ? 
  And it is not just a "TRUE" predicate
+! ((filterExpr instanceof ValueExpressions.BooleanExpression) && 
((ValueExpressions.BooleanExpression) filterExpr).getBoolean() );
 
-  ParquetMetadata footer = readFooter(fs.getConf(), 
rowGroup.getPath(), readerConfig);
-  if (timer != null) {
-long timeToRead = timer.elapsed(TimeUnit.MICROSECONDS);
-logger.trace("ParquetTrace,Read Footer,{},{},{},{},{},{},{}", "", 
rowGroup.getPath(), "", 0, 0, 0, timeToRead);
+  // Runtime pruning: Avoid recomputing metadata objects for each 
row-group in case they use the same file
+  // by keeping the following objects computed earlier (relies on same 
file being in consecutive rowgroups)
+  Path prevRowGroupPath = null;
+  Metadata_V4.ParquetTableMetadata_v4 tableMetadataV4 = null;
+  Metadata_V4.ParquetFileAndRowCountMetadata fileMetadataV4 = null;
+  FileSelection fileSelection = null;
+  FilterPredicate filterPredicate = null;
+  Set schemaPathsInExpr = null;
+  Set columnsInExpr = null;
+
+  // If pruning - Prepare the predicate and the columns before the FOR LOOP
+  if ( doRuntimePruning ) {
+filterPredicate = 
AbstractGroupScanWithMetadata.getFilterPredicate(filterExpr, context,
+  (FunctionImplementationRegistry) context.getFunctionRegistry(), 
context.getOptions(), true,
+  true /* supports file implicit columns */,
+  tupleSchema);
+// Extract only the relevant columns from the filter (sans implicit 
columns, if any)
+schemaPathsInExpr = filterExpr.accept(new 
FilterEvaluatorUtils.FieldReferenceFinder(), null);
+columnsInExpr = new HashSet<>();
+String partitionColumnLabel = 
context.getOptions().getOption(ExecConstants.FILESYSTEM_PARTITION_COLUMN_LABEL).string_val;
+for (SchemaPath path : schemaPathsInExpr) {
+  if (rowGroupScan.supportsFileImplicitColumns() &&
+path.toString().matches(partitionColumnLabel+"\\d+")) {
+continue;  // skip implicit columns like dir0, dir1
   }
-  footers.put(rowGroup.getPath(), footer);
-}
-ParquetMetadata footer = footers.get(rowGroup.getPath());
-
-ParquetReaderUtility.DateCorruptionStatus containsCorruptDates = 
ParquetReaderUtility.detectCorruptDates(footer,
-  rowGroupScan.getColumns(), readerConfig.autoCorrectCorruptedDates());
-logger.debug("Contains corrupt dates: {}.", containsCorruptDates);
-
-boolean useNewReader = 
context.getOptions().getBoolean(ExecConstants.PARQUET_NEW_RECORD_READER);
-boolean containsComplexColumn = 
ParquetReaderUtility.containsComplexColumn(footer, 

[jira] [Commented] (DRILL-7171) Count(*) query on leaf level directory is not reading summary cache file.

2019-04-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16819529#comment-16819529
 ] 

ASF GitHub Bot commented on DRILL-7171:
---

amansinha100 commented on issue #1748: DRILL-7171: Create metadata directories 
cache file in the leaf level directories to support ConvertCountToDirectScan 
optimization.
URL: https://github.com/apache/drill/pull/1748#issuecomment-483857402
 
 
   LGTM  +1.   Once @vdiravka is ok with the changes, you can mark this 
ready-to-commit. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Count(*) query on leaf level directory is not reading summary cache file.
> -
>
> Key: DRILL-7171
> URL: https://issues.apache.org/jira/browse/DRILL-7171
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Venkata Jyothsna Donapati
>Assignee: Venkata Jyothsna Donapati
>Priority: Minor
> Fix For: 1.17.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Since the leaf level directory doesn't store the metadata directories file, 
> while reading summary if the directories cache file is not present, it is 
> assumed that the cache is possibly corrupt and reading of the summary cache 
> file is skipped. Metadata directories cache file should be created at the 
> leaf level.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-540) Allow querying hive views in Drill

2019-04-16 Thread Bridget Bevens (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16819478#comment-16819478
 ] 

Bridget Bevens commented on DRILL-540:
--

Hi [~IhorHuzenko]
I've updated this page with the info: 
https://drill.apache.org/docs/querying-hive/ 
I've also updated this page to state that Hive views are supported as of Drill 
1.16: 
https://drill.apache.org/docs/querying-the-information-schema/

Please let me know if I need to change anything.

Thanks,
Bridget

> Allow querying hive views in Drill
> --
>
> Key: DRILL-540
> URL: https://issues.apache.org/jira/browse/DRILL-540
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Hive
>Reporter: Ramana Inukonda Nagaraj
>Assignee: Igor Guzenko
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.16.0
>
>
> Currently hive views cannot be queried from drill.
> This Jira aims to add support for Hive views in Drill.
> *Implementation details:*
>  # Drill persists it's views metadata in file with suffix .view.drill using 
> json format. For example: 
> {noformat}
> {
>  "name" : "view_from_calcite_1_4",
>  "sql" : "SELECT * FROM `cp`.`store.json`WHERE `store_id` = 0",
>  "fields" : [ {
>  "name" : "*",
>  "type" : "ANY",
>  "isNullable" : true
>  } ],
>  "workspaceSchemaPath" : [ "dfs", "tmp" ]
> }
> {noformat}
> Later Drill parses the metadata and uses it to treat view names in SQL as a 
> subquery.
>       2. In Apache Hive metadata about views is stored in similar way to 
> tables. Below is example from metastore.TBLS :
>  
> {noformat}
> TBL_ID |CREATE_TIME |DB_ID |LAST_ACCESS_TIME |OWNER |RETENTION |SD_ID 
> |TBL_NAME  |TBL_TYPE  |VIEW_EXPANDED_TEXT |
> ---||--|-|--|--|--|--|--|---|
> 2  |1542111078  |1 |0|mapr  |0 |2 |cview  
>|VIRTUAL_VIEW  |SELECT COUNT(*) FROM `default`.`customers` |
> {noformat}
>       3. So in Hive metastore views are considered as tables of special type. 
> And main benefit is that we also have expanded SQL definition of views (just 
> like in view.drill files). Also reading of the metadata is already 
> implemented in Drill with help of thrift Metastore API.
>       4. To enable querying of Hive views we'll reuse existing code for Drill 
> views as much as possible. First in *_HiveSchemaFactory.getDrillTable_* for 
> _*HiveReadEntry*_ we'll convert the metadata to instance of _*View*_ (_which 
> is actually model for data persisted in .view.drill files_) and then based on 
> this instance return new _*DrillViewTable*_. Using this approach drill will 
> handle hive views the same way as if it was initially defined in Drill and 
> persisted in .view.drill file. 
>      5. For conversion of Hive types: from _*FieldSchema*_ to _*RelDataType*_ 
> we'll reuse existing code from _*DrillHiveTable*_, so the conversion 
> functionality will be extracted and used for both (table and view) fields 
> type conversions. 
> *Security implications*
> Consider simple example case where we have users, 
> {code:java}
> user0  user1 user2
>\ /
>   group12
> {code}
> and  sample db where object names contains user or group who should access 
> them      
> {code:java}
> db_all
> tbl_user0
> vw_user0
> tbl_group12
> vw_group12
> {code}
> There are two Hive authorization modes supported  by Drill - SQL Standart and 
> Strorage Based  authorization. For SQL Standart authorization permissions 
> were granted using SQL: 
> {code:java}
> SET ROLE admin;
> GRANT SELECT ON db_all.tbl_user0 TO USER user0;
> GRANT SELECT ON db_all.vw_user0 TO USER user0;
> CREATE ROLE group12;
> GRANT ROLE group12 TO USER user1;
> GRANT ROLE group12 TO USER user2;
> GRANT SELECT ON db_all.tbl_group12 TO ROLE group12;
> GRANT SELECT ON db_all.vw_group12 TO ROLE group12;
> {code}
> And for Storage based authorization permissions were granted using commands: 
> {code:java}
> hadoop fs -chown user0:user0 /user/hive/warehouse/db_all.db/tbl_user0
> hadoop fs -chmod 700 /user/hive/warehouse/db_all.db/tbl_user0
> hadoop fs -chmod 750 /user/hive/warehouse/db_all.db/tbl_group12
> hadoop fs -chown user1:group12 
> /user/hive/warehouse/db_all.db/tbl_group12{code}
>  Then the following table shows us results of queries for both authorization 
> models. 
>                                                                               
>                           *SQL Standart     |            Storage Based 
> Authorization*
> ||SQL||user0||user1||user2||   ||user0||user1||user2||
> |*Queries executed using Drill :*| | | | | | | |
> |SHOW TABLES IN hive.db_all;|   all|    all|   all| |Accessibe tables + 

[jira] [Updated] (DRILL-540) Allow querying hive views in Drill

2019-04-16 Thread Bridget Bevens (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bridget Bevens updated DRILL-540:
-
Labels: doc-complete ready-to-commit  (was: doc-impacting ready-to-commit)

> Allow querying hive views in Drill
> --
>
> Key: DRILL-540
> URL: https://issues.apache.org/jira/browse/DRILL-540
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Hive
>Reporter: Ramana Inukonda Nagaraj
>Assignee: Igor Guzenko
>Priority: Major
>  Labels: doc-complete, ready-to-commit
> Fix For: 1.16.0
>
>
> Currently hive views cannot be queried from drill.
> This Jira aims to add support for Hive views in Drill.
> *Implementation details:*
>  # Drill persists it's views metadata in file with suffix .view.drill using 
> json format. For example: 
> {noformat}
> {
>  "name" : "view_from_calcite_1_4",
>  "sql" : "SELECT * FROM `cp`.`store.json`WHERE `store_id` = 0",
>  "fields" : [ {
>  "name" : "*",
>  "type" : "ANY",
>  "isNullable" : true
>  } ],
>  "workspaceSchemaPath" : [ "dfs", "tmp" ]
> }
> {noformat}
> Later Drill parses the metadata and uses it to treat view names in SQL as a 
> subquery.
>       2. In Apache Hive metadata about views is stored in similar way to 
> tables. Below is example from metastore.TBLS :
>  
> {noformat}
> TBL_ID |CREATE_TIME |DB_ID |LAST_ACCESS_TIME |OWNER |RETENTION |SD_ID 
> |TBL_NAME  |TBL_TYPE  |VIEW_EXPANDED_TEXT |
> ---||--|-|--|--|--|--|--|---|
> 2  |1542111078  |1 |0|mapr  |0 |2 |cview  
>|VIRTUAL_VIEW  |SELECT COUNT(*) FROM `default`.`customers` |
> {noformat}
>       3. So in Hive metastore views are considered as tables of special type. 
> And main benefit is that we also have expanded SQL definition of views (just 
> like in view.drill files). Also reading of the metadata is already 
> implemented in Drill with help of thrift Metastore API.
>       4. To enable querying of Hive views we'll reuse existing code for Drill 
> views as much as possible. First in *_HiveSchemaFactory.getDrillTable_* for 
> _*HiveReadEntry*_ we'll convert the metadata to instance of _*View*_ (_which 
> is actually model for data persisted in .view.drill files_) and then based on 
> this instance return new _*DrillViewTable*_. Using this approach drill will 
> handle hive views the same way as if it was initially defined in Drill and 
> persisted in .view.drill file. 
>      5. For conversion of Hive types: from _*FieldSchema*_ to _*RelDataType*_ 
> we'll reuse existing code from _*DrillHiveTable*_, so the conversion 
> functionality will be extracted and used for both (table and view) fields 
> type conversions. 
> *Security implications*
> Consider simple example case where we have users, 
> {code:java}
> user0  user1 user2
>\ /
>   group12
> {code}
> and  sample db where object names contains user or group who should access 
> them      
> {code:java}
> db_all
> tbl_user0
> vw_user0
> tbl_group12
> vw_group12
> {code}
> There are two Hive authorization modes supported  by Drill - SQL Standart and 
> Strorage Based  authorization. For SQL Standart authorization permissions 
> were granted using SQL: 
> {code:java}
> SET ROLE admin;
> GRANT SELECT ON db_all.tbl_user0 TO USER user0;
> GRANT SELECT ON db_all.vw_user0 TO USER user0;
> CREATE ROLE group12;
> GRANT ROLE group12 TO USER user1;
> GRANT ROLE group12 TO USER user2;
> GRANT SELECT ON db_all.tbl_group12 TO ROLE group12;
> GRANT SELECT ON db_all.vw_group12 TO ROLE group12;
> {code}
> And for Storage based authorization permissions were granted using commands: 
> {code:java}
> hadoop fs -chown user0:user0 /user/hive/warehouse/db_all.db/tbl_user0
> hadoop fs -chmod 700 /user/hive/warehouse/db_all.db/tbl_user0
> hadoop fs -chmod 750 /user/hive/warehouse/db_all.db/tbl_group12
> hadoop fs -chown user1:group12 
> /user/hive/warehouse/db_all.db/tbl_group12{code}
>  Then the following table shows us results of queries for both authorization 
> models. 
>                                                                               
>                           *SQL Standart     |            Storage Based 
> Authorization*
> ||SQL||user0||user1||user2||   ||user0||user1||user2||
> |*Queries executed using Drill :*| | | | | | | |
> |SHOW TABLES IN hive.db_all;|   all|    all|   all| |Accessibe tables + all 
> views|Accessibe tables + all views|Accessibe tables + all views|
> |SELECT * FROM hive.db_all.tbl_user0;|   (/)|   (x)|   (x)| |        (/)|     
>    (x)|         (x)|
> |SELECT * FROM hive.db_all.vw_user0;|   (/)|   (x)|   (x)| |        (/)|      
>   (x)|         (x)|
> 

[jira] [Updated] (DRILL-7177) Format Plugin for Excel Files

2019-04-16 Thread Bridget Bevens (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bridget Bevens updated DRILL-7177:
--
Labels: doc-impacting  (was: )

> Format Plugin for Excel Files
> -
>
> Key: DRILL-7177
> URL: https://issues.apache.org/jira/browse/DRILL-7177
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> This pull request adds the functionality which enables Drill to query 
> Microsoft Excel files. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7110) Skip writing profile when an ALTER SESSION is executed

2019-04-16 Thread Kunal Khatua (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16819273#comment-16819273
 ] 

Kunal Khatua commented on DRILL-7110:
-

[~bbevens] LGTM. Thanks!

> Skip writing profile when an ALTER SESSION is executed
> --
>
> Key: DRILL-7110
> URL: https://issues.apache.org/jira/browse/DRILL-7110
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Monitoring
>Affects Versions: 1.16.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Minor
>  Labels: doc-complete, ready-to-commit
> Fix For: 1.16.0
>
>
> Currently, any {{ALTER }} query will be logged. While this is useful, 
> it can potentially add up to a lot of profiles being written unnecessarily, 
> since those changes are also reflected on the queries that follow.
> This JIRA is proposing an option to skip writing such profiles to the profile 
> store.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7020) big varchar doesn't work with extractHeader=true

2019-04-16 Thread benj (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16818954#comment-16818954
 ] 

benj commented on DRILL-7020:
-

Note that this problem of course exists +for every csvh+ file that contains at 
least one field with more than 65536 characters.
{code:java}
SELECT * FROM ...`example_file_with_large_field.csvh`
Error: UNSUPPORTED_OPERATION ERROR: Trying to write something big in a column
{code}

The trick using "TABLE()" syntax to bypass this limitation is useful but as 
already mentioned force to use "COLUMNS[0]" syntax instead of real column name.

The error message is already a little bit disturbing because it say "write" 
although the problem comes from a reading (file).

> big varchar doesn't work with extractHeader=true
> 
>
> Key: DRILL-7020
> URL: https://issues.apache.org/jira/browse/DRILL-7020
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Text  CSV
>Affects Versions: 1.15.0
>Reporter: benj
>Priority: Major
>
> with a TEST file of csv type like
> {code:java}
> col1,col2
> w,x
> ...y...,z
> {code}
> where ...y... is > 65536 characters string (let say 66000 for example)
> SELECT with +*extractHeader=false*+ are OK
> {code:java}
> SELECT * FROM TABLE(tmp.`TEST`(type => 'text', fieldDelimiter => ',', 
> extractHeader => false));
>     col1  | col2
> +-+--
> | w       | x
> | ...y... | z
> {code}
> But SELECT with +*extractHeader=true*+ gives an error
> {code:java}
> SELECT * FROM TABLE(tmp.`TEST`(type => 'text', fieldDelimiter => ',', 
> extractHeader => true));
> Error: UNSUPPORTED_OPERATION ERROR: Trying to write something big in a column
> columnIndex 1
> Limit 65536
> Fragment 0:0
> {code}
> Note that is possible to use extractHeader=false with skipFirstLine=true but 
> in this case it's not possible to automatically get columns names.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-7179) Compiling drill from source doesn't include all the jars in the distribution/target dir

2019-04-16 Thread Hefei Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hefei Li resolved DRILL-7179.
-
   Resolution: Fixed
Fix Version/s: 1.15.0

Compilation environment setting problem has been resolved.

> Compiling drill from source doesn't include all the jars in the 
> distribution/target dir
> ---
>
> Key: DRILL-7179
> URL: https://issues.apache.org/jira/browse/DRILL-7179
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.15.0
> Environment: Building on Windows 10
>Reporter: Georgi Perpeliev
>Assignee: Hefei Li
>Priority: Minor
> Fix For: 1.15.0
>
> Attachments: checkou.png, cmpiled.png, verify_tarball_file.png
>
>
> Following the instructions on 
> [https://drill.apache.org/docs/compiling-drill-from-source/] , we end up with 
> incomplete tarball including only drill-shaded-guava-23.0.jar in the jars 
> subdirectory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7179) Compiling drill from source doesn't include all the jars in the distribution/target dir

2019-04-16 Thread Georgi Perpeliev (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16818896#comment-16818896
 ] 

Georgi Perpeliev commented on DRILL-7179:
-

[~lhfei]

I managed to compile it successfully today after reverting maven's settings.xml 
to default (my company uses a custom one). Before it was compiling without 
errors or warnings, but without those jars.

So it seems it was a transient maven issue and this one can be closed. 

Thanks for your help.

I had checked out the tag drill-1.15.0 and compiled against it, it seems to 
point to the same commit as the branch.

> Compiling drill from source doesn't include all the jars in the 
> distribution/target dir
> ---
>
> Key: DRILL-7179
> URL: https://issues.apache.org/jira/browse/DRILL-7179
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.15.0
> Environment: Building on Windows 10
>Reporter: Georgi Perpeliev
>Assignee: Hefei Li
>Priority: Minor
> Attachments: checkou.png, cmpiled.png, verify_tarball_file.png
>
>
> Following the instructions on 
> [https://drill.apache.org/docs/compiling-drill-from-source/] , we end up with 
> incomplete tarball including only drill-shaded-guava-23.0.jar in the jars 
> subdirectory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (DRILL-6720) Unable to read more than 1000 records from DB2 table

2019-04-16 Thread kshitij (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16818727#comment-16818727
 ] 

kshitij edited comment on DRILL-6720 at 4/16/19 7:27 AM:
-

I found the resolution for this issue. We need to append a parameter in the 
connection string in Drill storage plugin:
jdbc:db2://server:port/db{color:red}:allowNextOnExhaustedResultSet=1;{color}

Reference : http://www-01.ibm.com/support/docview.wss?uid=swg21461670


was (Author: kshitij@infy):
I found the resolution for this issue. We need to append a parameter in the 
connection string in Drill storage plugin:
jdbc:db2://server:port/db*:allowNextOnExhaustedResultSet=1;*

Reference : http://www-01.ibm.com/support/docview.wss?uid=swg21461670

> Unable to read more than 1000 records from DB2 table
> 
>
> Key: DRILL-6720
> URL: https://issues.apache.org/jira/browse/DRILL-6720
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC, Storage - JDBC
>Affects Versions: 1.9.0
>Reporter: kshitij
>Priority: Critical
> Fix For: 1.9.0
>
> Attachments: drill_DB2_query_error_log.txt, drill_db2_query_status.PNG
>
>
> We have created a storage plugin in drill for DB2 database, PFB the details:
> {
>  "type": "jdbc",
>  "driver": "com.ibm.db2.jcc.DB2Driver",
>  "url": "jdbc:db2://server:port/databasename",
>  "username": "user",
>  "password": "password",
>  "enabled": true
> }
> Version of DB2 is 10.1. We are using a type 4 JDBC driver (db2jcc4.jar).
> When we try to read the data in any of the tables, the query fails with 
> following error from drill:
> org.apache.drill.common.exceptions.UserRemoteException: DATA_READ ERROR: 
> Failure while attempting to read from database. sql SELECT * FROM 
> schema.table plugin DB2_PLUGIN Fragment 0:0 [Error Id: 
> 1404544f-bb5e-439b-b1a8-679388bb344d on server:port]
> The error logs from drill have been attached.
> One interesting observation - when we put a LIMIT clause of <=1000 to the 
> query, the query works and returns the data. Anything more than 1000 in LIMIT 
> clause throws back the same error as above.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (DRILL-6720) Unable to read more than 1000 records from DB2 table

2019-04-16 Thread kshitij (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kshitij closed DRILL-6720.
--
Resolution: Fixed

> Unable to read more than 1000 records from DB2 table
> 
>
> Key: DRILL-6720
> URL: https://issues.apache.org/jira/browse/DRILL-6720
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC, Storage - JDBC
>Affects Versions: 1.9.0
>Reporter: kshitij
>Priority: Critical
> Fix For: 1.9.0
>
> Attachments: drill_DB2_query_error_log.txt, drill_db2_query_status.PNG
>
>
> We have created a storage plugin in drill for DB2 database, PFB the details:
> {
>  "type": "jdbc",
>  "driver": "com.ibm.db2.jcc.DB2Driver",
>  "url": "jdbc:db2://server:port/databasename",
>  "username": "user",
>  "password": "password",
>  "enabled": true
> }
> Version of DB2 is 10.1. We are using a type 4 JDBC driver (db2jcc4.jar).
> When we try to read the data in any of the tables, the query fails with 
> following error from drill:
> org.apache.drill.common.exceptions.UserRemoteException: DATA_READ ERROR: 
> Failure while attempting to read from database. sql SELECT * FROM 
> schema.table plugin DB2_PLUGIN Fragment 0:0 [Error Id: 
> 1404544f-bb5e-439b-b1a8-679388bb344d on server:port]
> The error logs from drill have been attached.
> One interesting observation - when we put a LIMIT clause of <=1000 to the 
> query, the query works and returns the data. Anything more than 1000 in LIMIT 
> clause throws back the same error as above.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6720) Unable to read more than 1000 records from DB2 table

2019-04-16 Thread kshitij (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16818727#comment-16818727
 ] 

kshitij commented on DRILL-6720:


I found the resolution for this issue. We need to append a parameter in the 
connection string in Drill storage plugin:
jdbc:db2://server:port/db*:allowNextOnExhaustedResultSet=1;*

Reference : http://www-01.ibm.com/support/docview.wss?uid=swg21461670

> Unable to read more than 1000 records from DB2 table
> 
>
> Key: DRILL-6720
> URL: https://issues.apache.org/jira/browse/DRILL-6720
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC, Storage - JDBC
>Affects Versions: 1.9.0
>Reporter: kshitij
>Priority: Critical
> Fix For: 1.9.0
>
> Attachments: drill_DB2_query_error_log.txt, drill_db2_query_status.PNG
>
>
> We have created a storage plugin in drill for DB2 database, PFB the details:
> {
>  "type": "jdbc",
>  "driver": "com.ibm.db2.jcc.DB2Driver",
>  "url": "jdbc:db2://server:port/databasename",
>  "username": "user",
>  "password": "password",
>  "enabled": true
> }
> Version of DB2 is 10.1. We are using a type 4 JDBC driver (db2jcc4.jar).
> When we try to read the data in any of the tables, the query fails with 
> following error from drill:
> org.apache.drill.common.exceptions.UserRemoteException: DATA_READ ERROR: 
> Failure while attempting to read from database. sql SELECT * FROM 
> schema.table plugin DB2_PLUGIN Fragment 0:0 [Error Id: 
> 1404544f-bb5e-439b-b1a8-679388bb344d on server:port]
> The error logs from drill have been attached.
> One interesting observation - when we put a LIMIT clause of <=1000 to the 
> query, the query works and returns the data. Anything more than 1000 in LIMIT 
> clause throws back the same error as above.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7162) Apache Drill uses 3rd Party with Highest CVEs

2019-04-16 Thread Ayush Sharma (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16818716#comment-16818716
 ] 

Ayush Sharma commented on DRILL-7162:
-

[~vitalii] I have created an excel sheet for all the 3rd party jars and CVEs.

Attaching the same for your reference , the jars are updated with the version 
which addresses the CVE (Most of the critical CVEs are addressed).

Please try and see if this can be picked up in the 1.16 Fix version as it 
compromises with Security.

Please let me know if u need any more information related to the vulnerability 
reports.[^Jars.xlsx]

>  Apache Drill uses 3rd Party with Highest CVEs
> --
>
> Key: DRILL-7162
> URL: https://issues.apache.org/jira/browse/DRILL-7162
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0, 1.14.0, 1.15.0
>Reporter: Ayush Sharma
>Priority: Major
> Fix For: 1.17.0
>
> Attachments: Jars.xlsx
>
>
> Apache Drill uses rd party libraries with almost 250+ CVEs.
> Most of the CVEs are in the older version of Jetty (9.1.x) whereas the 
> current version of Jetty is 9.4.x
> Also many of the other libraries are in EOF versions and the are not patched 
> even in the latest release.
> This creates an issue of security when we use it in production.
> We are able to replace many older version of libraries with the latest 
> versions with no CVEs , however many of them are not replaceable as it is and 
> would require some changes in the source code.
> The jetty version is of the highest priority and needs migration to 9.4.x 
> version immediately.
>  
> Please look into this issue at immediate priority as it compromises with the 
> security of the application utilizing Apache Drill.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7162) Apache Drill uses 3rd Party with Highest CVEs

2019-04-16 Thread Ayush Sharma (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Sharma updated DRILL-7162:

Attachment: Jars.xlsx

>  Apache Drill uses 3rd Party with Highest CVEs
> --
>
> Key: DRILL-7162
> URL: https://issues.apache.org/jira/browse/DRILL-7162
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0, 1.14.0, 1.15.0
>Reporter: Ayush Sharma
>Priority: Major
> Fix For: 1.17.0
>
> Attachments: Jars.xlsx
>
>
> Apache Drill uses rd party libraries with almost 250+ CVEs.
> Most of the CVEs are in the older version of Jetty (9.1.x) whereas the 
> current version of Jetty is 9.4.x
> Also many of the other libraries are in EOF versions and the are not patched 
> even in the latest release.
> This creates an issue of security when we use it in production.
> We are able to replace many older version of libraries with the latest 
> versions with no CVEs , however many of them are not replaceable as it is and 
> would require some changes in the source code.
> The jetty version is of the highest priority and needs migration to 9.4.x 
> version immediately.
>  
> Please look into this issue at immediate priority as it compromises with the 
> security of the application utilizing Apache Drill.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)