[jira] [Created] (DRILL-7222) Visualize estimated and actual row counts for a query
Kunal Khatua created DRILL-7222: --- Summary: Visualize estimated and actual row counts for a query Key: DRILL-7222 URL: https://issues.apache.org/jira/browse/DRILL-7222 Project: Apache Drill Issue Type: Improvement Components: Web Server Affects Versions: 1.16.0 Reporter: Kunal Khatua Assignee: Kunal Khatua Fix For: 1.17.0 With statistics in place, it would be useful to have the *estimated* rowcount along side the *actual* rowcount query profile's operator overview. We can extract this from the Physical Plan section of the profile. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-7221) Exclude debug files generated my maven debug option from jar
Sorabh Hamirwasia created DRILL-7221: Summary: Exclude debug files generated my maven debug option from jar Key: DRILL-7221 URL: https://issues.apache.org/jira/browse/DRILL-7221 Project: Apache Drill Issue Type: Sub-task Components: Tools, Build & Test Reporter: Sorabh Hamirwasia Assignee: Sorabh Hamirwasia Fix For: 1.17.0 Release automated script was using -X debug option at release:prepare phase. This was generating some debug files which were getting packaged in the jars. This is because the pattern of these debug files were not ignored in exclude configuration of maven-jar plugin. It would be good to ignore these. *Debug files which were included:* *javac.sh* *org.codehaus.plexus.compiler.javac.JavacCompiler1256088670033285178arguments* *org.codehaus.plexus.compiler.javac.JavacCompiler1458111453480208588arguments* *org.codehaus.plexus.compiler.javac.JavacCompiler2392560589194600493arguments* *org.codehaus.plexus.compiler.javac.JavacCompiler4475905192586529595arguments* *org.codehaus.plexus.compiler.javac.JavacCompiler4524532450095901144arguments* *org.codehaus.plexus.compiler.javac.JavacCompiler4670895443631397937arguments* *org.codehaus.plexus.compiler.javac.JavacCompiler5215058338087807885arguments* *org.codehaus.plexus.compiler.javac.JavacCompiler7526103232425779297arguments* -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-7220) Create a release package in Drill repo with automated scripts and instructions
Sorabh Hamirwasia created DRILL-7220: Summary: Create a release package in Drill repo with automated scripts and instructions Key: DRILL-7220 URL: https://issues.apache.org/jira/browse/DRILL-7220 Project: Apache Drill Issue Type: Task Components: Tools, Build & Test Reporter: Sorabh Hamirwasia Assignee: Sorabh Hamirwasia Fix For: 1.17.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7192) Drill limits rows when autoLimit is disabled
[ https://issues.apache.org/jira/browse/DRILL-7192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827143#comment-16827143 ] Kunal Khatua commented on DRILL-7192: - [~vvysotskyi] The source of this issue is that we areĀ setting the parameter at a {{SESSION}} level, although the feature works at a {{QUERY}} level (i.e. on the {{Statement}} object). Calling {{!set rowlimit 10}} will execute the `Statement.setMaxRows()` automatically for each new Statement. However, {{!set rowlimit 0}} will *not* execute the `Statement.setMaxRows()` automatically for each new Statement. My guess is that since each query's Statement is a new object with a presumed default 0 (on the client side), it does not again `ALTER SESSION` behind the scene. I'll verify this with a custom code, but if that is the case... fixing SQLLine is not the solution. The only workaround would be to not rely on the {{SESSION}}-level value, but on the RunQuery.getAutolimitRowcount() by ensuring the value is set in the {{DrillClient}} > Drill limits rows when autoLimit is disabled > > > Key: DRILL-7192 > URL: https://issues.apache.org/jira/browse/DRILL-7192 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Volodymyr Vysotskyi >Assignee: Kunal Khatua >Priority: Major > Fix For: 1.17.0 > > > InĀ DRILL-7048 was implemented autoLimit for JDBC and rest clients. > *Steps to reproduce the issue:* > 1. Check that autoLimit was disabled, if not, disable it and restart Drill. > 2. Submit any query, and verify that rows count is correct, for example, > {code:sql} > SELECT * FROM cp.`employee.json`; > {code} > returns 1,155 rows > 3. Enable autoLimit for sqlLine sqlLine client: > {code:sql} > !set rowLimit 10 > {code} > 4. Submit the same query and verify that the result has 10 rows. > 5. Disable autoLimit: > {code:sql} > !set rowLimit 0 > {code} > 6. Submit the same query, but for this time, *it returns 10 rows instead of > 1,155*. > Correct rows count is returned only after creating a new connection. > The same issue is also observed for SQuirreL SQL client, but for example, for > Postgres, it works correctly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7199) Optimize the time taken to populate column statistics for non-interesting columns
[ https://issues.apache.org/jira/browse/DRILL-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827131#comment-16827131 ] ASF GitHub Bot commented on DRILL-7199: --- dvjyothsna commented on issue #1771: DRILL-7199: Optimize population of metadata for non-interesting columns URL: https://github.com/apache/drill/pull/1771#issuecomment-487132838 @amansinha100 , @vvysotskyi Please review. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Optimize the time taken to populate column statistics for non-interesting > columns > - > > Key: DRILL-7199 > URL: https://issues.apache.org/jira/browse/DRILL-7199 > Project: Apache Drill > Issue Type: Bug >Reporter: Venkata Jyothsna Donapati >Assignee: Venkata Jyothsna Donapati >Priority: Minor > Fix For: 1.17.0 > > Original Estimate: 48h > Remaining Estimate: 48h > > Currently populating column statistics for non-interesting columns very long > since it is populated for every row group. Since non-interesting column > statistics are common for the table, it can be populated once and can be > reused. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7199) Optimize the time taken to populate column statistics for non-interesting columns
[ https://issues.apache.org/jira/browse/DRILL-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827130#comment-16827130 ] ASF GitHub Bot commented on DRILL-7199: --- dvjyothsna commented on pull request #1771: DRILL-7199: Optimize population of metadata for non-interesting columns URL: https://github.com/apache/drill/pull/1771 Currently the non-interesting column metadata is populated for all types of metadata including rowgroup metadata. It's a huge overkill if there are large number of row groups. With this PR, non-interesting column metadata is populated only once when all the other types of metadata is populated in the BaseParquetMetadataProvider.java. This optimization reduced the planning time from 17 sec to 5 sec when there are 35000 row groups. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Optimize the time taken to populate column statistics for non-interesting > columns > - > > Key: DRILL-7199 > URL: https://issues.apache.org/jira/browse/DRILL-7199 > Project: Apache Drill > Issue Type: Bug >Reporter: Venkata Jyothsna Donapati >Assignee: Venkata Jyothsna Donapati >Priority: Minor > Fix For: 1.17.0 > > Original Estimate: 48h > Remaining Estimate: 48h > > Currently populating column statistics for non-interesting columns very long > since it is populated for every row group. Since non-interesting column > statistics are common for the table, it can be populated once and can be > reused. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7199) Optimize the time taken to populate column statistics for non-interesting columns
[ https://issues.apache.org/jira/browse/DRILL-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkata Jyothsna Donapati updated DRILL-7199: - Description: Currently populating column statistics for non-interesting columns very long since it is populated for every row group. Since non-interesting column statistics are common for the table, it can be populated once and can be reused. (was: Currently populating column statistics for non-existent columns very long since it is populated for every row group. Since non-existent column statistics are common for the table, it can be populated once and can be reused.) > Optimize the time taken to populate column statistics for non-interesting > columns > - > > Key: DRILL-7199 > URL: https://issues.apache.org/jira/browse/DRILL-7199 > Project: Apache Drill > Issue Type: Bug >Reporter: Venkata Jyothsna Donapati >Assignee: Venkata Jyothsna Donapati >Priority: Minor > Fix For: 1.17.0 > > Original Estimate: 48h > Remaining Estimate: 48h > > Currently populating column statistics for non-interesting columns very long > since it is populated for every row group. Since non-interesting column > statistics are common for the table, it can be populated once and can be > reused. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7050) RexNode convert exception in subquery
[ https://issues.apache.org/jira/browse/DRILL-7050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826862#comment-16826862 ] ASF GitHub Bot commented on DRILL-7050: --- arina-ielchiieva commented on issue #1770: DRILL-7050: RexNode convert exception in sub-query URL: https://github.com/apache/drill/pull/1770#issuecomment-487010377 +1, thanks for making the changes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > RexNode convert exception in subquery > - > > Key: DRILL-7050 > URL: https://issues.apache.org/jira/browse/DRILL-7050 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.14.0, 1.15.0 >Reporter: Oleg Zinoviev >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > > If the query contains a subquery whose filters are associated with the main > query, an error occurs: *PLAN ERROR: Cannot convert RexNode to equivalent > Drill expression. RexNode Class: org.apache.calcite.rex.RexCorrelVariable* > Steps to reproduce: > 1) Create source table (or view, doesn't matter) > {code:sql} > create table dfs.root.source as ( > select 1 as id union all select 2 as id > ) > {code} > 2) Execute query > {code:sql} > select t1.id, > (select count(t2.id) > from dfs.root.source t2 where t2.id = t1.id) > from dfs.root.source t1 > {code} > Reason: > Method > {code:java}org.apache.calcite.sql2rel.SqlToRelConverter.Blackboard.lookupExp{code} > call {code:java}RexBuilder.makeCorrel{code} in some cases -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7050) RexNode convert exception in subquery
[ https://issues.apache.org/jira/browse/DRILL-7050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-7050: Labels: ready-to-commit (was: ) > RexNode convert exception in subquery > - > > Key: DRILL-7050 > URL: https://issues.apache.org/jira/browse/DRILL-7050 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.14.0, 1.15.0 >Reporter: Oleg Zinoviev >Assignee: Volodymyr Vysotskyi >Priority: Major > Labels: ready-to-commit > Fix For: 1.17.0 > > > If the query contains a subquery whose filters are associated with the main > query, an error occurs: *PLAN ERROR: Cannot convert RexNode to equivalent > Drill expression. RexNode Class: org.apache.calcite.rex.RexCorrelVariable* > Steps to reproduce: > 1) Create source table (or view, doesn't matter) > {code:sql} > create table dfs.root.source as ( > select 1 as id union all select 2 as id > ) > {code} > 2) Execute query > {code:sql} > select t1.id, > (select count(t2.id) > from dfs.root.source t2 where t2.id = t1.id) > from dfs.root.source t1 > {code} > Reason: > Method > {code:java}org.apache.calcite.sql2rel.SqlToRelConverter.Blackboard.lookupExp{code} > call {code:java}RexBuilder.makeCorrel{code} in some cases -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7050) RexNode convert exception in subquery
[ https://issues.apache.org/jira/browse/DRILL-7050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-7050: Reviewer: Arina Ielchiieva (was: Volodymyr Vysotskyi) > RexNode convert exception in subquery > - > > Key: DRILL-7050 > URL: https://issues.apache.org/jira/browse/DRILL-7050 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.14.0, 1.15.0 >Reporter: Oleg Zinoviev >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > > If the query contains a subquery whose filters are associated with the main > query, an error occurs: *PLAN ERROR: Cannot convert RexNode to equivalent > Drill expression. RexNode Class: org.apache.calcite.rex.RexCorrelVariable* > Steps to reproduce: > 1) Create source table (or view, doesn't matter) > {code:sql} > create table dfs.root.source as ( > select 1 as id union all select 2 as id > ) > {code} > 2) Execute query > {code:sql} > select t1.id, > (select count(t2.id) > from dfs.root.source t2 where t2.id = t1.id) > from dfs.root.source t1 > {code} > Reason: > Method > {code:java}org.apache.calcite.sql2rel.SqlToRelConverter.Blackboard.lookupExp{code} > call {code:java}RexBuilder.makeCorrel{code} in some cases -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6974) SET option command
[ https://issues.apache.org/jira/browse/DRILL-6974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826860#comment-16826860 ] ASF GitHub Bot commented on DRILL-6974: --- vvysotskyi commented on issue #1763: DRILL-6974: Add possibility to view option value via SET command URL: https://github.com/apache/drill/pull/1763#issuecomment-487010134 @dgrinchenko, please squash the commits and add Jira number to the commit message. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > SET option command > -- > > Key: DRILL-6974 > URL: https://issues.apache.org/jira/browse/DRILL-6974 > Project: Apache Drill > Issue Type: Improvement > Components: SQL Parser >Affects Versions: 1.15.0 >Reporter: benj >Assignee: Dmytriy Grinchenko >Priority: Major > Labels: doc-impacting, ready-to-commit > Fix For: 1.17.0 > > > It's currently possible to define options with the SQL command SET > {code:java} > ALTER SESSION SET `drill.exec.hashagg.fallback.enabled` = true; > {code} > But it's not possible to simply visualize the current value of one option > with SHOW, we have to query like > {code:java} > SELECT * FROM sys.options WHERE `name` = > 'drill.exec.hashagg.fallback.enabled'; > {code} > Why not allow a simple > {code:java} > SET `drill.exec.hashagg.fallback.enabled`; > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6974) SET option command
[ https://issues.apache.org/jira/browse/DRILL-6974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Volodymyr Vysotskyi updated DRILL-6974: --- Labels: doc-impacting ready-to-commit (was: doc-impacting) > SET option command > -- > > Key: DRILL-6974 > URL: https://issues.apache.org/jira/browse/DRILL-6974 > Project: Apache Drill > Issue Type: Improvement > Components: SQL Parser >Affects Versions: 1.15.0 >Reporter: benj >Assignee: Dmytriy Grinchenko >Priority: Major > Labels: doc-impacting, ready-to-commit > Fix For: 1.17.0 > > > It's currently possible to define options with the SQL command SET > {code:java} > ALTER SESSION SET `drill.exec.hashagg.fallback.enabled` = true; > {code} > But it's not possible to simply visualize the current value of one option > with SHOW, we have to query like > {code:java} > SELECT * FROM sys.options WHERE `name` = > 'drill.exec.hashagg.fallback.enabled'; > {code} > Why not allow a simple > {code:java} > SET `drill.exec.hashagg.fallback.enabled`; > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-7219) Ignore hidden file problems
benj created DRILL-7219: --- Summary: Ignore hidden file problems Key: DRILL-7219 URL: https://issues.apache.org/jira/browse/DRILL-7219 Project: Apache Drill Issue Type: Bug Components: Storage - JSON, Storage - Parquet, Storage - Text & CSV Affects Versions: 1.15.0 Reporter: benj Drill seems to use different filtering rules for files depending on the type. * *Parquet*: filtering hidden file (starting with ".") +whether+ we request the directory or the files with * {code:java} /* DirPqt |--sub1.pqt |--sub2.pqt |--.sub3.pqt */ SELECT count(*) FROM (SELECT DISTINCT filename FROM `DirPqt`); => 2 SELECT count(*) FROM (SELECT DISTINCT filename FROM `DirPqt/*`); => 2 /* Its possible to request the hidden file */ SELECT count(*) FROM (SELECT DISTINCT filename FROM `DirPqt/.*`); => 1 /* But don't know how to request visible and hidden simultaneously (except to do an union) */ {code} * *CSV, json*: filtering hidden file (starting with ".") +depends+ if the request is on directory or files {code:java} /* DirCSVH |--sub1.csvh |--sub2.csvh |--.sub3.csvh */ SELECT count(*) FROM (SELECT DISTINCT filename FROM `DirCSVH`); => 2 SELECT count(*) FROM (SELECT DISTINCT filename FROM `DirCSVH/*`); => 3 /* Like for Parquet, its possible to request the hidden file*/ SELECT count(*) FROM (SELECT DISTINCT filename FROM `DirCSVH/.*`); =>1 /* It's also possible to request only visible */ SELECT count(*) FROM (SELECT DISTINCT filename FROM `DirCSVH/[^.]*`); =>2 /* But don't know how to request visible and hidden simultaneously (except to do an union)*/ {code} Some issue are about the problematic of hidden files, example : DRILL-2424 But don't found any precision of this filtering in the documentation. I found that hidden file start with "." or "_" but maybe there are other case ? It's a little bit strange to not have the same filtering rules depending of the type of the file. It's not practical to not have the possibility to simply say if we want or not hidden file. For example with a : {code:java} SELECT * FROM `MyDir/[.]?*`; {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (DRILL-7050) RexNode convert exception in subquery
[ https://issues.apache.org/jira/browse/DRILL-7050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Volodymyr Vysotskyi reassigned DRILL-7050: -- Assignee: Volodymyr Vysotskyi (was: Arina Ielchiieva) > RexNode convert exception in subquery > - > > Key: DRILL-7050 > URL: https://issues.apache.org/jira/browse/DRILL-7050 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.14.0, 1.15.0 >Reporter: Oleg Zinoviev >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > > If the query contains a subquery whose filters are associated with the main > query, an error occurs: *PLAN ERROR: Cannot convert RexNode to equivalent > Drill expression. RexNode Class: org.apache.calcite.rex.RexCorrelVariable* > Steps to reproduce: > 1) Create source table (or view, doesn't matter) > {code:sql} > create table dfs.root.source as ( > select 1 as id union all select 2 as id > ) > {code} > 2) Execute query > {code:sql} > select t1.id, > (select count(t2.id) > from dfs.root.source t2 where t2.id = t1.id) > from dfs.root.source t1 > {code} > Reason: > Method > {code:java}org.apache.calcite.sql2rel.SqlToRelConverter.Blackboard.lookupExp{code} > call {code:java}RexBuilder.makeCorrel{code} in some cases -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7050) RexNode convert exception in subquery
[ https://issues.apache.org/jira/browse/DRILL-7050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826771#comment-16826771 ] ASF GitHub Bot commented on DRILL-7050: --- vvysotskyi commented on pull request #1770: DRILL-7050: RexNode convert exception in sub-query URL: https://github.com/apache/drill/pull/1770 - Bumped up Calcite version to include a fix for CALCITE-2954; - Added a call to validate query when `inferUnknownTypes()` is called to replace temporary table name in sub-queries from the project list; - Added unit tests for both issues. For problem description please see [DRILL-7050](https://issues.apache.org/jira/browse/DRILL-7050). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > RexNode convert exception in subquery > - > > Key: DRILL-7050 > URL: https://issues.apache.org/jira/browse/DRILL-7050 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.14.0, 1.15.0 >Reporter: Oleg Zinoviev >Assignee: Arina Ielchiieva >Priority: Major > Fix For: 1.17.0 > > > If the query contains a subquery whose filters are associated with the main > query, an error occurs: *PLAN ERROR: Cannot convert RexNode to equivalent > Drill expression. RexNode Class: org.apache.calcite.rex.RexCorrelVariable* > Steps to reproduce: > 1) Create source table (or view, doesn't matter) > {code:sql} > create table dfs.root.source as ( > select 1 as id union all select 2 as id > ) > {code} > 2) Execute query > {code:sql} > select t1.id, > (select count(t2.id) > from dfs.root.source t2 where t2.id = t1.id) > from dfs.root.source t1 > {code} > Reason: > Method > {code:java}org.apache.calcite.sql2rel.SqlToRelConverter.Blackboard.lookupExp{code} > call {code:java}RexBuilder.makeCorrel{code} in some cases -- This message was sent by Atlassian JIRA (v7.6.3#76005)