Re: 回复:Is Drill query execution processing model just the same idea with the Spark whole-stage codegen improvement
In addition to directory and row group pruning, the physical plan generation looks at data locality for every row group and schedules the scan for the row group on the node where data is local. (Remote reads can kill performance like nothing else can). Short answer, query planning requires metadata; schema, statistics, and locality being the most important pieces of metadata we need. Drill will try to infer schema where it is not available but if the user can provide schema, Drill will do a better job. The same with statistics, Drill can plan a query without stats but if the stats are available, the execution plan will be much more efficient. So the job boils down to gathering metadata efficiently. As the metadata size increases, the task becomes similar to that of executing a query on a large data set. The metadata cache is a starting point for collecting metadata. Essentially, it means we have pre-fetched metadata. However, with a large amount of metadata, say several GB, this approach becomes the bottleneck and the solution is really to distribute the metadata collection as well. That's where the metastore comes in. The next level of complexity will then shift to the planning process. A single thread processing a few GB of metadata could itself become the bottleneck. So the next step in Drill's evolution will be to partition and distribute the planning process itself. That will be fun :) On Thu, Aug 9, 2018 at 10:51 AM, Paul Rogers wrote: > Hi Alex, > > Perhaps Parth can jump in here as he has deeper knowledge of Parquet. > > My understanding is that the planning-time metadata is used for partition > (directory) and row group pruning. By scanning the footer of each Parquet > row group, Drill can determine whether that group contains data that the > query must process. Only groups that could contain target data are included > in the list of groups to be scanned by the query's scan operators. > > For example, suppose we do a query over a date range and a US state: > > SELECT saleDate, prodId, quantity FROM `sales`WHERE dir0 BETWEEN > '2018-07-01` AND '2018-07-31'AND state = 'CA' > > Suppose sales is a directory that contains subdirectories partitioned by > date, and that state is a dictionary-encoded field with metadata in the > Parquet footer. > > In this case, Drill reads the directory structure to do the partition > pruning at planning time. Only the matching directories are added to the > list of files to be scanned at run time. > > Then, Drill scans the footers of each Parquet row group to find those with > `state` fields that contain the value of 'CA'. (I believe the footer > scanning is done only for matching partitions, but I've not looked at that > code recently.) > > Finally, Drill passes the matching set of files to the distribution > planner, which assigns files to minor fragments distributed across the > Drill cluster. The result attempts to maximize locality and minimize skew. > > Could this have been done differently? There is a trade-off. If the work > is done in the planner, then it can be very expensive for large data sets. > But, on the other hand, the resulting distribution plan has minimal skew: > each file that is scheduled for scanning does, in fact, need to be scanned. > > Suppose that Drill pushed the partition and row group filtering to the > scan operator. In this case, operators that received files outside of the > target date range would do nothing, while those with files in the range > would need to do further work, resulting in possible skew. (One could argue > that, if files are randomly distributed across nodes, then, on average, > each scan operator will see roughly the same number of included and > excluded files, reducing skew. This would be an interesting experiment to > try.) > > Another approach might be to do a two-step execution. First distribute the > work of doing the partition pruning and row group footer checking. Gather > the results. Then, finish planning the query and distribute the execution > as is done today. Here, much new code would be needed to implement this new > form of task, which is probably why it has not yet been done. > > Might be interesting for someone to prototype that to see if we get > improvement in planning performance. My guess is that, a prototype could be > built that does the "pre-selection" work as a Drill query, just to see if > distribution of the work helps. If that is a win, then perhaps a more > robust solution could be developed from there. > > The final factor is Drill's crude-but-effective metadata caching > mechanism. Drill reads footer and directory information for all Parquet > files in a directory structure and stores it in a series of possibly-large > JSON files, one in each directory. This mechanism has no good concurrency > control. (This is a long-standing, hard-to-fix bug.) Centralizing access in > the Foreman reduces the potential for concurrent writes on metadata refresh > (but does not eliminate them
[GitHub] paul-rogers commented on issue #1429: DRILL-6676: Add Union, List and Repeated List types to Result Set Loader
paul-rogers commented on issue #1429: DRILL-6676: Add Union, List and Repeated List types to Result Set Loader URL: https://github.com/apache/drill/pull/1429#issuecomment-411933865 @ilooner, this work is related to your resource management efforts. Please review at your convenience. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] paul-rogers opened a new pull request #1429: DRILL-6676: Add Union, List and Repeated List types to Result Set Loader
paul-rogers opened a new pull request #1429: DRILL-6676: Add Union, List and Repeated List types to Result Set Loader URL: https://github.com/apache/drill/pull/1429 Previous commits provided the core "result set loader" (RSL) structure and support for the "mainstream" vector types, including structured types such as maps and lists. This PR adds the "obscure" (and partly implemented) types used for JSON: (non-repeated) list, repeated list and union. The union type is complex: it is a bundle of vectors keyed by type, and can accept new types as a run proceeds. A (non-repeated) list is highly complex: it it can act like a repeated list, but with the ability to specify a null state for each entry. The non-repeated List can also act like a union type. This dual/morphing nature of a list required some rather complex magic behind the scenes to support the simple JSON-like interface used by the row set and result set loader mechanisms. This PR introduces the idea of a "variant" to model unions and non-repeated-lists-as-list-of-unions. The name is taken from Microsoft Basic and simply means a tagged union. (Where "union" is taken from "C".) Changes include fixing a number of issues with the list vectors, adding support in the column accessors and metadata layers, and adding support for creating vectors from metadata and metadata from vectors. Unit tests demonstrate how to use the resulting behavior as well as verifying that the behavior is correct. The focus of this PR is to enable union, list and repeated list support in the RSL and associated mechanisms. It is known that support of these vector types is incomplete: some operators fail when presented with such vectors. It is not the goal here to fix those issues: this is not a PR to fully support these types. Rather, the the scope of this PR is just to the RSL and associated classes. For more information, see [this wiki entry](https://github.com/paul-rogers/drill/wiki/Batch-Handling-Upgrades). This PR completes the result set loader work. The next PR in this series will introduce revisions to the scan operator that allow readers to use the RSL. After that, there are revised implementations for the delimited text (e.g. CSV) and JSON readers. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Ben-Zvi commented on issue #1408: DRILL-6453: Resolve deadlock when reading from build and probe sides simultaneously in HashJoin
Ben-Zvi commented on issue #1408: DRILL-6453: Resolve deadlock when reading from build and probe sides simultaneously in HashJoin URL: https://github.com/apache/drill/pull/1408#issuecomment-411933587 OK - tag it ready-to-commit ... This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] HanumathRao commented on issue #1426: DRILL-6671: Multi level lateral unnest join is throwing an exception …
HanumathRao commented on issue #1426: DRILL-6671: Multi level lateral unnest join is throwing an exception … URL: https://github.com/apache/drill/pull/1426#issuecomment-411931640 @vvysotskyi I have made the changes. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (DRILL-6678) Improve SelectionVectorRemover to pack output batch based on BatchSizing
Sorabh Hamirwasia created DRILL-6678: Summary: Improve SelectionVectorRemover to pack output batch based on BatchSizing Key: DRILL-6678 URL: https://issues.apache.org/jira/browse/DRILL-6678 Project: Apache Drill Issue Type: Improvement Components: Execution - Relational Operators Affects Versions: 1.14.0 Reporter: Sorabh Hamirwasia Assignee: Sorabh Hamirwasia SelectionVectorRemover in most of the cases is downstream to Filter which reduces the number of records to be copied in output container. In those cases if SelectionVectorRemover can pack the output batch to maximum utilization that will reduce the number of output batches from it and will help to improve performance. During Lateral & Unnest Performance evaluation we have noticed a significant decrease in performance as number of batches increases for same number of rows (i.e. Batch is not fully packed) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] ilooner commented on issue #1415: DRILL-6656: Disallow extra semicolons in import statements.
ilooner commented on issue #1415: DRILL-6656: Disallow extra semicolons in import statements. URL: https://github.com/apache/drill/pull/1415#issuecomment-411910893 @vvysotskyi Made additional style changes. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.
ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements. URL: https://github.com/apache/drill/pull/1415#discussion_r209091287 ## File path: exec/jdbc/src/test/java/org/apache/drill/jdbc/StatementTest.java ## @@ -83,7 +83,7 @@ public static void tearDownStatement() throws SQLException { public void testDefaultGetQueryTimeout() throws SQLException { try(Statement stmt = connection.createStatement()) { int timeoutValue = stmt.getQueryTimeout(); - assertEquals( 0 , timeoutValue ); + assertEquals( 0, timeoutValue ); Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.
ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements. URL: https://github.com/apache/drill/pull/1415#discussion_r209091229 ## File path: exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/ParquetSimpleTestFileGenerator.java ## @@ -375,7 +376,8 @@ public static void writeSimpleValues(SimpleGroupFactory sgf, ParquetWriter
[GitHub] ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.
ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements. URL: https://github.com/apache/drill/pull/1415#discussion_r209091196 ## File path: exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/ParquetSimpleTestFileGenerator.java ## @@ -292,7 +292,8 @@ public static void writeComplexValues(GroupFactory gf, ParquetWriter comp .append("_INT_64", 0x7FFFL) .append("_UINT_64", 0xL) .append("_DECIMAL_decimal18", 0xL); - byte[] bytes = new byte[30]; Arrays.fill(bytes, (byte)1); + byte[] bytes = new byte[30]; + Arrays.fill(bytes, (byte)1); Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.
ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements. URL: https://github.com/apache/drill/pull/1415#discussion_r209091066 ## File path: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/protocol/TestOperatorRecordBatch.java ## @@ -99,7 +99,10 @@ public BatchAccessor batchAccessor() { } @Override -public boolean buildSchema() { buildSchemaCalled = true; return ! schemaEOF; } +public boolean buildSchema() { + buildSchemaCalled = true; + return ! schemaEOF; Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.
ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements. URL: https://github.com/apache/drill/pull/1415#discussion_r209090720 ## File path: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestDistributedFragmentRun.java ## @@ -101,7 +104,9 @@ public void oneBitOneExchangeTwoEntryRunLogical() throws Exception{ public void twoBitOneExchangeTwoEntryRun() throws Exception{ RemoteServiceSet serviceSet = RemoteServiceSet.getLocalServiceSet(); - try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet); Drillbit bit2 = new Drillbit(CONFIG, serviceSet); DrillClient client = new DrillClient(CONFIG, serviceSet.getCoordinator());){ + try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet); Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.
ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements. URL: https://github.com/apache/drill/pull/1415#discussion_r209090647 ## File path: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestDistributedFragmentRun.java ## @@ -82,16 +84,17 @@ public void oneBitOneExchangeTwoEntryRun() throws Exception{ public void oneBitOneExchangeTwoEntryRunLogical() throws Exception{ RemoteServiceSet serviceSet = RemoteServiceSet.getLocalServiceSet(); -try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet); DrillClient client = new DrillClient(CONFIG, serviceSet.getCoordinator());){ -bit1.run(); -client.connect(); -List results = client.runQuery(QueryType.LOGICAL, Files.toString(DrillFileUtils.getResourceAsFile("/scan_screen_logical.json"), Charsets.UTF_8)); -int count = 0; -for(QueryDataBatch b : results){ -count += b.getHeader().getRowCount(); -b.release(); -} -assertEquals(100, count); +try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet); Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.
ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements. URL: https://github.com/apache/drill/pull/1415#discussion_r209090761 ## File path: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestDistributedFragmentRun.java ## @@ -101,7 +104,9 @@ public void oneBitOneExchangeTwoEntryRunLogical() throws Exception{ public void twoBitOneExchangeTwoEntryRun() throws Exception{ RemoteServiceSet serviceSet = RemoteServiceSet.getLocalServiceSet(); - try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet); Drillbit bit2 = new Drillbit(CONFIG, serviceSet); DrillClient client = new DrillClient(CONFIG, serviceSet.getCoordinator());){ + try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet); + Drillbit bit2 = new Drillbit(CONFIG, serviceSet); + DrillClient client = new DrillClient(CONFIG, serviceSet.getCoordinator())){ Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.
ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements. URL: https://github.com/apache/drill/pull/1415#discussion_r209090671 ## File path: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestDistributedFragmentRun.java ## @@ -82,16 +84,17 @@ public void oneBitOneExchangeTwoEntryRun() throws Exception{ public void oneBitOneExchangeTwoEntryRunLogical() throws Exception{ RemoteServiceSet serviceSet = RemoteServiceSet.getLocalServiceSet(); -try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet); DrillClient client = new DrillClient(CONFIG, serviceSet.getCoordinator());){ -bit1.run(); -client.connect(); -List results = client.runQuery(QueryType.LOGICAL, Files.toString(DrillFileUtils.getResourceAsFile("/scan_screen_logical.json"), Charsets.UTF_8)); -int count = 0; -for(QueryDataBatch b : results){ -count += b.getHeader().getRowCount(); -b.release(); -} -assertEquals(100, count); +try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet); +DrillClient client = new DrillClient(CONFIG, serviceSet.getCoordinator())){ Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.
ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements. URL: https://github.com/apache/drill/pull/1415#discussion_r209090584 ## File path: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestDistributedFragmentRun.java ## @@ -63,7 +64,8 @@ public void oneBitOneExchangeOneEntryRun() throws Exception{ public void oneBitOneExchangeTwoEntryRun() throws Exception{ RemoteServiceSet serviceSet = RemoteServiceSet.getLocalServiceSet(); -try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet); DrillClient client = new DrillClient(CONFIG, serviceSet.getCoordinator());){ +try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet); +DrillClient client = new DrillClient(CONFIG, serviceSet.getCoordinator())){ Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.
ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements. URL: https://github.com/apache/drill/pull/1415#discussion_r209090505 ## File path: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestDistributedFragmentRun.java ## @@ -63,7 +64,8 @@ public void oneBitOneExchangeOneEntryRun() throws Exception{ public void oneBitOneExchangeTwoEntryRun() throws Exception{ RemoteServiceSet serviceSet = RemoteServiceSet.getLocalServiceSet(); -try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet); DrillClient client = new DrillClient(CONFIG, serviceSet.getCoordinator());){ +try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet); Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.
ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements. URL: https://github.com/apache/drill/pull/1415#discussion_r209090387 ## File path: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestDistributedFragmentRun.java ## @@ -43,7 +43,8 @@ public void oneBitOneExchangeOneEntryRun() throws Exception{ RemoteServiceSet serviceSet = RemoteServiceSet.getLocalServiceSet(); -try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet); DrillClient client = new DrillClient(CONFIG, serviceSet.getCoordinator());){ +try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet); +DrillClient client = new DrillClient(CONFIG, serviceSet.getCoordinator())){ Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.
ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements. URL: https://github.com/apache/drill/pull/1415#discussion_r209090432 ## File path: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestDistributedFragmentRun.java ## @@ -43,7 +43,8 @@ public void oneBitOneExchangeOneEntryRun() throws Exception{ RemoteServiceSet serviceSet = RemoteServiceSet.getLocalServiceSet(); -try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet); DrillClient client = new DrillClient(CONFIG, serviceSet.getCoordinator());){ +try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet); Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.
ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements. URL: https://github.com/apache/drill/pull/1415#discussion_r209089770 ## File path: exec/java-exec/src/test/java/org/apache/drill/exec/fn/interp/ExpressionInterpreterTest.java ## @@ -227,7 +227,7 @@ private ValueVector evalExprWithInterpreter(String expression, RecordBatch batch } private void showValueVectorContent(ValueVector vw) { -for (int row = 0; row < vw.getAccessor().getValueCount(); row ++ ) { +for (int row = 0; row < vw.getAccessor().getValueCount(); row++ ) { Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.
ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements. URL: https://github.com/apache/drill/pull/1415#discussion_r209089299 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/explain/NumberingRelWriter.java ## @@ -84,7 +84,9 @@ protected void explain_( s.append(" "); if (id != null && id.opId == 0) { - for(int i =0; i < spacer.get(); i++){ s.append('-');} + for(int i = 0; i < spacer.get(); i++) { Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.
ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements. URL: https://github.com/apache/drill/pull/1415#discussion_r209089217 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillOptiq.java ## @@ -302,9 +302,12 @@ private LogicalExpression getDrillCastFunctionFromOptiq(RexCall call){ castType = Types.required(MinorType.VARCHAR).toBuilder().setPrecision(call.getType().getPrecision()).build(); break; - case "INTEGER": castType = Types.required(MinorType.INT); break; - case "FLOAT": castType = Types.required(MinorType.FLOAT4); break; - case "DOUBLE": castType = Types.required(MinorType.FLOAT8); break; + case "INTEGER": castType = Types.required(MinorType.INT); Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.
ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements. URL: https://github.com/apache/drill/pull/1415#discussion_r209088852 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java ## @@ -235,7 +235,7 @@ public static boolean isScalarSubquery(RelNode root) { if (currentrel instanceof DrillAggregateRel) { agg = (DrillAggregateRel)currentrel; } else if (currentrel instanceof RelSubset) { -currentrel = ((RelSubset) currentrel).getBest() ; +currentrel = ((RelSubset)currentrel).getBest(); Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.
ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements. URL: https://github.com/apache/drill/pull/1415#discussion_r209088799 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/HashJoinPOP.java ## @@ -88,6 +88,6 @@ public void setMaxAllocation(long maxAllocation) { public boolean isBufferedOperator(QueryContext queryContext) { // In case forced to use a single partition - do not consider this a buffered op (when memory is divided) return queryContext == null || - 1 < (int)queryContext.getOptions().getOption(ExecConstants.HASHJOIN_NUM_PARTITIONS_VALIDATOR) ; + 1 < (int)queryContext.getOptions().getOption(ExecConstants.HASHJOIN_NUM_PARTITIONS_VALIDATOR); Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.
ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements. URL: https://github.com/apache/drill/pull/1415#discussion_r209085677 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/HashAggregate.java ## @@ -104,6 +104,6 @@ public void setMaxAllocation(long maxAllocation) { public boolean isBufferedOperator(QueryContext queryContext) { // In case forced to use a single partition - do not consider this a buffered op (when memory is divided) return queryContext == null || - 1 < (int)queryContext.getOptions().getOption(ExecConstants.HASHAGG_NUM_PARTITIONS_VALIDATOR) ; + 1 < (int)queryContext.getOptions().getOption(ExecConstants.HASHAGG_NUM_PARTITIONS_VALIDATOR); Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.
ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements. URL: https://github.com/apache/drill/pull/1415#discussion_r209085506 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/StringFunctions.java ## @@ -713,7 +713,7 @@ public void eval() { if (length.value > 0) { charLen = Math.min((int)length.value, charCount); //left('abc', 5) -> 'abc' } else if (length.value < 0) { - charLen = Math.max(0, charCount + (int)length.value) ; // left('abc', -5) ==> '' + charLen = Math.max(0, charCount + (int)length.value); // left('abc', -5) ==> '' Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kkhatua commented on issue #1279: DRILL-5735: Allow search/sort in the Options webUI
kkhatua commented on issue #1279: DRILL-5735: Allow search/sort in the Options webUI URL: https://github.com/apache/drill/pull/1279#issuecomment-411901380 @arina-ielchiieva Rebased this PR with additional changes on the latest master. Please review. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.
ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements. URL: https://github.com/apache/drill/pull/1415#discussion_r209081893 ## File path: contrib/storage-hbase/src/main/java/org/apache/drill/exec/store/hbase/HBasePushFilterIntoScan.java ## @@ -125,7 +125,7 @@ protected void doPushFilterToScan(final RelOptRuleCall call, final FilterPrel fi final ScanPrel newScanPrel = ScanPrel.create(scan, filter.getTraitSet(), newGroupsScan, scan.getRowType()); // Depending on whether is a project in the middle, assign either scan or copy of project to childRel. -final RelNode childRel = project == null ? newScanPrel : project.copy(project.getTraitSet(), ImmutableList.of((RelNode)newScanPrel));; +final RelNode childRel = project == null ? newScanPrel : project.copy(project.getTraitSet(), ImmutableList.of((RelNode)newScanPrel)); Review comment: Removed the cast. IntelliJ said it was unnecessary. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Resolved] (DRILL-6674) Minor fixes to avoid auto boxing cost in logging in LateralJoinBatch
[ https://issues.apache.org/jira/browse/DRILL-6674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sorabh Hamirwasia resolved DRILL-6674. -- Resolution: Fixed > Minor fixes to avoid auto boxing cost in logging in LateralJoinBatch > > > Key: DRILL-6674 > URL: https://issues.apache.org/jira/browse/DRILL-6674 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.14.0 >Reporter: Sorabh Hamirwasia >Assignee: Sorabh Hamirwasia >Priority: Major > Labels: ready-to-commit > Fix For: 1.15.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] sohami closed pull request #1427: DRILL-6674: Minor fixes to avoid auto boxing cost in logging in Later…
sohami closed pull request #1427: DRILL-6674: Minor fixes to avoid auto boxing cost in logging in Later… URL: https://github.com/apache/drill/pull/1427 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/LateralJoinBatch.java b/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/LateralJoinBatch.java index 18843b5b4b0..ff33e2f3c05 100644 --- a/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/LateralJoinBatch.java +++ b/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/LateralJoinBatch.java @@ -276,8 +276,9 @@ public int getRecordCount() { */ @Override public RecordBatch getIncoming() { -Preconditions.checkState (left != null, "Retuning null left batch. It's unexpected since right side will only be " + - "called iff there is any valid left batch"); +Preconditions.checkState (left != null, + "Retuning null left batch. It's unexpected since right side will only be called iff " + +"there is any valid left batch"); return left; } @@ -348,7 +349,8 @@ protected void buildSchema() throws SchemaChangeException { if (!prefetchFirstBatchFromBothSides()) { return; } -Preconditions.checkState(right.getRecordCount() == 0, "Unexpected non-empty first right batch received"); +Preconditions.checkState(right.getRecordCount() == 0, + "Unexpected non-empty first right batch received"); // Setup output container schema based on known left and right schema setupNewSchema(); @@ -728,7 +730,6 @@ private void finalizeOutputContainer() { // Set the record count in the container container.setRecordCount(outputIndex); -container.buildSchema(BatchSchema.SelectionVectorMode.NONE); batchMemoryManager.updateOutgoingStats(outputIndex); @@ -976,12 +977,19 @@ private void crossJoinAndOutputRecords() { int leftRowId = leftJoinIndex + 1; int numRowsCopied = 0; - Preconditions.checkState(currentRowId <= leftRecordCount || leftJoinIndex <= leftRecordCount, -"Either RowId in right batch is greater than total records in left batch or all rows in left batch " + - "is processed but there are still rows in right batch. Details[RightRowId: %s, LeftRecordCount: %s, " + - "LeftJoinIndex: %s, RightJoinIndex: %s]", currentRowId, leftRecordCount, leftJoinIndex, rightJoinIndex); + if (currentRowId > leftRecordCount || leftJoinIndex > leftRecordCount) { +// Not using Preconditions.checkState here since along with condition evaluation there will be cost of boxing +// the arguments. +throw new IllegalStateException(String.format("Either RowId in right batch is greater than total records in " + + "left batch or all rows in left batch is processed but there are still rows in right batch. " + + "Details[RightRowId: %s, LeftRecordCount: %s, LeftJoinIndex: %s, RightJoinIndex: %s]", + currentRowId, leftRecordCount, leftJoinIndex, rightJoinIndex)); + } - logger.trace("leftRowId and currentRowId are: {}, {}", leftRowId, currentRowId); + if (logger.isTraceEnabled()) { +// Inside the if condition to eliminate parameter boxing cost +logger.trace("leftRowId and currentRowId are: {}, {}", leftRowId, currentRowId); + } // If leftRowId matches the rowId in right row then emit left and right row. Increment outputIndex, rightJoinIndex // and numRowsCopied. Also set leftMatchFound to true to indicate when to increase leftJoinIndex. @@ -1089,10 +1097,14 @@ private void setupInputOutputVectors(RecordBatch batch, int startVectorIndex, in private void copyDataToOutputVectors(int fromRowIndex, int toRowIndex, Map mapping, int numRowsToCopy, boolean isRightBatch) { for (Map.Entry entry : mapping.entrySet()) { - logger.trace("Copying data from incoming batch vector to outgoing batch vector. [IncomingBatch: " + + + if (logger.isTraceEnabled()) { +// Inside the if condition to eliminate parameter boxing cost +logger.trace("Copying data from incoming batch vector to outgoing batch vector. [IncomingBatch: " + "(RowIndex: {}, ColumnName: {}), OutputBatch: (RowIndex: {}, ColumnName: {}) and Other: (TimeEachValue: {})]", -fromRowIndex, entry.getKey().getField().getName(), toRowIndex, entry.getValue().getField().getName(), -numRowsToCopy); + fromRowIndex, entry.getKey().getField().getName(), toRowIndex, entry.getValue().getField().getName(), + numRowsToCopy); + } // Copy data from input vector to output
[GitHub] ilooner commented on issue #1408: DRILL-6453: Resolve deadlock when reading from build and probe sides simultaneously in HashJoin
ilooner commented on issue #1408: DRILL-6453: Resolve deadlock when reading from build and probe sides simultaneously in HashJoin URL: https://github.com/apache/drill/pull/1408#issuecomment-411890416 @Ben-Zvi Thanks for the review and suggestions. The code is much cleaner now. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ilooner commented on a change in pull request #1408: DRILL-6453: Resolve deadlock when reading from build and probe sides simultaneously in HashJoin
ilooner commented on a change in pull request #1408: DRILL-6453: Resolve deadlock when reading from build and probe sides simultaneously in HashJoin URL: https://github.com/apache/drill/pull/1408#discussion_r209072214 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinMemoryCalculatorImpl.java ## @@ -365,17 +299,20 @@ protected void initialize(boolean autoTune, double loadFactor) { Preconditions.checkState(!firstInitialized); Preconditions.checkArgument(initialPartitions >= 1); + // If we had probe data before there should still be probe data now. + // If we didn't have probe data before we could get some new data now. + Preconditions.checkState(probeSizePredictor.hasData() && !probeEmpty || !probeSizePredictor.hasData()); Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ilooner commented on a change in pull request #1408: DRILL-6453: Resolve deadlock when reading from build and probe sides simultaneously in HashJoin
ilooner commented on a change in pull request #1408: DRILL-6453: Resolve deadlock when reading from build and probe sides simultaneously in HashJoin URL: https://github.com/apache/drill/pull/1408#discussion_r209066972 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/BatchSizePredictor.java ## @@ -0,0 +1,79 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.physical.impl.join; + +import org.apache.drill.exec.record.RecordBatch; + +/** + * This class predicts the sizes of batches given an input batch. + * + * Invariants + * + * The {@link BatchSizePredictor} assumes that a {@link RecordBatch} is in a state where it can return a valid record count. + * + */ +public interface BatchSizePredictor { + /** + * Gets the batchSize computed in the call to {@link #updateStats()}. Returns 0 if {@link #hasData()} is false. + * @return Gets the batchSize computed in the call to {@link #updateStats()}. Returns 0 if {@link #hasData()} is false. + * @throws IllegalStateException if {@link #updateStats()} was never called. + */ + long getBatchSize(); + + /** + * Gets the number of records computed in the call to {@link #updateStats()}. Returns 0 if {@link #hasData()} is false. + * @return Gets the number of records computed in the call to {@link #updateStats()}. Returns 0 if {@link #hasData()} is false. + * @throws IllegalStateException if {@link #updateStats()} was never called. + */ + int getNumRecords(); + + /** + * True if the input batch had records in the last call to {@link #updateStats()}. False otherwise. + * @return True if the input batch had records in the last call to {@link #updateStats()}. False otherwise. + */ + boolean hasData(); Review comment: Renamed to hadDataLastTime This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ilooner commented on a change in pull request #1408: DRILL-6453: Resolve deadlock when reading from build and probe sides simultaneously in HashJoin
ilooner commented on a change in pull request #1408: DRILL-6453: Resolve deadlock when reading from build and probe sides simultaneously in HashJoin URL: https://github.com/apache/drill/pull/1408#discussion_r209066765 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinMemoryCalculatorImpl.java ## @@ -420,31 +357,32 @@ public long getMaxReservedMemory() { private void calculateMemoryUsage() { // Adjust based on number of records - maxBuildBatchSize = computeMaxBatchSizeNoHash(buildBatchSize, buildNumRecords, -maxBatchNumRecordsBuild, fragmentationFactor, safetyFactor); - maxProbeBatchSize = computeMaxBatchSizeNoHash(probeBatchSize, probeNumRecords, -maxBatchNumRecordsProbe, fragmentationFactor, safetyFactor); - - // Safety factor can be multiplied at the end since these batches are coming from exchange operators, so no excess value vector doubling - partitionBuildBatchSize = computeMaxBatchSize(buildBatchSize, -buildNumRecords, -recordsPerPartitionBatchBuild, -fragmentationFactor, -safetyFactor, -reserveHash); + maxBuildBatchSize = buildSizePredictor.predictBatchSize(maxBatchNumRecordsBuild, false); - // Safety factor can be multiplied at the end since these batches are coming from exchange operators, so no excess value vector doubling - partitionProbeBatchSize = computeMaxBatchSize( -probeBatchSize, -probeNumRecords, -recordsPerPartitionBatchProbe, -fragmentationFactor, -safetyFactor, -reserveHash); + if (probeSizePredictor.hasData()) { +// We have probe data and we can compute the max incoming size. +maxProbeBatchSize = probeSizePredictor.predictBatchSize(maxBatchNumRecordsProbe, false); + } else { +// We don't have probe data +if (probeEmpty) { + // We know the probe has no data, so we don't need to reserve any space for the incoming probe + maxProbeBatchSize = 0; +} else { + // The probe side may have data, so assume it is the max incoming batch size. This assumption + // can fail in some cases since the batch sizing project is incomplete. + maxProbeBatchSize = maxIncomingBatchSize; +} + } + + partitionBuildBatchSize = buildSizePredictor.predictBatchSize(recordsPerPartitionBatchBuild, reserveHash); + + if (probeSizePredictor.hasData()) { +partitionProbeBatchSize = probeSizePredictor.predictBatchSize(recordsPerPartitionBatchProbe, reserveHash); + } maxOutputBatchSize = (long) ((double)outputBatchSize * fragmentationFactor * safetyFactor); - long probeReservedMemory; + long probeReservedMemory = -1; Review comment: Removed -1 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
Re: [ANNOUNCE] Apache Drill Release 1.14.0
Thanks Vlad for the advice and the link. Looking back at prior releases, the 1.9.0 announcement (by Sudheesh) was also sent to the annou...@apache.org, but then he got feedback like "What's this project about?" and "Why should I care?" , so the following announcements avoided the Apache general list. The link does suggest " put 3-5 lines blurb for the project. " , which could address such feedbacks. We should try and compose a good "blurb" first. Boaz On Thu, Aug 9, 2018 at 11:14 AM, Vlad Rozov wrote: > I'd recommend announcing the release on the ASF-wide mailing list ( > annou...@apache.org) as well as there may be other community members who > may get interested in the new functionality. Please see [1] for the > requirements. > > Thank you, > > Vlad > > [1] https://urldefense.proofpoint.com/v2/url?u=http-3A__www. > apache.org_legal_release-2Dpolicy.html-23release- > 2Dannouncements=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg= > PqKay2uOMZUqopDRKNfBtZSlsp2meGOxWNAVHxHnXCk=wgS7HUblyvBJyV2zHVsb_ > f3xin2DwJkVwQLS8RXwH1M=qgOPe8LyLXFsI1T3WIIXnvmSWJASg6LeVldtgQKjuBw= > > On 2018/08/06 06:15:55, Charles Givre wrote: > > Thanks Boaz and great work everyone! > > > > Sent from my iPhone > > > > > On Aug 5, 2018, at 21:52, Abhishek Girish wrote: > > > > > > Congratulations, everyone! And Boaz, thanks so much for coordinating > the > > > release. > > > > > > Folks, please try out 1.14 - it's our best release yet! > > > > > >> On Sat, Aug 4, 2018 at 11:35 PM Boaz Ben-Zvi wrote: > > >> > > >> On behalf of the Apache Drill community, I am happy to announce the > > >> release of Apache Drill 1.14.0. > > >> > > >> For information about Apache Drill, and to get involved, visit the > project > > >> website [1]. > > >> > > >> This release of Drill provides the following many new features and > > >> improvements: > > >> > > >> > = > > >> > > >> - Ability to run Drill in a Docker container. (DRILL-6346) > > >> > > >> - Ability to export and save your storage plugin configurations to a > JSON > > >> file for reuse. (DRILL-4580) > > >> > > >> - Ability to manage storage plugin configurations in the Drill > > >> configuration file, storage-plugins-override.conf. (DRILL-6494) > > >> > > >> - Functions that return data type information. (DRILL-6361) > > >> > > >> - The Drill kafka storage plugin supports filter pushdown for query > > >> conditions on certain Kafka metadata fields in messages. (DRILL-5977) > > >> > > >> - Spill to disk for the Hash Join operator. (DRILL-6027) > > >> > > >> - The dfs storage plugin supports a Logfile plugin extension that > enables > > >> Drill to directly read and query log files of any format. (DRILL-6104) > > >> > > >> - Phonetic and string distance functions. (DRILL-6519) > > >> > > >> - The store.hive.conf.properties option enables you to specify Hive > > >> properties at the session level using the SET command. (DRILL-6575) > > >> > > >> - Drill can directly manage the CPU resources through the Drill > start-up > > >> script, drill-env.sh; you no longer have to manually add the PID to > the > > >> cgroup.procs file each time a Drillbit restarts. (DRILL-143) > > >> > > >> - Drill can query the metadata in various image formats with the image > > >> metadata format plugin. (DRILL-4364) > > >> > > >> - Enhanced decimal data type support. (DRILL-6094) > > >> > > >> - Option to push LIMIT(0) on top of SCAN. (DRILL-6574) > > >> > > >> - Parquet filter pushdown improvements. (DRILL-6174) > > >> > > >> - Drill can infer filter conditions for join queries and push the > filter > > >> conditions down to the data source. (DRILL-6173) > > >> > > >> - Drill uses a native reader to read Hive tables when you enable the > > >> store.hive.optimize_scan_with_native_readers option. When enabled, > Drill > > >> reads data faster and applies filter pushdown optimizations. > (DRILL-6331) > > >> > > >> - Early release of lateral join. (DRILL-5999) > > >> > > >> === > > >> > > >> For the full list please see the release notes at [2]. > > >> > > >> The binary and source artifacts are available here [3]. > > >> > > >> 1. https://urldefense.proofpoint.com/v2/url?u=https-3A__drill. > apache.org_=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg= > PqKay2uOMZUqopDRKNfBtZSlsp2meGOxWNAVHxHnXCk=wgS7HUblyvBJyV2zHVsb_ > f3xin2DwJkVwQLS8RXwH1M=G_3ypTlStJfrELX2T2sHdvi0H567Rj-3Ccim7xIgzZE= > > >> 2. https://urldefense.proofpoint.com/v2/url?u=https-3A__drill. > apache.org_docs_apache-2Ddrill-2D1-2D14-2D0-2Drelease-2Dnotes_=DwIBaQ= > cskdkSMqhcnjZxdQVpwTXg=PqKay2uOMZUqopDRKNfBtZSlsp2meGOxWNAVHxHnXCk= > wgS7HUblyvBJyV2zHVsb_f3xin2DwJkVwQLS8RXwH1M=HjAcDFcq9VkLkb6to_ > NU8HnHuqBLEfmO0rha4JH_FGU= > > >> 3. https://urldefense.proofpoint.com/v2/url?u=https-3A__drill. > apache.org_download_=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg= > PqKay2uOMZUqopDRKNfBtZSlsp2meGOxWNAVHxHnXCk=wgS7HUblyvBJyV2zHVsb_ >
Re: [ANNOUNCE] Apache Drill Release 1.14.0
I'd recommend announcing the release on the ASF-wide mailing list (annou...@apache.org) as well as there may be other community members who may get interested in the new functionality. Please see [1] for the requirements. Thank you, Vlad [1] http://www.apache.org/legal/release-policy.html#release-announcements On 2018/08/06 06:15:55, Charles Givre wrote: > Thanks Boaz and great work everyone! > > Sent from my iPhone > > > On Aug 5, 2018, at 21:52, Abhishek Girish wrote: > > > > Congratulations, everyone! And Boaz, thanks so much for coordinating the > > release. > > > > Folks, please try out 1.14 - it's our best release yet! > > > >> On Sat, Aug 4, 2018 at 11:35 PM Boaz Ben-Zvi wrote: > >> > >> On behalf of the Apache Drill community, I am happy to announce the > >> release of Apache Drill 1.14.0. > >> > >> For information about Apache Drill, and to get involved, visit the project > >> website [1]. > >> > >> This release of Drill provides the following many new features and > >> improvements: > >> > >> = > >> > >> - Ability to run Drill in a Docker container. (DRILL-6346) > >> > >> - Ability to export and save your storage plugin configurations to a JSON > >> file for reuse. (DRILL-4580) > >> > >> - Ability to manage storage plugin configurations in the Drill > >> configuration file, storage-plugins-override.conf. (DRILL-6494) > >> > >> - Functions that return data type information. (DRILL-6361) > >> > >> - The Drill kafka storage plugin supports filter pushdown for query > >> conditions on certain Kafka metadata fields in messages. (DRILL-5977) > >> > >> - Spill to disk for the Hash Join operator. (DRILL-6027) > >> > >> - The dfs storage plugin supports a Logfile plugin extension that enables > >> Drill to directly read and query log files of any format. (DRILL-6104) > >> > >> - Phonetic and string distance functions. (DRILL-6519) > >> > >> - The store.hive.conf.properties option enables you to specify Hive > >> properties at the session level using the SET command. (DRILL-6575) > >> > >> - Drill can directly manage the CPU resources through the Drill start-up > >> script, drill-env.sh; you no longer have to manually add the PID to the > >> cgroup.procs file each time a Drillbit restarts. (DRILL-143) > >> > >> - Drill can query the metadata in various image formats with the image > >> metadata format plugin. (DRILL-4364) > >> > >> - Enhanced decimal data type support. (DRILL-6094) > >> > >> - Option to push LIMIT(0) on top of SCAN. (DRILL-6574) > >> > >> - Parquet filter pushdown improvements. (DRILL-6174) > >> > >> - Drill can infer filter conditions for join queries and push the filter > >> conditions down to the data source. (DRILL-6173) > >> > >> - Drill uses a native reader to read Hive tables when you enable the > >> store.hive.optimize_scan_with_native_readers option. When enabled, Drill > >> reads data faster and applies filter pushdown optimizations. (DRILL-6331) > >> > >> - Early release of lateral join. (DRILL-5999) > >> > >> === > >> > >> For the full list please see the release notes at [2]. > >> > >> The binary and source artifacts are available here [3]. > >> > >> 1. https://drill.apache.org/ > >> 2. https://drill.apache.org/docs/apache-drill-1-14-0-release-notes/ > >> 3. https://drill.apache.org/download/ > >> > >> Thanks to everyone in the community who contributed to this release! > >> > >>Boaz > >> > >> > >> > >> >
Re: 回复:Is Drill query execution processing model just the same idea with the Spark whole-stage codegen improvement
Hi Alex, Perhaps Parth can jump in here as he has deeper knowledge of Parquet. My understanding is that the planning-time metadata is used for partition (directory) and row group pruning. By scanning the footer of each Parquet row group, Drill can determine whether that group contains data that the query must process. Only groups that could contain target data are included in the list of groups to be scanned by the query's scan operators. For example, suppose we do a query over a date range and a US state: SELECT saleDate, prodId, quantity FROM `sales`WHERE dir0 BETWEEN '2018-07-01` AND '2018-07-31'AND state = 'CA' Suppose sales is a directory that contains subdirectories partitioned by date, and that state is a dictionary-encoded field with metadata in the Parquet footer. In this case, Drill reads the directory structure to do the partition pruning at planning time. Only the matching directories are added to the list of files to be scanned at run time. Then, Drill scans the footers of each Parquet row group to find those with `state` fields that contain the value of 'CA'. (I believe the footer scanning is done only for matching partitions, but I've not looked at that code recently.) Finally, Drill passes the matching set of files to the distribution planner, which assigns files to minor fragments distributed across the Drill cluster. The result attempts to maximize locality and minimize skew. Could this have been done differently? There is a trade-off. If the work is done in the planner, then it can be very expensive for large data sets. But, on the other hand, the resulting distribution plan has minimal skew: each file that is scheduled for scanning does, in fact, need to be scanned. Suppose that Drill pushed the partition and row group filtering to the scan operator. In this case, operators that received files outside of the target date range would do nothing, while those with files in the range would need to do further work, resulting in possible skew. (One could argue that, if files are randomly distributed across nodes, then, on average, each scan operator will see roughly the same number of included and excluded files, reducing skew. This would be an interesting experiment to try.) Another approach might be to do a two-step execution. First distribute the work of doing the partition pruning and row group footer checking. Gather the results. Then, finish planning the query and distribute the execution as is done today. Here, much new code would be needed to implement this new form of task, which is probably why it has not yet been done. Might be interesting for someone to prototype that to see if we get improvement in planning performance. My guess is that, a prototype could be built that does the "pre-selection" work as a Drill query, just to see if distribution of the work helps. If that is a win, then perhaps a more robust solution could be developed from there. The final factor is Drill's crude-but-effective metadata caching mechanism. Drill reads footer and directory information for all Parquet files in a directory structure and stores it in a series of possibly-large JSON files, one in each directory. This mechanism has no good concurrency control. (This is a long-standing, hard-to-fix bug.) Centralizing access in the Foreman reduces the potential for concurrent writes on metadata refresh (but does not eliminate them since the Forman are themselves distributed.) Further, since the metadata is in one big file (in the root directory), and is in JSON (which does not support block-splitting), then it is actually not possible to distribute the planning-time selection work. I understand that the team is looking at alternative metadata implementations to avoid some of these existing issues. (There was some discussion on the dev list over the last few months.) In theory it would be possible to adopt a different metadata structure that would better support distributed selection planning. Perhaps farm out work at each directory level: in each directory, farm out work for the subdirectories (if they match the partition selection) and files. Recursively apply this down the directory tree. But, again, Drill does not have a general purpose task distribution (map/reduce) mechanism; Drill only knows how to distribute queries. Store metadata in a format that is versioned (to avoid concurrent read/write access), and block-oriented (to allow splitting the work to read a large metadata file.) It is interesting to note that several systems (including NetFlix's MetaCat, [1]) also hit bottlenecks in metadata access when metadata is stored in a relational DB, as Hive does. Hive-oriented tools have the same kind of metadata bottleneck, but for different reasons: because of the huge load placed on the Hive metastore. So, there is an opportunity for innovation in this area by observing, as you did, that metadata itself is a big data problem and
Re: Drill + Flink
>From the integration between Drill and Flink point of view, such possibility >sounds feasible. There may be a few technical challenges to solve as queriable >state API is still evolving, AFAIK. For now, I'll be more interested in a use >case for the integration between Drill and Flink or any other streaming >platform that provides an ability to query its pipeline state (Apex has such >ability as well, even though API is not formalized). Thank you, Vlad On 2018/08/09 14:09:38, Charles Givre wrote: > Hi Vlad, > I was thinking of Flink’s queriable state. A few people were asking me > questions about Flink so I started doing some digging into its internals and > it seemed like an interesting possibility. > — C > > > On Aug 8, 2018, at 11:28, Vlad Rozov wrote: > > > > Do you refer to Flink's queriable state? If not, can you describe a use > > case? > > > > Thank you, > > > > Vlad > > > > On 2018/08/08 14:45:50, Charles Givre wrote: > >> Hello all, > >> Has there been any discussion of creating a storage plugin for Drill to be > >> able to query Apache Flink? > >> —C > >
[GitHub] vrozov commented on a change in pull request #1428: DRILL-6670: align Parquet TIMESTAMP_MICROS logical type handling with earlier versions + minor fixes
vrozov commented on a change in pull request #1428: DRILL-6670: align Parquet TIMESTAMP_MICROS logical type handling with earlier versions + minor fixes URL: https://github.com/apache/drill/pull/1428#discussion_r209014522 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/AbstractParquetScanBatchCreator.java ## @@ -108,7 +108,7 @@ protected ScanBatch getBatch(ExecutorFragmentContext context, AbstractParquetRow if (!context.getOptions().getBoolean(ExecConstants.PARQUET_NEW_RECORD_READER) && !ParquetReaderUtility.containsComplexColumn(footer, rowGroupScan.getColumns())) { - logger.debug("Query {} qualifies for new Parquet reader", + logger.debug("Query {} qualifies for ParquetRecordReader", Review comment: IMO, it will be much better to log `PARQUET_NEW_RECORD_READER` option and whether or not there are complex columns prior to creating a reader and reader instance class after the reader is created: ``` log.debug("PARQUET_NEW_RECORD_READER is {}. Complex columns {}.", "enabled"/"disabled", "found"/"not found"); Reader reader; if (...) { reader = new ParquetRecordReader()l } else { reader = new DrillParquetReader(); } log.debug("Query {} uses {}); readers.add(reader); ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sohami commented on issue #1423: Updated URL for Apache Parquet logical types
sohami commented on issue #1423: Updated URL for Apache Parquet logical types URL: https://github.com/apache/drill/pull/1423#issuecomment-411812573 @davechallis - Thanks for your changes, since it's already merged by @bbevens on doc side I think we can close this PR now. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
Re: 回复:Is Drill query execution processing model just the same idea with the Spark whole-stage codegen improvement
Hi Paul and Drill developers, I am sorry for slight off-topic maybe, but I noticed that Drill's foreman collects metadata of all queried files in PLANNING state (ref. class e.g. MetadataGatherer), at least in case of Parquet when using dfs plugin. That costs a lot of time when number of queried files is substantial since MetadataGatherer task is not distributed across cluster nodes. What is the reason behind this collection? This doesn't seem to match schema-on-read philosophy, but then it's maybe just me or my setup, I am still very new to Drill. Also, I appreciate the advantage of metastore-free operations - in theory it makes Drill more reliable and less costly to run. At the same time there is Drill Metadata store project, meaning that evolution is actually shifting from metastore-free system. What are the reasons of that evolution? Or is it going to be an additional optional feature? Thanks, Best Regards, Alex On Tue, Aug 7, 2018 at 10:25 PM, Paul Rogers wrote: > Hi Qiaoyi, > > In general, optimal performance occurs when a system knows the schema at > the start and can fully optimize based on that schema. Think C or C++ > compilers compared with Java or Python. > > On the other hand, the JVM HotSpot optimizer has shown that one can > achieve very good performance via incremental optimization, but at the cost > of extreme runtime complexity. The benefit is that Java is much more > flexible, machine independent, and simpler than C or C++ (at least for > non-system applications.) > > Python is the other extreme: it is so dynamic that the literature has > shown that it is very difficult to optimize Python at the compiler or > runtime level. (Though, there is some interesting research on this topic. > See [1], [2].) > > Drill is somewhere in the middle. Drill does not do code generation at the > start like Impala or Spark do. Nor is it fully interpreted. Rather, Drill > is roughly like Java: code generation is done at runtime based on the > observed data types. (The JVM does machine code generation based on > observed execution patterns.) The advantage is that Drill is able to > achieve its pretty-good performance without the cost of a metadata system > to provide schema at plan time. > > So, to get the absolute fastest performance (think Impala), you must pay > the cost of a metadata system for all queries. Drill gets nearly as good > performance without the complexity of the Hive metastore -- a pretty good > tradeoff. > > If I may ask, what is your area of interest? Are you looking to use Drill > for some project? Or, just interested in how Drill works? > > Thanks, > - Paul > > [1] Towards Practical Gradual Typing: https://blog.acolyer. > org/2015/08/03/towards-practical-gradual-typing/ > [2] Is Sound Gradual Typing Dead? https://blog.acolyer. > org/2016/02/05/is-sound-gradual-typing-dead/ > > > > > > > > On Sunday, August 5, 2018, 11:36:52 PM PDT, 丁乔毅(智乔) < > qiaoyi.din...@alibaba-inc.com> wrote: > > Thanks Paul, good to know the design principals of the Drill query > execution process model. > I am very new to Drill, please bear with me. > > One more question. > As you mentioned, the schema-free processing is the key feature to be > advantage over Spark, is there any performance consideration behind this > design except the techniques of the dynamic codegen and vectorization > computation? > > Regards, > Qiaoyi > > > -- > 发件人:Paul Rogers > 发送时间:2018年8月4日(星期六) 02:27 > 收件人:dev > 主 题:Re: Is Drill query execution processing model just the same idea with > the Spark whole-stage codegen improvement > > Hi Qiaoyi, > As you noted, Drill and Spark have similar models -- but with important > differences. > Drill is schema-on-read (also called "schema less"). In particular, this > means that Drill does not know the schema of the data until the first row > (actually "record batch") arrives at each operator. Once Drill sees that > first batch, it has a data schema, and can generate the corresponding code; > but only for that one operator. > The above process repeats up the fragment ("fragment" is Drill's term for > a Spark stage.) > I believe that Spark requires (or at least allows) the user to define a > schema up front. This is particularly true for the more modern data frame > APIs. > Do you think the Spark improvement would apply to Drill's case of > determining the schema operator-by-opeartor up the DAG? > Thanks, > - Paul > > > > On Friday, August 3, 2018, 8:57:29 AM PDT, 丁乔毅(智乔) < > qiaoyi.din...@alibaba-inc.com> wrote: > > > Hi, all. > > I'm very new to Apache Drill. > > I'm quite interest in Drill query execution's implementation. > After a little bit of source code reading, I found it is built on a > processing model quite like a data-centric pushed-based style, which is > very similar with the idea behind the Spark whole-stage codegen > improvement(jira ticket https://issues.apache.org/jira/browse/SPARK-12795) > > And I
Re: Drill + Flink
Hi Vlad, I was thinking of Flink’s queriable state. A few people were asking me questions about Flink so I started doing some digging into its internals and it seemed like an interesting possibility. — C > On Aug 8, 2018, at 11:28, Vlad Rozov wrote: > > Do you refer to Flink's queriable state? If not, can you describe a use case? > > Thank you, > > Vlad > > On 2018/08/08 14:45:50, Charles Givre wrote: >> Hello all, >> Has there been any discussion of creating a storage plugin for Drill to be >> able to query Apache Flink? >> —C
[GitHub] okalinin commented on issue #1428: DRILL-6670: align Parquet TIMESTAMP_MICROS logical type handling with earlier versions
okalinin commented on issue #1428: DRILL-6670: align Parquet TIMESTAMP_MICROS logical type handling with earlier versions URL: https://github.com/apache/drill/pull/1428#issuecomment-411726502 I will push another ParquetRecordReader fix for logical type handling on Parquet files with dictionary encoding disabled. Please do not review PR yet. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.
vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements. URL: https://github.com/apache/drill/pull/1415#discussion_r208870590 ## File path: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestDistributedFragmentRun.java ## @@ -82,16 +84,17 @@ public void oneBitOneExchangeTwoEntryRun() throws Exception{ public void oneBitOneExchangeTwoEntryRunLogical() throws Exception{ RemoteServiceSet serviceSet = RemoteServiceSet.getLocalServiceSet(); -try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet); DrillClient client = new DrillClient(CONFIG, serviceSet.getCoordinator());){ -bit1.run(); -client.connect(); -List results = client.runQuery(QueryType.LOGICAL, Files.toString(DrillFileUtils.getResourceAsFile("/scan_screen_logical.json"), Charsets.UTF_8)); -int count = 0; -for(QueryDataBatch b : results){ -count += b.getHeader().getRowCount(); -b.release(); -} -assertEquals(100, count); +try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet); Review comment: `try(` -> `try (` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.
vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements. URL: https://github.com/apache/drill/pull/1415#discussion_r208866703 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/HashAggregate.java ## @@ -104,6 +104,6 @@ public void setMaxAllocation(long maxAllocation) { public boolean isBufferedOperator(QueryContext queryContext) { // In case forced to use a single partition - do not consider this a buffered op (when memory is divided) return queryContext == null || - 1 < (int)queryContext.getOptions().getOption(ExecConstants.HASHAGG_NUM_PARTITIONS_VALIDATOR) ; + 1 < (int)queryContext.getOptions().getOption(ExecConstants.HASHAGG_NUM_PARTITIONS_VALIDATOR); Review comment: `(int)queryContext` -> `(int) queryContext` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.
vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements. URL: https://github.com/apache/drill/pull/1415#discussion_r208870154 ## File path: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestDistributedFragmentRun.java ## @@ -43,7 +43,8 @@ public void oneBitOneExchangeOneEntryRun() throws Exception{ RemoteServiceSet serviceSet = RemoteServiceSet.getLocalServiceSet(); -try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet); DrillClient client = new DrillClient(CONFIG, serviceSet.getCoordinator());){ +try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet); +DrillClient client = new DrillClient(CONFIG, serviceSet.getCoordinator())){ Review comment: `){` -> `) {` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.
vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements. URL: https://github.com/apache/drill/pull/1415#discussion_r208871013 ## File path: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestDistributedFragmentRun.java ## @@ -101,7 +104,9 @@ public void oneBitOneExchangeTwoEntryRunLogical() throws Exception{ public void twoBitOneExchangeTwoEntryRun() throws Exception{ RemoteServiceSet serviceSet = RemoteServiceSet.getLocalServiceSet(); - try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet); Drillbit bit2 = new Drillbit(CONFIG, serviceSet); DrillClient client = new DrillClient(CONFIG, serviceSet.getCoordinator());){ + try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet); Review comment: `try(` -> `try (` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.
vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements. URL: https://github.com/apache/drill/pull/1415#discussion_r208870633 ## File path: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestDistributedFragmentRun.java ## @@ -82,16 +84,17 @@ public void oneBitOneExchangeTwoEntryRun() throws Exception{ public void oneBitOneExchangeTwoEntryRunLogical() throws Exception{ RemoteServiceSet serviceSet = RemoteServiceSet.getLocalServiceSet(); -try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet); DrillClient client = new DrillClient(CONFIG, serviceSet.getCoordinator());){ -bit1.run(); -client.connect(); -List results = client.runQuery(QueryType.LOGICAL, Files.toString(DrillFileUtils.getResourceAsFile("/scan_screen_logical.json"), Charsets.UTF_8)); -int count = 0; -for(QueryDataBatch b : results){ -count += b.getHeader().getRowCount(); -b.release(); -} -assertEquals(100, count); +try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet); +DrillClient client = new DrillClient(CONFIG, serviceSet.getCoordinator())){ Review comment: `){` -> `) {` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.
vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements. URL: https://github.com/apache/drill/pull/1415#discussion_r208870431 ## File path: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestDistributedFragmentRun.java ## @@ -43,7 +43,8 @@ public void oneBitOneExchangeOneEntryRun() throws Exception{ RemoteServiceSet serviceSet = RemoteServiceSet.getLocalServiceSet(); -try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet); DrillClient client = new DrillClient(CONFIG, serviceSet.getCoordinator());){ +try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet); Review comment: `try(` -> `try (` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.
vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements. URL: https://github.com/apache/drill/pull/1415#discussion_r208871104 ## File path: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestDistributedFragmentRun.java ## @@ -101,7 +104,9 @@ public void oneBitOneExchangeTwoEntryRunLogical() throws Exception{ public void twoBitOneExchangeTwoEntryRun() throws Exception{ RemoteServiceSet serviceSet = RemoteServiceSet.getLocalServiceSet(); - try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet); Drillbit bit2 = new Drillbit(CONFIG, serviceSet); DrillClient client = new DrillClient(CONFIG, serviceSet.getCoordinator());){ + try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet); + Drillbit bit2 = new Drillbit(CONFIG, serviceSet); + DrillClient client = new DrillClient(CONFIG, serviceSet.getCoordinator())){ Review comment: `){` -> `) {` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.
vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements. URL: https://github.com/apache/drill/pull/1415#discussion_r208870019 ## File path: exec/java-exec/src/test/java/org/apache/drill/exec/fn/interp/ExpressionInterpreterTest.java ## @@ -227,7 +227,7 @@ private ValueVector evalExprWithInterpreter(String expression, RecordBatch batch } private void showValueVectorContent(ValueVector vw) { -for (int row = 0; row < vw.getAccessor().getValueCount(); row ++ ) { +for (int row = 0; row < vw.getAccessor().getValueCount(); row++ ) { Review comment: Please remove space: `row++ )` -> `row++)` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.
vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements. URL: https://github.com/apache/drill/pull/1415#discussion_r208869017 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/explain/NumberingRelWriter.java ## @@ -84,7 +84,9 @@ protected void explain_( s.append(" "); if (id != null && id.opId == 0) { - for(int i =0; i < spacer.get(); i++){ s.append('-');} + for(int i = 0; i < spacer.get(); i++) { Review comment: `for(int i` -> `for (int i` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.
vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements. URL: https://github.com/apache/drill/pull/1415#discussion_r208870496 ## File path: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestDistributedFragmentRun.java ## @@ -63,7 +64,8 @@ public void oneBitOneExchangeOneEntryRun() throws Exception{ public void oneBitOneExchangeTwoEntryRun() throws Exception{ RemoteServiceSet serviceSet = RemoteServiceSet.getLocalServiceSet(); -try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet); DrillClient client = new DrillClient(CONFIG, serviceSet.getCoordinator());){ +try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet); Review comment: `try(` -> `try (` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.
vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements. URL: https://github.com/apache/drill/pull/1415#discussion_r208871299 ## File path: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/protocol/TestOperatorRecordBatch.java ## @@ -99,7 +99,10 @@ public BatchAccessor batchAccessor() { } @Override -public boolean buildSchema() { buildSchemaCalled = true; return ! schemaEOF; } +public boolean buildSchema() { + buildSchemaCalled = true; + return ! schemaEOF; Review comment: `! schemaEOF` -> `!schemaEOF` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.
vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements. URL: https://github.com/apache/drill/pull/1415#discussion_r208865479 ## File path: contrib/storage-hbase/src/main/java/org/apache/drill/exec/store/hbase/HBasePushFilterIntoScan.java ## @@ -125,7 +125,7 @@ protected void doPushFilterToScan(final RelOptRuleCall call, final FilterPrel fi final ScanPrel newScanPrel = ScanPrel.create(scan, filter.getTraitSet(), newGroupsScan, scan.getRowType()); // Depending on whether is a project in the middle, assign either scan or copy of project to childRel. -final RelNode childRel = project == null ? newScanPrel : project.copy(project.getTraitSet(), ImmutableList.of((RelNode)newScanPrel));; +final RelNode childRel = project == null ? newScanPrel : project.copy(project.getTraitSet(), ImmutableList.of((RelNode)newScanPrel)); Review comment: Could you please add space: `(RelNode)newScanPrel` -> `(RelNode) newScanPrel` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.
vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements. URL: https://github.com/apache/drill/pull/1415#discussion_r208871483 ## File path: exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/ParquetSimpleTestFileGenerator.java ## @@ -292,7 +292,8 @@ public static void writeComplexValues(GroupFactory gf, ParquetWriter comp .append("_INT_64", 0x7FFFL) .append("_UINT_64", 0xL) .append("_DECIMAL_decimal18", 0xL); - byte[] bytes = new byte[30]; Arrays.fill(bytes, (byte)1); + byte[] bytes = new byte[30]; + Arrays.fill(bytes, (byte)1); Review comment: `(byte)1` -> `(byte) 1` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.
vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements. URL: https://github.com/apache/drill/pull/1415#discussion_r208870539 ## File path: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestDistributedFragmentRun.java ## @@ -63,7 +64,8 @@ public void oneBitOneExchangeOneEntryRun() throws Exception{ public void oneBitOneExchangeTwoEntryRun() throws Exception{ RemoteServiceSet serviceSet = RemoteServiceSet.getLocalServiceSet(); -try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet); DrillClient client = new DrillClient(CONFIG, serviceSet.getCoordinator());){ +try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet); +DrillClient client = new DrillClient(CONFIG, serviceSet.getCoordinator())){ Review comment: `){` -> `) {` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.
vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements. URL: https://github.com/apache/drill/pull/1415#discussion_r208866591 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/StringFunctions.java ## @@ -713,7 +713,7 @@ public void eval() { if (length.value > 0) { charLen = Math.min((int)length.value, charCount); //left('abc', 5) -> 'abc' } else if (length.value < 0) { - charLen = Math.max(0, charCount + (int)length.value) ; // left('abc', -5) ==> '' + charLen = Math.max(0, charCount + (int)length.value); // left('abc', -5) ==> '' Review comment: Could you please add space: `(int)length` -> `(int) length` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.
vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements. URL: https://github.com/apache/drill/pull/1415#discussion_r208868830 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillOptiq.java ## @@ -302,9 +302,12 @@ private LogicalExpression getDrillCastFunctionFromOptiq(RexCall call){ castType = Types.required(MinorType.VARCHAR).toBuilder().setPrecision(call.getType().getPrecision()).build(); break; - case "INTEGER": castType = Types.required(MinorType.INT); break; - case "FLOAT": castType = Types.required(MinorType.FLOAT4); break; - case "DOUBLE": castType = Types.required(MinorType.FLOAT8); break; + case "INTEGER": castType = Types.required(MinorType.INT); Review comment: Could you please move the assigning to the next line here and in this class below This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.
vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements. URL: https://github.com/apache/drill/pull/1415#discussion_r208866832 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/HashJoinPOP.java ## @@ -88,6 +88,6 @@ public void setMaxAllocation(long maxAllocation) { public boolean isBufferedOperator(QueryContext queryContext) { // In case forced to use a single partition - do not consider this a buffered op (when memory is divided) return queryContext == null || - 1 < (int)queryContext.getOptions().getOption(ExecConstants.HASHJOIN_NUM_PARTITIONS_VALIDATOR) ; + 1 < (int)queryContext.getOptions().getOption(ExecConstants.HASHJOIN_NUM_PARTITIONS_VALIDATOR); Review comment: `(int)queryContext` -> `(int) queryContext` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.
vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements. URL: https://github.com/apache/drill/pull/1415#discussion_r208871960 ## File path: exec/jdbc/src/test/java/org/apache/drill/jdbc/StatementTest.java ## @@ -83,7 +83,7 @@ public static void tearDownStatement() throws SQLException { public void testDefaultGetQueryTimeout() throws SQLException { try(Statement stmt = connection.createStatement()) { int timeoutValue = stmt.getQueryTimeout(); - assertEquals( 0 , timeoutValue ); + assertEquals( 0, timeoutValue ); Review comment: `( 0, timeoutValue )` -> `(0, timeoutValue)` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.
vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements. URL: https://github.com/apache/drill/pull/1415#discussion_r208871597 ## File path: exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/ParquetSimpleTestFileGenerator.java ## @@ -375,7 +376,8 @@ public static void writeSimpleValues(SimpleGroupFactory sgf, ParquetWriter `(byte) 1` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.
vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements. URL: https://github.com/apache/drill/pull/1415#discussion_r208867531 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java ## @@ -235,7 +235,7 @@ public static boolean isScalarSubquery(RelNode root) { if (currentrel instanceof DrillAggregateRel) { agg = (DrillAggregateRel)currentrel; } else if (currentrel instanceof RelSubset) { -currentrel = ((RelSubset) currentrel).getBest() ; +currentrel = ((RelSubset)currentrel).getBest(); Review comment: `(RelSubset)currentrel` -> `(RelSubset) currentrel` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vvysotskyi commented on a change in pull request #1426: DRILL-6671: Multi level lateral unnest join is throwing an exception …
vvysotskyi commented on a change in pull request #1426: DRILL-6671: Multi level lateral unnest join is throwing an exception … URL: https://github.com/apache/drill/pull/1426#discussion_r208841312 ## File path: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/lateraljoin/TestE2EUnnestAndLateral.java ## @@ -169,13 +169,31 @@ public void testLeftLateral_WithFilterAndLimitInSubQuery() throws Exception { @Test public void testMultiUnnestAtSameLevel() throws Exception { String Sql = "EXPLAIN PLAN FOR SELECT customer.c_name, customer.c_address, U1.order_id, U1.order_amt," + Review comment: And here: `Sql` -> `sql` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vvysotskyi commented on a change in pull request #1426: DRILL-6671: Multi level lateral unnest join is throwing an exception …
vvysotskyi commented on a change in pull request #1426: DRILL-6671: Multi level lateral unnest join is throwing an exception … URL: https://github.com/apache/drill/pull/1426#discussion_r208839659 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/planner/fragment/Materializer.java ## @@ -138,7 +139,7 @@ public PhysicalOperator visitUnnest(UnnestPOP unnest, IndexedFragmentNode value) final Wrapper info; final int minorFragmentId; Review comment: It would be good to make them private too. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vvysotskyi commented on a change in pull request #1426: DRILL-6671: Multi level lateral unnest join is throwing an exception …
vvysotskyi commented on a change in pull request #1426: DRILL-6671: Multi level lateral unnest join is throwing an exception … URL: https://github.com/apache/drill/pull/1426#discussion_r208840867 ## File path: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/lateraljoin/TestE2EUnnestAndLateral.java ## @@ -169,13 +169,31 @@ public void testLeftLateral_WithFilterAndLimitInSubQuery() throws Exception { @Test public void testMultiUnnestAtSameLevel() throws Exception { String Sql = "EXPLAIN PLAN FOR SELECT customer.c_name, customer.c_address, U1.order_id, U1.order_amt," + - " U1.itemName, U1.itemNum" + " FROM cp.`lateraljoin/nested-customer.parquet` customer, LATERAL" + - " (SELECT t.ord.o_id AS order_id, t.ord.o_amount AS order_amt, U2.item_name AS itemName, U2.item_num AS " + -"itemNum FROM UNNEST(customer.orders) t(ord) , LATERAL" + - " (SELECT t1.ord.i_name AS item_name, t1.ord.i_number AS item_num FROM UNNEST(t.ord) AS t1(ord)) AS U2) AS U1"; +" U1.itemName, U1.itemNum" + " FROM cp.`lateraljoin/nested-customer.parquet` customer, LATERAL" + +" (SELECT t.ord.o_id AS order_id, t.ord.o_amount AS order_amt, U2.item_name AS itemName, U2.item_num AS " + +"itemNum FROM UNNEST(customer.orders) t(ord) , LATERAL" + +" (SELECT t1.ord.i_name AS item_name, t1.ord.i_number AS item_num FROM UNNEST(t.ord) AS t1(ord)) AS U2) AS U1"; runAndLog(Sql); } + @Test + public void testMultiUnnestAtSameLevelExec() throws Exception { Review comment: Could you please rewrite this test to validate query result with some expected result, since it is possible that after some changes both these queries may return the same incorrect result. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vvysotskyi commented on a change in pull request #1426: DRILL-6671: Multi level lateral unnest join is throwing an exception …
vvysotskyi commented on a change in pull request #1426: DRILL-6671: Multi level lateral unnest join is throwing an exception … URL: https://github.com/apache/drill/pull/1426#discussion_r208841144 ## File path: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/lateraljoin/TestE2EUnnestAndLateral.java ## @@ -169,13 +169,31 @@ public void testLeftLateral_WithFilterAndLimitInSubQuery() throws Exception { @Test public void testMultiUnnestAtSameLevel() throws Exception { String Sql = "EXPLAIN PLAN FOR SELECT customer.c_name, customer.c_address, U1.order_id, U1.order_amt," + - " U1.itemName, U1.itemNum" + " FROM cp.`lateraljoin/nested-customer.parquet` customer, LATERAL" + - " (SELECT t.ord.o_id AS order_id, t.ord.o_amount AS order_amt, U2.item_name AS itemName, U2.item_num AS " + -"itemNum FROM UNNEST(customer.orders) t(ord) , LATERAL" + - " (SELECT t1.ord.i_name AS item_name, t1.ord.i_number AS item_num FROM UNNEST(t.ord) AS t1(ord)) AS U2) AS U1"; +" U1.itemName, U1.itemNum" + " FROM cp.`lateraljoin/nested-customer.parquet` customer, LATERAL" + +" (SELECT t.ord.o_id AS order_id, t.ord.o_amount AS order_amt, U2.item_name AS itemName, U2.item_num AS " + +"itemNum FROM UNNEST(customer.orders) t(ord) , LATERAL" + +" (SELECT t1.ord.i_name AS item_name, t1.ord.i_number AS item_num FROM UNNEST(t.ord) AS t1(ord)) AS U2) AS U1"; runAndLog(Sql); } + @Test + public void testMultiUnnestAtSameLevelExec() throws Exception { +String Sql = "SELECT customer.c_name, customer.c_address, U1.order_id, U1.order_amt," + Review comment: Please rename string in lower case: `Sql` -> `sql` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vvysotskyi commented on a change in pull request #1426: DRILL-6671: Multi level lateral unnest join is throwing an exception …
vvysotskyi commented on a change in pull request #1426: DRILL-6671: Multi level lateral unnest join is throwing an exception … URL: https://github.com/apache/drill/pull/1426#discussion_r208839935 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/visitor/LateralUnnestRowIDVisitor.java ## @@ -59,16 +65,37 @@ public Prel visitPrel(Prel prel, Boolean isRightOfLateral) throws RuntimeExcepti } @Override - public Prel visitLateral(LateralJoinPrel prel, Boolean value) throws RuntimeException { + public Prel visitLateral(LateralJoinPrel prel, Boolean isRightOfLateral) throws RuntimeException { List children = Lists.newArrayList(); -children.add(((Prel)prel.getInput(0)).accept(this, value)); +children.add(((Prel)prel.getInput(0)).accept(this, isRightOfLateral)); Review comment: Please add space here and in the line below: `(Prel)prel` -> `(Prel) prel` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vvysotskyi commented on a change in pull request #1426: DRILL-6671: Multi level lateral unnest join is throwing an exception …
vvysotskyi commented on a change in pull request #1426: DRILL-6671: Multi level lateral unnest join is throwing an exception … URL: https://github.com/apache/drill/pull/1426#discussion_r208840141 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/visitor/LateralUnnestRowIDVisitor.java ## @@ -59,16 +65,37 @@ public Prel visitPrel(Prel prel, Boolean isRightOfLateral) throws RuntimeExcepti } @Override - public Prel visitLateral(LateralJoinPrel prel, Boolean value) throws RuntimeException { + public Prel visitLateral(LateralJoinPrel prel, Boolean isRightOfLateral) throws RuntimeException { List children = Lists.newArrayList(); -children.add(((Prel)prel.getInput(0)).accept(this, value)); +children.add(((Prel)prel.getInput(0)).accept(this, isRightOfLateral)); children.add(((Prel) prel.getInput(1)).accept(this, true)); -return (Prel) prel.copy(prel.getTraitSet(), children); +if (!isRightOfLateral) { + return (Prel) prel.copy(prel.getTraitSet(), children); +} else { + //Adjust the column numbering due to an additional column "$drill_implicit_field$" is added to the inputs. + Map requiredColsMap = new HashMap<>(); + for (Integer corrColIndex : prel.getRequiredColumns()) { +requiredColsMap.put(corrColIndex, corrColIndex+1); Review comment: Please add spaces: `corrColIndex+1` -> `corrColIndex + 1` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vvysotskyi commented on a change in pull request #1426: DRILL-6671: Multi level lateral unnest join is throwing an exception …
vvysotskyi commented on a change in pull request #1426: DRILL-6671: Multi level lateral unnest join is throwing an exception … URL: https://github.com/apache/drill/pull/1426#discussion_r208856911 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/visitor/LateralUnnestRowIDVisitor.java ## @@ -59,16 +65,37 @@ public Prel visitPrel(Prel prel, Boolean isRightOfLateral) throws RuntimeExcepti } @Override - public Prel visitLateral(LateralJoinPrel prel, Boolean value) throws RuntimeException { + public Prel visitLateral(LateralJoinPrel prel, Boolean isRightOfLateral) throws RuntimeException { List children = Lists.newArrayList(); -children.add(((Prel)prel.getInput(0)).accept(this, value)); +children.add(((Prel)prel.getInput(0)).accept(this, isRightOfLateral)); children.add(((Prel) prel.getInput(1)).accept(this, true)); -return (Prel) prel.copy(prel.getTraitSet(), children); +if (!isRightOfLateral) { + return (Prel) prel.copy(prel.getTraitSet(), children); +} else { + //Adjust the column numbering due to an additional column "$drill_implicit_field$" is added to the inputs. + Map requiredColsMap = new HashMap<>(); + for (Integer corrColIndex : prel.getRequiredColumns()) { +requiredColsMap.put(corrColIndex, corrColIndex+1); + } + ImmutableBitSet requiredColumns = prel.getRequiredColumns().shift(1); + + CorrelationId corrId = prel.getCluster().createCorrel(); + RexCorrelVariable rexCorrel = + (RexCorrelVariable) prel.getCluster().getRexBuilder().makeCorrel( + children.get(0).getRowType(), + corrId); + RelNode rightChild = children.get(1).accept( Review comment: Looks like `ProjectCorrelateTransposeRule.RelNodesExprsHandler` does not replace RexCorrelVariable for the case when RelNode does not have any input. In our case, this is `UnnestPrel`. So we need to extend this class and override method `visit(RelNode other)`: ``` @Override public RelNode visit(RelNode other) { return super.visit(other.accept(rexVisitor)); } ``` Also, it does not work since `UnnestPrel` use `accept(RexShuttle shuttle)` method from `AbstractRelNode` which does nothing, so we should override it somehow like: ``` @Override public RelNode accept(RexShuttle shuttle) { RexNode ref = shuttle.apply(this.ref); if (this.ref == ref) { return this; } return new UnnestPrel(getCluster(), traitSet, rowType, ref); } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vvysotskyi commented on a change in pull request #1426: DRILL-6671: Multi level lateral unnest join is throwing an exception …
vvysotskyi commented on a change in pull request #1426: DRILL-6671: Multi level lateral unnest join is throwing an exception … URL: https://github.com/apache/drill/pull/1426#discussion_r208839528 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/planner/fragment/Materializer.java ## @@ -116,14 +118,13 @@ public PhysicalOperator visitLateralJoin(LateralJoinPOP op, IndexedFragmentNode children.add(op.getLeft().accept(this, iNode)); children.add(op.getRight().accept(this, iNode)); -UnnestPOP unnestInLeftInput = iNode.getUnnest(); +UnnestPOP unnestForThisLateral = iNode.getUnnest(); PhysicalOperator newOp = op.getNewWithChildren(children); newOp.setCost(op.getCost()); newOp.setOperatorId(Short.MAX_VALUE & op.getOperatorId()); -((LateralJoinPOP)newOp).setUnnestForLateralJoin(unnestInLeftInput); - +((LateralJoinPOP)newOp).setUnnestForLateralJoin(unnestForThisLateral); Review comment: Please add space: `((LateralJoinPOP)newOp)` -> `((LateralJoinPOP) newOp)` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] davechallis commented on issue #1423: Updated URL for Apache Parquet logical types
davechallis commented on issue #1423: Updated URL for Apache Parquet logical types URL: https://github.com/apache/drill/pull/1423#issuecomment-411687191 @sohami would you still like me to open a Jira ticket, or has this been resolved now? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] okalinin opened a new pull request #1428: DRILL-6670: align Parquet TIMESTAMP_MICROS logical type handling with earlier versions
okalinin opened a new pull request #1428: DRILL-6670: align Parquet TIMESTAMP_MICROS logical type handling with earlier versions URL: https://github.com/apache/drill/pull/1428 ## DRILL-6670: align Parquet TIMESTAMP_MICROS logical type handling with earlier versions ### Background Parquet dependencies were upgraded with DRILL-6353 which changed behavior in how Parquet handles `TIMESTAMP_MICROS` logical type. Previously, calling `SchemaElement.getConverted_type()` was returning null and logical type was ignored. With updated Parquet version above call returns `TIMESTAMP_MICROS` which triggers query exception. Issue impacts both ParquetRecordReader and DrillParquetReader. This change aims at restoring original functionality - handling `TIMESTAMP_MICROS` as `INT64` with no logical type in both Parquet readers. It doesn't seem to make sense to do more since `TIMESTAMP_MICROS` is deprecated logical type as per Parquet [current documentation](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md). ### Change description - Added `TIMESTAMP_MICROS` logical type in both Parquet readers in order to handle it as regular `INT64` - Modified `ParquetSimpleTestFileGenerator` to include `TIMESTAMP_MICROS` columns in Parquet test files and updated existing test data files - Modified respective tests with minor regrouping - Fixed `TestDrillParquetReader` to make sure correct reader is actually used to handle test queries - Improved debug lines to avoid using ambiguous 'new/old Parquet reader' terminology ### Level of testing - Sample file and query provided by issue reporter - Unit tests This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (DRILL-6677) Check style unused import check conflicts with Eclipse
Paul Rogers created DRILL-6677: -- Summary: Check style unused import check conflicts with Eclipse Key: DRILL-6677 URL: https://issues.apache.org/jira/browse/DRILL-6677 Project: Apache Drill Issue Type: Bug Affects Versions: 1.15.0 Reporter: Paul Rogers Consider the following Java snippet: {code} import com.foo.Bar; /** This is an reference to \{\@link com.foo.Bar\} {code} Eclipse will notice the reference to {{com.foo.Bar}} in the Javadoc comment and its automatic import fixer-upper will include the import. But, check style appears to ignore Javadoc imports. So, Check style reports the import as unused. The only way, at present, to make Check style happy is to turn off the Eclipse import fixer-upper and do everything manually. Request: modify check style to also check for class references in Javadoc comments as such references are required for the Javadoc to build. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6676) Add Union, List and Repeated List types to Result Set Loader
Paul Rogers created DRILL-6676: -- Summary: Add Union, List and Repeated List types to Result Set Loader Key: DRILL-6676 URL: https://issues.apache.org/jira/browse/DRILL-6676 Project: Apache Drill Issue Type: Improvement Affects Versions: 1.15.0 Reporter: Paul Rogers Assignee: Paul Rogers Fix For: 1.15.0 Add support for the "obscure" vector types to the {{ResultSetLoader}}: * Union * List * Repeated List -- This message was sent by Atlassian JIRA (v7.6.3#76005)