[GitHub] drill issue #889: DRILL-5691: enhance scalar sub queries checking for the ca...
Github user weijietong commented on the issue: https://github.com/apache/drill/pull/889 @amansinha100 thanks for sharing the information. Got your point. I think your propose on [CALCITE-1048](https://issues.apache.org/jira/browse/CALCITE-1048) is possible. Since [CALCITE-794](https://issues.apache.org/jira/browse/CALCITE-794) has completed at version 1.6 ,it seems there's a more perfect solution( to get the least max number of all the rels of the RelSubSet). But due to Drill's Caclite version is still based on 1.4 , I support your current temp solution. Only wonder that whether the explicitly searched RelNode's (such as DrillAggregateRel) maxRowCount can represent the best RelNode's maxRowCount ? ---
[GitHub] drill issue #889: DRILL-5691: enhance scalar sub queries checking for the ca...
Github user amansinha100 commented on the issue: https://github.com/apache/drill/pull/889 @weijietong I am interested in getting your basic changes in. It is unfortunate we are running into this issue with RelSubSet. Let me see if I can make some changes on top of your changes (I will keep your authorship) and have it pass all our functional regression tests. ---
[GitHub] drill issue #889: DRILL-5691: enhance scalar sub queries checking for the ca...
Github user amansinha100 commented on the issue: https://github.com/apache/drill/pull/889 @weijietong I ran the functional tests with the PR and saw some failures. While debugging those failures, I found the simple scalar aggregate case works but anything with compound aggregate expressions throws a CannotPlanException, for example: explain plan for select count(*) from cp.`tpch/lineitem.parquet` l where l_quantity < (select max(l2.l_quantity) + 1 from cp.`tpch/lineitem.parquet` l2) Here, the right side of the inequality join is actually a scalar but the JoinUtils.isScalar() method does not detect it correctly. The reason is the right side is a DrillProjectRel whose input is a RelSubSet. The RelSubSet has a DrillAggregateRel but it looks like currently Calcite does not detect this is a scalar aggregate if it occurs as part of a RelSubSet. I see Calcite has a comment in [1] which points to an open JIRA CALCITE-1048. [1] https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/metadata/RelMdMaxRowCount.java#L165 ---
[GitHub] drill issue #889: DRILL-5691: enhance scalar sub queries checking for the ca...
Github user weijietong commented on the issue: https://github.com/apache/drill/pull/889 ping @arina-ielchiieva @amansinha100 ---
[GitHub] drill issue #889: DRILL-5691: enhance scalar sub queries checking for the ca...
Github user weijietong commented on the issue: https://github.com/apache/drill/pull/889 @amansinha100 I have removed the override of `getMaxRowCount` method of `TableScan` type to avoid your worry about unexpected results . This PR's target adjusts to just enhance the current cartesian join by leveraging Calcite's `RelMdMaxRowCount` method. I will solve our case specially at our env. ---
[GitHub] drill issue #889: DRILL-5691: enhance scalar sub queries checking for the ca...
Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/889 @arina-ielchiieva, @amansinha100 can you take another look at this one and see if your concerns have been addressed? ---
[GitHub] drill issue #889: DRILL-5691: enhance scalar sub queries checking for the ca...
Github user weijietong commented on the issue: https://github.com/apache/drill/pull/889 @arina-ielchiieva @amansinha100 any further advice ? ---
[GitHub] drill issue #889: DRILL-5691: enhance scalar sub queries checking for the ca...
Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/889 @arina-ielchiieva, looks like Weijie added a couple of commits since your +1. Can you take a look at them? ---
[GitHub] drill issue #889: DRILL-5691: enhance scalar sub queries checking for the ca...
Github user weijietong commented on the issue: https://github.com/apache/drill/pull/889 @arina-ielchiieva @amansinha100 isScalarSubquery method has been refactored , please review. ---
[GitHub] drill issue #889: DRILL-5691: enhance scalar sub queries checking for the ca...
Github user arina-ielchiieva commented on the issue: https://github.com/apache/drill/pull/889 +1, LGMT. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #889: DRILL-5691: enhance scalar sub queries checking for the ca...
Github user weijietong commented on the issue: https://github.com/apache/drill/pull/889 @arina-ielchiieva thanks for the advice, has corrected that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #889: DRILL-5691: enhance scalar sub queries checking for the ca...
Github user arina-ielchiieva commented on the issue: https://github.com/apache/drill/pull/889 Thanks @weijietong. LGTM. @amansinha100 could you please do the final CR since you are familiar with this issue? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #889: DRILL-5691: enhance scalar sub queries checking for the ca...
Github user weijietong commented on the issue: https://github.com/apache/drill/pull/889 @arina-ielchiieva other tests have passed separately and time zone related test cases have been corrected in PR 5717 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #889: DRILL-5691: enhance scalar sub queries checking for the ca...
Github user arina-ielchiieva commented on the issue: https://github.com/apache/drill/pull/889 On mine environment all tests pass. I used to have troubles but long time ago. I had to fix issues with timezones, memory usage `MAVEN_OPTS="-Xmx2048m -XX:MaxPermSize=1024m"`. Also I am pretty sure that some test have failed because the others did, usually the ones with Rpc exception. Maybe `UserRemote SYSTEM ERROR: URISyntaxException` is connected with your env. You should check that. Also please rebase on the latest master (you can do `push -f` again), also if `TestTpchDistributedConcurrent.testConcurrentQueries` will be failing again, try to disable it but make sure it passes separately. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #889: DRILL-5691: enhance scalar sub queries checking for the ca...
Github user weijietong commented on the issue: https://github.com/apache/drill/pull/889 @arina-ielchiieva have passed all the unit tests under java-exec bundle , got some errors which are sure not associated with this change, maybe my branch was old from the master. > TestConstantFolding.testConstantFolding_allTypes:163 » org.apache.drill.commo... TestCustomUserAuthenticator.positiveUserAuth » UserRemote SYSTEM ERROR: URISyn... TestCustomUserAuthenticator.positiveUserAuthAfterNegativeUserAuth » UserRemote TestInfoSchema.selectFromAllTables » UserRemote SYSTEM ERROR: URISyntaxExcepti... TestViewSupport.infoSchemaWithView:355->BaseTestQuery.testRunAndReturn:331 » Rpc TestInfoSchemaFilterPushDown.testFilterPushdown_NonEqual » UserRemote SYSTEM E... TestParquetScan.testSuccessFile:64->BaseTestQuery.testRunAndReturn:331 » Rpc o... TestTpchDistributedConcurrent.testConcurrentQueries:190 » test timed out afte... I also found the error to do `mvn test` ,got the same error as JIRA DRILL-4104 . through `mvn test -pl exec/java-exec` this method,I got the unit test result. I wonder how the devs do the test result . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #889: DRILL-5691: enhance scalar sub queries checking for the ca...
Github user arina-ielchiieva commented on the issue: https://github.com/apache/drill/pull/889 @weijietong thanks for code formatting. > I wonder if the enhanced code does not break the current unit test ,it will be ok. Can you please check this? Did all unit tests pass after the change? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #889: DRILL-5691: enhance scalar sub queries checking for the ca...
Github user weijietong commented on the issue: https://github.com/apache/drill/pull/889 @arina-ielchiieva your test case can not reproduce the error . You can search the dev email to find the origin error description with the keyword "Drill query planning error". Your query already satisfy the NestedLoopJoinPrule. My case is that I add another rule to change the Aggregate-->Aggregate-->Scan to Scan as the transformed Scan relnode already holding the count(distinct ) value. When this transformation occurs, the NestedLoopJoinPrule's checkPreconditions method will invoke JoinUtils.hasScalarSubqueryInput. Then it will fail, as the transformed relnode has no aggregate node which does not satisfy the current scalar rule. I think it's hard to reproduce this error without a specific rule like what I do. the precondition is: 1. a nested loop join 2. no (aggregate--> aggregate) count distinct relation nodes in the plan 3. the row number of one child of the nested loop join is 1 . I wonder if the enhanced code does not break the current unit test ,it will be ok. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #889: DRILL-5691: enhance scalar sub queries checking for the ca...
Github user arina-ielchiieva commented on the issue: https://github.com/apache/drill/pull/889 @weijietong regarding the unit test, I have tried to reproduce the problem and written the following unit test: ``` @Test public void test() throws Exception { FileSystem fs = null; try { fs = FileSystem.get(new Configuration()); // create table with partition pruning test("use dfs_test.tmp"); String tableName = "table_with_pruning"; Path dataFile = new Path(TestTools.getWorkingPath(),"src/test/resources/parquet/alltypes_required.parquet"); test("create table %s partition by (col_int) as select * from dfs.`%s`", tableName, dataFile); // generate metadata test("refresh table metadata `%s`", tableName); // execute query String query = String.format("select count(distinct col_int), count(distinct col_chr) from `%s` where col_int = 45436", tableName); test(query); } finally { if (fs != null) { fs.close(); } } } ``` `AbstractGroupScan.getScanStats` method returns one row but it does not fail. Can you please take a look? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #889: DRILL-5691: enhance scalar sub queries checking for the ca...
Github user weijietong commented on the issue: https://github.com/apache/drill/pull/889 @arina-ielchiieva I have corrected the codes as you guide. But sorry for the unit tests, I have tried a long time to simulate a scan with 1 row ,but failed to do that. The row count of a scan is fetched from the AbstractGroupScan.getScanStats method. At my plugin ,I override this method to ensure it will return 1 . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #889: DRILL-5691: enhance scalar sub queries checking for the ca...
Github user arina-ielchiieva commented on the issue: https://github.com/apache/drill/pull/889 @weijietong thanks for the PR. 1. Is it possible to add unit tests? 2. Can you please add in method description that a table scan with only one row should also be considered as a scalar as well. 3. Please fix indention in `if` statements and change `currentrel` to camel case [1]. [1] https://drill.apache.org/docs/apache-drill-contribution-guidelines/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---