[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework
[ https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880197#comment-16880197 ] ASF GitHub Bot commented on DRILL-7306: --- arina-ielchiieva commented on issue #1813: DRILL-7306: Disable schema-only batch for new scan framework URL: https://github.com/apache/drill/pull/1813#issuecomment-509175362 @paul-rogers no problem, glad that could help and thank you for your efforts. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Disable "fast schema" batch for new scan framework > -- > > Key: DRILL-7306 > URL: https://issues.apache.org/jira/browse/DRILL-7306 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.17.0 > > > The EVF framework is set up to return a "fast schema" empty batch with only > schema as its first batch because, when the code was written, it seemed > that's how we wanted operators to work. However, DRILL-7305 notes that many > operators cannot handle empty batches. > Since the empty-batch bugs show that Drill does not, in fact, provide a "fast > schema" batch, this ticket asks to disable the feature in the new scan > framework. The feature is disabled with a config option; it can be re-enabled > if ever it is needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework
[ https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880199#comment-16880199 ] ASF GitHub Bot commented on DRILL-7306: --- arina-ielchiieva commented on pull request #1813: DRILL-7306: Disable schema-only batch for new scan framework URL: https://github.com/apache/drill/pull/1813 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Disable "fast schema" batch for new scan framework > -- > > Key: DRILL-7306 > URL: https://issues.apache.org/jira/browse/DRILL-7306 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.17.0 > > > The EVF framework is set up to return a "fast schema" empty batch with only > schema as its first batch because, when the code was written, it seemed > that's how we wanted operators to work. However, DRILL-7305 notes that many > operators cannot handle empty batches. > Since the empty-batch bugs show that Drill does not, in fact, provide a "fast > schema" batch, this ticket asks to disable the feature in the new scan > framework. The feature is disabled with a config option; it can be re-enabled > if ever it is needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework
[ https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880198#comment-16880198 ] ASF GitHub Bot commented on DRILL-7306: --- arina-ielchiieva commented on issue #1813: DRILL-7306: Disable schema-only batch for new scan framework URL: https://github.com/apache/drill/pull/1813#issuecomment-509175395 +1, merging. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Disable "fast schema" batch for new scan framework > -- > > Key: DRILL-7306 > URL: https://issues.apache.org/jira/browse/DRILL-7306 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.17.0 > > > The EVF framework is set up to return a "fast schema" empty batch with only > schema as its first batch because, when the code was written, it seemed > that's how we wanted operators to work. However, DRILL-7305 notes that many > operators cannot handle empty batches. > Since the empty-batch bugs show that Drill does not, in fact, provide a "fast > schema" batch, this ticket asks to disable the feature in the new scan > framework. The feature is disabled with a config option; it can be re-enabled > if ever it is needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework
[ https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880030#comment-16880030 ] ASF GitHub Bot commented on DRILL-7306: --- paul-rogers commented on issue #1813: DRILL-7306: Disable schema-only batch for new scan framework URL: https://github.com/apache/drill/pull/1813#issuecomment-509087121 @arina-ielchiieva, thank you for running the tests. Glad to hear we finally have a clean run. I've squashed commits. Since this PR has been a struggle, let's do commit it by itself. Once that is done, I will rebase the other two PRs, then combine them into a single merge branch for your convenience. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Disable "fast schema" batch for new scan framework > -- > > Key: DRILL-7306 > URL: https://issues.apache.org/jira/browse/DRILL-7306 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.17.0 > > > The EVF framework is set up to return a "fast schema" empty batch with only > schema as its first batch because, when the code was written, it seemed > that's how we wanted operators to work. However, DRILL-7305 notes that many > operators cannot handle empty batches. > Since the empty-batch bugs show that Drill does not, in fact, provide a "fast > schema" batch, this ticket asks to disable the feature in the new scan > framework. The feature is disabled with a config option; it can be re-enabled > if ever it is needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework
[ https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16879864#comment-16879864 ] ASF GitHub Bot commented on DRILL-7306: --- arina-ielchiieva commented on issue #1813: DRILL-7306: Disable schema-only batch for new scan framework URL: https://github.com/apache/drill/pull/1813#issuecomment-509001976 @paul-rogers re-run the tests, now they pass. I guess there is no need to create merge branch for this PR, just squash the commits and rebase, I'll re-run the tests and merge. After this PR is merged, you will be able to proceed with the remaining two. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Disable "fast schema" batch for new scan framework > -- > > Key: DRILL-7306 > URL: https://issues.apache.org/jira/browse/DRILL-7306 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.17.0 > > > The EVF framework is set up to return a "fast schema" empty batch with only > schema as its first batch because, when the code was written, it seemed > that's how we wanted operators to work. However, DRILL-7305 notes that many > operators cannot handle empty batches. > Since the empty-batch bugs show that Drill does not, in fact, provide a "fast > schema" batch, this ticket asks to disable the feature in the new scan > framework. The feature is disabled with a config option; it can be re-enabled > if ever it is needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework
[ https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16879786#comment-16879786 ] ASF GitHub Bot commented on DRILL-7306: --- paul-rogers commented on issue #1813: DRILL-7306: Disable schema-only batch for new scan framework URL: https://github.com/apache/drill/pull/1813#issuecomment-508966464 @arina-ielchiieva, very sorry for the trouble that PR has caused. Thank you for the SF1 data. It allowed me to find the offending change and fix the issue. This is a case of "no good deed goes unpunished." I tried to work around the bad code explained in DRILL-7308 by setting precision only if non-zero. For reasons I did not track down, this change caused some part of the TPC-H queries to fail. The failure was in the "tuple/column metadata" classes. I had not realized that these classes are now used outside of just the new scan and schema work. (I did not track down the usage.) This exercise shows that Drill requires the following rules regarding precision: * Precision must be set for all types that need it (including VarChar and Decimal.) * Precision must be set for these types even if the value is zero. * Code that wants to know if the precision is non-zero should check the precision value itself. Code should not use an "is set" check as a substitute for "value != 0". (This is the flaw in the REST code described in DRILL-7308.) Filed DRILL-7318 to suggest we clean up and standardize our type-to-string implementations so that they build type strings based on the above rules. Reverted the precision-related change in {{PrimitiveColumnMetadata}}. The TPC-H union03 query now passes locally. I presume the others will also. The reversion caused a test to fail. It seems that the `EXPLAIN PLAN` for the new schema provisioning stuff uses the `toString()` method to get the type string shown in the plan for a provided schema. Not sure this is the best idea because `toString()` is for debugging and uses internal type names. It also shows the precision and scale for all types, resulting in the output `INT(0, 0):OPTIONAL`. (The type names and cardinality notation should be fixed as part of DRILL-7318.) The reverted change had removed the unwanted precision and scale. By reverting the change, the unwanted precision and scale returned. So, for now, I modified `MaterializedField.toString()` to not emit the precision if the value is zero (unless the type is Decimal.) `EXPLAIN PLAN` now produces `INT:OPTIONAL` (without the precision and scale.) Left the changes as a new commit so you can review just the new changes. Please try this PR again against the functional tests. Let's hope this time things work this time. If so, I'll squash commits. We talked about creating a merge branch with the three open PRs. Since you need to run the functional tests again to verify the fix, do you want to then just commit this PR? I can create a merge branch for the other two. Otherwise, I'm happy to create a merge branch with all three. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Disable "fast schema" batch for new scan framework > -- > > Key: DRILL-7306 > URL: https://issues.apache.org/jira/browse/DRILL-7306 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.17.0 > > > The EVF framework is set up to return a "fast schema" empty batch with only > schema as its first batch because, when the code was written, it seemed > that's how we wanted operators to work. However, DRILL-7305 notes that many > operators cannot handle empty batches. > Since the empty-batch bugs show that Drill does not, in fact, provide a "fast > schema" batch, this ticket asks to disable the feature in the new scan > framework. The feature is disabled with a config option; it can be re-enabled > if ever it is needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework
[ https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16879107#comment-16879107 ] ASF GitHub Bot commented on DRILL-7306: --- arina-ielchiieva commented on issue #1813: DRILL-7306: Disable schema-only batch for new scan framework URL: https://github.com/apache/drill/pull/1813#issuecomment-508696422 @paul-rogers I have re-cheked again (previous time I made runs on master and on your branch to ensure failures are caused by your changes), result is the same. ``` on commit c2c4f765dd039cf9073196e5078eebb942882f66 (DRILL-7306: Disable schema-only batch for new scan framework) two empty CSV failures on commit 6ca5902573d06239c366f7cd788e72697366f617 (Fixed empty result set issue) could not build the project [ERROR] /root/drillAutomation/builds/drill/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/ReaderState.java:[365,8] error: cannot find symbol symbol: variable batchCount location: class ReaderState on commit 32fb3a7f8f9861d967929bfb3487d935fc683ff3 (Additional debugging) Parquet failures ``` Link to SF1 data - https://s3-us-west-1.amazonaws.com/drill-public/tpch/sf1/tpch_sf1_parquet.tar.gz Tests were ran on 4 node cluster with the following options: ``` >> Query: alter system set `planner.enable_decimal_data_type` = true; ok summary true planner.enable_decimal_data_type updated. >> Query: alter system set `new_view_default_permissions` = '777'; ok summary true new_view_default_permissions updated. >> Query: alter system set `planner.enable_limit0_optimization` = true; ok summary true planner.enable_limit0_optimization updated. >> Query: alter system set `exec.errors.verbose` = true; ok summary true exec.errors.verbose updated. >> Query: alter system set `planner.memory.max_query_memory_per_node` = 10737418240; ok summary true planner.memory.max_query_memory_per_node updated. >> Query: alter system set `drill.exec.hashagg.fallback.enabled` = true; ok summary true drill.exec.hashagg.fallback.enabled updated. >> Query: alter system set `drill.exec.hashjoin.fallback.enabled` = true; ok summary true drill.exec.hashjoin.fallback.enabled updated. ``` > Taking a step back, I'm actually completely mystified at how my changes could impact Parquet (only). This PR only changed source files are for the "new" scan, which Parquet does not use. Oddly, none of the text file queries fail; which is the one area I did change. Well, in the PR you do change some common classes, so I guess it somehow influences, I don't think it purely connected with parquet just with some filtering or batch counting or something like that This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Disable "fast schema" batch for new scan framework > -- > > Key: DRILL-7306 > URL: https://issues.apache.org/jira/browse/DRILL-7306 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.17.0 > > > The EVF framework is set up to return a "fast schema" empty batch with only > schema as its first batch because, when the code was written, it seemed > that's how we wanted operators to work. However, DRILL-7305 notes that many > operators cannot handle empty batches. > Since the empty-batch bugs show that Drill does not, in fact, provide a "fast > schema" batch, this ticket asks to disable the feature in the new scan > framework. The feature is disabled with a config option; it can be re-enabled > if ever it is needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework
[ https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16879106#comment-16879106 ] ASF GitHub Bot commented on DRILL-7306: --- arina-ielchiieva commented on issue #1813: DRILL-7306: Disable schema-only batch for new scan framework URL: https://github.com/apache/drill/pull/1813#issuecomment-508696422 @paul-rogers I have re-cheked again (previous time I made runs on master and on your branch to ensure failures are caused by your changes), result is the same. ``` on commit c2c4f765dd039cf9073196e5078eebb942882f66 (DRILL-7306: Disable schema-only batch for new scan framework) two empty CSV failures on commit 6ca5902573d06239c366f7cd788e72697366f617 (Fixed empty result set issue) could not build the project [ERROR] /root/drillAutomation/builds/drill/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/ReaderState.java:[365,8] error: cannot find symbol symbol: variable batchCount location: class ReaderState on commit 32fb3a7f8f9861d967929bfb3487d935fc683ff3 (Additional debugging) Parquet failures ``` Link to SF1 data - https://s3-us-west-1.amazonaws.com/drill-public/tpch/sf1/tpch_sf1_parquet.tar.gz Tests were ran on 4 node cluster with the following options: ``` >> Query: alter system set `planner.enable_decimal_data_type` = true; ok summary true planner.enable_decimal_data_type updated. >> Query: alter system set `new_view_default_permissions` = '777'; ok summary true new_view_default_permissions updated. >> Query: alter system set `planner.enable_limit0_optimization` = true; ok summary true planner.enable_limit0_optimization updated. >> Query: alter system set `exec.errors.verbose` = true; ok summary true exec.errors.verbose updated. >> Query: alter system set `planner.memory.max_query_memory_per_node` = 10737418240; ok summary true planner.memory.max_query_memory_per_node updated. >> Query: alter system set `drill.exec.hashagg.fallback.enabled` = true; ok summary true drill.exec.hashagg.fallback.enabled updated. >> Query: alter system set `drill.exec.hashjoin.fallback.enabled` = true; ok summary true drill.exec.hashjoin.fallback.enabled updated. ``` > Taking a step back, I'm actually completely mystified at how my changes could impact Parquet (only). This PR only changed source files are for the "new" scan, which Parquet does not use. Oddly, none of the text file queries fail; which is the one area I did change. Well, in the PR you do change some common classes, so I guess it somehow influences, I don't think it purely connected with parquet just with some filtering so something like that. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Disable "fast schema" batch for new scan framework > -- > > Key: DRILL-7306 > URL: https://issues.apache.org/jira/browse/DRILL-7306 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.17.0 > > > The EVF framework is set up to return a "fast schema" empty batch with only > schema as its first batch because, when the code was written, it seemed > that's how we wanted operators to work. However, DRILL-7305 notes that many > operators cannot handle empty batches. > Since the empty-batch bugs show that Drill does not, in fact, provide a "fast > schema" batch, this ticket asks to disable the feature in the new scan > framework. The feature is disabled with a config option; it can be re-enabled > if ever it is needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework
[ https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16879105#comment-16879105 ] ASF GitHub Bot commented on DRILL-7306: --- arina-ielchiieva commented on issue #1813: DRILL-7306: Disable schema-only batch for new scan framework URL: https://github.com/apache/drill/pull/1813#issuecomment-508696422 @paul-rogers I have re-cheked again (previous time I made runs on master and on your branch to ensure failures are caused by your changes), result is the same. ``` on commit c2c4f765dd039cf9073196e5078eebb942882f66 (DRILL-7306: Disable schema-only batch for new scan framework) two empty CSV failures on commit 6ca5902573d06239c366f7cd788e72697366f617 (Fixed empty result set issue) could not build the project [ERROR] /root/drillAutomation/builds/drill/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/ReaderState.java:[365,8] error: cannot find symbol symbol: variable batchCount location: class ReaderState on commit 32fb3a7f8f9861d967929bfb3487d935fc683ff3 (Additional debugging) Parquet failures ``` Link to SF1 data - https://s3-us-west-1.amazonaws.com/drill-public/tpch/sf1/tpch_sf1_parquet.tar.gz Test were ran on 4 node cluster with the following options: ``` >> Query: alter system set `planner.enable_decimal_data_type` = true; ok summary true planner.enable_decimal_data_type updated. >> Query: alter system set `new_view_default_permissions` = '777'; ok summary true new_view_default_permissions updated. >> Query: alter system set `planner.enable_limit0_optimization` = true; ok summary true planner.enable_limit0_optimization updated. >> Query: alter system set `exec.errors.verbose` = true; ok summary true exec.errors.verbose updated. >> Query: alter system set `planner.memory.max_query_memory_per_node` = 10737418240; ok summary true planner.memory.max_query_memory_per_node updated. >> Query: alter system set `drill.exec.hashagg.fallback.enabled` = true; ok summary true drill.exec.hashagg.fallback.enabled updated. >> Query: alter system set `drill.exec.hashjoin.fallback.enabled` = true; ok summary true drill.exec.hashjoin.fallback.enabled updated. ``` > Taking a step back, I'm actually completely mystified at how my changes could impact Parquet (only). This PR only changed source files are for the "new" scan, which Parquet does not use. Oddly, none of the text file queries fail; which is the one area I did change. Well, in the PR you do change some common classes, so I guess it somehow influences, I don't think it purely connected with parquet just with some filtering so something like that. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Disable "fast schema" batch for new scan framework > -- > > Key: DRILL-7306 > URL: https://issues.apache.org/jira/browse/DRILL-7306 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.17.0 > > > The EVF framework is set up to return a "fast schema" empty batch with only > schema as its first batch because, when the code was written, it seemed > that's how we wanted operators to work. However, DRILL-7305 notes that many > operators cannot handle empty batches. > Since the empty-batch bugs show that Drill does not, in fact, provide a "fast > schema" batch, this ticket asks to disable the feature in the new scan > framework. The feature is disabled with a config option; it can be re-enabled > if ever it is needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework
[ https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16878917#comment-16878917 ] ASF GitHub Bot commented on DRILL-7306: --- paul-rogers commented on issue #1813: DRILL-7306: Disable schema-only batch for new scan framework URL: https://github.com/apache/drill/pull/1813#issuecomment-508583884 Thanks, @arina-ielchiieva, for pointing me to the Parquet data sources. As it turns out, these failures are quite a mystery. First, I don't think the files you mentioned are those used by the tests that failed. The set stored on GitHub is for scale factor (SF) 0.1 which has 1500 customers in the customer table with ids from 0 to 1499. The tests seem to use SF1 which, perhaps, is generated by the test framework during its setup. If we look at the union03 query, the expected results include customer IDs in the six-digit range. That said, I did recreate the union03 query locally, using the SF0.1 files and got 3 result rows. To verify, I wrote a test that scanned the entire table (just a `SELECT * FROM ...`), and "manually" applied the where clause. Three rows matched. So, looks like, at least locally, that particular query works OK against the SF0.1 data set. Unfortunately, I can't check the contents of the `customer.parquet` file because I can't get Parquet tools to work after several hours of fighting one thing after another. I seem to recall we discussed bundling that tool with Drill. Doing so would be very handy. Building by hand requires far more steps than is documented in the Parquet and HortonWorks web site: 1) install gcc, 2) download and compile thrift, 3) build Parquet-tools, 4) figure out the set of dependent jars that must be on the class path, 5)... not sure, here is where I gave up in frustration... Taking a step back, I'm actually completely mystified at how my changes could impact Parquet (only). This PR only changed source files are for the "new" scan, which Parquet does not use. Oddly, none of the text file queries fail; which is the one area I *did* change. Were the parquet files used in the tests rebuilt recently? Might there be a problem with the data itself? Just to make sure I'm tracking down the correct issue: does the master branch pass these same tests? Using the same data files (that is, using the same cluster without rebuilding the functional tests?) Perhaps try testing the log regex or mock PRs. They are rebased on the same master version as this PR. But, they include a distinct set of changes. If those PRs pass, then the problem is somewhere in this PR. If those {Rs have failures, then perhaps we want to double-check the test framework data. While that is done, I will continue to try to find a way to track down the issue (without access to the test framework or the SF1 data...) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Disable "fast schema" batch for new scan framework > -- > > Key: DRILL-7306 > URL: https://issues.apache.org/jira/browse/DRILL-7306 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.17.0 > > > The EVF framework is set up to return a "fast schema" empty batch with only > schema as its first batch because, when the code was written, it seemed > that's how we wanted operators to work. However, DRILL-7305 notes that many > operators cannot handle empty batches. > Since the empty-batch bugs show that Drill does not, in fact, provide a "fast > schema" batch, this ticket asks to disable the feature in the new scan > framework. The feature is disabled with a config option; it can be re-enabled > if ever it is needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework
[ https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16878899#comment-16878899 ] ASF GitHub Bot commented on DRILL-7306: --- paul-rogers commented on issue #1813: DRILL-7306: Disable schema-only batch for new scan framework URL: https://github.com/apache/drill/pull/1813#issuecomment-508583884 Thanks, @arina-ielchiieva, for pointing me to the Parquet data sources. As it turns out, I don't think that is the correct set of files used by the test. If I manually count the matches for the "union03" query, I get three rows out of a total of 1500 rows in the customer table. The expected results shown in your earlier post show customer IDs beyond 1500, suggesting that the failed query ran against a larger file than the one in the directory you suggested. Unfortunately, I can't check the contents of the customer.parquet file because I can't get Parquet tools to work after several hours of fighting one thing after another. I seem to recall we discussed bundling that tool with Drill. Would sure be handy. Looking closer, it seems that the files in the test framework are for scale factor (SF) 0.1. But, the tests use files for SF1. So, I suspect I'm testing against files 1/10 the size of those used in the tests that failed. I'm guessing the test framework generates the SF1 files during its setup phase (which seems to require MFS to run.) Further, I'm completely mystified at how my changes could impact Parquet since the only changed source files are for the "new" scan, which Parquet does not use. Oddly, none of the text file queries fail; which is the area I *did* change. So, net status is that I'm stuck: can't reproduce the issue, can't inspect the data files, can't get access to the SF1 files, can't run the functional tests. Just to make sure I'm tracking down the correct issue: does the master branch pass these same tests? Using the same data files (that is, using the same cluster without rebuilding the functional tests?) Were the parquet files used in the tests rebuilt recently? Might there be a problem with the data itself? I can't tell what the framework is doing. Does it try to do a CSV query against the "golden" file to compare results? Though, the error seems to say that the Parquet query returned zero rows rather than that the Parquet results didn't match the "golden" CSV expected results. Any suggestions for how to proceed? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Disable "fast schema" batch for new scan framework > -- > > Key: DRILL-7306 > URL: https://issues.apache.org/jira/browse/DRILL-7306 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.17.0 > > > The EVF framework is set up to return a "fast schema" empty batch with only > schema as its first batch because, when the code was written, it seemed > that's how we wanted operators to work. However, DRILL-7305 notes that many > operators cannot handle empty batches. > Since the empty-batch bugs show that Drill does not, in fact, provide a "fast > schema" batch, this ticket asks to disable the feature in the new scan > framework. The feature is disabled with a config option; it can be re-enabled > if ever it is needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework
[ https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16878894#comment-16878894 ] ASF GitHub Bot commented on DRILL-7306: --- paul-rogers commented on issue #1813: DRILL-7306: Disable schema-only batch for new scan framework URL: https://github.com/apache/drill/pull/1813#issuecomment-508583884 Thanks, @arina-ielchiieva, for pointing me to the Parquet data sources. As it turns out, I don't think that is the correct set of files used by the test. If I manually count the matches for the "union03" query, I get three rows out of a total of 1500 rows in the customer table. The expected results shown in your earlier post show customer IDs beyond 1500, suggesting that the failed query ran against a larger file than the one in the directory you suggested. Unfortunately, I can't check the contents of the customer.parquet file because I can't get Parquet tools to work after several hours of fighting one thing after another. I seem to recall we discussed bundling that tool with Drill. Would sure be handy. Also, I'm completely mystified at how my changes could impact Parquet since the only changed source files are for the "new" scan, which Parquet does not use. Oddly, none of the text file queries fail; which is the area I *did* change. So, net status is that I'm stuck: can't reproduce the issue, can't inspect the data files, can't run the functional tests. Just to make sure I'm tracking down the correct issue: does the master branch pass these same tests? Using the same data files (that is, using the same cluster without rebuilding the functional tests?) Were the parquet files used in the tests rebuilt recently? Might there be a problem with the data itself? I can't tell what the framework is doing. Does it try to do a CSV query against the "golden" file to compare results? Though, the error seems to say that the Parquet query returned zero rows rather than that the Parquet results didn't match the "golden" CSV expected results. Any suggestions for how to proceed? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Disable "fast schema" batch for new scan framework > -- > > Key: DRILL-7306 > URL: https://issues.apache.org/jira/browse/DRILL-7306 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.17.0 > > > The EVF framework is set up to return a "fast schema" empty batch with only > schema as its first batch because, when the code was written, it seemed > that's how we wanted operators to work. However, DRILL-7305 notes that many > operators cannot handle empty batches. > Since the empty-batch bugs show that Drill does not, in fact, provide a "fast > schema" batch, this ticket asks to disable the feature in the new scan > framework. The feature is disabled with a config option; it can be re-enabled > if ever it is needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework
[ https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16878893#comment-16878893 ] ASF GitHub Bot commented on DRILL-7306: --- paul-rogers commented on issue #1813: DRILL-7306: Disable schema-only batch for new scan framework URL: https://github.com/apache/drill/pull/1813#issuecomment-508583884 Thanks, @arina-ielchiieva, for pointing me to the Parquet data sources. As it turns out, I don't think that is the correct set of files used by the test. If I manually count the matches for the "union03" query, I get three rows out of a total of 1500 rows in the customer table. The expected results shown in your earlier post show customer IDs beyond 1500, suggesting that the failed query ran against a larger file than the one in the directory you suggested. Unfortunately, I can't check the contents of the customer.parquet file because I can't get Parquet tools to work after several hours of fighting one thing after another. I seem to recall we discussed bundling that tool with Drill. Would sure be handy. Also, I'm completely mystified at how my changes could impact Parquet since the only changed source files are for the "new" scan, which Parquet does not use. So, net status is that I'm stuck: can't reproduce the issue, can't inspect the data files, can't run the functional tests. Just to make sure I'm tracking down the correct issue: does the master branch pass these same tests? Using the same data files (that is, using the same cluster without rebuilding the functional tests?) Were the parquet files used in the tests rebuilt recently? Might there be a problem with the data itself? I can't tell what the framework is doing. Does it try to do a CSV query against the "golden" file to compare results? Though, the error seems to say that the Parquet query returned zero rows rather than that the Parquet results didn't match the "golden" CSV expected results. Any suggestions for how to proceed? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Disable "fast schema" batch for new scan framework > -- > > Key: DRILL-7306 > URL: https://issues.apache.org/jira/browse/DRILL-7306 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.17.0 > > > The EVF framework is set up to return a "fast schema" empty batch with only > schema as its first batch because, when the code was written, it seemed > that's how we wanted operators to work. However, DRILL-7305 notes that many > operators cannot handle empty batches. > Since the empty-batch bugs show that Drill does not, in fact, provide a "fast > schema" batch, this ticket asks to disable the feature in the new scan > framework. The feature is disabled with a config option; it can be re-enabled > if ever it is needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework
[ https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876856#comment-16876856 ] ASF GitHub Bot commented on DRILL-7306: --- arina-ielchiieva commented on issue #1813: DRILL-7306: Disable schema-only batch for new scan framework URL: https://github.com/apache/drill/pull/1813#issuecomment-507619026 Looks like datasets are stored here: https://github.com/mapr/drill-test-framework/tree/master/framework/resources/Datasources/Tpch0.01/parquet This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Disable "fast schema" batch for new scan framework > -- > > Key: DRILL-7306 > URL: https://issues.apache.org/jira/browse/DRILL-7306 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.17.0 > > > The EVF framework is set up to return a "fast schema" empty batch with only > schema as its first batch because, when the code was written, it seemed > that's how we wanted operators to work. However, DRILL-7305 notes that many > operators cannot handle empty batches. > Since the empty-batch bugs show that Drill does not, in fact, provide a "fast > schema" batch, this ticket asks to disable the feature in the new scan > framework. The feature is disabled with a config option; it can be re-enabled > if ever it is needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework
[ https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876853#comment-16876853 ] ASF GitHub Bot commented on DRILL-7306: --- arina-ielchiieva commented on issue #1813: DRILL-7306: Disable schema-only batch for new scan framework URL: https://github.com/apache/drill/pull/1813#issuecomment-507619026 Looks like datasets are stored here; https://github.com/mapr/drill-test-framework/tree/master/framework/resources/Datasources/Tpch0.01/parquet This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Disable "fast schema" batch for new scan framework > -- > > Key: DRILL-7306 > URL: https://issues.apache.org/jira/browse/DRILL-7306 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.17.0 > > > The EVF framework is set up to return a "fast schema" empty batch with only > schema as its first batch because, when the code was written, it seemed > that's how we wanted operators to work. However, DRILL-7305 notes that many > operators cannot handle empty batches. > Since the empty-batch bugs show that Drill does not, in fact, provide a "fast > schema" batch, this ticket asks to disable the feature in the new scan > framework. The feature is disabled with a config option; it can be re-enabled > if ever it is needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework
[ https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876615#comment-16876615 ] ASF GitHub Bot commented on DRILL-7306: --- paul-rogers commented on issue #1813: DRILL-7306: Disable schema-only batch for new scan framework URL: https://github.com/apache/drill/pull/1813#issuecomment-507493331 Oh my. We're moving in the wrong direction. I'll need to take a deeper look in a day or two. I wonder, if I just check out the test suite, do I get the data files? Or, are they generated using Hive? Once I have the files, I should be able to reproduce the issues with a single-node Drill. @arina-ielchiieva, if the data files are generated, can you post several of the files somewhere so I can grab them? Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Disable "fast schema" batch for new scan framework > -- > > Key: DRILL-7306 > URL: https://issues.apache.org/jira/browse/DRILL-7306 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Labels: ready-to-commit > Fix For: 1.17.0 > > > The EVF framework is set up to return a "fast schema" empty batch with only > schema as its first batch because, when the code was written, it seemed > that's how we wanted operators to work. However, DRILL-7305 notes that many > operators cannot handle empty batches. > Since the empty-batch bugs show that Drill does not, in fact, provide a "fast > schema" batch, this ticket asks to disable the feature in the new scan > framework. The feature is disabled with a config option; it can be re-enabled > if ever it is needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework
[ https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876614#comment-16876614 ] ASF GitHub Bot commented on DRILL-7306: --- paul-rogers commented on issue #1813: DRILL-7306: Disable schema-only batch for new scan framework URL: https://github.com/apache/drill/pull/1813#issuecomment-507493331 Oh my. We're moving in the wrong direction. I'll need to take a deeper look in a day or two. I wonder, if I just check out the test suite, do I get the data files? Or, are they generated using Hive? Once I have the files, I should be able to reproduce the issues with a single-node Drill. If the files are generated, can you post several of the files somewhere so I can grab them? Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Disable "fast schema" batch for new scan framework > -- > > Key: DRILL-7306 > URL: https://issues.apache.org/jira/browse/DRILL-7306 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Labels: ready-to-commit > Fix For: 1.17.0 > > > The EVF framework is set up to return a "fast schema" empty batch with only > schema as its first batch because, when the code was written, it seemed > that's how we wanted operators to work. However, DRILL-7305 notes that many > operators cannot handle empty batches. > Since the empty-batch bugs show that Drill does not, in fact, provide a "fast > schema" batch, this ticket asks to disable the feature in the new scan > framework. The feature is disabled with a config option; it can be re-enabled > if ever it is needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework
[ https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876074#comment-16876074 ] ASF GitHub Bot commented on DRILL-7306: --- arina-ielchiieva commented on issue #1813: DRILL-7306: Disable schema-only batch for new scan framework URL: https://github.com/apache/drill/pull/1813#issuecomment-507203793 @paul-rogers re-run full tests suit for the updated branch. Previous failures have been fixed but new were added: ``` /root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/union02.sql /root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/join08-merge.sql /root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/original/parquet/query4.sql /root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/join09-hash.sql /root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/join08-hash.sql /root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/union01.sql /root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/filter01.sql /root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/join01-hash.sql /root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/join09-merge.sql /root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/failed04.sql /root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/union03.sql ``` Looks like queries started to return zero results: ``` Data Verification Failures: Query: /root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/union02.sql SELECT C_CUSTKEY, C_NATIONKEY FROM customer C WHERE C_ACCTBAL BETWEEN 1000 AND 1200 AND C_NATIONKEY IN (1, 3) Baseline: /root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/union02.e_tsv Expected number of rows: 208 Actual number of rows from Drill: 0 Number of matching rows: 0 Number of rows missing: 208 Number of rows unexpected: 0 These rows are missing (first 10): 122823 (1 occurence(s)) 6151 1 (1 occurence(s)) 307271 (1 occurence(s)) 110087 3 (1 occurence(s)) 706623 (1 occurence(s)) 101382 3 (1 occurence(s)) 144393 1 (1 occurence(s)) 952553 (1 occurence(s)) 983311 (1 occurence(s)) 122904 3 (1 occurence(s)) Query: /root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/join08-merge.sql SELECT COUNT(*) FROM (SELECT L.L_ORDERKEY AS X, C.C_CUSTKEY AS Y FROM lineitem L LEFT OUTER JOIN customer C ON L.L_ORDERKEY = C.C_CUSTKEY) AS FOO WHERE X < 1 Baseline: /root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/join08-merge.e_tsv Expected number of rows: 1 Actual number of rows from Drill: 1 Number of matching rows: 0 Number of rows missing: 1 Number of rows unexpected: 1 These rows are not expected (first 10): 0 These rows are missing (first 10): 9965 (1 occurence(s)) Query: /root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/original/parquet/query4.sql SELECT O.O_ORDERPRIORITY, COUNT(*) AS ORDER_COUNT FROM orders O WHERE O.O_ORDERDate >= DATE '1996-10-01' AND O.O_ORDERDate < DATE '1996-10-01' + INTERVAL '3' MONTH AND EXISTS ( SELECT * FROM lineitem L WHERE L.L_ORDERKEY = O.O_ORDERKEY AND L.L_COMMITDate < L.L_RECEIPTDate ) GROUP BY O.O_ORDERPRIORITY ORDER BY O.O_ORDERPRIORITY Baseline: /root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/original/parquet/query4.e_tsv Expected number of rows: 5 Actual number of rows from Drill: 0 Number of matching rows: 0 Number of rows missing: 5 Number of rows unexpected: 0 These rows are missing (first 10): 1-URGENT 10611 (1 occurence(s)) 2-HIGH 10538 (1 occurence(s)) 3-MEDIUM 10574 (1 occurence(s)) 4-NOT SPECIFIED 10511 (1 occurence(s)) 5-LOW10568 (1 occurence(s)) Query: /root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/join09-hash.sql SELECT COUNT(*) FROM (SELECT L.L_ORDERKEY AS X, C.C_CUSTKEY AS Y FROM lineitem L LEFT OUTER JOIN
[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework
[ https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876075#comment-16876075 ] ASF GitHub Bot commented on DRILL-7306: --- arina-ielchiieva commented on issue #1813: DRILL-7306: Disable schema-only batch for new scan framework URL: https://github.com/apache/drill/pull/1813#issuecomment-507203793 @paul-rogers re-ran full tests suit for the updated branch. Previous failures have been fixed but new were added: ``` /root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/union02.sql /root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/join08-merge.sql /root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/original/parquet/query4.sql /root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/join09-hash.sql /root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/join08-hash.sql /root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/union01.sql /root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/filter01.sql /root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/join01-hash.sql /root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/join09-merge.sql /root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/failed04.sql /root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/union03.sql ``` Looks like queries started to return zero results: ``` Data Verification Failures: Query: /root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/union02.sql SELECT C_CUSTKEY, C_NATIONKEY FROM customer C WHERE C_ACCTBAL BETWEEN 1000 AND 1200 AND C_NATIONKEY IN (1, 3) Baseline: /root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/union02.e_tsv Expected number of rows: 208 Actual number of rows from Drill: 0 Number of matching rows: 0 Number of rows missing: 208 Number of rows unexpected: 0 These rows are missing (first 10): 122823 (1 occurence(s)) 6151 1 (1 occurence(s)) 307271 (1 occurence(s)) 110087 3 (1 occurence(s)) 706623 (1 occurence(s)) 101382 3 (1 occurence(s)) 144393 1 (1 occurence(s)) 952553 (1 occurence(s)) 983311 (1 occurence(s)) 122904 3 (1 occurence(s)) Query: /root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/join08-merge.sql SELECT COUNT(*) FROM (SELECT L.L_ORDERKEY AS X, C.C_CUSTKEY AS Y FROM lineitem L LEFT OUTER JOIN customer C ON L.L_ORDERKEY = C.C_CUSTKEY) AS FOO WHERE X < 1 Baseline: /root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/join08-merge.e_tsv Expected number of rows: 1 Actual number of rows from Drill: 1 Number of matching rows: 0 Number of rows missing: 1 Number of rows unexpected: 1 These rows are not expected (first 10): 0 These rows are missing (first 10): 9965 (1 occurence(s)) Query: /root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/original/parquet/query4.sql SELECT O.O_ORDERPRIORITY, COUNT(*) AS ORDER_COUNT FROM orders O WHERE O.O_ORDERDate >= DATE '1996-10-01' AND O.O_ORDERDate < DATE '1996-10-01' + INTERVAL '3' MONTH AND EXISTS ( SELECT * FROM lineitem L WHERE L.L_ORDERKEY = O.O_ORDERKEY AND L.L_COMMITDate < L.L_RECEIPTDate ) GROUP BY O.O_ORDERPRIORITY ORDER BY O.O_ORDERPRIORITY Baseline: /root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/original/parquet/query4.e_tsv Expected number of rows: 5 Actual number of rows from Drill: 0 Number of matching rows: 0 Number of rows missing: 5 Number of rows unexpected: 0 These rows are missing (first 10): 1-URGENT 10611 (1 occurence(s)) 2-HIGH 10538 (1 occurence(s)) 3-MEDIUM 10574 (1 occurence(s)) 4-NOT SPECIFIED 10511 (1 occurence(s)) 5-LOW10568 (1 occurence(s)) Query: /root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/join09-hash.sql SELECT COUNT(*) FROM (SELECT L.L_ORDERKEY AS X, C.C_CUSTKEY AS Y FROM lineitem L LEFT OUTER JOIN
[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework
[ https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875902#comment-16875902 ] ASF GitHub Bot commented on DRILL-7306: --- paul-rogers commented on issue #1813: DRILL-7306: Disable schema-only batch for new scan framework URL: https://github.com/apache/drill/pull/1813#issuecomment-507085743 @arina-ielchiieva, please review the recent changes to this PR. If more comments, I'll go ahead and address them. Once you are satisfied, and provide a +1, I'll create a single merge branch with this PR, DRILL-6951 and DRILL-7293. I will run the complete unit tests. But, I'm afraid I must ask you to run the functional tests as I don't have a MFS or Hadoop cluster available to run that test suite. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Disable "fast schema" batch for new scan framework > -- > > Key: DRILL-7306 > URL: https://issues.apache.org/jira/browse/DRILL-7306 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Labels: ready-to-commit > Fix For: 1.17.0 > > > The EVF framework is set up to return a "fast schema" empty batch with only > schema as its first batch because, when the code was written, it seemed > that's how we wanted operators to work. However, DRILL-7305 notes that many > operators cannot handle empty batches. > Since the empty-batch bugs show that Drill does not, in fact, provide a "fast > schema" batch, this ticket asks to disable the feature in the new scan > framework. The feature is disabled with a config option; it can be re-enabled > if ever it is needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework
[ https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875884#comment-16875884 ] ASF GitHub Bot commented on DRILL-7306: --- paul-rogers commented on pull request #1813: DRILL-7306: Disable schema-only batch for new scan framework URL: https://github.com/apache/drill/pull/1813#discussion_r298853771 ## File path: exec/java-exec/src/test/java/org/apache/drill/TestSchemaWithTableFunction.java ## @@ -17,6 +17,14 @@ */ package org.apache.drill; +import static org.junit.Assert.assertFalse; +import static org.junit.Assert.assertTrue; Review comment: Sorry, I've manually put the imports back and have updated the Eclipse rules to put static imports after the others. As it turns out, the Eclipse style rules on the Drill web site are not valid for the latest Eclipse: "Import failed. This is not a faled profile: Expected `CleanUpProfile` but encountered `CodeFormatterProfile`. So, I've been using the Eclipse default profile with a few adjustments per the [web site](http://drill.apache.org/docs/apache-drill-contribution-guidelines/). The web site does not mention a preferred import order. I have tried disabling the auto-update of imports so that files are left unchanged. However, this tends to leave unused imports which then cause the build to fail and take a long time to fix manually. Best solution: update the Eclipse settings file both to make it valid and to include any missing style rules. Then, I'll update my IDE to enforce the project's preferred standards. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Disable "fast schema" batch for new scan framework > -- > > Key: DRILL-7306 > URL: https://issues.apache.org/jira/browse/DRILL-7306 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Labels: ready-to-commit > Fix For: 1.17.0 > > > The EVF framework is set up to return a "fast schema" empty batch with only > schema as its first batch because, when the code was written, it seemed > that's how we wanted operators to work. However, DRILL-7305 notes that many > operators cannot handle empty batches. > Since the empty-batch bugs show that Drill does not, in fact, provide a "fast > schema" batch, this ticket asks to disable the feature in the new scan > framework. The feature is disabled with a config option; it can be re-enabled > if ever it is needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework
[ https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875741#comment-16875741 ] ASF GitHub Bot commented on DRILL-7306: --- arina-ielchiieva commented on pull request #1813: DRILL-7306: Disable schema-only batch for new scan framework URL: https://github.com/apache/drill/pull/1813#discussion_r298827401 ## File path: exec/java-exec/src/test/java/org/apache/drill/TestSchemaWithTableFunction.java ## @@ -17,6 +17,14 @@ */ package org.apache.drill; +import static org.junit.Assert.assertFalse; +import static org.junit.Assert.assertTrue; Review comment: I am not sure why your IDE makes static imports on the top of the list, I think its more common when they are in the end. I think all Drill project follows this convention... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Disable "fast schema" batch for new scan framework > -- > > Key: DRILL-7306 > URL: https://issues.apache.org/jira/browse/DRILL-7306 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Labels: ready-to-commit > Fix For: 1.17.0 > > > The EVF framework is set up to return a "fast schema" empty batch with only > schema as its first batch because, when the code was written, it seemed > that's how we wanted operators to work. However, DRILL-7305 notes that many > operators cannot handle empty batches. > Since the empty-batch bugs show that Drill does not, in fact, provide a "fast > schema" batch, this ticket asks to disable the feature in the new scan > framework. The feature is disabled with a config option; it can be re-enabled > if ever it is needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework
[ https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875696#comment-16875696 ] ASF GitHub Bot commented on DRILL-7306: --- paul-rogers commented on issue #1813: DRILL-7306: Disable schema-only batch for new scan framework URL: https://github.com/apache/drill/pull/1813#issuecomment-507011340 Rebased on master and resolved conflicts. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Disable "fast schema" batch for new scan framework > -- > > Key: DRILL-7306 > URL: https://issues.apache.org/jira/browse/DRILL-7306 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Labels: ready-to-commit > Fix For: 1.17.0 > > > The EVF framework is set up to return a "fast schema" empty batch with only > schema as its first batch because, when the code was written, it seemed > that's how we wanted operators to work. However, DRILL-7305 notes that many > operators cannot handle empty batches. > Since the empty-batch bugs show that Drill does not, in fact, provide a "fast > schema" batch, this ticket asks to disable the feature in the new scan > framework. The feature is disabled with a config option; it can be re-enabled > if ever it is needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework
[ https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875692#comment-16875692 ] ASF GitHub Bot commented on DRILL-7306: --- paul-rogers commented on issue #1813: DRILL-7306: Disable schema-only batch for new scan framework URL: https://github.com/apache/drill/pull/1813#issuecomment-506999568 When running the full tests, the following failed in `java-exec`: ``` [ERROR] Errors: [ERROR] TestDynamicUDFSupport.testReRegisterTheSameJarWithDifferentContent:600->BaseTestQuery.testRunAndReturn:340 » Rpc ``` When running this unit test in Eclipse, two tests failed: `testDropFunction` and `testSuccessfulRegistrationAfterSeveralRetryAttempts`. Then, `testConcurrentRemoteRegistryUpdateWithDuplicates` hung forever. I believe these tests (and one other) seem to fail about 50% of the time on my builds. For example: ``` [ERROR] Errors: [ERROR] TestPStoreProviders.verifyZkStore:67 » NoSuchElement ``` The workaround seems to be to rebuild all of Drill. That is, the rough pattern seems to be that this test will run once after a clean build, but will fail if run a second time or after a code change. Not sure if this is the exact pattern; something like this happens. The result is that it is hard to tell if my code broke something or if the tests are just flaky. I wonder, is there something we can do to stabilize these tests? All other tests run fine if I rerun them a second time on the same build or after I make a small code change. Anyway, after doing a full rebuild and retest, this commit does pass all unit tests. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Disable "fast schema" batch for new scan framework > -- > > Key: DRILL-7306 > URL: https://issues.apache.org/jira/browse/DRILL-7306 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Labels: ready-to-commit > Fix For: 1.17.0 > > > The EVF framework is set up to return a "fast schema" empty batch with only > schema as its first batch because, when the code was written, it seemed > that's how we wanted operators to work. However, DRILL-7305 notes that many > operators cannot handle empty batches. > Since the empty-batch bugs show that Drill does not, in fact, provide a "fast > schema" batch, this ticket asks to disable the feature in the new scan > framework. The feature is disabled with a config option; it can be re-enabled > if ever it is needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework
[ https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875632#comment-16875632 ] ASF GitHub Bot commented on DRILL-7306: --- paul-rogers commented on issue #1813: DRILL-7306: Disable schema-only batch for new scan framework URL: https://github.com/apache/drill/pull/1813#issuecomment-506989680 Addressed the `TestEmptyInputSql` failure. The code now recognizes two cases: 1. Empty results: the reader provided a schema, but had no rows. (This is the case that failed.) 2. Null results: the reader provides neither rows nor schema. This is the case that was always being followed, even if we have a schema. Changed the query builder row set code to return an empty row set if the output contains only an empty batch and contains a schema. The code continues to return no row set if the result is null. (Oddly, Drill will return a batch with no rows and no schema if the reader returns no batches at all.) Will address other issues in separate commits. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Disable "fast schema" batch for new scan framework > -- > > Key: DRILL-7306 > URL: https://issues.apache.org/jira/browse/DRILL-7306 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Labels: ready-to-commit > Fix For: 1.17.0 > > > The EVF framework is set up to return a "fast schema" empty batch with only > schema as its first batch because, when the code was written, it seemed > that's how we wanted operators to work. However, DRILL-7305 notes that many > operators cannot handle empty batches. > Since the empty-batch bugs show that Drill does not, in fact, provide a "fast > schema" batch, this ticket asks to disable the feature in the new scan > framework. The feature is disabled with a config option; it can be re-enabled > if ever it is needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework
[ https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875630#comment-16875630 ] ASF GitHub Bot commented on DRILL-7306: --- paul-rogers commented on issue #1813: DRILL-7306: Disable schema-only batch for new scan framework URL: https://github.com/apache/drill/pull/1813#issuecomment-506999568 When running the full tests, the following failed in `java-exec`: ``` [ERROR] Errors: [ERROR] TestDynamicUDFSupport.testReRegisterTheSameJarWithDifferentContent:600->BaseTestQuery.testRunAndReturn:340 » Rpc ``` When running this unit test in Eclipse, two tests failed: `testDropFunction` and `testSuccessfulRegistrationAfterSeveralRetryAttempts`. Then, `testConcurrentRemoteRegistryUpdateWithDuplicates` hung forever. I believe these tests (and one other that I can't recall) seem to fail about 50% of the time on my builds. The workaround seems to be to rebuild all of Drill. That is, the rough pattern seems to be that this test will run once after a clean build, but will fail if run a second time or after a code change. Not sure if this is the exact pattern; something like this happens. The result is that it is hard to tell if my code broke something or if the tests are just flaky. I wonder, is there something we can do to stabilize these tests? All other tests run fine if I rerun them a second time on the same build or after I make a small code change. Anyway, after doing a full rebuild and retest, this commit does pass all unit tests. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Disable "fast schema" batch for new scan framework > -- > > Key: DRILL-7306 > URL: https://issues.apache.org/jira/browse/DRILL-7306 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Labels: ready-to-commit > Fix For: 1.17.0 > > > The EVF framework is set up to return a "fast schema" empty batch with only > schema as its first batch because, when the code was written, it seemed > that's how we wanted operators to work. However, DRILL-7305 notes that many > operators cannot handle empty batches. > Since the empty-batch bugs show that Drill does not, in fact, provide a "fast > schema" batch, this ticket asks to disable the feature in the new scan > framework. The feature is disabled with a config option; it can be re-enabled > if ever it is needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework
[ https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875606#comment-16875606 ] ASF GitHub Bot commented on DRILL-7306: --- paul-rogers commented on issue #1813: DRILL-7306: Disable schema-only batch for new scan framework URL: https://github.com/apache/drill/pull/1813#issuecomment-506989680 Addressed the TestEmptyInputSql failure. The code now recognizes two cases: 1. Empty results: the reader provided a schema, but had no rows. (This is the case that failed.) 2. Null results: the reader provides neither rows nor schema. This is the case that was always being followed, even if we have a schema. Changed the query builder row set code to return an empty row set if the output contains only an empty batch and contains a schema. The code continues to return no row set if the result is null. (Oddly, Drill will return a batch with no rows and no schema if the reader returns no batches at all.) Will address other issues in separate commits. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Disable "fast schema" batch for new scan framework > -- > > Key: DRILL-7306 > URL: https://issues.apache.org/jira/browse/DRILL-7306 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Labels: ready-to-commit > Fix For: 1.17.0 > > > The EVF framework is set up to return a "fast schema" empty batch with only > schema as its first batch because, when the code was written, it seemed > that's how we wanted operators to work. However, DRILL-7305 notes that many > operators cannot handle empty batches. > Since the empty-batch bugs show that Drill does not, in fact, provide a "fast > schema" batch, this ticket asks to disable the feature in the new scan > framework. The feature is disabled with a config option; it can be re-enabled > if ever it is needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework
[ https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875562#comment-16875562 ] ASF GitHub Bot commented on DRILL-7306: --- arina-ielchiieva commented on issue #1813: DRILL-7306: Disable schema-only batch for new scan framework URL: https://github.com/apache/drill/pull/1813#issuecomment-506972377 @paul-rogers When running tests there are unit and functional test failures. Please run full unit tests suit locally before making the PR, Travis does not do that. UNIT TESTS ``` [INFO] Running org.apache.drill.exec.TestEmptyInputSql 05:53:50.840 [main] ERROR org.apache.drill.TestReporter - Test Failed (d: 0 B(1.0 MiB), h: 6.3 MiB(863.9 MiB), nh: 32 B(324.6 MiB)): testQueryEmptyCsv(org.apache.drill.exec.TestEmptyInputSql) java.lang.Exception: Expected and actual numbers of columns do not match. at org.apache.drill.test.DrillTestWrapper.compareSchemaOnly(DrillTestWrapper.java:486) ~[test-classes/:1.17.0-SNAPSHOT] at org.apache.drill.test.DrillTestWrapper.run(DrillTestWrapper.java:163) ~[test-classes/:1.17.0-SNAPSHOT] at org.apache.drill.exec.TestEmptyInputSql.testQueryEmptyCsv(TestEmptyInputSql.java:222) ~[test-classes/:na] at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_161]. ``` FUNCTIONAL TESTS ``` Data Verification Failures: Query: /root/drillAutomation/drill-test-framework/framework/resources/Functional/limit0/union_all/prq_union_all/data/emptyLHS_CSV.q SELECT cast(columns[0] as int) FROM `emptyFiles/empty_1.csv` UNION ALL SELECT col1 FROM notEmpty_csv_v Baseline: /root/drillAutomation/drill-test-framework/framework/resources/Functional/limit0/union_all/prq_union_all/data/emptyLHS_CSV.e Expected number of rows: 10 Actual number of rows from Drill: 10 Number of matching rows: 0 Number of rows missing: 10 Number of rows unexpected: 10 These rows are not expected (first 10): null These rows are missing (first 10): 1 (1 occurence(s)) 2 (1 occurence(s)) 3 (1 occurence(s)) 4 (1 occurence(s)) 5 (1 occurence(s)) 6 (1 occurence(s)) 7 (1 occurence(s)) 8 (1 occurence(s)) 9 (1 occurence(s)) 10 (1 occurence(s)) ``` Please fix the failures and rebase on the latest master. Also when I was cherry-picking DRILL-7306 & DRILL-6951 there were conflicts. You can consider creating merge branch with commits for these Jiras and resolve the conflicts to ease merge process. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Disable "fast schema" batch for new scan framework > -- > > Key: DRILL-7306 > URL: https://issues.apache.org/jira/browse/DRILL-7306 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Labels: ready-to-commit > Fix For: 1.17.0 > > > The EVF framework is set up to return a "fast schema" empty batch with only > schema as its first batch because, when the code was written, it seemed > that's how we wanted operators to work. However, DRILL-7305 notes that many > operators cannot handle empty batches. > Since the empty-batch bugs show that Drill does not, in fact, provide a "fast > schema" batch, this ticket asks to disable the feature in the new scan > framework. The feature is disabled with a config option; it can be re-enabled > if ever it is needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework
[ https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872877#comment-16872877 ] ASF GitHub Bot commented on DRILL-7306: --- paul-rogers commented on issue #1813: DRILL-7306: Disable schema-only batch for new scan framework URL: https://github.com/apache/drill/pull/1813#issuecomment-505698254 Commits squashed. Note that we can commit either this PR, or DRILL-7293, but not both at the same time. I will need to add one line to DRILL-7293 either after committing that PR, OR after committing DRILL-7293 before this one. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Disable "fast schema" batch for new scan framework > -- > > Key: DRILL-7306 > URL: https://issues.apache.org/jira/browse/DRILL-7306 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Labels: ready-to-commit > Fix For: 1.17.0 > > > The EVF framework is set up to return a "fast schema" empty batch with only > schema as its first batch because, when the code was written, it seemed > that's how we wanted operators to work. However, DRILL-7305 notes that many > operators cannot handle empty batches. > Since the empty-batch bugs show that Drill does not, in fact, provide a "fast > schema" batch, this ticket asks to disable the feature in the new scan > framework. The feature is disabled with a config option; it can be re-enabled > if ever it is needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework
[ https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872135#comment-16872135 ] ASF GitHub Bot commented on DRILL-7306: --- arina-ielchiieva commented on issue #1813: DRILL-7306: Disable schema-only batch for new scan framework URL: https://github.com/apache/drill/pull/1813#issuecomment-505333500 @paul-rogers thanks, now it's much better. +1, please squash the commits. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Disable "fast schema" batch for new scan framework > -- > > Key: DRILL-7306 > URL: https://issues.apache.org/jira/browse/DRILL-7306 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.17.0 > > > The EVF framework is set up to return a "fast schema" empty batch with only > schema as its first batch because, when the code was written, it seemed > that's how we wanted operators to work. However, DRILL-7305 notes that many > operators cannot handle empty batches. > Since the empty-batch bugs show that Drill does not, in fact, provide a "fast > schema" batch, this ticket asks to disable the feature in the new scan > framework. The feature is disabled with a config option; it can be re-enabled > if ever it is needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework
[ https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872009#comment-16872009 ] ASF GitHub Bot commented on DRILL-7306: --- paul-rogers commented on issue #1813: DRILL-7306: Disable schema-only batch for new scan framework URL: https://github.com/apache/drill/pull/1813#issuecomment-505286184 @arina-ielchiieva, regarding `enableSchemaBatch`, recall that Java boolean variables are, by definition in the language spec, set to false. So, since we never set it to true, except in tests, it always defaults to false. Since this was confusing, added Javadoc to explain the problem and the default setting of the option. Does this new material answer your question? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Disable "fast schema" batch for new scan framework > -- > > Key: DRILL-7306 > URL: https://issues.apache.org/jira/browse/DRILL-7306 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.17.0 > > > The EVF framework is set up to return a "fast schema" empty batch with only > schema as its first batch because, when the code was written, it seemed > that's how we wanted operators to work. However, DRILL-7305 notes that many > operators cannot handle empty batches. > Since the empty-batch bugs show that Drill does not, in fact, provide a "fast > schema" batch, this ticket asks to disable the feature in the new scan > framework. The feature is disabled with a config option; it can be re-enabled > if ever it is needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework
[ https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871188#comment-16871188 ] ASF GitHub Bot commented on DRILL-7306: --- arina-ielchiieva commented on pull request #1813: DRILL-7306: Disable schema-only batch for new scan framework URL: https://github.com/apache/drill/pull/1813#discussion_r296713307 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/easy/EasyFormatPlugin.java ## @@ -398,6 +363,40 @@ public void addContext(UserException.Builder builder) { } } + /** + * Initialize the scan framework builder with standard options. + * Call this from the plugin-specific + * {@link #frameworkBuilder(OptionManager, EasySubScan)} method. + * The plugin can then customize/revise options as needed. Review comment: Please add two params to the Javadoc as well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Disable "fast schema" batch for new scan framework > -- > > Key: DRILL-7306 > URL: https://issues.apache.org/jira/browse/DRILL-7306 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.17.0 > > > The EVF framework is set up to return a "fast schema" empty batch with only > schema as its first batch because, when the code was written, it seemed > that's how we wanted operators to work. However, DRILL-7305 notes that many > operators cannot handle empty batches. > Since the empty-batch bugs show that Drill does not, in fact, provide a "fast > schema" batch, this ticket asks to disable the feature in the new scan > framework. The feature is disabled with a config option; it can be re-enabled > if ever it is needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework
[ https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870808#comment-16870808 ] ASF GitHub Bot commented on DRILL-7306: --- paul-rogers commented on pull request #1813: DRILL-7306: Disable schema-only batch for new scan framework URL: https://github.com/apache/drill/pull/1813 The EVF framework is set up to return a "fast schema" empty batch with only schema as its first batch because, when the code was written, it seemed that's how we wanted operators to work. However, DRILL-7305 notes that many operators cannot handle empty batches. Since the empty-batch bugs show that Drill does not, in fact, provide a "fast schema" batch, this ticket asks to disable the feature in the new scan framework. The feature is disabled with a config option; it can be re-enabled if ever it is needed. Old tests validate the original schema-batch mode, new tests added to validate the no-schema-batch mode. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Disable "fast schema" batch for new scan framework > -- > > Key: DRILL-7306 > URL: https://issues.apache.org/jira/browse/DRILL-7306 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.17.0 > > > The EVF framework is set up to return a "fast schema" empty batch with only > schema as its first batch because, when the code was written, it seemed > that's how we wanted operators to work. However, DRILL-7305 notes that many > operators cannot handle empty batches. > Since the empty-batch bugs show that Drill does not, in fact, provide a "fast > schema" batch, this ticket asks to disable the feature in the new scan > framework. The feature is disabled with a config option; it can be re-enabled > if ever it is needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)