[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework

2019-07-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880197#comment-16880197
 ] 

ASF GitHub Bot commented on DRILL-7306:
---

arina-ielchiieva commented on issue #1813: DRILL-7306: Disable schema-only 
batch for new scan framework
URL: https://github.com/apache/drill/pull/1813#issuecomment-509175362
 
 
   @paul-rogers no problem, glad that could help and thank you for your efforts.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Disable "fast schema" batch for new scan framework
> --
>
> Key: DRILL-7306
> URL: https://issues.apache.org/jira/browse/DRILL-7306
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
>  The EVF framework is set up to return a "fast schema" empty batch with only 
> schema as its first batch because, when the code was written, it seemed 
> that's how we wanted operators to work. However, DRILL-7305 notes that many 
> operators cannot handle empty batches.
> Since the empty-batch bugs show that Drill does not, in fact, provide a "fast 
> schema" batch, this ticket asks to disable the feature in the new scan 
> framework. The feature is disabled with a config option; it can be re-enabled 
> if ever it is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework

2019-07-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880199#comment-16880199
 ] 

ASF GitHub Bot commented on DRILL-7306:
---

arina-ielchiieva commented on pull request #1813: DRILL-7306: Disable 
schema-only batch for new scan framework
URL: https://github.com/apache/drill/pull/1813
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Disable "fast schema" batch for new scan framework
> --
>
> Key: DRILL-7306
> URL: https://issues.apache.org/jira/browse/DRILL-7306
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
>  The EVF framework is set up to return a "fast schema" empty batch with only 
> schema as its first batch because, when the code was written, it seemed 
> that's how we wanted operators to work. However, DRILL-7305 notes that many 
> operators cannot handle empty batches.
> Since the empty-batch bugs show that Drill does not, in fact, provide a "fast 
> schema" batch, this ticket asks to disable the feature in the new scan 
> framework. The feature is disabled with a config option; it can be re-enabled 
> if ever it is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework

2019-07-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880198#comment-16880198
 ] 

ASF GitHub Bot commented on DRILL-7306:
---

arina-ielchiieva commented on issue #1813: DRILL-7306: Disable schema-only 
batch for new scan framework
URL: https://github.com/apache/drill/pull/1813#issuecomment-509175395
 
 
   +1, merging.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Disable "fast schema" batch for new scan framework
> --
>
> Key: DRILL-7306
> URL: https://issues.apache.org/jira/browse/DRILL-7306
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
>  The EVF framework is set up to return a "fast schema" empty batch with only 
> schema as its first batch because, when the code was written, it seemed 
> that's how we wanted operators to work. However, DRILL-7305 notes that many 
> operators cannot handle empty batches.
> Since the empty-batch bugs show that Drill does not, in fact, provide a "fast 
> schema" batch, this ticket asks to disable the feature in the new scan 
> framework. The feature is disabled with a config option; it can be re-enabled 
> if ever it is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework

2019-07-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880030#comment-16880030
 ] 

ASF GitHub Bot commented on DRILL-7306:
---

paul-rogers commented on issue #1813: DRILL-7306: Disable schema-only batch for 
new scan framework
URL: https://github.com/apache/drill/pull/1813#issuecomment-509087121
 
 
   @arina-ielchiieva, thank you for running the tests. Glad to hear we finally 
have a clean run.
   
   I've squashed commits. Since this PR has been a struggle, let's do commit it 
by itself. Once that is done, I will rebase the other two PRs, then combine 
them into a single merge branch for your convenience.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Disable "fast schema" batch for new scan framework
> --
>
> Key: DRILL-7306
> URL: https://issues.apache.org/jira/browse/DRILL-7306
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
>  The EVF framework is set up to return a "fast schema" empty batch with only 
> schema as its first batch because, when the code was written, it seemed 
> that's how we wanted operators to work. However, DRILL-7305 notes that many 
> operators cannot handle empty batches.
> Since the empty-batch bugs show that Drill does not, in fact, provide a "fast 
> schema" batch, this ticket asks to disable the feature in the new scan 
> framework. The feature is disabled with a config option; it can be re-enabled 
> if ever it is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework

2019-07-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16879864#comment-16879864
 ] 

ASF GitHub Bot commented on DRILL-7306:
---

arina-ielchiieva commented on issue #1813: DRILL-7306: Disable schema-only 
batch for new scan framework
URL: https://github.com/apache/drill/pull/1813#issuecomment-509001976
 
 
   @paul-rogers re-run the tests, now they pass. I guess there is no need to 
create merge branch for this PR, just squash the commits and rebase, I'll 
re-run the tests and merge. After this PR is merged, you will be able to 
proceed with the remaining two.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Disable "fast schema" batch for new scan framework
> --
>
> Key: DRILL-7306
> URL: https://issues.apache.org/jira/browse/DRILL-7306
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
>  The EVF framework is set up to return a "fast schema" empty batch with only 
> schema as its first batch because, when the code was written, it seemed 
> that's how we wanted operators to work. However, DRILL-7305 notes that many 
> operators cannot handle empty batches.
> Since the empty-batch bugs show that Drill does not, in fact, provide a "fast 
> schema" batch, this ticket asks to disable the feature in the new scan 
> framework. The feature is disabled with a config option; it can be re-enabled 
> if ever it is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework

2019-07-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16879786#comment-16879786
 ] 

ASF GitHub Bot commented on DRILL-7306:
---

paul-rogers commented on issue #1813: DRILL-7306: Disable schema-only batch for 
new scan framework
URL: https://github.com/apache/drill/pull/1813#issuecomment-508966464
 
 
   @arina-ielchiieva, very sorry for the trouble that PR has caused. Thank you 
for the SF1 data. It allowed me to find the offending change and fix the issue.
   
   This is a case of "no good deed goes unpunished." I tried to work around the 
bad code explained in DRILL-7308 by setting precision only if non-zero. For 
reasons I did not track down, this change caused some part of the TPC-H queries 
to fail.
   
   The failure was in the "tuple/column metadata" classes. I had not realized 
that these classes are now used outside of just the new scan and schema work. 
(I did not track down the usage.)
   
   This exercise shows that Drill requires the following rules regarding 
precision:
   
   * Precision must be set for all types that need it (including VarChar and 
Decimal.)
   * Precision must be set for these types even if the value is zero.
   * Code that wants to know if the precision is non-zero should check the 
precision value itself. Code should not use an "is set" check as a substitute 
for "value != 0". (This is the flaw in the REST code described in DRILL-7308.)
   
   Filed DRILL-7318 to suggest we clean up and standardize our type-to-string 
implementations so that they build type strings based on the above rules.
   
   Reverted the precision-related change in {{PrimitiveColumnMetadata}}. The 
TPC-H union03 query now passes locally. I presume the others will also.
   
   The reversion caused a test to fail. It seems that the `EXPLAIN PLAN` for 
the new schema provisioning stuff uses the `toString()` method to get the type 
string shown in the plan for a provided schema. Not sure this is the best idea 
because `toString()` is for debugging and uses internal type names. It also 
shows the precision and scale for all types, resulting in the output `INT(0, 
0):OPTIONAL`. (The type names and cardinality notation should be fixed as part 
of DRILL-7318.)
   
   The reverted change had removed the unwanted precision and scale. By 
reverting the change, the unwanted precision and scale returned. So, for now, I 
modified `MaterializedField.toString()` to not emit the precision if the value 
is zero (unless the type is Decimal.) `EXPLAIN PLAN` now produces 
`INT:OPTIONAL` (without the precision and scale.)
   
   Left the changes as a new commit so you can review just the new changes.
   
   Please try this PR again against the functional tests. Let's hope this time 
things work this time. If so, I'll squash commits. We talked about creating a 
merge branch with the three open PRs. Since you need to run the functional 
tests again to verify the fix, do you want to then just commit this PR? I can 
create a merge branch for the other two. Otherwise, I'm happy to create a merge 
branch with all three.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Disable "fast schema" batch for new scan framework
> --
>
> Key: DRILL-7306
> URL: https://issues.apache.org/jira/browse/DRILL-7306
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
>  The EVF framework is set up to return a "fast schema" empty batch with only 
> schema as its first batch because, when the code was written, it seemed 
> that's how we wanted operators to work. However, DRILL-7305 notes that many 
> operators cannot handle empty batches.
> Since the empty-batch bugs show that Drill does not, in fact, provide a "fast 
> schema" batch, this ticket asks to disable the feature in the new scan 
> framework. The feature is disabled with a config option; it can be re-enabled 
> if ever it is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework

2019-07-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16879107#comment-16879107
 ] 

ASF GitHub Bot commented on DRILL-7306:
---

arina-ielchiieva commented on issue #1813: DRILL-7306: Disable schema-only 
batch for new scan framework
URL: https://github.com/apache/drill/pull/1813#issuecomment-508696422
 
 
   @paul-rogers I have re-cheked again (previous time I made runs on master and 
on your branch to ensure failures are caused by your changes), result is the 
same.
   ```
   on commit c2c4f765dd039cf9073196e5078eebb942882f66 (DRILL-7306: Disable 
schema-only batch for new scan framework)
   two empty CSV failures
   
   on commit 6ca5902573d06239c366f7cd788e72697366f617 (Fixed empty result set 
issue) could not build the project
 [ERROR] 
/root/drillAutomation/builds/drill/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/ReaderState.java:[365,8]
 error: cannot find symbol
 symbol:   variable batchCount
 location: class ReaderState
   
   on commit 32fb3a7f8f9861d967929bfb3487d935fc683ff3 (Additional debugging)  
   Parquet failures 
   ```
   Link to SF1 data - 
https://s3-us-west-1.amazonaws.com/drill-public/tpch/sf1/tpch_sf1_parquet.tar.gz
   Tests were ran on 4 node cluster with the following options:
   ```
   >> Query: alter system set `planner.enable_decimal_data_type` = true;
   ok   summary
   true planner.enable_decimal_data_type updated.
   
   >> Query: alter system set `new_view_default_permissions` = '777';
   ok   summary
   true new_view_default_permissions updated.
   
   >> Query: alter system set `planner.enable_limit0_optimization` = true;
   ok   summary
   true planner.enable_limit0_optimization updated.
   
   >> Query: alter system set `exec.errors.verbose` = true;
   ok   summary
   true exec.errors.verbose updated.
   
   >> Query: alter system set `planner.memory.max_query_memory_per_node` = 
10737418240;
   ok   summary
   true planner.memory.max_query_memory_per_node updated.
   
   >> Query: alter system set `drill.exec.hashagg.fallback.enabled` = true;
   ok   summary
   true drill.exec.hashagg.fallback.enabled updated.
   
   >> Query: alter system set `drill.exec.hashjoin.fallback.enabled` = true;
   ok   summary
   true drill.exec.hashjoin.fallback.enabled updated.
   ```
   
   > Taking a step back, I'm actually completely mystified at how my changes 
could impact Parquet (only). This PR only changed source files are for the 
"new" scan, which Parquet does not use. Oddly, none of the text file queries 
fail; which is the one area I did change.
   
   Well, in the PR you do change some common classes, so I guess it somehow 
influences, I don't think it purely connected with parquet just with some 
filtering or batch counting or something like that
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Disable "fast schema" batch for new scan framework
> --
>
> Key: DRILL-7306
> URL: https://issues.apache.org/jira/browse/DRILL-7306
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
>  The EVF framework is set up to return a "fast schema" empty batch with only 
> schema as its first batch because, when the code was written, it seemed 
> that's how we wanted operators to work. However, DRILL-7305 notes that many 
> operators cannot handle empty batches.
> Since the empty-batch bugs show that Drill does not, in fact, provide a "fast 
> schema" batch, this ticket asks to disable the feature in the new scan 
> framework. The feature is disabled with a config option; it can be re-enabled 
> if ever it is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework

2019-07-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16879106#comment-16879106
 ] 

ASF GitHub Bot commented on DRILL-7306:
---

arina-ielchiieva commented on issue #1813: DRILL-7306: Disable schema-only 
batch for new scan framework
URL: https://github.com/apache/drill/pull/1813#issuecomment-508696422
 
 
   @paul-rogers I have re-cheked again (previous time I made runs on master and 
on your branch to ensure failures are caused by your changes), result is the 
same.
   ```
   on commit c2c4f765dd039cf9073196e5078eebb942882f66 (DRILL-7306: Disable 
schema-only batch for new scan framework)
   two empty CSV failures
   
   on commit 6ca5902573d06239c366f7cd788e72697366f617 (Fixed empty result set 
issue) could not build the project
 [ERROR] 
/root/drillAutomation/builds/drill/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/ReaderState.java:[365,8]
 error: cannot find symbol
 symbol:   variable batchCount
 location: class ReaderState
   
   on commit 32fb3a7f8f9861d967929bfb3487d935fc683ff3 (Additional debugging)  
   Parquet failures 
   ```
   Link to SF1 data - 
https://s3-us-west-1.amazonaws.com/drill-public/tpch/sf1/tpch_sf1_parquet.tar.gz
   Tests were ran on 4 node cluster with the following options:
   ```
   >> Query: alter system set `planner.enable_decimal_data_type` = true;
   ok   summary
   true planner.enable_decimal_data_type updated.
   
   >> Query: alter system set `new_view_default_permissions` = '777';
   ok   summary
   true new_view_default_permissions updated.
   
   >> Query: alter system set `planner.enable_limit0_optimization` = true;
   ok   summary
   true planner.enable_limit0_optimization updated.
   
   >> Query: alter system set `exec.errors.verbose` = true;
   ok   summary
   true exec.errors.verbose updated.
   
   >> Query: alter system set `planner.memory.max_query_memory_per_node` = 
10737418240;
   ok   summary
   true planner.memory.max_query_memory_per_node updated.
   
   >> Query: alter system set `drill.exec.hashagg.fallback.enabled` = true;
   ok   summary
   true drill.exec.hashagg.fallback.enabled updated.
   
   >> Query: alter system set `drill.exec.hashjoin.fallback.enabled` = true;
   ok   summary
   true drill.exec.hashjoin.fallback.enabled updated.
   ```
   
   > Taking a step back, I'm actually completely mystified at how my changes 
could impact Parquet (only). This PR only changed source files are for the 
"new" scan, which Parquet does not use. Oddly, none of the text file queries 
fail; which is the one area I did change.
   
   Well, in the PR you do change some common classes, so I guess it somehow 
influences, I don't think it purely connected with parquet just with some 
filtering so something like that.
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Disable "fast schema" batch for new scan framework
> --
>
> Key: DRILL-7306
> URL: https://issues.apache.org/jira/browse/DRILL-7306
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
>  The EVF framework is set up to return a "fast schema" empty batch with only 
> schema as its first batch because, when the code was written, it seemed 
> that's how we wanted operators to work. However, DRILL-7305 notes that many 
> operators cannot handle empty batches.
> Since the empty-batch bugs show that Drill does not, in fact, provide a "fast 
> schema" batch, this ticket asks to disable the feature in the new scan 
> framework. The feature is disabled with a config option; it can be re-enabled 
> if ever it is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework

2019-07-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16879105#comment-16879105
 ] 

ASF GitHub Bot commented on DRILL-7306:
---

arina-ielchiieva commented on issue #1813: DRILL-7306: Disable schema-only 
batch for new scan framework
URL: https://github.com/apache/drill/pull/1813#issuecomment-508696422
 
 
   @paul-rogers I have re-cheked again (previous time I made runs on master and 
on your branch to ensure failures are caused by your changes), result is the 
same.
   ```
   on commit c2c4f765dd039cf9073196e5078eebb942882f66 (DRILL-7306: Disable 
schema-only batch for new scan framework)
   two empty CSV failures
   
   on commit 6ca5902573d06239c366f7cd788e72697366f617 (Fixed empty result set 
issue) could not build the project
 [ERROR] 
/root/drillAutomation/builds/drill/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/ReaderState.java:[365,8]
 error: cannot find symbol
 symbol:   variable batchCount
 location: class ReaderState
   
   on commit 32fb3a7f8f9861d967929bfb3487d935fc683ff3 (Additional debugging)  
   Parquet failures 
   ```
   Link to SF1 data - 
https://s3-us-west-1.amazonaws.com/drill-public/tpch/sf1/tpch_sf1_parquet.tar.gz
   Test were ran on 4 node cluster with the following options:
   ```
   >> Query: alter system set `planner.enable_decimal_data_type` = true;
   ok   summary
   true planner.enable_decimal_data_type updated.
   
   >> Query: alter system set `new_view_default_permissions` = '777';
   ok   summary
   true new_view_default_permissions updated.
   
   >> Query: alter system set `planner.enable_limit0_optimization` = true;
   ok   summary
   true planner.enable_limit0_optimization updated.
   
   >> Query: alter system set `exec.errors.verbose` = true;
   ok   summary
   true exec.errors.verbose updated.
   
   >> Query: alter system set `planner.memory.max_query_memory_per_node` = 
10737418240;
   ok   summary
   true planner.memory.max_query_memory_per_node updated.
   
   >> Query: alter system set `drill.exec.hashagg.fallback.enabled` = true;
   ok   summary
   true drill.exec.hashagg.fallback.enabled updated.
   
   >> Query: alter system set `drill.exec.hashjoin.fallback.enabled` = true;
   ok   summary
   true drill.exec.hashjoin.fallback.enabled updated.
   ```
   
   > Taking a step back, I'm actually completely mystified at how my changes 
could impact Parquet (only). This PR only changed source files are for the 
"new" scan, which Parquet does not use. Oddly, none of the text file queries 
fail; which is the one area I did change.
   
   Well, in the PR you do change some common classes, so I guess it somehow 
influences, I don't think it purely connected with parquet just with some 
filtering so something like that.
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Disable "fast schema" batch for new scan framework
> --
>
> Key: DRILL-7306
> URL: https://issues.apache.org/jira/browse/DRILL-7306
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
>  The EVF framework is set up to return a "fast schema" empty batch with only 
> schema as its first batch because, when the code was written, it seemed 
> that's how we wanted operators to work. However, DRILL-7305 notes that many 
> operators cannot handle empty batches.
> Since the empty-batch bugs show that Drill does not, in fact, provide a "fast 
> schema" batch, this ticket asks to disable the feature in the new scan 
> framework. The feature is disabled with a config option; it can be re-enabled 
> if ever it is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework

2019-07-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16878917#comment-16878917
 ] 

ASF GitHub Bot commented on DRILL-7306:
---

paul-rogers commented on issue #1813: DRILL-7306: Disable schema-only batch for 
new scan framework
URL: https://github.com/apache/drill/pull/1813#issuecomment-508583884
 
 
   Thanks, @arina-ielchiieva, for pointing me to the Parquet data sources. As 
it turns out, these failures are quite a mystery.
   
   First, I don't think the files you mentioned are those used by the tests 
that failed. The set stored on GitHub is for scale factor (SF) 0.1 which has 
1500 customers in the customer table with ids from 0 to 1499. The tests seem to 
use SF1 which, perhaps, is generated by the test framework during its setup. If 
we look at the union03 query, the expected results include customer IDs in the 
six-digit range.
   
   That said, I did recreate the union03 query locally, using the SF0.1 files 
and got 3 result rows. To verify, I wrote a test that scanned the entire table 
(just a `SELECT * FROM ...`), and "manually" applied the where clause. Three 
rows matched. So, looks like, at least locally, that particular query works OK 
against the SF0.1 data set.
   
   Unfortunately, I can't check the contents of the `customer.parquet` file 
because I can't get Parquet tools to work after several hours of fighting one 
thing after another. I seem to recall we discussed bundling that tool with 
Drill. Doing so would be very handy. Building by hand requires far more steps 
than is documented in the Parquet and HortonWorks web site: 1) install gcc, 2) 
download and compile thrift, 3) build Parquet-tools, 4) figure out the set of 
dependent jars that must be on the class path, 5)... not sure, here is where I 
gave up in frustration...
   
   Taking a step back, I'm actually completely mystified at how my changes 
could impact Parquet (only). This PR only changed source files are for the 
"new" scan, which Parquet does not use. Oddly, none of the text file queries 
fail; which is the one area I *did* change.
   
   Were the parquet files used in the tests rebuilt recently? Might there be a 
problem with the data itself?
   
   Just to make sure I'm tracking down the correct issue: does the master 
branch pass these same tests? Using the same data files (that is, using the 
same cluster without rebuilding the functional tests?) Perhaps try testing the 
log regex or mock PRs. They are rebased on the same master version as this PR. 
But, they include a distinct set of changes. If those PRs pass, then the 
problem is somewhere in this PR. If those {Rs have failures, then perhaps we 
want to double-check the test framework data.
   
   While that is done, I will continue to try to find a way to track down the 
issue (without access to the test framework or the SF1 data...)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Disable "fast schema" batch for new scan framework
> --
>
> Key: DRILL-7306
> URL: https://issues.apache.org/jira/browse/DRILL-7306
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
>  The EVF framework is set up to return a "fast schema" empty batch with only 
> schema as its first batch because, when the code was written, it seemed 
> that's how we wanted operators to work. However, DRILL-7305 notes that many 
> operators cannot handle empty batches.
> Since the empty-batch bugs show that Drill does not, in fact, provide a "fast 
> schema" batch, this ticket asks to disable the feature in the new scan 
> framework. The feature is disabled with a config option; it can be re-enabled 
> if ever it is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework

2019-07-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16878899#comment-16878899
 ] 

ASF GitHub Bot commented on DRILL-7306:
---

paul-rogers commented on issue #1813: DRILL-7306: Disable schema-only batch for 
new scan framework
URL: https://github.com/apache/drill/pull/1813#issuecomment-508583884
 
 
   Thanks, @arina-ielchiieva, for pointing me to the Parquet data sources. As 
it turns out, I don't think that is the correct set of files used by the test. 
If I manually count the matches for the "union03" query, I get three rows out 
of a total of 1500 rows in the customer table. The expected results shown in 
your earlier post show customer IDs beyond 1500, suggesting that the failed 
query ran against a larger file than the one in the directory you suggested.
   
   Unfortunately, I can't check the contents of the customer.parquet file 
because I can't get Parquet tools to work after several hours of fighting one 
thing after another. I seem to recall we discussed bundling that tool with 
Drill. Would sure be handy.
   
   Looking closer, it seems that the files in the test framework are for scale 
factor (SF) 0.1. But, the tests use files for SF1. So, I suspect I'm testing 
against files 1/10 the size of those used in the tests that failed. I'm 
guessing the test framework generates the SF1 files during its setup phase 
(which seems to require MFS to run.)
   
   Further, I'm completely mystified at how my changes could impact Parquet 
since the only changed source files are for the "new" scan, which Parquet does 
not use. Oddly, none of the text file queries fail; which is the area I *did* 
change.
   
   So, net status is that I'm stuck: can't reproduce the issue, can't inspect 
the data files, can't get access to the SF1 files, can't run the functional 
tests.
   
   Just to make sure I'm tracking down the correct issue: does the master 
branch pass these same tests? Using the same data files (that is, using the 
same cluster without rebuilding the functional tests?)
   
   Were the parquet files used in the tests rebuilt recently? Might there be a 
problem with the data itself?
   
   I can't tell what the framework is doing. Does it try to do a CSV query 
against the "golden" file to compare results? Though, the error seems to say 
that the Parquet query returned zero rows rather than that the Parquet results 
didn't match the "golden" CSV expected results.
   
   Any suggestions for how to proceed?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Disable "fast schema" batch for new scan framework
> --
>
> Key: DRILL-7306
> URL: https://issues.apache.org/jira/browse/DRILL-7306
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
>  The EVF framework is set up to return a "fast schema" empty batch with only 
> schema as its first batch because, when the code was written, it seemed 
> that's how we wanted operators to work. However, DRILL-7305 notes that many 
> operators cannot handle empty batches.
> Since the empty-batch bugs show that Drill does not, in fact, provide a "fast 
> schema" batch, this ticket asks to disable the feature in the new scan 
> framework. The feature is disabled with a config option; it can be re-enabled 
> if ever it is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework

2019-07-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16878894#comment-16878894
 ] 

ASF GitHub Bot commented on DRILL-7306:
---

paul-rogers commented on issue #1813: DRILL-7306: Disable schema-only batch for 
new scan framework
URL: https://github.com/apache/drill/pull/1813#issuecomment-508583884
 
 
   Thanks, @arina-ielchiieva, for pointing me to the Parquet data sources. As 
it turns out, I don't think that is the correct set of files used by the test. 
If I manually count the matches for the "union03" query, I get three rows out 
of a total of 1500 rows in the customer table. The expected results shown in 
your earlier post show customer IDs beyond 1500, suggesting that the failed 
query ran against a larger file than the one in the directory you suggested.
   
   Unfortunately, I can't check the contents of the customer.parquet file 
because I can't get Parquet tools to work after several hours of fighting one 
thing after another. I seem to recall we discussed bundling that tool with 
Drill. Would sure be handy.
   
   Also, I'm completely mystified at how my changes could impact Parquet since 
the only changed source files are for the "new" scan, which Parquet does not 
use. Oddly, none of the text file queries fail; which is the area I *did* 
change.
   
   So, net status is that I'm stuck: can't reproduce the issue, can't inspect 
the data files, can't run the functional tests.
   
   Just to make sure I'm tracking down the correct issue: does the master 
branch pass these same tests? Using the same data files (that is, using the 
same cluster without rebuilding the functional tests?)
   
   Were the parquet files used in the tests rebuilt recently? Might there be a 
problem with the data itself?
   
   I can't tell what the framework is doing. Does it try to do a CSV query 
against the "golden" file to compare results? Though, the error seems to say 
that the Parquet query returned zero rows rather than that the Parquet results 
didn't match the "golden" CSV expected results.
   
   Any suggestions for how to proceed?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Disable "fast schema" batch for new scan framework
> --
>
> Key: DRILL-7306
> URL: https://issues.apache.org/jira/browse/DRILL-7306
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
>  The EVF framework is set up to return a "fast schema" empty batch with only 
> schema as its first batch because, when the code was written, it seemed 
> that's how we wanted operators to work. However, DRILL-7305 notes that many 
> operators cannot handle empty batches.
> Since the empty-batch bugs show that Drill does not, in fact, provide a "fast 
> schema" batch, this ticket asks to disable the feature in the new scan 
> framework. The feature is disabled with a config option; it can be re-enabled 
> if ever it is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework

2019-07-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16878893#comment-16878893
 ] 

ASF GitHub Bot commented on DRILL-7306:
---

paul-rogers commented on issue #1813: DRILL-7306: Disable schema-only batch for 
new scan framework
URL: https://github.com/apache/drill/pull/1813#issuecomment-508583884
 
 
   Thanks, @arina-ielchiieva, for pointing me to the Parquet data sources. As 
it turns out, I don't think that is the correct set of files used by the test. 
If I manually count the matches for the "union03" query, I get three rows out 
of a total of 1500 rows in the customer table. The expected results shown in 
your earlier post show customer IDs beyond 1500, suggesting that the failed 
query ran against a larger file than the one in the directory you suggested.
   
   Unfortunately, I can't check the contents of the customer.parquet file 
because I can't get Parquet tools to work after several hours of fighting one 
thing after another. I seem to recall we discussed bundling that tool with 
Drill. Would sure be handy.
   
   Also, I'm completely mystified at how my changes could impact Parquet since 
the only changed source files are for the "new" scan, which Parquet does not 
use.
   
   So, net status is that I'm stuck: can't reproduce the issue, can't inspect 
the data files, can't run the functional tests.
   
   Just to make sure I'm tracking down the correct issue: does the master 
branch pass these same tests? Using the same data files (that is, using the 
same cluster without rebuilding the functional tests?)
   
   Were the parquet files used in the tests rebuilt recently? Might there be a 
problem with the data itself?
   
   I can't tell what the framework is doing. Does it try to do a CSV query 
against the "golden" file to compare results? Though, the error seems to say 
that the Parquet query returned zero rows rather than that the Parquet results 
didn't match the "golden" CSV expected results.
   
   Any suggestions for how to proceed?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Disable "fast schema" batch for new scan framework
> --
>
> Key: DRILL-7306
> URL: https://issues.apache.org/jira/browse/DRILL-7306
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
>  The EVF framework is set up to return a "fast schema" empty batch with only 
> schema as its first batch because, when the code was written, it seemed 
> that's how we wanted operators to work. However, DRILL-7305 notes that many 
> operators cannot handle empty batches.
> Since the empty-batch bugs show that Drill does not, in fact, provide a "fast 
> schema" batch, this ticket asks to disable the feature in the new scan 
> framework. The feature is disabled with a config option; it can be re-enabled 
> if ever it is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework

2019-07-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876856#comment-16876856
 ] 

ASF GitHub Bot commented on DRILL-7306:
---

arina-ielchiieva commented on issue #1813: DRILL-7306: Disable schema-only 
batch for new scan framework
URL: https://github.com/apache/drill/pull/1813#issuecomment-507619026
 
 
   Looks like datasets are stored here: 
https://github.com/mapr/drill-test-framework/tree/master/framework/resources/Datasources/Tpch0.01/parquet
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Disable "fast schema" batch for new scan framework
> --
>
> Key: DRILL-7306
> URL: https://issues.apache.org/jira/browse/DRILL-7306
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
>  The EVF framework is set up to return a "fast schema" empty batch with only 
> schema as its first batch because, when the code was written, it seemed 
> that's how we wanted operators to work. However, DRILL-7305 notes that many 
> operators cannot handle empty batches.
> Since the empty-batch bugs show that Drill does not, in fact, provide a "fast 
> schema" batch, this ticket asks to disable the feature in the new scan 
> framework. The feature is disabled with a config option; it can be re-enabled 
> if ever it is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework

2019-07-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876853#comment-16876853
 ] 

ASF GitHub Bot commented on DRILL-7306:
---

arina-ielchiieva commented on issue #1813: DRILL-7306: Disable schema-only 
batch for new scan framework
URL: https://github.com/apache/drill/pull/1813#issuecomment-507619026
 
 
   Looks like datasets are stored here; 
https://github.com/mapr/drill-test-framework/tree/master/framework/resources/Datasources/Tpch0.01/parquet
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Disable "fast schema" batch for new scan framework
> --
>
> Key: DRILL-7306
> URL: https://issues.apache.org/jira/browse/DRILL-7306
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
>  The EVF framework is set up to return a "fast schema" empty batch with only 
> schema as its first batch because, when the code was written, it seemed 
> that's how we wanted operators to work. However, DRILL-7305 notes that many 
> operators cannot handle empty batches.
> Since the empty-batch bugs show that Drill does not, in fact, provide a "fast 
> schema" batch, this ticket asks to disable the feature in the new scan 
> framework. The feature is disabled with a config option; it can be re-enabled 
> if ever it is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework

2019-07-01 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876615#comment-16876615
 ] 

ASF GitHub Bot commented on DRILL-7306:
---

paul-rogers commented on issue #1813: DRILL-7306: Disable schema-only batch for 
new scan framework
URL: https://github.com/apache/drill/pull/1813#issuecomment-507493331
 
 
   Oh my. We're moving in the wrong direction. I'll need to take a deeper look 
in a day or two.
   
   I wonder, if I just check out the test suite, do I get the data files? Or, 
are they generated using Hive? Once I have the files, I should be able to 
reproduce the issues with a single-node Drill. @arina-ielchiieva, if the data 
files are generated, can you post several of the files somewhere so I can grab 
them? Thanks.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Disable "fast schema" batch for new scan framework
> --
>
> Key: DRILL-7306
> URL: https://issues.apache.org/jira/browse/DRILL-7306
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
>  The EVF framework is set up to return a "fast schema" empty batch with only 
> schema as its first batch because, when the code was written, it seemed 
> that's how we wanted operators to work. However, DRILL-7305 notes that many 
> operators cannot handle empty batches.
> Since the empty-batch bugs show that Drill does not, in fact, provide a "fast 
> schema" batch, this ticket asks to disable the feature in the new scan 
> framework. The feature is disabled with a config option; it can be re-enabled 
> if ever it is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework

2019-07-01 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876614#comment-16876614
 ] 

ASF GitHub Bot commented on DRILL-7306:
---

paul-rogers commented on issue #1813: DRILL-7306: Disable schema-only batch for 
new scan framework
URL: https://github.com/apache/drill/pull/1813#issuecomment-507493331
 
 
   Oh my. We're moving in the wrong direction. I'll need to take a deeper look 
in a day or two.
   
   I wonder, if I just check out the test suite, do I get the data files? Or, 
are they generated using Hive? Once I have the files, I should be able to 
reproduce the issues with a single-node Drill. If the files are generated, can 
you post several of the files somewhere so I can grab them? Thanks.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Disable "fast schema" batch for new scan framework
> --
>
> Key: DRILL-7306
> URL: https://issues.apache.org/jira/browse/DRILL-7306
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
>  The EVF framework is set up to return a "fast schema" empty batch with only 
> schema as its first batch because, when the code was written, it seemed 
> that's how we wanted operators to work. However, DRILL-7305 notes that many 
> operators cannot handle empty batches.
> Since the empty-batch bugs show that Drill does not, in fact, provide a "fast 
> schema" batch, this ticket asks to disable the feature in the new scan 
> framework. The feature is disabled with a config option; it can be re-enabled 
> if ever it is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework

2019-07-01 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876074#comment-16876074
 ] 

ASF GitHub Bot commented on DRILL-7306:
---

arina-ielchiieva commented on issue #1813: DRILL-7306: Disable schema-only 
batch for new scan framework
URL: https://github.com/apache/drill/pull/1813#issuecomment-507203793
 
 
   @paul-rogers re-run full tests suit for the updated branch. Previous 
failures have been fixed but new were added:
   ```
   
/root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/union02.sql
   
/root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/join08-merge.sql
   
/root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/original/parquet/query4.sql
   
/root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/join09-hash.sql
   
/root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/join08-hash.sql
   
/root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/union01.sql
   
/root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/filter01.sql
   
/root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/join01-hash.sql
   
/root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/join09-merge.sql
   
/root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/failed04.sql
   
/root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/union03.sql
   ```
   Looks like queries started to return zero results:
   ```
   Data Verification Failures:
   
   Query: 
/root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/union02.sql
   SELECT C_CUSTKEY, C_NATIONKEY FROM customer C WHERE C_ACCTBAL BETWEEN 1000 
AND 1200 AND C_NATIONKEY IN (1, 3)
   
   Baseline: 
/root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/union02.e_tsv
Expected number of rows: 208
   Actual number of rows from Drill: 0
Number of matching rows: 0
 Number of rows missing: 208
  Number of rows unexpected: 0
   
   These rows are missing (first 10):
   122823 (1 occurence(s))
   6151 1 (1 occurence(s))
   307271 (1 occurence(s))
   110087   3 (1 occurence(s))
   706623 (1 occurence(s))
   101382   3 (1 occurence(s))
   144393   1 (1 occurence(s))
   952553 (1 occurence(s))
   983311 (1 occurence(s))
   122904   3 (1 occurence(s))
   Query: 
/root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/join08-merge.sql
   SELECT COUNT(*)
   FROM (SELECT L.L_ORDERKEY AS X, C.C_CUSTKEY AS Y
   FROM lineitem L
   LEFT OUTER JOIN customer C
   ON L.L_ORDERKEY = C.C_CUSTKEY) AS FOO
   WHERE X < 1
   
   Baseline: 
/root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/join08-merge.e_tsv
Expected number of rows: 1
   Actual number of rows from Drill: 1
Number of matching rows: 0
 Number of rows missing: 1
  Number of rows unexpected: 1
   
   These rows are not expected (first 10):
   0
   
   These rows are missing (first 10):
   9965 (1 occurence(s))
   Query: 
/root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/original/parquet/query4.sql
   SELECT
   O.O_ORDERPRIORITY,
   COUNT(*) AS ORDER_COUNT
   FROM
   orders O
   WHERE
   O.O_ORDERDate >= DATE '1996-10-01'
   AND O.O_ORDERDate < DATE '1996-10-01' + INTERVAL '3' MONTH
   AND
   EXISTS (
   SELECT
   *
   FROM
   lineitem L
   WHERE
   L.L_ORDERKEY = O.O_ORDERKEY
   AND L.L_COMMITDate < L.L_RECEIPTDate
   )
   GROUP BY
   O.O_ORDERPRIORITY
   ORDER BY
   O.O_ORDERPRIORITY
   
   Baseline: 
/root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/original/parquet/query4.e_tsv
Expected number of rows: 5
   Actual number of rows from Drill: 0
Number of matching rows: 0
 Number of rows missing: 5
  Number of rows unexpected: 0
   
   These rows are missing (first 10):
   1-URGENT 10611 (1 occurence(s))
   2-HIGH   10538 (1 occurence(s))
   3-MEDIUM 10574 (1 occurence(s))
   4-NOT SPECIFIED  10511 (1 occurence(s))
   5-LOW10568 (1 occurence(s))
   
   Query: 
/root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/join09-hash.sql
   SELECT COUNT(*)
   FROM (SELECT L.L_ORDERKEY AS X, C.C_CUSTKEY AS Y
   FROM lineitem L
   LEFT OUTER JOIN 

[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework

2019-07-01 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876075#comment-16876075
 ] 

ASF GitHub Bot commented on DRILL-7306:
---

arina-ielchiieva commented on issue #1813: DRILL-7306: Disable schema-only 
batch for new scan framework
URL: https://github.com/apache/drill/pull/1813#issuecomment-507203793
 
 
   @paul-rogers re-ran full tests suit for the updated branch. Previous 
failures have been fixed but new were added:
   ```
   
/root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/union02.sql
   
/root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/join08-merge.sql
   
/root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/original/parquet/query4.sql
   
/root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/join09-hash.sql
   
/root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/join08-hash.sql
   
/root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/union01.sql
   
/root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/filter01.sql
   
/root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/join01-hash.sql
   
/root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/join09-merge.sql
   
/root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/failed04.sql
   
/root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/union03.sql
   ```
   Looks like queries started to return zero results:
   ```
   Data Verification Failures:
   
   Query: 
/root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/union02.sql
   SELECT C_CUSTKEY, C_NATIONKEY FROM customer C WHERE C_ACCTBAL BETWEEN 1000 
AND 1200 AND C_NATIONKEY IN (1, 3)
   
   Baseline: 
/root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/union02.e_tsv
Expected number of rows: 208
   Actual number of rows from Drill: 0
Number of matching rows: 0
 Number of rows missing: 208
  Number of rows unexpected: 0
   
   These rows are missing (first 10):
   122823 (1 occurence(s))
   6151 1 (1 occurence(s))
   307271 (1 occurence(s))
   110087   3 (1 occurence(s))
   706623 (1 occurence(s))
   101382   3 (1 occurence(s))
   144393   1 (1 occurence(s))
   952553 (1 occurence(s))
   983311 (1 occurence(s))
   122904   3 (1 occurence(s))
   Query: 
/root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/join08-merge.sql
   SELECT COUNT(*)
   FROM (SELECT L.L_ORDERKEY AS X, C.C_CUSTKEY AS Y
   FROM lineitem L
   LEFT OUTER JOIN customer C
   ON L.L_ORDERKEY = C.C_CUSTKEY) AS FOO
   WHERE X < 1
   
   Baseline: 
/root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/join08-merge.e_tsv
Expected number of rows: 1
   Actual number of rows from Drill: 1
Number of matching rows: 0
 Number of rows missing: 1
  Number of rows unexpected: 1
   
   These rows are not expected (first 10):
   0
   
   These rows are missing (first 10):
   9965 (1 occurence(s))
   Query: 
/root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/original/parquet/query4.sql
   SELECT
   O.O_ORDERPRIORITY,
   COUNT(*) AS ORDER_COUNT
   FROM
   orders O
   WHERE
   O.O_ORDERDate >= DATE '1996-10-01'
   AND O.O_ORDERDate < DATE '1996-10-01' + INTERVAL '3' MONTH
   AND
   EXISTS (
   SELECT
   *
   FROM
   lineitem L
   WHERE
   L.L_ORDERKEY = O.O_ORDERKEY
   AND L.L_COMMITDate < L.L_RECEIPTDate
   )
   GROUP BY
   O.O_ORDERPRIORITY
   ORDER BY
   O.O_ORDERPRIORITY
   
   Baseline: 
/root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/original/parquet/query4.e_tsv
Expected number of rows: 5
   Actual number of rows from Drill: 0
Number of matching rows: 0
 Number of rows missing: 5
  Number of rows unexpected: 0
   
   These rows are missing (first 10):
   1-URGENT 10611 (1 occurence(s))
   2-HIGH   10538 (1 occurence(s))
   3-MEDIUM 10574 (1 occurence(s))
   4-NOT SPECIFIED  10511 (1 occurence(s))
   5-LOW10568 (1 occurence(s))
   
   Query: 
/root/drillAutomation/drill-test-framework/framework/resources/Advanced/tpch/tpch_sf1/smoke/parquet/join09-hash.sql
   SELECT COUNT(*)
   FROM (SELECT L.L_ORDERKEY AS X, C.C_CUSTKEY AS Y
   FROM lineitem L
   LEFT OUTER JOIN 

[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework

2019-06-30 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875902#comment-16875902
 ] 

ASF GitHub Bot commented on DRILL-7306:
---

paul-rogers commented on issue #1813: DRILL-7306: Disable schema-only batch for 
new scan framework
URL: https://github.com/apache/drill/pull/1813#issuecomment-507085743
 
 
   @arina-ielchiieva, please review the recent changes to this PR. If more 
comments, I'll go ahead and address them. Once you are satisfied, and provide a 
+1, I'll create a single merge branch with this PR, DRILL-6951 and DRILL-7293.
   
   I will run the complete unit tests. But, I'm afraid I must ask you to run 
the functional tests as I don't have a MFS or Hadoop cluster available to run 
that test suite. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Disable "fast schema" batch for new scan framework
> --
>
> Key: DRILL-7306
> URL: https://issues.apache.org/jira/browse/DRILL-7306
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
>  The EVF framework is set up to return a "fast schema" empty batch with only 
> schema as its first batch because, when the code was written, it seemed 
> that's how we wanted operators to work. However, DRILL-7305 notes that many 
> operators cannot handle empty batches.
> Since the empty-batch bugs show that Drill does not, in fact, provide a "fast 
> schema" batch, this ticket asks to disable the feature in the new scan 
> framework. The feature is disabled with a config option; it can be re-enabled 
> if ever it is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework

2019-06-30 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875884#comment-16875884
 ] 

ASF GitHub Bot commented on DRILL-7306:
---

paul-rogers commented on pull request #1813: DRILL-7306: Disable schema-only 
batch for new scan framework
URL: https://github.com/apache/drill/pull/1813#discussion_r298853771
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/TestSchemaWithTableFunction.java
 ##
 @@ -17,6 +17,14 @@
  */
 package org.apache.drill;
 
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertTrue;
 
 Review comment:
   Sorry, I've manually put the imports back and have updated the Eclipse rules 
to put static imports after the others.
   
   As it turns out, the Eclipse style rules on the Drill web site are not valid 
for the latest Eclipse: "Import failed. This is not a faled profile: Expected 
`CleanUpProfile` but encountered `CodeFormatterProfile`. So, I've been using 
the Eclipse default profile with a few adjustments per the [web 
site](http://drill.apache.org/docs/apache-drill-contribution-guidelines/). The 
web site does not mention a preferred import order.
   
   I have tried disabling the auto-update of imports so that files are left 
unchanged. However, this tends to leave unused imports which then cause the 
build to fail and take a long time to fix manually.
   
   Best solution: update the Eclipse settings file both to make it valid and to 
include any missing style rules. Then, I'll update my IDE to enforce the 
project's preferred standards.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Disable "fast schema" batch for new scan framework
> --
>
> Key: DRILL-7306
> URL: https://issues.apache.org/jira/browse/DRILL-7306
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
>  The EVF framework is set up to return a "fast schema" empty batch with only 
> schema as its first batch because, when the code was written, it seemed 
> that's how we wanted operators to work. However, DRILL-7305 notes that many 
> operators cannot handle empty batches.
> Since the empty-batch bugs show that Drill does not, in fact, provide a "fast 
> schema" batch, this ticket asks to disable the feature in the new scan 
> framework. The feature is disabled with a config option; it can be re-enabled 
> if ever it is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework

2019-06-30 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875741#comment-16875741
 ] 

ASF GitHub Bot commented on DRILL-7306:
---

arina-ielchiieva commented on pull request #1813: DRILL-7306: Disable 
schema-only batch for new scan framework
URL: https://github.com/apache/drill/pull/1813#discussion_r298827401
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/TestSchemaWithTableFunction.java
 ##
 @@ -17,6 +17,14 @@
  */
 package org.apache.drill;
 
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertTrue;
 
 Review comment:
   I am not sure why your IDE makes static imports on the top of the list, I 
think its more common when they are in the end. I think all Drill project 
follows this convention...
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Disable "fast schema" batch for new scan framework
> --
>
> Key: DRILL-7306
> URL: https://issues.apache.org/jira/browse/DRILL-7306
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
>  The EVF framework is set up to return a "fast schema" empty batch with only 
> schema as its first batch because, when the code was written, it seemed 
> that's how we wanted operators to work. However, DRILL-7305 notes that many 
> operators cannot handle empty batches.
> Since the empty-batch bugs show that Drill does not, in fact, provide a "fast 
> schema" batch, this ticket asks to disable the feature in the new scan 
> framework. The feature is disabled with a config option; it can be re-enabled 
> if ever it is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework

2019-06-30 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875696#comment-16875696
 ] 

ASF GitHub Bot commented on DRILL-7306:
---

paul-rogers commented on issue #1813: DRILL-7306: Disable schema-only batch for 
new scan framework
URL: https://github.com/apache/drill/pull/1813#issuecomment-507011340
 
 
   Rebased on master and resolved conflicts.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Disable "fast schema" batch for new scan framework
> --
>
> Key: DRILL-7306
> URL: https://issues.apache.org/jira/browse/DRILL-7306
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
>  The EVF framework is set up to return a "fast schema" empty batch with only 
> schema as its first batch because, when the code was written, it seemed 
> that's how we wanted operators to work. However, DRILL-7305 notes that many 
> operators cannot handle empty batches.
> Since the empty-batch bugs show that Drill does not, in fact, provide a "fast 
> schema" batch, this ticket asks to disable the feature in the new scan 
> framework. The feature is disabled with a config option; it can be re-enabled 
> if ever it is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework

2019-06-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875692#comment-16875692
 ] 

ASF GitHub Bot commented on DRILL-7306:
---

paul-rogers commented on issue #1813: DRILL-7306: Disable schema-only batch for 
new scan framework
URL: https://github.com/apache/drill/pull/1813#issuecomment-506999568
 
 
   When running the full tests, the following failed in `java-exec`:
   
   ```
   [ERROR] Errors: 
   [ERROR]   
TestDynamicUDFSupport.testReRegisterTheSameJarWithDifferentContent:600->BaseTestQuery.testRunAndReturn:340
 » Rpc
   ```
   
   When running this unit test in Eclipse, two tests failed: `testDropFunction` 
and `testSuccessfulRegistrationAfterSeveralRetryAttempts`. Then, 
`testConcurrentRemoteRegistryUpdateWithDuplicates` hung forever.
   
   I believe these tests (and one other) seem to fail about 50% of the time on 
my builds. For example:
   
   ```
   [ERROR] Errors: 
   [ERROR]   TestPStoreProviders.verifyZkStore:67 » NoSuchElement
   ```
   
   The workaround seems to be to rebuild all of Drill. That is, the rough 
pattern seems to be that this test will run once after a clean build, but will 
fail if run a second time or after a code change. Not sure if this is the exact 
pattern; something like this happens.
   
   The result is that it is hard to tell if my code broke something or if the 
tests are just flaky. I wonder, is there something we can do to stabilize these 
tests? All other tests run fine if I rerun them a second time on the same build 
or after I make a small code change.
   
   Anyway, after doing a full rebuild and retest, this commit does pass all 
unit tests.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Disable "fast schema" batch for new scan framework
> --
>
> Key: DRILL-7306
> URL: https://issues.apache.org/jira/browse/DRILL-7306
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
>  The EVF framework is set up to return a "fast schema" empty batch with only 
> schema as its first batch because, when the code was written, it seemed 
> that's how we wanted operators to work. However, DRILL-7305 notes that many 
> operators cannot handle empty batches.
> Since the empty-batch bugs show that Drill does not, in fact, provide a "fast 
> schema" batch, this ticket asks to disable the feature in the new scan 
> framework. The feature is disabled with a config option; it can be re-enabled 
> if ever it is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework

2019-06-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875632#comment-16875632
 ] 

ASF GitHub Bot commented on DRILL-7306:
---

paul-rogers commented on issue #1813: DRILL-7306: Disable schema-only batch for 
new scan framework
URL: https://github.com/apache/drill/pull/1813#issuecomment-506989680
 
 
   Addressed the `TestEmptyInputSql` failure. The code now recognizes two cases:
   
   1. Empty results: the reader provided a schema, but had no rows. (This is 
the case that failed.)
   2. Null results: the reader provides neither rows nor schema. This is the 
case that was always being followed, even if we have a schema.
   
   Changed the query builder row set code to return an empty row set if the 
output contains only an empty batch and contains a schema. The code continues 
to return no row set if the result is null. (Oddly, Drill will return a batch 
with no rows and no schema if the reader returns no batches at all.)
   
   Will address other issues in separate commits.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Disable "fast schema" batch for new scan framework
> --
>
> Key: DRILL-7306
> URL: https://issues.apache.org/jira/browse/DRILL-7306
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
>  The EVF framework is set up to return a "fast schema" empty batch with only 
> schema as its first batch because, when the code was written, it seemed 
> that's how we wanted operators to work. However, DRILL-7305 notes that many 
> operators cannot handle empty batches.
> Since the empty-batch bugs show that Drill does not, in fact, provide a "fast 
> schema" batch, this ticket asks to disable the feature in the new scan 
> framework. The feature is disabled with a config option; it can be re-enabled 
> if ever it is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework

2019-06-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875630#comment-16875630
 ] 

ASF GitHub Bot commented on DRILL-7306:
---

paul-rogers commented on issue #1813: DRILL-7306: Disable schema-only batch for 
new scan framework
URL: https://github.com/apache/drill/pull/1813#issuecomment-506999568
 
 
   When running the full tests, the following failed in `java-exec`:
   
   ```
   [ERROR] Errors: 
   [ERROR]   
TestDynamicUDFSupport.testReRegisterTheSameJarWithDifferentContent:600->BaseTestQuery.testRunAndReturn:340
 » Rpc
   ```
   
   When running this unit test in Eclipse, two tests failed: `testDropFunction` 
and `testSuccessfulRegistrationAfterSeveralRetryAttempts`. Then, 
`testConcurrentRemoteRegistryUpdateWithDuplicates` hung forever.
   
   I believe these tests (and one other that I can't recall) seem to fail about 
50% of the time on my builds.
   
   The workaround seems to be to rebuild all of Drill. That is, the rough 
pattern seems to be that this test will run once after a clean build, but will 
fail if run a second time or after a code change. Not sure if this is the exact 
pattern; something like this happens.
   
   The result is that it is hard to tell if my code broke something or if the 
tests are just flaky. I wonder, is there something we can do to stabilize these 
tests? All other tests run fine if I rerun them a second time on the same build 
or after I make a small code change.
   
   Anyway, after doing a full rebuild and retest, this commit does pass all 
unit tests.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Disable "fast schema" batch for new scan framework
> --
>
> Key: DRILL-7306
> URL: https://issues.apache.org/jira/browse/DRILL-7306
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
>  The EVF framework is set up to return a "fast schema" empty batch with only 
> schema as its first batch because, when the code was written, it seemed 
> that's how we wanted operators to work. However, DRILL-7305 notes that many 
> operators cannot handle empty batches.
> Since the empty-batch bugs show that Drill does not, in fact, provide a "fast 
> schema" batch, this ticket asks to disable the feature in the new scan 
> framework. The feature is disabled with a config option; it can be re-enabled 
> if ever it is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework

2019-06-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875606#comment-16875606
 ] 

ASF GitHub Bot commented on DRILL-7306:
---

paul-rogers commented on issue #1813: DRILL-7306: Disable schema-only batch for 
new scan framework
URL: https://github.com/apache/drill/pull/1813#issuecomment-506989680
 
 
   Addressed the TestEmptyInputSql failure. The code now recognizes two cases:
   
   1. Empty results: the reader provided a schema, but had no rows. (This is 
the case that failed.)
   2. Null results: the reader provides neither rows nor schema. This is the 
case that was always being followed, even if we have a schema.
   
   Changed the query builder row set code to return an empty row set if the 
output contains only an empty batch and contains a schema. The code continues 
to return no row set if the result is null. (Oddly, Drill will return a batch 
with no rows and no schema if the reader returns no batches at all.)
   
   Will address other issues in separate commits.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Disable "fast schema" batch for new scan framework
> --
>
> Key: DRILL-7306
> URL: https://issues.apache.org/jira/browse/DRILL-7306
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
>  The EVF framework is set up to return a "fast schema" empty batch with only 
> schema as its first batch because, when the code was written, it seemed 
> that's how we wanted operators to work. However, DRILL-7305 notes that many 
> operators cannot handle empty batches.
> Since the empty-batch bugs show that Drill does not, in fact, provide a "fast 
> schema" batch, this ticket asks to disable the feature in the new scan 
> framework. The feature is disabled with a config option; it can be re-enabled 
> if ever it is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework

2019-06-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875562#comment-16875562
 ] 

ASF GitHub Bot commented on DRILL-7306:
---

arina-ielchiieva commented on issue #1813: DRILL-7306: Disable schema-only 
batch for new scan framework
URL: https://github.com/apache/drill/pull/1813#issuecomment-506972377
 
 
   @paul-rogers
   When running tests there are unit and functional test failures. Please run 
full unit tests suit locally before making the PR, Travis does not do that.
   UNIT TESTS
   ```
   [INFO] Running org.apache.drill.exec.TestEmptyInputSql
   05:53:50.840 [main] ERROR org.apache.drill.TestReporter - Test Failed (d: 0 
B(1.0 MiB), h: 6.3 MiB(863.9 MiB), nh: 32 B(324.6 MiB)): 
testQueryEmptyCsv(org.apache.drill.exec.TestEmptyInputSql)
   java.lang.Exception: Expected and actual numbers of columns do not match.
at 
org.apache.drill.test.DrillTestWrapper.compareSchemaOnly(DrillTestWrapper.java:486)
 ~[test-classes/:1.17.0-SNAPSHOT]
at 
org.apache.drill.test.DrillTestWrapper.run(DrillTestWrapper.java:163) 
~[test-classes/:1.17.0-SNAPSHOT]
at 
org.apache.drill.exec.TestEmptyInputSql.testQueryEmptyCsv(TestEmptyInputSql.java:222)
 ~[test-classes/:na]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_161].
   ```
   FUNCTIONAL TESTS 
   ```  
 Data Verification Failures:
   
 Query: 
/root/drillAutomation/drill-test-framework/framework/resources/Functional/limit0/union_all/prq_union_all/data/emptyLHS_CSV.q
 SELECT cast(columns[0] as int) FROM `emptyFiles/empty_1.csv` UNION ALL 
SELECT col1 FROM notEmpty_csv_v
   
 Baseline: 
/root/drillAutomation/drill-test-framework/framework/resources/Functional/limit0/union_all/prq_union_all/data/emptyLHS_CSV.e
  Expected number of rows: 10
 Actual number of rows from Drill: 10
  Number of matching rows: 0
   Number of rows missing: 10
Number of rows unexpected: 10
   
 These rows are not expected (first 10):
 null
   
 These rows are missing (first 10):
 1 (1 occurence(s))
 2 (1 occurence(s))
 3 (1 occurence(s))
 4 (1 occurence(s))
 5 (1 occurence(s))
 6 (1 occurence(s))
 7 (1 occurence(s))
 8 (1 occurence(s))
 9 (1 occurence(s))
 10 (1 occurence(s))
   ```
   
   Please fix the failures and rebase on the latest master.
   Also when I was cherry-picking DRILL-7306 & DRILL-6951 there were conflicts.
   You can consider creating merge branch with commits for these Jiras and 
resolve the conflicts to ease merge process.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Disable "fast schema" batch for new scan framework
> --
>
> Key: DRILL-7306
> URL: https://issues.apache.org/jira/browse/DRILL-7306
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
>  The EVF framework is set up to return a "fast schema" empty batch with only 
> schema as its first batch because, when the code was written, it seemed 
> that's how we wanted operators to work. However, DRILL-7305 notes that many 
> operators cannot handle empty batches.
> Since the empty-batch bugs show that Drill does not, in fact, provide a "fast 
> schema" batch, this ticket asks to disable the feature in the new scan 
> framework. The feature is disabled with a config option; it can be re-enabled 
> if ever it is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework

2019-06-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872877#comment-16872877
 ] 

ASF GitHub Bot commented on DRILL-7306:
---

paul-rogers commented on issue #1813: DRILL-7306: Disable schema-only batch for 
new scan framework
URL: https://github.com/apache/drill/pull/1813#issuecomment-505698254
 
 
   Commits squashed. Note that we can commit either this PR, or DRILL-7293, but 
not both at the same time. I will need to add one line to DRILL-7293 either 
after committing that PR, OR after committing DRILL-7293 before this one.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Disable "fast schema" batch for new scan framework
> --
>
> Key: DRILL-7306
> URL: https://issues.apache.org/jira/browse/DRILL-7306
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
>  The EVF framework is set up to return a "fast schema" empty batch with only 
> schema as its first batch because, when the code was written, it seemed 
> that's how we wanted operators to work. However, DRILL-7305 notes that many 
> operators cannot handle empty batches.
> Since the empty-batch bugs show that Drill does not, in fact, provide a "fast 
> schema" batch, this ticket asks to disable the feature in the new scan 
> framework. The feature is disabled with a config option; it can be re-enabled 
> if ever it is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework

2019-06-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872135#comment-16872135
 ] 

ASF GitHub Bot commented on DRILL-7306:
---

arina-ielchiieva commented on issue #1813: DRILL-7306: Disable schema-only 
batch for new scan framework
URL: https://github.com/apache/drill/pull/1813#issuecomment-505333500
 
 
   @paul-rogers thanks, now it's much better.
   +1, please squash the commits.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Disable "fast schema" batch for new scan framework
> --
>
> Key: DRILL-7306
> URL: https://issues.apache.org/jira/browse/DRILL-7306
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
>  The EVF framework is set up to return a "fast schema" empty batch with only 
> schema as its first batch because, when the code was written, it seemed 
> that's how we wanted operators to work. However, DRILL-7305 notes that many 
> operators cannot handle empty batches.
> Since the empty-batch bugs show that Drill does not, in fact, provide a "fast 
> schema" batch, this ticket asks to disable the feature in the new scan 
> framework. The feature is disabled with a config option; it can be re-enabled 
> if ever it is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872009#comment-16872009
 ] 

ASF GitHub Bot commented on DRILL-7306:
---

paul-rogers commented on issue #1813: DRILL-7306: Disable schema-only batch for 
new scan framework
URL: https://github.com/apache/drill/pull/1813#issuecomment-505286184
 
 
   @arina-ielchiieva, regarding `enableSchemaBatch`, recall that Java boolean 
variables are, by definition in the language spec, set to false. So, since we 
never set it to true, except in tests, it always defaults to false.
   
   Since this was confusing, added Javadoc to explain the problem and the 
default setting of the option. Does this new material answer your question?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Disable "fast schema" batch for new scan framework
> --
>
> Key: DRILL-7306
> URL: https://issues.apache.org/jira/browse/DRILL-7306
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
>  The EVF framework is set up to return a "fast schema" empty batch with only 
> schema as its first batch because, when the code was written, it seemed 
> that's how we wanted operators to work. However, DRILL-7305 notes that many 
> operators cannot handle empty batches.
> Since the empty-batch bugs show that Drill does not, in fact, provide a "fast 
> schema" batch, this ticket asks to disable the feature in the new scan 
> framework. The feature is disabled with a config option; it can be re-enabled 
> if ever it is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871188#comment-16871188
 ] 

ASF GitHub Bot commented on DRILL-7306:
---

arina-ielchiieva commented on pull request #1813: DRILL-7306: Disable 
schema-only batch for new scan framework
URL: https://github.com/apache/drill/pull/1813#discussion_r296713307
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/easy/EasyFormatPlugin.java
 ##
 @@ -398,6 +363,40 @@ public void addContext(UserException.Builder builder) {
 }
   }
 
+  /**
+   * Initialize the scan framework builder with standard options.
+   * Call this from the plugin-specific
+   * {@link #frameworkBuilder(OptionManager, EasySubScan)} method.
+   * The plugin can then customize/revise options as needed.
 
 Review comment:
   Please add two params to the Javadoc as well.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Disable "fast schema" batch for new scan framework
> --
>
> Key: DRILL-7306
> URL: https://issues.apache.org/jira/browse/DRILL-7306
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
>  The EVF framework is set up to return a "fast schema" empty batch with only 
> schema as its first batch because, when the code was written, it seemed 
> that's how we wanted operators to work. However, DRILL-7305 notes that many 
> operators cannot handle empty batches.
> Since the empty-batch bugs show that Drill does not, in fact, provide a "fast 
> schema" batch, this ticket asks to disable the feature in the new scan 
> framework. The feature is disabled with a config option; it can be re-enabled 
> if ever it is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework

2019-06-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870808#comment-16870808
 ] 

ASF GitHub Bot commented on DRILL-7306:
---

paul-rogers commented on pull request #1813: DRILL-7306: Disable schema-only 
batch for new scan framework
URL: https://github.com/apache/drill/pull/1813
 
 
   The EVF framework is set up to return a "fast schema" empty batch with only 
schema as its first batch because, when the code was written, it seemed that's 
how we wanted operators to work. However, DRILL-7305 notes that many operators 
cannot handle empty batches.
   
   Since the empty-batch bugs show that Drill does not, in fact, provide a 
"fast schema" batch, this ticket asks to disable the feature in the new scan 
framework. The feature is disabled with a config option; it can be re-enabled 
if ever it is needed.
   
   Old tests validate the original schema-batch mode, new tests added to 
validate the no-schema-batch mode.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Disable "fast schema" batch for new scan framework
> --
>
> Key: DRILL-7306
> URL: https://issues.apache.org/jira/browse/DRILL-7306
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
>  The EVF framework is set up to return a "fast schema" empty batch with only 
> schema as its first batch because, when the code was written, it seemed 
> that's how we wanted operators to work. However, DRILL-7305 notes that many 
> operators cannot handle empty batches.
> Since the empty-batch bugs show that Drill does not, in fact, provide a "fast 
> schema" batch, this ticket asks to disable the feature in the new scan 
> framework. The feature is disabled with a config option; it can be re-enabled 
> if ever it is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)