[jira] [Commented] (DRILL-4392) CTAS with partition writes an internal field into generated parquet files

2016-03-19 Thread Victoria Markman (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200100#comment-15200100
 ] 

Victoria Markman commented on DRILL-4392:
-

This is fixed now,  test is passing in the latest nightly precommit run: 
http://10.10.104.91:8080/view/Nightly/job/Functional-Baseline-104.61/151/consoleFull

> CTAS with partition writes an internal field into generated parquet files
> -
>
> Key: DRILL-4392
> URL: https://issues.apache.org/jira/browse/DRILL-4392
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
>Priority: Blocker
> Fix For: 1.6.0
>
>
> On today's master branch:
> {code}
> select * from sys.version;
> +-+---+-++-++
> | version | commit_id |   
> commit_message|commit_time
>  |   build_email   | build_time |
> +-+---+-++-++
> | 1.5.0-SNAPSHOT  | 9a3a5c4ff670a50a49f61f97dd838da59a12f976  | DRILL-4382: 
> Remove dependency on drill-logical from vector package  | 16.02.2016 @ 
> 11:58:48 PST  | j...@apache.org  | 16.02.2016 @ 17:40:44 PST  |
> +-+---+-++-
> {code}
> Parquet table created by Drill's CTAS statement has one internal field 
> "P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R".   This additional field would not 
> impact non-star query, but would cause incorrect result for star query.
> {code}
> use dfs.tmp;
> create table nation_ctas partition by (n_regionkey) as select * from 
> cp.`tpch/nation.parquet`;
> select * from dfs.tmp.nation_ctas limit 6;
> +--++--+-++
> | n_nationkey  | n_name | n_regionkey  |  
>   n_comment   
>  | P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R  |
> +--++--+-++
> | 5| ETHIOPIA   | 0| ven packages wake quickly. 
> regu  
>| true   |
> | 15   | MOROCCO| 0| rns. blithely bold courts 
> among the closely regular packages use furiously bold platelets?  
> | false  |
> | 14   | KENYA  | 0|  pending excuses haggle 
> furiously deposits. pending, express pinto beans wake fluffily past t 
>   | false  |
> | 0| ALGERIA| 0|  haggle. carefully final 
> deposits detect slyly agai
>  | false  |
> | 16   | MOZAMBIQUE | 0| s. ironic, unusual 
> asymptotes wake blithely r
>| false  |
> | 24   | UNITED STATES  | 1| y final packages. slow foxes 
> cajole quickly. quickly silent platelets breach ironic accounts. unusual 
> pinto be  | true
> {code}
> This basically breaks all the parquet files created by Drill's CTAS with 
> partition support. 
> Also, it will also fail one of the Pre-commit functional test [1]
> [1] 
> https://github.com/mapr/drill-test-framework/blob/master/framework/resources/Functional/ctas/ctas_auto_partition/general/data/drill3361.q



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4392) CTAS with partition writes an internal field into generated parquet files

2016-02-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15154554#comment-15154554
 ] 

ASF GitHub Bot commented on DRILL-4392:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/383


> CTAS with partition writes an internal field into generated parquet files
> -
>
> Key: DRILL-4392
> URL: https://issues.apache.org/jira/browse/DRILL-4392
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>Assignee: Steven Phillips
>Priority: Blocker
>
> On today's master branch:
> {code}
> select * from sys.version;
> +-+---+-++-++
> | version | commit_id |   
> commit_message|commit_time
>  |   build_email   | build_time |
> +-+---+-++-++
> | 1.5.0-SNAPSHOT  | 9a3a5c4ff670a50a49f61f97dd838da59a12f976  | DRILL-4382: 
> Remove dependency on drill-logical from vector package  | 16.02.2016 @ 
> 11:58:48 PST  | j...@apache.org  | 16.02.2016 @ 17:40:44 PST  |
> +-+---+-++-
> {code}
> Parquet table created by Drill's CTAS statement has one internal field 
> "P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R".   This additional field would not 
> impact non-star query, but would cause incorrect result for star query.
> {code}
> use dfs.tmp;
> create table nation_ctas partition by (n_regionkey) as select * from 
> cp.`tpch/nation.parquet`;
> select * from dfs.tmp.nation_ctas limit 6;
> +--++--+-++
> | n_nationkey  | n_name | n_regionkey  |  
>   n_comment   
>  | P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R  |
> +--++--+-++
> | 5| ETHIOPIA   | 0| ven packages wake quickly. 
> regu  
>| true   |
> | 15   | MOROCCO| 0| rns. blithely bold courts 
> among the closely regular packages use furiously bold platelets?  
> | false  |
> | 14   | KENYA  | 0|  pending excuses haggle 
> furiously deposits. pending, express pinto beans wake fluffily past t 
>   | false  |
> | 0| ALGERIA| 0|  haggle. carefully final 
> deposits detect slyly agai
>  | false  |
> | 16   | MOZAMBIQUE | 0| s. ironic, unusual 
> asymptotes wake blithely r
>| false  |
> | 24   | UNITED STATES  | 1| y final packages. slow foxes 
> cajole quickly. quickly silent platelets breach ironic accounts. unusual 
> pinto be  | true
> {code}
> This basically breaks all the parquet files created by Drill's CTAS with 
> partition support. 
> Also, it will also fail one of the Pre-commit functional test [1]
> [1] 
> https://github.com/mapr/drill-test-framework/blob/master/framework/resources/Functional/ctas/ctas_auto_partition/general/data/drill3361.q



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4392) CTAS with partition writes an internal field into generated parquet files

2016-02-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15154553#comment-15154553
 ] 

ASF GitHub Bot commented on DRILL-4392:
---

Github user jinfengni commented on the pull request:

https://github.com/apache/drill/pull/383#issuecomment-186332099
  
Right. The planner could not remove that internal field by projection 
removal, since Writer operator has to use that field. It's the writer's job to 
exclude that field from the generated files.  


> CTAS with partition writes an internal field into generated parquet files
> -
>
> Key: DRILL-4392
> URL: https://issues.apache.org/jira/browse/DRILL-4392
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>Assignee: Steven Phillips
>Priority: Blocker
>
> On today's master branch:
> {code}
> select * from sys.version;
> +-+---+-++-++
> | version | commit_id |   
> commit_message|commit_time
>  |   build_email   | build_time |
> +-+---+-++-++
> | 1.5.0-SNAPSHOT  | 9a3a5c4ff670a50a49f61f97dd838da59a12f976  | DRILL-4382: 
> Remove dependency on drill-logical from vector package  | 16.02.2016 @ 
> 11:58:48 PST  | j...@apache.org  | 16.02.2016 @ 17:40:44 PST  |
> +-+---+-++-
> {code}
> Parquet table created by Drill's CTAS statement has one internal field 
> "P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R".   This additional field would not 
> impact non-star query, but would cause incorrect result for star query.
> {code}
> use dfs.tmp;
> create table nation_ctas partition by (n_regionkey) as select * from 
> cp.`tpch/nation.parquet`;
> select * from dfs.tmp.nation_ctas limit 6;
> +--++--+-++
> | n_nationkey  | n_name | n_regionkey  |  
>   n_comment   
>  | P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R  |
> +--++--+-++
> | 5| ETHIOPIA   | 0| ven packages wake quickly. 
> regu  
>| true   |
> | 15   | MOROCCO| 0| rns. blithely bold courts 
> among the closely regular packages use furiously bold platelets?  
> | false  |
> | 14   | KENYA  | 0|  pending excuses haggle 
> furiously deposits. pending, express pinto beans wake fluffily past t 
>   | false  |
> | 0| ALGERIA| 0|  haggle. carefully final 
> deposits detect slyly agai
>  | false  |
> | 16   | MOZAMBIQUE | 0| s. ironic, unusual 
> asymptotes wake blithely r
>| false  |
> | 24   | UNITED STATES  | 1| y final packages. slow foxes 
> cajole quickly. quickly silent platelets breach ironic accounts. unusual 
> pinto be  | true
> {code}
> This basically breaks all the parquet files created by Drill's CTAS with 
> partition support. 
> Also, it will also fail one of the Pre-commit functional test [1]
> [1] 
> https://github.com/mapr/drill-test-framework/blob/master/framework/resources/Functional/ctas/ctas_auto_partition/general/data/drill3361.q



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4392) CTAS with partition writes an internal field into generated parquet files

2016-02-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15153739#comment-15153739
 ] 

ASF GitHub Bot commented on DRILL-4392:
---

Github user jacques-n commented on the pull request:

https://github.com/apache/drill/pull/383#issuecomment-186058545
  
Not projection removal but rather writer rewrite. 


> CTAS with partition writes an internal field into generated parquet files
> -
>
> Key: DRILL-4392
> URL: https://issues.apache.org/jira/browse/DRILL-4392
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>Assignee: Steven Phillips
>Priority: Blocker
>
> On today's master branch:
> {code}
> select * from sys.version;
> +-+---+-++-++
> | version | commit_id |   
> commit_message|commit_time
>  |   build_email   | build_time |
> +-+---+-++-++
> | 1.5.0-SNAPSHOT  | 9a3a5c4ff670a50a49f61f97dd838da59a12f976  | DRILL-4382: 
> Remove dependency on drill-logical from vector package  | 16.02.2016 @ 
> 11:58:48 PST  | j...@apache.org  | 16.02.2016 @ 17:40:44 PST  |
> +-+---+-++-
> {code}
> Parquet table created by Drill's CTAS statement has one internal field 
> "P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R".   This additional field would not 
> impact non-star query, but would cause incorrect result for star query.
> {code}
> use dfs.tmp;
> create table nation_ctas partition by (n_regionkey) as select * from 
> cp.`tpch/nation.parquet`;
> select * from dfs.tmp.nation_ctas limit 6;
> +--++--+-++
> | n_nationkey  | n_name | n_regionkey  |  
>   n_comment   
>  | P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R  |
> +--++--+-++
> | 5| ETHIOPIA   | 0| ven packages wake quickly. 
> regu  
>| true   |
> | 15   | MOROCCO| 0| rns. blithely bold courts 
> among the closely regular packages use furiously bold platelets?  
> | false  |
> | 14   | KENYA  | 0|  pending excuses haggle 
> furiously deposits. pending, express pinto beans wake fluffily past t 
>   | false  |
> | 0| ALGERIA| 0|  haggle. carefully final 
> deposits detect slyly agai
>  | false  |
> | 16   | MOZAMBIQUE | 0| s. ironic, unusual 
> asymptotes wake blithely r
>| false  |
> | 24   | UNITED STATES  | 1| y final packages. slow foxes 
> cajole quickly. quickly silent platelets breach ironic accounts. unusual 
> pinto be  | true
> {code}
> This basically breaks all the parquet files created by Drill's CTAS with 
> partition support. 
> Also, it will also fail one of the Pre-commit functional test [1]
> [1] 
> https://github.com/mapr/drill-test-framework/blob/master/framework/resources/Functional/ctas/ctas_auto_partition/general/data/drill3361.q



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4392) CTAS with partition writes an internal field into generated parquet files

2016-02-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15153735#comment-15153735
 ] 

ASF GitHub Bot commented on DRILL-4392:
---

Github user jacques-n commented on the pull request:

https://github.com/apache/drill/pull/383#issuecomment-186053195
  
Fix looks fine. +1

However I think we should open a separate bug that this should be fixed in 
planning when we add the partition column we should the projection to remove 
this. 


> CTAS with partition writes an internal field into generated parquet files
> -
>
> Key: DRILL-4392
> URL: https://issues.apache.org/jira/browse/DRILL-4392
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>Assignee: Steven Phillips
>Priority: Blocker
>
> On today's master branch:
> {code}
> select * from sys.version;
> +-+---+-++-++
> | version | commit_id |   
> commit_message|commit_time
>  |   build_email   | build_time |
> +-+---+-++-++
> | 1.5.0-SNAPSHOT  | 9a3a5c4ff670a50a49f61f97dd838da59a12f976  | DRILL-4382: 
> Remove dependency on drill-logical from vector package  | 16.02.2016 @ 
> 11:58:48 PST  | j...@apache.org  | 16.02.2016 @ 17:40:44 PST  |
> +-+---+-++-
> {code}
> Parquet table created by Drill's CTAS statement has one internal field 
> "P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R".   This additional field would not 
> impact non-star query, but would cause incorrect result for star query.
> {code}
> use dfs.tmp;
> create table nation_ctas partition by (n_regionkey) as select * from 
> cp.`tpch/nation.parquet`;
> select * from dfs.tmp.nation_ctas limit 6;
> +--++--+-++
> | n_nationkey  | n_name | n_regionkey  |  
>   n_comment   
>  | P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R  |
> +--++--+-++
> | 5| ETHIOPIA   | 0| ven packages wake quickly. 
> regu  
>| true   |
> | 15   | MOROCCO| 0| rns. blithely bold courts 
> among the closely regular packages use furiously bold platelets?  
> | false  |
> | 14   | KENYA  | 0|  pending excuses haggle 
> furiously deposits. pending, express pinto beans wake fluffily past t 
>   | false  |
> | 0| ALGERIA| 0|  haggle. carefully final 
> deposits detect slyly agai
>  | false  |
> | 16   | MOZAMBIQUE | 0| s. ironic, unusual 
> asymptotes wake blithely r
>| false  |
> | 24   | UNITED STATES  | 1| y final packages. slow foxes 
> cajole quickly. quickly silent platelets breach ironic accounts. unusual 
> pinto be  | true
> {code}
> This basically breaks all the parquet files created by Drill's CTAS with 
> partition support. 
> Also, it will also fail one of the Pre-commit functional test [1]
> [1] 
> https://github.com/mapr/drill-test-framework/blob/master/framework/resources/Functional/ctas/ctas_auto_partition/general/data/drill3361.q



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4392) CTAS with partition writes an internal field into generated parquet files

2016-02-18 Thread Jinfeng Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15153435#comment-15153435
 ] 

Jinfeng Ni commented on DRILL-4392:
---

I submitted a patch for this issue.

Seems the issue was caused by the change of MaterializedField.getPath() 
returning a String in stead of SchemaPath. That makes the check for the 
internal partition-related field fail, since one uses String, while the other 
uses SchemaPath.  The fix is simply to compare two Strings. 

On a side note, I'm not sure if it is a right way to change 
MaterializedField.getPath() to return String, in stead of SchemaPath.  
Returning String means we have to ensure the case sensitivity in comparison is 
consistent across the code base, which seems harder to enforce.  



> CTAS with partition writes an internal field into generated parquet files
> -
>
> Key: DRILL-4392
> URL: https://issues.apache.org/jira/browse/DRILL-4392
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>Assignee: Steven Phillips
>Priority: Blocker
>
> On today's master branch:
> {code}
> select * from sys.version;
> +-+---+-++-++
> | version | commit_id |   
> commit_message|commit_time
>  |   build_email   | build_time |
> +-+---+-++-++
> | 1.5.0-SNAPSHOT  | 9a3a5c4ff670a50a49f61f97dd838da59a12f976  | DRILL-4382: 
> Remove dependency on drill-logical from vector package  | 16.02.2016 @ 
> 11:58:48 PST  | j...@apache.org  | 16.02.2016 @ 17:40:44 PST  |
> +-+---+-++-
> {code}
> Parquet table created by Drill's CTAS statement has one internal field 
> "P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R".   This additional field would not 
> impact non-star query, but would cause incorrect result for star query.
> {code}
> use dfs.tmp;
> create table nation_ctas partition by (n_regionkey) as select * from 
> cp.`tpch/nation.parquet`;
> select * from dfs.tmp.nation_ctas limit 6;
> +--++--+-++
> | n_nationkey  | n_name | n_regionkey  |  
>   n_comment   
>  | P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R  |
> +--++--+-++
> | 5| ETHIOPIA   | 0| ven packages wake quickly. 
> regu  
>| true   |
> | 15   | MOROCCO| 0| rns. blithely bold courts 
> among the closely regular packages use furiously bold platelets?  
> | false  |
> | 14   | KENYA  | 0|  pending excuses haggle 
> furiously deposits. pending, express pinto beans wake fluffily past t 
>   | false  |
> | 0| ALGERIA| 0|  haggle. carefully final 
> deposits detect slyly agai
>  | false  |
> | 16   | MOZAMBIQUE | 0| s. ironic, unusual 
> asymptotes wake blithely r
>| false  |
> | 24   | UNITED STATES  | 1| y final packages. slow foxes 
> cajole quickly. quickly silent platelets breach ironic accounts. unusual 
> pinto be  | true
> {code}
> This basically breaks all the parquet files created by Drill's CTAS with 
> partition support. 
> Also, it will also fail one of the Pre-commit functional test [1]
> [1] 
> https://github.com/mapr/drill-test-framework/blob/master/framework/resources/Functional/ctas/ctas_auto_partition/general/data/drill3361.q



--
This message was sent by 

[jira] [Commented] (DRILL-4392) CTAS with partition writes an internal field into generated parquet files

2016-02-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15153419#comment-15153419
 ] 

ASF GitHub Bot commented on DRILL-4392:
---

GitHub user jinfengni opened a pull request:

https://github.com/apache/drill/pull/383

DRILL-4392: Fix CTAS partition to remove one unnecessary internal fie…

…ld in generated parquet files.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jinfengni/incubator-drill DRILL-4392

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/383.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #383


commit bc6427685b9b3a7846acbc177cfe6e7e1163ec6e
Author: Jinfeng Ni 
Date:   2016-02-18T23:38:42Z

DRILL-4392: Fix CTAS partition to remove one unnecessary internal field in 
generated parquet files.




> CTAS with partition writes an internal field into generated parquet files
> -
>
> Key: DRILL-4392
> URL: https://issues.apache.org/jira/browse/DRILL-4392
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>Assignee: Steven Phillips
>Priority: Blocker
>
> On today's master branch:
> {code}
> select * from sys.version;
> +-+---+-++-++
> | version | commit_id |   
> commit_message|commit_time
>  |   build_email   | build_time |
> +-+---+-++-++
> | 1.5.0-SNAPSHOT  | 9a3a5c4ff670a50a49f61f97dd838da59a12f976  | DRILL-4382: 
> Remove dependency on drill-logical from vector package  | 16.02.2016 @ 
> 11:58:48 PST  | j...@apache.org  | 16.02.2016 @ 17:40:44 PST  |
> +-+---+-++-
> {code}
> Parquet table created by Drill's CTAS statement has one internal field 
> "P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R".   This additional field would not 
> impact non-star query, but would cause incorrect result for star query.
> {code}
> use dfs.tmp;
> create table nation_ctas partition by (n_regionkey) as select * from 
> cp.`tpch/nation.parquet`;
> select * from dfs.tmp.nation_ctas limit 6;
> +--++--+-++
> | n_nationkey  | n_name | n_regionkey  |  
>   n_comment   
>  | P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R  |
> +--++--+-++
> | 5| ETHIOPIA   | 0| ven packages wake quickly. 
> regu  
>| true   |
> | 15   | MOROCCO| 0| rns. blithely bold courts 
> among the closely regular packages use furiously bold platelets?  
> | false  |
> | 14   | KENYA  | 0|  pending excuses haggle 
> furiously deposits. pending, express pinto beans wake fluffily past t 
>   | false  |
> | 0| ALGERIA| 0|  haggle. carefully final 
> deposits detect slyly agai
>  | false  |
> | 16   | MOZAMBIQUE | 0| s. ironic, unusual 
> asymptotes wake blithely r
>| false  |
> | 24   | UNITED STATES  | 1| y final packages. slow foxes 
> cajole quickly. quickly silent platelets breach ironic accounts. unusual 
> pinto be  | true
> {code}
> This basically breaks all the parquet files created by Drill's CTAS with 
> partition support. 

[jira] [Commented] (DRILL-4392) CTAS with partition writes an internal field into generated parquet files

2016-02-18 Thread Deneche A. Hakim (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15152954#comment-15152954
 ] 

Deneche A. Hakim commented on DRILL-4392:
-

[~sphillips] any ETA when this could be fixed ?

> CTAS with partition writes an internal field into generated parquet files
> -
>
> Key: DRILL-4392
> URL: https://issues.apache.org/jira/browse/DRILL-4392
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>Assignee: Steven Phillips
>Priority: Blocker
>
> On today's master branch:
> {code}
> select * from sys.version;
> +-+---+-++-++
> | version | commit_id |   
> commit_message|commit_time
>  |   build_email   | build_time |
> +-+---+-++-++
> | 1.5.0-SNAPSHOT  | 9a3a5c4ff670a50a49f61f97dd838da59a12f976  | DRILL-4382: 
> Remove dependency on drill-logical from vector package  | 16.02.2016 @ 
> 11:58:48 PST  | j...@apache.org  | 16.02.2016 @ 17:40:44 PST  |
> +-+---+-++-
> {code}
> Parquet table created by Drill's CTAS statement has one internal field 
> "P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R".   This additional field would not 
> impact non-star query, but would cause incorrect result for star query.
> {code}
> use dfs.tmp;
> create table nation_ctas partition by (n_regionkey) as select * from 
> cp.`tpch/nation.parquet`;
> select * from dfs.tmp.nation_ctas limit 6;
> +--++--+-++
> | n_nationkey  | n_name | n_regionkey  |  
>   n_comment   
>  | P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R  |
> +--++--+-++
> | 5| ETHIOPIA   | 0| ven packages wake quickly. 
> regu  
>| true   |
> | 15   | MOROCCO| 0| rns. blithely bold courts 
> among the closely regular packages use furiously bold platelets?  
> | false  |
> | 14   | KENYA  | 0|  pending excuses haggle 
> furiously deposits. pending, express pinto beans wake fluffily past t 
>   | false  |
> | 0| ALGERIA| 0|  haggle. carefully final 
> deposits detect slyly agai
>  | false  |
> | 16   | MOZAMBIQUE | 0| s. ironic, unusual 
> asymptotes wake blithely r
>| false  |
> | 24   | UNITED STATES  | 1| y final packages. slow foxes 
> cajole quickly. quickly silent platelets breach ironic accounts. unusual 
> pinto be  | true
> {code}
> This basically breaks all the parquet files created by Drill's CTAS with 
> partition support. 
> Also, it will also fail one of the Pre-commit functional test [1]
> [1] 
> https://github.com/mapr/drill-test-framework/blob/master/framework/resources/Functional/ctas/ctas_auto_partition/general/data/drill3361.q



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4392) CTAS with partition writes an internal field into generated parquet files

2016-02-17 Thread Chun Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15151214#comment-15151214
 ] 

Chun Chang commented on DRILL-4392:
---

Our night automation run also hit this:

{noformat}
Summary

Execution Failures:
Verification Failures:
/root/drillAutomation/framework-master/framework/resources/Functional/interpreted_partition_pruning/ctas_auto_partition/hierarchical/data/textSelectStartFromPartition.q
/root/drillAutomation/framework-master/framework/resources/Functional/ctas/ctas_auto_partition/general/data/q2.q
/root/drillAutomation/framework-master/framework/resources/Functional/ctas/ctas_auto_partition/general/data/q7.q
/root/drillAutomation/framework-master/framework/resources/Functional/ctas/ctas_auto_partition/existing_partition_pruning/parquet/data/drill3410.q
/root/drillAutomation/framework-master/framework/resources/Functional/ctas/ctas_auto_partition/general/data/q4.q
/root/drillAutomation/framework-master/framework/resources/Functional/ctas/ctas_auto_partition/general/data/q1.q
/root/drillAutomation/framework-master/framework/resources/Functional/ctas/ctas_auto_partition/general/data/q5.q
/root/drillAutomation/framework-master/framework/resources/Functional/ctas/ctas_auto_partition/existing_partition_pruning/json/data/jsonselectStarFromPartition.q
/root/drillAutomation/framework-master/framework/resources/Functional/ctas/ctas_auto_partition/general/data/drill3361.q
/root/drillAutomation/framework-master/framework/resources/Functional/ctas/ctas_auto_partition/general/data/q9.q
/root/drillAutomation/framework-master/framework/resources/Functional/ctas/ctas_auto_partition/general/data/q11.q
/root/drillAutomation/framework-master/framework/resources/Functional/ctas/ctas_auto_partition/general/data/q8.q
/root/drillAutomation/framework-master/framework/resources/Functional/ctas/ctas_auto_partition/general/data/q6.q
/root/drillAutomation/framework-master/framework/resources/Functional/ctas/ctas_auto_partition/existing_partition_pruning/hierarchical/data/textSelectStartFromPartition.q
/root/drillAutomation/framework-master/framework/resources/Functional/ctas/ctas_auto_partition/existing_partition_pruning/parquet/data/parquetselectStarFromPartition.q
/root/drillAutomation/framework-master/framework/resources/Functional/ctas/ctas_auto_partition/general/data/q10.q
/root/drillAutomation/framework-master/framework/resources/Functional/ctas/ctas_auto_partition/general/data/q3.q
Timeout Failures:


Passing tests: 4979
Execution Failures: 0
VerificationFailures: 17
Timeouts: 0
Canceled: 0
{noformat}

> CTAS with partition writes an internal field into generated parquet files
> -
>
> Key: DRILL-4392
> URL: https://issues.apache.org/jira/browse/DRILL-4392
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>Assignee: Steven Phillips
>Priority: Blocker
>
> On today's master branch:
> {code}
> select * from sys.version;
> +-+---+-++-++
> | version | commit_id |   
> commit_message|commit_time
>  |   build_email   | build_time |
> +-+---+-++-++
> | 1.5.0-SNAPSHOT  | 9a3a5c4ff670a50a49f61f97dd838da59a12f976  | DRILL-4382: 
> Remove dependency on drill-logical from vector package  | 16.02.2016 @ 
> 11:58:48 PST  | j...@apache.org  | 16.02.2016 @ 17:40:44 PST  |
> +-+---+-++-
> {code}
> Parquet table created by Drill's CTAS statement has one internal field 
> "P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R".   This additional field would not 
> impact non-star query, but would cause incorrect result for star query.
> {code}
> use dfs.tmp;
> create table nation_ctas partition by (n_regionkey) as select * from 
> cp.`tpch/nation.parquet`;
> select * from dfs.tmp.nation_ctas limit 6;
> 

[jira] [Commented] (DRILL-4392) CTAS with partition writes an internal field into generated parquet files

2016-02-16 Thread Jinfeng Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149901#comment-15149901
 ] 

Jinfeng Ni commented on DRILL-4392:
---

This issue seems to be a regression of DRILL-4382 [1]. I moved to one commit 
earlier, and did not see such problem.

Also, if Select clause in CTAS does not contain * column, it hit this problem 
as well.  That is, the additional internal field seems to not be added because 
of the * column logic.

{code}
create table nation_ctas partition by (n_regionkey) as select n_nationkey, 
n_regionkey, n_name from cp.`tpch/nation.parquet`;
select * from dfs.tmp.nation_ctas;

+--+--+-++
| n_nationkey  | n_regionkey  | n_name  | 
P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R  |
+--+--+-++
| 5| 0| ETHIOPIA| true  
 |
| 15   | 0| MOROCCO | false 
 |
{code}


[1] 
https://github.com/apache/drill/commit/9a3a5c4ff670a50a49f61f97dd838da59a12f976

> CTAS with partition writes an internal field into generated parquet files
> -
>
> Key: DRILL-4392
> URL: https://issues.apache.org/jira/browse/DRILL-4392
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>Priority: Blocker
>
> On today's master branch:
> {code}
> select * from sys.version;
> +-+---+-++-++
> | version | commit_id |   
> commit_message|commit_time
>  |   build_email   | build_time |
> +-+---+-++-++
> | 1.5.0-SNAPSHOT  | 9a3a5c4ff670a50a49f61f97dd838da59a12f976  | DRILL-4382: 
> Remove dependency on drill-logical from vector package  | 16.02.2016 @ 
> 11:58:48 PST  | j...@apache.org  | 16.02.2016 @ 17:40:44 PST  |
> +-+---+-++-
> {code}
> Parquet table created by Drill's CTAS statement has one internal field 
> "P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R".   This additional field would not 
> impact non-star query, but would cause incorrect result for star query.
> {code}
> use dfs.tmp;
> create table nation_ctas partition by (n_regionkey) as select * from 
> cp.`tpch/nation.parquet`;
> select * from dfs.tmp.nation_ctas limit 6;
> +--++--+-++
> | n_nationkey  | n_name | n_regionkey  |  
>   n_comment   
>  | P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R  |
> +--++--+-++
> | 5| ETHIOPIA   | 0| ven packages wake quickly. 
> regu  
>| true   |
> | 15   | MOROCCO| 0| rns. blithely bold courts 
> among the closely regular packages use furiously bold platelets?  
> | false  |
> | 14   | KENYA  | 0|  pending excuses haggle 
> furiously deposits. pending, express pinto beans wake fluffily past t 
>   | false  |
> | 0| ALGERIA| 0|  haggle. carefully final 
> deposits detect slyly agai
>  | false  |
> | 16   | MOZAMBIQUE | 0| s. ironic, unusual 
> asymptotes wake blithely r
>| false  |
> | 24   | UNITED STATES  | 1| y final packages. slow foxes 
> cajole quickly. quickly silent platelets breach ironic accounts. unusual