[jira] [Updated] (HIVE-16922) Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"

2017-06-20 Thread Dudu Markovitz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dudu Markovitz updated HIVE-16922:
--
Description: 
https://github.com/apache/hive/blob/master/serde/if/serde.thrift

Typo in serde.thrift: 
COLLECTION_DELIM = "colelction.delim"

(*colelction* instead of *collection*)








  was:
https://github.com/apache/hive/blob/master/serde/if/serde.thrift

Typo in serde.thrift: 
COLLECTION_DELIM = "colelction.delim"

(_colelction _instead of _collection_)









> Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"
> ---
>
> Key: HIVE-16922
> URL: https://issues.apache.org/jira/browse/HIVE-16922
> Project: Hive
>  Issue Type: Bug
>  Components: Thrift API
>Reporter: Dudu Markovitz
>
> https://github.com/apache/hive/blob/master/serde/if/serde.thrift
> Typo in serde.thrift: 
> COLLECTION_DELIM = "colelction.delim"
> (*colelction* instead of *collection*)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (HIVE-3229) null values being loaded as non-null values into Hive

2017-06-04 Thread Dudu Markovitz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dudu Markovitz resolved HIVE-3229.
--
Resolution: Not A Bug

> null values being loaded as non-null values into Hive
> -
>
> Key: HIVE-3229
> URL: https://issues.apache.org/jira/browse/HIVE-3229
> Project: Hive
>  Issue Type: Bug
>Reporter: N Campbell
> Attachments: CERT.TSET1.txt
>
>
> various tab delimited input files contain one or more columns that represent 
> null values in rows. the data appears to load (without an error such as in 
> JIRA 3228) however the resulting values are now non-null values which is 
> incorrect.
> create table if not exists CERT.TSET1_E ( RNUM int , C1 int, C2 string)
> row format delimited
> fields terminated by '\t'
> stored as textfile;
> create table if not exists CERT.TSET1 ( RNUM int , C1 int, C2 string)
> stored as sequencefile;
> load data local inpath 'CERT.TSET1.txt'
> overwrite into table CERT.TSET1_E;
> insert overwrite table CERT.TSET1  select * from CERT.TSET1_E;



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise

2017-05-28 Thread Dudu Markovitz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16027766#comment-16027766
 ] 

Dudu Markovitz commented on HIVE-11531:
---

Done 

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select#LanguageManualSelect-SelectSyntax
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select#LanguageManualSelect-LIMITClause



> Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
> -
>
> Key: HIVE-11531
> URL: https://issues.apache.org/jira/browse/HIVE-11531
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Sergey Shelukhin
>Assignee: Hui Zheng
>  Labels: TODOC2.0
> Fix For: 2.0.0
>
> Attachments: HIVE-11531.02.patch, HIVE-11531.03.patch, 
> HIVE-11531.04.patch, HIVE-11531.05.patch, HIVE-11531.06.patch, 
> HIVE-11531.07.patch, HIVE-11531.patch, HIVE-11531.WIP.1.patch, 
> HIVE-11531.WIP.2.patch
>
>
> For any UIs that involve pagination, it is useful to issue queries in the 
> form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be 
> paginated (which can be extremely large by itself). At present, ROW_NUMBER 
> can be used to achieve this effect, but optimizations for LIMIT such as TopN 
> in ReduceSink do not apply to ROW_NUMBER. We can add first class support for 
> "skip" to existing limit, or improve ROW_NUMBER for better performance



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise

2017-05-26 Thread Dudu Markovitz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16026629#comment-16026629
 ] 

Dudu Markovitz commented on HIVE-11531:
---

[~leftylev], It seems the offset feature is not documented. Would you like me 
to do it?

> Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
> -
>
> Key: HIVE-11531
> URL: https://issues.apache.org/jira/browse/HIVE-11531
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Sergey Shelukhin
>Assignee: Hui Zheng
>  Labels: TODOC2.0
> Fix For: 2.0.0
>
> Attachments: HIVE-11531.02.patch, HIVE-11531.03.patch, 
> HIVE-11531.04.patch, HIVE-11531.05.patch, HIVE-11531.06.patch, 
> HIVE-11531.07.patch, HIVE-11531.patch, HIVE-11531.WIP.1.patch, 
> HIVE-11531.WIP.2.patch
>
>
> For any UIs that involve pagination, it is useful to issue queries in the 
> form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be 
> paginated (which can be extremely large by itself). At present, ROW_NUMBER 
> can be used to achieve this effect, but optimizations for LIMIT such as TopN 
> in ReduceSink do not apply to ROW_NUMBER. We can add first class support for 
> "skip" to existing limit, or improve ROW_NUMBER for better performance



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-12082) Null comparison for greatest and least operator

2017-05-25 Thread Dudu Markovitz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16024606#comment-16024606
 ] 

Dudu Markovitz edited comment on HIVE-12082 at 5/25/17 11:45 AM:
-

[~szehon] [~sershe] -

_GREATEST_ and _LEAST_ return _NULL_ if one their argument is _NULL_, on every 
database I'm familiar with.
However, I strongly feel this is a bad design that does not fit realistic 
use-cases.
SQL users tends to use _GREATEST_ /_LEAST_ as an horizontal _MAX_/_MIN_, which 
in that case makes a lot of sense to ignore _NULL_ values .
The is a work-around _NULLS_ but it is cumbersome and error-prone, e.g. -
{code}
GREATEST (coalesce(x,-9),coalesce(y,-9),coalesce(z,-9))
{code}

I would like to suggest 2 possible options to handle this function differently:
1. configuration: something like _hive.greatest.least.ignore.null_ with default 
value of _false_.
2. Enhanced syntax : {code}GREATEST/LEAST (...) [IGNORE NULLS]{code}

What say you?






was (Author: dmarkovitz):
[~szehon] [~sershe] -

_GREATEST_ and _LEAST_ return _NULL_ if one their argument is _NULL_, on every 
database I'm familiar with.
However, I strongly feel this is a bad design that does not fit realistic 
use-cases.
SQL users tends to use _GREATEST_ /_LEAST_ as an horizontal _MAX_/_MIN_, which 
in that case makes a lot of sense to ignore _NULL_ values .
The is a work-around _NULLS_ but it is cumbersome and error-prone, e.g. -
{code}
GREATEST (coalesce(x,-9),coalesce(y,-9),coalesce(z,-9))
{code}

I would like to suggest 2 possible options to handle this function differently:
1. configuration: something like _hive.greatest.least.ignore.null_ with default 
value of _false_.
2. Enhanced the functions syntax : {code}GREATEST/LEAST (...) [IGNORE 
NULLS]{code}

What say you?





> Null comparison for greatest and least operator
> ---
>
> Key: HIVE-12082
> URL: https://issues.apache.org/jira/browse/HIVE-12082
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Fix For: 2.0.0
>
> Attachments: HIVE-12082.2.patch, HIVE-12082.patch
>
>
> In mysql comparisons if any of the entries are null, then the result is null.
> [https://dev.mysql.com/doc/refman/5.0/en/comparison-operators.html|https://dev.mysql.com/doc/refman/5.0/en/comparison-operators.html]
>  and 
> [https://dev.mysql.com/doc/refman/5.0/en/type-conversion.html|https://dev.mysql.com/doc/refman/5.0/en/type-conversion.html].
> This can be demonstrated by the following mysql query:
> {noformat}
> mysql> select greatest(1, null) from test;
> +---+
> | greatest(1, null) |
> +---+
> |  NULL |
> +---+
> 1 row in set (0.00 sec)
> mysql> select greatest(-1, null) from test;
> ++
> | greatest(-1, null) |
> ++
> |   NULL |
> ++
> 1 row in set (0.00 sec)
> {noformat}
> This is in contrast to Hive, where null are ignored in the comparisons.
> {noformat}
> hive> select greatest(null, 1) from test;
> OK
> 1
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-12082) Null comparison for greatest and least operator

2017-05-25 Thread Dudu Markovitz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16024606#comment-16024606
 ] 

Dudu Markovitz commented on HIVE-12082:
---

[~szehon] [~sershe] -

_GREATEST_ and _LEAST_ return _NULL_ if one their argument is _NULL_, on every 
database I'm familiar with.
However, I strongly feel this is a bad design that does not fit realistic 
use-cases.
SQL users tends to use _GREATEST_ /_LEAST_ as an horizontal _MAX_/_MIN_, which 
in that case makes a lot of sense to ignore _NULL_ values .
The is a work-around _NULLS_ but it is cumbersome and error-prone, e.g. -
{code}
GREATEST (coalesce(x,-9),coalesce(y,-9),coalesce(z,-9))
{code}

I would like to suggest 2 possible options to handle this function differently:
1. configuration: something like _hive.greatest.least.ignore.null_ with default 
value of _false_.
2. Enhanced the functions syntax : {code}GREATEST/LEAST (...) [IGNORE 
NULLS]{code}

What say you?





> Null comparison for greatest and least operator
> ---
>
> Key: HIVE-12082
> URL: https://issues.apache.org/jira/browse/HIVE-12082
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Fix For: 2.0.0
>
> Attachments: HIVE-12082.2.patch, HIVE-12082.patch
>
>
> In mysql comparisons if any of the entries are null, then the result is null.
> [https://dev.mysql.com/doc/refman/5.0/en/comparison-operators.html|https://dev.mysql.com/doc/refman/5.0/en/comparison-operators.html]
>  and 
> [https://dev.mysql.com/doc/refman/5.0/en/type-conversion.html|https://dev.mysql.com/doc/refman/5.0/en/type-conversion.html].
> This can be demonstrated by the following mysql query:
> {noformat}
> mysql> select greatest(1, null) from test;
> +---+
> | greatest(1, null) |
> +---+
> |  NULL |
> +---+
> 1 row in set (0.00 sec)
> mysql> select greatest(-1, null) from test;
> ++
> | greatest(-1, null) |
> ++
> |   NULL |
> ++
> 1 row in set (0.00 sec)
> {noformat}
> This is in contrast to Hive, where null are ignored in the comparisons.
> {noformat}
> hive> select greatest(null, 1) from test;
> OK
> 1
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-12082) Null comparison for greatest and least operator

2017-05-25 Thread Dudu Markovitz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16024537#comment-16024537
 ] 

Dudu Markovitz edited comment on HIVE-12082 at 5/25/17 10:57 AM:
-

[~leftylev], you were actually right in the first place. 
The functions _least_ and _greatest_ are not (just) mathematical functions, 
they work for every primitive type and therefore should be documented in the 
conditional section.

e.g., 

{code}
select greatest ('goodbye','yellow','brick','road');   -- 
yellow
select least(date '2017-03-25',date '2017-01-17',date '2017-01-02');   -- 
2017-01-02
{code}



was (Author: dmarkovitz):
[~leftylev]], you were actually right in the first place. 
The functions _least_ and _greatest_ are not (just) mathematical functions, 
they work for every primitive type and therefore should be documented in the 
conditional section.

e.g., 

{code}
select greatest ('goodbye','yellow','brick','road');   -- 
yellow
select least(date '2017-03-25',date '2017-01-17',date '2017-01-02');   -- 
2017-01-02
{code}


> Null comparison for greatest and least operator
> ---
>
> Key: HIVE-12082
> URL: https://issues.apache.org/jira/browse/HIVE-12082
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Fix For: 2.0.0
>
> Attachments: HIVE-12082.2.patch, HIVE-12082.patch
>
>
> In mysql comparisons if any of the entries are null, then the result is null.
> [https://dev.mysql.com/doc/refman/5.0/en/comparison-operators.html|https://dev.mysql.com/doc/refman/5.0/en/comparison-operators.html]
>  and 
> [https://dev.mysql.com/doc/refman/5.0/en/type-conversion.html|https://dev.mysql.com/doc/refman/5.0/en/type-conversion.html].
> This can be demonstrated by the following mysql query:
> {noformat}
> mysql> select greatest(1, null) from test;
> +---+
> | greatest(1, null) |
> +---+
> |  NULL |
> +---+
> 1 row in set (0.00 sec)
> mysql> select greatest(-1, null) from test;
> ++
> | greatest(-1, null) |
> ++
> |   NULL |
> ++
> 1 row in set (0.00 sec)
> {noformat}
> This is in contrast to Hive, where null are ignored in the comparisons.
> {noformat}
> hive> select greatest(null, 1) from test;
> OK
> 1
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-12082) Null comparison for greatest and least operator

2017-05-25 Thread Dudu Markovitz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16024537#comment-16024537
 ] 

Dudu Markovitz edited comment on HIVE-12082 at 5/25/17 10:57 AM:
-

[~leftylev]], you were actually right in the first place. 
The functions _least_ and _greatest_ are not (just) mathematical functions, 
they work for every primitive type and therefore should be documented in the 
conditional section.

e.g., 

{code}
select greatest ('goodbye','yellow','brick','road');   -- 
yellow
select least(date '2017-03-25',date '2017-01-17',date '2017-01-02');   -- 
2017-01-02
{code}



was (Author: dmarkovitz):
[~leftylev]], you were actually right in the first place. 
The functions _least_ and _greatest_ are not (just) mathematical functions, 
they work for every primitive type and therefore should be documented in the 
conditional section.

e.g., 

{code}
select greatest ('goodbye','yellow','brick','road');
   -- yellow

select least(date '2017-03-25',date '2017-01-17',date '2017-01-02');   -- 
2017-01-02
{code}


> Null comparison for greatest and least operator
> ---
>
> Key: HIVE-12082
> URL: https://issues.apache.org/jira/browse/HIVE-12082
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Fix For: 2.0.0
>
> Attachments: HIVE-12082.2.patch, HIVE-12082.patch
>
>
> In mysql comparisons if any of the entries are null, then the result is null.
> [https://dev.mysql.com/doc/refman/5.0/en/comparison-operators.html|https://dev.mysql.com/doc/refman/5.0/en/comparison-operators.html]
>  and 
> [https://dev.mysql.com/doc/refman/5.0/en/type-conversion.html|https://dev.mysql.com/doc/refman/5.0/en/type-conversion.html].
> This can be demonstrated by the following mysql query:
> {noformat}
> mysql> select greatest(1, null) from test;
> +---+
> | greatest(1, null) |
> +---+
> |  NULL |
> +---+
> 1 row in set (0.00 sec)
> mysql> select greatest(-1, null) from test;
> ++
> | greatest(-1, null) |
> ++
> |   NULL |
> ++
> 1 row in set (0.00 sec)
> {noformat}
> This is in contrast to Hive, where null are ignored in the comparisons.
> {noformat}
> hive> select greatest(null, 1) from test;
> OK
> 1
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-12082) Null comparison for greatest and least operator

2017-05-25 Thread Dudu Markovitz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16024537#comment-16024537
 ] 

Dudu Markovitz commented on HIVE-12082:
---

[~leftylev]], you were actually right in the first place. 
The functions _least_ and _greatest_ are not (just) mathematical functions, 
they work for every primitive type and therefore should be documented in the 
conditional section.

e.g., 

{code}
select greatest ('goodbye','yellow','brick','road');
   -- yellow

select least(date '2017-03-25',date '2017-01-17',date '2017-01-02');   -- 
2017-01-02
{code}


> Null comparison for greatest and least operator
> ---
>
> Key: HIVE-12082
> URL: https://issues.apache.org/jira/browse/HIVE-12082
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Fix For: 2.0.0
>
> Attachments: HIVE-12082.2.patch, HIVE-12082.patch
>
>
> In mysql comparisons if any of the entries are null, then the result is null.
> [https://dev.mysql.com/doc/refman/5.0/en/comparison-operators.html|https://dev.mysql.com/doc/refman/5.0/en/comparison-operators.html]
>  and 
> [https://dev.mysql.com/doc/refman/5.0/en/type-conversion.html|https://dev.mysql.com/doc/refman/5.0/en/type-conversion.html].
> This can be demonstrated by the following mysql query:
> {noformat}
> mysql> select greatest(1, null) from test;
> +---+
> | greatest(1, null) |
> +---+
> |  NULL |
> +---+
> 1 row in set (0.00 sec)
> mysql> select greatest(-1, null) from test;
> ++
> | greatest(-1, null) |
> ++
> |   NULL |
> ++
> 1 row in set (0.00 sec)
> {noformat}
> This is in contrast to Hive, where null are ignored in the comparisons.
> {noformat}
> hive> select greatest(null, 1) from test;
> OK
> 1
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-10120) Disallow create table with dot/colon in column name

2017-05-07 Thread Dudu Markovitz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999540#comment-15999540
 ] 

Dudu Markovitz edited comment on HIVE-10120 at 5/7/17 9:00 AM:
---

1. 
I'm truly puzzled here.
Why put a limitation on the *create* statement instead of removing the 
limitation from the *select* statement?
There seems to be no logical reason to prohibit queries such as *select 
`a.b.c`*.

2.
A)
*CTAS* was not handled and it still possible to create tables that could not be 
queried

{code}
hive> create table t as select 1 as `a.b.c`;
OK
hive> desc t;
OK
col_namedata_type   comment
a.b.c   int 
Time taken: 0.441 seconds, Fetched: 1 row(s)
hive> select * from t;
FAILED: RuntimeException java.lang.RuntimeException: cannot find field a from 
[0:a.b.c]
{code}

B)
*alter table ... change column* was not handled and it still possible to rename 
a column such that the table could not be queried

{code}
create table t (i int);
alter table t change column i `a.b.c` int
select * from t;

Error while compiling statement: FAILED: RuntimeException 
java.lang.RuntimeException: cannot find field a from [0:a.b.c]
{code}


was (Author: dmarkovitz):
1. 
I'm truly puzzled here.
Why put a limitation on the *create* statement instead of removing the 
limitation from the *select* statement?
There seems to be no logical reason to prohibit queries such as *select 
`a.b.c`*.

2.
A)
*CTAS* was not handled and it still possible to create tables that cannot be 
queried

{code}
hive> create table t as select 1 as `a.b.c`;
OK
hive> desc t;
OK
col_namedata_type   comment
a.b.c   int 
Time taken: 0.441 seconds, Fetched: 1 row(s)
hive> select * from t;
FAILED: RuntimeException java.lang.RuntimeException: cannot find field a from 
[0:a.b.c]
{code}

B)
*alter table ... change column* was not handled and it still possible to rename 
a column to unqueryable name.

{code}
create table t (i int);
alter table t change column i `a.b.c` int
select * from t;

Error while compiling statement: FAILED: RuntimeException 
java.lang.RuntimeException: cannot find field a from [0:a.b.c]
{code}

> Disallow create table with dot/colon in column name
> ---
>
> Key: HIVE-10120
> URL: https://issues.apache.org/jira/browse/HIVE-10120
> Project: Hive
>  Issue Type: Improvement
>  Components: Parser
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>  Labels: TODOC1.2
> Fix For: 1.2.0
>
> Attachments: HIVE-10120.01.patch, HIVE-10120.02.patch
>
>
> Since we don't allow users to query column names with dot in the middle such 
> as emp.no, don't allow users to create tables with such columns that cannot 
> be queried. Fix the documentation to reflect this fix.
> Here is an example. Consider this table:
> {code}
> CREATE TABLE a (`emp.no` string);
> select `emp.no` from a; fails with this message:
> FAILED: RuntimeException java.lang.RuntimeException: cannot find field emp 
> from [0:emp.no]
> {code}
> The hive documentation needs to be fixed:
> {code}
>  (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL) seems 
> to  indicate that any Unicode character can go between the backticks in the 
> select statement, but it doesn’t like the dot/colon or even select * when 
> there is a column that has a dot/colon. 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-10120) Disallow create table with dot/colon in column name

2017-05-07 Thread Dudu Markovitz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999740#comment-15999740
 ] 

Dudu Markovitz commented on HIVE-10120:
---

[~leftylev] - Done


> Disallow create table with dot/colon in column name
> ---
>
> Key: HIVE-10120
> URL: https://issues.apache.org/jira/browse/HIVE-10120
> Project: Hive
>  Issue Type: Improvement
>  Components: Parser
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>  Labels: TODOC1.2
> Fix For: 1.2.0
>
> Attachments: HIVE-10120.01.patch, HIVE-10120.02.patch
>
>
> Since we don't allow users to query column names with dot in the middle such 
> as emp.no, don't allow users to create tables with such columns that cannot 
> be queried. Fix the documentation to reflect this fix.
> Here is an example. Consider this table:
> {code}
> CREATE TABLE a (`emp.no` string);
> select `emp.no` from a; fails with this message:
> FAILED: RuntimeException java.lang.RuntimeException: cannot find field emp 
> from [0:emp.no]
> {code}
> The hive documentation needs to be fixed:
> {code}
>  (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL) seems 
> to  indicate that any Unicode character can go between the backticks in the 
> select statement, but it doesn’t like the dot/colon or even select * when 
> there is a column that has a dot/colon. 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-10120) Disallow create table with dot/colon in column name

2017-05-07 Thread Dudu Markovitz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999540#comment-15999540
 ] 

Dudu Markovitz edited comment on HIVE-10120 at 5/7/17 8:49 AM:
---

1. 
I'm truly puzzled here.
Why put a limitation on the *create* statement instead of removing the 
limitation from the *select* statement?
There seems to be no logical reason to prohibit queries such as *select 
`a.b.c`*.

2.
A)
*CTAS* was not handled and it still possible to create tables that cannot be 
queried

{code}
hive> create table t as select 1 as `a.b.c`;
OK
hive> desc t;
OK
col_namedata_type   comment
a.b.c   int 
Time taken: 0.441 seconds, Fetched: 1 row(s)
hive> select * from t;
FAILED: RuntimeException java.lang.RuntimeException: cannot find field a from 
[0:a.b.c]
{code}

B)
*alter table ... change column* was not handled and it still possible to rename 
a column to unqueryable name.

{code}
create table t (i int);
alter table t change column i `a.b.c` int
select * from t;

Error while compiling statement: FAILED: RuntimeException 
java.lang.RuntimeException: cannot find field a from [0:a.b.c]
{code}


was (Author: dmarkovitz):
1. 
I'm truly puzzled here.
Why put a limitation on the *create* statement instead of removing the 
limitation from the *select* statement?
There seems to be no logical reason to prohibit queries such as *select 
`a.b.c`*.

2.
*CTAS* was not handled and it still possible to create tables that cannot be 
queried

{code}
hive> create table t as select 1 as `a.b.c`;
OK
hive> desc t;
OK
col_namedata_type   comment
a.b.c   int 
Time taken: 0.441 seconds, Fetched: 1 row(s)
hive> select * from t;
FAILED: RuntimeException java.lang.RuntimeException: cannot find field a from 
[0:a.b.c]
{code}

> Disallow create table with dot/colon in column name
> ---
>
> Key: HIVE-10120
> URL: https://issues.apache.org/jira/browse/HIVE-10120
> Project: Hive
>  Issue Type: Improvement
>  Components: Parser
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>  Labels: TODOC1.2
> Fix For: 1.2.0
>
> Attachments: HIVE-10120.01.patch, HIVE-10120.02.patch
>
>
> Since we don't allow users to query column names with dot in the middle such 
> as emp.no, don't allow users to create tables with such columns that cannot 
> be queried. Fix the documentation to reflect this fix.
> Here is an example. Consider this table:
> {code}
> CREATE TABLE a (`emp.no` string);
> select `emp.no` from a; fails with this message:
> FAILED: RuntimeException java.lang.RuntimeException: cannot find field emp 
> from [0:emp.no]
> {code}
> The hive documentation needs to be fixed:
> {code}
>  (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL) seems 
> to  indicate that any Unicode character can go between the backticks in the 
> select statement, but it doesn’t like the dot/colon or even select * when 
> there is a column that has a dot/colon. 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-10120) Disallow create table with dot/colon in column name

2017-05-06 Thread Dudu Markovitz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999540#comment-15999540
 ] 

Dudu Markovitz edited comment on HIVE-10120 at 5/6/17 7:18 PM:
---

1. 
I'm truly puzzled here.
Why put a limitation on the *create* statement instead of removing the 
limitation from the *select* statement?
There seems to be no logical reason to prohibit queries such as *select 
`a.b.c`*.

2.
*CTAS* was not handled and it still possible to create tables that cannot be 
queried

{code}
hive> create table t as select 1 as `a.b.c`;
OK
hive> desc t;
OK
col_namedata_type   comment
a.b.c   int 
Time taken: 0.441 seconds, Fetched: 1 row(s)
hive> select * from t;
FAILED: RuntimeException java.lang.RuntimeException: cannot find field a from 
[0:a.b.c]
{code}


was (Author: dmarkovitz):
1. 
I'm truly puzzled here.
Why put a limitation on the *create* statement instead of removing the 
limitation from the *select* statement?
There seems to be no logical reason to prohibit queries such as *select 
`a.b.c`*.

2.
*CTAS* was not handled and it still possible to create tables that cannot be 
queried


hive> create table t as select 1 as `a.b.c`;
OK
hive> desc t;
OK
col_namedata_type   comment
a.b.c   int 
Time taken: 0.441 seconds, Fetched: 1 row(s)
hive> select * from t;
FAILED: RuntimeException java.lang.RuntimeException: cannot find field a from 
[0:a.b.c]


> Disallow create table with dot/colon in column name
> ---
>
> Key: HIVE-10120
> URL: https://issues.apache.org/jira/browse/HIVE-10120
> Project: Hive
>  Issue Type: Improvement
>  Components: Parser
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>  Labels: TODOC1.2
> Fix For: 1.2.0
>
> Attachments: HIVE-10120.01.patch, HIVE-10120.02.patch
>
>
> Since we don't allow users to query column names with dot in the middle such 
> as emp.no, don't allow users to create tables with such columns that cannot 
> be queried. Fix the documentation to reflect this fix.
> Here is an example. Consider this table:
> {code}
> CREATE TABLE a (`emp.no` string);
> select `emp.no` from a; fails with this message:
> FAILED: RuntimeException java.lang.RuntimeException: cannot find field emp 
> from [0:emp.no]
> {code}
> The hive documentation needs to be fixed:
> {code}
>  (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL) seems 
> to  indicate that any Unicode character can go between the backticks in the 
> select statement, but it doesn’t like the dot/colon or even select * when 
> there is a column that has a dot/colon. 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-10120) Disallow create table with dot/colon in column name

2017-05-06 Thread Dudu Markovitz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999540#comment-15999540
 ] 

Dudu Markovitz commented on HIVE-10120:
---

1. 
I'm truly puzzled here.
Why put a limitation on the *create* statement instead of removing the 
limitation from the *select* statement?
There seems to be no logical reason to prohibit queries such as *select 
`a.b.c`*.

2.
*CTAS* was not handled and it still possible to create tables that cannot be 
queried


hive> create table t as select 1 as `a.b.c`;
OK
hive> desc t;
OK
col_namedata_type   comment
a.b.c   int 
Time taken: 0.441 seconds, Fetched: 1 row(s)
hive> select * from t;
FAILED: RuntimeException java.lang.RuntimeException: cannot find field a from 
[0:a.b.c]


> Disallow create table with dot/colon in column name
> ---
>
> Key: HIVE-10120
> URL: https://issues.apache.org/jira/browse/HIVE-10120
> Project: Hive
>  Issue Type: Improvement
>  Components: Parser
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>  Labels: TODOC1.2
> Fix For: 1.2.0
>
> Attachments: HIVE-10120.01.patch, HIVE-10120.02.patch
>
>
> Since we don't allow users to query column names with dot in the middle such 
> as emp.no, don't allow users to create tables with such columns that cannot 
> be queried. Fix the documentation to reflect this fix.
> Here is an example. Consider this table:
> {code}
> CREATE TABLE a (`emp.no` string);
> select `emp.no` from a; fails with this message:
> FAILED: RuntimeException java.lang.RuntimeException: cannot find field emp 
> from [0:emp.no]
> {code}
> The hive documentation needs to be fixed:
> {code}
>  (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL) seems 
> to  indicate that any Unicode character can go between the backticks in the 
> select statement, but it doesn’t like the dot/colon or even select * when 
> there is a column that has a dot/colon. 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-10120) Disallow create table with dot/colon in column name

2017-05-05 Thread Dudu Markovitz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15998902#comment-15998902
 ] 

Dudu Markovitz commented on HIVE-10120:
---

[~leftylev]] - It seems the documentation was not updated. Would you like me to 
take care of that?

> Disallow create table with dot/colon in column name
> ---
>
> Key: HIVE-10120
> URL: https://issues.apache.org/jira/browse/HIVE-10120
> Project: Hive
>  Issue Type: Improvement
>  Components: Parser
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>  Labels: TODOC1.2
> Fix For: 1.2.0
>
> Attachments: HIVE-10120.01.patch, HIVE-10120.02.patch
>
>
> Since we don't allow users to query column names with dot in the middle such 
> as emp.no, don't allow users to create tables with such columns that cannot 
> be queried. Fix the documentation to reflect this fix.
> Here is an example. Consider this table:
> {code}
> CREATE TABLE a (`emp.no` string);
> select `emp.no` from a; fails with this message:
> FAILED: RuntimeException java.lang.RuntimeException: cannot find field emp 
> from [0:emp.no]
> {code}
> The hive documentation needs to be fixed:
> {code}
>  (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL) seems 
> to  indicate that any Unicode character can go between the backticks in the 
> select statement, but it doesn’t like the dot/colon or even select * when 
> there is a column that has a dot/colon. 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-10120) Disallow create table with dot/colon in column name

2017-05-05 Thread Dudu Markovitz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15998902#comment-15998902
 ] 

Dudu Markovitz edited comment on HIVE-10120 at 5/5/17 8:20 PM:
---

[~leftylev] - It seems the documentation was not updated. Would you like me to 
take care of that?


was (Author: dmarkovitz):
[~leftylev]] - It seems the documentation was not updated. Would you like me to 
take care of that?

> Disallow create table with dot/colon in column name
> ---
>
> Key: HIVE-10120
> URL: https://issues.apache.org/jira/browse/HIVE-10120
> Project: Hive
>  Issue Type: Improvement
>  Components: Parser
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>  Labels: TODOC1.2
> Fix For: 1.2.0
>
> Attachments: HIVE-10120.01.patch, HIVE-10120.02.patch
>
>
> Since we don't allow users to query column names with dot in the middle such 
> as emp.no, don't allow users to create tables with such columns that cannot 
> be queried. Fix the documentation to reflect this fix.
> Here is an example. Consider this table:
> {code}
> CREATE TABLE a (`emp.no` string);
> select `emp.no` from a; fails with this message:
> FAILED: RuntimeException java.lang.RuntimeException: cannot find field emp 
> from [0:emp.no]
> {code}
> The hive documentation needs to be fixed:
> {code}
>  (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL) seems 
> to  indicate that any Unicode character can go between the backticks in the 
> select statement, but it doesn’t like the dot/colon or even select * when 
> there is a column that has a dot/colon. 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-15402) LAG's PRECEDING does not work.

2017-04-25 Thread Dudu Markovitz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dudu Markovitz reassigned HIVE-15402:
-

Assignee: (was: Dudu Markovitz)

> LAG's PRECEDING does not work.
> --
>
> Key: HIVE-15402
> URL: https://issues.apache.org/jira/browse/HIVE-15402
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Ryu Kobayashi
>
> The syntax in the following manual does not work: 
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics#LanguageManualWindowingAndAnalytics-LAGspecifyingalagof3rowsanddefaultvalueof0
> {code}
> SELECT a, LAG(a, 3, 0) OVER (PARTITION BY b ORDER BY C ROWS 3 PRECEDING)
> FROM T;
> {code}
> {code}
> FAILED: SemanticException Failed to breakup Windowing invocations into 
> Groups. At least 1 group must only depend on input columns. Also check for 
> circular dependencies.
> Underlying error: Expecting left window frame boundary for function 
> LAG((tok_table_or_col a), 3, 0) Window 
> Spec=[PartitioningSpec=[partitionColumns=[(tok_table_or_col 
> b)]orderColumns=[(tok_table_or_col c) ASC NULLS_FIRST]]window(start=range(3 
> PRECEDING), end=currentRow)] as LAG_window_0 to be unbounded. Found : 3
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-15402) LAG's PRECEDING does not work.

2017-04-25 Thread Dudu Markovitz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982494#comment-15982494
 ] 

Dudu Markovitz edited comment on HIVE-15402 at 4/25/17 7:27 AM:


Not a bug. 
The documentation needed to be fixed (done).
For LAG, as well for some other functions (ROW_NUMBER, RANK etc.) the 
ROWS/RANGE clause does not make any sense.


was (Author: dmarkovitz):
Not a bug. 
The syntax in the documentation was wrong and I've fixed it.
For LAG, as well for some other functions (ROW_NUMBER, RANK etc.) the 
ROWS/RANGE clause does not make any sense.

> LAG's PRECEDING does not work.
> --
>
> Key: HIVE-15402
> URL: https://issues.apache.org/jira/browse/HIVE-15402
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Ryu Kobayashi
>Assignee: Dudu Markovitz
>
> The syntax in the following manual does not work: 
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics#LanguageManualWindowingAndAnalytics-LAGspecifyingalagof3rowsanddefaultvalueof0
> {code}
> SELECT a, LAG(a, 3, 0) OVER (PARTITION BY b ORDER BY C ROWS 3 PRECEDING)
> FROM T;
> {code}
> {code}
> FAILED: SemanticException Failed to breakup Windowing invocations into 
> Groups. At least 1 group must only depend on input columns. Also check for 
> circular dependencies.
> Underlying error: Expecting left window frame boundary for function 
> LAG((tok_table_or_col a), 3, 0) Window 
> Spec=[PartitioningSpec=[partitionColumns=[(tok_table_or_col 
> b)]orderColumns=[(tok_table_or_col c) ASC NULLS_FIRST]]window(start=range(3 
> PRECEDING), end=currentRow)] as LAG_window_0 to be unbounded. Found : 3
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15402) LAG's PRECEDING does not work.

2017-04-25 Thread Dudu Markovitz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982494#comment-15982494
 ] 

Dudu Markovitz commented on HIVE-15402:
---

Not a bug. 
The syntax in the documentation was wrong and I've fixed it.
For LAG, as well for some other functions (ROW_NUMBER, RANK etc.) the 
ROWS/RANGE clause does not make any sense.

> LAG's PRECEDING does not work.
> --
>
> Key: HIVE-15402
> URL: https://issues.apache.org/jira/browse/HIVE-15402
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Ryu Kobayashi
>Assignee: Dudu Markovitz
>
> The syntax in the following manual does not work: 
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics#LanguageManualWindowingAndAnalytics-LAGspecifyingalagof3rowsanddefaultvalueof0
> {code}
> SELECT a, LAG(a, 3, 0) OVER (PARTITION BY b ORDER BY C ROWS 3 PRECEDING)
> FROM T;
> {code}
> {code}
> FAILED: SemanticException Failed to breakup Windowing invocations into 
> Groups. At least 1 group must only depend on input columns. Also check for 
> circular dependencies.
> Underlying error: Expecting left window frame boundary for function 
> LAG((tok_table_or_col a), 3, 0) Window 
> Spec=[PartitioningSpec=[partitionColumns=[(tok_table_or_col 
> b)]orderColumns=[(tok_table_or_col c) ASC NULLS_FIRST]]window(start=range(3 
> PRECEDING), end=currentRow)] as LAG_window_0 to be unbounded. Found : 3
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-15402) LAG's PRECEDING does not work.

2017-04-25 Thread Dudu Markovitz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dudu Markovitz reassigned HIVE-15402:
-

Assignee: Dudu Markovitz

> LAG's PRECEDING does not work.
> --
>
> Key: HIVE-15402
> URL: https://issues.apache.org/jira/browse/HIVE-15402
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Ryu Kobayashi
>Assignee: Dudu Markovitz
>
> The syntax in the following manual does not work: 
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics#LanguageManualWindowingAndAnalytics-LAGspecifyingalagof3rowsanddefaultvalueof0
> {code}
> SELECT a, LAG(a, 3, 0) OVER (PARTITION BY b ORDER BY C ROWS 3 PRECEDING)
> FROM T;
> {code}
> {code}
> FAILED: SemanticException Failed to breakup Windowing invocations into 
> Groups. At least 1 group must only depend on input columns. Also check for 
> circular dependencies.
> Underlying error: Expecting left window frame boundary for function 
> LAG((tok_table_or_col a), 3, 0) Window 
> Spec=[PartitioningSpec=[partitionColumns=[(tok_table_or_col 
> b)]orderColumns=[(tok_table_or_col c) ASC NULLS_FIRST]]window(start=range(3 
> PRECEDING), end=currentRow)] as LAG_window_0 to be unbounded. Found : 3
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16496) Enhance asterisk expression (as in "select *") with EXCLUDE clause

2017-04-22 Thread Dudu Markovitz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15979834#comment-15979834
 ] 

Dudu Markovitz commented on HIVE-16496:
---

[~cartershanklin], thanks for your eye opening comments. 

1. I've added some more use-case
2. You are right about the positional part - it is not much useful to this  
syntax enhancement (but I do have an idea for another syntax enhancement that 
intends to ease ad-hoc work and can use the positional concept. I'll open a new 
ticket for it)

P.s. 
I would strongly recommend against the use NATURAL JOIN. 
Columns can be renamed, dropped or added, which leads to a logically broken 
code.




> Enhance asterisk expression (as in "select *") with EXCLUDE clause
> --
>
> Key: HIVE-16496
> URL: https://issues.apache.org/jira/browse/HIVE-16496
> Project: Hive
>  Issue Type: Wish
>  Components: Parser
>Reporter: Dudu Markovitz
>
> support the following syntax:
> {code}
> select * exclude (a,b,e) from t
> {code}
> which for a table t with columns a,b,c,d,e would be equal to:
> {code}
> select c,d from t
> {code}
> Please note that the EXCLUDE clause relates directly to its preceding 
> asterisk.
> Here are some useful use cases:
> h3. use-case 1: join
> {code}
> select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
> {code}
> This supplies a very clean way to select all columns without getting 
> "Ambiguous column reference" and without the need to specify all the columns 
> of at least one of the tables.
>  
> Currently, without this enhancement, the query would look something like this:
> {code}
> select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
> on t1.x=t2.x;
> {code}
> Considering a table may hold hundreds or even thousands of column, this can 
> be come very ugly and error prone.
> Often this require some scripting work.
> h3. use-case 2: view
> Creating views with all the tables columns except for some technical columns
>  
> {code}
> create myview as select * exclude (cre_ts,upd_ts) from t;
> {code}
> h3. use-case 3: row_number
> Remove computational columns that are not needed in the final row-set, e.g. -
> retrieve the last record for each customer
> {code}
> select  * exclude (rn)
> from   (select  t.*
>,row_number() over (partition by customer_id order by ts desc) 
> as rn
> fromt
> ) t
> 
> where   rn = 1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16496) Enhance asterisk expression (as in "select *") with EXCLUDE clause

2017-04-22 Thread Dudu Markovitz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dudu Markovitz updated HIVE-16496:
--
Description: 
support the following syntax:

{code}
select * exclude (a,b,e) from t
{code}

which for a table t with columns a,b,c,d,e would be equal to:

{code}
select c,d from t
{code}

Please note that the EXCLUDE clause relates directly to its preceding asterisk.

Here are some useful use cases:

h3. use-case 1: join

{code}
select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
{code}

This supplies a very clean way to select all columns without getting "Ambiguous 
column reference" and without the need to specify all the columns of at least 
one of the tables.
 
Currently, without this enhancement, the query would look something like this:

{code}
select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
on t1.x=t2.x;
{code}

Considering a table may hold hundreds or even thousands of column, this can be 
come very ugly and error prone.
Often this require some scripting work.

h3. use-case 2: view
Creating views with all the tables columns except for some technical columns
 
{code}
create myview as select * exclude (cre_ts,upd_ts) from t;
{code}

h3. use-case 3: row_number
Remove computational columns that are not needed in the final row-set, e.g. -
retrieve the last record for each customer

{code}
select  * exclude (rn)

from   (select  t.*
   ,row_number() over (partition by customer_id order by ts desc) 
as rn

fromt
) t

where   rn = 1
{code}


  was:
support the following syntax:

{code}
select * exclude (a,b,e) from t
{code}

which for a table t with columns a,b,c,d,e would be equal to:

{code}
select c,d from t
{code}

Please note that the EXCLUDE clause relates directly to its preceding asterisk.

A common use case would be:

{code}
select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
{code}

This supplies a very clean way to select all columns without getting "Ambiguous 
column reference" and without the need to specify all the columns of at least 
one of the tables.
 
Currently, without this enhancement, the query would look something like this:

{code}
select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
on t1.x=t2.x;
{code}

Considering a table may hold hundreds or even thousands of column, this can be 
come very ugly and error prone.
Often this require some scripting work.

h4. Extended syntax: positional notation support
The positional notation is similar to the one used but the *cut* unix command 

{code}
select * exclude ($(1,2,5))   from t   -- exclude columns 1, 2 and 5
select * exclude ($(1-3)) from t   -- exclude columns 1 to 3
select * exclude ($(1,3-5,7)) from t   -- exclude columns 1,3 to 5 and 7
select * exclude ($(7,3-5,1)) from t   -- exclude columns 1,3 to 5 and 7 (same 
as previous example)
select * exclude ($(3,5-))from t   -- exclude the 3rd column and all 
columns from the 5th column to the end
select * exclude (-$(1,2))from t   -- exclude last 2 columns
select * exclude (-$(1-3,7))  from t   -- exclude last 3 columns and the 7th 
column from the end
select * exclude (-$(4-)) from t   -- exclude all columns except for the 
last 3
{code}


A complex example would look like:

{code}
select * exclude ($(1-3,5,7),x,y,-$(1-2)) from t  
{code}

exclude:
- first 3 columns
- 5th and 7th columns
- x and y 
- last 2 columns

P.s. 1
There should be *no* error raised for the following scenarios:
 - Excluding a column that does not exists in the columns set
 - Excluding the same column more than once (e.g. by name and by position).
- Excluding all columns

P.s. 2
This enhancement answers a real need that is being raised again again in the 
Hive users community as well in legacy RDBMS communities. 
As far as I know, no provider had yet implemented something similar and we have 
an opportunity here to lead the SQL ecosystem.

  


> Enhance asterisk expression (as in "select *") with EXCLUDE clause
> --
>
> Key: HIVE-16496
> URL: https://issues.apache.org/jira/browse/HIVE-16496
> Project: Hive
>  Issue Type: Wish
>  Components: Parser
>Reporter: Dudu Markovitz
>
> support the following syntax:
> {code}
> select * exclude (a,b,e) from t
> {code}
> which for a table t with columns a,b,c,d,e would be equal to:
> {code}
> select c,d from t
> {code}
> Please note that the EXCLUDE clause relates directly to its preceding 
> asterisk.
> Here are some useful use cases:
> h3. use-case 1: join
> {code}
> select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
> {code}
> This supplies a very clean way to select all columns without getting 
> "Ambiguous column reference" and without the need to specify all the columns 
> of at least one of the 

[jira] [Updated] (HIVE-16496) Enhance asterisk expression (as in "select *") with EXCLUDE clause

2017-04-21 Thread Dudu Markovitz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dudu Markovitz updated HIVE-16496:
--
Description: 
support the following syntax:

{code}
select * exclude (a,b,e) from t
{code}

which for a table t with columns a,b,c,d,e would be equal to:

{code}
select c,d from t
{code}

Please note that the EXCLUDE clause relates directly to its preceding asterisk.

A common use case would be:

{code}
select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
{code}

This supplies a very clean way to select all columns without getting "Ambiguous 
column reference" and without the need to specify all the columns of at least 
one of the tables.
 
Currently, without this enhancement, the query would look something like this:

{code}
select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
on t1.x=t2.x;
{code}

Considering a table may hold hundreds or even thousands of column, this can be 
come very ugly and error prone.
Often this require some scripting work.

h4. Extended syntax: positional notation support
The positional notation is similar to the one used but the *cut* unix command 

{code}
select * exclude ($(1,2,5))   from t   -- exclude columns 1, 2 and 5
select * exclude ($(1-3)) from t   -- exclude columns 1 to 3
select * exclude ($(1,3-5,7)) from t   -- exclude columns 1,3 to 5 and 7
select * exclude ($(7,3-5,1)) from t   -- exclude columns 1,3 to 5 and 7 (same 
as previous example)
select * exclude ($(3,5-))from t   -- exclude the 3rd column and all 
columns from the 5th column to the end
select * exclude (-$(1,2))from t   -- exclude last 2 columns
select * exclude (-$(1-3,7))  from t   -- exclude last 3 columns and the 7th 
column from the end
select * exclude (-$(4-)) from t   -- exclude all columns except for the 
last 3
{code}


A complex example would look like:

{code}
select * exclude ($(1-3,5,7),x,y,-$(1-2)) from t  
{code}

exclude:
- first 3 columns
- 5th and 7th columns
- x and y 
- last 2 columns

P.s. 1
There should be *no* error raised for the following scenarios:
 - Excluding a column that does not exists in the columns set
 - Excluding the same column more than once (e.g. by name and by position).
- Excluding all columns

P.s. 2
This enhancement answers a real need that is being raised again again in the 
Hive users community as well in legacy RDBMS communities. 
As far as I know, no provider had yet implemented something similar and we have 
an opportunity here to lead the SQL ecosystem.

  

  was:
support the following syntax:

{code}
select * exclude (a,b,e) from t
{code}

which for a table t with columns a,b,c,d,e would be equal to:

{code}
select c,d from t
{code}

Please note that the EXCLUDE clause relates directly to its preceding asterisk.

A common use case would be:

{code}
select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
{code}

This supplies a very clean way to select all columns without getting "Ambiguous 
column reference" and without the need to specify all the columns of at least 
one of the tables.
 
Currently, without this enhancement, the query would look something like this:

{code}
select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
on t1.x=t2.x;
{code}

Considering a table may hold hundreds or even thousands of column, this can be 
come very ugly and error prone.
Often this require some scripting work.

h4. Extended syntax: positional notation support

{code}
select * exclude ($(1,2,5))   from t   -- exclude columns 1, 2 and 5
select * exclude ($(1-3)) from t   -- exclude columns 1 to 3
select * exclude ($(1,3-5,7)) from t   -- exclude columns 1,3 to 5 and 7
select * exclude ($(7,3-5,1)) from t   -- exclude columns 1,3 to 5 and 7 (same 
as previous example)
select * exclude ($(3,5-))from t   -- exclude the 3rd column and all 
columns from the 5th column to the end
select * exclude (-$(1,2))from t   -- exclude last 2 columns
select * exclude (-$(1-3,7))  from t   -- exclude last 3 columns and the 7th 
column from the end
select * exclude (-$(4-)) from t   -- exclude all columns except for the 
last 3
{code}


A complex example would look like:

{code}
select * exclude ($(1-3,5,7),x,y,-$(1-2)) from t  
{code}

exclude:
- first 3 columns
- 5th and 7th columns
- x and y 
- last 2 columns

P.s. 1
There should be *no* error raised for the following scenarios:
 - Excluding a column that does not exists in the columns set
 - Excluding the same column more than once (e.g. by name and by position).
- Excluding all columns

P.s. 2
This enhancement answers a real need that is being raised again again in the 
Hive users community as well in legacy RDBMS communities. 
As far as I know, no provider had yet implemented something similar and we have 
an opportunity here to lead the SQL ecosystem.

  


> Enhance asterisk expression (as in "select *") with EXCLUDE clause
> 

[jira] [Updated] (HIVE-16496) Enhance asterisk expression (as in "select *") with EXCLUDE clause

2017-04-21 Thread Dudu Markovitz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dudu Markovitz updated HIVE-16496:
--
Description: 
support the following syntax:

{code}
select * exclude (a,b,e) from t
{code}

which for a table t with columns a,b,c,d,e would be equal to:

{code}
select c,d from t
{code}

Please note that the EXCLUDE clause relates directly to its preceding asterisk.

A common use case would be:

{code}
select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
{code}

This supplies a very clean way to select all columns without getting "Ambiguous 
column reference" and without the need to specify all the columns of at least 
one of the tables.
 
Currently, without this enhancement, the query would look something like this:

{code}
select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
on t1.x=t2.x;
{code}

Considering a table may hold hundreds or even thousands of column, this can be 
come very ugly and error prone.
Often this require some scripting work.

h4. Extended syntax: positional notation support

{code}
select * exclude ($(1,2,5))   from t   -- exclude columns 1, 2 and 5
select * exclude ($(1-3)) from t   -- exclude columns 1 to 3
select * exclude ($(1,3-5,7)) from t   -- exclude columns 1,3 to 5 and 7
select * exclude ($(7,3-5,1)) from t   -- exclude columns 1,3 to 5 and 7 (same 
as previous example)
select * exclude ($(3,5-))from t   -- exclude the 3rd column and all 
columns from the 5th column to the end
select * exclude (-$(1,2))from t   -- exclude last 2 columns
select * exclude (-$(1-3,7))  from t   -- exclude last 3 columns and the 7th 
column from the end
select * exclude (-$(4-)) from t   -- exclude all columns except for the 
last 3
{code}


A complex example would look like:

{code}
select * exclude ($(1-3,5,7),x,y,-$(1-2)) from t  
{code}

exclude:
- first 3 columns
- 5th and 7th columns
- x and y 
- last 2 columns

P.s. 1
There should be *no* error raised for the following scenarios:
 - Excluding a column that does not exists in the columns set
 - Excluding the same column more than once (e.g. by name and by position).
- Excluding all columns

P.s. 2
This enhancement answers a real need that is being raised again again in the 
Hive users community as well in legacy RDBMS communities. 
As far as I know, no provider had yet implemented something similar and we have 
an opportunity here to lead the SQL ecosystem.

  

  was:
support the following syntax:

{code}
select * exclude (a,b,e) from t
{code}

which for a table t with columns a,b,c,d,e would be equal to:

{code}
select c,d from t
{code}

Please note that the EXCLUDE clause relates directly to its preceding asterisk.

A common use case would be:

{code}
select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
{code}

This supplies a very clean way to select all columns without getting "Ambiguous 
column reference" and without the need to specify all the columns of at least 
one of the tables.
 
Currently, without this enhancement, the query would look something like this:

{code}
select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
on t1.x=t2.x;
{code}

Considering a table may hold hundreds or even thousands of column, this can be 
come very ugly and error prone.
Often this require some scripting work.

h4. Extended syntax: positional notation support

{code}
select * exclude ($(1,2,5))   from t   -- exclude columns 1, 2 and 5
select * exclude ($(1-3)) from t   -- exclude columns 1 to 3
select * exclude ($(1,3-5,7)) from t   -- exclude columns 1,3 to 5 and 7
select * exclude ($(7,3-5,1)) from t   -- exclude columns 1,3 to 5 and 7 (same 
as previous example)*/
select * exclude ($(3,5-))from t   -- exclude the 3rd column and all 
columns from the 5th column to the end
select * exclude (-$(1,2))from t   -- exclude last 2 columns
select * exclude (-$(1-3,7))  from t   -- exclude last 3 columns and the 7th 
column from the end
select * exclude (-$(4-)) from t   -- exclude all columns except for the 
last 3
{code}


A complex example would look like:

{code}
select * exclude ($(1-3,5,7),x,y,-$(1-2)) from t  
{code}

exclude:
- first 3 columns
- 5th and 7th columns
- x and y 
- last 2 columns

P.s. 1
There should be *no* error raised for the following scenarios:
 - Excluding a column that does not exists in the columns set
 - Excluding the same column more than once (e.g. by name and by position).
- Excluding all columns

P.s. 2
This enhancement answers a real need that is being raised again again in the 
Hive users community as well in legacy RDBMS communities. 
As far as I know, no provider had yet implemented something similar and we have 
an opportunity here to lead the SQL ecosystem.

  


> Enhance asterisk expression (as in "select *") with EXCLUDE clause
> --
>
> 

[jira] [Updated] (HIVE-16496) Enhance asterisk expression (as in "select *") with EXCLUDE clause

2017-04-21 Thread Dudu Markovitz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dudu Markovitz updated HIVE-16496:
--
Description: 
support the following syntax:

{code}
select * exclude (a,b,e) from t
{code}

which for a table t with columns a,b,c,d,e would be equal to:

{code}
select c,d from t
{code}

Please note that the EXCLUDE clause relates directly to its preceding asterisk.

A common use case would be:

{code}
select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
{code}

This supplies a very clean way to select all columns without getting "Ambiguous 
column reference" and without the need to specify all the columns of at least 
one of the tables.
 
Currently, without this enhancement, the query would look something like this:

{code}
select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
on t1.x=t2.x;
{code}

Considering a table may hold hundreds or even thousands of column, this can be 
come very ugly and error prone.
Often this require some scripting work.

h4. Extended syntax: positional notation support

{code}
select * exclude ($(1,2,5))   from t   -- exclude columns 1, 2 and 5
select * exclude ($(1-3)) from t   -- exclude columns 1 to 3
select * exclude ($(1,3-5,7)) from t   -- exclude columns 1,3 to 5 and 7
select * exclude ($(7,3-5,1)) from t   -- exclude columns 1,3 to 5 and 7 (same 
as previous example)*/
select * exclude ($(3,5-))from t   -- exclude the 3rd column and all 
columns from the 5th column to the end
select * exclude (-$(1,2))from t   -- exclude last 2 columns
select * exclude (-$(1-3,7))  from t   -- exclude last 3 columns and the 7th 
column from the end
select * exclude (-$(4-)) from t   -- exclude all columns except for the 
last 3
{code}


A complex example would look like:

{code}
select * exclude ($(1-3,5,7),x,y,-$(1-2)) from t  
{code}

exclude:
- first 3 columns
- 5th and 7th columns
- x and y 
- last 2 columns

P.s. 1
There should be *no* error raised for the following scenarios:
 - Excluding a column that does not exists in the columns set
 - Excluding the same column more than once (e.g. by name and by position).
- Excluding all columns

P.s. 2
This enhancement answers a real need that is being raised again again in the 
Hive users community as well in legacy RDBMS communities. 
As far as I know, no provider had yet implemented something similar and we have 
an opportunity here to lead the SQL ecosystem.

  

  was:
support the following syntax:

{code}
select * exclude (a,b,e) from t
{code}

which for a table t with columns a,b,c,d,e would be equal to:

{code}
select c,d from t
{code}

Please note that the EXCLUDE clause relates directly to its preceding asterisk.

A common use case would be:

{code}
select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
{code}

This supplies a very clean way to select all columns without getting "Ambiguous 
column reference" and without the need to specify all the columns of at least 
one of the tables.
 
Currently, without this enhancement, the query would look something like this:

{code}
select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
on t1.x=t2.x;
{code}

Considering a table may hold hundreds or even thousands of column, this can be 
come very ugly and error prone.
Often this require some scripting work.

h4. Extended syntax: positional notation support

{code}
select * exclude ($(1,2,5))   from t   -- exclude columns 1, 2 and 5
select * exclude ($(1-3)) from t   -- exclude columns 1 to 3
select * exclude ($(1,3-5,7)) from t   -- exclude columns 1,3 to 5 and 7
select * exclude ($(7,3-5,1)) from t   -- exclude columns 1,3 to 5 and 7 (same 
as previous example)*/
select * exclude ($(3,5-))from t   -- exclude the 3rd column and all 
columns from the 5th column to the end
select * exclude (-$(1,2))from t   -- exclude last 2 columns
select * exclude (-$(1-3,7))  from t   -- exclude last 3 columns and the 7th 
column from the end
select * exclude (-$(4-)) from t   -- exclude all column excpt for the last 
3
{code}


A complex example would look like:

{code}
select * exclude ($(1-3,5,7),x,y,-$(1-2)) from t  
{code}

exclude:
- first 3 columns
- 5th and 7th columns
- x and y 
- last 2 columns

P.s. 1
There should be *no* error raised for the following scenarios:
 - Excluding a column that does not exists in the columns set
 - Excluding the same column more than once (e.g. by name and by position).
- Excluding all columns

P.s. 2
This enhancement answers a real need that is being raised again again in the 
Hive users community as well in legacy RDBMS communities. 
As far as I know, no provider had yet implemented something similar and we have 
an opportunity here to lead the SQL ecosystem.

  


> Enhance asterisk expression (as in "select *") with EXCLUDE clause
> --
>
> 

[jira] [Updated] (HIVE-16496) Enhance asterisk expression (as in "select *") with EXCLUDE clause

2017-04-21 Thread Dudu Markovitz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dudu Markovitz updated HIVE-16496:
--
Description: 
support the following syntax:

{code}
select * exclude (a,b,e) from t
{code}

which for a table t with columns a,b,c,d,e would be equal to:

{code}
select c,d from t
{code}

Please note that the EXCLUDE clause relates directly to its preceding asterisk.

A common use case would be:

{code}
select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
{code}

This supplies a very clean way to select all columns without getting "Ambiguous 
column reference" and without the need to specify all the columns of at least 
one of the tables.
 
Currently, without this enhancement, the query would look something like this:

{code}
select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
on t1.x=t2.x;
{code}

Considering a table may hold hundreds or even thousands of column, this can be 
come very ugly and error prone.
Often this require some scripting work.

h4. Extended syntax: positional notation support

{code}
select * exclude ($(1,2,5))   from t   -- exclude columns 1, 2 and 5
select * exclude ($(1-3)) from t   -- exclude columns 1 to 3
select * exclude ($(1,3-5,7)) from t   -- exclude columns 1,3 to 5 and 7
select * exclude ($(7,3-5,1)) from t   -- exclude columns 1,3 to 5 and 7 (same 
as previous example)*/
select * exclude ($(3,5-))from t   -- exclude the 3rd column and all 
columns from the 5th column to the end
select * exclude (-$(1,2))from t   -- exclude last 2 columns
select * exclude (-$(1-3,7))  from t   -- exclude last 3 columns and the 7th 
column from the end
select * exclude (-$(4-)) from t   -- exclude all column excpt for the last 
3
{code}


A complex example would look like:

{code}
select * exclude ($(1-3,5,7),x,y,-$(1-2)) from t  
{code}

exclude:
- first 3 columns
- 5th and 7th columns
- x and y 
- last 2 columns

P.s. 1
There should be *no* error raised for the following scenarios:
 - Excluding a column that does not exists in the columns set
 - Excluding the same column more than once (e.g. by name and by position).
- Excluding all columns

P.s. 2
This enhancement answers a real need that is being raised again again in the 
Hive users community as well in legacy RDBMS communities. 
As far as I know, no provider had yet implemented something similar and we have 
an opportunity here to lead the SQL ecosystem.

  

  was:
support the following syntax:

{code}
select * exclude (a,b,e) from t
{code}

which for a table t with columns a,b,c,d,e would be equal to:

{code}
select c,d from t
{code}

Please note that the EXCLUDE clause relates directly to its preceding asterisk.

A common use case would be:

{code}
select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
{code}

This supplies a very clean way to select all columns without getting "Ambiguous 
column reference" and without the need to specify all the columns of at least 
one of the tables.
 
Currently, without this enhancement, the query would look something like this:

{code}
select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
on t1.x=t2.x;
{code}

Considering a table may hold hundreds or even thousands of column, this can be 
come very ugly and error prone.
Often this require some scripting work.

h4. Extended syntax: positional notation support

(code)
select * exclude ($(1,2,5))   from t   -- exclude columns 1, 2 and 5
select * exclude ($(1-3)) from t   -- exclude columns 1 to 3
select * exclude ($(1,3-5,7)) from t   -- exclude columns 1,3 to 5 and 7
select * exclude ($(7,3-5,1)) from t   -- exclude columns 1,3 to 5 and 7 (same 
as previous example)*/
select * exclude ($(3,5-))from t   -- exclude the 3rd column and all 
columns from the 5th column to the end
select * exclude (-$(1,2))from t   -- exclude last 2 columns
select * exclude (-$(1-3,7))  from t   -- exclude last 3 columns and the 7th 
column from the end
select * exclude (-$(4-)) from t   -- exclude all column excpt for the last 
3
(code)


A complex example would look like:

{code}
select * exclude ($(1-3,5,7),x,y,-$(1-2)) from t  
{code}

exclude:
- first 3 columns
- 5th and 7th columns
- x and y 
- last 2 columns

P.s. 1
There should be *no* error raised for the following scenarios:
 - Excluding a column that does not exists in the columns set
 - Excluding the same column more than once (e.g. by name and by position).
- Excluding all columns

P.s. 2
This enhancement answers a real need that is being raised again again in the 
Hive users community as well in legacy RDBMS communities. 
As far as I know, no provider had yet implemented something similar and we have 
an opportunity here to lead the SQL ecosystem.

  


> Enhance asterisk expression (as in "select *") with EXCLUDE clause
> --
>
> Key: 

[jira] [Updated] (HIVE-16496) Enhance asterisk expression (as in "select *") with EXCLUDE clause

2017-04-21 Thread Dudu Markovitz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dudu Markovitz updated HIVE-16496:
--
Description: 
support the following syntax:

{code}
select * exclude (a,b,e) from t
{code}

which for a table t with columns a,b,c,d,e would be equal to:

{code}
select c,d from t
{code}

Please note that the EXCLUDE clause relates directly to its preceding asterisk.

A common use case would be:

{code}
select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
{code}

This supplies a very clean way to select all columns without getting "Ambiguous 
column reference" and without the need to specify all the columns of at least 
one of the tables.
 
Currently, without this enhancement, the query would look something like this:

{code}
select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
on t1.x=t2.x;
{code}

Considering a table may hold hundreds or even thousands of column, this can be 
come very ugly and error prone.
Often this require some scripting work.

h4. Extended syntax: positional notation support

(code)
select * exclude ($(1,2,5))   from t   -- exclude columns 1, 2 and 5
select * exclude ($(1-3)) from t   -- exclude columns 1 to 3
select * exclude ($(1,3-5,7)) from t   -- exclude columns 1,3 to 5 and 7
select * exclude ($(7,3-5,1)) from t   -- exclude columns 1,3 to 5 and 7 (same 
as previous example)*/
select * exclude ($(3,5-))from t   -- exclude the 3rd column and all 
columns from the 5th column to the end
select * exclude (-$(1,2))from t   -- exclude last 2 columns
select * exclude (-$(1-3,7))  from t   -- exclude last 3 columns and the 7th 
column from the end
select * exclude (-$(4-)) from t   -- exclude all column excpt for the last 
3
(code)


A complex example would look like:

{code}
select * exclude ($(1-3,5,7),x,y,-$(1-2)) from t  
{code}

exclude:
- first 3 columns
- 5th and 7th columns
- x and y 
- last 2 columns

P.s. 1
There should be *no* error raised for the following scenarios:
 - Excluding a column that does not exists in the columns set
 - Excluding the same column more than once (e.g. by name and by position).
- Excluding all columns

P.s. 2
This enhancement answers a real need that is being raised again again in the 
Hive users community as well in legacy RDBMS communities. 
As far as I know, no provider had yet implemented something similar and we have 
an opportunity here to lead the SQL ecosystem.

  

  was:
support the following syntax:

{code}
select * exclude (a,b,e) from t
{code}

which for a table t with columns a,b,c,d,e would be equal to:

{code}
select c,d from t
{code}

Please note that the EXCLUDE clause relates directly to its preceding asterisk.

A common use case would be:

{code}
select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
{code}

This supplies a very clean way to select all columns without getting "Ambiguous 
column reference" and without the need to specify all the columns of at least 
one of the tables.
 
Currently, without this enhancement, the query would look something like this:

{code}
select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
on t1.x=t2.x;
{code}

Considering a table may hold hundreds or even thousands of column, this can be 
come very ugly and error prone.
Often this require some scripting work.

h4. Extended syntax: positional notation support

{code}
select * exclude (${1,2,5})   from t   -- exclude columns 1, 2 and 5
select * exclude (${1-3}) from t   -- exclude columns 1 to 3
select * exclude (${1,3-5,7}) from t   -- exclude columns 1,3 to 5 and 7
select * exclude (${7,3-5,1}) from t   -- exclude columns 1,3 to 5 and 7 (same 
as previous example)*/
select * exclude (${3,5-})from t   -- exclude the 3rd column and all 
columns from the 5th column to the end
select * exclude (-${1,2})from t   -- exclude last 2 columns
select * exclude (-${1-3,7})  from t   -- exclude last 3 columns and the 7th 
column from the end
select * exclude (-${4-}) from t   -- exclude all column excpt for the last 
3
{code}


A complex example would look like:

{code}
select * exclude (${1-3,5,7},x,y,-${1-2}) from t  
{code}

exclude:
- first 3 columns
- 5th and 7th columns
- x and y 
- last 2 columns

P.s. 1
There should be *no* error raised for the following scenarios:
 - Excluding a column that does not exists in the columns set
 - Excluding the same column more than once (e.g. by name and by position).
- Excluding all columns

P.s. 2
This enhancement answers a real need that is being raised again again in the 
Hive users community as well in legacy RDBMS communities. 
As far as I know, no provider had yet implemented something similar and we have 
an opportunity here to lead the SQL ecosystem.

  


> Enhance asterisk expression (as in "select *") with EXCLUDE clause
> --
>
> Key: 

[jira] [Updated] (HIVE-16496) Enhance asterisk expression (as in "select *") with EXCLUDE clause

2017-04-21 Thread Dudu Markovitz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dudu Markovitz updated HIVE-16496:
--
Description: 
support the following syntax:

{code}
select * exclude (a,b,e) from t
{code}

which for a table t with columns a,b,c,d,e would be equal to:

{code}
select c,d from t
{code}

Please note that the EXCLUDE clause relates directly to its preceding asterisk.

A common use case would be:

{code}
select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
{code}

This supplies a very clean way to select all columns without getting "Ambiguous 
column reference" and without the need to specify all the columns of at least 
one of the tables.
 
Currently, without this enhancement, the query would look something like this:

{code}
select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
on t1.x=t2.x;
{code}

Considering a table may hold hundreds or even thousands of column, this can be 
come very ugly and error prone.
Often this require some scripting work.

h4. Extended syntax: positional notation support

{code}
select * exclude (${1,2,5})   from t   -- exclude columns 1, 2 and 5
select * exclude (${1-3}) from t   -- exclude columns 1 to 3
select * exclude (${1,3-5,7}) from t   -- exclude columns 1,3 to 5 and 7
select * exclude (${7,3-5,1}) from t   -- exclude columns 1,3 to 5 and 7 (same 
as previous example)*/
select * exclude (${3,5-})from t   -- exclude the 3rd column and all 
columns from the 5th column to the end
select * exclude (-${1,2})from t   -- exclude last 2 columns
select * exclude (-${1-3,7})  from t   -- exclude last 3 columns and the 7th 
column from the end
select * exclude (-${4-}) from t   -- exclude all column excpt for the last 
3
{code}


A complex example would look like:

{code}
select * exclude (${1-3,5,7},x,y,-${1-2}) from t  
{code}

exclude:
- first 3 columns
- 5th and 7th columns
- x and y 
- last 2 columns

P.s. 1
There should be *no* error raised for the following scenarios:
 - Excluding a column that does not exists in the columns set
 - Excluding the same column more than once (e.g. by name and by position).
- Excluding all columns

P.s. 2
This enhancement answers a real need that is being raised again again in the 
Hive users community as well in legacy RDBMS communities. 
As far as I know, no provider had yet implemented something similar and we have 
an opportunity here to lead the SQL ecosystem.

  

  was:
support the following syntax:

{code}
select * exclude (a,b,e) from t
{code}

which for a table t with columns a,b,c,d,e would be equal to:

{code}
select c,d from t
{code}

Please note that the EXCLUDE clause relates directly to its preceding asterisk.

A common use case would be:

{code}
select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
{code}

This supplies a very clean way to select all columns without getting "Ambiguous 
column reference" and without the need to specify all the columns of at least 
one of the tables.
 
Currently, without this enhancement, the query would look something like this:

{code}
select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
on t1.x=t2.x;
{code}

Considering a table may hold hundreds or even thousands of column, this can be 
come very ugly and error prone.
Often this require some scripting work.

h4. Extended syntax: positional notation support

{code}
select * exclude (${1,2,5})   from t   /* exclude columns 1, 2 and 5 */
select * exclude (${1-3}) from t   /* exclude columns 1 to 3 */
select * exclude (${1,3-5,7}) from t   /* exclude columns 1,3 to 5 and 7 */
select * exclude (${7,3-5,1}) from t   /* exclude columns 1,3 to 5 and 7 (same 
as previous example)*/
select * exclude (${3,5-})from t   /* exclude the 3rd column and all 
columns from the 5th column to the end */
select * exclude (-${1,2})from t   /* exclude last 2 columns */
select * exclude (-${1-3,7})  from t   /* exclude last 3 columns and the 7th 
column from the end */
select * exclude (-${4-}) from t   /* exclude all column excpt for the last 
3 */
{code}


A complex example would look like:

{code}
select * exclude (${1-3,5,7},x,y,-${1-2}) from t  
{code}

exclude:
- first 3 columns
- 5th and 7th columns
- x and y 
- last 2 columns

P.s. 1
There should be *no* error raised for the following scenarios:
 - Excluding a column that does not exists in the columns set
 - Excluding the same column more than once (e.g. by name and by position).
- Excluding all columns

P.s. 2
This enhancement answers a real need that is being raised again again in the 
Hive users community as well in legacy RDBMS communities. 
As far as I know, no provider had yet implemented something similar and we have 
an opportunity here to lead the SQL ecosystem.

  


> Enhance asterisk expression (as in "select *") with EXCLUDE clause
> --
>
> 

[jira] [Updated] (HIVE-16496) Enhance asterisk expression (as in "select *") with EXCLUDE clause

2017-04-21 Thread Dudu Markovitz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dudu Markovitz updated HIVE-16496:
--
Description: 
support the following syntax:

{code}
select * exclude (a,b,e) from t
{code}

which for a table t with columns a,b,c,d,e would be equal to:

{code}
select c,d from t
{code}

Please note that the EXCLUDE clause relates directly to its preceding asterisk.

A common use case would be:

{code}
select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
{code}

This supplies a very clean way to select all columns without getting "Ambiguous 
column reference" and without the need to specify all the columns of at least 
one of the tables.
 
Currently, without this enhancement, the query would look something like this:

{code}
select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
on t1.x=t2.x;
{code}

Considering a table may hold hundreds or even thousands of column, this can be 
come very ugly and error prone.
Often this require some scripting work.

h4. Extended syntax: positional notation support

{code}
select * exclude (${1,2,5})   from t   /* exclude columns 1, 2 and 5 */
select * exclude (${1-3}) from t   /* exclude columns 1 to 3 */
select * exclude (${1,3-5,7}) from t   /* exclude columns 1,3 to 5 and 7 */
select * exclude (${7,3-5,1}) from t   /* exclude columns 1,3 to 5 and 7 (same 
as previous example)*/
select * exclude (${3,5-})from t   /* exclude the 3rd column and all 
columns from the 5th column to the end */
select * exclude (-${1,2})from t   /* exclude last 2 columns */
select * exclude (-${1-3,7})  from t   /* exclude last 3 columns and the 7th 
column from the end */
select * exclude (-${4-}) from t   /* exclude all column excpt for the last 
3 */
{code}


A complex example would look like:

{code}
select * exclude (${1-3,5,7},x,y,-${1-2}) from t  
{code}

exclude:
- first 3 columns
- 5th and 7th columns
- x and y 
- last 2 columns

P.s. 1
There should be *no* error raised for the following scenarios:
 - Excluding a column that does not exists in the columns set
 - Excluding the same column more than once (e.g. by name and by position).
- Excluding all columns

P.s. 2
This enhancement answers a real need that is being raised again again in the 
Hive users community as well in legacy RDBMS communities. 
As far as I know, no provider had yet implemented something similar and we have 
an opportunity here to lead the SQL ecosystem.

  

  was:
support the following syntax:

{code}
select * exclude (a,b,e) from t
{code}

which for a table t with columns a,b,c,d,e would be equal to:

{code}
select c,d from t
{code}

Please note that the EXCLUDE clause relates directly to its preceding asterisk.

A common use case would be:

{code}
select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
{code}

This supplies a very clean way to select all columns without getting "Ambiguous 
column reference" and without the need to specify all the columns of at least 
one of the tables.
 
Currently, without this enhancement, the query would look something like this:

{code}
select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
on t1.x=t2.x;
{code}

Considering a table may hold hundreds or even thousands of column, this can be 
come very ugly and error prone.
Often this require some scripting work.

An extension to this enhancement would be supporting positional column 
notation, e.g.:

{code}
select * exclude (${1,2,5})   from t   /* exclude columns 1, 2 and 5 */
select * exclude (${1-3}) from t   /* exclude columns 1 to 3 */
select * exclude (${1,3-5,7}) from t   /* exclude columns 1,3 to 5 and 7 */
select * exclude (${7,3-5,1}) from t   /* exclude columns 1,3 to 5 and 7 (same 
as previous example)*/
select * exclude (${3,5-})from t   /* exclude the 3rd column and all 
columns from the 5th column to the end */
select * exclude (-${1,2})from t   /* exclude last 2 columns */
select * exclude (-${1-3,7})  from t   /* exclude last 3 columns and the 7th 
column from the end */
select * exclude (-${4-}) from t   /* exclude all column excpt for the last 
3 */
{code}


A complex example would look like:

{code}
select * exclude (${1-3,5,7},x,y,-${1-2}) from t  
{code}

exclude:
- first 3 columns
- 5th and 7th columns
- x and y 
- last 2 columns

P.s. 1
There should be *no* error raised for the following scenarios:
 - Excluding a column that does not exists in the columns set
 - Excluding the same column more than once (e.g. by name and by position).
- Excluding all columns

P.s. 2
This enhancement answers a real need that is being raised again again in the 
Hive users community as well in legacy RDBMS communities. 
As far as I know, no provider had yet implemented something similar and we have 
an opportunity here to lead the SQL ecosystem.

  


> Enhance asterisk expression (as in "select *") with EXCLUDE clause
> 

[jira] [Updated] (HIVE-16496) Enhance asterisk expression (as in "select *") with EXCLUDE clause

2017-04-21 Thread Dudu Markovitz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dudu Markovitz updated HIVE-16496:
--
Description: 
support the following syntax:

{code}
select * exclude (a,b,e) from t
{code}

which for a table t with columns a,b,c,d,e would be equal to:

{code}
select c,d from t
{code}

Please note that the EXCLUDE clause relates directly to its preceding asterisk.

A common use case would be:

{code}
select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
{code}

This supplies a very clean way to select all columns without getting "Ambiguous 
column reference" and without the need to specify all the columns of at least 
one of the tables.
 
Currently, without this enhancement, the query would look something like this:

{code}
select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
on t1.x=t2.x;
{code}

Considering a table may hold hundreds or even thousands of column, this can be 
come very ugly and error prone.
Often this require some scripting work.

An extension to this enhancement would be supporting positional column 
notation, e.g.:

{code}
select * exclude (${1,2,5})   from t   /* exclude columns 1, 2 and 5 */
select * exclude (${1-3}) from t   /* exclude columns 1 to 3 */
select * exclude (${1,3-5,7}) from t   /* exclude columns 1,3 to 5 and 7 */
select * exclude (${7,3-5,1}) from t   /* exclude columns 1,3 to 5 and 7 (same 
as previous example)*/
select * exclude (${3,5-})from t   /* exclude the 3rd column and all 
columns from the 5th column to the end */
select * exclude (-${1,2})from t   /* exclude last 2 columns */
select * exclude (-${1-3,7})  from t   /* exclude last 3 columns and the 7th 
column from the end */
select * exclude (-${4-}) from t   /* exclude all column excpt for the last 
3 */
{code}


A complex example would look like:

{code}
select * exclude (${1-3,5,7},x,y,-${1-2}) from t  
{code}

exclude:
- first 3 columns
- 5th and 7th columns
- x and y 
- last 2 columns

P.s. 1
There should be *no* error raised for the following scenarios:
 - Excluding a column that does not exists in the columns set
 - Excluding the same column more than once (e.g. by name and by position).
- Excluding all columns

P.s. 2
This enhancement answers a real need that is being raised again again in the 
Hive users community as well in legacy RDBMS communities. 
As far as I know, no provider had yet implemented something similar and we have 
an opportunity here to lead the SQL ecosystem.

  

  was:
support the following syntax:

{code}
select * exclude (a,b,e) from t
{code}

which for a table t with columns a,b,c,d,e would be equal to:

{code}
select c,d from t
{code}

Please note that the EXCLUDE clause relates directly to its preceding asterisk.

A common use case would be:

{code}
select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
{code}

This supplies a very clean way to select all columns without getting "Ambiguous 
column reference" and without the need to specify all the columns of at least 
one of the tables.
 
Currently, without this enhancement, the query would look something like this:

{code}
select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
on t1.x=t2.x;
{code}

Considering a table may hold hundreds or even thousands of column, this can be 
come very ugly and error prone.
Often this require some scripting work.

An extension to this enhancement would be supporting positional column 
notation, e.g.:

{code}
select * exclude (${1,2,5})   from t   /* exclude columns 1, 2 and 5 */
select * exclude (${1-3}) from t   /* exclude columns 1 to 3 */
select * exclude (${1,3-5,7}) from t   /* exclude columns 1,3 to 5 and 7 */
select * exclude (${7,3-5,1}) from t   /* exclude columns 1,3 to 5 and 7 (same 
as previous example)*/
select * exclude (${3,5-})from t   /* exclude the 3rd column and all 
columns from the 5th column to the end */
select * exclude (-${1,2})from t   /* exclude last 2 columns */
select * exclude (-${1-3,7})  from t   /* exclude last 3 columns and the 7th 
column from the end */
select * exclude (-${4-}) from t   /* exclude all column excpt for the last 
3 */
{code}


A complex example would look like:

{code}
select * exclude (${1-3,5,7},x,y,-${1-2})  from t  
{code}

exclude:
- first 3 columns
- 5th and 7th columns
- x and y 
- last 2 columns

P.s. 1
There should be *no* error raised for the following scenarios:
 - Excluding a column that does not exists in the columns set
 - Excluding the same column more than once (e.g. by name and by position).
- Excluding all columns

P.s. 2
This enhancement answers a real need that is being raised again again in the 
Hive users community as well in legacy RDBMS communities. 
As far as I know, no provider had yet implemented something similar and we have 
an opportunity here to lead the SQL ecosystem.

  


> Enhance asterisk expression (as in 

[jira] [Updated] (HIVE-16496) Enhance asterisk expression (as in "select *") with EXCLUDE clause

2017-04-21 Thread Dudu Markovitz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dudu Markovitz updated HIVE-16496:
--
Description: 
support the following syntax:

{code}
select * exclude (a,b,e) from t
{code}

which for a table t with columns a,b,c,d,e would be equal to:

{code}
select c,d from t
{code}

Please note that the EXCLUDE clause relates directly to its preceding asterisk.

A common use case would be:

{code}
select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
{code}

This supplies a very clean way to select all columns without getting "Ambiguous 
column reference" and without the need to specify all the columns of at least 
one of the tables.
 
Currently, without this enhancement, the query would look something like this:

{code}
select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
on t1.x=t2.x;
{code}

Considering a table may hold hundreds or even thousands of column, this can be 
come very ugly and error prone.
Often this require some scripting work.

An extension to this enhancement would be supporting positional column 
notation, e.g.:

{code}
select * exclude (${1,2,5})   from t   /* exclude columns 1, 2 and 5 */
select * exclude (${1-3}) from t   /* exclude columns 1 to 3 */
select * exclude (${1,3-5,7}) from t   /* exclude columns 1,3 to 5 and 7 */
select * exclude (${7,3-5,1}) from t   /* exclude columns 1,3 to 5 and 7 (same 
as previous example)*/
select * exclude (${3,5-})from t   /* exclude the 3rd column and all 
columns from the 5th column to the end */
select * exclude (-${1,2})from t   /* exclude last 2 columns */
select * exclude (-${1-3,7})  from t   /* exclude last 3 columns and the 7th 
column from the end */
select * exclude (-${4-}) from t   /* exclude all column excpt for the last 
3 */
{code}


A complex example would look like:

{code}
select * exclude (${1-3,5,7},x,y,-${1-2})  from t  
{code}

exclude:
- first 3 columns
- 5th and 7th columns
- x and y 
- last 2 columns

P.s. 1
There should be *no* error raised for the following scenarios:
 - Excluding a column that does not exists in the columns set
 - Excluding the same column more than once (e.g. by name and by position).
- Excluding all columns

P.s. 2
This enhancement answers a real need that is being raised again again in the 
Hive users community as well in legacy RDBMS communities. 
As far as I know, no provider had yet implemented something similar and we have 
an opportunity here to lead the SQL ecosystem.

  

  was:
support the following syntax:

{code}
select * exclude (a,b,e) from t
{code}

which for a table t with columns a,b,c,d,e would be equal to:

{code}
select c,d from t
{code}

Please note that the EXCLUDE clause relates directly to its preceding asterisk.

A common use case would be:

{code}
select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
{code}

This supplies a very clean way to select all columns without getting "Ambiguous 
column reference" and without the need to specify all the columns of at least 
one of the tables.
 
Currently, without this enhancement, the query would look something like this:

{code}
select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
on t1.x=t2.x;
{code}

Considering a table may hold hundreds or even thousands of column, this can be 
come very ugly and error prone.
Often this require some scripting work.

An extension to this enhancement would be supporting positional column 
notation, e.g.:


{code}
select * exclude (${1,2,5})   from t   /* exclude columns 1, 2 and 5 */
select * exclude (${1-3}) from t   /* exclude columns 1 to 3 */
select * exclude (${1,3-5,7}) from t   /* exclude columns 1,3 to 5 and 7 */
select * exclude (${3,5-})from t   /* exclude the 3rd column and all 
columns from the 5th column to the end */
select * exclude (-${1,2})from t   /* exclude last 2 columns */
select * exclude (-${1-3,7})  from t   /* exclude last 3 columns and the 7th 
column from the end */
select * exclude (-${4-}) from t   /* exclude all column except for the 
last 3 */
{code}

A complex example would look like:

{code}
select * exclude (${1-3,5,7},x,y,-${1-2})  from t  
{code}

exclude:
- first 3 columns
- 5th and 7th columns
- x and y 
- last 2 columns

P.s. 1
There should be *no* error raised for the following scenarios:
 - Excluding a column that does not exists in the columns set
 - Excluding the same column more than once (e.g. by name and by position).
- Excluding all columns

P.s. 2
This enhancement answers a real need that is being raised again again in the 
Hive users community as well in legacy RDBMS communities. 
As far as I know, no provider had yet implemented something similar and we have 
an opportunity here to lead the SQL ecosystem.

  


> Enhance asterisk expression (as in "select *") with EXCLUDE clause
> 

[jira] [Updated] (HIVE-16496) Enhance asterisk expression (as in "select *") with EXCLUDE clause

2017-04-21 Thread Dudu Markovitz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dudu Markovitz updated HIVE-16496:
--
Description: 
support the following syntax:

{code}
select * exclude (a,b,e) from t
{code}

which for a table t with columns a,b,c,d,e would be equal to:

{code}
select c,d from t
{code}

Please note that the EXCLUDE clause relates directly to its preceding asterisk.

A common use case would be:

{code}
select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
{code}

This supplies a very clean way to select all columns without getting "Ambiguous 
column reference" and without the need to specify all the columns of at least 
one of the tables.
 
Currently, without this enhancement, the query would look something like this:

{code}
select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
on t1.x=t2.x;
{code}

Considering a table may hold hundreds or even thousands of column, this can be 
come very ugly and error prone.
Often this require some scripting work.

An extension to this enhancement would be supporting positional column 
notation, e.g.:


{code}
select * exclude (${1,2,5})   from t   /* exclude columns 1, 2 and 5 */
select * exclude (${1-3}) from t   /* exclude columns 1 to 3 */
select * exclude (${1,3-5,7}) from t   /* exclude columns 1,3 to 5 and 7 */
select * exclude (${3,5-})from t   /* exclude the 3rd column and all 
columns from the 5th column to the end */
select * exclude (-${1,2})from t   /* exclude last 2 columns */
select * exclude (-${1-3,7})  from t   /* exclude last 3 columns and the 7th 
column from the end */
select * exclude (-${4-}) from t   /* exclude all column except for the 
last 3 */
{code}

A complex example would look like:

{code}
select * exclude (${1-3,5,7},x,y,-${1-2})  from t  
{code}

exclude:
- first 3 columns
- 5th and 7th columns
- x and y 
- last 2 columns

P.s. 1
There should be *no* error raised for the following scenarios:
 - Excluding a column that does not exists in the columns set
 - Excluding the same column more than once (e.g. by name and by position).
- Excluding all columns

P.s. 2
This enhancement answers a real need that is being raised again again in the 
Hive users community as well in legacy RDBMS communities. 
As far as I know, no provider had yet implemented something similar and we have 
an opportunity here to lead the SQL ecosystem.

  

  was:
support the following syntax:

{code}
select * exclude (a,b,e) from t
{code}

which for a table t with columns a,b,c,d,e would be equal to:

{code}
select c,d from t
{code}

Please note that the EXCLUDE clause relates directly to its preceding asterisk.

A common use case would be:

{code}
select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
{code}

This supplies a very clean way to select all columns without getting "Ambiguous 
column reference" and without the need to specify all the columns of at least 
one of the tables.
 
Currently, without this enhancement, the query would look something like this:

{code}
select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
on t1.x=t2.x;
{code}

Considering a table may hold hundreds or even thousands of column, this can be 
come very ugly and error prone.
Often this require some scripting work.

An extension to this enhancement would be supporting positional column 
notation, e.g.:

{code}
select * exclude (${1,2,5}) from t   /* exclude columns 1, 2 and 5 */
select * exclude (${1-3}) from t   /* exclude columns 1 to 3 */
select * exclude (${1,3-5,7}) from t   /* exclude columns 1,3 to 5 and 7 */
select * exclude (${3,5-}) from t   /* exclude the 3rd column and all columns 
from the 5th column to the end */
select * exclude (-${1,2}) from t   /* exclude last 2 columns */
select * exclude (-${1-3,7}) from t   /* exclude last 3 columns and the 7th 
column from the end */
select * exclude (-${4-}) from t   /* exclude all column except for the last 3 
*/
{code}

A complex example would look like:

{code}
select * exclude (${1-3,5,7},x,y,-${1-2})  from t  
{code}

exclude:
- first 3 columns
- 5th and 7th columns
- x and y 
- last 2 columns

P.s. 1
There should be *no* error raised for the following scenarios:
 - Excluding a column that does not exists in the columns set
 - Excluding the same column more than once (e.g. by name and by position).
- Excluding all columns

P.s. 2
This enhancement answers a real need that is being raised again again in the 
Hive users community as well in legacy RDBMS communities. 
As far as I know, no provider had yet implemented something similar and we have 
an opportunity here to lead the SQL ecosystem.

  


> Enhance asterisk expression (as in "select *") with EXCLUDE clause
> --
>
> Key: HIVE-16496
> URL: https://issues.apache.org/jira/browse/HIVE-16496
>

[jira] [Updated] (HIVE-16496) Enhance asterisk expression (as in "select *") with EXCLUDE clause

2017-04-21 Thread Dudu Markovitz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dudu Markovitz updated HIVE-16496:
--
Description: 
support the following syntax:

{code}
select * exclude (a,b,e) from t
{code}

which for a table t with columns a,b,c,d,e would be equal to:

{code}
select c,d from t
{code}

Please note that the EXCLUDE clause relates directly to its preceding asterisk.

A common use case would be:

{code}
select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
{code}

This supplies a very clean way to select all columns without getting "Ambiguous 
column reference" and without the need to specify all the columns of at least 
one of the tables.
 
Currently, without this enhancement, the query would look something like this:

{code}
select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
on t1.x=t2.x;
{code}

Considering a table may hold hundreds or even thousands of column, this can be 
come very ugly and error prone.
Often this require some scripting work.

An extension to this enhancement would be supporting positional column 
notation, e.g.:

{code}
select * exclude (${1,2,5}) from t   /* exclude columns 1, 2 and 5 */
select * exclude (${1-3}) from t   /* exclude columns 1 to 3 */
select * exclude (${1,3-5,7}) from t   /* exclude columns 1,3 to 5 and 7 */
select * exclude (${3,5-}) from t   /* exclude the 3rd column and all columns 
from the 5th column to the end */
select * exclude (-${1,2}) from t   /* exclude last 2 columns */
select * exclude (-${1-3,7}) from t   /* exclude last 3 columns and the 7th 
column from the end */
select * exclude (-${4-}) from t   /* exclude all column except for the last 3 
*/
{code}

A complex example would look like:

{code}
select * exclude (${1-3,5,7},x,y,-${1-2})  from t  
{code}

exclude:
- first 3 columns
- 5th and 7th columns
- x and y 
- last 2 columns

P.s. 1
There should be *no* error raised for the following scenarios:
 - Excluding a column that does not exists in the columns set
 - Excluding the same column more than once (e.g. by name and by position).
- Excluding all columns

P.s. 2
This enhancement answers a real need that is being raised again again in the 
Hive users community as well in legacy RDBMS communities. 
As far as I know, no provider had yet implemented something similar and we have 
an opportunity here to lead the SQL ecosystem.

  

  was:
support the following syntax:

{code}
select * exclude (a,b,e) from t
{code}

which for a table t with columns a,b,c,d,e would be equal to:

{code}
select c,d from t
{code}

Please note that the EXCLUDE clause relates directly to its preceding asterisk.

A common use case would be:

{code}
select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
{code}

This supplies a very clean way to select all columns without getting "Ambiguous 
column reference" and without the need to specify all the columns of at least 
one of the tables.
 
Currently, without this enhancement, the query would look something like this:

{code}
select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
on t1.x=t2.x;
{code}

Considering a table may hold hundreds or even thousands of column, this can be 
come very ugly and error prone.
Often this require some scripting work.

An extension to this enhancement would be supporting positional column 
notation, e.g.:

{code}
select * exclude (${1,2,5}) from t   /* exclude columns 1, 2 and 5 */
select * exclude (${1-3}) from t   /* exclude columns 1 to 3 */
select * exclude (${1,3-5,7}) from t   /* exclude columns 1,3 to 5 and 7 */
select * exclude (${3,5-}) from t   /* exclude the 3rd column and all columns 
from the 5th column to the end */
select * exclude (-${1,2}) from t   /* exclude last 2 columns */
select * exclude (-${1-3,7}) from t   /* exclude last 3 columns and the 7th 
column from the end */
select * exclude (-${4-}) from t   /* exclude all column excpt for the last 3 */
{code}

A complex example would look like:

{code}
select * exclude (${1-3,5,7},x,y,-${1-2})  from t  
{code}

exclude:
- first 3 columns
- 5th and 7th columns
- x and y 
- last 2 columns

P.s. 1
There should be *no* error raised for the following scenarios:
 - Excluding a column that does not exists in the columns set
 - Excluding the same column more than once (e.g. by name and by position).
- Excluding all columns

P.s. 2
This enhancement answers a real need that is being raised again again in the 
Hive users community as well in legacy RDBMS communities. 
As far as I know, no provider had yet implemented something similar and we have 
an opportunity here to lead the SQL ecosystem.

  


> Enhance asterisk expression (as in "select *") with EXCLUDE clause
> --
>
> Key: HIVE-16496
> URL: https://issues.apache.org/jira/browse/HIVE-16496
> Project: Hive

[jira] [Updated] (HIVE-16496) Enhance asterisk expression (as in "select *") with EXCLUDE clause

2017-04-21 Thread Dudu Markovitz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dudu Markovitz updated HIVE-16496:
--
Description: 
support the following syntax:

{code}
select * exclude (a,b,e) from t
{code}

which for a table t with columns a,b,c,d,e would be equal to:

{code}
select c,d from t
{code}

Please note that the EXCLUDE clause relates directly to its preceding asterisk.

A common use case would be:

{code}
select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
{code}

This supplies a very clean way to select all columns without getting "Ambiguous 
column reference" and without the need to specify all the columns of at least 
one of the tables.
 
Currently, without this enhancement, the query would look something like this:

{code}
select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
on t1.x=t2.x;
{code}

Considering a table may hold hundreds or even thousands of column, this can be 
come very ugly and error prone.
Often this require some scripting work.

An extension to this enhancement would be supporting positional column 
notation, e.g.:

{code}
select * exclude (${1,2,5}) from t   /* exclude columns 1, 2 and 5 */
select * exclude (${1-3}) from t   /* exclude columns 1 to 3 */
select * exclude (${1,3-5,7}) from t   /* exclude columns 1,3 to 5 and 7 */
select * exclude (${3,5-}) from t   /* exclude the 3rd column and all columns 
from the 5th column to the end */
select * exclude (-${1,2}) from t   /* exclude last 2 columns */
select * exclude (-${1-3,7}) from t   /* exclude last 3 columns and the 7th 
column from the end */
select * exclude (-${4-}) from t   /* exclude all column excpt for the last 3 */
{code}

A complex example would look like:

{code}
select * exclude (${1-3,5,7},x,y,-${1-2})  from t  
{code}

exclude:
- first 3 columns
- 5th and 7th columns
- x and y 
- last 2 columns

P.s. 1
There should be *no* error raised for the following scenarios:
 - Excluding a column that does not exists in the columns set
 - Excluding the same column more than once (e.g. by name and by position).
- Excluding all columns

P.s. 2
This enhancement answers a real need that is being raised again again in the 
Hive users community as well in legacy RDBMS communities. 
As far as I know, no provider had yet implemented something similar and we have 
an opportunity here to lead the SQL ecosystem.

  

  was:
support the following syntax:

{code}
select * exclude (a,b,e) from t
{code}

which for a table t with columns a,b,c,d,e would be equal to:

{code}
select c,d from t
{code}

Please note that the EXCLUDE clause relates directly to its preceding asterisk.

A common use case would be:

{code}
select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
{code}

This supplies a very clean way to select all columns without getting "Ambiguous 
column reference" and without the need to specify all the columns of at least 
one of the tables.
 
Currently, without this enhancement, the query would look something like this:

{code}
select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
on t1.x=t2.x;
{code}

Considering a table may hold hundreds or even thousands of column, this can be 
come very ugly and error prone.
Often this require some scripting work.

An extension to this enhancement would be supporting positional column 
notation, e.g.:

{code}
select * exclude (${1,2,5}) from t   /* exclude columns 1, 2 and 5 */
select * exclude (${1-3}) from t   /* exclude columns 1 to 3 */
select * exclude (${1,3-5,7}) from t   /* exclude columns 1,3 to 5 and 7 */
select * exclude (${3,5-}) from t   /* exclude the 3rd column and all columns 
from the 5th column to the end */
select * exclude (-${1,2}) from t   /* exclude last 2 columns */
select * exclude (-${1-3,7}) from t   /* exclude last 3 columns and the 7th 
column from the end */
{code}

A complex example would look like:

{code}
select * exclude (${1-3,5,7},x,y,-${1-2})  from t  
{code}

exclude:
- first 3 columns
- 5th and 7th columns
- x and y 
- last 2 columns

P.s. 1
There should be *no* error raised for the following scenarios:
 - Excluding a column that does not exists in the columns set
 - Excluding the same column more than once (e.g. by name and by position).
- Excluding all columns

P.s. 2
This enhancement answers a real need that is being raised again again in the 
Hive users community as well in legacy RDBMS communities. 
As far as I know, no provider had yet implemented something similar and we have 
an opportunity here to lead the SQL ecosystem.

  


> Enhance asterisk expression (as in "select *") with EXCLUDE clause
> --
>
> Key: HIVE-16496
> URL: https://issues.apache.org/jira/browse/HIVE-16496
> Project: Hive
>  Issue Type: Wish
>  Components: Parser
>Reporter: 

[jira] [Updated] (HIVE-16486) schema changes for partitioned tables (RESTRICT, not CASCADE) seems to lead to unreasonable behaviour

2017-04-21 Thread Dudu Markovitz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dudu Markovitz updated HIVE-16486:
--
Description: 
h2. Use case 1: adding a column in the middle of a table

{code}
create table mytable  (i1 int,i3 int) partitioned by (ver int) stored as 
textfile;
insert into mytable partition(ver=1) values (1,2),(3,4);
select * from mytable;
{code}

||i1||i3||ver||
|1|2|1|
|3|4|1|

{code}
alter table mytable replace columns (i1 int,i2 int,i3 int);
insert into mytable partition(ver=2) values (5,6,7),(8,9,10);
select * from mytable;
{code}


I would expect values 2 and 4 to appear under column i3 , In accordance with 
the columns definition of partition ver=1, but they appear under column i2.

||i1||i2||i3||ver||
|1|2|(null)|1|
|3|4|(null)|1|
|5|6|7|2|
|8|9|10|2|

h2. Use case 2: dropping a column

{code}
create table mytable  (i1 int,i2 int) partitioned by (ver int) stored as 
textfile;
insert into mytable partition(ver=1) values (1,2),(3,4);
select * from mytable;
{code}

||i1||i2||ver||
|1|2|1|
|3|4|1|

{code}
alter table mytable replace columns (i1 int);
insert into mytable partition(ver=2) values (5),(6);
select * from mytable;
{code}

||i1||ver||
|1|1|
|3|1|
|5|2|
|6|2|

I would expect column i2 to be part of the columns list when asterisk is being 
used or at least that is would be available for explicit use

{code}
select i1,i2 from mytable;
{code}

{quote}
{color:red}
Error while compiling statement: FAILED: SemanticException [Error 10004]: Line 
1:10 Invalid table alias or column reference 'i2': (possible column names are: 
i1, ver)
{color}
{quote}

h2. Use case 3: Insert to a specific partition after adding a column at the end 

{code}
create table t (i int) partitioned by (ver int);
alter table t add partition (ver=1);
alter table t add columns (j int);
alter table t add partition (ver=2);
{code}

I would expect that insert into partition ver=1 will require a single value 
(for column i)

{code}
insert into t partition (ver=1) values (1);
{code}

{quote}
{color:red}
Error while compiling statement: FAILED: SemanticException [Error 10044]: Line 
1:12 Cannot insert into target table because column number/types are different 
'1': Table insclause-0 has 2 columns, but query has 1 columns.
{color}
{quote}

Apparently the number of columns id derived from the current table definition 
and not from the partition definition 

{code}
insert into t partition (ver=1) values (1,2);
insert into t partition (ver=2) values (3,4);
{code}

>From the other end, although we are forced to supply values for all columns 
>the columns (including column j which is not defined for partition ver=1) the 
>values are not displayed

{code}
select * from t;
{code}

||i||j||ver||
|1|(null)|1|
|3|4|2|

(j=2 for ver=1 does exist in the files)


  was:
h2. Use case 1: adding a column in the middle of a table

{code}
create table mytable  (i1 int,i3 int) partitioned by (ver int) stored as 
textfile;
insert into mytable partition(ver=1) values (1,2),(3,4);
select * from mytable;
{code}

||i1||i3||ver||
|1|2|1|
|3|4|1|

{code}
alter table mytable replace columns (i1 int,i2 int,i3 int);
insert into mytable partition(ver=2) values (5,6,7),(8,9,10);
select * from mytable;
{code}


I would expect values 2 and 4 to appear under column i3 , In accordance with 
the columns definition of partition ver=1, but they appear under column i2.

||i1||i2||i3||ver||
|1|2|(null)|1|
|3|4|(null)|1|
|5|6|7|2|
|8|9|10|2|

h2. Use case 2: dropping a column

{code}
create table mytable  (i1 int,i2 int) partitioned by (ver int) stored as 
textfile;
insert into mytable partition(ver=1) values (1,2),(3,4);
select * from mytable;
{code}

||i1||i2||ver||
|1|2|1|
|3|4|1|

{code}
alter table mytable replace columns (i1 int);
insert into mytable partition(ver=2) values (5),(6);
select * from mytable;
{code}

||i1||ver||
|1|1|
|3|1|
|5|2|
|6|2|

I would expect column i2 to be part of the columns list when asterisk is being 
used or at least that is would be available for explicit use

{code}
select i1,i2 from mytable;
{code}

{quote}
{color:red}
Error while compiling statement: FAILED: SemanticException [Error 10004]: Line 
1:10 Invalid table alias or column reference 'i2': (possible column names are: 
i1, ver)
{color}
{quote}

h2. Use case 3: Insert to a specific partition after adding a column at the end 

{code}
create table t (i int) partitioned by (ver int);
alter table t add partition (ver=1);
alter table t add columns (j int);
alter table t add partition (ver=2);
{code}

I would expect that insert into partition ver=1 will require a single value 
(for column i)

{code}
insert into t partition (ver=1) values (1);
{code}

{quote}
{color:red}
Error while compiling statement: FAILED: SemanticException [Error 10044]: Line 
1:12 Cannot insert into target table because column number/types are different 
'1': Table insclause-0 has 2 columns, but query has 1 columns.

[jira] [Updated] (HIVE-16486) schema changes for partitioned tables (RESTRICT, not CASCADE) seems to lead to unreasonable behaviour

2017-04-21 Thread Dudu Markovitz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dudu Markovitz updated HIVE-16486:
--
Description: 
h2. Use case 1: adding a column in the middle of a table

{code}
create table mytable  (i1 int,i3 int) partitioned by (ver int) stored as 
textfile;
insert into mytable partition(ver=1) values (1,2),(3,4);
select * from mytable;
{code}

||i1||i3||ver||
|1|2|1|
|3|4|1|

{code}
alter table mytable replace columns (i1 int,i2 int,i3 int);
insert into mytable partition(ver=2) values (5,6,7),(8,9,10);
select * from mytable;
{code}


I would expect values 2 and 4 to appear under column i3 , In accordance with 
the columns definition of partition ver=1, but they appear under column i2.

||i1||i2||i3||ver||
|1|2|(null)|1|
|3|4|(null)|1|
|5|6|7|2|
|8|9|10|2|

h2. Use case 2: dropping a column

{code}
create table mytable  (i1 int,i2 int) partitioned by (ver int) stored as 
textfile;
insert into mytable partition(ver=1) values (1,2),(3,4);
select * from mytable;
{code}

||i1||i2||ver||
|1|2|1|
|3|4|1|

{code}
alter table mytable replace columns (i1 int);
insert into mytable partition(ver=2) values (5),(6);
select * from mytable;
{code}

||i1||ver||
|1|1|
|3|1|
|5|2|
|6|2|

I would expect column i2 to be part of the columns list when asterisk is being 
used or at least that is would be available for explicit use

{code}
select i1,i2 from mytable;
{code}

{quote}
{color:red}
Error while compiling statement: FAILED: SemanticException [Error 10004]: Line 
1:10 Invalid table alias or column reference 'i2': (possible column names are: 
i1, ver)
{color}
{quote}

h2. Use case 3: Insert to a specific partition after adding a column at the end 

{code}
create table t (i int) partitioned by (ver int);
alter table t add partition (ver=1);
alter table t add columns (j int);
alter table t add partition (ver=2);
{code}

I would expect that insert into partition ver=1 will require a single value 
(for column i)

{code}
insert into t partition (ver=1) values (1);
{code}

{quote}
{color:red}
Error while compiling statement: FAILED: SemanticException [Error 10044]: Line 
1:12 Cannot insert into target table because column number/types are different 
'1': Table insclause-0 has 2 columns, but query has 1 columns.
{color}
{quote}

Apparently the number of columns id derived from the current table definition 
and not from the partition definition 

{code}
insert into t partition (ver=1) values (1,2);
insert into t partition (ver=2) values (3,4);

>From the other end, although we are forced to supply values for all columns 
>the columns (including column j which is not defined for partition ver=1) the 
>values are not displayed

{code}
select * from t;
{code}

||i||j||ver||
|1|(null)|1|
|3|4|2|

(j=2 for ver=1 does exist in the files)


  was:
h2. Use case 1: adding a column in the middle of a table

{code}
create table mytable  (i1 int,i3 int) partitioned by (ver int) stored as 
textfile;
insert into mytable partition(ver=1) values (1,2),(3,4);
select * from mytable;
{code}

||i1||i3||ver||
|1|2|1|
|3|4|1|

{code}
alter table mytable replace columns (i1 int,i2 int,i3 int);
insert into mytable partition(ver=2) values (5,6,7),(8,9,10);
select * from mytable;
{code}


I would expect values 2 and 4 to appear under column i3 , In accordance with 
the columns definition of partition ver=1, but they appear under column i2.

||i1||i2||i3||ver||
|1|2|(null)|1|
|3|4|(null)|1|
|5|6|7|2|
|8|9|10|2|

h2. Use case 2: dropping a column

{code}
create table mytable  (i1 int,i2 int) partitioned by (ver int) stored as 
textfile;
insert into mytable partition(ver=1) values (1,2),(3,4);
select * from mytable;
{code}

||i1||i2||ver||
|1|2|1|
|3|4|1|

{code}
alter table mytable replace columns (i1 int);
insert into mytable partition(ver=2) values (5),(6);
select * from mytable;
{code}

||i1||ver||
|1|1|
|3|1|
|5|2|
|6|2|

I would expect column i2 to be part of the columns list when asterisk is being 
used or at least that is would be available for explicit use

{code}
select i1,i2 from mytable;
{code}

{quote}
{color:red}
Error while compiling statement: FAILED: SemanticException [Error 10004]: Line 
1:10 Invalid table alias or column reference 'i2': (possible column names are: 
i1, ver)
{color}
{quote}

h2. Use case 3: Insert to a specific partition after adding a column at the end 

{code}
create table t (i int) partitioned by (ver int);
alter table t add partition (ver=1);
alter table t add columns (j int);
alter table t add partition (ver=2);

I would expect that insert into partition ver=1 will require a single value 
(for column i)
{code}
insert into t partition (ver=1) values (1);
{code}

{quote}
{color:red}
Error while compiling statement: FAILED: SemanticException [Error 10044]: Line 
1:12 Cannot insert into target table because column number/types are different 
'1': Table insclause-0 has 2 columns, but query has 1 columns.
{color}

[jira] [Updated] (HIVE-16486) schema changes for partitioned tables (RESTRICT, not CASCADE) seems to lead to unreasonable behaviour

2017-04-21 Thread Dudu Markovitz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dudu Markovitz updated HIVE-16486:
--
Description: 
h2. Use case 1: adding a column in the middle of a table

{code}
create table mytable  (i1 int,i3 int) partitioned by (ver int) stored as 
textfile;
insert into mytable partition(ver=1) values (1,2),(3,4);
select * from mytable;
{code}

||i1||i3||ver||
|1|2|1|
|3|4|1|

{code}
alter table mytable replace columns (i1 int,i2 int,i3 int);
insert into mytable partition(ver=2) values (5,6,7),(8,9,10);
select * from mytable;
{code}


I would expect values 2 and 4 to appear under column i3 , In accordance with 
the columns definition of partition ver=1, but they appear under column i2.

||i1||i2||i3||ver||
|1|2|(null)|1|
|3|4|(null)|1|
|5|6|7|2|
|8|9|10|2|

h2. Use case 2: dropping a column

{code}
create table mytable  (i1 int,i2 int) partitioned by (ver int) stored as 
textfile;
insert into mytable partition(ver=1) values (1,2),(3,4);
select * from mytable;
{code}

||i1||i2||ver||
|1|2|1|
|3|4|1|

{code}
alter table mytable replace columns (i1 int);
insert into mytable partition(ver=2) values (5),(6);
select * from mytable;
{code}

||i1||ver||
|1|1|
|3|1|
|5|2|
|6|2|

I would expect column i2 to be part of the columns list when asterisk is being 
used or at least that is would be available for explicit use

{code}
select i1,i2 from mytable;
{code}

{quote}
{color:red}
Error while compiling statement: FAILED: SemanticException [Error 10004]: Line 
1:10 Invalid table alias or column reference 'i2': (possible column names are: 
i1, ver)
{color}
{quote}

h2. Use case 3: Insert to a specific partition after adding a column at the end 

{code}
create table t (i int) partitioned by (ver int);
alter table t add partition (ver=1);
alter table t add columns (j int);
alter table t add partition (ver=2);

I would expect that insert into partition ver=1 will require a single value 
(for column i)
{code}
insert into t partition (ver=1) values (1);
{code}

{quote}
{color:red}
Error while compiling statement: FAILED: SemanticException [Error 10044]: Line 
1:12 Cannot insert into target table because column number/types are different 
'1': Table insclause-0 has 2 columns, but query has 1 columns.
{color}
{quote}

Apparently the number of columns id derived from the current table definition 
and not from the partition definition 

{code}
insert into t partition (ver=1) values (1,2);
insert into t partition (ver=2) values (3,4);

>From the other end, although we are forced to supply values for all columns 
>the columns (including column j which is not defined for partition ver=1) the 
>values are not displayed

{code}
select * from t;
{code}

||i||j||ver||
|1|(null)|1|
|3|4|2|

(j=2 for ver=1 does exist in the files)


  was:
h2. Use case 1: adding a column in the middle of a table

{code}
create table mytable  (i1 int,i3 int) partitioned by (ver int) stored as 
textfile;
insert into mytable partition(ver=1) values (1,2),(3,4);
select * from mytable;
{code}

||i1||i3||ver||
|1|2|1|
|3|4|1|

{code}
alter table mytable replace columns (i1 int,i2 int,i3 int);
insert into mytable partition(ver=2) values (5,6,7),(8,9,10);
select * from mytable;
{code}


I would expect values 2 and 4 to appear under column i3 , In accordance with 
the columns definition of partition ver=1, but they appear under column i2.

||i1||i2||i3||ver||
|1|2|(null)|1|
|3|4|(null)|1|
|5|6|7|2|
|8|9|10|2|

h2. Use case 2: dropping a column

{code}
create table mytable  (i1 int,i2 int) partitioned by (ver int) stored as 
textfile;
insert into mytable partition(ver=1) values (1,2),(3,4);
select * from mytable;
{code}

||i1||i2||ver||
|1|2|1|
|3|4|1|

{code}
alter table mytable replace columns (i1 int);
insert into mytable partition(ver=2) values (5),(6);
select * from mytable;
{code}

||i1||ver||
|1|1|
|3|1|
|5|2|
|6|2|

I would expect column i2 to be part of the columns list when asterisk is being 
used or at least that is would be available for explicit use

{code}
select i1,i2 from mytable;
{code}

{quote}
{color:red}
Error while compiling statement: FAILED: SemanticException [Error 10004]: Line 
1:10 Invalid table alias or column reference 'i2': (possible column names are: 
i1, ver)
{color}
{quote}

h2. Use case 3: Insert to a specific partition after adding a column at the end 

{code}
create table t (i int) partitioned by (ver int);
alter table t add partition (ver=1);
alter table t add columns (j int);
alter table t add partition (ver=2);

I would expect that insert into partition ver=1 will require a single value 
(for column i)
{code}
insert into t partition (ver=1) values (1);
{code}

{quote}
{color:red}
Error while compiling statement: FAILED: SemanticException [Error 10044]: Line 
1:12 Cannot insert into target table because column number/types are different 
'1': Table insclause-0 has 2 columns, but query has 1 columns.
{color}
{quote}


[jira] [Updated] (HIVE-16486) schema changes for partitioned tables (RESTRICT, not CASCADE) seems to lead to unreasonable behaviour

2017-04-21 Thread Dudu Markovitz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dudu Markovitz updated HIVE-16486:
--
Description: 
h4. Use case 1: adding a column in the middle of a table

{code}
create table mytable  (i1 int,i3 int) partitioned by (ver int) stored as 
textfile;
insert into mytable partition(ver=1) values (1,2),(3,4);
select * from mytable;
{code}

||i1||i3||ver||
|1|2|1|
|3|4|1|

{code}
alter table mytable replace columns (i1 int,i2 int,i3 int);
insert into mytable partition(ver=2) values (5,6,7),(8,9,10);
select * from mytable;
{code}


I would expect values 2 and 4 to appear under column i3 , In accordance with 
the columns definition of partition ver=1, but they appear under column i2.

||i1||i2||i3||ver||
|1|2|(null)|1|
|3|4|(null)|1|
|5|6|7|2|
|8|9|10|2|

h4. Use case 2: dropping a column

{code}
create table mytable  (i1 int,i2 int) partitioned by (ver int) stored as 
textfile;
insert into mytable partition(ver=1) values (1,2),(3,4);
select * from mytable;
{code}

||i1||i2||ver||
|1|2|1|
|3|4|1|

{code}
alter table mytable replace columns (i1 int);
insert into mytable partition(ver=2) values (5),(6);
select * from mytable;
{code}

||i1||ver||
|1|1|
|3|1|
|5|2|
|6|2|

I would expect column i2 to be part of the columns list when asterisk is being 
used or at least that is would be available for explicit use

{code}
select i1,i2 from mytable;
{code}

{quote}
{color:red}
Error while compiling statement: FAILED: SemanticException [Error 10004]: Line 
1:10 Invalid table alias or column reference 'i2': (possible column names are: 
i1, ver)
{color}
{quote}

h4. Use case 1: Insert to a specific partition after adding a column at the end 

{code}
create table t (i int) partitioned by (ver int);
alter table t add partition (ver=1);
alter table t add columns (j int);
alter table t add partition (ver=2);

I would expect that insert into partition ver=1 will require a single value 
(for column i)
{code}
insert into t partition (ver=1) values (1);
{code}

{quote}
{color:red}
Error while compiling statement: FAILED: SemanticException [Error 10044]: Line 
1:12 Cannot insert into target table because column number/types are different 
'1': Table insclause-0 has 2 columns, but query has 1 columns.
{color}
{quote}

Apperantly the number of columns id derived from the current table definition 
and not from the partition definition 

{code}
insert into t partition (ver=1) values (1,2);
insert into t partition (ver=2) values (3,4);

>From the other end, although we are forced to supply values for all columns 
>the columns (including column j which is not defined for partition ver=1) the 
>values are not displayed

{code}
select * from t;
{code}

||i||j||ver||
|1|(null)|1|
|3|4|2|

(The data does exist in the files)


  was:
h4. Use case 1: adding a column in the middle of a table

{code}
create table mytable  (i1 int,i3 int) partitioned by (ver int) stored as 
textfile;
insert into mytable partition(ver=1) values (1,2),(3,4);
select * from mytable;
{code}

||i1||i3||ver||
|1|2|1|
|3|4|1|

{code}
alter table mytable replace columns (i1 int,i2 int,i3 int);
insert into mytable partition(ver=2) values (5,6,7),(8,9,10);
select * from mytable;
{code}


I would expect values 2 and 4 to appear under column i3 , In accordance with 
the columns definition of partition ver=1, but they appear under column i2.

||i1||i2||i3||ver||
|1|2|(null)|1|
|3|4|(null)|1|
|5|6|7|2|
|8|9|10|2|

h4. Use case 2: dropping a column

{code}
create table mytable  (i1 int,i2 int) partitioned by (ver int) stored as 
textfile;
insert into mytable partition(ver=1) values (1,2),(3,4);
select * from mytable;
{code}

||i1||i2||ver||
|1|2|1|
|3|4|1|

{code}
alter table mytable replace columns (i1 int);
insert into mytable partition(ver=2) values (5),(6);
select * from mytable;
{code}

||i1||ver||
|1|1|
|3|1|
|5|2|
|6|2|

I would expect column i2 to be part of the columns list when asterisk is being 
used or at least that is would be available for explicit use

{code}
select i1,i2 from mytable;
{code}

{quote}
{color:red}
Error while compiling statement: FAILED: SemanticException [Error 10004]: Line 
1:10 Invalid table alias or column reference 'i2': (possible column names are: 
i1, ver)
{color}
{quote}


> schema changes for partitioned tables (RESTRICT, not CASCADE) seems to lead 
> to unreasonable behaviour
> -
>
> Key: HIVE-16486
> URL: https://issues.apache.org/jira/browse/HIVE-16486
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 1.1.0
>Reporter: Dudu Markovitz
>
> h4. Use case 1: adding a column in the middle of a table
> {code}
> create table mytable  (i1 int,i3 int) partitioned by (ver int) stored as 
> textfile;
> insert into 

[jira] [Updated] (HIVE-16486) schema changes for partitioned tables (RESTRICT, not CASCADE) seems to lead to unreasonable behaviour

2017-04-21 Thread Dudu Markovitz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dudu Markovitz updated HIVE-16486:
--
Description: 
h2. Use case 1: adding a column in the middle of a table

{code}
create table mytable  (i1 int,i3 int) partitioned by (ver int) stored as 
textfile;
insert into mytable partition(ver=1) values (1,2),(3,4);
select * from mytable;
{code}

||i1||i3||ver||
|1|2|1|
|3|4|1|

{code}
alter table mytable replace columns (i1 int,i2 int,i3 int);
insert into mytable partition(ver=2) values (5,6,7),(8,9,10);
select * from mytable;
{code}


I would expect values 2 and 4 to appear under column i3 , In accordance with 
the columns definition of partition ver=1, but they appear under column i2.

||i1||i2||i3||ver||
|1|2|(null)|1|
|3|4|(null)|1|
|5|6|7|2|
|8|9|10|2|

h2. Use case 2: dropping a column

{code}
create table mytable  (i1 int,i2 int) partitioned by (ver int) stored as 
textfile;
insert into mytable partition(ver=1) values (1,2),(3,4);
select * from mytable;
{code}

||i1||i2||ver||
|1|2|1|
|3|4|1|

{code}
alter table mytable replace columns (i1 int);
insert into mytable partition(ver=2) values (5),(6);
select * from mytable;
{code}

||i1||ver||
|1|1|
|3|1|
|5|2|
|6|2|

I would expect column i2 to be part of the columns list when asterisk is being 
used or at least that is would be available for explicit use

{code}
select i1,i2 from mytable;
{code}

{quote}
{color:red}
Error while compiling statement: FAILED: SemanticException [Error 10004]: Line 
1:10 Invalid table alias or column reference 'i2': (possible column names are: 
i1, ver)
{color}
{quote}

h2. Use case 3: Insert to a specific partition after adding a column at the end 

{code}
create table t (i int) partitioned by (ver int);
alter table t add partition (ver=1);
alter table t add columns (j int);
alter table t add partition (ver=2);

I would expect that insert into partition ver=1 will require a single value 
(for column i)
{code}
insert into t partition (ver=1) values (1);
{code}

{quote}
{color:red}
Error while compiling statement: FAILED: SemanticException [Error 10044]: Line 
1:12 Cannot insert into target table because column number/types are different 
'1': Table insclause-0 has 2 columns, but query has 1 columns.
{color}
{quote}

Apparently the number of columns id derived from the current table definition 
and not from the partition definition 

{code}
insert into t partition (ver=1) values (1,2);
insert into t partition (ver=2) values (3,4);

>From the other end, although we are forced to supply values for all columns 
>the columns (including column j which is not defined for partition ver=1) the 
>values are not displayed

{code}
select * from t;
{code}

||i||j||ver||
|1|(null)|1|
|3|4|2|

(The data does exist in the files)


  was:
h4. Use case 1: adding a column in the middle of a table

{code}
create table mytable  (i1 int,i3 int) partitioned by (ver int) stored as 
textfile;
insert into mytable partition(ver=1) values (1,2),(3,4);
select * from mytable;
{code}

||i1||i3||ver||
|1|2|1|
|3|4|1|

{code}
alter table mytable replace columns (i1 int,i2 int,i3 int);
insert into mytable partition(ver=2) values (5,6,7),(8,9,10);
select * from mytable;
{code}


I would expect values 2 and 4 to appear under column i3 , In accordance with 
the columns definition of partition ver=1, but they appear under column i2.

||i1||i2||i3||ver||
|1|2|(null)|1|
|3|4|(null)|1|
|5|6|7|2|
|8|9|10|2|

h4. Use case 2: dropping a column

{code}
create table mytable  (i1 int,i2 int) partitioned by (ver int) stored as 
textfile;
insert into mytable partition(ver=1) values (1,2),(3,4);
select * from mytable;
{code}

||i1||i2||ver||
|1|2|1|
|3|4|1|

{code}
alter table mytable replace columns (i1 int);
insert into mytable partition(ver=2) values (5),(6);
select * from mytable;
{code}

||i1||ver||
|1|1|
|3|1|
|5|2|
|6|2|

I would expect column i2 to be part of the columns list when asterisk is being 
used or at least that is would be available for explicit use

{code}
select i1,i2 from mytable;
{code}

{quote}
{color:red}
Error while compiling statement: FAILED: SemanticException [Error 10004]: Line 
1:10 Invalid table alias or column reference 'i2': (possible column names are: 
i1, ver)
{color}
{quote}

h4. Use case 1: Insert to a specific partition after adding a column at the end 

{code}
create table t (i int) partitioned by (ver int);
alter table t add partition (ver=1);
alter table t add columns (j int);
alter table t add partition (ver=2);

I would expect that insert into partition ver=1 will require a single value 
(for column i)
{code}
insert into t partition (ver=1) values (1);
{code}

{quote}
{color:red}
Error while compiling statement: FAILED: SemanticException [Error 10044]: Line 
1:12 Cannot insert into target table because column number/types are different 
'1': Table insclause-0 has 2 columns, but query has 1 columns.
{color}
{quote}

Apperantly 

[jira] [Updated] (HIVE-16486) schema changes for partitioned tables (RESTRICT, not CASCADE) seems to lead to unreasonable behaviour

2017-04-21 Thread Dudu Markovitz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dudu Markovitz updated HIVE-16486:
--
Summary: schema changes for partitioned tables (RESTRICT, not CASCADE) 
seems to lead to unreasonable behaviour  (was: schema changes for partitioned 
tables (RESTRICT, not CASCADE) other than adding columns at the end of a table, 
return wrong results)

> schema changes for partitioned tables (RESTRICT, not CASCADE) seems to lead 
> to unreasonable behaviour
> -
>
> Key: HIVE-16486
> URL: https://issues.apache.org/jira/browse/HIVE-16486
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 1.1.0
>Reporter: Dudu Markovitz
>
> h4. Use case 1: adding a column in the middle of a table
> {code}
> create table mytable  (i1 int,i3 int) partitioned by (ver int) stored as 
> textfile;
> insert into mytable partition(ver=1) values (1,2),(3,4);
> select * from mytable;
> {code}
> ||i1||i3||ver||
> |1|2|1|
> |3|4|1|
> {code}
> alter table mytable replace columns (i1 int,i2 int,i3 int);
> insert into mytable partition(ver=2) values (5,6,7),(8,9,10);
> select * from mytable;
> {code}
> I would expect values 2 and 4 to appear under column i3 , In accordance with 
> the columns definition of partition ver=1, but they appear under column i2.
> ||i1||i2||i3||ver||
> |1|2|(null)|1|
> |3|4|(null)|1|
> |5|6|7|2|
> |8|9|10|2|
> h4. Use case 2: dropping a column
> {code}
> create table mytable  (i1 int,i2 int) partitioned by (ver int) stored as 
> textfile;
> insert into mytable partition(ver=1) values (1,2),(3,4);
> select * from mytable;
> {code}
> ||i1||i2||ver||
> |1|2|1|
> |3|4|1|
> {code}
> alter table mytable replace columns (i1 int);
> insert into mytable partition(ver=2) values (5),(6);
> select * from mytable;
> {code}
> ||i1||ver||
> |1|1|
> |3|1|
> |5|2|
> |6|2|
> I would expect column i2 to be part of the columns list when asterisk is 
> being used or at least that is would be available for explicit use
> {code}
> select i1,i2 from mytable;
> {code}
> {quote}
> {color:red}
> Error while compiling statement: FAILED: SemanticException [Error 10004]: 
> Line 1:10 Invalid table alias or column reference 'i2': (possible column 
> names are: i1, ver)
> {color}
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16486) schema changes for partitioned tables (RESTRICT, not CASCADE) other than adding columns at the end of a table, return wrong results

2017-04-21 Thread Dudu Markovitz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dudu Markovitz updated HIVE-16486:
--
Description: 
h4. Use case 1: adding a column in the middle of a table

{code}
create table mytable  (i1 int,i3 int) partitioned by (ver int) stored as 
textfile;
insert into mytable partition(ver=1) values (1,2),(3,4);
select * from mytable;
{code}

||i1||i3||ver||
|1|2|1|
|3|4|1|

{code}
alter table mytable replace columns (i1 int,i2 int,i3 int);
insert into mytable partition(ver=2) values (5,6,7),(8,9,10);
select * from mytable;
{code}


I would expect values 2 and 4 to appear under column i3 , In accordance with 
the columns definition of partition ver=1, but they appear under column i2.

||i1||i2||i3||ver||
|1|2|(null)|1|
|3|4|(null)|1|
|5|6|7|2|
|8|9|10|2|

h4. Use case 2: dropping a column

{code}
create table mytable  (i1 int,i2 int) partitioned by (ver int) stored as 
textfile;
insert into mytable partition(ver=1) values (1,2),(3,4);
select * from mytable;
{code}

||i1||i2||ver||
|1|2|1|
|3|4|1|

{code}
alter table mytable replace columns (i1 int);
insert into mytable partition(ver=2) values (5),(6);
select * from mytable;
{code}

||i1||ver||
|1|1|
|3|1|
|5|2|
|6|2|

I would expect column i2 to be part of the columns list when asterisk is being 
used or at least that is would be available for explicit use

{code}
select i1,i2 from mytable;
{code}

{quote}
{color:red}
Error while compiling statement: FAILED: SemanticException [Error 10004]: Line 
1:10 Invalid table alias or column reference 'i2': (possible column names are: 
i1, ver)
{color}
{quote}

  was:
h4. Use case 1: adding a column in the middle of a table

{code}
create table mytable  (i1 int,i3 int) partitioned by (ver int) stored as 
textfile;
insert into mytable partition(ver=1) values (1,2),(3,4);
select * from mytable;
{code}

||i1||i3||ver||
|1|2|1|
|3|4|1|

{code}
alter table mytable replace columns (i1 int,i2 int,i3 int);
insert into mytable partition(ver=2) values (5,6,7),(8,9,10);
select * from mytable;
{code}


*I would expect values 2 and 4 to appear under column i3 , In accordance with 
the columns definition of partition ver=1, but they appear under column i2.*

||i1||i2||i3||ver||
|1|2|(null)|1|
|3|4|(null)|1|
|5|6|7|2|
|8|9|10|2|

h4. Use case 2: dropping a column

{code}
create table mytable  (i1 int,i2 int) partitioned by (ver int) stored as 
textfile;
insert into mytable partition(ver=1) values (1,2),(3,4);
select * from mytable;
{code}

||i1||i2||ver||
|1|2|1|
|3|4|1|

{code}
alter table mytable replace columns (i1 int);
insert into mytable partition(ver=2) values (5),(6);
select * from mytable;
{code}

||i1||ver||
|1|1|
|3|1|
|5|2|
|6|2|

*I would expect column i2 to be part of the columns list when asterisk is being 
used or at least that is would be available for explicit use*

{code}
select i1,i2 from mytable;
{code}

{color:red}
Error while compiling statement: FAILED: SemanticException [Error 10004]: Line 
1:10 Invalid table alias or column reference 'i2': (possible column names are: 
i1, ver)
{color}


> schema changes for partitioned tables (RESTRICT, not CASCADE) other than 
> adding columns at the end of a table, return wrong results
> ---
>
> Key: HIVE-16486
> URL: https://issues.apache.org/jira/browse/HIVE-16486
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 1.1.0
>Reporter: Dudu Markovitz
>
> h4. Use case 1: adding a column in the middle of a table
> {code}
> create table mytable  (i1 int,i3 int) partitioned by (ver int) stored as 
> textfile;
> insert into mytable partition(ver=1) values (1,2),(3,4);
> select * from mytable;
> {code}
> ||i1||i3||ver||
> |1|2|1|
> |3|4|1|
> {code}
> alter table mytable replace columns (i1 int,i2 int,i3 int);
> insert into mytable partition(ver=2) values (5,6,7),(8,9,10);
> select * from mytable;
> {code}
> I would expect values 2 and 4 to appear under column i3 , In accordance with 
> the columns definition of partition ver=1, but they appear under column i2.
> ||i1||i2||i3||ver||
> |1|2|(null)|1|
> |3|4|(null)|1|
> |5|6|7|2|
> |8|9|10|2|
> h4. Use case 2: dropping a column
> {code}
> create table mytable  (i1 int,i2 int) partitioned by (ver int) stored as 
> textfile;
> insert into mytable partition(ver=1) values (1,2),(3,4);
> select * from mytable;
> {code}
> ||i1||i2||ver||
> |1|2|1|
> |3|4|1|
> {code}
> alter table mytable replace columns (i1 int);
> insert into mytable partition(ver=2) values (5),(6);
> select * from mytable;
> {code}
> ||i1||ver||
> |1|1|
> |3|1|
> |5|2|
> |6|2|
> I would expect column i2 to be part of the columns list when asterisk is 
> being used or at least that is would be available for explicit use

[jira] [Updated] (HIVE-16486) schema changes for partitioned tables (RESTRICT, not CASCADE) other than adding columns at the end of a table, return wrong results

2017-04-21 Thread Dudu Markovitz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dudu Markovitz updated HIVE-16486:
--
Description: 
h4. Use case 1: adding a column in the middle of a table

{code}
create table mytable  (i1 int,i3 int) partitioned by (ver int) stored as 
textfile;
insert into mytable partition(ver=1) values (1,2),(3,4);
select * from mytable;
{code}

||i1||i3||ver||
|1|2|1|
|3|4|1|

{code}
alter table mytable replace columns (i1 int,i2 int,i3 int);
insert into mytable partition(ver=2) values (5,6,7),(8,9,10);
select * from mytable;
{code}


*I would expect values 2 and 4 to appear under column i3 , In accordance with 
the columns definition of partition ver=1, but they appear under column i2.*

||i1||i2||i3||ver||
|1|2|(null)|1|
|3|4|(null)|1|
|5|6|7|2|
|8|9|10|2|

h4. Use case 2: dropping a column

{code}
create table mytable  (i1 int,i2 int) partitioned by (ver int) stored as 
textfile;
insert into mytable partition(ver=1) values (1,2),(3,4);
select * from mytable;
{code}

||i1||i2||ver||
|1|2|1|
|3|4|1|

{code}
alter table mytable replace columns (i1 int);
insert into mytable partition(ver=2) values (5),(6);
select * from mytable;
{code}

||i1||ver||
|1|1|
|3|1|
|5|2|
|6|2|

*I would expect column i2 to be part of the columns list when asterisk is being 
used or at least that is would be available for explicit use*

{code}
select i1,i2 from mytable;
{code}

{color:red}
Error while compiling statement: FAILED: SemanticException [Error 10004]: Line 
1:10 Invalid table alias or column reference 'i2': (possible column names are: 
i1, ver)
{color}

  was:
-
-- Use case 1: adding a column in the middle of a table
-
create table mytable  (i1 int,i3 int) partitioned by (ver int) stored as 
textfile;
insert into mytable partition(ver=1) values (1,2),(3,4);
select * from mytable;

||i1||i3||ver||
|1|2|1|
|3|4|1|

alter table mytable replace columns (i1 int,i2 int,i3 int);
insert into mytable partition(ver=2) values (5,6,7),(8,9,10);
select * from mytable;

**
I would expect values 2 and 4 to appear under column i3 , In accordance with 
the columns definition of partition ver=1, but they appear under column i2.
**

i1  i2  i3  ver
1   2   (null)  1
3   4   (null)  1
5   6   7   2
8   9   10  2

-
-- Use case 2: dropping a column
-

create table mytable  (i1 int,i2 int) partitioned by (ver int) stored as 
textfile;
insert into mytable partition(ver=1) values (1,2),(3,4);
select * from mytable;

i1  i2  ver
1   2   1
3   4   1

alter table mytable replace columns (i1 int);
insert into mytable partition(ver=2) values (5),(6);

select * from mytable;

i1  ver
1   1
3   1
5   2
6   2


**
I would expect column i2 to be part of the columns list when asterisk is being 
used or at least that is would be available for explicit use
**

select i1,i2 from mytable;

Error while compiling statement: FAILED: SemanticException [Error 10004]: Line 
1:10 Invalid table alias or column reference 'i2': (possible column names are: 
i1, ver)


> schema changes for partitioned tables (RESTRICT, not CASCADE) other than 
> adding columns at the end of a table, return wrong results
> ---
>
> Key: HIVE-16486
> URL: https://issues.apache.org/jira/browse/HIVE-16486
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 1.1.0
>Reporter: Dudu Markovitz
>
> h4. Use case 1: adding a column in the middle of a table
> {code}
> create table mytable  (i1 int,i3 int) partitioned by (ver int) stored as 
> textfile;
> insert into mytable partition(ver=1) values (1,2),(3,4);
> select * from mytable;
> {code}
> ||i1||i3||ver||
> |1|2|1|
> |3|4|1|
> {code}
> alter table mytable replace columns (i1 int,i2 int,i3 int);
> insert into mytable partition(ver=2) values (5,6,7),(8,9,10);
> select * from mytable;
> {code}
> *I would expect values 2 and 4 to appear under column i3 , In accordance with 
> the columns definition of partition ver=1, but they appear under column i2.*
> ||i1||i2||i3||ver||
> |1|2|(null)|1|
> |3|4|(null)|1|
> |5|6|7|2|
> |8|9|10|2|
> h4. Use case 2: dropping a column
> {code}
> create 

[jira] [Updated] (HIVE-16486) schema changes for partitioned tables (RESTRICT, not CASCADE) other than adding columns at the end of a table, return wrong results

2017-04-21 Thread Dudu Markovitz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dudu Markovitz updated HIVE-16486:
--
Description: 
-
-- Use case 1: adding a column in the middle of a table
-
create table mytable  (i1 int,i3 int) partitioned by (ver int) stored as 
textfile;
insert into mytable partition(ver=1) values (1,2),(3,4);
select * from mytable;

||i1||i3||ver||
|1|2|1|
|3|4|1|

alter table mytable replace columns (i1 int,i2 int,i3 int);
insert into mytable partition(ver=2) values (5,6,7),(8,9,10);
select * from mytable;

**
I would expect values 2 and 4 to appear under column i3 , In accordance with 
the columns definition of partition ver=1, but they appear under column i2.
**

i1  i2  i3  ver
1   2   (null)  1
3   4   (null)  1
5   6   7   2
8   9   10  2

-
-- Use case 2: dropping a column
-

create table mytable  (i1 int,i2 int) partitioned by (ver int) stored as 
textfile;
insert into mytable partition(ver=1) values (1,2),(3,4);
select * from mytable;

i1  i2  ver
1   2   1
3   4   1

alter table mytable replace columns (i1 int);
insert into mytable partition(ver=2) values (5),(6);

select * from mytable;

i1  ver
1   1
3   1
5   2
6   2


**
I would expect column i2 to be part of the columns list when asterisk is being 
used or at least that is would be available for explicit use
**

select i1,i2 from mytable;

Error while compiling statement: FAILED: SemanticException [Error 10004]: Line 
1:10 Invalid table alias or column reference 'i2': (possible column names are: 
i1, ver)

  was:
-
-- Use case 1: adding a column in the middle of a table
-
create table mytable  (i1 int,i3 int) partitioned by (ver int) stored as 
textfile;
insert into mytable partition(ver=1) values (1,2),(3,4);
select * from mytable;

i1  i3  ver
1   2   1
3   4   1

alter table mytable replace columns (i1 int,i2 int,i3 int);
insert into mytable partition(ver=2) values (5,6,7),(8,9,10);
select * from mytable;

**
I would expect values 2 and 4 to appear under column i3 , In accordance with 
the columns definition of partition ver=1, but they appear under column i2.
**

i1  i2  i3  ver
1   2   (null)  1
3   4   (null)  1
5   6   7   2
8   9   10  2

-
-- Use case 2: dropping a column
-

create table mytable  (i1 int,i2 int) partitioned by (ver int) stored as 
textfile;
insert into mytable partition(ver=1) values (1,2),(3,4);
select * from mytable;

i1  i2  ver
1   2   1
3   4   1

alter table mytable replace columns (i1 int);
insert into mytable partition(ver=2) values (5),(6);

select * from mytable;

i1  ver
1   1
3   1
5   2
6   2


**
I would expect column i2 to be part of the columns list when asterisk is being 
used or at least that is would be available for explicit use
**

select i1,i2 from mytable;

Error while compiling statement: FAILED: SemanticException [Error 10004]: Line 
1:10 Invalid table alias or column reference 'i2': (possible column names are: 
i1, ver)


> schema changes for partitioned tables (RESTRICT, not CASCADE) other than 
> adding columns at the end of a table, return wrong results
> ---
>
> Key: HIVE-16486
> URL: https://issues.apache.org/jira/browse/HIVE-16486
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 1.1.0
>Reporter: Dudu Markovitz
>
> -
> -- Use case 1: adding a column in the middle of a table
> 

[jira] [Updated] (HIVE-16496) Enhance asterisk expression (as in "select *") with EXCLUDE clause

2017-04-21 Thread Dudu Markovitz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dudu Markovitz updated HIVE-16496:
--
Description: 
support the following syntax:

{code}
select * exclude (a,b,e) from t
{code}

which for a table t with columns a,b,c,d,e would be equal to:

{code}
select c,d from t
{code}

Please note that the EXCLUDE clause relates directly to its preceding asterisk.

A common use case would be:

{code}
select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
{code}

This supplies a very clean way to select all columns without getting "Ambiguous 
column reference" and without the need to specify all the columns of at least 
one of the tables.
 
Currently, without this enhancement, the query would look something like this:

{code}
select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
on t1.x=t2.x;
{code}

Considering a table may hold hundreds or even thousands of column, this can be 
come very ugly and error prone.
Often this require some scripting work.

An extension to this enhancement would be supporting positional column 
notation, e.g.:

{code}
select * exclude (${1,2,5}) from t   /* exclude columns 1, 2 and 5 */
select * exclude (${1-3}) from t   /* exclude columns 1 to 3 */
select * exclude (${1,3-5,7}) from t   /* exclude columns 1,3 to 5 and 7 */
select * exclude (${3,5-}) from t   /* exclude the 3rd column and all columns 
from the 5th column to the end */
select * exclude (-${1,2}) from t   /* exclude last 2 columns */
select * exclude (-${1-3,7}) from t   /* exclude last 3 columns and the 7th 
column from the end */
{code}

A complex example would look like:

{code}
select * exclude (${1-3,5,7},x,y,-${1-2})  from t  
{code}

exclude:
- first 3 columns
- 5th and 7th columns
- x and y 
- last 2 columns

P.s. 1
There should be *no* error raised for the following scenarios:
 - Excluding a column that does not exists in the columns set
 - Excluding the same column more than once (e.g. by name and by position).
- Excluding all columns

P.s. 2
This enhancement answers a real need that is being raised again again in the 
Hive users community as well in legacy RDBMS communities. 
As far as I know, no provider had yet implemented something similar and we have 
an opportunity here to lead the SQL ecosystem.

  

  was:
support the following syntax:

{code}
select * exclude (a,b,e) from t
{code}

which for a table t with columns a,b,c,d,e would be equal to:

{code}
select c,d from t
{code}

Please note that the EXCLUDE clause relates directly to its preceding asterisk.

A common use case would be:

{code}
select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
{code}

This supplies a very clean way to select all columns without getting "Ambiguous 
column reference" and without the need to specify all the columns of at least 
one of the tables.
 
Currently, without this enhancement, the query would look something like this:

{code}
select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
on t1.x=t2.x;
{code}

Considering that a table may contain hundreds or even thousands of column this 
can be come very ugly and error prone.
Often this require some scripting work.

An extension to this enhancement would be supporting positional column 
notation, e.g.:

{code}
select * exclude (${1,2,5}) from t   /* exclude columns 1, 2 and 5 */
select * exclude (${1-3}) from t   /* exclude columns 1 to 3 */
select * exclude (${1,3-5,7}) from t   /* exclude columns 1,3 to 5 and 7 */
select * exclude (${3,5-}) from t   /* exclude the 3rd column and all columns 
from the 5th column to the end */
select * exclude (-${1,2}) from t   /* exclude last 2 columns */
select * exclude (-${1-3,7}) from t   /* exclude last 3 columns and the 7th 
column from the end */
{code}

A complex example would look like:

{code}
select * exclude (${1-3,5,7},x,y,-${1-2})  from t  
{code}

exclude:
- first 3 columns
- 5th and 7th columns
- x and y 
- last 2 columns

P.s. 1
There should be *no* error raised for the following scenarios:
 - Excluding a column that does not exists in the columns set
 - Excluding the same column more than once (e.g. by name and by position).
- Excluding all columns

P.s. 2
This enhancement answers a real need that is being raised again again in the 
Hive users community as well in legacy RDBMS communities. 
As far as I know, no provider had yet implemented something similar and we have 
an opportunity here to lead the SQL ecosystem.

  


> Enhance asterisk expression (as in "select *") with EXCLUDE clause
> --
>
> Key: HIVE-16496
> URL: https://issues.apache.org/jira/browse/HIVE-16496
> Project: Hive
>  Issue Type: Wish
>  Components: Parser
>Reporter: Dudu Markovitz
>
> support the following syntax:
> {code}
> select * exclude 

[jira] [Issue Comment Deleted] (HIVE-16496) Enhance asterisk expression (as in "select *") with EXCLUDE clause

2017-04-21 Thread Dudu Markovitz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dudu Markovitz updated HIVE-16496:
--
Comment: was deleted

(was: AS far as I know, no SQL provider had yet dealt with this issue.
)

> Enhance asterisk expression (as in "select *") with EXCLUDE clause
> --
>
> Key: HIVE-16496
> URL: https://issues.apache.org/jira/browse/HIVE-16496
> Project: Hive
>  Issue Type: Wish
>  Components: Parser
>Reporter: Dudu Markovitz
>
> support the following syntax:
> {code}
> select * exclude (a,b,e) from t
> {code}
> which for a table t with columns a,b,c,d,e would be equal to:
> {code}
> select c,d from t
> {code}
> Please note that the EXCLUDE clause relates directly to its preceding 
> asterisk.
> A common use case would be:
> {code}
> select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
> {code}
> This supplies a very clean way to select all columns without getting 
> "Ambiguous column reference" and without the need to specify all the columns 
> of at least one of the tables.
>  
> Currently, without this enhancement, the query would look something like this:
> {code}
> select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
> on t1.x=t2.x;
> {code}
> Considering that a table may contain hundreds or even thousands of column 
> this can be come very ugly and error prone.
> Often this require some scripting work.
> An extension to this enhancement would be supporting positional column 
> notation, e.g.:
> {code}
> select * exclude (${1,2,5}) from t   /* exclude columns 1, 2 and 5 */
> select * exclude (${1-3}) from t   /* exclude columns 1 to 3 */
> select * exclude (${1,3-5,7}) from t   /* exclude columns 1,3 to 5 and 7 */
> select * exclude (${3,5-}) from t   /* exclude the 3rd column and all columns 
> from the 5th column to the end */
> select * exclude (-${1,2}) from t   /* exclude last 2 columns */
> select * exclude (-${1-3,7}) from t   /* exclude last 3 columns and the 7th 
> column from the end */
> {code}
> A complex example would look like:
> {code}
> select * exclude (${1-3,5,7},x,y,-${1-2})  from t  
> {code}
> exclude:
> - first 3 columns
> - 5th and 7th columns
> - x and y 
> - last 2 columns
> P.s. 1
> There should be *no* error raised for the following scenarios:
>  - Excluding a column that does not exists in the columns set
>  - Excluding the same column more than once (e.g. by name and by position).
> - Excluding all columns
> P.s. 2
> This enhancement answers a real need that is being raised again again in the 
> Hive users community as well in legacy RDBMS communities. 
> As far as I know, no provider had yet implemented something similar and we 
> have an opportunity here to lead the SQL ecosystem.
>   



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-16496) Enhance asterisk expression (as in "select *") with EXCLUDE clause

2017-04-21 Thread Dudu Markovitz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15978414#comment-15978414
 ] 

Dudu Markovitz edited comment on HIVE-16496 at 4/21/17 9:53 AM:


AS far as I know, no SQL provider had yet dealt with this issue.



was (Author: dmarkovitz):
AS far as I know, no SQL provider had yet dealt with this issue.


> Enhance asterisk expression (as in "select *") with EXCLUDE clause
> --
>
> Key: HIVE-16496
> URL: https://issues.apache.org/jira/browse/HIVE-16496
> Project: Hive
>  Issue Type: Wish
>  Components: Parser
>Reporter: Dudu Markovitz
>
> support the following syntax:
> {code}
> select * exclude (a,b,e) from t
> {code}
> which for a table t with columns a,b,c,d,e would be equal to:
> {code}
> select c,d from t
> {code}
> Please note that the EXCLUDE clause relates directly to its preceding 
> asterisk.
> A common use case would be:
> {code}
> select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
> {code}
> This supplies a very clean way to select all columns without getting 
> "Ambiguous column reference" and without the need to specify all the columns 
> of at least one of the tables.
>  
> Currently, without this enhancement, the query would look something like this:
> {code}
> select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
> on t1.x=t2.x;
> {code}
> Considering that a table may contain hundreds or even thousands of column 
> this can be come very ugly and error prone.
> Often this require some scripting work.
> An extension to this enhancement would be supporting positional column 
> notation, e.g.:
> {code}
> select * exclude (${1,2,5}) from t   /* exclude columns 1, 2 and 5 */
> select * exclude (${1-3}) from t   /* exclude columns 1 to 3 */
> select * exclude (${1,3-5,7}) from t   /* exclude columns 1,3 to 5 and 7 */
> select * exclude (${3,5-}) from t   /* exclude the 3rd column and all columns 
> from the 5th column to the end */
> select * exclude (-${1,2}) from t   /* exclude last 2 columns */
> select * exclude (-${1-3,7}) from t   /* exclude last 3 columns and the 7th 
> column from the end */
> {code}
> A complex example would look like:
> {code}
> select * exclude (${1-3,5,7},x,y,-${1-2})  from t  
> {code}
> exclude:
> - first 3 columns
> - 5th and 7th columns
> - x and y 
> - last 2 columns
> P.s. 1
> There should be *no* error raised for the following scenarios:
>  - Excluding a column that does not exists in the columns set
>  - Excluding the same column more than once (e.g. by name and by position).
> - Excluding all columns
> P.s. 2
> This enhancement answers a real need that is being raised again again in the 
> Hive users community as well in legacy RDBMS communities. 
> As far as I know, no provider had yet implemented something similar and we 
> have an opportunity here to lead the SQL ecosystem.
>   



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16496) Enhance asterisk expression (as in "select *") with EXCLUDE clause

2017-04-21 Thread Dudu Markovitz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15978414#comment-15978414
 ] 

Dudu Markovitz commented on HIVE-16496:
---

AS far as I know, no SQL provider had yet dealt with this issue.


> Enhance asterisk expression (as in "select *") with EXCLUDE clause
> --
>
> Key: HIVE-16496
> URL: https://issues.apache.org/jira/browse/HIVE-16496
> Project: Hive
>  Issue Type: Wish
>  Components: Parser
>Reporter: Dudu Markovitz
>
> support the following syntax:
> {code}
> select * exclude (a,b,e) from t
> {code}
> which for a table t with columns a,b,c,d,e would be equal to:
> {code}
> select c,d from t
> {code}
> Please note that the EXCLUDE clause relates directly to its preceding 
> asterisk.
> A common use case would be:
> {code}
> select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
> {code}
> This supplies a very clean way to select all columns without getting 
> "Ambiguous column reference" and without the need to specify all the columns 
> of at least one of the tables.
>  
> Currently, without this enhancement, the query would look something like this:
> {code}
> select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
> on t1.x=t2.x;
> {code}
> Considering that a table may contain hundreds or even thousands of column 
> this can be come very ugly and error prone.
> Often this require some scripting work.
> An extension to this enhancement would be supporting positional column 
> notation, e.g.:
> {code}
> select * exclude (${1,2,5}) from t   /* exclude columns 1, 2 and 5 */
> select * exclude (${1-3}) from t   /* exclude columns 1 to 3 */
> select * exclude (${1,3-5,7}) from t   /* exclude columns 1,3 to 5 and 7 */
> select * exclude (${3,5-}) from t   /* exclude the 3rd column and all columns 
> from the 5th column to the end */
> select * exclude (-${1,2}) from t   /* exclude last 2 columns */
> select * exclude (-${1-3,7}) from t   /* exclude last 3 columns and the 7th 
> column from the end */
> {code}
> A complex example would look like:
> {code}
> select * exclude (${1-3,5,7},x,y,-${1-2})  from t  
> {code}
> exclude:
> - first 3 columns
> - 5th and 7th columns
> - x and y 
> - last 2 columns
> P.s. 1
> There should be *no* error raised for the following scenarios:
>  - Excluding a column that does not exists in the columns set
>  - Excluding the same column more than once (e.g. by name and by position).
> - Excluding all columns
> P.s. 2
> This enhancement answers a real need that is being raised again again in the 
> Hive users community as well in legacy RDBMS communities. 
> As far as I know, no provider had yet implemented something similar and we 
> have an opportunity here to lead the SQL ecosystem.
>   



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16496) Enhance asterisk expression (as in "select *") with EXCLUDE clause

2017-04-21 Thread Dudu Markovitz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dudu Markovitz updated HIVE-16496:
--
Description: 
support the following syntax:

{code}
select * exclude (a,b,e) from t
{code}

which for a table t with columns a,b,c,d,e would be equal to:

{code}
select c,d from t
{code}

Please note that the EXCLUDE clause relates directly to its preceding asterisk.

A common use case would be:

{code}
select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
{code}

This supplies a very clean way to select all columns without getting "Ambiguous 
column reference" and without the need to specify all the columns of at least 
one of the tables.
 
Currently, without this enhancement, the query would look something like this:

{code}
select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
on t1.x=t2.x;
{code}

Considering that a table may contain hundreds or even thousands of column this 
can be come very ugly and error prone.
Often this require some scripting work.

An extension to this enhancement would be supporting positional column 
notation, e.g.:

{code}
select * exclude (${1,2,5}) from t   /* exclude columns 1, 2 and 5 */
select * exclude (${1-3}) from t   /* exclude columns 1 to 3 */
select * exclude (${1,3-5,7}) from t   /* exclude columns 1,3 to 5 and 7 */
select * exclude (${3,5-}) from t   /* exclude the 3rd column and all columns 
from the 5th column to the end */
select * exclude (-${1,2}) from t   /* exclude last 2 columns */
select * exclude (-${1-3,7}) from t   /* exclude last 3 columns and the 7th 
column from the end */
{code}

A complex example would look like:

{code}
select * exclude (${1-3,5,7},x,y,-${1-2})  from t  
{code}

exclude:
- first 3 columns
- 5th and 7th columns
- x and y 
- last 2 columns

P.s. 1
There should be *no* error raised for the following scenarios:
 - Excluding a column that does not exists in the columns set
 - Excluding the same column more than once (e.g. by name and by position).
- Excluding all columns

P.s. 2
This enhancement answers a real need that is being raised again again in the 
Hive users community as well in legacy RDBMS communities. 
As far as I know, no provider had yet implemented something similar and we have 
an opportunity here to lead the SQL ecosystem.

  

  was:
support the following syntax:

{code}
select * exclude (a,b,e) from t
{code}

which for a table t with columns a,b,c,d,e would be equal to:

{code}
select c,d from t
{code}

A common use case would be:

{code}
select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
{code}

This supplies a very clean way to select all columns without getting "Ambiguous 
column reference" and without the need to specify all the columns of at least 
one of the tables.
 
Currently, without this enhancement, the query would look something like this:

{code}
select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
on t1.x=t2.x;
{code}

Considering that a table may contain hundreds or even thousands of column this 
can be come very ugly and error prone.
Often this require some scripting work.

An extension to this enhancement would be supporting positional column 
notation, e.g.:

{code}
select * exclude (${1,2,5}) from t   /* exclude columns 1, 2 and 5 */
select * exclude (${1-3}) from t   /* exclude columns 1 to 3 */
select * exclude (${1,3-5,7}) from t   /* exclude columns 1,3 to 5 and 7 */
select * exclude (${3,5-}) from t   /* exclude the 3rd column and all columns 
from the 5th column to the end */
select * exclude (-${1,2}) from t   /* exclude last 2 columns */
select * exclude (-${1-3,7}) from t   /* exclude last 3 columns and the 7th 
column from the end */
{code}

A complex example would look like:

{code}
select * exclude (${1-3,5,7},x,y,-${1-2})  from t  
{code}

exclude:
- first 3 columns
- 5th and 7th columns
- x and y 
- last 2 columns

P.s. 1
There should be *no* error raised for the following scenarios:
 - Excluding a column that does not exists in the columns set
 - Excluding the same column more than once (e.g. by name and by position).
- Excluding all columns

P.s. 2
This enhancement answers a real need that is being raised again again in the 
Hive users community as well in legacy RDBMS communities. 
As far as I know, no provider had yet implemented something similar and we have 
an opportunity here to lead the SQL ecosystem.

  


> Enhance asterisk expression (as in "select *") with EXCLUDE clause
> --
>
> Key: HIVE-16496
> URL: https://issues.apache.org/jira/browse/HIVE-16496
> Project: Hive
>  Issue Type: Wish
>  Components: Parser
>Reporter: Dudu Markovitz
>
> support the following syntax:
> {code}
> select * exclude (a,b,e) from t
> {code}
> which for a table t with columns a,b,c,d,e 

[jira] [Updated] (HIVE-16496) Enhance asterisk expression (as in "select *") with EXCLUDE clause

2017-04-21 Thread Dudu Markovitz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dudu Markovitz updated HIVE-16496:
--
Description: 
support the following syntax:

{code}
select * exclude (a,b,e) from t
{code}

which for a table t with columns a,b,c,d,e would be equal to:

{code}
select c,d from t
{code}

A common use case would be:

{code}
select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
{code}

This supplies a very clean way to select all columns without getting "Ambiguous 
column reference" and without the need to specify all the columns of at least 
one of the tables.
 
Currently, without this enhancement, the query would look something like this:

{code}
select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
on t1.x=t2.x;
{code}

Considering that a table may contain hundreds or even thousands of column this 
can be come very ugly and error prone.
Often this require some scripting work.

An extension to this enhancement would be supporting positional column 
notation, e.g.:

{code}
select * exclude (${1,2,5}) from t   /* exclude columns 1, 2 and 5 */
select * exclude (${1-3}) from t   /* exclude columns 1 to 3 */
select * exclude (${1,3-5,7}) from t   /* exclude columns 1,3 to 5 and 7 */
select * exclude (${3,5-}) from t   /* exclude the 3rd column and all columns 
from the 5th column to the end */
select * exclude (-${1,2}) from t   /* exclude last 2 columns */
select * exclude (-${1-3,7}) from t   /* exclude last 3 columns and the 7th 
column from the end */
{code}

A complex example would look like:

{code}
select * exclude (${1-3,5,7},x,y,-${1-2})  from t  
{code}

exclude:
- first 3 columns
- 5th and 7th columns
- x and y 
- last 2 columns

P.s. 1
There should be *no* error raised for the following scenarios:
 - Excluding a column that does not exists in the columns set
 - Excluding the same column more than once (e.g. by name and by position).
- Excluding all columns

P.s. 2
This enhancement answers a real need that is being raised again again in the 
Hive users community as well in legacy RDBMS communities. 
As far as I know, no provider had yet implemented something similar and we have 
an opportunity here to lead the SQL ecosystem.

  

  was:
support the following syntax:

{code}
select * exclude (a,b,e) from t
{code}

which for a table t with columns a,b,c,d,e would be equal to:

{code}
select c,d from t
{code}

A common use case would be:

{code}
select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
{code}

This supplies a very clean way to select all columns without getting "Ambiguous 
column reference" and without the need to specify all the columns of at least 
one of the tables.
 
Currently, without this enhancement, the query would look something like this:

{code}
select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
on t1.x=t2.x;
{code}

An extension to this enhancement would be supporting positional column 
notation, e.g.:

{code}
select * exclude (${1,2,5})from t   /* exclude columns 1, 2 and 5 */
select * exclude (${1-3})   from t   /* exclude columns 1 to 3 */
select * exclude (${1,3-5,7}) from t   /* exclude columns 1,3 to 5 and 7 */
select * exclude (${5-}) from t   /* exclude all columns from the 5th column to 
the end */
select * exclude (-${1,2})  from t   /* exclude last 2 columns */
select * exclude (-${1-3,7})   from t   /* exclude last 3 columns and the 7th 
column from the end */
{code}

A complex example would look like:

{code}
select * exclude (${1-3,5,7},x,y,-${1-2})  from t  
{code}

exclude:
- first 3 columns
- 5th and 7th columns
- x and y 
- last 2 columns

P.s.
There should be *no* error raised for excluding a column that does not exists 
in the columns set or for excluding the same column more than once (e.g. by 
name and by position).


> Enhance asterisk expression (as in "select *") with EXCLUDE clause
> --
>
> Key: HIVE-16496
> URL: https://issues.apache.org/jira/browse/HIVE-16496
> Project: Hive
>  Issue Type: Wish
>  Components: Parser
>Reporter: Dudu Markovitz
>
> support the following syntax:
> {code}
> select * exclude (a,b,e) from t
> {code}
> which for a table t with columns a,b,c,d,e would be equal to:
> {code}
> select c,d from t
> {code}
> A common use case would be:
> {code}
> select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
> {code}
> This supplies a very clean way to select all columns without getting 
> "Ambiguous column reference" and without the need to specify all the columns 
> of at least one of the tables.
>  
> Currently, without this enhancement, the query would look something like this:
> {code}
> select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
> on t1.x=t2.x;
> {code}
> Considering that a 

[jira] [Updated] (HIVE-16496) Enhance asterisk expression (as in "select *") with EXCLUDE clause

2017-04-21 Thread Dudu Markovitz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dudu Markovitz updated HIVE-16496:
--
Description: 
support the following syntax:

{code}
select * exclude (a,b,e) from t
{code}

which for a table t with columns a,b,c,d,e would be equal to:

{code}
select c,d from t
{code}

A common use case would be:

{code}
select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
{code}

This supplies a very clean way to select all columns without getting "Ambiguous 
column reference" and without the need to specify all the columns of at least 
one of the tables.
 
Currently, without this enhancement, the query would look something like this:

{code}
select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
on t1.x=t2.x;
{code}

An extension to this enhancement would be supporting positional column 
notation, e.g.:

{code}
select * exclude (${1,2,5})from t   /* exclude columns 1, 2 and 5 */
select * exclude (${1-3})   from t   /* exclude columns 1 to 3 */
select * exclude (${1,3-5,7}) from t   /* exclude columns 1,3 to 5 and 7 */
select * exclude (${5-}) from t   /* exclude all columns from the 5th column to 
the end */
select * exclude (-${1,2})  from t   /* exclude last 2 columns */
select * exclude (-${1-3,7})   from t   /* exclude last 3 columns and the 7th 
column from the end */
{code}

A complex example would look like:

{code}
select * exclude (${1-3,5,7},x,y,-${1-2})  from t  
{code}

exclude:
- first 3 columns
- 5th and 7th columns
- x and y 
- last 2 columns

P.s.
There should be *no* error raised for excluding a column that does not exists 
in the columns set or for excluding the same column more than once (e.g. by 
name and by position).

  was:
support the following syntax:

{code}
select * exclude (a,b,e) from t
{code}

which for a table t with columns a,b,c,d,e would be equal to:

{code}
select c,d from t
{code}

A common use case would be:

{code}
select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
{code}

This supplies a very clean way to select all columns without getting "Ambiguous 
column reference" and without the need to specify all the columns of at least 
one of the tables.
 
Currently, without this enhancement, the query would look something like this:

{code}
select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
on t1.x=t2.x;
{code}

An extension to this enhancement would be supporting positional column 
notation, e.g.:

{code}
select * exclude (${1,2,5})from t   /* exclude columns 1, 2 and 5 */
select * exclude (${1-3})   from t   /* exclude columns 1 to 3 */
select * exclude (${1,3-5,7}) from t   /* exclude columns 1,3 to 5 and 7 */
select * exclude (-${1,2})  from t   /* exclude last 2 columns */
select * exclude (-${1-3,7})   from t   /* exclude last 3 columns and the 7th 
column from the end */
{code}

A complex example would look like:

{code}
select * exclude (${1-3,5,7},x,y,-${1-2})  from t  
{code}

exclude:
- first 3 columns
- 5th and 7th columns
- x and y 
- last 2 columns

P.s.
There should be *no* error raised for excluding a column that does not exists 
in the columns set or for excluding the same column more than once (e.g. by 
name and by position).


> Enhance asterisk expression (as in "select *") with EXCLUDE clause
> --
>
> Key: HIVE-16496
> URL: https://issues.apache.org/jira/browse/HIVE-16496
> Project: Hive
>  Issue Type: Wish
>  Components: Parser
>Reporter: Dudu Markovitz
>
> support the following syntax:
> {code}
> select * exclude (a,b,e) from t
> {code}
> which for a table t with columns a,b,c,d,e would be equal to:
> {code}
> select c,d from t
> {code}
> A common use case would be:
> {code}
> select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
> {code}
> This supplies a very clean way to select all columns without getting 
> "Ambiguous column reference" and without the need to specify all the columns 
> of at least one of the tables.
>  
> Currently, without this enhancement, the query would look something like this:
> {code}
> select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
> on t1.x=t2.x;
> {code}
> An extension to this enhancement would be supporting positional column 
> notation, e.g.:
> {code}
> select * exclude (${1,2,5})from t   /* exclude columns 1, 2 and 5 */
> select * exclude (${1-3})   from t   /* exclude columns 1 to 3 */
> select * exclude (${1,3-5,7}) from t   /* exclude columns 1,3 to 5 and 7 */
> select * exclude (${5-}) from t   /* exclude all columns from the 5th column 
> to the end */
> select * exclude (-${1,2})  from t   /* exclude last 2 columns */
> select * exclude (-${1-3,7})   from t   /* exclude last 3 columns and the 7th 
> column from the end */
> 

[jira] [Updated] (HIVE-16496) Enhance asterisk expression (as in "select *") with EXCLUDE clause

2017-04-21 Thread Dudu Markovitz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dudu Markovitz updated HIVE-16496:
--
Description: 
support the following syntax:

{code}
select * exclude (a,b,e) from t
{code}

which for a table t with columns a,b,c,d,e would be equal to:

{code}
select c,d from t
{code}

A common use case would be:

{code}
select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
{code}

This supplies a very clean way to select all columns without getting "Ambiguous 
column reference" and without the need to specify all the columns of at least 
one of the tables.
 
Currently, without this enhancement, the query would look something like this:

{code}
select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
on t1.x=t2.x;
{code}

An extension to this enhancement would be supporting positional column 
notation, e.g.:

{code}
select * exclude (${1,2,5})from t   /* exclude columns 1, 2 and 5 */
select * exclude (${1-3})   from t   /* exclude columns 1 to 3 */
select * exclude (${1,3-5,7}) from t   /* exclude columns 1,3 to 5 and 7 */
select * exclude (-${1,2})  from t   /* exclude last 2 columns */
select * exclude (-${1-3,7})   from t   /* exclude last 3 columns and the 7th 
column from the end */
{code}

A complex example would look like:

{code}
select * exclude (${1-3,5,7},x,y,-${1-2})  from t  
{code}

exclude:
- first 3 columns
- 5th and 7th columns
- x and y 
- last 2 columns

P.s.
There should be *no* error raised for excluding a column that does not exists 
in the columns set or for excluding the same column more than once (e.g. by 
name and by position).

  was:
support the following syntax:

{code}
select * exclude (a,b,e) from t
{code}

which for a table t with columns a,b,c,d,e would be equal to:

{code}
select c,d from t
{code}

A common use case would be:

{code}
select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
{code}

This supplies a very clean way to select all columns without getting "Ambiguous 
column reference" and without the need to specify all the columns of at least 
one of the tables.
 
Currently, without this enhancement, the query would look something like this:

{code}
select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
on t1.x=t2.x;
{code}

An extension to this enhancement would be supporting positional column 
notation, e.g.:

{code}
select * exclude (${1,2,5})from t   /* exclude columns 1, 2 and 5 */
select * exclude (${1-3})   from t   /* exclude columns 1 to 3 */
select * exclude (${1,3-5,7}) from t   /* exclude columns 1,3 to 5 and 7 */
select * exclude (-${1,2})  from t   /* exclude last 2 columns */
select * exclude (-${1-3,7})   from t   /* exclude last 3 columns and the 7th 
column from the end */
{code}

A complex example would look like:

{code}
select * exclude (${1-4,5,7},x,y,-${1-3})  from t  
{code}

exclude:
- first 4 columns
- 5th and 7th columns
- x and y 
- last 3 columns

P.s.
There should be *no* error raised for excluding a column that does not exists 
in the columns set or for excluding the same column more than once (e.g. by 
name and by position).


> Enhance asterisk expression (as in "select *") with EXCLUDE clause
> --
>
> Key: HIVE-16496
> URL: https://issues.apache.org/jira/browse/HIVE-16496
> Project: Hive
>  Issue Type: Wish
>  Components: Parser
>Reporter: Dudu Markovitz
>
> support the following syntax:
> {code}
> select * exclude (a,b,e) from t
> {code}
> which for a table t with columns a,b,c,d,e would be equal to:
> {code}
> select c,d from t
> {code}
> A common use case would be:
> {code}
> select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
> {code}
> This supplies a very clean way to select all columns without getting 
> "Ambiguous column reference" and without the need to specify all the columns 
> of at least one of the tables.
>  
> Currently, without this enhancement, the query would look something like this:
> {code}
> select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
> on t1.x=t2.x;
> {code}
> An extension to this enhancement would be supporting positional column 
> notation, e.g.:
> {code}
> select * exclude (${1,2,5})from t   /* exclude columns 1, 2 and 5 */
> select * exclude (${1-3})   from t   /* exclude columns 1 to 3 */
> select * exclude (${1,3-5,7}) from t   /* exclude columns 1,3 to 5 and 7 */
> select * exclude (-${1,2})  from t   /* exclude last 2 columns */
> select * exclude (-${1-3,7})   from t   /* exclude last 3 columns and the 7th 
> column from the end */
> {code}
> A complex example would look like:
> {code}
> select * exclude (${1-3,5,7},x,y,-${1-2})  from t  
> {code}
> exclude:
> - first 3 columns
> - 5th and 7th columns
> - x and y 
> - 

[jira] [Updated] (HIVE-16496) Enhance asterisk expression (as in "select *") with EXCLUDE clause

2017-04-21 Thread Dudu Markovitz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dudu Markovitz updated HIVE-16496:
--
Description: 
support the following syntax:

{code}
select * exclude (a,b,e) from t
{code}

which for a table t with columns a,b,c,d,e would be equal to:

{code}
select c,d from t
{code}

A common use case would be:

{code}
select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
{code}

This supplies a very clean way to select all columns without getting "Ambiguous 
column reference" and without the need to specify all the columns of at least 
one of the tables.
 
Currently, without this enhancement, the query would look something like this:

{code}
select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
on t1.x=t2.x;
{code}

An extension to this enhancement would be supporting positional column 
notation, e.g.:

{code}
select * exclude (${1,2,5})from t   /* exclude columns 1, 2 and 5 */
select * exclude (${1-3})   from t   /* exclude columns 1 to 3 */
select * exclude (${1,3-5,7}) from t   /* exclude columns 1,3 to 5 and 7 */
select * exclude (-${1,2})  from t   /* exclude last 2 columns */
select * exclude (-${1-3,7})   from t   /* exclude last 3 columns and the 7th 
column from the end */
{code}

A complex example would look like:

{code}
select * exclude (${1-4,5,7},x,y,-${1-3})  from t  
{code}

exclude:
- first 4 columns
- 5th and 7th columns
- x and y 
- last 3 columns

P.s.
There should be *no* error raised for excluding a column that does not exists 
in the columns set or for excluding the same column more than once (e.g. by 
name and by position).

  was:
support the following syntax:

{code}
select * exclude (a,b,e) from t
{code}

which for a table t with columns a,b,c,d,e would be equal to:

{code}
select c,d from t
{code}

A common use case would be:

{code}
select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
{code}

This supplies a very clean way to select all columns without getting "Ambiguous 
column reference" and without the need to specify all the columns of at least 
one of the tables.
 
Currently, without this enhancement, the query would look something like this:

{code}
select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
on t1.x=t2.x;
{code}

An extension to this enhancement would be supporting positional column 
notation, e.g.:

{code}
select * exclude (${1,2,5})from t   /* exclude columns 1, 2 and 5 */
select * exclude (${1-3})   from t   /* exclude columns 1 to 3 */
select * exclude (${1,3-5,7}) from t   /* exclude columns 1,3 to 5 and 7 */
select * exclude (-${1,2})  from t   /* exclude last 2 columns */
select * exclude (-${1-3,7})   from t   /* exclude last 3 columns and the 7th 
column from the end */
{code}

A complex example would look like:

{code}
select * exclude (${1-4,5,7},x,y,-${1-3})  from t  
{code}

exclude:
- first 4 columns
- 5th and 7th columns
- x and y 
- last 3 columns

P.s.
There should be **no** error raised for excluding a column that does not exists 
in the columns set or for excluding the same column more than once (e.g. by 
name and by position).


> Enhance asterisk expression (as in "select *") with EXCLUDE clause
> --
>
> Key: HIVE-16496
> URL: https://issues.apache.org/jira/browse/HIVE-16496
> Project: Hive
>  Issue Type: Wish
>  Components: Parser
>Reporter: Dudu Markovitz
>
> support the following syntax:
> {code}
> select * exclude (a,b,e) from t
> {code}
> which for a table t with columns a,b,c,d,e would be equal to:
> {code}
> select c,d from t
> {code}
> A common use case would be:
> {code}
> select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
> {code}
> This supplies a very clean way to select all columns without getting 
> "Ambiguous column reference" and without the need to specify all the columns 
> of at least one of the tables.
>  
> Currently, without this enhancement, the query would look something like this:
> {code}
> select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
> on t1.x=t2.x;
> {code}
> An extension to this enhancement would be supporting positional column 
> notation, e.g.:
> {code}
> select * exclude (${1,2,5})from t   /* exclude columns 1, 2 and 5 */
> select * exclude (${1-3})   from t   /* exclude columns 1 to 3 */
> select * exclude (${1,3-5,7}) from t   /* exclude columns 1,3 to 5 and 7 */
> select * exclude (-${1,2})  from t   /* exclude last 2 columns */
> select * exclude (-${1-3,7})   from t   /* exclude last 3 columns and the 7th 
> column from the end */
> {code}
> A complex example would look like:
> {code}
> select * exclude (${1-4,5,7},x,y,-${1-3})  from t  
> {code}
> exclude:
> - first 4 columns
> - 5th and 7th columns
> - x and y 
> 

[jira] [Updated] (HIVE-16496) Enhance asterisk expression (as in "select *") with EXCLUDE clause

2017-04-21 Thread Dudu Markovitz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dudu Markovitz updated HIVE-16496:
--
Description: 
support the following syntax:

{code}
select * exclude (a,b,e) from t
{code}

which for a table t with columns a,b,c,d,e would be equal to:

{code}
select c,d from t
{code}

A common use case would be:

{code}
select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
{code}

This supplies a very clean way to select all columns without getting "Ambiguous 
column reference" and without the need to specify all the columns of at least 
one of the tables.
 
Currently, without this enhancement, the query would look something like this:

{code}
select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
on t1.x=t2.x;
{code}

An extension to this enhancement would be supporting positional column 
notation, e.g.:

{code}
select * exclude (${1,2,5})from t   /* exclude columns 1, 2 and 5 */
select * exclude (${1-3})   from t   /* exclude columns 1 to 3 */
select * exclude (${1,3-5,7}) from t   /* exclude columns 1,3 to 5 and 7 */
select * exclude (-${1,2})  from t   /* exclude last 2 columns */
select * exclude (-${1-3,7})   from t   /* exclude last 3 columns and the 7th 
column from the end */
{code}

A complex example would look like:

{code}
select * exclude (${1-4,5,7},x,y,-${1-3})  from t  
{code}

exclude:
- first 4 columns
- 5th and 7th columns
- x and y 
- last 3 columns

P.s.
There should be **no** error raised for excluding a column that does not exists 
in the columns set or for excluding the same column more than once (e.g. by 
name and by position).

  was:
support the following syntax:

{code}
select * exclude (a,b,e) from t
{code}

which for a table t with columns a,b,c,d,e would be equal to:

{code}
select c,d from t
{code}

A common use case would be:

{code}
select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
{code}

This supplies a very clean way to select all columns without getting "Ambiguous 
column reference" and without the need to specify all the columns of at least 
one of the tables.
 
Currently, without this enhancement, the query would look something like this:

{code}
select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
on t1.x=t2.x;
{code}

An extension to this enhancement would be supporting positional column 
notation, e.g.:

{code}
select * exclude (${1,2,5})from t   /* exclude columns 1, 2 and 5 */
select * exclude (${1-3})   from t   /* exclude columns 1 to 3 */
select * exclude (${1,3-5,7}) from t   /* exclude columns 1,3 to 5 and 7 */
select * exclude (-${1,2})  from t   /* exclude last 2 columns */
select * exclude (-${1-3,7})   from t   /* exclude last 3 columns and the 7th 
column from the end */
{code}

A complex example would look like:

{code}
select * exclude (${1-4,5,7},x,y,-${1-3})  from t  
{code}

exclude:
- first 4 columns
- 5th and 7th columns
- x and y 
- last 3 columns



> Enhance asterisk expression (as in "select *") with EXCLUDE clause
> --
>
> Key: HIVE-16496
> URL: https://issues.apache.org/jira/browse/HIVE-16496
> Project: Hive
>  Issue Type: Wish
>  Components: Parser
>Reporter: Dudu Markovitz
>
> support the following syntax:
> {code}
> select * exclude (a,b,e) from t
> {code}
> which for a table t with columns a,b,c,d,e would be equal to:
> {code}
> select c,d from t
> {code}
> A common use case would be:
> {code}
> select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
> {code}
> This supplies a very clean way to select all columns without getting 
> "Ambiguous column reference" and without the need to specify all the columns 
> of at least one of the tables.
>  
> Currently, without this enhancement, the query would look something like this:
> {code}
> select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
> on t1.x=t2.x;
> {code}
> An extension to this enhancement would be supporting positional column 
> notation, e.g.:
> {code}
> select * exclude (${1,2,5})from t   /* exclude columns 1, 2 and 5 */
> select * exclude (${1-3})   from t   /* exclude columns 1 to 3 */
> select * exclude (${1,3-5,7}) from t   /* exclude columns 1,3 to 5 and 7 */
> select * exclude (-${1,2})  from t   /* exclude last 2 columns */
> select * exclude (-${1-3,7})   from t   /* exclude last 3 columns and the 7th 
> column from the end */
> {code}
> A complex example would look like:
> {code}
> select * exclude (${1-4,5,7},x,y,-${1-3})  from t  
> {code}
> exclude:
> - first 4 columns
> - 5th and 7th columns
> - x and y 
> - last 3 columns
> P.s.
> There should be **no** error raised for excluding a column that does not 
> exists in the columns set or for excluding the same column more than once 
> 

[jira] [Updated] (HIVE-16496) Enhance asterisk expression (as in "select *") with EXCLUDE clause

2017-04-21 Thread Dudu Markovitz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dudu Markovitz updated HIVE-16496:
--
Description: 
support the following syntax:

{code}
select * exclude (a,b,e) from t
{code}

which for a table t with columns a,b,c,d,e would be equal to:

{code}
select c,d from t
{code}

A common use case would be:

{code}
select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
{code}

This supplies a very clean way to select all columns without getting "Ambiguous 
column reference" and without the need to specify all the columns of at least 
one of the tables.
 
Currently, without this enhancement, the query would look something like this:

{code}
select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
on t1.x=t2.x;
{code}

An extension to this enhancement would be supporting positional column 
notation, e.g.:

{code}
select * exclude (${1,2,5})from t   /* exclude columns 1, 2 and 5 */
select * exclude (${1-3})   from t   /* exclude columns 1 to 3 */
select * exclude (${1,3-5,7}) from t   /* exclude columns 1,3 to 5 and 7 */
select * exclude (-${1,2})  from t   /* exclude last 2 columns */
select * exclude (-${1-3,7})   from t   /* exclude last 3 columns and the 7th 
column from the end */
{code}

A complex example would look like:

{code}
select * exclude (${1-4,5,7},x,y,-${1-3})  from t  
{code}

exclude:
- first 4 columns
- 5th and 7th columns
- x and y 
- last 3 columns


  was:
support the following syntax:

{code}
select * exclude (a,b,e) from t
{code}

which for a table t with columns a,b,c,d,e would be equal to:

{code}
select c,d from t
{code}

A common use case would be:

{code}
select t1.* exclude \(x), t2.* from t1 join t2 on t1.x=t2.x;
{code}

This supplies a very clean way to select all columns without getting "Ambiguous 
column reference" and without the need to specify all the columns of at least 
one of the tables.
 
Currently, without this enhancement, the query would look something like this:

{code}
select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
on t1.x=t2.x;
{code}

An extension to this enhancement would be supporting positional column 
notation, e.g.:

{code}
select * exclude (${1,2,5})from t   /* exclude columns 1, 2 and 5 */
select * exclude (${1-3})   from t   /* exclude columns 1 to 3 */
select * exclude (${1,3-5,7}) from t   /* exclude columns 1,3 to 5 and 7 */
select * exclude (-${1,2})  from t   /* exclude last 2 columns */
select * exclude (-${1-3,7})   from t   /* exclude last 3 columns and the 7th 
column from the end */
{code}

A complex example would look like:

{code}
select * exclude (${1-4,5,7},x,y,-${1-3})  from t  
{code}

exclude:
- first 4 columns
- 5th and 7th columns
- x and y 
- last 3 columns



> Enhance asterisk expression (as in "select *") with EXCLUDE clause
> --
>
> Key: HIVE-16496
> URL: https://issues.apache.org/jira/browse/HIVE-16496
> Project: Hive
>  Issue Type: Wish
>  Components: Parser
>Reporter: Dudu Markovitz
>
> support the following syntax:
> {code}
> select * exclude (a,b,e) from t
> {code}
> which for a table t with columns a,b,c,d,e would be equal to:
> {code}
> select c,d from t
> {code}
> A common use case would be:
> {code}
> select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
> {code}
> This supplies a very clean way to select all columns without getting 
> "Ambiguous column reference" and without the need to specify all the columns 
> of at least one of the tables.
>  
> Currently, without this enhancement, the query would look something like this:
> {code}
> select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
> on t1.x=t2.x;
> {code}
> An extension to this enhancement would be supporting positional column 
> notation, e.g.:
> {code}
> select * exclude (${1,2,5})from t   /* exclude columns 1, 2 and 5 */
> select * exclude (${1-3})   from t   /* exclude columns 1 to 3 */
> select * exclude (${1,3-5,7}) from t   /* exclude columns 1,3 to 5 and 7 */
> select * exclude (-${1,2})  from t   /* exclude last 2 columns */
> select * exclude (-${1-3,7})   from t   /* exclude last 3 columns and the 7th 
> column from the end */
> {code}
> A complex example would look like:
> {code}
> select * exclude (${1-4,5,7},x,y,-${1-3})  from t  
> {code}
> exclude:
> - first 4 columns
> - 5th and 7th columns
> - x and y 
> - last 3 columns



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16496) Enhance asterisk expression (as in "select *") with EXCLUDE clause

2017-04-21 Thread Dudu Markovitz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dudu Markovitz updated HIVE-16496:
--
Description: 
support the following syntax:

{code}
select * exclude (a,b,e) from t
{code}

which for a table t with columns a,b,c,d,e would be equal to:

{code}
select c,d from t
{code}

A common use case would be:

{code}
select t1.* exclude \(x), t2.* from t1 join t2 on t1.x=t2.x;
{code}

This supplies a very clean way to select all columns without getting "Ambiguous 
column reference" and without the need to specify all the columns of at least 
one of the tables.
 
Currently, without this enhancement, the query would look something like this:

{code}
select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
on t1.x=t2.x;
{code}

An extension to this enhancement would be supporting positional column 
notation, e.g.:

{code}
select * exclude (${1,2,5})from t   /* exclude columns 1, 2 and 5 */
select * exclude (${1-3})   from t   /* exclude columns 1 to 3 */
select * exclude (${1,3-5,7}) from t   /* exclude columns 1,3 to 5 and 7 */
select * exclude (-${1,2})  from t   /* exclude last 2 columns */
select * exclude (-${1-3,7})   from t   /* exclude last 3 columns and the 7th 
column from the end */
{code}

A complex example would look like:

{code}
select * exclude (${1-4,5,7},x,y,-${1-3})  from t  
{code}

exclude:
- first 4 columns
- 5th and 7th columns
- x and y 
- last 3 columns


  was:
support the following syntax:

{code}
select * exclude (a,b,e) from t
{code}

which for a table t with columns a,b,c,d,e would be equal to:

> select c,d from t

A common use case would be:

> select t1.* exclude \(x), t2.* from t1 join t2 on t1.x=t2.x;

This supplies a very clean way to select all columns without getting "Ambiguous 
column reference" and without the need to specify all the columns of at least 
one of the tables.
 
Currently, without this enhancement, the query would look something like this:

> select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
> on t1.x=t2.x;

An extension to this enhancement would be supporting positional column 
notation, e.g.:

> select * exclude (${1,2,5})from t   /* exclude columns 1, 2 and 5 */
> select * exclude (${1-3})   from t   /* exclude columns 1 to 3 */
> select * exclude (${1,3-5,7}) from t   /* exclude columns 1,3 to 5 and 7 */
> select * exclude (\-${1,2})  from t   /* exclude last 2 columns */
> select * exclude (\-${1-3,7})   from t   /* exclude last 3 columns and the 
> 7th column from the end */

A complex example would look like:

> select * exclude (${1-4,5,7},x,y,-${1-3})  from t  

exclude:
- first 4 columns
- 5th and 7th columns
- x and y 
- last 3 columns



> Enhance asterisk expression (as in "select *") with EXCLUDE clause
> --
>
> Key: HIVE-16496
> URL: https://issues.apache.org/jira/browse/HIVE-16496
> Project: Hive
>  Issue Type: Wish
>  Components: Parser
>Reporter: Dudu Markovitz
>
> support the following syntax:
> {code}
> select * exclude (a,b,e) from t
> {code}
> which for a table t with columns a,b,c,d,e would be equal to:
> {code}
> select c,d from t
> {code}
> A common use case would be:
> {code}
> select t1.* exclude \(x), t2.* from t1 join t2 on t1.x=t2.x;
> {code}
> This supplies a very clean way to select all columns without getting 
> "Ambiguous column reference" and without the need to specify all the columns 
> of at least one of the tables.
>  
> Currently, without this enhancement, the query would look something like this:
> {code}
> select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
> on t1.x=t2.x;
> {code}
> An extension to this enhancement would be supporting positional column 
> notation, e.g.:
> {code}
> select * exclude (${1,2,5})from t   /* exclude columns 1, 2 and 5 */
> select * exclude (${1-3})   from t   /* exclude columns 1 to 3 */
> select * exclude (${1,3-5,7}) from t   /* exclude columns 1,3 to 5 and 7 */
> select * exclude (-${1,2})  from t   /* exclude last 2 columns */
> select * exclude (-${1-3,7})   from t   /* exclude last 3 columns and the 7th 
> column from the end */
> {code}
> A complex example would look like:
> {code}
> select * exclude (${1-4,5,7},x,y,-${1-3})  from t  
> {code}
> exclude:
> - first 4 columns
> - 5th and 7th columns
> - x and y 
> - last 3 columns



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16496) Enhance asterisk expression (as in "select *") with EXCLUDE clause

2017-04-21 Thread Dudu Markovitz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dudu Markovitz updated HIVE-16496:
--
Description: 
support the following syntax:

{code}
select * exclude (a,b,e) from t
{code}

which for a table t with columns a,b,c,d,e would be equal to:

> select c,d from t

A common use case would be:

> select t1.* exclude \(x), t2.* from t1 join t2 on t1.x=t2.x;

This supplies a very clean way to select all columns without getting "Ambiguous 
column reference" and without the need to specify all the columns of at least 
one of the tables.
 
Currently, without this enhancement, the query would look something like this:

> select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
> on t1.x=t2.x;

An extension to this enhancement would be supporting positional column 
notation, e.g.:

> select * exclude (${1,2,5})from t   /* exclude columns 1, 2 and 5 */
> select * exclude (${1-3})   from t   /* exclude columns 1 to 3 */
> select * exclude (${1,3-5,7}) from t   /* exclude columns 1,3 to 5 and 7 */
> select * exclude (\-${1,2})  from t   /* exclude last 2 columns */
> select * exclude (\-${1-3,7})   from t   /* exclude last 3 columns and the 
> 7th column from the end */

A complex example would look like:

> select * exclude (${1-4,5,7},x,y,-${1-3})  from t  

exclude:
- first 4 columns
- 5th and 7th columns
- x and y 
- last 3 columns


  was:
support the following syntax:

> select * exclude (a,b,e) from t

which for a table t with columns a,b,c,d,e would be equal to:

> select c,d from t

A common use case would be:

> select t1.* exclude \(x), t2.* from t1 join t2 on t1.x=t2.x;

This supplies a very clean way to select all columns without getting "Ambiguous 
column reference" and without the need to specify all the columns of at least 
one of the tables.
 
Currently, without this enhancement, the query would look something like this:

> select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
> on t1.x=t2.x;

An extension to this enhancement would be supporting positional column 
notation, e.g.:

> select * exclude (${1,2,5})from t   /* exclude columns 1, 2 and 5 */
> select * exclude (${1-3})   from t   /* exclude columns 1 to 3 */
> select * exclude (${1,3-5,7}) from t   /* exclude columns 1,3 to 5 and 7 */
> select * exclude (\-${1,2})  from t   /* exclude last 2 columns */
> select * exclude (\-${1-3,7})   from t   /* exclude last 3 columns and the 
> 7th column from the end */

A complex example would look like:

> select * exclude (${1-4,5,7},x,y,-${1-3})  from t  

exclude:
- first 4 columns
- 5th and 7th columns
- x and y 
- last 3 columns



> Enhance asterisk expression (as in "select *") with EXCLUDE clause
> --
>
> Key: HIVE-16496
> URL: https://issues.apache.org/jira/browse/HIVE-16496
> Project: Hive
>  Issue Type: Wish
>  Components: Parser
>Reporter: Dudu Markovitz
>
> support the following syntax:
> {code}
> select * exclude (a,b,e) from t
> {code}
> which for a table t with columns a,b,c,d,e would be equal to:
> > select c,d from t
> A common use case would be:
> > select t1.* exclude \(x), t2.* from t1 join t2 on t1.x=t2.x;
> This supplies a very clean way to select all columns without getting 
> "Ambiguous column reference" and without the need to specify all the columns 
> of at least one of the tables.
>  
> Currently, without this enhancement, the query would look something like this:
> > select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join 
> > t2 on t1.x=t2.x;
> An extension to this enhancement would be supporting positional column 
> notation, e.g.:
> > select * exclude (${1,2,5})from t   /* exclude columns 1, 2 and 5 */
> > select * exclude (${1-3})   from t   /* exclude columns 1 to 3 */
> > select * exclude (${1,3-5,7}) from t   /* exclude columns 1,3 to 5 and 7 */
> > select * exclude (\-${1,2})  from t   /* exclude last 2 columns */
> > select * exclude (\-${1-3,7})   from t   /* exclude last 3 columns and the 
> > 7th column from the end */
> A complex example would look like:
> > select * exclude (${1-4,5,7},x,y,-${1-3})  from t  
> exclude:
> - first 4 columns
> - 5th and 7th columns
> - x and y 
> - last 3 columns



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16496) Enhance asterisk expression (as in "select *") with EXCLUDE clause

2017-04-21 Thread Dudu Markovitz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dudu Markovitz updated HIVE-16496:
--
Description: 
support the following syntax:

> select * exclude (a,b,e) from t

which for a table t with columns a,b,c,d,e would be equal to:

> select c,d from t

A common use case would be:

> select t1.* exclude \(x), t2.* from t1 join t2 on t1.x=t2.x;

This supplies a very clean way to select all columns without getting "Ambiguous 
column reference" and without the need to specify all the columns of at least 
one of the tables.
 
Currently, without this enhancement, the query would look something like this:

> select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
> on t1.x=t2.x;

An extension to this enhancement would be supporting positional column 
notation, e.g.:

> select * exclude (${1,2,5})from t   /* exclude columns 1, 2 and 5 */
> select * exclude (${1-3})   from t   /* exclude columns 1 to 3 */
> select * exclude (${1,3-5,7}) from t   /* exclude columns 1,3 to 5 and 7 */
> select * exclude (\-${1,2})  from t   /* exclude last 2 columns */
> select * exclude (\-${1-3,7})   from t   /* exclude last 3 columns and the 
> 7th column from the end */

A complex example would look like:

> select * exclude (${1-4,5,7},x,y,-${1-3})  from t  

exclude:
- first 4 columns
- 5th and 7th columns
- x and y 
- last 3 columns


  was:
support the following syntax:

> select * exclude (a,b,e) from t

which for a table t with columns a,b,c,d,e would be equal to:

> select c,d from t

A common use case would be:

> select t1.* exclude \(x), t2.* from t1 join t2 on t1.x=t2.x;

This supplies a very clean way to select all columns without getting "Ambiguous 
column reference" and without the need to specify all the columns of at least 
one of the tables.
 
Currently, without this enhancement, the query would look something like this:

> select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
> on t1.x=t2.x;

An extension to this enhancement would be supporting positional column 
notation, e.g.:

> select * exclude (${1,2,5})from t   -- exclude columns 1, 2 and 5
> select * exclude (${1\-3})   from t   -- exclude columns 1 to 3
> select * exclude (${1,3\-5,7}) from t   -- exclude columns 1,3 to 5 and 7
> select * exclude (\-${1,2})  from t   -- exclude last 2 columns
> select * exclude (\-${1\-3,7})   from t   -- exclude last 3 columns and the 
> 7th column from the end

A complex example would look like:

> select * exclude (${1-4,5,7},x,y,-${1-3})  from t  

exclude:
- first 4 columns
- 5th and 7th columns
- x and y 
- last 3 columns



> Enhance asterisk expression (as in "select *") with EXCLUDE clause
> --
>
> Key: HIVE-16496
> URL: https://issues.apache.org/jira/browse/HIVE-16496
> Project: Hive
>  Issue Type: Wish
>  Components: Parser
>Reporter: Dudu Markovitz
>
> support the following syntax:
> > select * exclude (a,b,e) from t
> which for a table t with columns a,b,c,d,e would be equal to:
> > select c,d from t
> A common use case would be:
> > select t1.* exclude \(x), t2.* from t1 join t2 on t1.x=t2.x;
> This supplies a very clean way to select all columns without getting 
> "Ambiguous column reference" and without the need to specify all the columns 
> of at least one of the tables.
>  
> Currently, without this enhancement, the query would look something like this:
> > select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join 
> > t2 on t1.x=t2.x;
> An extension to this enhancement would be supporting positional column 
> notation, e.g.:
> > select * exclude (${1,2,5})from t   /* exclude columns 1, 2 and 5 */
> > select * exclude (${1-3})   from t   /* exclude columns 1 to 3 */
> > select * exclude (${1,3-5,7}) from t   /* exclude columns 1,3 to 5 and 7 */
> > select * exclude (\-${1,2})  from t   /* exclude last 2 columns */
> > select * exclude (\-${1-3,7})   from t   /* exclude last 3 columns and the 
> > 7th column from the end */
> A complex example would look like:
> > select * exclude (${1-4,5,7},x,y,-${1-3})  from t  
> exclude:
> - first 4 columns
> - 5th and 7th columns
> - x and y 
> - last 3 columns



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16496) Enhance asterisk expression (as in "select *") with EXCLUDE clause

2017-04-21 Thread Dudu Markovitz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dudu Markovitz updated HIVE-16496:
--
Description: 
support the following syntax:

> select * exclude (a,b,e) from t

which for a table t with columns a,b,c,d,e would be equal to:

> select c,d from t

A common use case would be:

> select t1.* exclude \(x), t2.* from t1 join t2 on t1.x=t2.x;

This supplies a very clean way to select all columns without getting "Ambiguous 
column reference" and without the need to specify all the columns of at least 
one of the tables.
 
Currently, without this enhancement, the query would look something like this:

> select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
> on t1.x=t2.x;

An extension to this enhancement would be supporting positional column 
notation, e.g.:

> select * exclude (${1,2,5})from t   -- exclude columns 1, 2 and 5
> select * exclude (${1\-3})   from t   -- exclude columns 1 to 3
> select * exclude (${1,3\-5,7}) from t   -- exclude columns 1,3 to 5 and 7
> select * exclude (\-${1,2})  from t   -- exclude last 2 columns
> select * exclude (\-${1\-3,7})   from t   -- exclude last 3 columns and the 
> 7th column from the end

A complex example would look like:

> select * exclude (${1-4,5,7},x,y,-${1-3})  from t  

exclude:
- first 4 columns
- 5th and 7th columns
- x and y 
- last 3 columns


  was:
support the following syntax:

> select * exclude (a,b,e) from t

which for a table t with columns a,b,c,d,e would be equal to:

> select c,d from t

A common use case would be:

> select t1.* exclude \(x), t2.* from t1 join t2 on t1.x=t2.x;

This supplies a very clean way to select all columns without getting "Ambiguous 
column reference" and without the need to specify all the columns of at least 
one of the tables.
 
Currently, without this enhancement, the query would look something like this:

> select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
> on t1.x=t2.x;

An extension to this enhancement would be supporting positional column 
notation, e.g.:

> select * exclude (${1,2,5})from t\-- exclude columns 1, 2 and 5
> select * exclude (${1-3})   from t\-- exclude columns 1 to 3
> select * exclude (${1,3-5,7}) from t\-- exclude columns 1,3 to 5 and 7
> select * exclude (-${1,2})  from t\-- exclude last 2 columns
> select * exclude (-${1-3,7})   from t\-- exclude last 3 columns and the 
> 7th column from the end

A complex example would look like:

> select * exclude (${1-4,5,7},x,y,-${1-3})  from t  

exclude:
- first 4 columns
- 5th and 7th columns
- x and y 
- last 3 columns



> Enhance asterisk expression (as in "select *") with EXCLUDE clause
> --
>
> Key: HIVE-16496
> URL: https://issues.apache.org/jira/browse/HIVE-16496
> Project: Hive
>  Issue Type: Wish
>  Components: Parser
>Reporter: Dudu Markovitz
>
> support the following syntax:
> > select * exclude (a,b,e) from t
> which for a table t with columns a,b,c,d,e would be equal to:
> > select c,d from t
> A common use case would be:
> > select t1.* exclude \(x), t2.* from t1 join t2 on t1.x=t2.x;
> This supplies a very clean way to select all columns without getting 
> "Ambiguous column reference" and without the need to specify all the columns 
> of at least one of the tables.
>  
> Currently, without this enhancement, the query would look something like this:
> > select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join 
> > t2 on t1.x=t2.x;
> An extension to this enhancement would be supporting positional column 
> notation, e.g.:
> > select * exclude (${1,2,5})from t   -- exclude columns 1, 2 and 5
> > select * exclude (${1\-3})   from t   -- exclude columns 1 to 3
> > select * exclude (${1,3\-5,7}) from t   -- exclude columns 1,3 to 5 and 7
> > select * exclude (\-${1,2})  from t   -- exclude last 2 columns
> > select * exclude (\-${1\-3,7})   from t   -- exclude last 3 columns and the 
> > 7th column from the end
> A complex example would look like:
> > select * exclude (${1-4,5,7},x,y,-${1-3})  from t  
> exclude:
> - first 4 columns
> - 5th and 7th columns
> - x and y 
> - last 3 columns



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16496) Enhance asterisk expression (as in "select *") with EXCLUDE clause

2017-04-21 Thread Dudu Markovitz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dudu Markovitz updated HIVE-16496:
--
Description: 
support the following syntax:

> select * exclude (a,b,e) from t

which for a table t with columns a,b,c,d,e would be equal to:

> select c,d from t

A common use case would be:

> select t1.* exclude \(x), t2.* from t1 join t2 on t1.x=t2.x;

This supplies a very clean way to select all columns without getting "Ambiguous 
column reference" and without the need to specify all the columns of at least 
one of the tables.
 
Currently, without this enhancement, the query would look something like this:

> select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
> on t1.x=t2.x;

An extension to this enhancement would be supporting positional column 
notation, e.g.:

> select * exclude (${1,2,5})from t\-- exclude columns 1, 2 and 5
> select * exclude (${1-3})   from t\-- exclude columns 1 to 3
> select * exclude (${1,3-5,7}) from t\-- exclude columns 1,3 to 5 and 7
> select * exclude (-${1,2})  from t\-- exclude last 2 columns
> select * exclude (-${1-3,7})   from t\-- exclude last 3 columns and the 
> 7th column from the end

A complex example would look like:

> select * exclude (${1-4,5,7},x,y,-${1-3})  from t  

exclude:
- first 4 columns
- 5th and 7th columns
- x and y 
- last 3 columns


  was:
support the following syntax:

> select * exclude (a,b,e) from t

which for a table t with columns a,b,c,d,e would be equal to:

> select c,d from t

A common use case would be:

> select t1.* exclude \(x), t2.* from t1 join t2 on t1.x=t2.x;

This supplies a very clean way to select all columns without getting "Ambiguous 
column reference" and without the need to specify all the columns of at least 
one of the tables.
 
Currently, without this enhancement, the query would look something like this:

> select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
> on t1.x=t2.x;

An extension to this enhancement would be supporting positional column 
notation, e.g.:

> select * exclude (${1,2,5})from t-- exclude columns 1, 2 and 5
> select * exclude (${1-3})   from t-- exclude columns 1 to 3
> select * exclude (${1,3-5,7}) from t-- exclude columns 1,3 to 5 and 7
> select * exclude (-${1,2})  from t-- exclude last 2 columns
> select * exclude (-${1-3,7})   from t-- exclude last 3 columns and the 
> 7th column from the end

A complex example would look like:

> select * exclude (${1-4,5,7},x,y,-${1-3})  from t  

exclude:
- first 4 columns
- 5th and 7th columns
- x and y 
- last 3 columns



> Enhance asterisk expression (as in "select *") with EXCLUDE clause
> --
>
> Key: HIVE-16496
> URL: https://issues.apache.org/jira/browse/HIVE-16496
> Project: Hive
>  Issue Type: Wish
>  Components: Parser
>Reporter: Dudu Markovitz
>
> support the following syntax:
> > select * exclude (a,b,e) from t
> which for a table t with columns a,b,c,d,e would be equal to:
> > select c,d from t
> A common use case would be:
> > select t1.* exclude \(x), t2.* from t1 join t2 on t1.x=t2.x;
> This supplies a very clean way to select all columns without getting 
> "Ambiguous column reference" and without the need to specify all the columns 
> of at least one of the tables.
>  
> Currently, without this enhancement, the query would look something like this:
> > select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join 
> > t2 on t1.x=t2.x;
> An extension to this enhancement would be supporting positional column 
> notation, e.g.:
> > select * exclude (${1,2,5})from t\-- exclude columns 1, 2 and 5
> > select * exclude (${1-3})   from t\-- exclude columns 1 to 3
> > select * exclude (${1,3-5,7}) from t\-- exclude columns 1,3 to 5 and 7
> > select * exclude (-${1,2})  from t\-- exclude last 2 columns
> > select * exclude (-${1-3,7})   from t\-- exclude last 3 columns and the 
> > 7th column from the end
> A complex example would look like:
> > select * exclude (${1-4,5,7},x,y,-${1-3})  from t  
> exclude:
> - first 4 columns
> - 5th and 7th columns
> - x and y 
> - last 3 columns



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16496) Enhance asterisk expression (as in "select *") with EXCLUDE clause

2017-04-21 Thread Dudu Markovitz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dudu Markovitz updated HIVE-16496:
--
Description: 
support the following syntax:

> select * exclude (a,b,e) from t

which for a table t with columns a,b,c,d,e would be equal to:

> select c,d from t

A common use case would be:

> select t1.* exclude \(x), t2.* from t1 join t2 on t1.x=t2.x;

This supplies a very clean way to select all columns without getting "Ambiguous 
column reference" and without the need to specify all the columns of at least 
one of the tables.
 
Currently, without this enhancement, the query would look something like this:

> select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
> on t1.x=t2.x;

An extension to this enhancement would be supporting positional column 
notation, e.g.:

> select * exclude (${1,2,5})from t-- exclude columns 1, 2 and 5
> select * exclude (${1-3})   from t-- exclude columns 1 to 3
> select * exclude (${1,3-5,7}) from t-- exclude columns 1,3 to 5 and 7
> select * exclude (-${1,2})  from t-- exclude last 2 columns
> select * exclude (-${1-3,7})   from t-- exclude last 3 columns and the 
> 7th column from the end

A complex example would look like:

> select * exclude (${1-4,5,7},x,y,-${1-3})  from t  

exclude:
- first 4 columns
- 5th and 7th columns
- x and y 
- last 3 columns


  was:
support the following syntax:

> select * exclude (a,b,e) from t

which for a table t with columns a,b,c,d,e would be equal to:

> select c,d from t

A common use case would be:

> select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;

This supplies a very clean way to select all columns without getting "Ambiguous 
column reference" and without the need to specify all the columns of at least 
one of the tables.
 
Currently, without this enhancement, the query would look something like this:

> select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
> on t1.x=t2.x;

An extension to this enhancement would be supporting positional column 
notation, e.g.:

> select * exclude (${1,2,5})from t-- exclude columns 1, 2 and 5
> select * exclude (${1-3})   from t-- exclude columns 1 to 3
> select * exclude (${1,3-5,7}) from t-- exclude columns 1,3 to 5 and 7
> select * exclude (-${1,2})  from t-- exclude last 2 columns
> select * exclude (-${1-3,7})   from t-- exclude last 3 columns and the 
> 7th column from the end

A complex example would look like:

> select * exclude (${1-4,5,7},x,y,-${1-3})  from t  

exclude:
- first 4 columns
- 5th and 7th columns
- x and y 
- last 3 columns



> Enhance asterisk expression (as in "select *") with EXCLUDE clause
> --
>
> Key: HIVE-16496
> URL: https://issues.apache.org/jira/browse/HIVE-16496
> Project: Hive
>  Issue Type: Wish
>  Components: Parser
>Reporter: Dudu Markovitz
>
> support the following syntax:
> > select * exclude (a,b,e) from t
> which for a table t with columns a,b,c,d,e would be equal to:
> > select c,d from t
> A common use case would be:
> > select t1.* exclude \(x), t2.* from t1 join t2 on t1.x=t2.x;
> This supplies a very clean way to select all columns without getting 
> "Ambiguous column reference" and without the need to specify all the columns 
> of at least one of the tables.
>  
> Currently, without this enhancement, the query would look something like this:
> > select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join 
> > t2 on t1.x=t2.x;
> An extension to this enhancement would be supporting positional column 
> notation, e.g.:
> > select * exclude (${1,2,5})from t-- exclude columns 1, 2 and 5
> > select * exclude (${1-3})   from t-- exclude columns 1 to 3
> > select * exclude (${1,3-5,7}) from t-- exclude columns 1,3 to 5 and 7
> > select * exclude (-${1,2})  from t-- exclude last 2 columns
> > select * exclude (-${1-3,7})   from t-- exclude last 3 columns and the 
> > 7th column from the end
> A complex example would look like:
> > select * exclude (${1-4,5,7},x,y,-${1-3})  from t  
> exclude:
> - first 4 columns
> - 5th and 7th columns
> - x and y 
> - last 3 columns



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16158) Correct mistake in documentation for ALTER TABLE … ADD/REPLACE COLUMNS CASCADE

2017-04-20 Thread Dudu Markovitz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dudu Markovitz reassigned HIVE-16158:
-

Assignee: Illya Yalovyy  (was: Lefty Leverenz)

Kind reminder to resolve the ticket


> Correct mistake in documentation for ALTER TABLE … ADD/REPLACE COLUMNS CASCADE
> --
>
> Key: HIVE-16158
> URL: https://issues.apache.org/jira/browse/HIVE-16158
> Project: Hive
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.0.0
>Reporter: Illya Yalovyy
>Assignee: Illya Yalovyy
>
> Current documentation says that key word CASCADE was introduced in Hive 0.15 
> release. That information is incorrect and confuses users. The feature was 
> actually released in Hive 1.1.0. (HIVE-8839) 
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Add/ReplaceColumns



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16299) In case of partitioned table, MSCK REPAIR TABLE does not do a full validation of a FS paths and in result create false partitions and directories

2017-03-30 Thread Dudu Markovitz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15948517#comment-15948517
 ] 

Dudu Markovitz commented on HIVE-16299:
---

Hi

1.
The 1st example could be created on hive 1.1.0-cdh5.7.0 but not on Hive 
1.2.1000.2.5.3.0-37.
The 2nd example could be created on both.

2.
I think that a situation where a partition is created built on directories 
structure and then directories are built based on the partitions that were just 
added, is not healthy and should be prevented.
It would make a lot of sense to validated a complete path and not each 
directory separately.










> In case of partitioned table, MSCK REPAIR TABLE does not do a full validation 
> of a FS paths and in result create false partitions and directories
> -
>
> Key: HIVE-16299
> URL: https://issues.apache.org/jira/browse/HIVE-16299
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: storage-2.2.0
>Reporter: Dudu Markovitz
>Assignee: Vihang Karajgaonkar
>Priority: Minor
>
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java
> static String getPartitionName(Path tablePath, Path partitionPath, 
> Set partCols)
> 
> MSCK REPAIR validates that any sub-directory is in the format col=val and 
> that there is indeed a partition column named "col".
> However, there is no validation of the partition column location and as a 
> result false partitions are being created and so are directories that match 
> those partitions. 
> e.g. 1
> hive> dfs -mkdir -p /user/hive/warehouse/t/a=1/a=2/a=3/b=4/c=5;
> hive> create external table t (i int) partitioned by (a int,b int,c int) ;
> OK
> hive> msck repair table t;
> OK
> Partitions not in metastore:  t:a=1/a=2/a=3/b=4/c=5
> Repair: Added partition to metastore t:a=1/a=2/a=3/b=4/c=5
> Time taken: 0.563 seconds, Fetched: 2 row(s)
> hive> show partitions t;
> OK
> a=3/b=4/c=5
> hive> dfs -ls -R /user/hive/warehouse/t;
> drwxr-xr-x   - cloudera supergroup  0 2017-03-26 13:07 
> /user/hive/warehouse/t/a=1
> drwxr-xr-x   - cloudera supergroup  0 2017-03-26 13:07 
> /user/hive/warehouse/t/a=1/a=2
> drwxr-xr-x   - cloudera supergroup  0 2017-03-26 13:07 
> /user/hive/warehouse/t/a=1/a=2/a=3
> drwxr-xr-x   - cloudera supergroup  0 2017-03-26 13:07 
> /user/hive/warehouse/t/a=1/a=2/a=3/b=4
> drwxr-xr-x   - cloudera supergroup  0 2017-03-26 13:07 
> /user/hive/warehouse/t/a=1/a=2/a=3/b=4/c=5
> drwxrwxrwx   - cloudera supergroup  0 2017-03-26 13:07 
> /user/hive/warehouse/t/a=3
> drwxrwxrwx   - cloudera supergroup  0 2017-03-26 13:07 
> /user/hive/warehouse/t/a=3/b=4
> drwxrwxrwx   - cloudera supergroup  0 2017-03-26 13:07 
> /user/hive/warehouse/t/a=3/b=4/c=5
> e.g. 2
> hive> dfs -mkdir -p /user/hive/warehouse/t/c=3/b=2/a=1;
> hive> create external table t (i int) partitioned by (a int,b int,c int);
> OK
> hive> msck repair table t;
> OK
> Partitions not in metastore:  t:c=3/b=2/a=1
> Repair: Added partition to metastore t:c=3/b=2/a=1
> Time taken: 0.512 seconds, Fetched: 2 row(s)
> hive> show partitions t;
> OK
> a=1/b=2/c=3
> hive> dfs -ls -R  /user/hive/warehouse/t;
> drwxrwxrwx   - cloudera supergroup  0 2017-03-26 13:13 
> /user/hive/warehouse/t/a=1
> drwxrwxrwx   - cloudera supergroup  0 2017-03-26 13:13 
> /user/hive/warehouse/t/a=1/b=2
> drwxrwxrwx   - cloudera supergroup  0 2017-03-26 13:13 
> /user/hive/warehouse/t/a=1/b=2/c=3
> drwxr-xr-x   - cloudera supergroup  0 2017-03-26 13:12 
> /user/hive/warehouse/t/c=3
> drwxr-xr-x   - cloudera supergroup  0 2017-03-26 13:12 
> /user/hive/warehouse/t/c=3/b=2
> drwxr-xr-x   - cloudera supergroup  0 2017-03-26 13:12 
> /user/hive/warehouse/t/c=3/b=2/a=1



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-12541) SymbolicTextInputFormat should supports the path with regex

2017-03-22 Thread Dudu Markovitz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15936661#comment-15936661
 ] 

Dudu Markovitz commented on HIVE-12541:
---

According to the examples is this ticket, the support is for GLOB, not for 
REGEX.

> SymbolicTextInputFormat should supports the path with regex
> ---
>
> Key: HIVE-12541
> URL: https://issues.apache.org/jira/browse/HIVE-12541
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.14.0, 1.2.0, 1.2.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
> Fix For: 2.1.0
>
> Attachments: HIVE-12541.1.patch, HIVE-12541.2.patch, 
> HIVE-12541.3.patch, HIVE-12541.4.patch
>
>
> 1, In fact,SybolicTextInputFormat supports the path with regex .I add some 
> test sql . 
> 2, But ,when using CombineHiveInputFormat to combine  input files , It cannot 
> resolve the path with regex ,so it will get a wrong result.I  give a example 
> ,and fix the problem.
> Table desc :
> {noformat}
> CREATE External TABLE `symlink_text_input_format`(
>   `key` string,
>   `value` string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/symlink_text_input_format'  
> {noformat}
> There is a link file in the dir 
> '/user/hive/warehouse/symlink_text_input_format' ,   the content of the link 
> file is 
> {noformat}
>  viewfs://nsx/tmp/symlink* 
> {noformat}
> it contains one path ,and the path contains a regex!
> Execute the sql : 
> {noformat}
> set hive.rework.mapredwork = true ;
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> set mapred.min.split.size.per.rack= 0 ;
> set mapred.min.split.size.per.node= 0 ;
> set mapred.max.split.size= 0 ;
> select count(*) from  symlink_text_input_format ;
> {noformat}
> It will get a wrong result :0 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16029) COLLECT_SET and COLLECT_LIST does not return NULL in the result

2017-02-25 Thread Dudu Markovitz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15884595#comment-15884595
 ] 

Dudu Markovitz commented on HIVE-16029:
---

Not a bug. NULL values are being ignored by all aggregate functions.

> COLLECT_SET and COLLECT_LIST does not return NULL in the result
> ---
>
> Key: HIVE-16029
> URL: https://issues.apache.org/jira/browse/HIVE-16029
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.1
>Reporter: Eric Lin
>Assignee: Eric Lin
>Priority: Minor
> Attachments: HIVE-16029.patch
>
>
> See the test case below:
> {code}
> 0: jdbc:hive2://localhost:1/default> select * from collect_set_test;
> +-+
> | collect_set_test.a  |
> +-+
> | 1   |
> | 2   |
> | NULL|
> | 4   |
> | NULL|
> +-+
> 0: jdbc:hive2://localhost:1/default> select collect_set(a) from 
> collect_set_test;
> +---+
> |  _c0  |
> +---+
> | [1,2,4]  |
> +---+
> {code}
> The correct result should be:
> {code}
> 0: jdbc:hive2://localhost:1/default> select collect_set(a) from 
> collect_set_test;
> +---+
> |  _c0  |
> +---+
> | [1,2,null,4]  |
> +---+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-1683) Column aliases cannot be used in a group by clause

2016-12-05 Thread Dudu Markovitz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15721729#comment-15721729
 ] 

Dudu Markovitz edited comment on HIVE-1683 at 12/5/16 9:37 AM:
---

There was no reason to close this issue.
The execution flow is not relevant, the important thing is that the full query 
is known at parsing time and therefore aliases can be expand to their source 
expressions no matter where they are located.
Teradata for example supports aliases in every part of the query including 
SELECT and WHERE.

e.g.
{code}
select  length(x) as len_x,case when len_x > 10 then 'Y' else 'N' end ...
select length(x) as len_x from ... where len_x > 10
{\code}



was (Author: dmarkovitz):
There was no reason to close this issue.
The execution flow is not relevant, the important thing is that the full query 
is known at parsing time and therefore aliases can be expand to their source 
expressions no matter where they are located.
Teradata for example supports aliases in every part of the query including 
SELECT and WHERE.

e.g.
select  length(x) as len_x,case when len_x > 10 then 'Y' else 'N' end ...
select length(x) as len_x from ... where len_x > 10


> Column aliases cannot be used in a group by clause
> --
>
> Key: HIVE-1683
> URL: https://issues.apache.org/jira/browse/HIVE-1683
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Shrikrishna Lawande
>  Labels: SQL
>
> Column aliases cannot be used in a group by clause
> Following query would fail :
> select col1 as t, count(col2) from test group by t;
> FAILED: Error in semantic analysis: line 1:49 Invalid Table Alias or Column 
> Reference t



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-1683) Column aliases cannot be used in a group by clause

2016-12-05 Thread Dudu Markovitz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15721729#comment-15721729
 ] 

Dudu Markovitz edited comment on HIVE-1683 at 12/5/16 9:35 AM:
---

There was no reason to close this issue.
The execution flow is not relevant, the important thing is that the full query 
is known at parsing time and therefore aliases can be expand to their source 
expressions no matter where they are located.
Teradata for example supports aliases in every part of the query including 
SELECT and WHERE.

e.g.
select  length(x) as len_x,case when len_x > 10 then 'Y' else 'N' end ...
select length(x) as len_x from ... where len_x > 10



was (Author: dmarkovitz):
There was no reason to close this issue.
The execution flow is not relevant, the important thing is that the full query 
is known at parsing time and therefore aliases can be expand to their source 
expressions no matter where they are located.
Teradata for example supports aliases in every part of the query including 
SELECT and WHERE.

e.g.
bq. select  length(x) as len_x,case when len_x > 10 then 'Y' else 'N' end ...
bq. select length(x) as len_x from ... where len_x > 10


> Column aliases cannot be used in a group by clause
> --
>
> Key: HIVE-1683
> URL: https://issues.apache.org/jira/browse/HIVE-1683
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Shrikrishna Lawande
>  Labels: SQL
>
> Column aliases cannot be used in a group by clause
> Following query would fail :
> select col1 as t, count(col2) from test group by t;
> FAILED: Error in semantic analysis: line 1:49 Invalid Table Alias or Column 
> Reference t



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-1683) Column aliases cannot be used in a group by clause

2016-12-05 Thread Dudu Markovitz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15721729#comment-15721729
 ] 

Dudu Markovitz edited comment on HIVE-1683 at 12/5/16 9:34 AM:
---

There was no reason to close this issue.
The execution flow is not relevant, the important thing is that the full query 
is known at parsing time and therefore aliases can be expand to their source 
expressions no matter where they are located.
Teradata for example supports aliases in every part of the query including 
SELECT and WHERE.

e.g.
bq. select  length(x) as len_x,case when len_x > 10 then 'Y' else 'N' end ...
bq. select length(x) as len_x from ... where len_x > 10



was (Author: dmarkovitz):
There was no reason to close this issue.
The execution flow is not relevant, the important thing is that the full query 
is known at parsing time and therefore aliases can be expand to their source 
expressions no matter where they are located.
Teradata for example supports aliases in every part of the query including 
SELECT and WHERE.

e.g.
select  length(x) as len_x,case when len_x > 10 then 'Y' else 'N' end ...
select length(x) as len_x from ... where len_x > 10


> Column aliases cannot be used in a group by clause
> --
>
> Key: HIVE-1683
> URL: https://issues.apache.org/jira/browse/HIVE-1683
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Shrikrishna Lawande
>  Labels: SQL
>
> Column aliases cannot be used in a group by clause
> Following query would fail :
> select col1 as t, count(col2) from test group by t;
> FAILED: Error in semantic analysis: line 1:49 Invalid Table Alias or Column 
> Reference t



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-1683) Column aliases cannot be used in a group by clause

2016-12-05 Thread Dudu Markovitz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15721729#comment-15721729
 ] 

Dudu Markovitz edited comment on HIVE-1683 at 12/5/16 9:34 AM:
---

There was no reason to close this issue.
The execution flow is not relevant, the important thing is that the full query 
is known at parsing time and therefore aliases can be expand to their source 
expressions no matter where they are located.
Teradata for example supports aliases in every part of the query including 
SELECT and WHERE.

e.g.
select  length(x) as len_x,case when len_x > 10 then 'Y' else 'N' end ...
select length(x) as len_x from ... where len_x > 10



was (Author: dmarkovitz):
There was no reason to close this issue.
The execution flow is not relevant, the important thing is that the full query 
is known at parsing time and therefore aliases can be expand to their source 
expressions no matter where they are located.
Teradata for example supports aliases in every part of the query including 
SELECT and WHERE.

e.g.
select  length(x) as len_x,case when len_x > 10 then 'Y' else 'N' end ...
select length(x) as len_x from ... where len_x > 10


> Column aliases cannot be used in a group by clause
> --
>
> Key: HIVE-1683
> URL: https://issues.apache.org/jira/browse/HIVE-1683
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Shrikrishna Lawande
>  Labels: SQL
>
> Column aliases cannot be used in a group by clause
> Following query would fail :
> select col1 as t, count(col2) from test group by t;
> FAILED: Error in semantic analysis: line 1:49 Invalid Table Alias or Column 
> Reference t



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-1683) Column aliases cannot be used in a group by clause

2016-12-05 Thread Dudu Markovitz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15721729#comment-15721729
 ] 

Dudu Markovitz commented on HIVE-1683:
--

There was no reason to close this issue.
The execution flow is not relevant, the important thing is that the full query 
is known at parsing time and therefore aliases can be expand to their source 
expressions no matter where they are located.
Teradata for example supports aliases in every part of the query including 
SELECT and WHERE.

e.g.
select  length(x) as len_x,case when len_x > 10 then 'Y' else 'N' end ...
select length(x) as len_x from ... where len_x > 10


> Column aliases cannot be used in a group by clause
> --
>
> Key: HIVE-1683
> URL: https://issues.apache.org/jira/browse/HIVE-1683
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Shrikrishna Lawande
>  Labels: SQL
>
> Column aliases cannot be used in a group by clause
> Following query would fail :
> select col1 as t, count(col2) from test group by t;
> FAILED: Error in semantic analysis: line 1:49 Invalid Table Alias or Column 
> Reference t



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-5483) use metastore statistics to optimize max/min/etc. queries

2016-07-30 Thread Dudu Markovitz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400813#comment-15400813
 ] 

Dudu Markovitz commented on HIVE-5483:
--

Hi guys, perhaps I'm missing something but do I have any guarantee for the 
correctness of the metadata when someone can simply delete or replace a files 
directly in the file system without going through the metastore? 

> use metastore statistics to optimize max/min/etc. queries
> -
>
> Key: HIVE-5483
> URL: https://issues.apache.org/jira/browse/HIVE-5483
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Ashutosh Chauhan
> Fix For: 0.13.0
>
> Attachments: HIVE-5483.2.patch, HIVE-5483.3.patch, HIVE-5483.patch
>
>
> We have discussed this a little bit.
> Hive can answer queries such as select max(c1) from t purely from metastore 
> using partition statistics, provided that we know the statistics are up to 
> date.
> All data changes (e.g. adding new partitions) currently go thru metastore so 
> we can track up-to-date-ness. If they are not up-to-date, the queries will 
> have to read data (at least for outdated partitions) until someone runs 
> analyze table. We can also analyze new partitions after add, if that is 
> configured/specified in the command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan

2016-07-30 Thread Dudu Markovitz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400809#comment-15400809
 ] 

Dudu Markovitz commented on HIVE-6492:
--

Hi guys

Perhaps I'm missing something, but although I understand the the business 
scenario I can't say I understand the chosen solution.

1.
Does it make sense to limit the access to all tables by the number of 
partitions when the volume of a partition can vary rapidly from table to table? 

2.
Does it make sense to limit all users with a single parameter where there are 
different groups of users with different business justifications?

3.
What prevents the users from simply divide their queries to multiple smaller 
queries?  

4.
Can't a user just change the parameter for his session, removing the limitation?


For various reasons It is strongly recommended not to give the users access to 
tables themselves but only to views that masks the tables.
If that approach is taken, a simple filter within the view can solved the 
issue, e.g. -

create view mytable_v as select * from mytable where create_date >= date 
'2013-01-01';



> limit partition number involved in a table scan
> ---
>
> Key: HIVE-6492
> URL: https://issues.apache.org/jira/browse/HIVE-6492
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.12.0
>Reporter: Selina Zhang
>Assignee: Selina Zhang
> Fix For: 0.13.0
>
> Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt, 
> HIVE-6492.3.patch.txt, HIVE-6492.4.patch.txt, HIVE-6492.4.patch_suggestion, 
> HIVE-6492.5.patch.txt, HIVE-6492.6.patch.txt, HIVE-6492.7.parch.txt
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> To protect the cluster, a new configure variable 
> "hive.limit.query.max.table.partition" is added to hive configuration to
> limit the table partitions involved in a table scan. 
> The default value will be set to -1 which means there is no limit by default. 
> This variable will not affect "metadata only" query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6628) Use UDFs in create table statement

2016-07-30 Thread Dudu Markovitz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400794#comment-15400794
 ] 

Dudu Markovitz commented on HIVE-6628:
--

Hi Nicolas

I couldn't understand why a combination of external table + view is not a good 
fit for you, e.g. -

create external table mytable(
 userid int
,adate string
,listofthings string
)ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
;

create view mytable_v as
select userid int
 , from_utc_timestamp(adate,"Europe/Paris")
 ,split( list_of_things, ";")
from mytable
;

> Use UDFs in create table statement
> --
>
> Key: HIVE-6628
> URL: https://issues.apache.org/jira/browse/HIVE-6628
> Project: Hive
>  Issue Type: New Feature
>  Components: CLI
>Reporter: nicolas maillard
>Priority: Trivial
>
> It would be nice to be able to use UDFs in a create table statement
> Say my data is : userid, timestamp utc, list_of_things
> 123,1386716402,thing1;thing2:thing3
> Being able to say
> create external table mytable(
> userid int
> adate string as from_utc_timestamp(tilmestamp,"Europe/Paris")
> listofthings array as split( list_of_things, ";")
> )ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
> this is like a much lighter serde or a simpler view I guess.
> It would allow to correct the view of certains fields on the fly  without 
> needing to do reproscessing. this is a use case we see happening a lot in our 
> inital data collections



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-10142) Calculating formula based on difference between each row's value and current row's in Windowing function

2016-07-30 Thread Dudu Markovitz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400487#comment-15400487
 ] 

Dudu Markovitz edited comment on HIVE-10142 at 7/30/16 7:23 PM:


Although I can relate to the request, I've never seen it implemented before, 
probably because it is an O(N^2) operation.

E.g.-
For every event I would like to count the number of events with higher values, 
that occurred before this event.
Assuming we have a new keyword  "CURRENT_ROW", the analytic function would look 
something like this:

count (case when val > CURRENT_ROW.val then 1 end) over (order by ts rows 
between unbounded preceding and current row)

The thing is that in order to implement this we would probably sort the data 
set by ts (so far so good) and then compare each record against its preceding 
records which is a O(N^2) operation.
That mean that for a table of 1M (1,000,000) record we are at the scale of 1T 
(1,000,000,000,000) operations.

I'm not sure we want to go there.



was (Author: dmarkovitz):
Although I can relate to the request, I've never seen it implemented before, 
probably because it is an O(N^2) operation.

Take this for example -
For every event I would like to count the number of events with higher higher 
values that occurred before it.
Assuming we have a new keyword  "CURRENT_ROW", the analytic function would look 
something like this:

count (case when val > CURRENT_ROW.val then 1 end) over (order by ts rows 
between unbounded preceding and current row)

The thing is that in order to implement this we would probably sort the data 
set by ts (so far so good) and then compare each record against its preceding 
records which is a O(N^2) operation.
That mean that for a table of 1M (1,000,000) record we are at the scale of 1T 
(1,000,000,000,000) operations.

I'm not sure want to go there.


> Calculating formula based on difference between each row's value and current 
> row's in Windowing function
> 
>
> Key: HIVE-10142
> URL: https://issues.apache.org/jira/browse/HIVE-10142
> Project: Hive
>  Issue Type: New Feature
>  Components: PTF-Windowing
>Affects Versions: 1.0.0
>Reporter: Yi Zhang
>Assignee: Aihua Xu
>
> For analytics with windowing function, the calculation formula sometimes 
> needs to perform over each row's value against current tow's value. The decay 
> value is a good example, such as sums of value with a decay function based on 
> difference of timestamp between each row and current row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11022) Support collecting lists in user defined order

2016-07-30 Thread Dudu Markovitz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400781#comment-15400781
 ] 

Dudu Markovitz commented on HIVE-11022:
---

In order to support complex ordering expressions I would suggest to add the 
sorting option as an enhanced syntax for the current functions and not as 
additional functions, in a similar way to MySQL's GROUP_CONCAT or Oracle's 
LISTAGG.
It would also be nice to add the SEPARATOR /DELIMITER option.

http://dev.mysql.com/doc/refman/5.7/en/group-by-functions.html#function_group-concat
http://docs.oracle.com/cd/E11882_01/server.112/e41084/functions089.htm#SQLRF30030




> Support collecting lists in user defined order
> --
>
> Key: HIVE-11022
> URL: https://issues.apache.org/jira/browse/HIVE-11022
> Project: Hive
>  Issue Type: New Feature
>  Components: UDF
>Reporter: Michael Haeusler
>
> Hive currently supports aggregation of lists "in order of input rows" with 
> the UDF collect_list. Unfortunately, the order is not well defined when 
> map-side aggregations are used.
> Hive could support collecting lists in user-defined order by providing a UDF
> COLLECT_LIST_SORTED(valueColumn, sortColumn[, limit]), that would return a 
> list of values sorted in a user defined order. An optional limit parameter 
> can restrict this to the n first values within that order.
> Especially in the limit case, this can be efficiently pre-aggregated and 
> reduces the amount of data transferred to reducers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7660) Hive to support qualify analytic filtering

2016-07-30 Thread Dudu Markovitz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400494#comment-15400494
 ] 

Dudu Markovitz commented on HIVE-7660:
--

This is a syntactic sugar supported by Teradata.

e.g.
In order to get the last 3 records of each buyer:

select * from purchase qualify row_number () over (partition by buyer order by 
ts desc) <= 3;

without the "qualify" it would look something like:

select * from (select *,row_number () over (partition by buyer order by ts 
desc) as rn from purchase) as t where t.rn <= 3;

> Hive to support qualify analytic filtering
> --
>
> Key: HIVE-7660
> URL: https://issues.apache.org/jira/browse/HIVE-7660
> Project: Hive
>  Issue Type: New Feature
>Reporter: Viji
>Priority: Trivial
>
> Currently, Hive does not support qualify analytic filtering. It would be 
> useful fi this feature were added in the future.
> As a workaround, since it is just a filter, we can replace it with a subquery 
> and filter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13733) CTE + "IS NULL" predicate + column aliasing as existing column leads to wrong results

2016-05-11 Thread Dudu Markovitz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dudu Markovitz updated HIVE-13733:
--
Description: 
hive> create table t (i int,a string,b string);
hive> insert into t values (1,'hello','world'),(2,'bye',null);
hive> select * from t where t.b is null;
2   bye NULL

This is wrong, all 3 columns should return the same value - t.a.

hive> with cte as (select t.a as a,t.a as b,t.a as c from t where t.b is null) 
select * from cte;
bye NULLbye


However, these are right:

hive> select t.a as a,t.a as b,t.a as c from t where t.b is null;
bye bye bye


hive> with cte as (select t.a as a,t.a as b,t.a as c from t where t.b is not 
null) select * from cte;OK
hello   hello   hello


  was:
hive> create table t (i int,a string,b string);
hive> insert into t values (1,'hello','world'),(2,'bye',null);
hive> select * from t where t.b is null;
2   bye NULL

This is wrong, all 3 columns should return the same value - t.a.

hive> with cte as (select t.a as a,t.a as b,t.a as c from t where t.b is null) 
select * from cte;
bye NULLbye


However, this are right:

hive> select t.a as a,t.a as b,t.a as c from t where t.b is null;
bye bye bye


hive> with cte as (select t.a as a,t.a as b,t.a as c from t where t.b is not 
null) select * from cte;OK
hello   hello   hello



> CTE + "IS NULL" predicate + column aliasing as existing column leads to wrong 
> results 
> --
>
> Key: HIVE-13733
> URL: https://issues.apache.org/jira/browse/HIVE-13733
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Dudu Markovitz
>
> hive> create table t (i int,a string,b string);
> hive> insert into t values (1,'hello','world'),(2,'bye',null);
> hive> select * from t where t.b is null;
> 2 bye NULL
> This is wrong, all 3 columns should return the same value - t.a.
> hive> with cte as (select t.a as a,t.a as b,t.a as c from t where t.b is 
> null) select * from cte;
> bye   NULLbye
> However, these are right:
> hive> select t.a as a,t.a as b,t.a as c from t where t.b is null;
> bye   bye bye
> hive> with cte as (select t.a as a,t.a as b,t.a as c from t where t.b is not 
> null) select * from cte;OK
> hello hello   hello



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13412) External table - fields terminated by '\u0044' - 0044 is being interpreted as decimal and not hex

2016-04-25 Thread Dudu Markovitz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256258#comment-15256258
 ] 

Dudu Markovitz commented on HIVE-13412:
---

It appears as a general bug (the way '\u' is treated) and not specific for 
field terminators


> select ascii('\u0010');
10

-- should have been 16

> External table -  fields terminated by '\u0044' - 0044 is being interpreted 
> as decimal and not hex
> --
>
> Key: HIVE-13412
> URL: https://issues.apache.org/jira/browse/HIVE-13412
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 0.14.0
>Reporter: Dudu Markovitz
>Assignee: Pengcheng Xiong
>
> , (comma) as the decimal value of '44' and hex value of '2c'
> In the following example I'm using  '\u0044' as delimiter which is being 
> interpreted as comma.
> hive> create external table test_delimiter_dec_unicode (c1 int,c2 int,c3 int) 
> row format delimited fields terminated by '\u0044';
> OK
> Time taken: 0.035 seconds
> hive> show create table test_delimiter_dec_unicode;
> OK
> CREATE EXTERNAL TABLE `test_delimiter_dec_unicode`(
>   `c1` int,
>   `c2` int,
>   `c3` int)
> ROW FORMAT DELIMITED
>   FIELDS TERMINATED BY ','
> ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)