date:20211026

[jira] [Assigned] (HIVE-25652) Add constraints in result of “SHOW CREATE TABLE ”

2021-10-26 Thread Soumyakanti Das (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Soumyakanti Das reassigned HIVE-25652:
--


> Add constraints in result of “SHOW CREATE TABLE ”
> -
>
> Key: HIVE-25652
> URL: https://issues.apache.org/jira/browse/HIVE-25652
> Project: Hive
>  Issue Type: Improvement
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Major
>
> Currently show create table doesn’t pull any constraint info like not null, 
> defaults, primary key.
> Example:
> Create table
>  
> {code:java}
> CREATE TABLE TEST(
>   col1 varchar(100) NOT NULL COMMENT "comment for column 1",
>   col2 timestamp DEFAULT CURRENT_TIMESTAMP() COMMENT "comment for column 2",
>   col3 decimal,
>   col4 varchar(512) NOT NULL,
>   col5 varchar(100),
>   primary key(col1, col2) disable novalidate)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat';
> {code}
> Currently {{SHOW CREATE TABLE TEST}} doesn't show the column constraints.
> {code:java}
> CREATE TABLE `test`(
>   `col1` varchar(100) COMMENT 'comment for column 1', 
>   `col2` timestamp COMMENT 'comment for column 2', 
>   `col3` decimal(10,0), 
>   `col4` varchar(512), 
>   `col5` varchar(100))
> ROW FORMAT SERDE 
>   'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
> STORED AS INPUTFORMAT 
>   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
> OUTPUTFORMAT 
>   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25303) CTAS hive.create.as.external.legacy tries to place data files in managed WH path

2021-10-26 Thread Sai Hemanth Gantasala (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sai Hemanth Gantasala updated HIVE-25303:
-
Description: 
Under legacy table creation mode (hive.create.as.external.legacy=true), when a 
database has been created in a specific LOCATION, in a session where that 
database is Used, tables are created using the following command:
{code:java}
CREATE TABLE  AS SELECT {code}
should inherit the HDFS path from the database's location. Instead, Hive is 
trying to write the table data into 
/warehouse/tablespace/managed/hive//

+Design+: 
 In the CTAS query, first data is written in the target directory (which 
happens in HS2) and then the table is created(This happens in HMS). So here two 
decisions are being made i) target directory location ii) how the table should 
be created (table type, sd e.t.c).
 When HS2 needs a target location that needs to be set, it'll make create a 
table dry run call to HMS (where table translation happens) and i) and ii) 
decisions are made within HMS and returns table object. Then HS2 will use this 
location set by HMS for placing the data.

The  patch for issue addresses the table location being incorrect and table 
data being empty for the following cases 1) when the external legacy config is 
set i.e.., hive.create.as.external.legacy=true 2) when the table is created 
with the transactional property set to false i.e.., TBLPROPERTIES 
('transactional'='false')

  was:
Under legacy table creation mode (hive.create.as.external.legacy=true), when a 
database has been created in a specific LOCATION, in a session where that 
database is Used, tables are created using the following command:
{code:java}
CREATE TABLE  AS SELECT {code}
should inherit the HDFS path from the database's location. Instead, Hive is 
trying to write the table data into 
/warehouse/tablespace/managed/hive//

+Design+: 
 In the CTAS query, first data is written in the target directory (which 
happens in HS2) and then the table is created(This happens in HMS). So here two 
decisions are being made i) target directory location ii) how the table should 
be created (table type, sd e.t.c).
 When HS2 needs a target location that needs to be set, it'll make create table 
dry run call to HMS (where table translation happens) and i) and ii) decisions 
are made within HMS and returns table object. Then HS2 will use this location 
set by HMS for placing the data.

The  patch for issue addresses the table location


> CTAS hive.create.as.external.legacy tries to place data files in managed WH 
> path
> 
>
> Key: HIVE-25303
> URL: https://issues.apache.org/jira/browse/HIVE-25303
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Standalone Metastore
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Under legacy table creation mode (hive.create.as.external.legacy=true), when 
> a database has been created in a specific LOCATION, in a session where that 
> database is Used, tables are created using the following command:
> {code:java}
> CREATE TABLE  AS SELECT {code}
> should inherit the HDFS path from the database's location. Instead, Hive is 
> trying to write the table data into 
> /warehouse/tablespace/managed/hive//
> +Design+: 
>  In the CTAS query, first data is written in the target directory (which 
> happens in HS2) and then the table is created(This happens in HMS). So here 
> two decisions are being made i) target directory location ii) how the table 
> should be created (table type, sd e.t.c).
>  When HS2 needs a target location that needs to be set, it'll make create a 
> table dry run call to HMS (where table translation happens) and i) and ii) 
> decisions are made within HMS and returns table object. Then HS2 will use 
> this location set by HMS for placing the data.
> The  patch for issue addresses the table location being incorrect and table 
> data being empty for the following cases 1) when the external legacy config 
> is set i.e.., hive.create.as.external.legacy=true 2) when the table is 
> created with the transactional property set to false i.e.., TBLPROPERTIES 
> ('transactional'='false')



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25303) CTAS hive.create.as.external.legacy tries to place data files in managed WH path

2021-10-26 Thread Sai Hemanth Gantasala (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sai Hemanth Gantasala updated HIVE-25303:
-
Description: 
Under legacy table creation mode (hive.create.as.external.legacy=true), when a 
database has been created in a specific LOCATION, in a session where that 
database is Used, tables are created using the following command:
{code:java}
CREATE TABLE  AS SELECT {code}
should inherit the HDFS path from the database's location. Instead, Hive is 
trying to write the table data into 
/warehouse/tablespace/managed/hive//

+Design+: 
 In the CTAS query, first data is written in the target directory (which 
happens in HS2) and then the table is created(This happens in HMS). So here two 
decisions are being made i) target directory location ii) how the table should 
be created (table type, sd e.t.c).
 When HS2 needs a target location that needs to be set, it'll make create table 
dry run call to HMS (where table translation happens) and i) and ii) decisions 
are made within HMS and returns table object. Then HS2 will use this location 
set by HMS for placing the data.

The  patch for issue addresses the table location

  was:
Under legacy table creation mode (hive.create.as.external.legacy=true), when a 
database has been created in a specific LOCATION, in a session where that 
database is Used, tables are created using the following command:
{code:java}
CREATE TABLE  AS SELECT {code}
should inherit the HDFS path from the database's location. Instead, Hive is 
trying to write the table data into 
/warehouse/tablespace/managed/hive//

+Design+: 
In the CTAS query, first data is written in the target directory (which happens 
in HS2) and then the table is created(This happens in HMS). So here two 
decisions are being made i) target directory location ii) how the table should 
be created (table type, sd e.t.c).
When HS2 needs a target location that needs to be set, it'll make create table 
dry run call to HMS (where table translation happens) and i) and ii) decisions 
are made within HMS and returns table object. Then HS2 will use this location 
set by HMS for placing the data.


> CTAS hive.create.as.external.legacy tries to place data files in managed WH 
> path
> 
>
> Key: HIVE-25303
> URL: https://issues.apache.org/jira/browse/HIVE-25303
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Standalone Metastore
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Under legacy table creation mode (hive.create.as.external.legacy=true), when 
> a database has been created in a specific LOCATION, in a session where that 
> database is Used, tables are created using the following command:
> {code:java}
> CREATE TABLE  AS SELECT {code}
> should inherit the HDFS path from the database's location. Instead, Hive is 
> trying to write the table data into 
> /warehouse/tablespace/managed/hive//
> +Design+: 
>  In the CTAS query, first data is written in the target directory (which 
> happens in HS2) and then the table is created(This happens in HMS). So here 
> two decisions are being made i) target directory location ii) how the table 
> should be created (table type, sd e.t.c).
>  When HS2 needs a target location that needs to be set, it'll make create 
> table dry run call to HMS (where table translation happens) and i) and ii) 
> decisions are made within HMS and returns table object. Then HS2 will use 
> this location set by HMS for placing the data.
> The  patch for issue addresses the table location



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (HIVE-287) support count(*) and count distinct on multiple columns

2021-10-26 Thread Amelia Emma (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17434587#comment-17434587
 ] 

Amelia Emma edited comment on HIVE-287 at 10/26/21, 9:46 PM:
-

Magnificently written! I am astonished to see this thing. I will keep an eye on 
these types of content to improve my fairycoupons.com to sale coupons like 
[thriftbooks 
coupons|http://[*fairycoupons.com/store-profile/thriftbooks-coupons*|https://fairycoupons.com/store-profile/thriftbooks-coupons].com]
 and [eberjey discount 
code|http://[*fairycoupons.com/store-profile/eberjey-coupons*|https://fairycoupons.com/store-profile/eberjey-coupons].com].


was (Author: amelii):
Magnificently written! I am astonished to see this thing. I will keep an eye on 
these types of content to improve my fairycoupons.com to sale coupons like 
[thriftbooks 
coupons|http://[*fairycoupons.com/store-profile/thriftbooks-coupons*|https://fairycoupons.com/store-profile/thriftbooks-coupons].com]
 and [eberjey discount 
code|http://[*fairycoupons.com/store-profile/eberjey-coupons*|https://fairycoupons.com/store-profile/eberjey-coupons].com].

> support count(*) and count distinct on multiple columns
> ---
>
> Key: HIVE-287
> URL: https://issues.apache.org/jira/browse/HIVE-287
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Namit Jain
>Assignee: Arvind Prabhakar
>Priority: Major
> Fix For: 0.6.0
>
> Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch, 
> HIVE-287-4.patch, HIVE-287-5-branch-0.6.patch, HIVE-287-5-trunk.patch, 
> HIVE-287-6-branch-0.6.patch, HIVE-287-6-trunk.patch
>
>
> The following query does not work:
> select count(distinct col1, col2) from Tbl



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-287) support count(*) and count distinct on multiple columns

2021-10-26 Thread Amelia Emma (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17434587#comment-17434587
 ] 

Amelia Emma commented on HIVE-287:
--

Magnificently written! I am astonished to see this thing. I will keep an eye on 
these types of content to improve my fairycoupons.com to sale coupons like 
[thriftbooks 
coupons|http://[*fairycoupons.com/store-profile/thriftbooks-coupons*|https://fairycoupons.com/store-profile/thriftbooks-coupons].com]
 and [eberjey discount 
code|http://[*fairycoupons.com/store-profile/eberjey-coupons*|https://fairycoupons.com/store-profile/eberjey-coupons].com].

> support count(*) and count distinct on multiple columns
> ---
>
> Key: HIVE-287
> URL: https://issues.apache.org/jira/browse/HIVE-287
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Namit Jain
>Assignee: Arvind Prabhakar
>Priority: Major
> Fix For: 0.6.0
>
> Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch, 
> HIVE-287-4.patch, HIVE-287-5-branch-0.6.patch, HIVE-287-5-trunk.patch, 
> HIVE-287-6-branch-0.6.patch, HIVE-287-6-trunk.patch
>
>
> The following query does not work:
> select count(distinct col1, col2) from Tbl



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (HIVE-287) support count(*) and count distinct on multiple columns

2021-10-26 Thread Amelia Emma (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17434583#comment-17434583
 ] 

Amelia Emma edited comment on HIVE-287 at 10/26/21, 9:38 PM:
-

Magnificently written! I am astonished to see this thing. I will keep an eye on 
these types of content to improve my fairycoupons.com to sale coupons like 
[thriftbooks 
coupons][[http://fairycoupons.com/store-profile/thriftbooks-coupons.com]|http://fairycoupons.com/store-profile/thriftbooks-coupons.com]
 and [coolwick coupons 
code][[*https://fairycoupons.com/store-profile/coolwick-coupons-code.*|https://fairycoupons.com/store-profile/coolwick-coupons-code]com]com].


was (Author: amelii):
Magnificently written! I am astonished to see this thing. I will keep an eye on 
these types of content to improve my fairycoupons.com to sale coupons like 
[thriftbooks 
coupons[http://fairycoupons.com/store-profile/thriftbooks-coupons.com] and 
[coolwick coupons 
code[[*https://fairycoupons.com/store-profile/coolwick-coupons-code.*|https://fairycoupons.com/store-profile/coolwick-coupons-code]com]com].

> support count(*) and count distinct on multiple columns
> ---
>
> Key: HIVE-287
> URL: https://issues.apache.org/jira/browse/HIVE-287
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Namit Jain
>Assignee: Arvind Prabhakar
>Priority: Major
> Fix For: 0.6.0
>
> Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch, 
> HIVE-287-4.patch, HIVE-287-5-branch-0.6.patch, HIVE-287-5-trunk.patch, 
> HIVE-287-6-branch-0.6.patch, HIVE-287-6-trunk.patch
>
>
> The following query does not work:
> select count(distinct col1, col2) from Tbl



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Issue Comment Deleted] (HIVE-287) support count(*) and count distinct on multiple columns

2021-10-26 Thread Amelia Emma (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amelia Emma updated HIVE-287:
-
Comment: was deleted

(was: Magnificently written! I am astonished to see this thing. I will keep an 
eye on these types of content to improve my fairycoupons.com to sale coupons 
like [thriftbooks 
coupons][[http://fairycoupons.com/store-profile/thriftbooks-coupons.com]|http://fairycoupons.com/store-profile/thriftbooks-coupons.com]
 and [coolwick coupons 
code][[*https://fairycoupons.com/store-profile/coolwick-coupons-code.*|https://fairycoupons.com/store-profile/coolwick-coupons-code]com]com].)

> support count(*) and count distinct on multiple columns
> ---
>
> Key: HIVE-287
> URL: https://issues.apache.org/jira/browse/HIVE-287
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Namit Jain
>Assignee: Arvind Prabhakar
>Priority: Major
> Fix For: 0.6.0
>
> Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch, 
> HIVE-287-4.patch, HIVE-287-5-branch-0.6.patch, HIVE-287-5-trunk.patch, 
> HIVE-287-6-branch-0.6.patch, HIVE-287-6-trunk.patch
>
>
> The following query does not work:
> select count(distinct col1, col2) from Tbl



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (HIVE-287) support count(*) and count distinct on multiple columns

2021-10-26 Thread Amelia Emma (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17434583#comment-17434583
 ] 

Amelia Emma edited comment on HIVE-287 at 10/26/21, 9:37 PM:
-

Magnificently written! I am astonished to see this thing. I will keep an eye on 
these types of content to improve my fairycoupons.com to sale coupons like 
[thriftbooks 
coupons[http://fairycoupons.com/store-profile/thriftbooks-coupons.com] and 
[coolwick coupons 
code[[*https://fairycoupons.com/store-profile/coolwick-coupons-code.*|https://fairycoupons.com/store-profile/coolwick-coupons-code]com]com].


was (Author: amelii):
Magnificently written! I am astonished to see this thing. I will keep an eye on 
these types of content to improve my fairycoupons.com to sale coupons like 
[thriftbooks 
coupons|http://[%2Afairycoupons.com/store-profile/thriftbooks-coupons.*] and 
[coolwick coupons 
code[[*https://fairycoupons.com/store-profile/coolwick-coupons-code.*|https://fairycoupons.com/store-profile/coolwick-coupons-code]com].

> support count(*) and count distinct on multiple columns
> ---
>
> Key: HIVE-287
> URL: https://issues.apache.org/jira/browse/HIVE-287
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Namit Jain
>Assignee: Arvind Prabhakar
>Priority: Major
> Fix For: 0.6.0
>
> Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch, 
> HIVE-287-4.patch, HIVE-287-5-branch-0.6.patch, HIVE-287-5-trunk.patch, 
> HIVE-287-6-branch-0.6.patch, HIVE-287-6-trunk.patch
>
>
> The following query does not work:
> select count(distinct col1, col2) from Tbl



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (HIVE-287) support count(*) and count distinct on multiple columns

2021-10-26 Thread Amelia Emma (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17434583#comment-17434583
 ] 

Amelia Emma edited comment on HIVE-287 at 10/26/21, 9:36 PM:
-

Magnificently written! I am astonished to see this thing. I will keep an eye on 
these types of content to improve my fairycoupons.com to sale coupons like 
[thriftbooks 
coupons|http://[%2Afairycoupons.com/store-profile/thriftbooks-coupons.*] and 
[coolwick coupons 
code[[*https://fairycoupons.com/store-profile/coolwick-coupons-code.*|https://fairycoupons.com/store-profile/coolwick-coupons-code]com].


was (Author: amelii):
Magnificently written! I am astonished to see this thing. I will keep an eye on 
these types of content to improve my fairycoupons.com to sale coupons like 
[thriftbooks 
coupons|http://[*fairycoupons.com/store-profile/thriftbooks-coupons.*|https://fairycoupons.com/store-profile/thriftbooks-coupons]com]
 and [coolwick coupons 
code][https://fairycoupons.com/store-profile/coolwick-coupons-code].

> support count(*) and count distinct on multiple columns
> ---
>
> Key: HIVE-287
> URL: https://issues.apache.org/jira/browse/HIVE-287
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Namit Jain
>Assignee: Arvind Prabhakar
>Priority: Major
> Fix For: 0.6.0
>
> Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch, 
> HIVE-287-4.patch, HIVE-287-5-branch-0.6.patch, HIVE-287-5-trunk.patch, 
> HIVE-287-6-branch-0.6.patch, HIVE-287-6-trunk.patch
>
>
> The following query does not work:
> select count(distinct col1, col2) from Tbl



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (HIVE-287) support count(*) and count distinct on multiple columns

2021-10-26 Thread Amelia Emma (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17434583#comment-17434583
 ] 

Amelia Emma edited comment on HIVE-287 at 10/26/21, 9:35 PM:
-

Magnificently written! I am astonished to see this thing. I will keep an eye on 
these types of content to improve my fairycoupons.com to sale coupons like 
[thriftbooks 
coupons|http://[*fairycoupons.com/store-profile/thriftbooks-coupons.*|https://fairycoupons.com/store-profile/thriftbooks-coupons]com]
 and [coolwick coupons 
code][https://fairycoupons.com/store-profile/coolwick-coupons-code].


was (Author: amelii):
Magnificently written! I am astonished to see this thing. I will keep an eye on 
these types of content to improve my fairycoupons.com to sale coupons like 
[thriftbooks 
coupons][https://fairycoupons.com/store-profile/thriftbooks-coupons] and 
[coolwick coupons 
code][https://fairycoupons.com/store-profile/coolwick-coupons-code].

> support count(*) and count distinct on multiple columns
> ---
>
> Key: HIVE-287
> URL: https://issues.apache.org/jira/browse/HIVE-287
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Namit Jain
>Assignee: Arvind Prabhakar
>Priority: Major
> Fix For: 0.6.0
>
> Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch, 
> HIVE-287-4.patch, HIVE-287-5-branch-0.6.patch, HIVE-287-5-trunk.patch, 
> HIVE-287-6-branch-0.6.patch, HIVE-287-6-trunk.patch
>
>
> The following query does not work:
> select count(distinct col1, col2) from Tbl



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (HIVE-287) support count(*) and count distinct on multiple columns

2021-10-26 Thread Amelia Emma (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17434583#comment-17434583
 ] 

Amelia Emma edited comment on HIVE-287 at 10/26/21, 9:33 PM:
-

Magnificently written! I am astonished to see this thing. I will keep an eye on 
these types of content to improve my fairycoupons.com to sale coupons like 
[thriftbooks 
coupons][https://fairycoupons.com/store-profile/thriftbooks-coupons] and 
[coolwick coupons 
code][https://fairycoupons.com/store-profile/coolwick-coupons-code].


was (Author: amelii):
Magnificently written! I am astonished to see this thing. I will keep an eye on 
these types of content to improve my fairycoupons.com to sale coupons like 
[thriftbooks 
coupons][*https://fairycoupons.com/store-profile/thriftbooks-coupons*] and 
[coolwick coupons 
code][*https://fairycoupons.com/store-profile/coolwick-coupons-code*].

> support count(*) and count distinct on multiple columns
> ---
>
> Key: HIVE-287
> URL: https://issues.apache.org/jira/browse/HIVE-287
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Namit Jain
>Assignee: Arvind Prabhakar
>Priority: Major
> Fix For: 0.6.0
>
> Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch, 
> HIVE-287-4.patch, HIVE-287-5-branch-0.6.patch, HIVE-287-5-trunk.patch, 
> HIVE-287-6-branch-0.6.patch, HIVE-287-6-trunk.patch
>
>
> The following query does not work:
> select count(distinct col1, col2) from Tbl



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-287) support count(*) and count distinct on multiple columns

2021-10-26 Thread Amelia Emma (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17434583#comment-17434583
 ] 

Amelia Emma commented on HIVE-287:
--

Magnificently written! I am astonished to see this thing. I will keep an eye on 
these types of content to improve my fairycoupons.com to sale coupons like 
[thriftbooks 
coupons|[*https://fairycoupons.com/store-profile/thriftbooks-coupons*] and 
[coolwick coupons 
code|[*https://fairycoupons.com/store-profile/coolwick-coupons-code*]].

> support count(*) and count distinct on multiple columns
> ---
>
> Key: HIVE-287
> URL: https://issues.apache.org/jira/browse/HIVE-287
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Namit Jain
>Assignee: Arvind Prabhakar
>Priority: Major
> Fix For: 0.6.0
>
> Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch, 
> HIVE-287-4.patch, HIVE-287-5-branch-0.6.patch, HIVE-287-5-trunk.patch, 
> HIVE-287-6-branch-0.6.patch, HIVE-287-6-trunk.patch
>
>
> The following query does not work:
> select count(distinct col1, col2) from Tbl



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (HIVE-287) support count(*) and count distinct on multiple columns

2021-10-26 Thread Amelia Emma (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17434583#comment-17434583
 ] 

Amelia Emma edited comment on HIVE-287 at 10/26/21, 9:32 PM:
-

Magnificently written! I am astonished to see this thing. I will keep an eye on 
these types of content to improve my fairycoupons.com to sale coupons like 
[thriftbooks 
coupons][*https://fairycoupons.com/store-profile/thriftbooks-coupons*] and 
[coolwick coupons 
code][*https://fairycoupons.com/store-profile/coolwick-coupons-code*].


was (Author: amelii):
Magnificently written! I am astonished to see this thing. I will keep an eye on 
these types of content to improve my fairycoupons.com to sale coupons like 
[thriftbooks 
coupons|[*https://fairycoupons.com/store-profile/thriftbooks-coupons*] and 
[coolwick coupons 
code|[*https://fairycoupons.com/store-profile/coolwick-coupons-code*]].

> support count(*) and count distinct on multiple columns
> ---
>
> Key: HIVE-287
> URL: https://issues.apache.org/jira/browse/HIVE-287
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Namit Jain
>Assignee: Arvind Prabhakar
>Priority: Major
> Fix For: 0.6.0
>
> Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch, 
> HIVE-287-4.patch, HIVE-287-5-branch-0.6.patch, HIVE-287-5-trunk.patch, 
> HIVE-287-6-branch-0.6.patch, HIVE-287-6-trunk.patch
>
>
> The following query does not work:
> select count(distinct col1, col2) from Tbl



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-25643) Disable replace cols and change col commands for migrated Iceberg tables

2021-10-26 Thread Marton Bod (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17434474#comment-17434474
 ] 

Marton Bod commented on HIVE-25643:
---

Pushed to master. Thanks [~szita] for the review!

> Disable replace cols and change col commands for migrated Iceberg tables
> 
>
> Key: HIVE-25643
> URL: https://issues.apache.org/jira/browse/HIVE-25643
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Since the Iceberg table migration will intentionally not rewrite the data 
> files, the migrated table will end up with data files that do not contain the 
> Iceberg field IDs necessary for safe, reliable schema evolution. For this 
> purpose, we should disallow the REPLACE COLUMNS and CHANGE COLUMN commands 
> for these migrated Iceberg tables. ADD COLUMNS are still permitted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-25643) Disable replace cols and change col commands for migrated Iceberg tables

2021-10-26 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod resolved HIVE-25643.
---
Resolution: Fixed

> Disable replace cols and change col commands for migrated Iceberg tables
> 
>
> Key: HIVE-25643
> URL: https://issues.apache.org/jira/browse/HIVE-25643
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Since the Iceberg table migration will intentionally not rewrite the data 
> files, the migrated table will end up with data files that do not contain the 
> Iceberg field IDs necessary for safe, reliable schema evolution. For this 
> purpose, we should disallow the REPLACE COLUMNS and CHANGE COLUMN commands 
> for these migrated Iceberg tables. ADD COLUMNS are still permitted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25643) Disable replace cols and change col commands for migrated Iceberg tables

2021-10-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25643:
--
Labels: pull-request-available  (was: )

> Disable replace cols and change col commands for migrated Iceberg tables
> 
>
> Key: HIVE-25643
> URL: https://issues.apache.org/jira/browse/HIVE-25643
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Since the Iceberg table migration will intentionally not rewrite the data 
> files, the migrated table will end up with data files that do not contain the 
> Iceberg field IDs necessary for safe, reliable schema evolution. For this 
> purpose, we should disallow the REPLACE COLUMNS and CHANGE COLUMN commands 
> for these migrated Iceberg tables. ADD COLUMNS are still permitted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25643) Disable replace cols and change col commands for migrated Iceberg tables

2021-10-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25643?focusedWorklogId=670234=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-670234
 ]

ASF GitHub Bot logged work on HIVE-25643:
-

Author: ASF GitHub Bot
Created on: 26/Oct/21 16:59
Start Date: 26/Oct/21 16:59
Worklog Time Spent: 10m 
  Work Description: marton-bod merged pull request #2744:
URL: https://github.com/apache/hive/pull/2744


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 670234)
Remaining Estimate: 0h
Time Spent: 10m

> Disable replace cols and change col commands for migrated Iceberg tables
> 
>
> Key: HIVE-25643
> URL: https://issues.apache.org/jira/browse/HIVE-25643
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Since the Iceberg table migration will intentionally not rewrite the data 
> files, the migrated table will end up with data files that do not contain the 
> Iceberg field IDs necessary for safe, reliable schema evolution. For this 
> purpose, we should disallow the REPLACE COLUMNS and CHANGE COLUMN commands 
> for these migrated Iceberg tables. ADD COLUMNS are still permitted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25553) Support Map data-type natively in Arrow format

2021-10-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25553?focusedWorklogId=670214=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-670214
 ]

ASF GitHub Bot logged work on HIVE-25553:
-

Author: ASF GitHub Bot
Created on: 26/Oct/21 16:34
Start Date: 26/Oct/21 16:34
Worklog Time Spent: 10m 
  Work Description: warriersruthi opened a new pull request #2751:
URL: https://github.com/apache/hive/pull/2751


   This covers the following sub-tasks as well:
   HIVE-25554: Upgrade arrow version to 0.15
   HIVE-2: ArrowColumnarBatchSerDe should store map natively instead of 
converting to list
   
   What changes were proposed in this pull request?
   a. Upgrading arrow version to version 0.15.0 (where map data-type is 
supported)
   b. Modifying ArrowColumnarBatchSerDe and corresponding 
Serializer/Deserializer to not use list as a workaround for map and use the 
arrow map data-type instead
   c. Taking care of creating non-nullable struct and non-nullable key type for 
the map data-type in ArrowColumnarBatchSerDe
   
   Why are the changes needed?
   Currently ArrowColumnarBatchSerDe converts map datatype as a list of structs 
data-type (where struct is containing the key-value pair of the map).
   This causes issues when reading Map datatype using llap-ext-client as it 
reads a list of structs instead.
   HiveWarehouseConnector which uses the llap-ext-client throws exception when 
the schema (containing Map data type) is different from actual data (list of 
structs).
   This change includes the fix for this issue.
   
   Does this PR introduce any user-facing change?
   No
   
   How was this patch tested?
   Enabled back the Arrow specific tests in Hive code
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 670214)
Time Spent: 2h 50m  (was: 2h 40m)

> Support Map data-type natively in Arrow format
> --
>
> Key: HIVE-25553
> URL: https://issues.apache.org/jira/browse/HIVE-25553
> Project: Hive
>  Issue Type: Improvement
>  Components: llap, Serializers/Deserializers
>Reporter: Adesh Kumar Rao
>Assignee: Sruthi Mooriyathvariam
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Currently ArrowColumnarBatchSerDe converts map datatype as a list of structs 
> data-type (where stuct is containing the key-value pair of the map). This 
> causes issues when reading Map datatype using llap-ext-client as it reads a 
> list of structs instead. 
> HiveWarehouseConnector which uses the llap-ext-client throws exception when 
> the schema (containing Map data type) is different from actual data (list of 
> structs).
>  
> Fixing this issue requires upgrading arrow version (where map data-type is 
> supported), modifying ArrowColumnarBatchSerDe and corresponding 
> Serializer/Deserializer to not use list as a workaround for map and use the 
> arrow map data-type instead. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25647) hadoop memo

2021-10-26 Thread St Li (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

St Li updated HIVE-25647:
-
Description: 
do not care this  just test

close firewall:systemctl stop firewalld
check :systemctl status firewalld
chose:tzselect
 echo "TZ='Asia/Shanghai'; export TZ" >> /etc/profile && source /etc/profile
 yum install -y ntp
 vim /etc/ntp.conf    #server0-3
add : fudge 127.127.1.0 stratum 10
 /bin/systemctl restart ntpd.service
 ntpdate master //slaveshang

service crond status
 /sbin/service crond start

 

 

 

 

  was:
do not care this  just test

关闭防火墙:systemctl stop firewalld
查看状态:systemctl status firewalld
选择时区:tzselect
echo "TZ='Asia/Shanghai'; export TZ" >> /etc/profile && source /etc/profile
yum install -y ntp
vim /etc/ntp.conf 注释掉server0-3
添加 fudge 127.127.1.0 stratum 10
/bin/systemctl restart ntpd.service
ntpdate master //slaveshang

service crond status
/sbin/service crond start

 

 

 

 


> hadoop memo
> ---
>
> Key: HIVE-25647
> URL: https://issues.apache.org/jira/browse/HIVE-25647
> Project: Hive
>  Issue Type: Wish
>  Components: Configuration
>Affects Versions: 3.1.2
> Environment: hadoop 2.7.3
>Reporter: St Li
>Assignee: St Li
>Priority: Major
> Fix For: All Versions
>
> Attachments: worldip.csv
>
>
> do not care this  just test
> close firewall:systemctl stop firewalld
> check :systemctl status firewalld
> chose:tzselect
>  echo "TZ='Asia/Shanghai'; export TZ" >> /etc/profile && source /etc/profile
>  yum install -y ntp
>  vim /etc/ntp.conf    #server0-3
> add : fudge 127.127.1.0 stratum 10
>  /bin/systemctl restart ntpd.service
>  ntpdate master //slaveshang
> service crond status
>  /sbin/service crond start
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25647) hadoop memo

2021-10-26 Thread St Li (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

St Li updated HIVE-25647:
-
Summary: hadoop memo  (was: hadoop指令备忘录)

> hadoop memo
> ---
>
> Key: HIVE-25647
> URL: https://issues.apache.org/jira/browse/HIVE-25647
> Project: Hive
>  Issue Type: Wish
>  Components: Configuration
>Affects Versions: 3.1.2
> Environment: hadoop 2.7.3
>Reporter: St Li
>Assignee: St Li
>Priority: Major
> Fix For: All Versions
>
> Attachments: worldip.csv
>
>
> do not care this  just test
> 关闭防火墙:systemctl stop firewalld
> 查看状态:systemctl status firewalld
> 选择时区:tzselect
> echo "TZ='Asia/Shanghai'; export TZ" >> /etc/profile && source /etc/profile
> yum install -y ntp
> vim /etc/ntp.conf 注释掉server0-3
> 添加 fudge 127.127.1.0 stratum 10
> /bin/systemctl restart ntpd.service
> ntpdate master //slaveshang
> service crond status
> /sbin/service crond start
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25628) Avoid unnecessary file ops if Iceberg table is LLAP cached

2021-10-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25628?focusedWorklogId=670057=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-670057
 ]

ASF GitHub Bot logged work on HIVE-25628:
-

Author: ASF GitHub Bot
Created on: 26/Oct/21 13:04
Start Date: 26/Oct/21 13:04
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2748:
URL: https://github.com/apache/hive/pull/2748#discussion_r736498819



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/vector/HiveVectorizedReader.java
##
@@ -114,19 +115,24 @@ private HiveVectorizedReader() {
   // Need to turn positional schema evolution off since we use column 
name based schema evolution for projection
   // and Iceberg will make a mapping between the file schema and the 
current reading schema.
   job.setBoolean(OrcConf.FORCE_POSITIONAL_EVOLUTION.getHiveConfName(), 
false);
-  VectorizedReadUtils.handleIcebergProjection(inputFile, task, job);
+
+  // Iceberg currently does not track the last modification time of a 
file. Until that's added, we need to set
+  // Long.MIN_VALUE as last modification time in the fileId triplet.

Review comment:
   nit: the comment is good, but maybe add a TODO as well so we can grep 
for these tech debts later

##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/orc/VectorizedReadUtils.java
##
@@ -20,28 +20,81 @@
 package org.apache.iceberg.orc;
 
 import java.io.IOException;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.common.io.CacheTag;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.llap.LlapHiveUtils;
+import org.apache.hadoop.hive.llap.io.api.LlapProxy;
+import org.apache.hadoop.hive.ql.io.SyntheticFileId;
 import org.apache.hadoop.hive.ql.io.sarg.ConvertAstToSearchArg;
+import org.apache.hadoop.hive.ql.plan.MapWork;
+import org.apache.hadoop.hive.ql.plan.PartitionDesc;
 import org.apache.hadoop.hive.ql.plan.TableScanDesc;
 import org.apache.hadoop.hive.serde2.ColumnProjectionUtils;
 import org.apache.hadoop.mapred.JobConf;
 import org.apache.hive.iceberg.org.apache.orc.Reader;
 import org.apache.hive.iceberg.org.apache.orc.TypeDescription;
+import org.apache.hive.iceberg.org.apache.orc.impl.ReaderImpl;
 import org.apache.iceberg.FileScanTask;
 import org.apache.iceberg.Schema;
 import org.apache.iceberg.expressions.Binder;
 import org.apache.iceberg.expressions.Expression;
 import org.apache.iceberg.io.InputFile;
 import org.apache.iceberg.mapping.MappingUtil;
+import org.apache.orc.impl.BufferChunk;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 
 /**
  * Utilities that rely on Iceberg code from org.apache.iceberg.orc package.
  */
 public class VectorizedReadUtils {
 
+  private static final Logger LOG = 
LoggerFactory.getLogger(VectorizedReadUtils.class);
+
   private VectorizedReadUtils() {
 
   }
 
+  private static TypeDescription getSchemaForFile(InputFile inputFile, 
SyntheticFileId fileId, JobConf job)
+  throws IOException {
+TypeDescription schema = null;
+
+if (HiveConf.getBoolVar(job, HiveConf.ConfVars.LLAP_IO_ENABLED, 
LlapProxy.isDaemon()) &&
+LlapProxy.getIo() != null) {
+  MapWork mapWork = LlapHiveUtils.findMapWork(job);
+  Path path = new Path(inputFile.location());
+  PartitionDesc partitionDesc = LlapHiveUtils.partitionDescForPath(path, 
mapWork.getPathToPartitionInfo());
+
+  // Note: Since Hive doesn't know about partition information of Iceberg 
tables, partitionDesc is only used to
+  // deduct the table (and DB) name here.
+  CacheTag cacheTag = HiveConf.getBoolVar(job, 
HiveConf.ConfVars.LLAP_TRACK_CACHE_USAGE) ?
+  LlapHiveUtils.getDbAndTableNameForMetrics(path, true, partitionDesc) 
: null;
+
+  try {
+// Schema has to be serialized and deserialized as it is passed 
between different packages of TypeDescription..
+BufferChunk tailBuffer = LlapProxy.getIo().getOrcTailFromCache(path, 
job, cacheTag, fileId).getTailBuffer();
+schema = ReaderImpl.extractFileTail(tailBuffer.getData()).getSchema();
+  } catch (IOException ioe) {
+LOG.warn("LLAP is turned on but was unable to get file metadata 
information through its cache.", ioe);

Review comment:
   Do we want to log the inputFile.location() here for debugging?

##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/orc/VectorizedReadUtils.java
##
@@ -50,43 +103,38 @@ private VectorizedReadUtils() {
* @param job - JobConf instance to adjust
* @throws IOException - errors relating to accessing the ORC file
*/
-  public static void handleIcebergProjection(InputFile inputFile, FileScanTask 
task, JobConf job)
-  throws IOException {
-Reader orcFileReader =

[jira] [Assigned] (HIVE-25350) Replication fails for external tables on setting owner/groups

2021-10-26 Thread Sumit Verma (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumit Verma reassigned HIVE-25350:
--

Assignee: Sumit Verma  (was: Ayush Saxena)

> Replication fails for external tables on setting owner/groups
> -
>
> Key: HIVE-25350
> URL: https://issues.apache.org/jira/browse/HIVE-25350
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Sumit Verma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> DirCopyTask tries to preserve user group permissions, irrespective whether 
> they have been specified to be preserved or not.
> Changing user/group requires SuperUser privileges, hence the task fails.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-25449) datediff() gives wrong output when run in a tez task with some non-UTC timezone

2021-10-26 Thread Sumit Verma (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumit Verma reassigned HIVE-25449:
--

Assignee: Sumit Verma  (was: Shubham Chaurasia)

> datediff() gives wrong output when run in a tez task with some non-UTC 
> timezone
> ---
>
> Key: HIVE-25449
> URL: https://issues.apache.org/jira/browse/HIVE-25449
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Reporter: Shubham Chaurasia
>Assignee: Sumit Verma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Repro (thanks Qiaosong Dong) - 
> Add -Duser.timezone=GMT+8 to {{tez.task.launch.cmd-opts}}
> {code}
> create external table test_dt(id string, dt date);
> insert into test_dt values('11', '2021-07-06'), ('22', '2021-07-07');
> select datediff(dt1.dt, '2021-07-01') from test_dt dt1 left join test_dt dt 
> on dt1.id = dt.id;
> +--+
> | _c0  |
> +--+
> | 6|
> | 7|
> +--+
> {code}
> Expected output - 
> {code}
> +--+
> | _c0  |
> +--+
> | 5|
> | 6|
> +--+
> {code}
> *Cause*
> This happens because in {{VectorUDFDateDiffColScalar}} class  
> 1. For second argument(scalar) , we use {{java.text.SimpleDateFormat}} to 
> parse the date strings which interprets it to be in local timezone.
> 2. For first column we get a column vector which represents the date as epoch 
> day. This is always in UTC.
> *Solution*
> We need to check other variants of datediff UDFs as well and change the 
> parsing mechanism to always interpret date strings in UTC. 
>  
> I did a quick change in {{VectorUDFDateDiffColScalar}} which fixes the issue.
> {code}
> -  date.setTime(formatter.parse(new String(bytesValue, 
> "UTF-8")).getTime());
> -  baseDate = DateWritableV2.dateToDays(date);
> +  org.apache.hadoop.hive.common.type.Date hiveDate
> +  = org.apache.hadoop.hive.common.type.Date.valueOf(new 
> String(bytesValue, "UTF-8"));
> +  date.setTime(hiveDate.toEpochMilli());
> +  baseDate = hiveDate.toEpochDay();
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-25614) Mapjoin then join left,the result is incorrect

2021-10-26 Thread Sumit Verma (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumit Verma reassigned HIVE-25614:
--

Assignee: Sumit Verma

> Mapjoin then join left,the result is incorrect
> --
>
> Key: HIVE-25614
> URL: https://issues.apache.org/jira/browse/HIVE-25614
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.2
>Reporter: zengxl
>Assignee: Sumit Verma
>Priority: Critical
>
> Currently I join 3 tables, find the result of left join is 
> *{color:#de350b}null{color}*
>  Here is my SQL，The result of this SQL is NULL
> {code:java}
> //代码占位符
> CREATE TABLE 
> `pdwd.pdwd_pf_hive_ah3_metastore_tbl_partitions_d_test_0831_3_tbl`(
>   `tbl_id` bigint COMMENT 'TBL_ID', 
>   `tbl_create_time` bigint COMMENT 'TBL_CREATE_TIME', 
>   `db_id` bigint COMMENT 'DB_ID', 
>   `tbl_last_access_time` bigint COMMENT 'TBL_LAST_ACCESS_TIME', 
>   `owner` string COMMENT 'OWNER', 
>   `retention` bigint COMMENT 'RETENTION', 
>   `sd_id` bigint COMMENT 'SD_ID', 
>   `tbl_name` string COMMENT 'TBL_NAME', 
>   `tbl_type` string COMMENT 'TBL_TYPE', 
>   `view_expanded_text` string COMMENT 'VIEW_EXPANDED_TEXT', 
>   `view_original_text` string COMMENT 'VIEW_ORIGINAL_TEXT', 
>   `is_rewrite_enabled` bigint COMMENT 'IS_REWRITE_ENABLED', 
>   `tbl_owner_type` string COMMENT 'TBL_OWNER_TYPE', 
>   `cd_id` bigint COMMENT 'CD_ID', 
>   `input_format` string COMMENT 'INPUT_FORMAT', 
>   `is_compressed` bigint COMMENT 'IS_COMPRESSED', 
>   `is_storedassubdirectories` bigint COMMENT 'IS_STOREDASSUBDIRECTORIES', 
>   `tbl_or_part_location` string COMMENT 'tbl_or_part_location', 
>   `num_buckets` bigint COMMENT 'NUM_BUCKETS', 
>   `output_format` string COMMENT 'OUTPUT_FORMAT', 
>   `serde_id` bigint COMMENT 'SERDE_ID',
>   `part_id` bigint COMMENT 'PART_ID', 
>   `part_create_time` bigint COMMENT 'PART_CREATE_TIME', 
>   `part_last_access_time` bigint COMMENT 'PART_LAST_ACCESS_TIME', 
>   `part_name` string COMMENT 'PART_NAME')
> PARTITIONED BY ( 
>   `pt` string)
> ROW FORMAT SERDE 
>   'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
> STORED AS INPUTFORMAT 
>   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
> OUTPUTFORMAT 
>   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
>   
>   with tmp1 as (
> select
> part_id,
> create_time,
> last_access_time,
> part_name,
> sd_id,
> tbl_id
> from
> pods.pods_pf_hive_ah3_metastore_partitions_d
> where
> pt='2021-08-12'
> ),
> tmp2 as (
> select
> tbl_id,
> create_time,
> db_id,
> last_access_time,
> owner,
> retention,
> sd_id,
> tbl_name,
> tbl_type,
> view_expanded_text,
> view_original_text,
> is_rewrite_enabled,
> owner_type
> from
> pods.pods_pf_hive_ah3_metastore_tbls_d
> where
> pt='2021-08-12'
> ),
> tmp3 as (
> select
> sd_id,
> cd_id,
> input_format,
> is_compressed,
> is_storedassubdirectories,
> location,
> num_buckets,
> output_format,
> serde_id
> from
> pods.pods_pf_hive_ah3_metastore_sds_d
> where
> pt='2021-08-12'
> )insert overwrite table 
> pdwd.pdwd_pf_hive_ah3_metastore_tbl_partitions_d_test_0831_3_tbl 
> PARTITION(pt='2021-08-14')
> select
> a.tbl_id,
> b.create_time as tbl_create_time,
> b.db_id,
> b.last_access_time as tbl_last_access_time,
> b.owner,
> b.retention,
> a.sd_id,
> b.tbl_name,
> b.tbl_type,
> b.view_expanded_text,
> b.view_original_text,
> b.is_rewrite_enabled,
> b.owner_type as tbl_owner_type,
> d.cd_id,
> d.input_format,
> d.is_compressed,
> d.is_storedassubdirectories,
> d.location as tbl_location,
> d.num_buckets,
> d.output_format,
> d.serde_id,
> a.part_id,
> a.create_time as part_create_time,
> a.last_access_time as part_last_access_time,
> a.part_name
> from tmp1 a 
> left join tmp2 b on a.tbl_id=b.tbl_id
> left join tmp3 d on a.sd_id=d.sd_id;
> {code}
> pods.pods_pf_hive_ah3_metastore_partitions_d、pods.pods_pf_hive_ah3_metastore_tbls_d、pods.pods_pf_hive_ah3_metastore_sds_d
>   from {color:#de350b}*Metastore*{color} partitions、tbls、sds
> The sizes of the three tables are as follows:
> 80.3 M 240.9 M 
> hdfs://ns4/apps/hive/warehouse/pods/pods_pf_hive_ah3_metastore_partitions_d/pt=2021-10-09/exchangis_hive_w__2585cbd4_8bf8_4fbb_8a90_f5a7939b62b3.snappy
>  179.8 K 539.5 K 
> hdfs://ns4/apps/hive/warehouse/pods/pods_pf_hive_ah3_metastore_tbls_d/pt=2021-10-09/exchangis_hive_w__8a62acaa_6f82_442e_97db_ce960833612f.snappy
>  94.3 M 282.9 M 
> hdfs://ns4/apps/hive/warehouse/pods/pods_pf_hive_ah3_metastore_sds_d/pt=2021-10-09/exchangis_hive_w__d25536e8_7018_4262_a00e_5af3b1f88925.snappy
> The result is as follows,select from 
> pdwd.pdwd_pf_hive_ah3_metastore_tbl_partitions_d_test_0831_3_tbl table 
> *{color:#de350b}is null{color}*
> {code:java}
> hive> select * from 
> pdwd.pdwd_pf_hive_ah3_metastore_tbl_partitions_d_test_0831_3_tbl where 
> pt='2021-08-14' and sd_id=21229815;
> OK
> 721213

[jira] [Work logged] (HIVE-25650) Make workerId and workerVersionId optional in the FindNextCompactRequest

2021-10-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25650?focusedWorklogId=670044=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-670044
 ]

ASF GitHub Bot logged work on HIVE-25650:
-

Author: ASF GitHub Bot
Created on: 26/Oct/21 12:18
Start Date: 26/Oct/21 12:18
Worklog Time Spent: 10m 
  Work Description: vcsomor commented on pull request #2749:
URL: https://github.com/apache/hive/pull/2749#issuecomment-951881123


   @lcspinter please review this


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 670044)
Time Spent: 20m  (was: 10m)

> Make workerId and workerVersionId optional in the FindNextCompactRequest
> 
>
> Key: HIVE-25650
> URL: https://issues.apache.org/jira/browse/HIVE-25650
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Viktor Csomor
>Assignee: Viktor Csomor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In hive_metastore.thrift the FindNextCompactRequest struct's fields are 
> required:
> {code}
> struct FindNextCompactRequest {
> 1: required string workerId,
> 2: required string workerVersion
> }{code}
> these should probably be made optional, to avoid breaking compaction if 
> they're not available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (HIVE-25553) Support Map data-type natively in Arrow format

2021-10-26 Thread Sankar Hariappan (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17434285#comment-17434285
 ] 

Sankar Hariappan edited comment on HIVE-25553 at 10/26/21, 11:33 AM:
-

[~kgyrtkirk] My bad, I noticed the green tick in the title and assumed the 
tests are passed but missed the "tests-failed" tag. 
Thanks for reverting the patch!

[~warriersruthi], Could you pls resubmit the patch and fix those test failures?


was (Author: sankarh):
[~kgyrtkirk] My bad, I noticed the green tick in the title and missed the 
"tests-failed" tag. 
Thanks for reverting the patch!

[~warriersruthi], Could you pls resubmit the patch and fix those test failures?

> Support Map data-type natively in Arrow format
> --
>
> Key: HIVE-25553
> URL: https://issues.apache.org/jira/browse/HIVE-25553
> Project: Hive
>  Issue Type: Improvement
>  Components: llap, Serializers/Deserializers
>Reporter: Adesh Kumar Rao
>Assignee: Sruthi Mooriyathvariam
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Currently ArrowColumnarBatchSerDe converts map datatype as a list of structs 
> data-type (where stuct is containing the key-value pair of the map). This 
> causes issues when reading Map datatype using llap-ext-client as it reads a 
> list of structs instead. 
> HiveWarehouseConnector which uses the llap-ext-client throws exception when 
> the schema (containing Map data type) is different from actual data (list of 
> structs).
>  
> Fixing this issue requires upgrading arrow version (where map data-type is 
> supported), modifying ArrowColumnarBatchSerDe and corresponding 
> Serializer/Deserializer to not use list as a workaround for map and use the 
> arrow map data-type instead. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-25553) Support Map data-type natively in Arrow format

2021-10-26 Thread Sankar Hariappan (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17434285#comment-17434285
 ] 

Sankar Hariappan commented on HIVE-25553:
-

[~kgyrtkirk] My bad, I noticed the green tick in the title and missed the 
"tests-failed" tag. 
Thanks for reverting the patch!

[~warriersruthi], Could you pls resubmit the patch and fix those test failures?

> Support Map data-type natively in Arrow format
> --
>
> Key: HIVE-25553
> URL: https://issues.apache.org/jira/browse/HIVE-25553
> Project: Hive
>  Issue Type: Improvement
>  Components: llap, Serializers/Deserializers
>Reporter: Adesh Kumar Rao
>Assignee: Sruthi Mooriyathvariam
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Currently ArrowColumnarBatchSerDe converts map datatype as a list of structs 
> data-type (where stuct is containing the key-value pair of the map). This 
> causes issues when reading Map datatype using llap-ext-client as it reads a 
> list of structs instead. 
> HiveWarehouseConnector which uses the llap-ext-client throws exception when 
> the schema (containing Map data type) is different from actual data (list of 
> structs).
>  
> Fixing this issue requires upgrading arrow version (where map data-type is 
> supported), modifying ArrowColumnarBatchSerDe and corresponding 
> Serializer/Deserializer to not use list as a workaround for map and use the 
> arrow map data-type instead. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25553) Support Map data-type natively in Arrow format

2021-10-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25553?focusedWorklogId=670004=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-670004
 ]

ASF GitHub Bot logged work on HIVE-25553:
-

Author: ASF GitHub Bot
Created on: 26/Oct/21 10:45
Start Date: 26/Oct/21 10:45
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on pull request #2689:
URL: https://github.com/apache/hive/pull/2689#issuecomment-951811525


   reverted from master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 670004)
Time Spent: 2h 40m  (was: 2.5h)

> Support Map data-type natively in Arrow format
> --
>
> Key: HIVE-25553
> URL: https://issues.apache.org/jira/browse/HIVE-25553
> Project: Hive
>  Issue Type: Improvement
>  Components: llap, Serializers/Deserializers
>Reporter: Adesh Kumar Rao
>Assignee: Sruthi Mooriyathvariam
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Currently ArrowColumnarBatchSerDe converts map datatype as a list of structs 
> data-type (where stuct is containing the key-value pair of the map). This 
> causes issues when reading Map datatype using llap-ext-client as it reads a 
> list of structs instead. 
> HiveWarehouseConnector which uses the llap-ext-client throws exception when 
> the schema (containing Map data type) is different from actual data (list of 
> structs).
>  
> Fixing this issue requires upgrading arrow version (where map data-type is 
> supported), modifying ArrowColumnarBatchSerDe and corresponding 
> Serializer/Deserializer to not use list as a workaround for map and use the 
> arrow map data-type instead. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Reopened] (HIVE-25553) Support Map data-type natively in Arrow format

2021-10-26 Thread Zoltan Haindrich (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reopened HIVE-25553:
-

reverted from master:
* it was committed without a clean testrun
* 5 tests were broken by these changes
** one of the test is clearly arrow 
related(org.apache.hadoop.hive.ql.io.arrow.TestSerializer)
http://ci.hive.apache.org/job/hive-precommit/job/master/lastCompletedBuild/testReport/junit/org.apache.hadoop.hive.ql.io.arrow/TestSerializer/Testing___split_06___PostProcess___testEmptyComplexStruct/

[~sankarh] why did you merged the changes even thru the PR was marked as 
tests-failed? it didn't even had a green testrun!
http://ci.hive.apache.org/job/hive-precommit/job/PR-2689/



> Support Map data-type natively in Arrow format
> --
>
> Key: HIVE-25553
> URL: https://issues.apache.org/jira/browse/HIVE-25553
> Project: Hive
>  Issue Type: Improvement
>  Components: llap, Serializers/Deserializers
>Reporter: Adesh Kumar Rao
>Assignee: Sruthi Mooriyathvariam
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Currently ArrowColumnarBatchSerDe converts map datatype as a list of structs 
> data-type (where stuct is containing the key-value pair of the map). This 
> causes issues when reading Map datatype using llap-ext-client as it reads a 
> list of structs instead. 
> HiveWarehouseConnector which uses the llap-ext-client throws exception when 
> the schema (containing Map data type) is different from actual data (list of 
> structs).
>  
> Fixing this issue requires upgrading arrow version (where map data-type is 
> supported), modifying ArrowColumnarBatchSerDe and corresponding 
> Serializer/Deserializer to not use list as a workaround for map and use the 
> arrow map data-type instead. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24590) Operation Logging still leaks the log4j Appenders

2021-10-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24590?focusedWorklogId=669979=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-669979
 ]

ASF GitHub Bot logged work on HIVE-24590:
-

Author: ASF GitHub Bot
Created on: 26/Oct/21 09:39
Start Date: 26/Oct/21 09:39
Worklog Time Spent: 10m 
  Work Description: zabetak commented on pull request #2432:
URL: https://github.com/apache/hive/pull/2432#issuecomment-951762656


   > So, what would the affect be if the time were to pass and another write 
request came in? Would it create a second file? Append to the end of an 
existing file? Is the original log files always deleted when the logger goes 
idle?
   
   The deletion of the log files is not handled by Log4j (not in this case at 
least). When the logger goes idle, the appender is closed along with the 
respective file descriptor but the file remains as is. When the appender comes 
back it will write to the same file; if for whatever reason the file is not 
there it will create a file with the same name and start writing there.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 669979)
Time Spent: 4h  (was: 3h 50m)

> Operation Logging still leaks the log4j Appenders
> -
>
> Key: HIVE-24590
> URL: https://issues.apache.org/jira/browse/HIVE-24590
> Project: Hive
>  Issue Type: Bug
>  Components: Logging
>Reporter: Eugene Chung
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screen Shot 2021-01-06 at 18.42.05.png, Screen Shot 
> 2021-01-06 at 18.42.24.png, Screen Shot 2021-01-06 at 18.42.55.png, Screen 
> Shot 2021-01-06 at 21.38.32.png, Screen Shot 2021-01-06 at 21.47.28.png, 
> Screen Shot 2021-01-08 at 21.01.40.png, add_debug_log_and_trace.patch
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> I'm using Hive 3.1.2 with options below.
>  * hive.server2.logging.operation.enabled=true
>  * hive.server2.logging.operation.level=VERBOSE
>  * hive.async.log.enabled=false
> I already know the ticket, https://issues.apache.org/jira/browse/HIVE-17128 
> but HS2 still leaks log4j RandomAccessFileManager.
> !Screen Shot 2021-01-06 at 18.42.05.png|width=756,height=197!
> I checked the operation log file which is not closed/deleted properly.
> !Screen Shot 2021-01-06 at 18.42.24.png|width=603,height=272!
> Then there's the log,
> {code:java}
> client.TezClient: Shutting down Tez Session, sessionName= {code}
> !Screen Shot 2021-01-06 at 18.42.55.png|width=1372,height=26!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-19825) HiveServer2 leader selection shall use different zookeeper znode

2021-10-26 Thread Ranith Sardar (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-19825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17434228#comment-17434228
 ] 

Ranith Sardar commented on HIVE-19825:
--

Tested a similar scenario with the patch. Patch LGTM.

> HiveServer2 leader selection shall use different zookeeper znode
> 
>
> Key: HIVE-19825
> URL: https://issues.apache.org/jira/browse/HIVE-19825
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Daniel Dai
>Assignee: Daniel Dai
>Priority: Major
> Attachments: HIVE-19825.1.patch
>
>
> Currently, HiveServer2 leader selection (used only by privilegesynchronizer 
> now) is reuse /hiveserver2 parent znode which is already used for HiveServer2 
> service discovery. This interfere the service discovery. I'd like to switch 
> to a different znode /hiveserver2-leader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-25651) Enable LLAP cache affinity for Iceberg ORC splits

2021-10-26 Thread Jira



 [ 
https://issues.apache.org/jira/browse/HIVE-25651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ádám Szita reassigned HIVE-25651:
-


> Enable LLAP cache affinity for Iceberg ORC splits
> -
>
> Key: HIVE-25651
> URL: https://issues.apache.org/jira/browse/HIVE-25651
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>
> Since HiveIcebergInputformat doesn't implement any LLAP marker interfaces, 
> cache affinity is never tried, and so any split containing ORC file parts may 
> go to a random LLAP daemon, causing subpar hit ratio later.
> So we should:
>  * let HS2 know that cache affinity is required for this inputformat
>  * prevent Iceberg from grouping separate files together in one combined 
> split in case of LLAP execution
>  * provide proper getPath() result for HiveIcebergSplit, so that 
> HostAffinitySplitLocationProvider calculates different hashes for different 
> files (right now getPath() returns table location only)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25650) Make workerId and workerVersionId optional in the FindNextCompactRequest

2021-10-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25650:
--
Labels: pull-request-available  (was: )

> Make workerId and workerVersionId optional in the FindNextCompactRequest
> 
>
> Key: HIVE-25650
> URL: https://issues.apache.org/jira/browse/HIVE-25650
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Viktor Csomor
>Assignee: Viktor Csomor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In hive_metastore.thrift the FindNextCompactRequest struct's fields are 
> required:
> {code}
> struct FindNextCompactRequest {
> 1: required string workerId,
> 2: required string workerVersion
> }{code}
> these should probably be made optional, to avoid breaking compaction if 
> they're not available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25628) Avoid unnecessary file ops if Iceberg table is LLAP cached

2021-10-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25628:
--
Labels: pull-request-available  (was: )

> Avoid unnecessary file ops if Iceberg table is LLAP cached
> --
>
> Key: HIVE-25628
> URL: https://issues.apache.org/jira/browse/HIVE-25628
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In case the query execution is vectorized for an Iceberg table, we need to 
> make an extra file open operation on the ORC file to learn what the file 
> schema is (to be matched later with the logical schema).
> In LLAP configuration the file schema could be retrieved through LLAP cache 
> as ORC metadata is cached, so we should avoid the file operation when 
> possible.
> Also: LLAP relies on cache keys that are usually triplets of file information 
> and is constructed by an FS.listStatus call. For iceberg tables we should 
> rely on such file information provided by Iceberg's metadata to spare this 
> call too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25650) Make workerId and workerVersionId optional in the FindNextCompactRequest

2021-10-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25650?focusedWorklogId=669967=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-669967
 ]

ASF GitHub Bot logged work on HIVE-25650:
-

Author: ASF GitHub Bot
Created on: 26/Oct/21 09:13
Start Date: 26/Oct/21 09:13
Worklog Time Spent: 10m 
  Work Description: vcsomor opened a new pull request #2749:
URL: https://github.com/apache/hive/pull/2749


   The `workerId` and the `workerVersionId` have been made optional in the 
FindNextCompactRequest
   - The two fields in the hive_metastore.thrift have been made optional
   - Test and code adjusted accordingly
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 669967)
Remaining Estimate: 0h
Time Spent: 10m

> Make workerId and workerVersionId optional in the FindNextCompactRequest
> 
>
> Key: HIVE-25650
> URL: https://issues.apache.org/jira/browse/HIVE-25650
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Viktor Csomor
>Assignee: Viktor Csomor
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In hive_metastore.thrift the FindNextCompactRequest struct's fields are 
> required:
> {code}
> struct FindNextCompactRequest {
> 1: required string workerId,
> 2: required string workerVersion
> }{code}
> these should probably be made optional, to avoid breaking compaction if 
> they're not available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25628) Avoid unnecessary file ops if Iceberg table is LLAP cached

2021-10-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25628?focusedWorklogId=669965=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-669965
 ]

ASF GitHub Bot logged work on HIVE-25628:
-

Author: ASF GitHub Bot
Created on: 26/Oct/21 09:13
Start Date: 26/Oct/21 09:13
Worklog Time Spent: 10m 
  Work Description: szlta opened a new pull request #2748:
URL: https://github.com/apache/hive/pull/2748


   In case the query execution is vectorized for an Iceberg table, we need to 
make an extra file open operation on the ORC file to learn what the file schema 
is (to be matched later with the logical schema).
   
   In LLAP configuration the file schema could be retrieved through LLAP cache 
as ORC metadata is cached, so we should avoid the file operation when possible.
   
   Also: LLAP relies on cache keys that are usually triplets of file 
information and is constructed by an FS.listStatus call. For iceberg tables we 
should rely on such file information provided by Iceberg's metadata to spare 
this call too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 669965)
Remaining Estimate: 0h
Time Spent: 10m

> Avoid unnecessary file ops if Iceberg table is LLAP cached
> --
>
> Key: HIVE-25628
> URL: https://issues.apache.org/jira/browse/HIVE-25628
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In case the query execution is vectorized for an Iceberg table, we need to 
> make an extra file open operation on the ORC file to learn what the file 
> schema is (to be matched later with the logical schema).
> In LLAP configuration the file schema could be retrieved through LLAP cache 
> as ORC metadata is cached, so we should avoid the file operation when 
> possible.
> Also: LLAP relies on cache keys that are usually triplets of file information 
> and is constructed by an FS.listStatus call. For iceberg tables we should 
> rely on such file information provided by Iceberg's metadata to spare this 
> call too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25642) Log a warning if multiple Compaction Worker versions are running compactions

2021-10-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25642?focusedWorklogId=669963=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-669963
 ]

ASF GitHub Bot logged work on HIVE-25642:
-

Author: ASF GitHub Bot
Created on: 26/Oct/21 09:10
Start Date: 26/Oct/21 09:10
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on a change in pull request #2743:
URL: https://github.com/apache/hive/pull/2743#discussion_r736302385



##
File path: 
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java
##
@@ -571,6 +571,12 @@ public static ConfVars getMetaConf(String name) {
 "tables or partitions to be compacted once they are determined to 
need compaction.\n" +
 "It will also increase the background load on the Hadoop cluster 
as more MapReduce jobs\n" +
 "will be running in the background."),
+
COMPACTOR_WORKER_DETECT_MULTIPLE_VERSION_THRESHOLD("metastore.compactor.worker.detect_multiple_versions.threshold",

Review comment:
   Could you please use `.` instead of `_` in the config parameter name? 
   `metastore.compactor.worker.detect.multiple.versions.threshold`

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/metrics/AcidMetricService.java
##
@@ -104,12 +110,43 @@ public void run() {
 }
   }
 
-  private void collectMetrics() throws MetaException {
-ShowCompactResponse currentCompactions = txnHandler.showCompact(new 
ShowCompactRequest());
+  private void detectMultipleWorkerVersions(ShowCompactResponse 
currentCompactions) {
+long workerVersionThresholdInHours = MetastoreConf.getLongVar(conf,
+
MetastoreConf.ConfVars.COMPACTOR_WORKER_DETECT_MULTIPLE_VERSION_THRESHOLD);
+long since = System.currentTimeMillis() - 
hoursInMillis(workerVersionThresholdInHours);
+
+List versions = 
collectWorkerVersions(currentCompactions.getCompacts(), since);
+if (versions.size() > 1) {
+  LOG.warn("Multiple Compaction Worker versions detected: {}", versions);
+}
+  }
+
+  private void updateMetrics(ShowCompactResponse currentCompactions) throws 
MetaException {
 updateMetricsFromShowCompact(currentCompactions, conf);
 updateDBMetrics();
   }
 
+  @VisibleForTesting
+  public static long hoursInMillis(long hours) {

Review comment:
   You don't need this method, if you get the conf value using 
   `MetastoreConf.getTimeVar(conf, 
MetastoreConf.ConfVars.COMPACTOR_WORKER_DETECT_MULTIPLE_VERSION_THRESHOLD, 
TimeUnit.MILLISECONDS)`

##
File path: 
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java
##
@@ -571,6 +571,12 @@ public static ConfVars getMetaConf(String name) {
 "tables or partitions to be compacted once they are determined to 
need compaction.\n" +
 "It will also increase the background load on the Hadoop cluster 
as more MapReduce jobs\n" +
 "will be running in the background."),
+
COMPACTOR_WORKER_DETECT_MULTIPLE_VERSION_THRESHOLD("metastore.compactor.worker.detect_multiple_versions.threshold",
+  "hive.metastore.compactor.worker.detect_versions.threshold", 24,

Review comment:
   You should define the default time unit. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 669963)
Time Spent: 0.5h  (was: 20m)

> Log a warning if multiple Compaction Worker versions are running compactions
> 
>
> Key: HIVE-25642
> URL: https://issues.apache.org/jira/browse/HIVE-25642
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Viktor Csomor
>Assignee: Viktor Csomor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Log a warning if multiple versions of a Compaction Workers are running 
> compactions.
> The start time of the individual HMS services are not stored at the moment, 
> however this information could proved a good baseline of detecting multiple 
> Worker versions. 
> Due to the lack of this information we can check periodically in the past N 
> hours to detect the versions.
> The N hours can be configured by 
> {{metastore.compactor.worker.detect_multiple_versions.threshold}} property.
> This periodical check only make sense if the

[jira] [Work started] (HIVE-25628) Avoid unnecessary file ops if Iceberg table is LLAP cached

2021-10-26 Thread Jira



 [ 
https://issues.apache.org/jira/browse/HIVE-25628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-25628 started by Ádám Szita.
-
> Avoid unnecessary file ops if Iceberg table is LLAP cached
> --
>
> Key: HIVE-25628
> URL: https://issues.apache.org/jira/browse/HIVE-25628
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>
> In case the query execution is vectorized for an Iceberg table, we need to 
> make an extra file open operation on the ORC file to learn what the file 
> schema is (to be matched later with the logical schema).
> In LLAP configuration the file schema could be retrieved through LLAP cache 
> as ORC metadata is cached, so we should avoid the file operation when 
> possible.
> Also: LLAP relies on cache keys that are usually triplets of file information 
> and is constructed by an FS.listStatus call. For iceberg tables we should 
> rely on such file information provided by Iceberg's metadata to spare this 
> call too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25650) Make workerId and workerVersionId optional in the FindNextCompactRequest

2021-10-26 Thread Viktor Csomor (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viktor Csomor updated HIVE-25650:
-
Summary: Make workerId and workerVersionId optional in the 
FindNextCompactRequest  (was: Make WorkerId and WorkerVersionId optional in the 
FindNextCompactRequest)

> Make workerId and workerVersionId optional in the FindNextCompactRequest
> 
>
> Key: HIVE-25650
> URL: https://issues.apache.org/jira/browse/HIVE-25650
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Viktor Csomor
>Assignee: Viktor Csomor
>Priority: Minor
>
> In hive_metastore.thrift the FindNextCompactRequest struct's fields are 
> required:
> {code}
> struct FindNextCompactRequest {
> 1: required string workerId,
> 2: required string workerVersion
> }{code}
> these should probably be made optional, to avoid breaking compaction if 
> they're not available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-25650) Make WorkerId and WorkerVersionId optional in the FindNextCompactRequest

2021-10-26 Thread Csomor Viktor (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csomor Viktor reassigned HIVE-25650:


Assignee: Csomor Viktor

> Make WorkerId and WorkerVersionId optional in the FindNextCompactRequest
> 
>
> Key: HIVE-25650
> URL: https://issues.apache.org/jira/browse/HIVE-25650
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Csomor Viktor
>Assignee: Csomor Viktor
>Priority: Minor
>
> In hive_metastore.thrift the FindNextCompactRequest struct's fields are 
> required:
> {code}
> struct FindNextCompactRequest {
> 1: required string workerId,
> 2: required string workerVersion
> }{code}
> these should probably be made optional, to avoid breaking compaction if 
> they're not available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25642) Log a warning if multiple Compaction Worker versions are running compactions

2021-10-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25642?focusedWorklogId=669952=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-669952
 ]

ASF GitHub Bot logged work on HIVE-25642:
-

Author: ASF GitHub Bot
Created on: 26/Oct/21 08:39
Start Date: 26/Oct/21 08:39
Worklog Time Spent: 10m 
  Work Description: vcsomor commented on pull request #2743:
URL: https://github.com/apache/hive/pull/2743#issuecomment-951714388


   @lcspinter could you please check


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 669952)
Time Spent: 20m  (was: 10m)

> Log a warning if multiple Compaction Worker versions are running compactions
> 
>
> Key: HIVE-25642
> URL: https://issues.apache.org/jira/browse/HIVE-25642
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Csomor Viktor
>Assignee: Csomor Viktor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Log a warning if multiple versions of a Compaction Workers are running 
> compactions.
> The start time of the individual HMS services are not stored at the moment, 
> however this information could proved a good baseline of detecting multiple 
> Worker versions. 
> Due to the lack of this information we can check periodically in the past N 
> hours to detect the versions.
> The N hours can be configured by 
> {{metastore.compactor.worker.detect_multiple_versions.threshold}} property.
> This periodical check only make sense if the Compaction are enabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25638) Select returns deleted records in Hive ACID table

2021-10-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25638:
--
Labels: pull-request-available  (was: )

> Select returns deleted records in Hive ACID table
> -
>
> Key: HIVE-25638
> URL: https://issues.apache.org/jira/browse/HIVE-25638
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hive stores the stripe stats in the ORC files. During select, these stats are 
> used to create the SARG. The SARG is used to reduce the records read from the 
> delete-delta files. Currently, in case where the number of stripes are more 
> than 1, the SARG generated is not proper as it uses the first stripe index 
> for both min and max key interval. The max key interval should be obtained 
> from last stripe index. This cause some valid deleted records to be skipped. 
> And those records are return to the user.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25638) Select returns deleted records in Hive ACID table

2021-10-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25638?focusedWorklogId=669941=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-669941
 ]

ASF GitHub Bot logged work on HIVE-25638:
-

Author: ASF GitHub Bot
Created on: 26/Oct/21 08:02
Start Date: 26/Oct/21 08:02
Worklog Time Spent: 10m 
  Work Description: maheshk114 opened a new pull request #2747:
URL: https://github.com/apache/hive/pull/2747


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 669941)
Remaining Estimate: 0h
Time Spent: 10m

> Select returns deleted records in Hive ACID table
> -
>
> Key: HIVE-25638
> URL: https://issues.apache.org/jira/browse/HIVE-25638
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hive stores the stripe stats in the ORC files. During select, these stats are 
> used to create the SARG. The SARG is used to reduce the records read from the 
> delete-delta files. Currently, in case where the number of stripes are more 
> than 1, the SARG generated is not proper as it uses the first stripe index 
> for both min and max key interval. The max key interval should be obtained 
> from last stripe index. This cause some valid deleted records to be skipped. 
> And those records are return to the user.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

43 matches

Mail list logo