[jira] [Work logged] (HIVE-25268) date_format udf doesn't work for dates prior to 1900 if the timezone is different from UTC

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25268?focusedWorklogId=612959=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612959
 ]

ASF GitHub Bot logged work on HIVE-25268:
-

Author: ASF GitHub Bot
Created on: 22/Jun/21 05:29
Start Date: 22/Jun/21 05:29
Worklog Time Spent: 10m 
  Work Description: guptanikhil007 commented on a change in pull request 
#2409:
URL: https://github.com/apache/hive/pull/2409#discussion_r655886832



##
File path: ql/src/test/queries/clientpositive/udf_date_format.q
##
@@ -78,3 +78,16 @@ select date_format("2015-04-08 10:30:45","-MM-dd 
HH:mm:ss.SSS z");
 --julian date
 set hive.local.time.zone=UTC;
 select date_format("1001-01-05","dd---MM--");
+
+--dates prior to 1900
+set hive.local.time.zone=Asia/Bangkok;
+select date_format('1400-01-14 01:01:10.123', '-MM-dd HH:mm:ss.SSS z');
+select date_format('1800-01-14 01:01:10.123', '-MM-dd HH:mm:ss.SSS z');
+
+set hive.local.time.zone=Europe/Berlin;
+select date_format('1400-01-14 01:01:10.123', '-MM-dd HH:mm:ss.SSS z');
+select date_format('1800-01-14 01:01:10.123', '-MM-dd HH:mm:ss.SSS z');
+
+set hive.local.time.zone=Africa/Johannesburg;
+select date_format('1400-01-14 01:01:10.123', '-MM-dd HH:mm:ss.SSS z');

Review comment:
   Once this patch is merged I will update the Hive wiki as well




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 612959)
Time Spent: 3h 50m  (was: 3h 40m)

> date_format udf doesn't work for dates prior to 1900 if the timezone is 
> different from UTC
> --
>
> Key: HIVE-25268
> URL: https://issues.apache.org/jira/browse/HIVE-25268
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 3.1.0, 3.1.1, 3.1.2, 4.0.0
>Reporter: Nikhil Gupta
>Assignee: Nikhil Gupta
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> *Hive 1.2.1*:
> {code:java}
>  select date_format('1400-01-14 01:00:00', '-MM-dd HH:mm:ss z');
> +--+--+
> |   _c0|
> +--+--+
> | 1400-01-14 01:00:00 ICT  |
> +--+--+
> select date_format('1800-01-14 01:00:00', '-MM-dd HH:mm:ss z');
> +--+--+
> |   _c0|
> +--+--+
> | 1800-01-14 01:00:00 ICT  |
> +--+--+
> {code}
> *Hive 3.1, Hive 4.0:*
> {code:java}
> select date_format('1400-01-14 01:00:00', '-MM-dd HH:mm:ss z');
> +--+
> |   _c0|
> +--+
> | 1400-01-06 01:17:56 ICT  |
> +--+
> select date_format('1800-01-14 01:00:00', '-MM-dd HH:mm:ss z');
> +--+
> |   _c0|
> +--+
> | 1800-01-14 01:17:56 ICT  |
> +--+
> {code}
> VM timezone is set to 'Asia/Bangkok'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25268) date_format udf doesn't work for dates prior to 1900 if the timezone is different from UTC

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25268?focusedWorklogId=612958=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612958
 ]

ASF GitHub Bot logged work on HIVE-25268:
-

Author: ASF GitHub Bot
Created on: 22/Jun/21 05:28
Start Date: 22/Jun/21 05:28
Worklog Time Spent: 10m 
  Work Description: guptanikhil007 commented on a change in pull request 
#2409:
URL: https://github.com/apache/hive/pull/2409#discussion_r655886389



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFDateFormat.java
##
@@ -111,17 +123,18 @@ public Object evaluate(DeferredObject[] arguments) throws 
HiveException {
 // the function should support both short date and full timestamp format
 // time part of the timestamp should not be skipped
 Timestamp ts = getTimestampValue(arguments, 0, tsConverters);
+
 if (ts == null) {
   Date d = getDateValue(arguments, 0, dtInputTypes, dtConverters);
   if (d == null) {
 return null;
   }
   ts = Timestamp.ofEpochMilli(d.toEpochMilli(id), id);
 }
-
-
-date.setTime(ts.toEpochMilli(id));
-String res = formatter.format(date);
+Timestamp ts2 = TimestampTZUtil.convertTimestampToZone(ts, timeZone, 
ZoneId.of("UTC"));

Review comment:
   done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 612958)
Time Spent: 3h 40m  (was: 3.5h)

> date_format udf doesn't work for dates prior to 1900 if the timezone is 
> different from UTC
> --
>
> Key: HIVE-25268
> URL: https://issues.apache.org/jira/browse/HIVE-25268
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 3.1.0, 3.1.1, 3.1.2, 4.0.0
>Reporter: Nikhil Gupta
>Assignee: Nikhil Gupta
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> *Hive 1.2.1*:
> {code:java}
>  select date_format('1400-01-14 01:00:00', '-MM-dd HH:mm:ss z');
> +--+--+
> |   _c0|
> +--+--+
> | 1400-01-14 01:00:00 ICT  |
> +--+--+
> select date_format('1800-01-14 01:00:00', '-MM-dd HH:mm:ss z');
> +--+--+
> |   _c0|
> +--+--+
> | 1800-01-14 01:00:00 ICT  |
> +--+--+
> {code}
> *Hive 3.1, Hive 4.0:*
> {code:java}
> select date_format('1400-01-14 01:00:00', '-MM-dd HH:mm:ss z');
> +--+
> |   _c0|
> +--+
> | 1400-01-06 01:17:56 ICT  |
> +--+
> select date_format('1800-01-14 01:00:00', '-MM-dd HH:mm:ss z');
> +--+
> |   _c0|
> +--+
> | 1800-01-14 01:17:56 ICT  |
> +--+
> {code}
> VM timezone is set to 'Asia/Bangkok'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25268) date_format udf doesn't work for dates prior to 1900 if the timezone is different from UTC

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25268?focusedWorklogId=612955=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612955
 ]

ASF GitHub Bot logged work on HIVE-25268:
-

Author: ASF GitHub Bot
Created on: 22/Jun/21 04:42
Start Date: 22/Jun/21 04:42
Worklog Time Spent: 10m 
  Work Description: guptanikhil007 commented on a change in pull request 
#2409:
URL: https://github.com/apache/hive/pull/2409#discussion_r655870639



##
File path: ql/src/test/queries/clientpositive/udf_date_format.q
##
@@ -78,3 +78,16 @@ select date_format("2015-04-08 10:30:45","-MM-dd 
HH:mm:ss.SSS z");
 --julian date
 set hive.local.time.zone=UTC;
 select date_format("1001-01-05","dd---MM--");
+
+--dates prior to 1900
+set hive.local.time.zone=Asia/Bangkok;
+select date_format('1400-01-14 01:01:10.123', '-MM-dd HH:mm:ss.SSS z');
+select date_format('1800-01-14 01:01:10.123', '-MM-dd HH:mm:ss.SSS z');
+
+set hive.local.time.zone=Europe/Berlin;
+select date_format('1400-01-14 01:01:10.123', '-MM-dd HH:mm:ss.SSS z');
+select date_format('1800-01-14 01:01:10.123', '-MM-dd HH:mm:ss.SSS z');
+
+set hive.local.time.zone=Africa/Johannesburg;
+select date_format('1400-01-14 01:01:10.123', '-MM-dd HH:mm:ss.SSS z');

Review comment:
   All the existing tests with SimpleDateFormat Formatter is passing except 
the milliseconds change which I have mentioned in my comment.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 612955)
Time Spent: 3.5h  (was: 3h 20m)

> date_format udf doesn't work for dates prior to 1900 if the timezone is 
> different from UTC
> --
>
> Key: HIVE-25268
> URL: https://issues.apache.org/jira/browse/HIVE-25268
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 3.1.0, 3.1.1, 3.1.2, 4.0.0
>Reporter: Nikhil Gupta
>Assignee: Nikhil Gupta
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> *Hive 1.2.1*:
> {code:java}
>  select date_format('1400-01-14 01:00:00', '-MM-dd HH:mm:ss z');
> +--+--+
> |   _c0|
> +--+--+
> | 1400-01-14 01:00:00 ICT  |
> +--+--+
> select date_format('1800-01-14 01:00:00', '-MM-dd HH:mm:ss z');
> +--+--+
> |   _c0|
> +--+--+
> | 1800-01-14 01:00:00 ICT  |
> +--+--+
> {code}
> *Hive 3.1, Hive 4.0:*
> {code:java}
> select date_format('1400-01-14 01:00:00', '-MM-dd HH:mm:ss z');
> +--+
> |   _c0|
> +--+
> | 1400-01-06 01:17:56 ICT  |
> +--+
> select date_format('1800-01-14 01:00:00', '-MM-dd HH:mm:ss z');
> +--+
> |   _c0|
> +--+
> | 1800-01-14 01:17:56 ICT  |
> +--+
> {code}
> VM timezone is set to 'Asia/Bangkok'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25231) Add an ability to migrate CSV generated to hive table in replstats

2021-06-21 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena resolved HIVE-25231.
-
Hadoop Flags: Reviewed
  Resolution: Fixed

> Add an ability to migrate CSV generated to hive table in replstats
> --
>
> Key: HIVE-25231
> URL: https://issues.apache.org/jira/browse/HIVE-25231
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Add an option to replstats.sh to load the CSV generated using the replication 
> policy into a hive table/view.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25207) Expose incremental load statistics via JMX

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25207?focusedWorklogId=612952=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612952
 ]

ASF GitHub Bot logged work on HIVE-25207:
-

Author: ASF GitHub Bot
Created on: 22/Jun/21 04:01
Start Date: 22/Jun/21 04:01
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on pull request #2356:
URL: https://github.com/apache/hive/pull/2356#issuecomment-865509568


   **First Snap:**
   
![image](https://user-images.githubusercontent.com/25608848/122861071-720ec080-d33c-11eb-99d3-fd03a10b0738.png)
   **Second Snap**
   
![image](https://user-images.githubusercontent.com/25608848/122861110-8226a000-d33c-11eb-92ec-8854aeb7c07c.png)
   
   **Third Snap**
   
![image](https://user-images.githubusercontent.com/25608848/122861138-8d79cb80-d33c-11eb-843a-c325e6efc48d.png)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 612952)
Time Spent: 1h 20m  (was: 1h 10m)

> Expose incremental load statistics via JMX
> --
>
> Key: HIVE-25207
> URL: https://issues.apache.org/jira/browse/HIVE-25207
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Expose the incremental load details and statistics at per policy level in the 
> JMX.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24078) result rows not equal in mr and tez

2021-06-21 Thread dailong (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dailong updated HIVE-24078:
---
Summary: result rows not equal in mr and tez  (was: result rows not equal 
in mr and tez.)

> result rows not equal in mr and tez
> ---
>
> Key: HIVE-24078
> URL: https://issues.apache.org/jira/browse/HIVE-24078
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Tez
>Affects Versions: 3.1.2
>Reporter: kuqiqi
>Priority: Blocker
>
> select
> rank_num,
> province_name,
> programset_id,
> programset_name,
> programset_type,
> cv,
> uv,
> pt,
> rank_num2,
> rank_num3,
> city_name,
> level,
> cp_code,
> cp_name,
> version_type,
> zz.city_code,
> zz.province_alias,
> '20200815' dt
> from 
> (SELECT row_number() over(partition BY 
> a1.province_alias,a1.city_code,a1.version_type
>  ORDER BY cast(a1.cv AS bigint) DESC) AS rank_num,
>  province_name(a1.province_alias) AS province_name,
>  a1.program_set_id AS programset_id,
>  a2.programset_name,
>  a2.type_name AS programset_type,
>  a1.cv,
>  a1.uv,
>  cast(a1.pt/360 as decimal(20,2)) pt,
>  row_number() over (partition by 
> a1.province_alias,a1.city_code,a1.version_type order by cast(a1.uv as bigint) 
> desc ) as rank_num2,
>  row_number() over (partition by 
> a1.province_alias,a1.city_code,a1.version_type order by cast(a1.pt as bigint) 
> desc ) as rank_num3,
>  a1.city_code,
>  a1.city_name,
>  '3' as level,
>  a2.cp_code,
>  a2.cp_name,
>  '20200815'as dt,
>  a1.province_alias,
>  a1.version_type
> FROM temp.dmp_device_vod_valid_day_v1_20200815_hn a1
> LEFT JOIN temp.dmp_device_vod_valid_day_v2_20200815_hn a2 ON 
> a1.program_set_id=a2.programset_id
> WHERE a2.programset_name IS NOT NULL ) zz
> where rank_num<1000 or rank_num2<1000 or rank_num3<1000
> ;
>  
> This sql gets 76742 rows in mr, but 76681 rows in tez.How to fix it?
> I think the problem maybe lies in row_number.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24078) result rows not equal in mr and tez.

2021-06-21 Thread dailong (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dailong updated HIVE-24078:
---
Summary: result rows not equal in mr and tez.  (was: result rows not equal 
in mr and tez)

> result rows not equal in mr and tez.
> 
>
> Key: HIVE-24078
> URL: https://issues.apache.org/jira/browse/HIVE-24078
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Tez
>Affects Versions: 3.1.2
>Reporter: kuqiqi
>Priority: Blocker
>
> select
> rank_num,
> province_name,
> programset_id,
> programset_name,
> programset_type,
> cv,
> uv,
> pt,
> rank_num2,
> rank_num3,
> city_name,
> level,
> cp_code,
> cp_name,
> version_type,
> zz.city_code,
> zz.province_alias,
> '20200815' dt
> from 
> (SELECT row_number() over(partition BY 
> a1.province_alias,a1.city_code,a1.version_type
>  ORDER BY cast(a1.cv AS bigint) DESC) AS rank_num,
>  province_name(a1.province_alias) AS province_name,
>  a1.program_set_id AS programset_id,
>  a2.programset_name,
>  a2.type_name AS programset_type,
>  a1.cv,
>  a1.uv,
>  cast(a1.pt/360 as decimal(20,2)) pt,
>  row_number() over (partition by 
> a1.province_alias,a1.city_code,a1.version_type order by cast(a1.uv as bigint) 
> desc ) as rank_num2,
>  row_number() over (partition by 
> a1.province_alias,a1.city_code,a1.version_type order by cast(a1.pt as bigint) 
> desc ) as rank_num3,
>  a1.city_code,
>  a1.city_name,
>  '3' as level,
>  a2.cp_code,
>  a2.cp_name,
>  '20200815'as dt,
>  a1.province_alias,
>  a1.version_type
> FROM temp.dmp_device_vod_valid_day_v1_20200815_hn a1
> LEFT JOIN temp.dmp_device_vod_valid_day_v2_20200815_hn a2 ON 
> a1.program_set_id=a2.programset_id
> WHERE a2.programset_name IS NOT NULL ) zz
> where rank_num<1000 or rank_num2<1000 or rank_num3<1000
> ;
>  
> This sql gets 76742 rows in mr, but 76681 rows in tez.How to fix it?
> I think the problem maybe lies in row_number.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24078) result rows not equal in mr and tez

2021-06-21 Thread dailong (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dailong updated HIVE-24078:
---
Summary: result rows not equal in mr and tez  (was: result rows not equal 
in mr and tez.)

> result rows not equal in mr and tez
> ---
>
> Key: HIVE-24078
> URL: https://issues.apache.org/jira/browse/HIVE-24078
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Tez
>Affects Versions: 3.1.2
>Reporter: kuqiqi
>Priority: Blocker
>
> select
> rank_num,
> province_name,
> programset_id,
> programset_name,
> programset_type,
> cv,
> uv,
> pt,
> rank_num2,
> rank_num3,
> city_name,
> level,
> cp_code,
> cp_name,
> version_type,
> zz.city_code,
> zz.province_alias,
> '20200815' dt
> from 
> (SELECT row_number() over(partition BY 
> a1.province_alias,a1.city_code,a1.version_type
>  ORDER BY cast(a1.cv AS bigint) DESC) AS rank_num,
>  province_name(a1.province_alias) AS province_name,
>  a1.program_set_id AS programset_id,
>  a2.programset_name,
>  a2.type_name AS programset_type,
>  a1.cv,
>  a1.uv,
>  cast(a1.pt/360 as decimal(20,2)) pt,
>  row_number() over (partition by 
> a1.province_alias,a1.city_code,a1.version_type order by cast(a1.uv as bigint) 
> desc ) as rank_num2,
>  row_number() over (partition by 
> a1.province_alias,a1.city_code,a1.version_type order by cast(a1.pt as bigint) 
> desc ) as rank_num3,
>  a1.city_code,
>  a1.city_name,
>  '3' as level,
>  a2.cp_code,
>  a2.cp_name,
>  '20200815'as dt,
>  a1.province_alias,
>  a1.version_type
> FROM temp.dmp_device_vod_valid_day_v1_20200815_hn a1
> LEFT JOIN temp.dmp_device_vod_valid_day_v2_20200815_hn a2 ON 
> a1.program_set_id=a2.programset_id
> WHERE a2.programset_name IS NOT NULL ) zz
> where rank_num<1000 or rank_num2<1000 or rank_num3<1000
> ;
>  
> This sql gets 76742 rows in mr, but 76681 rows in tez.How to fix it?
> I think the problem maybe lies in row_number.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24078) result rows not equal in mr and tez.

2021-06-21 Thread dailong (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dailong updated HIVE-24078:
---
Summary: result rows not equal in mr and tez.  (was: result rows not equal 
in mr and tez)

> result rows not equal in mr and tez.
> 
>
> Key: HIVE-24078
> URL: https://issues.apache.org/jira/browse/HIVE-24078
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Tez
>Affects Versions: 3.1.2
>Reporter: kuqiqi
>Priority: Blocker
>
> select
> rank_num,
> province_name,
> programset_id,
> programset_name,
> programset_type,
> cv,
> uv,
> pt,
> rank_num2,
> rank_num3,
> city_name,
> level,
> cp_code,
> cp_name,
> version_type,
> zz.city_code,
> zz.province_alias,
> '20200815' dt
> from 
> (SELECT row_number() over(partition BY 
> a1.province_alias,a1.city_code,a1.version_type
>  ORDER BY cast(a1.cv AS bigint) DESC) AS rank_num,
>  province_name(a1.province_alias) AS province_name,
>  a1.program_set_id AS programset_id,
>  a2.programset_name,
>  a2.type_name AS programset_type,
>  a1.cv,
>  a1.uv,
>  cast(a1.pt/360 as decimal(20,2)) pt,
>  row_number() over (partition by 
> a1.province_alias,a1.city_code,a1.version_type order by cast(a1.uv as bigint) 
> desc ) as rank_num2,
>  row_number() over (partition by 
> a1.province_alias,a1.city_code,a1.version_type order by cast(a1.pt as bigint) 
> desc ) as rank_num3,
>  a1.city_code,
>  a1.city_name,
>  '3' as level,
>  a2.cp_code,
>  a2.cp_name,
>  '20200815'as dt,
>  a1.province_alias,
>  a1.version_type
> FROM temp.dmp_device_vod_valid_day_v1_20200815_hn a1
> LEFT JOIN temp.dmp_device_vod_valid_day_v2_20200815_hn a2 ON 
> a1.program_set_id=a2.programset_id
> WHERE a2.programset_name IS NOT NULL ) zz
> where rank_num<1000 or rank_num2<1000 or rank_num3<1000
> ;
>  
> This sql gets 76742 rows in mr, but 76681 rows in tez.How to fix it?
> I think the problem maybe lies in row_number.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24078) result rows not equal in mr and tez

2021-06-21 Thread dailong (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dailong updated HIVE-24078:
---
Summary: result rows not equal in mr and tez  (was: result rows not equal 
in mr and tez.)

> result rows not equal in mr and tez
> ---
>
> Key: HIVE-24078
> URL: https://issues.apache.org/jira/browse/HIVE-24078
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Tez
>Affects Versions: 3.1.2
>Reporter: kuqiqi
>Priority: Blocker
>
> select
> rank_num,
> province_name,
> programset_id,
> programset_name,
> programset_type,
> cv,
> uv,
> pt,
> rank_num2,
> rank_num3,
> city_name,
> level,
> cp_code,
> cp_name,
> version_type,
> zz.city_code,
> zz.province_alias,
> '20200815' dt
> from 
> (SELECT row_number() over(partition BY 
> a1.province_alias,a1.city_code,a1.version_type
>  ORDER BY cast(a1.cv AS bigint) DESC) AS rank_num,
>  province_name(a1.province_alias) AS province_name,
>  a1.program_set_id AS programset_id,
>  a2.programset_name,
>  a2.type_name AS programset_type,
>  a1.cv,
>  a1.uv,
>  cast(a1.pt/360 as decimal(20,2)) pt,
>  row_number() over (partition by 
> a1.province_alias,a1.city_code,a1.version_type order by cast(a1.uv as bigint) 
> desc ) as rank_num2,
>  row_number() over (partition by 
> a1.province_alias,a1.city_code,a1.version_type order by cast(a1.pt as bigint) 
> desc ) as rank_num3,
>  a1.city_code,
>  a1.city_name,
>  '3' as level,
>  a2.cp_code,
>  a2.cp_name,
>  '20200815'as dt,
>  a1.province_alias,
>  a1.version_type
> FROM temp.dmp_device_vod_valid_day_v1_20200815_hn a1
> LEFT JOIN temp.dmp_device_vod_valid_day_v2_20200815_hn a2 ON 
> a1.program_set_id=a2.programset_id
> WHERE a2.programset_name IS NOT NULL ) zz
> where rank_num<1000 or rank_num2<1000 or rank_num3<1000
> ;
>  
> This sql gets 76742 rows in mr, but 76681 rows in tez.How to fix it?
> I think the problem maybe lies in row_number.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24078) result rows not equal in mr and tez.

2021-06-21 Thread dailong (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dailong updated HIVE-24078:
---
Summary: result rows not equal in mr and tez.  (was: result rows not equal 
in mr and tez)

> result rows not equal in mr and tez.
> 
>
> Key: HIVE-24078
> URL: https://issues.apache.org/jira/browse/HIVE-24078
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Tez
>Affects Versions: 3.1.2
>Reporter: kuqiqi
>Priority: Blocker
>
> select
> rank_num,
> province_name,
> programset_id,
> programset_name,
> programset_type,
> cv,
> uv,
> pt,
> rank_num2,
> rank_num3,
> city_name,
> level,
> cp_code,
> cp_name,
> version_type,
> zz.city_code,
> zz.province_alias,
> '20200815' dt
> from 
> (SELECT row_number() over(partition BY 
> a1.province_alias,a1.city_code,a1.version_type
>  ORDER BY cast(a1.cv AS bigint) DESC) AS rank_num,
>  province_name(a1.province_alias) AS province_name,
>  a1.program_set_id AS programset_id,
>  a2.programset_name,
>  a2.type_name AS programset_type,
>  a1.cv,
>  a1.uv,
>  cast(a1.pt/360 as decimal(20,2)) pt,
>  row_number() over (partition by 
> a1.province_alias,a1.city_code,a1.version_type order by cast(a1.uv as bigint) 
> desc ) as rank_num2,
>  row_number() over (partition by 
> a1.province_alias,a1.city_code,a1.version_type order by cast(a1.pt as bigint) 
> desc ) as rank_num3,
>  a1.city_code,
>  a1.city_name,
>  '3' as level,
>  a2.cp_code,
>  a2.cp_name,
>  '20200815'as dt,
>  a1.province_alias,
>  a1.version_type
> FROM temp.dmp_device_vod_valid_day_v1_20200815_hn a1
> LEFT JOIN temp.dmp_device_vod_valid_day_v2_20200815_hn a2 ON 
> a1.program_set_id=a2.programset_id
> WHERE a2.programset_name IS NOT NULL ) zz
> where rank_num<1000 or rank_num2<1000 or rank_num3<1000
> ;
>  
> This sql gets 76742 rows in mr, but 76681 rows in tez.How to fix it?
> I think the problem maybe lies in row_number.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25246) Fix the clean up of open repl created transactions

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25246?focusedWorklogId=612942=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612942
 ]

ASF GitHub Bot logged work on HIVE-25246:
-

Author: ASF GitHub Bot
Created on: 22/Jun/21 03:23
Start Date: 22/Jun/21 03:23
Worklog Time Spent: 10m 
  Work Description: hmangla98 commented on a change in pull request #2396:
URL: https://github.com/apache/hive/pull/2396#discussion_r655847667



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##
@@ -1339,7 +1340,8 @@ public void commitTxn(CommitTxnRequest rqst)
 // corresponding open txn event.
 LOG.info("Target txn id is missing for source txn id : " + 
sourceTxnId +
 " and repl policy " + rqst.getReplPolicy());
-return;
+throw new NoSuchTxnException("Source transaction: " + 
JavaUtils.txnIdToString(sourceTxnId)

Review comment:
   abort txn txnId would not have replPolicy set. So, its execution would 
not come here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 612942)
Time Spent: 2h 10m  (was: 2h)

> Fix the clean up of open repl created transactions
> --
>
> Key: HIVE-25246
> URL: https://issues.apache.org/jira/browse/HIVE-25246
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25246) Fix the clean up of open repl created transactions

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25246?focusedWorklogId=612939=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612939
 ]

ASF GitHub Bot logged work on HIVE-25246:
-

Author: ASF GitHub Bot
Created on: 22/Jun/21 03:06
Start Date: 22/Jun/21 03:06
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #2396:
URL: https://github.com/apache/hive/pull/2396#discussion_r655842392



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##
@@ -1339,7 +1340,8 @@ public void commitTxn(CommitTxnRequest rqst)
 // corresponding open txn event.
 LOG.info("Target txn id is missing for source txn id : " + 
sourceTxnId +
 " and repl policy " + rqst.getReplPolicy());
-return;
+throw new NoSuchTxnException("Source transaction: " + 
JavaUtils.txnIdToString(sourceTxnId)

Review comment:
   Didn't get this. Don't we do "abort txn  txnId" here the txn  id is of 
target and not of source, right?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 612939)
Time Spent: 2h  (was: 1h 50m)

> Fix the clean up of open repl created transactions
> --
>
> Key: HIVE-25246
> URL: https://issues.apache.org/jira/browse/HIVE-25246
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25246) Fix the clean up of open repl created transactions

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25246?focusedWorklogId=612938=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612938
 ]

ASF GitHub Bot logged work on HIVE-25246:
-

Author: ASF GitHub Bot
Created on: 22/Jun/21 03:05
Start Date: 22/Jun/21 03:05
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #2396:
URL: https://github.com/apache/hive/pull/2396#discussion_r655842392



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##
@@ -1339,7 +1340,8 @@ public void commitTxn(CommitTxnRequest rqst)
 // corresponding open txn event.
 LOG.info("Target txn id is missing for source txn id : " + 
sourceTxnId +
 " and repl policy " + rqst.getReplPolicy());
-return;
+throw new NoSuchTxnException("Source transaction: " + 
JavaUtils.txnIdToString(sourceTxnId)

Review comment:
   Didn't get this. Don't we do "abort txn  " here the txn  id is of 
target and not of source, right?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 612938)
Time Spent: 1h 50m  (was: 1h 40m)

> Fix the clean up of open repl created transactions
> --
>
> Key: HIVE-25246
> URL: https://issues.apache.org/jira/browse/HIVE-25246
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25272) READ transactions are getting logged in NOTIFICATION LOG

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25272?focusedWorklogId=612937=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612937
 ]

ASF GitHub Bot logged work on HIVE-25272:
-

Author: ASF GitHub Bot
Created on: 22/Jun/21 03:03
Start Date: 22/Jun/21 03:03
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #2413:
URL: https://github.com/apache/hive/pull/2413#discussion_r655841503



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcidTables.java
##
@@ -176,6 +177,28 @@ public void 
testReplOperationsNotCapturedInNotificationLog() throws Throwable {
 assert lastEventId == currentEventId;
   }
 
+  @Test
+  public void testREADOperationsNotCapturedInNotificationLog() throws 
Throwable {

Review comment:
   Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 612937)
Time Spent: 1h 20m  (was: 1h 10m)

> READ transactions are getting logged in NOTIFICATION LOG
> 
>
> Key: HIVE-25272
> URL: https://issues.apache.org/jira/browse/HIVE-25272
> Project: Hive
>  Issue Type: Bug
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> While READ transactions are already skipped from getting logged in 
> NOTIFICATION logs, few are still getting logged. Need to skip those 
> transactions as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25246) Fix the clean up of open repl created transactions

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25246?focusedWorklogId=612934=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612934
 ]

ASF GitHub Bot logged work on HIVE-25246:
-

Author: ASF GitHub Bot
Created on: 22/Jun/21 02:56
Start Date: 22/Jun/21 02:56
Worklog Time Spent: 10m 
  Work Description: hmangla98 commented on a change in pull request #2396:
URL: https://github.com/apache/hive/pull/2396#discussion_r655839436



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##
@@ -1339,7 +1340,8 @@ public void commitTxn(CommitTxnRequest rqst)
 // corresponding open txn event.
 LOG.info("Target txn id is missing for source txn id : " + 
sourceTxnId +
 " and repl policy " + rqst.getReplPolicy());
-return;
+throw new NoSuchTxnException("Source transaction: " + 
JavaUtils.txnIdToString(sourceTxnId)

Review comment:
   no, we have only source txn id and the replPolicy. Using this info, we 
query the REPL_TXN_MAP table and get corresponding target txn id and then 
commit this txn.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 612934)
Time Spent: 1h 40m  (was: 1.5h)

> Fix the clean up of open repl created transactions
> --
>
> Key: HIVE-25246
> URL: https://issues.apache.org/jira/browse/HIVE-25246
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25246) Fix the clean up of open repl created transactions

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25246?focusedWorklogId=612933=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612933
 ]

ASF GitHub Bot logged work on HIVE-25246:
-

Author: ASF GitHub Bot
Created on: 22/Jun/21 02:54
Start Date: 22/Jun/21 02:54
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #2396:
URL: https://github.com/apache/hive/pull/2396#discussion_r655838555



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##
@@ -1339,7 +1340,8 @@ public void commitTxn(CommitTxnRequest rqst)
 // corresponding open txn event.
 LOG.info("Target txn id is missing for source txn id : " + 
sourceTxnId +
 " and repl policy " + rqst.getReplPolicy());
-return;
+throw new NoSuchTxnException("Source transaction: " + 
JavaUtils.txnIdToString(sourceTxnId)

Review comment:
   Aren't we aborting based on target txn id so we know which target txn id 
we are looking for?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 612933)
Time Spent: 1.5h  (was: 1h 20m)

> Fix the clean up of open repl created transactions
> --
>
> Key: HIVE-25246
> URL: https://issues.apache.org/jira/browse/HIVE-25246
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25231) Add an ability to migrate CSV generated to hive table in replstats

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25231?focusedWorklogId=612931=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612931
 ]

ASF GitHub Bot logged work on HIVE-25231:
-

Author: ASF GitHub Bot
Created on: 22/Jun/21 02:27
Start Date: 22/Jun/21 02:27
Worklog Time Spent: 10m 
  Work Description: aasha merged pull request #2379:
URL: https://github.com/apache/hive/pull/2379


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 612931)
Time Spent: 40m  (was: 0.5h)

> Add an ability to migrate CSV generated to hive table in replstats
> --
>
> Key: HIVE-25231
> URL: https://issues.apache.org/jira/browse/HIVE-25231
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Add an option to replstats.sh to load the CSV generated using the replication 
> policy into a hive table/view.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25243) Llap external client - Handle nested values when the parent struct is null

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25243?focusedWorklogId=612928=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612928
 ]

ASF GitHub Bot logged work on HIVE-25243:
-

Author: ASF GitHub Bot
Created on: 22/Jun/21 01:51
Start Date: 22/Jun/21 01:51
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on a change in pull request #2391:
URL: https://github.com/apache/hive/pull/2391#discussion_r655817539



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/arrow/Serializer.java
##
@@ -347,6 +347,21 @@ private void writeStruct(NonNullableStructVector 
arrowVector, StructColumnVector
 final ColumnVector[] hiveFieldVectors = hiveVector == null ? null : 
hiveVector.fields;
 final int fieldSize = fieldTypeInfos.size();
 
+// This is to handle following scenario -
+// if any struct value itself is NULL, we get structVector.isNull[i]=true
+// but we don't get the same for it's child fields which later causes 
exceptions while setting to arrow vectors
+// see - https://issues.apache.org/jira/browse/HIVE-25243
+if (hiveVector != null && hiveFieldVectors != null) {
+  for (int i = 0; i < size; i++) {
+if (hiveVector.isNull[i]) {
+  for (ColumnVector fieldVector : hiveFieldVectors) {
+fieldVector.isNull[i] = true;

Review comment:
   to me it looks like if one of the filed is null ..then all fields are 
set to null.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 612928)
Time Spent: 1h  (was: 50m)

> Llap external client - Handle nested values when the parent struct is null
> --
>
> Key: HIVE-25243
> URL: https://issues.apache.org/jira/browse/HIVE-25243
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Shubham Chaurasia
>Assignee: Shubham Chaurasia
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Consider the following table in text format - 
> {code}
> +---+
> |  c8   |
> +---+
> | NULL  |
> | {"r":null,"s":null,"t":null}  |
> | {"r":"a","s":9,"t":2.2}   |
> +---+
> {code}
> When we query above table via llap external client, it throws following 
> exception - 
> {code:java}
> Caused by: java.lang.NullPointerException: src
> at io.netty.util.internal.ObjectUtil.checkNotNull(ObjectUtil.java:33)
> at 
> io.netty.buffer.UnsafeByteBufUtil.setBytes(UnsafeByteBufUtil.java:537)
> at 
> io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:199)
> at io.netty.buffer.WrappedByteBuf.setBytes(WrappedByteBuf.java:486)
> at 
> io.netty.buffer.UnsafeDirectLittleEndian.setBytes(UnsafeDirectLittleEndian.java:34)
> at io.netty.buffer.ArrowBuf.setBytes(ArrowBuf.java:933)
> at 
> org.apache.arrow.vector.BaseVariableWidthVector.setBytes(BaseVariableWidthVector.java:1191)
> at 
> org.apache.arrow.vector.BaseVariableWidthVector.setSafe(BaseVariableWidthVector.java:1026)
> at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.lambda$static$15(Serializer.java:834)
> at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.writeGeneric(Serializer.java:777)
> at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.writePrimitive(Serializer.java:581)
> at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:290)
> at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.writeStruct(Serializer.java:359)
> at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:296)
> at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.serializeBatch(Serializer.java:213)
> at 
> org.apache.hadoop.hive.ql.exec.vector.filesink.VectorFileSinkArrowOperator.process(VectorFileSinkArrowOperator.java:135)
> {code}
> Created a test to repro it - 
> {code:java}
> /**
>  * TestMiniLlapVectorArrowWithLlapIODisabled - turns off llap io while 
> testing LLAP external client flow.
>  * The aim of turning off LLAP IO is -
>  * when we create table through this test, LLAP caches them and returns the 
> same
>  * when we do a read query, due to this we miss some code paths which may 
> have been hit otherwise.
>  */
> public class TestMiniLlapVectorArrowWithLlapIODisabled extends 
> BaseJdbcWithMiniLlap 

[jira] [Commented] (HIVE-23556) Support hive.metastore.limit.partition.request for get_partitions_ps

2021-06-21 Thread Toshihiko Uchida (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17366918#comment-17366918
 ] 

Toshihiko Uchida commented on HIVE-23556:
-

[~kgyrtkirk] Thanks for taking a look at the issue! Got it.

> Support hive.metastore.limit.partition.request for get_partitions_ps
> 
>
> Key: HIVE-23556
> URL: https://issues.apache.org/jira/browse/HIVE-23556
> Project: Hive
>  Issue Type: Improvement
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Minor
> Attachments: HIVE-23556.2.patch, HIVE-23556.3.patch, 
> HIVE-23556.4.patch, HIVE-23556.patch
>
>
> HIVE-13884 added the configuration hive.metastore.limit.partition.request to 
> limit the number of partitions that can be requested.
> Currently, it takes in effect for the following MetaStore APIs
> * get_partitions,
> * get_partitions_with_auth,
> * get_partitions_by_filter,
> * get_partitions_spec_by_filter,
> * get_partitions_by_expr,
> but not for
> * get_partitions_ps,
> * get_partitions_ps_with_auth.
> This issue proposes to apply the configuration also to get_partitions_ps and 
> get_partitions_ps_with_auth.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25246) Fix the clean up of open repl created transactions

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25246?focusedWorklogId=612908=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612908
 ]

ASF GitHub Bot logged work on HIVE-25246:
-

Author: ASF GitHub Bot
Created on: 21/Jun/21 23:10
Start Date: 21/Jun/21 23:10
Worklog Time Spent: 10m 
  Work Description: hmangla98 commented on a change in pull request #2396:
URL: https://github.com/apache/hive/pull/2396#discussion_r655760894



##
File path: ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager.java
##
@@ -516,10 +516,17 @@ public void testHeartbeaterReplicationTxn() throws 
Exception {
 } catch (LockException e) {
   exception = e;
 }
-Assert.assertNotNull("Txn should have been aborted", exception);
-Assert.assertEquals(ErrorMsg.TXN_ABORTED, 
exception.getCanonicalErrorMsg());
+Assert.assertNotNull("Source transaction with txnId: 1, missing from 
REPL_TXN_MAP", exception);

Review comment:
   If this entry is missing, the exception would be thrown which will be 
caught up in line 517. So, e would be not null.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 612908)
Time Spent: 1h 20m  (was: 1h 10m)

> Fix the clean up of open repl created transactions
> --
>
> Key: HIVE-25246
> URL: https://issues.apache.org/jira/browse/HIVE-25246
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25246) Fix the clean up of open repl created transactions

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25246?focusedWorklogId=612905=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612905
 ]

ASF GitHub Bot logged work on HIVE-25246:
-

Author: ASF GitHub Bot
Created on: 21/Jun/21 23:05
Start Date: 21/Jun/21 23:05
Worklog Time Spent: 10m 
  Work Description: hmangla98 commented on a change in pull request #2396:
URL: https://github.com/apache/hive/pull/2396#discussion_r655759104



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##
@@ -1339,7 +1340,8 @@ public void commitTxn(CommitTxnRequest rqst)
 // corresponding open txn event.
 LOG.info("Target txn id is missing for source txn id : " + 
sourceTxnId +
 " and repl policy " + rqst.getReplPolicy());
-return;
+throw new NoSuchTxnException("Source transaction: " + 
JavaUtils.txnIdToString(sourceTxnId)

Review comment:
   target txn id is not present. that's the reason this exception is being 
thrown.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 612905)
Time Spent: 1h 10m  (was: 1h)

> Fix the clean up of open repl created transactions
> --
>
> Key: HIVE-25246
> URL: https://issues.apache.org/jira/browse/HIVE-25246
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25229) Hive lineage is not generated for columns on CREATE MATERIALIZED VIEW

2021-06-21 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez resolved HIVE-25229.

Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master, thanks [~soumyakanti.das]!

> Hive lineage is not generated for columns on CREATE MATERIALIZED VIEW
> -
>
> Key: HIVE-25229
> URL: https://issues.apache.org/jira/browse/HIVE-25229
> Project: Hive
>  Issue Type: Bug
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> While creating materialized view HookContext is supposed to send lineage info 
> which is missing.
> CREATE MATERIALIZED VIEW tbl1_view as select * from tbl1;
> Hook Context passed from hive.ql.Driver to Hive Hook of Atlas through 
> hookRunner.runPostExecHooks call doesn't have lineage info.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25229) Hive lineage is not generated for columns on CREATE MATERIALIZED VIEW

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25229?focusedWorklogId=612867=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612867
 ]

ASF GitHub Bot logged work on HIVE-25229:
-

Author: ASF GitHub Bot
Created on: 21/Jun/21 20:45
Start Date: 21/Jun/21 20:45
Worklog Time Spent: 10m 
  Work Description: jcamachor merged pull request #2377:
URL: https://github.com/apache/hive/pull/2377


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 612867)
Time Spent: 20m  (was: 10m)

> Hive lineage is not generated for columns on CREATE MATERIALIZED VIEW
> -
>
> Key: HIVE-25229
> URL: https://issues.apache.org/jira/browse/HIVE-25229
> Project: Hive
>  Issue Type: Bug
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> While creating materialized view HookContext is supposed to send lineage info 
> which is missing.
> CREATE MATERIALIZED VIEW tbl1_view as select * from tbl1;
> Hook Context passed from hive.ql.Driver to Hive Hook of Atlas through 
> hookRunner.runPostExecHooks call doesn't have lineage info.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24425) Create table in REMOTE db should fail

2021-06-21 Thread Naveen Gangam (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam resolved HIVE-24425.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

This fix has been merged to master. Closing the jira.
Thank you for the contribute [~dantongdong] and Welcome to the Hive community.

> Create table in REMOTE db should fail
> -
>
> Key: HIVE-24425
> URL: https://issues.apache.org/jira/browse/HIVE-24425
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Naveen Gangam
>Assignee: Dantong Dong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently it creates the table in that DB but show tables does not show 
> anything. Preventing the creation of table will resolve this inconsistency 
> too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24425) Create table in REMOTE db should fail

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24425?focusedWorklogId=612828=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612828
 ]

ASF GitHub Bot logged work on HIVE-24425:
-

Author: ASF GitHub Bot
Created on: 21/Jun/21 19:00
Start Date: 21/Jun/21 19:00
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on pull request #2393:
URL: https://github.com/apache/hive/pull/2393#issuecomment-865270113


   Fix has been committed to master. Please close this PR. Thank you for your 
work on this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 612828)
Time Spent: 40m  (was: 0.5h)

> Create table in REMOTE db should fail
> -
>
> Key: HIVE-24425
> URL: https://issues.apache.org/jira/browse/HIVE-24425
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Naveen Gangam
>Assignee: Dantong Dong
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently it creates the table in that DB but show tables does not show 
> anything. Preventing the creation of table will resolve this inconsistency 
> too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25246) Fix the clean up of open repl created transactions

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25246?focusedWorklogId=612817=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612817
 ]

ASF GitHub Bot logged work on HIVE-25246:
-

Author: ASF GitHub Bot
Created on: 21/Jun/21 18:26
Start Date: 21/Jun/21 18:26
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #2396:
URL: https://github.com/apache/hive/pull/2396#discussion_r655604672



##
File path: ql/src/test/org/apache/hadoop/hive/metastore/txn/TestTxnHandler.java
##
@@ -1690,6 +1701,19 @@ private void checkReplTxnForTest(Long startTxnId, Long 
endTxnId, String replPoli
 }
   }
 
+  private boolean targetTxnsPresentInReplTxnMap(Long startTxnId, Long 
endTxnId, List targetTxnId) throws Exception {
+String[] output = TestTxnDbUtil.queryToString(conf, "SELECT 
\"RTM_TARGET_TXN_ID\" FROM \"REPL_TXN_MAP\" WHERE " +
+" \"RTM_SRC_TXN_ID\" >=  " + startTxnId + "AND \"RTM_SRC_TXN_ID\" 
<=  " + endTxnId).split("\n");
+List replayedTxns = new ArrayList<>();
+for (int idx = 1; idx < output.length; idx++) {
+  Long txnId = Long.parseLong(output[idx].trim());
+  if (targetTxnId.contains(txnId)) {

Review comment:
   Do you really need this check?

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##
@@ -1339,7 +1340,8 @@ public void commitTxn(CommitTxnRequest rqst)
 // corresponding open txn event.
 LOG.info("Target txn id is missing for source txn id : " + 
sourceTxnId +

Review comment:
   This isn't an info level log. We can remove it actually as the exception 
is being thrown and that would appear any way

##
File path: ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager.java
##
@@ -516,10 +516,17 @@ public void testHeartbeaterReplicationTxn() throws 
Exception {
 } catch (LockException e) {
   exception = e;
 }
-Assert.assertNotNull("Txn should have been aborted", exception);
-Assert.assertEquals(ErrorMsg.TXN_ABORTED, 
exception.getCanonicalErrorMsg());
+Assert.assertNotNull("Source transaction with txnId: 1, missing from 
REPL_TXN_MAP", exception);

Review comment:
   The message you get if the assertion fails. If this entry is missing, 
the exception object would be null. Is that expected? 

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##
@@ -1339,7 +1340,8 @@ public void commitTxn(CommitTxnRequest rqst)
 // corresponding open txn event.
 LOG.info("Target txn id is missing for source txn id : " + 
sourceTxnId +
 " and repl policy " + rqst.getReplPolicy());
-return;
+throw new NoSuchTxnException("Source transaction: " + 
JavaUtils.txnIdToString(sourceTxnId)

Review comment:
   Add target txn id as well.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 612817)
Time Spent: 1h  (was: 50m)

> Fix the clean up of open repl created transactions
> --
>
> Key: HIVE-25246
> URL: https://issues.apache.org/jira/browse/HIVE-25246
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25268) date_format udf doesn't work for dates prior to 1900 if the timezone is different from UTC

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25268?focusedWorklogId=612797=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612797
 ]

ASF GitHub Bot logged work on HIVE-25268:
-

Author: ASF GitHub Bot
Created on: 21/Jun/21 17:14
Start Date: 21/Jun/21 17:14
Worklog Time Spent: 10m 
  Work Description: sankarh commented on a change in pull request #2409:
URL: https://github.com/apache/hive/pull/2409#discussion_r655565647



##
File path: ql/src/test/queries/clientpositive/udf_date_format.q
##
@@ -78,3 +78,16 @@ select date_format("2015-04-08 10:30:45","-MM-dd 
HH:mm:ss.SSS z");
 --julian date
 set hive.local.time.zone=UTC;
 select date_format("1001-01-05","dd---MM--");
+
+--dates prior to 1900
+set hive.local.time.zone=Asia/Bangkok;
+select date_format('1400-01-14 01:01:10.123', '-MM-dd HH:mm:ss.SSS z');
+select date_format('1800-01-14 01:01:10.123', '-MM-dd HH:mm:ss.SSS z');
+
+set hive.local.time.zone=Europe/Berlin;
+select date_format('1400-01-14 01:01:10.123', '-MM-dd HH:mm:ss.SSS z');
+select date_format('1800-01-14 01:01:10.123', '-MM-dd HH:mm:ss.SSS z');
+
+set hive.local.time.zone=Africa/Johannesburg;
+select date_format('1400-01-14 01:01:10.123', '-MM-dd HH:mm:ss.SSS z');

Review comment:
   Do we have tests for other formats (to ensure DateTimeFormat doesn't 
break anything)? Also need update of wiki doc.

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFDateFormat.java
##
@@ -111,17 +123,18 @@ public Object evaluate(DeferredObject[] arguments) throws 
HiveException {
 // the function should support both short date and full timestamp format
 // time part of the timestamp should not be skipped
 Timestamp ts = getTimestampValue(arguments, 0, tsConverters);
+
 if (ts == null) {
   Date d = getDateValue(arguments, 0, dtInputTypes, dtConverters);
   if (d == null) {
 return null;
   }
   ts = Timestamp.ofEpochMilli(d.toEpochMilli(id), id);
 }
-
-
-date.setTime(ts.toEpochMilli(id));
-String res = formatter.format(date);
+Timestamp ts2 = TimestampTZUtil.convertTimestampToZone(ts, timeZone, 
ZoneId.of("UTC"));

Review comment:
   Add comments on why this conversion is needed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 612797)
Time Spent: 3h 20m  (was: 3h 10m)

> date_format udf doesn't work for dates prior to 1900 if the timezone is 
> different from UTC
> --
>
> Key: HIVE-25268
> URL: https://issues.apache.org/jira/browse/HIVE-25268
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 3.1.0, 3.1.1, 3.1.2, 4.0.0
>Reporter: Nikhil Gupta
>Assignee: Nikhil Gupta
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> *Hive 1.2.1*:
> {code:java}
>  select date_format('1400-01-14 01:00:00', '-MM-dd HH:mm:ss z');
> +--+--+
> |   _c0|
> +--+--+
> | 1400-01-14 01:00:00 ICT  |
> +--+--+
> select date_format('1800-01-14 01:00:00', '-MM-dd HH:mm:ss z');
> +--+--+
> |   _c0|
> +--+--+
> | 1800-01-14 01:00:00 ICT  |
> +--+--+
> {code}
> *Hive 3.1, Hive 4.0:*
> {code:java}
> select date_format('1400-01-14 01:00:00', '-MM-dd HH:mm:ss z');
> +--+
> |   _c0|
> +--+
> | 1400-01-06 01:17:56 ICT  |
> +--+
> select date_format('1800-01-14 01:00:00', '-MM-dd HH:mm:ss z');
> +--+
> |   _c0|
> +--+
> | 1800-01-14 01:17:56 ICT  |
> +--+
> {code}
> VM timezone is set to 'Asia/Bangkok'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25235) Remove ThreadPoolExecutorWithOomHook

2021-06-21 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor resolved HIVE-25235.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master. Thanks [~mgergely] and [~dengzh] for the reviews!

> Remove ThreadPoolExecutorWithOomHook
> 
>
> Key: HIVE-25235
> URL: https://issues.apache.org/jira/browse/HIVE-25235
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> While I was looking at [HIVE-24846] to better perform OOM logging and I just 
> realized that this is not a good way to handle OOM.
> https://stackoverflow.com/questions/1692230/is-it-possible-to-catch-out-of-memory-exception-in-java
> bq. there's likely no easy way for you to recover from it if you do catch it
> If we want to handle OOM, it's best to do it from outside. It's best to do it 
> with the JVM facilities:
> {{-XX:+ExitOnOutOfMemoryError}}
> {{-XX:OnOutOfMemoryError}}
> It seems odd that the OOM handler attempts to load a handler and then do more 
> work when clearly the server is hosed at this point and just requesting to do 
> more work will further add to memory pressure.
> The current OOM logic in {{HiveServer2OomHookRunner}} causes HiveServer2 to 
> shutdown, but we already have that with the JVM shutdown hook.  This JVM 
> shutdown hook is triggered if {{-XX:OnOutOfMemoryError="kill -9 %p"}} exists 
> and is the appropriate thing to do.
> https://github.com/apache/hive/blob/328d197431b2ff1000fd9c56ce758013eff81ad8/service/src/java/org/apache/hive/service/server/HiveServer2.java#L443-L444
> https://github.com/apache/hive/blob/cb0541a31b87016fae8e4c0e7130532c6e5f8de7/service/src/java/org/apache/hive/service/server/HiveServer2OomHookRunner.java#L42-L44



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25235) Remove ThreadPoolExecutorWithOomHook

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25235?focusedWorklogId=612791=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612791
 ]

ASF GitHub Bot logged work on HIVE-25235:
-

Author: ASF GitHub Bot
Created on: 21/Jun/21 17:07
Start Date: 21/Jun/21 17:07
Worklog Time Spent: 10m 
  Work Description: belugabehr merged pull request #2383:
URL: https://github.com/apache/hive/pull/2383


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 612791)
Time Spent: 1h  (was: 50m)

> Remove ThreadPoolExecutorWithOomHook
> 
>
> Key: HIVE-25235
> URL: https://issues.apache.org/jira/browse/HIVE-25235
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> While I was looking at [HIVE-24846] to better perform OOM logging and I just 
> realized that this is not a good way to handle OOM.
> https://stackoverflow.com/questions/1692230/is-it-possible-to-catch-out-of-memory-exception-in-java
> bq. there's likely no easy way for you to recover from it if you do catch it
> If we want to handle OOM, it's best to do it from outside. It's best to do it 
> with the JVM facilities:
> {{-XX:+ExitOnOutOfMemoryError}}
> {{-XX:OnOutOfMemoryError}}
> It seems odd that the OOM handler attempts to load a handler and then do more 
> work when clearly the server is hosed at this point and just requesting to do 
> more work will further add to memory pressure.
> The current OOM logic in {{HiveServer2OomHookRunner}} causes HiveServer2 to 
> shutdown, but we already have that with the JVM shutdown hook.  This JVM 
> shutdown hook is triggered if {{-XX:OnOutOfMemoryError="kill -9 %p"}} exists 
> and is the appropriate thing to do.
> https://github.com/apache/hive/blob/328d197431b2ff1000fd9c56ce758013eff81ad8/service/src/java/org/apache/hive/service/server/HiveServer2.java#L443-L444
> https://github.com/apache/hive/blob/cb0541a31b87016fae8e4c0e7130532c6e5f8de7/service/src/java/org/apache/hive/service/server/HiveServer2OomHookRunner.java#L42-L44



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24951) Table created with Uppercase name using CTAS does not produce result for select queries

2021-06-21 Thread Naveen Gangam (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam resolved HIVE-24951.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Fix has been committed to master. Closing the jira. Thank you for the fix 
[~Rajkumar Singh]

> Table created with Uppercase name using CTAS does not produce result for 
> select queries
> ---
>
> Key: HIVE-24951
> URL: https://issues.apache.org/jira/browse/HIVE-24951
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 4.0.0
>Reporter: Rajkumar Singh
>Assignee: Rajkumar Singh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Steps to repro:
> {code:java}
> CREATE EXTERNAL TABLE MY_TEST AS SELECT * FROM source
> Table created with Location but does not have any data moved to it.
> /warehouse/tablespace/external/hive/MY_TEST
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24951) Table created with Uppercase name using CTAS does not produce result for select queries

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24951?focusedWorklogId=612775=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612775
 ]

ASF GitHub Bot logged work on HIVE-24951:
-

Author: ASF GitHub Bot
Created on: 21/Jun/21 16:46
Start Date: 21/Jun/21 16:46
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on pull request #2125:
URL: https://github.com/apache/hive/pull/2125#issuecomment-865187692


   Fix has been merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 612775)
Time Spent: 0.5h  (was: 20m)

> Table created with Uppercase name using CTAS does not produce result for 
> select queries
> ---
>
> Key: HIVE-24951
> URL: https://issues.apache.org/jira/browse/HIVE-24951
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 4.0.0
>Reporter: Rajkumar Singh
>Assignee: Rajkumar Singh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Steps to repro:
> {code:java}
> CREATE EXTERNAL TABLE MY_TEST AS SELECT * FROM source
> Table created with Location but does not have any data moved to it.
> /warehouse/tablespace/external/hive/MY_TEST
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25272) READ transactions are getting logged in NOTIFICATION LOG

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25272?focusedWorklogId=612747=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612747
 ]

ASF GitHub Bot logged work on HIVE-25272:
-

Author: ASF GitHub Bot
Created on: 21/Jun/21 16:07
Start Date: 21/Jun/21 16:07
Worklog Time Spent: 10m 
  Work Description: pvary commented on pull request #2413:
URL: https://github.com/apache/hive/pull/2413#issuecomment-865160700


   @deniskuzZ: This PR extends the scope of the READ-ONLY transactions. If you 
have time, could you please take a look?
   
   Thanks,
   Peter


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 612747)
Time Spent: 1h  (was: 50m)

> READ transactions are getting logged in NOTIFICATION LOG
> 
>
> Key: HIVE-25272
> URL: https://issues.apache.org/jira/browse/HIVE-25272
> Project: Hive
>  Issue Type: Bug
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> While READ transactions are already skipped from getting logged in 
> NOTIFICATION logs, few are still getting logged. Need to skip those 
> transactions as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25272) READ transactions are getting logged in NOTIFICATION LOG

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25272?focusedWorklogId=612745=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612745
 ]

ASF GitHub Bot logged work on HIVE-25272:
-

Author: ASF GitHub Bot
Created on: 21/Jun/21 16:05
Start Date: 21/Jun/21 16:05
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2413:
URL: https://github.com/apache/hive/pull/2413#discussion_r655517089



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcidTables.java
##
@@ -176,6 +177,28 @@ public void 
testReplOperationsNotCapturedInNotificationLog() throws Throwable {
 assert lastEventId == currentEventId;
   }
 
+  @Test
+  public void testREADOperationsNotCapturedInNotificationLog() throws 
Throwable {
+//Perform empty bootstrap dump and load
+primary.hiveConf.set("hive.txn.readonly.enabled", "true");
+primary.run("create table " + primaryDbName + ".t1 (id int)");
+primary.dump(primaryDbName);
+replica.run("REPL LOAD " + primaryDbName + " INTO " + replicatedDbName);
+//Perform empty incremental dump and load so that all db level properties 
are altered.
+primary.dump(primaryDbName);
+replica.run("REPL LOAD " + primaryDbName + " INTO " + replicatedDbName);
+primary.run("insert into " + primaryDbName + ".t1 values(1)");
+long lastEventId = primary.getCurrentNotificationEventId().getEventId();
+primary.run("DESCRIBE DATABASE " + primaryDbName );
+primary.run("SELECT * from " + primaryDbName + ".t1");
+primary.run("SHOW tables " + primaryDbName);

Review comment:
   What is the reason behind running these commands but discarding the 
results?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 612745)
Time Spent: 40m  (was: 0.5h)

> READ transactions are getting logged in NOTIFICATION LOG
> 
>
> Key: HIVE-25272
> URL: https://issues.apache.org/jira/browse/HIVE-25272
> Project: Hive
>  Issue Type: Bug
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> While READ transactions are already skipped from getting logged in 
> NOTIFICATION logs, few are still getting logged. Need to skip those 
> transactions as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25272) READ transactions are getting logged in NOTIFICATION LOG

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25272?focusedWorklogId=612746=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612746
 ]

ASF GitHub Bot logged work on HIVE-25272:
-

Author: ASF GitHub Bot
Created on: 21/Jun/21 16:05
Start Date: 21/Jun/21 16:05
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2413:
URL: https://github.com/apache/hive/pull/2413#discussion_r655517694



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcidTables.java
##
@@ -176,6 +177,28 @@ public void 
testReplOperationsNotCapturedInNotificationLog() throws Throwable {
 assert lastEventId == currentEventId;
   }
 
+  @Test
+  public void testREADOperationsNotCapturedInNotificationLog() throws 
Throwable {
+//Perform empty bootstrap dump and load
+primary.hiveConf.set("hive.txn.readonly.enabled", "true");
+primary.run("create table " + primaryDbName + ".t1 (id int)");
+primary.dump(primaryDbName);
+replica.run("REPL LOAD " + primaryDbName + " INTO " + replicatedDbName);
+//Perform empty incremental dump and load so that all db level properties 
are altered.
+primary.dump(primaryDbName);
+replica.run("REPL LOAD " + primaryDbName + " INTO " + replicatedDbName);
+primary.run("insert into " + primaryDbName + ".t1 values(1)");
+long lastEventId = primary.getCurrentNotificationEventId().getEventId();
+primary.run("DESCRIBE DATABASE " + primaryDbName );
+primary.run("SELECT * from " + primaryDbName + ".t1");
+primary.run("SHOW tables " + primaryDbName);

Review comment:
   never mind, I got it  




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 612746)
Time Spent: 50m  (was: 40m)

> READ transactions are getting logged in NOTIFICATION LOG
> 
>
> Key: HIVE-25272
> URL: https://issues.apache.org/jira/browse/HIVE-25272
> Project: Hive
>  Issue Type: Bug
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> While READ transactions are already skipped from getting logged in 
> NOTIFICATION logs, few are still getting logged. Need to skip those 
> transactions as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25173) Fix build failure of hive-pre-upgrade due to missing dependency on pentaho-aggdesigner-algorithm

2021-06-21 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17366693#comment-17366693
 ] 

Stamatis Zampetakis commented on HIVE-25173:


>From the Hive side, its fine no need to bring back the conjars repo. I just 
>wanted to understand the root cause and the implications, thanks Julian!

> Fix build failure of hive-pre-upgrade due to missing dependency on 
> pentaho-aggdesigner-algorithm
> 
>
> Key: HIVE-25173
> URL: https://issues.apache.org/jira/browse/HIVE-25173
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.1.2
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {noformat}
> [ERROR] Failed to execute goal on project hive-pre-upgrade: Could not resolve 
> dependencies for project org.apache.hive:hive-pre-upgrade:jar:4.0.0-SNAPSHOT: 
> Failure to find org.pentaho:pentaho-aggdesigner-algorithm:jar:5.1.5-jhyde in 
> https://repo.maven.apache.org/maven2 was cached in the local repository, 
> resolution will not be reattempted until the update interval of central has 
> elapsed or updates are forced
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25272) READ transactions are getting logged in NOTIFICATION LOG

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25272?focusedWorklogId=612742=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612742
 ]

ASF GitHub Bot logged work on HIVE-25272:
-

Author: ASF GitHub Bot
Created on: 21/Jun/21 16:01
Start Date: 21/Jun/21 16:01
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2413:
URL: https://github.com/apache/hive/pull/2413#discussion_r655513679



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcidTables.java
##
@@ -74,7 +74,7 @@
 /**
  * TestReplicationScenariosAcidTables - test replication for ACID tables.
  */
-@org.junit.Ignore("HIVE-25267")
+//@org.junit.Ignore("HIVE-25267")

Review comment:
   Does this fix the flakiness issue mentioned in HIVE-25267?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 612742)
Time Spent: 20m  (was: 10m)

> READ transactions are getting logged in NOTIFICATION LOG
> 
>
> Key: HIVE-25272
> URL: https://issues.apache.org/jira/browse/HIVE-25272
> Project: Hive
>  Issue Type: Bug
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> While READ transactions are already skipped from getting logged in 
> NOTIFICATION logs, few are still getting logged. Need to skip those 
> transactions as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25272) READ transactions are getting logged in NOTIFICATION LOG

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25272?focusedWorklogId=612743=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612743
 ]

ASF GitHub Bot logged work on HIVE-25272:
-

Author: ASF GitHub Bot
Created on: 21/Jun/21 16:01
Start Date: 21/Jun/21 16:01
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2413:
URL: https://github.com/apache/hive/pull/2413#discussion_r655514162



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcidTables.java
##
@@ -176,6 +177,28 @@ public void 
testReplOperationsNotCapturedInNotificationLog() throws Throwable {
 assert lastEventId == currentEventId;
   }
 
+  @Test
+  public void testREADOperationsNotCapturedInNotificationLog() throws 
Throwable {

Review comment:
   nit: we should use camelcase for methods




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 612743)
Time Spent: 0.5h  (was: 20m)

> READ transactions are getting logged in NOTIFICATION LOG
> 
>
> Key: HIVE-25272
> URL: https://issues.apache.org/jira/browse/HIVE-25272
> Project: Hive
>  Issue Type: Bug
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> While READ transactions are already skipped from getting logged in 
> NOTIFICATION logs, few are still getting logged. Need to skip those 
> transactions as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25269) When the skew and parallel parameters are true simultaneously, the result is less data

2021-06-21 Thread GuangMing Lu (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

GuangMing Lu updated HIVE-25269:

Attachment: P10IDS_RISKLIST.zip
p10ids_riskcon.zip
p10ids_realpayrc_ygz.zip
p10ids_prerec_split_ygz.zip
comb_classcode.zip

> When the skew and parallel parameters are true simultaneously, the result is 
> less data
> --
>
> Key: HIVE-25269
> URL: https://issues.apache.org/jira/browse/HIVE-25269
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer, SQL
>Affects Versions: 3.1.0, 3.1.2
>Reporter: GuangMing Lu
>Priority: Major
> Attachments: P10IDS_RISKLIST.zip, comb_classcode.zip, 
> p10ids_prerec_split_ygz.zip, p10ids_realpayrc_ygz.zip, p10ids_riskcon.zip, 
> test.sql
>
>
> When the params of hive.optimize.skewjoin, hive.groupby.skewindata and 
> hive.exec.parallel are true, and exec sql such as 'INSERT... FROM (SUBQUERY 
> UNIONALL ...GROUP BY...) A JOIN/LEFT JOIN A.expression', result data will be 
> reduced. Details of SQL and test data can be found in the attachment



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25269) When the skew and parallel parameters are true simultaneously, the result is less data

2021-06-21 Thread GuangMing Lu (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

GuangMing Lu updated HIVE-25269:

Attachment: test.sql

> When the skew and parallel parameters are true simultaneously, the result is 
> less data
> --
>
> Key: HIVE-25269
> URL: https://issues.apache.org/jira/browse/HIVE-25269
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer, SQL
>Affects Versions: 3.1.0, 3.1.2
>Reporter: GuangMing Lu
>Priority: Major
> Attachments: test.sql
>
>
> When the params of hive.optimize.skewjoin, hive.groupby.skewindata and 
> hive.exec.parallel are true, and exec sql such as 'INSERT... FROM (SUBQUERY 
> UNIONALL ...GROUP BY...) A JOIN/LEFT JOIN A.expression', result data will be 
> reduced. Details of SQL and test data can be found in the attachment



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25269) When the skew and parallel parameters are true simultaneously, the result is less data

2021-06-21 Thread GuangMing Lu (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

GuangMing Lu updated HIVE-25269:

Attachment: (was: comb_classcode.data)

> When the skew and parallel parameters are true simultaneously, the result is 
> less data
> --
>
> Key: HIVE-25269
> URL: https://issues.apache.org/jira/browse/HIVE-25269
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer, SQL
>Affects Versions: 3.1.0, 3.1.2
>Reporter: GuangMing Lu
>Priority: Major
>
> When the params of hive.optimize.skewjoin, hive.groupby.skewindata and 
> hive.exec.parallel are true, and exec sql such as 'INSERT... FROM (SUBQUERY 
> UNIONALL ...GROUP BY...) A JOIN/LEFT JOIN A.expression', result data will be 
> reduced. Details of SQL and test data can be found in the attachment



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25269) When the skew and parallel parameters are true simultaneously, the result is less data

2021-06-21 Thread GuangMing Lu (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

GuangMing Lu updated HIVE-25269:

Attachment: (was: 样例分析-表入数据.sql)

> When the skew and parallel parameters are true simultaneously, the result is 
> less data
> --
>
> Key: HIVE-25269
> URL: https://issues.apache.org/jira/browse/HIVE-25269
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer, SQL
>Affects Versions: 3.1.0, 3.1.2
>Reporter: GuangMing Lu
>Priority: Major
>
> When the params of hive.optimize.skewjoin, hive.groupby.skewindata and 
> hive.exec.parallel are true, and exec sql such as 'INSERT... FROM (SUBQUERY 
> UNIONALL ...GROUP BY...) A JOIN/LEFT JOIN A.expression', result data will be 
> reduced. Details of SQL and test data can be found in the attachment



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25269) When the skew and parallel parameters are true simultaneously, the result is less data

2021-06-21 Thread GuangMing Lu (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

GuangMing Lu updated HIVE-25269:

Attachment: 样例分析-表入数据.sql

> When the skew and parallel parameters are true simultaneously, the result is 
> less data
> --
>
> Key: HIVE-25269
> URL: https://issues.apache.org/jira/browse/HIVE-25269
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer, SQL
>Affects Versions: 3.1.0, 3.1.2
>Reporter: GuangMing Lu
>Priority: Major
> Attachments: comb_classcode.data, 样例分析-表入数据.sql
>
>
> When the params of hive.optimize.skewjoin, hive.groupby.skewindata and 
> hive.exec.parallel are true, and exec sql such as 'INSERT... FROM (SUBQUERY 
> UNIONALL ...GROUP BY...) A JOIN/LEFT JOIN A.expression', result data will be 
> reduced. Details of SQL and test data can be found in the attachment



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25269) When the skew and parallel parameters are true simultaneously, the result is less data

2021-06-21 Thread GuangMing Lu (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

GuangMing Lu updated HIVE-25269:

Attachment: (was: table_b_data.orc)

> When the skew and parallel parameters are true simultaneously, the result is 
> less data
> --
>
> Key: HIVE-25269
> URL: https://issues.apache.org/jira/browse/HIVE-25269
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer, SQL
>Affects Versions: 3.1.0, 3.1.2
>Reporter: GuangMing Lu
>Priority: Major
> Attachments: comb_classcode.data, 样例分析-表入数据.sql
>
>
> When the params of hive.optimize.skewjoin, hive.groupby.skewindata and 
> hive.exec.parallel are true, and exec sql such as 'INSERT... FROM (SUBQUERY 
> UNIONALL ...GROUP BY...) A JOIN/LEFT JOIN A.expression', result data will be 
> reduced. Details of SQL and test data can be found in the attachment



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25269) When the skew and parallel parameters are true simultaneously, the result is less data

2021-06-21 Thread GuangMing Lu (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

GuangMing Lu updated HIVE-25269:

Attachment: (was: test.sql)

> When the skew and parallel parameters are true simultaneously, the result is 
> less data
> --
>
> Key: HIVE-25269
> URL: https://issues.apache.org/jira/browse/HIVE-25269
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer, SQL
>Affects Versions: 3.1.0, 3.1.2
>Reporter: GuangMing Lu
>Priority: Major
> Attachments: comb_classcode.data, 样例分析-表入数据.sql
>
>
> When the params of hive.optimize.skewjoin, hive.groupby.skewindata and 
> hive.exec.parallel are true, and exec sql such as 'INSERT... FROM (SUBQUERY 
> UNIONALL ...GROUP BY...) A JOIN/LEFT JOIN A.expression', result data will be 
> reduced. Details of SQL and test data can be found in the attachment



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25269) When the skew and parallel parameters are true simultaneously, the result is less data

2021-06-21 Thread GuangMing Lu (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

GuangMing Lu updated HIVE-25269:

Attachment: (was: table_d_data.orc)

> When the skew and parallel parameters are true simultaneously, the result is 
> less data
> --
>
> Key: HIVE-25269
> URL: https://issues.apache.org/jira/browse/HIVE-25269
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer, SQL
>Affects Versions: 3.1.0, 3.1.2
>Reporter: GuangMing Lu
>Priority: Major
> Attachments: comb_classcode.data, 样例分析-表入数据.sql
>
>
> When the params of hive.optimize.skewjoin, hive.groupby.skewindata and 
> hive.exec.parallel are true, and exec sql such as 'INSERT... FROM (SUBQUERY 
> UNIONALL ...GROUP BY...) A JOIN/LEFT JOIN A.expression', result data will be 
> reduced. Details of SQL and test data can be found in the attachment



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25269) When the skew and parallel parameters are true simultaneously, the result is less data

2021-06-21 Thread GuangMing Lu (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

GuangMing Lu updated HIVE-25269:

Attachment: comb_classcode.data

> When the skew and parallel parameters are true simultaneously, the result is 
> less data
> --
>
> Key: HIVE-25269
> URL: https://issues.apache.org/jira/browse/HIVE-25269
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer, SQL
>Affects Versions: 3.1.0, 3.1.2
>Reporter: GuangMing Lu
>Priority: Major
> Attachments: comb_classcode.data, 样例分析-表入数据.sql
>
>
> When the params of hive.optimize.skewjoin, hive.groupby.skewindata and 
> hive.exec.parallel are true, and exec sql such as 'INSERT... FROM (SUBQUERY 
> UNIONALL ...GROUP BY...) A JOIN/LEFT JOIN A.expression', result data will be 
> reduced. Details of SQL and test data can be found in the attachment



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25269) When the skew and parallel parameters are true simultaneously, the result is less data

2021-06-21 Thread GuangMing Lu (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

GuangMing Lu updated HIVE-25269:

Attachment: (was: table_c_data.orc)

> When the skew and parallel parameters are true simultaneously, the result is 
> less data
> --
>
> Key: HIVE-25269
> URL: https://issues.apache.org/jira/browse/HIVE-25269
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer, SQL
>Affects Versions: 3.1.0, 3.1.2
>Reporter: GuangMing Lu
>Priority: Major
> Attachments: comb_classcode.data, 样例分析-表入数据.sql
>
>
> When the params of hive.optimize.skewjoin, hive.groupby.skewindata and 
> hive.exec.parallel are true, and exec sql such as 'INSERT... FROM (SUBQUERY 
> UNIONALL ...GROUP BY...) A JOIN/LEFT JOIN A.expression', result data will be 
> reduced. Details of SQL and test data can be found in the attachment



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25269) When the skew and parallel parameters are true simultaneously, the result is less data

2021-06-21 Thread GuangMing Lu (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

GuangMing Lu updated HIVE-25269:

Attachment: (was: table_a_data.orc)

> When the skew and parallel parameters are true simultaneously, the result is 
> less data
> --
>
> Key: HIVE-25269
> URL: https://issues.apache.org/jira/browse/HIVE-25269
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer, SQL
>Affects Versions: 3.1.0, 3.1.2
>Reporter: GuangMing Lu
>Priority: Major
> Attachments: comb_classcode.data, 样例分析-表入数据.sql
>
>
> When the params of hive.optimize.skewjoin, hive.groupby.skewindata and 
> hive.exec.parallel are true, and exec sql such as 'INSERT... FROM (SUBQUERY 
> UNIONALL ...GROUP BY...) A JOIN/LEFT JOIN A.expression', result data will be 
> reduced. Details of SQL and test data can be found in the attachment



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25209) SELECT query with SUM function producing unexpected result

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25209?focusedWorklogId=612731=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612731
 ]

ASF GitHub Bot logged work on HIVE-25209:
-

Author: ASF GitHub Bot
Created on: 21/Jun/21 15:39
Start Date: 21/Jun/21 15:39
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #2360:
URL: https://github.com/apache/hive/pull/2360#discussion_r655492864



##
File path: ql/src/test/queries/clientpositive/select_sum.q
##
@@ -0,0 +1,14 @@
+DROP DATABASE IF EXISTS db5 CASCADE;
+CREATE DATABASE db5;
+use db5;
+CREATE TABLE IF NOT EXISTS t1(c0 boolean, c1 boolean);
+
+SELECT SUM(1) FROM t1;

Review comment:
   could you please add an explain to see the optimization in action?

##
File path: ql/src/test/queries/clientpositive/select_sum.q
##
@@ -0,0 +1,14 @@
+DROP DATABASE IF EXISTS db5 CASCADE;
+CREATE DATABASE db5;
+use db5;

Review comment:
   these lines could be removed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 612731)
Time Spent: 20m  (was: 10m)

> SELECT query with SUM function producing unexpected result
> --
>
> Key: HIVE-25209
> URL: https://issues.apache.org/jira/browse/HIVE-25209
> Project: Hive
>  Issue Type: Bug
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Hive: SELECT query with SUM function producing unexpected result
> Problem Statement:
> {noformat}
> SELECT SUM(1) FROM t1;
>  result: 0
> SELECT SUM(agg0) FROM (
> SELECT SUM(1) as agg0 FROM t1 WHERE t1.c0 UNION ALL 
> SELECT SUM(1) as agg0 FROM t1 WHERE NOT (t1.c0) UNION ALL 
> SELECT SUM(1) as agg0 FROM t1 WHERE (t1.c0) IS NULL
> ) as asdf;
>  result: null {noformat}
> Steps to reproduce:
> {noformat}
> DROP DATABASE IF EXISTS db5 CASCADE;
> CREATE DATABASE db5;
> use db5;
> CREATE TABLE IF NOT EXISTS t1(c0 boolean, c1 boolean);
> SELECT SUM(1) FROM t1;
> -- result: 0
> SELECT SUM(agg0) FROM (
> SELECT SUM(1) as agg0 FROM t1 WHERE t1.c0 UNION ALL 
> SELECT SUM(1) as agg0 FROM t1 WHERE NOT (t1.c0) UNION ALL 
> SELECT SUM(1) as agg0 FROM t1 WHERE (t1.c0) IS NULL
> ) as asdf;
> -- result: null {noformat}
> Observations:
> SELECT SUM(1) as agg0 FROM t1 WHERE t1.c0 = t1.c1; – will result in null
> Similarity with postgres, 
>  both the queries result in null
> Similarity with Impala,
>  both the queries result in null



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25202) Support decimal64 operations for PTF operators

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25202?focusedWorklogId=612730=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612730
 ]

ASF GitHub Bot logged work on HIVE-25202:
-

Author: ASF GitHub Bot
Created on: 21/Jun/21 15:39
Start Date: 21/Jun/21 15:39
Worklog Time Spent: 10m 
  Work Description: ramesh0201 commented on pull request #2416:
URL: https://github.com/apache/hive/pull/2416#issuecomment-865134370


   Initial patch, need more improvements in the successive commits. Doing a 
test run for qtest.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 612730)
Time Spent: 20m  (was: 10m)

> Support decimal64 operations for PTF operators
> --
>
> Key: HIVE-25202
> URL: https://issues.apache.org/jira/browse/HIVE-25202
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Ramesh Kumar Thangarajan
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> After the support for decimal64 vectorization for multiple operators, PTF 
> operators were found guilty of breaking the decimal64 chain if they happen to 
> occur between two operators. As a result they introduce unnecessary cast to 
> decimal. In order to prevent this, we will support PTF operators to handle 
> decimal64 data types too



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25202) Support decimal64 operations for PTF operators

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25202?focusedWorklogId=612728=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612728
 ]

ASF GitHub Bot logged work on HIVE-25202:
-

Author: ASF GitHub Bot
Created on: 21/Jun/21 15:38
Start Date: 21/Jun/21 15:38
Worklog Time Spent: 10m 
  Work Description: ramesh0201 opened a new pull request #2416:
URL: https://github.com/apache/hive/pull/2416


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 612728)
Remaining Estimate: 0h
Time Spent: 10m

> Support decimal64 operations for PTF operators
> --
>
> Key: HIVE-25202
> URL: https://issues.apache.org/jira/browse/HIVE-25202
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Ramesh Kumar Thangarajan
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> After the support for decimal64 vectorization for multiple operators, PTF 
> operators were found guilty of breaking the decimal64 chain if they happen to 
> occur between two operators. As a result they introduce unnecessary cast to 
> decimal. In order to prevent this, we will support PTF operators to handle 
> decimal64 data types too



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25202) Support decimal64 operations for PTF operators

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25202:
--
Labels: pull-request-available  (was: )

> Support decimal64 operations for PTF operators
> --
>
> Key: HIVE-25202
> URL: https://issues.apache.org/jira/browse/HIVE-25202
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Ramesh Kumar Thangarajan
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> After the support for decimal64 vectorization for multiple operators, PTF 
> operators were found guilty of breaking the decimal64 chain if they happen to 
> occur between two operators. As a result they introduce unnecessary cast to 
> decimal. In order to prevent this, we will support PTF operators to handle 
> decimal64 data types too



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25216) Vectorized reading of ORC tables via Iceberg

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25216:
--
Labels: pull-request-available  (was: )

> Vectorized reading of ORC tables via Iceberg
> 
>
> Key: HIVE-25216
> URL: https://issues.apache.org/jira/browse/HIVE-25216
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As [https://github.com/apache/iceberg/pull/2613] is resolved, we should port 
> it to Hive codebase, to enable vectorized ORC reads on Iceberg-backed tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25216) Vectorized reading of ORC tables via Iceberg

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25216?focusedWorklogId=612720=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612720
 ]

ASF GitHub Bot logged work on HIVE-25216:
-

Author: ASF GitHub Bot
Created on: 21/Jun/21 15:24
Start Date: 21/Jun/21 15:24
Worklog Time Spent: 10m 
  Work Description: szlta opened a new pull request #2415:
URL: https://github.com/apache/hive/pull/2415


   As https://github.com/apache/iceberg/pull/2613 is resolved, we should port 
it to Hive codebase, to enable vectorized ORC reads on Iceberg-backed tables.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 612720)
Remaining Estimate: 0h
Time Spent: 10m

> Vectorized reading of ORC tables via Iceberg
> 
>
> Key: HIVE-25216
> URL: https://issues.apache.org/jira/browse/HIVE-25216
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As [https://github.com/apache/iceberg/pull/2613] is resolved, we should port 
> it to Hive codebase, to enable vectorized ORC reads on Iceberg-backed tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25242) Query performs extremely slow with hive.vectorized.adaptor.usage.mode = chosen

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25242?focusedWorklogId=612675=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612675
 ]

ASF GitHub Bot logged work on HIVE-25242:
-

Author: ASF GitHub Bot
Created on: 21/Jun/21 14:41
Start Date: 21/Jun/21 14:41
Worklog Time Spent: 10m 
  Work Description: zeroflag commented on pull request #2390:
URL: https://github.com/apache/hive/pull/2390#issuecomment-865088192


   @pgaref yes, I added a note on the ticket about whitelisting the concat. The 
patch was tested by a customer and didn't report any side effects.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 612675)
Time Spent: 0.5h  (was: 20m)

>  Query performs extremely slow with hive.vectorized.adaptor.usage.mode = 
> chosen
> ---
>
> Key: HIVE-25242
> URL: https://issues.apache.org/jira/browse/HIVE-25242
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> If hive.vectorized.adaptor.usage.mode is set to chosen only certain UDFS are 
> vectorized through the vectorized adaptor.
> Queries like this one, performs very slowly because the concat is not chosen 
> to be vectorized.
> {code:java}
> select count(*) from tbl where to_date(concat(year, '-', month, '-', day)) 
> between to_date('2018-12-01') and to_date('2021-03-01');  {code}
> The patch whitelists the concat udf so that it uses the vectorized adaptor in 
> chosen mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25242) Query performs extremely slow with hive.vectorized.adaptor.usage.mode = chosen

2021-06-21 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated HIVE-25242:
-
Description: 
If hive.vectorized.adaptor.usage.mode is set to chosen only certain UDFS are 
vectorized through the vectorized adaptor.

Queries like this one, performs very slowly because the concat is not chosen to 
be vectorized.
{code:java}
select count(*) from tbl where to_date(concat(year, '-', month, '-', day)) 
between to_date('2018-12-01') and to_date('2021-03-01');  {code}
The patch whitelists the concat udf so that it uses the vectorized adaptor in 
chosen mode.

  was:
If hive.vectorized.adaptor.usage.mode is set to chosen only certain UDFS are 
vectorized through the vectorized adaptor.

Queries like this one, performs very slowly because the concat is not chosen to 
be vectorized.
{code:java}
select count(*) from tbl where to_date(concat(year, '-', month, '-', day)) 
between to_date('2018-12-01') and to_date('2021-03-01');  {code}


>  Query performs extremely slow with hive.vectorized.adaptor.usage.mode = 
> chosen
> ---
>
> Key: HIVE-25242
> URL: https://issues.apache.org/jira/browse/HIVE-25242
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> If hive.vectorized.adaptor.usage.mode is set to chosen only certain UDFS are 
> vectorized through the vectorized adaptor.
> Queries like this one, performs very slowly because the concat is not chosen 
> to be vectorized.
> {code:java}
> select count(*) from tbl where to_date(concat(year, '-', month, '-', day)) 
> between to_date('2018-12-01') and to_date('2021-03-01');  {code}
> The patch whitelists the concat udf so that it uses the vectorized adaptor in 
> chosen mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25243) Llap external client - Handle nested values when the parent struct is null

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25243?focusedWorklogId=612668=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612668
 ]

ASF GitHub Bot logged work on HIVE-25243:
-

Author: ASF GitHub Bot
Created on: 21/Jun/21 14:35
Start Date: 21/Jun/21 14:35
Worklog Time Spent: 10m 
  Work Description: ShubhamChaurasia commented on a change in pull request 
#2391:
URL: https://github.com/apache/hive/pull/2391#discussion_r655433343



##
File path: 
itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestMiniLlapVectorArrowWithLlapIODisabled.java
##
@@ -0,0 +1,124 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hive.jdbc;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.conf.HiveConf.ConfVars;
+import org.apache.hadoop.hive.llap.LlapArrowRowInputFormat;
+import org.apache.hadoop.hive.llap.Row;
+import org.apache.hadoop.io.NullWritable;
+import org.apache.hadoop.mapred.InputFormat;
+import org.junit.BeforeClass;
+import org.junit.Ignore;
+import org.junit.Test;
+
+import static org.junit.Assert.assertEquals;
+
+/**
+ * TestMiniLlapVectorArrowWithLlapIODisabled - turns off llap io while testing 
LLAP external client flow.
+ * The aim of turning off LLAP IO is -
+ * when we create table through this test, LLAP caches them and returns the 
same

Review comment:
   By default, LLAP IO is ON in default mini-llap conf, this issue is only 
seen when we turn it off.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 612668)
Time Spent: 50m  (was: 40m)

> Llap external client - Handle nested values when the parent struct is null
> --
>
> Key: HIVE-25243
> URL: https://issues.apache.org/jira/browse/HIVE-25243
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Shubham Chaurasia
>Assignee: Shubham Chaurasia
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Consider the following table in text format - 
> {code}
> +---+
> |  c8   |
> +---+
> | NULL  |
> | {"r":null,"s":null,"t":null}  |
> | {"r":"a","s":9,"t":2.2}   |
> +---+
> {code}
> When we query above table via llap external client, it throws following 
> exception - 
> {code:java}
> Caused by: java.lang.NullPointerException: src
> at io.netty.util.internal.ObjectUtil.checkNotNull(ObjectUtil.java:33)
> at 
> io.netty.buffer.UnsafeByteBufUtil.setBytes(UnsafeByteBufUtil.java:537)
> at 
> io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:199)
> at io.netty.buffer.WrappedByteBuf.setBytes(WrappedByteBuf.java:486)
> at 
> io.netty.buffer.UnsafeDirectLittleEndian.setBytes(UnsafeDirectLittleEndian.java:34)
> at io.netty.buffer.ArrowBuf.setBytes(ArrowBuf.java:933)
> at 
> org.apache.arrow.vector.BaseVariableWidthVector.setBytes(BaseVariableWidthVector.java:1191)
> at 
> org.apache.arrow.vector.BaseVariableWidthVector.setSafe(BaseVariableWidthVector.java:1026)
> at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.lambda$static$15(Serializer.java:834)
> at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.writeGeneric(Serializer.java:777)
> at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.writePrimitive(Serializer.java:581)
> at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:290)
> at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.writeStruct(Serializer.java:359)
> at 
> 

[jira] [Work logged] (HIVE-25243) Llap external client - Handle nested values when the parent struct is null

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25243?focusedWorklogId=612665=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612665
 ]

ASF GitHub Bot logged work on HIVE-25243:
-

Author: ASF GitHub Bot
Created on: 21/Jun/21 14:35
Start Date: 21/Jun/21 14:35
Worklog Time Spent: 10m 
  Work Description: ShubhamChaurasia commented on a change in pull request 
#2391:
URL: https://github.com/apache/hive/pull/2391#discussion_r655432446



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/arrow/Serializer.java
##
@@ -347,6 +347,21 @@ private void writeStruct(NonNullableStructVector 
arrowVector, StructColumnVector
 final ColumnVector[] hiveFieldVectors = hiveVector == null ? null : 
hiveVector.fields;
 final int fieldSize = fieldTypeInfos.size();
 
+// This is to handle following scenario -
+// if any struct value itself is NULL, we get structVector.isNull[i]=true
+// but we don't get the same for it's child fields which later causes 
exceptions while setting to arrow vectors
+// see - https://issues.apache.org/jira/browse/HIVE-25243
+if (hiveVector != null && hiveFieldVectors != null) {
+  for (int i = 0; i < size; i++) {
+if (hiveVector.isNull[i]) {
+  for (ColumnVector fieldVector : hiveFieldVectors) {
+fieldVector.isNull[i] = true;

Review comment:
   It is like - if for outer vector(struct here) isNull[i] = true, all the 
nested vectors (hiveFieldVectors) should also have isNull[i] = true. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 612665)
Time Spent: 40m  (was: 0.5h)

> Llap external client - Handle nested values when the parent struct is null
> --
>
> Key: HIVE-25243
> URL: https://issues.apache.org/jira/browse/HIVE-25243
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Shubham Chaurasia
>Assignee: Shubham Chaurasia
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Consider the following table in text format - 
> {code}
> +---+
> |  c8   |
> +---+
> | NULL  |
> | {"r":null,"s":null,"t":null}  |
> | {"r":"a","s":9,"t":2.2}   |
> +---+
> {code}
> When we query above table via llap external client, it throws following 
> exception - 
> {code:java}
> Caused by: java.lang.NullPointerException: src
> at io.netty.util.internal.ObjectUtil.checkNotNull(ObjectUtil.java:33)
> at 
> io.netty.buffer.UnsafeByteBufUtil.setBytes(UnsafeByteBufUtil.java:537)
> at 
> io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:199)
> at io.netty.buffer.WrappedByteBuf.setBytes(WrappedByteBuf.java:486)
> at 
> io.netty.buffer.UnsafeDirectLittleEndian.setBytes(UnsafeDirectLittleEndian.java:34)
> at io.netty.buffer.ArrowBuf.setBytes(ArrowBuf.java:933)
> at 
> org.apache.arrow.vector.BaseVariableWidthVector.setBytes(BaseVariableWidthVector.java:1191)
> at 
> org.apache.arrow.vector.BaseVariableWidthVector.setSafe(BaseVariableWidthVector.java:1026)
> at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.lambda$static$15(Serializer.java:834)
> at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.writeGeneric(Serializer.java:777)
> at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.writePrimitive(Serializer.java:581)
> at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:290)
> at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.writeStruct(Serializer.java:359)
> at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:296)
> at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.serializeBatch(Serializer.java:213)
> at 
> org.apache.hadoop.hive.ql.exec.vector.filesink.VectorFileSinkArrowOperator.process(VectorFileSinkArrowOperator.java:135)
> {code}
> Created a test to repro it - 
> {code:java}
> /**
>  * TestMiniLlapVectorArrowWithLlapIODisabled - turns off llap io while 
> testing LLAP external client flow.
>  * The aim of turning off LLAP IO is -
>  * when we create table through this test, LLAP caches them and returns the 
> same
>  * when we do a read query, due to this we miss some code paths which may 
> have been hit otherwise.
>  */
> public class 

[jira] [Work logged] (HIVE-25243) Llap external client - Handle nested values when the parent struct is null

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25243?focusedWorklogId=612660=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612660
 ]

ASF GitHub Bot logged work on HIVE-25243:
-

Author: ASF GitHub Bot
Created on: 21/Jun/21 14:26
Start Date: 21/Jun/21 14:26
Worklog Time Spent: 10m 
  Work Description: ShubhamChaurasia commented on a change in pull request 
#2391:
URL: https://github.com/apache/hive/pull/2391#discussion_r655425247



##
File path: 
itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcWithMiniLlapVectorArrow.java
##
@@ -238,7 +238,10 @@ public void testDataTypes() throws Exception {
 assertEquals(Date.valueOf("2013-01-01"), rowValues[19]);
 assertEquals("abc123", rowValues[20]);
 assertEquals("abc123 ", rowValues[21]);
-assertArrayEquals("X'01FF'".getBytes("UTF-8"), (byte[]) rowValues[22]);
+
+// one of the above assertions already has assertEquals(null, 
rowValues[22])
+// and below assertion fails with - java.lang.AssertionError: actual array 
was null
+// assertArrayEquals("X'01FF'".getBytes("UTF-8"), (byte[]) rowValues[22]);

Review comment:
   ah yes, right I missed that, will create a new test altogether for this 
as suggested in other comments too.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 612660)
Time Spent: 0.5h  (was: 20m)

> Llap external client - Handle nested values when the parent struct is null
> --
>
> Key: HIVE-25243
> URL: https://issues.apache.org/jira/browse/HIVE-25243
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Shubham Chaurasia
>Assignee: Shubham Chaurasia
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Consider the following table in text format - 
> {code}
> +---+
> |  c8   |
> +---+
> | NULL  |
> | {"r":null,"s":null,"t":null}  |
> | {"r":"a","s":9,"t":2.2}   |
> +---+
> {code}
> When we query above table via llap external client, it throws following 
> exception - 
> {code:java}
> Caused by: java.lang.NullPointerException: src
> at io.netty.util.internal.ObjectUtil.checkNotNull(ObjectUtil.java:33)
> at 
> io.netty.buffer.UnsafeByteBufUtil.setBytes(UnsafeByteBufUtil.java:537)
> at 
> io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:199)
> at io.netty.buffer.WrappedByteBuf.setBytes(WrappedByteBuf.java:486)
> at 
> io.netty.buffer.UnsafeDirectLittleEndian.setBytes(UnsafeDirectLittleEndian.java:34)
> at io.netty.buffer.ArrowBuf.setBytes(ArrowBuf.java:933)
> at 
> org.apache.arrow.vector.BaseVariableWidthVector.setBytes(BaseVariableWidthVector.java:1191)
> at 
> org.apache.arrow.vector.BaseVariableWidthVector.setSafe(BaseVariableWidthVector.java:1026)
> at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.lambda$static$15(Serializer.java:834)
> at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.writeGeneric(Serializer.java:777)
> at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.writePrimitive(Serializer.java:581)
> at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:290)
> at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.writeStruct(Serializer.java:359)
> at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:296)
> at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.serializeBatch(Serializer.java:213)
> at 
> org.apache.hadoop.hive.ql.exec.vector.filesink.VectorFileSinkArrowOperator.process(VectorFileSinkArrowOperator.java:135)
> {code}
> Created a test to repro it - 
> {code:java}
> /**
>  * TestMiniLlapVectorArrowWithLlapIODisabled - turns off llap io while 
> testing LLAP external client flow.
>  * The aim of turning off LLAP IO is -
>  * when we create table through this test, LLAP caches them and returns the 
> same
>  * when we do a read query, due to this we miss some code paths which may 
> have been hit otherwise.
>  */
> public class TestMiniLlapVectorArrowWithLlapIODisabled extends 
> BaseJdbcWithMiniLlap {
>   @BeforeClass
>   public static void beforeTest() throws Exception {
> HiveConf conf = defaultConf();
> conf.setBoolVar(ConfVars.LLAP_OUTPUT_FORMAT_ARROW, true);

[jira] [Work logged] (HIVE-21489) EXPLAIN command throws ClassCastException in Hive

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21489?focusedWorklogId=612646=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612646
 ]

ASF GitHub Bot logged work on HIVE-21489:
-

Author: ASF GitHub Bot
Created on: 21/Jun/21 14:16
Start Date: 21/Jun/21 14:16
Worklog Time Spent: 10m 
  Work Description: zeroflag commented on pull request #2373:
URL: https://github.com/apache/hive/pull/2373#issuecomment-865069136


   LGTM, with one note, both SemanticAnalyzer and ExplainSemanticAnalyzer 
extends from BaseSemanticAnalyzer, wouldn't have made sense to put the getter 
there?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 612646)
Time Spent: 20m  (was: 10m)

> EXPLAIN command throws ClassCastException in Hive
> -
>
> Key: HIVE-21489
> URL: https://issues.apache.org/jira/browse/HIVE-21489
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.4
>Reporter: Ping Lu
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21489.1.patch, HIVE-21489.2.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I'm trying to run commands like explain select * from src in hive-2.3.4,but 
> it falls with the ClassCastException: 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer cannot be cast to 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer
> Steps to reproduce:
> 1)hive.execution.engine is the default value mr
> 2)hive.security.authorization.enabled is set to true, and 
> hive.security.authorization.manager is set to 
> org.apache.hadoop.hive.ql.security.authorization.DefaultHiveAuthorizationProvider
> 3)start hivecli to run command:explain select * from src
> I debug the code and find the issue HIVE-18778 causing the above 
> ClassCastException.If I set hive.in.test to true,the explain command can be 
> successfully executed。
> Now,I have one question,due to hive.in.test cann't be modified at runtime.how 
> to run explain command with using default authorization in hive-2.3.4,



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25243) Llap external client - Handle nested values when the parent struct is null

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25243?focusedWorklogId=612624=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612624
 ]

ASF GitHub Bot logged work on HIVE-25243:
-

Author: ASF GitHub Bot
Created on: 21/Jun/21 13:34
Start Date: 21/Jun/21 13:34
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on a change in pull request #2391:
URL: https://github.com/apache/hive/pull/2391#discussion_r655374537



##
File path: 
itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestMiniLlapVectorArrowWithLlapIODisabled.java
##
@@ -0,0 +1,124 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hive.jdbc;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.conf.HiveConf.ConfVars;
+import org.apache.hadoop.hive.llap.LlapArrowRowInputFormat;
+import org.apache.hadoop.hive.llap.Row;
+import org.apache.hadoop.io.NullWritable;
+import org.apache.hadoop.mapred.InputFormat;
+import org.junit.BeforeClass;
+import org.junit.Ignore;
+import org.junit.Test;
+
+import static org.junit.Assert.assertEquals;
+
+/**
+ * TestMiniLlapVectorArrowWithLlapIODisabled - turns off llap io while testing 
LLAP external client flow.
+ * The aim of turning off LLAP IO is -
+ * when we create table through this test, LLAP caches them and returns the 
same

Review comment:
   You mean, wen tested with turning on LLAP IO ?

##
File path: 
itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestMiniLlapVectorArrowWithLlapIODisabled.java
##
@@ -0,0 +1,124 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hive.jdbc;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.conf.HiveConf.ConfVars;
+import org.apache.hadoop.hive.llap.LlapArrowRowInputFormat;
+import org.apache.hadoop.hive.llap.Row;
+import org.apache.hadoop.io.NullWritable;
+import org.apache.hadoop.mapred.InputFormat;
+import org.junit.BeforeClass;
+import org.junit.Ignore;
+import org.junit.Test;
+
+import static org.junit.Assert.assertEquals;
+
+/**
+ * TestMiniLlapVectorArrowWithLlapIODisabled - turns off llap io while testing 
LLAP external client flow.
+ * The aim of turning off LLAP IO is -
+ * when we create table through this test, LLAP caches them and returns the 
same
+ * when we do a read query, due to this we miss some code paths which may have 
been hit otherwise.
+ */
+public class TestMiniLlapVectorArrowWithLlapIODisabled extends 
BaseJdbcWithMiniLlap {
+
+  @BeforeClass
+  public static void beforeTest() throws Exception {
+HiveConf conf = defaultConf();
+conf.setBoolVar(ConfVars.LLAP_OUTPUT_FORMAT_ARROW, true);
+conf.setBoolVar(ConfVars.HIVE_VECTORIZATION_FILESINK_ARROW_NATIVE_ENABLED, 
true);
+conf.set(ConfVars.LLAP_IO_ENABLED.varname, "false");
+BaseJdbcWithMiniLlap.beforeTest(conf);
+  }
+
+  @Override
+  protected InputFormat getInputFormat() {
+//For unit testing, no harm in hard-coding allocator ceiling to 
LONG.MAX_VALUE
+return new LlapArrowRowInputFormat(Long.MAX_VALUE);
+  }
+
+  @Test
+  public void testNullsInStructFields() throws Exception {
+createDataTypesTable("datatypes");
+RowCollector2 rowCollector = new RowCollector2();
+// c8 struct
+// c15 struct>
+// c16 array,n:int>>
+String query = "select c8, c15, c16 

[jira] [Updated] (HIVE-25265) Fix TestHiveIcebergStorageHandlerWithEngine

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25265:
--
Labels: pull-request-available  (was: )

> Fix TestHiveIcebergStorageHandlerWithEngine
> ---
>
> Key: HIVE-25265
> URL: https://issues.apache.org/jira/browse/HIVE-25265
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> test is unstable:
> http://ci.hive.apache.org/job/hive-flaky-check/251/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25265) Fix TestHiveIcebergStorageHandlerWithEngine

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25265?focusedWorklogId=612610=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612610
 ]

ASF GitHub Bot logged work on HIVE-25265:
-

Author: ASF GitHub Bot
Created on: 21/Jun/21 12:54
Start Date: 21/Jun/21 12:54
Worklog Time Spent: 10m 
  Work Description: marton-bod opened a new pull request #2414:
URL: https://github.com/apache/hive/pull/2414


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 612610)
Remaining Estimate: 0h
Time Spent: 10m

> Fix TestHiveIcebergStorageHandlerWithEngine
> ---
>
> Key: HIVE-25265
> URL: https://issues.apache.org/jira/browse/HIVE-25265
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Marton Bod
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> test is unstable:
> http://ci.hive.apache.org/job/hive-flaky-check/251/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25264) Add tests to verify Hive can read/write after schema change on Iceberg table

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25264?focusedWorklogId=612607=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612607
 ]

ASF GitHub Bot logged work on HIVE-25264:
-

Author: ASF GitHub Bot
Created on: 21/Jun/21 12:50
Start Date: 21/Jun/21 12:50
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on pull request #2407:
URL: https://github.com/apache/hive/pull/2407#issuecomment-865006033


   Thanks @kuczoram , looks great! Just a few questions


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 612607)
Time Spent: 2h 10m  (was: 2h)

> Add tests to verify Hive can read/write after schema change on Iceberg table
> 
>
> Key: HIVE-25264
> URL: https://issues.apache.org/jira/browse/HIVE-25264
> Project: Hive
>  Issue Type: Test
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> We should verify if Hive can properly read/write Iceberg tables after their 
> schema was modified through the Iceberg API (it's like when an other engine, 
> like Spark has done modification on the schema). 
> Unit tests should be added for the following operations offered by the 
> UpdateSchema interface in the Iceberg API:
> - adding new top level column
> - adding new nested column
> - adding required column
> - adding required nested column
> - renaming a column
> - updating a column
> - making a column required
> - delete a column
> - changing the order of the columns in the schema



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25264) Add tests to verify Hive can read/write after schema change on Iceberg table

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25264?focusedWorklogId=612604=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612604
 ]

ASF GitHub Bot logged work on HIVE-25264:
-

Author: ASF GitHub Bot
Created on: 21/Jun/21 12:48
Start Date: 21/Jun/21 12:48
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2407:
URL: https://github.com/apache/hive/pull/2407#discussion_r655345983



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -1273,6 +1274,510 @@ public void testScanTableCaseInsensitive() throws 
IOException {
 Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testAddColumnToIcebergTable() throws IOException {
+// Create an Iceberg table with the columns customer_id, first_name and 
last_name with some initial data.
+Table icebergTable = testTables.createTable(shell, "customers", 
HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+// Add a new column (age long) to the Iceberg table.
+icebergTable.updateSchema().addColumn("age", 
Types.LongType.get()).commit();
+
+Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", 
Types.LongType.get()),
+optional(2, "first_name", Types.StringType.get(), "This is first 
name"),
+optional(3, "last_name", Types.StringType.get(), "This is last name"),
+optional(4, "age", Types.LongType.get()));
+
+Schema customerSchemaWithAgeOnly =
+new Schema(optional(1, "customer_id", Types.LongType.get()), 
optional(4, "age", Types.LongType.get()));
+
+// Also add a new entry to the table where the age column is set.
+icebergTable = testTables.loadTable(TableIdentifier.of("default", 
"customers"));
+List newCustomerWithAge = 
TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+.add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null).build();
+testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, 
fileFormat, null, newCustomerWithAge);
+
+// Do a 'select *' from Hive and check if the age column appears in the 
result.
+// It should be null for the old data and should be filled for the data 
added after the column addition.
+TestHelper.RecordsBuilder customersWithAgeBuilder = 
TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+.add(0L, "Alice", "Brown", null).add(1L, "Bob", "Green", null).add(2L, 
"Trudy", "Pink", null)
+.add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null);
+List customersWithAge = customersWithAgeBuilder.build();
+
+List rows = shell.executeStatement("SELECT * FROM 
default.customers");
+HiveIcebergTestUtils.validateData(customersWithAge, 
HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+0);
+
+// Do a 'select customer_id, age' from Hive to check if the new column can 
be queried from Hive.
+// The customer_id is needed because of the result sorting.
+TestHelper.RecordsBuilder customerWithAgeOnlyBuilder = 
TestHelper.RecordsBuilder
+.newInstance(customerSchemaWithAgeOnly).add(0L, null).add(1L, 
null).add(2L, null).add(3L, 34L).add(4L, null);
+List customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+
+rows = shell.executeStatement("SELECT customer_id, age FROM 
default.customers");
+HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+
+// Insert some data with age column from Hive. Insert an entry with null 
age and an entry with filled age.
+shell.executeStatement(
+"INSERT INTO default.customers values (5L, 'Lily', 'Magenta', NULL), 
(6L, 'Roni', 'Purple', 23L)");
+
+customersWithAgeBuilder.add(5L, "Lily", "Magenta", null).add(6L, "Roni", 
"Purple", 23L);
+customersWithAge = customersWithAgeBuilder.build();
+rows = shell.executeStatement("SELECT * FROM default.customers");
+HiveIcebergTestUtils.validateData(customersWithAge, 
HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+0);
+
+customerWithAgeOnlyBuilder.add(5L, null).add(6L, 23L);
+customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+rows = shell.executeStatement("SELECT customer_id, age FROM 
default.customers");
+HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+  }
+
+  @Test
+  public void testAddRequiredColumnToIcebergTable() throws IOException {
+// Create an Iceberg table with the columns customer_id, first_name and 
last_name with some initial data.
+Table icebergTable = testTables.createTable(shell, "customers", 

[jira] [Work logged] (HIVE-25264) Add tests to verify Hive can read/write after schema change on Iceberg table

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25264?focusedWorklogId=612603=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612603
 ]

ASF GitHub Bot logged work on HIVE-25264:
-

Author: ASF GitHub Bot
Created on: 21/Jun/21 12:47
Start Date: 21/Jun/21 12:47
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2407:
URL: https://github.com/apache/hive/pull/2407#discussion_r655344122



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -1273,6 +1274,510 @@ public void testScanTableCaseInsensitive() throws 
IOException {
 Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testAddColumnToIcebergTable() throws IOException {
+// Create an Iceberg table with the columns customer_id, first_name and 
last_name with some initial data.
+Table icebergTable = testTables.createTable(shell, "customers", 
HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+// Add a new column (age long) to the Iceberg table.
+icebergTable.updateSchema().addColumn("age", 
Types.LongType.get()).commit();
+
+Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", 
Types.LongType.get()),
+optional(2, "first_name", Types.StringType.get(), "This is first 
name"),
+optional(3, "last_name", Types.StringType.get(), "This is last name"),
+optional(4, "age", Types.LongType.get()));
+
+Schema customerSchemaWithAgeOnly =
+new Schema(optional(1, "customer_id", Types.LongType.get()), 
optional(4, "age", Types.LongType.get()));
+
+// Also add a new entry to the table where the age column is set.
+icebergTable = testTables.loadTable(TableIdentifier.of("default", 
"customers"));
+List newCustomerWithAge = 
TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+.add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null).build();
+testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, 
fileFormat, null, newCustomerWithAge);
+
+// Do a 'select *' from Hive and check if the age column appears in the 
result.
+// It should be null for the old data and should be filled for the data 
added after the column addition.
+TestHelper.RecordsBuilder customersWithAgeBuilder = 
TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+.add(0L, "Alice", "Brown", null).add(1L, "Bob", "Green", null).add(2L, 
"Trudy", "Pink", null)
+.add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null);
+List customersWithAge = customersWithAgeBuilder.build();
+
+List rows = shell.executeStatement("SELECT * FROM 
default.customers");
+HiveIcebergTestUtils.validateData(customersWithAge, 
HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+0);
+
+// Do a 'select customer_id, age' from Hive to check if the new column can 
be queried from Hive.
+// The customer_id is needed because of the result sorting.
+TestHelper.RecordsBuilder customerWithAgeOnlyBuilder = 
TestHelper.RecordsBuilder
+.newInstance(customerSchemaWithAgeOnly).add(0L, null).add(1L, 
null).add(2L, null).add(3L, 34L).add(4L, null);
+List customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+
+rows = shell.executeStatement("SELECT customer_id, age FROM 
default.customers");
+HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+
+// Insert some data with age column from Hive. Insert an entry with null 
age and an entry with filled age.
+shell.executeStatement(
+"INSERT INTO default.customers values (5L, 'Lily', 'Magenta', NULL), 
(6L, 'Roni', 'Purple', 23L)");
+
+customersWithAgeBuilder.add(5L, "Lily", "Magenta", null).add(6L, "Roni", 
"Purple", 23L);
+customersWithAge = customersWithAgeBuilder.build();
+rows = shell.executeStatement("SELECT * FROM default.customers");
+HiveIcebergTestUtils.validateData(customersWithAge, 
HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+0);
+
+customerWithAgeOnlyBuilder.add(5L, null).add(6L, 23L);
+customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+rows = shell.executeStatement("SELECT customer_id, age FROM 
default.customers");
+HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+  }
+
+  @Test
+  public void testAddRequiredColumnToIcebergTable() throws IOException {
+// Create an Iceberg table with the columns customer_id, first_name and 
last_name with some initial data.
+Table icebergTable = testTables.createTable(shell, "customers", 

[jira] [Work logged] (HIVE-25264) Add tests to verify Hive can read/write after schema change on Iceberg table

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25264?focusedWorklogId=612602=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612602
 ]

ASF GitHub Bot logged work on HIVE-25264:
-

Author: ASF GitHub Bot
Created on: 21/Jun/21 12:46
Start Date: 21/Jun/21 12:46
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2407:
URL: https://github.com/apache/hive/pull/2407#discussion_r655344122



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -1273,6 +1274,510 @@ public void testScanTableCaseInsensitive() throws 
IOException {
 Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testAddColumnToIcebergTable() throws IOException {
+// Create an Iceberg table with the columns customer_id, first_name and 
last_name with some initial data.
+Table icebergTable = testTables.createTable(shell, "customers", 
HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+// Add a new column (age long) to the Iceberg table.
+icebergTable.updateSchema().addColumn("age", 
Types.LongType.get()).commit();
+
+Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", 
Types.LongType.get()),
+optional(2, "first_name", Types.StringType.get(), "This is first 
name"),
+optional(3, "last_name", Types.StringType.get(), "This is last name"),
+optional(4, "age", Types.LongType.get()));
+
+Schema customerSchemaWithAgeOnly =
+new Schema(optional(1, "customer_id", Types.LongType.get()), 
optional(4, "age", Types.LongType.get()));
+
+// Also add a new entry to the table where the age column is set.
+icebergTable = testTables.loadTable(TableIdentifier.of("default", 
"customers"));
+List newCustomerWithAge = 
TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+.add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null).build();
+testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, 
fileFormat, null, newCustomerWithAge);
+
+// Do a 'select *' from Hive and check if the age column appears in the 
result.
+// It should be null for the old data and should be filled for the data 
added after the column addition.
+TestHelper.RecordsBuilder customersWithAgeBuilder = 
TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+.add(0L, "Alice", "Brown", null).add(1L, "Bob", "Green", null).add(2L, 
"Trudy", "Pink", null)
+.add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null);
+List customersWithAge = customersWithAgeBuilder.build();
+
+List rows = shell.executeStatement("SELECT * FROM 
default.customers");
+HiveIcebergTestUtils.validateData(customersWithAge, 
HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+0);
+
+// Do a 'select customer_id, age' from Hive to check if the new column can 
be queried from Hive.
+// The customer_id is needed because of the result sorting.
+TestHelper.RecordsBuilder customerWithAgeOnlyBuilder = 
TestHelper.RecordsBuilder
+.newInstance(customerSchemaWithAgeOnly).add(0L, null).add(1L, 
null).add(2L, null).add(3L, 34L).add(4L, null);
+List customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+
+rows = shell.executeStatement("SELECT customer_id, age FROM 
default.customers");
+HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+
+// Insert some data with age column from Hive. Insert an entry with null 
age and an entry with filled age.
+shell.executeStatement(
+"INSERT INTO default.customers values (5L, 'Lily', 'Magenta', NULL), 
(6L, 'Roni', 'Purple', 23L)");
+
+customersWithAgeBuilder.add(5L, "Lily", "Magenta", null).add(6L, "Roni", 
"Purple", 23L);
+customersWithAge = customersWithAgeBuilder.build();
+rows = shell.executeStatement("SELECT * FROM default.customers");
+HiveIcebergTestUtils.validateData(customersWithAge, 
HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+0);
+
+customerWithAgeOnlyBuilder.add(5L, null).add(6L, 23L);
+customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+rows = shell.executeStatement("SELECT customer_id, age FROM 
default.customers");
+HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+  }
+
+  @Test
+  public void testAddRequiredColumnToIcebergTable() throws IOException {
+// Create an Iceberg table with the columns customer_id, first_name and 
last_name with some initial data.
+Table icebergTable = testTables.createTable(shell, "customers", 

[jira] [Work logged] (HIVE-25264) Add tests to verify Hive can read/write after schema change on Iceberg table

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25264?focusedWorklogId=612598=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612598
 ]

ASF GitHub Bot logged work on HIVE-25264:
-

Author: ASF GitHub Bot
Created on: 21/Jun/21 12:33
Start Date: 21/Jun/21 12:33
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2407:
URL: https://github.com/apache/hive/pull/2407#discussion_r655335482



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -1273,6 +1274,510 @@ public void testScanTableCaseInsensitive() throws 
IOException {
 Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testAddColumnToIcebergTable() throws IOException {
+// Create an Iceberg table with the columns customer_id, first_name and 
last_name with some initial data.
+Table icebergTable = testTables.createTable(shell, "customers", 
HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+// Add a new column (age long) to the Iceberg table.
+icebergTable.updateSchema().addColumn("age", 
Types.LongType.get()).commit();
+
+Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", 
Types.LongType.get()),
+optional(2, "first_name", Types.StringType.get(), "This is first 
name"),
+optional(3, "last_name", Types.StringType.get(), "This is last name"),
+optional(4, "age", Types.LongType.get()));
+
+Schema customerSchemaWithAgeOnly =
+new Schema(optional(1, "customer_id", Types.LongType.get()), 
optional(4, "age", Types.LongType.get()));
+
+// Also add a new entry to the table where the age column is set.
+icebergTable = testTables.loadTable(TableIdentifier.of("default", 
"customers"));
+List newCustomerWithAge = 
TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+.add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null).build();
+testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, 
fileFormat, null, newCustomerWithAge);
+
+// Do a 'select *' from Hive and check if the age column appears in the 
result.
+// It should be null for the old data and should be filled for the data 
added after the column addition.
+TestHelper.RecordsBuilder customersWithAgeBuilder = 
TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+.add(0L, "Alice", "Brown", null).add(1L, "Bob", "Green", null).add(2L, 
"Trudy", "Pink", null)
+.add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null);
+List customersWithAge = customersWithAgeBuilder.build();
+
+List rows = shell.executeStatement("SELECT * FROM 
default.customers");
+HiveIcebergTestUtils.validateData(customersWithAge, 
HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+0);
+
+// Do a 'select customer_id, age' from Hive to check if the new column can 
be queried from Hive.
+// The customer_id is needed because of the result sorting.
+TestHelper.RecordsBuilder customerWithAgeOnlyBuilder = 
TestHelper.RecordsBuilder
+.newInstance(customerSchemaWithAgeOnly).add(0L, null).add(1L, 
null).add(2L, null).add(3L, 34L).add(4L, null);
+List customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+
+rows = shell.executeStatement("SELECT customer_id, age FROM 
default.customers");
+HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+
+// Insert some data with age column from Hive. Insert an entry with null 
age and an entry with filled age.
+shell.executeStatement(
+"INSERT INTO default.customers values (5L, 'Lily', 'Magenta', NULL), 
(6L, 'Roni', 'Purple', 23L)");
+
+customersWithAgeBuilder.add(5L, "Lily", "Magenta", null).add(6L, "Roni", 
"Purple", 23L);
+customersWithAge = customersWithAgeBuilder.build();
+rows = shell.executeStatement("SELECT * FROM default.customers");
+HiveIcebergTestUtils.validateData(customersWithAge, 
HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+0);
+
+customerWithAgeOnlyBuilder.add(5L, null).add(6L, 23L);
+customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+rows = shell.executeStatement("SELECT customer_id, age FROM 
default.customers");
+HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+  }
+
+  @Test
+  public void testAddRequiredColumnToIcebergTable() throws IOException {
+// Create an Iceberg table with the columns customer_id, first_name and 
last_name with some initial data.
+Table icebergTable = testTables.createTable(shell, "customers", 

[jira] [Work logged] (HIVE-25264) Add tests to verify Hive can read/write after schema change on Iceberg table

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25264?focusedWorklogId=612589=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612589
 ]

ASF GitHub Bot logged work on HIVE-25264:
-

Author: ASF GitHub Bot
Created on: 21/Jun/21 12:06
Start Date: 21/Jun/21 12:06
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2407:
URL: https://github.com/apache/hive/pull/2407#discussion_r655317847



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -1273,6 +1274,510 @@ public void testScanTableCaseInsensitive() throws 
IOException {
 Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testAddColumnToIcebergTable() throws IOException {
+// Create an Iceberg table with the columns customer_id, first_name and 
last_name with some initial data.
+Table icebergTable = testTables.createTable(shell, "customers", 
HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+// Add a new column (age long) to the Iceberg table.
+icebergTable.updateSchema().addColumn("age", 
Types.LongType.get()).commit();
+
+Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", 
Types.LongType.get()),
+optional(2, "first_name", Types.StringType.get(), "This is first 
name"),
+optional(3, "last_name", Types.StringType.get(), "This is last name"),
+optional(4, "age", Types.LongType.get()));
+
+Schema customerSchemaWithAgeOnly =
+new Schema(optional(1, "customer_id", Types.LongType.get()), 
optional(4, "age", Types.LongType.get()));
+
+// Also add a new entry to the table where the age column is set.
+icebergTable = testTables.loadTable(TableIdentifier.of("default", 
"customers"));
+List newCustomerWithAge = 
TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+.add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null).build();
+testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, 
fileFormat, null, newCustomerWithAge);
+
+// Do a 'select *' from Hive and check if the age column appears in the 
result.
+// It should be null for the old data and should be filled for the data 
added after the column addition.
+TestHelper.RecordsBuilder customersWithAgeBuilder = 
TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+.add(0L, "Alice", "Brown", null).add(1L, "Bob", "Green", null).add(2L, 
"Trudy", "Pink", null)
+.add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null);
+List customersWithAge = customersWithAgeBuilder.build();
+
+List rows = shell.executeStatement("SELECT * FROM 
default.customers");
+HiveIcebergTestUtils.validateData(customersWithAge, 
HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+0);
+
+// Do a 'select customer_id, age' from Hive to check if the new column can 
be queried from Hive.
+// The customer_id is needed because of the result sorting.
+TestHelper.RecordsBuilder customerWithAgeOnlyBuilder = 
TestHelper.RecordsBuilder
+.newInstance(customerSchemaWithAgeOnly).add(0L, null).add(1L, 
null).add(2L, null).add(3L, 34L).add(4L, null);
+List customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+
+rows = shell.executeStatement("SELECT customer_id, age FROM 
default.customers");
+HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+
+// Insert some data with age column from Hive. Insert an entry with null 
age and an entry with filled age.
+shell.executeStatement(
+"INSERT INTO default.customers values (5L, 'Lily', 'Magenta', NULL), 
(6L, 'Roni', 'Purple', 23L)");
+
+customersWithAgeBuilder.add(5L, "Lily", "Magenta", null).add(6L, "Roni", 
"Purple", 23L);
+customersWithAge = customersWithAgeBuilder.build();
+rows = shell.executeStatement("SELECT * FROM default.customers");
+HiveIcebergTestUtils.validateData(customersWithAge, 
HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+0);
+
+customerWithAgeOnlyBuilder.add(5L, null).add(6L, 23L);
+customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+rows = shell.executeStatement("SELECT customer_id, age FROM 
default.customers");
+HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+  }
+
+  @Test
+  public void testAddRequiredColumnToIcebergTable() throws IOException {
+// Create an Iceberg table with the columns customer_id, first_name and 
last_name with some initial data.
+Table icebergTable = testTables.createTable(shell, "customers", 

[jira] [Work logged] (HIVE-25264) Add tests to verify Hive can read/write after schema change on Iceberg table

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25264?focusedWorklogId=612584=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612584
 ]

ASF GitHub Bot logged work on HIVE-25264:
-

Author: ASF GitHub Bot
Created on: 21/Jun/21 11:57
Start Date: 21/Jun/21 11:57
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2407:
URL: https://github.com/apache/hive/pull/2407#discussion_r655312262



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -1273,6 +1274,510 @@ public void testScanTableCaseInsensitive() throws 
IOException {
 Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testAddColumnToIcebergTable() throws IOException {
+// Create an Iceberg table with the columns customer_id, first_name and 
last_name with some initial data.
+Table icebergTable = testTables.createTable(shell, "customers", 
HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+// Add a new column (age long) to the Iceberg table.
+icebergTable.updateSchema().addColumn("age", 
Types.LongType.get()).commit();
+
+Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", 
Types.LongType.get()),
+optional(2, "first_name", Types.StringType.get(), "This is first 
name"),
+optional(3, "last_name", Types.StringType.get(), "This is last name"),
+optional(4, "age", Types.LongType.get()));
+
+Schema customerSchemaWithAgeOnly =
+new Schema(optional(1, "customer_id", Types.LongType.get()), 
optional(4, "age", Types.LongType.get()));
+
+// Also add a new entry to the table where the age column is set.
+icebergTable = testTables.loadTable(TableIdentifier.of("default", 
"customers"));
+List newCustomerWithAge = 
TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+.add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null).build();
+testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, 
fileFormat, null, newCustomerWithAge);
+
+// Do a 'select *' from Hive and check if the age column appears in the 
result.
+// It should be null for the old data and should be filled for the data 
added after the column addition.
+TestHelper.RecordsBuilder customersWithAgeBuilder = 
TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+.add(0L, "Alice", "Brown", null).add(1L, "Bob", "Green", null).add(2L, 
"Trudy", "Pink", null)
+.add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null);
+List customersWithAge = customersWithAgeBuilder.build();
+
+List rows = shell.executeStatement("SELECT * FROM 
default.customers");
+HiveIcebergTestUtils.validateData(customersWithAge, 
HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+0);
+
+// Do a 'select customer_id, age' from Hive to check if the new column can 
be queried from Hive.
+// The customer_id is needed because of the result sorting.
+TestHelper.RecordsBuilder customerWithAgeOnlyBuilder = 
TestHelper.RecordsBuilder
+.newInstance(customerSchemaWithAgeOnly).add(0L, null).add(1L, 
null).add(2L, null).add(3L, 34L).add(4L, null);
+List customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+
+rows = shell.executeStatement("SELECT customer_id, age FROM 
default.customers");
+HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+
+// Insert some data with age column from Hive. Insert an entry with null 
age and an entry with filled age.
+shell.executeStatement(
+"INSERT INTO default.customers values (5L, 'Lily', 'Magenta', NULL), 
(6L, 'Roni', 'Purple', 23L)");
+
+customersWithAgeBuilder.add(5L, "Lily", "Magenta", null).add(6L, "Roni", 
"Purple", 23L);
+customersWithAge = customersWithAgeBuilder.build();
+rows = shell.executeStatement("SELECT * FROM default.customers");
+HiveIcebergTestUtils.validateData(customersWithAge, 
HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+0);
+
+customerWithAgeOnlyBuilder.add(5L, null).add(6L, 23L);
+customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+rows = shell.executeStatement("SELECT customer_id, age FROM 
default.customers");
+HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+  }
+
+  @Test
+  public void testAddRequiredColumnToIcebergTable() throws IOException {
+// Create an Iceberg table with the columns customer_id, first_name and 
last_name with some initial data.

Review comment:
   This is not actually filled with initial data in this 

[jira] [Work logged] (HIVE-25264) Add tests to verify Hive can read/write after schema change on Iceberg table

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25264?focusedWorklogId=612585=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612585
 ]

ASF GitHub Bot logged work on HIVE-25264:
-

Author: ASF GitHub Bot
Created on: 21/Jun/21 11:57
Start Date: 21/Jun/21 11:57
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2407:
URL: https://github.com/apache/hive/pull/2407#discussion_r655312262



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -1273,6 +1274,510 @@ public void testScanTableCaseInsensitive() throws 
IOException {
 Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testAddColumnToIcebergTable() throws IOException {
+// Create an Iceberg table with the columns customer_id, first_name and 
last_name with some initial data.
+Table icebergTable = testTables.createTable(shell, "customers", 
HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+// Add a new column (age long) to the Iceberg table.
+icebergTable.updateSchema().addColumn("age", 
Types.LongType.get()).commit();
+
+Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", 
Types.LongType.get()),
+optional(2, "first_name", Types.StringType.get(), "This is first 
name"),
+optional(3, "last_name", Types.StringType.get(), "This is last name"),
+optional(4, "age", Types.LongType.get()));
+
+Schema customerSchemaWithAgeOnly =
+new Schema(optional(1, "customer_id", Types.LongType.get()), 
optional(4, "age", Types.LongType.get()));
+
+// Also add a new entry to the table where the age column is set.
+icebergTable = testTables.loadTable(TableIdentifier.of("default", 
"customers"));
+List newCustomerWithAge = 
TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+.add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null).build();
+testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, 
fileFormat, null, newCustomerWithAge);
+
+// Do a 'select *' from Hive and check if the age column appears in the 
result.
+// It should be null for the old data and should be filled for the data 
added after the column addition.
+TestHelper.RecordsBuilder customersWithAgeBuilder = 
TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+.add(0L, "Alice", "Brown", null).add(1L, "Bob", "Green", null).add(2L, 
"Trudy", "Pink", null)
+.add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null);
+List customersWithAge = customersWithAgeBuilder.build();
+
+List rows = shell.executeStatement("SELECT * FROM 
default.customers");
+HiveIcebergTestUtils.validateData(customersWithAge, 
HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+0);
+
+// Do a 'select customer_id, age' from Hive to check if the new column can 
be queried from Hive.
+// The customer_id is needed because of the result sorting.
+TestHelper.RecordsBuilder customerWithAgeOnlyBuilder = 
TestHelper.RecordsBuilder
+.newInstance(customerSchemaWithAgeOnly).add(0L, null).add(1L, 
null).add(2L, null).add(3L, 34L).add(4L, null);
+List customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+
+rows = shell.executeStatement("SELECT customer_id, age FROM 
default.customers");
+HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+
+// Insert some data with age column from Hive. Insert an entry with null 
age and an entry with filled age.
+shell.executeStatement(
+"INSERT INTO default.customers values (5L, 'Lily', 'Magenta', NULL), 
(6L, 'Roni', 'Purple', 23L)");
+
+customersWithAgeBuilder.add(5L, "Lily", "Magenta", null).add(6L, "Roni", 
"Purple", 23L);
+customersWithAge = customersWithAgeBuilder.build();
+rows = shell.executeStatement("SELECT * FROM default.customers");
+HiveIcebergTestUtils.validateData(customersWithAge, 
HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+0);
+
+customerWithAgeOnlyBuilder.add(5L, null).add(6L, 23L);
+customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+rows = shell.executeStatement("SELECT customer_id, age FROM 
default.customers");
+HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+  }
+
+  @Test
+  public void testAddRequiredColumnToIcebergTable() throws IOException {
+// Create an Iceberg table with the columns customer_id, first_name and 
last_name with some initial data.

Review comment:
   This is not actually filled with initial data in this 

[jira] [Work logged] (HIVE-25264) Add tests to verify Hive can read/write after schema change on Iceberg table

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25264?focusedWorklogId=612580=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612580
 ]

ASF GitHub Bot logged work on HIVE-25264:
-

Author: ASF GitHub Bot
Created on: 21/Jun/21 11:54
Start Date: 21/Jun/21 11:54
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2407:
URL: https://github.com/apache/hive/pull/2407#discussion_r655310601



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -1273,6 +1274,510 @@ public void testScanTableCaseInsensitive() throws 
IOException {
 Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testAddColumnToIcebergTable() throws IOException {
+// Create an Iceberg table with the columns customer_id, first_name and 
last_name with some initial data.
+Table icebergTable = testTables.createTable(shell, "customers", 
HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+// Add a new column (age long) to the Iceberg table.
+icebergTable.updateSchema().addColumn("age", 
Types.LongType.get()).commit();
+
+Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", 
Types.LongType.get()),
+optional(2, "first_name", Types.StringType.get(), "This is first 
name"),
+optional(3, "last_name", Types.StringType.get(), "This is last name"),
+optional(4, "age", Types.LongType.get()));
+
+Schema customerSchemaWithAgeOnly =
+new Schema(optional(1, "customer_id", Types.LongType.get()), 
optional(4, "age", Types.LongType.get()));
+
+// Also add a new entry to the table where the age column is set.
+icebergTable = testTables.loadTable(TableIdentifier.of("default", 
"customers"));
+List newCustomerWithAge = 
TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+.add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null).build();
+testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, 
fileFormat, null, newCustomerWithAge);
+
+// Do a 'select *' from Hive and check if the age column appears in the 
result.
+// It should be null for the old data and should be filled for the data 
added after the column addition.
+TestHelper.RecordsBuilder customersWithAgeBuilder = 
TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+.add(0L, "Alice", "Brown", null).add(1L, "Bob", "Green", null).add(2L, 
"Trudy", "Pink", null)
+.add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null);
+List customersWithAge = customersWithAgeBuilder.build();
+
+List rows = shell.executeStatement("SELECT * FROM 
default.customers");
+HiveIcebergTestUtils.validateData(customersWithAge, 
HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+0);
+
+// Do a 'select customer_id, age' from Hive to check if the new column can 
be queried from Hive.
+// The customer_id is needed because of the result sorting.
+TestHelper.RecordsBuilder customerWithAgeOnlyBuilder = 
TestHelper.RecordsBuilder
+.newInstance(customerSchemaWithAgeOnly).add(0L, null).add(1L, 
null).add(2L, null).add(3L, 34L).add(4L, null);
+List customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+
+rows = shell.executeStatement("SELECT customer_id, age FROM 
default.customers");
+HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+
+// Insert some data with age column from Hive. Insert an entry with null 
age and an entry with filled age.
+shell.executeStatement(
+"INSERT INTO default.customers values (5L, 'Lily', 'Magenta', NULL), 
(6L, 'Roni', 'Purple', 23L)");
+
+customersWithAgeBuilder.add(5L, "Lily", "Magenta", null).add(6L, "Roni", 
"Purple", 23L);
+customersWithAge = customersWithAgeBuilder.build();
+rows = shell.executeStatement("SELECT * FROM default.customers");
+HiveIcebergTestUtils.validateData(customersWithAge, 
HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+0);
+
+customerWithAgeOnlyBuilder.add(5L, null).add(6L, 23L);
+customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+rows = shell.executeStatement("SELECT customer_id, age FROM 
default.customers");
+HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+  }
+
+  @Test
+  public void testAddRequiredColumnToIcebergTable() throws IOException {
+// Create an Iceberg table with the columns customer_id, first_name and 
last_name with some initial data.
+Table icebergTable = testTables.createTable(shell, "customers", 

[jira] [Work logged] (HIVE-25264) Add tests to verify Hive can read/write after schema change on Iceberg table

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25264?focusedWorklogId=612577=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612577
 ]

ASF GitHub Bot logged work on HIVE-25264:
-

Author: ASF GitHub Bot
Created on: 21/Jun/21 11:49
Start Date: 21/Jun/21 11:49
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2407:
URL: https://github.com/apache/hive/pull/2407#discussion_r655307485



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -1273,6 +1274,510 @@ public void testScanTableCaseInsensitive() throws 
IOException {
 Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testAddColumnToIcebergTable() throws IOException {
+// Create an Iceberg table with the columns customer_id, first_name and 
last_name with some initial data.
+Table icebergTable = testTables.createTable(shell, "customers", 
HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+// Add a new column (age long) to the Iceberg table.
+icebergTable.updateSchema().addColumn("age", 
Types.LongType.get()).commit();
+
+Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", 
Types.LongType.get()),
+optional(2, "first_name", Types.StringType.get(), "This is first 
name"),
+optional(3, "last_name", Types.StringType.get(), "This is last name"),
+optional(4, "age", Types.LongType.get()));
+
+Schema customerSchemaWithAgeOnly =
+new Schema(optional(1, "customer_id", Types.LongType.get()), 
optional(4, "age", Types.LongType.get()));
+
+// Also add a new entry to the table where the age column is set.
+icebergTable = testTables.loadTable(TableIdentifier.of("default", 
"customers"));
+List newCustomerWithAge = 
TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+.add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null).build();
+testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, 
fileFormat, null, newCustomerWithAge);
+
+// Do a 'select *' from Hive and check if the age column appears in the 
result.
+// It should be null for the old data and should be filled for the data 
added after the column addition.
+TestHelper.RecordsBuilder customersWithAgeBuilder = 
TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+.add(0L, "Alice", "Brown", null).add(1L, "Bob", "Green", null).add(2L, 
"Trudy", "Pink", null)
+.add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null);
+List customersWithAge = customersWithAgeBuilder.build();
+
+List rows = shell.executeStatement("SELECT * FROM 
default.customers");
+HiveIcebergTestUtils.validateData(customersWithAge, 
HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+0);
+
+// Do a 'select customer_id, age' from Hive to check if the new column can 
be queried from Hive.
+// The customer_id is needed because of the result sorting.
+TestHelper.RecordsBuilder customerWithAgeOnlyBuilder = 
TestHelper.RecordsBuilder
+.newInstance(customerSchemaWithAgeOnly).add(0L, null).add(1L, 
null).add(2L, null).add(3L, 34L).add(4L, null);
+List customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+
+rows = shell.executeStatement("SELECT customer_id, age FROM 
default.customers");
+HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+
+// Insert some data with age column from Hive. Insert an entry with null 
age and an entry with filled age.
+shell.executeStatement(
+"INSERT INTO default.customers values (5L, 'Lily', 'Magenta', NULL), 
(6L, 'Roni', 'Purple', 23L)");
+
+customersWithAgeBuilder.add(5L, "Lily", "Magenta", null).add(6L, "Roni", 
"Purple", 23L);
+customersWithAge = customersWithAgeBuilder.build();

Review comment:
   Oh okay, I see now why you kept the builder :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 612577)
Time Spent: 40m  (was: 0.5h)

> Add tests to verify Hive can read/write after schema change on Iceberg table
> 
>
> Key: HIVE-25264
> URL: https://issues.apache.org/jira/browse/HIVE-25264
> Project: Hive
>  Issue Type: Test
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora

[jira] [Work logged] (HIVE-25264) Add tests to verify Hive can read/write after schema change on Iceberg table

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25264?focusedWorklogId=612574=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612574
 ]

ASF GitHub Bot logged work on HIVE-25264:
-

Author: ASF GitHub Bot
Created on: 21/Jun/21 11:46
Start Date: 21/Jun/21 11:46
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2407:
URL: https://github.com/apache/hive/pull/2407#discussion_r655305917



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -1273,6 +1274,510 @@ public void testScanTableCaseInsensitive() throws 
IOException {
 Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testAddColumnToIcebergTable() throws IOException {
+// Create an Iceberg table with the columns customer_id, first_name and 
last_name with some initial data.
+Table icebergTable = testTables.createTable(shell, "customers", 
HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+// Add a new column (age long) to the Iceberg table.
+icebergTable.updateSchema().addColumn("age", 
Types.LongType.get()).commit();
+
+Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", 
Types.LongType.get()),
+optional(2, "first_name", Types.StringType.get(), "This is first 
name"),
+optional(3, "last_name", Types.StringType.get(), "This is last name"),
+optional(4, "age", Types.LongType.get()));
+
+Schema customerSchemaWithAgeOnly =
+new Schema(optional(1, "customer_id", Types.LongType.get()), 
optional(4, "age", Types.LongType.get()));
+
+// Also add a new entry to the table where the age column is set.
+icebergTable = testTables.loadTable(TableIdentifier.of("default", 
"customers"));
+List newCustomerWithAge = 
TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+.add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null).build();
+testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, 
fileFormat, null, newCustomerWithAge);
+
+// Do a 'select *' from Hive and check if the age column appears in the 
result.
+// It should be null for the old data and should be filled for the data 
added after the column addition.
+TestHelper.RecordsBuilder customersWithAgeBuilder = 
TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)

Review comment:
   nit: can we merge these two declarations by not calling the `.build()` 
method separately (and elsewhere where it's the same pattern)? No strong 
feelings, so we can keep as is, but at least in my opinion it would make it a 
bit more streamlined




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 612574)
Time Spent: 0.5h  (was: 20m)

> Add tests to verify Hive can read/write after schema change on Iceberg table
> 
>
> Key: HIVE-25264
> URL: https://issues.apache.org/jira/browse/HIVE-25264
> Project: Hive
>  Issue Type: Test
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We should verify if Hive can properly read/write Iceberg tables after their 
> schema was modified through the Iceberg API (it's like when an other engine, 
> like Spark has done modification on the schema). 
> Unit tests should be added for the following operations offered by the 
> UpdateSchema interface in the Iceberg API:
> - adding new top level column
> - adding new nested column
> - adding required column
> - adding required nested column
> - renaming a column
> - updating a column
> - making a column required
> - delete a column
> - changing the order of the columns in the schema



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25264) Add tests to verify Hive can read/write after schema change on Iceberg table

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25264?focusedWorklogId=612572=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612572
 ]

ASF GitHub Bot logged work on HIVE-25264:
-

Author: ASF GitHub Bot
Created on: 21/Jun/21 11:44
Start Date: 21/Jun/21 11:44
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2407:
URL: https://github.com/apache/hive/pull/2407#discussion_r655304838



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -1273,6 +1274,510 @@ public void testScanTableCaseInsensitive() throws 
IOException {
 Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testAddColumnToIcebergTable() throws IOException {
+// Create an Iceberg table with the columns customer_id, first_name and 
last_name with some initial data.
+Table icebergTable = testTables.createTable(shell, "customers", 
HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+// Add a new column (age long) to the Iceberg table.
+icebergTable.updateSchema().addColumn("age", 
Types.LongType.get()).commit();
+
+Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", 
Types.LongType.get()),
+optional(2, "first_name", Types.StringType.get(), "This is first 
name"),
+optional(3, "last_name", Types.StringType.get(), "This is last name"),
+optional(4, "age", Types.LongType.get()));
+
+Schema customerSchemaWithAgeOnly =

Review comment:
   Can we move this closer to where it's first used?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 612572)
Time Spent: 20m  (was: 10m)

> Add tests to verify Hive can read/write after schema change on Iceberg table
> 
>
> Key: HIVE-25264
> URL: https://issues.apache.org/jira/browse/HIVE-25264
> Project: Hive
>  Issue Type: Test
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We should verify if Hive can properly read/write Iceberg tables after their 
> schema was modified through the Iceberg API (it's like when an other engine, 
> like Spark has done modification on the schema). 
> Unit tests should be added for the following operations offered by the 
> UpdateSchema interface in the Iceberg API:
> - adding new top level column
> - adding new nested column
> - adding required column
> - adding required nested column
> - renaming a column
> - updating a column
> - making a column required
> - delete a column
> - changing the order of the columns in the schema



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25234) Implement ALTER TABLE ... SET PARTITION SPEC to change partitioning on Iceberg tables

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25234?focusedWorklogId=612545=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612545
 ]

ASF GitHub Bot logged work on HIVE-25234:
-

Author: ASF GitHub Bot
Created on: 21/Jun/21 10:59
Start Date: 21/Jun/21 10:59
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2382:
URL: https://github.com/apache/hive/pull/2382#discussion_r655278621



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerNoScan.java
##
@@ -155,6 +155,93 @@ public void after() throws Exception {
 HiveIcebergStorageHandlerTestUtils.close(shell);
   }
 
+  @Test
+  public void testPartitionEvolution() {

Review comment:
   What happens when I try to alter an HBase table with partition spec, or 
a normal Hive (non-Iceberg) table?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 612545)
Time Spent: 1h 40m  (was: 1.5h)

> Implement ALTER TABLE ... SET PARTITION SPEC to change partitioning on 
> Iceberg tables
> -
>
> Key: HIVE-25234
> URL: https://issues.apache.org/jira/browse/HIVE-25234
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Provide a way to change the schema and the Iceberg partitioning specification 
> using Hive syntax.
> {code:sql}
> ALTER TABLE tbl SET PARTITION SPEC(...)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25234) Implement ALTER TABLE ... SET PARTITION SPEC to change partitioning on Iceberg tables

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25234?focusedWorklogId=612544=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612544
 ]

ASF GitHub Bot logged work on HIVE-25234:
-

Author: ASF GitHub Bot
Created on: 21/Jun/21 10:59
Start Date: 21/Jun/21 10:59
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2382:
URL: https://github.com/apache/hive/pull/2382#discussion_r655278190



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergTableUtil.java
##
@@ -106,4 +111,61 @@ public static PartitionSpec spec(Configuration 
configuration, Schema schema) {
 });
 return builder.build();
   }
+
+  public static void updateSpec(Configuration configuration, Table table) {
+// get the new partition transform spec
+PartitionSpec newPartitionSpec = spec(configuration, table.schema());
+if (newPartitionSpec == null) {
+  LOG.debug("Iceberg Partition spec is not updated due to empty partition 
spec definition.");
+  return;
+}
+
+List newPartitionNames =
+
newPartitionSpec.fields().stream().map(PartitionField::name).collect(Collectors.toList());
+List currentPartitionNames = 
table.spec().fields().stream().map(PartitionField::name)
+.collect(Collectors.toList());
+List intersectingPartitionNames =
+
currentPartitionNames.stream().filter(newPartitionNames::contains).collect(Collectors.toList());
+
+// delete those partitions which are not present among the new partion spec
+UpdatePartitionSpec updatePartitionSpec = table.updateSpec();
+currentPartitionNames.stream().filter(p -> 
!intersectingPartitionNames.contains(p))
+.forEach(updatePartitionSpec::removeField);
+updatePartitionSpec.apply();
+
+// add new partitions which are not yet present
+List partitionTransformSpecList 
= SessionStateUtil
+.getResource(configuration, 
hive_metastoreConstants.PARTITION_TRANSFORM_SPEC)
+.map(o -> (List) 
o).orElseGet(() -> null);
+IntStream.range(0, partitionTransformSpecList.size())
+.filter(i -> 
!intersectingPartitionNames.contains(newPartitionSpec.fields().get(i).name()))
+.forEach(i -> {
+  PartitionTransform.PartitionTransformSpec spec = 
partitionTransformSpecList.get(i);
+  switch (spec.transformType) {

Review comment:
   This `switch` is very similar to the one where we are converting the 
json to the spec. Do we have a way to reuse code?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 612544)
Time Spent: 1.5h  (was: 1h 20m)

> Implement ALTER TABLE ... SET PARTITION SPEC to change partitioning on 
> Iceberg tables
> -
>
> Key: HIVE-25234
> URL: https://issues.apache.org/jira/browse/HIVE-25234
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Provide a way to change the schema and the Iceberg partitioning specification 
> using Hive syntax.
> {code:sql}
> ALTER TABLE tbl SET PARTITION SPEC(...)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25234) Implement ALTER TABLE ... SET PARTITION SPEC to change partitioning on Iceberg tables

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25234?focusedWorklogId=612542=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612542
 ]

ASF GitHub Bot logged work on HIVE-25234:
-

Author: ASF GitHub Bot
Created on: 21/Jun/21 10:48
Start Date: 21/Jun/21 10:48
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2382:
URL: https://github.com/apache/hive/pull/2382#discussion_r655272146



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -214,7 +219,7 @@ public void 
commitDropTable(org.apache.hadoop.hive.metastore.api.Table hmsTable,
   @Override
   public void preAlterTable(org.apache.hadoop.hive.metastore.api.Table 
hmsTable, EnvironmentContext context)
   throws MetaException {
-super.preAlterTable(hmsTable, context);
+setupAlterOperationType(hmsTable, context);

Review comment:
   We removed the `super.preAlterTable`. Is this intentional?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 612542)
Time Spent: 1h 20m  (was: 1h 10m)

> Implement ALTER TABLE ... SET PARTITION SPEC to change partitioning on 
> Iceberg tables
> -
>
> Key: HIVE-25234
> URL: https://issues.apache.org/jira/browse/HIVE-25234
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Provide a way to change the schema and the Iceberg partitioning specification 
> using Hive syntax.
> {code:sql}
> ALTER TABLE tbl SET PARTITION SPEC(...)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25265) Fix TestHiveIcebergStorageHandlerWithEngine

2021-06-21 Thread Marton Bod (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod reassigned HIVE-25265:
-

Assignee: Marton Bod

> Fix TestHiveIcebergStorageHandlerWithEngine
> ---
>
> Key: HIVE-25265
> URL: https://issues.apache.org/jira/browse/HIVE-25265
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Marton Bod
>Priority: Major
>
> test is unstable:
> http://ci.hive.apache.org/job/hive-flaky-check/251/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25215) tables_with_x_aborted_transactions should count partition/unpartitioned tables

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25215?focusedWorklogId=612498=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612498
 ]

ASF GitHub Bot logged work on HIVE-25215:
-

Author: ASF GitHub Bot
Created on: 21/Jun/21 08:32
Start Date: 21/Jun/21 08:32
Worklog Time Spent: 10m 
  Work Description: klcopp merged pull request #2363:
URL: https://github.com/apache/hive/pull/2363


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 612498)
Time Spent: 20m  (was: 10m)

> tables_with_x_aborted_transactions should count partition/unpartitioned tables
> --
>
> Key: HIVE-25215
> URL: https://issues.apache.org/jira/browse/HIVE-25215
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Initiator compare's each partition's number of aborts to 
> hive.compactor.abortedtxn.threshold, so tables_with_x_aborted_transactions 
> should reflect the number of partitions/unpartitioned tables with >x aborts, 
> instead of the number of tables with >x aborts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25215) tables_with_x_aborted_transactions should count partition/unpartitioned tables

2021-06-21 Thread Karen Coppage (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage resolved HIVE-25215.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Committed to master branch. Thanks for the contribution [~asinkovits]!

> tables_with_x_aborted_transactions should count partition/unpartitioned tables
> --
>
> Key: HIVE-25215
> URL: https://issues.apache.org/jira/browse/HIVE-25215
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Initiator compare's each partition's number of aborts to 
> hive.compactor.abortedtxn.threshold, so tables_with_x_aborted_transactions 
> should reflect the number of partitions/unpartitioned tables with >x aborts, 
> instead of the number of tables with >x aborts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25081) Put metrics collection behind a feature flag

2021-06-21 Thread Karen Coppage (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage resolved HIVE-25081.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Committed to master branch. Thanks for the contribution [~asinkovits]!

> Put metrics collection behind a feature flag
> 
>
> Key: HIVE-25081
> URL: https://issues.apache.org/jira/browse/HIVE-25081
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Most metrics we're creating are collected in AcidMetricsService, which is 
> behind a feature flag. However there are some metrics that are collected 
> outside of the service. These should be behind a feature flag in addition to 
> hive.metastore.metrics.enabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25081) Put metrics collection behind a feature flag

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25081?focusedWorklogId=612497=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612497
 ]

ASF GitHub Bot logged work on HIVE-25081:
-

Author: ASF GitHub Bot
Created on: 21/Jun/21 08:29
Start Date: 21/Jun/21 08:29
Worklog Time Spent: 10m 
  Work Description: klcopp merged pull request #2332:
URL: https://github.com/apache/hive/pull/2332


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 612497)
Time Spent: 2h  (was: 1h 50m)

> Put metrics collection behind a feature flag
> 
>
> Key: HIVE-25081
> URL: https://issues.apache.org/jira/browse/HIVE-25081
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Most metrics we're creating are collected in AcidMetricsService, which is 
> behind a feature flag. However there are some metrics that are collected 
> outside of the service. These should be behind a feature flag in addition to 
> hive.metastore.metrics.enabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25268) date_format udf doesn't work for dates prior to 1900 if the timezone is different from UTC

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25268?focusedWorklogId=612474=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612474
 ]

ASF GitHub Bot logged work on HIVE-25268:
-

Author: ASF GitHub Bot
Created on: 21/Jun/21 07:24
Start Date: 21/Jun/21 07:24
Worklog Time Spent: 10m 
  Work Description: guptanikhil007 commented on a change in pull request 
#2409:
URL: https://github.com/apache/hive/pull/2409#discussion_r655134815



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFDateFormat.java
##
@@ -111,17 +123,18 @@ public Object evaluate(DeferredObject[] arguments) throws 
HiveException {
 // the function should support both short date and full timestamp format
 // time part of the timestamp should not be skipped
 Timestamp ts = getTimestampValue(arguments, 0, tsConverters);
+
 if (ts == null) {
   Date d = getDateValue(arguments, 0, dtInputTypes, dtConverters);
   if (d == null) {
 return null;
   }
   ts = Timestamp.ofEpochMilli(d.toEpochMilli(id), id);
 }
-
-
-date.setTime(ts.toEpochMilli(id));
-String res = formatter.format(date);
+Timestamp ts2 = TimestampTZUtil.convertTimestampToZone(ts, timeZone, 
ZoneId.of("UTC"));
+Instant instant = Instant.ofEpochSecond(ts2.toEpochSecond(), 
ts2.getNanos());
+ZonedDateTime zonedDateTime = ZonedDateTime.ofInstant(instant, 
ZoneOffset.UTC);
+String res = formatter.format(zonedDateTime);

Review comment:
   The timezone gets converted for some specific locale's in case we don't 
do this conversion:
   PDT -> PT
   CST -> CT
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 612474)
Time Spent: 3h 10m  (was: 3h)

> date_format udf doesn't work for dates prior to 1900 if the timezone is 
> different from UTC
> --
>
> Key: HIVE-25268
> URL: https://issues.apache.org/jira/browse/HIVE-25268
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 3.1.0, 3.1.1, 3.1.2, 4.0.0
>Reporter: Nikhil Gupta
>Assignee: Nikhil Gupta
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> *Hive 1.2.1*:
> {code:java}
>  select date_format('1400-01-14 01:00:00', '-MM-dd HH:mm:ss z');
> +--+--+
> |   _c0|
> +--+--+
> | 1400-01-14 01:00:00 ICT  |
> +--+--+
> select date_format('1800-01-14 01:00:00', '-MM-dd HH:mm:ss z');
> +--+--+
> |   _c0|
> +--+--+
> | 1800-01-14 01:00:00 ICT  |
> +--+--+
> {code}
> *Hive 3.1, Hive 4.0:*
> {code:java}
> select date_format('1400-01-14 01:00:00', '-MM-dd HH:mm:ss z');
> +--+
> |   _c0|
> +--+
> | 1400-01-06 01:17:56 ICT  |
> +--+
> select date_format('1800-01-14 01:00:00', '-MM-dd HH:mm:ss z');
> +--+
> |   _c0|
> +--+
> | 1800-01-14 01:17:56 ICT  |
> +--+
> {code}
> VM timezone is set to 'Asia/Bangkok'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25246) Fix the clean up of open repl created transactions

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25246?focusedWorklogId=612472=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612472
 ]

ASF GitHub Bot logged work on HIVE-25246:
-

Author: ASF GitHub Bot
Created on: 21/Jun/21 07:21
Start Date: 21/Jun/21 07:21
Worklog Time Spent: 10m 
  Work Description: hmangla98 commented on a change in pull request #2396:
URL: https://github.com/apache/hive/pull/2396#discussion_r655133158



##
File path: ql/src/test/org/apache/hadoop/hive/metastore/txn/TestTxnHandler.java
##
@@ -244,8 +245,16 @@ public void testAbortTxns() throws Exception {
 List txnList = openedTxns.getTxn_ids();
 txnHandler.abortTxns(new AbortTxnsRequest(txnList));
 
+OpenTxnRequest replRqst = new OpenTxnRequest(2, "me", "localhost");
+replRqst.setReplPolicy("default.*");
+replRqst.setReplSrcTxnIds(Arrays.asList(1L, 2L));
+List targetTxns = txnHandler.openTxns(replRqst).getTxn_ids();
+txnHandler.abortTxns(new AbortTxnsRequest(targetTxns));
+
+assertFalse(targetTxnsPresentInReplTxnMap(targetTxns));

Review comment:
   This test is included in TestDbTxnManager.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 612472)
Time Spent: 50m  (was: 40m)

> Fix the clean up of open repl created transactions
> --
>
> Key: HIVE-25246
> URL: https://issues.apache.org/jira/browse/HIVE-25246
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25246) Fix the clean up of open repl created transactions

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25246?focusedWorklogId=612471=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612471
 ]

ASF GitHub Bot logged work on HIVE-25246:
-

Author: ASF GitHub Bot
Created on: 21/Jun/21 07:21
Start Date: 21/Jun/21 07:21
Worklog Time Spent: 10m 
  Work Description: hmangla98 commented on a change in pull request #2396:
URL: https://github.com/apache/hive/pull/2396#discussion_r655105296



##
File path: ql/src/test/org/apache/hadoop/hive/metastore/txn/TestTxnHandler.java
##
@@ -244,8 +245,16 @@ public void testAbortTxns() throws Exception {
 List txnList = openedTxns.getTxn_ids();
 txnHandler.abortTxns(new AbortTxnsRequest(txnList));
 
+OpenTxnRequest replRqst = new OpenTxnRequest(2, "me", "localhost");
+replRqst.setReplPolicy("default.*");
+replRqst.setReplSrcTxnIds(Arrays.asList(1L, 2L));
+List targetTxns = txnHandler.openTxns(replRqst).getTxn_ids();
+txnHandler.abortTxns(new AbortTxnsRequest(targetTxns));
+
+assertFalse(targetTxnsPresentInReplTxnMap(targetTxns));

Review comment:
   This test is included in TestDbTxnManager.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 612471)
Time Spent: 40m  (was: 0.5h)

> Fix the clean up of open repl created transactions
> --
>
> Key: HIVE-25246
> URL: https://issues.apache.org/jira/browse/HIVE-25246
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25222) Fix reading Iceberg tables with a comma in column names

2021-06-21 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-25222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ádám Szita resolved HIVE-25222.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Committed to master. Thanks [~Marton Bod] for the fix,  [~lpinter] and 
[~belugabehr] for the review.

> Fix reading Iceberg tables with a comma in column names
> ---
>
> Key: HIVE-25222
> URL: https://issues.apache.org/jira/browse/HIVE-25222
> Project: Hive
>  Issue Type: Bug
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When using a table with a column name containing a comma (e.g. `employ,ee`), 
> reading an Iceberg table fails because we rely on the property 
> "hive.io.file.readcolumn.names" which encodes the read columns in a 
> comma-separated list, put together by the ColumnProjectionUtils class.
> Because it's comma-separated in all cases, it will produce a string like: 
> "id,birth_date,employ,ee" which can cause problems for Iceberg readers which 
> use this string list to construct their expected read schema.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25222) Fix reading Iceberg tables with a comma in column names

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25222?focusedWorklogId=612469=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612469
 ]

ASF GitHub Bot logged work on HIVE-25222:
-

Author: ASF GitHub Bot
Created on: 21/Jun/21 07:15
Start Date: 21/Jun/21 07:15
Worklog Time Spent: 10m 
  Work Description: szlta merged pull request #2368:
URL: https://github.com/apache/hive/pull/2368


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 612469)
Time Spent: 0.5h  (was: 20m)

> Fix reading Iceberg tables with a comma in column names
> ---
>
> Key: HIVE-25222
> URL: https://issues.apache.org/jira/browse/HIVE-25222
> Project: Hive
>  Issue Type: Bug
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When using a table with a column name containing a comma (e.g. `employ,ee`), 
> reading an Iceberg table fails because we rely on the property 
> "hive.io.file.readcolumn.names" which encodes the read columns in a 
> comma-separated list, put together by the ColumnProjectionUtils class.
> Because it's comma-separated in all cases, it will produce a string like: 
> "id,birth_date,employ,ee" which can cause problems for Iceberg readers which 
> use this string list to construct their expected read schema.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25272) READ transactions are getting logged in NOTIFICATION LOG

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25272:
--
Labels: pull-request-available  (was: )

> READ transactions are getting logged in NOTIFICATION LOG
> 
>
> Key: HIVE-25272
> URL: https://issues.apache.org/jira/browse/HIVE-25272
> Project: Hive
>  Issue Type: Bug
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While READ transactions are already skipped from getting logged in 
> NOTIFICATION logs, few are still getting logged. Need to skip those 
> transactions as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25272) READ transactions are getting logged in NOTIFICATION LOG

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25272?focusedWorklogId=612468=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612468
 ]

ASF GitHub Bot logged work on HIVE-25272:
-

Author: ASF GitHub Bot
Created on: 21/Jun/21 07:15
Start Date: 21/Jun/21 07:15
Worklog Time Spent: 10m 
  Work Description: pkumarsinha opened a new pull request #2413:
URL: https://github.com/apache/hive/pull/2413


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 612468)
Remaining Estimate: 0h
Time Spent: 10m

> READ transactions are getting logged in NOTIFICATION LOG
> 
>
> Key: HIVE-25272
> URL: https://issues.apache.org/jira/browse/HIVE-25272
> Project: Hive
>  Issue Type: Bug
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While READ transactions are already skipped from getting logged in 
> NOTIFICATION logs, few are still getting logged. Need to skip those 
> transactions as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25233) Removing deprecated unix_timestamp UDF

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25233?focusedWorklogId=612460=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612460
 ]

ASF GitHub Bot logged work on HIVE-25233:
-

Author: ASF GitHub Bot
Created on: 21/Jun/21 06:31
Start Date: 21/Jun/21 06:31
Worklog Time Spent: 10m 
  Work Description: ashish-kumar-sharma commented on pull request #2380:
URL: https://github.com/apache/hive/pull/2380#issuecomment-864766093


   @kgyrtkirk  Could you please review the PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 612460)
Time Spent: 0.5h  (was: 20m)

> Removing deprecated unix_timestamp UDF
> --
>
> Key: HIVE-25233
> URL: https://issues.apache.org/jira/browse/HIVE-25233
> Project: Hive
>  Issue Type: Task
>  Components: UDF
>Affects Versions: All Versions
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Description
> Since unix_timestamp() UDF was deprecated as part of 
> https://issues.apache.org/jira/browse/HIVE-10728. Internal 
> GenericUDFUnixTimeStamp extend GenericUDFToUnixTimeStamp and call 
> to_utc_timestamp() for unix_timestamp(string date) & unix_timestamp(string 
> date, string pattern).
> unix_timestamp()   => CURRENT_TIMESTAMP
> unix_timestamp(string date) => to_unix_timestamp()
> unix_timestamp(string date, string pattern) => to_unix_timestamp()
> We should clean up unix_timestamp() and points to to_unix_timestamp()
>



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24804) Introduce check: RANGE with offset PRECEDING/FOLLOWING requires at least one ORDER BY column

2021-06-21 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-24804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-24804:

Fix Version/s: 4.0.0

> Introduce check: RANGE with offset PRECEDING/FOLLOWING requires at least one 
> ORDER BY column
> 
>
> Key: HIVE-24804
> URL: https://issues.apache.org/jira/browse/HIVE-24804
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Currently, in Hive, we can run a windowing function with range specification 
> but without an ORDER BY clause:
> {code}
> create table vector_ptf_part_simple_text(p_mfgr string, p_name string, 
> p_retailprice double, rowindex string);
> select p_mfgr, p_name, rowindex,
> count(*) over(partition by p_mfgr range between 1 preceding and current row) 
> as cs1,
> count(*) over(partition by p_mfgr range between 3 preceding and current row) 
> as cs2
> from vector_ptf_part_simple_text;
> {code}
> This is confusing, because without an order by clause, the range is out of 
> context, we don't know by which column should we calculate the range.
> Tested on Postgres, it throws an exception:
> {code}
> create table vector_ptf_part_simple_text(p_mfgr varchar(10), p_name 
> varchar(10), p_retailprice integer, rowindex varchar(10));
> select p_mfgr, p_name, rowindex,
> count(*) over(partition by p_mfgr range between 1 preceding and current row) 
> as cs1,
> count(*) over(partition by p_mfgr range between 3 preceding and current row) 
> as cs2
> from vector_ptf_part_simple_text;
> *RANGE with offset PRECEDING/FOLLOWING requires exactly one ORDER BY column*
> {code}
> further references:
> https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts
> {code}
> RANGE: Computes the window frame based on a logical range of rows around the 
> current row, based on the current row’s ORDER BY key value. The provided 
> range value is added or subtracted to the current row's key value to define a 
> starting or ending range boundary for the window frame. In a range-based 
> window frame, there must be exactly one expression in the ORDER BY clause, 
> and the expression must have a numeric type.
> {code}
> https://docs.oracle.com/cd/E17952_01/mysql-8.0-en/window-functions-frames.html
> {code}
> Without ORDER BY: The default frame includes all partition rows (because, 
> without ORDER BY, all partition rows are peers). The default is equivalent to 
> this frame specification:
> RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
> {code}
> I believe this one could only make sense if you don't specify range, 
> otherwise the sql statement reflects a different thing from which is returned 
> by the engine



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25246) Fix the clean up of open repl created transactions

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25246?focusedWorklogId=612458=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612458
 ]

ASF GitHub Bot logged work on HIVE-25246:
-

Author: ASF GitHub Bot
Created on: 21/Jun/21 06:26
Start Date: 21/Jun/21 06:26
Worklog Time Spent: 10m 
  Work Description: hmangla98 commented on a change in pull request #2396:
URL: https://github.com/apache/hive/pull/2396#discussion_r655105296



##
File path: ql/src/test/org/apache/hadoop/hive/metastore/txn/TestTxnHandler.java
##
@@ -244,8 +245,16 @@ public void testAbortTxns() throws Exception {
 List txnList = openedTxns.getTxn_ids();
 txnHandler.abortTxns(new AbortTxnsRequest(txnList));
 
+OpenTxnRequest replRqst = new OpenTxnRequest(2, "me", "localhost");
+replRqst.setReplPolicy("default.*");
+replRqst.setReplSrcTxnIds(Arrays.asList(1L, 2L));
+List targetTxns = txnHandler.openTxns(replRqst).getTxn_ids();
+txnHandler.abortTxns(new AbortTxnsRequest(targetTxns));
+
+assertFalse(targetTxnsPresentInReplTxnMap(targetTxns));

Review comment:
   This test is included in TestDbTxnManager.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 612458)
Time Spent: 0.5h  (was: 20m)

> Fix the clean up of open repl created transactions
> --
>
> Key: HIVE-25246
> URL: https://issues.apache.org/jira/browse/HIVE-25246
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24804) Introduce check: RANGE with offset PRECEDING/FOLLOWING requires at least one ORDER BY column

2021-06-21 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-24804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor resolved HIVE-24804.
-
Resolution: Fixed

> Introduce check: RANGE with offset PRECEDING/FOLLOWING requires at least one 
> ORDER BY column
> 
>
> Key: HIVE-24804
> URL: https://issues.apache.org/jira/browse/HIVE-24804
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Currently, in Hive, we can run a windowing function with range specification 
> but without an ORDER BY clause:
> {code}
> create table vector_ptf_part_simple_text(p_mfgr string, p_name string, 
> p_retailprice double, rowindex string);
> select p_mfgr, p_name, rowindex,
> count(*) over(partition by p_mfgr range between 1 preceding and current row) 
> as cs1,
> count(*) over(partition by p_mfgr range between 3 preceding and current row) 
> as cs2
> from vector_ptf_part_simple_text;
> {code}
> This is confusing, because without an order by clause, the range is out of 
> context, we don't know by which column should we calculate the range.
> Tested on Postgres, it throws an exception:
> {code}
> create table vector_ptf_part_simple_text(p_mfgr varchar(10), p_name 
> varchar(10), p_retailprice integer, rowindex varchar(10));
> select p_mfgr, p_name, rowindex,
> count(*) over(partition by p_mfgr range between 1 preceding and current row) 
> as cs1,
> count(*) over(partition by p_mfgr range between 3 preceding and current row) 
> as cs2
> from vector_ptf_part_simple_text;
> *RANGE with offset PRECEDING/FOLLOWING requires exactly one ORDER BY column*
> {code}
> further references:
> https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts
> {code}
> RANGE: Computes the window frame based on a logical range of rows around the 
> current row, based on the current row’s ORDER BY key value. The provided 
> range value is added or subtracted to the current row's key value to define a 
> starting or ending range boundary for the window frame. In a range-based 
> window frame, there must be exactly one expression in the ORDER BY clause, 
> and the expression must have a numeric type.
> {code}
> https://docs.oracle.com/cd/E17952_01/mysql-8.0-en/window-functions-frames.html
> {code}
> Without ORDER BY: The default frame includes all partition rows (because, 
> without ORDER BY, all partition rows are peers). The default is equivalent to 
> this frame specification:
> RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
> {code}
> I believe this one could only make sense if you don't specify range, 
> otherwise the sql statement reflects a different thing from which is returned 
> by the engine



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24804) Introduce check: RANGE with offset PRECEDING/FOLLOWING requires at least one ORDER BY column

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24804?focusedWorklogId=612457=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612457
 ]

ASF GitHub Bot logged work on HIVE-24804:
-

Author: ASF GitHub Bot
Created on: 21/Jun/21 06:23
Start Date: 21/Jun/21 06:23
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on pull request #2000:
URL: https://github.com/apache/hive/pull/2000#issuecomment-864762115


   merged, thanks @kasakrisz for the review!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 612457)
Time Spent: 1h 10m  (was: 1h)

> Introduce check: RANGE with offset PRECEDING/FOLLOWING requires at least one 
> ORDER BY column
> 
>
> Key: HIVE-24804
> URL: https://issues.apache.org/jira/browse/HIVE-24804
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Currently, in Hive, we can run a windowing function with range specification 
> but without an ORDER BY clause:
> {code}
> create table vector_ptf_part_simple_text(p_mfgr string, p_name string, 
> p_retailprice double, rowindex string);
> select p_mfgr, p_name, rowindex,
> count(*) over(partition by p_mfgr range between 1 preceding and current row) 
> as cs1,
> count(*) over(partition by p_mfgr range between 3 preceding and current row) 
> as cs2
> from vector_ptf_part_simple_text;
> {code}
> This is confusing, because without an order by clause, the range is out of 
> context, we don't know by which column should we calculate the range.
> Tested on Postgres, it throws an exception:
> {code}
> create table vector_ptf_part_simple_text(p_mfgr varchar(10), p_name 
> varchar(10), p_retailprice integer, rowindex varchar(10));
> select p_mfgr, p_name, rowindex,
> count(*) over(partition by p_mfgr range between 1 preceding and current row) 
> as cs1,
> count(*) over(partition by p_mfgr range between 3 preceding and current row) 
> as cs2
> from vector_ptf_part_simple_text;
> *RANGE with offset PRECEDING/FOLLOWING requires exactly one ORDER BY column*
> {code}
> further references:
> https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts
> {code}
> RANGE: Computes the window frame based on a logical range of rows around the 
> current row, based on the current row’s ORDER BY key value. The provided 
> range value is added or subtracted to the current row's key value to define a 
> starting or ending range boundary for the window frame. In a range-based 
> window frame, there must be exactly one expression in the ORDER BY clause, 
> and the expression must have a numeric type.
> {code}
> https://docs.oracle.com/cd/E17952_01/mysql-8.0-en/window-functions-frames.html
> {code}
> Without ORDER BY: The default frame includes all partition rows (because, 
> without ORDER BY, all partition rows are peers). The default is equivalent to 
> this frame specification:
> RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
> {code}
> I believe this one could only make sense if you don't specify range, 
> otherwise the sql statement reflects a different thing from which is returned 
> by the engine



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24804) Introduce check: RANGE with offset PRECEDING/FOLLOWING requires at least one ORDER BY column

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24804?focusedWorklogId=612456=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612456
 ]

ASF GitHub Bot logged work on HIVE-24804:
-

Author: ASF GitHub Bot
Created on: 21/Jun/21 06:23
Start Date: 21/Jun/21 06:23
Worklog Time Spent: 10m 
  Work Description: abstractdog merged pull request #2000:
URL: https://github.com/apache/hive/pull/2000


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 612456)
Time Spent: 1h  (was: 50m)

> Introduce check: RANGE with offset PRECEDING/FOLLOWING requires at least one 
> ORDER BY column
> 
>
> Key: HIVE-24804
> URL: https://issues.apache.org/jira/browse/HIVE-24804
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently, in Hive, we can run a windowing function with range specification 
> but without an ORDER BY clause:
> {code}
> create table vector_ptf_part_simple_text(p_mfgr string, p_name string, 
> p_retailprice double, rowindex string);
> select p_mfgr, p_name, rowindex,
> count(*) over(partition by p_mfgr range between 1 preceding and current row) 
> as cs1,
> count(*) over(partition by p_mfgr range between 3 preceding and current row) 
> as cs2
> from vector_ptf_part_simple_text;
> {code}
> This is confusing, because without an order by clause, the range is out of 
> context, we don't know by which column should we calculate the range.
> Tested on Postgres, it throws an exception:
> {code}
> create table vector_ptf_part_simple_text(p_mfgr varchar(10), p_name 
> varchar(10), p_retailprice integer, rowindex varchar(10));
> select p_mfgr, p_name, rowindex,
> count(*) over(partition by p_mfgr range between 1 preceding and current row) 
> as cs1,
> count(*) over(partition by p_mfgr range between 3 preceding and current row) 
> as cs2
> from vector_ptf_part_simple_text;
> *RANGE with offset PRECEDING/FOLLOWING requires exactly one ORDER BY column*
> {code}
> further references:
> https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts
> {code}
> RANGE: Computes the window frame based on a logical range of rows around the 
> current row, based on the current row’s ORDER BY key value. The provided 
> range value is added or subtracted to the current row's key value to define a 
> starting or ending range boundary for the window frame. In a range-based 
> window frame, there must be exactly one expression in the ORDER BY clause, 
> and the expression must have a numeric type.
> {code}
> https://docs.oracle.com/cd/E17952_01/mysql-8.0-en/window-functions-frames.html
> {code}
> Without ORDER BY: The default frame includes all partition rows (because, 
> without ORDER BY, all partition rows are peers). The default is equivalent to 
> this frame specification:
> RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
> {code}
> I believe this one could only make sense if you don't specify range, 
> otherwise the sql statement reflects a different thing from which is returned 
> by the engine



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25268) date_format udf doesn't work for dates prior to 1900 if the timezone is different from UTC

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25268?focusedWorklogId=612455=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612455
 ]

ASF GitHub Bot logged work on HIVE-25268:
-

Author: ASF GitHub Bot
Created on: 21/Jun/21 06:18
Start Date: 21/Jun/21 06:18
Worklog Time Spent: 10m 
  Work Description: guptanikhil007 commented on a change in pull request 
#2409:
URL: https://github.com/apache/hive/pull/2409#discussion_r655101614



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFDateFormat.java
##
@@ -85,21 +87,31 @@ public ObjectInspector initialize(ObjectInspector[] 
arguments) throws UDFArgumen
   String fmtStr = getConstantStringValue(arguments, 1);
   if (fmtStr != null) {
 try {
-  formatter = new SimpleDateFormat(fmtStr);
-  
formatter.setCalendar(DateTimeMath.getTimeZonedProlepticGregorianCalendar());
+  if (timeZone == null) {
+timeZone = SessionState.get() == null ? new 
HiveConf().getLocalTimeZone() : SessionState.get().getConf()
+.getLocalTimeZone();
+  }
+  formatter = DateTimeFormatter.ofPattern(fmtStr).withZone(timeZone);
 } catch (IllegalArgumentException e) {
   // ignore
 }
   }
 } else {
-  throw new UDFArgumentTypeException(1, getFuncName() + " only takes 
constant as "
-  + getArgOrder(1) + " argument");
+  throw new UDFArgumentTypeException(1, getFuncName() + " only takes 
constant as " + getArgOrder(1) + " argument");
 }
 
 ObjectInspector outputOI = 
PrimitiveObjectInspectorFactory.writableStringObjectInspector;
 return outputOI;
   }
 
+  @Override

Review comment:
   Removed and verified




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 612455)
Time Spent: 3h  (was: 2h 50m)

> date_format udf doesn't work for dates prior to 1900 if the timezone is 
> different from UTC
> --
>
> Key: HIVE-25268
> URL: https://issues.apache.org/jira/browse/HIVE-25268
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 3.1.0, 3.1.1, 3.1.2, 4.0.0
>Reporter: Nikhil Gupta
>Assignee: Nikhil Gupta
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> *Hive 1.2.1*:
> {code:java}
>  select date_format('1400-01-14 01:00:00', '-MM-dd HH:mm:ss z');
> +--+--+
> |   _c0|
> +--+--+
> | 1400-01-14 01:00:00 ICT  |
> +--+--+
> select date_format('1800-01-14 01:00:00', '-MM-dd HH:mm:ss z');
> +--+--+
> |   _c0|
> +--+--+
> | 1800-01-14 01:00:00 ICT  |
> +--+--+
> {code}
> *Hive 3.1, Hive 4.0:*
> {code:java}
> select date_format('1400-01-14 01:00:00', '-MM-dd HH:mm:ss z');
> +--+
> |   _c0|
> +--+
> | 1400-01-06 01:17:56 ICT  |
> +--+
> select date_format('1800-01-14 01:00:00', '-MM-dd HH:mm:ss z');
> +--+
> |   _c0|
> +--+
> | 1800-01-14 01:17:56 ICT  |
> +--+
> {code}
> VM timezone is set to 'Asia/Bangkok'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25272) READ transactions are getting logged in NOTIFICATION LOG

2021-06-21 Thread Pravin Sinha (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha updated HIVE-25272:

Status: Patch Available  (was: Open)

> READ transactions are getting logged in NOTIFICATION LOG
> 
>
> Key: HIVE-25272
> URL: https://issues.apache.org/jira/browse/HIVE-25272
> Project: Hive
>  Issue Type: Bug
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>
> While READ transactions are already skipped from getting logged in 
> NOTIFICATION logs, few are still getting logged. Need to skip those 
> transactions as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25268) date_format udf doesn't work for dates prior to 1900 if the timezone is different from UTC

2021-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25268?focusedWorklogId=612453=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612453
 ]

ASF GitHub Bot logged work on HIVE-25268:
-

Author: ASF GitHub Bot
Created on: 21/Jun/21 06:17
Start Date: 21/Jun/21 06:17
Worklog Time Spent: 10m 
  Work Description: guptanikhil007 commented on a change in pull request 
#2409:
URL: https://github.com/apache/hive/pull/2409#discussion_r655101432



##
File path: ql/src/test/queries/clientpositive/udf_date_format.q
##
@@ -78,3 +78,8 @@ select date_format("2015-04-08 10:30:45","-MM-dd 
HH:mm:ss.SSS z");
 --julian date
 set hive.local.time.zone=UTC;
 select date_format("1001-01-05","dd---MM--");
+
+--dates prior to 1900 in another time zone
+set hive.local.time.zone=Asia/Bangkok;

Review comment:
   Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 612453)
Time Spent: 2h 40m  (was: 2.5h)

> date_format udf doesn't work for dates prior to 1900 if the timezone is 
> different from UTC
> --
>
> Key: HIVE-25268
> URL: https://issues.apache.org/jira/browse/HIVE-25268
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 3.1.0, 3.1.1, 3.1.2, 4.0.0
>Reporter: Nikhil Gupta
>Assignee: Nikhil Gupta
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> *Hive 1.2.1*:
> {code:java}
>  select date_format('1400-01-14 01:00:00', '-MM-dd HH:mm:ss z');
> +--+--+
> |   _c0|
> +--+--+
> | 1400-01-14 01:00:00 ICT  |
> +--+--+
> select date_format('1800-01-14 01:00:00', '-MM-dd HH:mm:ss z');
> +--+--+
> |   _c0|
> +--+--+
> | 1800-01-14 01:00:00 ICT  |
> +--+--+
> {code}
> *Hive 3.1, Hive 4.0:*
> {code:java}
> select date_format('1400-01-14 01:00:00', '-MM-dd HH:mm:ss z');
> +--+
> |   _c0|
> +--+
> | 1400-01-06 01:17:56 ICT  |
> +--+
> select date_format('1800-01-14 01:00:00', '-MM-dd HH:mm:ss z');
> +--+
> |   _c0|
> +--+
> | 1800-01-14 01:17:56 ICT  |
> +--+
> {code}
> VM timezone is set to 'Asia/Bangkok'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   >