[jira] [Work logged] (HIVE-24524) LLAP ShuffleHandler: upgrade to netty4

2021-03-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24524?focusedWorklogId=566659=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-566659
 ]

ASF GitHub Bot logged work on HIVE-24524:
-

Author: ASF GitHub Bot
Created on: 16/Mar/21 00:50
Start Date: 16/Mar/21 00:50
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1778:
URL: https://github.com/apache/hive/pull/1778


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 566659)
Time Spent: 0.5h  (was: 20m)

> LLAP ShuffleHandler: upgrade to netty4
> --
>
> Key: HIVE-24524
> URL: https://issues.apache.org/jira/browse/HIVE-24524
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Tez already has a WIP patch for upgrading its shuffle handler to netty4. 
> Netty4 is told to be a possible performance improvement compared to Netty3. 
> However, the refactor is not trivial, TEZ-4157 covers that more or less (the 
> code bases are very similar).
> Background:
> netty4 migration guideline: 
> https://netty.io/wiki/new-and-noteworthy-in-4.0.html
> articles of possible performance improvement:
> https://blog.twitter.com/engineering/en_us/a/2013/netty-4-at-twitter-reduced-gc-overhead.html
> https://developer.squareup.com/blog/upgrading-a-reverse-proxy-from-netty-3-to-4/
> some other notes: Netty3 is EOL since 2016:
> https://netty.io/news/2016/06/29/3-10-6-Final.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21737) Upgrade Avro to version 1.10.1

2021-03-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21737?focusedWorklogId=566657=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-566657
 ]

ASF GitHub Bot logged work on HIVE-21737:
-

Author: ASF GitHub Bot
Created on: 16/Mar/21 00:50
Start Date: 16/Mar/21 00:50
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1806:
URL: https://github.com/apache/hive/pull/1806


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 566657)
Time Spent: 9h  (was: 8h 50m)

> Upgrade Avro to version 1.10.1
> --
>
> Key: HIVE-21737
> URL: https://issues.apache.org/jira/browse/HIVE-21737
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Ismaël Mejía
>Assignee: Fokko Driesprong
>Priority: Major
>  Labels: pull-request-available
> Attachments: 
> 0001-HIVE-21737-Make-Avro-use-in-Hive-compatible-with-Avr.patch
>
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> Avro >= 1.9.x bring a lot of fixes including a leaner version of Avro without 
> Jackson in the public API and Guava as a dependency. Worth the update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24594) results_cache_invalidation2.q is flaky

2021-03-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24594?focusedWorklogId=566655=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-566655
 ]

ASF GitHub Bot logged work on HIVE-24594:
-

Author: ASF GitHub Bot
Created on: 16/Mar/21 00:49
Start Date: 16/Mar/21 00:49
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1837:
URL: https://github.com/apache/hive/pull/1837


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 566655)
Time Spent: 0.5h  (was: 20m)

> results_cache_invalidation2.q is flaky
> --
>
> Key: HIVE-24594
> URL: https://issues.apache.org/jira/browse/HIVE-24594
> Project: Hive
>  Issue Type: Test
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> results_cache_invalidation2.q failed for me couple of times on a unrelated 
> PR. Here is the error log.
> {noformat}
> ---
> Test set: org.apache.hadoop.hive.cli.split19.TestMiniLlapLocalCliDriver
> ---
> Tests run: 90, Failures: 1, Errors: 0, Skipped: 6, Time elapsed: 450.54 s <<< 
> FAILURE! - in org.apache.hadoop.hive.cli.split19.TestMiniLlapLocalCliDriver
> org.apache.hadoop.hive.cli.split19.TestMiniLlapLocalCliDriver.testCliDriver[results_cache_invalidation2]
>   Time elapsed: 15.087 s  <<< FAILURE!
> java.lang.AssertionError:
> Client Execution succeeded but contained differences (error code = 1) after 
> executing results_cache_invalidation2.q ^M
> 266a267
> >  A masked pattern was here 
> 271a273
> >  A masked pattern was here 
> 273c275,276
> <   Stage-0 is a root stage
> ---
> >   Stage-1 is a root stage
> >   Stage-0 depends on stages: Stage-1
> 275a279,365
> >   Stage: Stage-1
> > Tez
> >  A masked pattern was here 
> >   Edges:
> > Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 4 (SIMPLE_EDGE)
> > Reducer 3 <- Reducer 2 (CUSTOM_SIMPLE_EDGE)
> >  A masked pattern was here 
> >   Vertices:
> > Map 1
> > Map Operator Tree:
> > TableScan
> >   alias: tab1
> >   filterExpr: key is not null (type: boolean)
> >   Statistics: Num rows: 1500 Data size: 130500 Basic stats: 
> > COMPLETE Column stats: COMPLETE
> >   Filter Operator
> > predicate: key is not null (type: boolean)
> > Statistics: Num rows: 1500 Data size: 130500 Basic 
> > stats: COMPLETE Column stats: COMPLETE
> > Select Operator
> >   expressions: key (type: string)
> >   outputColumnNames: _col0
> >   Statistics: Num rows: 1500 Data size: 130500 Basic 
> > stats: COMPLETE Column stats: COMPLETE
> >   Reduce Output Operator
> > key expressions: _col0 (type: string)
> > null sort order: z
> > sort order: +
> > Map-reduce partition columns: _col0 (type: string)
> > Statistics: Num rows: 1500 Data size: 130500 Basic 
> > stats: COMPLETE Column stats: COMPLETE
> > Execution mode: vectorized, llap
> > LLAP IO: all inputs
> > Map 4
> > Map Operator Tree:
> > TableScan
> >   alias: tab2
> >   filterExpr: key is not null (type: boolean)
> >   Statistics: Num rows: 500 Data size: 43500 Basic stats: 
> > COMPLETE Column stats: COMPLETE
> >   Fil^M
> {noformat}
> The test works for me locally. In fact the same PR had a successful run of 
> this test in a previous commit. I think we should disable this and re-enable 
> it after fixing the flakiness.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24595) Vectorization causing incorrect results for scalar subquery

2021-03-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24595?focusedWorklogId=566656=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-566656
 ]

ASF GitHub Bot logged work on HIVE-24595:
-

Author: ASF GitHub Bot
Created on: 16/Mar/21 00:49
Start Date: 16/Mar/21 00:49
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #1867:
URL: https://github.com/apache/hive/pull/1867#issuecomment-799860346


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 566656)
Time Spent: 20m  (was: 10m)

> Vectorization causing incorrect results for scalar subquery
> ---
>
> Key: HIVE-24595
> URL: https://issues.apache.org/jira/browse/HIVE-24595
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 3.0.0
>Reporter: Vineet Garg
>Assignee: Mustafa İman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> *Repro*
> {code:sql}
>  CREATE EXTERNAL TABLE `alltypessmall`( 
>`id` int,
>`bool_col` boolean,  
>`tinyint_col` tinyint,   
>`smallint_col` smallint, 
>`int_col` int,   
>`bigint_col` bigint, 
>`float_col` float,   
>`double_col` double, 
>`date_string_col` string,
>`string_col` string, 
>`timestamp_col` timestamp)   
>  PARTITIONED BY (   
>`year` int,  
>`month` int) 
>  ROW FORMAT SERDE   
>'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'  
>  WITH SERDEPROPERTIES ( 
>'escape.delim'='\\', 
>'field.delim'=',',   
>'serialization.format'=',')  
>  STORED AS INPUTFORMAT  
>'org.apache.hadoop.mapred.TextInputFormat'   
>  OUTPUTFORMAT   
>'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' 
>  TBLPROPERTIES (
>'DO_NOT_UPDATE_STATS'='true',
>'OBJCAPABILITIES'='EXTREAD,EXTWRITE',
>'STATS_GENERATED'='TASK',
>'impala.lastComputeStatsTime'='1608312793',  
>'transient_lastDdlTime'='1608310442');
> insert into alltypessmall partition(year=2002,month=1) values(1, true, 
> 3,3,4,3434,5.4,44.3,'str1','str2', '01-01-2001');
> insert into alltypessmall partition(year=2002,month=1) values(1, true, 
> 3,3,4,3434,5.4,44.3,'str1','str2', '01-01-2001');
> insert into alltypessmall partition(year=2002,month=1) values(1, true, 
> 3,3,40,3434,5.4,44.3,'str1','str2', '01-01-2001');
> {code}
> Following query should fail but it succeeds
> {code:sql}
> SELECT id FROM alltypessmall
> WHERE int_col =
>   (SELECT int_col
>FROM alltypessmall)
> ORDER BY id;
> {code}
> *Explain plan*
> {code:java}
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   DagId: vgarg_20210106115838_3fe73bf6-66c2-4281-92e8-fd75fd8ad400:17
>   Edges:
> Map 1 <- Map 3 (BROADCAST_EDGE), Reducer 4 (BROADCAST_EDGE)
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> Reducer 4 <- Map 3 (CUSTOM_SIMPLE_EDGE)
>   DagName: vgarg_20210106115838_3fe73bf6-66c2-4281-92e8-fd75fd8ad400:17
>   Vertices:
> Map 1
> Map Operator Tree:
> TableScan
>   alias: alltypessmall
>   filterExpr: int_col is not null (type: boolean)
>   Statistics: Num rows: 3 Data size: 24 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   Filter Operator
> predicate: int_col is not null (type: boolean)

[jira] [Commented] (HIVE-24718) Moving to file based iteration for copying data

2021-03-15 Thread Pravin Sinha (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302080#comment-17302080
 ] 

Pravin Sinha commented on HIVE-24718:
-

+1

> Moving to file based iteration for copying data
> ---
>
> Key: HIVE-24718
> URL: https://issues.apache.org/jira/browse/HIVE-24718
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24718.01.patch, HIVE-24718.02.patch, 
> HIVE-24718.04.patch, HIVE-24718.05.patch, HIVE-24718.06.patch
>
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24887) getDatabase() to call translation code even if client has no capabilities

2021-03-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24887?focusedWorklogId=566593=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-566593
 ]

ASF GitHub Bot logged work on HIVE-24887:
-

Author: ASF GitHub Bot
Created on: 15/Mar/21 22:33
Start Date: 15/Mar/21 22:33
Worklog Time Spent: 10m 
  Work Description: nrg4878 opened a new pull request #2076:
URL: https://github.com/apache/hive/pull/2076


   …nslation (Naveen Gangam)
   
   ### What changes were proposed in this pull request?
   A minor change to have getDatabase() call call the translation layer even 
when client does not set the capabilities.
   Another change is to make checkDeletePermissions in the storage based 
authorizer to be package visibility, much like the other check* methods on this 
class. This allows the subclasses to use this method as well.
   
   ### Why are the changes needed?
   Mostly consistency with other methods like createTable() etc.
   
   ### Does this PR introduce _any_ user-facing change?
   Potentially, clients can see a different locationUri if their original 
database object had location from the managed warehouse. This old location will 
now be set as managedLocationUri.
   
   ### How was this patch tested?
   Manually.
   Failed unit tests.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 566593)
Remaining Estimate: 0h
Time Spent: 10m

> getDatabase() to call translation code even if client has no capabilities
> -
>
> Key: HIVE-24887
> URL: https://issues.apache.org/jira/browse/HIVE-24887
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We do this for other calls that go thru translation layer. For some reason, 
> the current code only calls it when the client sets the capabilities.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24887) getDatabase() to call translation code even if client has no capabilities

2021-03-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24887:
--
Labels: pull-request-available  (was: )

> getDatabase() to call translation code even if client has no capabilities
> -
>
> Key: HIVE-24887
> URL: https://issues.apache.org/jira/browse/HIVE-24887
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We do this for other calls that go thru translation layer. For some reason, 
> the current code only calls it when the client sets the capabilities.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24887) getDatabase() to call translation code even if client has no capabilities

2021-03-15 Thread Naveen Gangam (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam reassigned HIVE-24887:



> getDatabase() to call translation code even if client has no capabilities
> -
>
> Key: HIVE-24887
> URL: https://issues.apache.org/jira/browse/HIVE-24887
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
>
> We do this for other calls that go thru translation layer. For some reason, 
> the current code only calls it when the client sets the capabilities.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23779) BasicStatsTask Info is not getting printed in beeline console

2021-03-15 Thread Naresh P R (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naresh P R updated HIVE-23779:
--
Fix Version/s: 4.0.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> BasicStatsTask Info is not getting printed in beeline console
> -
>
> Key: HIVE-23779
> URL: https://issues.apache.org/jira/browse/HIVE-23779
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> After HIVE-16061, partition basic stats are not getting printed in beeline 
> console.
> {code:java}
> INFO : Partition {dt=2020-06-29} stats: [numFiles=21, numRows=22, 
> totalSize=14607, rawDataSize=0]{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23779) BasicStatsTask Info is not getting printed in beeline console

2021-03-15 Thread Naresh P R (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302014#comment-17302014
 ] 

Naresh P R commented on HIVE-23779:
---

Thanks for the review & merge [~mgergely]

> BasicStatsTask Info is not getting printed in beeline console
> -
>
> Key: HIVE-23779
> URL: https://issues.apache.org/jira/browse/HIVE-23779
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> After HIVE-16061, partition basic stats are not getting printed in beeline 
> console.
> {code:java}
> INFO : Partition {dt=2020-06-29} stats: [numFiles=21, numRows=22, 
> totalSize=14607, rawDataSize=0]{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23779) BasicStatsTask Info is not getting printed in beeline console

2021-03-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23779?focusedWorklogId=566553=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-566553
 ]

ASF GitHub Bot logged work on HIVE-23779:
-

Author: ASF GitHub Bot
Created on: 15/Mar/21 21:17
Start Date: 15/Mar/21 21:17
Worklog Time Spent: 10m 
  Work Description: miklosgergely merged pull request #2064:
URL: https://github.com/apache/hive/pull/2064


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 566553)
Time Spent: 1h 50m  (was: 1h 40m)

> BasicStatsTask Info is not getting printed in beeline console
> -
>
> Key: HIVE-23779
> URL: https://issues.apache.org/jira/browse/HIVE-23779
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> After HIVE-16061, partition basic stats are not getting printed in beeline 
> console.
> {code:java}
> INFO : Partition {dt=2020-06-29} stats: [numFiles=21, numRows=22, 
> totalSize=14607, rawDataSize=0]{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24883) Add support for array type columns in Hive Joins

2021-03-15 Thread Suprith (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17301925#comment-17301925
 ] 

Suprith commented on HIVE-24883:


Looks like it addresses a subset of problem(array type) reported here 
https://issues.apache.org/jira/browse/HIVE-20962

> Add support for array type columns in Hive Joins
> 
>
> Key: HIVE-24883
> URL: https://issues.apache.org/jira/browse/HIVE-24883
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hive fails to execute joins on array type columns as the comparison functions 
> are not able to handle array type columns.   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24816) Upgrade jackson to 2.10.5.1 or 2.11.0+ due to CVE-2020-25649

2021-03-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24816?focusedWorklogId=566514=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-566514
 ]

ASF GitHub Bot logged work on HIVE-24816:
-

Author: ASF GitHub Bot
Created on: 15/Mar/21 19:50
Start Date: 15/Mar/21 19:50
Worklog Time Spent: 10m 
  Work Description: saihemanth-cloudera opened a new pull request #2075:
URL: https://github.com/apache/hive/pull/2075


   
   
   ### What changes were proposed in this pull request?
   Jackson version changed to 2.11.0 in the pom files.
   
   
   
   ### Why are the changes needed?
   To avoid CVE-2020-25649
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   
   
   ### How was this patch tested?
   Local machine.
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 566514)
Time Spent: 20m  (was: 10m)

> Upgrade jackson to 2.10.5.1 or 2.11.0+ due to CVE-2020-25649
> 
>
> Key: HIVE-24816
> URL: https://issues.apache.org/jira/browse/HIVE-24816
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently, hive is pulling Jackson 2.10.5 version jar. Please upgrade to 
> 2.10.5.1 or 2.11.0+ due to CVE-2020-25649.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24853) HMS leaks queries in case of timeout

2021-03-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24853?focusedWorklogId=566472=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-566472
 ]

ASF GitHub Bot logged work on HIVE-24853:
-

Author: ASF GitHub Bot
Created on: 15/Mar/21 18:58
Start Date: 15/Mar/21 18:58
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on a change in pull request #2044:
URL: https://github.com/apache/hive/pull/2044#discussion_r594605627



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
##
@@ -1783,12 +1783,16 @@ private long partsFoundForPartitions(
   MetastoreDirectSqlUtils.timingTrace(doTrace, queryText, start, end);
   List list = MetastoreDirectSqlUtils.ensureList(qResult);
   List colStats = new 
ArrayList(list.size());
-  for (Object[] row : list) {
-colStats.add(prepareCSObjWithAdjustedNDV(row, 0, 
useDensityFunctionForNDVEstimation, ndvTuner));
-Deadline.checkTimeout();
+for (Object[] row : list) {

Review comment:
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 566472)
Time Spent: 3h 20m  (was: 3h 10m)

> HMS leaks queries in case of timeout
> 
>
> Key: HIVE-24853
> URL: https://issues.apache.org/jira/browse/HIVE-24853
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> The queries aren't closed in case of timeout.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24853) HMS leaks queries in case of timeout

2021-03-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24853?focusedWorklogId=566470=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-566470
 ]

ASF GitHub Bot logged work on HIVE-24853:
-

Author: ASF GitHub Bot
Created on: 15/Mar/21 18:57
Start Date: 15/Mar/21 18:57
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on a change in pull request #2044:
URL: https://github.com/apache/hive/pull/2044#discussion_r594604694



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
##
@@ -1783,12 +1783,16 @@ private long partsFoundForPartitions(
   MetastoreDirectSqlUtils.timingTrace(doTrace, queryText, start, end);
   List list = MetastoreDirectSqlUtils.ensureList(qResult);
   List colStats = new 
ArrayList(list.size());
-  for (Object[] row : list) {
-colStats.add(prepareCSObjWithAdjustedNDV(row, 0, 
useDensityFunctionForNDVEstimation, ndvTuner));
-Deadline.checkTimeout();
+for (Object[] row : list) {
+  colStats.add(prepareCSObjWithAdjustedNDV(row, 0,
+  useDensityFunctionForNDVEstimation, ndvTuner));
+  Deadline.checkTimeout();
+}
+return colStats;
+  } catch (Exception e) {
+throwMetaOrRuntimeException(e);
+return Collections.emptyList();

Review comment:
   Compiler wants it. :-(





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 566470)
Time Spent: 3h 10m  (was: 3h)

> HMS leaks queries in case of timeout
> 
>
> Key: HIVE-24853
> URL: https://issues.apache.org/jira/browse/HIVE-24853
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> The queries aren't closed in case of timeout.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24828) [HMS] Provide new HMS API to return latest committed compaction record for a given table

2021-03-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24828?focusedWorklogId=566358=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-566358
 ]

ASF GitHub Bot logged work on HIVE-24828:
-

Author: ASF GitHub Bot
Created on: 15/Mar/21 16:15
Start Date: 15/Mar/21 16:15
Worklog Time Spent: 10m 
  Work Description: hsnusonic commented on pull request #2073:
URL: https://github.com/apache/hive/pull/2073#issuecomment-799547557


   @kishendas @pvargacl Could you please review the changes? Thanks



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 566358)
Time Spent: 1h  (was: 50m)

> [HMS] Provide new HMS API to return latest committed compaction record for a 
> given table
> 
>
> Key: HIVE-24828
> URL: https://issues.apache.org/jira/browse/HIVE-24828
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Kishen Das
>Assignee: Yu-Wen Lai
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> We need a new HMS API to return the latest committed compaction record for a 
> given table. This can be used by a remote cache to decide whether a given 
> table's file metadata has been compacted or not, in order to decide whether 
> file metadata has to be refreshed from the file system before serving or it 
> can serve the current data from the cache. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24879) Create new metric about ACID metadata size

2021-03-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24879?focusedWorklogId=566339=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-566339
 ]

ASF GitHub Bot logged work on HIVE-24879:
-

Author: ASF GitHub Bot
Created on: 15/Mar/21 15:59
Start Date: 15/Mar/21 15:59
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #2074:
URL: https://github.com/apache/hive/pull/2074#discussion_r594465250



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/MetricsInfo.java
##
@@ -0,0 +1,26 @@
+package org.apache.hadoop.hive.metastore.txn;

Review comment:
   Missing apache licence





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 566339)
Time Spent: 20m  (was: 10m)

> Create new metric about ACID metadata size
> --
>
> Key: HIVE-24879
> URL: https://issues.apache.org/jira/browse/HIVE-24879
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> 2 new metrics should be created:
>  * Number of rows in txn_to_writeid table
>  * Number of rows in completed_txns table



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24879) Create new metric about ACID metadata size

2021-03-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24879?focusedWorklogId=566314=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-566314
 ]

ASF GitHub Bot logged work on HIVE-24879:
-

Author: ASF GitHub Bot
Created on: 15/Mar/21 15:38
Start Date: 15/Mar/21 15:38
Worklog Time Spent: 10m 
  Work Description: deniskuzZ opened a new pull request #2074:
URL: https://github.com/apache/hive/pull/2074


   
   
   ### What changes were proposed in this pull request?
   
   Introduced ACID metadata related metrics
   
   
   ### Why are the changes needed?
   
   Compaction observability
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   Unit tests



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 566314)
Remaining Estimate: 0h
Time Spent: 10m

> Create new metric about ACID metadata size
> --
>
> Key: HIVE-24879
> URL: https://issues.apache.org/jira/browse/HIVE-24879
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> 2 new metrics should be created:
>  * Number of rows in txn_to_writeid table
>  * Number of rows in completed_txns table



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24879) Create new metric about ACID metadata size

2021-03-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24879:
--
Labels: pull-request-available  (was: )

> Create new metric about ACID metadata size
> --
>
> Key: HIVE-24879
> URL: https://issues.apache.org/jira/browse/HIVE-24879
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> 2 new metrics should be created:
>  * Number of rows in txn_to_writeid table
>  * Number of rows in completed_txns table



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-24874) Worker performance metric

2021-03-15 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-24874 started by Denys Kuzmenko.
-
> Worker performance metric
> -
>
> Key: HIVE-24874
> URL: https://issues.apache.org/jira/browse/HIVE-24874
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>
> Wrap Compaction Worker with PerformanceLogger.
> Major and Minor compactions should be measured to separate metrics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-24879) Create new metric about ACID metadata size

2021-03-15 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-24879 started by Denys Kuzmenko.
-
> Create new metric about ACID metadata size
> --
>
> Key: HIVE-24879
> URL: https://issues.apache.org/jira/browse/HIVE-24879
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>
> 2 new metrics should be created:
>  * Number of rows in txn_to_writeid table
>  * Number of rows in completed_txns table



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24879) Create new metric about ACID metadata size

2021-03-15 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko reassigned HIVE-24879:
-

Assignee: Denys Kuzmenko

> Create new metric about ACID metadata size
> --
>
> Key: HIVE-24879
> URL: https://issues.apache.org/jira/browse/HIVE-24879
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>
> 2 new metrics should be created:
>  * Number of rows in txn_to_writeid table
>  * Number of rows in completed_txns table



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24871) Initiator / Cleaner performance metrics

2021-03-15 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko resolved HIVE-24871.
---
Resolution: Fixed

> Initiator / Cleaner performance metrics
> ---
>
> Key: HIVE-24871
> URL: https://issues.apache.org/jira/browse/HIVE-24871
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The PerformanceLogger should be used in Initiator and Cleaner service.
>  * One cycle of Initiator should be measured, with ignoring the time spent 
> waiting on the lock for AUX table
>  * One compaction cleanup should be measured in Cleaner (using different 
> metric for major and minor compaction cleanup)
> Important note: the PerformanceLogger implementation from metastore should be 
> used (not the ql one) otherwise the metric won't be published in HMS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24874) Worker performance metric

2021-03-15 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko reassigned HIVE-24874:
-

Assignee: Denys Kuzmenko

> Worker performance metric
> -
>
> Key: HIVE-24874
> URL: https://issues.apache.org/jira/browse/HIVE-24874
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>
> Wrap Compaction Worker with PerformanceLogger.
> Major and Minor compactions should be measured to separate metrics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24871) Initiator / Cleaner performance metrics

2021-03-15 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko reassigned HIVE-24871:
-

Assignee: Denys Kuzmenko

> Initiator / Cleaner performance metrics
> ---
>
> Key: HIVE-24871
> URL: https://issues.apache.org/jira/browse/HIVE-24871
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The PerformanceLogger should be used in Initiator and Cleaner service.
>  * One cycle of Initiator should be measured, with ignoring the time spent 
> waiting on the lock for AUX table
>  * One compaction cleanup should be measured in Cleaner (using different 
> metric for major and minor compaction cleanup)
> Important note: the PerformanceLogger implementation from metastore should be 
> used (not the ql one) otherwise the metric won't be published in HMS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24886) Support simple equality operations between MAP data types

2021-03-15 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis reassigned HIVE-24886:
--


> Support simple equality operations between MAP data types
> -
>
> Key: HIVE-24886
> URL: https://issues.apache.org/jira/browse/HIVE-24886
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Planning, Query Processor
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> Currently equality operations between MAP data types work in some very 
> limited cases e.g:
> {code:sql}
> create table table_map_types (id int, c1 map, c2 map);
> select id from table_map_types where map(1,1) IN (map(1,1), map(1,2), 
> map(1,3)); 
> {code}
> but this feature was never introduced explicitly (zero tests & JIRAs around 
> the subject) and the vast majority of queries involving comparisons between 
> MAP types now fail at compile time.
> The goal of this issue is to support simple equality operations:
> * EQUALS(=)
> * NOT_EQUALS(<>),
> * IN,
> * IS DISTINCT FROM,
> * IS NOT DISTINCT FROM
> between MAP data types when the compared (MAP) types are identical.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24885) The state of unset low or high value in LongColumnStatsData can not be retrieved

2021-03-15 Thread Qifan Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qifan Chen updated HIVE-24885:
--
Description: 
During the work to improve Impala column stats to compute min/max for columns, 
it is found that the state of unset low or high value in LongColumnStatsData 
can not be retrieved back. This is illustrated in the following Impala test 
case added to MetastoreEventsProcessorTest. 


{code:java} 
   
  @Test 
   
  public void testUnsetAndCheckUnsetLowHighValue() throws CatalogException {
   
try (MetaStoreClient msClient = catalog_.getMetaStoreClient()) {
   
  List colNames = new ArrayList();  
   
  colNames.add("id");   
   
  colNames.add("int_col");  
   
  colNames.add("bigint_col");   
   
  List colStatsObjs =  
   
  msClient.getHiveClient().getTableColumnStatistics(
   
  "unique_database", "alltypes", colNames, "impala");   
   
  for (ColumnStatisticsObj colStatsObj : colStatsObjs) {
   
ColumnStatisticsData colStatsData = colStatsObj.getStatsData(); 
   
LongColumnStatsData longColStatsData = colStatsData.getLongStats(); 
   
longColStatsData.unsetLowValue();   
   
longColStatsData.unsetHighValue();  
   
colStatsData.setLongStats(longColStatsData);
   
  } 
   
  assertTrue("All good!", true);
   
  colStatsObjs = msClient.getHiveClient().getTableColumnStatistics( 
   
  "unique_database", "alltypes", colNames, "impala");   
   
  for (ColumnStatisticsObj colStatsObj : colStatsObjs) {
   
ColumnStatisticsData colStatsData = colStatsObj.getStatsData(); 
   
LongColumnStatsData longColStatsData = colStatsData.getLongStats(); 
   
assertFalse("isSetLowValue() should be false", 
longColStatsData.isSetLowValue());  
assertFalse(
   
"isSetHighValue() should be false", 
longColStatsData.isSetHighValue());
  } 
   
  assertTrue("All good!", true);
   
} catch (NoSuchObjectException e) { 
   
  assertFalse(String.format("No such object exception: %s", e), false); 
   
} catch (MetaException e) { 
   
  assertFalse(String.format("Metadata exception: %s", e), false);   
   
} catch (TException e) {
   
  assertFalse(String.format("TException: %s", e), false);   
   
}   
   
  } 
{code}

The assertion on isSetLowValue() or isSetHighValue() should be false, since 
longColStatsData.unsetLowValue() is called in the first loop.

To build the test, 

{code:java}   
mvn -f $IMPALA_HOME/fe/pom.xml test -e -Djava.compiler=NONE -ff 
-Dtest=MetastoreEventsProcessorTest#testUnsetAndCheckUnsetLowHighValue
{code}

Table unique_database.alltypes is defined as follows.

{code:java}  
 CREATE EXTERNAL TABLE unique_database.alltypes (   
  
   id INT,  
  
   bool_col BOOLEAN,
  
   tinyint_col TINYINT, 
  
   smallint_col SMALLINT,   
  
   int_col INT, 
  
   bigint_col BIGINT,   
  

[jira] [Updated] (HIVE-24885) The state of unset low or high value in LongColumnStatsData can not be retrieved

2021-03-15 Thread Qifan Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qifan Chen updated HIVE-24885:
--
Description: 
During the work to improve Impala column stats to compute min/max for columns, 
it is found that the state of unset low or high value in LongColumnStatsData 
can not be retrieved back. This is illustrated in the following Impala test 
case added to MetastoreEventsProcessorTest. 


{code:java} 
   
  @Test 
   
  public void testUnsetAndCheckUnsetLowHighValue() throws CatalogException {
   
try (MetaStoreClient msClient = catalog_.getMetaStoreClient()) {
   
  List colNames = new ArrayList();  
   
  colNames.add("id");   
   
  colNames.add("int_col");  
   
  colNames.add("bigint_col");   
   
  List colStatsObjs =  
   
  msClient.getHiveClient().getTableColumnStatistics(
   
  "unique_database", "alltypes", colNames, "impala");   
   
  for (ColumnStatisticsObj colStatsObj : colStatsObjs) {
   
ColumnStatisticsData colStatsData = colStatsObj.getStatsData(); 
   
LongColumnStatsData longColStatsData = colStatsData.getLongStats(); 
   
longColStatsData.unsetLowValue();   
   
longColStatsData.unsetHighValue();  
   
colStatsData.setLongStats(longColStatsData);
   
  } 
   
  assertTrue("All good!", true);
   
  colStatsObjs = msClient.getHiveClient().getTableColumnStatistics( 
   
  "unique_database", "alltypes", colNames, "impala");   
   
  for (ColumnStatisticsObj colStatsObj : colStatsObjs) {
   
ColumnStatisticsData colStatsData = colStatsObj.getStatsData(); 
   
LongColumnStatsData longColStatsData = colStatsData.getLongStats(); 
   
assertFalse("isSetLowValue() should be false", 
longColStatsData.isSetLowValue());  
assertFalse(
   
"isSetHighValue() should be false", 
longColStatsData.isSetHighValue());
  } 
   
  assertTrue("All good!", true);
   
} catch (NoSuchObjectException e) { 
   
  assertFalse(String.format("No such object exception: %s", e), false); 
   
} catch (MetaException e) { 
   
  assertFalse(String.format("Metadata exception: %s", e), false);   
   
} catch (TException e) {
   
  assertFalse(String.format("TException: %s", e), false);   
   
}   
   
  } 
{code}

The assertion on isSetLowValue() or isSetHighValue() should be false, since 
longColStatsData.unsetLowValue() is called in the first loop.

To build the test, 

mvn -f $IMPALA_HOME/fe/pom.xml test -e -Djava.compiler=NONE -ff 
-Dtest=MetastoreEventsProcessorTest#testUnsetAndCheckUnsetLowHighValue


Table unique_database.alltypes is defined as follows.

 CREATE EXTERNAL TABLE unique_database.alltypes (   
  
   id INT,  
  
   bool_col BOOLEAN,
  
   tinyint_col TINYINT, 
  
   smallint_col SMALLINT,   
  
   int_col INT, 
  
   bigint_col BIGINT,   
  
   float_col FLOAT,   

[jira] [Updated] (HIVE-24885) The state of unset low or high value in LongColumnStatsData can not be retrieved

2021-03-15 Thread Qifan Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qifan Chen updated HIVE-24885:
--
Description: 
During the work to improve Impala column stats to compute min/max for columns, 
it is found that the state of unset low or high value in LongColumnStatsData 
can not be retrieved back. This is illustrated in the following Impala test 
case added to MetastoreEventsProcessorTest. 

{code:java} 
   
  @Test 
   
  public void testUnsetAndCheckUnsetLowHighValue() throws CatalogException {
   
try (MetaStoreClient msClient = catalog_.getMetaStoreClient()) {
   
  List colNames = new ArrayList();  
   
  colNames.add("id");   
   
  colNames.add("int_col");  
   
  colNames.add("bigint_col");   
   
  List colStatsObjs =  
   
  msClient.getHiveClient().getTableColumnStatistics(
   
  "unique_database", "alltypes", colNames, "impala");   
   
  for (ColumnStatisticsObj colStatsObj : colStatsObjs) {
   
ColumnStatisticsData colStatsData = colStatsObj.getStatsData(); 
   
LongColumnStatsData longColStatsData = colStatsData.getLongStats(); 
   
longColStatsData.unsetLowValue();   
   
longColStatsData.unsetHighValue();  
   
colStatsData.setLongStats(longColStatsData);
   
  } 
   
  assertTrue("All good!", true);
   
  colStatsObjs = msClient.getHiveClient().getTableColumnStatistics( 
   
  "unique_database", "alltypes", colNames, "impala");   
   
  for (ColumnStatisticsObj colStatsObj : colStatsObjs) {
   
ColumnStatisticsData colStatsData = colStatsObj.getStatsData(); 
   
LongColumnStatsData longColStatsData = colStatsData.getLongStats(); 
   
assertFalse("isSetLowValue() should be false", 
longColStatsData.isSetLowValue());  
assertFalse(
   
"isSetHighValue() should be false", 
longColStatsData.isSetHighValue());
  } 
   
  assertTrue("All good!", true);
   
} catch (NoSuchObjectException e) { 
   
  assertFalse(String.format("No such object exception: %s", e), false); 
   
} catch (MetaException e) { 
   
  assertFalse(String.format("Metadata exception: %s", e), false);   
   
} catch (TException e) {
   
  assertFalse(String.format("TException: %s", e), false);   
   
}   
   
  } 
{code:java}

The assertion on isSetLowValue() or isSetHighValue() should be false, since 
longColStatsData.unsetLowValue() is called in the first loop.

To build the test, 

mvn -f $IMPALA_HOME/fe/pom.xml test -e -Djava.compiler=NONE -ff 
-Dtest=MetastoreEventsProcessorTest#testUnsetAndCheckUnsetLowHighValue


Table unique_database.alltypes is defined as follows.

 CREATE EXTERNAL TABLE unique_database.alltypes (   
  
   id INT,  
  
   bool_col BOOLEAN,
  
   tinyint_col TINYINT, 
  
   smallint_col SMALLINT,   
  
   int_col INT, 
  
   bigint_col BIGINT,   
  
   float_col FLOAT,   

[jira] [Updated] (HIVE-24885) The state of unset low or high value in LongColumnStatsData can not be retrieved

2021-03-15 Thread Qifan Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qifan Chen updated HIVE-24885:
--
Description: 
During the work to improve Impala column stats to compute min/max for columns, 
it is found that the state of unset low or high value in LongColumnStatsData 
can not be retrieved back. This is illustrated in the following Impala test 
case added to MetastoreEventsProcessorTest. 



  @Test 
   
  public void testUnsetAndCheckUnsetLowHighValue() throws CatalogException {
   
try (MetaStoreClient msClient = catalog_.getMetaStoreClient()) {
   
  List colNames = new ArrayList();  
   
  colNames.add("id");   
   
  colNames.add("int_col");  
   
  colNames.add("bigint_col");   
   
  List colStatsObjs =  
   
  msClient.getHiveClient().getTableColumnStatistics(
   
  "unique_database", "alltypes", colNames, "impala");   
   
  for (ColumnStatisticsObj colStatsObj : colStatsObjs) {
   
ColumnStatisticsData colStatsData = colStatsObj.getStatsData(); 
   
LongColumnStatsData longColStatsData = colStatsData.getLongStats(); 
   
longColStatsData.unsetLowValue();   
   
longColStatsData.unsetHighValue();  
   
colStatsData.setLongStats(longColStatsData);
   
  } 
   
  assertTrue("All good!", true);
   
  colStatsObjs = msClient.getHiveClient().getTableColumnStatistics( 
   
  "unique_database", "alltypes", colNames, "impala");   
   
  for (ColumnStatisticsObj colStatsObj : colStatsObjs) {
   
ColumnStatisticsData colStatsData = colStatsObj.getStatsData(); 
   
LongColumnStatsData longColStatsData = colStatsData.getLongStats(); 
   
assertFalse("isSetLowValue() should be false", 
longColStatsData.isSetLowValue());  
assertFalse(
   
"isSetHighValue() should be false", 
longColStatsData.isSetHighValue());
  } 
   
  assertTrue("All good!", true);
   
} catch (NoSuchObjectException e) { 
   
  assertFalse(String.format("No such object exception: %s", e), false); 
   
} catch (MetaException e) { 
   
  assertFalse(String.format("Metadata exception: %s", e), false);   
   
} catch (TException e) {
   
  assertFalse(String.format("TException: %s", e), false);   
   
}   
   
  } 

The assertion on isSetLowValue() or isSetHighValue() should be false, since 
longColStatsData.unsetLowValue() is called in the first loop.

To build the test, 

mvn -f $IMPALA_HOME/fe/pom.xml test -e -Djava.compiler=NONE -ff 
-Dtest=MetastoreEventsProcessorTest#testUnsetAndCheckUnsetLowHighValue


Table unique_database.alltypes is defined as follows.

 CREATE EXTERNAL TABLE unique_database.alltypes (   
  
   id INT,  
  
   bool_col BOOLEAN,
  
   tinyint_col TINYINT, 
  
   smallint_col SMALLINT,   
  
   int_col INT, 
  
   bigint_col BIGINT,   
  
   float_col FLOAT,  

[jira] [Updated] (HIVE-24885) The state of unset low or high value in LongColumnStatsData can not be retrieved

2021-03-15 Thread Qifan Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qifan Chen updated HIVE-24885:
--
Description: 
During the work to improve Impala column stats to compute min/max for columns, 
it is found that the state of unset low or high value in LongColumnStatsData 
can not be retrieved back. This is illustrated in the following Impala test 
case added to MetastoreEventsProcessorTest. 

  /**   
   
   * Unset the low and the high value first and then check. 
   
   */   
   
  @Test 
   
  public void testUnsetAndCheckUnsetLowHighValue() throws CatalogException {
   
try (MetaStoreClient msClient = catalog_.getMetaStoreClient()) {
   
  List colNames = new ArrayList();  
   
  colNames.add("id");   
   
  colNames.add("int_col");  
   
  colNames.add("bigint_col");   
   
  List colStatsObjs =  
   
  msClient.getHiveClient().getTableColumnStatistics(
   
  "unique_database", "alltypes", colNames, "impala");   
   
  for (ColumnStatisticsObj colStatsObj : colStatsObjs) {
   
ColumnStatisticsData colStatsData = colStatsObj.getStatsData(); 
   
LongColumnStatsData longColStatsData = colStatsData.getLongStats(); 
   
longColStatsData.unsetLowValue();   
   
longColStatsData.unsetHighValue();  
   
colStatsData.setLongStats(longColStatsData);
   
  } 
   
  assertTrue("All good!", true);
   
  colStatsObjs = msClient.getHiveClient().getTableColumnStatistics( 
   
  "unique_database", "alltypes", colNames, "impala");   
   
  for (ColumnStatisticsObj colStatsObj : colStatsObjs) {
   
ColumnStatisticsData colStatsData = colStatsObj.getStatsData(); 
   
LongColumnStatsData longColStatsData = colStatsData.getLongStats(); 
   
assertFalse("isSetLowValue() should be false", 
longColStatsData.isSetLowValue());  
assertFalse(
   
"isSetHighValue() should be false", 
longColStatsData.isSetHighValue());
  } 
   
  assertTrue("All good!", true);
   
} catch (NoSuchObjectException e) { 
   
  assertFalse(String.format("No such object exception: %s", e), false); 
   
} catch (MetaException e) { 
   
  assertFalse(String.format("Metadata exception: %s", e), false);   
   
} catch (TException e) {
   
  assertFalse(String.format("TException: %s", e), false);   
   
}   
   
  } 

The assertion on isSetLowValue() or isSetHighValue() should be false, since 
longColStatsData.unsetLowValue() is called in the first loop.

To build the test, 

mvn -f $IMPALA_HOME/fe/pom.xml test -e -Djava.compiler=NONE -ff 
-Dtest=MetastoreEventsProcessorTest#testUnsetAndCheckUnsetLowHighValue


Table unique_database.alltypes is defined as follows.

 CREATE EXTERNAL TABLE unique_database.alltypes (   
  
   id INT,  
  
   bool_col BOOLEAN,
  
   tinyint_col TINYINT, 
  
   smallint_col SMALLINT,   
  
   int_col INT, 

[jira] [Work logged] (HIVE-24590) Operation Logging still leaks the log4j Appenders

2021-03-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24590?focusedWorklogId=566165=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-566165
 ]

ASF GitHub Bot logged work on HIVE-24590:
-

Author: ASF GitHub Bot
Created on: 15/Mar/21 12:54
Start Date: 15/Mar/21 12:54
Worklog Time Spent: 10m 
  Work Description: EugeneChung edited a comment on pull request #1849:
URL: https://github.com/apache/hive/pull/1849#issuecomment-799393602


   @zabetak Yes, I forgot to let you know. It worked well, but in my company's 
repo. I chose to clear all the log4j MDC. The leak and incorrect operation 
logging have been gone away.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 566165)
Time Spent: 1.5h  (was: 1h 20m)

> Operation Logging still leaks the log4j Appenders
> -
>
> Key: HIVE-24590
> URL: https://issues.apache.org/jira/browse/HIVE-24590
> Project: Hive
>  Issue Type: Bug
>  Components: Logging
>Reporter: Eugene Chung
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screen Shot 2021-01-06 at 18.42.05.png, Screen Shot 
> 2021-01-06 at 18.42.24.png, Screen Shot 2021-01-06 at 18.42.55.png, Screen 
> Shot 2021-01-06 at 21.38.32.png, Screen Shot 2021-01-06 at 21.47.28.png, 
> Screen Shot 2021-01-08 at 21.01.40.png, add_debug_log_and_trace.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> I'm using Hive 3.1.2 with options below.
>  * hive.server2.logging.operation.enabled=true
>  * hive.server2.logging.operation.level=VERBOSE
>  * hive.async.log.enabled=false
> I already know the ticket, https://issues.apache.org/jira/browse/HIVE-17128 
> but HS2 still leaks log4j RandomAccessFileManager.
> !Screen Shot 2021-01-06 at 18.42.05.png|width=756,height=197!
> I checked the operation log file which is not closed/deleted properly.
> !Screen Shot 2021-01-06 at 18.42.24.png|width=603,height=272!
> Then there's the log,
> {code:java}
> client.TezClient: Shutting down Tez Session, sessionName= {code}
> !Screen Shot 2021-01-06 at 18.42.55.png|width=1372,height=26!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24590) Operation Logging still leaks the log4j Appenders

2021-03-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24590?focusedWorklogId=566163=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-566163
 ]

ASF GitHub Bot logged work on HIVE-24590:
-

Author: ASF GitHub Bot
Created on: 15/Mar/21 12:51
Start Date: 15/Mar/21 12:51
Worklog Time Spent: 10m 
  Work Description: EugeneChung commented on pull request #1849:
URL: https://github.com/apache/hive/pull/1849#issuecomment-799393602


   @zabetak Yes, I forgot to let you know. It worked well, but in my company's 
repo. I chose to clear all the log4j MDC.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 566163)
Time Spent: 1h 20m  (was: 1h 10m)

> Operation Logging still leaks the log4j Appenders
> -
>
> Key: HIVE-24590
> URL: https://issues.apache.org/jira/browse/HIVE-24590
> Project: Hive
>  Issue Type: Bug
>  Components: Logging
>Reporter: Eugene Chung
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screen Shot 2021-01-06 at 18.42.05.png, Screen Shot 
> 2021-01-06 at 18.42.24.png, Screen Shot 2021-01-06 at 18.42.55.png, Screen 
> Shot 2021-01-06 at 21.38.32.png, Screen Shot 2021-01-06 at 21.47.28.png, 
> Screen Shot 2021-01-08 at 21.01.40.png, add_debug_log_and_trace.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> I'm using Hive 3.1.2 with options below.
>  * hive.server2.logging.operation.enabled=true
>  * hive.server2.logging.operation.level=VERBOSE
>  * hive.async.log.enabled=false
> I already know the ticket, https://issues.apache.org/jira/browse/HIVE-17128 
> but HS2 still leaks log4j RandomAccessFileManager.
> !Screen Shot 2021-01-06 at 18.42.05.png|width=756,height=197!
> I checked the operation log file which is not closed/deleted properly.
> !Screen Shot 2021-01-06 at 18.42.24.png|width=603,height=272!
> Then there's the log,
> {code:java}
> client.TezClient: Shutting down Tez Session, sessionName= {code}
> !Screen Shot 2021-01-06 at 18.42.55.png|width=1372,height=26!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24590) Operation Logging still leaks the log4j Appenders

2021-03-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24590?focusedWorklogId=566128=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-566128
 ]

ASF GitHub Bot logged work on HIVE-24590:
-

Author: ASF GitHub Bot
Created on: 15/Mar/21 11:16
Start Date: 15/Mar/21 11:16
Worklog Time Spent: 10m 
  Work Description: zabetak commented on pull request #1849:
URL: https://github.com/apache/hive/pull/1849#issuecomment-799336894


   Hey @EugeneChung did you have a change to try out this fix?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 566128)
Time Spent: 1h  (was: 50m)

> Operation Logging still leaks the log4j Appenders
> -
>
> Key: HIVE-24590
> URL: https://issues.apache.org/jira/browse/HIVE-24590
> Project: Hive
>  Issue Type: Bug
>  Components: Logging
>Reporter: Eugene Chung
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screen Shot 2021-01-06 at 18.42.05.png, Screen Shot 
> 2021-01-06 at 18.42.24.png, Screen Shot 2021-01-06 at 18.42.55.png, Screen 
> Shot 2021-01-06 at 21.38.32.png, Screen Shot 2021-01-06 at 21.47.28.png, 
> Screen Shot 2021-01-08 at 21.01.40.png, add_debug_log_and_trace.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> I'm using Hive 3.1.2 with options below.
>  * hive.server2.logging.operation.enabled=true
>  * hive.server2.logging.operation.level=VERBOSE
>  * hive.async.log.enabled=false
> I already know the ticket, https://issues.apache.org/jira/browse/HIVE-17128 
> but HS2 still leaks log4j RandomAccessFileManager.
> !Screen Shot 2021-01-06 at 18.42.05.png|width=756,height=197!
> I checked the operation log file which is not closed/deleted properly.
> !Screen Shot 2021-01-06 at 18.42.24.png|width=603,height=272!
> Then there's the log,
> {code:java}
> client.TezClient: Shutting down Tez Session, sessionName= {code}
> !Screen Shot 2021-01-06 at 18.42.55.png|width=1372,height=26!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24590) Operation Logging still leaks the log4j Appenders

2021-03-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24590?focusedWorklogId=566129=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-566129
 ]

ASF GitHub Bot logged work on HIVE-24590:
-

Author: ASF GitHub Bot
Created on: 15/Mar/21 11:16
Start Date: 15/Mar/21 11:16
Worklog Time Spent: 10m 
  Work Description: zabetak edited a comment on pull request #1849:
URL: https://github.com/apache/hive/pull/1849#issuecomment-799336894


   Hey @EugeneChung did you have a chance to try out this fix?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 566129)
Time Spent: 1h 10m  (was: 1h)

> Operation Logging still leaks the log4j Appenders
> -
>
> Key: HIVE-24590
> URL: https://issues.apache.org/jira/browse/HIVE-24590
> Project: Hive
>  Issue Type: Bug
>  Components: Logging
>Reporter: Eugene Chung
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screen Shot 2021-01-06 at 18.42.05.png, Screen Shot 
> 2021-01-06 at 18.42.24.png, Screen Shot 2021-01-06 at 18.42.55.png, Screen 
> Shot 2021-01-06 at 21.38.32.png, Screen Shot 2021-01-06 at 21.47.28.png, 
> Screen Shot 2021-01-08 at 21.01.40.png, add_debug_log_and_trace.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> I'm using Hive 3.1.2 with options below.
>  * hive.server2.logging.operation.enabled=true
>  * hive.server2.logging.operation.level=VERBOSE
>  * hive.async.log.enabled=false
> I already know the ticket, https://issues.apache.org/jira/browse/HIVE-17128 
> but HS2 still leaks log4j RandomAccessFileManager.
> !Screen Shot 2021-01-06 at 18.42.05.png|width=756,height=197!
> I checked the operation log file which is not closed/deleted properly.
> !Screen Shot 2021-01-06 at 18.42.24.png|width=603,height=272!
> Then there's the log,
> {code:java}
> client.TezClient: Shutting down Tez Session, sessionName= {code}
> !Screen Shot 2021-01-06 at 18.42.55.png|width=1372,height=26!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24871) Initiator / Cleaner performance metrics

2021-03-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24871?focusedWorklogId=566057=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-566057
 ]

ASF GitHub Bot logged work on HIVE-24871:
-

Author: ASF GitHub Bot
Created on: 15/Mar/21 08:01
Start Date: 15/Mar/21 08:01
Worklog Time Spent: 10m 
  Work Description: deniskuzZ merged pull request #2061:
URL: https://github.com/apache/hive/pull/2061


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 566057)
Time Spent: 1h  (was: 50m)

> Initiator / Cleaner performance metrics
> ---
>
> Key: HIVE-24871
> URL: https://issues.apache.org/jira/browse/HIVE-24871
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The PerformanceLogger should be used in Initiator and Cleaner service.
>  * One cycle of Initiator should be measured, with ignoring the time spent 
> waiting on the lock for AUX table
>  * One compaction cleanup should be measured in Cleaner (using different 
> metric for major and minor compaction cleanup)
> Important note: the PerformanceLogger implementation from metastore should be 
> used (not the ql one) otherwise the metric won't be published in HMS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24718) Moving to file based iteration for copying data

2021-03-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24718?focusedWorklogId=566051=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-566051
 ]

ASF GitHub Bot logged work on HIVE-24718:
-

Author: ASF GitHub Bot
Created on: 15/Mar/21 07:39
Start Date: 15/Mar/21 07:39
Worklog Time Spent: 10m 
  Work Description: ArkoSharma commented on a change in pull request #1936:
URL: https://github.com/apache/hive/pull/1936#discussion_r594106033



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcrossInstances.java
##
@@ -2225,17 +2224,11 @@ private void setupUDFJarOnHDFS(Path 
identityUdfLocalPath, Path identityUdfHdfsPa
   /*
* Method used from TestReplicationScenariosExclusiveReplica
*/
-  private void assertExternalFileInfo(List expected, String 
dumplocation, boolean isIncremental,
+  private void assertExternalFileList(List expected, String 
dumplocation,
   WarehouseInstance warehouseInstance)
   throws IOException {
 Path hivePath = new Path(dumplocation, ReplUtils.REPL_HIVE_BASE_DIR);
-Path metadataPath = new Path(hivePath, EximUtil.METADATA_PATH_NAME);
-Path externalTableInfoFile;
-if (isIncremental) {
-  externalTableInfoFile = new Path(hivePath, FILE_NAME);
-} else {
-  externalTableInfoFile = new Path(metadataPath, 
primaryDbName.toLowerCase() + File.separator + FILE_NAME);
-}
-ReplicationTestUtils.assertExternalFileInfo(warehouseInstance, expected, 
externalTableInfoFile);
+Path externalTblFileList = new Path(hivePath, EximUtil.FILE_LIST_EXTERNAL);

Review comment:
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 566051)
Time Spent: 6.5h  (was: 6h 20m)

> Moving to file based iteration for copying data
> ---
>
> Key: HIVE-24718
> URL: https://issues.apache.org/jira/browse/HIVE-24718
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24718.01.patch, HIVE-24718.02.patch, 
> HIVE-24718.04.patch, HIVE-24718.05.patch, HIVE-24718.06.patch
>
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24718) Moving to file based iteration for copying data

2021-03-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24718?focusedWorklogId=566050=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-566050
 ]

ASF GitHub Bot logged work on HIVE-24718:
-

Author: ASF GitHub Bot
Created on: 15/Mar/21 07:39
Start Date: 15/Mar/21 07:39
Worklog Time Spent: 10m 
  Work Description: ArkoSharma commented on a change in pull request #1936:
URL: https://github.com/apache/hive/pull/1936#discussion_r594105835



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTablesMetaDataOnly.java
##
@@ -639,9 +629,11 @@ public void testIncrementalDumpEmptyDumpDirectory() throws 
Throwable {
 .verifyResult(inc2Tuple.lastReplicationId);
   }
 
-  private void assertFalseExternalFileList(Path externalTableFileList)
-  throws IOException {
+  private void assertFalseExternalFileList(String dumpLocation)

Review comment:
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 566050)
Time Spent: 6h 20m  (was: 6h 10m)

> Moving to file based iteration for copying data
> ---
>
> Key: HIVE-24718
> URL: https://issues.apache.org/jira/browse/HIVE-24718
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24718.01.patch, HIVE-24718.02.patch, 
> HIVE-24718.04.patch, HIVE-24718.05.patch, HIVE-24718.06.patch
>
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24718) Moving to file based iteration for copying data

2021-03-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24718?focusedWorklogId=566049=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-566049
 ]

ASF GitHub Bot logged work on HIVE-24718:
-

Author: ASF GitHub Bot
Created on: 15/Mar/21 07:38
Start Date: 15/Mar/21 07:38
Worklog Time Spent: 10m 
  Work Description: ArkoSharma commented on a change in pull request #1936:
URL: https://github.com/apache/hive/pull/1936#discussion_r594105742



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTablesMetaDataOnly.java
##
@@ -639,9 +629,11 @@ public void testIncrementalDumpEmptyDumpDirectory() throws 
Throwable {
 .verifyResult(inc2Tuple.lastReplicationId);
   }
 
-  private void assertFalseExternalFileList(Path externalTableFileList)
-  throws IOException {
+  private void assertFalseExternalFileList(String dumpLocation)
+  throws IOException {

Review comment:
   getFileSystem() call throws IOException.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 566049)
Time Spent: 6h 10m  (was: 6h)

> Moving to file based iteration for copying data
> ---
>
> Key: HIVE-24718
> URL: https://issues.apache.org/jira/browse/HIVE-24718
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24718.01.patch, HIVE-24718.02.patch, 
> HIVE-24718.04.patch, HIVE-24718.05.patch, HIVE-24718.06.patch
>
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)