[jira] [Work logged] (HIVE-24624) Repl Load should detect the compatible staging dir

2021-02-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24624?focusedWorklogId=550071&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-550071
 ]

ASF GitHub Bot logged work on HIVE-24624:
-

Author: ASF GitHub Bot
Created on: 09/Feb/21 07:08
Start Date: 09/Feb/21 07:08
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1855:
URL: https://github.com/apache/hive/pull/1855#discussion_r572639940



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/ReplicationSemanticAnalyzer.java
##
@@ -318,27 +317,38 @@ private void analyzeReplLoad(ASTNode ast) throws 
SemanticException {
 .getEncodedDumpRootPath(conf, sourceDbNameOrPattern.toLowerCase()), 
conf), conf)) {
 throw new 
Exception(ErrorMsg.REPL_FAILED_WITH_NON_RECOVERABLE_ERROR.getMsg());
   }
-  if (loadPath != null) {
-DumpMetaData dmd = new DumpMetaData(loadPath, conf);
-
-boolean evDump = false;
-// we will decide what hdfs locations needs to be copied over here as 
well.
-if (dmd.isIncrementalDump()) {
-  LOG.debug("{} contains an incremental dump", loadPath);
-  evDump = true;
-} else {
-  LOG.debug("{} contains an bootstrap dump", loadPath);
-}
-ReplLoadWork replLoadWork = new ReplLoadWork(conf, 
loadPath.toString(), sourceDbNameOrPattern,
-replScope.getDbName(),
-dmd.getReplScope(),
-queryState.getLineageState(), evDump, dmd.getEventTo(), 
dmd.getDumpExecutionId(),
-initMetricCollection(!evDump, loadPath.toString(), 
replScope.getDbName(),
-  dmd.getDumpExecutionId()), dmd.isReplScopeModified());
-rootTasks.add(TaskFactory.get(replLoadWork, conf));
-  } else {
+
+  if (loadPath == null) {
+return;

Review comment:
   Add a log line.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 550071)
Time Spent: 1h 10m  (was: 1h)

> Repl Load should detect the compatible staging dir
> --
>
> Key: HIVE-24624
> URL: https://issues.apache.org/jira/browse/HIVE-24624
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pratyushotpal Madhukar
>Assignee: Pratyushotpal Madhukar
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24624.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Repl load in CDP when pointed to a staging dir should be able to detect 
> whether the staging dir has the dump structure in compatible format or not



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24624) Repl Load should detect the compatible staging dir

2021-02-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24624?focusedWorklogId=550070&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-550070
 ]

ASF GitHub Bot logged work on HIVE-24624:
-

Author: ASF GitHub Bot
Created on: 09/Feb/21 07:07
Start Date: 09/Feb/21 07:07
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1855:
URL: https://github.com/apache/hive/pull/1855#discussion_r572639339



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/ReplicationSemanticAnalyzer.java
##
@@ -372,16 +382,22 @@ private Path getCurrentLoadPath() throws IOException, 
SemanticException {
   }
 }
 Path hiveDumpPath = new Path(latestUpdatedStatus.getPath(), 
ReplUtils.REPL_HIVE_BASE_DIR);
-if (loadPathBase.getFileSystem(conf).exists(new Path(hiveDumpPath,

Review comment:
   why is this check removed?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 550070)
Time Spent: 1h  (was: 50m)

> Repl Load should detect the compatible staging dir
> --
>
> Key: HIVE-24624
> URL: https://issues.apache.org/jira/browse/HIVE-24624
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pratyushotpal Madhukar
>Assignee: Pratyushotpal Madhukar
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24624.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Repl load in CDP when pointed to a staging dir should be able to detect 
> whether the staging dir has the dump structure in compatible format or not



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24752) Returned operation's drilldown link may be broken

2021-02-08 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-24752:
---
Description: 
Since HIVE-23625, the path spec for the query page has changed from 
_query_page_ to _query_page.html_,
{code:java}
webServer.addServlet("query_page", "/query_page.html", 
QueryProfileServlet.class);{code}
the drilldown link of the operation returned may be broken if 
hive.server2.show.operation.drilldown.link is enabled...

  was:
The path spec for the query page has changed from _query_page_ to 
_query_page.html_,
{code:java}
webServer.addServlet("query_page", "/query_page.html", 
QueryProfileServlet.class);{code}
the drilldown link of the operation returned may be broken if 
hive.server2.show.operation.drilldown.link is enabled...


> Returned operation's drilldown link may be broken
> -
>
> Key: HIVE-24752
> URL: https://issues.apache.org/jira/browse/HIVE-24752
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Zhihua Deng
>Priority: Major
>
> Since HIVE-23625, the path spec for the query page has changed from 
> _query_page_ to _query_page.html_,
> {code:java}
> webServer.addServlet("query_page", "/query_page.html", 
> QueryProfileServlet.class);{code}
> the drilldown link of the operation returned may be broken if 
> hive.server2.show.operation.drilldown.link is enabled...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24752) Returned operation's drilldown link may be broken

2021-02-08 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-24752:
---
Summary: Returned operation's drilldown link may be broken  (was: Returned 
operation's drilldown link may be broken since HIVE-23625)

> Returned operation's drilldown link may be broken
> -
>
> Key: HIVE-24752
> URL: https://issues.apache.org/jira/browse/HIVE-24752
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Zhihua Deng
>Priority: Major
>
> The path spec for the query page has changed from _query_page_ to 
> _query_page.html_,
> {code:java}
> webServer.addServlet("query_page", "/query_page.html", 
> QueryProfileServlet.class);{code}
> the drilldown link of the operation returned may be broken if 
> hive.server2.show.operation.drilldown.link is enabled...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24693) Parquet Timestamp Values Read/Write Very Slow

2021-02-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24693?focusedWorklogId=549989&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-549989
 ]

ASF GitHub Bot logged work on HIVE-24693:
-

Author: ASF GitHub Bot
Created on: 09/Feb/21 00:50
Start Date: 09/Feb/21 00:50
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on pull request #1938:
URL: https://github.com/apache/hive/pull/1938#issuecomment-775569790


   @klcopp Where do see about validate date ranges starting at year 1?  I found 
this on the Hive Wiki:
   
   https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types
   
   > The range of values supported for the Date type is -­01-­01 to 
-­12-­31



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 549989)
Time Spent: 2h 50m  (was: 2h 40m)

> Parquet Timestamp Values Read/Write Very Slow
> -
>
> Key: HIVE-24693
> URL: https://issues.apache.org/jira/browse/HIVE-24693
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Parquet {{DataWriteableWriter}} relias on {{NanoTimeUtils}} to convert a 
> timestamp object into a binary value.  The way in which it does this,... it 
> calls {{toString()}} on the timestamp object, and then parses the String.  
> This particular timestamp do not carry a timezone, so the string is something 
> like:
> {{2021-21-03 12:32:23....}}
> The parse code tries to parse the string assuming there is a time zone, and 
> if not, falls-back and applies the provided "default time zone".  As was 
> noted in [HIVE-24353], if something fails to parse, it is very expensive to 
> try to parse again.  So, for each timestamp in the Parquet file, it:
> * Builds a string from the time stamp
> * Parses it (throws an exception, parses again)
> There is no need to do this kind of string manipulations/parsing, it should 
> just be using the epoch millis/seconds/time stored internal to the Timestamp 
> object.
> {code:java}
>   // Converts Timestamp to TimestampTZ.
>   public static TimestampTZ convert(Timestamp ts, ZoneId defaultTimeZone) {
> return parse(ts.toString(), defaultTimeZone);
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24742) Support router path or view fs path in Hive table location

2021-02-08 Thread Aihua Xu (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17281469#comment-17281469
 ] 

Aihua Xu commented on HIVE-24742:
-

[~ngangam], [~ychena] FYI. Can you help take a look? 

> Support router path or view fs path in Hive table location
> --
>
> Key: HIVE-24742
> URL: https://issues.apache.org/jira/browse/HIVE-24742
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 3.1.2
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Major
> Attachments: HIVE-24742.patch
>
>
> In 
> [FileUtils.java|https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/common/FileUtils.java#L747],
>  equalsFileSystem function checks the base URL to determine if source and 
> destination are on the same cluster and decides copy or move the data. That 
> will not work for viewfs or router base file system since 
> viewfs://ns-default/a and viewfs://ns-default/b may be on different physical 
> clusters.
> FileSystem in HDFS supports resolvePath() function to resolve to the physical 
> path. We can support viewfs and router through such function.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24742) Support router path or view fs path in Hive table location

2021-02-08 Thread Aihua Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-24742:

Status: Patch Available  (was: Open)

Attached the patch: simple change to make a resolvePath() call to resolve the 
path to physical path. 

> Support router path or view fs path in Hive table location
> --
>
> Key: HIVE-24742
> URL: https://issues.apache.org/jira/browse/HIVE-24742
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 3.1.2
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Major
> Attachments: HIVE-24742.patch
>
>
> In 
> [FileUtils.java|https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/common/FileUtils.java#L747],
>  equalsFileSystem function checks the base URL to determine if source and 
> destination are on the same cluster and decides copy or move the data. That 
> will not work for viewfs or router base file system since 
> viewfs://ns-default/a and viewfs://ns-default/b may be on different physical 
> clusters.
> FileSystem in HDFS supports resolvePath() function to resolve to the physical 
> path. We can support viewfs and router through such function.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24742) Support router path or view fs path in Hive table location

2021-02-08 Thread Aihua Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-24742:

Attachment: HIVE-24742.patch

> Support router path or view fs path in Hive table location
> --
>
> Key: HIVE-24742
> URL: https://issues.apache.org/jira/browse/HIVE-24742
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 3.1.2
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Major
> Attachments: HIVE-24742.patch
>
>
> In 
> [FileUtils.java|https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/common/FileUtils.java#L747],
>  equalsFileSystem function checks the base URL to determine if source and 
> destination are on the same cluster and decides copy or move the data. That 
> will not work for viewfs or router base file system since 
> viewfs://ns-default/a and viewfs://ns-default/b may be on different physical 
> clusters.
> FileSystem in HDFS supports resolvePath() function to resolve to the physical 
> path. We can support viewfs and router through such function.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22126) hive-exec packaging should shade guava

2021-02-08 Thread Dhirendra Pandit (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17281467#comment-17281467
 ] 

Dhirendra Pandit commented on HIVE-22126:
-

I am also having a similar problem with hive 3.1.1, Tez job is failing with 
error message 
"com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V"

> hive-exec packaging should shade guava
> --
>
> Key: HIVE-22126
> URL: https://issues.apache.org/jira/browse/HIVE-22126
> Project: Hive
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Assignee: Eugene Chung
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22126.01.patch, HIVE-22126.02.patch, 
> HIVE-22126.03.patch, HIVE-22126.04.patch, HIVE-22126.05.patch, 
> HIVE-22126.06.patch, HIVE-22126.07.patch, HIVE-22126.08.patch, 
> HIVE-22126.09.patch, HIVE-22126.09.patch, HIVE-22126.09.patch, 
> HIVE-22126.09.patch, HIVE-22126.09.patch
>
>
> The ql/pom.xml includes complete guava library into hive-exec.jar 
> https://github.com/apache/hive/blob/master/ql/pom.xml#L990 This causes a 
> problems for downstream clients of hive which have hive-exec.jar in their 
> classpath since they are pinned to the same guava version as that of hive. 
> We should shade guava classes so that other components which depend on 
> hive-exec can independently use a different version of guava as needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24625) CTAS with TBLPROPERTIES ('transactional'='false') loads data into incorrect directory

2021-02-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24625?focusedWorklogId=549958&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-549958
 ]

ASF GitHub Bot logged work on HIVE-24625:
-

Author: ASF GitHub Bot
Created on: 08/Feb/21 22:47
Start Date: 08/Feb/21 22:47
Worklog Time Spent: 10m 
  Work Description: mustafaiman commented on pull request #1856:
URL: https://github.com/apache/hive/pull/1856#issuecomment-775514851


   @zeroflag @nrg4878 is this ready to merge?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 549958)
Time Spent: 1h  (was: 50m)

> CTAS with TBLPROPERTIES ('transactional'='false') loads data into incorrect 
> directory
> -
>
> Key: HIVE-24625
> URL: https://issues.apache.org/jira/browse/HIVE-24625
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Metastore
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> MetastoreDefaultTransformer in HMS converts a managed non transactional table 
> to external table. MoveTask still uses the managed path when loading the 
> data, resulting an always empty table.
> {code:java}
> create table tbl1 TBLPROPERTIES ('transactional'='false') as select * from 
> other;{code}
> After the conversion the table location points to an external directory:
> Location: | 
> hdfs://c670-node2.coelab.cloudera.com:8020/warehouse/tablespace/external/hive/tbl1
> Move task uses the managed location"
> {code:java}
> INFO : Moving data to directory 
> hdfs://...:8020/warehouse/tablespace/managed/hive/tbl1 from 
> hdfs://...:8020/warehouse/tablespace/managed/hive/.hive-staging_hive_2021-01-05_16-10-39_973_41005081081760609-4/-ext-1000
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24705) Create/Alter/Drop tables based on storage handlers in HS2 should be authorized by Ranger/Sentry

2021-02-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24705?focusedWorklogId=549933&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-549933
 ]

ASF GitHub Bot logged work on HIVE-24705:
-

Author: ASF GitHub Bot
Created on: 08/Feb/21 22:10
Start Date: 08/Feb/21 22:10
Worklog Time Spent: 10m 
  Work Description: saihemanth-cloudera closed pull request #1931:
URL: https://github.com/apache/hive/pull/1931


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 549933)
Time Spent: 0.5h  (was: 20m)

> Create/Alter/Drop tables based on storage handlers in HS2 should be 
> authorized by Ranger/Sentry
> ---
>
> Key: HIVE-24705
> URL: https://issues.apache.org/jira/browse/HIVE-24705
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> With doAs=false in Hive3.x, whenever a user is trying to create a table based 
> on storage handlers on external storage for ex: HBase table, the end user we 
> are seeing is hive so we cannot really enforce the condition in Apache 
> Ranger/Sentry on the end-user. So, we need to enforce this condition in the 
> hive in the event of create/alter/drop tables based on storage handlers.
> Built-in hive storage handlers like HbaseStorageHandler, KafkaStorageHandler 
> e.t.c should implement a method getURIForAuthentication() which returns a URI 
> that is formed from table properties. This URI can be sent for authorization 
> to Ranger/Sentry.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24726) Track required data for cache hydration

2021-02-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24726:
--
Labels: pull-request-available  (was: )

> Track required data for cache hydration
> ---
>
> Key: HIVE-24726
> URL: https://issues.apache.org/jira/browse/HIVE-24726
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24726) Track required data for cache hydration

2021-02-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24726?focusedWorklogId=549931&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-549931
 ]

ASF GitHub Bot logged work on HIVE-24726:
-

Author: ASF GitHub Bot
Created on: 08/Feb/21 22:08
Start Date: 08/Feb/21 22:08
Worklog Time Spent: 10m 
  Work Description: asinkovits opened a new pull request #1961:
URL: https://github.com/apache/hive/pull/1961


   
   
   ### What changes were proposed in this pull request?
   This is a subtask for the cache hydration feature, collecting the required 
data for storing and restoring the cache content.
   
   
   
   
   
   ### Why are the changes needed?
   LLAP cache hydration will enable save/load the cache contents. To do this 
some additional metadata is required.
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   
   
   ### How was this patch tested?
   
   Unit tests were added.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 549931)
Remaining Estimate: 0h
Time Spent: 10m

> Track required data for cache hydration
> ---
>
> Key: HIVE-24726
> URL: https://issues.apache.org/jira/browse/HIVE-24726
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24739) Clarify Usage of Thrift TServerEventHandler and Count Number of Messages Processed

2021-02-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24739?focusedWorklogId=549899&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-549899
 ]

ASF GitHub Bot logged work on HIVE-24739:
-

Author: ASF GitHub Bot
Created on: 08/Feb/21 21:30
Start Date: 08/Feb/21 21:30
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on pull request #1946:
URL: https://github.com/apache/hive/pull/1946#issuecomment-775475736


   @miklosgergely Are you available for review on this? :)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 549899)
Time Spent: 2h 50m  (was: 2h 40m)

> Clarify Usage of Thrift TServerEventHandler and Count Number of Messages 
> Processed
> --
>
> Key: HIVE-24739
> URL: https://issues.apache.org/jira/browse/HIVE-24739
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Make the messages emitted from {{TServerEventHandler}} more meaningful.  
> Also, track the number of messages that each client sends to aid in 
> troubleshooting.
> I run into this issue all the time with and this would greatly help clarify 
> the logging.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24705) Create/Alter/Drop tables based on storage handlers in HS2 should be authorized by Ranger/Sentry

2021-02-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24705?focusedWorklogId=549887&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-549887
 ]

ASF GitHub Bot logged work on HIVE-24705:
-

Author: ASF GitHub Bot
Created on: 08/Feb/21 21:17
Start Date: 08/Feb/21 21:17
Worklog Time Spent: 10m 
  Work Description: saihemanth-cloudera opened a new pull request #1960:
URL: https://github.com/apache/hive/pull/1960


   … should be authorized by Ranger/Sentry
   
   
   
   ### What changes were proposed in this pull request?
   Built-in hive storage handlers like HbaseStorageHandler, KafkaStorageHandler 
e.t.c should implement a method getURIForAuthentication() which returns a URI 
that is formed from table properties.
   
   
   
   ### Why are the changes needed?
   With doAs=false in Hive3.x, whenever a user is trying to create a table 
based on storage handlers on external storage for ex: HBase table, the end user 
we are seeing is hive so we cannot really enforce the condition in Apache 
Ranger/Sentry on the end-user. So, we need to enforce this condition in the 
hive in the event of create/alter/drop tables based on storage handlers.
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   By enforcing authorization, only the users granted with create/alter/drop 
privileges in ranger/sentry can only do these operations on tables based on 
storage handlers.
   
   
   
   ### How was this patch tested?
   Local machine, Remote cluster.
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 549887)
Time Spent: 20m  (was: 10m)

> Create/Alter/Drop tables based on storage handlers in HS2 should be 
> authorized by Ranger/Sentry
> ---
>
> Key: HIVE-24705
> URL: https://issues.apache.org/jira/browse/HIVE-24705
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> With doAs=false in Hive3.x, whenever a user is trying to create a table based 
> on storage handlers on external storage for ex: HBase table, the end user we 
> are seeing is hive so we cannot really enforce the condition in Apache 
> Ranger/Sentry on the end-user. So, we need to enforce this condition in the 
> hive in the event of create/alter/drop tables based on storage handlers.
> Built-in hive storage handlers like HbaseStorageHandler, KafkaStorageHandler 
> e.t.c should implement a method getURIForAuthentication() which returns a URI 
> that is formed from table properties. This URI can be sent for authorization 
> to Ranger/Sentry.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23036) ORC PPD eval with sub-millisecond timestamps

2021-02-08 Thread Panagiotis Garefalakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17281283#comment-17281283
 ] 

Panagiotis Garefalakis commented on HIVE-23036:
---

Thank you [~abstractdog]  :) 

> ORC PPD eval with sub-millisecond timestamps
> 
>
> Key: HIVE-23036
> URL: https://issues.apache.org/jira/browse/HIVE-23036
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> See [ORC-611|https://issues.apache.org/jira/browse/ORC-611] for more details
> ORC stores timestamps with:
>  - nanosecond precision for the data itself
>  - milliseconds precision for min-max statistics
> As both min and max are rounded to the same value,  timestamps with ns 
> precision will not pass the PPD evaluator.
> {code:java}
> create table tsstat (ts timestamp) stored as orc;
> insert into tsstat values ("1970-01-01 00:00:00.0005");
> select * from tsstat where ts = "1970-01-01 00:00:00.0005";
> -- returned 0 rows{code}
> ORC PPD evaluation currently happens as part of OrcInputFormat 
> [https://github.com/apache/hive/blob/7e39a2c13711f9377c9ce1edb4224880421b1ea5/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java#L2314]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23036) ORC PPD eval with sub-millisecond timestamps

2021-02-08 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-23036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-23036:

Fix Version/s: 4.0.0

> ORC PPD eval with sub-millisecond timestamps
> 
>
> Key: HIVE-23036
> URL: https://issues.apache.org/jira/browse/HIVE-23036
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> See [ORC-611|https://issues.apache.org/jira/browse/ORC-611] for more details
> ORC stores timestamps with:
>  - nanosecond precision for the data itself
>  - milliseconds precision for min-max statistics
> As both min and max are rounded to the same value,  timestamps with ns 
> precision will not pass the PPD evaluator.
> {code:java}
> create table tsstat (ts timestamp) stored as orc;
> insert into tsstat values ("1970-01-01 00:00:00.0005");
> select * from tsstat where ts = "1970-01-01 00:00:00.0005";
> -- returned 0 rows{code}
> ORC PPD evaluation currently happens as part of OrcInputFormat 
> [https://github.com/apache/hive/blob/7e39a2c13711f9377c9ce1edb4224880421b1ea5/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java#L2314]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23036) ORC PPD eval with sub-millisecond timestamps

2021-02-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23036?focusedWorklogId=549763&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-549763
 ]

ASF GitHub Bot logged work on HIVE-23036:
-

Author: ASF GitHub Bot
Created on: 08/Feb/21 18:30
Start Date: 08/Feb/21 18:30
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on pull request #1943:
URL: https://github.com/apache/hive/pull/1943#issuecomment-775351617


   merged to master, thanks @pgaref for the patch!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 549763)
Time Spent: 1h 50m  (was: 1h 40m)

> ORC PPD eval with sub-millisecond timestamps
> 
>
> Key: HIVE-23036
> URL: https://issues.apache.org/jira/browse/HIVE-23036
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> See [ORC-611|https://issues.apache.org/jira/browse/ORC-611] for more details
> ORC stores timestamps with:
>  - nanosecond precision for the data itself
>  - milliseconds precision for min-max statistics
> As both min and max are rounded to the same value,  timestamps with ns 
> precision will not pass the PPD evaluator.
> {code:java}
> create table tsstat (ts timestamp) stored as orc;
> insert into tsstat values ("1970-01-01 00:00:00.0005");
> select * from tsstat where ts = "1970-01-01 00:00:00.0005";
> -- returned 0 rows{code}
> ORC PPD evaluation currently happens as part of OrcInputFormat 
> [https://github.com/apache/hive/blob/7e39a2c13711f9377c9ce1edb4224880421b1ea5/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java#L2314]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23036) ORC PPD eval with sub-millisecond timestamps

2021-02-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23036?focusedWorklogId=549762&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-549762
 ]

ASF GitHub Bot logged work on HIVE-23036:
-

Author: ASF GitHub Bot
Created on: 08/Feb/21 18:30
Start Date: 08/Feb/21 18:30
Worklog Time Spent: 10m 
  Work Description: abstractdog merged pull request #1943:
URL: https://github.com/apache/hive/pull/1943


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 549762)
Time Spent: 1h 40m  (was: 1.5h)

> ORC PPD eval with sub-millisecond timestamps
> 
>
> Key: HIVE-23036
> URL: https://issues.apache.org/jira/browse/HIVE-23036
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> See [ORC-611|https://issues.apache.org/jira/browse/ORC-611] for more details
> ORC stores timestamps with:
>  - nanosecond precision for the data itself
>  - milliseconds precision for min-max statistics
> As both min and max are rounded to the same value,  timestamps with ns 
> precision will not pass the PPD evaluator.
> {code:java}
> create table tsstat (ts timestamp) stored as orc;
> insert into tsstat values ("1970-01-01 00:00:00.0005");
> select * from tsstat where ts = "1970-01-01 00:00:00.0005";
> -- returned 0 rows{code}
> ORC PPD evaluation currently happens as part of OrcInputFormat 
> [https://github.com/apache/hive/blob/7e39a2c13711f9377c9ce1edb4224880421b1ea5/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java#L2314]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-23036) ORC PPD eval with sub-millisecond timestamps

2021-02-08 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-23036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor resolved HIVE-23036.
-
Resolution: Fixed

> ORC PPD eval with sub-millisecond timestamps
> 
>
> Key: HIVE-23036
> URL: https://issues.apache.org/jira/browse/HIVE-23036
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> See [ORC-611|https://issues.apache.org/jira/browse/ORC-611] for more details
> ORC stores timestamps with:
>  - nanosecond precision for the data itself
>  - milliseconds precision for min-max statistics
> As both min and max are rounded to the same value,  timestamps with ns 
> precision will not pass the PPD evaluator.
> {code:java}
> create table tsstat (ts timestamp) stored as orc;
> insert into tsstat values ("1970-01-01 00:00:00.0005");
> select * from tsstat where ts = "1970-01-01 00:00:00.0005";
> -- returned 0 rows{code}
> ORC PPD evaluation currently happens as part of OrcInputFormat 
> [https://github.com/apache/hive/blob/7e39a2c13711f9377c9ce1edb4224880421b1ea5/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java#L2314]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24313) Optimise stats collection for file sizes on cloud storage

2021-02-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24313?focusedWorklogId=549759&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-549759
 ]

ASF GitHub Bot logged work on HIVE-24313:
-

Author: ASF GitHub Bot
Created on: 08/Feb/21 18:27
Start Date: 08/Feb/21 18:27
Worklog Time Spent: 10m 
  Work Description: okumin commented on a change in pull request #1636:
URL: https://github.com/apache/hive/pull/1636#discussion_r572272405



##
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java
##
@@ -316,12 +319,11 @@ private static Statistics collectStatistics(HiveConf 
conf, PrunedPartitionList p
 
   basicStatsFactory.addEnhancer(new 
BasicStats.RowNumEstimator(estimateRowSizeFromSchema(conf, schema)));
 
-  List partStats = new ArrayList<>();
+  List partStats =
+  partList.getNotDeniedPartns().parallelStream().

Review comment:
   The point is `basicStatsFactory.build(Partish.buildFor(table, p))` can 
block the common pool shared with a whole JVM since it executes blocking-IO?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 549759)
Time Spent: 1.5h  (was: 1h 20m)

> Optimise stats collection for file sizes on cloud storage
> -
>
> Key: HIVE-24313
> URL: https://issues.apache.org/jira/browse/HIVE-24313
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> When stats information is not present (e.g external table), RelOptHiveTable 
> computes basic stats at runtime.
> Following is the codepath.
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/RelOptHiveTable.java#L598]
> {code:java}
> Statistics stats = StatsUtils.collectStatistics(hiveConf, partitionList,
> hiveTblMetadata, hiveNonPartitionCols, 
> nonPartColNamesThatRqrStats, colStatsCached,
> nonPartColNamesThatRqrStats, true);
>  {code}
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java#L322]
> {code:java}
> for (Partition p : partList.getNotDeniedPartns()) {
> BasicStats basicStats = 
> basicStatsFactory.build(Partish.buildFor(table, p));
> partStats.add(basicStats);
>   }
>  {code}
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStats.java#L205]
>  
> {code:java}
> try {
> ds = getFileSizeForPath(path);
>   } catch (IOException e) {
> ds = 0L;
>   }
>  {code}
>  
> For a table & query with large number of partitions, this takes long time to 
> compute statistics and increases compilation time.  It would be good to fix 
> it with "ForkJoinPool" ( 
> partList.getNotDeniedPartns().parallelStream().forEach((p) )
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24749) Disable user's UDF use SystemExit

2021-02-08 Thread okumin (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17281262#comment-17281262
 ] 

okumin commented on HIVE-24749:
---

I wonder if it's possible to make SecurityManager pluggable so that 
administrators can disallow users to run potentially dangerous UDFs.

I know this is too much for this ticket, though.

> Disable user's UDF use SystemExit
> -
>
> Key: HIVE-24749
> URL: https://issues.apache.org/jira/browse/HIVE-24749
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: All Versions
>Reporter: xiepengjie
>Assignee: xiepengjie
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If the System.exit() is executed in the user's UDF and using default 
> SecurityManager, it will cause the HS2 service process to exit, that's too 
> bad.
> It is safer to use NoExitSecurityManager which can intercepting System.exit().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24693) Parquet Timestamp Values Read/Write Very Slow

2021-02-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24693?focusedWorklogId=549701&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-549701
 ]

ASF GitHub Bot logged work on HIVE-24693:
-

Author: ASF GitHub Bot
Created on: 08/Feb/21 16:53
Start Date: 08/Feb/21 16:53
Worklog Time Spent: 10m 
  Work Description: klcopp commented on pull request #1938:
URL: https://github.com/apache/hive/pull/1938#issuecomment-775286514


   I've seen a couple users come back and ask why 0 shows up as 0001 but they 
seemed satisfied with the explanation that year 0 doesn't exist. I mean they 
don't represent all other users but... since it's not a real year I'm not sure 
we should let users use it.
   
   According to the wiki Hive doesn't support dates/timestamps outside of years 
0001–. AFAIK currently Hive accepts negative years (though I'm not sure 
they're displayed/stored correctly?) and auto-converts  to 0001 since  
doesn't exist. I think we should decide what we want and change either the wiki 
or Hive's behavior to not accept pre-0001 and post- dates – don't know how 
feasible this is though.
   
   @jcamachor  I'd be interested in what you think too!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 549701)
Time Spent: 2h 40m  (was: 2.5h)

> Parquet Timestamp Values Read/Write Very Slow
> -
>
> Key: HIVE-24693
> URL: https://issues.apache.org/jira/browse/HIVE-24693
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Parquet {{DataWriteableWriter}} relias on {{NanoTimeUtils}} to convert a 
> timestamp object into a binary value.  The way in which it does this,... it 
> calls {{toString()}} on the timestamp object, and then parses the String.  
> This particular timestamp do not carry a timezone, so the string is something 
> like:
> {{2021-21-03 12:32:23....}}
> The parse code tries to parse the string assuming there is a time zone, and 
> if not, falls-back and applies the provided "default time zone".  As was 
> noted in [HIVE-24353], if something fails to parse, it is very expensive to 
> try to parse again.  So, for each timestamp in the Parquet file, it:
> * Builds a string from the time stamp
> * Parses it (throws an exception, parses again)
> There is no need to do this kind of string manipulations/parsing, it should 
> just be using the epoch millis/seconds/time stored internal to the Timestamp 
> object.
> {code:java}
>   // Converts Timestamp to TimestampTZ.
>   public static TimestampTZ convert(Timestamp ts, ZoneId defaultTimeZone) {
> return parse(ts.toString(), defaultTimeZone);
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24747) Backport HIVE-24569 to branch-3.1

2021-02-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24747?focusedWorklogId=549686&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-549686
 ]

ASF GitHub Bot logged work on HIVE-24747:
-

Author: ASF GitHub Bot
Created on: 08/Feb/21 16:41
Start Date: 08/Feb/21 16:41
Worklog Time Spent: 10m 
  Work Description: zabetak commented on pull request #1952:
URL: https://github.com/apache/hive/pull/1952#issuecomment-775278193


   Hey @jcamachor , can you please have a look on this PR?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 549686)
Time Spent: 20m  (was: 10m)

> Backport HIVE-24569 to branch-3.1
> -
>
> Key: HIVE-24747
> URL: https://issues.apache.org/jira/browse/HIVE-24747
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24259) [CachedStore] Optimise get constraints call by removing redundant table check

2021-02-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24259?focusedWorklogId=549679&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-549679
 ]

ASF GitHub Bot logged work on HIVE-24259:
-

Author: ASF GitHub Bot
Created on: 08/Feb/21 16:20
Start Date: 08/Feb/21 16:20
Worklog Time Spent: 10m 
  Work Description: adesh-rao commented on a change in pull request #1610:
URL: https://github.com/apache/hive/pull/1610#discussion_r572168261



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
##
@@ -1325,6 +1204,7 @@ private void validateTableType(Table tbl) {
 }
 validateTableType(tbl);
 sharedCache.addTableToCache(catName, dbName, tblName, tbl);
+sharedCache.addTableConstraintsToCache(catName,dbName,tblName,new 
SQLAllTableConstraints());

Review comment:
   There is a separate api for createTableWithConstraints. 
   
   Is this api also used when table is created with constraints?

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
##
@@ -2814,21 +2649,12 @@ long getPartsFound() {
 catName = StringUtils.normalizeIdentifier(catName);
 dbName = StringUtils.normalizeIdentifier(dbName);
 tblName = StringUtils.normalizeIdentifier(tblName);
-if (!shouldCacheTable(catName, dbName, tblName) || (canUseEvents && 
rawStore.isActiveTransaction())) {
+if (!shouldCacheTable(catName, dbName, tblName) || (canUseEvents && 
rawStore.isActiveTransaction()) || !sharedCache
+.isTableConstraintValid(catName, dbName, tblName)) {
   return rawStore.getCheckConstraints(catName, dbName, tblName);
 }
+return sharedCache.listCachedCheckConstraint(catName, dbName, tblName);

Review comment:
   below check was present earlier:
   
   ```
   if (CollectionUtils.isEmpty(keys)) { 
 return rawStore.getCheckConstraints(catName, dbName, tblName); 
   }
   ```
   
   Why is this removed? Is this check moved somewhere else?

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/SharedCache.java
##
@@ -418,12 +424,12 @@ private void updateMemberSize(MemberName mn, Integer 
size, SizeMode mode) {
   }
 
   switch (mode) {
-case Delta:
-  this.memberObjectsSize[mn.ordinal()] += size;
-  break;
-case Snapshot:
-  this.memberObjectsSize[mn.ordinal()] = size;
-  break;
+  case Delta:

Review comment:
   nit: need extra spaces before case?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 549679)
Time Spent: 2.5h  (was: 2h 20m)

> [CachedStore] Optimise get constraints call by removing redundant table check 
> --
>
> Key: HIVE-24259
> URL: https://issues.apache.org/jira/browse/HIVE-24259
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Description -
> Problem - 
> 1. Redundant check if table is present or not
> 2. Currently in order to get all constraint form the cachedstore. 6 different 
> call is made with in the cached store. Which led to 6 different call to raw 
> store
>  
> DOD
> 1. Check only once if table exit in cached store.
> 2. Instead of calling individual constraint in cached store. Add a method 
> which return all constraint at once and if data is not consistent then fall 
> back to rawstore.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24753) Non blocking DROP PARTITION implementation

2021-02-08 Thread Zoltan Chovan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Chovan reassigned HIVE-24753:


Assignee: Zoltan Chovan

> Non blocking DROP PARTITION implementation
> --
>
> Key: HIVE-24753
> URL: https://issues.apache.org/jira/browse/HIVE-24753
> Project: Hive
>  Issue Type: New Feature
>Reporter: Zoltan Chovan
>Assignee: Zoltan Chovan
>Priority: Major
>
> Implement a way to execute drop partition operations in a way that doesn't 
> have to wait for currently running read operations to be finished.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23625) HS2 Web UI displays query drill-down results in plain text, not html

2021-02-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23625?focusedWorklogId=549574&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-549574
 ]

ASF GitHub Bot logged work on HIVE-23625:
-

Author: ASF GitHub Bot
Created on: 08/Feb/21 13:21
Start Date: 08/Feb/21 13:21
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 opened a new pull request #1958:
URL: https://github.com/apache/hive/pull/1958


   https://issues.apache.org/jira/browse/HIVE-24752
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   Local machine
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 549574)
Time Spent: 0.5h  (was: 20m)

> HS2 Web UI displays query drill-down results in plain text, not html
> 
>
> Key: HIVE-23625
> URL: https://issues.apache.org/jira/browse/HIVE-23625
> Project: Hive
>  Issue Type: Bug
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23625.1.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Opening a drilldown link on the HS2 Web UI, you are directed to the following 
> URL: /query_page?operationId=
> Since the path /query_page contains no file extensions, Jetty cannot 
> determine the mimetype and therefore the Hive HttpServer returns response 
> header Content-Type: text/plain;charset=utf-8, and the information does not 
> render as html in the browser. This should be corrected to return 
> text/html;charset=utf-8.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24752) Returned operation's drilldown link may be broken since HIVE-23625

2021-02-08 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-24752:
---
Description: 
The path spec for the query page has changed from _query_page_ to 
_query_page.html_,
{code:java}
webServer.addServlet("query_page", "/query_page.html", 
QueryProfileServlet.class);{code}
the drilldown link of the operation returned may be broken if 
hive.server2.show.operation.drilldown.link is enabled...

  was:
The path spec for the query page has changed from _query_page_ to 
_query_page.html_,
{code:java}
webServer.addServlet("query_page", "/query_page.html", 
QueryProfileServlet.class);{code}
the drilldown link of the operation returned may be broken if 
hive.server2.show.operation.drilldown.link is enabled...

 

 


> Returned operation's drilldown link may be broken since HIVE-23625
> --
>
> Key: HIVE-24752
> URL: https://issues.apache.org/jira/browse/HIVE-24752
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Zhihua Deng
>Priority: Major
>
> The path spec for the query page has changed from _query_page_ to 
> _query_page.html_,
> {code:java}
> webServer.addServlet("query_page", "/query_page.html", 
> QueryProfileServlet.class);{code}
> the drilldown link of the operation returned may be broken if 
> hive.server2.show.operation.drilldown.link is enabled...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24752) Returned operation's drilldown link may be broken since HIVE-23625

2021-02-08 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-24752:
---
Description: 
The path spec for the query page has changed from _query_page_ to 
_query_page.html_,
{code:java}
webServer.addServlet("query_page", "/query_page.html", 
QueryProfileServlet.class);{code}
the drilldown link of the operation returned may be broken if 
hive.server2.show.operation.drilldown.link is enabled...

 

 

  was:
The path spec for the query page has changed from _query_page_ to 
_query_page.html_,

 
{code:java}
webServer.addServlet("query_page", "/query_page.html", 
QueryProfileServlet.class);{code}
 

the drilldown link of the operation returned may be broken if 
hive.server2.show.operation.drilldown.link is enabled...

 

 


> Returned operation's drilldown link may be broken since HIVE-23625
> --
>
> Key: HIVE-24752
> URL: https://issues.apache.org/jira/browse/HIVE-24752
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Zhihua Deng
>Priority: Major
>
> The path spec for the query page has changed from _query_page_ to 
> _query_page.html_,
> {code:java}
> webServer.addServlet("query_page", "/query_page.html", 
> QueryProfileServlet.class);{code}
> the drilldown link of the operation returned may be broken if 
> hive.server2.show.operation.drilldown.link is enabled...
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24746) PTF: TimestampValueBoundaryScanner can be optimised during range computation

2021-02-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24746?focusedWorklogId=549548&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-549548
 ]

ASF GitHub Bot logged work on HIVE-24746:
-

Author: ASF GitHub Bot
Created on: 08/Feb/21 12:41
Start Date: 08/Feb/21 12:41
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1950:
URL: https://github.com/apache/hive/pull/1950#discussion_r572010803



##
File path: 
ql/src/test/org/apache/hadoop/hive/ql/udf/ptf/TestValueBoundaryScanner.java
##
@@ -0,0 +1,187 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.udf.ptf;
+
+import java.time.ZoneId;
+
+import org.apache.hadoop.hive.common.type.Date;
+import org.apache.hadoop.hive.common.type.Timestamp;
+import org.apache.hadoop.hive.common.type.TimestampTZ;
+import org.apache.hadoop.hive.ql.plan.ptf.OrderExpressionDef;
+import org.apache.hadoop.hive.ql.plan.ptf.PTFExpressionDef;
+import org.apache.hadoop.hive.serde2.io.DateWritableV2;
+import org.apache.hadoop.hive.serde2.io.HiveDecimalWritable;
+import org.apache.hadoop.hive.serde2.io.TimestampLocalTZWritable;
+import org.apache.hadoop.hive.serde2.io.TimestampWritableV2;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
+import org.apache.hadoop.io.LongWritable;
+import org.apache.hadoop.io.Text;
+import org.junit.Assert;
+import org.junit.Test;
+
+public class TestValueBoundaryScanner {
+
+  @Test
+  public void testLongEquals() {
+PTFExpressionDef argDef = new PTFExpressionDef();
+argDef.setOI(PrimitiveObjectInspectorFactory.writableLongObjectInspector);
+
+LongValueBoundaryScanner scanner =
+new LongValueBoundaryScanner(null, null, new 
OrderExpressionDef(argDef), false);
+LongWritable w1 = new LongWritable(1);
+LongWritable w2 = new LongWritable(2);
+
+Assert.assertTrue(scanner.isEqual(w1, w1));
+
+Assert.assertFalse(scanner.isEqual(w1, w2));
+Assert.assertFalse(scanner.isEqual(w2, w1));
+
+Assert.assertFalse(scanner.isEqual(null, w2));
+Assert.assertFalse(scanner.isEqual(w1, null));
+
+Assert.assertTrue(scanner.isEqual(null, null));
+  }
+
+  @Test
+  public void testHiveDecimalEquals() {
+PTFExpressionDef argDef = new PTFExpressionDef();
+
argDef.setOI(PrimitiveObjectInspectorFactory.writableHiveDecimalObjectInspector);
+
+HiveDecimalValueBoundaryScanner scanner =
+new HiveDecimalValueBoundaryScanner(null, null, new 
OrderExpressionDef(argDef), false);
+HiveDecimalWritable w1 = new HiveDecimalWritable(1);
+HiveDecimalWritable w2 = new HiveDecimalWritable(2);
+
+Assert.assertTrue(scanner.isEqual(w1, w1));
+
+Assert.assertFalse(scanner.isEqual(w1, w2));
+Assert.assertFalse(scanner.isEqual(w2, w1));
+
+Assert.assertFalse(scanner.isEqual(null, w2));
+Assert.assertFalse(scanner.isEqual(w1, null));
+
+Assert.assertTrue(scanner.isEqual(null, null));
+  }
+
+  @Test
+  public void testDateEquals() {
+PTFExpressionDef argDef = new PTFExpressionDef();
+argDef.setOI(PrimitiveObjectInspectorFactory.writableDateObjectInspector);
+
+DateValueBoundaryScanner scanner =
+new DateValueBoundaryScanner(null, null, new 
OrderExpressionDef(argDef), false);
+Date date = new Date();
+date.setTimeInMillis(1000);
+DateWritableV2 w1 = new DateWritableV2(date);
+DateWritableV2 w2 = new DateWritableV2(date);
+DateWritableV2 w3 = new DateWritableV2(); // empty
+
+Assert.assertTrue(scanner.isEqual(w1, w2));
+Assert.assertTrue(scanner.isEqual(w2, w1));
+
+// empty == epoch
+Assert.assertTrue(scanner.isEqual(w3, new DateWritableV2(new Date(;
+// empty == another
+Assert.assertTrue(scanner.isEqual(w1, w3));

Review comment:
   Is this expected?

##
File path: 
ql/src/test/org/apache/hadoop/hive/ql/udf/ptf/TestValueBoundaryScanner.java
##
@@ -0,0 +1,187 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distribut

[jira] [Work logged] (HIVE-24698) Sync ACL's for the table directory during external table replication.

2021-02-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24698?focusedWorklogId=549542&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-549542
 ]

ASF GitHub Bot logged work on HIVE-24698:
-

Author: ASF GitHub Bot
Created on: 08/Feb/21 12:14
Start Date: 08/Feb/21 12:14
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on pull request #1925:
URL: https://github.com/apache/hive/pull/1925#issuecomment-775103259


   Thanx Aasha for the review.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 549542)
Time Spent: 50m  (was: 40m)

> Sync ACL's for the table directory during external table replication.
> -
>
> Key: HIVE-24698
> URL: https://issues.apache.org/jira/browse/HIVE-24698
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Set similar ACL's to destination table directory in case the source has ACL's 
> enabled or set.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24736) Make buffer tracking in LLAP cache with BP wrapper more accurate

2021-02-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24736?focusedWorklogId=549521&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-549521
 ]

ASF GitHub Bot logged work on HIVE-24736:
-

Author: ASF GitHub Bot
Created on: 08/Feb/21 11:35
Start Date: 08/Feb/21 11:35
Worklog Time Spent: 10m 
  Work Description: szlta commented on a change in pull request #1951:
URL: https://github.com/apache/hive/pull/1951#discussion_r571977069



##
File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java
##
@@ -229,11 +230,20 @@ public void debugDumpShort(StringBuilder sb) {
 new LinkedBlockingQueue(),
 new 
ThreadFactoryBuilder().setNameFormat("IO-Elevator-Thread-%d").setDaemon(true).build());
 FixedSizedObjectPool tracePool = IoTrace.createTracePool(conf);
+if (isEncodeEnabled) {

Review comment:
   Making the encode threads pooled has little advantage on its own, but is 
required for the accurate buffer accounting part - moreover the two change 
parts are quite small, so I thought it makes sense to combine them





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 549521)
Time Spent: 1h 50m  (was: 1h 40m)

> Make buffer tracking in LLAP cache with BP wrapper more accurate
> 
>
> Key: HIVE-24736
> URL: https://issues.apache.org/jira/browse/HIVE-24736
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> HIVE-22492 has introduced threadlocal buffers in which LlapCachableBuffer 
> instances are stored before entering LRFU's heap - so that lock contention is 
> eased up.
> This is a nice performance improvement, but comes at the cost of losing the 
> exact accounting of llap buffer instances - e.g. if user gives a purge 
> command, not all the cache space is free'd up as one'd expect because purge 
> only considers buffers that the policy knows about. In this case we'd see in 
> LLAP's iomem servlet that the LRFU policy is empty, but a table may still 
> have the full content loaded.
> Also, if we use text based tables, during cache load, a set of -OrcEncode 
> threads are used that are ephemeral in nature. Attaching buffers to these 
> threads' thread local structures are ultimately lost. In an edge case we 
> could load lots of data into the cache by reading in many distinct smaller 
> text tables, whose buffers never reach LRFU policy, and hence cache hit ratio 
> will be suffering as a consequence (memory manager will give up asking LRFU 
> to evict, and will free up random buffers).
> I propose we try and track the amount of data stored in the BP wrapper 
> threadlocals, and flush them into the heap as a first step of a purge 
> request. This will enhance supportability.
> We should also replace the ephemeral OrcEncode threads with a thread pool, 
> that could actually serve as small performance improvement on its own by 
> saving time and memory to deal with thread lifecycle management.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24736) Make buffer tracking in LLAP cache with BP wrapper more accurate

2021-02-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24736?focusedWorklogId=549520&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-549520
 ]

ASF GitHub Bot logged work on HIVE-24736:
-

Author: ASF GitHub Bot
Created on: 08/Feb/21 11:32
Start Date: 08/Feb/21 11:32
Worklog Time Spent: 10m 
  Work Description: szlta commented on a change in pull request #1951:
URL: https://github.com/apache/hive/pull/1951#discussion_r571975616



##
File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java
##
@@ -229,11 +230,20 @@ public void debugDumpShort(StringBuilder sb) {
 new LinkedBlockingQueue(),
 new 
ThreadFactoryBuilder().setNameFormat("IO-Elevator-Thread-%d").setDaemon(true).build());
 FixedSizedObjectPool tracePool = IoTrace.createTracePool(conf);
+if (isEncodeEnabled) {
+  int encodeThreads = numThreads * 2;

Review comment:
   Yeah quite arbitrary I'll give you that.
   Text reading is actually started by the "regular" IO threads, and if 
encoding is needed for cache insertion, than one of these threads can produce 
more async encode tasks. This may end up being bursty, and according to what 
I've seen a bigger thread pool could come handy.
   Anyway I made this configurable now.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 549520)
Time Spent: 1h 40m  (was: 1.5h)

> Make buffer tracking in LLAP cache with BP wrapper more accurate
> 
>
> Key: HIVE-24736
> URL: https://issues.apache.org/jira/browse/HIVE-24736
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> HIVE-22492 has introduced threadlocal buffers in which LlapCachableBuffer 
> instances are stored before entering LRFU's heap - so that lock contention is 
> eased up.
> This is a nice performance improvement, but comes at the cost of losing the 
> exact accounting of llap buffer instances - e.g. if user gives a purge 
> command, not all the cache space is free'd up as one'd expect because purge 
> only considers buffers that the policy knows about. In this case we'd see in 
> LLAP's iomem servlet that the LRFU policy is empty, but a table may still 
> have the full content loaded.
> Also, if we use text based tables, during cache load, a set of -OrcEncode 
> threads are used that are ephemeral in nature. Attaching buffers to these 
> threads' thread local structures are ultimately lost. In an edge case we 
> could load lots of data into the cache by reading in many distinct smaller 
> text tables, whose buffers never reach LRFU policy, and hence cache hit ratio 
> will be suffering as a consequence (memory manager will give up asking LRFU 
> to evict, and will free up random buffers).
> I propose we try and track the amount of data stored in the BP wrapper 
> threadlocals, and flush them into the heap as a first step of a purge 
> request. This will enhance supportability.
> We should also replace the ephemeral OrcEncode threads with a thread pool, 
> that could actually serve as small performance improvement on its own by 
> saving time and memory to deal with thread lifecycle management.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24736) Make buffer tracking in LLAP cache with BP wrapper more accurate

2021-02-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24736?focusedWorklogId=549518&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-549518
 ]

ASF GitHub Bot logged work on HIVE-24736:
-

Author: ASF GitHub Bot
Created on: 08/Feb/21 11:29
Start Date: 08/Feb/21 11:29
Worklog Time Spent: 10m 
  Work Description: szlta commented on a change in pull request #1951:
URL: https://github.com/apache/hive/pull/1951#discussion_r571973578



##
File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/cache/LowLevelLrfuCachePolicy.java
##
@@ -836,13 +912,33 @@ public String description() {
  * @return long array with LRFU stats
  */
 public long[] getUsageStats() {
-  long dataOnHeap = 0L;   // all non-meta related buffers on min-heap
-  long dataOnList = 0L;   // all non-meta related buffers on eviction list
-  long metaOnHeap = 0L;   // meta data buffers on min-heap
-  long metaOnList = 0L;   // meta data buffers on eviction list
-  long listSize   = 0L;   // number of entries on eviction list
-  long lockedData = 0L;   // number of bytes in locked data buffers
-  long lockedMeta = 0L;   // number of bytes in locked metadata buffers
+  long dataOnHeap = 0L;   // all non-meta related buffers on min-heap
+  long dataOnList = 0L;   // all non-meta related buffers on eviction 
list
+  long metaOnHeap = 0L;   // meta data buffers on min-heap
+  long metaOnList = 0L;   // meta data buffers on eviction list
+  long listSize   = 0L;   // number of entries on eviction list
+  long lockedData = 0L;   // number of bytes in locked data buffers
+  long lockedMeta = 0L;   // number of bytes in locked metadata buffers
+  long bpWrapCount= 0L;   // number of buffers in BP wrapper 
threadlocals
+  long bpWrapDistinct = 0L;   // number of distinct buffers in BP wrapper 
threadlocals
+  long bpWrapData = 0L;   // number of bytes stored in BP wrapper data 
buffers
+  long bpWrapMeta = 0L;   // number of bytes stored in BP wrapper 
metadata buffers
+
+  // Using set to produce result of distinct buffers only
+  // (same buffer may be present in multiple thread local bp wrappers, or 
even inside heap/list, but ultimately
+  // it uses the same cache space)
+  Set bpWrapperBuffers = new HashSet<>();
+  for (BPWrapper bpWrapper : bpWrappers.values()) {
+bpWrapper.lock.lock();

Review comment:
   It is only called when someone (e.g. cluster admin) asks for memory 
stats through web UI. Hence it is okay to block all structures for these cases.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 549518)
Time Spent: 1h 20m  (was: 1h 10m)

> Make buffer tracking in LLAP cache with BP wrapper more accurate
> 
>
> Key: HIVE-24736
> URL: https://issues.apache.org/jira/browse/HIVE-24736
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> HIVE-22492 has introduced threadlocal buffers in which LlapCachableBuffer 
> instances are stored before entering LRFU's heap - so that lock contention is 
> eased up.
> This is a nice performance improvement, but comes at the cost of losing the 
> exact accounting of llap buffer instances - e.g. if user gives a purge 
> command, not all the cache space is free'd up as one'd expect because purge 
> only considers buffers that the policy knows about. In this case we'd see in 
> LLAP's iomem servlet that the LRFU policy is empty, but a table may still 
> have the full content loaded.
> Also, if we use text based tables, during cache load, a set of -OrcEncode 
> threads are used that are ephemeral in nature. Attaching buffers to these 
> threads' thread local structures are ultimately lost. In an edge case we 
> could load lots of data into the cache by reading in many distinct smaller 
> text tables, whose buffers never reach LRFU policy, and hence cache hit ratio 
> will be suffering as a consequence (memory manager will give up asking LRFU 
> to evict, and will free up random buffers).
> I propose we try and track the amount of data stored in the BP wrapper 
> threadlocals, and flush them into the heap as a first step of a purge 
> request. This will enhance sup

[jira] [Work logged] (HIVE-24736) Make buffer tracking in LLAP cache with BP wrapper more accurate

2021-02-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24736?focusedWorklogId=549519&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-549519
 ]

ASF GitHub Bot logged work on HIVE-24736:
-

Author: ASF GitHub Bot
Created on: 08/Feb/21 11:29
Start Date: 08/Feb/21 11:29
Worklog Time Spent: 10m 
  Work Description: szlta commented on a change in pull request #1951:
URL: https://github.com/apache/hive/pull/1951#discussion_r571973759



##
File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/cache/LowLevelLrfuCachePolicy.java
##
@@ -909,19 +1017,24 @@ public synchronized void getMetrics(MetricsCollector 
collector, boolean all) {
   .tag(MsInfo.SessionId, session);
 
   // add the values to the new record
-  mrb.addCounter(PolicyInformation.DataOnHeap,   usageStats[DATAONHEAP])
-  .addCounter(PolicyInformation.DataOnList,  usageStats[DATAONLIST])
-  .addCounter(PolicyInformation.MetaOnHeap,  usageStats[METAONHEAP])
-  .addCounter(PolicyInformation.MetaOnList,  usageStats[METAONLIST])
-  .addCounter(PolicyInformation.DataLocked,  usageStats[LOCKEDDATA])
-  .addCounter(PolicyInformation.MetaLocked,  usageStats[LOCKEDMETA])
-  .addCounter(PolicyInformation.HeapSize,heapSize)
-  .addCounter(PolicyInformation.HeapSizeMax, maxHeapSize)
-  .addCounter(PolicyInformation.ListSize,usageStats[LISTSIZE])
-  .addCounter(PolicyInformation.TotalData,   usageStats[DATAONHEAP]
- + usageStats[DATAONLIST])
-  .addCounter(PolicyInformation.TotalMeta,   usageStats[METAONHEAP]
- + usageStats[METAONLIST]);
+  mrb.addCounter(PolicyInformation.DataOnHeap, 
usageStats[DATAONHEAP])
+  .addCounter(PolicyInformation.DataOnList,
usageStats[DATAONLIST])
+  .addCounter(PolicyInformation.MetaOnHeap,
usageStats[METAONHEAP])
+  .addCounter(PolicyInformation.MetaOnList,
usageStats[METAONLIST])
+  .addCounter(PolicyInformation.DataLocked,
usageStats[LOCKEDDATA])
+  .addCounter(PolicyInformation.MetaLocked,
usageStats[LOCKEDMETA])
+  .addCounter(PolicyInformation.HeapSize,  heapSize)
+  .addCounter(PolicyInformation.HeapSizeMax,   maxHeapSize)
+  .addCounter(PolicyInformation.ListSize,  
usageStats[LISTSIZE])
+  .addCounter(PolicyInformation.BPWrapperCount,
usageStats[BPWRAPCNT])
+  .addCounter(PolicyInformation.BPWrapperDistinct, 
usageStats[BPWRAPDISTINCT])
+  .addCounter(PolicyInformation.BPWrapperData, 
usageStats[BPWRAPDATA])
+  .addCounter(PolicyInformation.TotalData, usageStats[DATAONHEAP]

Review comment:
   Fixed





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 549519)
Time Spent: 1.5h  (was: 1h 20m)

> Make buffer tracking in LLAP cache with BP wrapper more accurate
> 
>
> Key: HIVE-24736
> URL: https://issues.apache.org/jira/browse/HIVE-24736
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> HIVE-22492 has introduced threadlocal buffers in which LlapCachableBuffer 
> instances are stored before entering LRFU's heap - so that lock contention is 
> eased up.
> This is a nice performance improvement, but comes at the cost of losing the 
> exact accounting of llap buffer instances - e.g. if user gives a purge 
> command, not all the cache space is free'd up as one'd expect because purge 
> only considers buffers that the policy knows about. In this case we'd see in 
> LLAP's iomem servlet that the LRFU policy is empty, but a table may still 
> have the full content loaded.
> Also, if we use text based tables, during cache load, a set of -OrcEncode 
> threads are used that are ephemeral in nature. Attaching buffers to these 
> threads' thread local structures are ultimately lost. In an edge case we 
> could load lots of data into the cache by reading in many distinct smaller 
> text tables, whose buffers never reach LRFU policy, and hence cache hit ratio 
> will be suffering as a consequence (memory manager will give up 

[jira] [Resolved] (HIVE-24698) Sync ACL's for the table directory during external table replication.

2021-02-08 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi resolved HIVE-24698.

Resolution: Fixed

> Sync ACL's for the table directory during external table replication.
> -
>
> Key: HIVE-24698
> URL: https://issues.apache.org/jira/browse/HIVE-24698
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Set similar ACL's to destination table directory in case the source has ACL's 
> enabled or set.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24698) Sync ACL's for the table directory during external table replication.

2021-02-08 Thread Aasha Medhi (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17280965#comment-17280965
 ] 

Aasha Medhi commented on HIVE-24698:


+1

> Sync ACL's for the table directory during external table replication.
> -
>
> Key: HIVE-24698
> URL: https://issues.apache.org/jira/browse/HIVE-24698
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Set similar ACL's to destination table directory in case the source has ACL's 
> enabled or set.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24698) Sync ACL's for the table directory during external table replication.

2021-02-08 Thread Aasha Medhi (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17280966#comment-17280966
 ] 

Aasha Medhi commented on HIVE-24698:


Committed to master. Thank you for the patch [~ayushtkn]

> Sync ACL's for the table directory during external table replication.
> -
>
> Key: HIVE-24698
> URL: https://issues.apache.org/jira/browse/HIVE-24698
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Set similar ACL's to destination table directory in case the source has ACL's 
> enabled or set.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24698) Sync ACL's for the table directory during external table replication.

2021-02-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24698?focusedWorklogId=549515&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-549515
 ]

ASF GitHub Bot logged work on HIVE-24698:
-

Author: ASF GitHub Bot
Created on: 08/Feb/21 11:27
Start Date: 08/Feb/21 11:27
Worklog Time Spent: 10m 
  Work Description: aasha merged pull request #1925:
URL: https://github.com/apache/hive/pull/1925


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 549515)
Time Spent: 40m  (was: 0.5h)

> Sync ACL's for the table directory during external table replication.
> -
>
> Key: HIVE-24698
> URL: https://issues.apache.org/jira/browse/HIVE-24698
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Set similar ACL's to destination table directory in case the source has ACL's 
> enabled or set.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24625) CTAS with TBLPROPERTIES ('transactional'='false') loads data into incorrect directory

2021-02-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24625?focusedWorklogId=549513&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-549513
 ]

ASF GitHub Bot logged work on HIVE-24625:
-

Author: ASF GitHub Bot
Created on: 08/Feb/21 11:21
Start Date: 08/Feb/21 11:21
Worklog Time Spent: 10m 
  Work Description: zeroflag commented on a change in pull request #1856:
URL: https://github.com/apache/hive/pull/1856#discussion_r571968371



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
##
@@ -13272,6 +13274,7 @@ private void updateDefaultTblProps(Map 
source, Map CTAS with TBLPROPERTIES ('transactional'='false') loads data into incorrect 
> directory
> -
>
> Key: HIVE-24625
> URL: https://issues.apache.org/jira/browse/HIVE-24625
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Metastore
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> MetastoreDefaultTransformer in HMS converts a managed non transactional table 
> to external table. MoveTask still uses the managed path when loading the 
> data, resulting an always empty table.
> {code:java}
> create table tbl1 TBLPROPERTIES ('transactional'='false') as select * from 
> other;{code}
> After the conversion the table location points to an external directory:
> Location: | 
> hdfs://c670-node2.coelab.cloudera.com:8020/warehouse/tablespace/external/hive/tbl1
> Move task uses the managed location"
> {code:java}
> INFO : Moving data to directory 
> hdfs://...:8020/warehouse/tablespace/managed/hive/tbl1 from 
> hdfs://...:8020/warehouse/tablespace/managed/hive/.hive-staging_hive_2021-01-05_16-10-39_973_41005081081760609-4/-ext-1000
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24625) CTAS with TBLPROPERTIES ('transactional'='false') loads data into incorrect directory

2021-02-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24625?focusedWorklogId=549507&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-549507
 ]

ASF GitHub Bot logged work on HIVE-24625:
-

Author: ASF GitHub Bot
Created on: 08/Feb/21 10:55
Start Date: 08/Feb/21 10:55
Worklog Time Spent: 10m 
  Work Description: zeroflag commented on a change in pull request #1856:
URL: https://github.com/apache/hive/pull/1856#discussion_r571952378



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetastoreDefaultTransformer.java
##
@@ -583,29 +584,34 @@ public Table transformCreateTable(Table table, 
List processorCapabilitie
   throw new MetaException("Database " + dbName + " for table " + 
table.getTableName() + " could not be found");
 }
 
-  if (TableType.MANAGED_TABLE.name().equals(tableType)) {
+if (TableType.MANAGED_TABLE.name().equals(tableType)) {
   LOG.debug("Table is a MANAGED_TABLE");
   txnal = params.get(TABLE_IS_TRANSACTIONAL);
   txn_properties = params.get(TABLE_TRANSACTIONAL_PROPERTIES);
+  boolean ctas = Boolean.valueOf(params.getOrDefault(TABLE_IS_CTAS, 
"false"));
   isInsertAcid = (txn_properties != null && 
txn_properties.equalsIgnoreCase("insert_only"));
   if ((txnal == null || txnal.equalsIgnoreCase("FALSE")) && !isInsertAcid) 
{ // non-ACID MANAGED TABLE
-LOG.info("Converting " + newTable.getTableName() + " to EXTERNAL 
tableType for " + processorId);
-newTable.setTableType(TableType.EXTERNAL_TABLE.toString());
-params.remove(TABLE_IS_TRANSACTIONAL);
-params.remove(TABLE_TRANSACTIONAL_PROPERTIES);
-params.put("EXTERNAL", "TRUE");
-params.put(EXTERNAL_TABLE_PURGE, "TRUE");
-params.put("TRANSLATED_TO_EXTERNAL", "TRUE");
-newTable.setParameters(params);
-LOG.info("Modified table params are:" + params.toString());
-
-if (!table.isSetSd() || table.getSd().getLocation() == null) {
-  try {
-Path newPath = hmsHandler.getWh().getDefaultTablePath(db, 
table.getTableName(), true);
-newTable.getSd().setLocation(newPath.toString());
-LOG.info("Modified location from null to " + newPath);
-  } catch (Exception e) {
-LOG.warn("Exception determining external table location:" + 
e.getMessage());
+if (ctas) {

Review comment:
   We remove it from the outside at HiveMetaStore>>create_table_core. Since 
transformCreateTable is not always called (only if the transformer is not null).
   
   ```  
   if (tbl.getParameters() != null) {
   tbl.getParameters().remove(TABLE_IS_CTAS);
   }
 ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 549507)
Time Spent: 40m  (was: 0.5h)

> CTAS with TBLPROPERTIES ('transactional'='false') loads data into incorrect 
> directory
> -
>
> Key: HIVE-24625
> URL: https://issues.apache.org/jira/browse/HIVE-24625
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Metastore
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> MetastoreDefaultTransformer in HMS converts a managed non transactional table 
> to external table. MoveTask still uses the managed path when loading the 
> data, resulting an always empty table.
> {code:java}
> create table tbl1 TBLPROPERTIES ('transactional'='false') as select * from 
> other;{code}
> After the conversion the table location points to an external directory:
> Location: | 
> hdfs://c670-node2.coelab.cloudera.com:8020/warehouse/tablespace/external/hive/tbl1
> Move task uses the managed location"
> {code:java}
> INFO : Moving data to directory 
> hdfs://...:8020/warehouse/tablespace/managed/hive/tbl1 from 
> hdfs://...:8020/warehouse/tablespace/managed/hive/.hive-staging_hive_2021-01-05_16-10-39_973_41005081081760609-4/-ext-1000
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24725) Collect top priority items from llap cache policy

2021-02-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24725?focusedWorklogId=549490&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-549490
 ]

ASF GitHub Bot logged work on HIVE-24725:
-

Author: ASF GitHub Bot
Created on: 08/Feb/21 10:13
Start Date: 08/Feb/21 10:13
Worklog Time Spent: 10m 
  Work Description: szlta commented on a change in pull request #1947:
URL: https://github.com/apache/hive/pull/1947#discussion_r571923784



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -4564,6 +4564,9 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
 "The meaning of this parameter is the inverse of the number of time 
ticks (cache\n" +
 " operations, currently) that cause the combined recency-frequency of 
a block in cache\n" +
 " to be halved."),
+LLAP_LRFU_CUTOFF_PERCENTAGE("hive.llap.io.lrfu.cutoff.percentage", 0.10f,

Review comment:
   Naming here could make a connection with the API I think: I'm missing 
"hot buffers"
   Something like hive.llap.io.lrfu.hotbuffers.cutoff.percentage or 
hive.llap.io.lrfu.hotbuffers.percentage just so that we use similar terminology 
in conf name and code.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 549490)
Time Spent: 0.5h  (was: 20m)

> Collect top priority items from llap cache policy
> -
>
> Key: HIVE-24725
> URL: https://issues.apache.org/jira/browse/HIVE-24725
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24698) Sync ACL's for the table directory during external table replication.

2021-02-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24698?focusedWorklogId=549488&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-549488
 ]

ASF GitHub Bot logged work on HIVE-24698:
-

Author: ASF GitHub Bot
Created on: 08/Feb/21 10:12
Start Date: 08/Feb/21 10:12
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on a change in pull request #1925:
URL: https://github.com/apache/hive/pull/1925#discussion_r571923493



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/DirCopyTask.java
##
@@ -70,9 +76,40 @@ private boolean createAndSetPathOwner(Path destPath, Path 
sourcePath) throws IOE
 destPath, sourcePath, status.getOwner(), status.getGroup(), 
status.getPermission());
 destPath.getFileSystem(conf).setOwner(destPath, status.getOwner(), 
status.getGroup());
 destPath.getFileSystem(conf).setPermission(destPath, 
status.getPermission());
+setAclsToTarget(status, sourcePath, destPath);
 return createdDir;
   }
 
+  private void setAclsToTarget(FileStatus sourceStatus, Path sourcePath,
+  Path destPath) throws IOException {
+// Check if distCp options contains preserve ACL.
+if (isPreserveAcl()) {
+  AclStatus sourceAcls =
+  sourcePath.getFileSystem(conf).getAclStatus(sourcePath);
+  if (sourceAcls != null && sourceAcls.getEntries().size() > 0) {
+destPath.getFileSystem(conf).removeAcl(destPath);

Review comment:
   I followed the same semantics as DistCp, DistCp also does removeAcl 
first, added as part of HADOOP-16032.
   The reason being if the directory  at the target  has ACCESS scope ACL and 
the directory at source has only Default ACL, so setAcl will not remove the 
ACCESS scope ACL, hence the ACL's on source & dest won't be same. So, to 
counter that removeACl is called first.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 549488)
Time Spent: 0.5h  (was: 20m)

> Sync ACL's for the table directory during external table replication.
> -
>
> Key: HIVE-24698
> URL: https://issues.apache.org/jira/browse/HIVE-24698
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Set similar ACL's to destination table directory in case the source has ACL's 
> enabled or set.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24736) Make buffer tracking in LLAP cache with BP wrapper more accurate

2021-02-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24736?focusedWorklogId=549487&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-549487
 ]

ASF GitHub Bot logged work on HIVE-24736:
-

Author: ASF GitHub Bot
Created on: 08/Feb/21 10:11
Start Date: 08/Feb/21 10:11
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1951:
URL: https://github.com/apache/hive/pull/1951#discussion_r571923233



##
File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/cache/LowLevelLrfuCachePolicy.java
##
@@ -157,35 +206,29 @@ public void notifyUnlock(LlapCacheableBuffer buffer) {
 if (proactiveEvictionEnabled && !instantProactiveEviction) {
   buffer.removeProactiveEvictionMark();
 }
-int count = threadLocalCount.get();
-final LlapCacheableBuffer[] cacheableBuffers = threadLocalBuffers.get() ;
-if (count < maxQueueSize) {
-  cacheableBuffers[count] = buffer;
-  threadLocalCount.set(++count);
-}
-if (count <= maxQueueSize / 2) {
-  // case too early to flush
-  return;
-}
+BPWrapper bpWrapper = threadLocalBPWrapper.get();
 
-if (count == maxQueueSize) {
-  // case we have to flush thus block on heap lock
-  heapLock.lock();
-  try {
-doNotifyUnderHeapLock(count, cacheableBuffers);
-  } finally {
-threadLocalCount.set(0);
-heapLock.unlock();
+// This will only block in a very very rare scenario only.
+bpWrapper.lock.lock();
+try {
+  final LlapCacheableBuffer[] cacheableBuffers = bpWrapper.buffers;
+  if (bpWrapper.count < maxQueueSize) {
+cacheableBuffers[bpWrapper.count] = buffer;
+++bpWrapper.count;
   }
-  return;
-}
-if (heapLock.tryLock()) {
-  try {
-doNotifyUnderHeapLock(count, cacheableBuffers);
-  } finally {
-threadLocalCount.set(0);
-heapLock.unlock();
+  if (bpWrapper.count <= maxQueueSize / 2) {
+// case too early to flush
+return;
+  }
+
+  if (bpWrapper.count == maxQueueSize) {
+// case we have to flush thus block on heap lock
+bpWrapper.flush();
+return;
   }
+  bpWrapper.tryFlush(); //case 50% < queue usage < 100%, flush is 
preferred but not required yet

Review comment:
   That is a broken link now 😄
   But fair enough





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 549487)
Time Spent: 1h 10m  (was: 1h)

> Make buffer tracking in LLAP cache with BP wrapper more accurate
> 
>
> Key: HIVE-24736
> URL: https://issues.apache.org/jira/browse/HIVE-24736
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> HIVE-22492 has introduced threadlocal buffers in which LlapCachableBuffer 
> instances are stored before entering LRFU's heap - so that lock contention is 
> eased up.
> This is a nice performance improvement, but comes at the cost of losing the 
> exact accounting of llap buffer instances - e.g. if user gives a purge 
> command, not all the cache space is free'd up as one'd expect because purge 
> only considers buffers that the policy knows about. In this case we'd see in 
> LLAP's iomem servlet that the LRFU policy is empty, but a table may still 
> have the full content loaded.
> Also, if we use text based tables, during cache load, a set of -OrcEncode 
> threads are used that are ephemeral in nature. Attaching buffers to these 
> threads' thread local structures are ultimately lost. In an edge case we 
> could load lots of data into the cache by reading in many distinct smaller 
> text tables, whose buffers never reach LRFU policy, and hence cache hit ratio 
> will be suffering as a consequence (memory manager will give up asking LRFU 
> to evict, and will free up random buffers).
> I propose we try and track the amount of data stored in the BP wrapper 
> threadlocals, and flush them into the heap as a first step of a purge 
> request. This will enhance supportability.
> We should also replace the ephemeral OrcEncode threads with a thread pool, 
> that could actually serve as small performance improvement on its own by 
> saving time and memory to deal with thread lifecycle management.



--
This message was sent by Atl

[jira] [Work logged] (HIVE-24736) Make buffer tracking in LLAP cache with BP wrapper more accurate

2021-02-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24736?focusedWorklogId=549473&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-549473
 ]

ASF GitHub Bot logged work on HIVE-24736:
-

Author: ASF GitHub Bot
Created on: 08/Feb/21 10:02
Start Date: 08/Feb/21 10:02
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1951:
URL: https://github.com/apache/hive/pull/1951#discussion_r571915485



##
File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/cache/LowLevelLrfuCachePolicy.java
##
@@ -836,13 +912,33 @@ public String description() {
  * @return long array with LRFU stats
  */
 public long[] getUsageStats() {
-  long dataOnHeap = 0L;   // all non-meta related buffers on min-heap
-  long dataOnList = 0L;   // all non-meta related buffers on eviction list
-  long metaOnHeap = 0L;   // meta data buffers on min-heap
-  long metaOnList = 0L;   // meta data buffers on eviction list
-  long listSize   = 0L;   // number of entries on eviction list
-  long lockedData = 0L;   // number of bytes in locked data buffers
-  long lockedMeta = 0L;   // number of bytes in locked metadata buffers
+  long dataOnHeap = 0L;   // all non-meta related buffers on min-heap
+  long dataOnList = 0L;   // all non-meta related buffers on eviction 
list
+  long metaOnHeap = 0L;   // meta data buffers on min-heap
+  long metaOnList = 0L;   // meta data buffers on eviction list
+  long listSize   = 0L;   // number of entries on eviction list
+  long lockedData = 0L;   // number of bytes in locked data buffers
+  long lockedMeta = 0L;   // number of bytes in locked metadata buffers
+  long bpWrapCount= 0L;   // number of buffers in BP wrapper 
threadlocals
+  long bpWrapDistinct = 0L;   // number of distinct buffers in BP wrapper 
threadlocals
+  long bpWrapData = 0L;   // number of bytes stored in BP wrapper data 
buffers
+  long bpWrapMeta = 0L;   // number of bytes stored in BP wrapper 
metadata buffers
+
+  // Using set to produce result of distinct buffers only
+  // (same buffer may be present in multiple thread local bp wrappers, or 
even inside heap/list, but ultimately
+  // it uses the same cache space)
+  Set bpWrapperBuffers = new HashSet<>();
+  for (BPWrapper bpWrapper : bpWrappers.values()) {
+bpWrapper.lock.lock();

Review comment:
   How often we call `getUsageStats()` is called? Is it ok to lock all of 
the buffers here?

##
File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/cache/LowLevelLrfuCachePolicy.java
##
@@ -909,19 +1017,24 @@ public synchronized void getMetrics(MetricsCollector 
collector, boolean all) {
   .tag(MsInfo.SessionId, session);
 
   // add the values to the new record
-  mrb.addCounter(PolicyInformation.DataOnHeap,   usageStats[DATAONHEAP])
-  .addCounter(PolicyInformation.DataOnList,  usageStats[DATAONLIST])
-  .addCounter(PolicyInformation.MetaOnHeap,  usageStats[METAONHEAP])
-  .addCounter(PolicyInformation.MetaOnList,  usageStats[METAONLIST])
-  .addCounter(PolicyInformation.DataLocked,  usageStats[LOCKEDDATA])
-  .addCounter(PolicyInformation.MetaLocked,  usageStats[LOCKEDMETA])
-  .addCounter(PolicyInformation.HeapSize,heapSize)
-  .addCounter(PolicyInformation.HeapSizeMax, maxHeapSize)
-  .addCounter(PolicyInformation.ListSize,usageStats[LISTSIZE])
-  .addCounter(PolicyInformation.TotalData,   usageStats[DATAONHEAP]
- + usageStats[DATAONLIST])
-  .addCounter(PolicyInformation.TotalMeta,   usageStats[METAONHEAP]
- + usageStats[METAONLIST]);
+  mrb.addCounter(PolicyInformation.DataOnHeap, 
usageStats[DATAONHEAP])
+  .addCounter(PolicyInformation.DataOnList,
usageStats[DATAONLIST])
+  .addCounter(PolicyInformation.MetaOnHeap,
usageStats[METAONHEAP])
+  .addCounter(PolicyInformation.MetaOnList,
usageStats[METAONLIST])
+  .addCounter(PolicyInformation.DataLocked,
usageStats[LOCKEDDATA])
+  .addCounter(PolicyInformation.MetaLocked,
usageStats[LOCKEDMETA])
+  .addCounter(PolicyInformation.HeapSize,  heapSize)
+  .addCounter(PolicyInformation.HeapSizeMax,   maxHeapSize)
+  .addCounter(PolicyInformation.ListSize,  
usageStats[LISTSIZE])
+  .addCounter(PolicyInformation.BPWrapperCount,
usageStats[BPWRAPCNT])
+  .addCounter(PolicyInformation.BPWrapperDistinct, 
usageStats[BPWRAPDISTINCT])
+  .addCounter(PolicyInformation.BPWrapperData, 
usageStats[

[jira] [Work logged] (HIVE-24746) PTF: TimestampValueBoundaryScanner can be optimised during range computation

2021-02-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24746?focusedWorklogId=549485&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-549485
 ]

ASF GitHub Bot logged work on HIVE-24746:
-

Author: ASF GitHub Bot
Created on: 08/Feb/21 10:09
Start Date: 08/Feb/21 10:09
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on pull request #1950:
URL: https://github.com/apache/hive/pull/1950#issuecomment-775030048


   @rbalamohan 
   ```
   Benchmark
 Mode  Cnt  Score   Error  Units
   ValueBoundaryScannerBench.BaseBench.testTimestampEqualsWithInspector 
 avgt5  3.439 ± 0.638  ms/op
   ValueBoundaryScannerBench.BaseBench.testTimestampEqualsWithoutInspector  
 avgt5  0.015 ± 0.001  ms/op
   
ValueBoundaryScannerBench.BaseBench.testTimestampEqualsWithoutInspectorWithTypeCheck
  avgt5  0.015 ± 0.004  ms/op
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 549485)
Time Spent: 20m  (was: 10m)

> PTF: TimestampValueBoundaryScanner can be optimised during range computation
> 
>
> Key: HIVE-24746
> URL: https://issues.apache.org/jira/browse/HIVE-24746
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> During range computation, timestamp ranges become a hotspot due to 
> "TimeStamp" comparisons. It has to construct the entire TimeStamp object via 
> OI (which incurs LocalTime computation etc internally).
>  
> All these are done for "equals" comparison which can be done with "seconds & 
> nanoseconds" present in TimeStamp.
>  
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/ValueBoundaryScanner.java#L852]
>  
>  
> Request is to explore optimising this code path, so that equals() can be 
> performed with "seconds/nanoseconds" instead of entire timestamp
>  
> {noformat}
> at 
> org.apache.hadoop.hive.common.type.Timestamp.setTimeInSeconds(Timestamp.java:133)
>   at 
> org.apache.hadoop.hive.serde2.io.TimestampWritableV2.populateTimestamp(TimestampWritableV2.java:401)
>   at 
> org.apache.hadoop.hive.serde2.io.TimestampWritableV2.getTimestamp(TimestampWritableV2.java:210)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getTimestamp(PrimitiveObjectInspectorUtils.java:1239)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getTimestamp(PrimitiveObjectInspectorUtils.java:1181)
>   at 
> org.apache.hadoop.hive.ql.udf.ptf.TimestampValueBoundaryScanner.isEqual(ValueBoundaryScanner.java:848)
>   at 
> org.apache.hadoop.hive.ql.udf.ptf.SingleValueBoundaryScanner.computeEndCurrentRow(ValueBoundaryScanner.java:593)
>   at 
> org.apache.hadoop.hive.ql.udf.ptf.SingleValueBoundaryScanner.computeEnd(ValueBoundaryScanner.java:530)
>   at 
> org.apache.hadoop.hive.ql.udf.ptf.BasePartitionEvaluator.getRange(BasePartitionEvaluator.java:273)
>   at 
> org.apache.hadoop.hive.ql.udf.ptf.BasePartitionEvaluator.iterate(BasePartitionEvaluator.java:219)
>   at 
> org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.evaluateWindowFunction(WindowingTableFunction.java:147)
>   at 
> org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.access$100(WindowingTableFunction.java:61)
>   at 
> org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction$WindowingIterator.next(WindowingTableFunction.java:755)
>   at 
> org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.finishPartition(PTFOperator.java:373)
>   at 
> org.apache.hadoop.hive.ql.exec.PTFOperator.closeOp(PTFOperator.java:104)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:732)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:756)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.close(ReduceRecordProcessor.java:383)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:284)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>  {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24736) Make buffer tracking in LLAP cache with BP wrapper more accurate

2021-02-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24736?focusedWorklogId=549479&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-549479
 ]

ASF GitHub Bot logged work on HIVE-24736:
-

Author: ASF GitHub Bot
Created on: 08/Feb/21 10:03
Start Date: 08/Feb/21 10:03
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1951:
URL: https://github.com/apache/hive/pull/1951#discussion_r571917546



##
File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java
##
@@ -229,11 +230,20 @@ public void debugDumpShort(StringBuilder sb) {
 new LinkedBlockingQueue(),
 new 
ThreadFactoryBuilder().setNameFormat("IO-Elevator-Thread-%d").setDaemon(true).build());
 FixedSizedObjectPool tracePool = IoTrace.createTracePool(conf);
+if (isEncodeEnabled) {

Review comment:
   Why did we combine the 2 changes to a single jira?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 549479)
Time Spent: 50m  (was: 40m)

> Make buffer tracking in LLAP cache with BP wrapper more accurate
> 
>
> Key: HIVE-24736
> URL: https://issues.apache.org/jira/browse/HIVE-24736
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> HIVE-22492 has introduced threadlocal buffers in which LlapCachableBuffer 
> instances are stored before entering LRFU's heap - so that lock contention is 
> eased up.
> This is a nice performance improvement, but comes at the cost of losing the 
> exact accounting of llap buffer instances - e.g. if user gives a purge 
> command, not all the cache space is free'd up as one'd expect because purge 
> only considers buffers that the policy knows about. In this case we'd see in 
> LLAP's iomem servlet that the LRFU policy is empty, but a table may still 
> have the full content loaded.
> Also, if we use text based tables, during cache load, a set of -OrcEncode 
> threads are used that are ephemeral in nature. Attaching buffers to these 
> threads' thread local structures are ultimately lost. In an edge case we 
> could load lots of data into the cache by reading in many distinct smaller 
> text tables, whose buffers never reach LRFU policy, and hence cache hit ratio 
> will be suffering as a consequence (memory manager will give up asking LRFU 
> to evict, and will free up random buffers).
> I propose we try and track the amount of data stored in the BP wrapper 
> threadlocals, and flush them into the heap as a first step of a purge 
> request. This will enhance supportability.
> We should also replace the ephemeral OrcEncode threads with a thread pool, 
> that could actually serve as small performance improvement on its own by 
> saving time and memory to deal with thread lifecycle management.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24736) Make buffer tracking in LLAP cache with BP wrapper more accurate

2021-02-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24736?focusedWorklogId=549478&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-549478
 ]

ASF GitHub Bot logged work on HIVE-24736:
-

Author: ASF GitHub Bot
Created on: 08/Feb/21 10:03
Start Date: 08/Feb/21 10:03
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1951:
URL: https://github.com/apache/hive/pull/1951#discussion_r571917349



##
File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java
##
@@ -229,11 +230,20 @@ public void debugDumpShort(StringBuilder sb) {
 new LinkedBlockingQueue(),
 new 
ThreadFactoryBuilder().setNameFormat("IO-Elevator-Thread-%d").setDaemon(true).build());
 FixedSizedObjectPool tracePool = IoTrace.createTracePool(conf);
+if (isEncodeEnabled) {
+  int encodeThreads = numThreads * 2;

Review comment:
   Why exactly 2 times?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 549478)
Time Spent: 40m  (was: 0.5h)

> Make buffer tracking in LLAP cache with BP wrapper more accurate
> 
>
> Key: HIVE-24736
> URL: https://issues.apache.org/jira/browse/HIVE-24736
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> HIVE-22492 has introduced threadlocal buffers in which LlapCachableBuffer 
> instances are stored before entering LRFU's heap - so that lock contention is 
> eased up.
> This is a nice performance improvement, but comes at the cost of losing the 
> exact accounting of llap buffer instances - e.g. if user gives a purge 
> command, not all the cache space is free'd up as one'd expect because purge 
> only considers buffers that the policy knows about. In this case we'd see in 
> LLAP's iomem servlet that the LRFU policy is empty, but a table may still 
> have the full content loaded.
> Also, if we use text based tables, during cache load, a set of -OrcEncode 
> threads are used that are ephemeral in nature. Attaching buffers to these 
> threads' thread local structures are ultimately lost. In an edge case we 
> could load lots of data into the cache by reading in many distinct smaller 
> text tables, whose buffers never reach LRFU policy, and hence cache hit ratio 
> will be suffering as a consequence (memory manager will give up asking LRFU 
> to evict, and will free up random buffers).
> I propose we try and track the amount of data stored in the BP wrapper 
> threadlocals, and flush them into the heap as a first step of a purge 
> request. This will enhance supportability.
> We should also replace the ephemeral OrcEncode threads with a thread pool, 
> that could actually serve as small performance improvement on its own by 
> saving time and memory to deal with thread lifecycle management.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24736) Make buffer tracking in LLAP cache with BP wrapper more accurate

2021-02-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24736?focusedWorklogId=549482&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-549482
 ]

ASF GitHub Bot logged work on HIVE-24736:
-

Author: ASF GitHub Bot
Created on: 08/Feb/21 10:05
Start Date: 08/Feb/21 10:05
Worklog Time Spent: 10m 
  Work Description: pvary commented on pull request #1951:
URL: https://github.com/apache/hive/pull/1951#issuecomment-775027304


   LGTM. Some questions only on sizing, and locking



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 549482)
Time Spent: 1h  (was: 50m)

> Make buffer tracking in LLAP cache with BP wrapper more accurate
> 
>
> Key: HIVE-24736
> URL: https://issues.apache.org/jira/browse/HIVE-24736
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> HIVE-22492 has introduced threadlocal buffers in which LlapCachableBuffer 
> instances are stored before entering LRFU's heap - so that lock contention is 
> eased up.
> This is a nice performance improvement, but comes at the cost of losing the 
> exact accounting of llap buffer instances - e.g. if user gives a purge 
> command, not all the cache space is free'd up as one'd expect because purge 
> only considers buffers that the policy knows about. In this case we'd see in 
> LLAP's iomem servlet that the LRFU policy is empty, but a table may still 
> have the full content loaded.
> Also, if we use text based tables, during cache load, a set of -OrcEncode 
> threads are used that are ephemeral in nature. Attaching buffers to these 
> threads' thread local structures are ultimately lost. In an edge case we 
> could load lots of data into the cache by reading in many distinct smaller 
> text tables, whose buffers never reach LRFU policy, and hence cache hit ratio 
> will be suffering as a consequence (memory manager will give up asking LRFU 
> to evict, and will free up random buffers).
> I propose we try and track the amount of data stored in the BP wrapper 
> threadlocals, and flush them into the heap as a first step of a purge 
> request. This will enhance supportability.
> We should also replace the ephemeral OrcEncode threads with a thread pool, 
> that could actually serve as small performance improvement on its own by 
> saving time and memory to deal with thread lifecycle management.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24736) Make buffer tracking in LLAP cache with BP wrapper more accurate

2021-02-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24736?focusedWorklogId=549470&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-549470
 ]

ASF GitHub Bot logged work on HIVE-24736:
-

Author: ASF GitHub Bot
Created on: 08/Feb/21 09:55
Start Date: 08/Feb/21 09:55
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1951:
URL: https://github.com/apache/hive/pull/1951#discussion_r571912244



##
File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/cache/LowLevelLrfuCachePolicy.java
##
@@ -157,35 +206,29 @@ public void notifyUnlock(LlapCacheableBuffer buffer) {
 if (proactiveEvictionEnabled && !instantProactiveEviction) {
   buffer.removeProactiveEvictionMark();
 }
-int count = threadLocalCount.get();
-final LlapCacheableBuffer[] cacheableBuffers = threadLocalBuffers.get() ;
-if (count < maxQueueSize) {
-  cacheableBuffers[count] = buffer;
-  threadLocalCount.set(++count);
-}
-if (count <= maxQueueSize / 2) {
-  // case too early to flush
-  return;
-}
+BPWrapper bpWrapper = threadLocalBPWrapper.get();
 
-if (count == maxQueueSize) {
-  // case we have to flush thus block on heap lock
-  heapLock.lock();
-  try {
-doNotifyUnderHeapLock(count, cacheableBuffers);
-  } finally {
-threadLocalCount.set(0);
-heapLock.unlock();
+// This will only block in a very very rare scenario only.
+bpWrapper.lock.lock();
+try {
+  final LlapCacheableBuffer[] cacheableBuffers = bpWrapper.buffers;
+  if (bpWrapper.count < maxQueueSize) {
+cacheableBuffers[bpWrapper.count] = buffer;
+++bpWrapper.count;
   }
-  return;
-}
-if (heapLock.tryLock()) {
-  try {
-doNotifyUnderHeapLock(count, cacheableBuffers);
-  } finally {
-threadLocalCount.set(0);
-heapLock.unlock();
+  if (bpWrapper.count <= maxQueueSize / 2) {
+// case too early to flush
+return;
+  }
+
+  if (bpWrapper.count == maxQueueSize) {
+// case we have to flush thus block on heap lock
+bpWrapper.flush();
+return;
   }
+  bpWrapper.tryFlush(); //case 50% < queue usage < 100%, flush is 
preferred but not required yet

Review comment:
   QQ: How did you come up with these numbers? Any tests showing these are 
the good values?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 549470)
Time Spent: 20m  (was: 10m)

> Make buffer tracking in LLAP cache with BP wrapper more accurate
> 
>
> Key: HIVE-24736
> URL: https://issues.apache.org/jira/browse/HIVE-24736
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> HIVE-22492 has introduced threadlocal buffers in which LlapCachableBuffer 
> instances are stored before entering LRFU's heap - so that lock contention is 
> eased up.
> This is a nice performance improvement, but comes at the cost of losing the 
> exact accounting of llap buffer instances - e.g. if user gives a purge 
> command, not all the cache space is free'd up as one'd expect because purge 
> only considers buffers that the policy knows about. In this case we'd see in 
> LLAP's iomem servlet that the LRFU policy is empty, but a table may still 
> have the full content loaded.
> Also, if we use text based tables, during cache load, a set of -OrcEncode 
> threads are used that are ephemeral in nature. Attaching buffers to these 
> threads' thread local structures are ultimately lost. In an edge case we 
> could load lots of data into the cache by reading in many distinct smaller 
> text tables, whose buffers never reach LRFU policy, and hence cache hit ratio 
> will be suffering as a consequence (memory manager will give up asking LRFU 
> to evict, and will free up random buffers).
> I propose we try and track the amount of data stored in the BP wrapper 
> threadlocals, and flush them into the heap as a first step of a purge 
> request. This will enhance supportability.
> We should also replace the ephemeral OrcEncode threads with a thread pool, 
> that could actually serve as small performance improvement on its own by 
> saving time and memory to deal with thread lifecycle manageme

[jira] [Commented] (HIVE-24711) hive metastore memory leak

2021-02-08 Thread Karen Coppage (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17280830#comment-17280830
 ] 

Karen Coppage commented on HIVE-24711:
--

I can't diagnose anything based on just these log snippets but please check if 
your version contains this fix: HIVE-22700; if not it might help you.

> hive metastore memory leak
> --
>
> Key: HIVE-24711
> URL: https://issues.apache.org/jira/browse/HIVE-24711
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Metastore
>Affects Versions: 3.1.0
>Reporter: LinZhongwei
>Priority: Major
>
> hdp version:3.1.5.31-1
> hive version:3.1.0.3.1.5.31-1
> hadoop version:3.1.1.3.1.5.31-1
> We find that the hive metastore has memory leak if we set 
> compactor.initiator.on to true.
> If we disable the configuration, the memory leak disappear.
> How can we resolve this problem?
> Even if we set the heap size of hive metastore to 40 GB, after 1 month the 
> hive metastore service will be down with outofmemory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)