[jira] [Work logged] (HIVE-24511) Fix typo in SerDeStorageSchemaReader

2021-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24511?focusedWorklogId=531742=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-531742
 ]

ASF GitHub Bot logged work on HIVE-24511:
-

Author: ASF GitHub Bot
Created on: 06/Jan/21 07:52
Start Date: 06/Jan/21 07:52
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on a change in pull request #1757:
URL: https://github.com/apache/hive/pull/1757#discussion_r552419455



##
File path: 
metastore/src/java/org/apache/hadoop/hive/metastore/SerDeStorageSchemaReader.java
##
@@ -47,10 +48,10 @@
   Deserializer s = HiveMetaStoreUtils.getDeserializer(conf, tbl, false);
   return HiveMetaStoreUtils.getFieldsFromDeserializer(tbl.getTableName(), 
s);
 } catch (Exception e) {
-  StringUtils.stringifyException(e);
-  throw new MetaException(e.getMessage());
+  throw new MetaException(StringUtils.stringifyException(e));

Review comment:
   Hi, @belugabehr could you please take another look at the changes? thank 
you





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 531742)
Time Spent: 40m  (was: 0.5h)

> Fix typo in SerDeStorageSchemaReader
> 
>
> Key: HIVE-24511
> URL: https://issues.apache.org/jira/browse/HIVE-24511
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> 1,  Close the created classloader to release resources.
> 2,  More detail error messages on MetaException when throwing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24558) Handle update in table level regular expression.

2021-01-05 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-24558:
---
Attachment: HIVE-24558.05.patch
Status: Patch Available  (was: In Progress)

> Handle update in table level regular expression.
> 
>
> Key: HIVE-24558
> URL: https://issues.apache.org/jira/browse/HIVE-24558
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24558.01.patch, HIVE-24558.02.patch, 
> HIVE-24558.03.patch, HIVE-24558.04.patch, HIVE-24558.05.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24558) Handle update in table level regular expression.

2021-01-05 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-24558:
---
Status: In Progress  (was: Patch Available)

> Handle update in table level regular expression.
> 
>
> Key: HIVE-24558
> URL: https://issues.apache.org/jira/browse/HIVE-24558
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24558.01.patch, HIVE-24558.02.patch, 
> HIVE-24558.03.patch, HIVE-24558.04.patch, HIVE-24558.05.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24543) Support SAML 2.0 as an authentication mechanism

2021-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24543?focusedWorklogId=531672=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-531672
 ]

ASF GitHub Bot logged work on HIVE-24543:
-

Author: ASF GitHub Bot
Created on: 06/Jan/21 04:13
Start Date: 06/Jan/21 04:13
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on a change in pull request #1791:
URL: https://github.com/apache/hive/pull/1791#discussion_r552359001



##
File path: 
service/src/java/org/apache/hive/service/auth/AuthenticationProviderFactory.java
##
@@ -76,6 +77,9 @@ public static PasswdAuthenticationProvider 
getAuthenticationProvider(AuthMethods
   return new CustomAuthenticationProviderImpl((conf == null) ? 
AuthMethods.CUSTOM.getConf() : conf);
 } else if (authMethod == AuthMethods.NONE) {
   return new AnonymousAuthenticationProviderImpl();
+} else if (authMethod == AuthMethods.SAML) {
+  //TODO right thing to do?
+  return new AnonymousAuthenticationProviderImpl();

Review comment:
   not sure I understand how this works.
   When auth is set to SAML on server, is this provider even called?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 531672)
Time Spent: 4h  (was: 3h 50m)

> Support SAML 2.0 as an authentication mechanism
> ---
>
> Key: HIVE-24543
> URL: https://issues.apache.org/jira/browse/HIVE-24543
> Project: Hive
>  Issue Type: New Feature
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> With cloud based deployments, having a SAML 2.0 based authentication support 
> in HS2 will be greatly useful in case of federated or external identity 
> providers like Okta, PingIdentity or Azure AD.
> This authentication mechanism can initially be only supported on http 
> transport mode in HiveServer2 since the SAML 2.0 protocol is primarily 
> designed for web clients.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24543) Support SAML 2.0 as an authentication mechanism

2021-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24543?focusedWorklogId=531669=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-531669
 ]

ASF GitHub Bot logged work on HIVE-24543:
-

Author: ASF GitHub Bot
Created on: 06/Jan/21 04:00
Start Date: 06/Jan/21 04:00
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on a change in pull request #1791:
URL: https://github.com/apache/hive/pull/1791#discussion_r552356196



##
File path: service/pom.xml
##
@@ -179,6 +179,37 @@
 
   
 
+
+  org.pac4j
+  pac4j-saml-opensamlv3
+  4.0.3

Review comment:
   should define the scope for this dependency. Might pull in transitive 
dependencies that might be undesirable.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 531669)
Time Spent: 3h 50m  (was: 3h 40m)

> Support SAML 2.0 as an authentication mechanism
> ---
>
> Key: HIVE-24543
> URL: https://issues.apache.org/jira/browse/HIVE-24543
> Project: Hive
>  Issue Type: New Feature
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> With cloud based deployments, having a SAML 2.0 based authentication support 
> in HS2 will be greatly useful in case of federated or external identity 
> providers like Okta, PingIdentity or Azure AD.
> This authentication mechanism can initially be only supported on http 
> transport mode in HiveServer2 since the SAML 2.0 protocol is primarily 
> designed for web clients.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-22753) Fix gradual mem leak: Operationlog related appenders should be cleared up on errors

2021-01-05 Thread Eugene Chung (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17259396#comment-17259396
 ] 

Eugene Chung edited comment on HIVE-22753 at 1/6/21, 3:59 AM:
--

[~chengyong] I also applied this patch to Hive 3.1.2 and logs 'closed recently' 
were found. But still RandomAccessFileAppender instances were leaked.

 

 


was (Author: euigeun_chung):
[~chengyong] I also applied this patch to Hive 3.1.2 and logs 'closed recently' 
were found. But still RandomAccessFileAppender instances were leaked.

!image-2021-01-06-12-57-25-705.png|width=890,height=59!

!image-2021-01-06-12-58-34-088.png|width=851,height=395!

 

> Fix gradual mem leak: Operationlog related appenders should be cleared up on 
> errors 
> 
>
> Key: HIVE-22753
> URL: https://issues.apache.org/jira/browse/HIVE-22753
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Fix For: 4.0.0
>
> Attachments: HIVE-22753.1.patch, HIVE-22753.2.patch, 
> HIVE-22753.3.patch, HIVE-22753.4.patch, image-2020-01-21-11-14-37-911.png, 
> image-2020-01-21-11-17-59-279.png, image-2020-01-21-11-18-37-294.png
>
>
> In case of exception in SQLOperation, operational log does not get cleared 
> up. This causes gradual build up of HushableRandomAccessFileAppender causing 
> HS2 to OOM after some time.
> !image-2020-01-21-11-14-37-911.png|width=431,height=267!
>  
> Allocation tree
> !image-2020-01-21-11-18-37-294.png|width=425,height=178!
>  
> Prod instance mem
> !image-2020-01-21-11-17-59-279.png|width=671,height=201!
>  
> Each HushableRandomAccessFileAppender holds internal ref to 
> RandomAccessFileAppender which holds a 256 KB bytebuffer, causing the mem 
> leak.
> Related ticket: HIVE-18820



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22753) Fix gradual mem leak: Operationlog related appenders should be cleared up on errors

2021-01-05 Thread Eugene Chung (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17259396#comment-17259396
 ] 

Eugene Chung commented on HIVE-22753:
-

[~chengyong] I also applied this patch to Hive 3.1.2 and logs 'closed recently' 
were found. But still RandomAccessFileAppender instances were leaked.

!image-2021-01-06-12-57-25-705.png|width=890,height=59!

!image-2021-01-06-12-58-34-088.png|width=851,height=395!

 

> Fix gradual mem leak: Operationlog related appenders should be cleared up on 
> errors 
> 
>
> Key: HIVE-22753
> URL: https://issues.apache.org/jira/browse/HIVE-22753
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Fix For: 4.0.0
>
> Attachments: HIVE-22753.1.patch, HIVE-22753.2.patch, 
> HIVE-22753.3.patch, HIVE-22753.4.patch, image-2020-01-21-11-14-37-911.png, 
> image-2020-01-21-11-17-59-279.png, image-2020-01-21-11-18-37-294.png
>
>
> In case of exception in SQLOperation, operational log does not get cleared 
> up. This causes gradual build up of HushableRandomAccessFileAppender causing 
> HS2 to OOM after some time.
> !image-2020-01-21-11-14-37-911.png|width=431,height=267!
>  
> Allocation tree
> !image-2020-01-21-11-18-37-294.png|width=425,height=178!
>  
> Prod instance mem
> !image-2020-01-21-11-17-59-279.png|width=671,height=201!
>  
> Each HushableRandomAccessFileAppender holds internal ref to 
> RandomAccessFileAppender which holds a 256 KB bytebuffer, causing the mem 
> leak.
> Related ticket: HIVE-18820



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24589) Drop catalog failing with deadlock error for Oracle backend dbms.

2021-01-05 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera reassigned HIVE-24589:
--


> Drop catalog failing with deadlock error for Oracle backend dbms.
> -
>
> Key: HIVE-24589
> URL: https://issues.apache.org/jira/browse/HIVE-24589
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
> When we do a drop catalog we drop the catalog from the CTLGS table. The DBS 
> table has a foreign key reference on CTLGS for CTLG_NAME. This is causing the 
> DBS table to be locked exclusively and causing deadlocks. This can be avoided 
> by creating an index in the DBS table on CTLG_NAME.
> {code:java}
> CREATE INDEX CTLG_NAME_DBS ON DBS(CTLG_NAME); {code}
> {code:java}
>  Oracle Database maximizes the concurrency control of parent keys in relation 
> to dependent foreign keys.Locking behaviour depends on whether foreign key 
> columns are indexed. If foreign keys are not indexed, then the child table 
> will probably be locked more frequently, deadlocks will occur, and 
> concurrency will be decreased. For this reason foreign keys should almost 
> always be indexed. The only exception is when the matching unique or primary 
> key is never updated or deleted.{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24543) Support SAML 2.0 as an authentication mechanism

2021-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24543?focusedWorklogId=531664=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-531664
 ]

ASF GitHub Bot logged work on HIVE-24543:
-

Author: ASF GitHub Bot
Created on: 06/Jan/21 03:47
Start Date: 06/Jan/21 03:47
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on a change in pull request #1791:
URL: https://github.com/apache/hive/pull/1791#discussion_r552353157



##
File path: jdbc/src/java/org/apache/hive/jdbc/saml/HiveJdbcBrowserClient.java
##
@@ -0,0 +1,322 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hive.jdbc.saml;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+import com.google.common.base.Strings;
+import com.google.common.util.concurrent.ThreadFactoryBuilder;
+import java.awt.Desktop;
+import java.io.BufferedReader;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.io.PrintWriter;
+import java.io.UnsupportedEncodingException;
+import java.net.InetAddress;
+import java.net.ServerSocket;
+import java.net.Socket;
+import java.net.SocketTimeoutException;
+import java.net.URI;
+import java.net.URISyntaxException;
+import java.net.URLDecoder;
+import java.nio.charset.StandardCharsets;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.Callable;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import org.apache.hive.jdbc.Utils.JdbcConnectionParams;
+import org.apache.hive.service.auth.saml.HiveSamlUtils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * This class is used to execute a browser based SSO workflow with the 
authentication mode
+ * is browser.
+ */
+public class HiveJdbcBrowserClient implements IJdbcBrowserClient {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(HiveJdbcBrowserClient.class);
+  // error message when the socket times out.
+  @VisibleForTesting
+  public static final String TIMEOUT_ERROR_MSG = "Timed out while waiting for 
server response";
+  private final ServerSocket serverSocket;
+  private HiveJdbcBrowserServerResponse serverResponse;
+  protected JdbcBrowserClientContext clientContext;
+  // By default we wait for 2 min unless overridden by a JDBC connection param
+  // browserResponseTimeout
+  private static final int DEFAULT_SOCKET_TIMEOUT_SECS = 120;
+  private final ExecutorService serverResponseThread = 
Executors.newSingleThreadExecutor(
+  new ThreadFactoryBuilder().setNameFormat("Hive-Jdbc-Browser-Client-%d")
+  .setDaemon(true).build());
+
+  HiveJdbcBrowserClient(JdbcConnectionParams connectionParams)
+  throws HiveJdbcBrowserException {
+serverSocket = getServerSocket(connectionParams.getSessionVars());
+  }
+
+  private ServerSocket getServerSocket(Map sessionConf)
+  throws HiveJdbcBrowserException {
+final ServerSocket serverSocket;
+int port = Integer.parseInt(sessionConf
+.getOrDefault(JdbcConnectionParams.AUTH_BROWSER_RESPONSE_PORT, "0"));
+int timeout = Integer.parseInt(
+
sessionConf.getOrDefault(JdbcConnectionParams.AUTH_BROWSER_RESPONSE_TIMEOUT_SECS,
+String.valueOf(DEFAULT_SOCKET_TIMEOUT_SECS)));
+try {
+  serverSocket = new ServerSocket(port, 0,
+  InetAddress.getByName(HiveSamlUtils.LOOP_BACK_INTERFACE));
+  LOG.debug("Browser response timeout is set to {} seconds", timeout);
+  serverSocket.setSoTimeout(timeout * 1000);
+} catch (IOException e) {
+  throw new HiveJdbcBrowserException("Unable to bind to the localhost");
+}
+return serverSocket;
+  }
+
+  public Integer getPort() {
+return serverSocket.getLocalPort();
+  }
+
+  @Override
+  public void close() throws IOException {
+if (serverSocket != null) {
+  serverSocket.close();
+}
+  }
+
+  public void init(JdbcBrowserClientContext clientContext) {
+// everytime we 

[jira] [Work logged] (HIVE-24543) Support SAML 2.0 as an authentication mechanism

2021-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24543?focusedWorklogId=531663=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-531663
 ]

ASF GitHub Bot logged work on HIVE-24543:
-

Author: ASF GitHub Bot
Created on: 06/Jan/21 03:46
Start Date: 06/Jan/21 03:46
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on a change in pull request #1791:
URL: https://github.com/apache/hive/pull/1791#discussion_r552352951



##
File path: jdbc/src/java/org/apache/hive/jdbc/saml/HiveJdbcBrowserClient.java
##
@@ -0,0 +1,322 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hive.jdbc.saml;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+import com.google.common.base.Strings;
+import com.google.common.util.concurrent.ThreadFactoryBuilder;
+import java.awt.Desktop;
+import java.io.BufferedReader;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.io.PrintWriter;
+import java.io.UnsupportedEncodingException;
+import java.net.InetAddress;
+import java.net.ServerSocket;
+import java.net.Socket;
+import java.net.SocketTimeoutException;
+import java.net.URI;
+import java.net.URISyntaxException;
+import java.net.URLDecoder;
+import java.nio.charset.StandardCharsets;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.Callable;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import org.apache.hive.jdbc.Utils.JdbcConnectionParams;
+import org.apache.hive.service.auth.saml.HiveSamlUtils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * This class is used to execute a browser based SSO workflow with the 
authentication mode
+ * is browser.
+ */
+public class HiveJdbcBrowserClient implements IJdbcBrowserClient {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(HiveJdbcBrowserClient.class);
+  // error message when the socket times out.
+  @VisibleForTesting
+  public static final String TIMEOUT_ERROR_MSG = "Timed out while waiting for 
server response";
+  private final ServerSocket serverSocket;
+  private HiveJdbcBrowserServerResponse serverResponse;
+  protected JdbcBrowserClientContext clientContext;
+  // By default we wait for 2 min unless overridden by a JDBC connection param
+  // browserResponseTimeout
+  private static final int DEFAULT_SOCKET_TIMEOUT_SECS = 120;
+  private final ExecutorService serverResponseThread = 
Executors.newSingleThreadExecutor(
+  new ThreadFactoryBuilder().setNameFormat("Hive-Jdbc-Browser-Client-%d")
+  .setDaemon(true).build());
+
+  HiveJdbcBrowserClient(JdbcConnectionParams connectionParams)
+  throws HiveJdbcBrowserException {
+serverSocket = getServerSocket(connectionParams.getSessionVars());
+  }
+
+  private ServerSocket getServerSocket(Map sessionConf)
+  throws HiveJdbcBrowserException {
+final ServerSocket serverSocket;
+int port = Integer.parseInt(sessionConf
+.getOrDefault(JdbcConnectionParams.AUTH_BROWSER_RESPONSE_PORT, "0"));
+int timeout = Integer.parseInt(
+
sessionConf.getOrDefault(JdbcConnectionParams.AUTH_BROWSER_RESPONSE_TIMEOUT_SECS,
+String.valueOf(DEFAULT_SOCKET_TIMEOUT_SECS)));
+try {
+  serverSocket = new ServerSocket(port, 0,
+  InetAddress.getByName(HiveSamlUtils.LOOP_BACK_INTERFACE));
+  LOG.debug("Browser response timeout is set to {} seconds", timeout);
+  serverSocket.setSoTimeout(timeout * 1000);
+} catch (IOException e) {
+  throw new HiveJdbcBrowserException("Unable to bind to the localhost");
+}
+return serverSocket;
+  }
+
+  public Integer getPort() {
+return serverSocket.getLocalPort();

Review comment:
   should this account for serverSocket being null in case of an exception 
in getServerSocket() ?





This is an automated message from 

[jira] [Work logged] (HIVE-24075) Optimise KeyValuesInputMerger

2021-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24075?focusedWorklogId=531662=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-531662
 ]

ASF GitHub Bot logged work on HIVE-24075:
-

Author: ASF GitHub Bot
Created on: 06/Jan/21 03:42
Start Date: 06/Jan/21 03:42
Worklog Time Spent: 10m 
  Work Description: rbalamohan opened a new pull request #1463:
URL: https://github.com/apache/hive/pull/1463


   https://issues.apache.org/jira/browse/HIVE-24075
   
   When the reader comparisons in the queue are same, we could reuse 
"nextKVReaders" in next subsequent iteration instead of doing the comparison 
all over again.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 531662)
Time Spent: 40m  (was: 0.5h)

> Optimise KeyValuesInputMerger
> -
>
> Key: HIVE-24075
> URL: https://issues.apache.org/jira/browse/HIVE-24075
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Comparisons in KeyValueInputMerger can be reduced.
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/tools/KeyValuesInputMerger.java#L165|https://github.infra.cloudera.com/CDH/hive/blob/cdpd-master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/tools/KeyValuesInputMerger.java#L165]
> [https://github.infra.cloudera.com/CDH/hive/blob/cdpd-master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/tools/KeyValuesInputMerger.java#L150|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/tools/KeyValuesInputMerger.java#L150]
> If the reader comparisons in the queue are same, we could reuse 
> "{{nextKVReaders}}" in next subsequent iteration instead of doing the 
> comparison all over again.
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/tools/KeyValuesInputMerger.java#L178]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-24570) Hive on spark tmp file should be delete when driver process finished

2021-01-05 Thread zhaolong (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254814#comment-17254814
 ] 

zhaolong edited comment on HIVE-24570 at 1/6/21, 1:41 AM:
--

can someone review this patch?  [~vihangk1]  [~ychena]


was (Author: fsilent):
can someone review this patch?

> Hive on spark tmp file should be delete when driver process finished
> 
>
> Key: HIVE-24570
> URL: https://issues.apache.org/jira/browse/HIVE-24570
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.0, 4.0.0
>Reporter: zhaolong
>Assignee: zhaolong
>Priority: Major
>  Labels: pull-request-available
> Attachments: 0001-HIVE-24570.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Hive on spark tmp file should be delete when driver process finished, now 
> it`s in java.io.tmpdir (default /tmp) directory until hiveserver jvm is stop



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24242) Relax safety checks in SharedWorkOptimizer

2021-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24242?focusedWorklogId=531615=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-531615
 ]

ASF GitHub Bot logged work on HIVE-24242:
-

Author: ASF GitHub Bot
Created on: 06/Jan/21 01:09
Start Date: 06/Jan/21 01:09
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1564:
URL: https://github.com/apache/hive/pull/1564


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 531615)
Time Spent: 40m  (was: 0.5h)

> Relax safety checks in SharedWorkOptimizer
> --
>
> Key: HIVE-24242
> URL: https://issues.apache.org/jira/browse/HIVE-24242
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> there are some checks to lock out problematic cases
> For UnionOperator 
> [here|https://github.com/apache/hive/blob/1507d80fd47aad38b87bba4fd58c1427ba89dbbf/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SharedWorkOptimizer.java#L1571]
> This check could prevent the optimization even if the Union is only visible 
> from only 1 of the TS ops.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24334) SyntheticJoinPredicate creation may be missed when ReduceSink has Join input

2021-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24334?focusedWorklogId=531613=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-531613
 ]

ASF GitHub Bot logged work on HIVE-24334:
-

Author: ASF GitHub Bot
Created on: 06/Jan/21 01:09
Start Date: 06/Jan/21 01:09
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1631:
URL: https://github.com/apache/hive/pull/1631


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 531613)
Time Spent: 0.5h  (was: 20m)

> SyntheticJoinPredicate creation may be missed when ReduceSink has Join input
> 
>
> Key: HIVE-24334
> URL: https://issues.apache.org/jira/browse/HIVE-24334
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Let assume we have a plan where some Reduce Sink operators has a Join 
> operator as input:
> {code}
> TS[33]-FIL[34]-SEL[35]-RS[42]-JOIN[44]-RS[45]-JOIN[47]
> TS[36]-FIL[37]-SEL[38]-RS[43]-JOIN[44]
> TS[39]-FIL[40]-SEL[41]-RS[46]-JOIN[47]
> {code}
> RS[45] inputs is JOIN[44].
> When searching for additional opportunities to create additional 
> SyntheticJoinPredicates _ExprNodeDescUtils.backtrack_ does not return the 
> input expression of the expression in the join operator but the expression 
> itself.
> This is caused by
> - if the operator is a join operator where we create the join predicate 
> derivatives the expression is not resolved
> https://github.com/apache/hive/blob/375433510b73c5a22bde4e13485dfc16eaa24706/ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java#L400
> - later the backtrack algorithm does the resolution of the expression doesn't 
> do any iterations since it is already in a terminal state when it is called
> https://github.com/apache/hive/blob/375433510b73c5a22bde4e13485dfc16eaa24706/ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java#L414



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22415) Upgrade to Java 11

2021-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22415?focusedWorklogId=531614=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-531614
 ]

ASF GitHub Bot logged work on HIVE-22415:
-

Author: ASF GitHub Bot
Created on: 06/Jan/21 01:09
Start Date: 06/Jan/21 01:09
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1624:
URL: https://github.com/apache/hive/pull/1624


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 531614)
Time Spent: 4h 20m  (was: 4h 10m)

> Upgrade to Java 11
> --
>
> Key: HIVE-22415
> URL: https://issues.apache.org/jira/browse/HIVE-22415
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Upgrade Hive to Java JDK 11



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24543) Support SAML 2.0 as an authentication mechanism

2021-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24543?focusedWorklogId=531595=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-531595
 ]

ASF GitHub Bot logged work on HIVE-24543:
-

Author: ASF GitHub Bot
Created on: 06/Jan/21 00:12
Start Date: 06/Jan/21 00:12
Worklog Time Spent: 10m 
  Work Description: vihangk1 commented on a change in pull request #1791:
URL: https://github.com/apache/hive/pull/1791#discussion_r552276586



##
File path: jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java
##
@@ -592,6 +638,10 @@ public boolean retryRequest(IOException exception, int 
executionCount, HttpConte
   }
 });
 
+if (isBrowserAuthMode()) {
+  httpClientBuilder

Review comment:
   This is however a good point. Having browser mode without cookie auth 
doesn't make sense. Since without cookie auth, each http call will open a 
browser (althought it will be automatically authenticated). I think for the 
first version may be we should disallow browser auth if cookie-auth is not 
enabled on the server side. Thoughts?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 531595)
Time Spent: 3h 20m  (was: 3h 10m)

> Support SAML 2.0 as an authentication mechanism
> ---
>
> Key: HIVE-24543
> URL: https://issues.apache.org/jira/browse/HIVE-24543
> Project: Hive
>  Issue Type: New Feature
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> With cloud based deployments, having a SAML 2.0 based authentication support 
> in HS2 will be greatly useful in case of federated or external identity 
> providers like Okta, PingIdentity or Azure AD.
> This authentication mechanism can initially be only supported on http 
> transport mode in HiveServer2 since the SAML 2.0 protocol is primarily 
> designed for web clients.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24543) Support SAML 2.0 as an authentication mechanism

2021-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24543?focusedWorklogId=531594=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-531594
 ]

ASF GitHub Bot logged work on HIVE-24543:
-

Author: ASF GitHub Bot
Created on: 06/Jan/21 00:09
Start Date: 06/Jan/21 00:09
Worklog Time Spent: 10m 
  Work Description: vihangk1 commented on a change in pull request #1791:
URL: https://github.com/apache/hive/pull/1791#discussion_r552274366



##
File path: jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java
##
@@ -896,39 +946,103 @@ private void openSession() throws SQLException {
   openReq.setPassword(sessConfMap.get(JdbcConnectionParams.AUTH_PASSWD));
 }
 
+//TODO(Vihang): This is a bit hacky. We piggy back on a dummy OpenSession 
call
+// to get the redirect response from the server. Instead its probably 
cleaner to
+// explicitly do a HTTP post request and get the response.
+int numRetry = isBrowserAuthMode() ? 2 : 1;
+for (int i=0; i Support SAML 2.0 as an authentication mechanism
> ---
>
> Key: HIVE-24543
> URL: https://issues.apache.org/jira/browse/HIVE-24543
> Project: Hive
>  Issue Type: New Feature
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> With cloud based deployments, having a SAML 2.0 based authentication support 
> in HS2 will be greatly useful in case of federated or external identity 
> providers like Okta, PingIdentity or Azure AD.
> This authentication mechanism can initially be only supported on http 
> transport mode in HiveServer2 since the SAML 2.0 protocol is primarily 
> designed for web clients.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-15820) comment at the head of beeline -e

2021-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-15820?focusedWorklogId=531587=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-531587
 ]

ASF GitHub Bot logged work on HIVE-15820:
-

Author: ASF GitHub Bot
Created on: 05/Jan/21 23:50
Start Date: 05/Jan/21 23:50
Worklog Time Spent: 10m 
  Work Description: ujc714 commented on pull request #1814:
URL: https://github.com/apache/hive/pull/1814#issuecomment-754971040


   I moved trim() from HiveStringUtils.removeComments(String, int[]) to 
HiveStringUtils.removeComments(String) so that I only need to change 
TestHiveStringUtils.java instead of 6 other test result files.
   
   In my option, HiveStringUtils.removeComments(String, int[]) shouldn't trim 
the spaces. Otherwise, we'll lose the indents in a formatted SQL statement. 
Trimming the leading and trailing spaces of the whole statement should be 
enough.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 531587)
Time Spent: 0.5h  (was: 20m)

> comment at the head of beeline -e
> -
>
> Key: HIVE-15820
> URL: https://issues.apache.org/jira/browse/HIVE-15820
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 1.2.1, 2.1.1
>Reporter: muxin
>Assignee: Robbie Zhang
>Priority: Major
>  Labels: patch, pull-request-available
> Attachments: HIVE-15820.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> $ beeline -u jdbc:hive2://localhost:1 -n test -e "
> > --asdfasdfasdfasdf
> > select * from test_table;
> > "
> expected result of the above command should be all rows of test_table(same as 
> run in beeline interactive mode),but it does not output anything.
> the cause is that -e option will read commands as one string, and in method 
> dispatch(String line) it calls function isComment(String line) in the first, 
> which using
>  'lineTrimmed.startsWith("#") || lineTrimmed.startsWith("--")' 
> to regard commands as a comment.
> two ways can be considered to fix this problem:
> 1. in method initArgs(String[] args), split command by '\n' into command list 
> before dispatch when cl.getOptionValues('e') != null
> 2. in method dispatch(String line), remove comments using this:
> static String removeComments(String line) {
> if (line == null || line.isEmpty()) {
> return line;
> }
> StringBuilder builder = new StringBuilder();
> int escape = -1;
> for (int index = 0; index < line.length(); index++) {
> if (index < line.length() - 1 && line.charAt(index) == 
> line.charAt(index + 1)) {
> if (escape == -1 && line.charAt(index) == '-') {
> //find \n as the end of comment
> index = line.indexOf('\n',index+1);
> //there is no sql after this comment,so just break out
> if (-1==index){
> break;
> }
> }
> }
> char letter = line.charAt(index);
> if (letter == escape) {
> escape = -1; // Turn escape off.
> } else if (escape == -1 && (letter == '\'' || letter == '"')) {
> escape = letter; // Turn escape on.
> }
> builder.append(letter);
> }
> return builder.toString();
>   }
> the second way can be a general solution to remove all comments start with 
> '--'  in a sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-15820) comment at the head of beeline -e

2021-01-05 Thread Robbie Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-15820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Zhang reassigned HIVE-15820:
---

Assignee: Robbie Zhang  (was: muxin)

> comment at the head of beeline -e
> -
>
> Key: HIVE-15820
> URL: https://issues.apache.org/jira/browse/HIVE-15820
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 1.2.1, 2.1.1
>Reporter: muxin
>Assignee: Robbie Zhang
>Priority: Major
>  Labels: patch, pull-request-available
> Attachments: HIVE-15820.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> $ beeline -u jdbc:hive2://localhost:1 -n test -e "
> > --asdfasdfasdfasdf
> > select * from test_table;
> > "
> expected result of the above command should be all rows of test_table(same as 
> run in beeline interactive mode),but it does not output anything.
> the cause is that -e option will read commands as one string, and in method 
> dispatch(String line) it calls function isComment(String line) in the first, 
> which using
>  'lineTrimmed.startsWith("#") || lineTrimmed.startsWith("--")' 
> to regard commands as a comment.
> two ways can be considered to fix this problem:
> 1. in method initArgs(String[] args), split command by '\n' into command list 
> before dispatch when cl.getOptionValues('e') != null
> 2. in method dispatch(String line), remove comments using this:
> static String removeComments(String line) {
> if (line == null || line.isEmpty()) {
> return line;
> }
> StringBuilder builder = new StringBuilder();
> int escape = -1;
> for (int index = 0; index < line.length(); index++) {
> if (index < line.length() - 1 && line.charAt(index) == 
> line.charAt(index + 1)) {
> if (escape == -1 && line.charAt(index) == '-') {
> //find \n as the end of comment
> index = line.indexOf('\n',index+1);
> //there is no sql after this comment,so just break out
> if (-1==index){
> break;
> }
> }
> }
> char letter = line.charAt(index);
> if (letter == escape) {
> escape = -1; // Turn escape off.
> } else if (escape == -1 && (letter == '\'' || letter == '"')) {
> escape = letter; // Turn escape on.
> }
> builder.append(letter);
> }
> return builder.toString();
>   }
> the second way can be a general solution to remove all comments start with 
> '--'  in a sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24543) Support SAML 2.0 as an authentication mechanism

2021-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24543?focusedWorklogId=531579=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-531579
 ]

ASF GitHub Bot logged work on HIVE-24543:
-

Author: ASF GitHub Bot
Created on: 05/Jan/21 23:30
Start Date: 05/Jan/21 23:30
Worklog Time Spent: 10m 
  Work Description: vihangk1 commented on a change in pull request #1791:
URL: https://github.com/apache/hive/pull/1791#discussion_r552260225



##
File path: jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java
##
@@ -896,39 +946,103 @@ private void openSession() throws SQLException {
   openReq.setPassword(sessConfMap.get(JdbcConnectionParams.AUTH_PASSWD));
 }
 
+//TODO(Vihang): This is a bit hacky. We piggy back on a dummy OpenSession 
call
+// to get the redirect response from the server. Instead its probably 
cleaner to
+// explicitly do a HTTP post request and get the response.
+int numRetry = isBrowserAuthMode() ? 2 : 1;
+for (int i=0; i serverHiveConf = openResp.getConfiguration();
+  @VisibleForTesting
+  public IJdbcBrowserClient getBrowserClient() {
+return browserClient;
+  }
 
-  updateServerHiveConf(serverHiveConf, connParams);
+  private void openSession(TOpenSessionReq openReq) throws TException, 
SQLException {
+TOpenSessionResp openResp = client.OpenSession(openReq);
 
-  // validate connection
-  Utils.verifySuccess(openResp.getStatus());
-  if (!supportedProtocols.contains(openResp.getServerProtocolVersion())) {
-throw new TException("Unsupported Hive2 protocol");
-  }
-  protocol = openResp.getServerProtocolVersion();
-  sessHandle = openResp.getSessionHandle();
-
-  final String serverFetchSizeString =
-  
openResp.getConfiguration().get(ConfVars.HIVE_SERVER2_THRIFT_RESULTSET_DEFAULT_FETCH_SIZE.varname);
-  if (serverFetchSizeString == null) {
-throw new IllegalStateException("Server returned a null default fetch 
size. Check that "
-+ 
ConfVars.HIVE_SERVER2_THRIFT_RESULTSET_DEFAULT_FETCH_SIZE.varname + " is 
configured correctly.");
-  }
+// Populate a given configuration from HS2 server HiveConf, only if that 
configuration
+// is not already present in Connection parameter HiveConf i.e., client 
side configuration
+// takes precedence over the server side configuration.
+Map serverHiveConf = openResp.getConfiguration();
+
+updateServerHiveConf(serverHiveConf, connParams);
+
+// validate connection
+Utils.verifySuccess(openResp.getStatus());
+if (!supportedProtocols.contains(openResp.getServerProtocolVersion())) {
+  throw new TException("Unsupported Hive2 protocol");
+}
+protocol = openResp.getServerProtocolVersion();
+sessHandle = openResp.getSessionHandle();
 
-  this.defaultFetchSize = Integer.parseInt(serverFetchSizeString);
-  if (this.defaultFetchSize <= 0) {
-throw new IllegalStateException("Default fetch size must be greater 
than 0");
+final String serverFetchSizeString =
+
openResp.getConfiguration().get(ConfVars.HIVE_SERVER2_THRIFT_RESULTSET_DEFAULT_FETCH_SIZE.varname);
+if (serverFetchSizeString == null) {
+  throw new IllegalStateException("Server returned a null default fetch 
size. Check that "
+  + ConfVars.HIVE_SERVER2_THRIFT_RESULTSET_DEFAULT_FETCH_SIZE.varname 
+ " is configured correctly.");
+}
+
+this.defaultFetchSize = Integer.parseInt(serverFetchSizeString);
+if (this.defaultFetchSize <= 0) {
+  throw new IllegalStateException("Default fetch size must be greater than 
0");
+}
+  }
+
+  private boolean isSamlRedirect(TException e) {

Review comment:
   makes sense. Done.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 531579)
Time Spent: 3h  (was: 2h 50m)

> Support SAML 2.0 as an authentication mechanism
> ---
>
> Key: HIVE-24543
> URL: https://issues.apache.org/jira/browse/HIVE-24543
> Project: Hive
>  Issue Type: New Feature
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> With cloud based deployments, having a SAML 2.0 based authentication support 
> in HS2 will be greatly useful in case of federated or external identity 
> providers like Okta, PingIdentity or Azure AD.
> This authentication mechanism can initially be 

[jira] [Work logged] (HIVE-24543) Support SAML 2.0 as an authentication mechanism

2021-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24543?focusedWorklogId=531577=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-531577
 ]

ASF GitHub Bot logged work on HIVE-24543:
-

Author: ASF GitHub Bot
Created on: 05/Jan/21 23:28
Start Date: 05/Jan/21 23:28
Worklog Time Spent: 10m 
  Work Description: vihangk1 commented on a change in pull request #1791:
URL: https://github.com/apache/hive/pull/1791#discussion_r552259425



##
File path: jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java
##
@@ -896,39 +946,103 @@ private void openSession() throws SQLException {
   openReq.setPassword(sessConfMap.get(JdbcConnectionParams.AUTH_PASSWD));
 }
 
+//TODO(Vihang): This is a bit hacky. We piggy back on a dummy OpenSession 
call
+// to get the redirect response from the server. Instead its probably 
cleaner to
+// explicitly do a HTTP post request and get the response.
+int numRetry = isBrowserAuthMode() ? 2 : 1;
+for (int i=0; i serverHiveConf = openResp.getConfiguration();
+  @VisibleForTesting
+  public IJdbcBrowserClient getBrowserClient() {
+return browserClient;
+  }
 
-  updateServerHiveConf(serverHiveConf, connParams);
+  private void openSession(TOpenSessionReq openReq) throws TException, 
SQLException {
+TOpenSessionResp openResp = client.OpenSession(openReq);
 
-  // validate connection
-  Utils.verifySuccess(openResp.getStatus());
-  if (!supportedProtocols.contains(openResp.getServerProtocolVersion())) {
-throw new TException("Unsupported Hive2 protocol");
-  }
-  protocol = openResp.getServerProtocolVersion();
-  sessHandle = openResp.getSessionHandle();
-
-  final String serverFetchSizeString =
-  
openResp.getConfiguration().get(ConfVars.HIVE_SERVER2_THRIFT_RESULTSET_DEFAULT_FETCH_SIZE.varname);
-  if (serverFetchSizeString == null) {
-throw new IllegalStateException("Server returned a null default fetch 
size. Check that "
-+ 
ConfVars.HIVE_SERVER2_THRIFT_RESULTSET_DEFAULT_FETCH_SIZE.varname + " is 
configured correctly.");
-  }
+// Populate a given configuration from HS2 server HiveConf, only if that 
configuration

Review comment:
   The patch moves these lines (900-926) before the patch into a separate 
method so that we can have a retry attempt of this code. The first attempt to 
get the redirect URL and the second one to actually open the session.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 531577)
Time Spent: 2h 50m  (was: 2h 40m)

> Support SAML 2.0 as an authentication mechanism
> ---
>
> Key: HIVE-24543
> URL: https://issues.apache.org/jira/browse/HIVE-24543
> Project: Hive
>  Issue Type: New Feature
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> With cloud based deployments, having a SAML 2.0 based authentication support 
> in HS2 will be greatly useful in case of federated or external identity 
> providers like Okta, PingIdentity or Azure AD.
> This authentication mechanism can initially be only supported on http 
> transport mode in HiveServer2 since the SAML 2.0 protocol is primarily 
> designed for web clients.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24543) Support SAML 2.0 as an authentication mechanism

2021-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24543?focusedWorklogId=531571=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-531571
 ]

ASF GitHub Bot logged work on HIVE-24543:
-

Author: ASF GitHub Bot
Created on: 05/Jan/21 23:16
Start Date: 05/Jan/21 23:16
Worklog Time Spent: 10m 
  Work Description: vihangk1 commented on a change in pull request #1791:
URL: https://github.com/apache/hive/pull/1791#discussion_r552255324



##
File path: jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java
##
@@ -896,39 +946,103 @@ private void openSession() throws SQLException {
   openReq.setPassword(sessConfMap.get(JdbcConnectionParams.AUTH_PASSWD));
 }
 
+//TODO(Vihang): This is a bit hacky. We piggy back on a dummy OpenSession 
call
+// to get the redirect response from the server. Instead its probably 
cleaner to
+// explicitly do a HTTP post request and get the response.
+int numRetry = isBrowserAuthMode() ? 2 : 1;
+for (int i=0; ihttps://docs.microsoft.com/en-us/sql/odbc/reference/appendixes/appendix-a-odbc-error-codes?view=sql-server-ver15
 which lists 08S01 as "communications like failure". I didn't see anything 
which suggests a authentication error. The closest was perhaps 28000 but I am 
not sure if this is the right place to look for these codes. Do you know where 
can I find this for JDBC?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 531571)
Time Spent: 2h 40m  (was: 2.5h)

> Support SAML 2.0 as an authentication mechanism
> ---
>
> Key: HIVE-24543
> URL: https://issues.apache.org/jira/browse/HIVE-24543
> Project: Hive
>  Issue Type: New Feature
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> With cloud based deployments, having a SAML 2.0 based authentication support 
> in HS2 will be greatly useful in case of federated or external identity 
> providers like Okta, PingIdentity or Azure AD.
> This authentication mechanism can initially be only supported on http 
> transport mode in HiveServer2 since the SAML 2.0 protocol is primarily 
> designed for web clients.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24543) Support SAML 2.0 as an authentication mechanism

2021-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24543?focusedWorklogId=531564=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-531564
 ]

ASF GitHub Bot logged work on HIVE-24543:
-

Author: ASF GitHub Bot
Created on: 05/Jan/21 22:56
Start Date: 05/Jan/21 22:56
Worklog Time Spent: 10m 
  Work Description: vihangk1 commented on a change in pull request #1791:
URL: https://github.com/apache/hive/pull/1791#discussion_r552248192



##
File path: jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java
##
@@ -896,39 +946,103 @@ private void openSession() throws SQLException {
   openReq.setPassword(sessConfMap.get(JdbcConnectionParams.AUTH_PASSWD));
 }
 
+//TODO(Vihang): This is a bit hacky. We piggy back on a dummy OpenSession 
call

Review comment:
   yeah, sorry. I removed my name. I had it to track all the todos.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 531564)
Time Spent: 2.5h  (was: 2h 20m)

> Support SAML 2.0 as an authentication mechanism
> ---
>
> Key: HIVE-24543
> URL: https://issues.apache.org/jira/browse/HIVE-24543
> Project: Hive
>  Issue Type: New Feature
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> With cloud based deployments, having a SAML 2.0 based authentication support 
> in HS2 will be greatly useful in case of federated or external identity 
> providers like Okta, PingIdentity or Azure AD.
> This authentication mechanism can initially be only supported on http 
> transport mode in HiveServer2 since the SAML 2.0 protocol is primarily 
> designed for web clients.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24543) Support SAML 2.0 as an authentication mechanism

2021-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24543?focusedWorklogId=531563=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-531563
 ]

ASF GitHub Bot logged work on HIVE-24543:
-

Author: ASF GitHub Bot
Created on: 05/Jan/21 22:55
Start Date: 05/Jan/21 22:55
Worklog Time Spent: 10m 
  Work Description: vihangk1 commented on a change in pull request #1791:
URL: https://github.com/apache/hive/pull/1791#discussion_r552247841



##
File path: jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java
##
@@ -592,6 +638,10 @@ public boolean retryRequest(IOException exception, int 
executionCount, HttpConte
   }
 });
 
+if (isBrowserAuthMode()) {
+  httpClientBuilder

Review comment:
   I am not sure if you are reviewing this one commit at a time or by 
combining all the commits. line 626 on the latest code in this branch creates a 
httpClientBuilder if the cookieAuth is disabled.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 531563)
Time Spent: 2h 20m  (was: 2h 10m)

> Support SAML 2.0 as an authentication mechanism
> ---
>
> Key: HIVE-24543
> URL: https://issues.apache.org/jira/browse/HIVE-24543
> Project: Hive
>  Issue Type: New Feature
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> With cloud based deployments, having a SAML 2.0 based authentication support 
> in HS2 will be greatly useful in case of federated or external identity 
> providers like Okta, PingIdentity or Azure AD.
> This authentication mechanism can initially be only supported on http 
> transport mode in HiveServer2 since the SAML 2.0 protocol is primarily 
> designed for web clients.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24543) Support SAML 2.0 as an authentication mechanism

2021-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24543?focusedWorklogId=531558=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-531558
 ]

ASF GitHub Bot logged work on HIVE-24543:
-

Author: ASF GitHub Bot
Created on: 05/Jan/21 22:53
Start Date: 05/Jan/21 22:53
Worklog Time Spent: 10m 
  Work Description: vihangk1 commented on a change in pull request #1791:
URL: https://github.com/apache/hive/pull/1791#discussion_r552246893



##
File path: jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java
##
@@ -476,12 +511,18 @@ private String getServerHttpUrl(boolean useSsl) {
   private TTransport createHttpTransport() throws SQLException, 
TTransportException {
 CloseableHttpClient httpClient;
 boolean useSsl = isSslConnection();
-// Create an http client from the configs
+validateSslForBrowserMode();
 httpClient = getHttpClient(useSsl);
 transport = new THttpClient(getServerHttpUrl(useSsl), httpClient);
 return transport;
   }
 
+  protected void validateSslForBrowserMode() throws SQLException {
+if (isBrowserAuthMode() && !isSslConnection()) {
+  throw new SQLException("Browser mode is only supported with SSL is 
enabled");

Review comment:
   I updated the code to throw a new SQLException(new 
IllegalArgumentException()) here. However, I don't see what is the real benefit 
of doing this way since the constructor of HiveConnection only allows for 
SQLException to be thrown.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 531558)
Time Spent: 2h 10m  (was: 2h)

> Support SAML 2.0 as an authentication mechanism
> ---
>
> Key: HIVE-24543
> URL: https://issues.apache.org/jira/browse/HIVE-24543
> Project: Hive
>  Issue Type: New Feature
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> With cloud based deployments, having a SAML 2.0 based authentication support 
> in HS2 will be greatly useful in case of federated or external identity 
> providers like Okta, PingIdentity or Azure AD.
> This authentication mechanism can initially be only supported on http 
> transport mode in HiveServer2 since the SAML 2.0 protocol is primarily 
> designed for web clients.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24543) Support SAML 2.0 as an authentication mechanism

2021-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24543?focusedWorklogId=531557=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-531557
 ]

ASF GitHub Bot logged work on HIVE-24543:
-

Author: ASF GitHub Bot
Created on: 05/Jan/21 22:49
Start Date: 05/Jan/21 22:49
Worklog Time Spent: 10m 
  Work Description: vihangk1 commented on a change in pull request #1791:
URL: https://github.com/apache/hive/pull/1791#discussion_r552245742



##
File path: jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java
##
@@ -301,10 +326,20 @@ public HiveConnection(String uri, Properties info) throws 
SQLException {
 supportedProtocols.add(TProtocolVersion.HIVE_CLI_SERVICE_PROTOCOL_V9);
 supportedProtocols.add(TProtocolVersion.HIVE_CLI_SERVICE_PROTOCOL_V10);
 
+if (isBrowserAuthMode()) {
+  try {
+browserClient = browserClientFactory.create(connParams);
+  } catch (HiveJdbcBrowserException e) {
+throw new SQLException("");
+  }
+} else {
+  browserClient = null;
+}
 if (isEmbeddedMode) {
   client = EmbeddedCLIServicePortal.get(connParams.getHiveConfs());
   connParams.getHiveConfs().clear();
   // open client session
+  // TODO(Vihang) need to throw here if saml auth?

Review comment:
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 531557)
Time Spent: 2h  (was: 1h 50m)

> Support SAML 2.0 as an authentication mechanism
> ---
>
> Key: HIVE-24543
> URL: https://issues.apache.org/jira/browse/HIVE-24543
> Project: Hive
>  Issue Type: New Feature
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> With cloud based deployments, having a SAML 2.0 based authentication support 
> in HS2 will be greatly useful in case of federated or external identity 
> providers like Okta, PingIdentity or Azure AD.
> This authentication mechanism can initially be only supported on http 
> transport mode in HiveServer2 since the SAML 2.0 protocol is primarily 
> designed for web clients.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24588) Run tests using specific log4j2 configuration conveniently

2021-01-05 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis reassigned HIVE-24588:
--


> Run tests using specific log4j2 configuration conveniently
> --
>
> Key: HIVE-24588
> URL: https://issues.apache.org/jira/browse/HIVE-24588
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> In order to reproduce a problem (e.g., HIVE-24569) or validate that a log4j2 
> configuration is working as expected it is necessary to run a test and 
> explicitly specify which configuration should be used. Moreover, after the 
> end of the test in question it is desirable to restore the old logging 
> configuration that was used before launching the test to avoid affecting the 
> overall logging output.
> The goal of this issue is to introduce a convenient & declarative way of 
> running tests with log4j2 configurations based on Jupiter extensions and 
> annotations. The test could like below:
> {code:java}
>   @Test
>   @Log4jConfig("test-log4j2.properties")
>   void testUseExplicitConfig() {
> // Do something and assert
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24543) Support SAML 2.0 as an authentication mechanism

2021-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24543?focusedWorklogId=531545=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-531545
 ]

ASF GitHub Bot logged work on HIVE-24543:
-

Author: ASF GitHub Bot
Created on: 05/Jan/21 22:24
Start Date: 05/Jan/21 22:24
Worklog Time Spent: 10m 
  Work Description: vihangk1 commented on a change in pull request #1791:
URL: https://github.com/apache/hive/pull/1791#discussion_r552235108



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -3871,15 +3870,16 @@ private static void 
populateLlapDaemonVarsSet(Set llapDaemonVarsSetLocal
 
 // HiveServer2 auth configuration
 HIVE_SERVER2_AUTHENTICATION("hive.server2.authentication", "NONE",
-  new StringSet("NOSASL", "NONE", "LDAP", "KERBEROS", "PAM", "CUSTOM"),
+  new StringSet("NOSASL", "NONE", "LDAP", "KERBEROS", "PAM", "CUSTOM", 
"SAML"),
 "Client authentication types.\n" +
 "  NONE: no authentication check\n" +
 "  LDAP: LDAP/AD based authentication\n" +
 "  KERBEROS: Kerberos/GSSAPI authentication\n" +
 "  CUSTOM: Custom authentication provider\n" +
 "  (Use with property 
hive.server2.custom.authentication.class)\n" +
 "  PAM: Pluggable authentication module\n" +
-"  NOSASL:  Raw transport"),
+"  NOSASL:  Raw transport\n" +
+"  SAML2: SAML 2.0 compliant authentication. This is only supported in 
http transport mode."),

Review comment:
   Fixed





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 531545)
Time Spent: 1h 50m  (was: 1h 40m)

> Support SAML 2.0 as an authentication mechanism
> ---
>
> Key: HIVE-24543
> URL: https://issues.apache.org/jira/browse/HIVE-24543
> Project: Hive
>  Issue Type: New Feature
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> With cloud based deployments, having a SAML 2.0 based authentication support 
> in HS2 will be greatly useful in case of federated or external identity 
> providers like Okta, PingIdentity or Azure AD.
> This authentication mechanism can initially be only supported on http 
> transport mode in HiveServer2 since the SAML 2.0 protocol is primarily 
> designed for web clients.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24415) HiveSplitGenerator blocks Tez dispatcher

2021-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24415?focusedWorklogId=531526=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-531526
 ]

ASF GitHub Bot logged work on HIVE-24415:
-

Author: ASF GitHub Bot
Created on: 05/Jan/21 21:50
Start Date: 05/Jan/21 21:50
Worklog Time Spent: 10m 
  Work Description: ramesh0201 commented on pull request #1701:
URL: https://github.com/apache/hive/pull/1701#issuecomment-754923611


   +1. LGTM.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 531526)
Time Spent: 1h 10m  (was: 1h)

> HiveSplitGenerator blocks Tez dispatcher
> 
>
> Key: HIVE-24415
> URL: https://issues.apache.org/jira/browse/HIVE-24415
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mustafa İman
>Assignee: Mustafa İman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> HiveSplitGenerator does a lot of heavyweight operations in its constructor. 
> These operations block AsyncDispatcher in Tez 
> [https://github.com/apache/tez/blob/989d286d09cac7c4e4c5a0e06dd75ea5a6f15478/tez-dag/src/main/java/org/apache/tez/dag/app/dag/RootInputInitializerManager.java#L141]
>  . We should move bulk of initialization out of constructor.
> The only reason of setting up everything in constructor is 
> DynamicPartitionPruner. We can buffer incoming events in HiveSplitGenerator 
> until dynamic partition pruner is initialized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24415) HiveSplitGenerator blocks Tez dispatcher

2021-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24415?focusedWorklogId=531525=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-531525
 ]

ASF GitHub Bot logged work on HIVE-24415:
-

Author: ASF GitHub Bot
Created on: 05/Jan/21 21:50
Start Date: 05/Jan/21 21:50
Worklog Time Spent: 10m 
  Work Description: ramesh0201 commented on a change in pull request #1701:
URL: https://github.com/apache/hive/pull/1701#discussion_r552220196



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DynamicPartitionPruner.java
##
@@ -100,37 +99,19 @@
 
   private int sourceInfoCount = 0;
 
-  private final Object endOfEvents = new Object();
-
   private int totalEventCount = 0;
 
-  public DynamicPartitionPruner(InputInitializerContext context, MapWork work, 
JobConf jobConf) throws
-  SerDeException {
-this.context = context;
-this.work = work;
-this.jobConf = jobConf;
-synchronized (this) {
-  initialize();
+  public void prune() throws SerDeException, IOException, 
InterruptedException, HiveException {
+if (sourcesWaitingForEvents.isEmpty()) {
+  return;
 }
-  }
-
-  public void prune()
-  throws SerDeException, IOException,
-  InterruptedException, HiveException {
 
-synchronized(sourcesWaitingForEvents) {

Review comment:
   Makes sense for me now. Thank you much for helping me understand.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 531525)
Time Spent: 1h  (was: 50m)

> HiveSplitGenerator blocks Tez dispatcher
> 
>
> Key: HIVE-24415
> URL: https://issues.apache.org/jira/browse/HIVE-24415
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mustafa İman
>Assignee: Mustafa İman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> HiveSplitGenerator does a lot of heavyweight operations in its constructor. 
> These operations block AsyncDispatcher in Tez 
> [https://github.com/apache/tez/blob/989d286d09cac7c4e4c5a0e06dd75ea5a6f15478/tez-dag/src/main/java/org/apache/tez/dag/app/dag/RootInputInitializerManager.java#L141]
>  . We should move bulk of initialization out of constructor.
> The only reason of setting up everything in constructor is 
> DynamicPartitionPruner. We can buffer incoming events in HiveSplitGenerator 
> until dynamic partition pruner is initialized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24543) Support SAML 2.0 as an authentication mechanism

2021-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24543?focusedWorklogId=531433=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-531433
 ]

ASF GitHub Bot logged work on HIVE-24543:
-

Author: ASF GitHub Bot
Created on: 05/Jan/21 19:34
Start Date: 05/Jan/21 19:34
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on a change in pull request #1791:
URL: https://github.com/apache/hive/pull/1791#discussion_r552149040



##
File path: jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java
##
@@ -896,39 +946,103 @@ private void openSession() throws SQLException {
   openReq.setPassword(sessConfMap.get(JdbcConnectionParams.AUTH_PASSWD));
 }
 
+//TODO(Vihang): This is a bit hacky. We piggy back on a dummy OpenSession 
call
+// to get the redirect response from the server. Instead its probably 
cleaner to
+// explicitly do a HTTP post request and get the response.
+int numRetry = isBrowserAuthMode() ? 2 : 1;
+for (int i=0; i serverHiveConf = openResp.getConfiguration();
+  @VisibleForTesting
+  public IJdbcBrowserClient getBrowserClient() {
+return browserClient;
+  }
 
-  updateServerHiveConf(serverHiveConf, connParams);
+  private void openSession(TOpenSessionReq openReq) throws TException, 
SQLException {
+TOpenSessionResp openResp = client.OpenSession(openReq);
 
-  // validate connection
-  Utils.verifySuccess(openResp.getStatus());
-  if (!supportedProtocols.contains(openResp.getServerProtocolVersion())) {
-throw new TException("Unsupported Hive2 protocol");
-  }
-  protocol = openResp.getServerProtocolVersion();
-  sessHandle = openResp.getSessionHandle();
-
-  final String serverFetchSizeString =
-  
openResp.getConfiguration().get(ConfVars.HIVE_SERVER2_THRIFT_RESULTSET_DEFAULT_FETCH_SIZE.varname);
-  if (serverFetchSizeString == null) {
-throw new IllegalStateException("Server returned a null default fetch 
size. Check that "
-+ 
ConfVars.HIVE_SERVER2_THRIFT_RESULTSET_DEFAULT_FETCH_SIZE.varname + " is 
configured correctly.");
-  }
+// Populate a given configuration from HS2 server HiveConf, only if that 
configuration
+// is not already present in Connection parameter HiveConf i.e., client 
side configuration
+// takes precedence over the server side configuration.
+Map serverHiveConf = openResp.getConfiguration();
+
+updateServerHiveConf(serverHiveConf, connParams);
+
+// validate connection
+Utils.verifySuccess(openResp.getStatus());
+if (!supportedProtocols.contains(openResp.getServerProtocolVersion())) {
+  throw new TException("Unsupported Hive2 protocol");
+}
+protocol = openResp.getServerProtocolVersion();
+sessHandle = openResp.getSessionHandle();
 
-  this.defaultFetchSize = Integer.parseInt(serverFetchSizeString);
-  if (this.defaultFetchSize <= 0) {
-throw new IllegalStateException("Default fetch size must be greater 
than 0");
+final String serverFetchSizeString =
+
openResp.getConfiguration().get(ConfVars.HIVE_SERVER2_THRIFT_RESULTSET_DEFAULT_FETCH_SIZE.varname);
+if (serverFetchSizeString == null) {
+  throw new IllegalStateException("Server returned a null default fetch 
size. Check that "
+  + ConfVars.HIVE_SERVER2_THRIFT_RESULTSET_DEFAULT_FETCH_SIZE.varname 
+ " is configured correctly.");
+}
+
+this.defaultFetchSize = Integer.parseInt(serverFetchSizeString);
+if (this.defaultFetchSize <= 0) {
+  throw new IllegalStateException("Default fetch size must be greater than 
0");
+}
+  }
+
+  private boolean isSamlRedirect(TException e) {

Review comment:
   @vihangk1 Could you please add some comments to the method as to why we 
look for 302 and 303 in the error message. This will help remember the logic





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 531433)
Time Spent: 1h 40m  (was: 1.5h)

> Support SAML 2.0 as an authentication mechanism
> ---
>
> Key: HIVE-24543
> URL: https://issues.apache.org/jira/browse/HIVE-24543
> Project: Hive
>  Issue Type: New Feature
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> With cloud based deployments, having a SAML 2.0 based authentication support 
> in HS2 will be greatly useful in 

[jira] [Work logged] (HIVE-24543) Support SAML 2.0 as an authentication mechanism

2021-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24543?focusedWorklogId=531427=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-531427
 ]

ASF GitHub Bot logged work on HIVE-24543:
-

Author: ASF GitHub Bot
Created on: 05/Jan/21 19:21
Start Date: 05/Jan/21 19:21
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on a change in pull request #1791:
URL: https://github.com/apache/hive/pull/1791#discussion_r552141828



##
File path: jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java
##
@@ -896,39 +946,103 @@ private void openSession() throws SQLException {
   openReq.setPassword(sessConfMap.get(JdbcConnectionParams.AUTH_PASSWD));
 }
 
+//TODO(Vihang): This is a bit hacky. We piggy back on a dummy OpenSession 
call
+// to get the redirect response from the server. Instead its probably 
cleaner to
+// explicitly do a HTTP post request and get the response.
+int numRetry = isBrowserAuthMode() ? 2 : 1;
+for (int i=0; i serverHiveConf = openResp.getConfiguration();
+  @VisibleForTesting
+  public IJdbcBrowserClient getBrowserClient() {
+return browserClient;
+  }
 
-  updateServerHiveConf(serverHiveConf, connParams);
+  private void openSession(TOpenSessionReq openReq) throws TException, 
SQLException {
+TOpenSessionResp openResp = client.OpenSession(openReq);
 
-  // validate connection
-  Utils.verifySuccess(openResp.getStatus());
-  if (!supportedProtocols.contains(openResp.getServerProtocolVersion())) {
-throw new TException("Unsupported Hive2 protocol");
-  }
-  protocol = openResp.getServerProtocolVersion();
-  sessHandle = openResp.getSessionHandle();
-
-  final String serverFetchSizeString =
-  
openResp.getConfiguration().get(ConfVars.HIVE_SERVER2_THRIFT_RESULTSET_DEFAULT_FETCH_SIZE.varname);
-  if (serverFetchSizeString == null) {
-throw new IllegalStateException("Server returned a null default fetch 
size. Check that "
-+ 
ConfVars.HIVE_SERVER2_THRIFT_RESULTSET_DEFAULT_FETCH_SIZE.varname + " is 
configured correctly.");
-  }
+// Populate a given configuration from HS2 server HiveConf, only if that 
configuration

Review comment:
   its not clear what the change is here. The proposed code looks similar 
to the existing code in the OpenSession() method. What has changed?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 531427)
Time Spent: 1.5h  (was: 1h 20m)

> Support SAML 2.0 as an authentication mechanism
> ---
>
> Key: HIVE-24543
> URL: https://issues.apache.org/jira/browse/HIVE-24543
> Project: Hive
>  Issue Type: New Feature
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> With cloud based deployments, having a SAML 2.0 based authentication support 
> in HS2 will be greatly useful in case of federated or external identity 
> providers like Okta, PingIdentity or Azure AD.
> This authentication mechanism can initially be only supported on http 
> transport mode in HiveServer2 since the SAML 2.0 protocol is primarily 
> designed for web clients.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24543) Support SAML 2.0 as an authentication mechanism

2021-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24543?focusedWorklogId=531398=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-531398
 ]

ASF GitHub Bot logged work on HIVE-24543:
-

Author: ASF GitHub Bot
Created on: 05/Jan/21 18:21
Start Date: 05/Jan/21 18:21
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on a change in pull request #1791:
URL: https://github.com/apache/hive/pull/1791#discussion_r552110846



##
File path: jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java
##
@@ -896,39 +946,103 @@ private void openSession() throws SQLException {
   openReq.setPassword(sessConfMap.get(JdbcConnectionParams.AUTH_PASSWD));
 }
 
+//TODO(Vihang): This is a bit hacky. We piggy back on a dummy OpenSession 
call
+// to get the redirect response from the server. Instead its probably 
cleaner to
+// explicitly do a HTTP post request and get the response.
+int numRetry = isBrowserAuthMode() ? 2 : 1;
+for (int i=0; i Support SAML 2.0 as an authentication mechanism
> ---
>
> Key: HIVE-24543
> URL: https://issues.apache.org/jira/browse/HIVE-24543
> Project: Hive
>  Issue Type: New Feature
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> With cloud based deployments, having a SAML 2.0 based authentication support 
> in HS2 will be greatly useful in case of federated or external identity 
> providers like Okta, PingIdentity or Azure AD.
> This authentication mechanism can initially be only supported on http 
> transport mode in HiveServer2 since the SAML 2.0 protocol is primarily 
> designed for web clients.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24543) Support SAML 2.0 as an authentication mechanism

2021-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24543?focusedWorklogId=531397=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-531397
 ]

ASF GitHub Bot logged work on HIVE-24543:
-

Author: ASF GitHub Bot
Created on: 05/Jan/21 18:20
Start Date: 05/Jan/21 18:20
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on a change in pull request #1791:
URL: https://github.com/apache/hive/pull/1791#discussion_r552110364



##
File path: jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java
##
@@ -896,39 +946,103 @@ private void openSession() throws SQLException {
   openReq.setPassword(sessConfMap.get(JdbcConnectionParams.AUTH_PASSWD));
 }
 
+//TODO(Vihang): This is a bit hacky. We piggy back on a dummy OpenSession 
call
+// to get the redirect response from the server. Instead its probably 
cleaner to
+// explicitly do a HTTP post request and get the response.
+int numRetry = isBrowserAuthMode() ? 2 : 1;
+for (int i=0; i Support SAML 2.0 as an authentication mechanism
> ---
>
> Key: HIVE-24543
> URL: https://issues.apache.org/jira/browse/HIVE-24543
> Project: Hive
>  Issue Type: New Feature
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> With cloud based deployments, having a SAML 2.0 based authentication support 
> in HS2 will be greatly useful in case of federated or external identity 
> providers like Okta, PingIdentity or Azure AD.
> This authentication mechanism can initially be only supported on http 
> transport mode in HiveServer2 since the SAML 2.0 protocol is primarily 
> designed for web clients.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24574) Add DIAGNOSE Statement

2021-01-05 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17259102#comment-17259102
 ] 

David Mollitor commented on HIVE-24574:
---

Results could be a huge result set, so it's not always practical to collect it, 
so I propose here to make it optional.

> Add DIAGNOSE Statement
> --
>
> Key: HIVE-24574
> URL: https://issues.apache.org/jira/browse/HIVE-24574
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Priority: Major
>
> Add a new statement to Hive called {{DIAGNOSE}}
> {code:sql}
> DIAGNOSE [WITH (PARQUET|ORC|JSON|AVRO) RESULTS] ...
> {code}
> Returns a single binary (BLOB) column which contains a TAR-GZ file comprised 
> of several other files:
> * A JSON file containing HS2 version information, HS2 host name, date or 
> query submission, query id(s) etc.
> * The query itself (file name is MD5 of the query)
> * EXPLAIN plan (file name is MD5 of the explain plan)
> * SHOW CREATE for each table in the query ()
> * The configuration of the session (set)
> * The Hive logs generated by the query
> * The processing engine logs generated by the query
> * Any counters associated with the processing engine
> * Optionally, the results of the query in a single file (file name is MD5 of 
> the results)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24543) Support SAML 2.0 as an authentication mechanism

2021-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24543?focusedWorklogId=531390=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-531390
 ]

ASF GitHub Bot logged work on HIVE-24543:
-

Author: ASF GitHub Bot
Created on: 05/Jan/21 18:11
Start Date: 05/Jan/21 18:11
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on a change in pull request #1791:
URL: https://github.com/apache/hive/pull/1791#discussion_r552105043



##
File path: jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java
##
@@ -896,39 +946,103 @@ private void openSession() throws SQLException {
   openReq.setPassword(sessConfMap.get(JdbcConnectionParams.AUTH_PASSWD));
 }
 
+//TODO(Vihang): This is a bit hacky. We piggy back on a dummy OpenSession 
call

Review comment:
   TODO: If nothing to do, should we remove your name from the comment?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 531390)
Time Spent: 1h  (was: 50m)

> Support SAML 2.0 as an authentication mechanism
> ---
>
> Key: HIVE-24543
> URL: https://issues.apache.org/jira/browse/HIVE-24543
> Project: Hive
>  Issue Type: New Feature
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> With cloud based deployments, having a SAML 2.0 based authentication support 
> in HS2 will be greatly useful in case of federated or external identity 
> providers like Okta, PingIdentity or Azure AD.
> This authentication mechanism can initially be only supported on http 
> transport mode in HiveServer2 since the SAML 2.0 protocol is primarily 
> designed for web clients.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24574) Add DIAGNOSE Statement

2021-01-05 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17259100#comment-17259100
 ] 

David Mollitor commented on HIVE-24574:
---

Keeping in mind that a query can be made up of nested views, so each 'table' in 
the query needs a SHOW CREATE for itself and all of the sub-views/tables that 
make it compose it.

> Add DIAGNOSE Statement
> --
>
> Key: HIVE-24574
> URL: https://issues.apache.org/jira/browse/HIVE-24574
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Priority: Major
>
> Add a new statement to Hive called {{DIAGNOSE}}
> {code:sql}
> DIAGNOSE [WITH (PARQUET|ORC|JSON|AVRO) RESULTS] ...
> {code}
> Returns a single binary (BLOB) column which contains a TAR-GZ file comprised 
> of several other files:
> * A JSON file containing HS2 version information, HS2 host name, date or 
> query submission, query id(s) etc.
> * The query itself (file name is MD5 of the query)
> * EXPLAIN plan (file name is MD5 of the explain plan)
> * SHOW CREATE for each table in the query ()
> * The configuration of the session (set)
> * The Hive logs generated by the query
> * The processing engine logs generated by the query
> * Any counters associated with the processing engine
> * Optionally, the results of the query in a single file (file name is MD5 of 
> the results)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24543) Support SAML 2.0 as an authentication mechanism

2021-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24543?focusedWorklogId=531389=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-531389
 ]

ASF GitHub Bot logged work on HIVE-24543:
-

Author: ASF GitHub Bot
Created on: 05/Jan/21 18:10
Start Date: 05/Jan/21 18:10
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on a change in pull request #1791:
URL: https://github.com/apache/hive/pull/1791#discussion_r552104632



##
File path: jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java
##
@@ -592,6 +638,10 @@ public boolean retryRequest(IOException exception, int 
executionCount, HttpConte
   }
 });
 
+if (isBrowserAuthMode()) {
+  httpClientBuilder

Review comment:
   wouldn't httpClientBuilder be null if cookieAuth is disabled? We could 
use http transport with cookieAuth disabled right?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 531389)
Time Spent: 50m  (was: 40m)

> Support SAML 2.0 as an authentication mechanism
> ---
>
> Key: HIVE-24543
> URL: https://issues.apache.org/jira/browse/HIVE-24543
> Project: Hive
>  Issue Type: New Feature
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> With cloud based deployments, having a SAML 2.0 based authentication support 
> in HS2 will be greatly useful in case of federated or external identity 
> providers like Okta, PingIdentity or Azure AD.
> This authentication mechanism can initially be only supported on http 
> transport mode in HiveServer2 since the SAML 2.0 protocol is primarily 
> designed for web clients.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24543) Support SAML 2.0 as an authentication mechanism

2021-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24543?focusedWorklogId=531388=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-531388
 ]

ASF GitHub Bot logged work on HIVE-24543:
-

Author: ASF GitHub Bot
Created on: 05/Jan/21 18:07
Start Date: 05/Jan/21 18:07
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on a change in pull request #1791:
URL: https://github.com/apache/hive/pull/1791#discussion_r552102961



##
File path: jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java
##
@@ -476,12 +511,18 @@ private String getServerHttpUrl(boolean useSsl) {
   private TTransport createHttpTransport() throws SQLException, 
TTransportException {
 CloseableHttpClient httpClient;
 boolean useSsl = isSslConnection();
-// Create an http client from the configs
+validateSslForBrowserMode();
 httpClient = getHttpClient(useSsl);
 transport = new THttpClient(getServerHttpUrl(useSsl), httpClient);
 return transport;
   }
 
+  protected void validateSslForBrowserMode() throws SQLException {
+if (isBrowserAuthMode() && !isSslConnection()) {
+  throw new SQLException("Browser mode is only supported with SSL is 
enabled");

Review comment:
   I understand why this is done but feels odd that this throws a 
SQLException. Can we throw something else here and then wrap it in a 
SQLException in the calling method? Even a TTransportException would make sense 
over SQLException (though it is not thrown from thrift).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 531388)
Time Spent: 40m  (was: 0.5h)

> Support SAML 2.0 as an authentication mechanism
> ---
>
> Key: HIVE-24543
> URL: https://issues.apache.org/jira/browse/HIVE-24543
> Project: Hive
>  Issue Type: New Feature
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> With cloud based deployments, having a SAML 2.0 based authentication support 
> in HS2 will be greatly useful in case of federated or external identity 
> providers like Okta, PingIdentity or Azure AD.
> This authentication mechanism can initially be only supported on http 
> transport mode in HiveServer2 since the SAML 2.0 protocol is primarily 
> designed for web clients.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24543) Support SAML 2.0 as an authentication mechanism

2021-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24543?focusedWorklogId=531384=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-531384
 ]

ASF GitHub Bot logged work on HIVE-24543:
-

Author: ASF GitHub Bot
Created on: 05/Jan/21 18:02
Start Date: 05/Jan/21 18:02
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on a change in pull request #1791:
URL: https://github.com/apache/hive/pull/1791#discussion_r552100514



##
File path: jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java
##
@@ -301,10 +326,20 @@ public HiveConnection(String uri, Properties info) throws 
SQLException {
 supportedProtocols.add(TProtocolVersion.HIVE_CLI_SERVICE_PROTOCOL_V9);
 supportedProtocols.add(TProtocolVersion.HIVE_CLI_SERVICE_PROTOCOL_V10);
 
+if (isBrowserAuthMode()) {
+  try {
+browserClient = browserClientFactory.create(connParams);
+  } catch (HiveJdbcBrowserException e) {
+throw new SQLException("");
+  }
+} else {
+  browserClient = null;
+}
 if (isEmbeddedMode) {
   client = EmbeddedCLIServicePortal.get(connParams.getHiveConfs());
   connParams.getHiveConfs().clear();
   // open client session
+  // TODO(Vihang) need to throw here if saml auth?

Review comment:
   TODO item 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 531384)
Time Spent: 0.5h  (was: 20m)

> Support SAML 2.0 as an authentication mechanism
> ---
>
> Key: HIVE-24543
> URL: https://issues.apache.org/jira/browse/HIVE-24543
> Project: Hive
>  Issue Type: New Feature
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> With cloud based deployments, having a SAML 2.0 based authentication support 
> in HS2 will be greatly useful in case of federated or external identity 
> providers like Okta, PingIdentity or Azure AD.
> This authentication mechanism can initially be only supported on http 
> transport mode in HiveServer2 since the SAML 2.0 protocol is primarily 
> designed for web clients.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24543) Support SAML 2.0 as an authentication mechanism

2021-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24543?focusedWorklogId=531376=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-531376
 ]

ASF GitHub Bot logged work on HIVE-24543:
-

Author: ASF GitHub Bot
Created on: 05/Jan/21 17:42
Start Date: 05/Jan/21 17:42
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on a change in pull request #1791:
URL: https://github.com/apache/hive/pull/1791#discussion_r552088626



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -3871,15 +3870,16 @@ private static void 
populateLlapDaemonVarsSet(Set llapDaemonVarsSetLocal
 
 // HiveServer2 auth configuration
 HIVE_SERVER2_AUTHENTICATION("hive.server2.authentication", "NONE",
-  new StringSet("NOSASL", "NONE", "LDAP", "KERBEROS", "PAM", "CUSTOM"),
+  new StringSet("NOSASL", "NONE", "LDAP", "KERBEROS", "PAM", "CUSTOM", 
"SAML"),
 "Client authentication types.\n" +
 "  NONE: no authentication check\n" +
 "  LDAP: LDAP/AD based authentication\n" +
 "  KERBEROS: Kerberos/GSSAPI authentication\n" +
 "  CUSTOM: Custom authentication provider\n" +
 "  (Use with property 
hive.server2.custom.authentication.class)\n" +
 "  PAM: Pluggable authentication module\n" +
-"  NOSASL:  Raw transport"),
+"  NOSASL:  Raw transport\n" +
+"  SAML2: SAML 2.0 compliant authentication. This is only supported in 
http transport mode."),

Review comment:
   nit: SAML2 should be SAML to match the value in the set





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 531376)
Time Spent: 20m  (was: 10m)

> Support SAML 2.0 as an authentication mechanism
> ---
>
> Key: HIVE-24543
> URL: https://issues.apache.org/jira/browse/HIVE-24543
> Project: Hive
>  Issue Type: New Feature
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> With cloud based deployments, having a SAML 2.0 based authentication support 
> in HS2 will be greatly useful in case of federated or external identity 
> providers like Okta, PingIdentity or Azure AD.
> This authentication mechanism can initially be only supported on http 
> transport mode in HiveServer2 since the SAML 2.0 protocol is primarily 
> designed for web clients.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24394) Enable printing explain to console at query start

2021-01-05 Thread Johan Gustavsson (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Johan Gustavsson reassigned HIVE-24394:
---

Assignee: Zoltan Haindrich  (was: Jesus Camacho Rodriguez)

> Enable printing explain to console at query start
> -
>
> Key: HIVE-24394
> URL: https://issues.apache.org/jira/browse/HIVE-24394
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive, Query Processor
>Affects Versions: 2.3.7, 3.1.2
>Reporter: Johan Gustavsson
>Assignee: Zoltan Haindrich
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently there is a hive.log.explain.output option that prints extended 
> explain to log. While this is helpful for internal investigations, it limits 
> the information that is available to users. So we should add options to make 
> this print non-extended explain to console,. for general user consumption, to 
> make it easier for users to debug queries and workflows without having to 
> resubmit queries with explain.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24386) Add builder methods for GetTablesRequest and GetPartitionsRequest to HiveMetaStoreClient

2021-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24386?focusedWorklogId=531371=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-531371
 ]

ASF GitHub Bot logged work on HIVE-24386:
-

Author: ASF GitHub Bot
Created on: 05/Jan/21 17:31
Start Date: 05/Jan/21 17:31
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on a change in pull request #1694:
URL: https://github.com/apache/hive/pull/1694#discussion_r552082468



##
File path: 
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/client/TestTablesGetExists.java
##
@@ -555,6 +546,74 @@ public void 
testGetTableObjectsWithProjectionOfMultiValuedFields() throws Except
 }
   }
 
+  @Test
+  public void testGetTableProjectionSpecification() throws Exception {
+List tableNames = new ArrayList<>();
+tableNames.add(testTables[0].getTableName());
+tableNames.add(testTables[1].getTableName());
+
+GetProjectionsSpec projectSpec = (new GetTableProjectionsSpecBuilder())
+.includeTableName()
+.includeDatabase()
+.includeSdCdColsName()
+.includeSdCdColsType()
+.includeSdCdColsComment()
+.includeSdLocation()
+.includeSdInputFormat()
+.includeSdOutputFormat()
+.includeSdIsCompressed()
+.includeSdNumBuckets()
+.includeSdSerDeInfoName()
+.includeSdSerDeInfoSerializationLib()
+.includeSdSerDeInfoParameters()
+.includeSdSerDeInfoDescription()
+.includeSdSerDeInfoSerializerClass()
+.includeSdSerDeInfoDeserializerClass()
+.includeSdSerDeInfoSerdeType()
+.includeSdBucketCols()
+.includeSdSortColsCol()
+.includeSdSortColsOrder()
+.includeSdparameters()
+.includeSdSkewedColNames()
+.includeSdSkewedColValues()
+.includeSdSkewedColValueLocationMaps()
+.includeSdIsStoredAsSubDirectories()
+.includeOwner()
+.includeOwnerType()
+.includeCreateTime()
+.includeLastAccessTime()
+.includeRetention()
+.includePartitionKeysName()
+.includePartitionKeysType()
+.includePartitionKeysComment()
+.includeParameters()
+.includeViewOriginalText()
+.includeRewriteEnabled()
+.includeTableType()
+.build();
+
+List tables = client.getTables(null, DEFAULT_DATABASE, tableNames, 
projectSpec);
+
+Assert.assertEquals("Found tables", 2, tables.size());
+
+for(Table table : tables) {
+  Assert.assertTrue(table.isSetDbName());
+  Assert.assertTrue(table.isSetCatName());
+  Assert.assertTrue(table.isSetTableName());
+  Assert.assertTrue(table.isSetLastAccessTime());
+  Assert.assertTrue(table.isSetSd());
+  StorageDescriptor sd = table.getSd();
+  Assert.assertTrue(sd.isSetCols());
+  Assert.assertTrue(sd.isSetSerdeInfo());
+  Assert.assertTrue(sd.isSetBucketCols());
+  Assert.assertTrue(sd.isSetCompressed());
+  Assert.assertTrue(sd.isSetInputFormat());
+  Assert.assertTrue(sd.isSetSerdeInfo());
+  SerDeInfo serDeInfo = sd.getSerdeInfo();
+  Assert.assertTrue(serDeInfo.isSetSerializationLib());
+}
+  }
+

Review comment:
   Could you please add a negative test here as well? Include only a couple 
of fields on the projection spec and test to ensure that some of the other 
fields are not set on the returned Table objects? Thanks





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 531371)
Time Spent: 40m  (was: 0.5h)

> Add builder methods for GetTablesRequest and GetPartitionsRequest to 
> HiveMetaStoreClient
> 
>
> Key: HIVE-24386
> URL: https://issues.apache.org/jira/browse/HIVE-24386
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Narayanan Venkateswaran
>Assignee: Narayanan Venkateswaran
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Builder methods for GetTablesRequest and GetPartitionsRequest should be added 
> to the HiveMetaStoreClient class.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24386) Add builder methods for GetTablesRequest and GetPartitionsRequest to HiveMetaStoreClient

2021-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24386?focusedWorklogId=531369=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-531369
 ]

ASF GitHub Bot logged work on HIVE-24386:
-

Author: ASF GitHub Bot
Created on: 05/Jan/21 17:28
Start Date: 05/Jan/21 17:28
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on a change in pull request #1694:
URL: https://github.com/apache/hive/pull/1694#discussion_r552081081



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/client/builder/GetPartitionsRequestBuilder.java
##
@@ -0,0 +1,123 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.metastore.client.builder;
+
+import org.apache.hadoop.hive.metastore.api.GetPartitionsFilterSpec;
+import org.apache.hadoop.hive.metastore.api.GetPartitionsRequest;
+import org.apache.hadoop.hive.metastore.api.GetProjectionsSpec;
+
+import java.util.List;
+
+public class GetPartitionsRequestBuilder {
+private String catName = null;
+private String dbName = null;
+private String tblName = null;
+private boolean withAuth = true;
+private String user = null;
+private List groupNames = null;
+private GetProjectionsSpec projectionSpec = null;
+private GetPartitionsFilterSpec filterSpec = null;
+private List processorCapabilities = null;
+private String processorIdentifier = null;
+private String validWriteIdList = null;
+
+public GetPartitionsRequestBuilder(String catName, String dbName, String 
tblName, boolean withAuth, String user,
+   List groupNames, 
GetProjectionsSpec projectionSpec,
+   GetPartitionsFilterSpec filterSpec, 
List processorCapabilities,
+   String processorIdentifier, String 
validWriteIdList) {
+this.catName = catName;
+this.dbName = dbName;
+this.tblName = tblName;
+this.withAuth = withAuth;
+this.user = user;
+this.groupNames = groupNames;
+this.projectionSpec = projectionSpec;
+this.filterSpec = filterSpec;
+this.processorCapabilities = processorCapabilities;

Review comment:
   the processorCapabilities and the processorIdentifier are meant to be 
static values for a given instance of HMSClient object. They are not meant to 
change between different calls. So having it on builder class might be 
confusing. Should we remove it? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 531369)
Time Spent: 0.5h  (was: 20m)

> Add builder methods for GetTablesRequest and GetPartitionsRequest to 
> HiveMetaStoreClient
> 
>
> Key: HIVE-24386
> URL: https://issues.apache.org/jira/browse/HIVE-24386
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Narayanan Venkateswaran
>Assignee: Narayanan Venkateswaran
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Builder methods for GetTablesRequest and GetPartitionsRequest should be added 
> to the HiveMetaStoreClient class.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24386) Add builder methods for GetTablesRequest and GetPartitionsRequest to HiveMetaStoreClient

2021-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24386?focusedWorklogId=531357=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-531357
 ]

ASF GitHub Bot logged work on HIVE-24386:
-

Author: ASF GitHub Bot
Created on: 05/Jan/21 17:18
Start Date: 05/Jan/21 17:18
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on a change in pull request #1694:
URL: https://github.com/apache/hive/pull/1694#discussion_r552074646



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/TableFields.java
##
@@ -58,9 +60,9 @@
 
 private static final ImmutableSet allMultiValuedFields = new 
ImmutableSet.Builder()
 .add("values")
-.add("sd.cols.name")
-.add("sd.cols.type")
-.add("sd.cols.comment")
+.add("sd.cd.cols.name")

Review comment:
   @vnhive StorageDescriptor seems to have a variable "cols" but not the 
actual ColumnDescriptor "cd". How does this work? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 531357)
Time Spent: 20m  (was: 10m)

> Add builder methods for GetTablesRequest and GetPartitionsRequest to 
> HiveMetaStoreClient
> 
>
> Key: HIVE-24386
> URL: https://issues.apache.org/jira/browse/HIVE-24386
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Narayanan Venkateswaran
>Assignee: Narayanan Venkateswaran
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Builder methods for GetTablesRequest and GetPartitionsRequest should be added 
> to the HiveMetaStoreClient class.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24586) Rename compaction "attempted" status

2021-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24586:
--
Labels: pull-request-available  (was: )

> Rename compaction "attempted" status
> 
>
> Key: HIVE-24586
> URL: https://issues.apache.org/jira/browse/HIVE-24586
> Project: Hive
>  Issue Type: Improvement
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> A compaction with "attempted" status sounds like compactor tried to compact 
> the table/partition and failed. In reality it means one of these:
>  * the Initiator did not queue compaction because the number of previously 
> failed compactions has passed a threshold
>  * the Initiator did not queue compaction because of an error
> In both these cases the user is still able initiate compaction manually. This 
> should be made clearer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24586) Rename compaction "attempted" status

2021-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24586?focusedWorklogId=531347=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-531347
 ]

ASF GitHub Bot logged work on HIVE-24586:
-

Author: ASF GitHub Bot
Created on: 05/Jan/21 17:10
Start Date: 05/Jan/21 17:10
Worklog Time Spent: 10m 
  Work Description: klcopp opened a new pull request #1831:
URL: https://github.com/apache/hive/pull/1831


   ### What changes were proposed in this pull request?
   Rename compaction 'attempted' status to 'did not initiate'
   
   
   ### Why are the changes needed?
   See HIVE-24587.
   
   ### Does this PR introduce _any_ user-facing change?
   Show compactions and sys.COMPACTIONS tables will list "attempted" status as 
"did not initiate" now.
   
   ### How was this patch tested?
   Unit test for case of initiator failure.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 531347)
Remaining Estimate: 0h
Time Spent: 10m

> Rename compaction "attempted" status
> 
>
> Key: HIVE-24586
> URL: https://issues.apache.org/jira/browse/HIVE-24586
> Project: Hive
>  Issue Type: Improvement
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> A compaction with "attempted" status sounds like compactor tried to compact 
> the table/partition and failed. In reality it means one of these:
>  * the Initiator did not queue compaction because the number of previously 
> failed compactions has passed a threshold
>  * the Initiator did not queue compaction because of an error
> In both these cases the user is still able initiate compaction manually. This 
> should be made clearer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24519) Optimize MV: Materialized views should not rebuild when tables are not modified

2021-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24519?focusedWorklogId=531335=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-531335
 ]

ASF GitHub Bot logged work on HIVE-24519:
-

Author: ASF GitHub Bot
Created on: 05/Jan/21 16:48
Start Date: 05/Jan/21 16:48
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on a change in pull request #1772:
URL: https://github.com/apache/hive/pull/1772#discussion_r552045065



##
File path: 
ql/src/test/results/clientnegative/materialized_view_authorization_rebuild_no_grant.q.out
##
@@ -33,4 +33,4 @@ POSTHOOK: type: CREATE_MATERIALIZED_VIEW
 POSTHOOK: Input: default@amvrng_table
 POSTHOOK: Output: database:default
 POSTHOOK: Output: default@amvrng_mat_view
-FAILED: HiveAccessControlException Permission denied: Principal [name=user1, 
type=USER] does not have following privileges for operation QUERY [[INSERT, 
DELETE] on Object [type=TABLE_OR_VIEW, name=default.amvrng_mat_view, 
action=INSERT_OVERWRITE]]
+FAILED: HiveAccessControlException Permission denied: Principal [name=user1, 
type=USER] does not have following privileges for operation 
ALTER_MATERIALIZED_VIEW_REBUILD [[OBJECT OWNERSHIP] on Object 
[type=TABLE_OR_VIEW, name=default.amvrng_mat_view, action=INSERT_OVERWRITE]]

Review comment:
   It seems we need object ownership for rebuild now (this is probably 
because of the changes in Operation2Privilege.java). We should mimic 
HiveOperationType.QUERY privileges instead of other ALTER statements (we 
probably need SELECT, INSERT, and DELETE on the MV).

##
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
##
@@ -28,11 +28,14 @@
 import com.google.common.util.concurrent.MoreExecutors;
 import com.google.common.util.concurrent.ThreadFactoryBuilder;
 
+import static 
org.apache.hadoop.hive.conf.Constants.MATERIALIZED_VIEW_REWRITING_TIME_WINDOW;
 import static 
org.apache.hadoop.hive.metastore.api.hive_metastoreConstants.META_TABLE_STORAGE;
 import static 
org.apache.hadoop.hive.metastore.utils.MetaStoreUtils.getDefaultCatalog;
 import static org.apache.hadoop.hive.ql.io.AcidUtils.getFullTableName;
 import static 
org.apache.hadoop.hive.ql.metadata.HiveRelOptMaterialization.RewriteAlgorithm.CALCITE;
 import static 
org.apache.hadoop.hive.ql.metadata.HiveRelOptMaterialization.RewriteAlgorithm.ALL;
+import static 
org.apache.hadoop.hive.ql.metadata.HiveRelOptMaterialization.RewriteAlgorithm.CALCITE;

Review comment:
   nit. repeated import

##
File path: 
ql/src/test/results/clientpositive/llap/insert1_overwrite_partitions.q.out
##
@@ -198,9 +198,7 @@ PREHOOK: type: QUERY
 POSTHOOK: query: EXPLAIN INSERT OVERWRITE TABLE destinTable PARTITION 
(ds='2011-11-11', hr='11') if not exists
 SELECT one,two FROM sourceTable WHERE ds='2011-11-11' AND hr='12' order by one 
desc, two desc limit 5
 POSTHOOK: type: QUERY
-STAGE DEPENDENCIES:

Review comment:
   I'm going to back down from this change. It seems there were occasions 
where this was empty. Let's just keep it as it was for backwards compatibility. 
If we want to remove them, we can do it in a separate JIRA. Sorry about the 
confusion.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 531335)
Time Spent: 1h 50m  (was: 1h 40m)

> Optimize MV: Materialized views should not rebuild when tables are not 
> modified
> ---
>
> Key: HIVE-24519
> URL: https://issues.apache.org/jira/browse/HIVE-24519
> Project: Hive
>  Issue Type: Sub-task
>  Components: Materialized views
>Reporter: Rajesh Balamohan
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> e.g
> {noformat}
> create materialized view c_c_address as 
> select c_customer_sk from customer c, customer_address ca where 
> c_current_addr_sk = ca.ca_address_id;
> ALTER MATERIALIZED VIEW c_c_address REBUILD; <-- This shouldn't trigger 
> rebuild, when source tables are not modified
>  {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24337) Cache delete delta files in LLAP cache

2021-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24337?focusedWorklogId=531311=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-531311
 ]

ASF GitHub Bot logged work on HIVE-24337:
-

Author: ASF GitHub Bot
Created on: 05/Jan/21 15:50
Start Date: 05/Jan/21 15:50
Worklog Time Spent: 10m 
  Work Description: szlta commented on pull request #1776:
URL: https://github.com/apache/hive/pull/1776#issuecomment-754720731


   Tested with hive.llap.io.cache.deletedeltas=metadata.
   All tests passed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 531311)
Time Spent: 4.5h  (was: 4h 20m)

> Cache delete delta files in LLAP cache
> --
>
> Key: HIVE-24337
> URL: https://issues.apache.org/jira/browse/HIVE-24337
> Project: Hive
>  Issue Type: New Feature
>Reporter: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> HIVE-23824 added the functionality of caching metadata part of orc files in 
> LLAP cache, so that ACID reads can be faster. However the content itself 
> still needs to be read in every single time. If this could be cached too, 
> additional time could be saved.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24585) NPE in VectorizedOrcAcidRowBatchReader if LLAP is used with IO disabled

2021-01-05 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-24585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ádám Szita updated HIVE-24585:
--
Status: Patch Available  (was: In Progress)

> NPE in VectorizedOrcAcidRowBatchReader if LLAP is used with IO disabled
> ---
>
> Key: HIVE-24585
> URL: https://issues.apache.org/jira/browse/HIVE-24585
> Project: Hive
>  Issue Type: Bug
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> NPE is thrown if LLAP mode is turned on and LLAP daemon executes a query on 
> an ACID table if LLAP IO is disabled. Although this doesn't seem to be a very 
> useful LLAP environment setup, we'll need to cover this edge case too.
> {code:java}
> Caused by: java.lang.RuntimeException: java.io.IOException: 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:206)
>   at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.(TezGroupedSplitsInputFormat.java:145)
>   at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:111)
>   at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:156)
>   at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:82)
>   at 
> org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:703)
>   at 
> org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:662)
>   at 
> org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:150)
>   at 
> org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:114)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:543)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:189)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)
>   ... 15 more
> Caused by: java.io.IOException: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:431)
>   at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:203)
>   ... 26 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.getOrcTail(VectorizedOrcAcidRowBatchReader.java:680)
>   at 
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.findMinMaxKeys(VectorizedOrcAcidRowBatchReader.java:426)
>   at 
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.(VectorizedOrcAcidRowBatchReader.java:273)
>   at 
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.(VectorizedOrcAcidRowBatchReader.java:159)
>   at 
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.(VectorizedOrcAcidRowBatchReader.java:154)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:2074)
>   at 
> org.apache.hadoop.hive.ql.io.RecordReaderWrapper.create(RecordReaderWrapper.java:72)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:428)
>   ... 27 more {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24585) NPE in VectorizedOrcAcidRowBatchReader if LLAP is used with IO disabled

2021-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24585?focusedWorklogId=531310=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-531310
 ]

ASF GitHub Bot logged work on HIVE-24585:
-

Author: ASF GitHub Bot
Created on: 05/Jan/21 15:48
Start Date: 05/Jan/21 15:48
Worklog Time Spent: 10m 
  Work Description: szlta opened a new pull request #1830:
URL: https://github.com/apache/hive/pull/1830


   NPE is thrown if LLAP mode is turned on and LLAP daemon executes a query on 
an ACID table if LLAP IO is disabled. Although this doesn't seem to be a very 
useful LLAP environment setup, we'll need to cover this edge case too.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 531310)
Remaining Estimate: 0h
Time Spent: 10m

> NPE in VectorizedOrcAcidRowBatchReader if LLAP is used with IO disabled
> ---
>
> Key: HIVE-24585
> URL: https://issues.apache.org/jira/browse/HIVE-24585
> Project: Hive
>  Issue Type: Bug
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> NPE is thrown if LLAP mode is turned on and LLAP daemon executes a query on 
> an ACID table if LLAP IO is disabled. Although this doesn't seem to be a very 
> useful LLAP environment setup, we'll need to cover this edge case too.
> {code:java}
> Caused by: java.lang.RuntimeException: java.io.IOException: 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:206)
>   at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.(TezGroupedSplitsInputFormat.java:145)
>   at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:111)
>   at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:156)
>   at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:82)
>   at 
> org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:703)
>   at 
> org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:662)
>   at 
> org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:150)
>   at 
> org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:114)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:543)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:189)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)
>   ... 15 more
> Caused by: java.io.IOException: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:431)
>   at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:203)
>   ... 26 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.getOrcTail(VectorizedOrcAcidRowBatchReader.java:680)
>   at 
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.findMinMaxKeys(VectorizedOrcAcidRowBatchReader.java:426)
>   at 
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.(VectorizedOrcAcidRowBatchReader.java:273)
>   at 
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.(VectorizedOrcAcidRowBatchReader.java:159)
>   at 
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.(VectorizedOrcAcidRowBatchReader.java:154)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:2074)
>   at 
> org.apache.hadoop.hive.ql.io.RecordReaderWrapper.create(RecordReaderWrapper.java:72)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:428)
>   ... 27 more {code}



[jira] [Updated] (HIVE-24585) NPE in VectorizedOrcAcidRowBatchReader if LLAP is used with IO disabled

2021-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24585:
--
Labels: pull-request-available  (was: )

> NPE in VectorizedOrcAcidRowBatchReader if LLAP is used with IO disabled
> ---
>
> Key: HIVE-24585
> URL: https://issues.apache.org/jira/browse/HIVE-24585
> Project: Hive
>  Issue Type: Bug
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> NPE is thrown if LLAP mode is turned on and LLAP daemon executes a query on 
> an ACID table if LLAP IO is disabled. Although this doesn't seem to be a very 
> useful LLAP environment setup, we'll need to cover this edge case too.
> {code:java}
> Caused by: java.lang.RuntimeException: java.io.IOException: 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:206)
>   at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.(TezGroupedSplitsInputFormat.java:145)
>   at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:111)
>   at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:156)
>   at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:82)
>   at 
> org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:703)
>   at 
> org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:662)
>   at 
> org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:150)
>   at 
> org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:114)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:543)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:189)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)
>   ... 15 more
> Caused by: java.io.IOException: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:431)
>   at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:203)
>   ... 26 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.getOrcTail(VectorizedOrcAcidRowBatchReader.java:680)
>   at 
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.findMinMaxKeys(VectorizedOrcAcidRowBatchReader.java:426)
>   at 
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.(VectorizedOrcAcidRowBatchReader.java:273)
>   at 
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.(VectorizedOrcAcidRowBatchReader.java:159)
>   at 
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.(VectorizedOrcAcidRowBatchReader.java:154)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:2074)
>   at 
> org.apache.hadoop.hive.ql.io.RecordReaderWrapper.create(RecordReaderWrapper.java:72)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:428)
>   ... 27 more {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24587) DataFileReader is not closed in AvroGenericRecordReader#extractWriterProlepticFromMetadata

2021-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24587:
--
Labels: pull-request-available  (was: )

> DataFileReader is not closed in 
> AvroGenericRecordReader#extractWriterProlepticFromMetadata 
> ---
>
> Key: HIVE-24587
> URL: https://issues.apache.org/jira/browse/HIVE-24587
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Same problem as HIVE-22981 appears in 
> AvroGenericRecordReader#extractWriterProlepticFromMetadata.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24587) DataFileReader is not closed in AvroGenericRecordReader#extractWriterProlepticFromMetadata

2021-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24587?focusedWorklogId=531305=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-531305
 ]

ASF GitHub Bot logged work on HIVE-24587:
-

Author: ASF GitHub Bot
Created on: 05/Jan/21 15:34
Start Date: 05/Jan/21 15:34
Worklog Time Spent: 10m 
  Work Description: klcopp opened a new pull request #1829:
URL: https://github.com/apache/hive/pull/1829


   See HIVE-24587.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 531305)
Remaining Estimate: 0h
Time Spent: 10m

> DataFileReader is not closed in 
> AvroGenericRecordReader#extractWriterProlepticFromMetadata 
> ---
>
> Key: HIVE-24587
> URL: https://issues.apache.org/jira/browse/HIVE-24587
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Same problem as HIVE-22981 appears in 
> AvroGenericRecordReader#extractWriterProlepticFromMetadata.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24587) DataFileReader is not closed in AvroGenericRecordReader#extractWriterProlepticFromMetadata

2021-01-05 Thread Karen Coppage (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage reassigned HIVE-24587:



> DataFileReader is not closed in 
> AvroGenericRecordReader#extractWriterProlepticFromMetadata 
> ---
>
> Key: HIVE-24587
> URL: https://issues.apache.org/jira/browse/HIVE-24587
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>
> Same problem as HIVE-22981 appears in 
> AvroGenericRecordReader#extractWriterProlepticFromMetadata.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24586) Rename compaction "attempted" status

2021-01-05 Thread Karen Coppage (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage reassigned HIVE-24586:



> Rename compaction "attempted" status
> 
>
> Key: HIVE-24586
> URL: https://issues.apache.org/jira/browse/HIVE-24586
> Project: Hive
>  Issue Type: Improvement
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Minor
>
> A compaction with "attempted" status sounds like compactor tried to compact 
> the table/partition and failed. In reality it means one of these:
>  * the Initiator did not queue compaction because the number of previously 
> failed compactions has passed a threshold
>  * the Initiator did not queue compaction because of an error
> In both these cases the user is still able initiate compaction manually. This 
> should be made clearer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24565) Implement standard trim function

2021-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24565?focusedWorklogId=531237=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-531237
 ]

ASF GitHub Bot logged work on HIVE-24565:
-

Author: ASF GitHub Bot
Created on: 05/Jan/21 14:16
Start Date: 05/Jan/21 14:16
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #1810:
URL: https://github.com/apache/hive/pull/1810#discussion_r551958137



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBaseTrim.java
##
@@ -68,11 +82,24 @@ public Object evaluate(DeferredObject[] arguments) throws 
HiveException {
 if (valObject == null) {
   return null;
 }
-String val = ((Text) converter.convert(valObject)).toString();
+String val = stringToTrimConverter.convert(valObject).toString();
 if (val == null) {
   return null;
 }
-result.set(performOp(val.toString()));
+
+String trimChars = " ";

Review comment:
   I add supporting vectorized two parameter version of trim functions when 
the first parameter is a column the second is a literal. Example: 
   ```
   create table t1 (col0 string);
   select trim(col0, 'xy') from t1 group by col0;
   ```
   In case of trim chars parameter is also column we fall back for 
non-vectorized version of trim. I guess this use case is not as common as the 
previous but it can be implemented in a follow-up patch.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 531237)
Time Spent: 50m  (was: 40m)

> Implement standard trim function
> 
>
> Key: HIVE-24565
> URL: https://issues.apache.org/jira/browse/HIVE-24565
> Project: Hive
>  Issue Type: Improvement
>  Components: Parser, UDF
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> {code}
>  ::=
> TRIM   
>  ::=
> [ [  ] [  ] FROM ] 
>  ::=
> 
>  ::=
> LEADING
> | TRAILING
> | BOTH
>  ::=
> 
> {code}
> Example
> {code}
> SELECT TRIM(LEADING '0' FROM '000123');
> 123
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24532) Reduce sink vectorization mixes column types

2021-01-05 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-24532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mustafa İman reassigned HIVE-24532:
---

Assignee: Mustafa İman

> Reduce sink vectorization mixes column types
> 
>
> Key: HIVE-24532
> URL: https://issues.apache.org/jira/browse/HIVE-24532
> Project: Hive
>  Issue Type: Bug
>Reporter: Mustafa İman
>Assignee: Mustafa İman
>Priority: Major
> Attachments: castexception.txt, explainplan.txt
>
>
> I do insert overwrite select on a partitioned table. Partition column is 
> specified dynamically from select query. "ceil" function is applied on a 
> string column to specify partition for each row. Reduce sink gets confused 
> about the type of partition column. It leads to following cast exception in 
> runtime:
> {code:java}
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSerializeRow.serializePrimitiveWrite(VectorSerializeRow.java:452)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSerializeRow.serializeWrite(VectorSerializeRow.java:279)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSerializeRow.serializeWrite(VectorSerializeRow.java:258)
> at 
> org.apache.hadoop.hive.ql.exec.vector.reducesink.VectorReduceSinkObjectHashOperator.processKey(VectorReduceSinkObjectHashOperator.java:305)
> ... 28 more
> {code}
> The problem is reproducible by running mvn test 
> -Dtest=TestMiniLlapLocalCliDriver -Dqfile=insert0.q with "set 
> hive.stats.autogather=false". The additional config option causes insert 
> statements to be vectorized so the vectorization bug appears.
> insert0.q: 
> [https://github.com/apache/hive/blob/fb046c77257d648d0ee232356bdf665772b28bdd/ql/src/test/queries/clientpositive/insert0.q]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24532) Reduce sink vectorization mixes column types

2021-01-05 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-24532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258913#comment-17258913
 ] 

Mustafa İman commented on HIVE-24532:
-

minimal reproducer

CREATE TABLE src (key string COMMENT 'default') STORED AS TEXTFILE;

LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt" INTO TABLE src;

set hive.stats.autogather=false;
set hive.explain.user=false;
set hive.exec.dynamic.partition=true;

create table ctas_part (key int) partitioned by (modkey bigint);

insert overwrite table ctas_part partition (modkey)
select key, ceil(key / 100) from src order by key limit 10;

> Reduce sink vectorization mixes column types
> 
>
> Key: HIVE-24532
> URL: https://issues.apache.org/jira/browse/HIVE-24532
> Project: Hive
>  Issue Type: Bug
>Reporter: Mustafa İman
>Priority: Major
> Attachments: castexception.txt, explainplan.txt
>
>
> I do insert overwrite select on a partitioned table. Partition column is 
> specified dynamically from select query. "ceil" function is applied on a 
> string column to specify partition for each row. Reduce sink gets confused 
> about the type of partition column. It leads to following cast exception in 
> runtime:
> {code:java}
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSerializeRow.serializePrimitiveWrite(VectorSerializeRow.java:452)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSerializeRow.serializeWrite(VectorSerializeRow.java:279)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSerializeRow.serializeWrite(VectorSerializeRow.java:258)
> at 
> org.apache.hadoop.hive.ql.exec.vector.reducesink.VectorReduceSinkObjectHashOperator.processKey(VectorReduceSinkObjectHashOperator.java:305)
> ... 28 more
> {code}
> The problem is reproducible by running mvn test 
> -Dtest=TestMiniLlapLocalCliDriver -Dqfile=insert0.q with "set 
> hive.stats.autogather=false". The additional config option causes insert 
> statements to be vectorized so the vectorization bug appears.
> insert0.q: 
> [https://github.com/apache/hive/blob/fb046c77257d648d0ee232356bdf665772b28bdd/ql/src/test/queries/clientpositive/insert0.q]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-24585) NPE in VectorizedOrcAcidRowBatchReader if LLAP is used with IO disabled

2021-01-05 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-24585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-24585 started by Ádám Szita.
-
> NPE in VectorizedOrcAcidRowBatchReader if LLAP is used with IO disabled
> ---
>
> Key: HIVE-24585
> URL: https://issues.apache.org/jira/browse/HIVE-24585
> Project: Hive
>  Issue Type: Bug
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>
> NPE is thrown if LLAP mode is turned on and LLAP daemon executes a query on 
> an ACID table if LLAP IO is disabled. Although this doesn't seem to be a very 
> useful LLAP environment setup, we'll need to cover this edge case too.
> {code:java}
> Caused by: java.lang.RuntimeException: java.io.IOException: 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:206)
>   at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.(TezGroupedSplitsInputFormat.java:145)
>   at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:111)
>   at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:156)
>   at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:82)
>   at 
> org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:703)
>   at 
> org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:662)
>   at 
> org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:150)
>   at 
> org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:114)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:543)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:189)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)
>   ... 15 more
> Caused by: java.io.IOException: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:431)
>   at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:203)
>   ... 26 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.getOrcTail(VectorizedOrcAcidRowBatchReader.java:680)
>   at 
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.findMinMaxKeys(VectorizedOrcAcidRowBatchReader.java:426)
>   at 
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.(VectorizedOrcAcidRowBatchReader.java:273)
>   at 
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.(VectorizedOrcAcidRowBatchReader.java:159)
>   at 
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.(VectorizedOrcAcidRowBatchReader.java:154)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:2074)
>   at 
> org.apache.hadoop.hive.ql.io.RecordReaderWrapper.create(RecordReaderWrapper.java:72)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:428)
>   ... 27 more {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24585) NPE in VectorizedOrcAcidRowBatchReader if LLAP is used with IO disabled

2021-01-05 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-24585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ádám Szita updated HIVE-24585:
--
Description: 
NPE is thrown if LLAP mode is turned on and LLAP daemon executes a query on an 
ACID table if LLAP IO is disabled. Although this doesn't seem to be a very 
useful LLAP environment setup, we'll need to cover this edge case too.
{code:java}
Caused by: java.lang.RuntimeException: java.io.IOException: 
java.lang.NullPointerException
at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:206)
at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.(TezGroupedSplitsInputFormat.java:145)
at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:111)
at 
org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:156)
at 
org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:82)
at 
org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:703)
at 
org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:662)
at 
org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:150)
at 
org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:114)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:543)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:189)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)
... 15 more
Caused by: java.io.IOException: java.lang.NullPointerException
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:431)
at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:203)
... 26 more
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.getOrcTail(VectorizedOrcAcidRowBatchReader.java:680)
at 
org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.findMinMaxKeys(VectorizedOrcAcidRowBatchReader.java:426)
at 
org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.(VectorizedOrcAcidRowBatchReader.java:273)
at 
org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.(VectorizedOrcAcidRowBatchReader.java:159)
at 
org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.(VectorizedOrcAcidRowBatchReader.java:154)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:2074)
at 
org.apache.hadoop.hive.ql.io.RecordReaderWrapper.create(RecordReaderWrapper.java:72)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:428)
... 27 more {code}

> NPE in VectorizedOrcAcidRowBatchReader if LLAP is used with IO disabled
> ---
>
> Key: HIVE-24585
> URL: https://issues.apache.org/jira/browse/HIVE-24585
> Project: Hive
>  Issue Type: Bug
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>
> NPE is thrown if LLAP mode is turned on and LLAP daemon executes a query on 
> an ACID table if LLAP IO is disabled. Although this doesn't seem to be a very 
> useful LLAP environment setup, we'll need to cover this edge case too.
> {code:java}
> Caused by: java.lang.RuntimeException: java.io.IOException: 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:206)
>   at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.(TezGroupedSplitsInputFormat.java:145)
>   at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:111)
>   at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:156)
>   at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:82)
>   at 
> org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:703)
>   at 
> 

[jira] [Commented] (HIVE-24584) IndexOutOfBoundsException from Kryo when running msck repair

2021-01-05 Thread Attila Magyar (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258905#comment-17258905
 ] 

Attila Magyar commented on HIVE-24584:
--

cc: [~srahman]

> IndexOutOfBoundsException from Kryo when running msck repair
> 
>
> Key: HIVE-24584
> URL: https://issues.apache.org/jira/browse/HIVE-24584
> Project: Hive
>  Issue Type: Bug
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The following exception is coming when running "msck repair table t1 sync 
> partitions".
> {code:java}
> java.lang.IndexOutOfBoundsException: Index: 97, Size: 0
> at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_232]
> at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_232]
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:834)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:684) 
> ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:814)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:116)
>  [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.filterPartitionsByExpr(PartitionExpressionForMetastore.java:88)
>  [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24585) NPE in VectorizedOrcAcidRowBatchReader if LLAP is used with IO disabled

2021-01-05 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-24585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ádám Szita reassigned HIVE-24585:
-


> NPE in VectorizedOrcAcidRowBatchReader if LLAP is used with IO disabled
> ---
>
> Key: HIVE-24585
> URL: https://issues.apache.org/jira/browse/HIVE-24585
> Project: Hive
>  Issue Type: Bug
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24531) Vectorized table scan ignores binary column

2021-01-05 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-24531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258865#comment-17258865
 ] 

Mustafa İman commented on HIVE-24531:
-

This happens only when doing vectorized scan over a table which was stored as 
TEXTFILE

> Vectorized table scan ignores binary column
> ---
>
> Key: HIVE-24531
> URL: https://issues.apache.org/jira/browse/HIVE-24531
> Project: Hive
>  Issue Type: Bug
>Reporter: Mustafa İman
>Priority: Major
>
> There is a binary field in over1k dataset in hive codebase. Vectorized table 
> scan ignores binary field and passes as null in all rows. The issue affects 
> insert queries too with external tables and managed tables when 
> "hive.stats.autogather=false". 
> To reproduce:
> Add "set hive.stats.autogather=false;" on top of "vector_data_types.q"
> Run mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile=vector_data_types.q"
> Observe that "bin" column is all NULL when querying any of the tables.
>  
> Below is a simplified version of the same test:
> {code:java}
> set hive.mapred.mode=nonstrict;
> set hive.explain.user=false;
> set hive.fetch.task.conversion=none;
> set hive.stats.autogather=false;
> DROP TABLE over1k_n8;
> DROP TABLE over1korc_n1;
> -- data setup
> CREATE TABLE over1k_n8(t tinyint,
>si smallint,
>i int,
>b bigint,
>f float,
>d double,
>bo boolean,
>s string,
>ts timestamp,
>`dec` decimal(4,2),
>bin binary)
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
> STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH '../../data/files/over1k' OVERWRITE INTO TABLE 
> over1k_n8;
> analyze table over1k_n8 compute statistics;
> analyze table over1k_n8 compute statistics for columns;
> select * from over1k_n8 limit 10;
> select count(1) from over1k_n8 where bin is null;
> CREATE TABLE over1korc_n1(t tinyint,
>si smallint,
>i int,
>b bigint,
>f float,
>d double,
>bo boolean,
>s string,
>ts timestamp,
>`dec` decimal(4,2),
>bin binary)
> STORED AS ORC;
> explain vectorization detail
> INSERT INTO TABLE over1korc_n1 SELECT * FROM over1k_n8;
> INSERT INTO TABLE over1korc_n1 SELECT * FROM over1k_n8;
> select count(1) from over1korc_n1 where bin is null;
> select * from over1korc_n1 limit 10;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.6

2021-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=531142=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-531142
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 05/Jan/21 11:57
Start Date: 05/Jan/21 11:57
Worklog Time Spent: 10m 
  Work Description: pgaref edited a comment on pull request #1823:
URL: https://github.com/apache/hive/pull/1823#issuecomment-754556917


   > note: I tend to use distinct version numbers because snapshots might get 
cached and not updated - but that was an issue with the old ptest infra; I 
guess the current setup will handle that better...
   > 
   > anything will do which can serve a web page - I wanted to add 
https://raw.githubusercontent.com/pgaref/mave-repo/main/ it for you - however 
that page returns error 400 for everything...
   
   Hey Zoltan -- I noticed the 400 myself for listing and non-existing files 
(GitHub policy?) but maven pulling seems to work, for example, check 
https://raw.githubusercontent.com/pgaref/mave-repo/main/org/apache/orc/orc-core/maven-metadata-local.xml
   
   To be on the safe side though I created this Repsy public repo: 
https://repo.repsy.io/mvn/pgaref/repository
   Feel free to add this instead :) 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 531142)
Time Spent: 1h 50m  (was: 1h 40m)

> Upgrade ORC version to 1.6.6
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288==12318320=Create_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24584) IndexOutOfBoundsException from Kryo when running msck repair

2021-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24584?focusedWorklogId=531140=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-531140
 ]

ASF GitHub Bot logged work on HIVE-24584:
-

Author: ASF GitHub Bot
Created on: 05/Jan/21 11:51
Start Date: 05/Jan/21 11:51
Worklog Time Spent: 10m 
  Work Description: zeroflag opened a new pull request #1828:
URL: https://github.com/apache/hive/pull/1828


   wip



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 531140)
Remaining Estimate: 0h
Time Spent: 10m

> IndexOutOfBoundsException from Kryo when running msck repair
> 
>
> Key: HIVE-24584
> URL: https://issues.apache.org/jira/browse/HIVE-24584
> Project: Hive
>  Issue Type: Bug
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The following exception is coming when running "msck repair table t1 sync 
> partitions".
> {code:java}
> java.lang.IndexOutOfBoundsException: Index: 97, Size: 0
> at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_232]
> at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_232]
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:834)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:684) 
> ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:814)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:116)
>  [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.filterPartitionsByExpr(PartitionExpressionForMetastore.java:88)
>  [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24584) IndexOutOfBoundsException from Kryo when running msck repair

2021-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24584:
--
Labels: pull-request-available  (was: )

> IndexOutOfBoundsException from Kryo when running msck repair
> 
>
> Key: HIVE-24584
> URL: https://issues.apache.org/jira/browse/HIVE-24584
> Project: Hive
>  Issue Type: Bug
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The following exception is coming when running "msck repair table t1 sync 
> partitions".
> {code:java}
> java.lang.IndexOutOfBoundsException: Index: 97, Size: 0
> at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_232]
> at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_232]
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:834)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:684) 
> ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:814)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:116)
>  [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.filterPartitionsByExpr(PartitionExpressionForMetastore.java:88)
>  [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-24584) IndexOutOfBoundsException from Kryo when running msck repair

2021-01-05 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-24584 started by Attila Magyar.

> IndexOutOfBoundsException from Kryo when running msck repair
> 
>
> Key: HIVE-24584
> URL: https://issues.apache.org/jira/browse/HIVE-24584
> Project: Hive
>  Issue Type: Bug
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>
> The following exception is coming when running "msck repair table t1 sync 
> partitions".
> {code:java}
> java.lang.IndexOutOfBoundsException: Index: 97, Size: 0
> at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_232]
> at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_232]
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:834)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:684) 
> ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:814)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:116)
>  [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.filterPartitionsByExpr(PartitionExpressionForMetastore.java:88)
>  [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24584) IndexOutOfBoundsException from Kryo when running msck repair

2021-01-05 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar reassigned HIVE-24584:



> IndexOutOfBoundsException from Kryo when running msck repair
> 
>
> Key: HIVE-24584
> URL: https://issues.apache.org/jira/browse/HIVE-24584
> Project: Hive
>  Issue Type: Bug
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>
> The following exception is coming when running "msck repair table t1 sync 
> partitions".
> {code:java}
> java.lang.IndexOutOfBoundsException: Index: 97, Size: 0
> at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_232]
> at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_232]
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:834)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:684) 
> ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:814)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:116)
>  [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.filterPartitionsByExpr(PartitionExpressionForMetastore.java:88)
>  [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24524) LLAP ShuffleHandler: upgrade to netty4

2021-01-05 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-24524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-24524:

Description: 
Tez already has a WIP patch for upgrading its shuffle handler to netty4. Netty4 
is told to be a possible performance improvement compared to Netty3. However, 
the refactor is not trivial, TEZ-4157 covers that more or less (the code bases 
are very similar).

Background:
netty4 migration guideline: https://netty.io/wiki/new-and-noteworthy-in-4.0.html
articles of possible performance improvement:
https://blog.twitter.com/engineering/en_us/a/2013/netty-4-at-twitter-reduced-gc-overhead.html
https://developer.squareup.com/blog/upgrading-a-reverse-proxy-from-netty-3-to-4/

some other notes: Netty3 is EOL since 2016:
https://netty.io/news/2016/06/29/3-10-6-Final.html

  was:
Tez already has a WIP patch for upgrading its shuffle handler to netty4. Netty4 
is told to be a possible performance improvement compared to Netty3. However, 
the refactor is not trivial, TEZ-4157 covers that more or less (the code bases 
are very similar).

Background:
netty4 migration guideline: https://netty.io/wiki/new-and-noteworthy-in-4.0.html
articles of possible performance improvement:
https://blog.twitter.com/engineering/en_us/a/2013/netty-4-at-twitter-reduced-gc-overhead.html
https://developer.squareup.com/blog/upgrading-a-reverse-proxy-from-netty-3-to-4/


> LLAP ShuffleHandler: upgrade to netty4
> --
>
> Key: HIVE-24524
> URL: https://issues.apache.org/jira/browse/HIVE-24524
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Tez already has a WIP patch for upgrading its shuffle handler to netty4. 
> Netty4 is told to be a possible performance improvement compared to Netty3. 
> However, the refactor is not trivial, TEZ-4157 covers that more or less (the 
> code bases are very similar).
> Background:
> netty4 migration guideline: 
> https://netty.io/wiki/new-and-noteworthy-in-4.0.html
> articles of possible performance improvement:
> https://blog.twitter.com/engineering/en_us/a/2013/netty-4-at-twitter-reduced-gc-overhead.html
> https://developer.squareup.com/blog/upgrading-a-reverse-proxy-from-netty-3-to-4/
> some other notes: Netty3 is EOL since 2016:
> https://netty.io/news/2016/06/29/3-10-6-Final.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24526) Get grouped locations of external table data using metatool.

2021-01-05 Thread Anishek Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anishek Agarwal resolved HIVE-24526.

Resolution: Fixed

merged to master, thanks for the patch and review.

> Get grouped locations of external table data using metatool.
> 
>
> Key: HIVE-24526
> URL: https://issues.apache.org/jira/browse/HIVE-24526
> Project: Hive
>  Issue Type: Task
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24526.01.patch, HIVE-24526.02.patch, 
> HIVE-24526.03.patch, HIVE-24526.04.patch, HIVE-24526.05.patch, 
> HIVE-24526.06.patch, HIVE-24526.07.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This task adds two new functionalities to metatool.
> The first option, -listExtTblLocs generates a json-file containing a set of 
> locations which cover all external-table data-locations for a database 
> specified by user.
> The second option, -diffExtTblLocs creates a diff from two jsons generated 
> using the first option.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24526) Get grouped locations of external table data using metatool.

2021-01-05 Thread Arko Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arko Sharma updated HIVE-24526:
---
Description: 
This task adds two new functionalities to metatool.

The first option, -listExtTblLocs generates a json-file containing a set of 
locations which cover all external-table data-locations for a database 
specified by user.

The second option, -diffExtTblLocs creates a diff from two jsons generated 
using the first option.

  was:Add a functionality to metatool to get a list of locations which cover 
all external-table data-locations for a database specified by user.


> Get grouped locations of external table data using metatool.
> 
>
> Key: HIVE-24526
> URL: https://issues.apache.org/jira/browse/HIVE-24526
> Project: Hive
>  Issue Type: Task
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24526.01.patch, HIVE-24526.02.patch, 
> HIVE-24526.03.patch, HIVE-24526.04.patch, HIVE-24526.05.patch, 
> HIVE-24526.06.patch, HIVE-24526.07.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This task adds two new functionalities to metatool.
> The first option, -listExtTblLocs generates a json-file containing a set of 
> locations which cover all external-table data-locations for a database 
> specified by user.
> The second option, -diffExtTblLocs creates a diff from two jsons generated 
> using the first option.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24575) VectorGroupByOperator reusing keys can lead to wrong results

2021-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24575?focusedWorklogId=531120=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-531120
 ]

ASF GitHub Bot logged work on HIVE-24575:
-

Author: ASF GitHub Bot
Created on: 05/Jan/21 10:58
Start Date: 05/Jan/21 10:58
Worklog Time Spent: 10m 
  Work Description: mustafaiman commented on pull request #1822:
URL: https://github.com/apache/hive/pull/1822#issuecomment-754564513


   +1 pending tests



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 531120)
Time Spent: 40m  (was: 0.5h)

> VectorGroupByOperator reusing keys can lead to wrong results
> 
>
> Key: HIVE-24575
> URL: https://issues.apache.org/jira/browse/HIVE-24575
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
>  A common sql like
> {code:java}
> select category as category, count(distinct maskdid) as uv from 
> dwd_internal_inc_d group by category{code}
> can have a wrong result on the trunk,  the result of column category can be 
> confused and
> aggregate of distinct maskdid is also wrong. 
> After some debugging, We find that the problem is caused by wrong 
> byteStarts[i] when using it to copy the current keys to the reusable keys: 
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/wrapper/VectorHashKeyWrapperGeneral.java#L351-L362]
> The byteStarts[i] is always 0 due to Arrays.fill(byteStarts, 0); so it copies 
> the range from 0 other then the real start index to len of the current keys 
> to the reusable keys when clone.byteValues[i].length >= byteValues[i].length 
> met, which results to the problem.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24526) Get grouped locations of external table data using metatool.

2021-01-05 Thread Pravin Sinha (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258825#comment-17258825
 ] 

Pravin Sinha commented on HIVE-24526:
-

+1

> Get grouped locations of external table data using metatool.
> 
>
> Key: HIVE-24526
> URL: https://issues.apache.org/jira/browse/HIVE-24526
> Project: Hive
>  Issue Type: Task
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24526.01.patch, HIVE-24526.02.patch, 
> HIVE-24526.03.patch, HIVE-24526.04.patch, HIVE-24526.05.patch, 
> HIVE-24526.06.patch, HIVE-24526.07.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Add a functionality to metatool to get a list of locations which cover all 
> external-table data-locations for a database specified by user.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24575) VectorGroupByOperator reusing keys can lead to wrong results

2021-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24575?focusedWorklogId=531116=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-531116
 ]

ASF GitHub Bot logged work on HIVE-24575:
-

Author: ASF GitHub Bot
Created on: 05/Jan/21 10:47
Start Date: 05/Jan/21 10:47
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on a change in pull request #1822:
URL: https://github.com/apache/hive/pull/1822#discussion_r551854615



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/wrapper/VectorHashKeyWrapperGeneral.java
##
@@ -348,8 +348,8 @@ public void copyKey(KeyWrapper oldWrapper) {
 }
   } else {
 System.arraycopy(byteLengths, 0, clone.byteLengths, 0, 
byteValues.length);
-Arrays.fill(byteStarts, 0);
-System.arraycopy(byteStarts, 0, clone.byteStarts, 0, 
byteValues.length);
+Arrays.fill(clone.byteStarts, 0);
+// System.arraycopy(byteStarts, 0, clone.byteStarts, 0, 
byteValues.length);

Review comment:
   Fixed and added a test for this. Thanks very much for the review!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 531116)
Time Spent: 0.5h  (was: 20m)

> VectorGroupByOperator reusing keys can lead to wrong results
> 
>
> Key: HIVE-24575
> URL: https://issues.apache.org/jira/browse/HIVE-24575
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
>  A common sql like
> {code:java}
> select category as category, count(distinct maskdid) as uv from 
> dwd_internal_inc_d group by category{code}
> can have a wrong result on the trunk,  the result of column category can be 
> confused and
> aggregate of distinct maskdid is also wrong. 
> After some debugging, We find that the problem is caused by wrong 
> byteStarts[i] when using it to copy the current keys to the reusable keys: 
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/wrapper/VectorHashKeyWrapperGeneral.java#L351-L362]
> The byteStarts[i] is always 0 due to Arrays.fill(byteStarts, 0); so it copies 
> the range from 0 other then the real start index to len of the current keys 
> to the reusable keys when clone.byteValues[i].length >= byteValues[i].length 
> met, which results to the problem.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.6

2021-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=53=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-53
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 05/Jan/21 10:43
Start Date: 05/Jan/21 10:43
Worklog Time Spent: 10m 
  Work Description: pgaref edited a comment on pull request #1823:
URL: https://github.com/apache/hive/pull/1823#issuecomment-754556917


   > note: I tend to use distinct version numbers because snapshots might get 
cached and not updated - but that was an issue with the old ptest infra; I 
guess the current setup will handle that better...
   > 
   > anything will do which can serve a web page - I wanted to add 
https://raw.githubusercontent.com/pgaref/mave-repo/main/ it for you - however 
that page returns error 400 for everything...
   
   Hey Zoltan -- I noticed the 400 myself for listing and non-existing files 
(GitHub policy?) but maven pulling seems to work, for example, check 
https://raw.githubusercontent.com/pgaref/mave-repo/main/org/apache/orc/orc-core/maven-metadata-local.xml



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 53)
Time Spent: 1h 40m  (was: 1.5h)

> Upgrade ORC version to 1.6.6
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288==12318320=Create_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.6

2021-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=531109=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-531109
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 05/Jan/21 10:42
Start Date: 05/Jan/21 10:42
Worklog Time Spent: 10m 
  Work Description: pgaref commented on pull request #1823:
URL: https://github.com/apache/hive/pull/1823#issuecomment-754556917


   > note: I tend to use distinct version numbers because snapshots might get 
cached and not updated - but that was an issue with the old ptest infra; I 
guess the current setup will handle that better...
   > 
   > anything will do which can serve a web page - I wanted to add 
https://raw.githubusercontent.com/pgaref/mave-repo/main/ it for you - however 
that page returns error 400 for everything...
   
   Hey Zoltan -- I noticed the 400 myself (for listing and non-existing files) 
but maven pulling seems to work, for example, check 
https://raw.githubusercontent.com/pgaref/mave-repo/main/org/apache/orc/orc-core/maven-metadata-local.xml



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 531109)
Time Spent: 1h 20m  (was: 1h 10m)

> Upgrade ORC version to 1.6.6
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288==12318320=Create_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.6

2021-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=531110=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-531110
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 05/Jan/21 10:42
Start Date: 05/Jan/21 10:42
Worklog Time Spent: 10m 
  Work Description: pgaref edited a comment on pull request #1823:
URL: https://github.com/apache/hive/pull/1823#issuecomment-754556917


   > note: I tend to use distinct version numbers because snapshots might get 
cached and not updated - but that was an issue with the old ptest infra; I 
guess the current setup will handle that better...
   > 
   > anything will do which can serve a web page - I wanted to add 
https://raw.githubusercontent.com/pgaref/mave-repo/main/ it for you - however 
that page returns error 400 for everything...
   
   Hey Zoltan -- I noticed the 400 myself for listing and non-existing files 
but maven pulling seems to work, for example, check 
https://raw.githubusercontent.com/pgaref/mave-repo/main/org/apache/orc/orc-core/maven-metadata-local.xml



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 531110)
Time Spent: 1.5h  (was: 1h 20m)

> Upgrade ORC version to 1.6.6
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288==12318320=Create_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.6

2021-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=531105=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-531105
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 05/Jan/21 10:30
Start Date: 05/Jan/21 10:30
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on pull request #1823:
URL: https://github.com/apache/hive/pull/1823#issuecomment-754550816


   note: I tend to use distinct version numbers because snapshots might get 
cached and not updated - but that was an issue with the old ptest infra; I 
guess the current setup will handle that better...
   
   anything will do which can serve a web page - I wanted to add 
https://raw.githubusercontent.com/pgaref/mave-repo/main/  it for you - however 
that page returns error 400 for everything...
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 531105)
Time Spent: 1h 10m  (was: 1h)

> Upgrade ORC version to 1.6.6
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288==12318320=Create_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24581) Remove AcidUtils call from OrcInputformat for non transactional tables

2021-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24581?focusedWorklogId=531104=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-531104
 ]

ASF GitHub Bot logged work on HIVE-24581:
-

Author: ASF GitHub Bot
Created on: 05/Jan/21 10:14
Start Date: 05/Jan/21 10:14
Worklog Time Spent: 10m 
  Work Description: pvargacl opened a new pull request #1826:
URL: https://github.com/apache/hive/pull/1826


   
   
   ### What changes were proposed in this pull request?
   
   - Remove unneccessary AcidUtils.getAcidState call from OrcInputformat when 
the table is not transactional
   - Remove redundant filesystem utility functions from AcidUtils to HdfsUtils
   
   ### Why are the changes needed?
   Make the code more readable
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Current unit tests.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 531104)
Remaining Estimate: 0h
Time Spent: 10m

> Remove AcidUtils call from OrcInputformat for non transactional tables
> --
>
> Key: HIVE-24581
> URL: https://issues.apache.org/jira/browse/HIVE-24581
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently the split generation in OrcInputformat is tightly coupled with acid 
> and AcidUtils.getAcidState is called even if the table is not transactional. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24581) Remove AcidUtils call from OrcInputformat for non transactional tables

2021-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24581:
--
Labels: pull-request-available  (was: )

> Remove AcidUtils call from OrcInputformat for non transactional tables
> --
>
> Key: HIVE-24581
> URL: https://issues.apache.org/jira/browse/HIVE-24581
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently the split generation in OrcInputformat is tightly coupled with acid 
> and AcidUtils.getAcidState is called even if the table is not transactional. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24337) Cache delete delta files in LLAP cache

2021-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24337?focusedWorklogId=531103=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-531103
 ]

ASF GitHub Bot logged work on HIVE-24337:
-

Author: ASF GitHub Bot
Created on: 05/Jan/21 10:14
Start Date: 05/Jan/21 10:14
Worklog Time Spent: 10m 
  Work Description: szlta commented on pull request #1776:
URL: https://github.com/apache/hive/pull/1776#issuecomment-754542630


   Tested with hive.llap.io.cache.deletedeltas=all.
   Jenkins found 1 failed test:
   
org.apache.hadoop.hive.cli.split19.TestMiniLlapLocalCliDriver.testCliDriver[results_cache_invalidation2]
   Re-run manually and passed, thus it is flaky.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 531103)
Time Spent: 4h 20m  (was: 4h 10m)

> Cache delete delta files in LLAP cache
> --
>
> Key: HIVE-24337
> URL: https://issues.apache.org/jira/browse/HIVE-24337
> Project: Hive
>  Issue Type: New Feature
>Reporter: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> HIVE-23824 added the functionality of caching metadata part of orc files in 
> LLAP cache, so that ACID reads can be faster. However the content itself 
> still needs to be read in every single time. If this could be cached too, 
> additional time could be saved.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24570) Hive on spark tmp file should be delete when driver process finished

2021-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24570?focusedWorklogId=531098=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-531098
 ]

ASF GitHub Bot logged work on HIVE-24570:
-

Author: ASF GitHub Bot
Created on: 05/Jan/21 09:36
Start Date: 05/Jan/21 09:36
Worklog Time Spent: 10m 
  Work Description: fsilent commented on pull request #1816:
URL: https://github.com/apache/hive/pull/1816#issuecomment-754522248


   can someone review this pr? 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 531098)
Time Spent: 40m  (was: 0.5h)

> Hive on spark tmp file should be delete when driver process finished
> 
>
> Key: HIVE-24570
> URL: https://issues.apache.org/jira/browse/HIVE-24570
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.0, 4.0.0
>Reporter: zhaolong
>Assignee: zhaolong
>Priority: Major
>  Labels: pull-request-available
> Attachments: 0001-HIVE-24570.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Hive on spark tmp file should be delete when driver process finished, now 
> it`s in java.io.tmpdir (default /tmp) directory until hiveserver jvm is stop



--
This message was sent by Atlassian Jira
(v8.3.4#803005)