[jira] [Resolved] (HIVE-25431) Enable CBO for null safe equality operator.

2021-08-05 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera resolved HIVE-25431.

Resolution: Fixed

> Enable CBO for null safe equality operator.
> ---
>
> Key: HIVE-25431
> URL: https://issues.apache.org/jira/browse/HIVE-25431
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
> The CBO is disabled for null safe equality (<=>)  operator. This is causing 
> the sub optimal join execution  for some queries. As null safe equality is 
> supported by joins, the CBO can be enabled for it. There will be issues with 
> join reordering as Hive does not support join reordering for null safe 
> equality operator. But with CBO enabled the join plan will be better.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25403) from_unixtime() does not consider leap seconds

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25403?focusedWorklogId=634950=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634950
 ]

ASF GitHub Bot logged work on HIVE-25403:
-

Author: ASF GitHub Bot
Created on: 06/Aug/21 05:10
Start Date: 06/Aug/21 05:10
Worklog Time Spent: 10m 
  Work Description: warriersruthi commented on a change in pull request 
#2550:
URL: https://github.com/apache/hive/pull/2550#discussion_r683601489



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFFromUnixTime.java
##
@@ -87,89 +75,58 @@ public ObjectInspector initialize(ObjectInspector[] 
arguments) throws UDFArgumen
 inputLongOI = (LongObjectInspector) arguments[0];
 break;
   default:
-throw new UDFArgumentException("The function " + 
getName().toUpperCase()
-+ " takes only int/long types for first argument. Got Type:" + 
arg0OI.getPrimitiveCategory().name());
+throw new UDFArgumentException("The function from_unixtime takes only 
int/long types for first argument. Got Type:"
++ arg0OI.getPrimitiveCategory().name());
 }
 
 if (arguments.length == 2) {
-  PrimitiveObjectInspector arg1OI = (PrimitiveObjectInspector) 
arguments[1];
-  switch (arg1OI.getPrimitiveCategory()) {
-case CHAR:
-case VARCHAR:
-case STRING:
-  inputTextConverter = ObjectInspectorConverters.getConverter(arg1OI,
-  PrimitiveObjectInspectorFactory.javaStringObjectInspector);
-  break;
-default:
-  throw new UDFArgumentException("The function " + 
getName().toUpperCase()
-  + " takes only string type for second argument. Got Type:" + 
arg1OI.getPrimitiveCategory().name());
-  }
+  checkArgGroups(arguments, 1, inputTypes, STRING_GROUP);
+  obtainStringConverter(arguments, 1, inputTypes, converters);
 }
 
-if (timeZone == null) {
-  timeZone = SessionState.get() == null ? new 
HiveConf().getLocalTimeZone() : SessionState.get().getConf()
-  .getLocalTimeZone();
-  formatter.setTimeZone(TimeZone.getTimeZone(timeZone));
-}
+timeZone = SessionState.get() == null ? new HiveConf().getLocalTimeZone() 
: SessionState.get().getConf()
+  .getLocalTimeZone();
+FORMATTER.withZone(timeZone);
 
 return PrimitiveObjectInspectorFactory.writableStringObjectInspector;
   }
 
-  @Override
-  public void configure(MapredContext context) {
-if (context != null) {
-  String timeZoneStr = HiveConf.getVar(context.getJobConf(), 
HiveConf.ConfVars.HIVE_LOCAL_TIME_ZONE);
-  timeZone = TimestampTZUtil.parseTimeZone(timeZoneStr);
-  formatter.setTimeZone(TimeZone.getTimeZone(timeZone));
-}
-  }
-
   @Override
   public Object evaluate(DeferredObject[] arguments) throws HiveException {
 if (arguments[0].get() == null) {
   return null;
 }
 
-if (inputTextConverter != null) {
-  if (arguments[1].get() == null) {
-return null;
-  }
-  String format = (String) inputTextConverter.convert(arguments[1].get());
+if(arguments.length == 2) {
+  String format = getStringValue(arguments, 1, converters);
   if (format == null) {
 return null;
   }
   if (!format.equals(lastFormat)) {
-formatter = new SimpleDateFormat(format);
-formatter.setTimeZone(TimeZone.getTimeZone(timeZone));
+FORMATTER = DateTimeFormatter.ofPattern(format);
 lastFormat = format;
   }
 }
 
 // convert seconds to milliseconds
 long unixtime;
+Instant i;

Review comment:
   inoutIntOI is not required. So removed the whole check. 

##
File path: 
ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFFromUnixTime.java
##
@@ -0,0 +1,132 @@
+/* 


+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.udf.generic;
+
+import 

[jira] [Updated] (HIVE-25432) Support Join reordering for null safe equality operator.

2021-08-05 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-25432:
---
Parent: (was: HIVE-25431)
Issue Type: Bug  (was: Sub-task)

> Support Join reordering for null safe equality operator.
> 
>
> Key: HIVE-25432
> URL: https://issues.apache.org/jira/browse/HIVE-25432
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2
>Reporter: mahesh kumar behera
>Priority: Major
>
> Support Join reordering for null safe equality operator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25431) Enable CBO for null safe equality operator.

2021-08-05 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera reassigned HIVE-25431:
--


> Enable CBO for null safe equality operator.
> ---
>
> Key: HIVE-25431
> URL: https://issues.apache.org/jira/browse/HIVE-25431
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
> The CBO is disabled for null safe equality (<=>)  operator. This is causing 
> the sub optimal join execution  for some queries. As null safe equality is 
> supported by joins, the CBO can be enabled for it. There will be issues with 
> join reordering as Hive does not support join reordering for null safe 
> equality operator. But with CBO enabled the join plan will be better.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25397) Snapshot support for controlled failover

2021-08-05 Thread Arko Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arko Sharma updated HIVE-25397:
---
Description: In case the same locations are used for external tables on the 
source and target, then the snapshots created during replication can be re-used 
during reverse replication. This patch enables re-using the snapshots  during 
reverse replication using a configuration.

> Snapshot support for controlled failover
> 
>
> Key: HIVE-25397
> URL: https://issues.apache.org/jira/browse/HIVE-25397
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In case the same locations are used for external tables on the source and 
> target, then the snapshots created during replication can be re-used during 
> reverse replication. This patch enables re-using the snapshots  during 
> reverse replication using a configuration.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25341) Reduce FileSystem calls in case drop database cascade

2021-08-05 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena resolved HIVE-25341.
-
Resolution: Fixed

> Reduce FileSystem calls in case drop database cascade
> -
>
> Key: HIVE-25341
> URL: https://issues.apache.org/jira/browse/HIVE-25341
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Reduce the number of FileSystem calls made in case of drop database cascade



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25341) Reduce FileSystem calls in case drop database cascade

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25341?focusedWorklogId=634837=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634837
 ]

ASF GitHub Bot logged work on HIVE-25341:
-

Author: ASF GitHub Bot
Created on: 05/Aug/21 22:13
Start Date: 05/Aug/21 22:13
Worklog Time Spent: 10m 
  Work Description: rbalamohan merged pull request #2491:
URL: https://github.com/apache/hive/pull/2491


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 634837)
Time Spent: 1h 10m  (was: 1h)

> Reduce FileSystem calls in case drop database cascade
> -
>
> Key: HIVE-25341
> URL: https://issues.apache.org/jira/browse/HIVE-25341
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Reduce the number of FileSystem calls made in case of drop database cascade



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25403) from_unixtime() does not consider leap seconds

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25403?focusedWorklogId=634661=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634661
 ]

ASF GitHub Bot logged work on HIVE-25403:
-

Author: ASF GitHub Bot
Created on: 05/Aug/21 16:33
Start Date: 05/Aug/21 16:33
Worklog Time Spent: 10m 
  Work Description: warriersruthi commented on pull request #2550:
URL: https://github.com/apache/hive/pull/2550#issuecomment-893598856


   > +1 LGTM
   
   Thanks, Ashish for your detailed review!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 634661)
Time Spent: 2h 20m  (was: 2h 10m)

>  from_unixtime() does not consider leap seconds
> ---
>
> Key: HIVE-25403
> URL: https://issues.apache.org/jira/browse/HIVE-25403
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Sruthi Mooriyathvariam
>Assignee: Sruthi Mooriyathvariam
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.2
>
> Attachments: image-2021-07-29-14-42-49-806.png
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> The Unix_timestamp() considers "leap second" while the from_unixtime is not; 
> which results in to wrong result as below:
> !image-2021-07-29-14-42-49-806.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25403) from_unixtime() does not consider leap seconds

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25403?focusedWorklogId=634643=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634643
 ]

ASF GitHub Bot logged work on HIVE-25403:
-

Author: ASF GitHub Bot
Created on: 05/Aug/21 16:18
Start Date: 05/Aug/21 16:18
Worklog Time Spent: 10m 
  Work Description: warriersruthi commented on a change in pull request 
#2550:
URL: https://github.com/apache/hive/pull/2550#discussion_r683602325



##
File path: 
ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFFromUnixTime.java
##
@@ -0,0 +1,132 @@
+/* 


+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.udf.generic;
+
+import java.time.ZoneId;
+import org.apache.hadoop.hive.common.type.Date;
+import org.apache.hadoop.hive.common.type.Timestamp;
+import org.apache.hadoop.hive.common.type.TimestampTZ;
+import org.apache.hadoop.hive.common.type.TimestampTZUtil;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.exec.MapredContext;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDF.DeferredJavaObject;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDF.DeferredObject;
+import org.apache.hadoop.hive.serde2.io.DateWritableV2;
+import org.apache.hadoop.hive.serde2.io.TimestampWritableV2;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
+import org.apache.hadoop.io.LongWritable;
+import org.apache.hadoop.io.Text;
+
+import static org.junit.Assert.assertNull;
+import static org.junit.Assert.assertEquals;
+import static org.mockito.Mockito.when;
+
+import org.apache.hadoop.mapred.JobConf;
+import org.joda.time.format.DateTimeFormatter;
+import org.joda.time.format.DateTimeFormatterBuilder;
+import org.junit.Test;
+import org.mockito.Mockito;
+
+/**
+ * TestGenericUDFFromUnixTime.
+ */
+public class TestGenericUDFFromUnixTime {
+
+  public static void runAndVerify(GenericUDFFromUnixTime udf,
+  Object arg, Object expected) throws HiveException {
+DeferredObject[] args = { new DeferredJavaObject(arg) };
+Object result = udf.evaluate(args);
+if (expected == null) {
+  assertNull(result);
+} else {
+  assertEquals(expected.toString(), result.toString());
+}
+  }
+
+  public static void runAndVerify(GenericUDFFromUnixTime udf,
+  Object arg1, Object arg2, Object expected) throws HiveException {
+DeferredObject[] args = { new DeferredJavaObject(arg1), new 
DeferredJavaObject(arg2) };
+Object result = udf.evaluate(args);
+
+if (expected == null) {
+  assertNull(result);
+} else {
+  assertEquals(expected.toString(), result.toString());
+}
+  }
+
+  @Test
+  public void testTimestampDefaultTimezone() throws HiveException {
+ObjectInspector valueLongOI = 
PrimitiveObjectInspectorFactory.writableLongObjectInspector;
+GenericUDFFromUnixTime udf = new GenericUDFFromUnixTime();
+ObjectInspector args[] = {valueLongOI};
+udf.initialize(args);
+
+Timestamp ts = Timestamp.valueOf("1470-01-01 00:00:00");
+TimestampTZ tstz = TimestampTZUtil.convert(ts, ZoneId.systemDefault());
+
+runAndVerify(udf,
+new LongWritable(tstz.getEpochSecond()), new Text("1470-01-01 
00:00:00"));
+
+// test null values
+runAndVerify(udf, null, null);
+  }
+
+  @Test
+  public void testTimestampOtherTimezone() throws HiveException {
+ObjectInspector valueLongOI = 
PrimitiveObjectInspectorFactory.writableLongObjectInspector;
+GenericUDFFromUnixTime udf = new GenericUDFFromUnixTime();
+ObjectInspector args[] = {valueLongOI};
+udf.initialize(args);
+
+Timestamp ts = Timestamp.valueOf("2010-01-13 11:57:40");
+TimestampTZ tstz1 = TimestampTZUtil.convert(ts, 
ZoneId.of("America/Los_Angeles"));
+TimestampTZ tstz2 = 

[jira] [Work logged] (HIVE-25403) from_unixtime() does not consider leap seconds

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25403?focusedWorklogId=634641=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634641
 ]

ASF GitHub Bot logged work on HIVE-25403:
-

Author: ASF GitHub Bot
Created on: 05/Aug/21 16:18
Start Date: 05/Aug/21 16:18
Worklog Time Spent: 10m 
  Work Description: warriersruthi commented on a change in pull request 
#2550:
URL: https://github.com/apache/hive/pull/2550#discussion_r683601956



##
File path: 
ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFFromUnixTime.java
##
@@ -0,0 +1,132 @@
+/* 


+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.udf.generic;
+
+import java.time.ZoneId;
+import org.apache.hadoop.hive.common.type.Date;
+import org.apache.hadoop.hive.common.type.Timestamp;
+import org.apache.hadoop.hive.common.type.TimestampTZ;
+import org.apache.hadoop.hive.common.type.TimestampTZUtil;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.exec.MapredContext;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDF.DeferredJavaObject;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDF.DeferredObject;
+import org.apache.hadoop.hive.serde2.io.DateWritableV2;
+import org.apache.hadoop.hive.serde2.io.TimestampWritableV2;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
+import org.apache.hadoop.io.LongWritable;
+import org.apache.hadoop.io.Text;
+
+import static org.junit.Assert.assertNull;
+import static org.junit.Assert.assertEquals;
+import static org.mockito.Mockito.when;
+
+import org.apache.hadoop.mapred.JobConf;
+import org.joda.time.format.DateTimeFormatter;
+import org.joda.time.format.DateTimeFormatterBuilder;
+import org.junit.Test;
+import org.mockito.Mockito;
+
+/**
+ * TestGenericUDFFromUnixTime.
+ */
+public class TestGenericUDFFromUnixTime {
+
+  public static void runAndVerify(GenericUDFFromUnixTime udf,
+  Object arg, Object expected) throws HiveException {
+DeferredObject[] args = { new DeferredJavaObject(arg) };
+Object result = udf.evaluate(args);
+if (expected == null) {
+  assertNull(result);
+} else {
+  assertEquals(expected.toString(), result.toString());
+}
+  }
+
+  public static void runAndVerify(GenericUDFFromUnixTime udf,
+  Object arg1, Object arg2, Object expected) throws HiveException {
+DeferredObject[] args = { new DeferredJavaObject(arg1), new 
DeferredJavaObject(arg2) };
+Object result = udf.evaluate(args);
+
+if (expected == null) {
+  assertNull(result);
+} else {
+  assertEquals(expected.toString(), result.toString());
+}
+  }
+
+  @Test
+  public void testTimestampDefaultTimezone() throws HiveException {
+ObjectInspector valueLongOI = 
PrimitiveObjectInspectorFactory.writableLongObjectInspector;

Review comment:
   IntObjectInspector is removed from the code. so no need to add anything 
for that in the test.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 634641)
Time Spent: 2h  (was: 1h 50m)

>  from_unixtime() does not consider leap seconds
> ---
>
> Key: HIVE-25403
> URL: https://issues.apache.org/jira/browse/HIVE-25403
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Sruthi Mooriyathvariam
>Assignee: Sruthi Mooriyathvariam
>

[jira] [Work logged] (HIVE-25403) from_unixtime() does not consider leap seconds

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25403?focusedWorklogId=634640=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634640
 ]

ASF GitHub Bot logged work on HIVE-25403:
-

Author: ASF GitHub Bot
Created on: 05/Aug/21 16:17
Start Date: 05/Aug/21 16:17
Worklog Time Spent: 10m 
  Work Description: warriersruthi commented on a change in pull request 
#2550:
URL: https://github.com/apache/hive/pull/2550#discussion_r683601489



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFFromUnixTime.java
##
@@ -87,89 +75,58 @@ public ObjectInspector initialize(ObjectInspector[] 
arguments) throws UDFArgumen
 inputLongOI = (LongObjectInspector) arguments[0];
 break;
   default:
-throw new UDFArgumentException("The function " + 
getName().toUpperCase()
-+ " takes only int/long types for first argument. Got Type:" + 
arg0OI.getPrimitiveCategory().name());
+throw new UDFArgumentException("The function from_unixtime takes only 
int/long types for first argument. Got Type:"
++ arg0OI.getPrimitiveCategory().name());
 }
 
 if (arguments.length == 2) {
-  PrimitiveObjectInspector arg1OI = (PrimitiveObjectInspector) 
arguments[1];
-  switch (arg1OI.getPrimitiveCategory()) {
-case CHAR:
-case VARCHAR:
-case STRING:
-  inputTextConverter = ObjectInspectorConverters.getConverter(arg1OI,
-  PrimitiveObjectInspectorFactory.javaStringObjectInspector);
-  break;
-default:
-  throw new UDFArgumentException("The function " + 
getName().toUpperCase()
-  + " takes only string type for second argument. Got Type:" + 
arg1OI.getPrimitiveCategory().name());
-  }
+  checkArgGroups(arguments, 1, inputTypes, STRING_GROUP);
+  obtainStringConverter(arguments, 1, inputTypes, converters);
 }
 
-if (timeZone == null) {
-  timeZone = SessionState.get() == null ? new 
HiveConf().getLocalTimeZone() : SessionState.get().getConf()
-  .getLocalTimeZone();
-  formatter.setTimeZone(TimeZone.getTimeZone(timeZone));
-}
+timeZone = SessionState.get() == null ? new HiveConf().getLocalTimeZone() 
: SessionState.get().getConf()
+  .getLocalTimeZone();
+FORMATTER.withZone(timeZone);
 
 return PrimitiveObjectInspectorFactory.writableStringObjectInspector;
   }
 
-  @Override
-  public void configure(MapredContext context) {
-if (context != null) {
-  String timeZoneStr = HiveConf.getVar(context.getJobConf(), 
HiveConf.ConfVars.HIVE_LOCAL_TIME_ZONE);
-  timeZone = TimestampTZUtil.parseTimeZone(timeZoneStr);
-  formatter.setTimeZone(TimeZone.getTimeZone(timeZone));
-}
-  }
-
   @Override
   public Object evaluate(DeferredObject[] arguments) throws HiveException {
 if (arguments[0].get() == null) {
   return null;
 }
 
-if (inputTextConverter != null) {
-  if (arguments[1].get() == null) {
-return null;
-  }
-  String format = (String) inputTextConverter.convert(arguments[1].get());
+if(arguments.length == 2) {
+  String format = getStringValue(arguments, 1, converters);
   if (format == null) {
 return null;
   }
   if (!format.equals(lastFormat)) {
-formatter = new SimpleDateFormat(format);
-formatter.setTimeZone(TimeZone.getTimeZone(timeZone));
+FORMATTER = DateTimeFormatter.ofPattern(format);
 lastFormat = format;
   }
 }
 
 // convert seconds to milliseconds
 long unixtime;
+Instant i;

Review comment:
   inoutIntOI is not required. So removed the whole check. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 634640)
Time Spent: 1h 50m  (was: 1h 40m)

>  from_unixtime() does not consider leap seconds
> ---
>
> Key: HIVE-25403
> URL: https://issues.apache.org/jira/browse/HIVE-25403
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Sruthi Mooriyathvariam
>Assignee: Sruthi Mooriyathvariam
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.2
>
> Attachments: image-2021-07-29-14-42-49-806.png
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The Unix_timestamp() considers "leap second" while the from_unixtime is not; 
> which results in to wrong result as below:
> !image-2021-07-29-14-42-49-806.png!




[jira] [Updated] (HIVE-25429) Delta metrics collection may cause number of tez counters to exceed tez.counters.max limit

2021-08-05 Thread Karen Coppage (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage updated HIVE-25429:
-
Description: 
There's a limit to the number of tez counters allowed (tez.counters.max). Delta 
metrics collection (i.e. DeltaFileMetricsReporter) was creating 3 counters for 
each partition touched by a given query, which can result in a huge number of 
counters, which is unnecessary because we're only interested in n number of 
partitions with the most deltas. This change limits the number of counters 
created to hive.txn.acid.metrics.max.cache.size*3.

Also when tez.counters.max is reached a LimitExceededException is thrown but 
isn't caught on the Hive side and causes the query to fail. We should catch 
this and skip delta metrics collection in this case.

Also make sure that metrics are only collected if 
hive.metastore.acidmetrics.ext.on=true

  was:
There's a limit to the number of tez counters allowed (tez.counters.max). Delta 
metrics collection (i.e. DeltaFileMetricsReporter) was creating 3 counters for 
each partition touched by a given query, which can result in a huge number of 
counters, which is unnecessary because we're only interested in n number of 
partitions with the most deltas. This change limits the number of counters 
created to hive.txn.acid.metrics.max.cache.size*3.

Also when tez.counters.max is reached a LimitExceededException is thrown but 
isn't caught on the Hive side and causes the query to fail. We should catch 
this and skip delta metrics collection in this case.


> Delta metrics collection may cause number of tez counters to exceed 
> tez.counters.max limit
> --
>
> Key: HIVE-25429
> URL: https://issues.apache.org/jira/browse/HIVE-25429
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> There's a limit to the number of tez counters allowed (tez.counters.max). 
> Delta metrics collection (i.e. DeltaFileMetricsReporter) was creating 3 
> counters for each partition touched by a given query, which can result in a 
> huge number of counters, which is unnecessary because we're only interested 
> in n number of partitions with the most deltas. This change limits the number 
> of counters created to hive.txn.acid.metrics.max.cache.size*3.
> Also when tez.counters.max is reached a LimitExceededException is thrown but 
> isn't caught on the Hive side and causes the query to fail. We should catch 
> this and skip delta metrics collection in this case.
> Also make sure that metrics are only collected if 
> hive.metastore.acidmetrics.ext.on=true



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25429) Delta metrics collection may cause number of tez counters to exceed tez.counters.max limit

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25429:
--
Labels: pull-request-available  (was: )

> Delta metrics collection may cause number of tez counters to exceed 
> tez.counters.max limit
> --
>
> Key: HIVE-25429
> URL: https://issues.apache.org/jira/browse/HIVE-25429
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> There's a limit to the number of tez counters allowed (tez.counters.max). 
> Delta metrics collection (i.e. DeltaFileMetricsReporter) was creating 3 
> counters for each partition touched by a given query, which can result in a 
> huge number of counters, which is unnecessary because we're only interested 
> in n number of partitions with the most deltas. This change limits the number 
> of counters created to hive.txn.acid.metrics.max.cache.size*3.
> Also when tez.counters.max is reached a LimitExceededException is thrown but 
> isn't caught on the Hive side and causes the query to fail. We should catch 
> this and skip delta metrics collection in this case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25429) Delta metrics collection may cause number of tez counters to exceed tez.counters.max limit

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25429?focusedWorklogId=634630=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634630
 ]

ASF GitHub Bot logged work on HIVE-25429:
-

Author: ASF GitHub Bot
Created on: 05/Aug/21 15:53
Start Date: 05/Aug/21 15:53
Worklog Time Spent: 10m 
  Work Description: klcopp opened a new pull request #2563:
URL: https://github.com/apache/hive/pull/2563


   See HIVE-25429


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 634630)
Remaining Estimate: 0h
Time Spent: 10m

> Delta metrics collection may cause number of tez counters to exceed 
> tez.counters.max limit
> --
>
> Key: HIVE-25429
> URL: https://issues.apache.org/jira/browse/HIVE-25429
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> There's a limit to the number of tez counters allowed (tez.counters.max). 
> Delta metrics collection (i.e. DeltaFileMetricsReporter) was creating 3 
> counters for each partition touched by a given query, which can result in a 
> huge number of counters, which is unnecessary because we're only interested 
> in n number of partitions with the most deltas. This change limits the number 
> of counters created to hive.txn.acid.metrics.max.cache.size*3.
> Also when tez.counters.max is reached a LimitExceededException is thrown but 
> isn't caught on the Hive side and causes the query to fail. We should catch 
> this and skip delta metrics collection in this case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25423) Add new test driver to automatically launch and load external database

2021-08-05 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17393972#comment-17393972
 ] 

Zoltan Haindrich commented on HIVE-25423:
-

hey [~dantongdong]!

This sounds like an interesting thing - I wanted to do something similar but 
never got time for it.
Right now we are running many of these CliDriver tests - which gives us some 
level of coverage - but we don't really have much smoke tests;
which do excercise the normal usage flow and not work thru the hacks of 
QTestUtil and similar things.

In the new test system I've already tried to prepare things for that - and I've 
configured it to run the schema initialization in a few databases.
We could change the way it works - but it would be very valuable to have tests 
which are not relying on itests or on the HIVE_IN_TEST and alike confs

let me know what you think!

> Add new test driver to automatically launch and load external database
> --
>
> Key: HIVE-25423
> URL: https://issues.apache.org/jira/browse/HIVE-25423
> Project: Hive
>  Issue Type: Test
>  Components: Testing Infrastructure, Tests
>Affects Versions: 3.1.2
>Reporter: Dantong Dong
>Assignee: Dantong Dong
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screen Shot 2021-08-04 at 2.32.35 PM.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Add new test driver(TestMiniLlapExtDBCliDriver) to automatically launch and 
> load external database with specified custom script during test. This Issue 
> was originated from HIVE-24396. Will add docs later
> !Screen Shot 2021-08-04 at 2.32.35 PM.png|width=500,height=262!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-24235) Drop and recreate table during MR compaction leaves behind base/delta directory

2021-08-05 Thread Karen Coppage (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388018#comment-17388018
 ] 

Karen Coppage edited comment on HIVE-24235 at 8/5/21, 12:41 PM:


This is incomplete because if the table is dropped, all related compactions are 
cleared from metadata. So here we're trying to fail a compaction that doesn't 
exist.

HIVE-24900 and HIVE-25430 are needed to complete this fix.


was (Author: klcopp):
This is useless because if the table is dropped, all related compactions are 
cleared from metadata. So here we're trying to fail a compaction that doesn't 
exist.

The actual fix for this issue is HIVE-25393.

> Drop and recreate table during MR compaction leaves behind base/delta 
> directory
> ---
>
> Key: HIVE-24235
> URL: https://issues.apache.org/jira/browse/HIVE-24235
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> If a table is dropped and recreated during MR compaction, the table directory 
> and a base (or delta, if minor compaction) directory could be created, with 
> or without data, while the table "does not exist".
> E.g.
> {code:java}
> create table c (i int) stored as orc tblproperties 
> ("NO_AUTO_COMPACTION"="true", "transactional"="true");
> insert into c values (9);
> insert into c values (9);
> alter table c compact 'major';
> While compaction job is running: {
> drop table c;
> create table c (i int) stored as orc tblproperties 
> ("NO_AUTO_COMPACTION"="true", "transactional"="true");
> }
> {code}
> The table directory should be empty, but table directory could look like this 
> after the job is finished:
> {code:java}
> Oct  6 14:23 c/base_002_v101/._orc_acid_version.crc
> Oct  6 14:23 c/base_002_v101/.bucket_0.crc
> Oct  6 14:23 c/base_002_v101/_orc_acid_version
> Oct  6 14:23 c/base_002_v101/bucket_0
> {code}
> or perhaps just: 
> {code:java}
> Oct  6 14:23 c/base_002_v101/._orc_acid_version.crc
> Oct  6 14:23 c/base_002_v101/_orc_acid_version
> {code}
> Insert another row and you have:
> {code:java}
> Oct  6 14:33 base_002_v101/
> Oct  6 14:33 base_002_v101/._orc_acid_version.crc
> Oct  6 14:33 base_002_v101/.bucket_0.crc
> Oct  6 14:33 base_002_v101/_orc_acid_version
> Oct  6 14:33 base_002_v101/bucket_0
> Oct  6 14:35 delta_001_001_/._orc_acid_version.crc
> Oct  6 14:35 delta_001_001_/.bucket_0_0.crc
> Oct  6 14:35 delta_001_001_/_orc_acid_version
> Oct  6 14:35 delta_001_001_/bucket_0_0
> {code}
> Selecting from the table will result in this error because the highest valid 
> writeId for this table is 1:
> {code:java}
> thrift.ThriftCLIService: Error fetching results: 
> org.apache.hive.service.cli.HiveSQLException: Unable to get the next row set
>         at 
> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:482)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> ...
> Caused by: java.io.IOException: java.lang.RuntimeException: ORC split 
> generation failed with exception: java.io.IOException: Not enough history 
> available for (1,x).  Oldest available base: 
> .../warehouse/b/base_004_v092
> {code}
> Solution: Resolve the table again after compaction is finished; compare the 
> id with the table id from when compaction began. If the ids do not match, 
> abort the compaction's transaction.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25430) compactor.Worker.markFailed should catch and log any kind of exception

2021-08-05 Thread Karen Coppage (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage reassigned HIVE-25430:



> compactor.Worker.markFailed should catch and log any kind of exception
> --
>
> Key: HIVE-25430
> URL: https://issues.apache.org/jira/browse/HIVE-25430
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25394) Enable vectorization for TestIcebergCliDriver dynamic_partition_pruning.q

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25394:
--
Labels: pull-request-available  (was: )

> Enable vectorization for TestIcebergCliDriver dynamic_partition_pruning.q
> -
>
> Key: HIVE-25394
> URL: https://issues.apache.org/jira/browse/HIVE-25394
> Project: Hive
>  Issue Type: Test
>Reporter: Peter Vary
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If we turn on vectorization for {{dynamic_partition_pruning.q}} we will get 
> the following exception:
> {code}
> See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, 
> or check ./ql/target/surefire-reports or 
> ./itests/qtest/target/surefire-reports/ for specific test cases logs.
>  org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, 
> vertexName=Map 1, vertexId=vertex_1627387142352_0001_11_01, diagnostics=[Task 
> failed, taskId=task_1627387142352_0001_11_01_00, diagnostics=[TaskAttempt 
> 0 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1627387142352_0001_11_01_00_0:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:365)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:277)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.IOException: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:89)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:311)
>   ... 16 more
> Caused by: java.io.IOException: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:374)
>   at 
> org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:82)
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:119)
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:59)
>   at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:145)
>   at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:75)
>   ... 18 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatchCtx.addPartitionColsToBatch(VectorizedRowBatchCtx.java:595)
>   at 
> 

[jira] [Work logged] (HIVE-25394) Enable vectorization for TestIcebergCliDriver dynamic_partition_pruning.q

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25394?focusedWorklogId=634516=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634516
 ]

ASF GitHub Bot logged work on HIVE-25394:
-

Author: ASF GitHub Bot
Created on: 05/Aug/21 12:13
Start Date: 05/Aug/21 12:13
Worklog Time Spent: 10m 
  Work Description: szlta opened a new pull request #2561:
URL: https://github.com/apache/hive/pull/2561


   Enable vectorization for TestIcebergCliDriver dynamic_partition_pruning.q


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 634516)
Remaining Estimate: 0h
Time Spent: 10m

> Enable vectorization for TestIcebergCliDriver dynamic_partition_pruning.q
> -
>
> Key: HIVE-25394
> URL: https://issues.apache.org/jira/browse/HIVE-25394
> Project: Hive
>  Issue Type: Test
>Reporter: Peter Vary
>Assignee: Ádám Szita
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If we turn on vectorization for {{dynamic_partition_pruning.q}} we will get 
> the following exception:
> {code}
> See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, 
> or check ./ql/target/surefire-reports or 
> ./itests/qtest/target/surefire-reports/ for specific test cases logs.
>  org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, 
> vertexName=Map 1, vertexId=vertex_1627387142352_0001_11_01, diagnostics=[Task 
> failed, taskId=task_1627387142352_0001_11_01_00, diagnostics=[TaskAttempt 
> 0 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1627387142352_0001_11_01_00_0:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:365)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:277)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.IOException: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:89)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:311)
>   ... 16 more
> Caused by: java.io.IOException: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:374)
>   at 
> org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:82)
>   at 
> 

[jira] [Work logged] (HIVE-25408) AlterTableSetOwnerAnalyzer should send Hive Privilege Objects for Authorization.

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25408?focusedWorklogId=634486=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634486
 ]

ASF GitHub Bot logged work on HIVE-25408:
-

Author: ASF GitHub Bot
Created on: 05/Aug/21 11:57
Start Date: 05/Aug/21 11:57
Worklog Time Spent: 10m 
  Work Description: saihemanth-cloudera opened a new pull request #2560:
URL: https://github.com/apache/hive/pull/2560


   …ects for Authorization
   
   
   
   ### What changes were proposed in this pull request?
   Added hive privilege objects that contain table information that can be 
authorized in ranger/sentry.
   
   
   
   ### Why are the changes needed?
   Otherwise, any user can alter the table owner information.
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   
   
   ### How was this patch tested?
   Local machine, Remote cluster.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 634486)
Time Spent: 20m  (was: 10m)

> AlterTableSetOwnerAnalyzer should send Hive Privilege Objects for 
> Authorization. 
> -
>
> Key: HIVE-25408
> URL: https://issues.apache.org/jira/browse/HIVE-25408
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently, Hive is sending an empty list in the Hive Privilege Objects for 
> authorization when a user does the following operation: alter table foo set 
> owner user user_name;
> We should be sending the input/objects related to the table in Hive privilege 
> objects for authorization.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25328) Limit scope of REPLACE COLUMNS for Iceberg tables

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25328?focusedWorklogId=634479=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634479
 ]

ASF GitHub Bot logged work on HIVE-25328:
-

Author: ASF GitHub Bot
Created on: 05/Aug/21 11:56
Start Date: 05/Aug/21 11:56
Worklog Time Spent: 10m 
  Work Description: marton-bod merged pull request #2475:
URL: https://github.com/apache/hive/pull/2475


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 634479)
Time Spent: 1.5h  (was: 1h 20m)

> Limit scope of REPLACE COLUMNS for Iceberg tables
> -
>
> Key: HIVE-25328
> URL: https://issues.apache.org/jira/browse/HIVE-25328
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Replace columns is a rather wildcard operation which can do heavy-weight 
> schema changes. We would only want to allow this operation for dropping 
> columns for Iceberg tables. For other changes (adding cols, renaming, type 
> promotion etc.), we should use the CHANGE COLUMN command.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25423) Add new test driver to automatically launch and load external database

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25423?focusedWorklogId=634449=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634449
 ]

ASF GitHub Bot logged work on HIVE-25423:
-

Author: ASF GitHub Bot
Created on: 05/Aug/21 11:52
Start Date: 05/Aug/21 11:52
Worklog Time Spent: 10m 
  Work Description: dantongdong opened a new pull request #2559:
URL: https://github.com/apache/hive/pull/2559


   [HIVE-25423](https://issues.apache.org/jira/browse/HIVE-25423): Add new test 
driver to automatically launch and load external database.
   
   Add new test driver - **TestMiniLlapExtDBCliDriver** - which has the ability 
to automatically launch external database with custom SQL script(default init 
and cleanup script located at data/script/[databaseType]). Moved 
dataconnector.q test to test suite **externalDB.llap.query.files**
   
   Add new test dataconnector_mysql.q to benchmark the new test driver's 
loading ability, desired output located in dataconnector_mysql.q.out.
   
   Currently this test driver only support mysql external database launching. 
Postgres and Derby support will land sequentially. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 634449)
Time Spent: 20m  (was: 10m)

> Add new test driver to automatically launch and load external database
> --
>
> Key: HIVE-25423
> URL: https://issues.apache.org/jira/browse/HIVE-25423
> Project: Hive
>  Issue Type: Test
>  Components: Testing Infrastructure, Tests
>Affects Versions: 3.1.2
>Reporter: Dantong Dong
>Assignee: Dantong Dong
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screen Shot 2021-08-04 at 2.32.35 PM.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Add new test driver(TestMiniLlapExtDBCliDriver) to automatically launch and 
> load external database with specified custom script during test. This Issue 
> was originated from HIVE-24396. Will add docs later
> !Screen Shot 2021-08-04 at 2.32.35 PM.png|width=500,height=262!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25422) Break up TestHiveIcebergStorageHandlerWithEngine test

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25422?focusedWorklogId=634440=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634440
 ]

ASF GitHub Bot logged work on HIVE-25422:
-

Author: ASF GitHub Bot
Created on: 05/Aug/21 11:50
Start Date: 05/Aug/21 11:50
Worklog Time Spent: 10m 
  Work Description: szlta opened a new pull request #2558:
URL: https://github.com/apache/hive/pull/2558


   TestHiveIcebergStorageHandlerWithEngine tests the Iceberg-Hive integration 
by running queries against Iceberg-backed tables. This is a parameterized test, 
so that each file format, each table type, and vectorization on/off scenarios 
are covered, while it is testing different functionalities too.
   
   This ticket will track the effort to break this test class into smaller 
chunks, as in the recent past we have observed cases where this test couldn't 
even be executed for memory/process problem reasons.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 634440)
Time Spent: 1h  (was: 50m)

> Break up TestHiveIcebergStorageHandlerWithEngine test
> -
>
> Key: HIVE-25422
> URL: https://issues.apache.org/jira/browse/HIVE-25422
> Project: Hive
>  Issue Type: Bug
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> TestHiveIcebergStorageHandlerWithEngine tests the Iceberg-Hive integration by 
> running queries against Iceberg-backed tables. This is a parameterized test, 
> so that each file format, each table type, and vectorization on/off scenarios 
> are covered, while it is testing different functionalities too.
> This ticket will track the effort to break this test class into smaller 
> chunks, as in the recent past we have observed cases where this test couldn't 
> even be executed for memory/process problem reasons.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25422) Break up TestHiveIcebergStorageHandlerWithEngine test

2021-08-05 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-25422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17393902#comment-17393902
 ] 

Ádám Szita commented on HIVE-25422:
---

The following break up is proposed:
{code:java}
TestHiveIcebergCTAS.java
   3
  public void testCTASFromHiveTable() {
  public void testCTASPartitionedFromHiveTable() throws TException, 
InterruptedException {
  public void testCTASFailureRollback() throws IOException {
TestHiveIcebergComplexTypeWrites.java
  12
  public void testWriteArrayOfPrimitivesInTable() throws IOException {
  public void testWriteArrayOfArraysInTable() throws IOException {
  public void testWriteArrayOfMapsInTable() throws IOException {
  public void testWriteArrayOfStructsInTable() throws IOException {
  public void testWriteMapOfPrimitivesInTable() throws IOException {
  public void testWriteMapOfArraysInTable() throws IOException {
  public void testWriteMapOfMapsInTable() throws IOException {
  public void testWriteMapOfStructsInTable() throws IOException {
  public void testWriteStructOfPrimitivesInTable() throws IOException {
  public void testWriteStructOfArraysInTable() throws IOException {
  public void testWriteStructOfMapsInTable() throws IOException {
  public void testWriteStructOfStructsInTable() throws IOException {
TestHiveIcebergInserts.java
  12
  public void testInsert() throws IOException {
  public void testInsertSupportedTypes() throws IOException {
  public void testInsertFromSelect() throws IOException {
  public void testInsertOverwriteNonPartitionedTable() throws IOException {
  public void testInsertOverwritePartitionedTable() throws IOException {
  public void testInsertFromSelectWithOrderBy() throws IOException {
  public void testInsertFromSelectWithProjection() throws IOException {
  public void testInsertUsingSourceTableWithSharedColumnsNames() throws 
IOException {
  public void testInsertFromJoiningTwoIcebergTables() throws IOException {
  public void testMultiTableInsert() throws IOException {
  public void testWriteWithDefaultWriteFormat() {
  public void testInsertEmptyResultSet() throws IOException {
TestHiveIcebergMigration.java
   7
  public void testMigrateHiveTableToIceberg() throws TException, 
InterruptedException {
  public void testMigratePartitionedHiveTableToIceberg() throws TException, 
InterruptedException {
  public void testMigratePartitionedBucketedHiveTableToIceberg() throws 
TException, InterruptedException {
  public void testRollbackMigrateHiveTableToIceberg() throws TException, 
InterruptedException {
  public void testRollbackMigratePartitionedHiveTableToIceberg() throws 
TException, InterruptedException {
  public void testRollbackMultiPartitionedHiveTableToIceberg() throws 
TException, InterruptedException {
  public void testRollbackMigratePartitionedBucketedHiveTableToIceberg() throws 
TException, InterruptedException {
TestHiveIcebergPartitions.java
  10
  public void testPartitionPruning() throws IOException {
  public void testPartitionedWrite() throws IOException {
  public void testIdentityPartitionedWrite() throws IOException {
  public void testMultilevelIdentityPartitionedWrite() throws IOException {
  public void testYearTransform() throws IOException {
  public void testMonthTransform() throws IOException {
  public void testDayTransform() throws IOException {
  public void testHourTransform() throws IOException {
  public void testBucketTransform() throws IOException {
  public void testTruncateTransform() throws IOException {
TestHiveIcebergSchemaEvolution.java
  14
  public void testDescribeTable() throws IOException {
  public void testAlterChangeColumn() throws IOException {
  public void testSchemaEvolutionOnVectorizedReads() throws Exception {
  public void testAddColumnToIcebergTable() throws IOException {
  public void testAddRequiredColumnToIcebergTable() throws IOException {
  public void testAddColumnIntoStructToIcebergTable() throws IOException {
  public void testMakeColumnRequiredInIcebergTable() throws IOException {
  public void testRemoveColumnFromIcebergTable() throws IOException {
  public void testRemoveAndAddBackColumnFromIcebergTable() throws IOException {
  public void testRenameColumnInIcebergTable() throws IOException {
  public void testMoveLastNameToFirstInIcebergTable() throws IOException {
  public void testMoveLastNameBeforeCustomerIdInIcebergTable() throws 
IOException {
  public void testMoveCustomerIdAfterFirstNameInIcebergTable() throws 
IOException {
  public void testUpdateColumnTypeInIcebergTable() throws IOException, 
TException, InterruptedException {
TestHiveIcebergSelects.java
   8
  public void testScanTable() throws IOException {
  public void testCBOWithSelectedColumnsNonOverlapJoin() throws IOException {
  public void testCBOWithSelectedColumnsOverlapJoin() throws IOException {
  public void testCBOWithSelfJoin() throws IOException {
  public void 

[jira] [Work logged] (HIVE-25420) Ignore time type column in Iceberg testing for vectorized runs

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25420?focusedWorklogId=634447=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634447
 ]

ASF GitHub Bot logged work on HIVE-25420:
-

Author: ASF GitHub Bot
Created on: 05/Aug/21 11:51
Start Date: 05/Aug/21 11:51
Worklog Time Spent: 10m 
  Work Description: szlta merged pull request #2557:
URL: https://github.com/apache/hive/pull/2557


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 634447)
Time Spent: 0.5h  (was: 20m)

> Ignore time type column in Iceberg testing for vectorized runs
> --
>
> Key: HIVE-25420
> URL: https://issues.apache.org/jira/browse/HIVE-25420
> Project: Hive
>  Issue Type: Bug
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Time is a valid type in Iceberg but not in Hive. In Hive it is represented as 
> a string type column, while (at least if ORC is used as underlying file 
> format) long type is written out to data files.
> This requires translation two times: long@ORC -> LocalDate@Iceberg -> 
> toString()@Hive and it works well for non vectorized reads, but when 
> vectorization is turned on, we will get:
> {code:java}
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector {code}
> Thus for now, time type is not supported with vectorization, and the relevant 
> test cases should be ignored in such test configs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25048) Refine the start/end functions in HMSHandler

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25048?focusedWorklogId=634345=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634345
 ]

ASF GitHub Bot logged work on HIVE-25048:
-

Author: ASF GitHub Bot
Created on: 05/Aug/21 11:40
Start Date: 05/Aug/21 11:40
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #2441:
URL: https://github.com/apache/hive/pull/2441#discussion_r682695432



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/RetryingHMSHandler.java
##
@@ -61,6 +66,7 @@ public Result(Object result, int numRetries) {
 
   private final Configuration origConf;// base configuration
   private final Configuration activeConf;  // active configuration
+  private final List endFunctionListeners; // 
the end function listener

Review comment:
   didn't know we had something like this :)
   
   but having only `MetaStoreEndFunctionListener` seems a bit odd;
   
   how about:
   * introduce a `MetaStoreFunctionListener`
   * keep the `MetaStoreEndFunctionListener` interface but extend the new 
interface with default implementation of `startFunction`
   * move the old logging/counting stuff into a `MetaStoreFunctionListener`
   
   just an idea; let me know what you think!

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/RetryingHMSHandler.java
##
@@ -202,6 +215,8 @@ public Result invokeInternal(final Object proxy, final 
Method method, final Obje
   LOG.error(ExceptionUtils.getStackTrace(e.getCause()));
   throw e.getCause();
 }
+  } finally {
+endFunction(method, object, ex, args);

Review comment:
   I think we could place the `startFunctions` here as well - now we have 1 
part here and the other there... 
   I don't think we will break much things if we do that - or do you think that 
could cause some trouble?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 634345)
Time Spent: 4h  (was: 3h 50m)

> Refine the start/end functions in HMSHandler
> 
>
> Key: HIVE-25048
> URL: https://issues.apache.org/jira/browse/HIVE-25048
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Some start/end functions are incomplete or wrong in the HMSHandler, these 
> functions audit actions, monitor the performance, and notify the end function 
> listeners. We have already measured the performance of the HMSHandler in 
> PerfLogger,  and covered more methods than these functions that have done, so 
> we can remove the monitoring from the start/end functions, move the end 
> function listeners to the RetryingHMSHandler to eliminate the try-finally 
> blocks that spread across many different methods. After these, we can try to 
> cleanup the functions to make HMSHandler be more simplified.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25422) Break up TestHiveIcebergStorageHandlerWithEngine test

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25422?focusedWorklogId=634324=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634324
 ]

ASF GitHub Bot logged work on HIVE-25422:
-

Author: ASF GitHub Bot
Created on: 05/Aug/21 11:38
Start Date: 05/Aug/21 11:38
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2558:
URL: https://github.com/apache/hive/pull/2558#discussion_r683216106



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -20,116 +20,81 @@
 package org.apache.iceberg.mr.hive;
 
 import java.io.IOException;
-import java.math.BigDecimal;
-import java.text.SimpleDateFormat;
-import java.time.LocalDate;
-import java.time.LocalDateTime;
-import java.time.OffsetDateTime;
-import java.time.ZoneOffset;
 import java.util.ArrayList;
-import java.util.Arrays;
 import java.util.Collection;
-import java.util.Comparator;
-import java.util.Date;
 import java.util.HashMap;
 import java.util.List;
-import java.util.Locale;
 import java.util.Map;
-import java.util.Properties;
 import java.util.concurrent.TimeUnit;
-import java.util.stream.Collectors;
-import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.hive.common.StatsSetupConst;
 import org.apache.hadoop.hive.conf.HiveConf;
-import org.apache.hadoop.hive.metastore.api.FieldSchema;
-import org.apache.hadoop.hive.metastore.api.MetaException;
-import org.apache.hadoop.hive.metastore.api.StorageDescriptor;
-import org.apache.hadoop.hive.metastore.partition.spec.PartitionSpecProxy;
 import org.apache.hadoop.hive.ql.exec.mr.ExecMapper;
-import org.apache.iceberg.AssertHelpers;
 import org.apache.iceberg.FileFormat;
-import org.apache.iceberg.HistoryEntry;
-import org.apache.iceberg.PartitionKey;
-import org.apache.iceberg.PartitionSpec;
 import org.apache.iceberg.Schema;
 import org.apache.iceberg.SnapshotSummary;
 import org.apache.iceberg.Table;
-import org.apache.iceberg.TableProperties;
-import org.apache.iceberg.catalog.TableIdentifier;
-import org.apache.iceberg.data.GenericRecord;
 import org.apache.iceberg.data.Record;
-import org.apache.iceberg.exceptions.NoSuchTableException;
-import org.apache.iceberg.hive.HiveSchemaUtil;
 import org.apache.iceberg.hive.MetastoreUtil;
-import org.apache.iceberg.mr.Catalogs;
-import org.apache.iceberg.mr.InputFormatConfig;
 import org.apache.iceberg.mr.TestHelper;
 import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList;
 import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
-import org.apache.iceberg.relocated.com.google.common.collect.Lists;
 import org.apache.iceberg.types.Type;
 import org.apache.iceberg.types.Types;
 import org.apache.thrift.TException;
 import org.junit.After;
 import org.junit.AfterClass;
 import org.junit.Assert;
-import org.junit.Assume;
 import org.junit.Before;
 import org.junit.BeforeClass;
 import org.junit.Rule;
-import org.junit.Test;
 import org.junit.rules.TemporaryFolder;
 import org.junit.rules.Timeout;
 import org.junit.runner.RunWith;
 import org.junit.runners.Parameterized;
-import org.mockito.ArgumentMatchers;
-import org.mockito.MockedStatic;
-import org.mockito.Mockito;
 
 import static org.apache.iceberg.types.Types.NestedField.optional;
 import static org.apache.iceberg.types.Types.NestedField.required;
 import static org.junit.runners.Parameterized.Parameter;
 import static org.junit.runners.Parameterized.Parameters;
 
-
 @RunWith(Parameterized.class)
-public class TestHiveIcebergStorageHandlerWithEngine {
+public abstract class TestHiveIcebergStorageHandlerWithEngine {

Review comment:
   Should we rename this to `HiveIcebergStorageHandlerWithEngineBase` or 
something, so it does not start with Test?

##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergTypes.java
##
@@ -0,0 +1,160 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr.hive;
+
+import java.io.IOException;
+import java.math.BigDecimal;
+import 

[jira] [Work logged] (HIVE-25417) Null bit vector is not handled while getting the stats for Postgres backend

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25417?focusedWorklogId=634286=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634286
 ]

ASF GitHub Bot logged work on HIVE-25417:
-

Author: ASF GitHub Bot
Created on: 05/Aug/21 11:34
Start Date: 05/Aug/21 11:34
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #2556:
URL: https://github.com/apache/hive/pull/2556#discussion_r682520895



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/StatObjectConverter.java
##
@@ -525,6 +531,16 @@ public static MPartitionColumnStatistics 
convertToMPartitionColumnStatistics(
 return mColStats;
   }
 
+  private static byte[] getBitVector(byte[] bytes) {

Review comment:
   you could move this logic into `MTableColumnStatistics#getBitVector` ;
   

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
##
@@ -9538,7 +9538,7 @@ private void writeMPartitionColumnStatistics(Table table, 
Partition partition,
 if (oldStats != null) {
   StatObjectConverter.setFieldsIntoOldStats(mStatsObj, oldStats);
 } else {
-  if (sqlGenerator.getDbProduct().isPOSTGRES() && mStatsObj.getBitVector() 
== null) {

Review comment:
   you could aslo move this into the `setBitVector` / defaults stuff into 
the `MTableColumnStatistics`

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/model/MPartitionColumnStatistics.java
##
@@ -281,11 +281,20 @@ public void setDecimalHighValue(String decimalHighValue) {
   }
 
   public byte[] getBitVector() {
+// workaround for DN bug in persisting nulls in pg bytea column
+// instead set empty bit vector with header.
+// https://issues.apache.org/jira/browse/HIVE-17836
+if (bitVector != null && bitVector.length == 2 && bitVector[0] == 'H' && 
bitVector[1] == 'L') {
+  return null;
+}
 return bitVector;
   }
 
   public void setBitVector(byte[] bitVector) {
-this.bitVector = bitVector;
+// workaround for DN bug in persisting nulls in pg bytea column

Review comment:
   is the DN serialization happens thru the getters or thru the fields? 
   if its reading the fields; what happens if we create an instance of this 
class and never call the `setBitVector(null)`? will that be okay?
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 634286)
Time Spent: 40m  (was: 0.5h)

> Null bit vector is not handled while getting the stats for Postgres backend
> ---
>
> Key: HIVE-25417
> URL: https://issues.apache.org/jira/browse/HIVE-25417
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> While adding stats with null bit vector, a special string "HL" is added as 
> Postgres does not support null value for byte columns. But while getting the 
> stats, the conversion to null is not done. This is causing failure during 
> deserialisation of bit vector field if the existing stats is used for merge.
>  
> {code:java}
>  The input stream is not a HyperLogLog stream.  7276-1 instead of 727676 or 
> 7077^Mat 
> org.apache.hadoop.hive.common.ndv.hll.HyperLogLogUtils.checkMagicString(HyperLogLogUtils.java:349)^M
>  at 
> org.apache.hadoop.hive.common.ndv.hll.HyperLogLogUtils.deserializeHLL(HyperLogLogUtils.java:139)^M
>at 
> org.apache.hadoop.hive.common.ndv.hll.HyperLogLogUtils.deserializeHLL(HyperLogLogUtils.java:213)^M
>at 
> org.apache.hadoop.hive.common.ndv.hll.HyperLogLogUtils.deserializeHLL(HyperLogLogUtils.java:227)^M
>at 
> org.apache.hadoop.hive.common.ndv.NumDistinctValueEstimatorFactory.getNumDistinctValueEstimator(NumDistinctValueEstimatorFactory.java:53)^M
>   at 
> org.apache.hadoop.hive.metastore.columnstats.cache.LongColumnStatsDataInspector.updateNdvEstimator(LongColumnStatsDataInspector.java:124)^M
>   at 
> org.apache.hadoop.hive.metastore.columnstats.cache.LongColumnStatsDataInspector.getNdvEstimator(LongColumnStatsDataInspector.java:107)^M
>  at 
> org.apache.hadoop.hive.metastore.columnstats.merge.LongColumnStatsMerger.merge(LongColumnStatsMerger.java:36)^M
>

[jira] [Work logged] (HIVE-25048) Refine the start/end functions in HMSHandler

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25048?focusedWorklogId=634254=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634254
 ]

ASF GitHub Bot logged work on HIVE-25048:
-

Author: ASF GitHub Bot
Created on: 05/Aug/21 11:30
Start Date: 05/Aug/21 11:30
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on pull request #2441:
URL: https://github.com/apache/hive/pull/2441#issuecomment-892688659


   @nrg4878 @kgyrtkirk any thought or comments? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 634254)
Time Spent: 3h 50m  (was: 3h 40m)

> Refine the start/end functions in HMSHandler
> 
>
> Key: HIVE-25048
> URL: https://issues.apache.org/jira/browse/HIVE-25048
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Some start/end functions are incomplete or wrong in the HMSHandler, these 
> functions audit actions, monitor the performance, and notify the end function 
> listeners. We have already measured the performance of the HMSHandler in 
> PerfLogger,  and covered more methods than these functions that have done, so 
> we can remove the monitoring from the start/end functions, move the end 
> function listeners to the RetryingHMSHandler to eliminate the try-finally 
> blocks that spread across many different methods. After these, we can try to 
> cleanup the functions to make HMSHandler be more simplified.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25061) PTF: Improve ValueBoundaryScanner

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25061?focusedWorklogId=634250=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634250
 ]

ASF GitHub Bot logged work on HIVE-25061:
-

Author: ASF GitHub Bot
Created on: 05/Aug/21 11:30
Start Date: 05/Aug/21 11:30
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on pull request #2225:
URL: https://github.com/apache/hive/pull/2225#issuecomment-893248100


   merged to master, thanks @pgaref for your comments and review!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 634250)
Time Spent: 5.5h  (was: 5h 20m)

> PTF: Improve ValueBoundaryScanner
> -
>
> Key: HIVE-25061
> URL: https://issues.apache.org/jira/browse/HIVE-25061
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: Screen Shot 2021-04-27 at 1.02.37 PM.png
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> -First, I need to check whether TreeMap is really needed for our case.-
> It turned out a binary-ish search approach could help in range calculation, 
> as we're searching in an ordered set of values.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25048) Refine the start/end functions in HMSHandler

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25048?focusedWorklogId=634243=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634243
 ]

ASF GitHub Bot logged work on HIVE-25048:
-

Author: ASF GitHub Bot
Created on: 05/Aug/21 11:29
Start Date: 05/Aug/21 11:29
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on a change in pull request #2441:
URL: https://github.com/apache/hive/pull/2441#discussion_r683067748



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/RetryingHMSHandler.java
##
@@ -202,6 +215,8 @@ public Result invokeInternal(final Object proxy, final 
Method method, final Obje
   LOG.error(ExceptionUtils.getStackTrace(e.getCause()));
   throw e.getCause();
 }
+  } finally {
+endFunction(method, object, ex, args);

Review comment:
   The `startFunctions` takes care of these:
   1.  API Metrics,  which is duplicated with 
[PerfLogger](https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/metrics/PerfLogger.java#L188-L207)
 in RetryingHMSHandler, this part cloud be elimated from the `startFunctions`.
   2. Audit Logs, it is difficult to move these to a standalone function, as 
the log is related with the input of the method. For example, we have two 
`startPartitionFunction` for the partition related methods:
   ```
 private void startPartitionFunction(String function, String cat, String 
db, String tbl,
 List partVals) {
   startFunction(function, " : tbl=" +
   TableName.getQualified(cat, db, tbl) + "[" + join(partVals, ",") + 
"]");
 }
   
 private void startPartitionFunction(String function, String catName, 
String db, String tbl,
 Map partName) {
   startFunction(function, " : tbl=" +
   TableName.getQualified(catName, db, tbl) + "partition=" + partName);
 }
   ```
   
   In some cases, we also use only log the table name by using 
`startTableFunction`, like 
[get_partitions](https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java#L5453)
   These functions make the audit log of partitions different, another concern 
is that if we add/remove a method, the audit log should be changed elsewhere, 
this may be upsetting.
   

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/RetryingHMSHandler.java
##
@@ -202,6 +215,8 @@ public Result invokeInternal(final Object proxy, final 
Method method, final Obje
   LOG.error(ExceptionUtils.getStackTrace(e.getCause()));
   throw e.getCause();
 }
+  } finally {
+endFunction(method, object, ex, args);

Review comment:
   The `startFunctions` takes care of these:
   1.  API Metrics,  which is duplicated with 
[PerfLogger](https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/metrics/PerfLogger.java#L188-L207)
 in RetryingHMSHandler, this part cloud be elimated from the `startFunctions`.
   2. Audit Logs, it is difficult to move these to a standalone function, as 
the log is related with the input of the method and the input varies among 
different methods. For example, we have two `startPartitionFunction` for the 
partition related methods:
   ```
 private void startPartitionFunction(String function, String cat, String 
db, String tbl,
 List partVals) {
   startFunction(function, " : tbl=" +
   TableName.getQualified(cat, db, tbl) + "[" + join(partVals, ",") + 
"]");
 }
   
 private void startPartitionFunction(String function, String catName, 
String db, String tbl,
 Map partName) {
   startFunction(function, " : tbl=" +
   TableName.getQualified(catName, db, tbl) + "partition=" + partName);
 }
   ```
   
   In some cases, we also use only log the table name by using 
`startTableFunction`, like 
[get_partitions](https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java#L5453)
   These functions make the audit log of partitions different, another concern 
is that if we add/remove a method, the audit log should be changed elsewhere, 
this may be upsetting.
   

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/RetryingHMSHandler.java
##
@@ -202,6 +215,8 @@ public Result invokeInternal(final Object proxy, final 
Method method, final Obje
   LOG.error(ExceptionUtils.getStackTrace(e.getCause()));
  

[jira] [Work started] (HIVE-25394) Enable vectorization for TestIcebergCliDriver dynamic_partition_pruning.q

2021-08-05 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-25394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-25394 started by Ádám Szita.
-
> Enable vectorization for TestIcebergCliDriver dynamic_partition_pruning.q
> -
>
> Key: HIVE-25394
> URL: https://issues.apache.org/jira/browse/HIVE-25394
> Project: Hive
>  Issue Type: Test
>Reporter: Peter Vary
>Assignee: Ádám Szita
>Priority: Major
>
> If we turn on vectorization for {{dynamic_partition_pruning.q}} we will get 
> the following exception:
> {code}
> See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, 
> or check ./ql/target/surefire-reports or 
> ./itests/qtest/target/surefire-reports/ for specific test cases logs.
>  org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, 
> vertexName=Map 1, vertexId=vertex_1627387142352_0001_11_01, diagnostics=[Task 
> failed, taskId=task_1627387142352_0001_11_01_00, diagnostics=[TaskAttempt 
> 0 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1627387142352_0001_11_01_00_0:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:365)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:277)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.IOException: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:89)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:311)
>   ... 16 more
> Caused by: java.io.IOException: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:374)
>   at 
> org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:82)
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:119)
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:59)
>   at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:145)
>   at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:75)
>   ... 18 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatchCtx.addPartitionColsToBatch(VectorizedRowBatchCtx.java:595)
>   at 
> org.apache.iceberg.mr.hive.vector.VectorizedRowBatchIterator.advance(VectorizedRowBatchIterator.java:69)
>   at 
> 

[jira] [Assigned] (HIVE-25404) Inserts inside merge statements are rewritten incorrectly for partitioned tables

2021-08-05 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-25404:
---

Assignee: Zoltan Haindrich

> Inserts inside merge statements are rewritten incorrectly for partitioned 
> tables
> 
>
> Key: HIVE-25404
> URL: https://issues.apache.org/jira/browse/HIVE-25404
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> {code}
> drop table u;drop table t;
> create table t(value string default 'def') partitioned by (id integer);
> create table u(id integer);
> {code}
> #1 id specified
> rewritten
> {code}
> FROM
>   `default`.`t`
>   RIGHT OUTER JOIN
>   `default`.`u`
>   ON `t`.`id`=`u`.`id`
> INSERT INTO `default`.`t` (`id`,`value`) partition (`id`)-- insert clause
>   SELECT `u`.`id`,'x'
>WHERE `t`.`id` IS NULL
> {code}
> #2 when values is not specified
> {code}
> merge into t using u on t.id=u.id when not matched then insert (id) values 
> (u.id);
> {code}
> rewritten query:
> {code}
> FROM
>   `default`.`t`
>   RIGHT OUTER JOIN
>   `default`.`u`
>   ON `t`.`id`=`u`.`id`
> INSERT INTO `default`.`t` (`id`) partition (`id`)-- insert clause
>   SELECT `u`.`id`
>WHERE `t`.`id` IS NULL
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25048) Refine the start/end functions in HMSHandler

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25048?focusedWorklogId=634135=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634135
 ]

ASF GitHub Bot logged work on HIVE-25048:
-

Author: ASF GitHub Bot
Created on: 05/Aug/21 11:17
Start Date: 05/Aug/21 11:17
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 edited a comment on pull request #2441:
URL: https://github.com/apache/hive/pull/2441#issuecomment-892688659


   Hi @nrg4878 @kgyrtkirk any thought or comments? thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 634135)
Time Spent: 3.5h  (was: 3h 20m)

> Refine the start/end functions in HMSHandler
> 
>
> Key: HIVE-25048
> URL: https://issues.apache.org/jira/browse/HIVE-25048
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Some start/end functions are incomplete or wrong in the HMSHandler, these 
> functions audit actions, monitor the performance, and notify the end function 
> listeners. We have already measured the performance of the HMSHandler in 
> PerfLogger,  and covered more methods than these functions that have done, so 
> we can remove the monitoring from the start/end functions, move the end 
> function listeners to the RetryingHMSHandler to eliminate the try-finally 
> blocks that spread across many different methods. After these, we can try to 
> cleanup the functions to make HMSHandler be more simplified.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24969) Predicates may be removed when decorrelating subqueries with lateral

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24969?focusedWorklogId=634128=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634128
 ]

ASF GitHub Bot logged work on HIVE-24969:
-

Author: ASF GitHub Bot
Created on: 05/Aug/21 11:16
Start Date: 05/Aug/21 11:16
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 removed a comment on pull request #2145:
URL: https://github.com/apache/hive/pull/2145#issuecomment-833295319


   Hi @jcamachor, RS has only one JOIN as his child when parsing the join tree, 
 looks safe here to push down all predicates through RS if they are belong to 
the inputs of the RS. what do you think?
   thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 634128)
Time Spent: 2h 10m  (was: 2h)

> Predicates may be removed when decorrelating subqueries with lateral
> 
>
> Key: HIVE-24969
> URL: https://issues.apache.org/jira/browse/HIVE-24969
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Step to reproduce:
> {code:java}
> select count(distinct logItem.triggerId)
> from service_stat_log LATERAL VIEW explode(logItems) LogItemTable AS logItem
> where logItem.dsp in ('delivery', 'ocpa')
> and logItem.iswin = true
> and logItem.adid in (
>  select distinct adId
>  from ad_info
>  where subAccountId in (16010, 14863));  {code}
> For predicates _logItem.dsp in ('delivery', 'ocpa')_  and _logItem.iswin = 
> true_ are removed when doing ppd: JOIN ->   RS  -> LVJ.  The JOIN has 
> candicates: logitem -> [logItem.dsp in ('delivery', 'ocpa'), logItem.iswin = 
> true],when pushing them to the RS followed by LVJ,  none of them are pushed, 
> the candicates of logitem are removed finally by default, which cause to the 
> wrong result.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25061) PTF: Improve ValueBoundaryScanner

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25061?focusedWorklogId=634115=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634115
 ]

ASF GitHub Bot logged work on HIVE-25061:
-

Author: ASF GitHub Bot
Created on: 05/Aug/21 11:14
Start Date: 05/Aug/21 11:14
Worklog Time Spent: 10m 
  Work Description: abstractdog merged pull request #2225:
URL: https://github.com/apache/hive/pull/2225


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 634115)
Time Spent: 5h 20m  (was: 5h 10m)

> PTF: Improve ValueBoundaryScanner
> -
>
> Key: HIVE-25061
> URL: https://issues.apache.org/jira/browse/HIVE-25061
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: Screen Shot 2021-04-27 at 1.02.37 PM.png
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> -First, I need to check whether TreeMap is really needed for our case.-
> It turned out a binary-ish search approach could help in range calculation, 
> as we're searching in an ordered set of values.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24705) Create/Alter/Drop tables based on storage handlers in HS2 should be authorized by Ranger/Sentry

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24705?focusedWorklogId=634073=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634073
 ]

ASF GitHub Bot logged work on HIVE-24705:
-

Author: ASF GitHub Bot
Created on: 05/Aug/21 11:09
Start Date: 05/Aug/21 11:09
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on pull request #1960:
URL: https://github.com/apache/hive/pull/1960#issuecomment-892743187


   fix has been merged to master. Please close this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 634073)
Time Spent: 1h 40m  (was: 1.5h)

> Create/Alter/Drop tables based on storage handlers in HS2 should be 
> authorized by Ranger/Sentry
> ---
>
> Key: HIVE-24705
> URL: https://issues.apache.org/jira/browse/HIVE-24705
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> With doAs=false in Hive3.x, whenever a user is trying to create a table based 
> on storage handlers on external storage for ex: HBase table, the end user we 
> are seeing is hive so we cannot really enforce the condition in Apache 
> Ranger/Sentry on the end-user. So, we need to enforce this condition in the 
> hive in the event of create/alter/drop tables based on storage handlers.
> Built-in hive storage handlers like HbaseStorageHandler, KafkaStorageHandler 
> e.t.c should implement a method getURIForAuthentication() which returns a URI 
> that is formed from table properties. This URI can be sent for authorization 
> to Ranger/Sentry.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25328) Limit scope of REPLACE COLUMNS for Iceberg tables

2021-08-05 Thread Marton Bod (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17393777#comment-17393777
 ] 

Marton Bod commented on HIVE-25328:
---

Pushed to master. Thanks for the reviews, [~szita], [~pvary]!

> Limit scope of REPLACE COLUMNS for Iceberg tables
> -
>
> Key: HIVE-25328
> URL: https://issues.apache.org/jira/browse/HIVE-25328
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Replace columns is a rather wildcard operation which can do heavy-weight 
> schema changes. We would only want to allow this operation for dropping 
> columns for Iceberg tables. For other changes (adding cols, renaming, type 
> promotion etc.), we should use the CHANGE COLUMN command.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25429) Delta metrics collection may cause number of tez counters to exceed tez.counters.max limit

2021-08-05 Thread Karen Coppage (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage reassigned HIVE-25429:



> Delta metrics collection may cause number of tez counters to exceed 
> tez.counters.max limit
> --
>
> Key: HIVE-25429
> URL: https://issues.apache.org/jira/browse/HIVE-25429
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>
> There's a limit to the number of tez counters allowed (tez.counters.max). 
> Delta metrics collection (i.e. DeltaFileMetricsReporter) was creating 3 
> counters for each partition touched by a given query, which can result in a 
> huge number of counters, which is unnecessary because we're only interested 
> in n number of partitions with the most deltas. This change limits the number 
> of counters created to hive.txn.acid.metrics.max.cache.size*3.
> Also when tez.counters.max is reached a LimitExceededException is thrown but 
> isn't caught on the Hive side and causes the query to fail. We should catch 
> this and skip delta metrics collection in this case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25328) Limit scope of REPLACE COLUMNS for Iceberg tables

2021-08-05 Thread Marton Bod (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod resolved HIVE-25328.
---
Resolution: Fixed

> Limit scope of REPLACE COLUMNS for Iceberg tables
> -
>
> Key: HIVE-25328
> URL: https://issues.apache.org/jira/browse/HIVE-25328
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Replace columns is a rather wildcard operation which can do heavy-weight 
> schema changes. We would only want to allow this operation for dropping 
> columns for Iceberg tables. For other changes (adding cols, renaming, type 
> promotion etc.), we should use the CHANGE COLUMN command.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25328) Limit scope of REPLACE COLUMNS for Iceberg tables

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25328?focusedWorklogId=634035=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634035
 ]

ASF GitHub Bot logged work on HIVE-25328:
-

Author: ASF GitHub Bot
Created on: 05/Aug/21 10:05
Start Date: 05/Aug/21 10:05
Worklog Time Spent: 10m 
  Work Description: marton-bod merged pull request #2475:
URL: https://github.com/apache/hive/pull/2475


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 634035)
Time Spent: 1h 20m  (was: 1h 10m)

> Limit scope of REPLACE COLUMNS for Iceberg tables
> -
>
> Key: HIVE-25328
> URL: https://issues.apache.org/jira/browse/HIVE-25328
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Replace columns is a rather wildcard operation which can do heavy-weight 
> schema changes. We would only want to allow this operation for dropping 
> columns for Iceberg tables. For other changes (adding cols, renaming, type 
> promotion etc.), we should use the CHANGE COLUMN command.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25420) Ignore time type column in Iceberg testing for vectorized runs

2021-08-05 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-25420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ádám Szita resolved HIVE-25420.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Committed to master, thanks for the quick review [~pvary].

> Ignore time type column in Iceberg testing for vectorized runs
> --
>
> Key: HIVE-25420
> URL: https://issues.apache.org/jira/browse/HIVE-25420
> Project: Hive
>  Issue Type: Bug
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Time is a valid type in Iceberg but not in Hive. In Hive it is represented as 
> a string type column, while (at least if ORC is used as underlying file 
> format) long type is written out to data files.
> This requires translation two times: long@ORC -> LocalDate@Iceberg -> 
> toString()@Hive and it works well for non vectorized reads, but when 
> vectorization is turned on, we will get:
> {code:java}
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector {code}
> Thus for now, time type is not supported with vectorization, and the relevant 
> test cases should be ignored in such test configs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25373) Modify buildColumnStatsDesc to send configured number of stats for updation

2021-08-05 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera resolved HIVE-25373.

Resolution: Fixed

> Modify buildColumnStatsDesc to send configured number of stats for updation
> ---
>
> Key: HIVE-25373
> URL: https://issues.apache.org/jira/browse/HIVE-25373
> Project: Hive
>  Issue Type: Sub-task
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The number of stats sent for updation should be controlled to avoid thrift 
> error in case the size exceeds the limit.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25342) Optimize set_aggr_stats_for for mergeColStats path.

2021-08-05 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera resolved HIVE-25342.

Resolution: Fixed

> Optimize set_aggr_stats_for for mergeColStats path. 
> 
>
> Key: HIVE-25342
> URL: https://issues.apache.org/jira/browse/HIVE-25342
> Project: Hive
>  Issue Type: Sub-task
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The optimisation used for normal path to use direct sql can also be used for 
> mergeColStats
> path. The stats to be updated can be accumulated in a temp list and that list 
> can be used to update the stats in a batch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25205) Reduce overhead of adding write notification log during batch loading of partition..

2021-08-05 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera resolved HIVE-25205.

Resolution: Fixed

> Reduce overhead of adding write notification log during batch loading of 
> partition..
> 
>
> Key: HIVE-25205
> URL: https://issues.apache.org/jira/browse/HIVE-25205
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, HiveServer2
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: performance
>
> During batch loading of partition the write notification logs are added for 
> each partition added. This is causing delay in execution as the call to HMS 
> is done for each partition. This can be optimised by adding a new API in HMS 
> to send a batch of partition and then this batch can be added together to the 
> backend database. Once we have a batch of notification log, at HMS side, code 
> can be optimised to add the logs using single call to backend RDBMS. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25061) PTF: Improve ValueBoundaryScanner

2021-08-05 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-25061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor resolved HIVE-25061.
-
Resolution: Fixed

> PTF: Improve ValueBoundaryScanner
> -
>
> Key: HIVE-25061
> URL: https://issues.apache.org/jira/browse/HIVE-25061
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: Screen Shot 2021-04-27 at 1.02.37 PM.png
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> -First, I need to check whether TreeMap is really needed for our case.-
> It turned out a binary-ish search approach could help in range calculation, 
> as we're searching in an ordered set of values.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25061) PTF: Improve ValueBoundaryScanner

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25061?focusedWorklogId=633984=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-633984
 ]

ASF GitHub Bot logged work on HIVE-25061:
-

Author: ASF GitHub Bot
Created on: 05/Aug/21 07:56
Start Date: 05/Aug/21 07:56
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on pull request #2225:
URL: https://github.com/apache/hive/pull/2225#issuecomment-893248100


   merged to master, thanks @pgaref for your comments and review!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 633984)
Time Spent: 5h 10m  (was: 5h)

> PTF: Improve ValueBoundaryScanner
> -
>
> Key: HIVE-25061
> URL: https://issues.apache.org/jira/browse/HIVE-25061
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: Screen Shot 2021-04-27 at 1.02.37 PM.png
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> -First, I need to check whether TreeMap is really needed for our case.-
> It turned out a binary-ish search approach could help in range calculation, 
> as we're searching in an ordered set of values.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25061) PTF: Improve ValueBoundaryScanner

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25061?focusedWorklogId=633983=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-633983
 ]

ASF GitHub Bot logged work on HIVE-25061:
-

Author: ASF GitHub Bot
Created on: 05/Aug/21 07:56
Start Date: 05/Aug/21 07:56
Worklog Time Spent: 10m 
  Work Description: abstractdog merged pull request #2225:
URL: https://github.com/apache/hive/pull/2225


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 633983)
Time Spent: 5h  (was: 4h 50m)

> PTF: Improve ValueBoundaryScanner
> -
>
> Key: HIVE-25061
> URL: https://issues.apache.org/jira/browse/HIVE-25061
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: Screen Shot 2021-04-27 at 1.02.37 PM.png
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> -First, I need to check whether TreeMap is really needed for our case.-
> It turned out a binary-ish search approach could help in range calculation, 
> as we're searching in an ordered set of values.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25422) Break up TestHiveIcebergStorageHandlerWithEngine test

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25422?focusedWorklogId=633982=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-633982
 ]

ASF GitHub Bot logged work on HIVE-25422:
-

Author: ASF GitHub Bot
Created on: 05/Aug/21 07:54
Start Date: 05/Aug/21 07:54
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2558:
URL: https://github.com/apache/hive/pull/2558#discussion_r683216671



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergTypes.java
##
@@ -0,0 +1,160 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr.hive;
+
+import java.io.IOException;
+import java.math.BigDecimal;
+import java.util.List;
+import java.util.Map;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.data.GenericRecord;
+import org.apache.iceberg.data.Record;
+import org.apache.iceberg.mr.TestHelper;
+import org.apache.iceberg.types.Types;
+import org.junit.Assert;
+import org.junit.Test;
+
+import static org.apache.iceberg.types.Types.NestedField.required;
+
+public class TestHiveIcebergTypes extends 
TestHiveIcebergStorageHandlerWithEngine {

Review comment:
   Maybe a few lines of comments what tests we expect for most of the new 
classes?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 633982)
Time Spent: 40m  (was: 0.5h)

> Break up TestHiveIcebergStorageHandlerWithEngine test
> -
>
> Key: HIVE-25422
> URL: https://issues.apache.org/jira/browse/HIVE-25422
> Project: Hive
>  Issue Type: Bug
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> TestHiveIcebergStorageHandlerWithEngine tests the Iceberg-Hive integration by 
> running queries against Iceberg-backed tables. This is a parameterized test, 
> so that each file format, each table type, and vectorization on/off scenarios 
> are covered, while it is testing different functionalities too.
> This ticket will track the effort to break this test class into smaller 
> chunks, as in the recent past we have observed cases where this test couldn't 
> even be executed for memory/process problem reasons.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25422) Break up TestHiveIcebergStorageHandlerWithEngine test

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25422?focusedWorklogId=633981=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-633981
 ]

ASF GitHub Bot logged work on HIVE-25422:
-

Author: ASF GitHub Bot
Created on: 05/Aug/21 07:54
Start Date: 05/Aug/21 07:54
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2558:
URL: https://github.com/apache/hive/pull/2558#discussion_r683216671



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergTypes.java
##
@@ -0,0 +1,160 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr.hive;
+
+import java.io.IOException;
+import java.math.BigDecimal;
+import java.util.List;
+import java.util.Map;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.data.GenericRecord;
+import org.apache.iceberg.data.Record;
+import org.apache.iceberg.mr.TestHelper;
+import org.apache.iceberg.types.Types;
+import org.junit.Assert;
+import org.junit.Test;
+
+import static org.apache.iceberg.types.Types.NestedField.required;
+
+public class TestHiveIcebergTypes extends 
TestHiveIcebergStorageHandlerWithEngine {

Review comment:
   Maybe a few lines of comments what tests we expect here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 633981)
Time Spent: 0.5h  (was: 20m)

> Break up TestHiveIcebergStorageHandlerWithEngine test
> -
>
> Key: HIVE-25422
> URL: https://issues.apache.org/jira/browse/HIVE-25422
> Project: Hive
>  Issue Type: Bug
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> TestHiveIcebergStorageHandlerWithEngine tests the Iceberg-Hive integration by 
> running queries against Iceberg-backed tables. This is a parameterized test, 
> so that each file format, each table type, and vectorization on/off scenarios 
> are covered, while it is testing different functionalities too.
> This ticket will track the effort to break this test class into smaller 
> chunks, as in the recent past we have observed cases where this test couldn't 
> even be executed for memory/process problem reasons.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25422) Break up TestHiveIcebergStorageHandlerWithEngine test

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25422?focusedWorklogId=633980=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-633980
 ]

ASF GitHub Bot logged work on HIVE-25422:
-

Author: ASF GitHub Bot
Created on: 05/Aug/21 07:53
Start Date: 05/Aug/21 07:53
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2558:
URL: https://github.com/apache/hive/pull/2558#discussion_r683216106



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -20,116 +20,81 @@
 package org.apache.iceberg.mr.hive;
 
 import java.io.IOException;
-import java.math.BigDecimal;
-import java.text.SimpleDateFormat;
-import java.time.LocalDate;
-import java.time.LocalDateTime;
-import java.time.OffsetDateTime;
-import java.time.ZoneOffset;
 import java.util.ArrayList;
-import java.util.Arrays;
 import java.util.Collection;
-import java.util.Comparator;
-import java.util.Date;
 import java.util.HashMap;
 import java.util.List;
-import java.util.Locale;
 import java.util.Map;
-import java.util.Properties;
 import java.util.concurrent.TimeUnit;
-import java.util.stream.Collectors;
-import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.hive.common.StatsSetupConst;
 import org.apache.hadoop.hive.conf.HiveConf;
-import org.apache.hadoop.hive.metastore.api.FieldSchema;
-import org.apache.hadoop.hive.metastore.api.MetaException;
-import org.apache.hadoop.hive.metastore.api.StorageDescriptor;
-import org.apache.hadoop.hive.metastore.partition.spec.PartitionSpecProxy;
 import org.apache.hadoop.hive.ql.exec.mr.ExecMapper;
-import org.apache.iceberg.AssertHelpers;
 import org.apache.iceberg.FileFormat;
-import org.apache.iceberg.HistoryEntry;
-import org.apache.iceberg.PartitionKey;
-import org.apache.iceberg.PartitionSpec;
 import org.apache.iceberg.Schema;
 import org.apache.iceberg.SnapshotSummary;
 import org.apache.iceberg.Table;
-import org.apache.iceberg.TableProperties;
-import org.apache.iceberg.catalog.TableIdentifier;
-import org.apache.iceberg.data.GenericRecord;
 import org.apache.iceberg.data.Record;
-import org.apache.iceberg.exceptions.NoSuchTableException;
-import org.apache.iceberg.hive.HiveSchemaUtil;
 import org.apache.iceberg.hive.MetastoreUtil;
-import org.apache.iceberg.mr.Catalogs;
-import org.apache.iceberg.mr.InputFormatConfig;
 import org.apache.iceberg.mr.TestHelper;
 import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList;
 import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
-import org.apache.iceberg.relocated.com.google.common.collect.Lists;
 import org.apache.iceberg.types.Type;
 import org.apache.iceberg.types.Types;
 import org.apache.thrift.TException;
 import org.junit.After;
 import org.junit.AfterClass;
 import org.junit.Assert;
-import org.junit.Assume;
 import org.junit.Before;
 import org.junit.BeforeClass;
 import org.junit.Rule;
-import org.junit.Test;
 import org.junit.rules.TemporaryFolder;
 import org.junit.rules.Timeout;
 import org.junit.runner.RunWith;
 import org.junit.runners.Parameterized;
-import org.mockito.ArgumentMatchers;
-import org.mockito.MockedStatic;
-import org.mockito.Mockito;
 
 import static org.apache.iceberg.types.Types.NestedField.optional;
 import static org.apache.iceberg.types.Types.NestedField.required;
 import static org.junit.runners.Parameterized.Parameter;
 import static org.junit.runners.Parameterized.Parameters;
 
-
 @RunWith(Parameterized.class)
-public class TestHiveIcebergStorageHandlerWithEngine {
+public abstract class TestHiveIcebergStorageHandlerWithEngine {

Review comment:
   Should we rename this to `HiveIcebergStorageHandlerWithEngineBase` or 
something, so it does not start with Test?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 633980)
Time Spent: 20m  (was: 10m)

> Break up TestHiveIcebergStorageHandlerWithEngine test
> -
>
> Key: HIVE-25422
> URL: https://issues.apache.org/jira/browse/HIVE-25422
> Project: Hive
>  Issue Type: Bug
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> TestHiveIcebergStorageHandlerWithEngine tests the Iceberg-Hive integration by 
> running queries against Iceberg-backed tables. This is a parameterized test, 
> so that each file 

[jira] [Work logged] (HIVE-25048) Refine the start/end functions in HMSHandler

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25048?focusedWorklogId=633955=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-633955
 ]

ASF GitHub Bot logged work on HIVE-25048:
-

Author: ASF GitHub Bot
Created on: 05/Aug/21 06:10
Start Date: 05/Aug/21 06:10
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on a change in pull request #2441:
URL: https://github.com/apache/hive/pull/2441#discussion_r683080884



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/RetryingHMSHandler.java
##
@@ -61,6 +66,7 @@ public Result(Object result, int numRetries) {
 
   private final Configuration origConf;// base configuration
   private final Configuration activeConf;  // active configuration
+  private final List endFunctionListeners; // 
the end function listener

Review comment:
   Thank you for the comments!
   The `startFunctions` could be eliminated for the reasons listing above, 
leaving the `MetaStoreEndFunctionListener` on `endFunction` that should be 
taken care of. Introducing a `MetaStoreFunctionListener` makes things much 
straightforward, but we can get the context of the current method from 
`MetaStoreEndFunctionContext`, making it usless to have an extra start function 
listener.
   This change also introduce some incompatibility in 
[MetaStoreEndFunctionContext](https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreEndFunctionContext.java#L55-L57),
 the `getInputTableName` would return null for the old implemention.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 633955)
Time Spent: 3h 20m  (was: 3h 10m)

> Refine the start/end functions in HMSHandler
> 
>
> Key: HIVE-25048
> URL: https://issues.apache.org/jira/browse/HIVE-25048
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Some start/end functions are incomplete or wrong in the HMSHandler, these 
> functions audit actions, monitor the performance, and notify the end function 
> listeners. We have already measured the performance of the HMSHandler in 
> PerfLogger,  and covered more methods than these functions that have done, so 
> we can remove the monitoring from the start/end functions, move the end 
> function listeners to the RetryingHMSHandler to eliminate the try-finally 
> blocks that spread across many different methods. After these, we can try to 
> cleanup the functions to make HMSHandler be more simplified.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)