[jira] [Work logged] (HIVE-25596) Compress Hive Replication Metrics while storing

2021-11-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25596?focusedWorklogId=678934&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-678934
 ]

ASF GitHub Bot logged work on HIVE-25596:
-

Author: ASF GitHub Bot
Created on: 09/Nov/21 08:13
Start Date: 09/Nov/21 08:13
Worklog Time Spent: 10m 
  Work Description: pkumarsinha merged pull request #2724:
URL: https://github.com/apache/hive/pull/2724


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 678934)
Time Spent: 6h  (was: 5h 50m)

> Compress Hive Replication Metrics while storing
> ---
>
> Key: HIVE-25596
> URL: https://issues.apache.org/jira/browse/HIVE-25596
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
> Attachments: CompressedRM_Progress(k=10), CompressedRM_Progress(k=5), 
> PlainTextRM_Progress(k=10), PlainTextRM_Progress(k=5)
>
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> Compress the json fields of sys.replication_metrics table to optimise RDBMS 
> space usage.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25596) Compress Hive Replication Metrics while storing

2021-11-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25596?focusedWorklogId=678350&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-678350
 ]

ASF GitHub Bot logged work on HIVE-25596:
-

Author: ASF GitHub Bot
Created on: 08/Nov/21 06:57
Start Date: 08/Nov/21 06:57
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #2724:
URL: https://github.com/apache/hive/pull/2724#discussion_r76628



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/repl/metric/ReplicationMetricCollector.java
##
@@ -105,6 +110,21 @@ public void reportFailoverStart(String stageName, 
Map metricMap,
 }
   }
 
+  private void updateRMProgressIfLimitExceeds(Progress progress, Stage stage) 
throws SemanticException {
+MessageSerializer serializer = 
MessageFactory.getDefaultInstanceForReplMetrics(conf).getSerializer();
+ObjectMapper mapper = new ObjectMapper();
+String serializedProgress = null;
+try {
+  serializedProgress = 
serializer.serialize(mapper.writeValueAsString(progress));
+} catch (Exception e) {
+  throw new SemanticException(e);
+}
+if (serializedProgress.length() > ReplStatsTracker.RM_PROGRESS_LENGTH) {
+  stage.setReplStats("Error: RM_PROGRESS limit exceeded to " + 
serializedProgress.length());

Review comment:
   Also, please add the dropped progress as a part of log 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 678350)
Time Spent: 5h 50m  (was: 5h 40m)

> Compress Hive Replication Metrics while storing
> ---
>
> Key: HIVE-25596
> URL: https://issues.apache.org/jira/browse/HIVE-25596
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
> Attachments: CompressedRM_Progress(k=10), CompressedRM_Progress(k=5), 
> PlainTextRM_Progress(k=10), PlainTextRM_Progress(k=5)
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> Compress the json fields of sys.replication_metrics table to optimise RDBMS 
> space usage.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25596) Compress Hive Replication Metrics while storing

2021-11-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25596?focusedWorklogId=678347&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-678347
 ]

ASF GitHub Bot logged work on HIVE-25596:
-

Author: ASF GitHub Bot
Created on: 08/Nov/21 06:47
Start Date: 08/Nov/21 06:47
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #2724:
URL: https://github.com/apache/hive/pull/2724#discussion_r744418558



##
File path: 
ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFDeserialize.java
##
@@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.udf.generic;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.messaging.MessageEncoder;
+import org.apache.hadoop.hive.metastore.messaging.MessageFactory;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
+import org.apache.hadoop.io.Text;
+import org.junit.Test;
+import static org.junit.Assert.assertNotNull;
+import static org.junit.Assert.assertNull;
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertTrue;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
+
+/**
+ * TestGenericUDFDeserialize.

Review comment:
   Add description

##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -696,7 +696,8 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
 + "attempted using the snapshot based approach. If disabled, the 
replication will fail in case the target is "
 + "modified."),
 REPL_STATS_TOP_EVENTS_COUNTS("hive.repl.stats.events.count", 5,
-"Number of top costliest events that needs to maintained per event 
type for the replication statistics."),
+"Number of top costliest events that needs to maintained per event 
type for the replication statistics." +
+" Maximum permissible limit is 10."),

Review comment:
   fix typo

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/repl/metric/ReplicationMetricCollector.java
##
@@ -105,6 +110,21 @@ public void reportFailoverStart(String stageName, 
Map metricMap,
 }
   }
 
+  private void updateRMProgressIfLimitExceeds(Progress progress, Stage stage) 
throws SemanticException {
+MessageSerializer serializer = 
MessageFactory.getDefaultInstanceForReplMetrics(conf).getSerializer();
+ObjectMapper mapper = new ObjectMapper();
+String serializedProgress = null;
+try {
+  serializedProgress = 
serializer.serialize(mapper.writeValueAsString(progress));
+} catch (Exception e) {
+  throw new SemanticException(e);
+}
+if (serializedProgress.length() > ReplStatsTracker.RM_PROGRESS_LENGTH) {
+  stage.setReplStats("Error: RM_PROGRESS limit exceeded to " + 
serializedProgress.length());

Review comment:
   Add a log line too 

##
File path: 
ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFDeserialize.java
##
@@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.udf.generic;
+
+i

[jira] [Work logged] (HIVE-25596) Compress Hive Replication Metrics while storing

2021-11-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25596?focusedWorklogId=676240&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-676240
 ]

ASF GitHub Bot logged work on HIVE-25596:
-

Author: ASF GitHub Bot
Created on: 04/Nov/21 07:27
Start Date: 04/Nov/21 07:27
Worklog Time Spent: 10m 
  Work Description: hmangla98 commented on a change in pull request #2724:
URL: https://github.com/apache/hive/pull/2724#discussion_r742588986



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/repl/metric/event/Stage.java
##
@@ -129,12 +127,7 @@ public String getReplStats() {
   }
 
   public void setReplStats(String replStats) {
-// Check the stat string doesn't surpass the RM_PROGRESS column length.
-if (replStats.length() >= RM_PROGRESS_LENGTH - 2000) {

Review comment:
   2k chars are consumed by variables other than replStats.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 676240)
Time Spent: 5.5h  (was: 5h 20m)

> Compress Hive Replication Metrics while storing
> ---
>
> Key: HIVE-25596
> URL: https://issues.apache.org/jira/browse/HIVE-25596
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
> Attachments: CompressedRM_Progress(k=10), CompressedRM_Progress(k=5), 
> PlainTextRM_Progress(k=10), PlainTextRM_Progress(k=5)
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> Compress the json fields of sys.replication_metrics table to optimise RDBMS 
> space usage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25596) Compress Hive Replication Metrics while storing

2021-11-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25596?focusedWorklogId=676239&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-676239
 ]

ASF GitHub Bot logged work on HIVE-25596:
-

Author: ASF GitHub Bot
Created on: 04/Nov/21 07:26
Start Date: 04/Nov/21 07:26
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #2724:
URL: https://github.com/apache/hive/pull/2724#discussion_r742588517



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/repl/metric/event/Stage.java
##
@@ -129,12 +127,7 @@ public String getReplStats() {
   }
 
   public void setReplStats(String replStats) {
-// Check the stat string doesn't surpass the RM_PROGRESS column length.
-if (replStats.length() >= RM_PROGRESS_LENGTH - 2000) {

Review comment:
   why was it 2k lesser than RM_PROGRESS_LENGTH 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 676239)
Time Spent: 5h 20m  (was: 5h 10m)

> Compress Hive Replication Metrics while storing
> ---
>
> Key: HIVE-25596
> URL: https://issues.apache.org/jira/browse/HIVE-25596
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
> Attachments: CompressedRM_Progress(k=10), CompressedRM_Progress(k=5), 
> PlainTextRM_Progress(k=10), PlainTextRM_Progress(k=5)
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Compress the json fields of sys.replication_metrics table to optimise RDBMS 
> space usage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25596) Compress Hive Replication Metrics while storing

2021-11-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25596?focusedWorklogId=676237&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-676237
 ]

ASF GitHub Bot logged work on HIVE-25596:
-

Author: ASF GitHub Bot
Created on: 04/Nov/21 07:25
Start Date: 04/Nov/21 07:25
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #2724:
URL: https://github.com/apache/hive/pull/2724#discussion_r742587970



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/repl/metric/ReplicationMetricCollector.java
##
@@ -116,14 +136,16 @@ public void reportStageEnd(String stageName, Status 
status, long lastReplId,
 stage = new Stage(stageName, status, -1L);
   }
   stage.setStatus(status);
-  stage.setEndTime(System.currentTimeMillis());
+  stage.setEndTime(getCurrentTimeInMillis());
   stage.setReplSnapshotsCount(replSnapshotCount);
   if (replStatsTracker != null && !(replStatsTracker instanceof 
NoOpReplStatsTracker)) {
 String replStatString = replStatsTracker.toString();
 LOG.info("Replication Statistics are: {}", replStatString);
 stage.setReplStats(replStatString);
   }
   progress.addStage(stage);
+  // Check the progress string doesn't surpass the RM_PROGRESS column 
width.
+  checkRMProgressLimit(progress, stage);

Review comment:
   nit: 
   it's not just check. it is updating the state as well. we should name it 
accordingly




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 676237)
Time Spent: 5h 10m  (was: 5h)

> Compress Hive Replication Metrics while storing
> ---
>
> Key: HIVE-25596
> URL: https://issues.apache.org/jira/browse/HIVE-25596
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
> Attachments: CompressedRM_Progress(k=10), CompressedRM_Progress(k=5), 
> PlainTextRM_Progress(k=10), PlainTextRM_Progress(k=5)
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> Compress the json fields of sys.replication_metrics table to optimise RDBMS 
> space usage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25596) Compress Hive Replication Metrics while storing

2021-11-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25596?focusedWorklogId=676235&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-676235
 ]

ASF GitHub Bot logged work on HIVE-25596:
-

Author: ASF GitHub Bot
Created on: 04/Nov/21 07:23
Start Date: 04/Nov/21 07:23
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #2724:
URL: https://github.com/apache/hive/pull/2724#discussion_r742587136



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/repl/metric/ReplicationMetricCollector.java
##
@@ -105,6 +110,21 @@ public void reportFailoverStart(String stageName, 
Map metricMap,
 }
   }
 
+  private void checkRMProgressLimit(Progress progress, Stage stage) throws 
SemanticException {
+MessageSerializer serializer = 
MessageFactory.getDefaultInstanceForReplMetrics(conf).getSerializer();
+ObjectMapper mapper = new ObjectMapper();
+String serializedProgress = null;
+try {
+  serializedProgress = 
serializer.serialize(mapper.writeValueAsString(progress));
+} catch (Exception e) {
+  throw new SemanticException(e);
+}
+if (serializedProgress.length() > ReplStatsTracker.RM_PROGRESS_LENGTH) {
+  stage.setReplStats("RM_PROGRESS LIMIT EXCEEDED TO " + 
serializedProgress.length());

Review comment:
   Add keyword as a prefix like "ERROR" or "WARN"




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 676235)
Time Spent: 5h  (was: 4h 50m)

> Compress Hive Replication Metrics while storing
> ---
>
> Key: HIVE-25596
> URL: https://issues.apache.org/jira/browse/HIVE-25596
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
> Attachments: CompressedRM_Progress(k=10), CompressedRM_Progress(k=5), 
> PlainTextRM_Progress(k=10), PlainTextRM_Progress(k=5)
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> Compress the json fields of sys.replication_metrics table to optimise RDBMS 
> space usage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25596) Compress Hive Replication Metrics while storing

2021-11-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25596?focusedWorklogId=675955&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-675955
 ]

ASF GitHub Bot logged work on HIVE-25596:
-

Author: ASF GitHub Bot
Created on: 04/Nov/21 01:44
Start Date: 04/Nov/21 01:44
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #2724:
URL: https://github.com/apache/hive/pull/2724#discussion_r741594555



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/repl/metric/MetricSink.java
##
@@ -116,14 +117,17 @@ public void run() {
   int totalMetricsSize = metrics.size();
   List replicationMetricsList = new 
ArrayList<>(totalMetricsSize);
   ObjectMapper mapper = new ObjectMapper();
+  MessageEncoder encoder = 
MessageFactory.getDefaultInstanceForReplMetrics(conf);
+  MessageSerializer serializer = encoder.getSerializer();
   for (int index = 0; index < totalMetricsSize; index++) {
 ReplicationMetric metric = metrics.removeFirst();
 ReplicationMetrics persistentMetric = new ReplicationMetrics();
 persistentMetric.setDumpExecutionId(metric.getDumpExecutionId());
 
persistentMetric.setScheduledExecutionId(metric.getScheduledExecutionId());
 persistentMetric.setPolicy(metric.getPolicy());
-
persistentMetric.setProgress(mapper.writeValueAsString(metric.getProgress()));
-
persistentMetric.setMetadata(mapper.writeValueAsString(metric.getMetadata()));
+
persistentMetric.setProgress(serializer.serialize(mapper.writeValueAsString(metric.getProgress(;
+
persistentMetric.setMetadata(serializer.serialize(mapper.writeValueAsString(metric.getMetadata(;

Review comment:
   How does this justify a need to compress the metadata filed in that 
case? I think we should focus on the size in worst case and then see change 
post compression. That way we can decide on:
   a) whether we really need compressetion for metadata column
   b) if so, how much should the column size be. 

##
File path: 
ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFDeserialize.java
##
@@ -0,0 +1,93 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.udf.generic;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.messaging.MessageEncoder;
+import org.apache.hadoop.hive.metastore.messaging.MessageFactory;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
+import org.apache.hadoop.io.Text;
+import org.junit.Test;
+import static org.junit.Assert.assertNotNull;
+import static org.junit.Assert.assertNull;
+import static org.junit.Assert.assertEquals;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
+
+/**
+ * TestGenericUDFGzipJsonDeserialize.
+ */
+public class TestGenericUDFDeserialize {
+
+@Test
+public void testOneArg() throws HiveException {
+GenericUDFDeserialize udf = new GenericUDFDeserialize();
+ObjectInspector valueOI1 = 
PrimitiveObjectInspectorFactory.writableStringObjectInspector;
+ObjectInspector valueOI2 = 
PrimitiveObjectInspectorFactory.writableStringObjectInspector;
+UDFArgumentException ex = null;
+try {
+udf.initialize(new ObjectInspector[]{valueOI1});
+} catch (UDFArgumentException e) {
+ex = e;
+}
+assertNotNull("The function deserialize() accepts 2 argument.", ex);
+ex = null;
+try {
+udf.initialize(new ObjectInspector[]{valueOI2, valueOI1});
+} catch (UDFArgumentException e) {
+ex = e;
+}
+assertNull("The function deserialize() accepts 2 argument.", ex);
+}
+
+@Test
+public void testGZIPJsonDeserializeString() throws HiveException {
+GenericUDFDeserialize udf = new GenericUDFDeser

[jira] [Work logged] (HIVE-25596) Compress Hive Replication Metrics while storing

2021-11-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25596?focusedWorklogId=675792&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-675792
 ]

ASF GitHub Bot logged work on HIVE-25596:
-

Author: ASF GitHub Bot
Created on: 04/Nov/21 01:28
Start Date: 04/Nov/21 01:28
Worklog Time Spent: 10m 
  Work Description: hmangla98 commented on a change in pull request #2724:
URL: https://github.com/apache/hive/pull/2724#discussion_r741578189



##
File path: 
standalone-metastore/metastore-server/src/test/resources/sql/postgres/upgrade-3.1.3000-to-4.0.0.postgres.sql
##
@@ -51,6 +51,10 @@ CREATE TABLE "REPLICATION_METRICS" (
 --Increase the size of RM_PROGRESS to accomodate the replication statistics
 ALTER TABLE "REPLICATION_METRICS" ALTER "RM_PROGRESS" TYPE varchar(24000);
 
+ALTER TABLE "REPLICATION_METRICS" ALTER "RM_PROGRESS" TYPE varchar(1);

Review comment:
   This is tested as part of ITestPostgres.

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFDeserialize.java
##
@@ -0,0 +1,86 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.udf.generic;
+
+import org.apache.hadoop.hive.metastore.messaging.json.JSONMessageEncoder;
+import 
org.apache.hadoop.hive.metastore.messaging.json.gzip.GzipJSONMessageEncoder;
+import org.apache.hadoop.hive.ql.exec.Description;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils;
+
+/**
+ * GenericUDFDeserializeString.
+ *
+ */
+@Description(name = "deserialize",
+value="_FUNC_(message, encodingFormat) - Returns deserialized string 
of encoded message.",
+extended="Example:\n"
++ "  > SELECT _FUNC_('H4sI/ytJLS4BAAx+f9gE', 
'gzip(json-2.0)') FROM src LIMIT 1;\n"
++ "  test")
+public class GenericUDFDeserialize extends GenericUDF {
+
+private static final int ARG_COUNT = 2; // Number of arguments to this UDF
+private static final String FUNC_NAME = "deserialize"; // External Name
+
+private transient PrimitiveObjectInspector stringOI = null;
+private transient PrimitiveObjectInspector encodingFormat = null;
+
+@Override
+public ObjectInspector initialize(ObjectInspector[] arguments)
+throws UDFArgumentException {
+if (arguments.length != ARG_COUNT) {
+throw new UDFArgumentException("The function " + FUNC_NAME + " 
accepts " + ARG_COUNT + " arguments.");
+}
+for (ObjectInspector arg: arguments) {
+if (arg.getCategory() != ObjectInspector.Category.PRIMITIVE ||
+
PrimitiveObjectInspectorUtils.PrimitiveGrouping.STRING_GROUP != 
PrimitiveObjectInspectorUtils.getPrimitiveGrouping(
+
((PrimitiveObjectInspector)arg).getPrimitiveCategory())){
+throw new UDFArgumentTypeException(0, "The arguments to " + 
FUNC_NAME + " must be a string/varchar");
+}
+}
+stringOI = (PrimitiveObjectInspector) arguments[0];
+encodingFormat = (PrimitiveObjectInspector) arguments[1];
+return PrimitiveObjectInspectorFactory.javaStringObjectInspector;
+}
+
+@Override
+public Object evaluate(DeferredObject[] arguments) throws HiveException {
+String value = 
PrimitiveObjectInspectorUtils.getString(arguments[0].get(), stringOI);
+String messageFormat = 
PrimitiveObjectInspectorUtils.getString(arguments[1].get(), encodingFormat);
+if (value == null) {
+return null;
+} else if (messageFormat == null || messageFormat.isEmpty() || 
JSONMessageEncoder.FORMAT.equalsIgnoreCase(value))

[jira] [Work logged] (HIVE-25596) Compress Hive Replication Metrics while storing

2021-11-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25596?focusedWorklogId=675609&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-675609
 ]

ASF GitHub Bot logged work on HIVE-25596:
-

Author: ASF GitHub Bot
Created on: 04/Nov/21 01:07
Start Date: 04/Nov/21 01:07
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #2724:
URL: https://github.com/apache/hive/pull/2724#discussion_r741567981



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/repl/metric/ReplicationMetricCollector.java
##
@@ -58,14 +59,15 @@ public void setMetricsMBean(ObjectName metricsMBean) {
 
   public ReplicationMetricCollector(String dbName, Metadata.ReplicationType 
replicationType,
  String stagingDir, long dumpExecutionId, HiveConf 
conf) {
+this.conf = conf;
 checkEnabledForTests(conf);
 String policy = conf.get(Constants.SCHEDULED_QUERY_SCHEDULENAME);
 long executionId = conf.getLong(Constants.SCHEDULED_QUERY_EXECUTIONID, 0L);
 if (!StringUtils.isEmpty(policy) && executionId > 0) {
   isEnabled = true;
   metricCollector = MetricCollector.getInstance().init(conf);
   MetricSink.getInstance().init(conf);
-  Metadata metadata = new Metadata(dbName, replicationType, stagingDir);
+  Metadata metadata = new Metadata(dbName, replicationType, 
testingModeEnabled() ? "dummyDir" :stagingDir);

Review comment:
   nit: shift this staging logic calculation to a separate method

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFDeserialize.java
##
@@ -0,0 +1,86 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.udf.generic;
+
+import org.apache.hadoop.hive.metastore.messaging.json.JSONMessageEncoder;
+import 
org.apache.hadoop.hive.metastore.messaging.json.gzip.GzipJSONMessageEncoder;
+import org.apache.hadoop.hive.ql.exec.Description;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils;
+
+/**
+ * GenericUDFDeserializeString.
+ *
+ */
+@Description(name = "deserialize",
+value="_FUNC_(message, encodingFormat) - Returns deserialized string 
of encoded message.",
+extended="Example:\n"
++ "  > SELECT _FUNC_('H4sI/ytJLS4BAAx+f9gE', 
'gzip(json-2.0)') FROM src LIMIT 1;\n"

Review comment:
   The base64 encoding is  missed out here in description i.e even though 
the passed parameter is "gzip(json-2.0)", the fact is that it won't work unless 
the passed content is base64 encoded. 
   Moreover,  json part in "gzip(json-2.0)" at a UDF level would be confusing. 
Meaning even if underlying string is non json one, the UDF will work just fine. 
 user mandated to pass json even if it is not doesn't go well. for a UDF level 
just gzip would have been fine?

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFDeserialize.java
##
@@ -0,0 +1,86 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND

[jira] [Work logged] (HIVE-25596) Compress Hive Replication Metrics while storing

2021-11-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25596?focusedWorklogId=675592&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-675592
 ]

ASF GitHub Bot logged work on HIVE-25596:
-

Author: ASF GitHub Bot
Created on: 04/Nov/21 01:05
Start Date: 04/Nov/21 01:05
Worklog Time Spent: 10m 
  Work Description: hmangla98 commented on a change in pull request #2724:
URL: https://github.com/apache/hive/pull/2724#discussion_r741578189



##
File path: 
standalone-metastore/metastore-server/src/test/resources/sql/postgres/upgrade-3.1.3000-to-4.0.0.postgres.sql
##
@@ -51,6 +51,10 @@ CREATE TABLE "REPLICATION_METRICS" (
 --Increase the size of RM_PROGRESS to accomodate the replication statistics
 ALTER TABLE "REPLICATION_METRICS" ALTER "RM_PROGRESS" TYPE varchar(24000);
 
+ALTER TABLE "REPLICATION_METRICS" ALTER "RM_PROGRESS" TYPE varchar(1);

Review comment:
   This is tested as part of ITestPostgres.

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFDeserialize.java
##
@@ -0,0 +1,86 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.udf.generic;
+
+import org.apache.hadoop.hive.metastore.messaging.json.JSONMessageEncoder;
+import 
org.apache.hadoop.hive.metastore.messaging.json.gzip.GzipJSONMessageEncoder;
+import org.apache.hadoop.hive.ql.exec.Description;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils;
+
+/**
+ * GenericUDFDeserializeString.
+ *
+ */
+@Description(name = "deserialize",
+value="_FUNC_(message, encodingFormat) - Returns deserialized string 
of encoded message.",
+extended="Example:\n"
++ "  > SELECT _FUNC_('H4sI/ytJLS4BAAx+f9gE', 
'gzip(json-2.0)') FROM src LIMIT 1;\n"
++ "  test")
+public class GenericUDFDeserialize extends GenericUDF {
+
+private static final int ARG_COUNT = 2; // Number of arguments to this UDF
+private static final String FUNC_NAME = "deserialize"; // External Name
+
+private transient PrimitiveObjectInspector stringOI = null;
+private transient PrimitiveObjectInspector encodingFormat = null;
+
+@Override
+public ObjectInspector initialize(ObjectInspector[] arguments)
+throws UDFArgumentException {
+if (arguments.length != ARG_COUNT) {
+throw new UDFArgumentException("The function " + FUNC_NAME + " 
accepts " + ARG_COUNT + " arguments.");
+}
+for (ObjectInspector arg: arguments) {
+if (arg.getCategory() != ObjectInspector.Category.PRIMITIVE ||
+
PrimitiveObjectInspectorUtils.PrimitiveGrouping.STRING_GROUP != 
PrimitiveObjectInspectorUtils.getPrimitiveGrouping(
+
((PrimitiveObjectInspector)arg).getPrimitiveCategory())){
+throw new UDFArgumentTypeException(0, "The arguments to " + 
FUNC_NAME + " must be a string/varchar");
+}
+}
+stringOI = (PrimitiveObjectInspector) arguments[0];
+encodingFormat = (PrimitiveObjectInspector) arguments[1];
+return PrimitiveObjectInspectorFactory.javaStringObjectInspector;
+}
+
+@Override
+public Object evaluate(DeferredObject[] arguments) throws HiveException {
+String value = 
PrimitiveObjectInspectorUtils.getString(arguments[0].get(), stringOI);
+String messageFormat = 
PrimitiveObjectInspectorUtils.getString(arguments[1].get(), encodingFormat);
+if (value == null) {
+return null;
+} else if (messageFormat == null || messageFormat.isEmpty() || 
JSONMessageEncoder.FORMAT.equalsIgnoreCase(value))

[jira] [Work logged] (HIVE-25596) Compress Hive Replication Metrics while storing

2021-11-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25596?focusedWorklogId=675098&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-675098
 ]

ASF GitHub Bot logged work on HIVE-25596:
-

Author: ASF GitHub Bot
Created on: 04/Nov/21 00:11
Start Date: 04/Nov/21 00:11
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #2724:
URL: https://github.com/apache/hive/pull/2724#discussion_r741567981



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/repl/metric/ReplicationMetricCollector.java
##
@@ -58,14 +59,15 @@ public void setMetricsMBean(ObjectName metricsMBean) {
 
   public ReplicationMetricCollector(String dbName, Metadata.ReplicationType 
replicationType,
  String stagingDir, long dumpExecutionId, HiveConf 
conf) {
+this.conf = conf;
 checkEnabledForTests(conf);
 String policy = conf.get(Constants.SCHEDULED_QUERY_SCHEDULENAME);
 long executionId = conf.getLong(Constants.SCHEDULED_QUERY_EXECUTIONID, 0L);
 if (!StringUtils.isEmpty(policy) && executionId > 0) {
   isEnabled = true;
   metricCollector = MetricCollector.getInstance().init(conf);
   MetricSink.getInstance().init(conf);
-  Metadata metadata = new Metadata(dbName, replicationType, stagingDir);
+  Metadata metadata = new Metadata(dbName, replicationType, 
testingModeEnabled() ? "dummyDir" :stagingDir);

Review comment:
   nit: shift this staging logic calculation to a separate method

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFDeserialize.java
##
@@ -0,0 +1,86 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.udf.generic;
+
+import org.apache.hadoop.hive.metastore.messaging.json.JSONMessageEncoder;
+import 
org.apache.hadoop.hive.metastore.messaging.json.gzip.GzipJSONMessageEncoder;
+import org.apache.hadoop.hive.ql.exec.Description;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils;
+
+/**
+ * GenericUDFDeserializeString.
+ *
+ */
+@Description(name = "deserialize",
+value="_FUNC_(message, encodingFormat) - Returns deserialized string 
of encoded message.",
+extended="Example:\n"
++ "  > SELECT _FUNC_('H4sI/ytJLS4BAAx+f9gE', 
'gzip(json-2.0)') FROM src LIMIT 1;\n"

Review comment:
   The base64 encoding is  missed out here in description i.e even though 
the passed parameter is "gzip(json-2.0)", the fact is that it won't work unless 
the passed content is base64 encoded. 
   Moreover,  json part in "gzip(json-2.0)" at a UDF level would be confusing. 
Meaning even if underlying string is non json one, the UDF will work just fine. 
 user mandated to pass json even if it is not doesn't go well. for a UDF level 
just gzip would have been fine?

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFDeserialize.java
##
@@ -0,0 +1,86 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND

[jira] [Work logged] (HIVE-25596) Compress Hive Replication Metrics while storing

2021-11-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25596?focusedWorklogId=675079&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-675079
 ]

ASF GitHub Bot logged work on HIVE-25596:
-

Author: ASF GitHub Bot
Created on: 04/Nov/21 00:08
Start Date: 04/Nov/21 00:08
Worklog Time Spent: 10m 
  Work Description: hmangla98 commented on a change in pull request #2724:
URL: https://github.com/apache/hive/pull/2724#discussion_r741578189



##
File path: 
standalone-metastore/metastore-server/src/test/resources/sql/postgres/upgrade-3.1.3000-to-4.0.0.postgres.sql
##
@@ -51,6 +51,10 @@ CREATE TABLE "REPLICATION_METRICS" (
 --Increase the size of RM_PROGRESS to accomodate the replication statistics
 ALTER TABLE "REPLICATION_METRICS" ALTER "RM_PROGRESS" TYPE varchar(24000);
 
+ALTER TABLE "REPLICATION_METRICS" ALTER "RM_PROGRESS" TYPE varchar(1);

Review comment:
   This is tested as part of ITestPostgres.

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFDeserialize.java
##
@@ -0,0 +1,86 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.udf.generic;
+
+import org.apache.hadoop.hive.metastore.messaging.json.JSONMessageEncoder;
+import 
org.apache.hadoop.hive.metastore.messaging.json.gzip.GzipJSONMessageEncoder;
+import org.apache.hadoop.hive.ql.exec.Description;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils;
+
+/**
+ * GenericUDFDeserializeString.
+ *
+ */
+@Description(name = "deserialize",
+value="_FUNC_(message, encodingFormat) - Returns deserialized string 
of encoded message.",
+extended="Example:\n"
++ "  > SELECT _FUNC_('H4sI/ytJLS4BAAx+f9gE', 
'gzip(json-2.0)') FROM src LIMIT 1;\n"
++ "  test")
+public class GenericUDFDeserialize extends GenericUDF {
+
+private static final int ARG_COUNT = 2; // Number of arguments to this UDF
+private static final String FUNC_NAME = "deserialize"; // External Name
+
+private transient PrimitiveObjectInspector stringOI = null;
+private transient PrimitiveObjectInspector encodingFormat = null;
+
+@Override
+public ObjectInspector initialize(ObjectInspector[] arguments)
+throws UDFArgumentException {
+if (arguments.length != ARG_COUNT) {
+throw new UDFArgumentException("The function " + FUNC_NAME + " 
accepts " + ARG_COUNT + " arguments.");
+}
+for (ObjectInspector arg: arguments) {
+if (arg.getCategory() != ObjectInspector.Category.PRIMITIVE ||
+
PrimitiveObjectInspectorUtils.PrimitiveGrouping.STRING_GROUP != 
PrimitiveObjectInspectorUtils.getPrimitiveGrouping(
+
((PrimitiveObjectInspector)arg).getPrimitiveCategory())){
+throw new UDFArgumentTypeException(0, "The arguments to " + 
FUNC_NAME + " must be a string/varchar");
+}
+}
+stringOI = (PrimitiveObjectInspector) arguments[0];
+encodingFormat = (PrimitiveObjectInspector) arguments[1];
+return PrimitiveObjectInspectorFactory.javaStringObjectInspector;
+}
+
+@Override
+public Object evaluate(DeferredObject[] arguments) throws HiveException {
+String value = 
PrimitiveObjectInspectorUtils.getString(arguments[0].get(), stringOI);
+String messageFormat = 
PrimitiveObjectInspectorUtils.getString(arguments[1].get(), encodingFormat);
+if (value == null) {
+return null;
+} else if (messageFormat == null || messageFormat.isEmpty() || 
JSONMessageEncoder.FORMAT.equalsIgnoreCase(value))

[jira] [Work logged] (HIVE-25596) Compress Hive Replication Metrics while storing

2021-11-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25596?focusedWorklogId=674520&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-674520
 ]

ASF GitHub Bot logged work on HIVE-25596:
-

Author: ASF GitHub Bot
Created on: 03/Nov/21 10:42
Start Date: 03/Nov/21 10:42
Worklog Time Spent: 10m 
  Work Description: hmangla98 commented on a change in pull request #2724:
URL: https://github.com/apache/hive/pull/2724#discussion_r741814503



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplStatsTracker.java
##
@@ -36,7 +36,7 @@
 public class ReplStatsTracker {
 
   // Maintains the length of the RM_Progress column in the RDBMS, which stores 
the ReplStats
-  public static int RM_PROGRESS_LENGTH = 24000;
+  public static int RM_PROGRESS_LENGTH = 1;

Review comment:
   Attached the sample outputs and size enhancements in comment section.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 674520)
Time Spent: 3h 50m  (was: 3h 40m)

> Compress Hive Replication Metrics while storing
> ---
>
> Key: HIVE-25596
> URL: https://issues.apache.org/jira/browse/HIVE-25596
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
> Attachments: CompressedRM_Progress(k=10), CompressedRM_Progress(k=5), 
> PlainTextRM_Progress(k=10), PlainTextRM_Progress(k=5)
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Compress the json fields of sys.replication_metrics table to optimise RDBMS 
> space usage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25596) Compress Hive Replication Metrics while storing

2021-11-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25596?focusedWorklogId=674365&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-674365
 ]

ASF GitHub Bot logged work on HIVE-25596:
-

Author: ASF GitHub Bot
Created on: 03/Nov/21 03:29
Start Date: 03/Nov/21 03:29
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #2724:
URL: https://github.com/apache/hive/pull/2724#discussion_r741597141



##
File path: 
ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFDeserialize.java
##
@@ -0,0 +1,93 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.udf.generic;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.messaging.MessageEncoder;
+import org.apache.hadoop.hive.metastore.messaging.MessageFactory;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
+import org.apache.hadoop.io.Text;
+import org.junit.Test;
+import static org.junit.Assert.assertNotNull;
+import static org.junit.Assert.assertNull;
+import static org.junit.Assert.assertEquals;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
+
+/**
+ * TestGenericUDFGzipJsonDeserialize.
+ */
+public class TestGenericUDFDeserialize {
+
+@Test
+public void testOneArg() throws HiveException {
+GenericUDFDeserialize udf = new GenericUDFDeserialize();
+ObjectInspector valueOI1 = 
PrimitiveObjectInspectorFactory.writableStringObjectInspector;
+ObjectInspector valueOI2 = 
PrimitiveObjectInspectorFactory.writableStringObjectInspector;
+UDFArgumentException ex = null;
+try {
+udf.initialize(new ObjectInspector[]{valueOI1});
+} catch (UDFArgumentException e) {
+ex = e;
+}
+assertNotNull("The function deserialize() accepts 2 argument.", ex);
+ex = null;
+try {
+udf.initialize(new ObjectInspector[]{valueOI2, valueOI1});
+} catch (UDFArgumentException e) {
+ex = e;
+}
+assertNull("The function deserialize() accepts 2 argument.", ex);
+}
+
+@Test
+public void testGZIPJsonDeserializeString() throws HiveException {
+GenericUDFDeserialize udf = new GenericUDFDeserialize();
+udf.initialize(new 
ObjectInspector[]{PrimitiveObjectInspectorFactory.writableStringObjectInspector,
+
PrimitiveObjectInspectorFactory.writableStringObjectInspector});
+GenericUDF.DeferredObject[] args = new GenericUDF.DeferredObject[2];
+String expectedOutput = "test";
+MessageEncoder encoder = 
MessageFactory.getDefaultInstanceForReplMetrics(new HiveConf());
+String serializedMsg = 
encoder.getSerializer().serialize(expectedOutput);
+args[0] = new GenericUDF.DeferredJavaObject(new Text(serializedMsg));
+args[1] = new GenericUDF.DeferredJavaObject(new 
Text(encoder.getMessageFormat()));
+Object actualOutput = udf.evaluate(args).toString();
+assertEquals("deserialize() test", expectedOutput, actualOutput != 
null ? actualOutput : null);
+}
+
+@Test
+public void testInvalidMessageString() throws HiveException {
+GenericUDFDeserialize udf = new GenericUDFDeserialize();
+udf.initialize(new 
ObjectInspector[]{PrimitiveObjectInspectorFactory.writableStringObjectInspector,

Review comment:
   nit: format




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 674365)
Time Spent: 3h 40m  (was: 3.5h)

> Compress Hive Replication Metrics while storing
> ---

[jira] [Work logged] (HIVE-25596) Compress Hive Replication Metrics while storing

2021-11-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25596?focusedWorklogId=674364&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-674364
 ]

ASF GitHub Bot logged work on HIVE-25596:
-

Author: ASF GitHub Bot
Created on: 03/Nov/21 03:28
Start Date: 03/Nov/21 03:28
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #2724:
URL: https://github.com/apache/hive/pull/2724#discussion_r741597021



##
File path: 
ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFDeserialize.java
##
@@ -0,0 +1,93 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.udf.generic;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.messaging.MessageEncoder;
+import org.apache.hadoop.hive.metastore.messaging.MessageFactory;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
+import org.apache.hadoop.io.Text;
+import org.junit.Test;
+import static org.junit.Assert.assertNotNull;
+import static org.junit.Assert.assertNull;
+import static org.junit.Assert.assertEquals;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
+
+/**
+ * TestGenericUDFGzipJsonDeserialize.
+ */
+public class TestGenericUDFDeserialize {
+
+@Test
+public void testOneArg() throws HiveException {
+GenericUDFDeserialize udf = new GenericUDFDeserialize();
+ObjectInspector valueOI1 = 
PrimitiveObjectInspectorFactory.writableStringObjectInspector;
+ObjectInspector valueOI2 = 
PrimitiveObjectInspectorFactory.writableStringObjectInspector;
+UDFArgumentException ex = null;
+try {
+udf.initialize(new ObjectInspector[]{valueOI1});
+} catch (UDFArgumentException e) {
+ex = e;
+}
+assertNotNull("The function deserialize() accepts 2 argument.", ex);
+ex = null;
+try {
+udf.initialize(new ObjectInspector[]{valueOI2, valueOI1});
+} catch (UDFArgumentException e) {
+ex = e;
+}
+assertNull("The function deserialize() accepts 2 argument.", ex);
+}
+
+@Test
+public void testGZIPJsonDeserializeString() throws HiveException {
+GenericUDFDeserialize udf = new GenericUDFDeserialize();
+udf.initialize(new 
ObjectInspector[]{PrimitiveObjectInspectorFactory.writableStringObjectInspector,
+
PrimitiveObjectInspectorFactory.writableStringObjectInspector});
+GenericUDF.DeferredObject[] args = new GenericUDF.DeferredObject[2];
+String expectedOutput = "test";
+MessageEncoder encoder = 
MessageFactory.getDefaultInstanceForReplMetrics(new HiveConf());
+String serializedMsg = 
encoder.getSerializer().serialize(expectedOutput);
+args[0] = new GenericUDF.DeferredJavaObject(new Text(serializedMsg));
+args[1] = new GenericUDF.DeferredJavaObject(new 
Text(encoder.getMessageFormat()));
+Object actualOutput = udf.evaluate(args).toString();
+assertEquals("deserialize() test", expectedOutput, actualOutput != 
null ? actualOutput : null);
+}
+
+@Test
+public void testInvalidMessageString() throws HiveException {
+GenericUDFDeserialize udf = new GenericUDFDeserialize();
+udf.initialize(new 
ObjectInspector[]{PrimitiveObjectInspectorFactory.writableStringObjectInspector,
+
PrimitiveObjectInspectorFactory.writableStringObjectInspector});
+GenericUDF.DeferredObject[] args = new GenericUDF.DeferredObject[2];
+String expectedOutput = "test";
+MessageEncoder encoder = 
MessageFactory.getDefaultInstanceForReplMetrics(new HiveConf());
+String serializedMsg = 
encoder.getSerializer().serialize(expectedOutput);
+args[0] = new GenericUDF.DeferredJavaObject(new Text(serializedMsg));
+args[1] = new GenericUDF.DeferredJavaObject(new 
Text("randomSerialization"));
+   

[jira] [Work logged] (HIVE-25596) Compress Hive Replication Metrics while storing

2021-11-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25596?focusedWorklogId=674358&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-674358
 ]

ASF GitHub Bot logged work on HIVE-25596:
-

Author: ASF GitHub Bot
Created on: 03/Nov/21 03:18
Start Date: 03/Nov/21 03:18
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #2724:
URL: https://github.com/apache/hive/pull/2724#discussion_r741594555



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/repl/metric/MetricSink.java
##
@@ -116,14 +117,17 @@ public void run() {
   int totalMetricsSize = metrics.size();
   List replicationMetricsList = new 
ArrayList<>(totalMetricsSize);
   ObjectMapper mapper = new ObjectMapper();
+  MessageEncoder encoder = 
MessageFactory.getDefaultInstanceForReplMetrics(conf);
+  MessageSerializer serializer = encoder.getSerializer();
   for (int index = 0; index < totalMetricsSize; index++) {
 ReplicationMetric metric = metrics.removeFirst();
 ReplicationMetrics persistentMetric = new ReplicationMetrics();
 persistentMetric.setDumpExecutionId(metric.getDumpExecutionId());
 
persistentMetric.setScheduledExecutionId(metric.getScheduledExecutionId());
 persistentMetric.setPolicy(metric.getPolicy());
-
persistentMetric.setProgress(mapper.writeValueAsString(metric.getProgress()));
-
persistentMetric.setMetadata(mapper.writeValueAsString(metric.getMetadata()));
+
persistentMetric.setProgress(serializer.serialize(mapper.writeValueAsString(metric.getProgress(;
+
persistentMetric.setMetadata(serializer.serialize(mapper.writeValueAsString(metric.getMetadata(;

Review comment:
   How does this justify a need to compress the metadata filed in that 
case? I think we should focus on the size in worst case and then see change 
post compression. That way we can decide on:
   a) whether we really need compressetion for metadata column
   b) if so, how much should the column size be. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 674358)
Time Spent: 3h 20m  (was: 3h 10m)

> Compress Hive Replication Metrics while storing
> ---
>
> Key: HIVE-25596
> URL: https://issues.apache.org/jira/browse/HIVE-25596
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Compress the json fields of sys.replication_metrics table to optimise RDBMS 
> space usage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25596) Compress Hive Replication Metrics while storing

2021-11-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25596?focusedWorklogId=674347&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-674347
 ]

ASF GitHub Bot logged work on HIVE-25596:
-

Author: ASF GitHub Bot
Created on: 03/Nov/21 02:52
Start Date: 03/Nov/21 02:52
Worklog Time Spent: 10m 
  Work Description: hmangla98 commented on a change in pull request #2724:
URL: https://github.com/apache/hive/pull/2724#discussion_r741586076



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/repl/metric/MetricSink.java
##
@@ -116,14 +117,17 @@ public void run() {
   int totalMetricsSize = metrics.size();
   List replicationMetricsList = new 
ArrayList<>(totalMetricsSize);
   ObjectMapper mapper = new ObjectMapper();
+  MessageEncoder encoder = 
MessageFactory.getDefaultInstanceForReplMetrics(conf);
+  MessageSerializer serializer = encoder.getSerializer();
   for (int index = 0; index < totalMetricsSize; index++) {
 ReplicationMetric metric = metrics.removeFirst();
 ReplicationMetrics persistentMetric = new ReplicationMetrics();
 persistentMetric.setDumpExecutionId(metric.getDumpExecutionId());
 
persistentMetric.setScheduledExecutionId(metric.getScheduledExecutionId());
 persistentMetric.setPolicy(metric.getPolicy());
-
persistentMetric.setProgress(mapper.writeValueAsString(metric.getProgress()));
-
persistentMetric.setMetadata(mapper.writeValueAsString(metric.getMetadata()));
+
persistentMetric.setProgress(serializer.serialize(mapper.writeValueAsString(metric.getProgress(;
+
persistentMetric.setMetadata(serializer.serialize(mapper.writeValueAsString(metric.getMetadata(;

Review comment:
   Tested with one such sample metadata entry.
   Plain text was of 234 bytes and using compression, output string was of 209 
Bytes.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 674347)
Time Spent: 3h 10m  (was: 3h)

> Compress Hive Replication Metrics while storing
> ---
>
> Key: HIVE-25596
> URL: https://issues.apache.org/jira/browse/HIVE-25596
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Compress the json fields of sys.replication_metrics table to optimise RDBMS 
> space usage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25596) Compress Hive Replication Metrics while storing

2021-11-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25596?focusedWorklogId=674329&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-674329
 ]

ASF GitHub Bot logged work on HIVE-25596:
-

Author: ASF GitHub Bot
Created on: 03/Nov/21 02:34
Start Date: 03/Nov/21 02:34
Worklog Time Spent: 10m 
  Work Description: hmangla98 commented on a change in pull request #2724:
URL: https://github.com/apache/hive/pull/2724#discussion_r741581141



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFDeserialize.java
##
@@ -0,0 +1,86 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.udf.generic;
+
+import org.apache.hadoop.hive.metastore.messaging.json.JSONMessageEncoder;
+import 
org.apache.hadoop.hive.metastore.messaging.json.gzip.GzipJSONMessageEncoder;
+import org.apache.hadoop.hive.ql.exec.Description;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils;
+
+/**
+ * GenericUDFDeserializeString.
+ *
+ */
+@Description(name = "deserialize",
+value="_FUNC_(message, encodingFormat) - Returns deserialized string 
of encoded message.",
+extended="Example:\n"
++ "  > SELECT _FUNC_('H4sI/ytJLS4BAAx+f9gE', 
'gzip(json-2.0)') FROM src LIMIT 1;\n"
++ "  test")
+public class GenericUDFDeserialize extends GenericUDF {
+
+private static final int ARG_COUNT = 2; // Number of arguments to this UDF
+private static final String FUNC_NAME = "deserialize"; // External Name
+
+private transient PrimitiveObjectInspector stringOI = null;
+private transient PrimitiveObjectInspector encodingFormat = null;
+
+@Override
+public ObjectInspector initialize(ObjectInspector[] arguments)
+throws UDFArgumentException {
+if (arguments.length != ARG_COUNT) {
+throw new UDFArgumentException("The function " + FUNC_NAME + " 
accepts " + ARG_COUNT + " arguments.");
+}
+for (ObjectInspector arg: arguments) {
+if (arg.getCategory() != ObjectInspector.Category.PRIMITIVE ||
+
PrimitiveObjectInspectorUtils.PrimitiveGrouping.STRING_GROUP != 
PrimitiveObjectInspectorUtils.getPrimitiveGrouping(
+
((PrimitiveObjectInspector)arg).getPrimitiveCategory())){
+throw new UDFArgumentTypeException(0, "The arguments to " + 
FUNC_NAME + " must be a string/varchar");
+}
+}
+stringOI = (PrimitiveObjectInspector) arguments[0];
+encodingFormat = (PrimitiveObjectInspector) arguments[1];
+return PrimitiveObjectInspectorFactory.javaStringObjectInspector;
+}
+
+@Override
+public Object evaluate(DeferredObject[] arguments) throws HiveException {
+String value = 
PrimitiveObjectInspectorUtils.getString(arguments[0].get(), stringOI);
+String messageFormat = 
PrimitiveObjectInspectorUtils.getString(arguments[1].get(), encodingFormat);
+if (value == null) {
+return null;
+} else if (messageFormat == null || messageFormat.isEmpty() || 
JSONMessageEncoder.FORMAT.equalsIgnoreCase(value)) {
+return value;
+} else if 
(GzipJSONMessageEncoder.FORMAT.equalsIgnoreCase(messageFormat)) {
+return 
GzipJSONMessageEncoder.getInstance().getDeserializer().deSerializeGenericString(value);
+} else {
+throw new HiveException("Invalid message format provided: " + 
messageFormat + " for message: " + value);

Review comment:
   Already included in TestGenericUDFDeserialize#testInvalidMessageString




-- 
This is an automated message fr

[jira] [Work logged] (HIVE-25596) Compress Hive Replication Metrics while storing

2021-11-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25596?focusedWorklogId=674317&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-674317
 ]

ASF GitHub Bot logged work on HIVE-25596:
-

Author: ASF GitHub Bot
Created on: 03/Nov/21 02:22
Start Date: 03/Nov/21 02:22
Worklog Time Spent: 10m 
  Work Description: hmangla98 commented on a change in pull request #2724:
URL: https://github.com/apache/hive/pull/2724#discussion_r741578189



##
File path: 
standalone-metastore/metastore-server/src/test/resources/sql/postgres/upgrade-3.1.3000-to-4.0.0.postgres.sql
##
@@ -51,6 +51,10 @@ CREATE TABLE "REPLICATION_METRICS" (
 --Increase the size of RM_PROGRESS to accomodate the replication statistics
 ALTER TABLE "REPLICATION_METRICS" ALTER "RM_PROGRESS" TYPE varchar(24000);
 
+ALTER TABLE "REPLICATION_METRICS" ALTER "RM_PROGRESS" TYPE varchar(1);

Review comment:
   This is tested as part of ITestPostgres.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 674317)
Time Spent: 2h 50m  (was: 2h 40m)

> Compress Hive Replication Metrics while storing
> ---
>
> Key: HIVE-25596
> URL: https://issues.apache.org/jira/browse/HIVE-25596
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Compress the json fields of sys.replication_metrics table to optimise RDBMS 
> space usage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25596) Compress Hive Replication Metrics while storing

2021-11-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25596?focusedWorklogId=674312&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-674312
 ]

ASF GitHub Bot logged work on HIVE-25596:
-

Author: ASF GitHub Bot
Created on: 03/Nov/21 02:12
Start Date: 03/Nov/21 02:12
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #2724:
URL: https://github.com/apache/hive/pull/2724#discussion_r741567981



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/repl/metric/ReplicationMetricCollector.java
##
@@ -58,14 +59,15 @@ public void setMetricsMBean(ObjectName metricsMBean) {
 
   public ReplicationMetricCollector(String dbName, Metadata.ReplicationType 
replicationType,
  String stagingDir, long dumpExecutionId, HiveConf 
conf) {
+this.conf = conf;
 checkEnabledForTests(conf);
 String policy = conf.get(Constants.SCHEDULED_QUERY_SCHEDULENAME);
 long executionId = conf.getLong(Constants.SCHEDULED_QUERY_EXECUTIONID, 0L);
 if (!StringUtils.isEmpty(policy) && executionId > 0) {
   isEnabled = true;
   metricCollector = MetricCollector.getInstance().init(conf);
   MetricSink.getInstance().init(conf);
-  Metadata metadata = new Metadata(dbName, replicationType, stagingDir);
+  Metadata metadata = new Metadata(dbName, replicationType, 
testingModeEnabled() ? "dummyDir" :stagingDir);

Review comment:
   nit: shift this staging logic calculation to a separate method

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFDeserialize.java
##
@@ -0,0 +1,86 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.udf.generic;
+
+import org.apache.hadoop.hive.metastore.messaging.json.JSONMessageEncoder;
+import 
org.apache.hadoop.hive.metastore.messaging.json.gzip.GzipJSONMessageEncoder;
+import org.apache.hadoop.hive.ql.exec.Description;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils;
+
+/**
+ * GenericUDFDeserializeString.
+ *
+ */
+@Description(name = "deserialize",
+value="_FUNC_(message, encodingFormat) - Returns deserialized string 
of encoded message.",
+extended="Example:\n"
++ "  > SELECT _FUNC_('H4sI/ytJLS4BAAx+f9gE', 
'gzip(json-2.0)') FROM src LIMIT 1;\n"

Review comment:
   The base64 encoding is  missed out here in description i.e even though 
the passed parameter is "gzip(json-2.0)", the fact is that it won't work unless 
the passed content is base64 encoded. 
   Moreover,  json part in "gzip(json-2.0)" at a UDF level would be confusing. 
Meaning even if underlying string is non json one, the UDF will work just fine. 
 user mandated to pass json even if it is not doesn't go well. for a UDF level 
just gzip would have been fine?

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFDeserialize.java
##
@@ -0,0 +1,86 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND

[jira] [Work logged] (HIVE-25596) Compress Hive Replication Metrics while storing

2021-11-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25596?focusedWorklogId=674178&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-674178
 ]

ASF GitHub Bot logged work on HIVE-25596:
-

Author: ASF GitHub Bot
Created on: 02/Nov/21 21:48
Start Date: 02/Nov/21 21:48
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #2724:
URL: https://github.com/apache/hive/pull/2724#discussion_r740743883



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplStatsTracker.java
##
@@ -36,7 +36,7 @@
 public class ReplStatsTracker {
 
   // Maintains the length of the RM_Progress column in the RDBMS, which stores 
the ReplStats
-  public static int RM_PROGRESS_LENGTH = 24000;
+  public static int RM_PROGRESS_LENGTH = 1;

Review comment:
   Were you able to justify the size?

##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplStatsTracker.java
##
@@ -36,7 +36,7 @@
 public class ReplStatsTracker {
 
   // Maintains the length of the RM_Progress column in the RDBMS, which stores 
the ReplStats
-  public static int RM_PROGRESS_LENGTH = 24000;
+  public static int RM_PROGRESS_LENGTH = 1;

Review comment:
   Can you please add a sample metric with top K=5/K=10 both




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 674178)
Time Spent: 2.5h  (was: 2h 20m)

> Compress Hive Replication Metrics while storing
> ---
>
> Key: HIVE-25596
> URL: https://issues.apache.org/jira/browse/HIVE-25596
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Compress the json fields of sys.replication_metrics table to optimise RDBMS 
> space usage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25596) Compress Hive Replication Metrics while storing

2021-11-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25596?focusedWorklogId=673450&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-673450
 ]

ASF GitHub Bot logged work on HIVE-25596:
-

Author: ASF GitHub Bot
Created on: 02/Nov/21 17:58
Start Date: 02/Nov/21 17:58
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #2724:
URL: https://github.com/apache/hive/pull/2724#discussion_r740743883



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplStatsTracker.java
##
@@ -36,7 +36,7 @@
 public class ReplStatsTracker {
 
   // Maintains the length of the RM_Progress column in the RDBMS, which stores 
the ReplStats
-  public static int RM_PROGRESS_LENGTH = 24000;
+  public static int RM_PROGRESS_LENGTH = 1;

Review comment:
   Were you able to justify the size?

##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplStatsTracker.java
##
@@ -36,7 +36,7 @@
 public class ReplStatsTracker {
 
   // Maintains the length of the RM_Progress column in the RDBMS, which stores 
the ReplStats
-  public static int RM_PROGRESS_LENGTH = 24000;
+  public static int RM_PROGRESS_LENGTH = 1;

Review comment:
   Can you please add a sample metric with top K=5/K=10 both




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 673450)
Time Spent: 2h 20m  (was: 2h 10m)

> Compress Hive Replication Metrics while storing
> ---
>
> Key: HIVE-25596
> URL: https://issues.apache.org/jira/browse/HIVE-25596
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Compress the json fields of sys.replication_metrics table to optimise RDBMS 
> space usage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25596) Compress Hive Replication Metrics while storing

2021-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25596?focusedWorklogId=672977&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-672977
 ]

ASF GitHub Bot logged work on HIVE-25596:
-

Author: ASF GitHub Bot
Created on: 02/Nov/21 06:12
Start Date: 02/Nov/21 06:12
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #2724:
URL: https://github.com/apache/hive/pull/2724#discussion_r740744473



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplStatsTracker.java
##
@@ -36,7 +36,7 @@
 public class ReplStatsTracker {
 
   // Maintains the length of the RM_Progress column in the RDBMS, which stores 
the ReplStats
-  public static int RM_PROGRESS_LENGTH = 24000;
+  public static int RM_PROGRESS_LENGTH = 1;

Review comment:
   Can you please add a sample metric with top K=5/K=10 both




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 672977)
Time Spent: 2h 10m  (was: 2h)

> Compress Hive Replication Metrics while storing
> ---
>
> Key: HIVE-25596
> URL: https://issues.apache.org/jira/browse/HIVE-25596
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Compress the json fields of sys.replication_metrics table to optimise RDBMS 
> space usage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25596) Compress Hive Replication Metrics while storing

2021-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25596?focusedWorklogId=672976&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-672976
 ]

ASF GitHub Bot logged work on HIVE-25596:
-

Author: ASF GitHub Bot
Created on: 02/Nov/21 06:11
Start Date: 02/Nov/21 06:11
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #2724:
URL: https://github.com/apache/hive/pull/2724#discussion_r740743883



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplStatsTracker.java
##
@@ -36,7 +36,7 @@
 public class ReplStatsTracker {
 
   // Maintains the length of the RM_Progress column in the RDBMS, which stores 
the ReplStats
-  public static int RM_PROGRESS_LENGTH = 24000;
+  public static int RM_PROGRESS_LENGTH = 1;

Review comment:
   Were you able to justify the size?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 672976)
Time Spent: 2h  (was: 1h 50m)

> Compress Hive Replication Metrics while storing
> ---
>
> Key: HIVE-25596
> URL: https://issues.apache.org/jira/browse/HIVE-25596
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Compress the json fields of sys.replication_metrics table to optimise RDBMS 
> space usage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25596) Compress Hive Replication Metrics while storing

2021-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25596?focusedWorklogId=671426&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-671426
 ]

ASF GitHub Bot logged work on HIVE-25596:
-

Author: ASF GitHub Bot
Created on: 28/Oct/21 13:15
Start Date: 28/Oct/21 13:15
Worklog Time Spent: 10m 
  Work Description: hmangla98 commented on a change in pull request #2724:
URL: https://github.com/apache/hive/pull/2724#discussion_r73835



##
File path: metastore/scripts/upgrade/hive/hive-schema-4.0.0.hive.sql
##
@@ -1466,7 +1466,8 @@ CREATE EXTERNAL TABLE IF NOT EXISTS `REPLICATION_METRICS` 
(
 `POLICY_NAME` string,
 `DUMP_EXECUTION_ID` bigint,
 `METADATA` string,
-`PROGRESS` string
+`PROGRESS` string,
+`MESSAGE_FORMAT` varchar(16)

Review comment:
   Yes, varchar is supported in hive-schema files. Actually, we're using 
same syntax in case of Notification_log table. So, i kept it same for both the 
tables.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 671426)
Time Spent: 1h 50m  (was: 1h 40m)

> Compress Hive Replication Metrics while storing
> ---
>
> Key: HIVE-25596
> URL: https://issues.apache.org/jira/browse/HIVE-25596
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Compress the json fields of sys.replication_metrics table to optimise RDBMS 
> space usage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25596) Compress Hive Replication Metrics while storing

2021-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25596?focusedWorklogId=671407&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-671407
 ]

ASF GitHub Bot logged work on HIVE-25596:
-

Author: ASF GitHub Bot
Created on: 28/Oct/21 12:26
Start Date: 28/Oct/21 12:26
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #2724:
URL: https://github.com/apache/hive/pull/2724#discussion_r738331249



##
File path: metastore/scripts/upgrade/hive/upgrade-3.1.0-to-4.0.0.hive.sql
##
@@ -527,7 +527,8 @@ CREATE EXTERNAL TABLE IF NOT EXISTS `REPLICATION_METRICS` (
 `POLICY_NAME` string,
 `DUMP_EXECUTION_ID` bigint,
 `METADATA` string,
-`PROGRESS` string
+`PROGRESS` string,
+`MESSAGE_FORMAT` varchar(16)

Review comment:
   varchar or string?

##
File path: 
standalone-metastore/metastore-server/src/main/sql/mssql/hive-schema-4.0.0.mssql.sql
##
@@ -1367,7 +1367,8 @@ CREATE TABLE "REPLICATION_METRICS" (
   "RM_DUMP_EXECUTION_ID" bigint NOT NULL,
   "RM_METADATA" varchar(max),
   "RM_PROGRESS" varchar(max),
-  "RM_START_TIME" integer NOT NULL
+  "RM_START_TIME" integer NOT NULL,
+  MESSAGE_FORMAT nvarchar(16),

Review comment:
   typo nvarchar

##
File path: metastore/scripts/upgrade/hive/hive-schema-4.0.0.hive.sql
##
@@ -1466,7 +1466,8 @@ CREATE EXTERNAL TABLE IF NOT EXISTS `REPLICATION_METRICS` 
(
 `POLICY_NAME` string,
 `DUMP_EXECUTION_ID` bigint,
 `METADATA` string,
-`PROGRESS` string
+`PROGRESS` string,
+`MESSAGE_FORMAT` varchar(16)

Review comment:
   is varchar supported? can this be string?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 671407)
Time Spent: 1h 40m  (was: 1.5h)

> Compress Hive Replication Metrics while storing
> ---
>
> Key: HIVE-25596
> URL: https://issues.apache.org/jira/browse/HIVE-25596
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Compress the json fields of sys.replication_metrics table to optimise RDBMS 
> space usage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25596) Compress Hive Replication Metrics while storing

2021-10-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25596?focusedWorklogId=668453&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-668453
 ]

ASF GitHub Bot logged work on HIVE-25596:
-

Author: ASF GitHub Bot
Created on: 21/Oct/21 14:07
Start Date: 21/Oct/21 14:07
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #2724:
URL: https://github.com/apache/hive/pull/2724#discussion_r733716884



##
File path: 
standalone-metastore/metastore-server/src/main/sql/derby/hive-schema-4.0.0.derby.sql
##
@@ -803,11 +803,13 @@ CREATE TABLE "APP"."REPLICATION_METRICS" (
   "RM_POLICY" varchar(256) NOT NULL,
   "RM_DUMP_EXECUTION_ID" bigint NOT NULL,
   "RM_METADATA" varchar(4000),
-  "RM_PROGRESS" varchar(1),
+  "RM_PROGRESS" varchar(24000),
   "RM_START_TIME" integer not null,
   PRIMARY KEY("RM_SCHEDULED_EXECUTION_ID")
 );
 
+ALTER TABLE "APP"."REPLICATION_METRICS" ALTER "RM_PROGRESS" SET DATA TYPE 
VARCHAR(1);

Review comment:
   Run the Itests for all RDBMS




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 668453)
Time Spent: 1.5h  (was: 1h 20m)

> Compress Hive Replication Metrics while storing
> ---
>
> Key: HIVE-25596
> URL: https://issues.apache.org/jira/browse/HIVE-25596
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Compress the json fields of sys.replication_metrics table to optimise RDBMS 
> space usage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25596) Compress Hive Replication Metrics while storing

2021-10-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25596?focusedWorklogId=668452&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-668452
 ]

ASF GitHub Bot logged work on HIVE-25596:
-

Author: ASF GitHub Bot
Created on: 21/Oct/21 14:06
Start Date: 21/Oct/21 14:06
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #2724:
URL: https://github.com/apache/hive/pull/2724#discussion_r733716884



##
File path: 
standalone-metastore/metastore-server/src/main/sql/derby/hive-schema-4.0.0.derby.sql
##
@@ -803,11 +803,13 @@ CREATE TABLE "APP"."REPLICATION_METRICS" (
   "RM_POLICY" varchar(256) NOT NULL,
   "RM_DUMP_EXECUTION_ID" bigint NOT NULL,
   "RM_METADATA" varchar(4000),
-  "RM_PROGRESS" varchar(1),
+  "RM_PROGRESS" varchar(24000),
   "RM_START_TIME" integer not null,
   PRIMARY KEY("RM_SCHEDULED_EXECUTION_ID")
 );
 
+ALTER TABLE "APP"."REPLICATION_METRICS" ALTER "RM_PROGRESS" SET DATA TYPE 
VARCHAR(1);

Review comment:
   Run the Itests




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 668452)
Time Spent: 1h 20m  (was: 1h 10m)

> Compress Hive Replication Metrics while storing
> ---
>
> Key: HIVE-25596
> URL: https://issues.apache.org/jira/browse/HIVE-25596
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Compress the json fields of sys.replication_metrics table to optimise RDBMS 
> space usage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25596) Compress Hive Replication Metrics while storing

2021-10-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25596?focusedWorklogId=668451&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-668451
 ]

ASF GitHub Bot logged work on HIVE-25596:
-

Author: ASF GitHub Bot
Created on: 21/Oct/21 14:06
Start Date: 21/Oct/21 14:06
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #2724:
URL: https://github.com/apache/hive/pull/2724#discussion_r733716632



##
File path: ql/src/test/queries/clientpositive/udf_gzip_json_deserialize.q
##
@@ -0,0 +1,7 @@
+
+DESCRIBE FUNCTION gzip_json_deserialize;
+DESCRIBE FUNCTION EXTENDED gzip_json_deserialize;
+
+-- evalutes function for array of primitives
+SELECT gzip_json_deserialize("H4sI/ytJLS4BAAx+f9gE");

Review comment:
   the validation is missing




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 668451)
Time Spent: 1h 10m  (was: 1h)

> Compress Hive Replication Metrics while storing
> ---
>
> Key: HIVE-25596
> URL: https://issues.apache.org/jira/browse/HIVE-25596
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Compress the json fields of sys.replication_metrics table to optimise RDBMS 
> space usage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25596) Compress Hive Replication Metrics while storing

2021-10-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25596?focusedWorklogId=668450&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-668450
 ]

ASF GitHub Bot logged work on HIVE-25596:
-

Author: ASF GitHub Bot
Created on: 21/Oct/21 14:04
Start Date: 21/Oct/21 14:04
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #2724:
URL: https://github.com/apache/hive/pull/2724#discussion_r733714401



##
File path: 
standalone-metastore/metastore-server/src/main/sql/derby/hive-schema-4.0.0.derby.sql
##
@@ -803,11 +803,13 @@ CREATE TABLE "APP"."REPLICATION_METRICS" (
   "RM_POLICY" varchar(256) NOT NULL,
   "RM_DUMP_EXECUTION_ID" bigint NOT NULL,
   "RM_METADATA" varchar(4000),
-  "RM_PROGRESS" varchar(1),
+  "RM_PROGRESS" varchar(24000),
   "RM_START_TIME" integer not null,
   PRIMARY KEY("RM_SCHEDULED_EXECUTION_ID")
 );
 
+ALTER TABLE "APP"."REPLICATION_METRICS" ALTER "RM_PROGRESS" SET DATA TYPE 
VARCHAR(1);

Review comment:
   Is the syntax validated?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 668450)
Time Spent: 1h  (was: 50m)

> Compress Hive Replication Metrics while storing
> ---
>
> Key: HIVE-25596
> URL: https://issues.apache.org/jira/browse/HIVE-25596
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Compress the json fields of sys.replication_metrics table to optimise RDBMS 
> space usage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25596) Compress Hive Replication Metrics while storing

2021-10-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25596?focusedWorklogId=668398&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-668398
 ]

ASF GitHub Bot logged work on HIVE-25596:
-

Author: ASF GitHub Bot
Created on: 21/Oct/21 13:09
Start Date: 21/Oct/21 13:09
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #2724:
URL: https://github.com/apache/hive/pull/2724#discussion_r733658593



##
File path: 
standalone-metastore/metastore-server/src/main/sql/derby/upgrade-3.2.0-to-4.0.0.derby.sql
##
@@ -98,7 +98,7 @@ CREATE TABLE "APP"."REPLICATION_METRICS" (
 );
 
 --Increase the size of RM_PROGRESS to accomodate the replication statistics
-ALTER TABLE "APP"."REPLICATION_METRICS" ALTER "RM_PROGRESS" SET DATA TYPE 
VARCHAR(24000);
+ALTER TABLE "APP"."REPLICATION_METRICS" ALTER "RM_PROGRESS" SET DATA TYPE 
VARCHAR(1);

Review comment:
   dont modify this. Do a alter table and reduce the size.

##
File path: 
standalone-metastore/metastore-server/src/main/sql/mysql/hive-schema-4.0.0.mysql.sql
##
@@ -1269,7 +1269,7 @@ CREATE TABLE IF NOT EXISTS REPLICATION_METRICS (
   RM_POLICY varchar(256) NOT NULL,
   RM_DUMP_EXECUTION_ID bigint NOT NULL,
   RM_METADATA varchar(4000),
-  RM_PROGRESS varchar(24000),
+  RM_PROGRESS varchar(1),

Review comment:
   dont modify this. Do a alter table and reduce the size.

##
File path: 
standalone-metastore/metastore-server/src/main/sql/mysql/upgrade-3.2.0-to-4.0.0.mysql.sql
##
@@ -105,7 +105,7 @@ CREATE TABLE IF NOT EXISTS REPLICATION_METRICS (
 ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
 
 --Increase the size of RM_PROGRESS to accomodate the replication statistics
-ALTER TABLE REPLICATION_METRICS MODIFY RM_PROGRESS varchar(24000);
+ALTER TABLE REPLICATION_METRICS MODIFY RM_PROGRESS varchar(1);

Review comment:
   dont modify this. Do a alter table and reduce the size.

##
File path: 
standalone-metastore/metastore-server/src/main/sql/postgres/hive-schema-4.0.0.postgres.sql
##
@@ -1976,7 +1976,7 @@ CREATE TABLE "REPLICATION_METRICS" (
   "RM_POLICY" varchar(256) NOT NULL,
   "RM_DUMP_EXECUTION_ID" bigint NOT NULL,
   "RM_METADATA" varchar(4000),
-  "RM_PROGRESS" varchar(24000),
+  "RM_PROGRESS" varchar(1),

Review comment:
   dont modify this. Do a alter table and reduce the size.

##
File path: 
standalone-metastore/metastore-server/src/main/sql/postgres/upgrade-3.2.0-to-4.0.0.postgres.sql
##
@@ -229,7 +229,7 @@ CREATE TABLE "REPLICATION_METRICS" (
 );
 
 --Increase the size of RM_PROGRESS to accomodate the replication statistics
-ALTER TABLE "REPLICATION_METRICS" ALTER "RM_PROGRESS" TYPE varchar(24000);
+ALTER TABLE "REPLICATION_METRICS" ALTER "RM_PROGRESS" TYPE varchar(1);

Review comment:
   dont modify this. Do a alter table and reduce the size.

##
File path: 
standalone-metastore/metastore-server/src/test/resources/sql/postgres/upgrade-3.1.3000-to-4.0.0.postgres.sql
##
@@ -49,7 +49,7 @@ CREATE TABLE "REPLICATION_METRICS" (
 );
 
 --Increase the size of RM_PROGRESS to accomodate the replication statistics
-ALTER TABLE "REPLICATION_METRICS" ALTER "RM_PROGRESS" TYPE varchar(24000);
+ALTER TABLE "REPLICATION_METRICS" ALTER "RM_PROGRESS" TYPE varchar(1);

Review comment:
   dont modify this. Do a alter table and reduce the size.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 668398)
Time Spent: 50m  (was: 40m)

> Compress Hive Replication Metrics while storing
> ---
>
> Key: HIVE-25596
> URL: https://issues.apache.org/jira/browse/HIVE-25596
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Compress the json fields of sys.replication_metrics table to optimise RDBMS 
> space usage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25596) Compress Hive Replication Metrics while storing

2021-10-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25596?focusedWorklogId=668397&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-668397
 ]

ASF GitHub Bot logged work on HIVE-25596:
-

Author: ASF GitHub Bot
Created on: 21/Oct/21 13:08
Start Date: 21/Oct/21 13:08
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #2724:
URL: https://github.com/apache/hive/pull/2724#discussion_r733658209



##
File path: 
standalone-metastore/metastore-server/src/main/sql/derby/hive-schema-4.0.0.derby.sql
##
@@ -803,7 +803,7 @@ CREATE TABLE "APP"."REPLICATION_METRICS" (
   "RM_POLICY" varchar(256) NOT NULL,
   "RM_DUMP_EXECUTION_ID" bigint NOT NULL,
   "RM_METADATA" varchar(4000),
-  "RM_PROGRESS" varchar(24000),
+  "RM_PROGRESS" varchar(1),

Review comment:
   dont modify this. Do a alter table and reduce the size.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 668397)
Time Spent: 40m  (was: 0.5h)

> Compress Hive Replication Metrics while storing
> ---
>
> Key: HIVE-25596
> URL: https://issues.apache.org/jira/browse/HIVE-25596
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Compress the json fields of sys.replication_metrics table to optimise RDBMS 
> space usage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25596) Compress Hive Replication Metrics while storing

2021-10-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25596?focusedWorklogId=667473&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-667473
 ]

ASF GitHub Bot logged work on HIVE-25596:
-

Author: ASF GitHub Bot
Created on: 20/Oct/21 07:15
Start Date: 20/Oct/21 07:15
Worklog Time Spent: 10m 
  Work Description: hmangla98 commented on a change in pull request #2724:
URL: https://github.com/apache/hive/pull/2724#discussion_r732477821



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/repl/metric/MetricSink.java
##
@@ -116,14 +118,15 @@ public void run() {
   int totalMetricsSize = metrics.size();
   List replicationMetricsList = new 
ArrayList<>(totalMetricsSize);
   ObjectMapper mapper = new ObjectMapper();
+  MessageSerializer serializer = 
GzipJSONMessageEncoder.getInstance().getSerializer();
   for (int index = 0; index < totalMetricsSize; index++) {
 ReplicationMetric metric = metrics.removeFirst();
 ReplicationMetrics persistentMetric = new ReplicationMetrics();
 persistentMetric.setDumpExecutionId(metric.getDumpExecutionId());
 
persistentMetric.setScheduledExecutionId(metric.getScheduledExecutionId());
 persistentMetric.setPolicy(metric.getPolicy());
-
persistentMetric.setProgress(mapper.writeValueAsString(metric.getProgress()));
-
persistentMetric.setMetadata(mapper.writeValueAsString(metric.getMetadata()));
+
persistentMetric.setProgress(serializer.serialize(mapper.writeValueAsString(metric.getProgress(;
+
persistentMetric.setMetadata(serializer.serialize(mapper.writeValueAsString(metric.getMetadata(;

Review comment:
   I tried with a string of 100 characters(100 Bytes) and the compressed 
string was of 24 Bytes which is reduced by 76% of original string.

##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
##
@@ -510,6 +510,7 @@
 system.registerGenericUDF("sort_array", GenericUDFSortArray.class);
 system.registerGenericUDF("sort_array_by", 
GenericUDFSortArrayByField.class);
 system.registerGenericUDF("array_contains", GenericUDFArrayContains.class);
+system.registerGenericUDF("deserialize", GenericUDFDeserialize.class);

Review comment:
   Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 667473)
Time Spent: 0.5h  (was: 20m)

> Compress Hive Replication Metrics while storing
> ---
>
> Key: HIVE-25596
> URL: https://issues.apache.org/jira/browse/HIVE-25596
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Compress the json fields of sys.replication_metrics table to optimise RDBMS 
> space usage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25596) Compress Hive Replication Metrics while storing

2021-10-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25596?focusedWorklogId=667449&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-667449
 ]

ASF GitHub Bot logged work on HIVE-25596:
-

Author: ASF GitHub Bot
Created on: 20/Oct/21 05:05
Start Date: 20/Oct/21 05:05
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #2724:
URL: https://github.com/apache/hive/pull/2724#discussion_r732417367



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
##
@@ -510,6 +510,7 @@
 system.registerGenericUDF("sort_array", GenericUDFSortArray.class);
 system.registerGenericUDF("sort_array_by", 
GenericUDFSortArrayByField.class);
 system.registerGenericUDF("array_contains", GenericUDFArrayContains.class);
+system.registerGenericUDF("deserialize", GenericUDFDeserialize.class);

Review comment:
   rename to gzip_json_deserialize?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 667449)
Time Spent: 20m  (was: 10m)

> Compress Hive Replication Metrics while storing
> ---
>
> Key: HIVE-25596
> URL: https://issues.apache.org/jira/browse/HIVE-25596
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Compress the json fields of sys.replication_metrics table to optimise RDBMS 
> space usage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25596) Compress Hive Replication Metrics while storing

2021-10-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25596?focusedWorklogId=667448&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-667448
 ]

ASF GitHub Bot logged work on HIVE-25596:
-

Author: ASF GitHub Bot
Created on: 20/Oct/21 05:02
Start Date: 20/Oct/21 05:02
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #2724:
URL: https://github.com/apache/hive/pull/2724#discussion_r732416554



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/repl/metric/MetricSink.java
##
@@ -116,14 +118,15 @@ public void run() {
   int totalMetricsSize = metrics.size();
   List replicationMetricsList = new 
ArrayList<>(totalMetricsSize);
   ObjectMapper mapper = new ObjectMapper();
+  MessageSerializer serializer = 
GzipJSONMessageEncoder.getInstance().getSerializer();
   for (int index = 0; index < totalMetricsSize; index++) {
 ReplicationMetric metric = metrics.removeFirst();
 ReplicationMetrics persistentMetric = new ReplicationMetrics();
 persistentMetric.setDumpExecutionId(metric.getDumpExecutionId());
 
persistentMetric.setScheduledExecutionId(metric.getScheduledExecutionId());
 persistentMetric.setPolicy(metric.getPolicy());
-
persistentMetric.setProgress(mapper.writeValueAsString(metric.getProgress()));
-
persistentMetric.setMetadata(mapper.writeValueAsString(metric.getMetadata()));
+
persistentMetric.setProgress(serializer.serialize(mapper.writeValueAsString(metric.getProgress(;
+
persistentMetric.setMetadata(serializer.serialize(mapper.writeValueAsString(metric.getMetadata(;

Review comment:
   what is the size improvement we get with serializing




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 667448)
Remaining Estimate: 0h
Time Spent: 10m

> Compress Hive Replication Metrics while storing
> ---
>
> Key: HIVE-25596
> URL: https://issues.apache.org/jira/browse/HIVE-25596
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Compress the json fields of sys.replication_metrics table to optimise RDBMS 
> space usage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)