[jira] [Commented] (HIVE-24129) Deleting the previous successful dump directory should be based on config

2020-09-08 Thread Aasha Medhi (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17192633#comment-17192633
 ] 

Aasha Medhi commented on HIVE-24129:


+1

> Deleting the previous successful dump directory should be based on config
> -
>
> Key: HIVE-24129
> URL: https://issues.apache.org/jira/browse/HIVE-24129
> Project: Hive
>  Issue Type: Task
>Reporter: Pravin Sinha
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24129.01.patch, HIVE-24129.02.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> {color:#22}Description: Provide a policy level config defaulted to 
> true.{color}
> {color:#22}This can help debug any issue in the production.{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24129) Deleting the previous successful dump directory should be based on config

2020-09-08 Thread Arko Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arko Sharma updated HIVE-24129:
---
Attachment: HIVE-24129.02.patch

> Deleting the previous successful dump directory should be based on config
> -
>
> Key: HIVE-24129
> URL: https://issues.apache.org/jira/browse/HIVE-24129
> Project: Hive
>  Issue Type: Task
>Reporter: Pravin Sinha
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24129.01.patch, HIVE-24129.02.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> {color:#22}Description: Provide a policy level config defaulted to 
> true.{color}
> {color:#22}This can help debug any issue in the production.{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23851) MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23851?focusedWorklogId=480599=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480599
 ]

ASF GitHub Bot logged work on HIVE-23851:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 05:27
Start Date: 09/Sep/20 05:27
Worklog Time Spent: 10m 
  Work Description: shameersss1 commented on a change in pull request #1271:
URL: https://github.com/apache/hive/pull/1271#discussion_r485346149



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MsckPartitionExpressionProxy.java
##
@@ -1,114 +0,0 @@
-package org.apache.hadoop.hive.metastore;
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- * 
- * http://www.apache.org/licenses/LICENSE-2.0
- * 
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-import java.nio.charset.StandardCharsets;
-import java.util.ArrayList;
-import java.util.HashSet;
-import java.util.List;
-import java.util.Set;
-
-import org.apache.hadoop.hive.metastore.api.FieldSchema;
-import org.apache.hadoop.hive.metastore.api.FileMetadataExprType;
-import org.apache.hadoop.hive.metastore.api.MetaException;
-import org.apache.hadoop.hive.metastore.utils.FileUtils;
-import org.apache.hadoop.hive.ql.io.sarg.SearchArgument;
-import org.apache.hadoop.util.StringUtils;
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
-
-// This is added as part of moving MSCK code from ql to standalone-metastore. 
There is a metastore API to drop
-// partitions by name but we cannot use it because msck typically will contain 
partition value (year=2014). We almost
-// never drop partition by name (year). So we need to construct expression 
filters, the current
-// PartitionExpressionProxy implementations (PartitionExpressionForMetastore 
and HCatClientHMSImpl.ExpressionBuilder)
-// all depend on ql code to build ExprNodeDesc for the partition expressions. 
It also depends on kryo for serializing
-// the expression objects to byte[]. For MSCK drop partition, we don't need 
complex expression generator. For now,
-// all we do is split the partition spec (year=2014/month=24) into filter 
expression year='2014' and month='24' and
-// rely on metastore database to deal with type conversions. Ideally, 
PartitionExpressionProxy default implementation
-// should use SearchArgument (storage-api) to construct the filter expression 
and not depend on ql, but the usecase
-// for msck is pretty simple and this specific implementation should suffice.

Review comment:
   Hm..hm.. using SARG will complicate a lot i guess and moving ExprNode 
related classes from ql is not trivial (we could explore this a bit further) 
but looking at a big picture most of the ExprNode classes is dependent on serde 
classes since we don't want serde classes in standalone-metastore We should 
either put the ExprNode related class in some other module or new a new module 
which both ql and standalone-metastore can use.
   
   I could think of an another approach even though it is hacky, It solves the 
purpose. - Since PartitionExpressionForMetastore class is required only during 
partition pruning step, We can switch back the expression proxy class to 
MsckPartitionExpressionProxy once the partition pruning step is done. This way 
we could solve the compatability issue.
   
   @kgyrtkirk Any thoughts on validity of the above approach?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480599)
Time Spent: 3h 20m  (was: 3h 10m)

> MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions
> 
>
> Key: HIVE-23851
> URL: https://issues.apache.org/jira/browse/HIVE-23851
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Syed Shameerur Rahman
>

[jira] [Updated] (HIVE-24129) Deleting the previous successful dump directory should be based on config

2020-09-08 Thread Arko Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arko Sharma updated HIVE-24129:
---
Attachment: HIVE-24129.01.patch

> Deleting the previous successful dump directory should be based on config
> -
>
> Key: HIVE-24129
> URL: https://issues.apache.org/jira/browse/HIVE-24129
> Project: Hive
>  Issue Type: Task
>Reporter: Pravin Sinha
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24129.01.patch, HIVE-24129.02.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> {color:#22}Description: Provide a policy level config defaulted to 
> true.{color}
> {color:#22}This can help debug any issue in the production.{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23851) MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23851?focusedWorklogId=480598=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480598
 ]

ASF GitHub Bot logged work on HIVE-23851:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 05:27
Start Date: 09/Sep/20 05:27
Worklog Time Spent: 10m 
  Work Description: shameersss1 commented on a change in pull request #1271:
URL: https://github.com/apache/hive/pull/1271#discussion_r485346149



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MsckPartitionExpressionProxy.java
##
@@ -1,114 +0,0 @@
-package org.apache.hadoop.hive.metastore;
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- * 
- * http://www.apache.org/licenses/LICENSE-2.0
- * 
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-import java.nio.charset.StandardCharsets;
-import java.util.ArrayList;
-import java.util.HashSet;
-import java.util.List;
-import java.util.Set;
-
-import org.apache.hadoop.hive.metastore.api.FieldSchema;
-import org.apache.hadoop.hive.metastore.api.FileMetadataExprType;
-import org.apache.hadoop.hive.metastore.api.MetaException;
-import org.apache.hadoop.hive.metastore.utils.FileUtils;
-import org.apache.hadoop.hive.ql.io.sarg.SearchArgument;
-import org.apache.hadoop.util.StringUtils;
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
-
-// This is added as part of moving MSCK code from ql to standalone-metastore. 
There is a metastore API to drop
-// partitions by name but we cannot use it because msck typically will contain 
partition value (year=2014). We almost
-// never drop partition by name (year). So we need to construct expression 
filters, the current
-// PartitionExpressionProxy implementations (PartitionExpressionForMetastore 
and HCatClientHMSImpl.ExpressionBuilder)
-// all depend on ql code to build ExprNodeDesc for the partition expressions. 
It also depends on kryo for serializing
-// the expression objects to byte[]. For MSCK drop partition, we don't need 
complex expression generator. For now,
-// all we do is split the partition spec (year=2014/month=24) into filter 
expression year='2014' and month='24' and
-// rely on metastore database to deal with type conversions. Ideally, 
PartitionExpressionProxy default implementation
-// should use SearchArgument (storage-api) to construct the filter expression 
and not depend on ql, but the usecase
-// for msck is pretty simple and this specific implementation should suffice.

Review comment:
   Hm..hm.. using SARG will complicate a lot i guess and moving ExprNode 
related classes from ql is not trivial (we could explore this a bit further) 
but looking at a big picture most of the ExprNode classes is dependent on serde 
classes since we don't want serde classes in standalone-metastore We should 
either put the ExprNode related class in some other module or new a new module 
which both ql and standalone-metastore can use.
   
   I could think of a another approach even though it is hacky, It solves the 
purpose. - Since PartitionExpressionForMetastore class is required only during 
partition pruning step, We can switch back the expression proxy class to 
MsckPartitionExpressionProxy once the partition pruning step is done. This way 
we could solve the compatability issue.
   
   @kgyrtkirk Any thoughts on validity of the above approach?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480598)
Time Spent: 3h 10m  (was: 3h)

> MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions
> 
>
> Key: HIVE-23851
> URL: https://issues.apache.org/jira/browse/HIVE-23851
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Syed Shameerur Rahman
>Assignee: 

[jira] [Updated] (HIVE-24129) Deleting the previous successful dump directory should be based on config

2020-09-08 Thread Arko Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arko Sharma updated HIVE-24129:
---
Attachment: (was: HIVE-24129.01.patch)

> Deleting the previous successful dump directory should be based on config
> -
>
> Key: HIVE-24129
> URL: https://issues.apache.org/jira/browse/HIVE-24129
> Project: Hive
>  Issue Type: Task
>Reporter: Pravin Sinha
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> {color:#22}Description: Provide a policy level config defaulted to 
> true.{color}
> {color:#22}This can help debug any issue in the production.{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23851) MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23851?focusedWorklogId=480597=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480597
 ]

ASF GitHub Bot logged work on HIVE-23851:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 05:26
Start Date: 09/Sep/20 05:26
Worklog Time Spent: 10m 
  Work Description: shameersss1 commented on a change in pull request #1271:
URL: https://github.com/apache/hive/pull/1271#discussion_r485346149



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MsckPartitionExpressionProxy.java
##
@@ -1,114 +0,0 @@
-package org.apache.hadoop.hive.metastore;
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- * 
- * http://www.apache.org/licenses/LICENSE-2.0
- * 
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-import java.nio.charset.StandardCharsets;
-import java.util.ArrayList;
-import java.util.HashSet;
-import java.util.List;
-import java.util.Set;
-
-import org.apache.hadoop.hive.metastore.api.FieldSchema;
-import org.apache.hadoop.hive.metastore.api.FileMetadataExprType;
-import org.apache.hadoop.hive.metastore.api.MetaException;
-import org.apache.hadoop.hive.metastore.utils.FileUtils;
-import org.apache.hadoop.hive.ql.io.sarg.SearchArgument;
-import org.apache.hadoop.util.StringUtils;
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
-
-// This is added as part of moving MSCK code from ql to standalone-metastore. 
There is a metastore API to drop
-// partitions by name but we cannot use it because msck typically will contain 
partition value (year=2014). We almost
-// never drop partition by name (year). So we need to construct expression 
filters, the current
-// PartitionExpressionProxy implementations (PartitionExpressionForMetastore 
and HCatClientHMSImpl.ExpressionBuilder)
-// all depend on ql code to build ExprNodeDesc for the partition expressions. 
It also depends on kryo for serializing
-// the expression objects to byte[]. For MSCK drop partition, we don't need 
complex expression generator. For now,
-// all we do is split the partition spec (year=2014/month=24) into filter 
expression year='2014' and month='24' and
-// rely on metastore database to deal with type conversions. Ideally, 
PartitionExpressionProxy default implementation
-// should use SearchArgument (storage-api) to construct the filter expression 
and not depend on ql, but the usecase
-// for msck is pretty simple and this specific implementation should suffice.

Review comment:
   Hm..hm.. using SARG will complicate a lot i guess and moving ExprNode 
related classes from ql is not trivial (we could explore this a bit further) 
but looking at a big picture most of the ExprNode classes is dependent on serde 
classes since we don't want serde classes in standalone-metastore We should 
either put class the ExprNode related class in some other module or new a new 
module which both ql and standalone-metastore can use.
   
   I could think of a another approach even though it is hacky, It solves the 
purpose. - Since PartitionExpressionForMetastore class is required only during 
partition pruning step, We can switch back the expression proxy class to 
MsckPartitionExpressionProxy once the partition pruning step is done. This way 
we could solve the compatability issue.
   
   @kgyrtkirk Any thoughts on validity of the above approach?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480597)
Time Spent: 3h  (was: 2h 50m)

> MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions
> 
>
> Key: HIVE-23851
> URL: https://issues.apache.org/jira/browse/HIVE-23851
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Syed Shameerur Rahman
>

[jira] [Updated] (HIVE-24127) Dump events from default catalog only

2020-09-08 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-24127:
---
Attachment: HIVE-24127.02.patch
Status: Patch Available  (was: In Progress)

> Dump events from default catalog only
> -
>
> Key: HIVE-24127
> URL: https://issues.apache.org/jira/browse/HIVE-24127
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24127.01.patch, HIVE-24127.02.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Don't dump events from spark catalog. In bootstrap we skip spark tables. In 
> inceremental load also we should skip spark events.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24127) Dump events from default catalog only

2020-09-08 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-24127:
---
Status: In Progress  (was: Patch Available)

> Dump events from default catalog only
> -
>
> Key: HIVE-24127
> URL: https://issues.apache.org/jira/browse/HIVE-24127
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24127.01.patch, HIVE-24127.02.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Don't dump events from spark catalog. In bootstrap we skip spark tables. In 
> inceremental load also we should skip spark events.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24129) Deleting the previous successful dump directory should be based on config

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24129?focusedWorklogId=480591=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480591
 ]

ASF GitHub Bot logged work on HIVE-24129:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 05:04
Start Date: 09/Sep/20 05:04
Worklog Time Spent: 10m 
  Work Description: ArkoSharma commented on a change in pull request #1479:
URL: https://github.com/apache/hive/pull/1479#discussion_r485339951



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenarios.java
##
@@ -2018,6 +2018,129 @@ public void 
testIncrementalLoadWithPreviousDumpDeleteFailed() throws IOException
 verifyRun("SELECT a from " + replDbName + ".ptned WHERE b=2", ptnData2, 
driverMirror);
   }
 
+  @Test
+  public void testConfiguredDeleteOfPrevDumpDir() throws IOException {
+boolean verifySetupOriginal = verifySetupSteps;
+verifySetupSteps = true;
+String nameOfTest = "testConfigureDeleteOfPrevDumpDir";
+String dbName = createDB(nameOfTest, driver);
+String replDbName = dbName + "_dupe";
+List withConfigDeletePrevDump = Arrays.asList(
+"'" + HiveConf.ConfVars.REPL_RETAIN_PREV_DUMP_DIR + "'= 'false' ");
+List withConfigRetainPrevDump = Arrays.asList(
+"'" + HiveConf.ConfVars.REPL_RETAIN_PREV_DUMP_DIR + "'= 'true' ",
+"'" + HiveConf.ConfVars.REPL_RETAIN_PREV_DUMP_DIR_COUNT + "'= '2' 
");
+
+run("CREATE TABLE " + dbName + ".unptned(a string) STORED AS TEXTFILE", 
driver);
+run("CREATE TABLE " + dbName + ".ptned(a string) partitioned by (b int) 
STORED AS TEXTFILE", driver);
+
+Tuple bootstrapDump = bootstrapLoadAndVerify(dbName, replDbName);
+
+String[] unptnData = new String[] {"eleven", "twelve"};
+String[] ptnData1 = new String[] {"thirteen", "fourteen", "fifteen"};
+String[] ptnData2 = new String[] {"fifteen", "sixteen", "seventeen"};
+String[] empty = new String[] {};
+
+String unptnLocn = new Path(TEST_PATH, nameOfTest + 
"_unptn").toUri().getPath();
+String ptnLocn1 = new Path(TEST_PATH, nameOfTest + 
"_ptn1").toUri().getPath();
+String ptnLocn2 = new Path(TEST_PATH, nameOfTest + 
"_ptn2").toUri().getPath();
+
+createTestDataFile(unptnLocn, unptnData);
+createTestDataFile(ptnLocn1, ptnData1);
+createTestDataFile(ptnLocn2, ptnData2);
+
+run("LOAD DATA LOCAL INPATH '" + unptnLocn + "' OVERWRITE INTO TABLE " + 
dbName + ".unptned", driver);
+verifySetup("SELECT * from " + dbName + ".unptned", unptnData, driver);
+run("CREATE TABLE " + dbName + ".unptned_late LIKE " + dbName + 
".unptned", driver);
+run("INSERT INTO TABLE " + dbName + ".unptned_late SELECT * FROM " + 
dbName + ".unptned", driver);
+verifySetup("SELECT * from " + dbName + ".unptned_late", unptnData, 
driver);
+
+//perform first incremental with default option and check that 
bootstrap-dump-dir gets deleted
+Path bootstrapDumpDir = new Path(bootstrapDump.dumpLocation, 
ReplUtils.REPL_HIVE_BASE_DIR);
+FileSystem fs = FileSystem.get(bootstrapDumpDir.toUri(), hconf);
+assertTrue(fs.exists(bootstrapDumpDir));
+Tuple incrDump = replDumpDb(dbName);
+assertFalse(fs.exists(bootstrapDumpDir));
+
+
+loadAndVerify(replDbName, dbName, incrDump.lastReplId);
+verifyRun("SELECT * from " + replDbName + ".unptned", unptnData, 
driverMirror);
+verifyRun("SELECT * from " + replDbName + ".unptned_late", unptnData, 
driverMirror);
+
+run("ALTER TABLE " + dbName + ".ptned ADD PARTITION (b=1)", driver);
+run("LOAD DATA LOCAL INPATH '" + ptnLocn1 + "' OVERWRITE INTO TABLE " + 
dbName
++ ".ptned PARTITION(b=1)", driver);
+verifySetup("SELECT a from " + dbName + ".ptned WHERE b=1", ptnData1, 
driver);
+run("LOAD DATA LOCAL INPATH '" + ptnLocn2 + "' OVERWRITE INTO TABLE " + 
dbName
++ ".ptned PARTITION(b=2)", driver);
+verifySetup("SELECT a from " + dbName + ".ptned WHERE b=2", ptnData2, 
driver);
+
+
+//Perform 2nd incremental with retain option.
+//Check 1st incremental dump-dir is present even after 2nd incr dump.
+Path incrDumpDir1 = new Path(incrDump.dumpLocation, 
ReplUtils.REPL_HIVE_BASE_DIR);
+incrDump = replDumpDb(dbName, withConfigRetainPrevDump);
+assertTrue(fs.exists(incrDumpDir1));
+
+loadAndVerify(replDbName, dbName, incrDump.lastReplId);
+verifyRun("SELECT a from " + replDbName + ".ptned WHERE b=1", ptnData1, 
driverMirror);
+verifyRun("SELECT a from " + replDbName + ".ptned WHERE b=2", ptnData2, 
driverMirror);
+
+run("CREATE TABLE " + dbName
++ ".ptned_late(a string) PARTITIONED BY (b int) STORED AS 
TEXTFILE", driver);
+run("INSERT INTO TABLE " + dbName + ".ptned_late PARTITION(b=1) SELECT a 
FROM " + dbName
++ ".ptned WHERE b=1", driver);
+verifySetup("SELECT a from " + dbName + ".ptned_late 

[jira] [Work logged] (HIVE-24084) Push Aggregates thru joins in case it re-groups previously unique columns

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24084?focusedWorklogId=480588=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480588
 ]

ASF GitHub Bot logged work on HIVE-24084:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 04:48
Start Date: 09/Sep/20 04:48
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on a change in pull request #1439:
URL: https://github.com/apache/hive/pull/1439#discussion_r485325338



##
File path: 
ql/src/test/results/clientpositive/llap/constraints_optimization.q.out
##
@@ -2631,13 +2631,12 @@ POSTHOOK: Input: default@customer
 POSTHOOK: Input: default@store_sales
  A masked pattern was here 
 CBO PLAN:
-HiveAggregate(group=[{0}])
-  HiveJoin(condition=[=($0, $8)], joinType=[inner], algorithm=[none], 
cost=[not available])
-HiveProject(c_customer_sk=[$0], c_customer_id=[$1], c_first_name=[$8], 
c_last_name=[$9], c_preferred_cust_flag=[$10], c_birth_country=[$14], 
c_login=[$15], c_email_address=[$16])
-  HiveTableScan(table=[[default, customer]], table:alias=[customer])
-HiveProject(ss_customer_sk=[$3])
-  HiveFilter(condition=[IS NOT NULL($3)])
-HiveTableScan(table=[[default, store_sales]], 
table:alias=[store_sales])
+HiveSemiJoin(condition=[=($0, $1)], joinType=[semi])

Review comment:
   This is quite neat.

##
File path: 
ql/src/test/results/clientpositive/perf/tez/constraints/cbo_query23.q.out
##
@@ -155,18 +155,19 @@ HiveAggregate(group=[{}], agg#0=[sum($0)])
 HiveTableScan(table=[[default, date_dim]], 
table:alias=[date_dim])
   HiveProject($f1=[$0])
 HiveFilter(condition=[>($2, 4)])
-  HiveProject(i_item_sk=[$1], d_date=[$0], $f2=[$2])
-HiveAggregate(group=[{3, 4}], agg#0=[count()])
-  HiveJoin(condition=[=($1, $4)], joinType=[inner], 
algorithm=[none], cost=[not available])
-HiveJoin(condition=[=($0, $2)], joinType=[inner], 
algorithm=[none], cost=[not available])
-  HiveProject(ss_sold_date_sk=[$0], ss_item_sk=[$2])
-HiveFilter(condition=[IS NOT NULL($0)])
-  HiveTableScan(table=[[default, store_sales]], 
table:alias=[store_sales])
-  HiveProject(d_date_sk=[$0], d_date=[$2])
-HiveFilter(condition=[IN($6, 1999, 2000, 2001, 2002)])
-  HiveTableScan(table=[[default, date_dim]], 
table:alias=[date_dim])
-HiveProject(i_item_sk=[$0], substr=[substr($4, 1, 30)])
-  HiveTableScan(table=[[default, item]], 
table:alias=[item])
+  HiveProject(i_item_sk=[$3], d_date=[$1], $f2=[$2])
+HiveJoin(condition=[=($0, $3)], joinType=[inner], 
algorithm=[none], cost=[not available])
+  HiveProject(ss_item_sk=[$0], d_date=[$1], $f2=[$2])
+HiveAggregate(group=[{1, 3}], agg#0=[count()])

Review comment:
   Cool!

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveAggregateJoinTransposeRule.java
##
@@ -303,6 +305,90 @@ public void onMatch(RelOptRuleCall call) {
 }
   }
 
+  /**
+   * Determines weather the give grouping is unique.
+   *
+   * Consider a join which might produce non-unique rows; but later the 
results are aggregated again.
+   * This method determines if there are sufficient columns in the grouping 
which have been present previously as unique column(s).
+   */
+  private boolean isGroupingUnique(RelNode input, ImmutableBitSet groups) {
+if (groups.isEmpty()) {
+  return false;
+}
+RelMetadataQuery mq = input.getCluster().getMetadataQuery();
+Set uKeys = mq.getUniqueKeys(input);
+for (ImmutableBitSet u : uKeys) {
+  if (groups.contains(u)) {
+return true;
+  }
+}
+if (input instanceof Join) {
+  Join join = (Join) input;
+  RexBuilder rexBuilder = input.getCluster().getRexBuilder();
+  SimpleConditionInfo cond = new SimpleConditionInfo(join.getCondition(), 
rexBuilder);
+
+  if (cond.valid) {
+ImmutableBitSet newGroup = 
groups.intersect(ImmutableBitSet.fromBitSet(cond.fields));
+RelNode l = join.getLeft();
+RelNode r = join.getRight();
+
+int joinFieldCount = join.getRowType().getFieldCount();
+int lFieldCount = l.getRowType().getFieldCount();
+
+ImmutableBitSet groupL = newGroup.get(0, lFieldCount);
+ImmutableBitSet groupR = newGroup.get(lFieldCount, 
joinFieldCount).shift(-lFieldCount);
+
+if (isGroupingUnique(l, groupL)) {

Review comment:
   As you go down recursively, you may start finding `HepRelVertex` as the 
rel node. I think you hit the `instanceof Project` below only for the first 
project because you create it using the builder.
 

[jira] [Work logged] (HIVE-24129) Deleting the previous successful dump directory should be based on config

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24129?focusedWorklogId=480579=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480579
 ]

ASF GitHub Bot logged work on HIVE-24129:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 04:12
Start Date: 09/Sep/20 04:12
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1479:
URL: https://github.com/apache/hive/pull/1479#discussion_r485324479



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenarios.java
##
@@ -2018,6 +2018,129 @@ public void 
testIncrementalLoadWithPreviousDumpDeleteFailed() throws IOException
 verifyRun("SELECT a from " + replDbName + ".ptned WHERE b=2", ptnData2, 
driverMirror);
   }
 
+  @Test
+  public void testConfiguredDeleteOfPrevDumpDir() throws IOException {
+boolean verifySetupOriginal = verifySetupSteps;
+verifySetupSteps = true;
+String nameOfTest = "testConfigureDeleteOfPrevDumpDir";

Review comment:
   String name = testName.getMethodName(); can be used

##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenarios.java
##
@@ -2018,6 +2018,129 @@ public void 
testIncrementalLoadWithPreviousDumpDeleteFailed() throws IOException
 verifyRun("SELECT a from " + replDbName + ".ptned WHERE b=2", ptnData2, 
driverMirror);
   }
 
+  @Test
+  public void testConfiguredDeleteOfPrevDumpDir() throws IOException {
+boolean verifySetupOriginal = verifySetupSteps;
+verifySetupSteps = true;
+String nameOfTest = "testConfigureDeleteOfPrevDumpDir";
+String dbName = createDB(nameOfTest, driver);
+String replDbName = dbName + "_dupe";
+List withConfigDeletePrevDump = Arrays.asList(
+"'" + HiveConf.ConfVars.REPL_RETAIN_PREV_DUMP_DIR + "'= 'false' ");
+List withConfigRetainPrevDump = Arrays.asList(
+"'" + HiveConf.ConfVars.REPL_RETAIN_PREV_DUMP_DIR + "'= 'true' ",
+"'" + HiveConf.ConfVars.REPL_RETAIN_PREV_DUMP_DIR_COUNT + "'= '2' 
");
+
+run("CREATE TABLE " + dbName + ".unptned(a string) STORED AS TEXTFILE", 
driver);
+run("CREATE TABLE " + dbName + ".ptned(a string) partitioned by (b int) 
STORED AS TEXTFILE", driver);
+
+Tuple bootstrapDump = bootstrapLoadAndVerify(dbName, replDbName);
+
+String[] unptnData = new String[] {"eleven", "twelve"};
+String[] ptnData1 = new String[] {"thirteen", "fourteen", "fifteen"};
+String[] ptnData2 = new String[] {"fifteen", "sixteen", "seventeen"};
+String[] empty = new String[] {};
+
+String unptnLocn = new Path(TEST_PATH, nameOfTest + 
"_unptn").toUri().getPath();
+String ptnLocn1 = new Path(TEST_PATH, nameOfTest + 
"_ptn1").toUri().getPath();
+String ptnLocn2 = new Path(TEST_PATH, nameOfTest + 
"_ptn2").toUri().getPath();
+
+createTestDataFile(unptnLocn, unptnData);
+createTestDataFile(ptnLocn1, ptnData1);
+createTestDataFile(ptnLocn2, ptnData2);
+
+run("LOAD DATA LOCAL INPATH '" + unptnLocn + "' OVERWRITE INTO TABLE " + 
dbName + ".unptned", driver);
+verifySetup("SELECT * from " + dbName + ".unptned", unptnData, driver);
+run("CREATE TABLE " + dbName + ".unptned_late LIKE " + dbName + 
".unptned", driver);
+run("INSERT INTO TABLE " + dbName + ".unptned_late SELECT * FROM " + 
dbName + ".unptned", driver);
+verifySetup("SELECT * from " + dbName + ".unptned_late", unptnData, 
driver);
+
+//perform first incremental with default option and check that 
bootstrap-dump-dir gets deleted
+Path bootstrapDumpDir = new Path(bootstrapDump.dumpLocation, 
ReplUtils.REPL_HIVE_BASE_DIR);
+FileSystem fs = FileSystem.get(bootstrapDumpDir.toUri(), hconf);
+assertTrue(fs.exists(bootstrapDumpDir));
+Tuple incrDump = replDumpDb(dbName);
+assertFalse(fs.exists(bootstrapDumpDir));
+
+
+loadAndVerify(replDbName, dbName, incrDump.lastReplId);
+verifyRun("SELECT * from " + replDbName + ".unptned", unptnData, 
driverMirror);
+verifyRun("SELECT * from " + replDbName + ".unptned_late", unptnData, 
driverMirror);
+
+run("ALTER TABLE " + dbName + ".ptned ADD PARTITION (b=1)", driver);
+run("LOAD DATA LOCAL INPATH '" + ptnLocn1 + "' OVERWRITE INTO TABLE " + 
dbName
++ ".ptned PARTITION(b=1)", driver);
+verifySetup("SELECT a from " + dbName + ".ptned WHERE b=1", ptnData1, 
driver);
+run("LOAD DATA LOCAL INPATH '" + ptnLocn2 + "' OVERWRITE INTO TABLE " + 
dbName
++ ".ptned PARTITION(b=2)", driver);
+verifySetup("SELECT a from " + dbName + ".ptned WHERE b=2", ptnData2, 
driver);
+
+
+//Perform 2nd incremental with retain option.
+//Check 1st incremental dump-dir is present even after 2nd incr dump.
+Path incrDumpDir1 = new Path(incrDump.dumpLocation, 
ReplUtils.REPL_HIVE_BASE_DIR);
+incrDump = replDumpDb(dbName, 

[jira] [Work logged] (HIVE-24072) HiveAggregateJoinTransposeRule may try to create an invalid transformation

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24072?focusedWorklogId=480574=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480574
 ]

ASF GitHub Bot logged work on HIVE-24072:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 03:53
Start Date: 09/Sep/20 03:53
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on a change in pull request #1432:
URL: https://github.com/apache/hive/pull/1432#discussion_r485321252



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveAggregateJoinTransposeRule.java
##
@@ -145,8 +145,7 @@ public void onMatch(RelOptRuleCall call) {
 int fieldCount = joinInput.getRowType().getFieldCount();
 final ImmutableBitSet fieldSet =
 ImmutableBitSet.range(offset, offset + fieldCount);
-final ImmutableBitSet belowAggregateKeyNotShifted =
-belowAggregateColumns.intersect(fieldSet);

Review comment:
   Yes, this makes sense now! 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480574)
Time Spent: 1h 10m  (was: 1h)

> HiveAggregateJoinTransposeRule may try to create an invalid transformation
> --
>
> Key: HIVE-24072
> URL: https://issues.apache.org/jira/browse/HIVE-24072
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> {code}
> java.lang.AssertionError: 
> Cannot add expression of different type to set:
> set type is RecordType(INTEGER NOT NULL o_orderkey, DECIMAL(10, 0) 
> o_totalprice, DATE o_orderdate, INTEGER NOT NULL c_custkey, VARCHAR(25) 
> CHARACTER SET "UTF-16LE" c_name, DOUBLE $f5) NOT NULL
> expression type is RecordType(INTEGER NOT NULL o_orderkey, INTEGER NOT NULL 
> o_custkey, DECIMAL(10, 0) o_totalprice, DATE o_orderdate, INTEGER NOT NULL 
> c_custkey, DOUBLE $f1) NOT NULL
> set is rel#567:HiveAggregate.HIVE.[].any(input=HepRelVertex#490,group={2, 4, 
> 5, 6, 7},agg#0=sum($1))
> expression is HiveProject(o_orderkey=[$2], o_custkey=[$3], o_totalprice=[$4], 
> o_orderdate=[$5], c_custkey=[$6], $f1=[$1])
>   HiveJoin(condition=[=($2, $0)], joinType=[inner], algorithm=[none], 
> cost=[{2284.5 rows, 0.0 cpu, 0.0 io}])
> HiveAggregate(group=[{0}], agg#0=[sum($1)])
>   HiveProject(l_orderkey=[$0], l_quantity=[$4])
> HiveTableScan(table=[[tpch_0_001, lineitem]], table:alias=[l])
> HiveJoin(condition=[=($0, $6)], joinType=[inner], algorithm=[none], 
> cost=[{1.9115E15 rows, 0.0 cpu, 0.0 io}])
>   HiveJoin(condition=[=($4, $1)], joinType=[inner], algorithm=[none], 
> cost=[{1650.0 rows, 0.0 cpu, 0.0 io}])
> HiveProject(o_orderkey=[$0], o_custkey=[$1], o_totalprice=[$3], 
> o_orderdate=[$4])
>   HiveTableScan(table=[[tpch_0_001, orders]], table:alias=[orders])
> HiveProject(c_custkey=[$0], c_name=[$1])
>   HiveTableScan(table=[[tpch_0_001, customer]], 
> table:alias=[customer])
>   HiveProject($f0=[$0])
> HiveFilter(condition=[>($1, 3E2)])
>   HiveAggregate(group=[{0}], agg#0=[sum($4)])
> HiveTableScan(table=[[tpch_0_001, lineitem]], 
> table:alias=[lineitem])
>   at 
> org.apache.calcite.plan.RelOptUtil.verifyTypeEquivalence(RelOptUtil.java:383)
>   at 
> org.apache.calcite.plan.hep.HepRuleCall.transformTo(HepRuleCall.java:57)
>   at 
> org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRuleCall.java:236)
>   at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveAggregateJoinTransposeRule.onMatch(HiveAggregateJoinTransposeRule.java:300)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24095) Load partitions in parallel for external tables in the bootstrap phase

2020-09-08 Thread Anishek Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anishek Agarwal updated HIVE-24095:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Patch committed to master, Thanks for the patch [~aasha] and review 
[~pkumarsinha]

> Load partitions in parallel for external tables in the bootstrap phase
> --
>
> Key: HIVE-24095
> URL: https://issues.apache.org/jira/browse/HIVE-24095
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24095.01.patch, HIVE-24095.02.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This is part 1 of the change. This will load partitions in parallel for 
> external tables. Managed table is tracked as part of 
> https://issues.apache.org/jira/browse/HIVE-24109



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24132) Metastore client doesn't close connection properly

2020-09-08 Thread xiepengjie (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiepengjie updated HIVE-24132:
--
Description: 
While closing metastore client connection, sometimes throws warning log with 
following trace. 
{code:java}
2020-09-09 10:56:14,408 WARN org.apache.thrift.transport.TIOStreamTransport: 
Error closing output stream.
java.net.SocketException: Socket closed
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:116)
at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
at 
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
at java.io.FilterOutputStream.close(FilterOutputStream.java:158)
at 
org.apache.thrift.transport.TIOStreamTransport.close(TIOStreamTransport.java:110)
at org.apache.thrift.transport.TSocket.close(TSocket.java:235)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.close(HiveMetaStoreClient.java:506)
at sun.reflect.GeneratedMethodAccessor44.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156)
at com.sun.proxy.$Proxy6.close(Unknown Source)
at sun.reflect.GeneratedMethodAccessor44.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:1992)
at com.sun.proxy.$Proxy6.close(Unknown Source)
at org.apache.hadoop.hive.ql.metadata.Hive.close(Hive.java:320)
at org.apache.hadoop.hive.ql.metadata.Hive.access$000(Hive.java:143)
at org.apache.hadoop.hive.ql.metadata.Hive$1.remove(Hive.java:167)
at org.apache.hadoop.hive.ql.metadata.Hive.closeCurrent(Hive.java:288)
at 
org.apache.hive.service.cli.session.HiveSessionImpl.close(HiveSessionImpl.java:616)
at 
org.apache.hive.service.cli.session.HiveSessionImplwithUGI.close(HiveSessionImplwithUGI.java:93)
at sun.reflect.GeneratedMethodAccessor117.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
at 
org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
at 
org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1923)
at 
org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
at com.sun.proxy.$Proxy19.close(Unknown Source)
at 
org.apache.hive.service.cli.session.SessionManager.closeSession(SessionManager.java:300)
at 
org.apache.hive.service.cli.CLIService.closeSession(CLIService.java:237)
at 
org.apache.hive.service.cli.thrift.ThriftCLIService.CloseSession(ThriftCLIService.java:464)
at 
org.apache.hive.service.cli.thrift.TCLIService$Processor$CloseSession.getResult(TCLIService.java:1273)
at 
org.apache.hive.service.cli.thrift.TCLIService$Processor$CloseSession.getResult(TCLIService.java:1258)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at 
org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:57)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}

> Metastore client doesn't close connection properly
> --
>
> Key: HIVE-24132
> URL: https://issues.apache.org/jira/browse/HIVE-24132
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 1.2.1
>Reporter: xiepengjie
>Priority: Major
>
> While closing metastore client connection, sometimes throws warning log with 
> following trace. 
> {code:java}
> 2020-09-09 

[jira] [Updated] (HIVE-24095) Load partitions in parallel for external tables in the bootstrap phase

2020-09-08 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-24095:
---
Attachment: HIVE-24095.02.patch
Status: Patch Available  (was: In Progress)

> Load partitions in parallel for external tables in the bootstrap phase
> --
>
> Key: HIVE-24095
> URL: https://issues.apache.org/jira/browse/HIVE-24095
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24095.01.patch, HIVE-24095.02.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This is part 1 of the change. This will load partitions in parallel for 
> external tables. Managed table is tracked as part of 
> https://issues.apache.org/jira/browse/HIVE-24109



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24095) Load partitions in parallel for external tables in the bootstrap phase

2020-09-08 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-24095:
---
Status: In Progress  (was: Patch Available)

> Load partitions in parallel for external tables in the bootstrap phase
> --
>
> Key: HIVE-24095
> URL: https://issues.apache.org/jira/browse/HIVE-24095
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24095.01.patch, HIVE-24095.02.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This is part 1 of the change. This will load partitions in parallel for 
> external tables. Managed table is tracked as part of 
> https://issues.apache.org/jira/browse/HIVE-24109



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24095) Load partitions in parallel for external tables in the bootstrap phase

2020-09-08 Thread Pravin Sinha (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17192562#comment-17192562
 ] 

Pravin Sinha commented on HIVE-24095:
-

+1

> Load partitions in parallel for external tables in the bootstrap phase
> --
>
> Key: HIVE-24095
> URL: https://issues.apache.org/jira/browse/HIVE-24095
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24095.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This is part 1 of the change. This will load partitions in parallel for 
> external tables. Managed table is tracked as part of 
> https://issues.apache.org/jira/browse/HIVE-24109



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24131) Use original src location always when data copy runs on target

2020-09-08 Thread Pravin Sinha (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha updated HIVE-24131:

Attachment: HIVE-24131.01.patch

> Use original src location always when data copy runs on target 
> ---
>
> Key: HIVE-24131
> URL: https://issues.apache.org/jira/browse/HIVE-24131
> Project: Hive
>  Issue Type: Bug
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24131.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23698) Compiler support for row-level filtering on filterPredicates

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23698?focusedWorklogId=480537=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480537
 ]

ASF GitHub Bot logged work on HIVE-23698:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 00:48
Start Date: 09/Sep/20 00:48
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #:
URL: https://github.com/apache/hive/pull/


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480537)
Time Spent: 40m  (was: 0.5h)

> Compiler support for row-level filtering on filterPredicates
> 
>
> Key: HIVE-23698
> URL: https://issues.apache.org/jira/browse/HIVE-23698
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Similar to what we currently do for StorageHandlers, we should pushdown the 
> static expression for row-level filtering when the file-format supports the 
> feature (ORC).
> I propose to split the  filterExpr to residual and pushed predicate. If 
> predicate is completely pushed then we remove the operator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23586) load data overwrite into bucket table failed

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23586?focusedWorklogId=480528=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480528
 ]

ASF GitHub Bot logged work on HIVE-23586:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 00:47
Start Date: 09/Sep/20 00:47
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1048:
URL: https://github.com/apache/hive/pull/1048


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480528)
Time Spent: 40m  (was: 0.5h)

> load data overwrite into bucket table failed
> 
>
> Key: HIVE-23586
> URL: https://issues.apache.org/jira/browse/HIVE-23586
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.0, 4.0.0, 3.1.2
>Reporter: zhaolong
>Assignee: zhaolong
>Priority: Critical
>  Labels: pull-request-available
> Attachments: HIVE-23586.01.patch, image-2020-06-01-21-40-21-726.png, 
> image-2020-06-01-21-41-28-732.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> load data overwrite into bucket table is failed if filename is not like 
> 00_0, but insert new data in the table.
>  
> for example:
> CREATE EXTERNAL TABLE IF NOT EXISTS test_hive2 (name string,account string) 
> PARTITIONED BY (logdate string) CLUSTERED BY (account) INTO 4 BUCKETS row 
> format delimited fields terminated by '|' STORED AS textfile;
>  load data inpath 'hdfs://hacluster/tmp/zltest' overwrite into table 
> default.test_hive2 partition (logdate='20200508');
>  !image-2020-06-01-21-40-21-726.png!
>  load data inpath 'hdfs://hacluster/tmp/zltest' overwrite into table 
> default.test_hive2 partition (logdate='20200508');// should overwrite but 
> insert new data
>  !image-2020-06-01-21-41-28-732.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23582) LLAP: Make SplitLocationProvider impl pluggable

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23582?focusedWorklogId=480526=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480526
 ]

ASF GitHub Bot logged work on HIVE-23582:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 00:47
Start Date: 09/Sep/20 00:47
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1041:
URL: https://github.com/apache/hive/pull/1041


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480526)
Time Spent: 0.5h  (was: 20m)

> LLAP: Make SplitLocationProvider impl pluggable
> ---
>
> Key: HIVE-23582
> URL: https://issues.apache.org/jira/browse/HIVE-23582
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-23582.1.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> LLAP uses HostAffinitySplitLocationProvider implementation by default. For 
> non zookeeper based environments, a different split location provider may be 
> used. To facilitate that make the SplitLocationProvider implementation class 
> a pluggable. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23597) VectorizedOrcAcidRowBatchReader::ColumnizedDeleteEventRegistry reads delete delta directories multiple times

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23597?focusedWorklogId=480532=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480532
 ]

ASF GitHub Bot logged work on HIVE-23597:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 00:47
Start Date: 09/Sep/20 00:47
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1081:
URL: https://github.com/apache/hive/pull/1081


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480532)
Time Spent: 1h 40m  (was: 1.5h)

> VectorizedOrcAcidRowBatchReader::ColumnizedDeleteEventRegistry reads delete 
> delta directories multiple times
> 
>
> Key: HIVE-23597
> URL: https://issues.apache.org/jira/browse/HIVE-23597
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java#L1562]
> {code:java}
> try {
> final Path[] deleteDeltaDirs = getDeleteDeltaDirsFromSplit(orcSplit);
> if (deleteDeltaDirs.length > 0) {
>   int totalDeleteEventCount = 0;
>   for (Path deleteDeltaDir : deleteDeltaDirs) {
> {code}
>  
> Consider a directory layout like the following. This was created by having 
> simple set of "insert --> update --> select" queries.
>  
> {noformat}
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/base_001
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/base_002
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_003_003_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_004_004_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_005_005_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_006_006_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_007_007_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_008_008_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_009_009_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_010_010_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_011_011_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_012_012_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_013_013_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_003_003_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_004_004_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_005_005_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_006_006_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_007_007_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_008_008_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_009_009_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_010_010_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_011_011_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_012_012_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_013_013_
>  {noformat}
>  
> Orcsplit contains all the delete delta folder information. For the directory 
> layout like this, it would create {{~12 splits}}. For every split, it 
> constructs "ColumnizedDeleteEventRegistry" in VRBAcidReader and 

[jira] [Work logged] (HIVE-21141) Fix some spell errors in Hive

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21141?focusedWorklogId=480530=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480530
 ]

ASF GitHub Bot logged work on HIVE-21141:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 00:47
Start Date: 09/Sep/20 00:47
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #519:
URL: https://github.com/apache/hive/pull/519


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480530)
Time Spent: 1h  (was: 50m)

> Fix some spell errors in Hive
> -
>
> Key: HIVE-21141
> URL: https://issues.apache.org/jira/browse/HIVE-21141
> Project: Hive
>  Issue Type: Bug
>Reporter: Bo Xu
>Assignee: Bo Xu
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21141.1.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Fix som spell errors in Hive



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23561) FIX Arrow Decimal serialization for native VectorRowBatches

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23561?focusedWorklogId=480536=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480536
 ]

ASF GitHub Bot logged work on HIVE-23561:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 00:47
Start Date: 09/Sep/20 00:47
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1038:
URL: https://github.com/apache/hive/pull/1038


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480536)
Time Spent: 20m  (was: 10m)

> FIX Arrow Decimal serialization for native VectorRowBatches
> ---
>
> Key: HIVE-23561
> URL: https://issues.apache.org/jira/browse/HIVE-23561
> Project: Hive
>  Issue Type: Bug
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-23561.01.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Arrow Serializer does not properly handle Decimal primitive values when 
> selected array is used.
> In more detail, decimalValueSetter should be setting the value at 
> *arrowIndex[i]* as the value at *hiveIndex[j]*, however currently its using 
> the _same_ index!
> https://github.com/apache/hive/blob/eac25e711ea750bc52f41da7ed3c32bfe36d4f67/ql/src/java/org/apache/hadoop/hive/ql/io/arrow/Serializer.java#L926
> This works fine for cases where i == j (selected is not used) but returns 
> wrong decimal row values when i != j.
> This ticket fixes this inconsistency and adds tests with selected indexes for 
> all supported types



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23554) [LLAP] support ColumnVectorBatch with FilterContext as part of ReadPipeline

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23554?focusedWorklogId=480521=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480521
 ]

ASF GitHub Bot logged work on HIVE-23554:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 00:47
Start Date: 09/Sep/20 00:47
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1036:
URL: https://github.com/apache/hive/pull/1036


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480521)
Time Spent: 0.5h  (was: 20m)

> [LLAP] support ColumnVectorBatch with FilterContext as part of ReadPipeline
> ---
>
> Key: HIVE-23554
> URL: https://issues.apache.org/jira/browse/HIVE-23554
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-23554.01.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently the readPipeline in LLAP supports consuming ColumnVectorBatches.
> As each batch can be now tied with a Filter (HIVE-22959  HIVE-23215) we 
> should update the pipeline to consume BatchWrappers of ColumnVectorBatch and 
> a Filter instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23475) Track MJ HashTable mem usage

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23475?focusedWorklogId=480533=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480533
 ]

ASF GitHub Bot logged work on HIVE-23475:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 00:47
Start Date: 09/Sep/20 00:47
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1023:
URL: https://github.com/apache/hive/pull/1023


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480533)
Time Spent: 0.5h  (was: 20m)

> Track MJ HashTable mem usage
> 
>
> Key: HIVE-23475
> URL: https://issues.apache.org/jira/browse/HIVE-23475
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-23475.01.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21624) LLAP: Cpu metrics at thread level is broken

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21624?focusedWorklogId=480524=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480524
 ]

ASF GitHub Bot logged work on HIVE-21624:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 00:47
Start Date: 09/Sep/20 00:47
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1030:
URL: https://github.com/apache/hive/pull/1030


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480524)
Time Spent: 0.5h  (was: 20m)

> LLAP: Cpu metrics at thread level is broken
> ---
>
> Key: HIVE-21624
> URL: https://issues.apache.org/jira/browse/HIVE-21624
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 4.0.0, 3.2.0
>Reporter: Nita Dembla
>Assignee: Prasanth Jayachandran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21624.1.patch, HIVE-21624.2.patch, 
> HIVE-21624.3.patch, HIVE-21624.4.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> ExecutorThreadCPUTime and ExecutorThreadUserTime relies on thread mx bean cpu 
> metrics when available. At some point, the thread name which the metrics 
> publisher looks for has changed causing no metrics to be published for these 
> counters.  
> The above counters looks for thread with name starting with 
> "ContainerExecutor" but the llap task executor thread got changed to 
> "Task-Executor"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23309) Lazy Initialization of Hadoop Shims

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23309?focusedWorklogId=480520=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480520
 ]

ASF GitHub Bot logged work on HIVE-23309:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 00:47
Start Date: 09/Sep/20 00:47
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #999:
URL: https://github.com/apache/hive/pull/999


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480520)
Time Spent: 0.5h  (was: 20m)

> Lazy Initialization of Hadoop Shims
> ---
>
> Key: HIVE-23309
> URL: https://issues.apache.org/jira/browse/HIVE-23309
> Project: Hive
>  Issue Type: Bug
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23309.01.patch, HIVE-23309.02.patch, 
> HIVE-23309.03.patch, HIVE-23309.04.patch, HIVE-23309.05.patch, 
> HIVE-23309.06.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Initialize hadoop-shims only if CM is enabled



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23666) checkHashModeEfficiency is skipped when a groupby operator doesn't have a grouping set

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23666?focusedWorklogId=480527=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480527
 ]

ASF GitHub Bot logged work on HIVE-23666:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 00:47
Start Date: 09/Sep/20 00:47
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1103:
URL: https://github.com/apache/hive/pull/1103


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480527)
Time Spent: 0.5h  (was: 20m)

> checkHashModeEfficiency is skipped when a groupby operator doesn't have a 
> grouping set
> --
>
> Key: HIVE-23666
> URL: https://issues.apache.org/jira/browse/HIVE-23666
> Project: Hive
>  Issue Type: Bug
>Reporter: Ramesh Kumar Thangarajan
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-23666.1.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> checkHashModeEfficiency is skipped when a groupby operator doesn't have a 
> grouping set



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23742) Remove unintentional execution of TPC-DS query39 in qtests

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23742?focusedWorklogId=480523=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480523
 ]

ASF GitHub Bot logged work on HIVE-23742:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 00:47
Start Date: 09/Sep/20 00:47
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1160:
URL: https://github.com/apache/hive/pull/1160


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480523)
Time Spent: 0.5h  (was: 20m)

> Remove unintentional execution of TPC-DS query39 in qtests
> --
>
> Key: HIVE-23742
> URL: https://issues.apache.org/jira/browse/HIVE-23742
> Project: Hive
>  Issue Type: Task
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> TPC-DS queries under clientpositive/perf are meant only to check plan 
> regressions so they should never be really executed thus the execution part 
> should be removed from query39.q and cbo_query39.q



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21313) Use faster function to point to instead of copy immutable byte arrays

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21313?focusedWorklogId=480531=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480531
 ]

ASF GitHub Bot logged work on HIVE-21313:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 00:47
Start Date: 09/Sep/20 00:47
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #548:
URL: https://github.com/apache/hive/pull/548


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480531)
Time Spent: 1h  (was: 50m)

> Use faster function to point to instead of copy immutable byte arrays
> -
>
> Key: HIVE-21313
> URL: https://issues.apache.org/jira/browse/HIVE-21313
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: All Versions
>Reporter: ZhangXin
>Assignee: ZhangXin
>Priority: Minor
>  Labels: pull-request-available
> Fix For: All Versions
>
> Attachments: HIVE-21313.patch, HIVE-21313.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> In file ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorAssignRow.java
> We may find code like this:
> ```
> Text text = (Text) convertTargetWritable;
>  if (text == null)
> {     text = new Text(); }
> text.set(string);
>  ((BytesColumnVector) columnVector).setVal(
>      batchIndex, text.getBytes(), 0, text.getLength());
> ```
>  
> Using `setVal` method can copy the bytes array generated by 
> `text.getBytes()`. This is totally unnecessary at all. Since the bytes array 
> is immutable, we can just use `setRef` method to point to the specific  byte 
> array, which will also lower the memory usage.
>  
> Pull request on Github:  https://github.com/apache/hive/pull/548
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22979) Support total file size in statistics annotation

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22979?focusedWorklogId=480535=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480535
 ]

ASF GitHub Bot logged work on HIVE-22979:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 00:47
Start Date: 09/Sep/20 00:47
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #941:
URL: https://github.com/apache/hive/pull/941


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480535)
Time Spent: 1h 20m  (was: 1h 10m)

> Support total file size in statistics annotation
> 
>
> Key: HIVE-22979
> URL: https://issues.apache.org/jira/browse/HIVE-22979
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-22979.1.patch, HIVE-22979.2.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Hive statistics annotation provide estimated Statistics for each operator. 
> The data size provided in TableScanOperator is raw data size (after 
> decompression and decoding), but there are some optimizations that can be 
> performed based on total file size on disk (scan cost estimation).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23611) Mandate fully qualified absolute path for external table base dir during REPL operation

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23611?focusedWorklogId=480534=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480534
 ]

ASF GitHub Bot logged work on HIVE-23611:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 00:47
Start Date: 09/Sep/20 00:47
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1120:
URL: https://github.com/apache/hive/pull/1120


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480534)
Time Spent: 1h 10m  (was: 1h)

> Mandate fully qualified absolute path for external table base dir during REPL 
> operation
> ---
>
> Key: HIVE-23611
> URL: https://issues.apache.org/jira/browse/HIVE-23611
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23611.01.patch, HIVE-23611.02.patch, 
> HIVE-23611.03.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23443) LLAP speculative task pre-emption seems to be not working

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23443?focusedWorklogId=480525=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480525
 ]

ASF GitHub Bot logged work on HIVE-23443:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 00:47
Start Date: 09/Sep/20 00:47
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1012:
URL: https://github.com/apache/hive/pull/1012


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480525)
Time Spent: 1.5h  (was: 1h 20m)

> LLAP speculative task pre-emption seems to be not working
> -
>
> Key: HIVE-23443
> URL: https://issues.apache.org/jira/browse/HIVE-23443
> Project: Hive
>  Issue Type: Bug
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-23443.1.patch, HIVE-23443.2.patch, 
> HIVE-23443.3.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> I think after HIVE-23210 we are getting a stable sort order and it is causing 
> pre-emption to not work in certain cases.
> {code:java}
> "attempt_1589167813851__119_01_08_0 
> (hive_20200511055921_89598f09-19f1-4969-ab7a-82e2dd796273-119/Map 1, started 
> at 2020-05-11 05:59:22, in preemption queue, can finish)", 
> "attempt_1589167813851_0008_84_01_08_1 
> (hive_20200511055928_7ae29ca3-e67d-4d1f-b193-05651023b503-84/Map 1, started 
> at 2020-05-11 06:00:23, in preemption queue, can finish)" {code}
> Scheduler only peek's at the pre-emption queue and looks at whether it is 
> non-finishable. 
> [https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorService.java#L420]
> In the above case, all tasks are speculative but state change is not 
> triggering pre-emption queue re-ordering so peek() always returns canFinish 
> task even though non-finishable tasks are in the queue. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23710) Add table meta cache limit when starting Hive server2

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23710?focusedWorklogId=480529=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480529
 ]

ASF GitHub Bot logged work on HIVE-23710:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 00:47
Start Date: 09/Sep/20 00:47
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1135:
URL: https://github.com/apache/hive/pull/1135


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480529)
Time Spent: 0.5h  (was: 20m)

> Add table meta cache limit when starting Hive server2
> -
>
> Key: HIVE-23710
> URL: https://issues.apache.org/jira/browse/HIVE-23710
> Project: Hive
>  Issue Type: Improvement
> Environment: Hive 2.3.6
>Reporter: Deegue
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23710.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When we start up Hive server2, it will connect to metastore to get table meta 
> info by database and cache them. If there are many tables in a database, 
> however, will exceed `hive.metastore.client.socket.timeout`.
> Then exception thrown like:
> {noformat}
> 2020-06-17T11:38:27,595  WARN [main] metastore.RetryingMetaStoreClient: 
> MetaStoreClient lost connection. Attempting to reconnect (1 of 1) after 1s. 
> getTableObjectsByName
> org.apache.thrift.transport.TTransportException: 
> java.net.SocketTimeoutException: Read timed out
>   at 
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
>  ~[hive-exec-2.3.6.jar:2.3.6]
>   at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) 
> ~[hive-exec-2.3.6.jar:2.3.6]
>   at 
> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429) 
> ~[hive-exec-2.3.6.jar:2.3.6]
>   at 
> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318) 
> ~[hive-exec-2.3.6.jar:2.3.6]
>   at 
> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
>  ~[hive-exec-2.3.6.jar:2.3.6]
>   at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77) 
> ~[hive-exec-2.3.6.jar:2.3.6]
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_table_objects_by_name_req(ThriftHiveMetastore.java:1596)
>  ~[hive-exec-2.3.6.jar:2.3.6]
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_table_objects_by_name_req(ThriftHiveMetastore.java:1583)
>  ~[hive-exec-2.3.6.jar:2.3.6]
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTableObjectsByName(HiveMetaStoreClient.java:1370)
>  ~[hive-exec-2.3.6.jar:2.3.6]
>   at 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.getTableObjectsByName(SessionHiveMetaStoreClient.java:238)
>  ~[hive-exec-2.3.6.jar:2.3.6]
>   at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) ~[?:?]
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_121]
>   at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_121]
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:206)
>  ~[hive-exec-2.3.6.jar:2.3.6]
>   at com.sun.proxy.$Proxy38.getTableObjectsByName(Unknown Source) ~[?:?]
>   at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) ~[?:?]
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_121]
>   at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_121]
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2336)
>  ~[hive-exec-2.3.6.jar:2.3.6]
>   at com.sun.proxy.$Proxy38.getTableObjectsByName(Unknown Source) ~[?:?]
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.getAllTableObjects(Hive.java:1343) 
> ~[hive-exec-2.3.6.jar:2.3.6]
>   at 
> org.apache.hadoop.hive.ql.metadata.HiveMaterializedViewsRegistry.init(HiveMaterializedViewsRegistry.java:127)
>  ~[hive-exec-2.3.6.jar:2.3.6]
>   at 
> org.apache.hive.service.server.HiveServer2.init(HiveServer2.java:167) 
> ~[hive-service-2.3.6.jar:2.3.6]
>   at 
> org.apache.hive.service.server.HiveServer2.startHiveServer2(HiveServer2.java:607)
>  ~[hive-service-2.3.6.jar:2.3.6]
>   at 
> 

[jira] [Work logged] (HIVE-23551) Acid: Update queries should treat dirCache as read-only in AcidUtils

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23551?focusedWorklogId=480522=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480522
 ]

ASF GitHub Bot logged work on HIVE-23551:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 00:47
Start Date: 09/Sep/20 00:47
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1047:
URL: https://github.com/apache/hive/pull/1047


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480522)
Time Spent: 0.5h  (was: 20m)

> Acid: Update queries should treat dirCache as read-only in AcidUtils
> 
>
> Key: HIVE-23551
> URL: https://issues.apache.org/jira/browse/HIVE-23551
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-23551.1.patch, HIVE-23551.2.patch, 
> HIVE-23551.3.patch, HIVE-23551.4.patch, HIVE-23551.5.patch, HIVE-23551.6.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Update statements create delta folders at the end of the execution. When 
> {{insert overwrite}} followed by {{update}} is executed, it does not get any 
> open txns and ends up caching the {{base}} folder. However, the delta folder 
> which gets created at the end of the statement never makes it to the cache. 
> This creates wrong results.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22360) MultiDelimitSerDe returns wrong results in last column when the loaded file has more columns than those in table schema

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22360?focusedWorklogId=480519=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480519
 ]

ASF GitHub Bot logged work on HIVE-22360:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 00:47
Start Date: 09/Sep/20 00:47
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #823:
URL: https://github.com/apache/hive/pull/823


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480519)
Time Spent: 40m  (was: 0.5h)

> MultiDelimitSerDe returns wrong results in last column when the loaded file 
> has more columns than those in table schema
> ---
>
> Key: HIVE-22360
> URL: https://issues.apache.org/jira/browse/HIVE-22360
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 4.0.0
>Reporter: Shubham Chaurasia
>Assignee: Shubham Chaurasia
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-22360.1.patch, HIVE-22360.2.patch, 
> HIVE-22360.3.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Repro steps:
> Input file:
> {code}
> 1^,1^,^,0^,0^,0 
> 2^,1^,^,0^,1^,0 
> 3^,1^,^,0^,0^,0 
> 4^,1^,^,0^,1^,0
> {code}
> Queries:
> {code}
> CREATE TABLE  n2(colA int, colB tinyint, colC timestamp, colD smallint, colE 
> smallint) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.MultiDelimitSerDe' 
> WITH SERDEPROPERTIES ("field.delim"="^,")STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH '/Users/schaurasia/Documents/input_6_cols.csv' 
> OVERWRITE INTO TABLE n2;
>  select * from n2;
> // wrong last column results here.
> +--+--+--+--+--+
> | n2.cola  | n2.colb  | n2.colc  | n2.cold  | n2.cole  |
> +--+--+--+--+--+
> | 1| 1| NULL | 0| NULL |
> | 2| 1| NULL | 0| NULL |
> | 3| 1| NULL | 0| NULL |
> | 4| 1| NULL | 0| NULL |
> +--+--+--+--+--+
> {code}
> Cause:
> In multi-serde parsing, the total length calculation here: 
> https://github.com/apache/hive/blob/rel/release-3.1.2/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyStruct.java#L308
>  does not take extra fields into account.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23735) Reducer misestimate for export command

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23735?focusedWorklogId=480511=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480511
 ]

ASF GitHub Bot logged work on HIVE-23735:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 00:46
Start Date: 09/Sep/20 00:46
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1165:
URL: https://github.com/apache/hive/pull/1165


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480511)
Time Spent: 0.5h  (was: 20m)

> Reducer misestimate for export command
> --
>
> Key: HIVE-23735
> URL: https://issues.apache.org/jira/browse/HIVE-23735
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23735.1.wip.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L6869
> {code}
> if (dest_tab.getNumBuckets() > 0) {
> ...
> }
> {code}
> For "export" command, HS2 creates a dummy table and for this table and gets 
> "1" as the number of buckets.
> {noformat}
> set hive.stats.autogather=false;
> export table sample_table to '/tmp/export/sampe_db/t1';
> {noformat}
> This causes issues in reducer estimates and always lands up with '1' as the 
> number of reducer task. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=480516=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480516
 ]

ASF GitHub Bot logged work on HIVE-23688:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 00:46
Start Date: 09/Sep/20 00:46
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1122:
URL: https://github.com/apache/hive/pull/1122


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480516)
Time Spent: 0.5h  (was: 20m)

> Vectorization: IndexArrayOutOfBoundsException For map type column which 
> includes null value
> ---
>
> Key: HIVE-23688
> URL: https://issues.apache.org/jira/browse/HIVE-23688
> Project: Hive
>  Issue Type: Bug
>  Components: Parquet, storage-api, Vectorization
>Affects Versions: All Versions
>Reporter: 范宜臻
>Assignee: 范宜臻
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.0.0, 4.0.0
>
> Attachments: HIVE-23688.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays 
> in MapColumnVector.values(BytesColumnVector) when values in map contain 
> {color:#de350b}null{color}
> reproduce in master branch:
> {code:java}
> set hive.vectorized.execution.enabled=true; 
> CREATE TABLE parquet_map_type (id int,stringMap map) 
> stored as parquet; 
> insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', 
> 'bar'); 
> select id, stringMap['k1'] from parquet_map_type group by 1,2;
> {code}
> query explain:
> {code:java}
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2 vectorized
>   File Output Operator [FS_12]
> Group By Operator [GBY_11] (rows=5 width=2)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE] vectorized
>   SHUFFLE [RS_10]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_9] (rows=10 width=2)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_8] (rows=10 width=2)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=10 width=2)
>   
> temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"]
> {code}
> runtime error:
> {code:java}
> Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, 
> diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row 
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> 

[jira] [Work logged] (HIVE-23639) Fix FindBug issues in hive-contrib

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23639?focusedWorklogId=480515=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480515
 ]

ASF GitHub Bot logged work on HIVE-23639:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 00:46
Start Date: 09/Sep/20 00:46
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #1163:
URL: https://github.com/apache/hive/pull/1163#issuecomment-689229739


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480515)
Time Spent: 20m  (was: 10m)

> Fix FindBug issues in hive-contrib
> --
>
> Key: HIVE-23639
> URL: https://issues.apache.org/jira/browse/HIVE-23639
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: spotbugsXml.xml
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23755) Fix Ranger Url extra slash

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23755?focusedWorklogId=480508=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480508
 ]

ASF GitHub Bot logged work on HIVE-23755:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 00:46
Start Date: 09/Sep/20 00:46
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1173:
URL: https://github.com/apache/hive/pull/1173


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480508)
Time Spent: 1h  (was: 50m)

> Fix Ranger Url extra slash
> --
>
> Key: HIVE-23755
> URL: https://issues.apache.org/jira/browse/HIVE-23755
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23755.01.patch, HIVE-23755.02.patch, 
> HIVE-23755.03.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23770) Druid filter translation unable to handle inverted between

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23770?focusedWorklogId=480513=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480513
 ]

ASF GitHub Bot logged work on HIVE-23770:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 00:46
Start Date: 09/Sep/20 00:46
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1190:
URL: https://github.com/apache/hive/pull/1190


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480513)
Time Spent: 40m  (was: 0.5h)

> Druid filter translation unable to handle inverted between
> --
>
> Key: HIVE-23770
> URL: https://issues.apache.org/jira/browse/HIVE-23770
> Project: Hive
>  Issue Type: Bug
>Reporter: Nishant Bangarwa
>Assignee: Nishant Bangarwa
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23770.1.patch, HIVE-23770.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Druid filter translation happens in Calcite and does not uses HiveBetween 
> inverted flag for translation this misses a negation in the planned query



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23606) LLAP: Delay In DirectByteBuffer Clean Up For EncodedReaderImpl

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23606?focusedWorklogId=480518=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480518
 ]

ASF GitHub Bot logged work on HIVE-23606:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 00:47
Start Date: 09/Sep/20 00:47
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1057:
URL: https://github.com/apache/hive/pull/1057


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480518)
Time Spent: 0.5h  (was: 20m)

> LLAP: Delay In DirectByteBuffer Clean Up For EncodedReaderImpl
> --
>
> Key: HIVE-23606
> URL: https://issues.apache.org/jira/browse/HIVE-23606
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 3.0.0
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-23606.01.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> DirectByteBuffler are only cleaned up when there is Full GC or manually 
> invoked cleaner method of DirectByteBuffer, Since full GC may take some time 
> to kick in, In the meanwhile the native memory usage of LLAP daemon process 
> might shoot up and this will force the YARN pmem monitor to kill the 
> container running the daemon.
> HIVE-16180 tried to solve this problem, but the code structure got messed up 
> after HIVE-15665
> The IdentityHashMap (toRelease) is initialized in 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedReaderImpl.java#L409
>  , but it is getting re-initialized inside the method 
> getDataFromCacheAndDisk() 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedReaderImpl.java#L633
>   which makes it local to that method hence the original toRelease 
> IdentityHashMap remains empty.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23585) Retrieve replication instance metrics details

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23585?focusedWorklogId=480517=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480517
 ]

ASF GitHub Bot logged work on HIVE-23585:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 00:46
Start Date: 09/Sep/20 00:46
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1100:
URL: https://github.com/apache/hive/pull/1100


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480517)
Time Spent: 40m  (was: 0.5h)

> Retrieve replication instance metrics details
> -
>
> Key: HIVE-23585
> URL: https://issues.apache.org/jira/browse/HIVE-23585
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23585.01.patch, HIVE-23585.02.patch, 
> HIVE-23585.03.patch, Replication Metrics.pdf
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23640) Fix FindBug issues in hive-druid-handler

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23640?focusedWorklogId=480509=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480509
 ]

ASF GitHub Bot logged work on HIVE-23640:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 00:46
Start Date: 09/Sep/20 00:46
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #1164:
URL: https://github.com/apache/hive/pull/1164#issuecomment-689229727


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480509)
Time Spent: 20m  (was: 10m)

> Fix FindBug issues in hive-druid-handler
> 
>
> Key: HIVE-23640
> URL: https://issues.apache.org/jira/browse/HIVE-23640
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: spotbugsXml.xml
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23637) Fix FindBug issues in hive-cli

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23637?focusedWorklogId=480510=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480510
 ]

ASF GitHub Bot logged work on HIVE-23637:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 00:46
Start Date: 09/Sep/20 00:46
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #1162:
URL: https://github.com/apache/hive/pull/1162#issuecomment-689229748


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480510)
Time Spent: 20m  (was: 10m)

> Fix FindBug issues in hive-cli
> --
>
> Key: HIVE-23637
> URL: https://issues.apache.org/jira/browse/HIVE-23637
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: spotbugsXml.xml
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23721) MetaStoreDirectSql.ensureDbInit() need to optimize QuerySQL

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23721?focusedWorklogId=480512=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480512
 ]

ASF GitHub Bot logged work on HIVE-23721:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 00:46
Start Date: 09/Sep/20 00:46
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1202:
URL: https://github.com/apache/hive/pull/1202


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480512)
Time Spent: 40m  (was: 0.5h)

> MetaStoreDirectSql.ensureDbInit() need to optimize QuerySQL
> ---
>
> Key: HIVE-23721
> URL: https://issues.apache.org/jira/browse/HIVE-23721
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0, 3.1.2
> Environment: Hadoop 3.1(1700+ nodes)
> YARN 3.1 (with timelineserver enabled,https enabled)
> Hive 3.1 (15 HS2 instance)
> 6+ YARN Applications every day
>Reporter: YulongZ
>Assignee: zhangbutao
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-23721.01.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> From Hive3.0,catalog added to hivemeta,many schema of metastore added column 
> “catName”,and index for table added column “catName”。
> In MetaStoreDirectSql.ensureDbInit() ,two queries below
> “
>   initQueries.add(pm.newQuery(MTableColumnStatistics.class, "dbName == 
> ''"));
>   initQueries.add(pm.newQuery(MPartitionColumnStatistics.class, "dbName 
> == ''"));
> ”
> should use "catName == ''" instead of "dbName == ''",because “catName” is the 
> first index column。
> When  data of metastore become large,for example, table of 
> MPartitionColumnStatistics have millions of lines。The 
> “newQuery(MPartitionColumnStatistics.class, "dbName == ''")” for metastore 
> executed very slowly,and the query “show tables“ for hiveserver2 executed 
> very slowly too。



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23784) Fix Replication Metrics Sink to DB

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23784?focusedWorklogId=480514=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480514
 ]

ASF GitHub Bot logged work on HIVE-23784:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 00:46
Start Date: 09/Sep/20 00:46
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1193:
URL: https://github.com/apache/hive/pull/1193


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480514)
Time Spent: 0.5h  (was: 20m)

> Fix Replication Metrics Sink to DB
> --
>
> Key: HIVE-23784
> URL: https://issues.apache.org/jira/browse/HIVE-23784
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23784.01.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23454) Querying hive table which has Materialized view fails with HiveAccessControlException

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23454?focusedWorklogId=480454=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480454
 ]

ASF GitHub Bot logged work on HIVE-23454:
-

Author: ASF GitHub Bot
Created on: 08/Sep/20 21:32
Start Date: 08/Sep/20 21:32
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on pull request #1471:
URL: https://github.com/apache/hive/pull/1471#issuecomment-689147366


   We should probably create a follow-up JIRA to check authorization before 
triggering the rewriting algorithm. If the compilation overhead to check every 
MV that is applicable to the given query is unacceptable, permissions could 
possibly be kept in the HS2 registry and refreshed periodically in the 
background, then verified after rewriting, which would at least decrease the 
number of authorization failures. @vineetgarg02 , can you create a JIRA for 
this?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480454)
Time Spent: 1h  (was: 50m)

> Querying hive table which has Materialized view fails with 
> HiveAccessControlException
> -
>
> Key: HIVE-23454
> URL: https://issues.apache.org/jira/browse/HIVE-23454
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization, HiveServer2
>Affects Versions: 3.0.0, 3.2.0
>Reporter: Chiran Ravani
>Assignee: Vineet Garg
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Query fails with HiveAccessControlException against table when there is  
> Materialized view pointing to that table which end user does not have access 
> to, but the actual table user has all the privileges.
> From the HiveServer2 logs - it looks as part of optimization Hive uses 
> materialized view to query the data instead of table and since end user does 
> not have access on MV we receive HiveAccessControlException.
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveVolcanoPlanner.java#L99
> The Simplest reproducer for this issue is as below.
> 1. Create a table using hive user and insert some data
> {code:java}
> create table db1.testmvtable(id int, name string) partitioned by(year int);
> insert into db1.testmvtable partition(year=2020) values(1,'Name1');
> insert into db1.testmvtable partition(year=2020) values(1,'Name2');
> insert into db1.testmvtable partition(year=2016) values(1,'Name1');
> insert into db1.testmvtable partition(year=2016) values(1,'Name2');
> {code}
> 2. Create Materialized view on top of above table with partitioned and where 
> clause as hive user.
> {code:java}
> CREATE MATERIALIZED VIEW db2.testmv PARTITIONED ON(year) as select * from 
> db1.testmvtable tmv where year >= 2018;
> {code}
> 3. Grant all (Select to be minimum) access to user 'chiran' via Ranger on 
> database db1.
> 4. Run select on base table db1.testmvtable as 'chiran' with where clause 
> having partition value >=2018, it runs into HiveAccessControlException on 
> db2.testmv
> {code:java}
> eg:- (select * from db1.testmvtable where year=2020;)
> 0: jdbc:hive2://node2> select * from db1.testmvtable where year=2020;
> Error: Error while compiling statement: FAILED: HiveAccessControlException 
> Permission denied: user [chiran] does not have [SELECT] privilege on 
> [db2/testmv/*] (state=42000,code=4)
> {code}
> 5. This works when partition column is not in MV
> {code:java}
> 0: jdbc:hive2://node2> select * from db1.testmvtable where year=2016;
> DEBUG : Acquired the compile lock.
> INFO  : Compiling 
> command(queryId=hive_20200507130248_841458fe-7048-4727-8816-3f9472d2a67a): 
> select * from db1.testmvtable where year=2016
> DEBUG : Encoding valid txns info 897:9223372036854775807::893,895,896 
> txnid:897
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Returning Hive schema: 
> Schema(fieldSchemas:[FieldSchema(name:testmvtable.id, type:int, 
> comment:null), FieldSchema(name:testmvtable.name, type:string, comment:null), 
> FieldSchema(name:testmvtable.year, type:int, comment:null)], properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20200507130248_841458fe-7048-4727-8816-3f9472d2a67a); 
> Time taken: 0.222 seconds
> DEBUG : Encoding valid txn write ids info 
> 897$db1.testmvtable:4:9223372036854775807:: txnid:897
> INFO  : Executing 
> 

[jira] [Work logged] (HIVE-23454) Querying hive table which has Materialized view fails with HiveAccessControlException

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23454?focusedWorklogId=480450=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480450
 ]

ASF GitHub Bot logged work on HIVE-23454:
-

Author: ASF GitHub Bot
Created on: 08/Sep/20 21:26
Start Date: 08/Sep/20 21:26
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on a change in pull request #1471:
URL: https://github.com/apache/hive/pull/1471#discussion_r485205008



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java
##
@@ -2280,6 +2280,15 @@ private RelNode 
applyMaterializedViewRewriting(RelOptPlanner planner, RelNode ba
 return calcitePreMVRewritingPlan;
   }
 
+  try {
+if 
(!HiveMaterializedViewUtils.checkPrivilegeForMV(materializedViewsUsedAfterRewrite))
 {
+  // if materialized views do not have appropriate privilges, we 
shouldn't be using them

Review comment:
   nit. typo





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480450)
Time Spent: 40m  (was: 0.5h)

> Querying hive table which has Materialized view fails with 
> HiveAccessControlException
> -
>
> Key: HIVE-23454
> URL: https://issues.apache.org/jira/browse/HIVE-23454
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization, HiveServer2
>Affects Versions: 3.0.0, 3.2.0
>Reporter: Chiran Ravani
>Assignee: Vineet Garg
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Query fails with HiveAccessControlException against table when there is  
> Materialized view pointing to that table which end user does not have access 
> to, but the actual table user has all the privileges.
> From the HiveServer2 logs - it looks as part of optimization Hive uses 
> materialized view to query the data instead of table and since end user does 
> not have access on MV we receive HiveAccessControlException.
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveVolcanoPlanner.java#L99
> The Simplest reproducer for this issue is as below.
> 1. Create a table using hive user and insert some data
> {code:java}
> create table db1.testmvtable(id int, name string) partitioned by(year int);
> insert into db1.testmvtable partition(year=2020) values(1,'Name1');
> insert into db1.testmvtable partition(year=2020) values(1,'Name2');
> insert into db1.testmvtable partition(year=2016) values(1,'Name1');
> insert into db1.testmvtable partition(year=2016) values(1,'Name2');
> {code}
> 2. Create Materialized view on top of above table with partitioned and where 
> clause as hive user.
> {code:java}
> CREATE MATERIALIZED VIEW db2.testmv PARTITIONED ON(year) as select * from 
> db1.testmvtable tmv where year >= 2018;
> {code}
> 3. Grant all (Select to be minimum) access to user 'chiran' via Ranger on 
> database db1.
> 4. Run select on base table db1.testmvtable as 'chiran' with where clause 
> having partition value >=2018, it runs into HiveAccessControlException on 
> db2.testmv
> {code:java}
> eg:- (select * from db1.testmvtable where year=2020;)
> 0: jdbc:hive2://node2> select * from db1.testmvtable where year=2020;
> Error: Error while compiling statement: FAILED: HiveAccessControlException 
> Permission denied: user [chiran] does not have [SELECT] privilege on 
> [db2/testmv/*] (state=42000,code=4)
> {code}
> 5. This works when partition column is not in MV
> {code:java}
> 0: jdbc:hive2://node2> select * from db1.testmvtable where year=2016;
> DEBUG : Acquired the compile lock.
> INFO  : Compiling 
> command(queryId=hive_20200507130248_841458fe-7048-4727-8816-3f9472d2a67a): 
> select * from db1.testmvtable where year=2016
> DEBUG : Encoding valid txns info 897:9223372036854775807::893,895,896 
> txnid:897
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Returning Hive schema: 
> Schema(fieldSchemas:[FieldSchema(name:testmvtable.id, type:int, 
> comment:null), FieldSchema(name:testmvtable.name, type:string, comment:null), 
> FieldSchema(name:testmvtable.year, type:int, comment:null)], properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20200507130248_841458fe-7048-4727-8816-3f9472d2a67a); 
> Time taken: 0.222 seconds
> DEBUG : Encoding valid txn write ids info 
> 897$db1.testmvtable:4:9223372036854775807:: txnid:897
> INFO  : 

[jira] [Work logged] (HIVE-23454) Querying hive table which has Materialized view fails with HiveAccessControlException

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23454?focusedWorklogId=480451=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480451
 ]

ASF GitHub Bot logged work on HIVE-23454:
-

Author: ASF GitHub Bot
Created on: 08/Sep/20 21:26
Start Date: 08/Sep/20 21:26
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on a change in pull request #1471:
URL: https://github.com/apache/hive/pull/1471#discussion_r485205385



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveMaterializedViewUtils.java
##
@@ -347,6 +355,38 @@ public static RelNode copyNodeNewCluster(RelOptCluster 
optCluster, RelNode node)
 }
   }
 
+  /**
+   * Validate if given materialized view has SELECT privileges for current user
+   * @param cachedMVTable
+   * @return false if user does not have privilege otherwise true
+   * @throws HiveException
+   */
+  public static boolean checkPrivilegeForMV(List cachedMVTableList) 
throws HiveException {

Review comment:
   Can we use full name: `checkPrivilegeForMaterializedViews` ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480451)
Time Spent: 50m  (was: 40m)

> Querying hive table which has Materialized view fails with 
> HiveAccessControlException
> -
>
> Key: HIVE-23454
> URL: https://issues.apache.org/jira/browse/HIVE-23454
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization, HiveServer2
>Affects Versions: 3.0.0, 3.2.0
>Reporter: Chiran Ravani
>Assignee: Vineet Garg
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Query fails with HiveAccessControlException against table when there is  
> Materialized view pointing to that table which end user does not have access 
> to, but the actual table user has all the privileges.
> From the HiveServer2 logs - it looks as part of optimization Hive uses 
> materialized view to query the data instead of table and since end user does 
> not have access on MV we receive HiveAccessControlException.
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveVolcanoPlanner.java#L99
> The Simplest reproducer for this issue is as below.
> 1. Create a table using hive user and insert some data
> {code:java}
> create table db1.testmvtable(id int, name string) partitioned by(year int);
> insert into db1.testmvtable partition(year=2020) values(1,'Name1');
> insert into db1.testmvtable partition(year=2020) values(1,'Name2');
> insert into db1.testmvtable partition(year=2016) values(1,'Name1');
> insert into db1.testmvtable partition(year=2016) values(1,'Name2');
> {code}
> 2. Create Materialized view on top of above table with partitioned and where 
> clause as hive user.
> {code:java}
> CREATE MATERIALIZED VIEW db2.testmv PARTITIONED ON(year) as select * from 
> db1.testmvtable tmv where year >= 2018;
> {code}
> 3. Grant all (Select to be minimum) access to user 'chiran' via Ranger on 
> database db1.
> 4. Run select on base table db1.testmvtable as 'chiran' with where clause 
> having partition value >=2018, it runs into HiveAccessControlException on 
> db2.testmv
> {code:java}
> eg:- (select * from db1.testmvtable where year=2020;)
> 0: jdbc:hive2://node2> select * from db1.testmvtable where year=2020;
> Error: Error while compiling statement: FAILED: HiveAccessControlException 
> Permission denied: user [chiran] does not have [SELECT] privilege on 
> [db2/testmv/*] (state=42000,code=4)
> {code}
> 5. This works when partition column is not in MV
> {code:java}
> 0: jdbc:hive2://node2> select * from db1.testmvtable where year=2016;
> DEBUG : Acquired the compile lock.
> INFO  : Compiling 
> command(queryId=hive_20200507130248_841458fe-7048-4727-8816-3f9472d2a67a): 
> select * from db1.testmvtable where year=2016
> DEBUG : Encoding valid txns info 897:9223372036854775807::893,895,896 
> txnid:897
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Returning Hive schema: 
> Schema(fieldSchemas:[FieldSchema(name:testmvtable.id, type:int, 
> comment:null), FieldSchema(name:testmvtable.name, type:string, comment:null), 
> FieldSchema(name:testmvtable.year, type:int, comment:null)], properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20200507130248_841458fe-7048-4727-8816-3f9472d2a67a); 
> 

[jira] [Work logged] (HIVE-23454) Querying hive table which has Materialized view fails with HiveAccessControlException

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23454?focusedWorklogId=480436=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480436
 ]

ASF GitHub Bot logged work on HIVE-23454:
-

Author: ASF GitHub Bot
Created on: 08/Sep/20 20:26
Start Date: 08/Sep/20 20:26
Worklog Time Spent: 10m 
  Work Description: vineetgarg02 commented on pull request #1471:
URL: https://github.com/apache/hive/pull/1471#issuecomment-689117082


   @jcamachor  I have addressed your review comments in latest update.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480436)
Time Spent: 0.5h  (was: 20m)

> Querying hive table which has Materialized view fails with 
> HiveAccessControlException
> -
>
> Key: HIVE-23454
> URL: https://issues.apache.org/jira/browse/HIVE-23454
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization, HiveServer2
>Affects Versions: 3.0.0, 3.2.0
>Reporter: Chiran Ravani
>Assignee: Vineet Garg
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Query fails with HiveAccessControlException against table when there is  
> Materialized view pointing to that table which end user does not have access 
> to, but the actual table user has all the privileges.
> From the HiveServer2 logs - it looks as part of optimization Hive uses 
> materialized view to query the data instead of table and since end user does 
> not have access on MV we receive HiveAccessControlException.
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveVolcanoPlanner.java#L99
> The Simplest reproducer for this issue is as below.
> 1. Create a table using hive user and insert some data
> {code:java}
> create table db1.testmvtable(id int, name string) partitioned by(year int);
> insert into db1.testmvtable partition(year=2020) values(1,'Name1');
> insert into db1.testmvtable partition(year=2020) values(1,'Name2');
> insert into db1.testmvtable partition(year=2016) values(1,'Name1');
> insert into db1.testmvtable partition(year=2016) values(1,'Name2');
> {code}
> 2. Create Materialized view on top of above table with partitioned and where 
> clause as hive user.
> {code:java}
> CREATE MATERIALIZED VIEW db2.testmv PARTITIONED ON(year) as select * from 
> db1.testmvtable tmv where year >= 2018;
> {code}
> 3. Grant all (Select to be minimum) access to user 'chiran' via Ranger on 
> database db1.
> 4. Run select on base table db1.testmvtable as 'chiran' with where clause 
> having partition value >=2018, it runs into HiveAccessControlException on 
> db2.testmv
> {code:java}
> eg:- (select * from db1.testmvtable where year=2020;)
> 0: jdbc:hive2://node2> select * from db1.testmvtable where year=2020;
> Error: Error while compiling statement: FAILED: HiveAccessControlException 
> Permission denied: user [chiran] does not have [SELECT] privilege on 
> [db2/testmv/*] (state=42000,code=4)
> {code}
> 5. This works when partition column is not in MV
> {code:java}
> 0: jdbc:hive2://node2> select * from db1.testmvtable where year=2016;
> DEBUG : Acquired the compile lock.
> INFO  : Compiling 
> command(queryId=hive_20200507130248_841458fe-7048-4727-8816-3f9472d2a67a): 
> select * from db1.testmvtable where year=2016
> DEBUG : Encoding valid txns info 897:9223372036854775807::893,895,896 
> txnid:897
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Returning Hive schema: 
> Schema(fieldSchemas:[FieldSchema(name:testmvtable.id, type:int, 
> comment:null), FieldSchema(name:testmvtable.name, type:string, comment:null), 
> FieldSchema(name:testmvtable.year, type:int, comment:null)], properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20200507130248_841458fe-7048-4727-8816-3f9472d2a67a); 
> Time taken: 0.222 seconds
> DEBUG : Encoding valid txn write ids info 
> 897$db1.testmvtable:4:9223372036854775807:: txnid:897
> INFO  : Executing 
> command(queryId=hive_20200507130248_841458fe-7048-4727-8816-3f9472d2a67a): 
> select * from db1.testmvtable where year=2016
> INFO  : Completed executing 
> command(queryId=hive_20200507130248_841458fe-7048-4727-8816-3f9472d2a67a); 
> Time taken: 0.008 seconds
> INFO  : OK
> DEBUG : Shutting down query select * from db1.testmvtable where year=2016
> +-+---+---+
> | testmvtable.id  | 

[jira] [Updated] (HIVE-24129) Deleting the previous successful dump directory should be based on config

2020-09-08 Thread Arko Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arko Sharma updated HIVE-24129:
---
Attachment: HIVE-24129.01.patch
Status: Patch Available  (was: Open)

> Deleting the previous successful dump directory should be based on config
> -
>
> Key: HIVE-24129
> URL: https://issues.apache.org/jira/browse/HIVE-24129
> Project: Hive
>  Issue Type: Task
>Reporter: Pravin Sinha
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24129.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {color:#22}Description: Provide a policy level config defaulted to 
> true.{color}
> {color:#22}This can help debug any issue in the production.{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24129) Deleting the previous successful dump directory should be based on config

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24129:
--
Labels: pull-request-available  (was: )

> Deleting the previous successful dump directory should be based on config
> -
>
> Key: HIVE-24129
> URL: https://issues.apache.org/jira/browse/HIVE-24129
> Project: Hive
>  Issue Type: Task
>Reporter: Pravin Sinha
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {color:#22}Description: Provide a policy level config defaulted to 
> true.{color}
> {color:#22}This can help debug any issue in the production.{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24129) Deleting the previous successful dump directory should be based on config

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24129?focusedWorklogId=480420=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480420
 ]

ASF GitHub Bot logged work on HIVE-24129:
-

Author: ASF GitHub Bot
Created on: 08/Sep/20 20:04
Start Date: 08/Sep/20 20:04
Worklog Time Spent: 10m 
  Work Description: ArkoSharma opened a new pull request #1479:
URL: https://github.com/apache/hive/pull/1479


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480420)
Remaining Estimate: 0h
Time Spent: 10m

> Deleting the previous successful dump directory should be based on config
> -
>
> Key: HIVE-24129
> URL: https://issues.apache.org/jira/browse/HIVE-24129
> Project: Hive
>  Issue Type: Task
>Reporter: Pravin Sinha
>Assignee: Arko Sharma
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {color:#22}Description: Provide a policy level config defaulted to 
> true.{color}
> {color:#22}This can help debug any issue in the production.{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24127) Dump events from default catalog only

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24127?focusedWorklogId=480385=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480385
 ]

ASF GitHub Bot logged work on HIVE-24127:
-

Author: ASF GitHub Bot
Created on: 08/Sep/20 19:19
Start Date: 08/Sep/20 19:19
Worklog Time Spent: 10m 
  Work Description: aasha opened a new pull request #1478:
URL: https://github.com/apache/hive/pull/1478


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480385)
Remaining Estimate: 0h
Time Spent: 10m

> Dump events from default catalog only
> -
>
> Key: HIVE-24127
> URL: https://issues.apache.org/jira/browse/HIVE-24127
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
> Attachments: HIVE-24127.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Don't dump events from spark catalog. In bootstrap we skip spark tables. In 
> inceremental load also we should skip spark events.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24127) Dump events from default catalog only

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24127:
--
Labels: pull-request-available  (was: )

> Dump events from default catalog only
> -
>
> Key: HIVE-24127
> URL: https://issues.apache.org/jira/browse/HIVE-24127
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24127.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Don't dump events from spark catalog. In bootstrap we skip spark tables. In 
> inceremental load also we should skip spark events.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24127) Dump events from default catalog only

2020-09-08 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-24127:
---
Attachment: HIVE-24127.01.patch
Status: Patch Available  (was: Open)

> Dump events from default catalog only
> -
>
> Key: HIVE-24127
> URL: https://issues.apache.org/jira/browse/HIVE-24127
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
> Attachments: HIVE-24127.01.patch
>
>
> Don't dump events from spark catalog. In bootstrap we skip spark tables. In 
> inceremental load also we should skip spark events.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24031) Infinite planning time on syntactically big queries

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24031?focusedWorklogId=480276=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480276
 ]

ASF GitHub Bot logged work on HIVE-24031:
-

Author: ASF GitHub Bot
Created on: 08/Sep/20 16:45
Start Date: 08/Sep/20 16:45
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk merged pull request #1424:
URL: https://github.com/apache/hive/pull/1424


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480276)
Time Spent: 50m  (was: 40m)

> Infinite planning time on syntactically big queries
> ---
>
> Key: HIVE-24031
> URL: https://issues.apache.org/jira/browse/HIVE-24031
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: ASTNode_getChildren_cost.png, 
> query_big_array_constructor.nps
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Syntactically big queries (~1 million tokens), such as the query shown below, 
> lead to very big (seemingly infinite) planning times.
> {code:sql}
> select posexplode(array('item1', 'item2', ..., 'item1M'));
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24131) Use original src location always when data copy runs on target

2020-09-08 Thread Pravin Sinha (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha updated HIVE-24131:

Status: Patch Available  (was: Open)

> Use original src location always when data copy runs on target 
> ---
>
> Key: HIVE-24131
> URL: https://issues.apache.org/jira/browse/HIVE-24131
> Project: Hive
>  Issue Type: Bug
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24131) Use original src location always when data copy runs on target

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24131:
--
Labels: pull-request-available  (was: )

> Use original src location always when data copy runs on target 
> ---
>
> Key: HIVE-24131
> URL: https://issues.apache.org/jira/browse/HIVE-24131
> Project: Hive
>  Issue Type: Bug
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24131) Use original src location always when data copy runs on target

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24131?focusedWorklogId=480237=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480237
 ]

ASF GitHub Bot logged work on HIVE-24131:
-

Author: ASF GitHub Bot
Created on: 08/Sep/20 15:56
Start Date: 08/Sep/20 15:56
Worklog Time Spent: 10m 
  Work Description: pkumarsinha opened a new pull request #1477:
URL: https://github.com/apache/hive/pull/1477


   …rget
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480237)
Remaining Estimate: 0h
Time Spent: 10m

> Use original src location always when data copy runs on target 
> ---
>
> Key: HIVE-24131
> URL: https://issues.apache.org/jira/browse/HIVE-24131
> Project: Hive
>  Issue Type: Bug
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24084) Push Aggregates thru joins in case it re-groups previously unique columns

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24084?focusedWorklogId=480207=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480207
 ]

ASF GitHub Bot logged work on HIVE-24084:
-

Author: ASF GitHub Bot
Created on: 08/Sep/20 15:17
Start Date: 08/Sep/20 15:17
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1439:
URL: https://github.com/apache/hive/pull/1439#discussion_r485001912



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveAggregateJoinTransposeRule.java
##
@@ -303,6 +305,90 @@ public void onMatch(RelOptRuleCall call) {
 }
   }
 
+  /**
+   * Determines weather the give grouping is unique.
+   *
+   * Consider a join which might produce non-unique rows; but later the 
results are aggregated again.
+   * This method determines if there are sufficient columns in the grouping 
which have been present previously as unique column(s).
+   */
+  private boolean isGroupingUnique(RelNode input, ImmutableBitSet groups) {
+if (groups.isEmpty()) {
+  return false;
+}
+RelMetadataQuery mq = input.getCluster().getMetadataQuery();
+Set uKeys = mq.getUniqueKeys(input);

Review comment:
   yes; I've explored using `areColumnsUnique` because it matches the 
usecase here - however for some tests it emitted some NPEs so I've gone back to 
the `getUniqueKeys` approach
   I'll file a jira for `areColumnsUnique` when I know what's wrong with it..





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480207)
Time Spent: 2h 10m  (was: 2h)

> Push Aggregates thru joins in case it re-groups previously unique columns
> -
>
> Key: HIVE-24084
> URL: https://issues.apache.org/jira/browse/HIVE-24084
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24131) Use original src location always when data copy runs on target

2020-09-08 Thread Pravin Sinha (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha reassigned HIVE-24131:
---


> Use original src location always when data copy runs on target 
> ---
>
> Key: HIVE-24131
> URL: https://issues.apache.org/jira/browse/HIVE-24131
> Project: Hive
>  Issue Type: Bug
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24130) Support datasets for non-default database

2020-09-08 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-24130:
---


> Support datasets for non-default database
> -
>
> Key: HIVE-24130
> URL: https://issues.apache.org/jira/browse/HIVE-24130
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> tpch datasets were added in a different database - but QTestDatasetHandler 
> only considers tables by "name" and ignores the "db" - so the protection 
> mechanism doesn't fully work for the tpch tables are being wiped out after 
> the first test using them and never loaded back



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23851) MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23851?focusedWorklogId=480157=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480157
 ]

ASF GitHub Bot logged work on HIVE-23851:
-

Author: ASF GitHub Bot
Created on: 08/Sep/20 14:08
Start Date: 08/Sep/20 14:08
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1271:
URL: https://github.com/apache/hive/pull/1271#discussion_r484949144



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MsckPartitionExpressionProxy.java
##
@@ -1,114 +0,0 @@
-package org.apache.hadoop.hive.metastore;
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- * 
- * http://www.apache.org/licenses/LICENSE-2.0
- * 
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-import java.nio.charset.StandardCharsets;
-import java.util.ArrayList;
-import java.util.HashSet;
-import java.util.List;
-import java.util.Set;
-
-import org.apache.hadoop.hive.metastore.api.FieldSchema;
-import org.apache.hadoop.hive.metastore.api.FileMetadataExprType;
-import org.apache.hadoop.hive.metastore.api.MetaException;
-import org.apache.hadoop.hive.metastore.utils.FileUtils;
-import org.apache.hadoop.hive.ql.io.sarg.SearchArgument;
-import org.apache.hadoop.util.StringUtils;
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
-
-// This is added as part of moving MSCK code from ql to standalone-metastore. 
There is a metastore API to drop
-// partitions by name but we cannot use it because msck typically will contain 
partition value (year=2014). We almost
-// never drop partition by name (year). So we need to construct expression 
filters, the current
-// PartitionExpressionProxy implementations (PartitionExpressionForMetastore 
and HCatClientHMSImpl.ExpressionBuilder)
-// all depend on ql code to build ExprNodeDesc for the partition expressions. 
It also depends on kryo for serializing
-// the expression objects to byte[]. For MSCK drop partition, we don't need 
complex expression generator. For now,
-// all we do is split the partition spec (year=2014/month=24) into filter 
expression year='2014' and month='24' and
-// rely on metastore database to deal with type conversions. Ideally, 
PartitionExpressionProxy default implementation
-// should use SearchArgument (storage-api) to construct the filter expression 
and not depend on ql, but the usecase
-// for msck is pretty simple and this specific implementation should suffice.

Review comment:
   I don't know - have you explored using the SA approach ? seeing the ql 
class thru reflection looks a bit wierd.
   
   other way around could be to move Expr related classes out from ql (not sure 
if that's possible)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480157)
Time Spent: 2h 50m  (was: 2h 40m)

> MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions
> 
>
> Key: HIVE-23851
> URL: https://issues.apache.org/jira/browse/HIVE-23851
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> *Steps to reproduce:*
> # Create external table
> # Run msck command to sync all the partitions with metastore
> # Remove one of the partition path
> # Run msck repair with partition filtering
> *Stack Trace:*
> {code:java}
>  2020-07-15T02:10:29,045 ERROR [4dad298b-28b1-4e6b-94b6-aa785b60c576 main] 
> ppr.PartitionExpressionForMetastore: Failed to deserialize the expression
>  java.lang.IndexOutOfBoundsException: Index: 110, Size: 0
>  at 

[jira] [Commented] (HIVE-21574) return wrong result when execute left join sql

2020-09-08 Thread zhishui (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17192182#comment-17192182
 ] 

zhishui commented on HIVE-21574:


[~Panda Song] could you give some schma to reappear your problem

> return wrong result when execute left join sql
> --
>
> Key: HIVE-21574
> URL: https://issues.apache.org/jira/browse/HIVE-21574
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.0
> Environment: hive 3.1.0 hdfs 3.1.1
>Reporter: Panda Song
>Priority: Blocker
>
> Can somebody delete this issue please?
> when I use a table instead of the sub select,I get the right result,much more 
> rows are joined together(metrics old_uv is bigger!!!) 
> Is there some bugs here?
> Please help me ,thanks a lot!!
> {code:java}
> select 
> a.event_date,
> count(distinct a.device_id) as uv,
> count(distinct case when b.device_id is not null then b.device_id end) as 
> old_uv,
> count(distinct a.device_id) - count(distinct case when b.device_id is not 
> null then b.device_id end) as new_uv
> from
> (
> select
> event_date,
> device_id,
> qingting_id
> from datacenter.bl_page_chain_day
> where event_date = '2019-03-31'
> and (current_content like '/membership5%'
> or current_content like '/vips/members%'
> or current_content like '/members/v2/%')
> )a
> left join
> (select
>   b.device_id
> from
> lzq_test.first_buy_vip a
> inner join datacenter.device_qingting b on a.qingting_id = b.qingting_id
> where a.first_buy < '2019-03-31'
> group by b.device_id
> )b
> on a.device_id = b.device_id
> group by a.event_date;
> {code}
> plan:
> {code:java}
> Plan optimized by CBO. 
> 
>  Vertex dependency in root stage
>  Map 1 <- Map 3 (BROADCAST_EDGE)
>  Reducer 2 <- Map 1 (SIMPLE_EDGE)   
>  Reducer 5 <- Map 4 (CUSTOM_SIMPLE_EDGE), Reducer 2 (ONE_TO_ONE_EDGE) 
>  Reducer 6 <- Reducer 5 (SIMPLE_EDGE)   
> 
>  Stage-0
>Fetch Operator   
>  limit:-1   
>  Stage-1
>Reducer 6
>File Output Operator [FS_26] 
>  Select Operator [SEL_25] (rows=35527639 width=349) 
>Output:["_col0","_col1","_col2","_col3"] 
>Group By Operator [GBY_24] (rows=35527639 width=349) 
>  Output:["_col0","_col1","_col2"],aggregations:["count(DISTINCT 
> KEY._col1:0._col0)","count(DISTINCT KEY._col1:1._col0)"],keys:KEY._col0 
><-Reducer 5 [SIMPLE_EDGE]
>  SHUFFLE [RS_23]
>PartitionCols:_col0  
>Group By Operator [GBY_22] (rows=71055278 width=349) 
>  
> Output:["_col0","_col1","_col2","_col3","_col4"],aggregations:["count(DISTINCT
>  _col1)","count(DISTINCT _col2)"],keys:true, _col1, _col2 
>  Select Operator [SEL_20] (rows=71055278 width=349) 
>Output:["_col1","_col2"] 
>Map Join Operator [MAPJOIN_45] (rows=71055278 width=349) 
>  
> Conds:RS_17.KEY.reducesinkkey0=RS_18.KEY.reducesinkkey0(Right 
> Outer),Output:["_col0","_col1"] 
><-Reducer 2 [ONE_TO_ONE_EDGE]
>  FORWARD [RS_17]
>PartitionCols:_col0  
>Group By Operator [GBY_12] (rows=21738609 width=235) 
>  Output:["_col0"],keys:KEY._col0 
><-Map 1 [SIMPLE_EDGE]
>  SHUFFLE [RS_11]
>PartitionCols:_col0  
>Group By Operator [GBY_10] (rows=43477219 
> width=235) 
>  Output:["_col0"],keys:_col0 
>  Map Join Operator [MAPJOIN_44] (rows=43477219 
> width=235) 
>
> Conds:SEL_2._col1=RS_7._col0(Inner),Output:["_col0"] 
>  <-Map 3 [BROADCAST_EDGE] 
>BROADCAST [RS_7] 
>  PartitionCols:_col0 
>  Select Operator [SEL_5] (rows=301013 
> width=228) 
>Output:["_col0"] 
>Filter Operator [FIL_32] (rows=301013 
> width=228) 
>  predicate:((first_buy < 
> DATE'2019-03-31') and qingting_id is not null) 
>  TableScan 

[jira] [Commented] (HIVE-23851) MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions

2020-09-08 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17192169#comment-17192169
 ] 

Syed Shameerur Rahman commented on HIVE-23851:
--

[~jcamachorodriguez] [~kgyrtkirk] Could you Please review the PR?
Thanks..

> MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions
> 
>
> Key: HIVE-23851
> URL: https://issues.apache.org/jira/browse/HIVE-23851
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> *Steps to reproduce:*
> # Create external table
> # Run msck command to sync all the partitions with metastore
> # Remove one of the partition path
> # Run msck repair with partition filtering
> *Stack Trace:*
> {code:java}
>  2020-07-15T02:10:29,045 ERROR [4dad298b-28b1-4e6b-94b6-aa785b60c576 main] 
> ppr.PartitionExpressionForMetastore: Failed to deserialize the expression
>  java.lang.IndexOutOfBoundsException: Index: 110, Size: 0
>  at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_192]
>  at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_192]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:857)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:707) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:806)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:96)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.convertExprToFilter(PartitionExpressionForMetastore.java:52)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.PartFilterExprUtil.makeExpressionTree(PartFilterExprUtil.java:48)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:3593)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:80)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT-tests.jar:4.0.0-SNAPSHOT]
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_192]
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_192]
> {code}
> *Cause:*
> In case of msck repair with partition filtering we expect expression proxy 
> class to be set as PartitionExpressionForMetastore ( 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckAnalyzer.java#L78
>  ), While dropping partition we serialize the drop partition filter 
> expression as ( 
> https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java#L589
>  ) which is incompatible during deserializtion happening in 
> PartitionExpressionForMetastore ( 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionExpressionForMetastore.java#L52
>  ) hence the query fails with Failed to deserialize the expression.
> *Solutions*:
> I could think of two approaches to this problem
> # Since PartitionExpressionForMetastore is required only during parition 
> pruning step, We can switch back the expression proxy class to 
> MsckPartitionExpressionProxy once the partition pruning step is done.
> # The other solution is to make serialization process in msck drop partition 
> filter expression compatible with the one with 
> PartitionExpressionForMetastore, We can do this via Reflection since the drop 
> partition serialization happens in Msck class (standadlone-metatsore) by this 
> way we can completely 

[jira] [Work logged] (HIVE-24073) Execution exception in sort-merge semijoin

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24073?focusedWorklogId=480063=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480063
 ]

ASF GitHub Bot logged work on HIVE-24073:
-

Author: ASF GitHub Bot
Created on: 08/Sep/20 11:35
Start Date: 08/Sep/20 11:35
Worklog Time Spent: 10m 
  Work Description: maheshk114 opened a new pull request #1476:
URL: https://github.com/apache/hive/pull/1476


   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480063)
Remaining Estimate: 0h
Time Spent: 10m

> Execution exception in sort-merge semijoin
> --
>
> Key: HIVE-24073
> URL: https://issues.apache.org/jira/browse/HIVE-24073
> Project: Hive
>  Issue Type: Bug
>  Components: Operators
>Reporter: Jesus Camacho Rodriguez
>Assignee: mahesh kumar behera
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Working on HIVE-24041, we trigger an additional SJ conversion that leads to 
> this exception at execution time:
> {code}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Attempting to overwrite 
> nextKeyWritables[1]
>   at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1063)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:685)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:707)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:707)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:707)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:707)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:462)
>   ... 16 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Attempting to overwrite 
> nextKeyWritables[1]
>   at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.flush(GroupByOperator.java:1037)
>   at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1060)
>   ... 22 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Attempting to 
> overwrite nextKeyWritables[1]
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.processKey(CommonMergeJoinOperator.java:564)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.process(CommonMergeJoinOperator.java:243)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:887)
>   at 
> org.apache.hadoop.hive.ql.exec.TezDummyStoreOperator.process(TezDummyStoreOperator.java:49)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:887)
>   at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1003)
>   at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.flush(GroupByOperator.java:1020)
>   ... 23 more
> {code}
> To reproduce, just set {{hive.auto.convert.sortmerge.join}} to {{true}} in 
> the last query in {{auto_sortmerge_join_10.q}} after HIVE-24041 has been 
> merged.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24073) Execution exception in sort-merge semijoin

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24073:
--
Labels: pull-request-available  (was: )

> Execution exception in sort-merge semijoin
> --
>
> Key: HIVE-24073
> URL: https://issues.apache.org/jira/browse/HIVE-24073
> Project: Hive
>  Issue Type: Bug
>  Components: Operators
>Reporter: Jesus Camacho Rodriguez
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Working on HIVE-24041, we trigger an additional SJ conversion that leads to 
> this exception at execution time:
> {code}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Attempting to overwrite 
> nextKeyWritables[1]
>   at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1063)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:685)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:707)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:707)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:707)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:707)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:462)
>   ... 16 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Attempting to overwrite 
> nextKeyWritables[1]
>   at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.flush(GroupByOperator.java:1037)
>   at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1060)
>   ... 22 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Attempting to 
> overwrite nextKeyWritables[1]
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.processKey(CommonMergeJoinOperator.java:564)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.process(CommonMergeJoinOperator.java:243)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:887)
>   at 
> org.apache.hadoop.hive.ql.exec.TezDummyStoreOperator.process(TezDummyStoreOperator.java:49)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:887)
>   at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1003)
>   at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.flush(GroupByOperator.java:1020)
>   ... 23 more
> {code}
> To reproduce, just set {{hive.auto.convert.sortmerge.join}} to {{true}} in 
> the last query in {{auto_sortmerge_join_10.q}} after HIVE-24041 has been 
> merged.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=479955=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479955
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 08/Sep/20 09:09
Start Date: 08/Sep/20 09:09
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1474:
URL: https://github.com/apache/hive/pull/1474#discussion_r484768146



##
File path: 
standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift
##
@@ -1012,8 +1012,9 @@ struct CommitTxnRequest {
 3: optional list writeEventInfos,
 // Information to update the last repl id of table/partition along with 
commit txn (replication from 2.6 to 3.0)
 4: optional ReplLastIdInfo replLastIdInfo,
+5: optional bool exclWriteEnabled = true,

Review comment:
   we need to keep it consistent with downstream, see `HIVE-23759: refactor 
field order of CommitTxnRequest`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 479955)
Time Spent: 7h 20m  (was: 7h 10m)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to an old and a new 
> partition to the source table.
> 3. Open, run and commit transaction 3 that inserts data to the target table 
> of the merge statement, that will retrigger a snapshot generation in 
> transaction 1.
> 4. Run transaction 1, the snapshot will be regenerated, and it will read 
> partial data from transaction 2 breaking the ACID properties.
> Different setup.
> Switch the transaction order:
> 1. compile transaction 1 that inserts data to an old and a new partition of 
> the source table.
> 2. compile transaction 2 that insert data to the target table
> 2. compile transaction 3 that merge inserts data from the source table to the 
> target table
> 3. run and commit transaction 1
> 4. run and commit transaction 2
> 5. run transaction 3, since it cointains 1 and 2 in its snaphot the 
> isValidTxnListState will be triggered and we do a partial read of the 
> transaction 1 for the same reasons.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=479954=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479954
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 08/Sep/20 09:08
Start Date: 08/Sep/20 09:08
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1474:
URL: https://github.com/apache/hive/pull/1474#discussion_r484767687



##
File path: ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java
##
@@ -2315,6 +2386,139 @@ private void 
testConcurrentMergeInsertNoDuplicates(String query, boolean sharedW
 List res = new ArrayList();
 driver.getFetchTask().fetch(res);
 Assert.assertEquals("Duplicate records found", 4, res.size());
+dropTable(new String[]{"target", "source"});
+  }
+
+  /**
+   * ValidTxnManager.isValidTxnListState can invalidate a snapshot if a 
relevant write transaction was committed
+   * between a query compilation and lock acquisition. When this happens we 
have to recompile the given query,
+   * otherwise we can miss reading partitions created between. The following 
three cases test these scenarios.
+   * @throws Exception ex
+   */
+  @Test
+  public void testMergeInsertDynamicPartitioningSequential() throws Exception {
+dropTable(new String[]{"target", "source"});
+conf.setBoolVar(HiveConf.ConfVars.TXN_WRITE_X_LOCK, false);
+
+// Create partition c=1
+driver.run("create table target (a int, b int) partitioned by (c int) 
stored as orc TBLPROPERTIES ('transactional'='true')");
+driver.run("insert into target values (1,1,1), (2,2,1)");
+//Create partition c=2
+driver.run("create table source (a int, b int) partitioned by (c int) 
stored as orc TBLPROPERTIES ('transactional'='true')");
+driver.run("insert into source values (3,3,2), (4,4,2)");
+
+// txn 1 inserts data to an old and a new partition
+driver.run("insert into source values (5,5,2), (6,6,3)");
+
+// txn 2 inserts into the target table into a new partition ( and a 
duplicate considering the source table)
+driver.run("insert into target values (3, 3, 2)");
+
+// txn3 merge
+driver.run("merge into target t using source s on t.a = s.a " +
+  "when not matched then insert values (s.a, s.b, s.c)");
+driver.run("select * from target");
+List res = new ArrayList();
+driver.getFetchTask().fetch(res);
+// The merge should see all three partition and not create duplicates
+Assert.assertEquals("Duplicate records found", 6, res.size());
+Assert.assertTrue("Partition 3 was skipped", res.contains("6\t6\t3"));
+dropTable(new String[]{"target", "source"});
+  }
+
+  @Test
+  public void 
testMergeInsertDynamicPartitioningSnapshotInvalidatedWithOldCommit() throws 
Exception {
+// By creating the driver with the factory, we should have a ReExecDriver
+IDriver driver3 = DriverFactory.newDriver(conf);
+Assert.assertTrue("ReExecDriver was expected", driver3 instanceof 
ReExecDriver);

Review comment:
   changed

##
File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java
##
@@ -488,30 +489,40 @@ private void runInternal(String command, boolean 
alreadyCompiled) throws Command
 
   lockAndRespond();
 
+  int retryShapshotCnt = 0;
+  int maxRetrySnapshotCnt = HiveConf.getIntVar(driverContext.getConf(),
+HiveConf.ConfVars.HIVE_TXN_MAX_RETRYSNAPSHOT_COUNT);
+
   try {
-if (!driverTxnHandler.isValidTxnListState()) {
-  LOG.info("Compiling after acquiring locks");
+while (!driverTxnHandler.isValidTxnListState() && ++retryShapshotCnt 
<= maxRetrySnapshotCnt) {
+  LOG.info("Compiling after acquiring locks, attempt #" + 
retryShapshotCnt);
   // Snapshot was outdated when locks were acquired, hence regenerate 
context,
   // txn list and retry
   // TODO: Lock acquisition should be moved before analyze, this is a 
bit hackish.
   // Currently, we acquire a snapshot, we compile the query wrt that 
snapshot,
   // and then, we acquire locks. If snapshot is still valid, we 
continue as usual.
   // But if snapshot is not valid, we recompile the query.
   if (driverContext.isOutdatedTxn()) {
+LOG.info("Snapshot is outdated, re-initiating transaction ...");
 driverContext.getTxnManager().rollbackTxn();
 
 String userFromUGI = DriverUtils.getUserFromUGI(driverContext);
 driverContext.getTxnManager().openTxn(context, userFromUGI, 
driverContext.getTxnType());
 lockAndRespond();
   }
+
   driverContext.setRetrial(true);
   driverContext.getBackupContext().addSubContext(context);
   
driverContext.getBackupContext().setHiveLocks(context.getHiveLocks());
   context = 

[jira] [Assigned] (HIVE-24129) Deleting the previous successful dump directory should be based on config

2020-09-08 Thread Pravin Sinha (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha reassigned HIVE-24129:
---


> Deleting the previous successful dump directory should be based on config
> -
>
> Key: HIVE-24129
> URL: https://issues.apache.org/jira/browse/HIVE-24129
> Project: Hive
>  Issue Type: Task
>Reporter: Pravin Sinha
>Assignee: Arko Sharma
>Priority: Major
>
> {color:#22}Description: Provide a policy level config defaulted to 
> true.{color}
> {color:#22}This can help debug any issue in the production.{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23413) Create a new config to skip all locks

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23413?focusedWorklogId=479939=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479939
 ]

ASF GitHub Bot logged work on HIVE-23413:
-

Author: ASF GitHub Bot
Created on: 08/Sep/20 08:51
Start Date: 08/Sep/20 08:51
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1220:
URL: https://github.com/apache/hive/pull/1220#discussion_r484757467



##
File path: ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java
##
@@ -727,35 +751,52 @@ private void testCheckExpectedLocks(boolean sharedWrite) 
throws Exception {
*/
   @Test
   public void testCheckExpectedLocks2() throws Exception {
+testCheckExpectedLocks2(true);
+  }
+  @Test
+  public void testCheckExpectedLocks2NoReadLock() throws Exception {
+testCheckExpectedLocks2(false);
+  }
+  public void testCheckExpectedLocks2(boolean readLocks) throws Exception {

Review comment:
   nit: new row

##
File path: ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java
##
@@ -727,35 +751,52 @@ private void testCheckExpectedLocks(boolean sharedWrite) 
throws Exception {
*/
   @Test
   public void testCheckExpectedLocks2() throws Exception {
+testCheckExpectedLocks2(true);
+  }
+  @Test

Review comment:
   nit: new row





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 479939)
Time Spent: 50m  (was: 40m)

> Create a new config to skip all locks
> -
>
> Key: HIVE-23413
> URL: https://issues.apache.org/jira/browse/HIVE-23413
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23413.1.patch, HIVE-23413.2.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> From time-to-time some query is blocked on locks which should not.
> To have a quick workaround for this we should have a config which the user 
> can set in the session to disable acquiring/checking locks, so we can provide 
> it immediately and then later investigate and fix the root cause.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-16352) Ability to skip or repair out of sync blocks with HIVE at runtime

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-16352?focusedWorklogId=479938=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479938
 ]

ASF GitHub Bot logged work on HIVE-16352:
-

Author: ASF GitHub Bot
Created on: 08/Sep/20 08:51
Start Date: 08/Sep/20 08:51
Worklog Time Spent: 10m 
  Work Description: gabrywu commented on pull request #1436:
URL: https://github.com/apache/hive/pull/1436#issuecomment-688722827


   @kgyrtkirk Hi, committer, could you review this PR and give some suggestions?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 479938)
Time Spent: 50m  (was: 40m)

> Ability to skip or repair out of sync blocks with HIVE at runtime
> -
>
> Key: HIVE-16352
> URL: https://issues.apache.org/jira/browse/HIVE-16352
> Project: Hive
>  Issue Type: New Feature
>  Components: Avro, File Formats, Reader
>Affects Versions: 3.1.2
>Reporter: Navdeep Poonia
>Assignee: gabrywu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> When a file is corrupted it raises the error java.io.IOException: Invalid 
> sync! with hive.
>  Can we have some functionality to skip or repair such blocks at runtime to 
> make avro more error resilient in case of data corruption.
>  Error: java.io.IOException: java.io.IOException: java.io.IOException: While 
> processing file 
> s3n:///navdeepp/warehouse/avro_test/354dc34474404f4bbc0d8013fc8e6e4b_42.
>  java.io.IOException: Invalid sync!
>  at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
>  at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
>  at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:334)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-24094) cast type mismatch and use is not null, the results are error if cbo is true

2020-09-08 Thread zhaolong (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17192041#comment-17192041
 ] 

zhaolong edited comment on HIVE-24094 at 9/8/20, 8:43 AM:
--

upgrade calcite-core to 1.19.0, can resolve this problem。

!image-2020-09-08-16-42-54-728.png!

!image-2020-09-08-16-43-00-848.png!

i think this is the calcite bug link. 
https://issues.apache.org/jira/browse/CALCITE-3609


was (Author: fsilent):
upgrade calcite-core to 1.19.0, can resolve this problem。

!image-2020-09-08-16-42-00-966.png!

i think this is the calcite bug link. 
https://issues.apache.org/jira/browse/CALCITE-3609

> cast type mismatch and use is not null, the results are error if cbo is true
> 
>
> Key: HIVE-24094
> URL: https://issues.apache.org/jira/browse/HIVE-24094
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 3.1.0
>Reporter: zhaolong
>Priority: Major
> Attachments: image-2020-08-31-10-01-26-250.png, 
> image-2020-08-31-10-02-39-154.png, image-2020-09-04-10-54-43-141.png, 
> image-2020-09-04-10-56-00-764.png, image-2020-09-04-10-56-07-286.png, 
> image-2020-09-04-10-59-36-780.png, image-2020-09-04-11-02-07-917.png, 
> image-2020-09-04-11-02-18-008.png, image-2020-09-07-15-20-44-201.png, 
> image-2020-09-07-15-21-35-566.png, image-2020-09-07-15-24-59-015.png, 
> image-2020-09-07-15-25-18-785.png, image-2020-09-08-16-42-54-728.png, 
> image-2020-09-08-16-43-00-848.png
>
>
> 1.CREATE TABLE IF NOT EXISTS testa
> ( 
>  SEARCHWORD STRING, 
>  COUNT_NUM BIGINT, 
>  WORDS STRING 
> ) 
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\27' 
> STORED AS TEXTFILE; 
> 2.insert into testa values('searchword', 1, 'a');
> 3.set hive.cbo.enable=false;
> 4.SELECT 
> CASE 
>  WHEN CAST(searchword as bigint) IS NOT NULL THEN CAST(CAST(searchword as 
> bigint) as String) 
>  ELSE searchword 
> END AS WORDS, 
> searchword FROM testa;
> !image-2020-08-31-10-01-26-250.png!
> 5.set hive.cbo.enable=true;
> 6.SELECT 
> CASE 
>  WHEN CAST(searchword as bigint) IS NOT NULL THEN CAST(CAST(searchword as 
> bigint) as String) 
>  ELSE searchword 
> END AS WORDS, 
> searchword FROM testa;
> !image-2020-08-31-10-02-39-154.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24094) cast type mismatch and use is not null, the results are error if cbo is true

2020-09-08 Thread zhaolong (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17192041#comment-17192041
 ] 

zhaolong commented on HIVE-24094:
-

upgrade calcite-core to 1.19.0, can resolve this problem。

!image-2020-09-08-16-42-00-966.png!

i think this is the calcite bug link. 
https://issues.apache.org/jira/browse/CALCITE-3609

> cast type mismatch and use is not null, the results are error if cbo is true
> 
>
> Key: HIVE-24094
> URL: https://issues.apache.org/jira/browse/HIVE-24094
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 3.1.0
>Reporter: zhaolong
>Priority: Major
> Attachments: image-2020-08-31-10-01-26-250.png, 
> image-2020-08-31-10-02-39-154.png, image-2020-09-04-10-54-43-141.png, 
> image-2020-09-04-10-56-00-764.png, image-2020-09-04-10-56-07-286.png, 
> image-2020-09-04-10-59-36-780.png, image-2020-09-04-11-02-07-917.png, 
> image-2020-09-04-11-02-18-008.png, image-2020-09-07-15-20-44-201.png, 
> image-2020-09-07-15-21-35-566.png, image-2020-09-07-15-24-59-015.png, 
> image-2020-09-07-15-25-18-785.png
>
>
> 1.CREATE TABLE IF NOT EXISTS testa
> ( 
>  SEARCHWORD STRING, 
>  COUNT_NUM BIGINT, 
>  WORDS STRING 
> ) 
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\27' 
> STORED AS TEXTFILE; 
> 2.insert into testa values('searchword', 1, 'a');
> 3.set hive.cbo.enable=false;
> 4.SELECT 
> CASE 
>  WHEN CAST(searchword as bigint) IS NOT NULL THEN CAST(CAST(searchword as 
> bigint) as String) 
>  ELSE searchword 
> END AS WORDS, 
> searchword FROM testa;
> !image-2020-08-31-10-01-26-250.png!
> 5.set hive.cbo.enable=true;
> 6.SELECT 
> CASE 
>  WHEN CAST(searchword as bigint) IS NOT NULL THEN CAST(CAST(searchword as 
> bigint) as String) 
>  ELSE searchword 
> END AS WORDS, 
> searchword FROM testa;
> !image-2020-08-31-10-02-39-154.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23413) Create a new config to skip all locks

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23413?focusedWorklogId=479921=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479921
 ]

ASF GitHub Bot logged work on HIVE-23413:
-

Author: ASF GitHub Bot
Created on: 08/Sep/20 08:27
Start Date: 08/Sep/20 08:27
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on pull request #1220:
URL: https://github.com/apache/hive/pull/1220#issuecomment-688708402


   @deniskuzZ @pvary Do you still think we need this? If so please review, and 
I will check if it needs rebase.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 479921)
Time Spent: 40m  (was: 0.5h)

> Create a new config to skip all locks
> -
>
> Key: HIVE-23413
> URL: https://issues.apache.org/jira/browse/HIVE-23413
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23413.1.patch, HIVE-23413.2.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> From time-to-time some query is blocked on locks which should not.
> To have a quick workaround for this we should have a config which the user 
> can set in the session to disable acquiring/checking locks, so we can provide 
> it immediately and then later investigate and fix the root cause.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)