[jira] [Work logged] (HIVE-24654) Table level replication support for Atlas metadata

2021-01-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24654?focusedWorklogId=542674&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-542674
 ]

ASF GitHub Bot logged work on HIVE-24654:
-

Author: ASF GitHub Bot
Created on: 27/Jan/21 06:31
Start Date: 27/Jan/21 06:31
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #1883:
URL: https://github.com/apache/hive/pull/1883#discussion_r565058597



##
File path: 
ql/src/test/org/apache/hadoop/hive/ql/exec/repl/TestAtlasDumpTask.java
##
@@ -242,6 +251,25 @@ public void testAtlasClientTimeouts() throws Exception {
 AtlasRestClientBuilder.ATLAS_PROPERTY_READ_TIMEOUT_IN_MS));
   }
 
+  @Test
+  public void testCreateExportRequest() throws Exception {

Review comment:
   Hive replication as such will not allow falling back from table level to 
db level replication. For a expression modification case, Atlas server should 
handle the case and if it does not. we will have to get that fixed by Atlas 
team. For now, there is no known case on Atlas server side.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 542674)
Time Spent: 50m  (was: 40m)

> Table level replication support for Atlas metadata
> --
>
> Key: HIVE-24654
> URL: https://issues.apache.org/jira/browse/HIVE-24654
> Project: Hive
>  Issue Type: Task
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24654.01.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Covers mainly Atlas export API payload change required to support table level 
> replication



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24654) Table level replication support for Atlas metadata

2021-01-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24654?focusedWorklogId=542673&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-542673
 ]

ASF GitHub Bot logged work on HIVE-24654:
-

Author: ASF GitHub Bot
Created on: 27/Jan/21 06:28
Start Date: 27/Jan/21 06:28
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #1883:
URL: https://github.com/apache/hive/pull/1883#discussion_r565057506



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/atlas/AtlasRequestBuilder.java
##
@@ -105,6 +126,50 @@ private String getQualifiedName(String clusterName, String 
srcDb) {
 return qualifiedName;
   }
 
+  private String getQualifiedName(String clusterName, String srcDB, String 
tableName) {
+String qualifiedTableName = 
String.format(QUALIFIED_NAME_HIVE_TABLE_FORMAT, srcDB, tableName);
+return getQualifiedName(clusterName,  qualifiedTableName);
+  }
+
+  private List getQualifiedNames(String clusterName, String srcDb, 
Path listOfTablesFile, HiveConf conf)
+  throws SemanticException {
+List qualifiedNames = new ArrayList<>();
+List tableNames = getFileAsList(listOfTablesFile, conf);
+if (CollectionUtils.isEmpty(tableNames)) {
+  LOG.info("Empty file encountered: {}", listOfTablesFile);
+  return qualifiedNames;
+}
+for (String tableName : tableNames) {
+  qualifiedNames.add(getQualifiedName(clusterName, srcDb, tableName));
+}
+return qualifiedNames;
+  }
+
+  private static List getFileAsList(Path listOfTablesFile, HiveConf 
conf) throws SemanticException {
+List list = new ArrayList<>();
+InputStream is = null;
+try {
+  FileSystem fs = FileSystem.get(listOfTablesFile.toUri(), conf);

Review comment:
   No, doesn't look like. The FS creation looks proper so should work in 
those cases.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 542673)
Time Spent: 40m  (was: 0.5h)

> Table level replication support for Atlas metadata
> --
>
> Key: HIVE-24654
> URL: https://issues.apache.org/jira/browse/HIVE-24654
> Project: Hive
>  Issue Type: Task
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24654.01.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Covers mainly Atlas export API payload change required to support table level 
> replication



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24654) Table level replication support for Atlas metadata

2021-01-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24654?focusedWorklogId=542672&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-542672
 ]

ASF GitHub Bot logged work on HIVE-24654:
-

Author: ASF GitHub Bot
Created on: 27/Jan/21 06:27
Start Date: 27/Jan/21 06:27
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #1883:
URL: https://github.com/apache/hive/pull/1883#discussion_r565057156



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/AtlasLoadTask.java
##
@@ -121,7 +121,7 @@ AtlasReplInfo createAtlasReplInfo() throws 
SemanticException, MalformedURLExcept
 String srcCluster = 
ReplUtils.getNonEmpty(HiveConf.ConfVars.REPL_SOURCE_CLUSTER_NAME.varname, conf, 
errorFormat);
 String tgtCluster = 
ReplUtils.getNonEmpty(HiveConf.ConfVars.REPL_TARGET_CLUSTER_NAME.varname, conf, 
errorFormat);
 AtlasReplInfo atlasReplInfo = new AtlasReplInfo(endpoint, work.getSrcDB(), 
work.getTgtDB(),
-srcCluster, tgtCluster, work.getStagingDir(), conf);
+srcCluster, tgtCluster, work.getStagingDir(), null, conf);

Review comment:
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 542672)
Time Spent: 0.5h  (was: 20m)

> Table level replication support for Atlas metadata
> --
>
> Key: HIVE-24654
> URL: https://issues.apache.org/jira/browse/HIVE-24654
> Project: Hive
>  Issue Type: Task
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24654.01.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Covers mainly Atlas export API payload change required to support table level 
> replication



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24675) Handle external table replication for HA with same NS and lazy copy.

2021-01-26 Thread Arko Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arko Sharma updated HIVE-24675:
---
Attachment: HIVE-24675.04.patch

> Handle external table replication for HA with same NS and lazy copy.
> 
>
> Key: HIVE-24675
> URL: https://issues.apache.org/jira/browse/HIVE-24675
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24675.01.patch, HIVE-24675.02.patch, 
> HIVE-24675.03.patch, HIVE-24675.04.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24674) Set repl.source.for property in the db if db is under replication

2021-01-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24674?focusedWorklogId=542615&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-542615
 ]

ASF GitHub Bot logged work on HIVE-24674:
-

Author: ASF GitHub Bot
Created on: 27/Jan/21 03:51
Start Date: 27/Jan/21 03:51
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1897:
URL: https://github.com/apache/hive/pull/1897#discussion_r565006196



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenarios.java
##
@@ -4086,8 +4086,6 @@ public void testDumpWithPartitionDirMissing() throws 
IOException {
   @Test
   public void testDumpNonReplDatabase() throws IOException {

Review comment:
   can this test itself be removed?

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/ReplicationSemanticAnalyzer.java
##
@@ -177,11 +177,6 @@ private void initReplDump(ASTNode ast) throws 
HiveException {
 for (String dbName : Utils.matchesDb(db, dbNameOrPattern)) {
   Database database = db.getDatabase(dbName);
   if (database != null) {
-if (!isMetaDataOnly && 
!ReplChangeManager.isSourceOfReplication(database)) {

Review comment:
   Is a check needed on the load side for upgrade scenarios? Say source 
cluster is not upgraded but the target is.

##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java
##
@@ -873,6 +875,28 @@ Long bootStrapDump(Path dumpRoot, DumpMetaData dmd, Path 
cmRoot, Hive hiveDb)
   throw new HiveException("Replication dump not allowed for replicated 
database" +
   " with first incremental dump pending : " + dbName);
 }
+
+if (db != null && !HiveConf.getBoolVar(conf, REPL_DUMP_METADATA_ONLY)) 
{
+  if (!ReplChangeManager.isSourceOfReplication(db)) {
+// Check if the schedule name is available else set the query value
+// as default.
+String value = conf.get(SCHEDULED_QUERY_SCHEDULENAME,
+"default_" + getQueryState().getQueryString());
+Map params = db.getParameters();
+if (params != null) {
+  params.put("repl.source.for", value);

Review comment:
   If repl.source.for is already set for a particular db with a policy p1 
and a new policy is created say p2. p2 should be appended to p1.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 542615)
Time Spent: 0.5h  (was: 20m)

> Set repl.source.for property in the db if db is under replication
> -
>
> Key: HIVE-24674
> URL: https://issues.apache.org/jira/browse/HIVE-24674
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Add repl.source.for property in the database in case not already set, if the 
> database is under replication.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

2021-01-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=542604&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-542604
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 27/Jan/21 03:28
Start Date: 27/Jan/21 03:28
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on a change in pull request #1823:
URL: https://github.com/apache/hive/pull/1823#discussion_r564992785



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFileFormatProxy.java
##
@@ -47,12 +47,14 @@ public SplitInfos applySargToMetadata(
 OrcTail orcTail = ReaderImpl.extractFileTail(fileMetadata);
 OrcProto.Footer footer = orcTail.getFooter();
 int stripeCount = footer.getStripesCount();
-boolean writerUsedProlepticGregorian = footer.hasCalendar()
-? footer.getCalendar() == OrcProto.CalendarKind.PROLEPTIC_GREGORIAN
-: OrcConf.PROLEPTIC_GREGORIAN_DEFAULT.getBoolean(conf);
+// Always convert To PROLEPTIC_GREGORIAN

Review comment:
   Why is it OK to use proleptic calendar always here? Could we leave short 
explanation in the comment for when we need to revisit this code?

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedReaderImpl.java
##
@@ -282,6 +280,56 @@ public String toString() {
 }
   }
 
+  public static boolean[] findPresentStreamsByColumn(

Review comment:
   Can we add javadoc for these public static utility methods? If they are 
used only in this class, should we change their visibility?

##
File path: 
ql/src/test/results/clientpositive/llap/schema_evol_orc_nonvec_part_all_primitive.q.out
##
@@ -687,11 +687,11 @@ POSTHOOK: Input: 
default@part_change_various_various_timestamp_n6
 POSTHOOK: Input: default@part_change_various_various_timestamp_n6@part=1
  A masked pattern was here 
 insert_num partc1  c2  c3  c4  c5  c6  c7  
c8  c9  c10 c11 c12 b
-1011   1970-01-01 00:00:00.001 1969-12-31 23:59:59.872 NULL
1969-12-07 03:28:36.352 NULLNULLNULLNULL6229-06-28 
02:54:28.970117179   6229-06-28 02:54:28.97011   6229-06-28 02:54:28.97011  
 1950-12-18 00:00:00 original
-1021   1970-01-01 00:00:00 1970-01-01 00:00:00.127 1970-01-01 
00:00:32.767 1970-01-25 20:31:23.647 NULLNULLNULLNULL5966-07-09 
03:30:50.597 5966-07-09 03:30:50.597 5966-07-09 03:30:50.597 2049-12-18 
00:00:00 original
+1011   1970-01-01 00:00:01 1969-12-31 23:57:52 NULL
1901-12-13 20:45:52 NULLNULLNULLNULL6229-06-28 
02:54:28.970117179   6229-06-28 02:54:28.97011   6229-06-28 02:54:28.97011  
 1950-12-18 00:00:00 original

Review comment:
   This shifting for timestamp values does not seem right (or at least I 
cannot make sense of it). Could you explain what is going on here? Some of the 
shifting is significant: For those, I remember there were some backwards 
incompatible changes in schema evolution in 1.6.x, it may be related to that? 
However, other shifting seems a bit more suspicious, e.g., 1 second, ~2 minutes?

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedTreeReaderFactory.java
##
@@ -2585,6 +2590,7 @@ private static TreeReader getPrimitiveTreeReader(final 
int columnIndex,
 .setColumnEncoding(columnEncoding)
 .setVectors(vectors)
 .setContext(context)
+.setIsInstant(columnType.getCategory()  == 
TypeDescription.Category.TIMESTAMP_INSTANT)

Review comment:
   As @mustafaiman  mentioned, I think this should be always false indeed: 
TIMESTAMP_INSTANT is equivalent to TIMESTAMP_WITH_LOCAL_TIME_ZONE type in Hive. 
AFAIK support to read/write timestamp with local time zone in ORC is not 
implemented yet.

##
File path: ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java
##
@@ -325,7 +326,7 @@ public void testReadFormat_0_11() throws Exception {
 + "binary,string1:string,middle:struct>>,list:array>,"
 + "map:map>,ts:timestamp,"
-+ "decimal1:decimal(38,18)>", readerInspector.getTypeName());
++ "decimal1:decimal(38,10)>", readerInspector.getTypeName());

Review comment:
   Change in decimal scale. Expected?

##
File path: ql/src/test/results/clientpositive/llap/orc_file_dump.q.out
##
@@ -249,15 +249,15 @@ Stripes:
   Entry 1: numHashFunctions: 4 bitCount: 6272 popCount: 182 loadFactor: 
0.029 expectedFpp: 7.090246E-7
   Stripe level merge: numHashFunctions: 4 bitCount: 6272 popCount: 1772 
loadFactor: 0.2825 expectedFpp: 0.0063713384
 Row group indices for column 9:
-  Entry 0: count: 1000 hasNull: false min: 2013-03-01 09:11:58.703 max: 
2013-03-01 09:11:58.703 positions: 0,0,0,0,0,0
-

[jira] [Work logged] (HIVE-24654) Table level replication support for Atlas metadata

2021-01-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24654?focusedWorklogId=542601&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-542601
 ]

ASF GitHub Bot logged work on HIVE-24654:
-

Author: ASF GitHub Bot
Created on: 27/Jan/21 03:10
Start Date: 27/Jan/21 03:10
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1883:
URL: https://github.com/apache/hive/pull/1883#discussion_r564994663



##
File path: 
ql/src/test/org/apache/hadoop/hive/ql/exec/repl/TestAtlasDumpTask.java
##
@@ -242,6 +251,25 @@ public void testAtlasClientTimeouts() throws Exception {
 AtlasRestClientBuilder.ATLAS_PROPERTY_READ_TIMEOUT_IN_MS));
   }
 
+  @Test
+  public void testCreateExportRequest() throws Exception {

Review comment:
   what happens if policy is modified to remove the table expression or 
modify the table expression? Does it fallback to db level replication?

##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/AtlasLoadTask.java
##
@@ -121,7 +121,7 @@ AtlasReplInfo createAtlasReplInfo() throws 
SemanticException, MalformedURLExcept
 String srcCluster = 
ReplUtils.getNonEmpty(HiveConf.ConfVars.REPL_SOURCE_CLUSTER_NAME.varname, conf, 
errorFormat);
 String tgtCluster = 
ReplUtils.getNonEmpty(HiveConf.ConfVars.REPL_TARGET_CLUSTER_NAME.varname, conf, 
errorFormat);
 AtlasReplInfo atlasReplInfo = new AtlasReplInfo(endpoint, work.getSrcDB(), 
work.getTgtDB(),
-srcCluster, tgtCluster, work.getStagingDir(), conf);
+srcCluster, tgtCluster, work.getStagingDir(), null, conf);

Review comment:
   Can have 2 constructors, with and without table list

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/atlas/AtlasRequestBuilder.java
##
@@ -105,6 +126,50 @@ private String getQualifiedName(String clusterName, String 
srcDb) {
 return qualifiedName;
   }
 
+  private String getQualifiedName(String clusterName, String srcDB, String 
tableName) {
+String qualifiedTableName = 
String.format(QUALIFIED_NAME_HIVE_TABLE_FORMAT, srcDB, tableName);
+return getQualifiedName(clusterName,  qualifiedTableName);
+  }
+
+  private List getQualifiedNames(String clusterName, String srcDb, 
Path listOfTablesFile, HiveConf conf)
+  throws SemanticException {
+List qualifiedNames = new ArrayList<>();
+List tableNames = getFileAsList(listOfTablesFile, conf);
+if (CollectionUtils.isEmpty(tableNames)) {
+  LOG.info("Empty file encountered: {}", listOfTablesFile);
+  return qualifiedNames;
+}
+for (String tableName : tableNames) {
+  qualifiedNames.add(getQualifiedName(clusterName, srcDb, tableName));
+}
+return qualifiedNames;
+  }
+
+  private static List getFileAsList(Path listOfTablesFile, HiveConf 
conf) throws SemanticException {
+List list = new ArrayList<>();
+InputStream is = null;
+try {
+  FileSystem fs = FileSystem.get(listOfTablesFile.toUri(), conf);

Review comment:
   Anything needed based on staging is on source /target or HA cases?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 542601)
Time Spent: 20m  (was: 10m)

> Table level replication support for Atlas metadata
> --
>
> Key: HIVE-24654
> URL: https://issues.apache.org/jira/browse/HIVE-24654
> Project: Hive
>  Issue Type: Task
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24654.01.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Covers mainly Atlas export API payload change required to support table level 
> replication



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24675) Handle external table replication for HA with same NS and lazy copy.

2021-01-26 Thread Aasha Medhi (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17272530#comment-17272530
 ] 

Aasha Medhi commented on HIVE-24675:


+1

> Handle external table replication for HA with same NS and lazy copy.
> 
>
> Key: HIVE-24675
> URL: https://issues.apache.org/jira/browse/HIVE-24675
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24675.01.patch, HIVE-24675.02.patch, 
> HIVE-24675.03.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24353) performance: Refactor TimestampTZ parsing

2021-01-26 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17272527#comment-17272527
 ] 

David Mollitor commented on HIVE-24353:
---

I just saw this issue in the wild.  I'll take a look at this.

The issue I am seeing involves a scenario where Hive is parsing/formatting many 
timestamp strings, more than it probably should.  However, it is compounded by 
the fact that each string it parocesses is in the format {{2021-01-26 
10:32:32.0}}.  This is an issue because the current parsing code 
expects the time zone to be specified, or else it fails like you said.

I believe there are some areas in the Hive code base that generates timestamp 
string without timezone information and then those end up getting parsed here 
(slowly).

> performance: Refactor TimestampTZ parsing
> -
>
> Key: HIVE-24353
> URL: https://issues.apache.org/jira/browse/HIVE-24353
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vincenz Priesnitz
>Assignee: Vincenz Priesnitz
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> I found that for datasets that contain a lot of timestamps (without 
> timezones) hive spends the majority of time in TimestampTZUtil.parse, in 
> particular constructing stractraces for the try-catch blocks. 
> When parsing TimestampTZ we are currently using a fallback chain with several 
> try-catch blocks. For a common timestamp string without a timezone, we 
> currently throw and catch 2 exceptions, and actually parse the string twice. 
> I propose a refactor, that parses the string once and then expresses the 
> fallback chain with queries to the parsed TemporalAccessor. 
>  
> Update: I added a PR that resolves this issue: 
> [https://github.com/apache/hive/pull/1650] 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24404) Hive getUserName close db makes client operations lost metaStoreClient connection

2021-01-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24404?focusedWorklogId=542535&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-542535
 ]

ASF GitHub Bot logged work on HIVE-24404:
-

Author: ASF GitHub Bot
Created on: 27/Jan/21 00:55
Start Date: 27/Jan/21 00:55
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1685:
URL: https://github.com/apache/hive/pull/1685


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 542535)
Time Spent: 40m  (was: 0.5h)

> Hive getUserName close db makes client operations lost metaStoreClient 
> connection
> -
>
> Key: HIVE-24404
> URL: https://issues.apache.org/jira/browse/HIVE-24404
> Project: Hive
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: 2.3.7
> Environment: os: centos 7
> spark: 3.0.1
> hive: 2.3.7
>Reporter: Lichuanliang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> I'm using spark to execute a drop partition sql will always encounter a lost 
> metastore connection warning.
>  Spark ql:
> {code:java}
> alter table mydb.some_table drop if exists partition(dt = '2020-11-12',hh = 
> '17');
> {code}
> Execution log:
> {code:java}
> 20/11/12 19:37:57 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, 
> since hive.security.authorization.manager is set to instance of 
> HiveAuthorizerFactory.20/11/12 19:37:57 WARN SessionState: 
> METASTORE_FILTER_HOOK will be ignored, since 
> hive.security.authorization.manager is set to instance of 
> HiveAuthorizerFactory.20/11/12 19:37:57 WARN RetryingMetaStoreClient: 
> MetaStoreClient lost connection. Attempting to reconnect (1 of 1) after 1s. 
> listPartitionsWithAuthInfoorg.apache.thrift.transport.TTransportException: 
> Cannot write to null outputStream at 
> org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:142)
>  at 
> org.apache.thrift.protocol.TBinaryProtocol.writeI32(TBinaryProtocol.java:185) 
> at 
> org.apache.thrift.protocol.TBinaryProtocol.writeMessageBegin(TBinaryProtocol.java:116)
>  at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:70) at 
> org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:62) at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.send_get_partitions_ps_with_auth(ThriftHiveMetastore.java:2562)
>  at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_partitions_ps_with_auth(ThriftHiveMetastore.java:2549)
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.listPartitionsWithAuthInfo(HiveMetaStoreClient.java:1209)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:173)
>  at com.sun.proxy.$Proxy32.listPartitionsWithAuthInfo(Unknown Source) at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2336)
>  at com.sun.proxy.$Proxy32.listPartitionsWithAuthInfo(Unknown Source) at 
> org.apache.hadoop.hive.ql.metadata.Hive.getPartitions(Hive.java:2555) at 
> org.apache.hadoop.hive.ql.metadata.Hive.getPartitions(Hive.java:2581) at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$dropPartitions$2(HiveClientImpl.scala:628)
>  at 
> scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245)
>  at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) 
> at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) 
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at 
> scala.collection.TraversableLike.flatMap(TraversableLike.scala:245) at 
> scala.collection.TraversableLike.flatMap$(TraversableLike.scala:242) at 
> scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)

[jira] [Work logged] (HIVE-24169) HiveServer2 UDF cache

2021-01-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24169?focusedWorklogId=542537&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-542537
 ]

ASF GitHub Bot logged work on HIVE-24169:
-

Author: ASF GitHub Bot
Created on: 27/Jan/21 00:55
Start Date: 27/Jan/21 00:55
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1503:
URL: https://github.com/apache/hive/pull/1503


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 542537)
Time Spent: 1h 50m  (was: 1h 40m)

> HiveServer2 UDF cache
> -
>
> Key: HIVE-24169
> URL: https://issues.apache.org/jira/browse/HIVE-24169
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Sam An
>Assignee: Sam An
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> UDF is cache per session. This optional feature can help speed up UDF access 
> in S3 scenario.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24370) Make the GetPartitionsProjectionSpec generic and add builder methods for tables and partitions in HiveMetaStoreClient

2021-01-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24370?focusedWorklogId=542536&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-542536
 ]

ASF GitHub Bot logged work on HIVE-24370:
-

Author: ASF GitHub Bot
Created on: 27/Jan/21 00:55
Start Date: 27/Jan/21 00:55
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1664:
URL: https://github.com/apache/hive/pull/1664


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 542536)
Time Spent: 2h 20m  (was: 2h 10m)

> Make the GetPartitionsProjectionSpec generic and add builder methods for 
> tables and partitions in HiveMetaStoreClient
> -
>
> Key: HIVE-24370
> URL: https://issues.apache.org/jira/browse/HIVE-24370
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Narayanan Venkateswaran
>Assignee: Narayanan Venkateswaran
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> HIVE-20306 defines a projection struct called GetPartitionsProjectionSpec 
> While the name has Partition in its name, this is a fairly generic struct 
> with nothing specific to partitions. This should be renamed to a more generic 
> name (GetProjectionSpec ?) and builder methods of this class for tables and 
> partitions must be added to HiveMetaStoreClient.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24687) Consider listStatusIterator in MoveTask

2021-01-26 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-24687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mustafa İman resolved HIVE-24687.
-
Resolution: Invalid

> Consider listStatusIterator in MoveTask
> ---
>
> Key: HIVE-24687
> URL: https://issues.apache.org/jira/browse/HIVE-24687
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mustafa İman
>Assignee: Mustafa İman
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24685) Remove HiveSubQRemoveRelBuilder

2021-01-26 Thread Jesus Camacho Rodriguez (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17272455#comment-17272455
 ] 

Jesus Camacho Rodriguez commented on HIVE-24685:


https://github.com/apache/hive/pull/1878

> Remove HiveSubQRemoveRelBuilder
> ---
>
> Key: HIVE-24685
> URL: https://issues.apache.org/jira/browse/HIVE-24685
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>
> The class seems to be a close clone of {{RelBuilder}} created due to some 
> bugs existing in original implementation. Those issues seem to be fixed now 
> and we should be able to get rid of the copy. In the worst case scenario, if 
> we need to keep it for the time being, we could try to make it extend 
> {{RelBuilder}} and override only necessary methods.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-24668) Improve FileSystem usage in dynamic partition handling

2021-01-26 Thread Peter Varga (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-24668 started by Peter Varga.
--
> Improve FileSystem usage in dynamic partition handling
> --
>
> Key: HIVE-24668
> URL: https://issues.apache.org/jira/browse/HIVE-24668
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>
> Possible improvements:
>  * In the Movetask process both getFullDPSpecs and later 
> Hive::getValidPartitionsInPath do a listing for dynamic partitions in the 
> table, the result of the first can be reused
>  * Hive::listFilesCreatedByQuery does the recursive listing on Hive side, the 
> native recursive listing should be used
>  * if we add a new partition we populate the quickstats, that will do another 
> listing for the new partition, the files are already collected for the 
> writeNotificationlogs, that can be used



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24445) Non blocking DROP table implementation

2021-01-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24445?focusedWorklogId=542388&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-542388
 ]

ASF GitHub Bot logged work on HIVE-24445:
-

Author: ASF GitHub Bot
Created on: 26/Jan/21 20:10
Start Date: 26/Jan/21 20:10
Worklog Time Spent: 10m 
  Work Description: miklosgergely commented on a change in pull request 
#1914:
URL: https://github.com/apache/hive/pull/1914#discussion_r564800166



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/drop/DropTableDesc.java
##
@@ -29,31 +32,40 @@
  * DDL task description for DROP TABLE commands.
  */
 @Explain(displayName = "Drop Table", explainLevels = { Level.USER, 
Level.DEFAULT, Level.EXTENDED })
-public class DropTableDesc implements DDLDesc, Serializable {
+public class DropTableDesc implements DDLDescWithWriteId, Serializable {
   private static final long serialVersionUID = 1L;
 
-  private final String tableName;
+  private final TableName tableName;
   private final boolean ifExists;
   private final boolean purge;
   private final ReplicationSpec replicationSpec;
   private final boolean validationRequired;
+  private final boolean isTransactional;
 
-  public DropTableDesc(String tableName, boolean ifExists, boolean ifPurge, 
ReplicationSpec replicationSpec) {
-this(tableName, ifExists, ifPurge, replicationSpec, true);
+  private long writeId = 0;
+
+  public DropTableDesc(TableName tableName, boolean ifExists, boolean ifPurge, 
ReplicationSpec replicationSpec) {
+this(tableName, ifExists, ifPurge, replicationSpec, true, null);
   }
 
-  public DropTableDesc(String tableName, boolean ifExists, boolean purge, 
ReplicationSpec replicationSpec,
+  public DropTableDesc(TableName tableName, boolean ifExists, boolean ifPurge, 
ReplicationSpec replicationSpec,

Review comment:
   Nit: the variable name should be purge - I know, it was ifPurge before, 
but it's a great opportunity to fix it :)

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/drop/DropTableOperation.java
##
@@ -104,6 +104,7 @@ public int execute() throws HiveException {
 
 // TODO: API w/catalog name
 context.getDb().dropTable(desc.getTableName(), desc.isPurge());
+//context.getDb().dropTable(desc.getTableName(), desc.isPurge(), 
desc.getWriteId());

Review comment:
   I assume that this was not intentionally left commented out.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 542388)
Time Spent: 0.5h  (was: 20m)

> Non blocking DROP table implementation
> --
>
> Key: HIVE-24445
> URL: https://issues.apache.org/jira/browse/HIVE-24445
> Project: Hive
>  Issue Type: New Feature
>  Components: Hive
>Reporter: Zoltan Chovan
>Assignee: Zoltan Chovan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Implement a way to execute drop table operations in a way that doesn't have 
> to wait for currently running read operations to be finished.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24687) Consider listStatusIterator in MoveTask

2021-01-26 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-24687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mustafa İman updated HIVE-24687:

Summary: Consider listStatusIterator in MoveTask  (was: moveFile should use 
listStatusIterator)

> Consider listStatusIterator in MoveTask
> ---
>
> Key: HIVE-24687
> URL: https://issues.apache.org/jira/browse/HIVE-24687
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mustafa İman
>Assignee: Mustafa İman
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24687) moveFile should use listStatusIterator

2021-01-26 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-24687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mustafa İman reassigned HIVE-24687:
---


> moveFile should use listStatusIterator
> --
>
> Key: HIVE-24687
> URL: https://issues.apache.org/jira/browse/HIVE-24687
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mustafa İman
>Assignee: Mustafa İman
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24601) Control CBO fallback behavior via property

2021-01-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24601?focusedWorklogId=542344&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-542344
 ]

ASF GitHub Bot logged work on HIVE-24601:
-

Author: ASF GitHub Bot
Created on: 26/Jan/21 19:06
Start Date: 26/Jan/21 19:06
Worklog Time Spent: 10m 
  Work Description: zabetak opened a new pull request #1875:
URL: https://github.com/apache/hive/pull/1875


   ### What changes were proposed in this pull request?
   1. Add `hive.cbo.fallback.strategy` and plug it to the planner
   2. Replace usage of `hive.in.test` in planner with 
`hive.cbo.fallback.strategy`
   
   ### Why are the changes needed?
   1. Provide the means to fail-fast when CBO failures appear (even in 
production)
   2. Allow finer control on tests requiring to fail on CBO, legacy, or both
   
   ### Does this PR introduce _any_ user-facing change?
   No at its current state. 
   
   I was thinking to even drop the `CONSERVATIVE` option. The CBO error is 
always logged (not hidden) so I don't see why we should complicate the decision 
more.
   
   ### How was this patch tested?
   `mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile_regex="cbo_fallback.*" `
   `mvn test -Dtest=TestNegativeCliDriver -Dqfile_regex="cbo_fallback.*" `
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 542344)
Time Spent: 2h  (was: 1h 50m)

> Control CBO fallback behavior via property
> --
>
> Key: HIVE-24601
> URL: https://issues.apache.org/jira/browse/HIVE-24601
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> When CBO optimizer fails there is a fallback mechanism(HIVE-7413) that will 
> retry to process the query using the legacy Hive optimizer. 
> There are use-cases where this behavior is not desirable notably for the 
> tests (HIVE-16058) but also for end users who would like to disable the 
> fall-back mechanism to avoid running problematic queries without realizing.
> The goal of this issue is to introduce a dedicated Hive property controlling 
> this behavior,{{hive.cbo.fallback.enable}}, for both tests and production. 
> The default value should be true and tests should run with this property set 
> to false. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24601) Control CBO fallback behavior via property

2021-01-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24601?focusedWorklogId=542342&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-542342
 ]

ASF GitHub Bot logged work on HIVE-24601:
-

Author: ASF GitHub Bot
Created on: 26/Jan/21 19:05
Start Date: 26/Jan/21 19:05
Worklog Time Spent: 10m 
  Work Description: zabetak closed pull request #1875:
URL: https://github.com/apache/hive/pull/1875


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 542342)
Time Spent: 1h 40m  (was: 1.5h)

> Control CBO fallback behavior via property
> --
>
> Key: HIVE-24601
> URL: https://issues.apache.org/jira/browse/HIVE-24601
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> When CBO optimizer fails there is a fallback mechanism(HIVE-7413) that will 
> retry to process the query using the legacy Hive optimizer. 
> There are use-cases where this behavior is not desirable notably for the 
> tests (HIVE-16058) but also for end users who would like to disable the 
> fall-back mechanism to avoid running problematic queries without realizing.
> The goal of this issue is to introduce a dedicated Hive property controlling 
> this behavior,{{hive.cbo.fallback.enable}}, for both tests and production. 
> The default value should be true and tests should run with this property set 
> to false. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24601) Control CBO fallback behavior via property

2021-01-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24601?focusedWorklogId=542343&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-542343
 ]

ASF GitHub Bot logged work on HIVE-24601:
-

Author: ASF GitHub Bot
Created on: 26/Jan/21 19:05
Start Date: 26/Jan/21 19:05
Worklog Time Spent: 10m 
  Work Description: zabetak commented on pull request #1875:
URL: https://github.com/apache/hive/pull/1875#issuecomment-767761319


   Close/Reopen to retrigger tests.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 542343)
Time Spent: 1h 50m  (was: 1h 40m)

> Control CBO fallback behavior via property
> --
>
> Key: HIVE-24601
> URL: https://issues.apache.org/jira/browse/HIVE-24601
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> When CBO optimizer fails there is a fallback mechanism(HIVE-7413) that will 
> retry to process the query using the legacy Hive optimizer. 
> There are use-cases where this behavior is not desirable notably for the 
> tests (HIVE-16058) but also for end users who would like to disable the 
> fall-back mechanism to avoid running problematic queries without realizing.
> The goal of this issue is to introduce a dedicated Hive property controlling 
> this behavior,{{hive.cbo.fallback.enable}}, for both tests and production. 
> The default value should be true and tests should run with this property set 
> to false. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24392) Send table id in get_parttions_by_names_req api

2021-01-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24392?focusedWorklogId=542325&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-542325
 ]

ASF GitHub Bot logged work on HIVE-24392:
-

Author: ASF GitHub Bot
Created on: 26/Jan/21 18:49
Start Date: 26/Jan/21 18:49
Worklog Time Spent: 10m 
  Work Description: kishendas edited a comment on pull request #1909:
URL: https://github.com/apache/hive/pull/1909#issuecomment-767737511


   > …equest
   > 
   > ### What changes were proposed in this pull request?
   HMS thrift interface change.
   > ### Why are the changes needed?
   > Add two optional entries for thrift structure GetPartitionsByNamesRequest:
   > getFileMetadata : get file metadata or not.
   > id for table ID
   > It will be used by the metadata cache feature.
   > 
   > ### Does this PR introduce _any_ user-facing change?
   Yes, new optional members. Backward compatible for HMS server. 
   > ### How was this patch tested?
   Hive build works fine with the new interface.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 542325)
Time Spent: 2.5h  (was: 2h 20m)

> Send table id in get_parttions_by_names_req api
> ---
>
> Key: HIVE-24392
> URL: https://issues.apache.org/jira/browse/HIVE-24392
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Table id is not part of the get_partitions_by_names_req API thrift 
> definition, add it by this Jira



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24445) Non blocking DROP table implementation

2021-01-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24445?focusedWorklogId=542326&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-542326
 ]

ASF GitHub Bot logged work on HIVE-24445:
-

Author: ASF GitHub Bot
Created on: 26/Jan/21 18:49
Start Date: 26/Jan/21 18:49
Worklog Time Spent: 10m 
  Work Description: zchovan commented on pull request #1914:
URL: https://github.com/apache/hive/pull/1914#issuecomment-767751483


   @deniskuzZ 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 542326)
Time Spent: 20m  (was: 10m)

> Non blocking DROP table implementation
> --
>
> Key: HIVE-24445
> URL: https://issues.apache.org/jira/browse/HIVE-24445
> Project: Hive
>  Issue Type: New Feature
>  Components: Hive
>Reporter: Zoltan Chovan
>Assignee: Zoltan Chovan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Implement a way to execute drop table operations in a way that doesn't have 
> to wait for currently running read operations to be finished.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24445) Non blocking DROP table implementation

2021-01-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24445:
--
Labels: pull-request-available  (was: )

> Non blocking DROP table implementation
> --
>
> Key: HIVE-24445
> URL: https://issues.apache.org/jira/browse/HIVE-24445
> Project: Hive
>  Issue Type: New Feature
>  Components: Hive
>Reporter: Zoltan Chovan
>Assignee: Zoltan Chovan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Implement a way to execute drop table operations in a way that doesn't have 
> to wait for currently running read operations to be finished.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24445) Non blocking DROP table implementation

2021-01-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24445?focusedWorklogId=542324&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-542324
 ]

ASF GitHub Bot logged work on HIVE-24445:
-

Author: ASF GitHub Bot
Created on: 26/Jan/21 18:48
Start Date: 26/Jan/21 18:48
Worklog Time Spent: 10m 
  Work Description: zchovan opened a new pull request #1914:
URL: https://github.com/apache/hive/pull/1914


   Change-Id: I594a724ccc6c836e6a735e9ef71851150b0f0c9b
   
   
   
   ### What changes were proposed in this pull request?
   
   * added nem column to TBLS backend db table
   * updated the related JDO entities
   * added hive.txn.lockless.reads.enabled feature flag
   * added writeId to DropTableOperation/Desc
   
   ### Why are the changes needed?
   
   This is the first part of a bigger feature "lockless reads" the main goal is 
to avoid requesting locks for read operations, thus improving acid table 
performance
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   
   ### How was this patch tested?
   
   TBD
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 542324)
Remaining Estimate: 0h
Time Spent: 10m

> Non blocking DROP table implementation
> --
>
> Key: HIVE-24445
> URL: https://issues.apache.org/jira/browse/HIVE-24445
> Project: Hive
>  Issue Type: New Feature
>  Components: Hive
>Reporter: Zoltan Chovan
>Assignee: Zoltan Chovan
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Implement a way to execute drop table operations in a way that doesn't have 
> to wait for currently running read operations to be finished.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24456) Column masking/hashing function in hive should use SH512 if FIPS mode is enabled

2021-01-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24456?focusedWorklogId=542307&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-542307
 ]

ASF GitHub Bot logged work on HIVE-24456:
-

Author: ASF GitHub Bot
Created on: 26/Jan/21 18:32
Start Date: 26/Jan/21 18:32
Worklog Time Spent: 10m 
  Work Description: yongzhi commented on a change in pull request #1721:
URL: https://github.com/apache/hive/pull/1721#discussion_r564739195



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -4075,6 +4075,9 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
 "If hive (in tez mode only) cannot find a usable hive jar in 
\"hive.jar.directory\", \n" +
 "it will upload the hive jar to 
\"hive.user.install.directory/user.name\"\n" +
 "and use it to run queries."),
+HIVE_MASKING_ALGO("hive.masking.algo","sha256", "This property is used to 
indicate whether " +

Review comment:
   Did you run tests? Could you add a test for it?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 542307)
Time Spent: 1h  (was: 50m)

> Column masking/hashing function in hive should use SH512 if FIPS mode is 
> enabled
> 
>
> Key: HIVE-24456
> URL: https://issues.apache.org/jira/browse/HIVE-24456
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> hive-site.xml should have the following property to indicate that FIPS mode 
> is enabled.
> 
>     hive.masking.algo
>      sha512
> 
> If this property is present, then GenericUDFMaskHash should use SHA512 
> instead of SHA256 encoding for column masking.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24601) Control CBO fallback behavior via property

2021-01-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24601?focusedWorklogId=542302&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-542302
 ]

ASF GitHub Bot logged work on HIVE-24601:
-

Author: ASF GitHub Bot
Created on: 26/Jan/21 18:29
Start Date: 26/Jan/21 18:29
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on pull request #1875:
URL: https://github.com/apache/hive/pull/1875#issuecomment-767739852


   Can you trigger tests again? I think there was some failure?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 542302)
Time Spent: 1.5h  (was: 1h 20m)

> Control CBO fallback behavior via property
> --
>
> Key: HIVE-24601
> URL: https://issues.apache.org/jira/browse/HIVE-24601
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> When CBO optimizer fails there is a fallback mechanism(HIVE-7413) that will 
> retry to process the query using the legacy Hive optimizer. 
> There are use-cases where this behavior is not desirable notably for the 
> tests (HIVE-16058) but also for end users who would like to disable the 
> fall-back mechanism to avoid running problematic queries without realizing.
> The goal of this issue is to introduce a dedicated Hive property controlling 
> this behavior,{{hive.cbo.fallback.enable}}, for both tests and production. 
> The default value should be true and tests should run with this property set 
> to false. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24392) Send table id in get_parttions_by_names_req api

2021-01-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24392?focusedWorklogId=542295&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-542295
 ]

ASF GitHub Bot logged work on HIVE-24392:
-

Author: ASF GitHub Bot
Created on: 26/Jan/21 18:25
Start Date: 26/Jan/21 18:25
Worklog Time Spent: 10m 
  Work Description: kishendas commented on pull request #1909:
URL: https://github.com/apache/hive/pull/1909#issuecomment-767737511


   > …equest
   > 
   > ### What changes were proposed in this pull request?
   > ### Why are the changes needed?
   > Add two optional entries for thrift structure GetPartitionsByNamesRequest:
   > getFileMetadata : get file metadata or not.
   > id for table ID
   > It will be used by the metadata cache feature.
   > 
   > ### Does this PR introduce _any_ user-facing change?
   > ### How was this patch tested?
   
   Please answer all the questions in this template. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 542295)
Time Spent: 2h 20m  (was: 2h 10m)

> Send table id in get_parttions_by_names_req api
> ---
>
> Key: HIVE-24392
> URL: https://issues.apache.org/jira/browse/HIVE-24392
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Table id is not part of the get_partitions_by_names_req API thrift 
> definition, add it by this Jira



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24637) Make Tez progress log interval configurable

2021-01-26 Thread Johan Gustavsson (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17272311#comment-17272311
 ] 

Johan Gustavsson commented on HIVE-24637:
-

Thank you [~kgyrtkirk]!

> Make Tez progress log interval configurable
> ---
>
> Key: HIVE-24637
> URL: https://issues.apache.org/jira/browse/HIVE-24637
> Project: Hive
>  Issue Type: Improvement
>  Components: Clients
>Affects Versions: 2.4.0, 3.1.2, 4.0.0
>Reporter: Johan Gustavsson
>Assignee: Johan Gustavsson
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> In case of hive on MR we can configure how often the progress log is updated 
> in the client side with the parameter 
> [https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L2041-L2050]
>  while in case of Hive on Tez this value is hard coded here 
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/monitoring/RenderStrategy.java#L42].
>  The default value in Tez is also significantly shorter than that of MR (6 VS 
> 3 S) meaning for longer queries the client log can get very long.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24601) Control CBO fallback behavior via property

2021-01-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24601?focusedWorklogId=542284&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-542284
 ]

ASF GitHub Bot logged work on HIVE-24601:
-

Author: ASF GitHub Bot
Created on: 26/Jan/21 18:08
Start Date: 26/Jan/21 18:08
Worklog Time Spent: 10m 
  Work Description: zabetak commented on a change in pull request #1875:
URL: https://github.com/apache/hive/pull/1875#discussion_r564722814



##
File path: ql/src/test/queries/clientpositive/analyze_npe.q
##
@@ -1,3 +1,4 @@
+--! qt:disabled:HIVE-24656

Review comment:
   Yeap, I reenabled the test and rebased the PR.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 542284)
Time Spent: 1h 20m  (was: 1h 10m)

> Control CBO fallback behavior via property
> --
>
> Key: HIVE-24601
> URL: https://issues.apache.org/jira/browse/HIVE-24601
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> When CBO optimizer fails there is a fallback mechanism(HIVE-7413) that will 
> retry to process the query using the legacy Hive optimizer. 
> There are use-cases where this behavior is not desirable notably for the 
> tests (HIVE-16058) but also for end users who would like to disable the 
> fall-back mechanism to avoid running problematic queries without realizing.
> The goal of this issue is to introduce a dedicated Hive property controlling 
> this behavior,{{hive.cbo.fallback.enable}}, for both tests and production. 
> The default value should be true and tests should run with this property set 
> to false. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24584) IndexOutOfBoundsException from Kryo when running msck repair

2021-01-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24584?focusedWorklogId=542276&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-542276
 ]

ASF GitHub Bot logged work on HIVE-24584:
-

Author: ASF GitHub Bot
Created on: 26/Jan/21 17:39
Start Date: 26/Jan/21 17:39
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on pull request #1828:
URL: https://github.com/apache/hive/pull/1828#issuecomment-767708643


   @zeroflag Just seeing this. I have very little context to the prior fix. Has 
@shameersss1 been able to review this? Thanks



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 542276)
Time Spent: 0.5h  (was: 20m)

> IndexOutOfBoundsException from Kryo when running msck repair
> 
>
> Key: HIVE-24584
> URL: https://issues.apache.org/jira/browse/HIVE-24584
> Project: Hive
>  Issue Type: Bug
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The following exception is coming when running "msck repair table t1 sync 
> partitions".
> {code:java}
> java.lang.IndexOutOfBoundsException: Index: 97, Size: 0
> at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_232]
> at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_232]
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:834)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:684) 
> ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:814)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:116)
>  [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.filterPartitionsByExpr(PartitionExpressionForMetastore.java:88)
>  [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24569) LLAP daemon leaks file descriptors/log4j appenders

2021-01-26 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-24569.
-
Resolution: Fixed

merged into master. Thank you Stamatis for fixing this (and also extending the 
test infra along the way) and Prashanth for reviewing the changes!

> LLAP daemon leaks file descriptors/log4j appenders
> --
>
> Key: HIVE-24569
> URL: https://issues.apache.org/jira/browse/HIVE-24569
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: llap-appender-gc-roots.png
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> With HIVE-9756 query logs in LLAP are directed to different files (file per 
> query) using a Log4j2 routing appender. Without a purge policy in place, 
> appenders are created dynamically by the routing appender, one for each 
> query, and remain in memory forever. The dynamic appenders write to files so 
> each appender holds to a file descriptor. 
> Further work HIVE-14224 has mitigated the issue by introducing a custom 
> purging policy (LlapRoutingAppenderPurgePolicy) which deletes the dynamic 
> appenders (and closes the respective files) when the query is completed 
> (org.apache.hadoop.hive.llap.daemon.impl.QueryTracker#handleLogOnQueryCompletion).
>  
> However, in the presence of multiple threads appending to the logs there are 
> race conditions. In an internal Hive cluster the number of file descriptors 
> started going up approx one descriptor leaking per query. After some 
> debugging it turns out that one thread (running the 
> QueryTracker#handleLogOnQueryCompletion) signals that the query has finished 
> and thus the purge policy should get rid of the respective appender (and 
> close the file) while another (Task-Executor-0) attempts to append another 
> log message for the same query. The initial appender is closed after the 
> request from the query tracker but a new one is created to accomodate the 
> message from the task executor and the latter is never removed thus creating 
> a leak. 
> Similar leaks have been identified and fixed for HS2 with the most similar 
> one being that described 
> [here|https://issues.apache.org/jira/browse/HIVE-22753?focusedCommentId=17021041&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17021041].
>  
> The problem relies on the timing of threads so it may not manifestate in all 
> versions between 2.2.0 and 4.0.0. Usually the leak can be seen either via 
> lsof (or other similar command) with the following output:
> {noformat}
> # 1494391 is the PID of the LLAP daemon process
> ls -ltr /proc/1494391/fd
> ...
> lrwx-- 1 hive hadoop 64 Dec 24 12:08 978 -> 
> /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121724_66ce273d-54a9-4dcd-a9fb-20cb5691cef7-dag_1608659125567_0008_194.log
> lrwx-- 1 hive hadoop 64 Dec 24 12:08 977 -> 
> /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121804_ce53eeb5-c73f-4999-b7a4-b4dd04d4e4de-dag_1608659125567_0008_197.log
> lrwx-- 1 hive hadoop 64 Dec 24 12:08 974 -> 
> /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224122002_1693bd7d-2f0e-4673-a8d1-b7cb14a02204-dag_1608659125567_0008_204.log
> lrwx-- 1 hive hadoop 64 Dec 24 12:08 989 -> 
> /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121909_6a56218f-06c7-4906-9907-4b6dd824b100-dag_1608659125567_0008_201.log
> lrwx-- 1 hive hadoop 64 Dec 24 12:08 984 -> 
> /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121754_78ef49a0-bc23-478f-9a16-87fa25e7a287-dag_1608659125567_0008_196.log
> lrwx-- 1 hive hadoop 64 Dec 24 12:08 983 -> 
> /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121855_e65b9ebf-b2ec-4159-9570-1904442b7048-dag_1608659125567_0008_200.log
> lrwx-- 1 hive hadoop 64 Dec 24 12:08 981 -> 
> /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121818_e9051ae3-1316-46af-aabb-22c53ed2fda7-dag_1608659125567_0008_198.log
> lrwx-- 1 hive hadoop 64 Dec 24 12:08 980 -> 
> /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121744_fcf37921-4351-4368-95ee-b5be2592d89a-dag_1608659125567_0008_195.log
> lrwx-- 1 hive hadoop 64 Dec 24 12:08 979 -> 
> /hadoop/yarn/log/application_1608659125567_0006/conta

[jira] [Work logged] (HIVE-24569) LLAP daemon leaks file descriptors/log4j appenders

2021-01-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24569?focusedWorklogId=542253&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-542253
 ]

ASF GitHub Bot logged work on HIVE-24569:
-

Author: ASF GitHub Bot
Created on: 26/Jan/21 16:48
Start Date: 26/Jan/21 16:48
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk merged pull request #1858:
URL: https://github.com/apache/hive/pull/1858


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 542253)
Time Spent: 1h 40m  (was: 1.5h)

> LLAP daemon leaks file descriptors/log4j appenders
> --
>
> Key: HIVE-24569
> URL: https://issues.apache.org/jira/browse/HIVE-24569
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: llap-appender-gc-roots.png
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> With HIVE-9756 query logs in LLAP are directed to different files (file per 
> query) using a Log4j2 routing appender. Without a purge policy in place, 
> appenders are created dynamically by the routing appender, one for each 
> query, and remain in memory forever. The dynamic appenders write to files so 
> each appender holds to a file descriptor. 
> Further work HIVE-14224 has mitigated the issue by introducing a custom 
> purging policy (LlapRoutingAppenderPurgePolicy) which deletes the dynamic 
> appenders (and closes the respective files) when the query is completed 
> (org.apache.hadoop.hive.llap.daemon.impl.QueryTracker#handleLogOnQueryCompletion).
>  
> However, in the presence of multiple threads appending to the logs there are 
> race conditions. In an internal Hive cluster the number of file descriptors 
> started going up approx one descriptor leaking per query. After some 
> debugging it turns out that one thread (running the 
> QueryTracker#handleLogOnQueryCompletion) signals that the query has finished 
> and thus the purge policy should get rid of the respective appender (and 
> close the file) while another (Task-Executor-0) attempts to append another 
> log message for the same query. The initial appender is closed after the 
> request from the query tracker but a new one is created to accomodate the 
> message from the task executor and the latter is never removed thus creating 
> a leak. 
> Similar leaks have been identified and fixed for HS2 with the most similar 
> one being that described 
> [here|https://issues.apache.org/jira/browse/HIVE-22753?focusedCommentId=17021041&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17021041].
>  
> The problem relies on the timing of threads so it may not manifestate in all 
> versions between 2.2.0 and 4.0.0. Usually the leak can be seen either via 
> lsof (or other similar command) with the following output:
> {noformat}
> # 1494391 is the PID of the LLAP daemon process
> ls -ltr /proc/1494391/fd
> ...
> lrwx-- 1 hive hadoop 64 Dec 24 12:08 978 -> 
> /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121724_66ce273d-54a9-4dcd-a9fb-20cb5691cef7-dag_1608659125567_0008_194.log
> lrwx-- 1 hive hadoop 64 Dec 24 12:08 977 -> 
> /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121804_ce53eeb5-c73f-4999-b7a4-b4dd04d4e4de-dag_1608659125567_0008_197.log
> lrwx-- 1 hive hadoop 64 Dec 24 12:08 974 -> 
> /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224122002_1693bd7d-2f0e-4673-a8d1-b7cb14a02204-dag_1608659125567_0008_204.log
> lrwx-- 1 hive hadoop 64 Dec 24 12:08 989 -> 
> /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121909_6a56218f-06c7-4906-9907-4b6dd824b100-dag_1608659125567_0008_201.log
> lrwx-- 1 hive hadoop 64 Dec 24 12:08 984 -> 
> /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121754_78ef49a0-bc23-478f-9a16-87fa25e7a287-dag_1608659125567_0008_196.log
> lrwx-- 1 hive hadoop 64 Dec 24 12:08 983 -> 
> /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121855_e65b9ebf-b2ec-4159-9570-1904442b7048-dag_1608659125567_0008_200

[jira] [Work logged] (HIVE-24673) Migrate NegativeCliDriver and NegativeMinimrCliDriver to llap

2021-01-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24673?focusedWorklogId=542252&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-542252
 ]

ASF GitHub Bot logged work on HIVE-24673:
-

Author: ASF GitHub Bot
Created on: 26/Jan/21 16:43
Start Date: 26/Jan/21 16:43
Worklog Time Spent: 10m 
  Work Description: mustafaiman commented on pull request #1902:
URL: https://github.com/apache/hive/pull/1902#issuecomment-767671948


   @kgyrtkirk can you review this?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 542252)
Time Spent: 1h 20m  (was: 1h 10m)

> Migrate NegativeCliDriver and NegativeMinimrCliDriver to llap
> -
>
> Key: HIVE-24673
> URL: https://issues.apache.org/jira/browse/HIVE-24673
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mustafa İman
>Assignee: Mustafa İman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> These test drivers should run on llap. Otherwise we can run into situations 
> where certain queries correctly fail on MapReduce but not on Tez.
> Also, it is better if negative cli drivers does not mask "Caused by" lines in 
> test output. Otherwise, a query may start to fail for other reasons than the 
> expected one and we do not realize it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22534) ACID: Improve Compactor thread logging

2021-01-26 Thread Karen Coppage (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage updated HIVE-22534:
-
Resolution: Won't Fix
Status: Resolved  (was: Patch Available)

> ACID: Improve Compactor thread logging
> --
>
> Key: HIVE-22534
> URL: https://issues.apache.org/jira/browse/HIVE-22534
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Minor
> Attachments: HIVE-22534.01.patch, HIVE-22534.02.patch, 
> HIVE-22534.03.patch, HIVE-22534.04.patch, HIVE-22534.05.patch, 
> HIVE-22534.06.patch, HIVE-22534.07.patch, HIVE-22534.08.patch, 
> HIVE-22534.09.patch, HIVE-22534.10.patch, HIVE-22534.11.patch, 
> HIVE-22534.12.patch, HIVE-22534.13.patch, HIVE-22534.14.patch
>
>
> Make sure that it is easy to find issues when one of the compactor thread 
> fails.
> Maybe:
>  * MDC - with iteration / threadname - so we can easily grep the logs for a 
> given run
>  * MDC with table/partition data on which the worker is working



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24664) Support column aliases in Values clause

2021-01-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24664?focusedWorklogId=542237&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-542237
 ]

ASF GitHub Bot logged work on HIVE-24664:
-

Author: ASF GitHub Bot
Created on: 26/Jan/21 15:33
Start Date: 26/Jan/21 15:33
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #1892:
URL: https://github.com/apache/hive/pull/1892#discussion_r564604533



##
File path: parser/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g
##
@@ -151,11 +182,46 @@ expressionsNotInParenthesis[boolean isStruct, boolean 
forceStruct]
 -> {$more.tree}
 ;
 
-expressionPart[CommonTree t, boolean isStruct]
+expressionPart[CommonTree firstExprTree, boolean isStruct]
 :
 (COMMA expression)+
--> {isStruct}? ^(TOK_FUNCTION Identifier["struct"] {$t} expression+)
--> {$t} expression+
+-> {isStruct}? ^(TOK_FUNCTION Identifier["struct"] {$firstExprTree} 
expression+)
+-> {$firstExprTree} expression+
+;
+
+// Parses comma separated list of expressions with optionally specified 
aliases and store the aliases for further usage.
+//  [] [,  []]
+firstExpressionsWithAlias
+@init { initAliases(); }
+:
+first=expression colAlias=identifier? (COMMA expressionWithAlias)*
+-> {colAlias != null}? ^(TOK_FUNCTION Identifier["named_struct"] { 
adaptor.create(Identifier, addAlias($colAlias.tree.getText())) } {$first.tree} 
expressionWithAlias*)

Review comment:
   I changed the rule `valuesTableConstructor` by adding an alternative: if 
there are no aliases defined for any of the values let's parse the values 
clause in the lagacy way which transforms to array of structs.
   This seems to revert lots of the q.out files.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 542237)
Time Spent: 1h 10m  (was: 1h)

> Support column aliases in Values clause
> ---
>
> Key: HIVE-24664
> URL: https://issues.apache.org/jira/browse/HIVE-24664
> Project: Hive
>  Issue Type: Improvement
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Enable explicitly specify column aliases in the first row of Values clause. 
> If not all the columns has alias specified generate one.
> {code:java}
> values(1, 2 b, 3 c),(4, 5, 6);
> {code}
> {code:java}
> _col1   b   c
>   1 2   3
>   4 5   6
> {code}
>  This is not an standard SQL feature but some database engines like Impala 
> supports it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24564) Extend PPD filter transitivity to be able to discover new opportunities

2021-01-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24564?focusedWorklogId=542234&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-542234
 ]

ASF GitHub Bot logged work on HIVE-24564:
-

Author: ASF GitHub Bot
Created on: 26/Jan/21 15:26
Start Date: 26/Jan/21 15:26
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #1811:
URL: https://github.com/apache/hive/pull/1811#discussion_r564598717



##
File path: ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java
##
@@ -728,6 +763,166 @@ private void applyFilterTransitivity(JoinOperator join, 
int targetPos, OpWalkerI
 }
   }
 }
+
+private Set 
collectColumnsInPredicates(List predicates) {

Review comment:
   added javadoc





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 542234)
Time Spent: 1.5h  (was: 1h 20m)

> Extend PPD filter transitivity to be able to discover new opportunities
> ---
>
> Key: HIVE-24564
> URL: https://issues.apache.org/jira/browse/HIVE-24564
> Project: Hive
>  Issue Type: Improvement
>  Components: Logical Optimizer
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> If a predicate references a value column of one of the parent ReduceSink 
> operators of a Join the predicate can not be copied and pushed down to the 
> other side of the join. However if we a parent equijoin exists in the branch 
> of the RS where 
>  1. the referenced value column is a key column of that join
>  2. and the other side of that join expression is the key column of the RS
>  the column in the predicate can be replaced and the new predicate can be 
> pushed down.
> {code:java}
>Join(... = wr_on)
>   / \
> ...  RS(key: wr_on)
>   |
>   Join(ws1.ws_on = ws2.ws_on)
>   (ws1.ws_on, ws2.ws_on, wr_on)
>   / \
>   RS(key:ws_on)  
> RS(key:ws_on)
> (value: wr_on)
>|  
>  |
>Join(ws1.ws_on = wr.wr_on)   
> TS(ws2)
>/\
>  RS(key:ws_on)  RS(key:wr_on)
>||
> TS(ws1)   TS(wr)
> {code}
> A predicate like
> {code}
> (wr_on in (...))
> {code}
> can not be pushed to TS(ws2) because wr_on is not a key column in 
> Join(ws1.ws_on = ws2.ws_on). But we know that wr_on is equals to ws_on 
> because the join from the left branch. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24564) Extend PPD filter transitivity to be able to discover new opportunities

2021-01-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24564?focusedWorklogId=542231&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-542231
 ]

ASF GitHub Bot logged work on HIVE-24564:
-

Author: ASF GitHub Bot
Created on: 26/Jan/21 15:25
Start Date: 26/Jan/21 15:25
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #1811:
URL: https://github.com/apache/hive/pull/1811#discussion_r564598010



##
File path: ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java
##
@@ -712,14 +714,47 @@ private void applyFilterTransitivity(JoinOperator join, 
int targetPos, OpWalkerI
   if (!sourceAliases.contains(entry.getKey())) {
 continue;
   }
+
+  Set columnsInPredicates = null;
+  if (HiveConf.getBoolVar(owi.getParseContext().getConf(),

Review comment:
   Done

##
File path: ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java
##
@@ -712,14 +714,47 @@ private void applyFilterTransitivity(JoinOperator join, 
int targetPos, OpWalkerI
   if (!sourceAliases.contains(entry.getKey())) {
 continue;
   }
+
+  Set columnsInPredicates = null;
+  if (HiveConf.getBoolVar(owi.getParseContext().getConf(),
+  HiveConf.ConfVars.HIVEPPD_RECOGNIZE_COLUMN_EQUALITIES)) {
+columnsInPredicates = owi.getColumnsInPredicates().get(source);
+if (columnsInPredicates == null) {
+  columnsInPredicates = 
collectColumnsInPredicates(entry.getValue());
+  owi.getColumnsInPredicates().put(source, columnsInPredicates);
+}
+  }
+
   for (ExprNodeDesc predicate : entry.getValue()) {
 ExprNodeDesc backtrack = ExprNodeDescUtils.backtrack(predicate, 
join, source);
 if (backtrack == null) {
   continue;
 }
 ExprNodeDesc replaced = ExprNodeDescUtils.replace(backtrack, 
sourceKeys, targetKeys);
 if (replaced == null) {
-  continue;
+  if (!HiveConf.getBoolVar(owi.getParseContext().getConf(),

Review comment:
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 542231)
Time Spent: 1h 10m  (was: 1h)

> Extend PPD filter transitivity to be able to discover new opportunities
> ---
>
> Key: HIVE-24564
> URL: https://issues.apache.org/jira/browse/HIVE-24564
> Project: Hive
>  Issue Type: Improvement
>  Components: Logical Optimizer
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> If a predicate references a value column of one of the parent ReduceSink 
> operators of a Join the predicate can not be copied and pushed down to the 
> other side of the join. However if we a parent equijoin exists in the branch 
> of the RS where 
>  1. the referenced value column is a key column of that join
>  2. and the other side of that join expression is the key column of the RS
>  the column in the predicate can be replaced and the new predicate can be 
> pushed down.
> {code:java}
>Join(... = wr_on)
>   / \
> ...  RS(key: wr_on)
>   |
>   Join(ws1.ws_on = ws2.ws_on)
>   (ws1.ws_on, ws2.ws_on, wr_on)
>   / \
>   RS(key:ws_on)  
> RS(key:ws_on)
> (value: wr_on)
>|  
>  |
>Join(ws1.ws_on = wr.wr_on)   
> TS(ws2)
>/\
>  RS(key:ws_on)  RS(key:wr_on)
>||
> TS(ws1)   TS(wr)
> {code}
> A predicate like
> {code}
> (wr_on in (...))
> {code}
> can not be pushed to TS(ws2) because wr_on is not a key column in 
> Join(ws1.ws_on = ws2.ws_on). But we know that wr_

[jira] [Work logged] (HIVE-24564) Extend PPD filter transitivity to be able to discover new opportunities

2021-01-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24564?focusedWorklogId=542232&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-542232
 ]

ASF GitHub Bot logged work on HIVE-24564:
-

Author: ASF GitHub Bot
Created on: 26/Jan/21 15:25
Start Date: 26/Jan/21 15:25
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #1811:
URL: https://github.com/apache/hive/pull/1811#discussion_r564598480



##
File path: ql/src/java/org/apache/hadoop/hive/ql/ppd/OpWalkerInfo.java
##
@@ -39,11 +43,15 @@
 opToPushdownPredMap;
   private final ParseContext pGraphContext;
   private final List candidateFilterOps;
+  private final Map, Set> columnsInPredicates;

Review comment:
   Added comments





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 542232)
Time Spent: 1h 20m  (was: 1h 10m)

> Extend PPD filter transitivity to be able to discover new opportunities
> ---
>
> Key: HIVE-24564
> URL: https://issues.apache.org/jira/browse/HIVE-24564
> Project: Hive
>  Issue Type: Improvement
>  Components: Logical Optimizer
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> If a predicate references a value column of one of the parent ReduceSink 
> operators of a Join the predicate can not be copied and pushed down to the 
> other side of the join. However if we a parent equijoin exists in the branch 
> of the RS where 
>  1. the referenced value column is a key column of that join
>  2. and the other side of that join expression is the key column of the RS
>  the column in the predicate can be replaced and the new predicate can be 
> pushed down.
> {code:java}
>Join(... = wr_on)
>   / \
> ...  RS(key: wr_on)
>   |
>   Join(ws1.ws_on = ws2.ws_on)
>   (ws1.ws_on, ws2.ws_on, wr_on)
>   / \
>   RS(key:ws_on)  
> RS(key:ws_on)
> (value: wr_on)
>|  
>  |
>Join(ws1.ws_on = wr.wr_on)   
> TS(ws2)
>/\
>  RS(key:ws_on)  RS(key:wr_on)
>||
> TS(ws1)   TS(wr)
> {code}
> A predicate like
> {code}
> (wr_on in (...))
> {code}
> can not be pushed to TS(ws2) because wr_on is not a key column in 
> Join(ws1.ws_on = ws2.ws_on). But we know that wr_on is equals to ws_on 
> because the join from the left branch. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24564) Extend PPD filter transitivity to be able to discover new opportunities

2021-01-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24564?focusedWorklogId=542230&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-542230
 ]

ASF GitHub Bot logged work on HIVE-24564:
-

Author: ASF GitHub Bot
Created on: 26/Jan/21 15:25
Start Date: 26/Jan/21 15:25
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #1811:
URL: https://github.com/apache/hive/pull/1811#discussion_r564597786



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -2461,6 +2461,10 @@ private static void 
populateLlapDaemonVarsSet(Set llapDaemonVarsSetLocal
 "Whether to enable predicate pushdown through windowing"),
 HIVEPPDRECOGNIZETRANSITIVITY("hive.ppd.recognizetransivity", true,
 "Whether to transitively replicate predicate filters over equijoin 
conditions."),
+
HIVEPPD_RECOGNIZE_COLUMN_EQUALITIES("hive.ppd.recognize.column.equalities", 
true,
+"When hive.ppd.recognizetransivity is true Whether traverse join 
branches to discover equal columns based" +
+" on equijoin keys and try to substitute equal columns to 
predicates " +
+"and push down to the other branch."),
 HIVEPPDREMOVEDUPLICATEFILTERS("hive.ppd.remove.duplicatefilters", true,

Review comment:
   Replaced





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 542230)
Time Spent: 1h  (was: 50m)

> Extend PPD filter transitivity to be able to discover new opportunities
> ---
>
> Key: HIVE-24564
> URL: https://issues.apache.org/jira/browse/HIVE-24564
> Project: Hive
>  Issue Type: Improvement
>  Components: Logical Optimizer
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> If a predicate references a value column of one of the parent ReduceSink 
> operators of a Join the predicate can not be copied and pushed down to the 
> other side of the join. However if we a parent equijoin exists in the branch 
> of the RS where 
>  1. the referenced value column is a key column of that join
>  2. and the other side of that join expression is the key column of the RS
>  the column in the predicate can be replaced and the new predicate can be 
> pushed down.
> {code:java}
>Join(... = wr_on)
>   / \
> ...  RS(key: wr_on)
>   |
>   Join(ws1.ws_on = ws2.ws_on)
>   (ws1.ws_on, ws2.ws_on, wr_on)
>   / \
>   RS(key:ws_on)  
> RS(key:ws_on)
> (value: wr_on)
>|  
>  |
>Join(ws1.ws_on = wr.wr_on)   
> TS(ws2)
>/\
>  RS(key:ws_on)  RS(key:wr_on)
>||
> TS(ws1)   TS(wr)
> {code}
> A predicate like
> {code}
> (wr_on in (...))
> {code}
> can not be pushed to TS(ws2) because wr_on is not a key column in 
> Join(ws1.ws_on = ws2.ws_on). But we know that wr_on is equals to ws_on 
> because the join from the left branch. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24564) Extend PPD filter transitivity to be able to discover new opportunities

2021-01-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24564?focusedWorklogId=542229&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-542229
 ]

ASF GitHub Bot logged work on HIVE-24564:
-

Author: ASF GitHub Bot
Created on: 26/Jan/21 15:24
Start Date: 26/Jan/21 15:24
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #1811:
URL: https://github.com/apache/hive/pull/1811#discussion_r564597536



##
File path: ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java
##
@@ -728,6 +769,134 @@ private void applyFilterTransitivity(JoinOperator join, 
int targetPos, OpWalkerI
 }
   }
 }
+
+private void extractColumnExprNodes(ExprNodeDesc exprNodeDesc, 
List result) {
+  if (exprNodeDesc instanceof ExprNodeColumnDesc) {
+result.add((ExprNodeColumnDesc) exprNodeDesc);
+return;
+  }
+  if (exprNodeDesc instanceof ExprNodeGenericFuncDesc) {
+for (ExprNodeDesc child : exprNodeDesc.getChildren()) {
+  extractColumnExprNodes(child, result);
+}
+  }
+}
+
+private ExprNodeDesc replaceColumnExprNodes(ExprNodeDesc exprNodeDesc, 
Map replaceMap) {
+  if (exprNodeDesc instanceof ExprNodeColumnDesc) {
+return replaceMap.getOrDefault(exprNodeDesc, exprNodeDesc);
+  }
+  if (exprNodeDesc instanceof ExprNodeGenericFuncDesc) {
+ExprNodeGenericFuncDesc exprNodeGenericFuncDesc = 
(ExprNodeGenericFuncDesc) exprNodeDesc.clone();
+List replacedChildren = new 
ArrayList<>(exprNodeDesc.getChildren().size());
+for (ExprNodeDesc child : exprNodeDesc.getChildren()) {
+  replacedChildren.add(replaceColumnExprNodes(child, replaceMap));
+}
+exprNodeGenericFuncDesc.setChildren(replacedChildren);
+return exprNodeGenericFuncDesc;
+  }
+
+  return exprNodeDesc;
+}
+
+private Map walk(Operator operator, 
List exprNodeDescList) {

Review comment:
   changed to `searchForEqualities`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 542229)
Time Spent: 50m  (was: 40m)

> Extend PPD filter transitivity to be able to discover new opportunities
> ---
>
> Key: HIVE-24564
> URL: https://issues.apache.org/jira/browse/HIVE-24564
> Project: Hive
>  Issue Type: Improvement
>  Components: Logical Optimizer
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> If a predicate references a value column of one of the parent ReduceSink 
> operators of a Join the predicate can not be copied and pushed down to the 
> other side of the join. However if we a parent equijoin exists in the branch 
> of the RS where 
>  1. the referenced value column is a key column of that join
>  2. and the other side of that join expression is the key column of the RS
>  the column in the predicate can be replaced and the new predicate can be 
> pushed down.
> {code:java}
>Join(... = wr_on)
>   / \
> ...  RS(key: wr_on)
>   |
>   Join(ws1.ws_on = ws2.ws_on)
>   (ws1.ws_on, ws2.ws_on, wr_on)
>   / \
>   RS(key:ws_on)  
> RS(key:ws_on)
> (value: wr_on)
>|  
>  |
>Join(ws1.ws_on = wr.wr_on)   
> TS(ws2)
>/\
>  RS(key:ws_on)  RS(key:wr_on)
>||
> TS(ws1)   TS(wr)
> {code}
> A predicate like
> {code}
> (wr_on in (...))
> {code}
> can not be pushed to TS(ws2) because wr_on is not a key column in 
> Join(ws1.ws_on = ws2.ws_on). But we know that wr_on is equals to ws_on 
> because the join from the left branch. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24669) Improve Filesystem usage in Hive::loadPartitionInternal

2021-01-26 Thread Peter Varga (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Varga resolved HIVE-24669.

Fix Version/s: 4.0.0
   Resolution: Fixed

> Improve Filesystem usage in Hive::loadPartitionInternal
> ---
>
> Key: HIVE-24669
> URL: https://issues.apache.org/jira/browse/HIVE-24669
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> * Use native recursive listing instead doing it on the Hive side
>  * Reuse the file list determined for writeNotificationlogs in quickstat 
> generation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24669) Improve Filesystem usage in Hive::loadPartitionInternal

2021-01-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24669?focusedWorklogId=542224&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-542224
 ]

ASF GitHub Bot logged work on HIVE-24669:
-

Author: ASF GitHub Bot
Created on: 26/Jan/21 15:05
Start Date: 26/Jan/21 15:05
Worklog Time Spent: 10m 
  Work Description: deniskuzZ merged pull request #1893:
URL: https://github.com/apache/hive/pull/1893


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 542224)
Time Spent: 3h 20m  (was: 3h 10m)

> Improve Filesystem usage in Hive::loadPartitionInternal
> ---
>
> Key: HIVE-24669
> URL: https://issues.apache.org/jira/browse/HIVE-24669
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> * Use native recursive listing instead doing it on the Hive side
>  * Reuse the file list determined for writeNotificationlogs in quickstat 
> generation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24411) Make ThreadPoolExecutorWithOomHook more awareness of OutOfMemoryError

2021-01-26 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17272125#comment-17272125
 ] 

Zhihua Deng commented on HIVE-24411:


Thanks a lot for the review and merge, [~kgyrtkirk]!

> Make ThreadPoolExecutorWithOomHook more awareness of OutOfMemoryError
> -
>
> Key: HIVE-24411
> URL: https://issues.apache.org/jira/browse/HIVE-24411
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Now the ThreadPoolExecutorWithOomHook invokes some oom hooks and stops the 
> HiveServer2 in case of OutOfMemoryError when executing the tasks. The 
> exception is obtained by calling method _future.get()_, however the exception 
> should not be an instance of OutOfMemoryError,  as the exception is wrapped 
> in ExecutionException,  refer to the method _report_ in FutureTask.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24686) Remove unnecessary HiveChar instantiation in HiveCharWritable.getStrippedValue

2021-01-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24686?focusedWorklogId=542208&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-542208
 ]

ASF GitHub Bot logged work on HIVE-24686:
-

Author: ASF GitHub Bot
Created on: 26/Jan/21 14:34
Start Date: 26/Jan/21 14:34
Worklog Time Spent: 10m 
  Work Description: rbalamohan commented on a change in pull request #1913:
URL: https://github.com/apache/hive/pull/1913#discussion_r564556300



##
File path: serde/src/java/org/apache/hadoop/hive/serde2/io/HiveCharWritable.java
##
@@ -83,8 +84,8 @@ public Text getStrippedValue() {
   return value;
 }
 // A lot of these methods could be done more efficiently by operating on 
the Text value
-// directly, rather than converting to HiveChar.
-return new Text(getHiveChar().getStrippedValue());
+// directly, rather than converting to String
+return new Text(StringUtils.stripEnd(value.toString(), " "));

Review comment:
   value.toString() in getHiveChar() would be the expensive part, due to 
decode. Ref: HIVE-24416.
   
   Isn't "value.toString()" in the proposed patch have the same issue? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 542208)
Time Spent: 20m  (was: 10m)

> Remove unnecessary HiveChar instantiation in HiveCharWritable.getStrippedValue
> --
>
> Key: HIVE-24686
> URL: https://issues.apache.org/jira/browse/HIVE-24686
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24637) Make Tez progress log interval configurable

2021-01-26 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-24637:

Fix Version/s: 4.0.0
 Assignee: Johan Gustavsson  (was: Zoltan Haindrich)
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

merged into master. Thank you [~johang]!

> Make Tez progress log interval configurable
> ---
>
> Key: HIVE-24637
> URL: https://issues.apache.org/jira/browse/HIVE-24637
> Project: Hive
>  Issue Type: Improvement
>  Components: Clients
>Affects Versions: 2.4.0, 3.1.2, 4.0.0
>Reporter: Johan Gustavsson
>Assignee: Johan Gustavsson
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> In case of hive on MR we can configure how often the progress log is updated 
> in the client side with the parameter 
> [https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L2041-L2050]
>  while in case of Hive on Tez this value is hard coded here 
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/monitoring/RenderStrategy.java#L42].
>  The default value in Tez is also significantly shorter than that of MR (6 VS 
> 3 S) meaning for longer queries the client log can get very long.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24637) Make Tez progress log interval configurable

2021-01-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24637?focusedWorklogId=542196&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-542196
 ]

ASF GitHub Bot logged work on HIVE-24637:
-

Author: ASF GitHub Bot
Created on: 26/Jan/21 14:17
Start Date: 26/Jan/21 14:17
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk merged pull request #1870:
URL: https://github.com/apache/hive/pull/1870


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 542196)
Time Spent: 1h 40m  (was: 1.5h)

> Make Tez progress log interval configurable
> ---
>
> Key: HIVE-24637
> URL: https://issues.apache.org/jira/browse/HIVE-24637
> Project: Hive
>  Issue Type: Improvement
>  Components: Clients
>Affects Versions: 2.4.0, 3.1.2, 4.0.0
>Reporter: Johan Gustavsson
>Assignee: Zoltan Haindrich
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> In case of hive on MR we can configure how often the progress log is updated 
> in the client side with the parameter 
> [https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L2041-L2050]
>  while in case of Hive on Tez this value is hard coded here 
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/monitoring/RenderStrategy.java#L42].
>  The default value in Tez is also significantly shorter than that of MR (6 VS 
> 3 S) meaning for longer queries the client log can get very long.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24686) Remove unnecessary HiveChar instantiation in HiveCharWritable.getStrippedValue

2021-01-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24686:
--
Labels: pull-request-available  (was: )

> Remove unnecessary HiveChar instantiation in HiveCharWritable.getStrippedValue
> --
>
> Key: HIVE-24686
> URL: https://issues.apache.org/jira/browse/HIVE-24686
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24686) Remove unnecessary HiveChar instantiation in HiveCharWritable.getStrippedValue

2021-01-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24686?focusedWorklogId=542192&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-542192
 ]

ASF GitHub Bot logged work on HIVE-24686:
-

Author: ASF GitHub Bot
Created on: 26/Jan/21 14:09
Start Date: 26/Jan/21 14:09
Worklog Time Spent: 10m 
  Work Description: abstractdog opened a new pull request #1913:
URL: https://github.com/apache/hive/pull/1913


   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 542192)
Remaining Estimate: 0h
Time Spent: 10m

> Remove unnecessary HiveChar instantiation in HiveCharWritable.getStrippedValue
> --
>
> Key: HIVE-24686
> URL: https://issues.apache.org/jira/browse/HIVE-24686
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24686) Remove unnecessary HiveChar instantiation in HiveCharWritable.getStrippedValue

2021-01-26 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-24686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor reassigned HIVE-24686:
---

Assignee: László Bodor

> Remove unnecessary HiveChar instantiation in HiveCharWritable.getStrippedValue
> --
>
> Key: HIVE-24686
> URL: https://issues.apache.org/jira/browse/HIVE-24686
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24411) Make ThreadPoolExecutorWithOomHook more awareness of OutOfMemoryError

2021-01-26 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-24411.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

merged into master. Thank you [~dengzh]!

> Make ThreadPoolExecutorWithOomHook more awareness of OutOfMemoryError
> -
>
> Key: HIVE-24411
> URL: https://issues.apache.org/jira/browse/HIVE-24411
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Now the ThreadPoolExecutorWithOomHook invokes some oom hooks and stops the 
> HiveServer2 in case of OutOfMemoryError when executing the tasks. The 
> exception is obtained by calling method _future.get()_, however the exception 
> should not be an instance of OutOfMemoryError,  as the exception is wrapped 
> in ExecutionException,  refer to the method _report_ in FutureTask.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24678) Add feature toggle to control SWO parallel edge support

2021-01-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24678?focusedWorklogId=542182&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-542182
 ]

ASF GitHub Bot logged work on HIVE-24678:
-

Author: ASF GitHub Bot
Created on: 26/Jan/21 13:45
Start Date: 26/Jan/21 13:45
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk opened a new pull request #1912:
URL: https://github.com/apache/hive/pull/1912


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 542182)
Remaining Estimate: 0h
Time Spent: 10m

> Add feature toggle to control SWO parallel edge support
> ---
>
> Key: HIVE-24678
> URL: https://issues.apache.org/jira/browse/HIVE-24678
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I can't foresee the future - but it might give better diagnosability 
> opportunities to have a direct knob on this feature (I wanted to add it in 
> the base patch ; but eventually forgot to do so)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24678) Add feature toggle to control SWO parallel edge support

2021-01-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24678:
--
Labels: pull-request-available  (was: )

> Add feature toggle to control SWO parallel edge support
> ---
>
> Key: HIVE-24678
> URL: https://issues.apache.org/jira/browse/HIVE-24678
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I can't foresee the future - but it might give better diagnosability 
> opportunities to have a direct knob on this feature (I wanted to add it in 
> the base patch ; but eventually forgot to do so)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24683) Hadoop23Shims getFileId prone to NPE for non-existing paths

2021-01-26 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-24683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ádám Szita updated HIVE-24683:
--
Status: Patch Available  (was: In Progress)

> Hadoop23Shims getFileId prone to NPE for non-existing paths
> ---
>
> Key: HIVE-24683
> URL: https://issues.apache.org/jira/browse/HIVE-24683
> Project: Hive
>  Issue Type: Bug
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HIVE-23840 introduced the feature of reading delete deltas from LLAP cache if 
> it's available. This refactor opens an opportunity for NPE to happen:
> {code:java}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.shims.Hadoop23Shims.getFileId(Hadoop23Shims.java:1410)
> at org.apache.hadoop.hive.ql.io.HdfsUtils.getFileId(HdfsUtils.java:55)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.determineFileId(OrcEncodedDataReader.java:509)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.getOrcTailForPath(OrcEncodedDataReader.java:579)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapIoImpl.getOrcTailFromCache(LlapIoImpl.java:322)
> at 
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.getOrcTail(VectorizedOrcAcidRowBatchReader.java:683)
> at 
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.access$500(VectorizedOrcAcidRowBatchReader.java:82)
> at 
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader$ColumnizedDeleteEventRegistry.(VectorizedOrcAcidRowBatchReader.java:1581){code}
> ColumnizedDeleteEventRegistry infers the file name of a delete delta bucket 
> by looking at the bucket number (from the corresponding split) but this file 
> may not exist if no deletion happen from that particular bucket.
> Earlier this was handled by always trying to open an ORC reader on the path 
> and catching FileNotFoundException. However in the refactor we first try to 
> look into the cache, and for that try to retrieve a file ID first. This 
> entails a getFileStatus call on HDFS which returns null for non-existing 
> paths, causing the NPE eventually.
> This was later fixed by HIVE-23956, nevertheless Hadoop23Shims.getFileId 
> should be refactored in a way that it's not error prone anymore.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24683) Hadoop23Shims getFileId prone to NPE for non-existing paths

2021-01-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24683?focusedWorklogId=542177&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-542177
 ]

ASF GitHub Bot logged work on HIVE-24683:
-

Author: ASF GitHub Bot
Created on: 26/Jan/21 13:38
Start Date: 26/Jan/21 13:38
Worklog Time Spent: 10m 
  Work Description: szlta opened a new pull request #1911:
URL: https://github.com/apache/hive/pull/1911


   HIVE-23840 introduced the feature of reading delete deltas from LLAP cache 
if it's available. This refactor opens an opportunity for NPE to happen:
   
   Caused by: java.lang.NullPointerException
   at 
org.apache.hadoop.hive.shims.Hadoop23Shims.getFileId(Hadoop23Shims.java:1410)
   at org.apache.hadoop.hive.ql.io.HdfsUtils.getFileId(HdfsUtils.java:55)
   at 
org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.determineFileId(OrcEncodedDataReader.java:509)
   at 
org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.getOrcTailForPath(OrcEncodedDataReader.java:579)
   at 
org.apache.hadoop.hive.llap.io.api.impl.LlapIoImpl.getOrcTailFromCache(LlapIoImpl.java:322)
   at 
org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.getOrcTail(VectorizedOrcAcidRowBatchReader.java:683)
   at 
org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.access$500(VectorizedOrcAcidRowBatchReader.java:82)
   at 
org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader$ColumnizedDeleteEventRegistry.(VectorizedOrcAcidRowBatchReader.java:1581)
   ColumnizedDeleteEventRegistry infers the file name of a delete delta bucket 
by looking at the bucket number (from the corresponding split) but this file 
may not exist if no deletion happen from that particular bucket.
   
   Earlier this was handled by always trying to open an ORC reader on the path 
and catching FileNotFoundException. However in the refactor we first try to 
look into the cache, and for that try to retrieve a file ID first. This entails 
a getFileStatus call on HDFS which returns null for non-existing paths, causing 
the NPE eventually.
   
   This was later fixed by HIVE-23956, nevertheless Hadoop23Shims.getFileId 
should be refactored in a way that it's not error prone anymore.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 542177)
Remaining Estimate: 0h
Time Spent: 10m

> Hadoop23Shims getFileId prone to NPE for non-existing paths
> ---
>
> Key: HIVE-24683
> URL: https://issues.apache.org/jira/browse/HIVE-24683
> Project: Hive
>  Issue Type: Bug
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HIVE-23840 introduced the feature of reading delete deltas from LLAP cache if 
> it's available. This refactor opens an opportunity for NPE to happen:
> {code:java}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.shims.Hadoop23Shims.getFileId(Hadoop23Shims.java:1410)
> at org.apache.hadoop.hive.ql.io.HdfsUtils.getFileId(HdfsUtils.java:55)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.determineFileId(OrcEncodedDataReader.java:509)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.getOrcTailForPath(OrcEncodedDataReader.java:579)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapIoImpl.getOrcTailFromCache(LlapIoImpl.java:322)
> at 
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.getOrcTail(VectorizedOrcAcidRowBatchReader.java:683)
> at 
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.access$500(VectorizedOrcAcidRowBatchReader.java:82)
> at 
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader$ColumnizedDeleteEventRegistry.(VectorizedOrcAcidRowBatchReader.java:1581){code}
> ColumnizedDeleteEventRegistry infers the file name of a delete delta bucket 
> by looking at the bucket number (from the corresponding split) but this file 
> may not exist if no deletion happen from that particular bucket.
> Earlier this was handled by always trying to open an ORC reader on the path 
> and catching FileNotFoundException. However in the refactor we first try to 
> look into the cache, and for that try to retrieve a file ID first. This 
> entails a getFileStatus call on HDFS which returns null for non-existing 
> paths, causing the NPE eventually.
> This was later fixed by HIVE-23956, nevertheless Hadoop23Shims.getFileId 
> should be refactored in a way th

[jira] [Updated] (HIVE-24683) Hadoop23Shims getFileId prone to NPE for non-existing paths

2021-01-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24683:
--
Labels: pull-request-available  (was: )

> Hadoop23Shims getFileId prone to NPE for non-existing paths
> ---
>
> Key: HIVE-24683
> URL: https://issues.apache.org/jira/browse/HIVE-24683
> Project: Hive
>  Issue Type: Bug
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HIVE-23840 introduced the feature of reading delete deltas from LLAP cache if 
> it's available. This refactor opens an opportunity for NPE to happen:
> {code:java}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.shims.Hadoop23Shims.getFileId(Hadoop23Shims.java:1410)
> at org.apache.hadoop.hive.ql.io.HdfsUtils.getFileId(HdfsUtils.java:55)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.determineFileId(OrcEncodedDataReader.java:509)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.getOrcTailForPath(OrcEncodedDataReader.java:579)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapIoImpl.getOrcTailFromCache(LlapIoImpl.java:322)
> at 
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.getOrcTail(VectorizedOrcAcidRowBatchReader.java:683)
> at 
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.access$500(VectorizedOrcAcidRowBatchReader.java:82)
> at 
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader$ColumnizedDeleteEventRegistry.(VectorizedOrcAcidRowBatchReader.java:1581){code}
> ColumnizedDeleteEventRegistry infers the file name of a delete delta bucket 
> by looking at the bucket number (from the corresponding split) but this file 
> may not exist if no deletion happen from that particular bucket.
> Earlier this was handled by always trying to open an ORC reader on the path 
> and catching FileNotFoundException. However in the refactor we first try to 
> look into the cache, and for that try to retrieve a file ID first. This 
> entails a getFileStatus call on HDFS which returns null for non-existing 
> paths, causing the NPE eventually.
> This was later fixed by HIVE-23956, nevertheless Hadoop23Shims.getFileId 
> should be refactored in a way that it's not error prone anymore.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23456) Upgrade Calcite version to 1.25.0

2021-01-26 Thread Soumyakanti Das (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Soumyakanti Das reassigned HIVE-23456:
--

Assignee: Soumyakanti Das  (was: Stamatis Zampetakis)

> Upgrade Calcite version to 1.25.0
> -
>
> Key: HIVE-23456
> URL: https://issues.apache.org/jira/browse/HIVE-23456
> Project: Hive
>  Issue Type: Task
>Reporter: Stamatis Zampetakis
>Assignee: Soumyakanti Das
>Priority: Major
> Attachments: HIVE-23456.01.patch, HIVE-23456.02.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24670) DeleteReaderValue should not allocate empty vectors for delete delta files

2021-01-26 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-24670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ádám Szita resolved HIVE-24670.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Committed to master. Thanks for the review [~pvary].

> DeleteReaderValue should not allocate empty vectors for delete delta files
> --
>
> Key: HIVE-24670
> URL: https://issues.apache.org/jira/browse/HIVE-24670
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> If delete delta caching is turned off, the plain record reader inside 
> DeleteReaderValue allocates a batch with a schema that is equivalent to that 
> of an insert delta.
> This is unnecessary as the struct part in a delete delta file is always 
> empty. In cases where we have many delete delta files (e.g. due to compaction 
> failures) and a wide table definition (e.g. 200+ cols) this puts a 
> significant amount of memory pressure on the executor, while these empty 
> structures will never be filled or otherwise utilized.
> I propose we specify an ACID schema with an empty struct part to this record 
> reader to counter this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24670) DeleteReaderValue should not allocate empty vectors for delete delta files

2021-01-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24670?focusedWorklogId=542159&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-542159
 ]

ASF GitHub Bot logged work on HIVE-24670:
-

Author: ASF GitHub Bot
Created on: 26/Jan/21 12:53
Start Date: 26/Jan/21 12:53
Worklog Time Spent: 10m 
  Work Description: szlta merged pull request #1894:
URL: https://github.com/apache/hive/pull/1894


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 542159)
Time Spent: 20m  (was: 10m)

> DeleteReaderValue should not allocate empty vectors for delete delta files
> --
>
> Key: HIVE-24670
> URL: https://issues.apache.org/jira/browse/HIVE-24670
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> If delete delta caching is turned off, the plain record reader inside 
> DeleteReaderValue allocates a batch with a schema that is equivalent to that 
> of an insert delta.
> This is unnecessary as the struct part in a delete delta file is always 
> empty. In cases where we have many delete delta files (e.g. due to compaction 
> failures) and a wide table definition (e.g. 200+ cols) this puts a 
> significant amount of memory pressure on the executor, while these empty 
> structures will never be filled or otherwise utilized.
> I propose we specify an ACID schema with an empty struct part to this record 
> reader to counter this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-24666) Vectorized UDFToBoolean may unable to filter rows if input is string

2021-01-26 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17272061#comment-17272061
 ] 

Zhihua Deng edited comment on HIVE-24666 at 1/26/21, 12:07 PM:
---

Added a unit test(TestVectorFilterExpressions#testCastFilter) to show that a 
SelectColumnIsTrue is needed for the cast filter, and I will recheck on this.

 


was (Author: dengzh):
 

Added a unit test(TestVectorFilterExpressions#testCastFilter) to show that a 
SelectColumnIsTrue is needed for the cast filter, and I will recheck on this.

 

> Vectorized UDFToBoolean may unable to filter rows if input is string
> 
>
> Key: HIVE-24666
> URL: https://issues.apache.org/jira/browse/HIVE-24666
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-24666.2.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If we use cast boolean in where conditions to filter rows,  in vectorization 
> execution the filter is unable to filter rows,  step to reproduce:
> {code:java}
> create table vtb (key string, value string);
> insert into table vtb values('0', 'val0'), ('false', 'valfalse'),('off', 
> 'valoff'),('no','valno'),('vk', 'valvk');
> select distinct value from vtb where cast(key as boolean); {code}
> It's seems we don't generate a SelectColumnIsTrue to filter the rows if the 
> casted type is string:
>  
> https://github.com/apache/hive/blob/ff6f3565e50148b7bcfbcf19b970379f2bd59290/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java#L2995-L2996



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-24666) Vectorized UDFToBoolean may unable to filter rows if input is string

2021-01-26 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17272061#comment-17272061
 ] 

Zhihua Deng edited comment on HIVE-24666 at 1/26/21, 12:07 PM:
---

Added a unit test(TestVectorFilterExpressions#testCastFilter) to show that a 
SelectColumnIsTrue is needed for the cast filter, and I will recheck on this.


was (Author: dengzh):
Added a unit test(TestVectorFilterExpressions#testCastFilter) to show that a 
SelectColumnIsTrue is needed for the cast filter, and I will recheck on this.

 

> Vectorized UDFToBoolean may unable to filter rows if input is string
> 
>
> Key: HIVE-24666
> URL: https://issues.apache.org/jira/browse/HIVE-24666
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-24666.2.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If we use cast boolean in where conditions to filter rows,  in vectorization 
> execution the filter is unable to filter rows,  step to reproduce:
> {code:java}
> create table vtb (key string, value string);
> insert into table vtb values('0', 'val0'), ('false', 'valfalse'),('off', 
> 'valoff'),('no','valno'),('vk', 'valvk');
> select distinct value from vtb where cast(key as boolean); {code}
> It's seems we don't generate a SelectColumnIsTrue to filter the rows if the 
> casted type is string:
>  
> https://github.com/apache/hive/blob/ff6f3565e50148b7bcfbcf19b970379f2bd59290/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java#L2995-L2996



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-24666) Vectorized UDFToBoolean may unable to filter rows if input is string

2021-01-26 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17272061#comment-17272061
 ] 

Zhihua Deng edited comment on HIVE-24666 at 1/26/21, 12:06 PM:
---

{quote} You're not wrong, this patch works for the specific bug you are 
reporting, but introduces deeper plan changes that are not necessary.
{code:java}
FilterExprOrExpr(children: SelectColumnIsTrue(col 4:boolean)(children: 
CastLongToBooleanViaLongToLong(col 2:int) -> 4:boolean), SelectColumnIsTrue(col 
5:boolean)(children: CastStringToBoolean(col 0) -> 5:boolean))
{code}
Would have been correct even without the SelectColumnIsTrue \{quote}

Added a unit test(TestVectorFilterExpressions#testCastFilter) to show that a 
SelectColumnIsTrue is needed for the cast filter, and I will recheck on this.

 


was (Author: dengzh):
{quote} 

You're not wrong, this patch works for the specific bug you are reporting, but 
introduces deeper plan changes that are not necessary.

 
{code:java}
FilterExprOrExpr(children: SelectColumnIsTrue(col 4:boolean)(children: 
CastLongToBooleanViaLongToLong(col 2:int) -> 4:boolean), SelectColumnIsTrue(col 
5:boolean)(children: CastStringToBoolean(col 0) -> 5:boolean))
{code}
Would have been correct even without the SelectColumnIsTrue\{quote}

Added a unit test(TestVectorFilterExpressions#testCastFilter) to show that a 
SelectColumnIsTrue is needed for the cast filter, and I will recheck on this.

 

> Vectorized UDFToBoolean may unable to filter rows if input is string
> 
>
> Key: HIVE-24666
> URL: https://issues.apache.org/jira/browse/HIVE-24666
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-24666.2.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If we use cast boolean in where conditions to filter rows,  in vectorization 
> execution the filter is unable to filter rows,  step to reproduce:
> {code:java}
> create table vtb (key string, value string);
> insert into table vtb values('0', 'val0'), ('false', 'valfalse'),('off', 
> 'valoff'),('no','valno'),('vk', 'valvk');
> select distinct value from vtb where cast(key as boolean); {code}
> It's seems we don't generate a SelectColumnIsTrue to filter the rows if the 
> casted type is string:
>  
> https://github.com/apache/hive/blob/ff6f3565e50148b7bcfbcf19b970379f2bd59290/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java#L2995-L2996



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-24666) Vectorized UDFToBoolean may unable to filter rows if input is string

2021-01-26 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17272061#comment-17272061
 ] 

Zhihua Deng edited comment on HIVE-24666 at 1/26/21, 12:06 PM:
---

 

Added a unit test(TestVectorFilterExpressions#testCastFilter) to show that a 
SelectColumnIsTrue is needed for the cast filter, and I will recheck on this.

 


was (Author: dengzh):
{quote} You're not wrong, this patch works for the specific bug you are 
reporting, but introduces deeper plan changes that are not necessary.
{code:java}
FilterExprOrExpr(children: SelectColumnIsTrue(col 4:boolean)(children: 
CastLongToBooleanViaLongToLong(col 2:int) -> 4:boolean), SelectColumnIsTrue(col 
5:boolean)(children: CastStringToBoolean(col 0) -> 5:boolean))
{code}
Would have been correct even without the SelectColumnIsTrue \{quote}

Added a unit test(TestVectorFilterExpressions#testCastFilter) to show that a 
SelectColumnIsTrue is needed for the cast filter, and I will recheck on this.

 

> Vectorized UDFToBoolean may unable to filter rows if input is string
> 
>
> Key: HIVE-24666
> URL: https://issues.apache.org/jira/browse/HIVE-24666
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-24666.2.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If we use cast boolean in where conditions to filter rows,  in vectorization 
> execution the filter is unable to filter rows,  step to reproduce:
> {code:java}
> create table vtb (key string, value string);
> insert into table vtb values('0', 'val0'), ('false', 'valfalse'),('off', 
> 'valoff'),('no','valno'),('vk', 'valvk');
> select distinct value from vtb where cast(key as boolean); {code}
> It's seems we don't generate a SelectColumnIsTrue to filter the rows if the 
> casted type is string:
>  
> https://github.com/apache/hive/blob/ff6f3565e50148b7bcfbcf19b970379f2bd59290/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java#L2995-L2996



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-24666) Vectorized UDFToBoolean may unable to filter rows if input is string

2021-01-26 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17272061#comment-17272061
 ] 

Zhihua Deng edited comment on HIVE-24666 at 1/26/21, 12:06 PM:
---

{quote} 

You're not wrong, this patch works for the specific bug you are reporting, but 
introduces deeper plan changes that are not necessary.

 
{code:java}
FilterExprOrExpr(children: SelectColumnIsTrue(col 4:boolean)(children: 
CastLongToBooleanViaLongToLong(col 2:int) -> 4:boolean), SelectColumnIsTrue(col 
5:boolean)(children: CastStringToBoolean(col 0) -> 5:boolean))
{code}
Would have been correct even without the SelectColumnIsTrue\{quote}

Added a unit test(TestVectorFilterExpressions#testCastFilter) to show that a 
SelectColumnIsTrue is needed for the cast filter, and I will recheck on this.

 


was (Author: dengzh):
{quote}You're not wrong, this patch works for the specific bug you are 
reporting, but introduces deeper plan changes that are not necessary.
{code:java}
FilterExprOrExpr(children: SelectColumnIsTrue(col 4:boolean)(children: 
CastLongToBooleanViaLongToLong(col 2:int) -> 4:boolean), SelectColumnIsTrue(col 
5:boolean)(children: CastStringToBoolean(col 0) -> 5:boolean))
{code}
Would have been correct even without the SelectColumnIsTrue\{quote}

Added a unit test(TestVectorFilterExpressions#testCastFilter) to show that a 
SelectColumnIsTrue is needed for the cast filter, and I will recheck on this.

 

> Vectorized UDFToBoolean may unable to filter rows if input is string
> 
>
> Key: HIVE-24666
> URL: https://issues.apache.org/jira/browse/HIVE-24666
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-24666.2.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If we use cast boolean in where conditions to filter rows,  in vectorization 
> execution the filter is unable to filter rows,  step to reproduce:
> {code:java}
> create table vtb (key string, value string);
> insert into table vtb values('0', 'val0'), ('false', 'valfalse'),('off', 
> 'valoff'),('no','valno'),('vk', 'valvk');
> select distinct value from vtb where cast(key as boolean); {code}
> It's seems we don't generate a SelectColumnIsTrue to filter the rows if the 
> casted type is string:
>  
> https://github.com/apache/hive/blob/ff6f3565e50148b7bcfbcf19b970379f2bd59290/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java#L2995-L2996



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-24666) Vectorized UDFToBoolean may unable to filter rows if input is string

2021-01-26 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17272061#comment-17272061
 ] 

Zhihua Deng edited comment on HIVE-24666 at 1/26/21, 12:05 PM:
---

 
{quote}You're not wrong, this patch works for the specific bug you are 
reporting, but introduces deeper plan changes that are not necessary.
{code:java}
FilterExprOrExpr(children: SelectColumnIsTrue(col 4:boolean)(children: 
CastLongToBooleanViaLongToLong(col 2:int) -> 4:boolean), SelectColumnIsTrue(col 
5:boolean)(children: CastStringToBoolean(col 0) -> 5:boolean))
{code}
Would have been correct even without the SelectColumnIsTrue
{quote}
{{Added a unit 
test([TestVectorFilterExpressions|https://github.com/apache/hive/pull/1890/files#diff-be207977ebc589f19de15ab322b294334b524846ae6c49aade6b01b82f35b3fc]#testCastFilter)
 to show that a SelectColumnIsTrue is needed for the cast filter, and I will 
recheck on this.}}


was (Author: dengzh):
 

 
{quote}You're not wrong, this patch works for the specific bug you are 
reporting, but introduces deeper plan changes that are not necessary.
{code:java}
FilterExprOrExpr(children: SelectColumnIsTrue(col 4:boolean)(children: 
CastLongToBooleanViaLongToLong(col 2:int) -> 4:boolean), SelectColumnIsTrue(col 
5:boolean)(children: CastStringToBoolean(col 0) -> 5:boolean))
{code}
Would have been correct even without the SelectColumnIsTrue
{quote}
 

 

{{Added a unit 
test([TestVectorFilterExpressions|https://github.com/apache/hive/pull/1890/files#diff-be207977ebc589f19de15ab322b294334b524846ae6c49aade6b01b82f35b3fc]#testCastFilter)
 to show that a SelectColumnIsTrue is needed for the cast filter, and I will 
recheck on this.}}

> Vectorized UDFToBoolean may unable to filter rows if input is string
> 
>
> Key: HIVE-24666
> URL: https://issues.apache.org/jira/browse/HIVE-24666
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-24666.2.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If we use cast boolean in where conditions to filter rows,  in vectorization 
> execution the filter is unable to filter rows,  step to reproduce:
> {code:java}
> create table vtb (key string, value string);
> insert into table vtb values('0', 'val0'), ('false', 'valfalse'),('off', 
> 'valoff'),('no','valno'),('vk', 'valvk');
> select distinct value from vtb where cast(key as boolean); {code}
> It's seems we don't generate a SelectColumnIsTrue to filter the rows if the 
> casted type is string:
>  
> https://github.com/apache/hive/blob/ff6f3565e50148b7bcfbcf19b970379f2bd59290/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java#L2995-L2996



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-24666) Vectorized UDFToBoolean may unable to filter rows if input is string

2021-01-26 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17272061#comment-17272061
 ] 

Zhihua Deng edited comment on HIVE-24666 at 1/26/21, 12:05 PM:
---

{quote}You're not wrong, this patch works for the specific bug you are 
reporting, but introduces deeper plan changes that are not necessary.
{code:java}
FilterExprOrExpr(children: SelectColumnIsTrue(col 4:boolean)(children: 
CastLongToBooleanViaLongToLong(col 2:int) -> 4:boolean), SelectColumnIsTrue(col 
5:boolean)(children: CastStringToBoolean(col 0) -> 5:boolean))
{code}
Would have been correct even without the SelectColumnIsTrue\{quote}

Added a unit test(TestVectorFilterExpressions#testCastFilter) to show that a 
SelectColumnIsTrue is needed for the cast filter, and I will recheck on this.

 


was (Author: dengzh):
 
{quote}You're not wrong, this patch works for the specific bug you are 
reporting, but introduces deeper plan changes that are not necessary.
{code:java}
FilterExprOrExpr(children: SelectColumnIsTrue(col 4:boolean)(children: 
CastLongToBooleanViaLongToLong(col 2:int) -> 4:boolean), SelectColumnIsTrue(col 
5:boolean)(children: CastStringToBoolean(col 0) -> 5:boolean))
{code}
Would have been correct even without the SelectColumnIsTrue
{quote}
{{Added a unit 
test([TestVectorFilterExpressions|https://github.com/apache/hive/pull/1890/files#diff-be207977ebc589f19de15ab322b294334b524846ae6c49aade6b01b82f35b3fc]#testCastFilter)
 to show that a SelectColumnIsTrue is needed for the cast filter, and I will 
recheck on this.}}

> Vectorized UDFToBoolean may unable to filter rows if input is string
> 
>
> Key: HIVE-24666
> URL: https://issues.apache.org/jira/browse/HIVE-24666
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-24666.2.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If we use cast boolean in where conditions to filter rows,  in vectorization 
> execution the filter is unable to filter rows,  step to reproduce:
> {code:java}
> create table vtb (key string, value string);
> insert into table vtb values('0', 'val0'), ('false', 'valfalse'),('off', 
> 'valoff'),('no','valno'),('vk', 'valvk');
> select distinct value from vtb where cast(key as boolean); {code}
> It's seems we don't generate a SelectColumnIsTrue to filter the rows if the 
> casted type is string:
>  
> https://github.com/apache/hive/blob/ff6f3565e50148b7bcfbcf19b970379f2bd59290/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java#L2995-L2996



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-24666) Vectorized UDFToBoolean may unable to filter rows if input is string

2021-01-26 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17272061#comment-17272061
 ] 

Zhihua Deng edited comment on HIVE-24666 at 1/26/21, 12:04 PM:
---

 

 
{quote}You're not wrong, this patch works for the specific bug you are 
reporting, but introduces deeper plan changes that are not necessary.
{code:java}
FilterExprOrExpr(children: SelectColumnIsTrue(col 4:boolean)(children: 
CastLongToBooleanViaLongToLong(col 2:int) -> 4:boolean), SelectColumnIsTrue(col 
5:boolean)(children: CastStringToBoolean(col 0) -> 5:boolean))
{code}
Would have been correct even without the SelectColumnIsTrue
{quote}
 

 

{{Added a unit 
test([TestVectorFilterExpressions|https://github.com/apache/hive/pull/1890/files#diff-be207977ebc589f19de15ab322b294334b524846ae6c49aade6b01b82f35b3fc]#testCastFilter)
 to show that a SelectColumnIsTrue is needed for the cast filter, and I will 
recheck on this.}}


was (Author: dengzh):
 

{{{quote}}}

You're not wrong, this patch works for the specific bug you are reporting, but 
introduces deeper plan changes that are not necessary.
{code:java}
FilterExprOrExpr(children: SelectColumnIsTrue(col 4:boolean)(children: 
CastLongToBooleanViaLongToLong(col 2:int) -> 4:boolean), SelectColumnIsTrue(col 
5:boolean)(children: CastStringToBoolean(col 0) -> 5:boolean))
{code}
Would have been correct even without the SelectColumnIsTrue

 

{{{quote}}} 

{{Added a unit 
test([TestVectorFilterExpressions|https://github.com/apache/hive/pull/1890/files#diff-be207977ebc589f19de15ab322b294334b524846ae6c49aade6b01b82f35b3fc]#testCastFilter)
 to show that a SelectColumnIsTrue is needed for the cast filter, and I will 
recheck on this.}}

> Vectorized UDFToBoolean may unable to filter rows if input is string
> 
>
> Key: HIVE-24666
> URL: https://issues.apache.org/jira/browse/HIVE-24666
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-24666.2.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If we use cast boolean in where conditions to filter rows,  in vectorization 
> execution the filter is unable to filter rows,  step to reproduce:
> {code:java}
> create table vtb (key string, value string);
> insert into table vtb values('0', 'val0'), ('false', 'valfalse'),('off', 
> 'valoff'),('no','valno'),('vk', 'valvk');
> select distinct value from vtb where cast(key as boolean); {code}
> It's seems we don't generate a SelectColumnIsTrue to filter the rows if the 
> casted type is string:
>  
> https://github.com/apache/hive/blob/ff6f3565e50148b7bcfbcf19b970379f2bd59290/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java#L2995-L2996



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-24666) Vectorized UDFToBoolean may unable to filter rows if input is string

2021-01-26 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17272061#comment-17272061
 ] 

Zhihua Deng edited comment on HIVE-24666 at 1/26/21, 12:04 PM:
---

 

{{{quote}}}

You're not wrong, this patch works for the specific bug you are reporting, but 
introduces deeper plan changes that are not necessary.
{code:java}
FilterExprOrExpr(children: SelectColumnIsTrue(col 4:boolean)(children: 
CastLongToBooleanViaLongToLong(col 2:int) -> 4:boolean), SelectColumnIsTrue(col 
5:boolean)(children: CastStringToBoolean(col 0) -> 5:boolean))
{code}
Would have been correct even without the SelectColumnIsTrue

 

{{{quote}}} 

{{Added a unit 
test([TestVectorFilterExpressions|https://github.com/apache/hive/pull/1890/files#diff-be207977ebc589f19de15ab322b294334b524846ae6c49aade6b01b82f35b3fc]#testCastFilter)
 to show that a SelectColumnIsTrue is needed for the cast filter, and I will 
recheck on this.}}


was (Author: dengzh):
 {

You're not wrong, this patch works for the specific bug you are reporting, but 
introduces deeper plan changes that are not necessary.
{code:java}
FilterExprOrExpr(children: SelectColumnIsTrue(col 4:boolean)(children: 
CastLongToBooleanViaLongToLong(col 2:int) -> 4:boolean), SelectColumnIsTrue(col 
5:boolean)(children: CastStringToBoolean(col 0) -> 5:boolean))
{code}
Would have been correct even without the SelectColumnIsTrue

 

} 

{{Added a unit 
test([TestVectorFilterExpressions|https://github.com/apache/hive/pull/1890/files#diff-be207977ebc589f19de15ab322b294334b524846ae6c49aade6b01b82f35b3fc]#testCastFilter)
 to show that a SelectColumnIsTrue is needed for the cast filter, and I will 
recheck on this.}}

> Vectorized UDFToBoolean may unable to filter rows if input is string
> 
>
> Key: HIVE-24666
> URL: https://issues.apache.org/jira/browse/HIVE-24666
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-24666.2.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If we use cast boolean in where conditions to filter rows,  in vectorization 
> execution the filter is unable to filter rows,  step to reproduce:
> {code:java}
> create table vtb (key string, value string);
> insert into table vtb values('0', 'val0'), ('false', 'valfalse'),('off', 
> 'valoff'),('no','valno'),('vk', 'valvk');
> select distinct value from vtb where cast(key as boolean); {code}
> It's seems we don't generate a SelectColumnIsTrue to filter the rows if the 
> casted type is string:
>  
> https://github.com/apache/hive/blob/ff6f3565e50148b7bcfbcf19b970379f2bd59290/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java#L2995-L2996



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24666) Vectorized UDFToBoolean may unable to filter rows if input is string

2021-01-26 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17272061#comment-17272061
 ] 

Zhihua Deng commented on HIVE-24666:


{{{quote}}}{{ }}

You're not wrong, this patch works for the specific bug you are reporting, but 
introduces deeper plan changes that are not necessary.
{code:java}
FilterExprOrExpr(children: SelectColumnIsTrue(col 4:boolean)(children: 
CastLongToBooleanViaLongToLong(col 2:int) -> 4:boolean), SelectColumnIsTrue(col 
5:boolean)(children: CastStringToBoolean(col 0) -> 5:boolean))
{code}
Would have been correct even without the SelectColumnIsTrue

{{}}

{{{quote}}}

{{Added a unit 
test([TestVectorFilterExpressions|https://github.com/apache/hive/pull/1890/files#diff-be207977ebc589f19de15ab322b294334b524846ae6c49aade6b01b82f35b3fc]#testCastFilter)
 to show that a SelectColumnIsTrue is needed for the cast filter, and I will 
recheck on this.}}

> Vectorized UDFToBoolean may unable to filter rows if input is string
> 
>
> Key: HIVE-24666
> URL: https://issues.apache.org/jira/browse/HIVE-24666
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-24666.2.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If we use cast boolean in where conditions to filter rows,  in vectorization 
> execution the filter is unable to filter rows,  step to reproduce:
> {code:java}
> create table vtb (key string, value string);
> insert into table vtb values('0', 'val0'), ('false', 'valfalse'),('off', 
> 'valoff'),('no','valno'),('vk', 'valvk');
> select distinct value from vtb where cast(key as boolean); {code}
> It's seems we don't generate a SelectColumnIsTrue to filter the rows if the 
> casted type is string:
>  
> https://github.com/apache/hive/blob/ff6f3565e50148b7bcfbcf19b970379f2bd59290/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java#L2995-L2996



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-24666) Vectorized UDFToBoolean may unable to filter rows if input is string

2021-01-26 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17272061#comment-17272061
 ] 

Zhihua Deng edited comment on HIVE-24666 at 1/26/21, 12:03 PM:
---

 {

You're not wrong, this patch works for the specific bug you are reporting, but 
introduces deeper plan changes that are not necessary.
{code:java}
FilterExprOrExpr(children: SelectColumnIsTrue(col 4:boolean)(children: 
CastLongToBooleanViaLongToLong(col 2:int) -> 4:boolean), SelectColumnIsTrue(col 
5:boolean)(children: CastStringToBoolean(col 0) -> 5:boolean))
{code}
Would have been correct even without the SelectColumnIsTrue

 

} 

{{Added a unit 
test([TestVectorFilterExpressions|https://github.com/apache/hive/pull/1890/files#diff-be207977ebc589f19de15ab322b294334b524846ae6c49aade6b01b82f35b3fc]#testCastFilter)
 to show that a SelectColumnIsTrue is needed for the cast filter, and I will 
recheck on this.}}


was (Author: dengzh):
{{{quote}}}{{ }}

You're not wrong, this patch works for the specific bug you are reporting, but 
introduces deeper plan changes that are not necessary.
{code:java}
FilterExprOrExpr(children: SelectColumnIsTrue(col 4:boolean)(children: 
CastLongToBooleanViaLongToLong(col 2:int) -> 4:boolean), SelectColumnIsTrue(col 
5:boolean)(children: CastStringToBoolean(col 0) -> 5:boolean))
{code}
Would have been correct even without the SelectColumnIsTrue

{{}}

{{{quote}}}

{{Added a unit 
test([TestVectorFilterExpressions|https://github.com/apache/hive/pull/1890/files#diff-be207977ebc589f19de15ab322b294334b524846ae6c49aade6b01b82f35b3fc]#testCastFilter)
 to show that a SelectColumnIsTrue is needed for the cast filter, and I will 
recheck on this.}}

> Vectorized UDFToBoolean may unable to filter rows if input is string
> 
>
> Key: HIVE-24666
> URL: https://issues.apache.org/jira/browse/HIVE-24666
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-24666.2.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If we use cast boolean in where conditions to filter rows,  in vectorization 
> execution the filter is unable to filter rows,  step to reproduce:
> {code:java}
> create table vtb (key string, value string);
> insert into table vtb values('0', 'val0'), ('false', 'valfalse'),('off', 
> 'valoff'),('no','valno'),('vk', 'valvk');
> select distinct value from vtb where cast(key as boolean); {code}
> It's seems we don't generate a SelectColumnIsTrue to filter the rows if the 
> casted type is string:
>  
> https://github.com/apache/hive/blob/ff6f3565e50148b7bcfbcf19b970379f2bd59290/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java#L2995-L2996



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24411) Make ThreadPoolExecutorWithOomHook more awareness of OutOfMemoryError

2021-01-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24411?focusedWorklogId=542132&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-542132
 ]

ASF GitHub Bot logged work on HIVE-24411:
-

Author: ASF GitHub Bot
Created on: 26/Jan/21 11:36
Start Date: 26/Jan/21 11:36
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk merged pull request #1695:
URL: https://github.com/apache/hive/pull/1695


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 542132)
Time Spent: 1h 10m  (was: 1h)

> Make ThreadPoolExecutorWithOomHook more awareness of OutOfMemoryError
> -
>
> Key: HIVE-24411
> URL: https://issues.apache.org/jira/browse/HIVE-24411
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Now the ThreadPoolExecutorWithOomHook invokes some oom hooks and stops the 
> HiveServer2 in case of OutOfMemoryError when executing the tasks. The 
> exception is obtained by calling method _future.get()_, however the exception 
> should not be an instance of OutOfMemoryError,  as the exception is wrapped 
> in ExecutionException,  refer to the method _report_ in FutureTask.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24636) Memory leak due to stacking UDFClassLoader in Apache Commons LogFactory

2021-01-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24636?focusedWorklogId=542129&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-542129
 ]

ASF GitHub Bot logged work on HIVE-24636:
-

Author: ASF GitHub Bot
Created on: 26/Jan/21 11:28
Start Date: 26/Jan/21 11:28
Worklog Time Spent: 10m 
  Work Description: zmatyus opened a new pull request #1910:
URL: https://github.com/apache/hive/pull/1910


   ### What changes were proposed in this pull request?
   
   When a class loader is being closed, it should also be released from the 
`org.apache.commons.logging.LogFactory#factories`, where it is being used as a 
key.
   
   ### Why are the changes needed?
   
   Current implementation has a slow but steady memory leak.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 542129)
Remaining Estimate: 0h
Time Spent: 10m

> Memory leak due to stacking UDFClassLoader in Apache Commons LogFactory
> ---
>
> Key: HIVE-24636
> URL: https://issues.apache.org/jira/browse/HIVE-24636
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.1.0
>Reporter: dohongdayi
>Priority: Major
> Attachments: HIVE-24636.1.patch.txt
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Much the same as [HIVE-7563|https://issues.apache.org/jira/browse/HIVE-7563], 
> after ClassLoader is closed in JavaUtils, it should be released by Apache 
> Commons LogFactory, or the ClassLoader can't be Garbage Collected, which 
> leads to memory leak, exactly our PROD met.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24636) Memory leak due to stacking UDFClassLoader in Apache Commons LogFactory

2021-01-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24636:
--
Labels: pull-request-available  (was: )

> Memory leak due to stacking UDFClassLoader in Apache Commons LogFactory
> ---
>
> Key: HIVE-24636
> URL: https://issues.apache.org/jira/browse/HIVE-24636
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.1.0
>Reporter: dohongdayi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24636.1.patch.txt
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Much the same as [HIVE-7563|https://issues.apache.org/jira/browse/HIVE-7563], 
> after ClassLoader is closed in JavaUtils, it should be released by Apache 
> Commons LogFactory, or the ClassLoader can't be Garbage Collected, which 
> leads to memory leak, exactly our PROD met.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-24670) DeleteReaderValue should not allocate empty vectors for delete delta files

2021-01-26 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-24670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-24670 started by Ádám Szita.
-
> DeleteReaderValue should not allocate empty vectors for delete delta files
> --
>
> Key: HIVE-24670
> URL: https://issues.apache.org/jira/browse/HIVE-24670
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If delete delta caching is turned off, the plain record reader inside 
> DeleteReaderValue allocates a batch with a schema that is equivalent to that 
> of an insert delta.
> This is unnecessary as the struct part in a delete delta file is always 
> empty. In cases where we have many delete delta files (e.g. due to compaction 
> failures) and a wide table definition (e.g. 200+ cols) this puts a 
> significant amount of memory pressure on the executor, while these empty 
> structures will never be filled or otherwise utilized.
> I propose we specify an ACID schema with an empty struct part to this record 
> reader to counter this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24683) Hadoop23Shims getFileId prone to NPE for non-existing paths

2021-01-26 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-24683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ádám Szita updated HIVE-24683:
--
Description: 
HIVE-23840 introduced the feature of reading delete deltas from LLAP cache if 
it's available. This refactor opens an opportunity for NPE to happen:
{code:java}
Caused by: java.lang.NullPointerException
at org.apache.hadoop.hive.shims.Hadoop23Shims.getFileId(Hadoop23Shims.java:1410)
at org.apache.hadoop.hive.ql.io.HdfsUtils.getFileId(HdfsUtils.java:55)
at 
org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.determineFileId(OrcEncodedDataReader.java:509)
at 
org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.getOrcTailForPath(OrcEncodedDataReader.java:579)
at 
org.apache.hadoop.hive.llap.io.api.impl.LlapIoImpl.getOrcTailFromCache(LlapIoImpl.java:322)
at 
org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.getOrcTail(VectorizedOrcAcidRowBatchReader.java:683)
at 
org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.access$500(VectorizedOrcAcidRowBatchReader.java:82)
at 
org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader$ColumnizedDeleteEventRegistry.(VectorizedOrcAcidRowBatchReader.java:1581){code}
ColumnizedDeleteEventRegistry infers the file name of a delete delta bucket by 
looking at the bucket number (from the corresponding split) but this file may 
not exist if no deletion happen from that particular bucket.

Earlier this was handled by always trying to open an ORC reader on the path and 
catching FileNotFoundException. However in the refactor we first try to look 
into the cache, and for that try to retrieve a file ID first. This entails a 
getFileStatus call on HDFS which returns null for non-existing paths, causing 
the NPE eventually.

This was later fixed by HIVE-23956, nevertheless Hadoop23Shims.getFileId should 
be refactored in a way that it's not error prone anymore.

  was:
HIVE-23840 introduced the feature of reading delete deltas from LLAP cache if 
it's available. This refactor opens an opportunity for NPE to happen:
{code:java}
Caused by: java.lang.NullPointerException
at org.apache.hadoop.hive.shims.Hadoop23Shims.getFileId(Hadoop23Shims.java:1410)
at org.apache.hadoop.hive.ql.io.HdfsUtils.getFileId(HdfsUtils.java:55)
at 
org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.determineFileId(OrcEncodedDataReader.java:509)
at 
org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.getOrcTailForPath(OrcEncodedDataReader.java:579)
at 
org.apache.hadoop.hive.llap.io.api.impl.LlapIoImpl.getOrcTailFromCache(LlapIoImpl.java:322)
at 
org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.getOrcTail(VectorizedOrcAcidRowBatchReader.java:683)
at 
org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.access$500(VectorizedOrcAcidRowBatchReader.java:82)
at 
org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader$ColumnizedDeleteEventRegistry.(VectorizedOrcAcidRowBatchReader.java:1581){code}
ColumnizedDeleteEventRegistry infers the file name of a delete delta bucket by 
looking at the bucket number (from the corresponding split) but this file may 
not exist if no deletion happen from that particular bucket.

Earlier this was handled by always trying to open an ORC reader on the path and 
catching FileNotFoundException. However in the refactor we first try to look 
into the cache, and for that try to retrieve a file ID first. This entails a 
getFileStatus call on HDFS which returns null for non-existing paths, causing 
the NPE eventually.

This needs to be wrapped around by a null check in Hadoop23Shims..


> Hadoop23Shims getFileId prone to NPE for non-existing paths
> ---
>
> Key: HIVE-24683
> URL: https://issues.apache.org/jira/browse/HIVE-24683
> Project: Hive
>  Issue Type: Bug
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>
> HIVE-23840 introduced the feature of reading delete deltas from LLAP cache if 
> it's available. This refactor opens an opportunity for NPE to happen:
> {code:java}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.shims.Hadoop23Shims.getFileId(Hadoop23Shims.java:1410)
> at org.apache.hadoop.hive.ql.io.HdfsUtils.getFileId(HdfsUtils.java:55)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.determineFileId(OrcEncodedDataReader.java:509)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.getOrcTailForPath(OrcEncodedDataReader.java:579)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapIoImpl.getOrcTailFromCache(LlapIoImpl.java:322)
> at 
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.getOrcTail(VectorizedOrcAcidRowBatchReader.java:683)
> at 
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.access$500(VectorizedOrcAcidRowBatchR

[jira] [Updated] (HIVE-24683) Hadoop23Shims getFileId prone to NPE for non-existing paths

2021-01-26 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-24683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ádám Szita updated HIVE-24683:
--
Summary: Hadoop23Shims getFileId prone to NPE for non-existing paths  (was: 
NPE in Hadoop23Shims due to non-existing delete delta paths)

> Hadoop23Shims getFileId prone to NPE for non-existing paths
> ---
>
> Key: HIVE-24683
> URL: https://issues.apache.org/jira/browse/HIVE-24683
> Project: Hive
>  Issue Type: Bug
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>
> HIVE-23840 introduced the feature of reading delete deltas from LLAP cache if 
> it's available. This refactor opens an opportunity for NPE to happen:
> {code:java}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.shims.Hadoop23Shims.getFileId(Hadoop23Shims.java:1410)
> at org.apache.hadoop.hive.ql.io.HdfsUtils.getFileId(HdfsUtils.java:55)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.determineFileId(OrcEncodedDataReader.java:509)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.getOrcTailForPath(OrcEncodedDataReader.java:579)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapIoImpl.getOrcTailFromCache(LlapIoImpl.java:322)
> at 
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.getOrcTail(VectorizedOrcAcidRowBatchReader.java:683)
> at 
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.access$500(VectorizedOrcAcidRowBatchReader.java:82)
> at 
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader$ColumnizedDeleteEventRegistry.(VectorizedOrcAcidRowBatchReader.java:1581){code}
> ColumnizedDeleteEventRegistry infers the file name of a delete delta bucket 
> by looking at the bucket number (from the corresponding split) but this file 
> may not exist if no deletion happen from that particular bucket.
> Earlier this was handled by always trying to open an ORC reader on the path 
> and catching FileNotFoundException. However in the refactor we first try to 
> look into the cache, and for that try to retrieve a file ID first. This 
> entails a getFileStatus call on HDFS which returns null for non-existing 
> paths, causing the NPE eventually.
> This needs to be wrapped around by a null check in Hadoop23Shims..



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24411) Make ThreadPoolExecutorWithOomHook more awareness of OutOfMemoryError

2021-01-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24411?focusedWorklogId=542096&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-542096
 ]

ASF GitHub Bot logged work on HIVE-24411:
-

Author: ASF GitHub Bot
Created on: 26/Jan/21 09:38
Start Date: 26/Jan/21 09:38
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on pull request #1695:
URL: https://github.com/apache/hive/pull/1695#issuecomment-767422612


   > sorry @dengzhhu653 ; I've missed your initial ping
   
   Thank you very much for the review 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 542096)
Time Spent: 1h  (was: 50m)

> Make ThreadPoolExecutorWithOomHook more awareness of OutOfMemoryError
> -
>
> Key: HIVE-24411
> URL: https://issues.apache.org/jira/browse/HIVE-24411
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Now the ThreadPoolExecutorWithOomHook invokes some oom hooks and stops the 
> HiveServer2 in case of OutOfMemoryError when executing the tasks. The 
> exception is obtained by calling method _future.get()_, however the exception 
> should not be an instance of OutOfMemoryError,  as the exception is wrapped 
> in ExecutionException,  refer to the method _report_ in FutureTask.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24411) Make ThreadPoolExecutorWithOomHook more awareness of OutOfMemoryError

2021-01-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24411?focusedWorklogId=542070&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-542070
 ]

ASF GitHub Bot logged work on HIVE-24411:
-

Author: ASF GitHub Bot
Created on: 26/Jan/21 08:19
Start Date: 26/Jan/21 08:19
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on pull request #1695:
URL: https://github.com/apache/hive/pull/1695#issuecomment-767379994


   sorry @dengzhhu653 ; I've missed your initial ping



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 542070)
Time Spent: 50m  (was: 40m)

> Make ThreadPoolExecutorWithOomHook more awareness of OutOfMemoryError
> -
>
> Key: HIVE-24411
> URL: https://issues.apache.org/jira/browse/HIVE-24411
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Now the ThreadPoolExecutorWithOomHook invokes some oom hooks and stops the 
> HiveServer2 in case of OutOfMemoryError when executing the tasks. The 
> exception is obtained by calling method _future.get()_, however the exception 
> should not be an instance of OutOfMemoryError,  as the exception is wrapped 
> in ExecutionException,  refer to the method _report_ in FutureTask.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)