[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness

2022-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808969=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808969
 ]

ASF GitHub Bot logged work on HIVE-26496:
-

Author: ASF GitHub Bot
Created on: 15/Sep/22 05:06
Start Date: 15/Sep/22 05:06
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #3559:
URL: https://github.com/apache/hive/pull/3559#issuecomment-1247586713

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=3559)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=3559=false=BUG)
 
[![C](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/C-16px.png
 
'C')](https://sonarcloud.io/project/issues?id=apache_hive=3559=false=BUG)
 [1 
Bug](https://sonarcloud.io/project/issues?id=apache_hive=3559=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=3559=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3559=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=3559=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3559=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3559=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3559=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=3559=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3559=false=CODE_SMELL)
 [9 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=3559=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3559=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3559=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 808969)
Time Spent: 6h 40m  (was: 6.5h)

> FetchOperator scans delete_delta folders multiple times causing slowness
> 
>
> Key: HIVE-26496
> URL: https://issues.apache.org/jira/browse/HIVE-26496
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> FetchOperator scans way too many number of files/directories than needed.
> For e.g here is a layout of a table which had set of updates and deletes. 
> There are set of "delta" and "delete_delta" folders which are created.
> {noformat}
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/base_001
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_002_002_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_003_003_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_004_004_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_005_005_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_006_006_
> 

[jira] [Updated] (HIVE-21508) ClassCastException when initializing HiveMetaStoreClient on JDK10 or newer

2022-09-14 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated HIVE-21508:
--
Fix Version/s: 3.1.3

> ClassCastException when initializing HiveMetaStoreClient on JDK10 or newer
> --
>
> Key: HIVE-21508
> URL: https://issues.apache.org/jira/browse/HIVE-21508
> Project: Hive
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: 2.3.4, 3.2.0
>Reporter: Adar Dembo
>Assignee: Ana Jalba
>Priority: Major
> Fix For: 2.3.7, 2.4.0, 3.1.3, 3.2.0, 4.0.0, 4.0.0-alpha-1
>
> Attachments: HIVE-21508.1.patch, HIVE-21508.2.branch-2.3.patch, 
> HIVE-21508.3.branch-2.patch, HIVE-21508.4.branch-3.1.patch, 
> HIVE-21508.5.branch-3.1.patch, HIVE-21508.6.branch-3.patch, HIVE-21508.patch
>
>
> There's this block of code in {{HiveMetaStoreClient:resolveUris}} (called 
> from the constructor) on master:
> {noformat}
>   private URI metastoreUris[];
>   ...
>   if (MetastoreConf.getVar(conf, 
> ConfVars.THRIFT_URI_SELECTION).equalsIgnoreCase("RANDOM")) {
> List uriList = Arrays.asList(metastoreUris);
> Collections.shuffle(uriList);
> metastoreUris = (URI[]) uriList.toArray();
>   }
> {noformat}
> The cast to {{URI[]}} throws a {{ClassCastException}} beginning with JDK 10, 
> possibly with JDK 9 as well. Note that {{THRIFT_URI_SELECTION}} defaults to 
> {{RANDOM}} so this should affect anyone who creates a 
> {{HiveMetaStoreClient}}. On master this can be overridden with {{SEQUENTIAL}} 
> to avoid the broken case; I'm working against 2.3.4 where there's no such 
> workaround.
> [Here's|https://stackoverflow.com/questions/51372788/array-cast-java-8-vs-java-9]
>  a StackOverflow post that explains the issue in more detail. Interestingly, 
> the author described the issue in the context of the HMS; not sure why there 
> was no follow up with a Hive bug report.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-22013) "Show table extended" query fails with Wrong FS error for partition in customized location

2022-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22013?focusedWorklogId=808913=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808913
 ]

ASF GitHub Bot logged work on HIVE-22013:
-

Author: ASF GitHub Bot
Created on: 15/Sep/22 00:25
Start Date: 15/Sep/22 00:25
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on PR #3231:
URL: https://github.com/apache/hive/pull/3231#issuecomment-1247430251

   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.




Issue Time Tracking
---

Worklog Id: (was: 808913)
Time Spent: 1h  (was: 50m)

> "Show table extended" query fails with Wrong FS error for partition in 
> customized location
> --
>
> Key: HIVE-22013
> URL: https://issues.apache.org/jira/browse/HIVE-22013
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Rajesh Balamohan
>Assignee: Ganesha Shreedhara
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> In some of the `show table extended` statements, following codepath is invoked
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/TextMetaDataFormatter.java#L421]
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/TextMetaDataFormatter.java#L449]
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/TextMetaDataFormatter.java#L468]
> 1. Not sure why this invokes stats computation. This should be removed?
>  2. Even if #1 is needed, it would be broken when {{tblPath}} and 
> {{partitionPaths}} are different (i.e when both of them of them are in 
> different fs or configured via router etc).
> {noformat}
> Caused by: java.lang.IllegalArgumentException: Wrong FS: 
> hdfs://xyz/blah/tables/location/, expected: hdfs://zzz..
>   at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:657)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:194)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:698)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:106)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:763)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:759)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:759)
>   at 
> org.apache.hadoop.hive.ql.metadata.formatting.TextMetaDataFormatter.writeFileSystemStats(TextMetaDataFormatter.java
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26535) Iceberg: Support adding parquet compression type via Table properties

2022-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26535?focusedWorklogId=808906=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808906
 ]

ASF GitHub Bot logged work on HIVE-26535:
-

Author: ASF GitHub Bot
Created on: 14/Sep/22 23:48
Start Date: 14/Sep/22 23:48
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #3597:
URL: https://github.com/apache/hive/pull/3597#issuecomment-1247409805

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=3597)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=3597=false=BUG)
 
[![C](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/C-16px.png
 
'C')](https://sonarcloud.io/project/issues?id=apache_hive=3597=false=BUG)
 [1 
Bug](https://sonarcloud.io/project/issues?id=apache_hive=3597=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=3597=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3597=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=3597=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3597=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3597=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3597=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=3597=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3597=false=CODE_SMELL)
 [8 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=3597=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3597=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3597=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 808906)
Time Spent: 20m  (was: 10m)

> Iceberg: Support adding parquet compression type via Table properties
> -
>
> Key: HIVE-26535
> URL: https://issues.apache.org/jira/browse/HIVE-26535
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> As of now for Iceberg table the parquet compression format gets ignored.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness

2022-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808904=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808904
 ]

ASF GitHub Bot logged work on HIVE-26496:
-

Author: ASF GitHub Bot
Created on: 14/Sep/22 23:47
Start Date: 14/Sep/22 23:47
Worklog Time Spent: 10m 
  Work Description: difin commented on code in PR #3559:
URL: https://github.com/apache/hive/pull/3559#discussion_r971391049


##
ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java:
##
@@ -104,28 +104,43 @@ public OrcSplit(Path path, Object fileId, long offset, 
long length, String[] hos
 this.isOriginal = isOriginal;
 this.hasBase = hasBase;
 this.rootDir = rootDir;
-this.deltas.addAll(filterDeltasByBucketId(deltas, 
AcidUtils.parseBucketId(path)));
+int bucketId = AcidUtils.parseBucketId(path);
+AcidUtils.ParsedDeltaLight parentDelta = 
AcidUtils.ParsedDeltaLight.parse(getPath().getParent());

Review Comment:
   reverted to previous version without ParsedDeltaLight.parse(path.getParent())





Issue Time Tracking
---

Worklog Id: (was: 808904)
Time Spent: 6.5h  (was: 6h 20m)

> FetchOperator scans delete_delta folders multiple times causing slowness
> 
>
> Key: HIVE-26496
> URL: https://issues.apache.org/jira/browse/HIVE-26496
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> FetchOperator scans way too many number of files/directories than needed.
> For e.g here is a layout of a table which had set of updates and deletes. 
> There are set of "delta" and "delete_delta" folders which are created.
> {noformat}
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/base_001
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_002_002_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_003_003_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_004_004_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_005_005_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_006_006_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_007_007_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_008_008_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_009_009_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_010_010_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_011_011_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_012_012_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_013_013_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_014_014_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_015_015_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_016_016_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_017_017_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_018_018_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_019_019_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_020_020_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_021_021_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_022_022_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_002_002_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_003_003_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_004_004_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_005_005_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_006_006_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_007_007_
> 

[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness

2022-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808903=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808903
 ]

ASF GitHub Bot logged work on HIVE-26496:
-

Author: ASF GitHub Bot
Created on: 14/Sep/22 23:23
Start Date: 14/Sep/22 23:23
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3559:
URL: https://github.com/apache/hive/pull/3559#discussion_r971375969


##
ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java:
##
@@ -104,13 +104,32 @@ public OrcSplit(Path path, Object fileId, long offset, 
long length, String[] hos
 this.isOriginal = isOriginal;
 this.hasBase = hasBase;
 this.rootDir = rootDir;
-this.deltas.addAll(filterDeltasByBucketId(deltas, 
AcidUtils.parseBucketId(path)));
+this.deltas.addAll(filterDeleteDeltasByWriteIds
+(filterDeltasByBucketId(deltas, AcidUtils.parseBucketId(path)), 
conf));
 this.projColsUncompressedSize = projectedDataSize <= 0 ? length : 
projectedDataSize;
 // setting file length to Long.MAX_VALUE will let orc reader read file 
length from file system
 this.fileLen = fileLen <= 0 ? Long.MAX_VALUE : fileLen;
 this.syntheticAcidProps = syntheticAcidProps;
   }
 
+  /**
+   * For every split we want to filter out the delete deltas that contain 
events that happened only
+   * in the past relative to the split
+   * @param deltas
+   * @param conf
+   * @return
+   */
+  protected List filterDeleteDeltasByWriteIds(
+  List deltas, Configuration conf) 
throws IOException {
+
+AcidOutputFormat.Options orcSplitMinMaxWriteIds =
+AcidUtils.parseBaseOrDeltaBucketFilename(getPath(), conf);

Review Comment:
   root path could be either delta or base and we need to check if deleteDeltas 
are applicable to them. the prev version was actually correct 
   
   long minWriteId = !deltas.isEmpty() ?
 AcidUtils.parseBaseOrDeltaBucketFilename(path, 
null).getMinimumWriteId() : -1;
   this.deltas.addAll(
 deltas.stream()
   .filter(delta -> isQualifiedDeleteDeltasByWriteIds(delta, 
minWriteId))
   .flatMap(delta -> filterDeltasByBucketId(delta, bucketId))
   .collect(Collectors.toList()));
   





Issue Time Tracking
---

Worklog Id: (was: 808903)
Time Spent: 6h 20m  (was: 6h 10m)

> FetchOperator scans delete_delta folders multiple times causing slowness
> 
>
> Key: HIVE-26496
> URL: https://issues.apache.org/jira/browse/HIVE-26496
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> FetchOperator scans way too many number of files/directories than needed.
> For e.g here is a layout of a table which had set of updates and deletes. 
> There are set of "delta" and "delete_delta" folders which are created.
> {noformat}
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/base_001
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_002_002_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_003_003_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_004_004_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_005_005_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_006_006_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_007_007_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_008_008_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_009_009_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_010_010_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_011_011_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_012_012_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_013_013_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_014_014_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_015_015_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_016_016_
> 

[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness

2022-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808902=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808902
 ]

ASF GitHub Bot logged work on HIVE-26496:
-

Author: ASF GitHub Bot
Created on: 14/Sep/22 23:21
Start Date: 14/Sep/22 23:21
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3559:
URL: https://github.com/apache/hive/pull/3559#discussion_r971375969


##
ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java:
##
@@ -104,13 +104,32 @@ public OrcSplit(Path path, Object fileId, long offset, 
long length, String[] hos
 this.isOriginal = isOriginal;
 this.hasBase = hasBase;
 this.rootDir = rootDir;
-this.deltas.addAll(filterDeltasByBucketId(deltas, 
AcidUtils.parseBucketId(path)));
+this.deltas.addAll(filterDeleteDeltasByWriteIds
+(filterDeltasByBucketId(deltas, AcidUtils.parseBucketId(path)), 
conf));
 this.projColsUncompressedSize = projectedDataSize <= 0 ? length : 
projectedDataSize;
 // setting file length to Long.MAX_VALUE will let orc reader read file 
length from file system
 this.fileLen = fileLen <= 0 ? Long.MAX_VALUE : fileLen;
 this.syntheticAcidProps = syntheticAcidProps;
   }
 
+  /**
+   * For every split we want to filter out the delete deltas that contain 
events that happened only
+   * in the past relative to the split
+   * @param deltas
+   * @param conf
+   * @return
+   */
+  protected List filterDeleteDeltasByWriteIds(
+  List deltas, Configuration conf) 
throws IOException {
+
+AcidOutputFormat.Options orcSplitMinMaxWriteIds =
+AcidUtils.parseBaseOrDeltaBucketFilename(getPath(), conf);

Review Comment:
   root path could be anything: delta/base. the prev version was actually 
correct 
   
   long minWriteId = !deltas.isEmpty() ?
 AcidUtils.parseBaseOrDeltaBucketFilename(path, 
null).getMinimumWriteId() : -1;
   this.deltas.addAll(
 deltas.stream()
   .filter(delta -> isQualifiedDeleteDeltasByWriteIds(delta, 
minWriteId))
   .flatMap(delta -> filterDeltasByBucketId(delta, bucketId))
   .collect(Collectors.toList()));
   





Issue Time Tracking
---

Worklog Id: (was: 808902)
Time Spent: 6h 10m  (was: 6h)

> FetchOperator scans delete_delta folders multiple times causing slowness
> 
>
> Key: HIVE-26496
> URL: https://issues.apache.org/jira/browse/HIVE-26496
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> FetchOperator scans way too many number of files/directories than needed.
> For e.g here is a layout of a table which had set of updates and deletes. 
> There are set of "delta" and "delete_delta" folders which are created.
> {noformat}
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/base_001
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_002_002_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_003_003_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_004_004_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_005_005_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_006_006_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_007_007_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_008_008_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_009_009_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_010_010_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_011_011_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_012_012_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_013_013_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_014_014_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_015_015_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_016_016_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_017_017_
> 

[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness

2022-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808901=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808901
 ]

ASF GitHub Bot logged work on HIVE-26496:
-

Author: ASF GitHub Bot
Created on: 14/Sep/22 23:19
Start Date: 14/Sep/22 23:19
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3559:
URL: https://github.com/apache/hive/pull/3559#discussion_r971375969


##
ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java:
##
@@ -104,13 +104,32 @@ public OrcSplit(Path path, Object fileId, long offset, 
long length, String[] hos
 this.isOriginal = isOriginal;
 this.hasBase = hasBase;
 this.rootDir = rootDir;
-this.deltas.addAll(filterDeltasByBucketId(deltas, 
AcidUtils.parseBucketId(path)));
+this.deltas.addAll(filterDeleteDeltasByWriteIds
+(filterDeltasByBucketId(deltas, AcidUtils.parseBucketId(path)), 
conf));
 this.projColsUncompressedSize = projectedDataSize <= 0 ? length : 
projectedDataSize;
 // setting file length to Long.MAX_VALUE will let orc reader read file 
length from file system
 this.fileLen = fileLen <= 0 ? Long.MAX_VALUE : fileLen;
 this.syntheticAcidProps = syntheticAcidProps;
   }
 
+  /**
+   * For every split we want to filter out the delete deltas that contain 
events that happened only
+   * in the past relative to the split
+   * @param deltas
+   * @param conf
+   * @return
+   */
+  protected List filterDeleteDeltasByWriteIds(
+  List deltas, Configuration conf) 
throws IOException {
+
+AcidOutputFormat.Options orcSplitMinMaxWriteIds =
+AcidUtils.parseBaseOrDeltaBucketFilename(getPath(), conf);

Review Comment:
   root path could be anything: delta/delete-delta/base. the prev version was 
actually correct 
   
   long minWriteId = !deltas.isEmpty() ?
 AcidUtils.parseBaseOrDeltaBucketFilename(path, 
null).getMinimumWriteId() : -1;
   this.deltas.addAll(
 deltas.stream()
   .filter(delta -> isQualifiedDeleteDeltasByWriteIds(delta, 
minWriteId))
   .flatMap(delta -> filterDeltasByBucketId(delta, bucketId))
   .collect(Collectors.toList()));
   





Issue Time Tracking
---

Worklog Id: (was: 808901)
Time Spent: 6h  (was: 5h 50m)

> FetchOperator scans delete_delta folders multiple times causing slowness
> 
>
> Key: HIVE-26496
> URL: https://issues.apache.org/jira/browse/HIVE-26496
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> FetchOperator scans way too many number of files/directories than needed.
> For e.g here is a layout of a table which had set of updates and deletes. 
> There are set of "delta" and "delete_delta" folders which are created.
> {noformat}
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/base_001
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_002_002_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_003_003_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_004_004_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_005_005_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_006_006_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_007_007_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_008_008_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_009_009_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_010_010_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_011_011_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_012_012_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_013_013_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_014_014_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_015_015_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_016_016_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_017_017_
> 

[jira] [Work logged] (HIVE-26535) Iceberg: Support adding parquet compression type via Table properties

2022-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26535?focusedWorklogId=808896=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808896
 ]

ASF GitHub Bot logged work on HIVE-26535:
-

Author: ASF GitHub Bot
Created on: 14/Sep/22 22:11
Start Date: 14/Sep/22 22:11
Worklog Time Spent: 10m 
  Work Description: ayushtkn opened a new pull request, #3597:
URL: https://github.com/apache/hive/pull/3597

   ### What changes were proposed in this pull request?
   
   Add support to specify parquet compression properties in iceberg table via 
TBLPROPERTIES.




Issue Time Tracking
---

Worklog Id: (was: 808896)
Remaining Estimate: 0h
Time Spent: 10m

> Iceberg: Support adding parquet compression type via Table properties
> -
>
> Key: HIVE-26535
> URL: https://issues.apache.org/jira/browse/HIVE-26535
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As of now for Iceberg table the parquet compression format gets ignored.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26535) Iceberg: Support adding parquet compression type via Table properties

2022-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-26535:
--
Labels: pull-request-available  (was: )

> Iceberg: Support adding parquet compression type via Table properties
> -
>
> Key: HIVE-26535
> URL: https://issues.apache.org/jira/browse/HIVE-26535
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As of now for Iceberg table the parquet compression format gets ignored.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-26535) Iceberg: Support adding parquet compression type via Table properties

2022-09-14 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena reassigned HIVE-26535:
---


> Iceberg: Support adding parquet compression type via Table properties
> -
>
> Key: HIVE-26535
> URL: https://issues.apache.org/jira/browse/HIVE-26535
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>
> As of now for Iceberg table the parquet compression format gets ignored.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-25848) Empty result for structs in point lookup optimization with vectorization on

2022-09-14 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-25848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17604984#comment-17604984
 ] 

Hankó Gergely commented on HIVE-25848:
--

My finding is that some (probably all) of the vectorized Filter*InList 
expressions are unprepared to handle constant values, only column expressions. 
This causes the issue in the description and also others:
{code:java}
set hive.fetch.task.conversion=none;
set hive.optimize.point.lookup=false;
set hive.cbo.enable=false;

create table test (a string) partitioned by (y string);
insert into test values ('aa', 2022);

select * from test where (struct(2022) IN (struct(2022)));
--gives empty result
--works fine if vectorization is off{code}
{code:java}
set hive.fetch.task.conversion=none;
set hive.optimize.point.lookup=false;
set hive.cbo.enable=false;
set hive.optimize.constant.propagation=false;
set hive.optimize.ppd=false;
create table test (a string) partitioned by (y string);
insert into test values ('aa', 2022);
select * from test where (2022 IN (2022));
--throws error
--works fine if vectorization is off
{code}
It's probably the VectorizationContext.getInExpression() that should be tweaked 
not to use the multi-purpose createVectorExpression method but an 
InExpression-specific one that handles constants properly. Maybe it could do 
the evaluation for constants right away and generate a 
FilterConstantBooleanVectorExpression for the result, it would greatly speed up 
such operations.

 

The pull request solves a different bug where embedded expressions in structs 
are not initialized properly after deserialization. It is probably unrelated to 
this one so I'm going to create a new bug ticket for it.

> Empty result for structs in point lookup optimization with vectorization on
> ---
>
> Key: HIVE-25848
> URL: https://issues.apache.org/jira/browse/HIVE-25848
> Project: Hive
>  Issue Type: Bug
>Reporter: Ádám Szita
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Repro steps:
> {code:java}
> set hive.fetch.task.conversion=none;
> create table test (a string) partitioned by (y string, m string);
> insert into test values ('aa', 2022, 1);
> select * from test where (y=year(date_sub(current_date,4)) and 
> m=month(date_sub(current_date,4))) or (y=year(date_sub(current_date,10)) and 
> m=month(date_sub(current_date,10)) );
> --gives empty result{code}
> Turning either of the feature below off yields to good result (1 row 
> expected):
> {code:java}
> set hive.optimize.point.lookup=false;
> set hive.cbo.enable=false;
> set hive.vectorized.execution.enabled=false;
> {code}
> Expected good result is:
> {code}
> +-+-+-+
> | test.a  | test.y  | test.m  |
> +-+-+-+
> | aa      | 2022    | 1       |
> +-+-+-+ {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness

2022-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808881=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808881
 ]

ASF GitHub Bot logged work on HIVE-26496:
-

Author: ASF GitHub Bot
Created on: 14/Sep/22 21:23
Start Date: 14/Sep/22 21:23
Worklog Time Spent: 10m 
  Work Description: difin commented on code in PR #3559:
URL: https://github.com/apache/hive/pull/3559#discussion_r971221139


##
ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java:
##
@@ -104,13 +104,32 @@ public OrcSplit(Path path, Object fileId, long offset, 
long length, String[] hos
 this.isOriginal = isOriginal;
 this.hasBase = hasBase;
 this.rootDir = rootDir;
-this.deltas.addAll(filterDeltasByBucketId(deltas, 
AcidUtils.parseBucketId(path)));
+this.deltas.addAll(filterDeleteDeltasByWriteIds

Review Comment:
   deltas class member is marked as final and initialized before, that's why I 
couldn't reassign it and used addAll().
   
   
https://github.com/apache/hive/blob/e352684d5c87df1483444afc4c3ee897270bd413/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java#L67





Issue Time Tracking
---

Worklog Id: (was: 808881)
Time Spent: 5h 50m  (was: 5h 40m)

> FetchOperator scans delete_delta folders multiple times causing slowness
> 
>
> Key: HIVE-26496
> URL: https://issues.apache.org/jira/browse/HIVE-26496
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> FetchOperator scans way too many number of files/directories than needed.
> For e.g here is a layout of a table which had set of updates and deletes. 
> There are set of "delta" and "delete_delta" folders which are created.
> {noformat}
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/base_001
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_002_002_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_003_003_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_004_004_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_005_005_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_006_006_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_007_007_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_008_008_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_009_009_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_010_010_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_011_011_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_012_012_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_013_013_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_014_014_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_015_015_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_016_016_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_017_017_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_018_018_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_019_019_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_020_020_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_021_021_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_022_022_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_002_002_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_003_003_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_004_004_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_005_005_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_006_006_
> 

[jira] [Updated] (HIVE-25848) Empty result for structs in point lookup optimization with vectorization on

2022-09-14 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-25848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hankó Gergely updated HIVE-25848:
-
Labels:   (was: pull-request-available)

> Empty result for structs in point lookup optimization with vectorization on
> ---
>
> Key: HIVE-25848
> URL: https://issues.apache.org/jira/browse/HIVE-25848
> Project: Hive
>  Issue Type: Bug
>Reporter: Ádám Szita
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Repro steps:
> {code:java}
> set hive.fetch.task.conversion=none;
> create table test (a string) partitioned by (y string, m string);
> insert into test values ('aa', 2022, 1);
> select * from test where (y=year(date_sub(current_date,4)) and 
> m=month(date_sub(current_date,4))) or (y=year(date_sub(current_date,10)) and 
> m=month(date_sub(current_date,10)) );
> --gives empty result{code}
> Turning either of the feature below off yields to good result (1 row 
> expected):
> {code:java}
> set hive.optimize.point.lookup=false;
> set hive.cbo.enable=false;
> set hive.vectorized.execution.enabled=false;
> {code}
> Expected good result is:
> {code}
> +-+-+-+
> | test.a  | test.y  | test.m  |
> +-+-+-+
> | aa      | 2022    | 1       |
> +-+-+-+ {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness

2022-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808875=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808875
 ]

ASF GitHub Bot logged work on HIVE-26496:
-

Author: ASF GitHub Bot
Created on: 14/Sep/22 20:53
Start Date: 14/Sep/22 20:53
Worklog Time Spent: 10m 
  Work Description: difin commented on code in PR #3559:
URL: https://github.com/apache/hive/pull/3559#discussion_r971221139


##
ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java:
##
@@ -104,13 +104,32 @@ public OrcSplit(Path path, Object fileId, long offset, 
long length, String[] hos
 this.isOriginal = isOriginal;
 this.hasBase = hasBase;
 this.rootDir = rootDir;
-this.deltas.addAll(filterDeltasByBucketId(deltas, 
AcidUtils.parseBucketId(path)));
+this.deltas.addAll(filterDeleteDeltasByWriteIds

Review Comment:
   deltas class member is marked as final, that's why I couldn't reassign it 
and used addAll().
   
   
https://github.com/apache/hive/blob/e352684d5c87df1483444afc4c3ee897270bd413/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java#L67





Issue Time Tracking
---

Worklog Id: (was: 808875)
Time Spent: 5h 40m  (was: 5.5h)

> FetchOperator scans delete_delta folders multiple times causing slowness
> 
>
> Key: HIVE-26496
> URL: https://issues.apache.org/jira/browse/HIVE-26496
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> FetchOperator scans way too many number of files/directories than needed.
> For e.g here is a layout of a table which had set of updates and deletes. 
> There are set of "delta" and "delete_delta" folders which are created.
> {noformat}
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/base_001
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_002_002_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_003_003_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_004_004_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_005_005_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_006_006_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_007_007_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_008_008_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_009_009_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_010_010_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_011_011_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_012_012_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_013_013_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_014_014_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_015_015_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_016_016_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_017_017_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_018_018_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_019_019_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_020_020_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_021_021_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_022_022_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_002_002_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_003_003_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_004_004_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_005_005_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_006_006_
> 

[jira] [Work logged] (HIVE-26522) Test for HIVE-22033 and backport to 3.1 and 2.3

2022-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26522?focusedWorklogId=808869=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808869
 ]

ASF GitHub Bot logged work on HIVE-26522:
-

Author: ASF GitHub Bot
Created on: 14/Sep/22 20:41
Start Date: 14/Sep/22 20:41
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #3585:
URL: https://github.com/apache/hive/pull/3585#issuecomment-1247280592

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=3585)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=3585=false=BUG)
 
[![C](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/C-16px.png
 
'C')](https://sonarcloud.io/project/issues?id=apache_hive=3585=false=BUG)
 [1 
Bug](https://sonarcloud.io/project/issues?id=apache_hive=3585=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=3585=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3585=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=3585=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3585=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3585=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3585=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=3585=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3585=false=CODE_SMELL)
 [6 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=3585=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3585=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3585=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 808869)
Time Spent: 1h 10m  (was: 1h)

> Test for HIVE-22033 and backport to 3.1 and 2.3
> ---
>
> Key: HIVE-26522
> URL: https://issues.apache.org/jira/browse/HIVE-26522
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 2.3.8, 3.1.3
>Reporter: Pavan Lanka
>Assignee: Pavan Lanka
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> HIVE-22033 fixes the issue with Hive Delegation tokens so that the renewal 
> time is effective.
> This looks at adding a test for HIVE-22033 and backporting this fix to 3.1 
> and 2.3 branches in Hive.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26488) Fix NPE in DDLSemanticAnalyzerFactory during compilation

2022-09-14 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17604941#comment-17604941
 ] 

Ayush Saxena commented on HIVE-26488:
-

Committed to master.
Thanx [~zabetak] for the review!!!

> Fix NPE in DDLSemanticAnalyzerFactory during compilation
> 
>
> Key: HIVE-26488
> URL: https://issues.apache.org/jira/browse/HIVE-26488
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> *Exception Trace:*
> {noformat}
> java.lang.ExceptionInInitializerError
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzerFactory.getInternal(SemanticAnalyzerFactory.java:62)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzerFactory.get(SemanticAnalyzerFactory.java:41)
>   at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:209)
>   at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:106)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:507)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:459)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:424)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:418)
> {noformat}
> *Cause:*
> {noformat}
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.ddl.DDLSemanticAnalyzerFactory.(DDLSemanticAnalyzerFactory.java:84)
>   ... 40 more
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-26488) Fix NPE in DDLSemanticAnalyzerFactory during compilation

2022-09-14 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena resolved HIVE-26488.
-
Fix Version/s: 4.0.0-alpha-2
 Hadoop Flags: Reviewed
   Resolution: Fixed

> Fix NPE in DDLSemanticAnalyzerFactory during compilation
> 
>
> Key: HIVE-26488
> URL: https://issues.apache.org/jira/browse/HIVE-26488
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> *Exception Trace:*
> {noformat}
> java.lang.ExceptionInInitializerError
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzerFactory.getInternal(SemanticAnalyzerFactory.java:62)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzerFactory.get(SemanticAnalyzerFactory.java:41)
>   at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:209)
>   at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:106)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:507)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:459)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:424)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:418)
> {noformat}
> *Cause:*
> {noformat}
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.ddl.DDLSemanticAnalyzerFactory.(DDLSemanticAnalyzerFactory.java:84)
>   ... 40 more
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26488) Fix NPE in DDLSemanticAnalyzerFactory during compilation

2022-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26488?focusedWorklogId=808858=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808858
 ]

ASF GitHub Bot logged work on HIVE-26488:
-

Author: ASF GitHub Bot
Created on: 14/Sep/22 19:41
Start Date: 14/Sep/22 19:41
Worklog Time Spent: 10m 
  Work Description: ayushtkn merged PR #3538:
URL: https://github.com/apache/hive/pull/3538




Issue Time Tracking
---

Worklog Id: (was: 808858)
Time Spent: 2h 20m  (was: 2h 10m)

> Fix NPE in DDLSemanticAnalyzerFactory during compilation
> 
>
> Key: HIVE-26488
> URL: https://issues.apache.org/jira/browse/HIVE-26488
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> *Exception Trace:*
> {noformat}
> java.lang.ExceptionInInitializerError
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzerFactory.getInternal(SemanticAnalyzerFactory.java:62)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzerFactory.get(SemanticAnalyzerFactory.java:41)
>   at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:209)
>   at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:106)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:507)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:459)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:424)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:418)
> {noformat}
> *Cause:*
> {noformat}
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.ddl.DDLSemanticAnalyzerFactory.(DDLSemanticAnalyzerFactory.java:84)
>   ... 40 more
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness

2022-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808856=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808856
 ]

ASF GitHub Bot logged work on HIVE-26496:
-

Author: ASF GitHub Bot
Created on: 14/Sep/22 19:34
Start Date: 14/Sep/22 19:34
Worklog Time Spent: 10m 
  Work Description: difin commented on code in PR #3559:
URL: https://github.com/apache/hive/pull/3559#discussion_r971208288


##
ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java:
##
@@ -104,13 +104,32 @@ public OrcSplit(Path path, Object fileId, long offset, 
long length, String[] hos
 this.isOriginal = isOriginal;
 this.hasBase = hasBase;
 this.rootDir = rootDir;
-this.deltas.addAll(filterDeltasByBucketId(deltas, 
AcidUtils.parseBucketId(path)));
+this.deltas.addAll(filterDeleteDeltasByWriteIds
+(filterDeltasByBucketId(deltas, AcidUtils.parseBucketId(path)), 
conf));
 this.projColsUncompressedSize = projectedDataSize <= 0 ? length : 
projectedDataSize;
 // setting file length to Long.MAX_VALUE will let orc reader read file 
length from file system
 this.fileLen = fileLen <= 0 ? Long.MAX_VALUE : fileLen;
 this.syntheticAcidProps = syntheticAcidProps;
   }
 
+  /**
+   * For every split we want to filter out the delete deltas that contain 
events that happened only
+   * in the past relative to the split
+   * @param deltas
+   * @param conf
+   * @return
+   */
+  protected List filterDeleteDeltasByWriteIds(
+  List deltas, Configuration conf) 
throws IOException {
+
+AcidOutputFormat.Options orcSplitMinMaxWriteIds =
+AcidUtils.parseBaseOrDeltaBucketFilename(getPath(), conf);

Review Comment:
   Hi @deniskuzZ,
   Many tests failed with the change of using 
AcidUtils.ParsedDeltaLight.parse() instead of 
AcidUtils.parseBaseOrDeltaBucketFilename() with exception on this line: 
   
https://github.com/apache/hive/blob/e352684d5c87df1483444afc4c3ee897270bd413/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java#L108
   
   As I understand the split is not always a delta folder, it can be some older 
format not supported by ParsedDeltaLight. I saw that ParsedDeltaLight.parse() 
is used in some cases internally in AcidUtils.parseBaseOrDeltaBucketFilename(), 
but not always. Can you please advise if I should revert to using 
AcidUtils.parseBaseOrDeltaBucketFilename() that worked in all cases or there is 
some better way?
   
   
https://github.com/apache/hive/blob/f6bd0eb80767adfa9ce9f47a6d02a4940903effb/ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java#L538-L552





Issue Time Tracking
---

Worklog Id: (was: 808856)
Time Spent: 5.5h  (was: 5h 20m)

> FetchOperator scans delete_delta folders multiple times causing slowness
> 
>
> Key: HIVE-26496
> URL: https://issues.apache.org/jira/browse/HIVE-26496
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> FetchOperator scans way too many number of files/directories than needed.
> For e.g here is a layout of a table which had set of updates and deletes. 
> There are set of "delta" and "delete_delta" folders which are created.
> {noformat}
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/base_001
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_002_002_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_003_003_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_004_004_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_005_005_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_006_006_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_007_007_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_008_008_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_009_009_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_010_010_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_011_011_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_012_012_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_013_013_
> 

[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness

2022-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808855=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808855
 ]

ASF GitHub Bot logged work on HIVE-26496:
-

Author: ASF GitHub Bot
Created on: 14/Sep/22 19:33
Start Date: 14/Sep/22 19:33
Worklog Time Spent: 10m 
  Work Description: difin commented on code in PR #3559:
URL: https://github.com/apache/hive/pull/3559#discussion_r971221139


##
ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java:
##
@@ -104,13 +104,32 @@ public OrcSplit(Path path, Object fileId, long offset, 
long length, String[] hos
 this.isOriginal = isOriginal;
 this.hasBase = hasBase;
 this.rootDir = rootDir;
-this.deltas.addAll(filterDeltasByBucketId(deltas, 
AcidUtils.parseBucketId(path)));
+this.deltas.addAll(filterDeleteDeltasByWriteIds

Review Comment:
   deltas class member is marked as final, that's why I couldn't reassign it 
and used addAll().





Issue Time Tracking
---

Worklog Id: (was: 808855)
Time Spent: 5h 20m  (was: 5h 10m)

> FetchOperator scans delete_delta folders multiple times causing slowness
> 
>
> Key: HIVE-26496
> URL: https://issues.apache.org/jira/browse/HIVE-26496
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> FetchOperator scans way too many number of files/directories than needed.
> For e.g here is a layout of a table which had set of updates and deletes. 
> There are set of "delta" and "delete_delta" folders which are created.
> {noformat}
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/base_001
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_002_002_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_003_003_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_004_004_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_005_005_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_006_006_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_007_007_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_008_008_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_009_009_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_010_010_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_011_011_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_012_012_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_013_013_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_014_014_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_015_015_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_016_016_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_017_017_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_018_018_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_019_019_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_020_020_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_021_021_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_022_022_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_002_002_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_003_003_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_004_004_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_005_005_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_006_006_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_007_007_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_008_008_
> 

[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness

2022-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808854=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808854
 ]

ASF GitHub Bot logged work on HIVE-26496:
-

Author: ASF GitHub Bot
Created on: 14/Sep/22 19:30
Start Date: 14/Sep/22 19:30
Worklog Time Spent: 10m 
  Work Description: difin commented on code in PR #3559:
URL: https://github.com/apache/hive/pull/3559#discussion_r971208288


##
ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java:
##
@@ -104,13 +104,32 @@ public OrcSplit(Path path, Object fileId, long offset, 
long length, String[] hos
 this.isOriginal = isOriginal;
 this.hasBase = hasBase;
 this.rootDir = rootDir;
-this.deltas.addAll(filterDeltasByBucketId(deltas, 
AcidUtils.parseBucketId(path)));
+this.deltas.addAll(filterDeleteDeltasByWriteIds
+(filterDeltasByBucketId(deltas, AcidUtils.parseBucketId(path)), 
conf));
 this.projColsUncompressedSize = projectedDataSize <= 0 ? length : 
projectedDataSize;
 // setting file length to Long.MAX_VALUE will let orc reader read file 
length from file system
 this.fileLen = fileLen <= 0 ? Long.MAX_VALUE : fileLen;
 this.syntheticAcidProps = syntheticAcidProps;
   }
 
+  /**
+   * For every split we want to filter out the delete deltas that contain 
events that happened only
+   * in the past relative to the split
+   * @param deltas
+   * @param conf
+   * @return
+   */
+  protected List filterDeleteDeltasByWriteIds(
+  List deltas, Configuration conf) 
throws IOException {
+
+AcidOutputFormat.Options orcSplitMinMaxWriteIds =
+AcidUtils.parseBaseOrDeltaBucketFilename(getPath(), conf);

Review Comment:
   Hi @deniskuzZ,
   Many tests failed with the change of using 
AcidUtils.ParsedDeltaLight.parse() instead of 
AcidUtils.parseBaseOrDeltaBucketFilename() with exception on this line: 
   
https://github.com/apache/hive/blob/e352684d5c87df1483444afc4c3ee897270bd413/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java#L108.
 
   
   As I understand the split is not always a delta folder, it can be some older 
format not supported by ParsedDeltaLight. I saw that ParsedDeltaLight.parse() 
is used in some cases internally in AcidUtils.parseBaseOrDeltaBucketFilename(), 
but not always. Can you please advise if I should revert to using 
AcidUtils.parseBaseOrDeltaBucketFilename() that worked in all cases or there is 
some better way?
   
   
https://github.com/apache/hive/blob/f6bd0eb80767adfa9ce9f47a6d02a4940903effb/ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java#L538-L552





Issue Time Tracking
---

Worklog Id: (was: 808854)
Time Spent: 5h 10m  (was: 5h)

> FetchOperator scans delete_delta folders multiple times causing slowness
> 
>
> Key: HIVE-26496
> URL: https://issues.apache.org/jira/browse/HIVE-26496
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> FetchOperator scans way too many number of files/directories than needed.
> For e.g here is a layout of a table which had set of updates and deletes. 
> There are set of "delta" and "delete_delta" folders which are created.
> {noformat}
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/base_001
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_002_002_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_003_003_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_004_004_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_005_005_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_006_006_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_007_007_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_008_008_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_009_009_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_010_010_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_011_011_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_012_012_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_013_013_
> 

[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness

2022-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808851=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808851
 ]

ASF GitHub Bot logged work on HIVE-26496:
-

Author: ASF GitHub Bot
Created on: 14/Sep/22 19:20
Start Date: 14/Sep/22 19:20
Worklog Time Spent: 10m 
  Work Description: difin commented on code in PR #3559:
URL: https://github.com/apache/hive/pull/3559#discussion_r971208288


##
ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java:
##
@@ -104,13 +104,32 @@ public OrcSplit(Path path, Object fileId, long offset, 
long length, String[] hos
 this.isOriginal = isOriginal;
 this.hasBase = hasBase;
 this.rootDir = rootDir;
-this.deltas.addAll(filterDeltasByBucketId(deltas, 
AcidUtils.parseBucketId(path)));
+this.deltas.addAll(filterDeleteDeltasByWriteIds
+(filterDeltasByBucketId(deltas, AcidUtils.parseBucketId(path)), 
conf));
 this.projColsUncompressedSize = projectedDataSize <= 0 ? length : 
projectedDataSize;
 // setting file length to Long.MAX_VALUE will let orc reader read file 
length from file system
 this.fileLen = fileLen <= 0 ? Long.MAX_VALUE : fileLen;
 this.syntheticAcidProps = syntheticAcidProps;
   }
 
+  /**
+   * For every split we want to filter out the delete deltas that contain 
events that happened only
+   * in the past relative to the split
+   * @param deltas
+   * @param conf
+   * @return
+   */
+  protected List filterDeleteDeltasByWriteIds(
+  List deltas, Configuration conf) 
throws IOException {
+
+AcidOutputFormat.Options orcSplitMinMaxWriteIds =
+AcidUtils.parseBaseOrDeltaBucketFilename(getPath(), conf);

Review Comment:
   Hi @deniskuzZ,
   Many tests failed with the change of using 
AcidUtils.ParsedDeltaLight.parse() instead of 
AcidUtils.parseBaseOrDeltaBucketFilename(). As I understand the split is not 
always a delta folder, it can be some older format not supported by 
ParsedDeltaLight. I saw that ParsedDeltaLight.parse() is used in some cases 
internally in AcidUtils.parseBaseOrDeltaBucketFilename(), but not always. Can 
you please advise if I should revert to using 
AcidUtils.parseBaseOrDeltaBucketFilename() that worked in all cases or there is 
some better way?
   
   
https://github.com/apache/hive/blob/f6bd0eb80767adfa9ce9f47a6d02a4940903effb/ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java#L538-L552





Issue Time Tracking
---

Worklog Id: (was: 808851)
Time Spent: 4h 50m  (was: 4h 40m)

> FetchOperator scans delete_delta folders multiple times causing slowness
> 
>
> Key: HIVE-26496
> URL: https://issues.apache.org/jira/browse/HIVE-26496
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> FetchOperator scans way too many number of files/directories than needed.
> For e.g here is a layout of a table which had set of updates and deletes. 
> There are set of "delta" and "delete_delta" folders which are created.
> {noformat}
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/base_001
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_002_002_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_003_003_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_004_004_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_005_005_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_006_006_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_007_007_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_008_008_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_009_009_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_010_010_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_011_011_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_012_012_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_013_013_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_014_014_
> 

[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness

2022-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808852=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808852
 ]

ASF GitHub Bot logged work on HIVE-26496:
-

Author: ASF GitHub Bot
Created on: 14/Sep/22 19:20
Start Date: 14/Sep/22 19:20
Worklog Time Spent: 10m 
  Work Description: difin commented on code in PR #3559:
URL: https://github.com/apache/hive/pull/3559#discussion_r971190432


##
ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java:
##
@@ -104,28 +104,43 @@ public OrcSplit(Path path, Object fileId, long offset, 
long length, String[] hos
 this.isOriginal = isOriginal;
 this.hasBase = hasBase;
 this.rootDir = rootDir;
-this.deltas.addAll(filterDeltasByBucketId(deltas, 
AcidUtils.parseBucketId(path)));
+int bucketId = AcidUtils.parseBucketId(path);
+AcidUtils.ParsedDeltaLight parentDelta = 
AcidUtils.ParsedDeltaLight.parse(getPath().getParent());

Review Comment:
   Hi @deniskuzZ,
   Many tests failed with the change of using 
AcidUtils.ParsedDeltaLight.parse() instead of 
AcidUtils.parseBaseOrDeltaBucketFilename(). As I understand the split is not 
always a delta folder, it can be some older format not supported by 
ParsedDeltaLight. I saw that ParsedDeltaLight.parse() is used in some cases 
internally in AcidUtils.parseBaseOrDeltaBucketFilename(), but not always. Can 
you please advise if I should revert to using 
AcidUtils.parseBaseOrDeltaBucketFilename() that worked in all cases or there is 
some better way?
   
   
https://github.com/apache/hive/blob/f6bd0eb80767adfa9ce9f47a6d02a4940903effb/ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java#L538-L552





Issue Time Tracking
---

Worklog Id: (was: 808852)
Time Spent: 5h  (was: 4h 50m)

> FetchOperator scans delete_delta folders multiple times causing slowness
> 
>
> Key: HIVE-26496
> URL: https://issues.apache.org/jira/browse/HIVE-26496
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> FetchOperator scans way too many number of files/directories than needed.
> For e.g here is a layout of a table which had set of updates and deletes. 
> There are set of "delta" and "delete_delta" folders which are created.
> {noformat}
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/base_001
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_002_002_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_003_003_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_004_004_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_005_005_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_006_006_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_007_007_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_008_008_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_009_009_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_010_010_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_011_011_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_012_012_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_013_013_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_014_014_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_015_015_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_016_016_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_017_017_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_018_018_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_019_019_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_020_020_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_021_021_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_022_022_
> 

[jira] [Work logged] (HIVE-26277) NPEs and rounding issues in ColumnStatsAggregator classes

2022-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26277?focusedWorklogId=808847=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808847
 ]

ASF GitHub Bot logged work on HIVE-26277:
-

Author: ASF GitHub Bot
Created on: 14/Sep/22 19:13
Start Date: 14/Sep/22 19:13
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #3339:
URL: https://github.com/apache/hive/pull/3339#issuecomment-1247195030

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=3339)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=3339=false=BUG)
 
[![C](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/C-16px.png
 
'C')](https://sonarcloud.io/project/issues?id=apache_hive=3339=false=BUG)
 [2 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=3339=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=3339=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3339=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=3339=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3339=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3339=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3339=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=3339=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3339=false=CODE_SMELL)
 [44 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=3339=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3339=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3339=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 808847)
Time Spent: 7h 50m  (was: 7h 40m)

> NPEs and rounding issues in ColumnStatsAggregator classes
> -
>
> Key: HIVE-26277
> URL: https://issues.apache.org/jira/browse/HIVE-26277
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore, Statistics, Tests
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> Fix NPEs and rounding errors in _ColumnStatsAggregator_ classes, add 
> unit-tests for all the involved classes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness

2022-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808846=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808846
 ]

ASF GitHub Bot logged work on HIVE-26496:
-

Author: ASF GitHub Bot
Created on: 14/Sep/22 19:09
Start Date: 14/Sep/22 19:09
Worklog Time Spent: 10m 
  Work Description: difin commented on code in PR #3559:
URL: https://github.com/apache/hive/pull/3559#discussion_r971190432


##
ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java:
##
@@ -104,28 +104,43 @@ public OrcSplit(Path path, Object fileId, long offset, 
long length, String[] hos
 this.isOriginal = isOriginal;
 this.hasBase = hasBase;
 this.rootDir = rootDir;
-this.deltas.addAll(filterDeltasByBucketId(deltas, 
AcidUtils.parseBucketId(path)));
+int bucketId = AcidUtils.parseBucketId(path);
+AcidUtils.ParsedDeltaLight parentDelta = 
AcidUtils.ParsedDeltaLight.parse(getPath().getParent());

Review Comment:
   Hi @deniskuzZ,
   Many tests failed with the change of using 
AcidUtils.ParsedDeltaLight.parse() instead of 
AcidUtils.parseBaseOrDeltaBucketFilename(). As I understand the split is not 
always a delta folder, it can be some older format not supported by 
ParsedDeltaLight. I saw that ParsedDeltaLight.parse() is used in some cases 
internally in AcidUtils.parseBaseOrDeltaBucketFilename(), but not always. Can 
you please advise if I should revert to using 
AcidUtils.parseBaseOrDeltaBucketFilename() that worked in all cases or there is 
some better way?
   
   
https://github.com/apache/hive/blob/f6bd0eb80767adfa9ce9f47a6d02a4940903effb/ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java#L538-L552





Issue Time Tracking
---

Worklog Id: (was: 808846)
Time Spent: 4h 40m  (was: 4.5h)

> FetchOperator scans delete_delta folders multiple times causing slowness
> 
>
> Key: HIVE-26496
> URL: https://issues.apache.org/jira/browse/HIVE-26496
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> FetchOperator scans way too many number of files/directories than needed.
> For e.g here is a layout of a table which had set of updates and deletes. 
> There are set of "delta" and "delete_delta" folders which are created.
> {noformat}
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/base_001
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_002_002_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_003_003_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_004_004_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_005_005_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_006_006_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_007_007_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_008_008_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_009_009_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_010_010_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_011_011_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_012_012_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_013_013_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_014_014_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_015_015_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_016_016_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_017_017_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_018_018_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_019_019_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_020_020_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_021_021_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_022_022_
> 

[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness

2022-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808844=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808844
 ]

ASF GitHub Bot logged work on HIVE-26496:
-

Author: ASF GitHub Bot
Created on: 14/Sep/22 19:03
Start Date: 14/Sep/22 19:03
Worklog Time Spent: 10m 
  Work Description: difin commented on code in PR #3559:
URL: https://github.com/apache/hive/pull/3559#discussion_r971190432


##
ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java:
##
@@ -104,28 +104,43 @@ public OrcSplit(Path path, Object fileId, long offset, 
long length, String[] hos
 this.isOriginal = isOriginal;
 this.hasBase = hasBase;
 this.rootDir = rootDir;
-this.deltas.addAll(filterDeltasByBucketId(deltas, 
AcidUtils.parseBucketId(path)));
+int bucketId = AcidUtils.parseBucketId(path);
+AcidUtils.ParsedDeltaLight parentDelta = 
AcidUtils.ParsedDeltaLight.parse(getPath().getParent());

Review Comment:
   Hi @deniskuzZ,
   Many tests failed with the change of using 
AcidUtils.ParsedDeltaLight.parse() instead of 
AcidUtils.parseBaseOrDeltaBucketFilename(). As I understand the split is not 
always a delta folder, it can be some older format not supported by 
ParsedDeltaLight. I saw that ParsedDeltaLight.parse() is used in some cases 
internally in AcidUtils.parseBaseOrDeltaBucketFilename(), but not always. Can 
you please advise if I should revert to using 
AcidUtils.parseBaseOrDeltaBucketFilename() that worked in all cases or there is 
some better way?
   
   
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java#539





Issue Time Tracking
---

Worklog Id: (was: 808844)
Time Spent: 4.5h  (was: 4h 20m)

> FetchOperator scans delete_delta folders multiple times causing slowness
> 
>
> Key: HIVE-26496
> URL: https://issues.apache.org/jira/browse/HIVE-26496
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> FetchOperator scans way too many number of files/directories than needed.
> For e.g here is a layout of a table which had set of updates and deletes. 
> There are set of "delta" and "delete_delta" folders which are created.
> {noformat}
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/base_001
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_002_002_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_003_003_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_004_004_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_005_005_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_006_006_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_007_007_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_008_008_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_009_009_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_010_010_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_011_011_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_012_012_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_013_013_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_014_014_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_015_015_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_016_016_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_017_017_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_018_018_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_019_019_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_020_020_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_021_021_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_022_022_
> 

[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness

2022-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808843=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808843
 ]

ASF GitHub Bot logged work on HIVE-26496:
-

Author: ASF GitHub Bot
Created on: 14/Sep/22 19:03
Start Date: 14/Sep/22 19:03
Worklog Time Spent: 10m 
  Work Description: difin commented on code in PR #3559:
URL: https://github.com/apache/hive/pull/3559#discussion_r971190432


##
ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java:
##
@@ -104,28 +104,43 @@ public OrcSplit(Path path, Object fileId, long offset, 
long length, String[] hos
 this.isOriginal = isOriginal;
 this.hasBase = hasBase;
 this.rootDir = rootDir;
-this.deltas.addAll(filterDeltasByBucketId(deltas, 
AcidUtils.parseBucketId(path)));
+int bucketId = AcidUtils.parseBucketId(path);
+AcidUtils.ParsedDeltaLight parentDelta = 
AcidUtils.ParsedDeltaLight.parse(getPath().getParent());

Review Comment:
   Hi @deniskuzZ,
   Many tests failed with the change of using 
AcidUtils.ParsedDeltaLight.parse() instead of 
AcidUtils.parseBaseOrDeltaBucketFilename(). As I understand the split is not 
always a delta folder, it can be some older format not supported by 
ParsedDeltaLight. I saw that ParsedDeltaLight.parse() is used in some cases 
internally in AcidUtils.parseBaseOrDeltaBucketFilename(), but not always. Can 
you please advise if I should revert to using 
AcidUtils.parseBaseOrDeltaBucketFilename() that worked in all cases or there is 
some better way?
   
   
(https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java#539)





Issue Time Tracking
---

Worklog Id: (was: 808843)
Time Spent: 4h 20m  (was: 4h 10m)

> FetchOperator scans delete_delta folders multiple times causing slowness
> 
>
> Key: HIVE-26496
> URL: https://issues.apache.org/jira/browse/HIVE-26496
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> FetchOperator scans way too many number of files/directories than needed.
> For e.g here is a layout of a table which had set of updates and deletes. 
> There are set of "delta" and "delete_delta" folders which are created.
> {noformat}
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/base_001
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_002_002_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_003_003_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_004_004_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_005_005_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_006_006_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_007_007_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_008_008_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_009_009_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_010_010_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_011_011_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_012_012_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_013_013_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_014_014_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_015_015_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_016_016_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_017_017_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_018_018_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_019_019_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_020_020_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_021_021_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_022_022_
> 

[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness

2022-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808842=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808842
 ]

ASF GitHub Bot logged work on HIVE-26496:
-

Author: ASF GitHub Bot
Created on: 14/Sep/22 19:02
Start Date: 14/Sep/22 19:02
Worklog Time Spent: 10m 
  Work Description: difin commented on code in PR #3559:
URL: https://github.com/apache/hive/pull/3559#discussion_r971190432


##
ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java:
##
@@ -104,28 +104,43 @@ public OrcSplit(Path path, Object fileId, long offset, 
long length, String[] hos
 this.isOriginal = isOriginal;
 this.hasBase = hasBase;
 this.rootDir = rootDir;
-this.deltas.addAll(filterDeltasByBucketId(deltas, 
AcidUtils.parseBucketId(path)));
+int bucketId = AcidUtils.parseBucketId(path);
+AcidUtils.ParsedDeltaLight parentDelta = 
AcidUtils.ParsedDeltaLight.parse(getPath().getParent());

Review Comment:
   Hi @deniskuzZ,
   Many tests failed with the change of using 
AcidUtils.ParsedDeltaLight.parse() instead of 
AcidUtils.parseBaseOrDeltaBucketFilename(). As I understand the split is not 
always a delta folder, it can be some older format not supported by 
ParsedDeltaLight. I saw that ParsedDeltaLight.parse() is used in some cases 
internally in AcidUtils.parseBaseOrDeltaBucketFilename(), but not always. Can 
you please advise if I should revert to using 
AcidUtils.parseBaseOrDeltaBucketFilename() that worked in all cases or there is 
some better way?
   
   
[](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java#539)





Issue Time Tracking
---

Worklog Id: (was: 808842)
Time Spent: 4h 10m  (was: 4h)

> FetchOperator scans delete_delta folders multiple times causing slowness
> 
>
> Key: HIVE-26496
> URL: https://issues.apache.org/jira/browse/HIVE-26496
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> FetchOperator scans way too many number of files/directories than needed.
> For e.g here is a layout of a table which had set of updates and deletes. 
> There are set of "delta" and "delete_delta" folders which are created.
> {noformat}
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/base_001
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_002_002_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_003_003_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_004_004_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_005_005_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_006_006_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_007_007_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_008_008_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_009_009_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_010_010_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_011_011_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_012_012_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_013_013_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_014_014_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_015_015_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_016_016_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_017_017_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_018_018_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_019_019_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_020_020_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_021_021_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_022_022_
> 

[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness

2022-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808840=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808840
 ]

ASF GitHub Bot logged work on HIVE-26496:
-

Author: ASF GitHub Bot
Created on: 14/Sep/22 18:58
Start Date: 14/Sep/22 18:58
Worklog Time Spent: 10m 
  Work Description: difin commented on code in PR #3559:
URL: https://github.com/apache/hive/pull/3559#discussion_r971190432


##
ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java:
##
@@ -104,28 +104,43 @@ public OrcSplit(Path path, Object fileId, long offset, 
long length, String[] hos
 this.isOriginal = isOriginal;
 this.hasBase = hasBase;
 this.rootDir = rootDir;
-this.deltas.addAll(filterDeltasByBucketId(deltas, 
AcidUtils.parseBucketId(path)));
+int bucketId = AcidUtils.parseBucketId(path);
+AcidUtils.ParsedDeltaLight parentDelta = 
AcidUtils.ParsedDeltaLight.parse(getPath().getParent());

Review Comment:
   Hi @deniskuzZ,
   Many tests failed with the change of using 
AcidUtils.ParsedDeltaLight.parse() instead of 
AcidUtils.parseBaseOrDeltaBucketFilename(). As I understand the split is not 
always a delta folder, it can be some older format not supported by 
ParsedDeltaLight. I saw that ParsedDeltaLight.parse() is used in some cases 
internally in AcidUtils.parseBaseOrDeltaBucketFilename(), but not always. Can 
you please advise if I should revert to using 
AcidUtils.parseBaseOrDeltaBucketFilename() that worked in all cases or there is 
some better way?





Issue Time Tracking
---

Worklog Id: (was: 808840)
Time Spent: 4h  (was: 3h 50m)

> FetchOperator scans delete_delta folders multiple times causing slowness
> 
>
> Key: HIVE-26496
> URL: https://issues.apache.org/jira/browse/HIVE-26496
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> FetchOperator scans way too many number of files/directories than needed.
> For e.g here is a layout of a table which had set of updates and deletes. 
> There are set of "delta" and "delete_delta" folders which are created.
> {noformat}
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/base_001
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_002_002_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_003_003_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_004_004_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_005_005_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_006_006_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_007_007_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_008_008_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_009_009_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_010_010_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_011_011_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_012_012_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_013_013_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_014_014_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_015_015_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_016_016_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_017_017_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_018_018_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_019_019_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_020_020_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_021_021_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_022_022_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_002_002_
> 

[jira] [Work logged] (HIVE-26045) Detect timed out connections for providers and auto-reconnect

2022-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26045?focusedWorklogId=808837=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808837
 ]

ASF GitHub Bot logged work on HIVE-26045:
-

Author: ASF GitHub Bot
Created on: 14/Sep/22 18:44
Start Date: 14/Sep/22 18:44
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #3595:
URL: https://github.com/apache/hive/pull/3595#issuecomment-1247166412

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=3595)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=3595=false=BUG)
 
[![C](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/C-16px.png
 
'C')](https://sonarcloud.io/project/issues?id=apache_hive=3595=false=BUG)
 [1 
Bug](https://sonarcloud.io/project/issues?id=apache_hive=3595=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=3595=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3595=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=3595=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3595=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3595=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3595=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=3595=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3595=false=CODE_SMELL)
 [45 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=3595=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3595=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3595=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 808837)
Time Spent: 3h  (was: 2h 50m)

> Detect timed out connections for providers and auto-reconnect
> -
>
> Key: HIVE-26045
> URL: https://issues.apache.org/jira/browse/HIVE-26045
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Naveen Gangam
>Assignee: zhangbutao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> For the connectors, we use single connection, no pooling. But when the 
> connection is idle for an extended period, the JDBC connection times out. We 
> need to check for closed connections (Connection.isClosed()?) and 
> re-establish the connection. Otherwise it renders the connector fairly 
> useless.
> {noformat}
> 2022-03-17T13:02:16,635  WARN [HiveServer2-Handler-Pool: Thread-116] 
> thrift.ThriftCLIService: Error executing statement: 
> org.apache.hive.service.cli.HiveSQLException: Error while compiling 
> statement: FAILED: SemanticException Unable to fetch table temp_dbs. Error 
> retrieving remote 
> table:com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: 
> No operations allowed after connection closed.
>   at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:373)
>  ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT]
>   

[jira] [Comment Edited] (HIVE-26534) GROUPING() function errors out due to case-sensitivity of function name

2022-09-14 Thread Aman Sinha (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17604875#comment-17604875
 ] 

Aman Sinha edited comment on HIVE-26534 at 9/14/22 5:49 PM:


Marking this fixed since [~soumyakanti.das]'s PR has been merged. Thanks to the 
reviewers.


was (Author: amansinha):
Marking this fixed since [~soumyakanti.das] PR has been merged.

> GROUPING() function errors out due to case-sensitivity of function name
> ---
>
> Key: HIVE-26534
> URL: https://issues.apache.org/jira/browse/HIVE-26534
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Logical Optimizer
>Reporter: Aman Sinha
>Assignee: Soumyakanti Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The following errors out:
> {noformat}
> explain cbo select GROUPING(l_suppkey) from lineitem group by l_suppkey with 
> rollup;
> Error: Error while compiling statement: FAILED: SemanticException [Error 
> 10015]: Line 1:19 Arguments length mismatch 'l_suppkey': grouping() requires 
> at least 2 argument, got 1 (state=21000,code=10015)
> {noformat}
> Lowercase grouping() succeeds:
> {noformat}
> explain cbo select grouping(l_suppkey) from lineitem group by l_suppkey with 
> rollup;
> ++
> |  Explain   |
> ++
> | CBO PLAN:  |
> | HiveProject(_o__c0=[grouping($1, 0:BIGINT)])   |
> |   HiveAggregate(group=[{0}], groups=[[{0}, {}]], 
> GROUPING__ID=[GROUPING__ID()]) |
> | HiveProject(l_suppkey=[$2])|
> |   HiveTableScan(table=[[tpch, lineitem]], table:alias=[lineitem]) |
> ||
> ++
> {noformat}
> This is likely due to the SemanticAnalyzer doing a case-sensitive compare 
> here:
> {noformat}
>  @Override
>   public Object post(Object t) {
>  
>   if (func.getText().equals("grouping") && func.getChildCount() == 0) 
> {
> {noformat}
> We should fix this to make it case-insensitive comparison.  There might be 
> other places to examine too for grouping function.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-26534) GROUPING() function errors out due to case-sensitivity of function name

2022-09-14 Thread Aman Sinha (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aman Sinha resolved HIVE-26534.
---
Resolution: Fixed

Marking this fixed since [~soumyakanti.das] PR has been merged.

> GROUPING() function errors out due to case-sensitivity of function name
> ---
>
> Key: HIVE-26534
> URL: https://issues.apache.org/jira/browse/HIVE-26534
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Logical Optimizer
>Reporter: Aman Sinha
>Assignee: Soumyakanti Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The following errors out:
> {noformat}
> explain cbo select GROUPING(l_suppkey) from lineitem group by l_suppkey with 
> rollup;
> Error: Error while compiling statement: FAILED: SemanticException [Error 
> 10015]: Line 1:19 Arguments length mismatch 'l_suppkey': grouping() requires 
> at least 2 argument, got 1 (state=21000,code=10015)
> {noformat}
> Lowercase grouping() succeeds:
> {noformat}
> explain cbo select grouping(l_suppkey) from lineitem group by l_suppkey with 
> rollup;
> ++
> |  Explain   |
> ++
> | CBO PLAN:  |
> | HiveProject(_o__c0=[grouping($1, 0:BIGINT)])   |
> |   HiveAggregate(group=[{0}], groups=[[{0}, {}]], 
> GROUPING__ID=[GROUPING__ID()]) |
> | HiveProject(l_suppkey=[$2])|
> |   HiveTableScan(table=[[tpch, lineitem]], table:alias=[lineitem]) |
> ||
> ++
> {noformat}
> This is likely due to the SemanticAnalyzer doing a case-sensitive compare 
> here:
> {noformat}
>  @Override
>   public Object post(Object t) {
>  
>   if (func.getText().equals("grouping") && func.getChildCount() == 0) 
> {
> {noformat}
> We should fix this to make it case-insensitive comparison.  There might be 
> other places to examine too for grouping function.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26522) Test for HIVE-22033 and backport to 3.1 and 2.3

2022-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26522?focusedWorklogId=808811=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808811
 ]

ASF GitHub Bot logged work on HIVE-26522:
-

Author: ASF GitHub Bot
Created on: 14/Sep/22 17:13
Start Date: 14/Sep/22 17:13
Worklog Time Spent: 10m 
  Work Description: prasanthj commented on PR #3586:
URL: https://github.com/apache/hive/pull/3586#issuecomment-1247069691

   +1




Issue Time Tracking
---

Worklog Id: (was: 808811)
Time Spent: 1h  (was: 50m)

> Test for HIVE-22033 and backport to 3.1 and 2.3
> ---
>
> Key: HIVE-26522
> URL: https://issues.apache.org/jira/browse/HIVE-26522
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 2.3.8, 3.1.3
>Reporter: Pavan Lanka
>Assignee: Pavan Lanka
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> HIVE-22033 fixes the issue with Hive Delegation tokens so that the renewal 
> time is effective.
> This looks at adding a test for HIVE-22033 and backporting this fix to 3.1 
> and 2.3 branches in Hive.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26522) Test for HIVE-22033 and backport to 3.1 and 2.3

2022-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26522?focusedWorklogId=808810=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808810
 ]

ASF GitHub Bot logged work on HIVE-26522:
-

Author: ASF GitHub Bot
Created on: 14/Sep/22 17:12
Start Date: 14/Sep/22 17:12
Worklog Time Spent: 10m 
  Work Description: prasanthj commented on PR #3587:
URL: https://github.com/apache/hive/pull/3587#issuecomment-1247068988

   +1




Issue Time Tracking
---

Worklog Id: (was: 808810)
Time Spent: 50m  (was: 40m)

> Test for HIVE-22033 and backport to 3.1 and 2.3
> ---
>
> Key: HIVE-26522
> URL: https://issues.apache.org/jira/browse/HIVE-26522
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 2.3.8, 3.1.3
>Reporter: Pavan Lanka
>Assignee: Pavan Lanka
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> HIVE-22033 fixes the issue with Hive Delegation tokens so that the renewal 
> time is effective.
> This looks at adding a test for HIVE-22033 and backporting this fix to 3.1 
> and 2.3 branches in Hive.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26522) Test for HIVE-22033 and backport to 3.1 and 2.3

2022-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26522?focusedWorklogId=808809=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808809
 ]

ASF GitHub Bot logged work on HIVE-26522:
-

Author: ASF GitHub Bot
Created on: 14/Sep/22 17:12
Start Date: 14/Sep/22 17:12
Worklog Time Spent: 10m 
  Work Description: prasanthj commented on PR #3585:
URL: https://github.com/apache/hive/pull/3585#issuecomment-1247068460

   +1




Issue Time Tracking
---

Worklog Id: (was: 808809)
Time Spent: 40m  (was: 0.5h)

> Test for HIVE-22033 and backport to 3.1 and 2.3
> ---
>
> Key: HIVE-26522
> URL: https://issues.apache.org/jira/browse/HIVE-26522
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 2.3.8, 3.1.3
>Reporter: Pavan Lanka
>Assignee: Pavan Lanka
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> HIVE-22033 fixes the issue with Hive Delegation tokens so that the renewal 
> time is effective.
> This looks at adding a test for HIVE-22033 and backporting this fix to 3.1 
> and 2.3 branches in Hive.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness

2022-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808804=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808804
 ]

ASF GitHub Bot logged work on HIVE-26496:
-

Author: ASF GitHub Bot
Created on: 14/Sep/22 17:01
Start Date: 14/Sep/22 17:01
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #3559:
URL: https://github.com/apache/hive/pull/3559#issuecomment-1247057159

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=3559)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=3559=false=BUG)
 
[![C](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/C-16px.png
 
'C')](https://sonarcloud.io/project/issues?id=apache_hive=3559=false=BUG)
 [1 
Bug](https://sonarcloud.io/project/issues?id=apache_hive=3559=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=3559=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3559=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=3559=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3559=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3559=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3559=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=3559=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3559=false=CODE_SMELL)
 [45 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=3559=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3559=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3559=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 808804)
Time Spent: 3h 50m  (was: 3h 40m)

> FetchOperator scans delete_delta folders multiple times causing slowness
> 
>
> Key: HIVE-26496
> URL: https://issues.apache.org/jira/browse/HIVE-26496
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> FetchOperator scans way too many number of files/directories than needed.
> For e.g here is a layout of a table which had set of updates and deletes. 
> There are set of "delta" and "delete_delta" folders which are created.
> {noformat}
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/base_001
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_002_002_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_003_003_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_004_004_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_005_005_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_006_006_
> 

[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness

2022-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808795=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808795
 ]

ASF GitHub Bot logged work on HIVE-26496:
-

Author: ASF GitHub Bot
Created on: 14/Sep/22 16:30
Start Date: 14/Sep/22 16:30
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3559:
URL: https://github.com/apache/hive/pull/3559#discussion_r971049370


##
ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java:
##
@@ -104,28 +104,43 @@ public OrcSplit(Path path, Object fileId, long offset, 
long length, String[] hos
 this.isOriginal = isOriginal;
 this.hasBase = hasBase;
 this.rootDir = rootDir;
-this.deltas.addAll(filterDeltasByBucketId(deltas, 
AcidUtils.parseBucketId(path)));
+int bucketId = AcidUtils.parseBucketId(path);
+AcidUtils.ParsedDeltaLight parentDelta = 
AcidUtils.ParsedDeltaLight.parse(getPath().getParent());

Review Comment:
   could be refactored using static import:
   
   ParsedDeltaLight pd = ParsedDeltaLight.parse(path.getParent())
   





Issue Time Tracking
---

Worklog Id: (was: 808795)
Time Spent: 3h 40m  (was: 3.5h)

> FetchOperator scans delete_delta folders multiple times causing slowness
> 
>
> Key: HIVE-26496
> URL: https://issues.apache.org/jira/browse/HIVE-26496
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> FetchOperator scans way too many number of files/directories than needed.
> For e.g here is a layout of a table which had set of updates and deletes. 
> There are set of "delta" and "delete_delta" folders which are created.
> {noformat}
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/base_001
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_002_002_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_003_003_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_004_004_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_005_005_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_006_006_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_007_007_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_008_008_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_009_009_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_010_010_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_011_011_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_012_012_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_013_013_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_014_014_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_015_015_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_016_016_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_017_017_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_018_018_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_019_019_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_020_020_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_021_021_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_022_022_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_002_002_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_003_003_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_004_004_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_005_005_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_006_006_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_007_007_
> 

[jira] [Work logged] (HIVE-26521) Iceberg: Raise exception when running delete/update statements on V1 tables

2022-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26521?focusedWorklogId=808793=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808793
 ]

ASF GitHub Bot logged work on HIVE-26521:
-

Author: ASF GitHub Bot
Created on: 14/Sep/22 16:24
Start Date: 14/Sep/22 16:24
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #3579:
URL: https://github.com/apache/hive/pull/3579#issuecomment-1247013071

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=3579)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=3579=false=BUG)
 
[![C](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/C-16px.png
 
'C')](https://sonarcloud.io/project/issues?id=apache_hive=3579=false=BUG)
 [1 
Bug](https://sonarcloud.io/project/issues?id=apache_hive=3579=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=3579=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3579=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=3579=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3579=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3579=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3579=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=3579=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3579=false=CODE_SMELL)
 [46 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=3579=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3579=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3579=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 808793)
Time Spent: 1.5h  (was: 1h 20m)

> Iceberg: Raise exception when running delete/update statements on V1 tables
> ---
>
> Key: HIVE-26521
> URL: https://issues.apache.org/jira/browse/HIVE-26521
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Right now an exception is raised on the executor side when trying to commit 
> the delete file. We should throw an exception earlier, during the compilation 
> phase.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26488) Fix NPE in DDLSemanticAnalyzerFactory during compilation

2022-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26488?focusedWorklogId=808792=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808792
 ]

ASF GitHub Bot logged work on HIVE-26488:
-

Author: ASF GitHub Bot
Created on: 14/Sep/22 16:18
Start Date: 14/Sep/22 16:18
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #3538:
URL: https://github.com/apache/hive/pull/3538#issuecomment-1247005602

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=3538)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=3538=false=BUG)
 
[![C](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/C-16px.png
 
'C')](https://sonarcloud.io/project/issues?id=apache_hive=3538=false=BUG)
 [1 
Bug](https://sonarcloud.io/project/issues?id=apache_hive=3538=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=3538=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3538=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=3538=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3538=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3538=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3538=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=3538=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3538=false=CODE_SMELL)
 [48 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=3538=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3538=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3538=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 808792)
Time Spent: 2h 10m  (was: 2h)

> Fix NPE in DDLSemanticAnalyzerFactory during compilation
> 
>
> Key: HIVE-26488
> URL: https://issues.apache.org/jira/browse/HIVE-26488
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> *Exception Trace:*
> {noformat}
> java.lang.ExceptionInInitializerError
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzerFactory.getInternal(SemanticAnalyzerFactory.java:62)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzerFactory.get(SemanticAnalyzerFactory.java:41)
>   at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:209)
>   at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:106)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:507)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:459)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:424)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:418)
> {noformat}
> *Cause:*
> {noformat}
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.ddl.DDLSemanticAnalyzerFactory.(DDLSemanticAnalyzerFactory.java:84)
>   ... 40 more
> {noformat}



--
This message was sent by 

[jira] [Resolved] (HIVE-26363) Time logged during repldump and replload per table is not in readable format

2022-09-14 Thread Rakshith C (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakshith C resolved HIVE-26363.
---
Resolution: Fixed

> Time logged during repldump and replload per table is not in readable format
> 
>
> Key: HIVE-26363
> URL: https://issues.apache.org/jira/browse/HIVE-26363
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, repl
>Affects Versions: 4.0.0
>Reporter: Imran
>Assignee: Rakshith C
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> During replDump and replLoad we capture time take for each activity in 
> hive.log file. This is captured in milliseconds which becomes difficult to 
> read during debug activity, this ticket is raised to change the time logged 
> in hive.log in UTC format.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness

2022-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808788=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808788
 ]

ASF GitHub Bot logged work on HIVE-26496:
-

Author: ASF GitHub Bot
Created on: 14/Sep/22 15:40
Start Date: 14/Sep/22 15:40
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3559:
URL: https://github.com/apache/hive/pull/3559#discussion_r970992776


##
ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java:
##
@@ -104,13 +104,32 @@ public OrcSplit(Path path, Object fileId, long offset, 
long length, String[] hos
 this.isOriginal = isOriginal;
 this.hasBase = hasBase;
 this.rootDir = rootDir;
-this.deltas.addAll(filterDeltasByBucketId(deltas, 
AcidUtils.parseBucketId(path)));
+this.deltas.addAll(filterDeleteDeltasByWriteIds

Review Comment:
   looks good! 
   1 question, should we call addAll, or can we simply assign the result of the 
collect?  
   
   this.deltas = 
collect(Collectors.toList());
   





Issue Time Tracking
---

Worklog Id: (was: 808788)
Time Spent: 3.5h  (was: 3h 20m)

> FetchOperator scans delete_delta folders multiple times causing slowness
> 
>
> Key: HIVE-26496
> URL: https://issues.apache.org/jira/browse/HIVE-26496
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> FetchOperator scans way too many number of files/directories than needed.
> For e.g here is a layout of a table which had set of updates and deletes. 
> There are set of "delta" and "delete_delta" folders which are created.
> {noformat}
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/base_001
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_002_002_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_003_003_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_004_004_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_005_005_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_006_006_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_007_007_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_008_008_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_009_009_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_010_010_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_011_011_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_012_012_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_013_013_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_014_014_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_015_015_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_016_016_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_017_017_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_018_018_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_019_019_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_020_020_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_021_021_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_022_022_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_002_002_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_003_003_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_004_004_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_005_005_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_006_006_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_007_007_
> 

[jira] [Work logged] (HIVE-26504) User is not able to drop table

2022-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26504?focusedWorklogId=808778=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808778
 ]

ASF GitHub Bot logged work on HIVE-26504:
-

Author: ASF GitHub Bot
Created on: 14/Sep/22 15:03
Start Date: 14/Sep/22 15:03
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3557:
URL: https://github.com/apache/hive/pull/3557#discussion_r970940220


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java:
##
@@ -235,6 +235,10 @@ public void alterTable(RawStore msdb, Warehouse wh, String 
catName, String dbnam
 
   boolean renamedTranslatedToExternalTable = rename && 
MetaStoreUtils.isTranslatedToExternalTable(oldt)
   && MetaStoreUtils.isTranslatedToExternalTable(newt);
+
+  List columnStatistics = getColumnStats(msdb, oldt);
+  columnStatistics = deleteTableColumnStats(msdb, oldt, newt, 
columnStatistics);

Review Comment:
   Since we have deleted the table column stats, do we need to call 
deleteAllPartitionColumnStatistics here?
   
[https://github.com/apache/hive/blob/f6bd0eb80767adfa9ce9f47a6d02a4940903effb/stand[…]ain/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java](https://github.com/apache/hive/blob/f6bd0eb80767adfa9ce9f47a6d02a4940903effb/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java#L417)





Issue Time Tracking
---

Worklog Id: (was: 808778)
Time Spent: 1h  (was: 50m)

> User is not able to drop table
> --
>
> Key: HIVE-26504
> URL: https://issues.apache.org/jira/browse/HIVE-26504
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: László Végh
>Assignee: László Végh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Hive won't store anything in *TAB_COL_STATS* for partitioned table, whereas 
> impala stores complete column stats in TAB_COL_STATS for partitioned table. 
> Deleting entries in TAB_COL_STATS is based on (DB_NAME, TABLE_NAME), not by 
> TBL_ID. Renamed tables were having old names in TAB_COL_STATS.
> To Repro:
> {code:java}
> beeline:
> set hive.create.as.insert.only=false;
> set hive.create.as.acid=false;
> create table testes.table_name_with_partition (id tinyint, name string) 
> partitioned by (col_to_partition bigint) stored as parquet;
> insert into testes.table_name_with_partition (id, name, col_to_partition) 
> values (1, "a", 2020), (2, "b", 2021), (3, "c", 2022);
> impala:
> compute stats testes.table_name_with_partition; -- backend shows new entries 
> in TAB_COL_STATS
> beeline:
> alter table testes.table_name_with_partition rename to 
> testes2.table_that_cant_be_droped;
> drop table testes2.table_that_cant_be_droped; -- This fails with 
> TAB_COL_STATS_fkey constraint violation.
> {code}
> Exception trace for drop table failure
> {code:java}
> Caused by: org.postgresql.util.PSQLException: ERROR: update or delete on 
> table "TBLS" violates foreign key constraint "TAB_COL_STATS_fkey" on table 
> "TAB_COL_STATS"
>   Detail: Key (TBL_ID)=(19816) is still referenced from table "TAB_COL_STATS".
> at 
> org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2532)
> at 
> org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2267)
> ... 50 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26332) Upgrade maven-surefire-plugin to 3.0.0-M7

2022-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26332?focusedWorklogId=808777=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808777
 ]

ASF GitHub Bot logged work on HIVE-26332:
-

Author: ASF GitHub Bot
Created on: 14/Sep/22 15:02
Start Date: 14/Sep/22 15:02
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #3375:
URL: https://github.com/apache/hive/pull/3375#issuecomment-1246903804

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=3375)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=3375=false=BUG)
 
[![C](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/C-16px.png
 
'C')](https://sonarcloud.io/project/issues?id=apache_hive=3375=false=BUG)
 [1 
Bug](https://sonarcloud.io/project/issues?id=apache_hive=3375=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=3375=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3375=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=3375=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3375=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3375=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3375=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=3375=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3375=false=CODE_SMELL)
 [44 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=3375=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3375=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3375=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 808777)
Time Spent: 1h  (was: 50m)

> Upgrade maven-surefire-plugin to 3.0.0-M7
> -
>
> Key: HIVE-26332
> URL: https://issues.apache.org/jira/browse/HIVE-26332
> Project: Hive
>  Issue Type: Task
>  Components: Testing Infrastructure
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently we use 3.0.0-M4 which was released in 2019. Since there have been 
> multiple bug fixes and improvements:
> [https://issues.apache.org/jira/issues/?jql=project%20%3D%20SUREFIRE%20AND%20(fixVersion%20%3D%203.0.0-M5%20OR%20fixVersion%20%3D%203.0.0-M6%20OR%20fixVersion%20%3D%203.0.0-M7)%20ORDER%20BY%20resolutiondate%20%20DESC%2C%20key]
> Worth mentioning that interaction with JUnit5 is much more mature as well and 
> this is one of the main reasons driving this upgrade.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness

2022-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808772=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808772
 ]

ASF GitHub Bot logged work on HIVE-26496:
-

Author: ASF GitHub Bot
Created on: 14/Sep/22 14:56
Start Date: 14/Sep/22 14:56
Worklog Time Spent: 10m 
  Work Description: difin commented on code in PR #3559:
URL: https://github.com/apache/hive/pull/3559#discussion_r970930245


##
ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java:
##
@@ -104,13 +104,32 @@ public OrcSplit(Path path, Object fileId, long offset, 
long length, String[] hos
 this.isOriginal = isOriginal;
 this.hasBase = hasBase;
 this.rootDir = rootDir;
-this.deltas.addAll(filterDeltasByBucketId(deltas, 
AcidUtils.parseBucketId(path)));
+this.deltas.addAll(filterDeleteDeltasByWriteIds

Review Comment:
   done, replaced nested function calls with stream processing.



##
ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java:
##
@@ -104,13 +104,32 @@ public OrcSplit(Path path, Object fileId, long offset, 
long length, String[] hos
 this.isOriginal = isOriginal;
 this.hasBase = hasBase;
 this.rootDir = rootDir;
-this.deltas.addAll(filterDeltasByBucketId(deltas, 
AcidUtils.parseBucketId(path)));
+this.deltas.addAll(filterDeleteDeltasByWriteIds
+(filterDeltasByBucketId(deltas, AcidUtils.parseBucketId(path)), 
conf));
 this.projColsUncompressedSize = projectedDataSize <= 0 ? length : 
projectedDataSize;
 // setting file length to Long.MAX_VALUE will let orc reader read file 
length from file system
 this.fileLen = fileLen <= 0 ? Long.MAX_VALUE : fileLen;
 this.syntheticAcidProps = syntheticAcidProps;
   }
 
+  /**
+   * For every split we want to filter out the delete deltas that contain 
events that happened only
+   * in the past relative to the split
+   * @param deltas
+   * @param conf
+   * @return
+   */
+  protected List filterDeleteDeltasByWriteIds(
+  List deltas, Configuration conf) 
throws IOException {
+
+AcidOutputFormat.Options orcSplitMinMaxWriteIds =
+AcidUtils.parseBaseOrDeltaBucketFilename(getPath(), conf);

Review Comment:
   done





Issue Time Tracking
---

Worklog Id: (was: 808772)
Time Spent: 3h 20m  (was: 3h 10m)

> FetchOperator scans delete_delta folders multiple times causing slowness
> 
>
> Key: HIVE-26496
> URL: https://issues.apache.org/jira/browse/HIVE-26496
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> FetchOperator scans way too many number of files/directories than needed.
> For e.g here is a layout of a table which had set of updates and deletes. 
> There are set of "delta" and "delete_delta" folders which are created.
> {noformat}
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/base_001
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_002_002_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_003_003_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_004_004_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_005_005_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_006_006_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_007_007_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_008_008_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_009_009_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_010_010_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_011_011_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_012_012_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_013_013_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_014_014_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_015_015_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_016_016_
> 

[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness

2022-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808771=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808771
 ]

ASF GitHub Bot logged work on HIVE-26496:
-

Author: ASF GitHub Bot
Created on: 14/Sep/22 14:54
Start Date: 14/Sep/22 14:54
Worklog Time Spent: 10m 
  Work Description: difin commented on code in PR #3559:
URL: https://github.com/apache/hive/pull/3559#discussion_r970928460


##
ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcSplit.java:
##
@@ -0,0 +1,181 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.io.orc;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.common.ValidReadTxnList;
+import org.apache.hadoop.hive.common.ValidTxnList;
+import org.apache.hadoop.hive.common.ValidWriteIdList;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.hive_metastoreConstants;
+import org.apache.hadoop.hive.ql.io.*;
+import org.apache.hadoop.hive.ql.io.AcidUtils.Directory;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
+import org.apache.hadoop.io.LongWritable;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.hadoop.mapred.Reporter;
+import org.apache.orc.OrcConf;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.io.File;
+import java.util.*;
+
+import static org.junit.Assert.*;
+
+/**
+ * Tests for OrcSplit class
+ */
+public class TestOrcSplit {
+
+  private JobConf conf;
+  private FileSystem fs;
+  private Path root;
+  private ObjectInspector inspector;
+  public static class DummyRow {
+LongWritable field;
+RecordIdentifier ROW__ID;
+
+DummyRow(long val, long rowId, long origTxn, int bucket) {
+  field = new LongWritable(val);
+  bucket = BucketCodec.V1.encode(new 
AcidOutputFormat.Options(null).bucket(bucket));
+  ROW__ID = new RecordIdentifier(origTxn, bucket, rowId);
+}
+
+static String getColumnNamesProperty() {
+  return "field";
+}
+static String getColumnTypesProperty() {
+  return "bigint";
+}
+
+  }
+
+  @Before
+  public void setup() throws Exception {
+conf = new JobConf();
+conf.set(hive_metastoreConstants.TABLE_IS_TRANSACTIONAL, "true");
+conf.setBoolean(HiveConf.ConfVars.HIVE_TRANSACTIONAL_TABLE_SCAN.varname, 
true);
+conf.set(hive_metastoreConstants.TABLE_TRANSACTIONAL_PROPERTIES, 
"default");
+conf.setInt(HiveConf.ConfVars.HIVE_TXN_OPERATIONAL_PROPERTIES.varname,
+AcidUtils.AcidOperationalProperties.getDefault().toInt());
+conf.set(IOConstants.SCHEMA_EVOLUTION_COLUMNS, 
DummyRow.getColumnNamesProperty());
+conf.set(IOConstants.SCHEMA_EVOLUTION_COLUMNS_TYPES, 
DummyRow.getColumnTypesProperty());
+conf.setBoolean(HiveConf.ConfVars.HIVE_VECTORIZATION_ENABLED.varname, 
true);
+conf.set(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY.varname, "BI");
+OrcConf.ROWS_BETWEEN_CHECKS.setLong(conf, 1);
+
+Path workDir = new Path(System.getProperty("test.tmp.dir",
+"target" + File.separator + "test" + File.separator + "tmp"));
+root = new Path(workDir, "TestOrcSplit.testDump");
+fs = root.getFileSystem(conf);
+root = fs.makeQualified(root);
+fs.delete(root, true);
+synchronized (TestOrcFile.class) {
+  inspector = ObjectInspectorFactory.getReflectionObjectInspector
+  (DummyRow.class, ObjectInspectorFactory.ObjectInspectorOptions.JAVA);
+}
+  }
+
+  private List> getSplitStrategies() throws 
Exception {
+conf.setInt(HiveConf.ConfVars.HIVE_TXN_OPERATIONAL_PROPERTIES.varname,
+AcidUtils.AcidOperationalProperties.getDefault().toInt());
+OrcInputFormat.Context context = new OrcInputFormat.Context(conf);
+OrcInputFormat.FileGenerator gen = new OrcInputFormat.FileGenerator(
+context, () -> fs, root, false, null);
+Directory adi = gen.call();
+return OrcInputFormat.determineSplitStrategies(
+null, context, adi.getFs(), 

[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness

2022-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808770=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808770
 ]

ASF GitHub Bot logged work on HIVE-26496:
-

Author: ASF GitHub Bot
Created on: 14/Sep/22 14:54
Start Date: 14/Sep/22 14:54
Worklog Time Spent: 10m 
  Work Description: difin commented on code in PR #3559:
URL: https://github.com/apache/hive/pull/3559#discussion_r970928073


##
ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcSplit.java:
##
@@ -0,0 +1,181 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.io.orc;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.common.ValidReadTxnList;
+import org.apache.hadoop.hive.common.ValidTxnList;
+import org.apache.hadoop.hive.common.ValidWriteIdList;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.hive_metastoreConstants;
+import org.apache.hadoop.hive.ql.io.*;
+import org.apache.hadoop.hive.ql.io.AcidUtils.Directory;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
+import org.apache.hadoop.io.LongWritable;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.hadoop.mapred.Reporter;
+import org.apache.orc.OrcConf;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.io.File;
+import java.util.*;
+
+import static org.junit.Assert.*;
+
+/**
+ * Tests for OrcSplit class
+ */
+public class TestOrcSplit {
+
+  private JobConf conf;
+  private FileSystem fs;
+  private Path root;
+  private ObjectInspector inspector;
+  public static class DummyRow {
+LongWritable field;
+RecordIdentifier ROW__ID;
+
+DummyRow(long val, long rowId, long origTxn, int bucket) {
+  field = new LongWritable(val);
+  bucket = BucketCodec.V1.encode(new 
AcidOutputFormat.Options(null).bucket(bucket));
+  ROW__ID = new RecordIdentifier(origTxn, bucket, rowId);
+}
+
+static String getColumnNamesProperty() {
+  return "field";
+}
+static String getColumnTypesProperty() {
+  return "bigint";
+}
+
+  }
+
+  @Before
+  public void setup() throws Exception {
+conf = new JobConf();
+conf.set(hive_metastoreConstants.TABLE_IS_TRANSACTIONAL, "true");
+conf.setBoolean(HiveConf.ConfVars.HIVE_TRANSACTIONAL_TABLE_SCAN.varname, 
true);
+conf.set(hive_metastoreConstants.TABLE_TRANSACTIONAL_PROPERTIES, 
"default");
+conf.setInt(HiveConf.ConfVars.HIVE_TXN_OPERATIONAL_PROPERTIES.varname,
+AcidUtils.AcidOperationalProperties.getDefault().toInt());
+conf.set(IOConstants.SCHEMA_EVOLUTION_COLUMNS, 
DummyRow.getColumnNamesProperty());
+conf.set(IOConstants.SCHEMA_EVOLUTION_COLUMNS_TYPES, 
DummyRow.getColumnTypesProperty());
+conf.setBoolean(HiveConf.ConfVars.HIVE_VECTORIZATION_ENABLED.varname, 
true);
+conf.set(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY.varname, "BI");
+OrcConf.ROWS_BETWEEN_CHECKS.setLong(conf, 1);
+
+Path workDir = new Path(System.getProperty("test.tmp.dir",
+"target" + File.separator + "test" + File.separator + "tmp"));
+root = new Path(workDir, "TestOrcSplit.testDump");
+fs = root.getFileSystem(conf);
+root = fs.makeQualified(root);
+fs.delete(root, true);
+synchronized (TestOrcFile.class) {
+  inspector = ObjectInspectorFactory.getReflectionObjectInspector
+  (DummyRow.class, ObjectInspectorFactory.ObjectInspectorOptions.JAVA);
+}
+  }
+
+  private List> getSplitStrategies() throws 
Exception {
+conf.setInt(HiveConf.ConfVars.HIVE_TXN_OPERATIONAL_PROPERTIES.varname,
+AcidUtils.AcidOperationalProperties.getDefault().toInt());
+OrcInputFormat.Context context = new OrcInputFormat.Context(conf);
+OrcInputFormat.FileGenerator gen = new OrcInputFormat.FileGenerator(
+context, () -> fs, root, false, null);
+Directory adi = gen.call();
+return OrcInputFormat.determineSplitStrategies(
+null, context, adi.getFs(), 

[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness

2022-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808768=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808768
 ]

ASF GitHub Bot logged work on HIVE-26496:
-

Author: ASF GitHub Bot
Created on: 14/Sep/22 14:54
Start Date: 14/Sep/22 14:54
Worklog Time Spent: 10m 
  Work Description: difin commented on code in PR #3559:
URL: https://github.com/apache/hive/pull/3559#discussion_r970927575


##
ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcSplit.java:
##
@@ -0,0 +1,181 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.io.orc;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.common.ValidReadTxnList;
+import org.apache.hadoop.hive.common.ValidTxnList;
+import org.apache.hadoop.hive.common.ValidWriteIdList;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.hive_metastoreConstants;
+import org.apache.hadoop.hive.ql.io.*;
+import org.apache.hadoop.hive.ql.io.AcidUtils.Directory;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
+import org.apache.hadoop.io.LongWritable;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.hadoop.mapred.Reporter;
+import org.apache.orc.OrcConf;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.io.File;
+import java.util.*;

Review Comment:
   replaced wildcard imports with concrete classes.





Issue Time Tracking
---

Worklog Id: (was: 808768)
Time Spent: 2h 50m  (was: 2h 40m)

> FetchOperator scans delete_delta folders multiple times causing slowness
> 
>
> Key: HIVE-26496
> URL: https://issues.apache.org/jira/browse/HIVE-26496
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> FetchOperator scans way too many number of files/directories than needed.
> For e.g here is a layout of a table which had set of updates and deletes. 
> There are set of "delta" and "delete_delta" folders which are created.
> {noformat}
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/base_001
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_002_002_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_003_003_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_004_004_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_005_005_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_006_006_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_007_007_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_008_008_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_009_009_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_010_010_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_011_011_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_012_012_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_013_013_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_014_014_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_015_015_
> 

[jira] [Work logged] (HIVE-26420) Configurable timeout for HiveSplitGenerator to wait for LLAP instances

2022-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26420?focusedWorklogId=808674=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808674
 ]

ASF GitHub Bot logged work on HIVE-26420:
-

Author: ASF GitHub Bot
Created on: 14/Sep/22 11:50
Start Date: 14/Sep/22 11:50
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #3468:
URL: https://github.com/apache/hive/pull/3468#issuecomment-1246649477

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=3468)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=3468=false=BUG)
 
[![C](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/C-16px.png
 
'C')](https://sonarcloud.io/project/issues?id=apache_hive=3468=false=BUG)
 [1 
Bug](https://sonarcloud.io/project/issues?id=apache_hive=3468=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=3468=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3468=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=3468=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3468=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3468=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3468=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=3468=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3468=false=CODE_SMELL)
 [49 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=3468=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3468=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3468=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 808674)
Time Spent: 40m  (was: 0.5h)

> Configurable timeout for HiveSplitGenerator to wait for LLAP instances
> --
>
> Key: HIVE-26420
> URL: https://issues.apache.org/jira/browse/HIVE-26420
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In some circumstances we cannot guarantee that LLAP daemons are ready as soon 
> as Tez AMs, but don't want the query to fail immediately with:
> {code}
> Caused by: java.lang.IllegalArgumentException: No running LLAP daemons! 
> Please check LLAP service status and zookeeper configuration
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:142)
> 
> org.apache.hadoop.hive.ql.exec.tez.Utils.getCustomSplitLocationProvider(Utils.java:105)
> 
> org.apache.hadoop.hive.ql.exec.tez.Utils.getSplitLocationProvider(Utils.java:77)
> 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.(HiveSplitGenerator.java:147)
> 19 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness

2022-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808672=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808672
 ]

ASF GitHub Bot logged work on HIVE-26496:
-

Author: ASF GitHub Bot
Created on: 14/Sep/22 11:48
Start Date: 14/Sep/22 11:48
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3559:
URL: https://github.com/apache/hive/pull/3559#discussion_r970701167


##
ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java:
##
@@ -104,13 +104,32 @@ public OrcSplit(Path path, Object fileId, long offset, 
long length, String[] hos
 this.isOriginal = isOriginal;
 this.hasBase = hasBase;
 this.rootDir = rootDir;
-this.deltas.addAll(filterDeltasByBucketId(deltas, 
AcidUtils.parseBucketId(path)));
+this.deltas.addAll(filterDeleteDeltasByWriteIds
+(filterDeltasByBucketId(deltas, AcidUtils.parseBucketId(path)), 
conf));
 this.projColsUncompressedSize = projectedDataSize <= 0 ? length : 
projectedDataSize;
 // setting file length to Long.MAX_VALUE will let orc reader read file 
length from file system
 this.fileLen = fileLen <= 0 ? Long.MAX_VALUE : fileLen;
 this.syntheticAcidProps = syntheticAcidProps;
   }
 
+  /**
+   * For every split we want to filter out the delete deltas that contain 
events that happened only
+   * in the past relative to the split
+   * @param deltas
+   * @param conf
+   * @return
+   */
+  protected List filterDeleteDeltasByWriteIds(
+  List deltas, Configuration conf) 
throws IOException {
+
+AcidOutputFormat.Options orcSplitMinMaxWriteIds =
+AcidUtils.parseBaseOrDeltaBucketFilename(getPath(), conf);

Review Comment:
   why not simply `ParsedDeltaLight.parse(bucketFile.getParent())`?





Issue Time Tracking
---

Worklog Id: (was: 808672)
Time Spent: 2h 40m  (was: 2.5h)

> FetchOperator scans delete_delta folders multiple times causing slowness
> 
>
> Key: HIVE-26496
> URL: https://issues.apache.org/jira/browse/HIVE-26496
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> FetchOperator scans way too many number of files/directories than needed.
> For e.g here is a layout of a table which had set of updates and deletes. 
> There are set of "delta" and "delete_delta" folders which are created.
> {noformat}
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/base_001
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_002_002_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_003_003_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_004_004_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_005_005_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_006_006_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_007_007_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_008_008_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_009_009_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_010_010_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_011_011_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_012_012_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_013_013_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_014_014_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_015_015_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_016_016_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_017_017_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_018_018_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_019_019_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_020_020_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_021_021_
> 

[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness

2022-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808665=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808665
 ]

ASF GitHub Bot logged work on HIVE-26496:
-

Author: ASF GitHub Bot
Created on: 14/Sep/22 11:44
Start Date: 14/Sep/22 11:44
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3559:
URL: https://github.com/apache/hive/pull/3559#discussion_r970696749


##
ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java:
##
@@ -104,13 +104,32 @@ public OrcSplit(Path path, Object fileId, long offset, 
long length, String[] hos
 this.isOriginal = isOriginal;
 this.hasBase = hasBase;
 this.rootDir = rootDir;
-this.deltas.addAll(filterDeltasByBucketId(deltas, 
AcidUtils.parseBucketId(path)));
+this.deltas.addAll(filterDeleteDeltasByWriteIds

Review Comment:
   could we transform this construct into a stream pipeline:
   
   this.deltas = deltas.stream()
 .filter(delta -> filterDeltasByBucketId(delta, 
AcidUtils.parseBucketId(path)))
 .filter(delta -> filterDeleteDeltasByWriteIds(delta, conf))
 .collect(Collectors.toList());
   





Issue Time Tracking
---

Worklog Id: (was: 808665)
Time Spent: 2.5h  (was: 2h 20m)

> FetchOperator scans delete_delta folders multiple times causing slowness
> 
>
> Key: HIVE-26496
> URL: https://issues.apache.org/jira/browse/HIVE-26496
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> FetchOperator scans way too many number of files/directories than needed.
> For e.g here is a layout of a table which had set of updates and deletes. 
> There are set of "delta" and "delete_delta" folders which are created.
> {noformat}
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/base_001
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_002_002_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_003_003_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_004_004_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_005_005_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_006_006_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_007_007_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_008_008_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_009_009_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_010_010_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_011_011_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_012_012_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_013_013_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_014_014_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_015_015_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_016_016_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_017_017_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_018_018_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_019_019_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_020_020_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_021_021_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_022_022_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_002_002_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_003_003_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_004_004_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_005_005_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_006_006_
> 

[jira] [Work logged] (HIVE-26045) Detect timed out connections for providers and auto-reconnect

2022-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26045?focusedWorklogId=808661=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808661
 ]

ASF GitHub Bot logged work on HIVE-26045:
-

Author: ASF GitHub Bot
Created on: 14/Sep/22 11:41
Start Date: 14/Sep/22 11:41
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #3595:
URL: https://github.com/apache/hive/pull/3595#issuecomment-1246640211

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=3595)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=3595=false=BUG)
 
[![C](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/C-16px.png
 
'C')](https://sonarcloud.io/project/issues?id=apache_hive=3595=false=BUG)
 [1 
Bug](https://sonarcloud.io/project/issues?id=apache_hive=3595=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=3595=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3595=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=3595=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3595=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3595=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3595=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=3595=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3595=false=CODE_SMELL)
 [45 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=3595=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3595=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3595=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 808661)
Time Spent: 2h 50m  (was: 2h 40m)

> Detect timed out connections for providers and auto-reconnect
> -
>
> Key: HIVE-26045
> URL: https://issues.apache.org/jira/browse/HIVE-26045
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Naveen Gangam
>Assignee: zhangbutao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> For the connectors, we use single connection, no pooling. But when the 
> connection is idle for an extended period, the JDBC connection times out. We 
> need to check for closed connections (Connection.isClosed()?) and 
> re-establish the connection. Otherwise it renders the connector fairly 
> useless.
> {noformat}
> 2022-03-17T13:02:16,635  WARN [HiveServer2-Handler-Pool: Thread-116] 
> thrift.ThriftCLIService: Error executing statement: 
> org.apache.hive.service.cli.HiveSQLException: Error while compiling 
> statement: FAILED: SemanticException Unable to fetch table temp_dbs. Error 
> retrieving remote 
> table:com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: 
> No operations allowed after connection closed.
>   at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:373)
>  ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT]
> 

[jira] [Work logged] (HIVE-26525) Update llap-server python scripts to be compatible with python 3

2022-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26525?focusedWorklogId=808646=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808646
 ]

ASF GitHub Bot logged work on HIVE-26525:
-

Author: ASF GitHub Bot
Created on: 14/Sep/22 11:10
Start Date: 14/Sep/22 11:10
Worklog Time Spent: 10m 
  Work Description: deniskuzZ merged PR #3584:
URL: https://github.com/apache/hive/pull/3584




Issue Time Tracking
---

Worklog Id: (was: 808646)
Time Spent: 50m  (was: 40m)

> Update llap-server python scripts to be compatible with python 3
> 
>
> Key: HIVE-26525
> URL: https://issues.apache.org/jira/browse/HIVE-26525
> Project: Hive
>  Issue Type: Task
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> llap-server/src/main/resources/package.py and 
> /llap-server/src/main/resources/argparse.py are not compatible with python 3. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26445) Use tez.local.mode.without.network for qtests

2022-09-14 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-26445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17604016#comment-17604016
 ] 

László Bodor commented on HIVE-26445:
-

this one can merged upstream only after Tez 0.10.3 is released with TEZ-4447

> Use tez.local.mode.without.network for qtests
> -
>
> Key: HIVE-26445
> URL: https://issues.apache.org/jira/browse/HIVE-26445
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> looks like in case of iceberg, the local dagclient behaves weird:
> {code}
> 2022-08-02T06:54:36,669 ERROR [2f953972-7675-4594-8d6b-d1c295c056a5 
> Time-limited test] tez.TezTask: Failed to execute tez graph.
> java.lang.NullPointerException: null
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezTask.collectCommitInformation(TezTask.java:367)
>  ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:279) 
> [hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212) 
> [hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
> [hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:354) 
> [hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:327) 
> [hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:244) 
> [hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:105) 
> [hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:355) 
> [hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:205) 
> [hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:154) 
> [hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:149) 
> [hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
> {code}
> it's thrown from 
> https://github.com/apache/hive/blob/e0f2d287c562423dc2632910aae4f1cd8bcd4b4d/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java#L367



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26247) Filter out results 'show connectors' on HMS server-side

2022-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26247?focusedWorklogId=808620=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808620
 ]

ASF GitHub Bot logged work on HIVE-26247:
-

Author: ASF GitHub Bot
Created on: 14/Sep/22 10:20
Start Date: 14/Sep/22 10:20
Worklog Time Spent: 10m 
  Work Description: zhangbutao commented on PR #3545:
URL: https://github.com/apache/hive/pull/3545#issuecomment-1246553930

   Gentle ping @nrg4878 @saihemanth-cloudera 




Issue Time Tracking
---

Worklog Id: (was: 808620)
Time Spent: 40m  (was: 0.5h)

> Filter out results 'show connectors' on HMS server-side
> ---
>
> Key: HIVE-26247
> URL: https://issues.apache.org/jira/browse/HIVE-26247
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 4.0.0-alpha-1, 4.0.0-alpha-2
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26488) Fix NPE in DDLSemanticAnalyzerFactory during compilation

2022-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26488?focusedWorklogId=808618=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808618
 ]

ASF GitHub Bot logged work on HIVE-26488:
-

Author: ASF GitHub Bot
Created on: 14/Sep/22 10:13
Start Date: 14/Sep/22 10:13
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #3538:
URL: https://github.com/apache/hive/pull/3538#issuecomment-1246543513

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=3538)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=3538=false=BUG)
 
[![C](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/C-16px.png
 
'C')](https://sonarcloud.io/project/issues?id=apache_hive=3538=false=BUG)
 [1 
Bug](https://sonarcloud.io/project/issues?id=apache_hive=3538=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=3538=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3538=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=3538=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3538=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3538=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3538=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=3538=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3538=false=CODE_SMELL)
 [48 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=3538=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3538=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3538=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 808618)
Time Spent: 2h  (was: 1h 50m)

> Fix NPE in DDLSemanticAnalyzerFactory during compilation
> 
>
> Key: HIVE-26488
> URL: https://issues.apache.org/jira/browse/HIVE-26488
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> *Exception Trace:*
> {noformat}
> java.lang.ExceptionInInitializerError
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzerFactory.getInternal(SemanticAnalyzerFactory.java:62)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzerFactory.get(SemanticAnalyzerFactory.java:41)
>   at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:209)
>   at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:106)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:507)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:459)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:424)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:418)
> {noformat}
> *Cause:*
> {noformat}
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.ddl.DDLSemanticAnalyzerFactory.(DDLSemanticAnalyzerFactory.java:84)
>   ... 40 more
> {noformat}



--
This message was sent by 

[jira] [Work logged] (HIVE-26045) Detect timed out connections for providers and auto-reconnect

2022-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26045?focusedWorklogId=808617=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808617
 ]

ASF GitHub Bot logged work on HIVE-26045:
-

Author: ASF GitHub Bot
Created on: 14/Sep/22 10:07
Start Date: 14/Sep/22 10:07
Worklog Time Spent: 10m 
  Work Description: zhangbutao commented on PR #3388:
URL: https://github.com/apache/hive/pull/3388#issuecomment-1246535811

   Superseded by https://github.com/apache/hive/pull/3595
   cc @nrg4878 




Issue Time Tracking
---

Worklog Id: (was: 808617)
Time Spent: 2h 40m  (was: 2.5h)

> Detect timed out connections for providers and auto-reconnect
> -
>
> Key: HIVE-26045
> URL: https://issues.apache.org/jira/browse/HIVE-26045
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Naveen Gangam
>Assignee: zhangbutao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> For the connectors, we use single connection, no pooling. But when the 
> connection is idle for an extended period, the JDBC connection times out. We 
> need to check for closed connections (Connection.isClosed()?) and 
> re-establish the connection. Otherwise it renders the connector fairly 
> useless.
> {noformat}
> 2022-03-17T13:02:16,635  WARN [HiveServer2-Handler-Pool: Thread-116] 
> thrift.ThriftCLIService: Error executing statement: 
> org.apache.hive.service.cli.HiveSQLException: Error while compiling 
> statement: FAILED: SemanticException Unable to fetch table temp_dbs. Error 
> retrieving remote 
> table:com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: 
> No operations allowed after connection closed.
>   at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:373)
>  ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT]
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:211)
>  ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT]
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:265)
>  ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT]
>   at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:285) 
> ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT]
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:576)
>  ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT]
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:562)
>  ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT]
>   at sun.reflect.GeneratedMethodAccessor52.invoke(Unknown Source) ~[?:?]
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_231]
>   at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_231]
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
>  ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT]
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
>  ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT]
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
>  ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT]
>   at java.security.AccessController.doPrivileged(Native Method) 
> ~[?:1.8.0_231]
>   at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_231]
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>  ~[hadoop-common-3.1.0.jar:?]
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
>  ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT]
>   at com.sun.proxy.$Proxy44.executeStatementAsync(Unknown Source) ~[?:?]
>   at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:315)
>  ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT]
>   at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:567)
>  ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT]
>   at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1550)
>  

[jira] [Work logged] (HIVE-26045) Detect timed out connections for providers and auto-reconnect

2022-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26045?focusedWorklogId=808613=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808613
 ]

ASF GitHub Bot logged work on HIVE-26045:
-

Author: ASF GitHub Bot
Created on: 14/Sep/22 10:05
Start Date: 14/Sep/22 10:05
Worklog Time Spent: 10m 
  Work Description: zhangbutao opened a new pull request, #3595:
URL: https://github.com/apache/hive/pull/3595

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   




Issue Time Tracking
---

Worklog Id: (was: 808613)
Time Spent: 2.5h  (was: 2h 20m)

> Detect timed out connections for providers and auto-reconnect
> -
>
> Key: HIVE-26045
> URL: https://issues.apache.org/jira/browse/HIVE-26045
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Naveen Gangam
>Assignee: zhangbutao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> For the connectors, we use single connection, no pooling. But when the 
> connection is idle for an extended period, the JDBC connection times out. We 
> need to check for closed connections (Connection.isClosed()?) and 
> re-establish the connection. Otherwise it renders the connector fairly 
> useless.
> {noformat}
> 2022-03-17T13:02:16,635  WARN [HiveServer2-Handler-Pool: Thread-116] 
> thrift.ThriftCLIService: Error executing statement: 
> org.apache.hive.service.cli.HiveSQLException: Error while compiling 
> statement: FAILED: SemanticException Unable to fetch table temp_dbs. Error 
> retrieving remote 
> table:com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: 
> No operations allowed after connection closed.
>   at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:373)
>  ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT]
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:211)
>  ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT]
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:265)
>  ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT]
>   at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:285) 
> ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT]
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:576)
>  ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT]
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:562)
>  ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT]
>   at sun.reflect.GeneratedMethodAccessor52.invoke(Unknown Source) ~[?:?]
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_231]
>   at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_231]
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
>  ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT]
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
>  ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT]
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
>  ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT]
>   at java.security.AccessController.doPrivileged(Native Method) 
> ~[?:1.8.0_231]
>   at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_231]
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>  ~[hadoop-common-3.1.0.jar:?]
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
>  ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT]
>   at com.sun.proxy.$Proxy44.executeStatementAsync(Unknown Source) ~[?:?]
>   at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:315)
>  ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT]
>   at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:567)
>  ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT]
>   at 
> 

[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness

2022-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808609=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808609
 ]

ASF GitHub Bot logged work on HIVE-26496:
-

Author: ASF GitHub Bot
Created on: 14/Sep/22 10:03
Start Date: 14/Sep/22 10:03
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on PR #3559:
URL: https://github.com/apache/hive/pull/3559#issuecomment-1246530870

   left minor comments, but basically looks good to me, I'm tempted to approve 
this immediately
   just wondering if anyone with ACID background can see any obvious problems: 
@deniskuzZ , @lcspinter 




Issue Time Tracking
---

Worklog Id: (was: 808609)
Time Spent: 2h 20m  (was: 2h 10m)

> FetchOperator scans delete_delta folders multiple times causing slowness
> 
>
> Key: HIVE-26496
> URL: https://issues.apache.org/jira/browse/HIVE-26496
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> FetchOperator scans way too many number of files/directories than needed.
> For e.g here is a layout of a table which had set of updates and deletes. 
> There are set of "delta" and "delete_delta" folders which are created.
> {noformat}
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/base_001
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_002_002_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_003_003_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_004_004_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_005_005_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_006_006_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_007_007_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_008_008_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_009_009_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_010_010_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_011_011_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_012_012_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_013_013_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_014_014_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_015_015_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_016_016_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_017_017_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_018_018_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_019_019_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_020_020_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_021_021_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_022_022_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_002_002_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_003_003_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_004_004_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_005_005_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_006_006_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_007_007_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_008_008_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_009_009_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_010_010_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_011_011_
> 

[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness

2022-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808606=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808606
 ]

ASF GitHub Bot logged work on HIVE-26496:
-

Author: ASF GitHub Bot
Created on: 14/Sep/22 10:01
Start Date: 14/Sep/22 10:01
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on code in PR #3559:
URL: https://github.com/apache/hive/pull/3559#discussion_r970592700


##
ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcSplit.java:
##
@@ -0,0 +1,181 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.io.orc;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.common.ValidReadTxnList;
+import org.apache.hadoop.hive.common.ValidTxnList;
+import org.apache.hadoop.hive.common.ValidWriteIdList;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.hive_metastoreConstants;
+import org.apache.hadoop.hive.ql.io.*;
+import org.apache.hadoop.hive.ql.io.AcidUtils.Directory;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
+import org.apache.hadoop.io.LongWritable;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.hadoop.mapred.Reporter;
+import org.apache.orc.OrcConf;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.io.File;
+import java.util.*;
+
+import static org.junit.Assert.*;
+
+/**
+ * Tests for OrcSplit class
+ */
+public class TestOrcSplit {
+
+  private JobConf conf;
+  private FileSystem fs;
+  private Path root;
+  private ObjectInspector inspector;
+  public static class DummyRow {
+LongWritable field;
+RecordIdentifier ROW__ID;
+
+DummyRow(long val, long rowId, long origTxn, int bucket) {
+  field = new LongWritable(val);
+  bucket = BucketCodec.V1.encode(new 
AcidOutputFormat.Options(null).bucket(bucket));
+  ROW__ID = new RecordIdentifier(origTxn, bucket, rowId);
+}
+
+static String getColumnNamesProperty() {
+  return "field";
+}
+static String getColumnTypesProperty() {
+  return "bigint";
+}
+
+  }
+
+  @Before
+  public void setup() throws Exception {
+conf = new JobConf();
+conf.set(hive_metastoreConstants.TABLE_IS_TRANSACTIONAL, "true");
+conf.setBoolean(HiveConf.ConfVars.HIVE_TRANSACTIONAL_TABLE_SCAN.varname, 
true);
+conf.set(hive_metastoreConstants.TABLE_TRANSACTIONAL_PROPERTIES, 
"default");
+conf.setInt(HiveConf.ConfVars.HIVE_TXN_OPERATIONAL_PROPERTIES.varname,
+AcidUtils.AcidOperationalProperties.getDefault().toInt());
+conf.set(IOConstants.SCHEMA_EVOLUTION_COLUMNS, 
DummyRow.getColumnNamesProperty());
+conf.set(IOConstants.SCHEMA_EVOLUTION_COLUMNS_TYPES, 
DummyRow.getColumnTypesProperty());
+conf.setBoolean(HiveConf.ConfVars.HIVE_VECTORIZATION_ENABLED.varname, 
true);
+conf.set(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY.varname, "BI");
+OrcConf.ROWS_BETWEEN_CHECKS.setLong(conf, 1);
+
+Path workDir = new Path(System.getProperty("test.tmp.dir",
+"target" + File.separator + "test" + File.separator + "tmp"));
+root = new Path(workDir, "TestOrcSplit.testDump");
+fs = root.getFileSystem(conf);
+root = fs.makeQualified(root);
+fs.delete(root, true);
+synchronized (TestOrcFile.class) {
+  inspector = ObjectInspectorFactory.getReflectionObjectInspector
+  (DummyRow.class, ObjectInspectorFactory.ObjectInspectorOptions.JAVA);
+}
+  }
+
+  private List> getSplitStrategies() throws 
Exception {
+conf.setInt(HiveConf.ConfVars.HIVE_TXN_OPERATIONAL_PROPERTIES.varname,
+AcidUtils.AcidOperationalProperties.getDefault().toInt());
+OrcInputFormat.Context context = new OrcInputFormat.Context(conf);
+OrcInputFormat.FileGenerator gen = new OrcInputFormat.FileGenerator(
+context, () -> fs, root, false, null);
+Directory adi = gen.call();
+return OrcInputFormat.determineSplitStrategies(
+null, context, adi.getFs(), 

[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness

2022-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808605=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808605
 ]

ASF GitHub Bot logged work on HIVE-26496:
-

Author: ASF GitHub Bot
Created on: 14/Sep/22 10:01
Start Date: 14/Sep/22 10:01
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on code in PR #3559:
URL: https://github.com/apache/hive/pull/3559#discussion_r970592299


##
ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcSplit.java:
##
@@ -0,0 +1,181 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.io.orc;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.common.ValidReadTxnList;
+import org.apache.hadoop.hive.common.ValidTxnList;
+import org.apache.hadoop.hive.common.ValidWriteIdList;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.hive_metastoreConstants;
+import org.apache.hadoop.hive.ql.io.*;
+import org.apache.hadoop.hive.ql.io.AcidUtils.Directory;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
+import org.apache.hadoop.io.LongWritable;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.hadoop.mapred.Reporter;
+import org.apache.orc.OrcConf;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.io.File;
+import java.util.*;
+
+import static org.junit.Assert.*;
+
+/**
+ * Tests for OrcSplit class
+ */
+public class TestOrcSplit {
+
+  private JobConf conf;
+  private FileSystem fs;
+  private Path root;
+  private ObjectInspector inspector;
+  public static class DummyRow {
+LongWritable field;
+RecordIdentifier ROW__ID;
+
+DummyRow(long val, long rowId, long origTxn, int bucket) {
+  field = new LongWritable(val);
+  bucket = BucketCodec.V1.encode(new 
AcidOutputFormat.Options(null).bucket(bucket));
+  ROW__ID = new RecordIdentifier(origTxn, bucket, rowId);
+}
+
+static String getColumnNamesProperty() {
+  return "field";
+}
+static String getColumnTypesProperty() {
+  return "bigint";
+}
+
+  }
+
+  @Before
+  public void setup() throws Exception {
+conf = new JobConf();
+conf.set(hive_metastoreConstants.TABLE_IS_TRANSACTIONAL, "true");
+conf.setBoolean(HiveConf.ConfVars.HIVE_TRANSACTIONAL_TABLE_SCAN.varname, 
true);
+conf.set(hive_metastoreConstants.TABLE_TRANSACTIONAL_PROPERTIES, 
"default");
+conf.setInt(HiveConf.ConfVars.HIVE_TXN_OPERATIONAL_PROPERTIES.varname,
+AcidUtils.AcidOperationalProperties.getDefault().toInt());
+conf.set(IOConstants.SCHEMA_EVOLUTION_COLUMNS, 
DummyRow.getColumnNamesProperty());
+conf.set(IOConstants.SCHEMA_EVOLUTION_COLUMNS_TYPES, 
DummyRow.getColumnTypesProperty());
+conf.setBoolean(HiveConf.ConfVars.HIVE_VECTORIZATION_ENABLED.varname, 
true);
+conf.set(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY.varname, "BI");
+OrcConf.ROWS_BETWEEN_CHECKS.setLong(conf, 1);
+
+Path workDir = new Path(System.getProperty("test.tmp.dir",
+"target" + File.separator + "test" + File.separator + "tmp"));
+root = new Path(workDir, "TestOrcSplit.testDump");
+fs = root.getFileSystem(conf);
+root = fs.makeQualified(root);
+fs.delete(root, true);
+synchronized (TestOrcFile.class) {
+  inspector = ObjectInspectorFactory.getReflectionObjectInspector
+  (DummyRow.class, ObjectInspectorFactory.ObjectInspectorOptions.JAVA);
+}
+  }
+
+  private List> getSplitStrategies() throws 
Exception {
+conf.setInt(HiveConf.ConfVars.HIVE_TXN_OPERATIONAL_PROPERTIES.varname,
+AcidUtils.AcidOperationalProperties.getDefault().toInt());
+OrcInputFormat.Context context = new OrcInputFormat.Context(conf);
+OrcInputFormat.FileGenerator gen = new OrcInputFormat.FileGenerator(
+context, () -> fs, root, false, null);
+Directory adi = gen.call();
+return OrcInputFormat.determineSplitStrategies(
+null, context, adi.getFs(), 

[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness

2022-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808603=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808603
 ]

ASF GitHub Bot logged work on HIVE-26496:
-

Author: ASF GitHub Bot
Created on: 14/Sep/22 09:58
Start Date: 14/Sep/22 09:58
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on code in PR #3559:
URL: https://github.com/apache/hive/pull/3559#discussion_r970589035


##
ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcSplit.java:
##
@@ -0,0 +1,181 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.io.orc;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.common.ValidReadTxnList;
+import org.apache.hadoop.hive.common.ValidTxnList;
+import org.apache.hadoop.hive.common.ValidWriteIdList;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.hive_metastoreConstants;
+import org.apache.hadoop.hive.ql.io.*;
+import org.apache.hadoop.hive.ql.io.AcidUtils.Directory;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
+import org.apache.hadoop.io.LongWritable;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.hadoop.mapred.Reporter;
+import org.apache.orc.OrcConf;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.io.File;
+import java.util.*;

Review Comment:
   we don't import with wildcards in general
   honestly, I don't have a strong opinion about this :) but it tends to be 
avoided according to code reviews





Issue Time Tracking
---

Worklog Id: (was: 808603)
Time Spent: 1h 50m  (was: 1h 40m)

> FetchOperator scans delete_delta folders multiple times causing slowness
> 
>
> Key: HIVE-26496
> URL: https://issues.apache.org/jira/browse/HIVE-26496
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> FetchOperator scans way too many number of files/directories than needed.
> For e.g here is a layout of a table which had set of updates and deletes. 
> There are set of "delta" and "delete_delta" folders which are created.
> {noformat}
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/base_001
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_002_002_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_003_003_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_004_004_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_005_005_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_006_006_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_007_007_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_008_008_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_009_009_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_010_010_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_011_011_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_012_012_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_013_013_
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_014_014_
> 

[jira] [Work logged] (HIVE-26277) NPEs and rounding issues in ColumnStatsAggregator classes

2022-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26277?focusedWorklogId=808598=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808598
 ]

ASF GitHub Bot logged work on HIVE-26277:
-

Author: ASF GitHub Bot
Created on: 14/Sep/22 09:52
Start Date: 14/Sep/22 09:52
Worklog Time Spent: 10m 
  Work Description: zabetak commented on code in PR #3339:
URL: https://github.com/apache/hive/pull/3339#discussion_r970582527


##
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/columnstats/aggr/DateColumnStatsAggregatorTest.java:
##
@@ -0,0 +1,279 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.hadoop.hive.metastore.columnstats.aggr;
+
+import org.apache.hadoop.hive.metastore.TableType;
+import org.apache.hadoop.hive.metastore.annotation.MetastoreUnitTest;
+import org.apache.hadoop.hive.metastore.api.ColumnStatisticsData;
+import org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj;
+import org.apache.hadoop.hive.metastore.api.Date;
+import org.apache.hadoop.hive.metastore.api.FieldSchema;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.hive.metastore.columnstats.ColStatsBuilder;
+import 
org.apache.hadoop.hive.metastore.utils.MetaStoreServerUtils.ColStatsObjWithSourceInfo;
+import org.junit.Assert;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.List;
+
+import static 
org.apache.hadoop.hive.metastore.StatisticsTestUtils.createStatsWithInfo;
+
+@Category(MetastoreUnitTest.class)
+public class DateColumnStatsAggregatorTest {
+
+  private static final Table TABLE = new Table("dummy", "db", "hive", 0, 0,
+  0, null, null, Collections.emptyMap(), null, null,
+  TableType.MANAGED_TABLE.toString());
+  private static final FieldSchema COL = new FieldSchema("col", "int", "");
+
+  private static final Date DATE_1 = new Date(1);
+  private static final Date DATE_2 = new Date(2);
+  private static final Date DATE_3 = new Date(3);
+  private static final Date DATE_4 = new Date(4);
+  private static final Date DATE_5 = new Date(5);
+  private static final Date DATE_6 = new Date(6);
+  private static final Date DATE_7 = new Date(7);
+  private static final Date DATE_8 = new Date(8);
+  private static final Date DATE_9 = new Date(9);
+
+  @Test
+  public void testAggregateSingleStat() throws MetaException {
+List partitions = Collections.singletonList("part1");
+
+ColumnStatisticsData data1 = new 
ColStatsBuilder<>(Date.class).numNulls(1).numDVs(2).low(DATE_1).high(DATE_4)
+.hll(DATE_1.getDaysSinceEpoch(), DATE_4.getDaysSinceEpoch()).build();
+List statsList =
+Collections.singletonList(createStatsWithInfo(data1, TABLE, COL, 
partitions.get(0)));
+
+DateColumnStatsAggregator aggregator = new DateColumnStatsAggregator();
+ColumnStatisticsObj computedStatsObj = aggregator.aggregate(statsList, 
partitions, true);
+
+Assert.assertEquals(data1, computedStatsObj.getStatsData());
+  }
+
+  @Test
+  public void testAggregateSingleStatWhenNullValues() throws MetaException {
+List partitions = Collections.singletonList("part1");
+
+ColumnStatisticsData data1 = new 
ColStatsBuilder<>(Date.class).numNulls(1).numDVs(2).build();
+List statsList =
+Collections.singletonList(createStatsWithInfo(data1, TABLE, COL, 
partitions.get(0)));
+
+DateColumnStatsAggregator aggregator = new DateColumnStatsAggregator();
+ColumnStatisticsObj computedStatsObj = aggregator.aggregate(statsList, 
partitions, true);
+Assert.assertEquals(data1, computedStatsObj.getStatsData());
+
+aggregator.useDensityFunctionForNDVEstimation = true;
+computedStatsObj = aggregator.aggregate(statsList, partitions, true);
+Assert.assertEquals(data1, computedStatsObj.getStatsData());
+
+aggregator.useDensityFunctionForNDVEstimation = false;
+aggregator.ndvTuner = 1;
+// ndv tuner does not have any effect because min numDVs and max 

[jira] [Work logged] (HIVE-26445) Use tez.local.mode.without.network for qtests

2022-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26445?focusedWorklogId=808593=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808593
 ]

ASF GitHub Bot logged work on HIVE-26445:
-

Author: ASF GitHub Bot
Created on: 14/Sep/22 09:41
Start Date: 14/Sep/22 09:41
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #3583:
URL: https://github.com/apache/hive/pull/3583#issuecomment-1246505407

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=3583)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=3583=false=BUG)
 
[![C](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/C-16px.png
 
'C')](https://sonarcloud.io/project/issues?id=apache_hive=3583=false=BUG)
 [1 
Bug](https://sonarcloud.io/project/issues?id=apache_hive=3583=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=3583=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3583=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=3583=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3583=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3583=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3583=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=3583=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3583=false=CODE_SMELL)
 [44 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=3583=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3583=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3583=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 808593)
Time Spent: 0.5h  (was: 20m)

> Use tez.local.mode.without.network for qtests
> -
>
> Key: HIVE-26445
> URL: https://issues.apache.org/jira/browse/HIVE-26445
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> looks like in case of iceberg, the local dagclient behaves weird:
> {code}
> 2022-08-02T06:54:36,669 ERROR [2f953972-7675-4594-8d6b-d1c295c056a5 
> Time-limited test] tez.TezTask: Failed to execute tez graph.
> java.lang.NullPointerException: null
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezTask.collectCommitInformation(TezTask.java:367)
>  ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:279) 
> [hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212) 
> [hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
> [hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:354) 
> [hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>   at 

[jira] [Work logged] (HIVE-26488) Fix NPE in DDLSemanticAnalyzerFactory during compilation

2022-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26488?focusedWorklogId=808573=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808573
 ]

ASF GitHub Bot logged work on HIVE-26488:
-

Author: ASF GitHub Bot
Created on: 14/Sep/22 09:13
Start Date: 14/Sep/22 09:13
Worklog Time Spent: 10m 
  Work Description: zabetak commented on code in PR #3538:
URL: https://github.com/apache/hive/pull/3538#discussion_r970539353


##
ql/src/java/org/apache/hadoop/hive/ql/ddl/DDLSemanticAnalyzerFactory.java:
##
@@ -65,10 +68,12 @@ public interface DDLSemanticAnalyzerCategory {
   new HashMap<>();
 
   static {
-Set> analyzerClasses1 =
-new Reflections(DDL_ROOT).getSubTypesOf(BaseSemanticAnalyzer.class);
-Set> analyzerClasses2 =
-new Reflections(DDL_ROOT).getSubTypesOf(CalcitePlanner.class);
+Set> analyzerClasses1 = new 
Reflections(
+new ConfigurationBuilder()
+.setUrls(ClasspathHelper.forPackage(DDL_ROOT)).filterInputsBy(new 
FilterBuilder().includePackage(DDL_ROOT)).setExpandSuperTypes(false)).getSubTypesOf(BaseSemanticAnalyzer.class);
+Set> analyzerClasses2 = new Reflections(
+new ConfigurationBuilder().filterInputsBy(new 
FilterBuilder().includePackage(DDL_ROOT))
+
.setUrls(ClasspathHelper.forPackage(DDL_ROOT)).setExpandSuperTypes(false)).getSubTypesOf(CalcitePlanner.class);
 Set> analyzerClasses = 
Sets.union(analyzerClasses1, analyzerClasses2);
 for (Class analyzerClass : 
analyzerClasses) {

Review Comment:
   Thanks! Please delete the following lines as well; they are redundant.
   ```java
   Set> analyzerClasses2 =
   new Reflections(DDL_ROOT).getSubTypesOf(CalcitePlanner.class);
   Set> analyzerClasses = 
Sets.union(analyzerClasses1, analyzerClasses2);
   ```
   Consider adding a null check after the following line to avoid similar 
problems in the future:
   ```java
   DDLType ddlType = analyzerCategoryClass.getAnnotation(DDLType.class);
   ```





Issue Time Tracking
---

Worklog Id: (was: 808573)
Time Spent: 1h 50m  (was: 1h 40m)

> Fix NPE in DDLSemanticAnalyzerFactory during compilation
> 
>
> Key: HIVE-26488
> URL: https://issues.apache.org/jira/browse/HIVE-26488
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> *Exception Trace:*
> {noformat}
> java.lang.ExceptionInInitializerError
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzerFactory.getInternal(SemanticAnalyzerFactory.java:62)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzerFactory.get(SemanticAnalyzerFactory.java:41)
>   at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:209)
>   at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:106)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:507)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:459)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:424)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:418)
> {noformat}
> *Cause:*
> {noformat}
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.ddl.DDLSemanticAnalyzerFactory.(DDLSemanticAnalyzerFactory.java:84)
>   ... 40 more
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26488) Fix NPE in DDLSemanticAnalyzerFactory during compilation

2022-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26488?focusedWorklogId=808559=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808559
 ]

ASF GitHub Bot logged work on HIVE-26488:
-

Author: ASF GitHub Bot
Created on: 14/Sep/22 08:36
Start Date: 14/Sep/22 08:36
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on code in PR #3538:
URL: https://github.com/apache/hive/pull/3538#discussion_r970498592


##
ql/src/java/org/apache/hadoop/hive/ql/ddl/DDLSemanticAnalyzerFactory.java:
##
@@ -65,10 +68,12 @@ public interface DDLSemanticAnalyzerCategory {
   new HashMap<>();
 
   static {
-Set> analyzerClasses1 =
-new Reflections(DDL_ROOT).getSubTypesOf(BaseSemanticAnalyzer.class);
-Set> analyzerClasses2 =
-new Reflections(DDL_ROOT).getSubTypesOf(CalcitePlanner.class);
+Set> analyzerClasses1 = new 
Reflections(
+new ConfigurationBuilder()
+.setUrls(ClasspathHelper.forPackage(DDL_ROOT)).filterInputsBy(new 
FilterBuilder().includePackage(DDL_ROOT)).setExpandSuperTypes(false)).getSubTypesOf(BaseSemanticAnalyzer.class);
+Set> analyzerClasses2 = new Reflections(
+new ConfigurationBuilder().filterInputsBy(new 
FilterBuilder().includePackage(DDL_ROOT))
+
.setUrls(ClasspathHelper.forPackage(DDL_ROOT)).setExpandSuperTypes(false)).getSubTypesOf(CalcitePlanner.class);
 Set> analyzerClasses = 
Sets.union(analyzerClasses1, analyzerClasses2);
 for (Class analyzerClass : 
analyzerClasses) {

Review Comment:
   Done





Issue Time Tracking
---

Worklog Id: (was: 808559)
Time Spent: 1h 40m  (was: 1.5h)

> Fix NPE in DDLSemanticAnalyzerFactory during compilation
> 
>
> Key: HIVE-26488
> URL: https://issues.apache.org/jira/browse/HIVE-26488
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> *Exception Trace:*
> {noformat}
> java.lang.ExceptionInInitializerError
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzerFactory.getInternal(SemanticAnalyzerFactory.java:62)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzerFactory.get(SemanticAnalyzerFactory.java:41)
>   at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:209)
>   at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:106)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:507)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:459)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:424)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:418)
> {noformat}
> *Cause:*
> {noformat}
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.ddl.DDLSemanticAnalyzerFactory.(DDLSemanticAnalyzerFactory.java:84)
>   ... 40 more
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26521) Iceberg: Raise exception when running delete/update statements on V1 tables

2022-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26521?focusedWorklogId=808554=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808554
 ]

ASF GitHub Bot logged work on HIVE-26521:
-

Author: ASF GitHub Bot
Created on: 14/Sep/22 08:23
Start Date: 14/Sep/22 08:23
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #3579:
URL: https://github.com/apache/hive/pull/3579#issuecomment-1246415166

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=3579)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=3579=false=BUG)
 
[![C](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/C-16px.png
 
'C')](https://sonarcloud.io/project/issues?id=apache_hive=3579=false=BUG)
 [1 
Bug](https://sonarcloud.io/project/issues?id=apache_hive=3579=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=3579=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3579=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=3579=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3579=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3579=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3579=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=3579=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3579=false=CODE_SMELL)
 [44 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=3579=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3579=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3579=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 808554)
Time Spent: 1h 20m  (was: 1h 10m)

> Iceberg: Raise exception when running delete/update statements on V1 tables
> ---
>
> Key: HIVE-26521
> URL: https://issues.apache.org/jira/browse/HIVE-26521
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Right now an exception is raised on the executor side when trying to commit 
> the delete file. We should throw an exception earlier, during the compilation 
> phase.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-1271) Case sensitiveness of type information specified when using custom reducer causes type mismatch

2022-09-14 Thread LiaoShuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LiaoShuang updated HIVE-1271:
-
Description: 
Type information specified while using a custom reduce script is converted to 
lower case, and causes type mismatch during query semantic analysis . The 
following REDUCE query where field name = "userId" failed.

hive> CREATE TABLE SS (
> a INT,
> b INT,
> vals ARRAY>
> );
OK

hive> FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
> INSERT OVERWRITE TABLE SS
> REDUCE *
> USING 'myreduce.py'
> AS
> (a INT,
> b INT,
> vals ARRAY>
> )
> ;
FAILED: Error in semantic analysis: line 2:27 Cannot insert into
target table because column number/types are different SS: Cannot
convert column 2 from array> to
array>.

The same query worked fine after changing "userId" to "userid".

  was:
Type information specified while using a custom reduce script is converted to 
lower case, and causes type mismatch during query semantic analysis . The 
following REDUCE query where field name = "userId" failed.

hive> CREATE TABLE SS (
> a INT,
> b INT,
> vals ARRAY>
> );
OK

hive> FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
> INSERT OVERWRITE TABLE SS
> REDUCE *
> USING 'myreduce.py'
> AS
> (a INT,
> b INT,
> vals ARRAY>
> )
> ;
FAILED: Error in semantic analysis: line 2:27 Cannot insert into
target table because column number/types are different SS: Cannot
convert column 2 from array> to
array>.

The same query worked fine after changing "userId" to "userid".

*TEST


> Case sensitiveness of type information specified when using custom reducer 
> causes type mismatch
> ---
>
> Key: HIVE-1271
> URL: https://issues.apache.org/jira/browse/HIVE-1271
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.5.0
>Reporter: Dilip Joseph
>Assignee: Arvind Prabhakar
>Priority: Major
> Fix For: 0.6.0
>
> Attachments: HIVE-1271-1.patch, HIVE-1271.patch
>
>
> Type information specified while using a custom reduce script is converted to 
> lower case, and causes type mismatch during query semantic analysis . The 
> following REDUCE query where field name = "userId" failed.
> hive> CREATE TABLE SS (
> > a INT,
> > b INT,
> > vals ARRAY>
> > );
> OK
> hive> FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
> > INSERT OVERWRITE TABLE SS
> > REDUCE *
> > USING 'myreduce.py'
> > AS
> > (a INT,
> > b INT,
> > vals ARRAY>
> > )
> > ;
> FAILED: Error in semantic analysis: line 2:27 Cannot insert into
> target table because column number/types are different SS: Cannot
> convert column 2 from array> to
> array>.
> The same query worked fine after changing "userId" to "userid".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-1271) Case sensitiveness of type information specified when using custom reducer causes type mismatch

2022-09-14 Thread LiaoShuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LiaoShuang updated HIVE-1271:
-
Description: 
Type information specified while using a custom reduce script is converted to 
lower case, and causes type mismatch during query semantic analysis . The 
following REDUCE query where field name = "userId" failed.

hive> CREATE TABLE SS (
> a INT,
> b INT,
> vals ARRAY>
> );
OK

hive> FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
> INSERT OVERWRITE TABLE SS
> REDUCE *
> USING 'myreduce.py'
> AS
> (a INT,
> b INT,
> vals ARRAY>
> )
> ;
FAILED: Error in semantic analysis: line 2:27 Cannot insert into
target table because column number/types are different SS: Cannot
convert column 2 from array> to
array>.

The same query worked fine after changing "userId" to "userid".

*TEST

  was:
Type information specified  while using a custom reduce script is converted to 
lower case, and causes type mismatch during query semantic analysis .  The 
following REDUCE query where field name =  "userId" failed.

hive> CREATE TABLE SS (
   > a INT,
   > b INT,
   > vals ARRAY>
   > );
OK

hive> FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
   > INSERT OVERWRITE TABLE SS
   > REDUCE *
   > USING 'myreduce.py'
   > AS
   > (a INT,
   > b INT,
   > vals ARRAY>
   > )
   > ;
FAILED: Error in semantic analysis: line 2:27 Cannot insert into
target table because column number/types are different SS: Cannot
convert column 2 from array> to
array>.

The same query worked fine after changing "userId" to "userid".


> Case sensitiveness of type information specified when using custom reducer 
> causes type mismatch
> ---
>
> Key: HIVE-1271
> URL: https://issues.apache.org/jira/browse/HIVE-1271
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.5.0
>Reporter: Dilip Joseph
>Assignee: Arvind Prabhakar
>Priority: Major
> Fix For: 0.6.0
>
> Attachments: HIVE-1271-1.patch, HIVE-1271.patch
>
>
> Type information specified while using a custom reduce script is converted to 
> lower case, and causes type mismatch during query semantic analysis . The 
> following REDUCE query where field name = "userId" failed.
> hive> CREATE TABLE SS (
> > a INT,
> > b INT,
> > vals ARRAY>
> > );
> OK
> hive> FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
> > INSERT OVERWRITE TABLE SS
> > REDUCE *
> > USING 'myreduce.py'
> > AS
> > (a INT,
> > b INT,
> > vals ARRAY>
> > )
> > ;
> FAILED: Error in semantic analysis: line 2:27 Cannot insert into
> target table because column number/types are different SS: Cannot
> convert column 2 from array> to
> array>.
> The same query worked fine after changing "userId" to "userid".
> *TEST



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-26476) Iceberg: map "ORCFILE" to "ORC" while creating an iceberg table

2022-09-14 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-26476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Pintér resolved HIVE-26476.
--
Resolution: Fixed

> Iceberg: map "ORCFILE" to "ORC" while creating an iceberg table
> ---
>
> Key: HIVE-26476
> URL: https://issues.apache.org/jira/browse/HIVE-26476
> Project: Hive
>  Issue Type: Bug
>Reporter: Manthan B Y
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> *Issue:* Insert query failing with VERTEX_FAILURE
> *Steps to Reproduce:*
>  # Open Beeline session
>  # Execute the following queries
> {code:java}
> DROP TABLE IF EXISTS t2;
> CREATE TABLE IF NOT EXISTS t2(c0 DOUBLE , c1 DOUBLE , c2 DECIMAL) STORED BY 
> ICEBERG STORED AS ORCFILE;
> INSERT INTO t2(c1, c0) VALUES(0.1803113419993464, 0.9381388537256228);{code}
> *Result:*
> {code:java}
> org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordWriter(HiveFileFormatUtils.java:294)
>  at 
> org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:279)
>  ... 36 more ]], Vertex did not succeed due to OWN_TASK_FAILURE, 
> failedTasks:1 killedTasks:0, Vertex vertex_1660631059889_0001_8_00 [Map 1] 
> killed/failed due to:OWN_TASK_FAILURE]Vertex killed, vertexName=Reducer 2, 
> vertexId=vertex_1660631059889_0001_8_01, diagnostics=[Vertex received Kill 
> while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, 
> failedTasks:0 killedTasks:1, Vertex vertex_1660631059889_0001_8_01 [Reducer 
> 2] killed/failed due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to 
> VERTEX_FAILURE. failedVertices:1 killedVertices:1{code}
> *Note:* Same query with table in non-iceberg format works without error



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26507) Do not allow hive to iceberg migration if source table contains CHAR or VARCHAR columns

2022-09-14 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-26507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17603931#comment-17603931
 ] 

László Pintér commented on HIVE-26507:
--

The addendum was merged into master. Thanks, [~szita] for the review!

> Do not allow hive to iceberg migration if source table contains CHAR or 
> VARCHAR columns
> ---
>
> Key: HIVE-26507
> URL: https://issues.apache.org/jira/browse/HIVE-26507
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: László Pintér
>Priority: Major
>  Labels: iceberg, pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> "alter table" statements can be used for generating iceberg metadata 
> information (i.e for converting external tables  -> iceberg tables).
> As part of this process, it also converts certain datatypes to iceberg 
> compatible types (e.g char -> string). "iceberg.mr.schema.auto.conversion" 
> enables this conversion.
> This could cause certain issues at runtime. Here is an example
> {noformat}
> Before conversion:
> ==
> -- external table
> select count(*) from customer_demographics where cd_gender = 'F' and 
> cd_marital_status = 'U' and cd_education_status = '2 yr Degree';
> 27440
> after conversion:
> =
> -- iceberg table
> select count(*) from customer_demographics where cd_gender = 'F' and 
> cd_marital_status = 'U' and cd_education_status = '2 yr Degree';
> 0
> select count(*) from customer_demographics where cd_gender = 'F' and 
> cd_marital_status = 'U' and trim(cd_education_status) = '2 yr Degree';
> 27440
>  {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26499) 启用向量化(hive.vectorized.execution.enabled=true)后,case when的计算结果出现预设外的值

2022-09-14 Thread Zhizhen Hou (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17603908#comment-17603908
 ] 

Zhizhen Hou commented on HIVE-26499:


Master branch has solved this problem. 
https://issues.apache.org/jira/browse/HIVE-26408

> 启用向量化(hive.vectorized.execution.enabled=true)后,case when的计算结果出现预设外的值
> 
>
> Key: HIVE-26499
> URL: https://issues.apache.org/jira/browse/HIVE-26499
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.0
> Environment: hdfs(3.1.1)
> yarn(3.1.1)
> zookeeper(3.4.6)
> hive(3.1.0)
> tez(0.9.1)
>Reporter: Ricco-Chan
>Priority: Major
> Attachments: image-2022-08-29-11-04-21-921.png
>
>
> -- case when预设值只有1、2、3,计算结果中出现5和6。发现此bug时,对应表使用的是parquet格式 + snappy 压缩
>  
> select distinct(traveller_type) from
> (
>     select pri_acct_no,
>         case
>             when (t1.consume_flag = '1' and substr(t1.areacode, 1, 2) <> 
> '65') then '2'
>             when (substr(t1.areacode, 1, 2) = substr(t1.country_id_new, 1, 2) 
> and t1.consume_flag = '1') then '1'
>             else '3'
>         end as traveller_type
>     from my_table t1 where consume_flag = '1'
> ) t2;
> -
> !image-2022-08-29-11-04-21-921.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)