[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness
[ https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808969=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808969 ] ASF GitHub Bot logged work on HIVE-26496: - Author: ASF GitHub Bot Created on: 15/Sep/22 05:06 Start Date: 15/Sep/22 05:06 Worklog Time Spent: 10m Work Description: sonarcloud[bot] commented on PR #3559: URL: https://github.com/apache/hive/pull/3559#issuecomment-1247586713 Kudos, SonarCloud Quality Gate passed! [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive=3559) [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=3559=false=BUG) [![C](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/C-16px.png 'C')](https://sonarcloud.io/project/issues?id=apache_hive=3559=false=BUG) [1 Bug](https://sonarcloud.io/project/issues?id=apache_hive=3559=false=BUG) [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=3559=false=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=3559=false=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=3559=false=VULNERABILITY) [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3559=false=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3559=false=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3559=false=SECURITY_HOTSPOT) [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive=3559=false=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=3559=false=CODE_SMELL) [9 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive=3559=false=CODE_SMELL) [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive=3559=coverage=list) No Coverage information [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive=3559=duplicated_lines_density=list) No Duplication information Issue Time Tracking --- Worklog Id: (was: 808969) Time Spent: 6h 40m (was: 6.5h) > FetchOperator scans delete_delta folders multiple times causing slowness > > > Key: HIVE-26496 > URL: https://issues.apache.org/jira/browse/HIVE-26496 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Rajesh Balamohan >Assignee: Dmitriy Fingerman >Priority: Major > Labels: pull-request-available > Time Spent: 6h 40m > Remaining Estimate: 0h > > FetchOperator scans way too many number of files/directories than needed. > For e.g here is a layout of a table which had set of updates and deletes. > There are set of "delta" and "delete_delta" folders which are created. > {noformat} > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/base_001 > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_002_002_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_003_003_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_004_004_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_005_005_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_006_006_ >
[jira] [Updated] (HIVE-21508) ClassCastException when initializing HiveMetaStoreClient on JDK10 or newer
[ https://issues.apache.org/jira/browse/HIVE-21508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated HIVE-21508: -- Fix Version/s: 3.1.3 > ClassCastException when initializing HiveMetaStoreClient on JDK10 or newer > -- > > Key: HIVE-21508 > URL: https://issues.apache.org/jira/browse/HIVE-21508 > Project: Hive > Issue Type: Bug > Components: Clients >Affects Versions: 2.3.4, 3.2.0 >Reporter: Adar Dembo >Assignee: Ana Jalba >Priority: Major > Fix For: 2.3.7, 2.4.0, 3.1.3, 3.2.0, 4.0.0, 4.0.0-alpha-1 > > Attachments: HIVE-21508.1.patch, HIVE-21508.2.branch-2.3.patch, > HIVE-21508.3.branch-2.patch, HIVE-21508.4.branch-3.1.patch, > HIVE-21508.5.branch-3.1.patch, HIVE-21508.6.branch-3.patch, HIVE-21508.patch > > > There's this block of code in {{HiveMetaStoreClient:resolveUris}} (called > from the constructor) on master: > {noformat} > private URI metastoreUris[]; > ... > if (MetastoreConf.getVar(conf, > ConfVars.THRIFT_URI_SELECTION).equalsIgnoreCase("RANDOM")) { > List uriList = Arrays.asList(metastoreUris); > Collections.shuffle(uriList); > metastoreUris = (URI[]) uriList.toArray(); > } > {noformat} > The cast to {{URI[]}} throws a {{ClassCastException}} beginning with JDK 10, > possibly with JDK 9 as well. Note that {{THRIFT_URI_SELECTION}} defaults to > {{RANDOM}} so this should affect anyone who creates a > {{HiveMetaStoreClient}}. On master this can be overridden with {{SEQUENTIAL}} > to avoid the broken case; I'm working against 2.3.4 where there's no such > workaround. > [Here's|https://stackoverflow.com/questions/51372788/array-cast-java-8-vs-java-9] > a StackOverflow post that explains the issue in more detail. Interestingly, > the author described the issue in the context of the HMS; not sure why there > was no follow up with a Hive bug report. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-22013) "Show table extended" query fails with Wrong FS error for partition in customized location
[ https://issues.apache.org/jira/browse/HIVE-22013?focusedWorklogId=808913=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808913 ] ASF GitHub Bot logged work on HIVE-22013: - Author: ASF GitHub Bot Created on: 15/Sep/22 00:25 Start Date: 15/Sep/22 00:25 Worklog Time Spent: 10m Work Description: github-actions[bot] commented on PR #3231: URL: https://github.com/apache/hive/pull/3231#issuecomment-1247430251 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the d...@hive.apache.org list if the patch is in need of reviews. Issue Time Tracking --- Worklog Id: (was: 808913) Time Spent: 1h (was: 50m) > "Show table extended" query fails with Wrong FS error for partition in > customized location > -- > > Key: HIVE-22013 > URL: https://issues.apache.org/jira/browse/HIVE-22013 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Rajesh Balamohan >Assignee: Ganesha Shreedhara >Priority: Minor > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > In some of the `show table extended` statements, following codepath is invoked > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/TextMetaDataFormatter.java#L421] > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/TextMetaDataFormatter.java#L449] > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/TextMetaDataFormatter.java#L468] > 1. Not sure why this invokes stats computation. This should be removed? > 2. Even if #1 is needed, it would be broken when {{tblPath}} and > {{partitionPaths}} are different (i.e when both of them of them are in > different fs or configured via router etc). > {noformat} > Caused by: java.lang.IllegalArgumentException: Wrong FS: > hdfs://xyz/blah/tables/location/, expected: hdfs://zzz.. > at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:657) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:194) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:698) > at > org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:106) > at > org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:763) > at > org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:759) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:759) > at > org.apache.hadoop.hive.ql.metadata.formatting.TextMetaDataFormatter.writeFileSystemStats(TextMetaDataFormatter.java > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26535) Iceberg: Support adding parquet compression type via Table properties
[ https://issues.apache.org/jira/browse/HIVE-26535?focusedWorklogId=808906=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808906 ] ASF GitHub Bot logged work on HIVE-26535: - Author: ASF GitHub Bot Created on: 14/Sep/22 23:48 Start Date: 14/Sep/22 23:48 Worklog Time Spent: 10m Work Description: sonarcloud[bot] commented on PR #3597: URL: https://github.com/apache/hive/pull/3597#issuecomment-1247409805 Kudos, SonarCloud Quality Gate passed! [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive=3597) [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=3597=false=BUG) [![C](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/C-16px.png 'C')](https://sonarcloud.io/project/issues?id=apache_hive=3597=false=BUG) [1 Bug](https://sonarcloud.io/project/issues?id=apache_hive=3597=false=BUG) [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=3597=false=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=3597=false=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=3597=false=VULNERABILITY) [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3597=false=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3597=false=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3597=false=SECURITY_HOTSPOT) [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive=3597=false=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=3597=false=CODE_SMELL) [8 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive=3597=false=CODE_SMELL) [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive=3597=coverage=list) No Coverage information [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive=3597=duplicated_lines_density=list) No Duplication information Issue Time Tracking --- Worklog Id: (was: 808906) Time Spent: 20m (was: 10m) > Iceberg: Support adding parquet compression type via Table properties > - > > Key: HIVE-26535 > URL: https://issues.apache.org/jira/browse/HIVE-26535 > Project: Hive > Issue Type: Improvement >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > As of now for Iceberg table the parquet compression format gets ignored. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness
[ https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808904=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808904 ] ASF GitHub Bot logged work on HIVE-26496: - Author: ASF GitHub Bot Created on: 14/Sep/22 23:47 Start Date: 14/Sep/22 23:47 Worklog Time Spent: 10m Work Description: difin commented on code in PR #3559: URL: https://github.com/apache/hive/pull/3559#discussion_r971391049 ## ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java: ## @@ -104,28 +104,43 @@ public OrcSplit(Path path, Object fileId, long offset, long length, String[] hos this.isOriginal = isOriginal; this.hasBase = hasBase; this.rootDir = rootDir; -this.deltas.addAll(filterDeltasByBucketId(deltas, AcidUtils.parseBucketId(path))); +int bucketId = AcidUtils.parseBucketId(path); +AcidUtils.ParsedDeltaLight parentDelta = AcidUtils.ParsedDeltaLight.parse(getPath().getParent()); Review Comment: reverted to previous version without ParsedDeltaLight.parse(path.getParent()) Issue Time Tracking --- Worklog Id: (was: 808904) Time Spent: 6.5h (was: 6h 20m) > FetchOperator scans delete_delta folders multiple times causing slowness > > > Key: HIVE-26496 > URL: https://issues.apache.org/jira/browse/HIVE-26496 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Rajesh Balamohan >Assignee: Dmitriy Fingerman >Priority: Major > Labels: pull-request-available > Time Spent: 6.5h > Remaining Estimate: 0h > > FetchOperator scans way too many number of files/directories than needed. > For e.g here is a layout of a table which had set of updates and deletes. > There are set of "delta" and "delete_delta" folders which are created. > {noformat} > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/base_001 > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_002_002_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_003_003_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_004_004_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_005_005_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_006_006_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_007_007_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_008_008_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_009_009_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_010_010_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_011_011_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_012_012_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_013_013_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_014_014_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_015_015_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_016_016_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_017_017_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_018_018_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_019_019_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_020_020_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_021_021_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_022_022_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_002_002_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_003_003_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_004_004_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_005_005_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_006_006_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_007_007_ >
[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness
[ https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808903=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808903 ] ASF GitHub Bot logged work on HIVE-26496: - Author: ASF GitHub Bot Created on: 14/Sep/22 23:23 Start Date: 14/Sep/22 23:23 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3559: URL: https://github.com/apache/hive/pull/3559#discussion_r971375969 ## ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java: ## @@ -104,13 +104,32 @@ public OrcSplit(Path path, Object fileId, long offset, long length, String[] hos this.isOriginal = isOriginal; this.hasBase = hasBase; this.rootDir = rootDir; -this.deltas.addAll(filterDeltasByBucketId(deltas, AcidUtils.parseBucketId(path))); +this.deltas.addAll(filterDeleteDeltasByWriteIds +(filterDeltasByBucketId(deltas, AcidUtils.parseBucketId(path)), conf)); this.projColsUncompressedSize = projectedDataSize <= 0 ? length : projectedDataSize; // setting file length to Long.MAX_VALUE will let orc reader read file length from file system this.fileLen = fileLen <= 0 ? Long.MAX_VALUE : fileLen; this.syntheticAcidProps = syntheticAcidProps; } + /** + * For every split we want to filter out the delete deltas that contain events that happened only + * in the past relative to the split + * @param deltas + * @param conf + * @return + */ + protected List filterDeleteDeltasByWriteIds( + List deltas, Configuration conf) throws IOException { + +AcidOutputFormat.Options orcSplitMinMaxWriteIds = +AcidUtils.parseBaseOrDeltaBucketFilename(getPath(), conf); Review Comment: root path could be either delta or base and we need to check if deleteDeltas are applicable to them. the prev version was actually correct long minWriteId = !deltas.isEmpty() ? AcidUtils.parseBaseOrDeltaBucketFilename(path, null).getMinimumWriteId() : -1; this.deltas.addAll( deltas.stream() .filter(delta -> isQualifiedDeleteDeltasByWriteIds(delta, minWriteId)) .flatMap(delta -> filterDeltasByBucketId(delta, bucketId)) .collect(Collectors.toList())); Issue Time Tracking --- Worklog Id: (was: 808903) Time Spent: 6h 20m (was: 6h 10m) > FetchOperator scans delete_delta folders multiple times causing slowness > > > Key: HIVE-26496 > URL: https://issues.apache.org/jira/browse/HIVE-26496 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Rajesh Balamohan >Assignee: Dmitriy Fingerman >Priority: Major > Labels: pull-request-available > Time Spent: 6h 20m > Remaining Estimate: 0h > > FetchOperator scans way too many number of files/directories than needed. > For e.g here is a layout of a table which had set of updates and deletes. > There are set of "delta" and "delete_delta" folders which are created. > {noformat} > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/base_001 > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_002_002_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_003_003_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_004_004_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_005_005_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_006_006_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_007_007_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_008_008_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_009_009_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_010_010_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_011_011_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_012_012_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_013_013_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_014_014_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_015_015_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_016_016_ >
[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness
[ https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808902=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808902 ] ASF GitHub Bot logged work on HIVE-26496: - Author: ASF GitHub Bot Created on: 14/Sep/22 23:21 Start Date: 14/Sep/22 23:21 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3559: URL: https://github.com/apache/hive/pull/3559#discussion_r971375969 ## ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java: ## @@ -104,13 +104,32 @@ public OrcSplit(Path path, Object fileId, long offset, long length, String[] hos this.isOriginal = isOriginal; this.hasBase = hasBase; this.rootDir = rootDir; -this.deltas.addAll(filterDeltasByBucketId(deltas, AcidUtils.parseBucketId(path))); +this.deltas.addAll(filterDeleteDeltasByWriteIds +(filterDeltasByBucketId(deltas, AcidUtils.parseBucketId(path)), conf)); this.projColsUncompressedSize = projectedDataSize <= 0 ? length : projectedDataSize; // setting file length to Long.MAX_VALUE will let orc reader read file length from file system this.fileLen = fileLen <= 0 ? Long.MAX_VALUE : fileLen; this.syntheticAcidProps = syntheticAcidProps; } + /** + * For every split we want to filter out the delete deltas that contain events that happened only + * in the past relative to the split + * @param deltas + * @param conf + * @return + */ + protected List filterDeleteDeltasByWriteIds( + List deltas, Configuration conf) throws IOException { + +AcidOutputFormat.Options orcSplitMinMaxWriteIds = +AcidUtils.parseBaseOrDeltaBucketFilename(getPath(), conf); Review Comment: root path could be anything: delta/base. the prev version was actually correct long minWriteId = !deltas.isEmpty() ? AcidUtils.parseBaseOrDeltaBucketFilename(path, null).getMinimumWriteId() : -1; this.deltas.addAll( deltas.stream() .filter(delta -> isQualifiedDeleteDeltasByWriteIds(delta, minWriteId)) .flatMap(delta -> filterDeltasByBucketId(delta, bucketId)) .collect(Collectors.toList())); Issue Time Tracking --- Worklog Id: (was: 808902) Time Spent: 6h 10m (was: 6h) > FetchOperator scans delete_delta folders multiple times causing slowness > > > Key: HIVE-26496 > URL: https://issues.apache.org/jira/browse/HIVE-26496 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Rajesh Balamohan >Assignee: Dmitriy Fingerman >Priority: Major > Labels: pull-request-available > Time Spent: 6h 10m > Remaining Estimate: 0h > > FetchOperator scans way too many number of files/directories than needed. > For e.g here is a layout of a table which had set of updates and deletes. > There are set of "delta" and "delete_delta" folders which are created. > {noformat} > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/base_001 > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_002_002_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_003_003_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_004_004_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_005_005_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_006_006_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_007_007_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_008_008_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_009_009_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_010_010_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_011_011_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_012_012_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_013_013_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_014_014_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_015_015_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_016_016_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_017_017_ >
[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness
[ https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808901=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808901 ] ASF GitHub Bot logged work on HIVE-26496: - Author: ASF GitHub Bot Created on: 14/Sep/22 23:19 Start Date: 14/Sep/22 23:19 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3559: URL: https://github.com/apache/hive/pull/3559#discussion_r971375969 ## ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java: ## @@ -104,13 +104,32 @@ public OrcSplit(Path path, Object fileId, long offset, long length, String[] hos this.isOriginal = isOriginal; this.hasBase = hasBase; this.rootDir = rootDir; -this.deltas.addAll(filterDeltasByBucketId(deltas, AcidUtils.parseBucketId(path))); +this.deltas.addAll(filterDeleteDeltasByWriteIds +(filterDeltasByBucketId(deltas, AcidUtils.parseBucketId(path)), conf)); this.projColsUncompressedSize = projectedDataSize <= 0 ? length : projectedDataSize; // setting file length to Long.MAX_VALUE will let orc reader read file length from file system this.fileLen = fileLen <= 0 ? Long.MAX_VALUE : fileLen; this.syntheticAcidProps = syntheticAcidProps; } + /** + * For every split we want to filter out the delete deltas that contain events that happened only + * in the past relative to the split + * @param deltas + * @param conf + * @return + */ + protected List filterDeleteDeltasByWriteIds( + List deltas, Configuration conf) throws IOException { + +AcidOutputFormat.Options orcSplitMinMaxWriteIds = +AcidUtils.parseBaseOrDeltaBucketFilename(getPath(), conf); Review Comment: root path could be anything: delta/delete-delta/base. the prev version was actually correct long minWriteId = !deltas.isEmpty() ? AcidUtils.parseBaseOrDeltaBucketFilename(path, null).getMinimumWriteId() : -1; this.deltas.addAll( deltas.stream() .filter(delta -> isQualifiedDeleteDeltasByWriteIds(delta, minWriteId)) .flatMap(delta -> filterDeltasByBucketId(delta, bucketId)) .collect(Collectors.toList())); Issue Time Tracking --- Worklog Id: (was: 808901) Time Spent: 6h (was: 5h 50m) > FetchOperator scans delete_delta folders multiple times causing slowness > > > Key: HIVE-26496 > URL: https://issues.apache.org/jira/browse/HIVE-26496 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Rajesh Balamohan >Assignee: Dmitriy Fingerman >Priority: Major > Labels: pull-request-available > Time Spent: 6h > Remaining Estimate: 0h > > FetchOperator scans way too many number of files/directories than needed. > For e.g here is a layout of a table which had set of updates and deletes. > There are set of "delta" and "delete_delta" folders which are created. > {noformat} > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/base_001 > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_002_002_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_003_003_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_004_004_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_005_005_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_006_006_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_007_007_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_008_008_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_009_009_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_010_010_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_011_011_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_012_012_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_013_013_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_014_014_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_015_015_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_016_016_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_017_017_ >
[jira] [Work logged] (HIVE-26535) Iceberg: Support adding parquet compression type via Table properties
[ https://issues.apache.org/jira/browse/HIVE-26535?focusedWorklogId=808896=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808896 ] ASF GitHub Bot logged work on HIVE-26535: - Author: ASF GitHub Bot Created on: 14/Sep/22 22:11 Start Date: 14/Sep/22 22:11 Worklog Time Spent: 10m Work Description: ayushtkn opened a new pull request, #3597: URL: https://github.com/apache/hive/pull/3597 ### What changes were proposed in this pull request? Add support to specify parquet compression properties in iceberg table via TBLPROPERTIES. Issue Time Tracking --- Worklog Id: (was: 808896) Remaining Estimate: 0h Time Spent: 10m > Iceberg: Support adding parquet compression type via Table properties > - > > Key: HIVE-26535 > URL: https://issues.apache.org/jira/browse/HIVE-26535 > Project: Hive > Issue Type: Improvement >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > As of now for Iceberg table the parquet compression format gets ignored. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-26535) Iceberg: Support adding parquet compression type via Table properties
[ https://issues.apache.org/jira/browse/HIVE-26535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-26535: -- Labels: pull-request-available (was: ) > Iceberg: Support adding parquet compression type via Table properties > - > > Key: HIVE-26535 > URL: https://issues.apache.org/jira/browse/HIVE-26535 > Project: Hive > Issue Type: Improvement >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > As of now for Iceberg table the parquet compression format gets ignored. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-26535) Iceberg: Support adding parquet compression type via Table properties
[ https://issues.apache.org/jira/browse/HIVE-26535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena reassigned HIVE-26535: --- > Iceberg: Support adding parquet compression type via Table properties > - > > Key: HIVE-26535 > URL: https://issues.apache.org/jira/browse/HIVE-26535 > Project: Hive > Issue Type: Improvement >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > > As of now for Iceberg table the parquet compression format gets ignored. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-25848) Empty result for structs in point lookup optimization with vectorization on
[ https://issues.apache.org/jira/browse/HIVE-25848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17604984#comment-17604984 ] Hankó Gergely commented on HIVE-25848: -- My finding is that some (probably all) of the vectorized Filter*InList expressions are unprepared to handle constant values, only column expressions. This causes the issue in the description and also others: {code:java} set hive.fetch.task.conversion=none; set hive.optimize.point.lookup=false; set hive.cbo.enable=false; create table test (a string) partitioned by (y string); insert into test values ('aa', 2022); select * from test where (struct(2022) IN (struct(2022))); --gives empty result --works fine if vectorization is off{code} {code:java} set hive.fetch.task.conversion=none; set hive.optimize.point.lookup=false; set hive.cbo.enable=false; set hive.optimize.constant.propagation=false; set hive.optimize.ppd=false; create table test (a string) partitioned by (y string); insert into test values ('aa', 2022); select * from test where (2022 IN (2022)); --throws error --works fine if vectorization is off {code} It's probably the VectorizationContext.getInExpression() that should be tweaked not to use the multi-purpose createVectorExpression method but an InExpression-specific one that handles constants properly. Maybe it could do the evaluation for constants right away and generate a FilterConstantBooleanVectorExpression for the result, it would greatly speed up such operations. The pull request solves a different bug where embedded expressions in structs are not initialized properly after deserialization. It is probably unrelated to this one so I'm going to create a new bug ticket for it. > Empty result for structs in point lookup optimization with vectorization on > --- > > Key: HIVE-25848 > URL: https://issues.apache.org/jira/browse/HIVE-25848 > Project: Hive > Issue Type: Bug >Reporter: Ádám Szita >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Repro steps: > {code:java} > set hive.fetch.task.conversion=none; > create table test (a string) partitioned by (y string, m string); > insert into test values ('aa', 2022, 1); > select * from test where (y=year(date_sub(current_date,4)) and > m=month(date_sub(current_date,4))) or (y=year(date_sub(current_date,10)) and > m=month(date_sub(current_date,10)) ); > --gives empty result{code} > Turning either of the feature below off yields to good result (1 row > expected): > {code:java} > set hive.optimize.point.lookup=false; > set hive.cbo.enable=false; > set hive.vectorized.execution.enabled=false; > {code} > Expected good result is: > {code} > +-+-+-+ > | test.a | test.y | test.m | > +-+-+-+ > | aa | 2022 | 1 | > +-+-+-+ {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness
[ https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808881=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808881 ] ASF GitHub Bot logged work on HIVE-26496: - Author: ASF GitHub Bot Created on: 14/Sep/22 21:23 Start Date: 14/Sep/22 21:23 Worklog Time Spent: 10m Work Description: difin commented on code in PR #3559: URL: https://github.com/apache/hive/pull/3559#discussion_r971221139 ## ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java: ## @@ -104,13 +104,32 @@ public OrcSplit(Path path, Object fileId, long offset, long length, String[] hos this.isOriginal = isOriginal; this.hasBase = hasBase; this.rootDir = rootDir; -this.deltas.addAll(filterDeltasByBucketId(deltas, AcidUtils.parseBucketId(path))); +this.deltas.addAll(filterDeleteDeltasByWriteIds Review Comment: deltas class member is marked as final and initialized before, that's why I couldn't reassign it and used addAll(). https://github.com/apache/hive/blob/e352684d5c87df1483444afc4c3ee897270bd413/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java#L67 Issue Time Tracking --- Worklog Id: (was: 808881) Time Spent: 5h 50m (was: 5h 40m) > FetchOperator scans delete_delta folders multiple times causing slowness > > > Key: HIVE-26496 > URL: https://issues.apache.org/jira/browse/HIVE-26496 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Rajesh Balamohan >Assignee: Dmitriy Fingerman >Priority: Major > Labels: pull-request-available > Time Spent: 5h 50m > Remaining Estimate: 0h > > FetchOperator scans way too many number of files/directories than needed. > For e.g here is a layout of a table which had set of updates and deletes. > There are set of "delta" and "delete_delta" folders which are created. > {noformat} > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/base_001 > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_002_002_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_003_003_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_004_004_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_005_005_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_006_006_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_007_007_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_008_008_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_009_009_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_010_010_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_011_011_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_012_012_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_013_013_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_014_014_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_015_015_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_016_016_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_017_017_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_018_018_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_019_019_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_020_020_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_021_021_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_022_022_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_002_002_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_003_003_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_004_004_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_005_005_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_006_006_ >
[jira] [Updated] (HIVE-25848) Empty result for structs in point lookup optimization with vectorization on
[ https://issues.apache.org/jira/browse/HIVE-25848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hankó Gergely updated HIVE-25848: - Labels: (was: pull-request-available) > Empty result for structs in point lookup optimization with vectorization on > --- > > Key: HIVE-25848 > URL: https://issues.apache.org/jira/browse/HIVE-25848 > Project: Hive > Issue Type: Bug >Reporter: Ádám Szita >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Repro steps: > {code:java} > set hive.fetch.task.conversion=none; > create table test (a string) partitioned by (y string, m string); > insert into test values ('aa', 2022, 1); > select * from test where (y=year(date_sub(current_date,4)) and > m=month(date_sub(current_date,4))) or (y=year(date_sub(current_date,10)) and > m=month(date_sub(current_date,10)) ); > --gives empty result{code} > Turning either of the feature below off yields to good result (1 row > expected): > {code:java} > set hive.optimize.point.lookup=false; > set hive.cbo.enable=false; > set hive.vectorized.execution.enabled=false; > {code} > Expected good result is: > {code} > +-+-+-+ > | test.a | test.y | test.m | > +-+-+-+ > | aa | 2022 | 1 | > +-+-+-+ {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness
[ https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808875=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808875 ] ASF GitHub Bot logged work on HIVE-26496: - Author: ASF GitHub Bot Created on: 14/Sep/22 20:53 Start Date: 14/Sep/22 20:53 Worklog Time Spent: 10m Work Description: difin commented on code in PR #3559: URL: https://github.com/apache/hive/pull/3559#discussion_r971221139 ## ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java: ## @@ -104,13 +104,32 @@ public OrcSplit(Path path, Object fileId, long offset, long length, String[] hos this.isOriginal = isOriginal; this.hasBase = hasBase; this.rootDir = rootDir; -this.deltas.addAll(filterDeltasByBucketId(deltas, AcidUtils.parseBucketId(path))); +this.deltas.addAll(filterDeleteDeltasByWriteIds Review Comment: deltas class member is marked as final, that's why I couldn't reassign it and used addAll(). https://github.com/apache/hive/blob/e352684d5c87df1483444afc4c3ee897270bd413/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java#L67 Issue Time Tracking --- Worklog Id: (was: 808875) Time Spent: 5h 40m (was: 5.5h) > FetchOperator scans delete_delta folders multiple times causing slowness > > > Key: HIVE-26496 > URL: https://issues.apache.org/jira/browse/HIVE-26496 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Rajesh Balamohan >Assignee: Dmitriy Fingerman >Priority: Major > Labels: pull-request-available > Time Spent: 5h 40m > Remaining Estimate: 0h > > FetchOperator scans way too many number of files/directories than needed. > For e.g here is a layout of a table which had set of updates and deletes. > There are set of "delta" and "delete_delta" folders which are created. > {noformat} > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/base_001 > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_002_002_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_003_003_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_004_004_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_005_005_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_006_006_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_007_007_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_008_008_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_009_009_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_010_010_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_011_011_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_012_012_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_013_013_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_014_014_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_015_015_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_016_016_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_017_017_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_018_018_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_019_019_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_020_020_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_021_021_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_022_022_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_002_002_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_003_003_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_004_004_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_005_005_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_006_006_ >
[jira] [Work logged] (HIVE-26522) Test for HIVE-22033 and backport to 3.1 and 2.3
[ https://issues.apache.org/jira/browse/HIVE-26522?focusedWorklogId=808869=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808869 ] ASF GitHub Bot logged work on HIVE-26522: - Author: ASF GitHub Bot Created on: 14/Sep/22 20:41 Start Date: 14/Sep/22 20:41 Worklog Time Spent: 10m Work Description: sonarcloud[bot] commented on PR #3585: URL: https://github.com/apache/hive/pull/3585#issuecomment-1247280592 Kudos, SonarCloud Quality Gate passed! [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive=3585) [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=3585=false=BUG) [![C](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/C-16px.png 'C')](https://sonarcloud.io/project/issues?id=apache_hive=3585=false=BUG) [1 Bug](https://sonarcloud.io/project/issues?id=apache_hive=3585=false=BUG) [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=3585=false=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=3585=false=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=3585=false=VULNERABILITY) [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3585=false=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3585=false=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3585=false=SECURITY_HOTSPOT) [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive=3585=false=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=3585=false=CODE_SMELL) [6 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive=3585=false=CODE_SMELL) [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive=3585=coverage=list) No Coverage information [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive=3585=duplicated_lines_density=list) No Duplication information Issue Time Tracking --- Worklog Id: (was: 808869) Time Spent: 1h 10m (was: 1h) > Test for HIVE-22033 and backport to 3.1 and 2.3 > --- > > Key: HIVE-26522 > URL: https://issues.apache.org/jira/browse/HIVE-26522 > Project: Hive > Issue Type: Bug > Components: Standalone Metastore >Affects Versions: 2.3.8, 3.1.3 >Reporter: Pavan Lanka >Assignee: Pavan Lanka >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > HIVE-22033 fixes the issue with Hive Delegation tokens so that the renewal > time is effective. > This looks at adding a test for HIVE-22033 and backporting this fix to 3.1 > and 2.3 branches in Hive. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26488) Fix NPE in DDLSemanticAnalyzerFactory during compilation
[ https://issues.apache.org/jira/browse/HIVE-26488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17604941#comment-17604941 ] Ayush Saxena commented on HIVE-26488: - Committed to master. Thanx [~zabetak] for the review!!! > Fix NPE in DDLSemanticAnalyzerFactory during compilation > > > Key: HIVE-26488 > URL: https://issues.apache.org/jira/browse/HIVE-26488 > Project: Hive > Issue Type: Sub-task >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Labels: pull-request-available > Time Spent: 2h 20m > Remaining Estimate: 0h > > *Exception Trace:* > {noformat} > java.lang.ExceptionInInitializerError > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzerFactory.getInternal(SemanticAnalyzerFactory.java:62) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzerFactory.get(SemanticAnalyzerFactory.java:41) > at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:209) > at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:106) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:507) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:459) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:424) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:418) > {noformat} > *Cause:* > {noformat} > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.ddl.DDLSemanticAnalyzerFactory.(DDLSemanticAnalyzerFactory.java:84) > ... 40 more > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-26488) Fix NPE in DDLSemanticAnalyzerFactory during compilation
[ https://issues.apache.org/jira/browse/HIVE-26488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena resolved HIVE-26488. - Fix Version/s: 4.0.0-alpha-2 Hadoop Flags: Reviewed Resolution: Fixed > Fix NPE in DDLSemanticAnalyzerFactory during compilation > > > Key: HIVE-26488 > URL: https://issues.apache.org/jira/browse/HIVE-26488 > Project: Hive > Issue Type: Sub-task >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-2 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > *Exception Trace:* > {noformat} > java.lang.ExceptionInInitializerError > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzerFactory.getInternal(SemanticAnalyzerFactory.java:62) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzerFactory.get(SemanticAnalyzerFactory.java:41) > at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:209) > at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:106) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:507) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:459) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:424) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:418) > {noformat} > *Cause:* > {noformat} > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.ddl.DDLSemanticAnalyzerFactory.(DDLSemanticAnalyzerFactory.java:84) > ... 40 more > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26488) Fix NPE in DDLSemanticAnalyzerFactory during compilation
[ https://issues.apache.org/jira/browse/HIVE-26488?focusedWorklogId=808858=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808858 ] ASF GitHub Bot logged work on HIVE-26488: - Author: ASF GitHub Bot Created on: 14/Sep/22 19:41 Start Date: 14/Sep/22 19:41 Worklog Time Spent: 10m Work Description: ayushtkn merged PR #3538: URL: https://github.com/apache/hive/pull/3538 Issue Time Tracking --- Worklog Id: (was: 808858) Time Spent: 2h 20m (was: 2h 10m) > Fix NPE in DDLSemanticAnalyzerFactory during compilation > > > Key: HIVE-26488 > URL: https://issues.apache.org/jira/browse/HIVE-26488 > Project: Hive > Issue Type: Sub-task >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Labels: pull-request-available > Time Spent: 2h 20m > Remaining Estimate: 0h > > *Exception Trace:* > {noformat} > java.lang.ExceptionInInitializerError > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzerFactory.getInternal(SemanticAnalyzerFactory.java:62) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzerFactory.get(SemanticAnalyzerFactory.java:41) > at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:209) > at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:106) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:507) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:459) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:424) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:418) > {noformat} > *Cause:* > {noformat} > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.ddl.DDLSemanticAnalyzerFactory.(DDLSemanticAnalyzerFactory.java:84) > ... 40 more > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness
[ https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808856=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808856 ] ASF GitHub Bot logged work on HIVE-26496: - Author: ASF GitHub Bot Created on: 14/Sep/22 19:34 Start Date: 14/Sep/22 19:34 Worklog Time Spent: 10m Work Description: difin commented on code in PR #3559: URL: https://github.com/apache/hive/pull/3559#discussion_r971208288 ## ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java: ## @@ -104,13 +104,32 @@ public OrcSplit(Path path, Object fileId, long offset, long length, String[] hos this.isOriginal = isOriginal; this.hasBase = hasBase; this.rootDir = rootDir; -this.deltas.addAll(filterDeltasByBucketId(deltas, AcidUtils.parseBucketId(path))); +this.deltas.addAll(filterDeleteDeltasByWriteIds +(filterDeltasByBucketId(deltas, AcidUtils.parseBucketId(path)), conf)); this.projColsUncompressedSize = projectedDataSize <= 0 ? length : projectedDataSize; // setting file length to Long.MAX_VALUE will let orc reader read file length from file system this.fileLen = fileLen <= 0 ? Long.MAX_VALUE : fileLen; this.syntheticAcidProps = syntheticAcidProps; } + /** + * For every split we want to filter out the delete deltas that contain events that happened only + * in the past relative to the split + * @param deltas + * @param conf + * @return + */ + protected List filterDeleteDeltasByWriteIds( + List deltas, Configuration conf) throws IOException { + +AcidOutputFormat.Options orcSplitMinMaxWriteIds = +AcidUtils.parseBaseOrDeltaBucketFilename(getPath(), conf); Review Comment: Hi @deniskuzZ, Many tests failed with the change of using AcidUtils.ParsedDeltaLight.parse() instead of AcidUtils.parseBaseOrDeltaBucketFilename() with exception on this line: https://github.com/apache/hive/blob/e352684d5c87df1483444afc4c3ee897270bd413/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java#L108 As I understand the split is not always a delta folder, it can be some older format not supported by ParsedDeltaLight. I saw that ParsedDeltaLight.parse() is used in some cases internally in AcidUtils.parseBaseOrDeltaBucketFilename(), but not always. Can you please advise if I should revert to using AcidUtils.parseBaseOrDeltaBucketFilename() that worked in all cases or there is some better way? https://github.com/apache/hive/blob/f6bd0eb80767adfa9ce9f47a6d02a4940903effb/ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java#L538-L552 Issue Time Tracking --- Worklog Id: (was: 808856) Time Spent: 5.5h (was: 5h 20m) > FetchOperator scans delete_delta folders multiple times causing slowness > > > Key: HIVE-26496 > URL: https://issues.apache.org/jira/browse/HIVE-26496 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Rajesh Balamohan >Assignee: Dmitriy Fingerman >Priority: Major > Labels: pull-request-available > Time Spent: 5.5h > Remaining Estimate: 0h > > FetchOperator scans way too many number of files/directories than needed. > For e.g here is a layout of a table which had set of updates and deletes. > There are set of "delta" and "delete_delta" folders which are created. > {noformat} > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/base_001 > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_002_002_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_003_003_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_004_004_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_005_005_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_006_006_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_007_007_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_008_008_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_009_009_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_010_010_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_011_011_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_012_012_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_013_013_ >
[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness
[ https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808855=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808855 ] ASF GitHub Bot logged work on HIVE-26496: - Author: ASF GitHub Bot Created on: 14/Sep/22 19:33 Start Date: 14/Sep/22 19:33 Worklog Time Spent: 10m Work Description: difin commented on code in PR #3559: URL: https://github.com/apache/hive/pull/3559#discussion_r971221139 ## ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java: ## @@ -104,13 +104,32 @@ public OrcSplit(Path path, Object fileId, long offset, long length, String[] hos this.isOriginal = isOriginal; this.hasBase = hasBase; this.rootDir = rootDir; -this.deltas.addAll(filterDeltasByBucketId(deltas, AcidUtils.parseBucketId(path))); +this.deltas.addAll(filterDeleteDeltasByWriteIds Review Comment: deltas class member is marked as final, that's why I couldn't reassign it and used addAll(). Issue Time Tracking --- Worklog Id: (was: 808855) Time Spent: 5h 20m (was: 5h 10m) > FetchOperator scans delete_delta folders multiple times causing slowness > > > Key: HIVE-26496 > URL: https://issues.apache.org/jira/browse/HIVE-26496 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Rajesh Balamohan >Assignee: Dmitriy Fingerman >Priority: Major > Labels: pull-request-available > Time Spent: 5h 20m > Remaining Estimate: 0h > > FetchOperator scans way too many number of files/directories than needed. > For e.g here is a layout of a table which had set of updates and deletes. > There are set of "delta" and "delete_delta" folders which are created. > {noformat} > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/base_001 > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_002_002_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_003_003_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_004_004_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_005_005_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_006_006_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_007_007_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_008_008_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_009_009_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_010_010_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_011_011_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_012_012_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_013_013_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_014_014_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_015_015_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_016_016_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_017_017_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_018_018_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_019_019_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_020_020_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_021_021_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_022_022_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_002_002_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_003_003_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_004_004_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_005_005_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_006_006_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_007_007_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_008_008_ >
[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness
[ https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808854=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808854 ] ASF GitHub Bot logged work on HIVE-26496: - Author: ASF GitHub Bot Created on: 14/Sep/22 19:30 Start Date: 14/Sep/22 19:30 Worklog Time Spent: 10m Work Description: difin commented on code in PR #3559: URL: https://github.com/apache/hive/pull/3559#discussion_r971208288 ## ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java: ## @@ -104,13 +104,32 @@ public OrcSplit(Path path, Object fileId, long offset, long length, String[] hos this.isOriginal = isOriginal; this.hasBase = hasBase; this.rootDir = rootDir; -this.deltas.addAll(filterDeltasByBucketId(deltas, AcidUtils.parseBucketId(path))); +this.deltas.addAll(filterDeleteDeltasByWriteIds +(filterDeltasByBucketId(deltas, AcidUtils.parseBucketId(path)), conf)); this.projColsUncompressedSize = projectedDataSize <= 0 ? length : projectedDataSize; // setting file length to Long.MAX_VALUE will let orc reader read file length from file system this.fileLen = fileLen <= 0 ? Long.MAX_VALUE : fileLen; this.syntheticAcidProps = syntheticAcidProps; } + /** + * For every split we want to filter out the delete deltas that contain events that happened only + * in the past relative to the split + * @param deltas + * @param conf + * @return + */ + protected List filterDeleteDeltasByWriteIds( + List deltas, Configuration conf) throws IOException { + +AcidOutputFormat.Options orcSplitMinMaxWriteIds = +AcidUtils.parseBaseOrDeltaBucketFilename(getPath(), conf); Review Comment: Hi @deniskuzZ, Many tests failed with the change of using AcidUtils.ParsedDeltaLight.parse() instead of AcidUtils.parseBaseOrDeltaBucketFilename() with exception on this line: https://github.com/apache/hive/blob/e352684d5c87df1483444afc4c3ee897270bd413/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java#L108. As I understand the split is not always a delta folder, it can be some older format not supported by ParsedDeltaLight. I saw that ParsedDeltaLight.parse() is used in some cases internally in AcidUtils.parseBaseOrDeltaBucketFilename(), but not always. Can you please advise if I should revert to using AcidUtils.parseBaseOrDeltaBucketFilename() that worked in all cases or there is some better way? https://github.com/apache/hive/blob/f6bd0eb80767adfa9ce9f47a6d02a4940903effb/ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java#L538-L552 Issue Time Tracking --- Worklog Id: (was: 808854) Time Spent: 5h 10m (was: 5h) > FetchOperator scans delete_delta folders multiple times causing slowness > > > Key: HIVE-26496 > URL: https://issues.apache.org/jira/browse/HIVE-26496 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Rajesh Balamohan >Assignee: Dmitriy Fingerman >Priority: Major > Labels: pull-request-available > Time Spent: 5h 10m > Remaining Estimate: 0h > > FetchOperator scans way too many number of files/directories than needed. > For e.g here is a layout of a table which had set of updates and deletes. > There are set of "delta" and "delete_delta" folders which are created. > {noformat} > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/base_001 > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_002_002_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_003_003_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_004_004_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_005_005_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_006_006_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_007_007_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_008_008_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_009_009_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_010_010_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_011_011_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_012_012_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_013_013_ >
[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness
[ https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808851=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808851 ] ASF GitHub Bot logged work on HIVE-26496: - Author: ASF GitHub Bot Created on: 14/Sep/22 19:20 Start Date: 14/Sep/22 19:20 Worklog Time Spent: 10m Work Description: difin commented on code in PR #3559: URL: https://github.com/apache/hive/pull/3559#discussion_r971208288 ## ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java: ## @@ -104,13 +104,32 @@ public OrcSplit(Path path, Object fileId, long offset, long length, String[] hos this.isOriginal = isOriginal; this.hasBase = hasBase; this.rootDir = rootDir; -this.deltas.addAll(filterDeltasByBucketId(deltas, AcidUtils.parseBucketId(path))); +this.deltas.addAll(filterDeleteDeltasByWriteIds +(filterDeltasByBucketId(deltas, AcidUtils.parseBucketId(path)), conf)); this.projColsUncompressedSize = projectedDataSize <= 0 ? length : projectedDataSize; // setting file length to Long.MAX_VALUE will let orc reader read file length from file system this.fileLen = fileLen <= 0 ? Long.MAX_VALUE : fileLen; this.syntheticAcidProps = syntheticAcidProps; } + /** + * For every split we want to filter out the delete deltas that contain events that happened only + * in the past relative to the split + * @param deltas + * @param conf + * @return + */ + protected List filterDeleteDeltasByWriteIds( + List deltas, Configuration conf) throws IOException { + +AcidOutputFormat.Options orcSplitMinMaxWriteIds = +AcidUtils.parseBaseOrDeltaBucketFilename(getPath(), conf); Review Comment: Hi @deniskuzZ, Many tests failed with the change of using AcidUtils.ParsedDeltaLight.parse() instead of AcidUtils.parseBaseOrDeltaBucketFilename(). As I understand the split is not always a delta folder, it can be some older format not supported by ParsedDeltaLight. I saw that ParsedDeltaLight.parse() is used in some cases internally in AcidUtils.parseBaseOrDeltaBucketFilename(), but not always. Can you please advise if I should revert to using AcidUtils.parseBaseOrDeltaBucketFilename() that worked in all cases or there is some better way? https://github.com/apache/hive/blob/f6bd0eb80767adfa9ce9f47a6d02a4940903effb/ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java#L538-L552 Issue Time Tracking --- Worklog Id: (was: 808851) Time Spent: 4h 50m (was: 4h 40m) > FetchOperator scans delete_delta folders multiple times causing slowness > > > Key: HIVE-26496 > URL: https://issues.apache.org/jira/browse/HIVE-26496 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Rajesh Balamohan >Assignee: Dmitriy Fingerman >Priority: Major > Labels: pull-request-available > Time Spent: 4h 50m > Remaining Estimate: 0h > > FetchOperator scans way too many number of files/directories than needed. > For e.g here is a layout of a table which had set of updates and deletes. > There are set of "delta" and "delete_delta" folders which are created. > {noformat} > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/base_001 > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_002_002_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_003_003_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_004_004_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_005_005_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_006_006_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_007_007_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_008_008_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_009_009_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_010_010_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_011_011_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_012_012_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_013_013_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_014_014_ >
[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness
[ https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808852=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808852 ] ASF GitHub Bot logged work on HIVE-26496: - Author: ASF GitHub Bot Created on: 14/Sep/22 19:20 Start Date: 14/Sep/22 19:20 Worklog Time Spent: 10m Work Description: difin commented on code in PR #3559: URL: https://github.com/apache/hive/pull/3559#discussion_r971190432 ## ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java: ## @@ -104,28 +104,43 @@ public OrcSplit(Path path, Object fileId, long offset, long length, String[] hos this.isOriginal = isOriginal; this.hasBase = hasBase; this.rootDir = rootDir; -this.deltas.addAll(filterDeltasByBucketId(deltas, AcidUtils.parseBucketId(path))); +int bucketId = AcidUtils.parseBucketId(path); +AcidUtils.ParsedDeltaLight parentDelta = AcidUtils.ParsedDeltaLight.parse(getPath().getParent()); Review Comment: Hi @deniskuzZ, Many tests failed with the change of using AcidUtils.ParsedDeltaLight.parse() instead of AcidUtils.parseBaseOrDeltaBucketFilename(). As I understand the split is not always a delta folder, it can be some older format not supported by ParsedDeltaLight. I saw that ParsedDeltaLight.parse() is used in some cases internally in AcidUtils.parseBaseOrDeltaBucketFilename(), but not always. Can you please advise if I should revert to using AcidUtils.parseBaseOrDeltaBucketFilename() that worked in all cases or there is some better way? https://github.com/apache/hive/blob/f6bd0eb80767adfa9ce9f47a6d02a4940903effb/ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java#L538-L552 Issue Time Tracking --- Worklog Id: (was: 808852) Time Spent: 5h (was: 4h 50m) > FetchOperator scans delete_delta folders multiple times causing slowness > > > Key: HIVE-26496 > URL: https://issues.apache.org/jira/browse/HIVE-26496 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Rajesh Balamohan >Assignee: Dmitriy Fingerman >Priority: Major > Labels: pull-request-available > Time Spent: 5h > Remaining Estimate: 0h > > FetchOperator scans way too many number of files/directories than needed. > For e.g here is a layout of a table which had set of updates and deletes. > There are set of "delta" and "delete_delta" folders which are created. > {noformat} > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/base_001 > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_002_002_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_003_003_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_004_004_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_005_005_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_006_006_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_007_007_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_008_008_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_009_009_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_010_010_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_011_011_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_012_012_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_013_013_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_014_014_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_015_015_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_016_016_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_017_017_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_018_018_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_019_019_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_020_020_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_021_021_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_022_022_ >
[jira] [Work logged] (HIVE-26277) NPEs and rounding issues in ColumnStatsAggregator classes
[ https://issues.apache.org/jira/browse/HIVE-26277?focusedWorklogId=808847=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808847 ] ASF GitHub Bot logged work on HIVE-26277: - Author: ASF GitHub Bot Created on: 14/Sep/22 19:13 Start Date: 14/Sep/22 19:13 Worklog Time Spent: 10m Work Description: sonarcloud[bot] commented on PR #3339: URL: https://github.com/apache/hive/pull/3339#issuecomment-1247195030 Kudos, SonarCloud Quality Gate passed! [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive=3339) [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=3339=false=BUG) [![C](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/C-16px.png 'C')](https://sonarcloud.io/project/issues?id=apache_hive=3339=false=BUG) [2 Bugs](https://sonarcloud.io/project/issues?id=apache_hive=3339=false=BUG) [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=3339=false=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=3339=false=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=3339=false=VULNERABILITY) [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3339=false=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3339=false=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3339=false=SECURITY_HOTSPOT) [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive=3339=false=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=3339=false=CODE_SMELL) [44 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive=3339=false=CODE_SMELL) [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive=3339=coverage=list) No Coverage information [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive=3339=duplicated_lines_density=list) No Duplication information Issue Time Tracking --- Worklog Id: (was: 808847) Time Spent: 7h 50m (was: 7h 40m) > NPEs and rounding issues in ColumnStatsAggregator classes > - > > Key: HIVE-26277 > URL: https://issues.apache.org/jira/browse/HIVE-26277 > Project: Hive > Issue Type: Bug > Components: Standalone Metastore, Statistics, Tests >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Labels: pull-request-available > Time Spent: 7h 50m > Remaining Estimate: 0h > > Fix NPEs and rounding errors in _ColumnStatsAggregator_ classes, add > unit-tests for all the involved classes. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness
[ https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808846=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808846 ] ASF GitHub Bot logged work on HIVE-26496: - Author: ASF GitHub Bot Created on: 14/Sep/22 19:09 Start Date: 14/Sep/22 19:09 Worklog Time Spent: 10m Work Description: difin commented on code in PR #3559: URL: https://github.com/apache/hive/pull/3559#discussion_r971190432 ## ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java: ## @@ -104,28 +104,43 @@ public OrcSplit(Path path, Object fileId, long offset, long length, String[] hos this.isOriginal = isOriginal; this.hasBase = hasBase; this.rootDir = rootDir; -this.deltas.addAll(filterDeltasByBucketId(deltas, AcidUtils.parseBucketId(path))); +int bucketId = AcidUtils.parseBucketId(path); +AcidUtils.ParsedDeltaLight parentDelta = AcidUtils.ParsedDeltaLight.parse(getPath().getParent()); Review Comment: Hi @deniskuzZ, Many tests failed with the change of using AcidUtils.ParsedDeltaLight.parse() instead of AcidUtils.parseBaseOrDeltaBucketFilename(). As I understand the split is not always a delta folder, it can be some older format not supported by ParsedDeltaLight. I saw that ParsedDeltaLight.parse() is used in some cases internally in AcidUtils.parseBaseOrDeltaBucketFilename(), but not always. Can you please advise if I should revert to using AcidUtils.parseBaseOrDeltaBucketFilename() that worked in all cases or there is some better way? https://github.com/apache/hive/blob/f6bd0eb80767adfa9ce9f47a6d02a4940903effb/ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java#L538-L552 Issue Time Tracking --- Worklog Id: (was: 808846) Time Spent: 4h 40m (was: 4.5h) > FetchOperator scans delete_delta folders multiple times causing slowness > > > Key: HIVE-26496 > URL: https://issues.apache.org/jira/browse/HIVE-26496 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Rajesh Balamohan >Assignee: Dmitriy Fingerman >Priority: Major > Labels: pull-request-available > Time Spent: 4h 40m > Remaining Estimate: 0h > > FetchOperator scans way too many number of files/directories than needed. > For e.g here is a layout of a table which had set of updates and deletes. > There are set of "delta" and "delete_delta" folders which are created. > {noformat} > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/base_001 > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_002_002_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_003_003_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_004_004_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_005_005_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_006_006_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_007_007_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_008_008_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_009_009_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_010_010_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_011_011_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_012_012_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_013_013_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_014_014_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_015_015_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_016_016_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_017_017_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_018_018_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_019_019_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_020_020_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_021_021_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_022_022_ >
[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness
[ https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808844=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808844 ] ASF GitHub Bot logged work on HIVE-26496: - Author: ASF GitHub Bot Created on: 14/Sep/22 19:03 Start Date: 14/Sep/22 19:03 Worklog Time Spent: 10m Work Description: difin commented on code in PR #3559: URL: https://github.com/apache/hive/pull/3559#discussion_r971190432 ## ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java: ## @@ -104,28 +104,43 @@ public OrcSplit(Path path, Object fileId, long offset, long length, String[] hos this.isOriginal = isOriginal; this.hasBase = hasBase; this.rootDir = rootDir; -this.deltas.addAll(filterDeltasByBucketId(deltas, AcidUtils.parseBucketId(path))); +int bucketId = AcidUtils.parseBucketId(path); +AcidUtils.ParsedDeltaLight parentDelta = AcidUtils.ParsedDeltaLight.parse(getPath().getParent()); Review Comment: Hi @deniskuzZ, Many tests failed with the change of using AcidUtils.ParsedDeltaLight.parse() instead of AcidUtils.parseBaseOrDeltaBucketFilename(). As I understand the split is not always a delta folder, it can be some older format not supported by ParsedDeltaLight. I saw that ParsedDeltaLight.parse() is used in some cases internally in AcidUtils.parseBaseOrDeltaBucketFilename(), but not always. Can you please advise if I should revert to using AcidUtils.parseBaseOrDeltaBucketFilename() that worked in all cases or there is some better way? https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java#539 Issue Time Tracking --- Worklog Id: (was: 808844) Time Spent: 4.5h (was: 4h 20m) > FetchOperator scans delete_delta folders multiple times causing slowness > > > Key: HIVE-26496 > URL: https://issues.apache.org/jira/browse/HIVE-26496 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Rajesh Balamohan >Assignee: Dmitriy Fingerman >Priority: Major > Labels: pull-request-available > Time Spent: 4.5h > Remaining Estimate: 0h > > FetchOperator scans way too many number of files/directories than needed. > For e.g here is a layout of a table which had set of updates and deletes. > There are set of "delta" and "delete_delta" folders which are created. > {noformat} > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/base_001 > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_002_002_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_003_003_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_004_004_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_005_005_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_006_006_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_007_007_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_008_008_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_009_009_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_010_010_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_011_011_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_012_012_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_013_013_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_014_014_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_015_015_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_016_016_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_017_017_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_018_018_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_019_019_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_020_020_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_021_021_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_022_022_ >
[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness
[ https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808843=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808843 ] ASF GitHub Bot logged work on HIVE-26496: - Author: ASF GitHub Bot Created on: 14/Sep/22 19:03 Start Date: 14/Sep/22 19:03 Worklog Time Spent: 10m Work Description: difin commented on code in PR #3559: URL: https://github.com/apache/hive/pull/3559#discussion_r971190432 ## ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java: ## @@ -104,28 +104,43 @@ public OrcSplit(Path path, Object fileId, long offset, long length, String[] hos this.isOriginal = isOriginal; this.hasBase = hasBase; this.rootDir = rootDir; -this.deltas.addAll(filterDeltasByBucketId(deltas, AcidUtils.parseBucketId(path))); +int bucketId = AcidUtils.parseBucketId(path); +AcidUtils.ParsedDeltaLight parentDelta = AcidUtils.ParsedDeltaLight.parse(getPath().getParent()); Review Comment: Hi @deniskuzZ, Many tests failed with the change of using AcidUtils.ParsedDeltaLight.parse() instead of AcidUtils.parseBaseOrDeltaBucketFilename(). As I understand the split is not always a delta folder, it can be some older format not supported by ParsedDeltaLight. I saw that ParsedDeltaLight.parse() is used in some cases internally in AcidUtils.parseBaseOrDeltaBucketFilename(), but not always. Can you please advise if I should revert to using AcidUtils.parseBaseOrDeltaBucketFilename() that worked in all cases or there is some better way? (https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java#539) Issue Time Tracking --- Worklog Id: (was: 808843) Time Spent: 4h 20m (was: 4h 10m) > FetchOperator scans delete_delta folders multiple times causing slowness > > > Key: HIVE-26496 > URL: https://issues.apache.org/jira/browse/HIVE-26496 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Rajesh Balamohan >Assignee: Dmitriy Fingerman >Priority: Major > Labels: pull-request-available > Time Spent: 4h 20m > Remaining Estimate: 0h > > FetchOperator scans way too many number of files/directories than needed. > For e.g here is a layout of a table which had set of updates and deletes. > There are set of "delta" and "delete_delta" folders which are created. > {noformat} > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/base_001 > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_002_002_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_003_003_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_004_004_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_005_005_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_006_006_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_007_007_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_008_008_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_009_009_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_010_010_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_011_011_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_012_012_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_013_013_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_014_014_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_015_015_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_016_016_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_017_017_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_018_018_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_019_019_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_020_020_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_021_021_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_022_022_ >
[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness
[ https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808842=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808842 ] ASF GitHub Bot logged work on HIVE-26496: - Author: ASF GitHub Bot Created on: 14/Sep/22 19:02 Start Date: 14/Sep/22 19:02 Worklog Time Spent: 10m Work Description: difin commented on code in PR #3559: URL: https://github.com/apache/hive/pull/3559#discussion_r971190432 ## ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java: ## @@ -104,28 +104,43 @@ public OrcSplit(Path path, Object fileId, long offset, long length, String[] hos this.isOriginal = isOriginal; this.hasBase = hasBase; this.rootDir = rootDir; -this.deltas.addAll(filterDeltasByBucketId(deltas, AcidUtils.parseBucketId(path))); +int bucketId = AcidUtils.parseBucketId(path); +AcidUtils.ParsedDeltaLight parentDelta = AcidUtils.ParsedDeltaLight.parse(getPath().getParent()); Review Comment: Hi @deniskuzZ, Many tests failed with the change of using AcidUtils.ParsedDeltaLight.parse() instead of AcidUtils.parseBaseOrDeltaBucketFilename(). As I understand the split is not always a delta folder, it can be some older format not supported by ParsedDeltaLight. I saw that ParsedDeltaLight.parse() is used in some cases internally in AcidUtils.parseBaseOrDeltaBucketFilename(), but not always. Can you please advise if I should revert to using AcidUtils.parseBaseOrDeltaBucketFilename() that worked in all cases or there is some better way? [](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java#539) Issue Time Tracking --- Worklog Id: (was: 808842) Time Spent: 4h 10m (was: 4h) > FetchOperator scans delete_delta folders multiple times causing slowness > > > Key: HIVE-26496 > URL: https://issues.apache.org/jira/browse/HIVE-26496 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Rajesh Balamohan >Assignee: Dmitriy Fingerman >Priority: Major > Labels: pull-request-available > Time Spent: 4h 10m > Remaining Estimate: 0h > > FetchOperator scans way too many number of files/directories than needed. > For e.g here is a layout of a table which had set of updates and deletes. > There are set of "delta" and "delete_delta" folders which are created. > {noformat} > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/base_001 > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_002_002_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_003_003_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_004_004_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_005_005_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_006_006_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_007_007_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_008_008_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_009_009_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_010_010_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_011_011_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_012_012_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_013_013_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_014_014_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_015_015_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_016_016_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_017_017_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_018_018_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_019_019_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_020_020_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_021_021_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_022_022_ >
[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness
[ https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808840=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808840 ] ASF GitHub Bot logged work on HIVE-26496: - Author: ASF GitHub Bot Created on: 14/Sep/22 18:58 Start Date: 14/Sep/22 18:58 Worklog Time Spent: 10m Work Description: difin commented on code in PR #3559: URL: https://github.com/apache/hive/pull/3559#discussion_r971190432 ## ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java: ## @@ -104,28 +104,43 @@ public OrcSplit(Path path, Object fileId, long offset, long length, String[] hos this.isOriginal = isOriginal; this.hasBase = hasBase; this.rootDir = rootDir; -this.deltas.addAll(filterDeltasByBucketId(deltas, AcidUtils.parseBucketId(path))); +int bucketId = AcidUtils.parseBucketId(path); +AcidUtils.ParsedDeltaLight parentDelta = AcidUtils.ParsedDeltaLight.parse(getPath().getParent()); Review Comment: Hi @deniskuzZ, Many tests failed with the change of using AcidUtils.ParsedDeltaLight.parse() instead of AcidUtils.parseBaseOrDeltaBucketFilename(). As I understand the split is not always a delta folder, it can be some older format not supported by ParsedDeltaLight. I saw that ParsedDeltaLight.parse() is used in some cases internally in AcidUtils.parseBaseOrDeltaBucketFilename(), but not always. Can you please advise if I should revert to using AcidUtils.parseBaseOrDeltaBucketFilename() that worked in all cases or there is some better way? Issue Time Tracking --- Worklog Id: (was: 808840) Time Spent: 4h (was: 3h 50m) > FetchOperator scans delete_delta folders multiple times causing slowness > > > Key: HIVE-26496 > URL: https://issues.apache.org/jira/browse/HIVE-26496 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Rajesh Balamohan >Assignee: Dmitriy Fingerman >Priority: Major > Labels: pull-request-available > Time Spent: 4h > Remaining Estimate: 0h > > FetchOperator scans way too many number of files/directories than needed. > For e.g here is a layout of a table which had set of updates and deletes. > There are set of "delta" and "delete_delta" folders which are created. > {noformat} > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/base_001 > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_002_002_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_003_003_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_004_004_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_005_005_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_006_006_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_007_007_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_008_008_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_009_009_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_010_010_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_011_011_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_012_012_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_013_013_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_014_014_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_015_015_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_016_016_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_017_017_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_018_018_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_019_019_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_020_020_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_021_021_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_022_022_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_002_002_ >
[jira] [Work logged] (HIVE-26045) Detect timed out connections for providers and auto-reconnect
[ https://issues.apache.org/jira/browse/HIVE-26045?focusedWorklogId=808837=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808837 ] ASF GitHub Bot logged work on HIVE-26045: - Author: ASF GitHub Bot Created on: 14/Sep/22 18:44 Start Date: 14/Sep/22 18:44 Worklog Time Spent: 10m Work Description: sonarcloud[bot] commented on PR #3595: URL: https://github.com/apache/hive/pull/3595#issuecomment-1247166412 Kudos, SonarCloud Quality Gate passed! [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive=3595) [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=3595=false=BUG) [![C](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/C-16px.png 'C')](https://sonarcloud.io/project/issues?id=apache_hive=3595=false=BUG) [1 Bug](https://sonarcloud.io/project/issues?id=apache_hive=3595=false=BUG) [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=3595=false=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=3595=false=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=3595=false=VULNERABILITY) [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3595=false=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3595=false=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3595=false=SECURITY_HOTSPOT) [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive=3595=false=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=3595=false=CODE_SMELL) [45 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive=3595=false=CODE_SMELL) [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive=3595=coverage=list) No Coverage information [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive=3595=duplicated_lines_density=list) No Duplication information Issue Time Tracking --- Worklog Id: (was: 808837) Time Spent: 3h (was: 2h 50m) > Detect timed out connections for providers and auto-reconnect > - > > Key: HIVE-26045 > URL: https://issues.apache.org/jira/browse/HIVE-26045 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Affects Versions: 4.0.0 >Reporter: Naveen Gangam >Assignee: zhangbutao >Priority: Major > Labels: pull-request-available > Time Spent: 3h > Remaining Estimate: 0h > > For the connectors, we use single connection, no pooling. But when the > connection is idle for an extended period, the JDBC connection times out. We > need to check for closed connections (Connection.isClosed()?) and > re-establish the connection. Otherwise it renders the connector fairly > useless. > {noformat} > 2022-03-17T13:02:16,635 WARN [HiveServer2-Handler-Pool: Thread-116] > thrift.ThriftCLIService: Error executing statement: > org.apache.hive.service.cli.HiveSQLException: Error while compiling > statement: FAILED: SemanticException Unable to fetch table temp_dbs. Error > retrieving remote > table:com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: > No operations allowed after connection closed. > at > org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:373) > ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT] >
[jira] [Comment Edited] (HIVE-26534) GROUPING() function errors out due to case-sensitivity of function name
[ https://issues.apache.org/jira/browse/HIVE-26534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17604875#comment-17604875 ] Aman Sinha edited comment on HIVE-26534 at 9/14/22 5:49 PM: Marking this fixed since [~soumyakanti.das]'s PR has been merged. Thanks to the reviewers. was (Author: amansinha): Marking this fixed since [~soumyakanti.das] PR has been merged. > GROUPING() function errors out due to case-sensitivity of function name > --- > > Key: HIVE-26534 > URL: https://issues.apache.org/jira/browse/HIVE-26534 > Project: Hive > Issue Type: Bug > Components: Hive, Logical Optimizer >Reporter: Aman Sinha >Assignee: Soumyakanti Das >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > The following errors out: > {noformat} > explain cbo select GROUPING(l_suppkey) from lineitem group by l_suppkey with > rollup; > Error: Error while compiling statement: FAILED: SemanticException [Error > 10015]: Line 1:19 Arguments length mismatch 'l_suppkey': grouping() requires > at least 2 argument, got 1 (state=21000,code=10015) > {noformat} > Lowercase grouping() succeeds: > {noformat} > explain cbo select grouping(l_suppkey) from lineitem group by l_suppkey with > rollup; > ++ > | Explain | > ++ > | CBO PLAN: | > | HiveProject(_o__c0=[grouping($1, 0:BIGINT)]) | > | HiveAggregate(group=[{0}], groups=[[{0}, {}]], > GROUPING__ID=[GROUPING__ID()]) | > | HiveProject(l_suppkey=[$2])| > | HiveTableScan(table=[[tpch, lineitem]], table:alias=[lineitem]) | > || > ++ > {noformat} > This is likely due to the SemanticAnalyzer doing a case-sensitive compare > here: > {noformat} > @Override > public Object post(Object t) { > > if (func.getText().equals("grouping") && func.getChildCount() == 0) > { > {noformat} > We should fix this to make it case-insensitive comparison. There might be > other places to examine too for grouping function. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-26534) GROUPING() function errors out due to case-sensitivity of function name
[ https://issues.apache.org/jira/browse/HIVE-26534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Sinha resolved HIVE-26534. --- Resolution: Fixed Marking this fixed since [~soumyakanti.das] PR has been merged. > GROUPING() function errors out due to case-sensitivity of function name > --- > > Key: HIVE-26534 > URL: https://issues.apache.org/jira/browse/HIVE-26534 > Project: Hive > Issue Type: Bug > Components: Hive, Logical Optimizer >Reporter: Aman Sinha >Assignee: Soumyakanti Das >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > The following errors out: > {noformat} > explain cbo select GROUPING(l_suppkey) from lineitem group by l_suppkey with > rollup; > Error: Error while compiling statement: FAILED: SemanticException [Error > 10015]: Line 1:19 Arguments length mismatch 'l_suppkey': grouping() requires > at least 2 argument, got 1 (state=21000,code=10015) > {noformat} > Lowercase grouping() succeeds: > {noformat} > explain cbo select grouping(l_suppkey) from lineitem group by l_suppkey with > rollup; > ++ > | Explain | > ++ > | CBO PLAN: | > | HiveProject(_o__c0=[grouping($1, 0:BIGINT)]) | > | HiveAggregate(group=[{0}], groups=[[{0}, {}]], > GROUPING__ID=[GROUPING__ID()]) | > | HiveProject(l_suppkey=[$2])| > | HiveTableScan(table=[[tpch, lineitem]], table:alias=[lineitem]) | > || > ++ > {noformat} > This is likely due to the SemanticAnalyzer doing a case-sensitive compare > here: > {noformat} > @Override > public Object post(Object t) { > > if (func.getText().equals("grouping") && func.getChildCount() == 0) > { > {noformat} > We should fix this to make it case-insensitive comparison. There might be > other places to examine too for grouping function. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26522) Test for HIVE-22033 and backport to 3.1 and 2.3
[ https://issues.apache.org/jira/browse/HIVE-26522?focusedWorklogId=808811=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808811 ] ASF GitHub Bot logged work on HIVE-26522: - Author: ASF GitHub Bot Created on: 14/Sep/22 17:13 Start Date: 14/Sep/22 17:13 Worklog Time Spent: 10m Work Description: prasanthj commented on PR #3586: URL: https://github.com/apache/hive/pull/3586#issuecomment-1247069691 +1 Issue Time Tracking --- Worklog Id: (was: 808811) Time Spent: 1h (was: 50m) > Test for HIVE-22033 and backport to 3.1 and 2.3 > --- > > Key: HIVE-26522 > URL: https://issues.apache.org/jira/browse/HIVE-26522 > Project: Hive > Issue Type: Bug > Components: Standalone Metastore >Affects Versions: 2.3.8, 3.1.3 >Reporter: Pavan Lanka >Assignee: Pavan Lanka >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > HIVE-22033 fixes the issue with Hive Delegation tokens so that the renewal > time is effective. > This looks at adding a test for HIVE-22033 and backporting this fix to 3.1 > and 2.3 branches in Hive. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26522) Test for HIVE-22033 and backport to 3.1 and 2.3
[ https://issues.apache.org/jira/browse/HIVE-26522?focusedWorklogId=808810=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808810 ] ASF GitHub Bot logged work on HIVE-26522: - Author: ASF GitHub Bot Created on: 14/Sep/22 17:12 Start Date: 14/Sep/22 17:12 Worklog Time Spent: 10m Work Description: prasanthj commented on PR #3587: URL: https://github.com/apache/hive/pull/3587#issuecomment-1247068988 +1 Issue Time Tracking --- Worklog Id: (was: 808810) Time Spent: 50m (was: 40m) > Test for HIVE-22033 and backport to 3.1 and 2.3 > --- > > Key: HIVE-26522 > URL: https://issues.apache.org/jira/browse/HIVE-26522 > Project: Hive > Issue Type: Bug > Components: Standalone Metastore >Affects Versions: 2.3.8, 3.1.3 >Reporter: Pavan Lanka >Assignee: Pavan Lanka >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > HIVE-22033 fixes the issue with Hive Delegation tokens so that the renewal > time is effective. > This looks at adding a test for HIVE-22033 and backporting this fix to 3.1 > and 2.3 branches in Hive. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26522) Test for HIVE-22033 and backport to 3.1 and 2.3
[ https://issues.apache.org/jira/browse/HIVE-26522?focusedWorklogId=808809=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808809 ] ASF GitHub Bot logged work on HIVE-26522: - Author: ASF GitHub Bot Created on: 14/Sep/22 17:12 Start Date: 14/Sep/22 17:12 Worklog Time Spent: 10m Work Description: prasanthj commented on PR #3585: URL: https://github.com/apache/hive/pull/3585#issuecomment-1247068460 +1 Issue Time Tracking --- Worklog Id: (was: 808809) Time Spent: 40m (was: 0.5h) > Test for HIVE-22033 and backport to 3.1 and 2.3 > --- > > Key: HIVE-26522 > URL: https://issues.apache.org/jira/browse/HIVE-26522 > Project: Hive > Issue Type: Bug > Components: Standalone Metastore >Affects Versions: 2.3.8, 3.1.3 >Reporter: Pavan Lanka >Assignee: Pavan Lanka >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > HIVE-22033 fixes the issue with Hive Delegation tokens so that the renewal > time is effective. > This looks at adding a test for HIVE-22033 and backporting this fix to 3.1 > and 2.3 branches in Hive. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness
[ https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808804=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808804 ] ASF GitHub Bot logged work on HIVE-26496: - Author: ASF GitHub Bot Created on: 14/Sep/22 17:01 Start Date: 14/Sep/22 17:01 Worklog Time Spent: 10m Work Description: sonarcloud[bot] commented on PR #3559: URL: https://github.com/apache/hive/pull/3559#issuecomment-1247057159 Kudos, SonarCloud Quality Gate passed! [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive=3559) [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=3559=false=BUG) [![C](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/C-16px.png 'C')](https://sonarcloud.io/project/issues?id=apache_hive=3559=false=BUG) [1 Bug](https://sonarcloud.io/project/issues?id=apache_hive=3559=false=BUG) [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=3559=false=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=3559=false=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=3559=false=VULNERABILITY) [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3559=false=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3559=false=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3559=false=SECURITY_HOTSPOT) [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive=3559=false=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=3559=false=CODE_SMELL) [45 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive=3559=false=CODE_SMELL) [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive=3559=coverage=list) No Coverage information [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive=3559=duplicated_lines_density=list) No Duplication information Issue Time Tracking --- Worklog Id: (was: 808804) Time Spent: 3h 50m (was: 3h 40m) > FetchOperator scans delete_delta folders multiple times causing slowness > > > Key: HIVE-26496 > URL: https://issues.apache.org/jira/browse/HIVE-26496 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Rajesh Balamohan >Assignee: Dmitriy Fingerman >Priority: Major > Labels: pull-request-available > Time Spent: 3h 50m > Remaining Estimate: 0h > > FetchOperator scans way too many number of files/directories than needed. > For e.g here is a layout of a table which had set of updates and deletes. > There are set of "delta" and "delete_delta" folders which are created. > {noformat} > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/base_001 > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_002_002_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_003_003_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_004_004_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_005_005_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_006_006_ >
[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness
[ https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808795=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808795 ] ASF GitHub Bot logged work on HIVE-26496: - Author: ASF GitHub Bot Created on: 14/Sep/22 16:30 Start Date: 14/Sep/22 16:30 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3559: URL: https://github.com/apache/hive/pull/3559#discussion_r971049370 ## ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java: ## @@ -104,28 +104,43 @@ public OrcSplit(Path path, Object fileId, long offset, long length, String[] hos this.isOriginal = isOriginal; this.hasBase = hasBase; this.rootDir = rootDir; -this.deltas.addAll(filterDeltasByBucketId(deltas, AcidUtils.parseBucketId(path))); +int bucketId = AcidUtils.parseBucketId(path); +AcidUtils.ParsedDeltaLight parentDelta = AcidUtils.ParsedDeltaLight.parse(getPath().getParent()); Review Comment: could be refactored using static import: ParsedDeltaLight pd = ParsedDeltaLight.parse(path.getParent()) Issue Time Tracking --- Worklog Id: (was: 808795) Time Spent: 3h 40m (was: 3.5h) > FetchOperator scans delete_delta folders multiple times causing slowness > > > Key: HIVE-26496 > URL: https://issues.apache.org/jira/browse/HIVE-26496 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Rajesh Balamohan >Assignee: Dmitriy Fingerman >Priority: Major > Labels: pull-request-available > Time Spent: 3h 40m > Remaining Estimate: 0h > > FetchOperator scans way too many number of files/directories than needed. > For e.g here is a layout of a table which had set of updates and deletes. > There are set of "delta" and "delete_delta" folders which are created. > {noformat} > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/base_001 > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_002_002_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_003_003_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_004_004_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_005_005_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_006_006_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_007_007_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_008_008_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_009_009_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_010_010_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_011_011_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_012_012_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_013_013_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_014_014_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_015_015_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_016_016_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_017_017_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_018_018_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_019_019_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_020_020_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_021_021_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_022_022_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_002_002_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_003_003_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_004_004_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_005_005_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_006_006_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_007_007_ >
[jira] [Work logged] (HIVE-26521) Iceberg: Raise exception when running delete/update statements on V1 tables
[ https://issues.apache.org/jira/browse/HIVE-26521?focusedWorklogId=808793=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808793 ] ASF GitHub Bot logged work on HIVE-26521: - Author: ASF GitHub Bot Created on: 14/Sep/22 16:24 Start Date: 14/Sep/22 16:24 Worklog Time Spent: 10m Work Description: sonarcloud[bot] commented on PR #3579: URL: https://github.com/apache/hive/pull/3579#issuecomment-1247013071 Kudos, SonarCloud Quality Gate passed! [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive=3579) [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=3579=false=BUG) [![C](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/C-16px.png 'C')](https://sonarcloud.io/project/issues?id=apache_hive=3579=false=BUG) [1 Bug](https://sonarcloud.io/project/issues?id=apache_hive=3579=false=BUG) [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=3579=false=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=3579=false=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=3579=false=VULNERABILITY) [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3579=false=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3579=false=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3579=false=SECURITY_HOTSPOT) [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive=3579=false=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=3579=false=CODE_SMELL) [46 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive=3579=false=CODE_SMELL) [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive=3579=coverage=list) No Coverage information [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive=3579=duplicated_lines_density=list) No Duplication information Issue Time Tracking --- Worklog Id: (was: 808793) Time Spent: 1.5h (was: 1h 20m) > Iceberg: Raise exception when running delete/update statements on V1 tables > --- > > Key: HIVE-26521 > URL: https://issues.apache.org/jira/browse/HIVE-26521 > Project: Hive > Issue Type: Improvement >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > Right now an exception is raised on the executor side when trying to commit > the delete file. We should throw an exception earlier, during the compilation > phase. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26488) Fix NPE in DDLSemanticAnalyzerFactory during compilation
[ https://issues.apache.org/jira/browse/HIVE-26488?focusedWorklogId=808792=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808792 ] ASF GitHub Bot logged work on HIVE-26488: - Author: ASF GitHub Bot Created on: 14/Sep/22 16:18 Start Date: 14/Sep/22 16:18 Worklog Time Spent: 10m Work Description: sonarcloud[bot] commented on PR #3538: URL: https://github.com/apache/hive/pull/3538#issuecomment-1247005602 Kudos, SonarCloud Quality Gate passed! [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive=3538) [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=3538=false=BUG) [![C](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/C-16px.png 'C')](https://sonarcloud.io/project/issues?id=apache_hive=3538=false=BUG) [1 Bug](https://sonarcloud.io/project/issues?id=apache_hive=3538=false=BUG) [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=3538=false=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=3538=false=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=3538=false=VULNERABILITY) [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3538=false=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3538=false=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3538=false=SECURITY_HOTSPOT) [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive=3538=false=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=3538=false=CODE_SMELL) [48 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive=3538=false=CODE_SMELL) [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive=3538=coverage=list) No Coverage information [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive=3538=duplicated_lines_density=list) No Duplication information Issue Time Tracking --- Worklog Id: (was: 808792) Time Spent: 2h 10m (was: 2h) > Fix NPE in DDLSemanticAnalyzerFactory during compilation > > > Key: HIVE-26488 > URL: https://issues.apache.org/jira/browse/HIVE-26488 > Project: Hive > Issue Type: Sub-task >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Labels: pull-request-available > Time Spent: 2h 10m > Remaining Estimate: 0h > > *Exception Trace:* > {noformat} > java.lang.ExceptionInInitializerError > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzerFactory.getInternal(SemanticAnalyzerFactory.java:62) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzerFactory.get(SemanticAnalyzerFactory.java:41) > at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:209) > at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:106) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:507) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:459) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:424) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:418) > {noformat} > *Cause:* > {noformat} > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.ddl.DDLSemanticAnalyzerFactory.(DDLSemanticAnalyzerFactory.java:84) > ... 40 more > {noformat} -- This message was sent by
[jira] [Resolved] (HIVE-26363) Time logged during repldump and replload per table is not in readable format
[ https://issues.apache.org/jira/browse/HIVE-26363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakshith C resolved HIVE-26363. --- Resolution: Fixed > Time logged during repldump and replload per table is not in readable format > > > Key: HIVE-26363 > URL: https://issues.apache.org/jira/browse/HIVE-26363 > Project: Hive > Issue Type: Improvement > Components: HiveServer2, repl >Affects Versions: 4.0.0 >Reporter: Imran >Assignee: Rakshith C >Priority: Minor > Labels: pull-request-available > Time Spent: 2h 20m > Remaining Estimate: 0h > > During replDump and replLoad we capture time take for each activity in > hive.log file. This is captured in milliseconds which becomes difficult to > read during debug activity, this ticket is raised to change the time logged > in hive.log in UTC format. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness
[ https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808788=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808788 ] ASF GitHub Bot logged work on HIVE-26496: - Author: ASF GitHub Bot Created on: 14/Sep/22 15:40 Start Date: 14/Sep/22 15:40 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3559: URL: https://github.com/apache/hive/pull/3559#discussion_r970992776 ## ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java: ## @@ -104,13 +104,32 @@ public OrcSplit(Path path, Object fileId, long offset, long length, String[] hos this.isOriginal = isOriginal; this.hasBase = hasBase; this.rootDir = rootDir; -this.deltas.addAll(filterDeltasByBucketId(deltas, AcidUtils.parseBucketId(path))); +this.deltas.addAll(filterDeleteDeltasByWriteIds Review Comment: looks good! 1 question, should we call addAll, or can we simply assign the result of the collect? this.deltas = collect(Collectors.toList()); Issue Time Tracking --- Worklog Id: (was: 808788) Time Spent: 3.5h (was: 3h 20m) > FetchOperator scans delete_delta folders multiple times causing slowness > > > Key: HIVE-26496 > URL: https://issues.apache.org/jira/browse/HIVE-26496 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Rajesh Balamohan >Assignee: Dmitriy Fingerman >Priority: Major > Labels: pull-request-available > Time Spent: 3.5h > Remaining Estimate: 0h > > FetchOperator scans way too many number of files/directories than needed. > For e.g here is a layout of a table which had set of updates and deletes. > There are set of "delta" and "delete_delta" folders which are created. > {noformat} > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/base_001 > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_002_002_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_003_003_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_004_004_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_005_005_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_006_006_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_007_007_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_008_008_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_009_009_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_010_010_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_011_011_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_012_012_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_013_013_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_014_014_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_015_015_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_016_016_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_017_017_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_018_018_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_019_019_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_020_020_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_021_021_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_022_022_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_002_002_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_003_003_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_004_004_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_005_005_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_006_006_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_007_007_ >
[jira] [Work logged] (HIVE-26504) User is not able to drop table
[ https://issues.apache.org/jira/browse/HIVE-26504?focusedWorklogId=808778=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808778 ] ASF GitHub Bot logged work on HIVE-26504: - Author: ASF GitHub Bot Created on: 14/Sep/22 15:03 Start Date: 14/Sep/22 15:03 Worklog Time Spent: 10m Work Description: SourabhBadhya commented on code in PR #3557: URL: https://github.com/apache/hive/pull/3557#discussion_r970940220 ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java: ## @@ -235,6 +235,10 @@ public void alterTable(RawStore msdb, Warehouse wh, String catName, String dbnam boolean renamedTranslatedToExternalTable = rename && MetaStoreUtils.isTranslatedToExternalTable(oldt) && MetaStoreUtils.isTranslatedToExternalTable(newt); + + List columnStatistics = getColumnStats(msdb, oldt); + columnStatistics = deleteTableColumnStats(msdb, oldt, newt, columnStatistics); Review Comment: Since we have deleted the table column stats, do we need to call deleteAllPartitionColumnStatistics here? [https://github.com/apache/hive/blob/f6bd0eb80767adfa9ce9f47a6d02a4940903effb/stand[…]ain/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java](https://github.com/apache/hive/blob/f6bd0eb80767adfa9ce9f47a6d02a4940903effb/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java#L417) Issue Time Tracking --- Worklog Id: (was: 808778) Time Spent: 1h (was: 50m) > User is not able to drop table > -- > > Key: HIVE-26504 > URL: https://issues.apache.org/jira/browse/HIVE-26504 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: László Végh >Assignee: László Végh >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > Hive won't store anything in *TAB_COL_STATS* for partitioned table, whereas > impala stores complete column stats in TAB_COL_STATS for partitioned table. > Deleting entries in TAB_COL_STATS is based on (DB_NAME, TABLE_NAME), not by > TBL_ID. Renamed tables were having old names in TAB_COL_STATS. > To Repro: > {code:java} > beeline: > set hive.create.as.insert.only=false; > set hive.create.as.acid=false; > create table testes.table_name_with_partition (id tinyint, name string) > partitioned by (col_to_partition bigint) stored as parquet; > insert into testes.table_name_with_partition (id, name, col_to_partition) > values (1, "a", 2020), (2, "b", 2021), (3, "c", 2022); > impala: > compute stats testes.table_name_with_partition; -- backend shows new entries > in TAB_COL_STATS > beeline: > alter table testes.table_name_with_partition rename to > testes2.table_that_cant_be_droped; > drop table testes2.table_that_cant_be_droped; -- This fails with > TAB_COL_STATS_fkey constraint violation. > {code} > Exception trace for drop table failure > {code:java} > Caused by: org.postgresql.util.PSQLException: ERROR: update or delete on > table "TBLS" violates foreign key constraint "TAB_COL_STATS_fkey" on table > "TAB_COL_STATS" > Detail: Key (TBL_ID)=(19816) is still referenced from table "TAB_COL_STATS". > at > org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2532) > at > org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2267) > ... 50 more > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26332) Upgrade maven-surefire-plugin to 3.0.0-M7
[ https://issues.apache.org/jira/browse/HIVE-26332?focusedWorklogId=808777=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808777 ] ASF GitHub Bot logged work on HIVE-26332: - Author: ASF GitHub Bot Created on: 14/Sep/22 15:02 Start Date: 14/Sep/22 15:02 Worklog Time Spent: 10m Work Description: sonarcloud[bot] commented on PR #3375: URL: https://github.com/apache/hive/pull/3375#issuecomment-1246903804 Kudos, SonarCloud Quality Gate passed! [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive=3375) [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=3375=false=BUG) [![C](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/C-16px.png 'C')](https://sonarcloud.io/project/issues?id=apache_hive=3375=false=BUG) [1 Bug](https://sonarcloud.io/project/issues?id=apache_hive=3375=false=BUG) [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=3375=false=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=3375=false=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=3375=false=VULNERABILITY) [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3375=false=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3375=false=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3375=false=SECURITY_HOTSPOT) [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive=3375=false=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=3375=false=CODE_SMELL) [44 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive=3375=false=CODE_SMELL) [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive=3375=coverage=list) No Coverage information [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive=3375=duplicated_lines_density=list) No Duplication information Issue Time Tracking --- Worklog Id: (was: 808777) Time Spent: 1h (was: 50m) > Upgrade maven-surefire-plugin to 3.0.0-M7 > - > > Key: HIVE-26332 > URL: https://issues.apache.org/jira/browse/HIVE-26332 > Project: Hive > Issue Type: Task > Components: Testing Infrastructure >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > Currently we use 3.0.0-M4 which was released in 2019. Since there have been > multiple bug fixes and improvements: > [https://issues.apache.org/jira/issues/?jql=project%20%3D%20SUREFIRE%20AND%20(fixVersion%20%3D%203.0.0-M5%20OR%20fixVersion%20%3D%203.0.0-M6%20OR%20fixVersion%20%3D%203.0.0-M7)%20ORDER%20BY%20resolutiondate%20%20DESC%2C%20key] > Worth mentioning that interaction with JUnit5 is much more mature as well and > this is one of the main reasons driving this upgrade. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness
[ https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808772=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808772 ] ASF GitHub Bot logged work on HIVE-26496: - Author: ASF GitHub Bot Created on: 14/Sep/22 14:56 Start Date: 14/Sep/22 14:56 Worklog Time Spent: 10m Work Description: difin commented on code in PR #3559: URL: https://github.com/apache/hive/pull/3559#discussion_r970930245 ## ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java: ## @@ -104,13 +104,32 @@ public OrcSplit(Path path, Object fileId, long offset, long length, String[] hos this.isOriginal = isOriginal; this.hasBase = hasBase; this.rootDir = rootDir; -this.deltas.addAll(filterDeltasByBucketId(deltas, AcidUtils.parseBucketId(path))); +this.deltas.addAll(filterDeleteDeltasByWriteIds Review Comment: done, replaced nested function calls with stream processing. ## ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java: ## @@ -104,13 +104,32 @@ public OrcSplit(Path path, Object fileId, long offset, long length, String[] hos this.isOriginal = isOriginal; this.hasBase = hasBase; this.rootDir = rootDir; -this.deltas.addAll(filterDeltasByBucketId(deltas, AcidUtils.parseBucketId(path))); +this.deltas.addAll(filterDeleteDeltasByWriteIds +(filterDeltasByBucketId(deltas, AcidUtils.parseBucketId(path)), conf)); this.projColsUncompressedSize = projectedDataSize <= 0 ? length : projectedDataSize; // setting file length to Long.MAX_VALUE will let orc reader read file length from file system this.fileLen = fileLen <= 0 ? Long.MAX_VALUE : fileLen; this.syntheticAcidProps = syntheticAcidProps; } + /** + * For every split we want to filter out the delete deltas that contain events that happened only + * in the past relative to the split + * @param deltas + * @param conf + * @return + */ + protected List filterDeleteDeltasByWriteIds( + List deltas, Configuration conf) throws IOException { + +AcidOutputFormat.Options orcSplitMinMaxWriteIds = +AcidUtils.parseBaseOrDeltaBucketFilename(getPath(), conf); Review Comment: done Issue Time Tracking --- Worklog Id: (was: 808772) Time Spent: 3h 20m (was: 3h 10m) > FetchOperator scans delete_delta folders multiple times causing slowness > > > Key: HIVE-26496 > URL: https://issues.apache.org/jira/browse/HIVE-26496 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Rajesh Balamohan >Assignee: Dmitriy Fingerman >Priority: Major > Labels: pull-request-available > Time Spent: 3h 20m > Remaining Estimate: 0h > > FetchOperator scans way too many number of files/directories than needed. > For e.g here is a layout of a table which had set of updates and deletes. > There are set of "delta" and "delete_delta" folders which are created. > {noformat} > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/base_001 > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_002_002_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_003_003_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_004_004_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_005_005_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_006_006_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_007_007_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_008_008_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_009_009_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_010_010_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_011_011_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_012_012_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_013_013_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_014_014_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_015_015_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_016_016_ >
[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness
[ https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808771=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808771 ] ASF GitHub Bot logged work on HIVE-26496: - Author: ASF GitHub Bot Created on: 14/Sep/22 14:54 Start Date: 14/Sep/22 14:54 Worklog Time Spent: 10m Work Description: difin commented on code in PR #3559: URL: https://github.com/apache/hive/pull/3559#discussion_r970928460 ## ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcSplit.java: ## @@ -0,0 +1,181 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.io.orc; + +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.common.ValidReadTxnList; +import org.apache.hadoop.hive.common.ValidTxnList; +import org.apache.hadoop.hive.common.ValidWriteIdList; +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.metastore.api.hive_metastoreConstants; +import org.apache.hadoop.hive.ql.io.*; +import org.apache.hadoop.hive.ql.io.AcidUtils.Directory; +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory; +import org.apache.hadoop.io.LongWritable; +import org.apache.hadoop.mapred.JobConf; +import org.apache.hadoop.mapred.Reporter; +import org.apache.orc.OrcConf; +import org.junit.Before; +import org.junit.Test; + +import java.io.File; +import java.util.*; + +import static org.junit.Assert.*; + +/** + * Tests for OrcSplit class + */ +public class TestOrcSplit { + + private JobConf conf; + private FileSystem fs; + private Path root; + private ObjectInspector inspector; + public static class DummyRow { +LongWritable field; +RecordIdentifier ROW__ID; + +DummyRow(long val, long rowId, long origTxn, int bucket) { + field = new LongWritable(val); + bucket = BucketCodec.V1.encode(new AcidOutputFormat.Options(null).bucket(bucket)); + ROW__ID = new RecordIdentifier(origTxn, bucket, rowId); +} + +static String getColumnNamesProperty() { + return "field"; +} +static String getColumnTypesProperty() { + return "bigint"; +} + + } + + @Before + public void setup() throws Exception { +conf = new JobConf(); +conf.set(hive_metastoreConstants.TABLE_IS_TRANSACTIONAL, "true"); +conf.setBoolean(HiveConf.ConfVars.HIVE_TRANSACTIONAL_TABLE_SCAN.varname, true); +conf.set(hive_metastoreConstants.TABLE_TRANSACTIONAL_PROPERTIES, "default"); +conf.setInt(HiveConf.ConfVars.HIVE_TXN_OPERATIONAL_PROPERTIES.varname, +AcidUtils.AcidOperationalProperties.getDefault().toInt()); +conf.set(IOConstants.SCHEMA_EVOLUTION_COLUMNS, DummyRow.getColumnNamesProperty()); +conf.set(IOConstants.SCHEMA_EVOLUTION_COLUMNS_TYPES, DummyRow.getColumnTypesProperty()); +conf.setBoolean(HiveConf.ConfVars.HIVE_VECTORIZATION_ENABLED.varname, true); +conf.set(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY.varname, "BI"); +OrcConf.ROWS_BETWEEN_CHECKS.setLong(conf, 1); + +Path workDir = new Path(System.getProperty("test.tmp.dir", +"target" + File.separator + "test" + File.separator + "tmp")); +root = new Path(workDir, "TestOrcSplit.testDump"); +fs = root.getFileSystem(conf); +root = fs.makeQualified(root); +fs.delete(root, true); +synchronized (TestOrcFile.class) { + inspector = ObjectInspectorFactory.getReflectionObjectInspector + (DummyRow.class, ObjectInspectorFactory.ObjectInspectorOptions.JAVA); +} + } + + private List> getSplitStrategies() throws Exception { +conf.setInt(HiveConf.ConfVars.HIVE_TXN_OPERATIONAL_PROPERTIES.varname, +AcidUtils.AcidOperationalProperties.getDefault().toInt()); +OrcInputFormat.Context context = new OrcInputFormat.Context(conf); +OrcInputFormat.FileGenerator gen = new OrcInputFormat.FileGenerator( +context, () -> fs, root, false, null); +Directory adi = gen.call(); +return OrcInputFormat.determineSplitStrategies( +null, context, adi.getFs(),
[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness
[ https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808770=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808770 ] ASF GitHub Bot logged work on HIVE-26496: - Author: ASF GitHub Bot Created on: 14/Sep/22 14:54 Start Date: 14/Sep/22 14:54 Worklog Time Spent: 10m Work Description: difin commented on code in PR #3559: URL: https://github.com/apache/hive/pull/3559#discussion_r970928073 ## ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcSplit.java: ## @@ -0,0 +1,181 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.io.orc; + +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.common.ValidReadTxnList; +import org.apache.hadoop.hive.common.ValidTxnList; +import org.apache.hadoop.hive.common.ValidWriteIdList; +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.metastore.api.hive_metastoreConstants; +import org.apache.hadoop.hive.ql.io.*; +import org.apache.hadoop.hive.ql.io.AcidUtils.Directory; +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory; +import org.apache.hadoop.io.LongWritable; +import org.apache.hadoop.mapred.JobConf; +import org.apache.hadoop.mapred.Reporter; +import org.apache.orc.OrcConf; +import org.junit.Before; +import org.junit.Test; + +import java.io.File; +import java.util.*; + +import static org.junit.Assert.*; + +/** + * Tests for OrcSplit class + */ +public class TestOrcSplit { + + private JobConf conf; + private FileSystem fs; + private Path root; + private ObjectInspector inspector; + public static class DummyRow { +LongWritable field; +RecordIdentifier ROW__ID; + +DummyRow(long val, long rowId, long origTxn, int bucket) { + field = new LongWritable(val); + bucket = BucketCodec.V1.encode(new AcidOutputFormat.Options(null).bucket(bucket)); + ROW__ID = new RecordIdentifier(origTxn, bucket, rowId); +} + +static String getColumnNamesProperty() { + return "field"; +} +static String getColumnTypesProperty() { + return "bigint"; +} + + } + + @Before + public void setup() throws Exception { +conf = new JobConf(); +conf.set(hive_metastoreConstants.TABLE_IS_TRANSACTIONAL, "true"); +conf.setBoolean(HiveConf.ConfVars.HIVE_TRANSACTIONAL_TABLE_SCAN.varname, true); +conf.set(hive_metastoreConstants.TABLE_TRANSACTIONAL_PROPERTIES, "default"); +conf.setInt(HiveConf.ConfVars.HIVE_TXN_OPERATIONAL_PROPERTIES.varname, +AcidUtils.AcidOperationalProperties.getDefault().toInt()); +conf.set(IOConstants.SCHEMA_EVOLUTION_COLUMNS, DummyRow.getColumnNamesProperty()); +conf.set(IOConstants.SCHEMA_EVOLUTION_COLUMNS_TYPES, DummyRow.getColumnTypesProperty()); +conf.setBoolean(HiveConf.ConfVars.HIVE_VECTORIZATION_ENABLED.varname, true); +conf.set(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY.varname, "BI"); +OrcConf.ROWS_BETWEEN_CHECKS.setLong(conf, 1); + +Path workDir = new Path(System.getProperty("test.tmp.dir", +"target" + File.separator + "test" + File.separator + "tmp")); +root = new Path(workDir, "TestOrcSplit.testDump"); +fs = root.getFileSystem(conf); +root = fs.makeQualified(root); +fs.delete(root, true); +synchronized (TestOrcFile.class) { + inspector = ObjectInspectorFactory.getReflectionObjectInspector + (DummyRow.class, ObjectInspectorFactory.ObjectInspectorOptions.JAVA); +} + } + + private List> getSplitStrategies() throws Exception { +conf.setInt(HiveConf.ConfVars.HIVE_TXN_OPERATIONAL_PROPERTIES.varname, +AcidUtils.AcidOperationalProperties.getDefault().toInt()); +OrcInputFormat.Context context = new OrcInputFormat.Context(conf); +OrcInputFormat.FileGenerator gen = new OrcInputFormat.FileGenerator( +context, () -> fs, root, false, null); +Directory adi = gen.call(); +return OrcInputFormat.determineSplitStrategies( +null, context, adi.getFs(),
[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness
[ https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808768=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808768 ] ASF GitHub Bot logged work on HIVE-26496: - Author: ASF GitHub Bot Created on: 14/Sep/22 14:54 Start Date: 14/Sep/22 14:54 Worklog Time Spent: 10m Work Description: difin commented on code in PR #3559: URL: https://github.com/apache/hive/pull/3559#discussion_r970927575 ## ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcSplit.java: ## @@ -0,0 +1,181 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.io.orc; + +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.common.ValidReadTxnList; +import org.apache.hadoop.hive.common.ValidTxnList; +import org.apache.hadoop.hive.common.ValidWriteIdList; +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.metastore.api.hive_metastoreConstants; +import org.apache.hadoop.hive.ql.io.*; +import org.apache.hadoop.hive.ql.io.AcidUtils.Directory; +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory; +import org.apache.hadoop.io.LongWritable; +import org.apache.hadoop.mapred.JobConf; +import org.apache.hadoop.mapred.Reporter; +import org.apache.orc.OrcConf; +import org.junit.Before; +import org.junit.Test; + +import java.io.File; +import java.util.*; Review Comment: replaced wildcard imports with concrete classes. Issue Time Tracking --- Worklog Id: (was: 808768) Time Spent: 2h 50m (was: 2h 40m) > FetchOperator scans delete_delta folders multiple times causing slowness > > > Key: HIVE-26496 > URL: https://issues.apache.org/jira/browse/HIVE-26496 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Rajesh Balamohan >Assignee: Dmitriy Fingerman >Priority: Major > Labels: pull-request-available > Time Spent: 2h 50m > Remaining Estimate: 0h > > FetchOperator scans way too many number of files/directories than needed. > For e.g here is a layout of a table which had set of updates and deletes. > There are set of "delta" and "delete_delta" folders which are created. > {noformat} > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/base_001 > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_002_002_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_003_003_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_004_004_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_005_005_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_006_006_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_007_007_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_008_008_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_009_009_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_010_010_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_011_011_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_012_012_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_013_013_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_014_014_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_015_015_ >
[jira] [Work logged] (HIVE-26420) Configurable timeout for HiveSplitGenerator to wait for LLAP instances
[ https://issues.apache.org/jira/browse/HIVE-26420?focusedWorklogId=808674=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808674 ] ASF GitHub Bot logged work on HIVE-26420: - Author: ASF GitHub Bot Created on: 14/Sep/22 11:50 Start Date: 14/Sep/22 11:50 Worklog Time Spent: 10m Work Description: sonarcloud[bot] commented on PR #3468: URL: https://github.com/apache/hive/pull/3468#issuecomment-1246649477 Kudos, SonarCloud Quality Gate passed! [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive=3468) [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=3468=false=BUG) [![C](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/C-16px.png 'C')](https://sonarcloud.io/project/issues?id=apache_hive=3468=false=BUG) [1 Bug](https://sonarcloud.io/project/issues?id=apache_hive=3468=false=BUG) [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=3468=false=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=3468=false=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=3468=false=VULNERABILITY) [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3468=false=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3468=false=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3468=false=SECURITY_HOTSPOT) [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive=3468=false=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=3468=false=CODE_SMELL) [49 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive=3468=false=CODE_SMELL) [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive=3468=coverage=list) No Coverage information [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive=3468=duplicated_lines_density=list) No Duplication information Issue Time Tracking --- Worklog Id: (was: 808674) Time Spent: 40m (was: 0.5h) > Configurable timeout for HiveSplitGenerator to wait for LLAP instances > -- > > Key: HIVE-26420 > URL: https://issues.apache.org/jira/browse/HIVE-26420 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > In some circumstances we cannot guarantee that LLAP daemons are ready as soon > as Tez AMs, but don't want the query to fail immediately with: > {code} > Caused by: java.lang.IllegalArgumentException: No running LLAP daemons! > Please check LLAP service status and zookeeper configuration > com.google.common.base.Preconditions.checkArgument(Preconditions.java:142) > > org.apache.hadoop.hive.ql.exec.tez.Utils.getCustomSplitLocationProvider(Utils.java:105) > > org.apache.hadoop.hive.ql.exec.tez.Utils.getSplitLocationProvider(Utils.java:77) > > org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.(HiveSplitGenerator.java:147) > 19 more > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness
[ https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808672=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808672 ] ASF GitHub Bot logged work on HIVE-26496: - Author: ASF GitHub Bot Created on: 14/Sep/22 11:48 Start Date: 14/Sep/22 11:48 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3559: URL: https://github.com/apache/hive/pull/3559#discussion_r970701167 ## ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java: ## @@ -104,13 +104,32 @@ public OrcSplit(Path path, Object fileId, long offset, long length, String[] hos this.isOriginal = isOriginal; this.hasBase = hasBase; this.rootDir = rootDir; -this.deltas.addAll(filterDeltasByBucketId(deltas, AcidUtils.parseBucketId(path))); +this.deltas.addAll(filterDeleteDeltasByWriteIds +(filterDeltasByBucketId(deltas, AcidUtils.parseBucketId(path)), conf)); this.projColsUncompressedSize = projectedDataSize <= 0 ? length : projectedDataSize; // setting file length to Long.MAX_VALUE will let orc reader read file length from file system this.fileLen = fileLen <= 0 ? Long.MAX_VALUE : fileLen; this.syntheticAcidProps = syntheticAcidProps; } + /** + * For every split we want to filter out the delete deltas that contain events that happened only + * in the past relative to the split + * @param deltas + * @param conf + * @return + */ + protected List filterDeleteDeltasByWriteIds( + List deltas, Configuration conf) throws IOException { + +AcidOutputFormat.Options orcSplitMinMaxWriteIds = +AcidUtils.parseBaseOrDeltaBucketFilename(getPath(), conf); Review Comment: why not simply `ParsedDeltaLight.parse(bucketFile.getParent())`? Issue Time Tracking --- Worklog Id: (was: 808672) Time Spent: 2h 40m (was: 2.5h) > FetchOperator scans delete_delta folders multiple times causing slowness > > > Key: HIVE-26496 > URL: https://issues.apache.org/jira/browse/HIVE-26496 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Rajesh Balamohan >Assignee: Dmitriy Fingerman >Priority: Major > Labels: pull-request-available > Time Spent: 2h 40m > Remaining Estimate: 0h > > FetchOperator scans way too many number of files/directories than needed. > For e.g here is a layout of a table which had set of updates and deletes. > There are set of "delta" and "delete_delta" folders which are created. > {noformat} > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/base_001 > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_002_002_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_003_003_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_004_004_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_005_005_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_006_006_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_007_007_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_008_008_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_009_009_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_010_010_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_011_011_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_012_012_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_013_013_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_014_014_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_015_015_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_016_016_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_017_017_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_018_018_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_019_019_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_020_020_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_021_021_ >
[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness
[ https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808665=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808665 ] ASF GitHub Bot logged work on HIVE-26496: - Author: ASF GitHub Bot Created on: 14/Sep/22 11:44 Start Date: 14/Sep/22 11:44 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3559: URL: https://github.com/apache/hive/pull/3559#discussion_r970696749 ## ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java: ## @@ -104,13 +104,32 @@ public OrcSplit(Path path, Object fileId, long offset, long length, String[] hos this.isOriginal = isOriginal; this.hasBase = hasBase; this.rootDir = rootDir; -this.deltas.addAll(filterDeltasByBucketId(deltas, AcidUtils.parseBucketId(path))); +this.deltas.addAll(filterDeleteDeltasByWriteIds Review Comment: could we transform this construct into a stream pipeline: this.deltas = deltas.stream() .filter(delta -> filterDeltasByBucketId(delta, AcidUtils.parseBucketId(path))) .filter(delta -> filterDeleteDeltasByWriteIds(delta, conf)) .collect(Collectors.toList()); Issue Time Tracking --- Worklog Id: (was: 808665) Time Spent: 2.5h (was: 2h 20m) > FetchOperator scans delete_delta folders multiple times causing slowness > > > Key: HIVE-26496 > URL: https://issues.apache.org/jira/browse/HIVE-26496 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Rajesh Balamohan >Assignee: Dmitriy Fingerman >Priority: Major > Labels: pull-request-available > Time Spent: 2.5h > Remaining Estimate: 0h > > FetchOperator scans way too many number of files/directories than needed. > For e.g here is a layout of a table which had set of updates and deletes. > There are set of "delta" and "delete_delta" folders which are created. > {noformat} > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/base_001 > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_002_002_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_003_003_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_004_004_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_005_005_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_006_006_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_007_007_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_008_008_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_009_009_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_010_010_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_011_011_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_012_012_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_013_013_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_014_014_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_015_015_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_016_016_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_017_017_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_018_018_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_019_019_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_020_020_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_021_021_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_022_022_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_002_002_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_003_003_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_004_004_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_005_005_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_006_006_ >
[jira] [Work logged] (HIVE-26045) Detect timed out connections for providers and auto-reconnect
[ https://issues.apache.org/jira/browse/HIVE-26045?focusedWorklogId=808661=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808661 ] ASF GitHub Bot logged work on HIVE-26045: - Author: ASF GitHub Bot Created on: 14/Sep/22 11:41 Start Date: 14/Sep/22 11:41 Worklog Time Spent: 10m Work Description: sonarcloud[bot] commented on PR #3595: URL: https://github.com/apache/hive/pull/3595#issuecomment-1246640211 Kudos, SonarCloud Quality Gate passed! [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive=3595) [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=3595=false=BUG) [![C](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/C-16px.png 'C')](https://sonarcloud.io/project/issues?id=apache_hive=3595=false=BUG) [1 Bug](https://sonarcloud.io/project/issues?id=apache_hive=3595=false=BUG) [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=3595=false=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=3595=false=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=3595=false=VULNERABILITY) [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3595=false=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3595=false=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3595=false=SECURITY_HOTSPOT) [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive=3595=false=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=3595=false=CODE_SMELL) [45 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive=3595=false=CODE_SMELL) [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive=3595=coverage=list) No Coverage information [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive=3595=duplicated_lines_density=list) No Duplication information Issue Time Tracking --- Worklog Id: (was: 808661) Time Spent: 2h 50m (was: 2h 40m) > Detect timed out connections for providers and auto-reconnect > - > > Key: HIVE-26045 > URL: https://issues.apache.org/jira/browse/HIVE-26045 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Affects Versions: 4.0.0 >Reporter: Naveen Gangam >Assignee: zhangbutao >Priority: Major > Labels: pull-request-available > Time Spent: 2h 50m > Remaining Estimate: 0h > > For the connectors, we use single connection, no pooling. But when the > connection is idle for an extended period, the JDBC connection times out. We > need to check for closed connections (Connection.isClosed()?) and > re-establish the connection. Otherwise it renders the connector fairly > useless. > {noformat} > 2022-03-17T13:02:16,635 WARN [HiveServer2-Handler-Pool: Thread-116] > thrift.ThriftCLIService: Error executing statement: > org.apache.hive.service.cli.HiveSQLException: Error while compiling > statement: FAILED: SemanticException Unable to fetch table temp_dbs. Error > retrieving remote > table:com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: > No operations allowed after connection closed. > at > org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:373) > ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT] >
[jira] [Work logged] (HIVE-26525) Update llap-server python scripts to be compatible with python 3
[ https://issues.apache.org/jira/browse/HIVE-26525?focusedWorklogId=808646=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808646 ] ASF GitHub Bot logged work on HIVE-26525: - Author: ASF GitHub Bot Created on: 14/Sep/22 11:10 Start Date: 14/Sep/22 11:10 Worklog Time Spent: 10m Work Description: deniskuzZ merged PR #3584: URL: https://github.com/apache/hive/pull/3584 Issue Time Tracking --- Worklog Id: (was: 808646) Time Spent: 50m (was: 40m) > Update llap-server python scripts to be compatible with python 3 > > > Key: HIVE-26525 > URL: https://issues.apache.org/jira/browse/HIVE-26525 > Project: Hive > Issue Type: Task >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > llap-server/src/main/resources/package.py and > /llap-server/src/main/resources/argparse.py are not compatible with python 3. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26445) Use tez.local.mode.without.network for qtests
[ https://issues.apache.org/jira/browse/HIVE-26445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17604016#comment-17604016 ] László Bodor commented on HIVE-26445: - this one can merged upstream only after Tez 0.10.3 is released with TEZ-4447 > Use tez.local.mode.without.network for qtests > - > > Key: HIVE-26445 > URL: https://issues.apache.org/jira/browse/HIVE-26445 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > looks like in case of iceberg, the local dagclient behaves weird: > {code} > 2022-08-02T06:54:36,669 ERROR [2f953972-7675-4594-8d6b-d1c295c056a5 > Time-limited test] tez.TezTask: Failed to execute tez graph. > java.lang.NullPointerException: null > at > org.apache.hadoop.hive.ql.exec.tez.TezTask.collectCommitInformation(TezTask.java:367) > ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT] > at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:279) > [hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT] > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212) > [hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) > [hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT] > at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:354) > [hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT] > at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:327) > [hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT] > at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:244) > [hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT] > at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:105) > [hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT] > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:355) > [hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT] > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:205) > [hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT] > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:154) > [hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT] > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:149) > [hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT] > {code} > it's thrown from > https://github.com/apache/hive/blob/e0f2d287c562423dc2632910aae4f1cd8bcd4b4d/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java#L367 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26247) Filter out results 'show connectors' on HMS server-side
[ https://issues.apache.org/jira/browse/HIVE-26247?focusedWorklogId=808620=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808620 ] ASF GitHub Bot logged work on HIVE-26247: - Author: ASF GitHub Bot Created on: 14/Sep/22 10:20 Start Date: 14/Sep/22 10:20 Worklog Time Spent: 10m Work Description: zhangbutao commented on PR #3545: URL: https://github.com/apache/hive/pull/3545#issuecomment-1246553930 Gentle ping @nrg4878 @saihemanth-cloudera Issue Time Tracking --- Worklog Id: (was: 808620) Time Spent: 40m (was: 0.5h) > Filter out results 'show connectors' on HMS server-side > --- > > Key: HIVE-26247 > URL: https://issues.apache.org/jira/browse/HIVE-26247 > Project: Hive > Issue Type: Sub-task >Affects Versions: 4.0.0-alpha-1, 4.0.0-alpha-2 >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26488) Fix NPE in DDLSemanticAnalyzerFactory during compilation
[ https://issues.apache.org/jira/browse/HIVE-26488?focusedWorklogId=808618=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808618 ] ASF GitHub Bot logged work on HIVE-26488: - Author: ASF GitHub Bot Created on: 14/Sep/22 10:13 Start Date: 14/Sep/22 10:13 Worklog Time Spent: 10m Work Description: sonarcloud[bot] commented on PR #3538: URL: https://github.com/apache/hive/pull/3538#issuecomment-1246543513 Kudos, SonarCloud Quality Gate passed! [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive=3538) [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=3538=false=BUG) [![C](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/C-16px.png 'C')](https://sonarcloud.io/project/issues?id=apache_hive=3538=false=BUG) [1 Bug](https://sonarcloud.io/project/issues?id=apache_hive=3538=false=BUG) [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=3538=false=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=3538=false=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=3538=false=VULNERABILITY) [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3538=false=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3538=false=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3538=false=SECURITY_HOTSPOT) [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive=3538=false=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=3538=false=CODE_SMELL) [48 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive=3538=false=CODE_SMELL) [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive=3538=coverage=list) No Coverage information [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive=3538=duplicated_lines_density=list) No Duplication information Issue Time Tracking --- Worklog Id: (was: 808618) Time Spent: 2h (was: 1h 50m) > Fix NPE in DDLSemanticAnalyzerFactory during compilation > > > Key: HIVE-26488 > URL: https://issues.apache.org/jira/browse/HIVE-26488 > Project: Hive > Issue Type: Sub-task >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Labels: pull-request-available > Time Spent: 2h > Remaining Estimate: 0h > > *Exception Trace:* > {noformat} > java.lang.ExceptionInInitializerError > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzerFactory.getInternal(SemanticAnalyzerFactory.java:62) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzerFactory.get(SemanticAnalyzerFactory.java:41) > at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:209) > at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:106) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:507) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:459) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:424) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:418) > {noformat} > *Cause:* > {noformat} > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.ddl.DDLSemanticAnalyzerFactory.(DDLSemanticAnalyzerFactory.java:84) > ... 40 more > {noformat} -- This message was sent by
[jira] [Work logged] (HIVE-26045) Detect timed out connections for providers and auto-reconnect
[ https://issues.apache.org/jira/browse/HIVE-26045?focusedWorklogId=808617=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808617 ] ASF GitHub Bot logged work on HIVE-26045: - Author: ASF GitHub Bot Created on: 14/Sep/22 10:07 Start Date: 14/Sep/22 10:07 Worklog Time Spent: 10m Work Description: zhangbutao commented on PR #3388: URL: https://github.com/apache/hive/pull/3388#issuecomment-1246535811 Superseded by https://github.com/apache/hive/pull/3595 cc @nrg4878 Issue Time Tracking --- Worklog Id: (was: 808617) Time Spent: 2h 40m (was: 2.5h) > Detect timed out connections for providers and auto-reconnect > - > > Key: HIVE-26045 > URL: https://issues.apache.org/jira/browse/HIVE-26045 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Affects Versions: 4.0.0 >Reporter: Naveen Gangam >Assignee: zhangbutao >Priority: Major > Labels: pull-request-available > Time Spent: 2h 40m > Remaining Estimate: 0h > > For the connectors, we use single connection, no pooling. But when the > connection is idle for an extended period, the JDBC connection times out. We > need to check for closed connections (Connection.isClosed()?) and > re-establish the connection. Otherwise it renders the connector fairly > useless. > {noformat} > 2022-03-17T13:02:16,635 WARN [HiveServer2-Handler-Pool: Thread-116] > thrift.ThriftCLIService: Error executing statement: > org.apache.hive.service.cli.HiveSQLException: Error while compiling > statement: FAILED: SemanticException Unable to fetch table temp_dbs. Error > retrieving remote > table:com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: > No operations allowed after connection closed. > at > org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:373) > ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT] > at > org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:211) > ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT] > at > org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:265) > ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT] > at > org.apache.hive.service.cli.operation.Operation.run(Operation.java:285) > ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT] > at > org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:576) > ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT] > at > org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:562) > ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT] > at sun.reflect.GeneratedMethodAccessor52.invoke(Unknown Source) ~[?:?] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:1.8.0_231] > at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_231] > at > org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78) > ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT] > at > org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36) > ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT] > at > org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63) > ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT] > at java.security.AccessController.doPrivileged(Native Method) > ~[?:1.8.0_231] > at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_231] > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > ~[hadoop-common-3.1.0.jar:?] > at > org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59) > ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT] > at com.sun.proxy.$Proxy44.executeStatementAsync(Unknown Source) ~[?:?] > at > org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:315) > ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT] > at > org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:567) > ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT] > at > org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1550) >
[jira] [Work logged] (HIVE-26045) Detect timed out connections for providers and auto-reconnect
[ https://issues.apache.org/jira/browse/HIVE-26045?focusedWorklogId=808613=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808613 ] ASF GitHub Bot logged work on HIVE-26045: - Author: ASF GitHub Bot Created on: 14/Sep/22 10:05 Start Date: 14/Sep/22 10:05 Worklog Time Spent: 10m Work Description: zhangbutao opened a new pull request, #3595: URL: https://github.com/apache/hive/pull/3595 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Issue Time Tracking --- Worklog Id: (was: 808613) Time Spent: 2.5h (was: 2h 20m) > Detect timed out connections for providers and auto-reconnect > - > > Key: HIVE-26045 > URL: https://issues.apache.org/jira/browse/HIVE-26045 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Affects Versions: 4.0.0 >Reporter: Naveen Gangam >Assignee: zhangbutao >Priority: Major > Labels: pull-request-available > Time Spent: 2.5h > Remaining Estimate: 0h > > For the connectors, we use single connection, no pooling. But when the > connection is idle for an extended period, the JDBC connection times out. We > need to check for closed connections (Connection.isClosed()?) and > re-establish the connection. Otherwise it renders the connector fairly > useless. > {noformat} > 2022-03-17T13:02:16,635 WARN [HiveServer2-Handler-Pool: Thread-116] > thrift.ThriftCLIService: Error executing statement: > org.apache.hive.service.cli.HiveSQLException: Error while compiling > statement: FAILED: SemanticException Unable to fetch table temp_dbs. Error > retrieving remote > table:com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: > No operations allowed after connection closed. > at > org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:373) > ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT] > at > org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:211) > ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT] > at > org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:265) > ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT] > at > org.apache.hive.service.cli.operation.Operation.run(Operation.java:285) > ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT] > at > org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:576) > ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT] > at > org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:562) > ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT] > at sun.reflect.GeneratedMethodAccessor52.invoke(Unknown Source) ~[?:?] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:1.8.0_231] > at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_231] > at > org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78) > ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT] > at > org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36) > ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT] > at > org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63) > ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT] > at java.security.AccessController.doPrivileged(Native Method) > ~[?:1.8.0_231] > at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_231] > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > ~[hadoop-common-3.1.0.jar:?] > at > org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59) > ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT] > at com.sun.proxy.$Proxy44.executeStatementAsync(Unknown Source) ~[?:?] > at > org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:315) > ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT] > at > org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:567) > ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT] > at >
[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness
[ https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808609=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808609 ] ASF GitHub Bot logged work on HIVE-26496: - Author: ASF GitHub Bot Created on: 14/Sep/22 10:03 Start Date: 14/Sep/22 10:03 Worklog Time Spent: 10m Work Description: abstractdog commented on PR #3559: URL: https://github.com/apache/hive/pull/3559#issuecomment-1246530870 left minor comments, but basically looks good to me, I'm tempted to approve this immediately just wondering if anyone with ACID background can see any obvious problems: @deniskuzZ , @lcspinter Issue Time Tracking --- Worklog Id: (was: 808609) Time Spent: 2h 20m (was: 2h 10m) > FetchOperator scans delete_delta folders multiple times causing slowness > > > Key: HIVE-26496 > URL: https://issues.apache.org/jira/browse/HIVE-26496 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Rajesh Balamohan >Assignee: Dmitriy Fingerman >Priority: Major > Labels: pull-request-available > Time Spent: 2h 20m > Remaining Estimate: 0h > > FetchOperator scans way too many number of files/directories than needed. > For e.g here is a layout of a table which had set of updates and deletes. > There are set of "delta" and "delete_delta" folders which are created. > {noformat} > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/base_001 > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_002_002_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_003_003_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_004_004_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_005_005_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_006_006_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_007_007_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_008_008_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_009_009_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_010_010_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_011_011_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_012_012_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_013_013_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_014_014_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_015_015_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_016_016_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_017_017_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_018_018_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_019_019_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_020_020_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_021_021_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_022_022_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_002_002_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_003_003_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_004_004_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_005_005_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_006_006_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_007_007_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_008_008_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_009_009_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_010_010_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_011_011_ >
[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness
[ https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808606=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808606 ] ASF GitHub Bot logged work on HIVE-26496: - Author: ASF GitHub Bot Created on: 14/Sep/22 10:01 Start Date: 14/Sep/22 10:01 Worklog Time Spent: 10m Work Description: abstractdog commented on code in PR #3559: URL: https://github.com/apache/hive/pull/3559#discussion_r970592700 ## ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcSplit.java: ## @@ -0,0 +1,181 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.io.orc; + +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.common.ValidReadTxnList; +import org.apache.hadoop.hive.common.ValidTxnList; +import org.apache.hadoop.hive.common.ValidWriteIdList; +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.metastore.api.hive_metastoreConstants; +import org.apache.hadoop.hive.ql.io.*; +import org.apache.hadoop.hive.ql.io.AcidUtils.Directory; +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory; +import org.apache.hadoop.io.LongWritable; +import org.apache.hadoop.mapred.JobConf; +import org.apache.hadoop.mapred.Reporter; +import org.apache.orc.OrcConf; +import org.junit.Before; +import org.junit.Test; + +import java.io.File; +import java.util.*; + +import static org.junit.Assert.*; + +/** + * Tests for OrcSplit class + */ +public class TestOrcSplit { + + private JobConf conf; + private FileSystem fs; + private Path root; + private ObjectInspector inspector; + public static class DummyRow { +LongWritable field; +RecordIdentifier ROW__ID; + +DummyRow(long val, long rowId, long origTxn, int bucket) { + field = new LongWritable(val); + bucket = BucketCodec.V1.encode(new AcidOutputFormat.Options(null).bucket(bucket)); + ROW__ID = new RecordIdentifier(origTxn, bucket, rowId); +} + +static String getColumnNamesProperty() { + return "field"; +} +static String getColumnTypesProperty() { + return "bigint"; +} + + } + + @Before + public void setup() throws Exception { +conf = new JobConf(); +conf.set(hive_metastoreConstants.TABLE_IS_TRANSACTIONAL, "true"); +conf.setBoolean(HiveConf.ConfVars.HIVE_TRANSACTIONAL_TABLE_SCAN.varname, true); +conf.set(hive_metastoreConstants.TABLE_TRANSACTIONAL_PROPERTIES, "default"); +conf.setInt(HiveConf.ConfVars.HIVE_TXN_OPERATIONAL_PROPERTIES.varname, +AcidUtils.AcidOperationalProperties.getDefault().toInt()); +conf.set(IOConstants.SCHEMA_EVOLUTION_COLUMNS, DummyRow.getColumnNamesProperty()); +conf.set(IOConstants.SCHEMA_EVOLUTION_COLUMNS_TYPES, DummyRow.getColumnTypesProperty()); +conf.setBoolean(HiveConf.ConfVars.HIVE_VECTORIZATION_ENABLED.varname, true); +conf.set(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY.varname, "BI"); +OrcConf.ROWS_BETWEEN_CHECKS.setLong(conf, 1); + +Path workDir = new Path(System.getProperty("test.tmp.dir", +"target" + File.separator + "test" + File.separator + "tmp")); +root = new Path(workDir, "TestOrcSplit.testDump"); +fs = root.getFileSystem(conf); +root = fs.makeQualified(root); +fs.delete(root, true); +synchronized (TestOrcFile.class) { + inspector = ObjectInspectorFactory.getReflectionObjectInspector + (DummyRow.class, ObjectInspectorFactory.ObjectInspectorOptions.JAVA); +} + } + + private List> getSplitStrategies() throws Exception { +conf.setInt(HiveConf.ConfVars.HIVE_TXN_OPERATIONAL_PROPERTIES.varname, +AcidUtils.AcidOperationalProperties.getDefault().toInt()); +OrcInputFormat.Context context = new OrcInputFormat.Context(conf); +OrcInputFormat.FileGenerator gen = new OrcInputFormat.FileGenerator( +context, () -> fs, root, false, null); +Directory adi = gen.call(); +return OrcInputFormat.determineSplitStrategies( +null, context, adi.getFs(),
[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness
[ https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808605=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808605 ] ASF GitHub Bot logged work on HIVE-26496: - Author: ASF GitHub Bot Created on: 14/Sep/22 10:01 Start Date: 14/Sep/22 10:01 Worklog Time Spent: 10m Work Description: abstractdog commented on code in PR #3559: URL: https://github.com/apache/hive/pull/3559#discussion_r970592299 ## ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcSplit.java: ## @@ -0,0 +1,181 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.io.orc; + +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.common.ValidReadTxnList; +import org.apache.hadoop.hive.common.ValidTxnList; +import org.apache.hadoop.hive.common.ValidWriteIdList; +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.metastore.api.hive_metastoreConstants; +import org.apache.hadoop.hive.ql.io.*; +import org.apache.hadoop.hive.ql.io.AcidUtils.Directory; +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory; +import org.apache.hadoop.io.LongWritable; +import org.apache.hadoop.mapred.JobConf; +import org.apache.hadoop.mapred.Reporter; +import org.apache.orc.OrcConf; +import org.junit.Before; +import org.junit.Test; + +import java.io.File; +import java.util.*; + +import static org.junit.Assert.*; + +/** + * Tests for OrcSplit class + */ +public class TestOrcSplit { + + private JobConf conf; + private FileSystem fs; + private Path root; + private ObjectInspector inspector; + public static class DummyRow { +LongWritable field; +RecordIdentifier ROW__ID; + +DummyRow(long val, long rowId, long origTxn, int bucket) { + field = new LongWritable(val); + bucket = BucketCodec.V1.encode(new AcidOutputFormat.Options(null).bucket(bucket)); + ROW__ID = new RecordIdentifier(origTxn, bucket, rowId); +} + +static String getColumnNamesProperty() { + return "field"; +} +static String getColumnTypesProperty() { + return "bigint"; +} + + } + + @Before + public void setup() throws Exception { +conf = new JobConf(); +conf.set(hive_metastoreConstants.TABLE_IS_TRANSACTIONAL, "true"); +conf.setBoolean(HiveConf.ConfVars.HIVE_TRANSACTIONAL_TABLE_SCAN.varname, true); +conf.set(hive_metastoreConstants.TABLE_TRANSACTIONAL_PROPERTIES, "default"); +conf.setInt(HiveConf.ConfVars.HIVE_TXN_OPERATIONAL_PROPERTIES.varname, +AcidUtils.AcidOperationalProperties.getDefault().toInt()); +conf.set(IOConstants.SCHEMA_EVOLUTION_COLUMNS, DummyRow.getColumnNamesProperty()); +conf.set(IOConstants.SCHEMA_EVOLUTION_COLUMNS_TYPES, DummyRow.getColumnTypesProperty()); +conf.setBoolean(HiveConf.ConfVars.HIVE_VECTORIZATION_ENABLED.varname, true); +conf.set(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY.varname, "BI"); +OrcConf.ROWS_BETWEEN_CHECKS.setLong(conf, 1); + +Path workDir = new Path(System.getProperty("test.tmp.dir", +"target" + File.separator + "test" + File.separator + "tmp")); +root = new Path(workDir, "TestOrcSplit.testDump"); +fs = root.getFileSystem(conf); +root = fs.makeQualified(root); +fs.delete(root, true); +synchronized (TestOrcFile.class) { + inspector = ObjectInspectorFactory.getReflectionObjectInspector + (DummyRow.class, ObjectInspectorFactory.ObjectInspectorOptions.JAVA); +} + } + + private List> getSplitStrategies() throws Exception { +conf.setInt(HiveConf.ConfVars.HIVE_TXN_OPERATIONAL_PROPERTIES.varname, +AcidUtils.AcidOperationalProperties.getDefault().toInt()); +OrcInputFormat.Context context = new OrcInputFormat.Context(conf); +OrcInputFormat.FileGenerator gen = new OrcInputFormat.FileGenerator( +context, () -> fs, root, false, null); +Directory adi = gen.call(); +return OrcInputFormat.determineSplitStrategies( +null, context, adi.getFs(),
[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness
[ https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=808603=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808603 ] ASF GitHub Bot logged work on HIVE-26496: - Author: ASF GitHub Bot Created on: 14/Sep/22 09:58 Start Date: 14/Sep/22 09:58 Worklog Time Spent: 10m Work Description: abstractdog commented on code in PR #3559: URL: https://github.com/apache/hive/pull/3559#discussion_r970589035 ## ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcSplit.java: ## @@ -0,0 +1,181 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.io.orc; + +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.common.ValidReadTxnList; +import org.apache.hadoop.hive.common.ValidTxnList; +import org.apache.hadoop.hive.common.ValidWriteIdList; +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.metastore.api.hive_metastoreConstants; +import org.apache.hadoop.hive.ql.io.*; +import org.apache.hadoop.hive.ql.io.AcidUtils.Directory; +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory; +import org.apache.hadoop.io.LongWritable; +import org.apache.hadoop.mapred.JobConf; +import org.apache.hadoop.mapred.Reporter; +import org.apache.orc.OrcConf; +import org.junit.Before; +import org.junit.Test; + +import java.io.File; +import java.util.*; Review Comment: we don't import with wildcards in general honestly, I don't have a strong opinion about this :) but it tends to be avoided according to code reviews Issue Time Tracking --- Worklog Id: (was: 808603) Time Spent: 1h 50m (was: 1h 40m) > FetchOperator scans delete_delta folders multiple times causing slowness > > > Key: HIVE-26496 > URL: https://issues.apache.org/jira/browse/HIVE-26496 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Rajesh Balamohan >Assignee: Dmitriy Fingerman >Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > FetchOperator scans way too many number of files/directories than needed. > For e.g here is a layout of a table which had set of updates and deletes. > There are set of "delta" and "delete_delta" folders which are created. > {noformat} > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/base_001 > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_002_002_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_003_003_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_004_004_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_005_005_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_006_006_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_007_007_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_008_008_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_009_009_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_010_010_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_011_011_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_012_012_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_013_013_ > s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_014_014_ >
[jira] [Work logged] (HIVE-26277) NPEs and rounding issues in ColumnStatsAggregator classes
[ https://issues.apache.org/jira/browse/HIVE-26277?focusedWorklogId=808598=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808598 ] ASF GitHub Bot logged work on HIVE-26277: - Author: ASF GitHub Bot Created on: 14/Sep/22 09:52 Start Date: 14/Sep/22 09:52 Worklog Time Spent: 10m Work Description: zabetak commented on code in PR #3339: URL: https://github.com/apache/hive/pull/3339#discussion_r970582527 ## standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/columnstats/aggr/DateColumnStatsAggregatorTest.java: ## @@ -0,0 +1,279 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.hadoop.hive.metastore.columnstats.aggr; + +import org.apache.hadoop.hive.metastore.TableType; +import org.apache.hadoop.hive.metastore.annotation.MetastoreUnitTest; +import org.apache.hadoop.hive.metastore.api.ColumnStatisticsData; +import org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj; +import org.apache.hadoop.hive.metastore.api.Date; +import org.apache.hadoop.hive.metastore.api.FieldSchema; +import org.apache.hadoop.hive.metastore.api.MetaException; +import org.apache.hadoop.hive.metastore.api.Table; +import org.apache.hadoop.hive.metastore.columnstats.ColStatsBuilder; +import org.apache.hadoop.hive.metastore.utils.MetaStoreServerUtils.ColStatsObjWithSourceInfo; +import org.junit.Assert; +import org.junit.Test; +import org.junit.experimental.categories.Category; + +import java.util.Arrays; +import java.util.Collections; +import java.util.List; + +import static org.apache.hadoop.hive.metastore.StatisticsTestUtils.createStatsWithInfo; + +@Category(MetastoreUnitTest.class) +public class DateColumnStatsAggregatorTest { + + private static final Table TABLE = new Table("dummy", "db", "hive", 0, 0, + 0, null, null, Collections.emptyMap(), null, null, + TableType.MANAGED_TABLE.toString()); + private static final FieldSchema COL = new FieldSchema("col", "int", ""); + + private static final Date DATE_1 = new Date(1); + private static final Date DATE_2 = new Date(2); + private static final Date DATE_3 = new Date(3); + private static final Date DATE_4 = new Date(4); + private static final Date DATE_5 = new Date(5); + private static final Date DATE_6 = new Date(6); + private static final Date DATE_7 = new Date(7); + private static final Date DATE_8 = new Date(8); + private static final Date DATE_9 = new Date(9); + + @Test + public void testAggregateSingleStat() throws MetaException { +List partitions = Collections.singletonList("part1"); + +ColumnStatisticsData data1 = new ColStatsBuilder<>(Date.class).numNulls(1).numDVs(2).low(DATE_1).high(DATE_4) +.hll(DATE_1.getDaysSinceEpoch(), DATE_4.getDaysSinceEpoch()).build(); +List statsList = +Collections.singletonList(createStatsWithInfo(data1, TABLE, COL, partitions.get(0))); + +DateColumnStatsAggregator aggregator = new DateColumnStatsAggregator(); +ColumnStatisticsObj computedStatsObj = aggregator.aggregate(statsList, partitions, true); + +Assert.assertEquals(data1, computedStatsObj.getStatsData()); + } + + @Test + public void testAggregateSingleStatWhenNullValues() throws MetaException { +List partitions = Collections.singletonList("part1"); + +ColumnStatisticsData data1 = new ColStatsBuilder<>(Date.class).numNulls(1).numDVs(2).build(); +List statsList = +Collections.singletonList(createStatsWithInfo(data1, TABLE, COL, partitions.get(0))); + +DateColumnStatsAggregator aggregator = new DateColumnStatsAggregator(); +ColumnStatisticsObj computedStatsObj = aggregator.aggregate(statsList, partitions, true); +Assert.assertEquals(data1, computedStatsObj.getStatsData()); + +aggregator.useDensityFunctionForNDVEstimation = true; +computedStatsObj = aggregator.aggregate(statsList, partitions, true); +Assert.assertEquals(data1, computedStatsObj.getStatsData()); + +aggregator.useDensityFunctionForNDVEstimation = false; +aggregator.ndvTuner = 1; +// ndv tuner does not have any effect because min numDVs and max
[jira] [Work logged] (HIVE-26445) Use tez.local.mode.without.network for qtests
[ https://issues.apache.org/jira/browse/HIVE-26445?focusedWorklogId=808593=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808593 ] ASF GitHub Bot logged work on HIVE-26445: - Author: ASF GitHub Bot Created on: 14/Sep/22 09:41 Start Date: 14/Sep/22 09:41 Worklog Time Spent: 10m Work Description: sonarcloud[bot] commented on PR #3583: URL: https://github.com/apache/hive/pull/3583#issuecomment-1246505407 Kudos, SonarCloud Quality Gate passed! [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive=3583) [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=3583=false=BUG) [![C](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/C-16px.png 'C')](https://sonarcloud.io/project/issues?id=apache_hive=3583=false=BUG) [1 Bug](https://sonarcloud.io/project/issues?id=apache_hive=3583=false=BUG) [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=3583=false=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=3583=false=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=3583=false=VULNERABILITY) [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3583=false=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3583=false=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3583=false=SECURITY_HOTSPOT) [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive=3583=false=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=3583=false=CODE_SMELL) [44 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive=3583=false=CODE_SMELL) [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive=3583=coverage=list) No Coverage information [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive=3583=duplicated_lines_density=list) No Duplication information Issue Time Tracking --- Worklog Id: (was: 808593) Time Spent: 0.5h (was: 20m) > Use tez.local.mode.without.network for qtests > - > > Key: HIVE-26445 > URL: https://issues.apache.org/jira/browse/HIVE-26445 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > looks like in case of iceberg, the local dagclient behaves weird: > {code} > 2022-08-02T06:54:36,669 ERROR [2f953972-7675-4594-8d6b-d1c295c056a5 > Time-limited test] tez.TezTask: Failed to execute tez graph. > java.lang.NullPointerException: null > at > org.apache.hadoop.hive.ql.exec.tez.TezTask.collectCommitInformation(TezTask.java:367) > ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT] > at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:279) > [hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT] > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212) > [hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) > [hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT] > at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:354) > [hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT] > at
[jira] [Work logged] (HIVE-26488) Fix NPE in DDLSemanticAnalyzerFactory during compilation
[ https://issues.apache.org/jira/browse/HIVE-26488?focusedWorklogId=808573=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808573 ] ASF GitHub Bot logged work on HIVE-26488: - Author: ASF GitHub Bot Created on: 14/Sep/22 09:13 Start Date: 14/Sep/22 09:13 Worklog Time Spent: 10m Work Description: zabetak commented on code in PR #3538: URL: https://github.com/apache/hive/pull/3538#discussion_r970539353 ## ql/src/java/org/apache/hadoop/hive/ql/ddl/DDLSemanticAnalyzerFactory.java: ## @@ -65,10 +68,12 @@ public interface DDLSemanticAnalyzerCategory { new HashMap<>(); static { -Set> analyzerClasses1 = -new Reflections(DDL_ROOT).getSubTypesOf(BaseSemanticAnalyzer.class); -Set> analyzerClasses2 = -new Reflections(DDL_ROOT).getSubTypesOf(CalcitePlanner.class); +Set> analyzerClasses1 = new Reflections( +new ConfigurationBuilder() +.setUrls(ClasspathHelper.forPackage(DDL_ROOT)).filterInputsBy(new FilterBuilder().includePackage(DDL_ROOT)).setExpandSuperTypes(false)).getSubTypesOf(BaseSemanticAnalyzer.class); +Set> analyzerClasses2 = new Reflections( +new ConfigurationBuilder().filterInputsBy(new FilterBuilder().includePackage(DDL_ROOT)) + .setUrls(ClasspathHelper.forPackage(DDL_ROOT)).setExpandSuperTypes(false)).getSubTypesOf(CalcitePlanner.class); Set> analyzerClasses = Sets.union(analyzerClasses1, analyzerClasses2); for (Class analyzerClass : analyzerClasses) { Review Comment: Thanks! Please delete the following lines as well; they are redundant. ```java Set> analyzerClasses2 = new Reflections(DDL_ROOT).getSubTypesOf(CalcitePlanner.class); Set> analyzerClasses = Sets.union(analyzerClasses1, analyzerClasses2); ``` Consider adding a null check after the following line to avoid similar problems in the future: ```java DDLType ddlType = analyzerCategoryClass.getAnnotation(DDLType.class); ``` Issue Time Tracking --- Worklog Id: (was: 808573) Time Spent: 1h 50m (was: 1h 40m) > Fix NPE in DDLSemanticAnalyzerFactory during compilation > > > Key: HIVE-26488 > URL: https://issues.apache.org/jira/browse/HIVE-26488 > Project: Hive > Issue Type: Sub-task >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > *Exception Trace:* > {noformat} > java.lang.ExceptionInInitializerError > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzerFactory.getInternal(SemanticAnalyzerFactory.java:62) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzerFactory.get(SemanticAnalyzerFactory.java:41) > at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:209) > at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:106) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:507) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:459) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:424) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:418) > {noformat} > *Cause:* > {noformat} > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.ddl.DDLSemanticAnalyzerFactory.(DDLSemanticAnalyzerFactory.java:84) > ... 40 more > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26488) Fix NPE in DDLSemanticAnalyzerFactory during compilation
[ https://issues.apache.org/jira/browse/HIVE-26488?focusedWorklogId=808559=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808559 ] ASF GitHub Bot logged work on HIVE-26488: - Author: ASF GitHub Bot Created on: 14/Sep/22 08:36 Start Date: 14/Sep/22 08:36 Worklog Time Spent: 10m Work Description: ayushtkn commented on code in PR #3538: URL: https://github.com/apache/hive/pull/3538#discussion_r970498592 ## ql/src/java/org/apache/hadoop/hive/ql/ddl/DDLSemanticAnalyzerFactory.java: ## @@ -65,10 +68,12 @@ public interface DDLSemanticAnalyzerCategory { new HashMap<>(); static { -Set> analyzerClasses1 = -new Reflections(DDL_ROOT).getSubTypesOf(BaseSemanticAnalyzer.class); -Set> analyzerClasses2 = -new Reflections(DDL_ROOT).getSubTypesOf(CalcitePlanner.class); +Set> analyzerClasses1 = new Reflections( +new ConfigurationBuilder() +.setUrls(ClasspathHelper.forPackage(DDL_ROOT)).filterInputsBy(new FilterBuilder().includePackage(DDL_ROOT)).setExpandSuperTypes(false)).getSubTypesOf(BaseSemanticAnalyzer.class); +Set> analyzerClasses2 = new Reflections( +new ConfigurationBuilder().filterInputsBy(new FilterBuilder().includePackage(DDL_ROOT)) + .setUrls(ClasspathHelper.forPackage(DDL_ROOT)).setExpandSuperTypes(false)).getSubTypesOf(CalcitePlanner.class); Set> analyzerClasses = Sets.union(analyzerClasses1, analyzerClasses2); for (Class analyzerClass : analyzerClasses) { Review Comment: Done Issue Time Tracking --- Worklog Id: (was: 808559) Time Spent: 1h 40m (was: 1.5h) > Fix NPE in DDLSemanticAnalyzerFactory during compilation > > > Key: HIVE-26488 > URL: https://issues.apache.org/jira/browse/HIVE-26488 > Project: Hive > Issue Type: Sub-task >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > *Exception Trace:* > {noformat} > java.lang.ExceptionInInitializerError > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzerFactory.getInternal(SemanticAnalyzerFactory.java:62) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzerFactory.get(SemanticAnalyzerFactory.java:41) > at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:209) > at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:106) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:507) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:459) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:424) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:418) > {noformat} > *Cause:* > {noformat} > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.ddl.DDLSemanticAnalyzerFactory.(DDLSemanticAnalyzerFactory.java:84) > ... 40 more > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26521) Iceberg: Raise exception when running delete/update statements on V1 tables
[ https://issues.apache.org/jira/browse/HIVE-26521?focusedWorklogId=808554=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-808554 ] ASF GitHub Bot logged work on HIVE-26521: - Author: ASF GitHub Bot Created on: 14/Sep/22 08:23 Start Date: 14/Sep/22 08:23 Worklog Time Spent: 10m Work Description: sonarcloud[bot] commented on PR #3579: URL: https://github.com/apache/hive/pull/3579#issuecomment-1246415166 Kudos, SonarCloud Quality Gate passed! [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive=3579) [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=3579=false=BUG) [![C](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/C-16px.png 'C')](https://sonarcloud.io/project/issues?id=apache_hive=3579=false=BUG) [1 Bug](https://sonarcloud.io/project/issues?id=apache_hive=3579=false=BUG) [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=3579=false=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=3579=false=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=3579=false=VULNERABILITY) [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3579=false=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3579=false=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3579=false=SECURITY_HOTSPOT) [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive=3579=false=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=3579=false=CODE_SMELL) [44 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive=3579=false=CODE_SMELL) [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive=3579=coverage=list) No Coverage information [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive=3579=duplicated_lines_density=list) No Duplication information Issue Time Tracking --- Worklog Id: (was: 808554) Time Spent: 1h 20m (was: 1h 10m) > Iceberg: Raise exception when running delete/update statements on V1 tables > --- > > Key: HIVE-26521 > URL: https://issues.apache.org/jira/browse/HIVE-26521 > Project: Hive > Issue Type: Improvement >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > Right now an exception is raised on the executor side when trying to commit > the delete file. We should throw an exception earlier, during the compilation > phase. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-1271) Case sensitiveness of type information specified when using custom reducer causes type mismatch
[ https://issues.apache.org/jira/browse/HIVE-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] LiaoShuang updated HIVE-1271: - Description: Type information specified while using a custom reduce script is converted to lower case, and causes type mismatch during query semantic analysis . The following REDUCE query where field name = "userId" failed. hive> CREATE TABLE SS ( > a INT, > b INT, > vals ARRAY> > ); OK hive> FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s > INSERT OVERWRITE TABLE SS > REDUCE * > USING 'myreduce.py' > AS > (a INT, > b INT, > vals ARRAY> > ) > ; FAILED: Error in semantic analysis: line 2:27 Cannot insert into target table because column number/types are different SS: Cannot convert column 2 from array> to array>. The same query worked fine after changing "userId" to "userid". was: Type information specified while using a custom reduce script is converted to lower case, and causes type mismatch during query semantic analysis . The following REDUCE query where field name = "userId" failed. hive> CREATE TABLE SS ( > a INT, > b INT, > vals ARRAY> > ); OK hive> FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s > INSERT OVERWRITE TABLE SS > REDUCE * > USING 'myreduce.py' > AS > (a INT, > b INT, > vals ARRAY> > ) > ; FAILED: Error in semantic analysis: line 2:27 Cannot insert into target table because column number/types are different SS: Cannot convert column 2 from array> to array>. The same query worked fine after changing "userId" to "userid". *TEST > Case sensitiveness of type information specified when using custom reducer > causes type mismatch > --- > > Key: HIVE-1271 > URL: https://issues.apache.org/jira/browse/HIVE-1271 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.5.0 >Reporter: Dilip Joseph >Assignee: Arvind Prabhakar >Priority: Major > Fix For: 0.6.0 > > Attachments: HIVE-1271-1.patch, HIVE-1271.patch > > > Type information specified while using a custom reduce script is converted to > lower case, and causes type mismatch during query semantic analysis . The > following REDUCE query where field name = "userId" failed. > hive> CREATE TABLE SS ( > > a INT, > > b INT, > > vals ARRAY> > > ); > OK > hive> FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s > > INSERT OVERWRITE TABLE SS > > REDUCE * > > USING 'myreduce.py' > > AS > > (a INT, > > b INT, > > vals ARRAY> > > ) > > ; > FAILED: Error in semantic analysis: line 2:27 Cannot insert into > target table because column number/types are different SS: Cannot > convert column 2 from array> to > array>. > The same query worked fine after changing "userId" to "userid". -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-1271) Case sensitiveness of type information specified when using custom reducer causes type mismatch
[ https://issues.apache.org/jira/browse/HIVE-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] LiaoShuang updated HIVE-1271: - Description: Type information specified while using a custom reduce script is converted to lower case, and causes type mismatch during query semantic analysis . The following REDUCE query where field name = "userId" failed. hive> CREATE TABLE SS ( > a INT, > b INT, > vals ARRAY> > ); OK hive> FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s > INSERT OVERWRITE TABLE SS > REDUCE * > USING 'myreduce.py' > AS > (a INT, > b INT, > vals ARRAY> > ) > ; FAILED: Error in semantic analysis: line 2:27 Cannot insert into target table because column number/types are different SS: Cannot convert column 2 from array> to array>. The same query worked fine after changing "userId" to "userid". *TEST was: Type information specified while using a custom reduce script is converted to lower case, and causes type mismatch during query semantic analysis . The following REDUCE query where field name = "userId" failed. hive> CREATE TABLE SS ( > a INT, > b INT, > vals ARRAY> > ); OK hive> FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s > INSERT OVERWRITE TABLE SS > REDUCE * > USING 'myreduce.py' > AS > (a INT, > b INT, > vals ARRAY> > ) > ; FAILED: Error in semantic analysis: line 2:27 Cannot insert into target table because column number/types are different SS: Cannot convert column 2 from array> to array>. The same query worked fine after changing "userId" to "userid". > Case sensitiveness of type information specified when using custom reducer > causes type mismatch > --- > > Key: HIVE-1271 > URL: https://issues.apache.org/jira/browse/HIVE-1271 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.5.0 >Reporter: Dilip Joseph >Assignee: Arvind Prabhakar >Priority: Major > Fix For: 0.6.0 > > Attachments: HIVE-1271-1.patch, HIVE-1271.patch > > > Type information specified while using a custom reduce script is converted to > lower case, and causes type mismatch during query semantic analysis . The > following REDUCE query where field name = "userId" failed. > hive> CREATE TABLE SS ( > > a INT, > > b INT, > > vals ARRAY> > > ); > OK > hive> FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s > > INSERT OVERWRITE TABLE SS > > REDUCE * > > USING 'myreduce.py' > > AS > > (a INT, > > b INT, > > vals ARRAY> > > ) > > ; > FAILED: Error in semantic analysis: line 2:27 Cannot insert into > target table because column number/types are different SS: Cannot > convert column 2 from array> to > array>. > The same query worked fine after changing "userId" to "userid". > *TEST -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-26476) Iceberg: map "ORCFILE" to "ORC" while creating an iceberg table
[ https://issues.apache.org/jira/browse/HIVE-26476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Pintér resolved HIVE-26476. -- Resolution: Fixed > Iceberg: map "ORCFILE" to "ORC" while creating an iceberg table > --- > > Key: HIVE-26476 > URL: https://issues.apache.org/jira/browse/HIVE-26476 > Project: Hive > Issue Type: Bug >Reporter: Manthan B Y >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > *Issue:* Insert query failing with VERTEX_FAILURE > *Steps to Reproduce:* > # Open Beeline session > # Execute the following queries > {code:java} > DROP TABLE IF EXISTS t2; > CREATE TABLE IF NOT EXISTS t2(c0 DOUBLE , c1 DOUBLE , c2 DECIMAL) STORED BY > ICEBERG STORED AS ORCFILE; > INSERT INTO t2(c1, c0) VALUES(0.1803113419993464, 0.9381388537256228);{code} > *Result:* > {code:java} > org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordWriter(HiveFileFormatUtils.java:294) > at > org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:279) > ... 36 more ]], Vertex did not succeed due to OWN_TASK_FAILURE, > failedTasks:1 killedTasks:0, Vertex vertex_1660631059889_0001_8_00 [Map 1] > killed/failed due to:OWN_TASK_FAILURE]Vertex killed, vertexName=Reducer 2, > vertexId=vertex_1660631059889_0001_8_01, diagnostics=[Vertex received Kill > while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, > failedTasks:0 killedTasks:1, Vertex vertex_1660631059889_0001_8_01 [Reducer > 2] killed/failed due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to > VERTEX_FAILURE. failedVertices:1 killedVertices:1{code} > *Note:* Same query with table in non-iceberg format works without error -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26507) Do not allow hive to iceberg migration if source table contains CHAR or VARCHAR columns
[ https://issues.apache.org/jira/browse/HIVE-26507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17603931#comment-17603931 ] László Pintér commented on HIVE-26507: -- The addendum was merged into master. Thanks, [~szita] for the review! > Do not allow hive to iceberg migration if source table contains CHAR or > VARCHAR columns > --- > > Key: HIVE-26507 > URL: https://issues.apache.org/jira/browse/HIVE-26507 > Project: Hive > Issue Type: Bug >Reporter: Rajesh Balamohan >Assignee: László Pintér >Priority: Major > Labels: iceberg, pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > "alter table" statements can be used for generating iceberg metadata > information (i.e for converting external tables -> iceberg tables). > As part of this process, it also converts certain datatypes to iceberg > compatible types (e.g char -> string). "iceberg.mr.schema.auto.conversion" > enables this conversion. > This could cause certain issues at runtime. Here is an example > {noformat} > Before conversion: > == > -- external table > select count(*) from customer_demographics where cd_gender = 'F' and > cd_marital_status = 'U' and cd_education_status = '2 yr Degree'; > 27440 > after conversion: > = > -- iceberg table > select count(*) from customer_demographics where cd_gender = 'F' and > cd_marital_status = 'U' and cd_education_status = '2 yr Degree'; > 0 > select count(*) from customer_demographics where cd_gender = 'F' and > cd_marital_status = 'U' and trim(cd_education_status) = '2 yr Degree'; > 27440 > {noformat} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26499) 启用向量化(hive.vectorized.execution.enabled=true)后,case when的计算结果出现预设外的值
[ https://issues.apache.org/jira/browse/HIVE-26499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17603908#comment-17603908 ] Zhizhen Hou commented on HIVE-26499: Master branch has solved this problem. https://issues.apache.org/jira/browse/HIVE-26408 > 启用向量化(hive.vectorized.execution.enabled=true)后,case when的计算结果出现预设外的值 > > > Key: HIVE-26499 > URL: https://issues.apache.org/jira/browse/HIVE-26499 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 3.1.0 > Environment: hdfs(3.1.1) > yarn(3.1.1) > zookeeper(3.4.6) > hive(3.1.0) > tez(0.9.1) >Reporter: Ricco-Chan >Priority: Major > Attachments: image-2022-08-29-11-04-21-921.png > > > -- case when预设值只有1、2、3,计算结果中出现5和6。发现此bug时,对应表使用的是parquet格式 + snappy 压缩 > > select distinct(traveller_type) from > ( > select pri_acct_no, > case > when (t1.consume_flag = '1' and substr(t1.areacode, 1, 2) <> > '65') then '2' > when (substr(t1.areacode, 1, 2) = substr(t1.country_id_new, 1, 2) > and t1.consume_flag = '1') then '1' > else '3' > end as traveller_type > from my_table t1 where consume_flag = '1' > ) t2; > - > !image-2022-08-29-11-04-21-921.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)