[jira] [Work logged] (HIVE-26201) QueryResultsCache may have wrong permission if umask is too strict
[ https://issues.apache.org/jira/browse/HIVE-26201?focusedWorklogId=768322=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-768322 ] ASF GitHub Bot logged work on HIVE-26201: - Author: ASF GitHub Bot Created on: 10/May/22 05:46 Start Date: 10/May/22 05:46 Worklog Time Spent: 10m Work Description: skysiders commented on PR #3267: URL: https://github.com/apache/hive/pull/3267#issuecomment-1121956685 Hi @abstractdog , Could you please have a look at this?It same as [TEZ-4412](https://github.com/apache/tez/pull/209). Issue Time Tracking --- Worklog Id: (was: 768322) Time Spent: 20m (was: 10m) > QueryResultsCache may have wrong permission if umask is too strict > -- > > Key: HIVE-26201 > URL: https://issues.apache.org/jira/browse/HIVE-26201 > Project: Hive > Issue Type: Bug > Components: Query Processor, Tez >Affects Versions: 3.1.3 >Reporter: Zhang Dongsheng >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > TezSessionState, QueryResultsCache and Context use mkdirs(path, permission) > to create directory with special permission. But If the umask is too > restrictive, permissions may not work as expected. So we need to check if > permission is set as expected. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26174) disable rename table across dbs when on different filesystem
[ https://issues.apache.org/jira/browse/HIVE-26174?focusedWorklogId=768291=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-768291 ] ASF GitHub Bot logged work on HIVE-26174: - Author: ASF GitHub Bot Created on: 10/May/22 02:48 Start Date: 10/May/22 02:48 Worklog Time Spent: 10m Work Description: adrian-wang commented on PR #3240: URL: https://github.com/apache/hive/pull/3240#issuecomment-1121829345 Hi @ayushtkn , In our customers' scenario, they want to put table data in HDFS, while some stale tables will be archived to cloud store. They were using this command and thought that data should be placed in cloud store, but after a while they found it is still in HDFS. Besides, currently if we have two dbs located on two different HDFS services, the command would fail, Hence I think we should not allow rename when two dbs are on different storages. Issue Time Tracking --- Worklog Id: (was: 768291) Time Spent: 20m (was: 10m) > disable rename table across dbs when on different filesystem > > > Key: HIVE-26174 > URL: https://issues.apache.org/jira/browse/HIVE-26174 > Project: Hive > Issue Type: Improvement >Reporter: Adrian Wang >Assignee: Adrian Wang >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Currently, if we run > ALTER TABLE db1.table1 RENAME TO db2.table2; > and with `db1` and `db2` on different filesystem, for example `db1` as > `"hdfs:/user/hive/warehouse/db1.db"`, and `db2` as > `"s3://bucket/s3warehouse/db2.db"`, the new `db2.table2` will be under > location `hdfs:/s3warehouse/db2.db/table2`, which looks quite strange. > The idea is to ban this kind of operation, as we seem to intend to ban that, > but the check was done after we changed file system scheme so it was always > true. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-25963) Temporary table creation with not null constraint gets converted to external table
[ https://issues.apache.org/jira/browse/HIVE-25963?focusedWorklogId=768247=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-768247 ] ASF GitHub Bot logged work on HIVE-25963: - Author: ASF GitHub Bot Created on: 10/May/22 00:20 Start Date: 10/May/22 00:20 Worklog Time Spent: 10m Work Description: github-actions[bot] commented on PR #3040: URL: https://github.com/apache/hive/pull/3040#issuecomment-1121708249 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the d...@hive.apache.org list if the patch is in need of reviews. Issue Time Tracking --- Worklog Id: (was: 768247) Time Spent: 2h 50m (was: 2h 40m) > Temporary table creation with not null constraint gets converted to external > table > --- > > Key: HIVE-25963 > URL: https://issues.apache.org/jira/browse/HIVE-25963 > Project: Hive > Issue Type: Bug > Components: HiveServer2, Standalone Metastore >Reporter: Sourabh Goyal >Assignee: Sourabh Goyal >Priority: Major > Labels: pull-request-available > Time Spent: 2h 50m > Remaining Estimate: 0h > > When creating a temporary table with not null, constraint it gets covered to > external table. For example: > create temporary table t2 (a int not null); > table t2' metadata looks like: > {code:java} > +---+++ > | col_name| data_type >| comment | > +---+++ > | a | int >|| > | | NULL >| NULL | > | # Detailed Table Information | NULL >| NULL | > | Database: | default >| NULL | > | OwnerType:| USER >| NULL | > | Owner:| sourabh >| NULL | > | CreateTime: | Tue Feb 15 15:20:13 PST 2022 >| NULL | > | LastAccessTime: | UNKNOWN >| NULL | > | Retention:| 0 >| NULL | > | Location: | > hdfs://localhost:9000/tmp/hive/sourabh/80d374a8-cd7a-4fcf-ae72-51b04ff9c3d8/_tmp_space.db/4574446d-c144-48f9-b4b6-2e9ee0ce5be4 > | NULL | > | Table Type: | EXTERNAL_TABLE >| NULL | > | Table Parameters: | NULL >| NULL | > | | COLUMN_STATS_ACCURATE >| {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"a\":\"true\"}} | > | | EXTERNAL >| TRUE | > | | TRANSLATED_TO_EXTERNAL >| TRUE | > | | bucketing_version >| 2 | > | | external.table.purge >| TRUE | > | | numFiles >| 0 | > | | numRows >| 0 | > |
[jira] [Work logged] (HIVE-25998) Build iceberg modules without a flag
[ https://issues.apache.org/jira/browse/HIVE-25998?focusedWorklogId=768245=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-768245 ] ASF GitHub Bot logged work on HIVE-25998: - Author: ASF GitHub Bot Created on: 10/May/22 00:20 Start Date: 10/May/22 00:20 Worklog Time Spent: 10m Work Description: github-actions[bot] closed pull request #3068: HIVE-25998: Build iceberg modules without a flag URL: https://github.com/apache/hive/pull/3068 Issue Time Tracking --- Worklog Id: (was: 768245) Time Spent: 0.5h (was: 20m) > Build iceberg modules without a flag > > > Key: HIVE-25998 > URL: https://issues.apache.org/jira/browse/HIVE-25998 > Project: Hive > Issue Type: Task >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > We originally introduced a -Piceberg flag for building the iceberg modules. > Since then the iceberg modules are stabilised and we would like to have a > release, we should remove the flag now. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-25495) Upgrade to JLine3
[ https://issues.apache.org/jira/browse/HIVE-25495?focusedWorklogId=768244=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-768244 ] ASF GitHub Bot logged work on HIVE-25495: - Author: ASF GitHub Bot Created on: 10/May/22 00:20 Start Date: 10/May/22 00:20 Worklog Time Spent: 10m Work Description: github-actions[bot] closed pull request #3069: HIVE-25495: Upgrade JLine to version 3 URL: https://github.com/apache/hive/pull/3069 Issue Time Tracking --- Worklog Id: (was: 768244) Time Spent: 2h 50m (was: 2h 40m) > Upgrade to JLine3 > - > > Key: HIVE-25495 > URL: https://issues.apache.org/jira/browse/HIVE-25495 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 2h 50m > Remaining Estimate: 0h > > Jline 2 has been discontinued a long while ago. Hadoop uses JLine3 so Hive > should match. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-13384) Failed to create HiveMetaStoreClient object with proxy user when Kerberos enabled
[ https://issues.apache.org/jira/browse/HIVE-13384?focusedWorklogId=768246=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-768246 ] ASF GitHub Bot logged work on HIVE-13384: - Author: ASF GitHub Bot Created on: 10/May/22 00:20 Start Date: 10/May/22 00:20 Worklog Time Spent: 10m Work Description: github-actions[bot] closed pull request #3064: [HIVE-13384] HiveMetaStoreClient supports proxy URL: https://github.com/apache/hive/pull/3064 Issue Time Tracking --- Worklog Id: (was: 768246) Time Spent: 1h (was: 50m) > Failed to create HiveMetaStoreClient object with proxy user when Kerberos > enabled > - > > Key: HIVE-13384 > URL: https://issues.apache.org/jira/browse/HIVE-13384 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 1.2.0, 1.2.1 >Reporter: Bing Li >Assignee: Bing Li >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > I wrote a Java client to talk with HiveMetaStore. (Hive 1.2.0) > But found that it can't new a HiveMetaStoreClient object successfully via a > proxy user in Kerberos env. > === > 15/10/13 00:14:38 ERROR transport.TSaslTransport: SASL negotiation failure > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)] > at > com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) > at > org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94) > at > org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) > == > When I debugging on Hive, I found that the error came from open() method in > HiveMetaStoreClient class. > Around line 406, > transport = UserGroupInformation.getCurrentUser().doAs(new > PrivilegedExceptionAction() { //FAILED, because the current user > doesn't have the cridential > But it will work if I change above line to > transport = UserGroupInformation.getCurrentUser().getRealUser().doAs(new > PrivilegedExceptionAction() { //PASS > I found DRILL-3413 fixes this error in Drill side as a workaround. But if I > submit a mapreduce job via Pig/HCatalog, it runs into the same issue again > when initialize the object via HCatalog. > It would be better to fix this issue in Hive side. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26071) JWT authentication for Thrift over HTTP in HiveMetaStore
[ https://issues.apache.org/jira/browse/HIVE-26071?focusedWorklogId=768226=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-768226 ] ASF GitHub Bot logged work on HIVE-26071: - Author: ASF GitHub Bot Created on: 09/May/22 23:40 Start Date: 09/May/22 23:40 Worklog Time Spent: 10m Work Description: hsnusonic commented on code in PR #3233: URL: https://github.com/apache/hive/pull/3233#discussion_r868525627 ## standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java: ## @@ -79,8 +79,10 @@ import org.apache.hadoop.util.ReflectionUtils; import org.apache.hadoop.util.StringUtils; import org.apache.http.HttpException; +import org.apache.http.HttpHeaders; import org.apache.http.HttpRequest; import org.apache.http.HttpRequestInterceptor; +import org.apache.http.client.utils.HttpClientUtils; Review Comment: nit: unused import? ## standalone-metastore/metastore-server/pom.xml: ## @@ -311,6 +311,22 @@ curator-test test + + com.nimbusds + nimbus-jose-jwt + 9.20 + + + org.pac4j + pac4j-core + 4.5.5 + + + com.github.tomakehurst + wiremock-jre8-standalone + 2.32.0 Review Comment: Never mind. This is only used in test, so I am OK with it. Issue Time Tracking --- Worklog Id: (was: 768226) Time Spent: 6h 40m (was: 6.5h) > JWT authentication for Thrift over HTTP in HiveMetaStore > > > Key: HIVE-26071 > URL: https://issues.apache.org/jira/browse/HIVE-26071 > Project: Hive > Issue Type: New Feature > Components: Standalone Metastore >Reporter: Sourabh Goyal >Assignee: Sourabh Goyal >Priority: Major > Labels: pull-request-available > Time Spent: 6h 40m > Remaining Estimate: 0h > > HIVE-25575 recently added a support for JWT authentication in HS2. This Jira > aims to add the same feature in HMS -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26158) TRANSLATED_TO_EXTERNAL partition tables cannot query partition data after rename table
[ https://issues.apache.org/jira/browse/HIVE-26158?focusedWorklogId=768222=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-768222 ] ASF GitHub Bot logged work on HIVE-26158: - Author: ASF GitHub Bot Created on: 09/May/22 23:32 Start Date: 09/May/22 23:32 Worklog Time Spent: 10m Work Description: saihemanth-cloudera commented on code in PR #3255: URL: https://github.com/apache/hive/pull/3255#discussion_r867627190 ## ql/src/test/results/clientpositive/llap/translated_external_rename3.q.out: ## @@ -95,15 +64,17 @@ Retention: 0 A masked pattern was here Table Type:EXTERNAL_TABLE Table Parameters: - COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"a\":\"true\"}} + COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\"} Review Comment: I'm wondering why the column_stats are missing here when we do a describe on the table. ## ql/src/test/queries/clientpositive/translated_external_rename3.q: ## @@ -1,26 +1,25 @@ set metastore.metadata.transformer.class=org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer; -set metastore.metadata.transformer.location.mode=force; Review Comment: this is also working in force mode also. Can you please give some info about why you had to change it to seqprefix? ## standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java: ## @@ -244,6 +244,13 @@ public static boolean isExternalTable(Table table) { return isExternal(params); } + public static boolean isTranslatedToExternalTable(Table table) { +Map p = table.getParameters(); Review Comment: Can you change the name of this variable 'p' to something more meaningful like tblProperties? Issue Time Tracking --- Worklog Id: (was: 768222) Time Spent: 20m (was: 10m) > TRANSLATED_TO_EXTERNAL partition tables cannot query partition data after > rename table > -- > > Key: HIVE-26158 > URL: https://issues.apache.org/jira/browse/HIVE-26158 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0, 4.0.0-alpha-1, 4.0.0-alpha-2 >Reporter: tanghui >Assignee: Zoltan Haindrich >Priority: Major > Labels: metastore_translator, pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > After the patch is updated, the partition table location and hdfs data > directory are displayed normally, but the partition location of the table in > the SDS in the Hive metabase is still displayed as the location of the old > table, resulting in no data in the query partition. > > in beeline: > > set hive.create.as.external.legacy=true; > CREATE TABLE part_test( > c1 string > ,c2 string > )PARTITIONED BY (dat string) > insert into part_test values ("11","th","20220101") > insert into part_test values ("22","th","20220102") > alter table part_test rename to part_test11; > --this result is null. > select * from part_test11 where dat="20220101"; > ||part_test.c1||part_test.c2||part_test.dat|| > | | | | > - > SDS in the Hive metabase: > select SDS.LOCATION from TBLS,SDS where TBLS.TBL_NAME="part_test11" AND > TBLS.TBL_ID=SDS.CD_ID; > --- > |*LOCATION*| > |hdfs://nameservice1/warehouse/tablespace/external/hive/part_test11| > |hdfs://nameservice1/warehouse/tablespace/external/hive/part_test/dat=20220101| > |hdfs://nameservice1/warehouse/tablespace/external/hive/part_test/dat=20220102| > --- > > We need to modify the partition location of the table in SDS to ensure that > the query results are normal -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (HIVE-26215) Expose the MIN_HISTORY_LEVEL table through Hive sys database
[ https://issues.apache.org/jira/browse/HIVE-26215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-26215: -- Labels: pull-request-available (was: ) > Expose the MIN_HISTORY_LEVEL table through Hive sys database > --- > > Key: HIVE-26215 > URL: https://issues.apache.org/jira/browse/HIVE-26215 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > While we still (partially) use MIN_HISTORY_LEVEL for the cleaner, we should > expose it as a sys table so we can see what might be blocking the Cleaner > thread. > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26215) Expose the MIN_HISTORY_LEVEL table through Hive sys database
[ https://issues.apache.org/jira/browse/HIVE-26215?focusedWorklogId=768215=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-768215 ] ASF GitHub Bot logged work on HIVE-26215: - Author: ASF GitHub Bot Created on: 09/May/22 23:07 Start Date: 09/May/22 23:07 Worklog Time Spent: 10m Work Description: simhadri-g opened a new pull request, #3275: URL: https://github.com/apache/hive/pull/3275 ### What changes were proposed in this pull request? Expose the MIN_HISTORY_LEVEL table through Hive sys database ### Why are the changes needed? While we still (partially) use MIN_HISTORY_LEVEL for the cleaner, we should expose it as a sys table so we can see what might be blocking the Cleaner thread. ### Does this PR introduce _any_ user-facing change? Yes, users will be able to check MIN_HISTORY_LEVEL through sys db. ### How was this patch tested? q test and schema tool. Issue Time Tracking --- Worklog Id: (was: 768215) Remaining Estimate: 0h Time Spent: 10m > Expose the MIN_HISTORY_LEVEL table through Hive sys database > --- > > Key: HIVE-26215 > URL: https://issues.apache.org/jira/browse/HIVE-26215 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > While we still (partially) use MIN_HISTORY_LEVEL for the cleaner, we should > expose it as a sys table so we can see what might be blocking the Cleaner > thread. > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Assigned] (HIVE-26215) Expose the MIN_HISTORY_LEVEL table through Hive sys database
[ https://issues.apache.org/jira/browse/HIVE-26215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simhadri G reassigned HIVE-26215: - > Expose the MIN_HISTORY_LEVEL table through Hive sys database > --- > > Key: HIVE-26215 > URL: https://issues.apache.org/jira/browse/HIVE-26215 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > > While we still (partially) use MIN_HISTORY_LEVEL for the cleaner, we should > expose it as a sys table so we can see what might be blocking the Cleaner > thread. > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26009) Determine number of buckets for implicitly bucketed ACIDv2 tables
[ https://issues.apache.org/jira/browse/HIVE-26009?focusedWorklogId=768207=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-768207 ] ASF GitHub Bot logged work on HIVE-26009: - Author: ASF GitHub Bot Created on: 09/May/22 22:10 Start Date: 09/May/22 22:10 Worklog Time Spent: 10m Work Description: simhadri-g closed pull request #3224: HIVE-26009: Determine number of buckets for implicitly bucketed ACIDv… URL: https://github.com/apache/hive/pull/3224 Issue Time Tracking --- Worklog Id: (was: 768207) Time Spent: 20m (was: 10m) > Determine number of buckets for implicitly bucketed ACIDv2 tables > -- > > Key: HIVE-26009 > URL: https://issues.apache.org/jira/browse/HIVE-26009 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Hive tries to set number of reducers equal to number of buckets here: > [https://github.com/apache/hive/blob/9857c4e584384f7b0a49c34bc2bdf876c2ea1503/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L6958] > > > The numberOfBuckets for implicitly bucketed tables is set to -1 by default. > When this is the case, it is left to hive to estimate the number of reducers > required the job, based on job input, and configuration parameters. > [https://github.com/apache/hive/blob/9857c4e584384f7b0a49c34bc2bdf876c2ea1503/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3369] > > This estimate is not optimal in all cases. In the worst case, it case result > in a single reducer being launched , which can lead to a significant > bottleneck in performance . > > Ideally, the number of reducers launched should equal to number of buckets, > which is the case for explicitly bucketed tables. > > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (HIVE-26154) Upgrade cron-utils to 9.1.6 for branch-3
[ https://issues.apache.org/jira/browse/HIVE-26154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Asif Saleh updated HIVE-26154: -- Fix Version/s: 3.1.3 > Upgrade cron-utils to 9.1.6 for branch-3 > > > Key: HIVE-26154 > URL: https://issues.apache.org/jira/browse/HIVE-26154 > Project: Hive > Issue Type: Task > Components: Hive >Affects Versions: 3.1.3 >Reporter: Asif Saleh >Priority: Major > Labels: pull-request-available > Fix For: 3.1.3 > > > To fix [CVE-2021-41269|https://nvd.nist.gov/vuln/detail/CVE-2021-41269] issue. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HIVE-26154) Upgrade cron-utils to 9.1.6 for branch-3
[ https://issues.apache.org/jira/browse/HIVE-26154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17534023#comment-17534023 ] Asif Saleh commented on HIVE-26154: --- [~ngangam] The PR you merged was on the master branch. Can you pls make the change on the 3.1 branch? > Upgrade cron-utils to 9.1.6 for branch-3 > > > Key: HIVE-26154 > URL: https://issues.apache.org/jira/browse/HIVE-26154 > Project: Hive > Issue Type: Task > Components: Hive >Affects Versions: 3.1.3 >Reporter: Asif Saleh >Priority: Major > Labels: pull-request-available > > To fix [CVE-2021-41269|https://nvd.nist.gov/vuln/detail/CVE-2021-41269] issue. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (HIVE-26213) "hive.limit.pushdown.memory.usage" better not be equal to 1.0, otherwise it will raise an error
[ https://issues.apache.org/jira/browse/HIVE-26213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jingxuan Fu updated HIVE-26213: --- Description: In hive-default.xml.template {code:java} hive.limit.pushdown.memory.usage 0.1 Expects value between 0.0f and 1.0f. The fraction of available memory to be used for buffering rows in Reducesink operator for limit pushdown optimization. {code} Based on the description of hive-default.xml.template, hive.limit.pushdown.memory.usage expects a value between 0.0 and 1.0, setting hive.limit.pushdown.memory.usage to 1.0 means that it expects the available memory of all buffered lines for the limit pushdown optimization, and successfully start hiveserver2. Then, call the java api to write a program to establish a jdbc connection as a client to access hive, using JDBCDemo as an example. {code:java} import demo.utils.JDBCUtils; public class JDBCDemo{ public static void main(String[] args) throws Exception { JDBCUtils.init(); JDBCUtils.createDatabase(); JDBCUtils.showDatabases(); JDBCUtils.createTable(); JDBCUtils.showTables(); JDBCUtils.descTable(); JDBCUtils.loadData(); JDBCUtils.selectData(); JDBCUtils.countData(); JDBCUtils.dropDatabase(); JDBCUtils.dropTable(); JDBCUtils.destory(); } } {code} After running the client program, both the client and the hiveserver throw exceptions. {code:java} 2022-05-09 19:05:36: Starting HiveServer2 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Hive Session ID = 67a6db8d-f957-4d5d-ac18-28403adab7f3 Hive Session ID = f9f8772c-5765-4c3e-bcff-ca605c667be7 OK OK OK OK OK OK OK Loading data to table default.emp OK FAILED: SemanticException Invalid memory usage value 1.0 for hive.limit.pushdown.memory.usage{code} {code:java} liky@ljq1:~/hive_jdbc_test$ ./startJDBC_0.sh SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/liky/.m2/repository/org/apache/logging/log4j/log4j-slf4j-impl/2.17.1/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/liky/.m2/repository/org/slf4j/slf4j-log4j12/1.7.25/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Running: drop database if exists hive_jdbc_test Running: create database hive_jdbc_test Running: show databases default hive_jdbc_test Running: drop table if exists emp Running: create table emp( empno int, ename string, job string, mgr int, hiredate string, sal double, comm double, deptno int ) row format delimited fields terminated by '\t' Running: show tables emp Running: desc emp empno int ename string job string mgr int hiredate string sal double comm double deptno int Running: load data local inpath '/home/liky/hiveJDBCTestData/data.txt' overwrite into table emp Running: select * from emp Exception in thread "main" org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException Invalid memory usage value 1.0 for hive.limit.pushdown.memory.usage at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:380) at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:366) at org.apache.hive.jdbc.HiveStatement.runAsyncOnServer(HiveStatement.java:354) at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:293) at org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:509) at demo.utils.JDBCUtils.selectData(JDBCUtils.java:98) at demo.test.JDBCDemo.main(JDBCDemo.java:19){code} Setting hive.limit.pushdown.memory.usage to 0.0 has no exception. So, setting hive.limit.pushdown.memory.usage to 1.0 is not desirable, *hive-default.xml.template is not clear enough for the description of the boundary of the value, it is better to use the interval to indicate the value that is [0.0,1.0).* was: In hive-default.xml.template {code:java} hive.limit.pushdown.memory.usage 0.1 Expects value between 0.0f and 1.0f. The fraction of available memory to be used for buffering rows in Reducesink operator for limit pushdown optimization. {code} Based on the description of hive-default.xml.template, hive.limit.pushdown.memory.usage expects a value between 0.0 and 1.0, setting hive.limit.pushdown.memory.usage to 1.0 means that it expects the available memory of
[jira] [Updated] (HIVE-26213) "hive.limit.pushdown.memory.usage" better not be equal to 1.0, otherwise it will raise an error
[ https://issues.apache.org/jira/browse/HIVE-26213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jingxuan Fu updated HIVE-26213: --- Description: In hive-default.xml.template {code:java} hive.limit.pushdown.memory.usage 0.1 Expects value between 0.0f and 1.0f. The fraction of available memory to be used for buffering rows in Reducesink operator for limit pushdown optimization. {code} Based on the description of hive-default.xml.template, hive.limit.pushdown.memory.usage expects a value between 0.0 and 1.0, setting hive.limit.pushdown.memory.usage to 1.0 means that it expects the available memory of all buffered lines for the limit pushdown optimization, and successfully start hiveserver2. Then, call the java api to write a program to establish a jdbc connection as a client to access hive, using JDBCDemo as an example. hive.limit.pushdown.memory.usage 0.1 Expects value between 0.0f and 1.0f. The fraction of available memory to be used for buffering rows in Reducesink operator for limit pushdown optimization. Based on the description of hive-default.xml.template, hive.limit.pushdown.memory.usage expects a value between 0.0 and 1.0, setting hive.limit.pushdown.memory.usage to 1.0 means that it expects the available memory of all buffered lines for the limit pushdown optimization, and successfully start hiveserver2. Then, call the java api to write a program to establish a jdbc connection as a client to access hive, using JDBCDemo as an example. import demo.utils.JDBCUtils; public class JDBCDemo{ public static void main(String[] args) throws Exception { JDBCUtils.init(); JDBCUtils.createDatabase(); JDBCUtils.showDatabases(); JDBCUtils.createTable(); JDBCUtils.showTables(); JDBCUtils.descTable(); JDBCUtils.loadData(); JDBCUtils.selectData(); JDBCUtils.countData(); JDBCUtils.dropDatabase(); JDBCUtils.dropTable(); JDBCUtils.destory(); } } After running the client program, both the client and the hiveserver throw exceptions. {code:java} 2022-05-09 19:05:36: Starting HiveServer2 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Hive Session ID = 67a6db8d-f957-4d5d-ac18-28403adab7f3 Hive Session ID = f9f8772c-5765-4c3e-bcff-ca605c667be7 OK OK OK OK OK OK OK Loading data to table default.emp OK FAILED: SemanticException Invalid memory usage value 1.0 for hive.limit.pushdown.memory.usage{code} {code:java} liky@ljq1:~/hive_jdbc_test$ ./startJDBC_0.sh SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/liky/.m2/repository/org/apache/logging/log4j/log4j-slf4j-impl/2.17.1/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/liky/.m2/repository/org/slf4j/slf4j-log4j12/1.7.25/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Running: drop database if exists hive_jdbc_test Running: create database hive_jdbc_test Running: show databases default hive_jdbc_test Running: drop table if exists emp Running: create table emp( empno int, ename string, job string, mgr int, hiredate string, sal double, comm double, deptno int ) row format delimited fields terminated by '\t' Running: show tables emp Running: desc emp empno int ename string job string mgr int hiredate string sal double comm double deptno int Running: load data local inpath '/home/liky/hiveJDBCTestData/data.txt' overwrite into table emp Running: select * from emp Exception in thread "main" org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException Invalid memory usage value 1.0 for hive.limit.pushdown.memory.usage at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:380) at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:366) at org.apache.hive.jdbc.HiveStatement.runAsyncOnServer(HiveStatement.java:354) at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:293) at org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:509) at demo.utils.JDBCUtils.selectData(JDBCUtils.java:98) at demo.test.JDBCDemo.main(JDBCDemo.java:19){code} Setting hive.limit.pushdown.memory.usage to 0.0 has no exception. So, setting hive.limit.pushdown.memory.usage to 1.0 is
[jira] [Updated] (HIVE-26213) "hive.limit.pushdown.memory.usage" better not be equal to 1.0, otherwise it will raise an error
[ https://issues.apache.org/jira/browse/HIVE-26213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jingxuan Fu updated HIVE-26213: --- Description: In hive-default.xml.template hive.limit.pushdown.memory.usage 0.1 Expects value between 0.0f and 1.0f. The fraction of available memory to be used for buffering rows in Reducesink operator for limit pushdown optimization. Based on the description of hive-default.xml.template, hive.limit.pushdown.memory.usage expects a value between 0.0 and 1.0, setting hive.limit.pushdown.memory.usage to 1.0 means that it expects the available memory of all buffered lines for the limit pushdown optimization, and successfully start hiveserver2. Then, call the java api to write a program to establish a jdbc connection as a client to access hive, using JDBCDemo as an example. import demo.utils.JDBCUtils; public class JDBCDemo{ public static void main(String[] args) throws Exception { JDBCUtils.init(); JDBCUtils.createDatabase(); JDBCUtils.showDatabases(); JDBCUtils.createTable(); JDBCUtils.showTables(); JDBCUtils.descTable(); JDBCUtils.loadData(); JDBCUtils.selectData(); JDBCUtils.countData(); JDBCUtils.dropDatabase(); JDBCUtils.dropTable(); JDBCUtils.destory(); } } After running the client program, both the client and the hiveserver throw exceptions. 2022-05-09 19:05:36: Starting HiveServer2 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Hive Session ID = 67a6db8d-f957-4d5d-ac18-28403adab7f3 Hive Session ID = f9f8772c-5765-4c3e-bcff-ca605c667be7 OK OK OK OK OK OK OK Loading data to table default.emp OK FAILED: SemanticException Invalid memory usage value 1.0 for hive.limit.pushdown.memory.usage liky@ljq1:~/hive_jdbc_test$ ./startJDBC_0.sh SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/liky/.m2/repository/org/apache/logging/log4j/log4j-slf4j-impl/2.17.1/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/liky/.m2/repository/org/slf4j/slf4j-log4j12/1.7.25/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Running: drop database if exists hive_jdbc_test Running: create database hive_jdbc_test Running: show databases default hive_jdbc_test Running: drop table if exists emp Running: create table emp( empno int, ename string, job string, mgr int, hiredate string, sal double, comm double, deptno int ) row format delimited fields terminated by '\t' Running: show tables emp Running: desc emp empno int ename string job string mgr int hiredate string sal double comm double deptno int Running: load data local inpath '/home/liky/hiveJDBCTestData/data.txt' overwrite into table emp Running: select * from emp Exception in thread "main" org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException Invalid memory usage value 1.0 for hive.limit.pushdown.memory.usage at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:380) at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:366) at org.apache.hive.jdbc.HiveStatement.runAsyncOnServer(HiveStatement.java:354) at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:293) at org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:509) at demo.utils.JDBCUtils.selectData(JDBCUtils.java:98) at demo.test.JDBCDemo.main(JDBCDemo.java:19) Setting hive.limit.pushdown.memory.usage to 0.0 has no exception. So, setting hive.limit.pushdown.memory.usage to 1.0 is not desirable, *hive-default.xml.template is not clear enough for the description of the boundary of the value, it is better to use the interval to indicate the value that is [0.0,1.0).* was: In hive-default.xml.template {code:java} hive.limit.pushdown.memory.usage 0.1 Expects value between 0.0f and 1.0f. The fraction of available memory to be used for buffering rows in Reducesink operator for limit pushdown optimization. {code} Based on the description of hive-default.xml.template, hive.limit.pushdown.memory.usage expects a value between 0.0 and 1.0, setting hive.limit.pushdown.memory.usage to 1.0 means that it expects the available memory of all buffered
[jira] [Assigned] (HIVE-26213) "hive.limit.pushdown.memory.usage" better not be equal to 1.0, otherwise it will raise an error
[ https://issues.apache.org/jira/browse/HIVE-26213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jingxuan Fu reassigned HIVE-26213: -- > "hive.limit.pushdown.memory.usage" better not be equal to 1.0, otherwise it > will raise an error > --- > > Key: HIVE-26213 > URL: https://issues.apache.org/jira/browse/HIVE-26213 > Project: Hive > Issue Type: Bug >Affects Versions: 3.1.2 > Environment: Hive 3.1.2 > os.name=Linux > os.arch=amd64 > os.version=5.4.0-72-generic > java.version=1.8.0_162 > java.vendor=Oracle Corporation >Reporter: Jingxuan Fu >Assignee: Jingxuan Fu >Priority: Major > > In hive-default.xml.template > > > > > {code:java} > hive.limit.pushdown.memory.usage 0.1 > Expects value between 0.0f and 1.0f. The fraction of available > memory to be used for buffering rows in Reducesink operator for limit > pushdown optimization. > {code} > > > Based on the description of hive-default.xml.template, > hive.limit.pushdown.memory.usage expects a value between 0.0 and 1.0, setting > hive.limit.pushdown.memory.usage to 1.0 means that it expects the available > memory of all buffered lines for the limit pushdown optimization, and > successfully start hiveserver2. > Then, call the java api to write a program to establish a jdbc connection as > a client to access hive, using JDBCDemo as an example. > > {code:java} > import demo.utils.JDBCUtils; public class JDBCDemo{ public static void > main(String[] args) throws Exception { JDBCUtils.init(); > JDBCUtils.createDatabase(); JDBCUtils.showDatabases(); > JDBCUtils.createTable(); JDBCUtils.showTables(); JDBCUtils.descTable(); > JDBCUtils.loadData(); JDBCUtils.selectData(); JDBCUtils.countData(); > JDBCUtils.dropDatabase(); JDBCUtils.dropTable(); JDBCUtils.destory(); } } > {code} > After running the client program, both the client and the hiveserver throw > exceptions. > > {code:java} > 2022-05-09 19:05:36: Starting HiveServer2 SLF4J: Class path contains multiple > SLF4J bindings. SLF4J: Found binding in > [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. SLF4J: Actual binding is of type > [org.apache.logging.slf4j.Log4jLoggerFactory] Hive Session ID = > 67a6db8d-f957-4d5d-ac18-28403adab7f3 Hive Session ID = > f9f8772c-5765-4c3e-bcff-ca605c667be7 OK OK OK OK OK OK OK Loading data to > table default.emp OK FAILED: SemanticException Invalid memory usage value 1.0 > for hive.limit.pushdown.memory.usage{code} > > > > {code:java} > liky@ljq1:~/hive_jdbc_test$ ./startJDBC_0.sh SLF4J: Class path contains > multiple SLF4J bindings. SLF4J: Found binding in > [jar:file:/home/liky/.m2/repository/org/apache/logging/log4j/log4j-slf4j-impl/2.17.1/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/home/liky/.m2/repository/org/slf4j/slf4j-log4j12/1.7.25/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. SLF4J: Actual binding is of type > [org.apache.logging.slf4j.Log4jLoggerFactory] Running: drop database if > exists hive_jdbc_test Running: create database hive_jdbc_test Running: show > databases default hive_jdbc_test Running: drop table if exists emp Running: > create table emp( empno int, ename string, job string, mgr int, hiredate > string, sal double, comm double, deptno int ) row format delimited fields > terminated by '\t' Running: show tables emp Running: desc emp empno int ename > string job string mgr int hiredate string sal double comm double deptno int > Running: load data local inpath '/home/liky/hiveJDBCTestData/data.txt' > overwrite into table emp Running: select * from emp Exception in thread > "main" org.apache.hive.service.cli.HiveSQLException: Error while compiling > statement: FAILED: SemanticException Invalid memory usage value 1.0 for > hive.limit.pushdown.memory.usage at > org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:380) at > org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:366) at > org.apache.hive.jdbc.HiveStatement.runAsyncOnServer(HiveStatement.java:354) > at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:293) at > org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:509) at > demo.utils.JDBCUtils.selectData(JDBCUtils.java:98) at > demo.test.JDBCDemo.main(JDBCDemo.java:19){code} > > > Setting hive.limit.pushdown.memory.usage
[jira] [Updated] (HIVE-26211) "hive.server2.webui.max.historic.queries" should be avoided to be set too large, otherwise it will cause blocking
[ https://issues.apache.org/jira/browse/HIVE-26211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jingxuan Fu updated HIVE-26211: --- Description: In hive-default.xml.template {code:java} hive.server2.webui.max.historic.queries 25 The maximum number of past queries to show in HiverSever2 WebUI. {code} Set hive.server2.webui.max.historic.queries to a relatively large value, take 2000 as an example, start hiveserver2, it can start hiveserver normally, and logging without exception. {code:java} liky@ljq1:/usr/local/hive/conf$ hiveserver2 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] 2022-05-09 20:03:41: Starting HiveServer2 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Hive Session ID = 0b419706-4026-4a8b-80fe-b79fecbccd4f Hive Session ID = 0f9e28d7-0081-4b2f-a743-4093c38c152d{code} Next, if you use beeline as a client to connect to hive and send a request for database related operations, for example, if you query all the databases, after successfully executing "show databases", beeline blocks and no other operations can be performed. {code:java} liky@ljq1:/opt/hive$ beeline SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Beeline version 3.1.2 by Apache Hive beeline> !connect jdbc:hive2://192.168.1.194:1/default Connecting to jdbc:hive2://192.168.1.194:1/default Enter username for jdbc:hive2://192.168.1.194:1/default: hive Enter password for jdbc:hive2://192.168.1.194:1/default: * Connected to: Apache Hive (version 3.1.2) Driver: Hive JDBC (version 3.1.2) Transaction isolation: TRANSACTION_REPEATABLE_READ 0: jdbc:hive2://192.168.1.194:1/default> show databases . . . . . . . . . . . . . . . . . . . . . .> ; INFO : Compiling command(queryId=liky_20220509202542_15382019-f07b-40ff-840d-1f720df77d8b): show databases INFO : Concurrency mode is disabled, not creating a lock manager INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:database_name, type:string, comment:from deserializer)], properties:null) INFO : Completed compiling command(queryId=liky_20220509202542_15382019-f07b-40ff-840d-1f720df77d8b); Time taken: 0.393 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : Executing command(queryId=liky_20220509202542_15382019-f07b-40ff-840d-1f720df77d8b): show databases INFO : Starting task [Stage-0:DDL] in serial mode INFO : Completed executing command(queryId=liky_20220509202542_15382019-f07b-40ff-840d-1f720df77d8b); Time taken: 0.109 seconds INFO : OK INFO : Concurrency mode is disabled, not creating a lock manager database_name default 1 row selected (1.374 seconds) {code} Also, on the hiveserver side, a runtime null pointer exception is thrown, and the observation log throws no warnings or errors. {code:java} liky@ljq1:/usr/local/hive/conf$ hiveserver2 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Error: Could not find or load main class org.apache.hadoop.hbase.util.GetJavaProperty SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in
[jira] [Updated] (HIVE-26211) "hive.server2.webui.max.historic.queries" should be avoided to be set too large, otherwise it will cause blocking
[ https://issues.apache.org/jira/browse/HIVE-26211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jingxuan Fu updated HIVE-26211: --- Description: In hive-default.xml.template {code:java} hive.server2.webui.max.historic.queries 25 The maximum number of past queries to show in HiverSever2 WebUI. {code} Set hive.server2.webui.max.historic.queries to a relatively large value, take 2000 as an example, start hiveserver2, it can start hiveserver normally, and logging without exception. {code:java} liky@ljq1:/usr/local/hive/conf$ hiveserver2 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] 2022-05-09 20:03:41: Starting HiveServer2 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Hive Session ID = 0b419706-4026-4a8b-80fe-b79fecbccd4f Hive Session ID = 0f9e28d7-0081-4b2f-a743-4093c38c152d{code} Next, if you use beeline as a client to connect to hive and send a request for database related operations, for example, if you query all the databases, after successfully executing "show databases", beeline blocks and no other operations can be performed. {code:java} liky@ljq1:/opt/hive$ beeline SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Beeline version 3.1.2 by Apache Hive beeline> !connect jdbc:hive2://192.168.1.194:1/default Connecting to jdbc:hive2://192.168.1.194:1/default Enter username for jdbc:hive2://192.168.1.194:1/default: hive Enter password for jdbc:hive2://192.168.1.194:1/default: * Connected to: Apache Hive (version 3.1.2) Driver: Hive JDBC (version 3.1.2) Transaction isolation: TRANSACTION_REPEATABLE_READ 0: jdbc:hive2://192.168.1.194:1/default> show databases . . . . . . . . . . . . . . . . . . . . . .> ; INFO : Compiling command(queryId=liky_20220509202542_15382019-f07b-40ff-840d-1f720df77d8b): show databases INFO : Concurrency mode is disabled, not creating a lock manager INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:database_name, type:string, comment:from deserializer)], properties:null) INFO : Completed compiling command(queryId=liky_20220509202542_15382019-f07b-40ff-840d-1f720df77d8b); Time taken: 0.393 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : Executing command(queryId=liky_20220509202542_15382019-f07b-40ff-840d-1f720df77d8b): show databases INFO : Starting task [Stage-0:DDL] in serial mode INFO : Completed executing command(queryId=liky_20220509202542_15382019-f07b-40ff-840d-1f720df77d8b); Time taken: 0.109 seconds INFO : OK INFO : Concurrency mode is disabled, not creating a lock manager database_name default 1 row selected (1.374 seconds) {code} Also, on the hiveserver side, a runtime null pointer exception is thrown, and the observation log throws no warnings or errors. {code:java} liky@ljq1:/usr/local/hive/conf$ hiveserver2 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Error: Could not find or load main class org.apache.hadoop.hbase.util.GetJavaProperty SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in
[jira] [Updated] (HIVE-26211) "hive.server2.webui.max.historic.queries" should be avoided to be set too large, otherwise it will cause blocking
[ https://issues.apache.org/jira/browse/HIVE-26211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jingxuan Fu updated HIVE-26211: --- Description: In hive-default.xml.template {code:java} hive.server2.webui.max.historic.queries 25 The maximum number of past queries to show in HiverSever2 WebUI. {code} Set hive.server2.webui.max.historic.queries to a relatively large value, take 2000 as an example, start hiveserver2, it can start hiveserver normally, and logging without exception. {code:java} liky@ljq1:/usr/local/hive/conf$ hiveserver2 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] 2022-05-09 20:03:41: Starting HiveServer2 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Hive Session ID = 0b419706-4026-4a8b-80fe-b79fecbccd4f Hive Session ID = 0f9e28d7-0081-4b2f-a743-4093c38c152d{code} Next, if you use beeline as a client to connect to hive and send a request for database related operations, for example, if you query all the databases, after successfully executing "show databases", beeline blocks and no other operations can be performed. {code:java} liky@ljq1:/opt/hive$ beeline SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Beeline version 3.1.2 by Apache Hive beeline> !connect jdbc:hive2://192.168.1.194:1/default Connecting to jdbc:hive2://192.168.1.194:1/default Enter username for jdbc:hive2://192.168.1.194:1/default: hive Enter password for jdbc:hive2://192.168.1.194:1/default: * Connected to: Apache Hive (version 3.1.2) Driver: Hive JDBC (version 3.1.2) Transaction isolation: TRANSACTION_REPEATABLE_READ 0: jdbc:hive2://192.168.1.194:1/default> show databases . . . . . . . . . . . . . . . . . . . . . .> ; INFO : Compiling command(queryId=liky_20220509202542_15382019-f07b-40ff-840d-1f720df77d8b): show databases INFO : Concurrency mode is disabled, not creating a lock manager INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:database_name, type:string, comment:from deserializer)], properties:null) INFO : Completed compiling command(queryId=liky_20220509202542_15382019-f07b-40ff-840d-1f720df77d8b); Time taken: 0.393 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : Executing command(queryId=liky_20220509202542_15382019-f07b-40ff-840d-1f720df77d8b): show databases INFO : Starting task [Stage-0:DDL] in serial mode INFO : Completed executing command(queryId=liky_20220509202542_15382019-f07b-40ff-840d-1f720df77d8b); Time taken: 0.109 seconds INFO : OK INFO : Concurrency mode is disabled, not creating a lock manager database_name default 1 row selected (1.374 seconds) {code} Also, on the hiveserver side, a runtime null pointer exception is thrown, and the observation log throws no warnings or errors. {code:java} liky@ljq1:/usr/local/hive/conf$ hiveserver2 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Error: Could not find or load main class org.apache.hadoop.hbase.util.GetJavaProperty SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in
[jira] [Updated] (HIVE-26212) hive fetch data timeout
[ https://issues.apache.org/jira/browse/HIVE-26212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] royal updated HIVE-26212: - Description: When i fetch data from hive, The following error message appears, I think it's related to the size of the data 2022-05-09 19:28:17,156 INFO org.apache.hadoop.mapred.FileInputFormat: [HiveServer2-Handler-Pool: Thread-773525]: Total input paths to process : 47751 2022-05-09 19:30:19,729 WARN org.apache.hadoop.hive.conf.HiveConf: [HiveServer2-Handler-Pool: Thread-773521]: HiveConf of name hive.server2.idle.session.timeout_check_operation does not exist 2022-05-09 19:30:19,729 WARN org.apache.hadoop.hive.conf.HiveConf: [HiveServer2-Handler-Pool: Thread-773521]: HiveConf of name hive.sentry.conf.url does not exist 2022-05-09 19:30:19,729 WARN org.apache.hadoop.hive.conf.HiveConf: [HiveServer2-Handler-Pool: Thread-773521]: HiveConf of name hive.entity.capture.input.URI does not exist 2022-05-09 19:30:19,733 INFO org.apache.hadoop.hive.ql.exec.ListSinkOperator: [HiveServer2-Handler-Pool: Thread-773521]: 749375 finished. closing... 2022-05-09 19:30:19,733 INFO org.apache.hadoop.hive.ql.exec.ListSinkOperator: [HiveServer2-Handler-Pool: Thread-773521]: 749375 Close done 2022-05-09 19:30:19,733 INFO org.apache.hadoop.hive.ql.exec.ListSinkOperator: [HiveServer2-Handler-Pool: Thread-773521]: Initializing Self OP[749375] 2022-05-09 19:30:19,733 INFO org.apache.hadoop.hive.ql.exec.ListSinkOperator: [HiveServer2-Handler-Pool: Thread-773521]: Operator 749375 OP initialized 2022-05-09 19:30:19,733 INFO org.apache.hadoop.hive.ql.exec.ListSinkOperator: [HiveServer2-Handler-Pool: Thread-773521]: Initialization Done 749375 OP 2022-05-09 19:30:19,741 WARN org.apache.hive.service.cli.thrift.ThriftCLIService: [HiveServer2-Handler-Pool: Thread-773525]: Error fetching results: org.apache.hive.service.cli.HiveSQLException: java.io.IOException: java.lang.NullPointerException at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:463) at org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:294) at org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:769) at org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:462) at org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:694) at org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1553) at org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1538) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:706) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:508) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:415) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:140) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2071) at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:458) ... 13 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextPath(FetchOperator.java:255) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextSplits(FetchOperator.java:350) at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:295) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:446) ... 17 more Reply 0 KUDOS > hive fetch data timeout > --- > > Key: HIVE-26212 > URL: https://issues.apache.org/jira/browse/HIVE-26212 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 1.1.0 >Reporter: royal >Priority: Major > > When i fetch data from hive, The following error message appears, I think > it's related to the size of the data > > 2022-05-09 19:28:17,156 INFO org.apache.hadoop.mapred.FileInputFormat: > [HiveServer2-Handler-Pool: Thread-773525]: Total input paths to process : > 47751 > 2022-05-09 19:30:19,729 WARN org.apache.hadoop.hive.conf.HiveConf: > [HiveServer2-Handler-Pool: Thread-773521]: HiveConf of name > hive.server2.idle.session.timeout_check_operation does not exist > 2022-05-09 19:30:19,729 WARN
[jira] [Updated] (HIVE-26211) "hive.server2.webui.max.historic.queries" should be avoided to be set too large, otherwise it will cause blocking
[ https://issues.apache.org/jira/browse/HIVE-26211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jingxuan Fu updated HIVE-26211: --- Description: In hive-default.xml.template {code:java} hive.server2.webui.max.historic.queries 25 The maximum number of past queries to show in HiverSever2 WebUI. {code} Set hive.server2.webui.max.historic.queries to a relatively large value, take 2000 as an example, start hiveserver2, it can start hiveserver normally, and logging without exception. {code:java} liky@ljq1:/usr/local/hive/conf$ hiveserver2 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] 2022-05-09 20:03:41: Starting HiveServer2 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Hive Session ID = 0b419706-4026-4a8b-80fe-b79fecbccd4f Hive Session ID = 0f9e28d7-0081-4b2f-a743-4093c38c152d{code} Next, if you use beeline as a client to connect to hive and send a request for database related operations, for example, if you query all the databases, after successfully executing "show databases", beeline blocks and no other operations can be performed. {code:java} liky@ljq1:/opt/hive$ beeline SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Beeline version 3.1.2 by Apache Hive beeline> !connect jdbc:hive2://192.168.1.194:1/default Connecting to jdbc:hive2://192.168.1.194:1/default Enter username for jdbc:hive2://192.168.1.194:1/default: hive Enter password for jdbc:hive2://192.168.1.194:1/default: * Connected to: Apache Hive (version 3.1.2) Driver: Hive JDBC (version 3.1.2) Transaction isolation: TRANSACTION_REPEATABLE_READ 0: jdbc:hive2://192.168.1.194:1/default> show databases . . . . . . . . . . . . . . . . . . . . . .> ; INFO : Compiling command(queryId=liky_20220509202542_15382019-f07b-40ff-840d-1f720df77d8b): show databases INFO : Concurrency mode is disabled, not creating a lock manager INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:database_name, type:string, comment:from deserializer)], properties:null) INFO : Completed compiling command(queryId=liky_20220509202542_15382019-f07b-40ff-840d-1f720df77d8b); Time taken: 0.393 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : Executing command(queryId=liky_20220509202542_15382019-f07b-40ff-840d-1f720df77d8b): show databases INFO : Starting task [Stage-0:DDL] in serial mode INFO : Completed executing command(queryId=liky_20220509202542_15382019-f07b-40ff-840d-1f720df77d8b); Time taken: 0.109 seconds INFO : OK INFO : Concurrency mode is disabled, not creating a lock manager database_name default 1 row selected (1.374 seconds) {code} Also, on the hiveserver side, a runtime null pointer exception is thrown, and the observation log throws no warnings or errors. {code:java} liky@ljq1:/usr/local/hive/conf$ hiveserver2 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Error: Could not find or load main class org.apache.hadoop.hbase.util.GetJavaProperty SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in
[jira] [Assigned] (HIVE-26211) "hive.server2.webui.max.historic.queries" should be avoided to be set too large, otherwise it will cause blocking
[ https://issues.apache.org/jira/browse/HIVE-26211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jingxuan Fu reassigned HIVE-26211: -- > "hive.server2.webui.max.historic.queries" should be avoided to be set too > large, otherwise it will cause blocking > - > > Key: HIVE-26211 > URL: https://issues.apache.org/jira/browse/HIVE-26211 > Project: Hive > Issue Type: Bug >Affects Versions: 3.1.2 > Environment: Hive 3.1.2 > os.name=Linux > os.arch=amd64 > os.version=5.4.0-72-generic > java.version=1.8.0_162 > java.vendor=Oracle Corporation >Reporter: Jingxuan Fu >Assignee: Jingxuan Fu >Priority: Major > > In hive-default.xml.template > > hive.server2.webui.max.historic.queries > 25 > The maximum number of past queries to show in HiverSever2 > WebUI. > > Set hive.server2.webui.max.historic.queries to a relatively large value, take > 2000 as an example, start hiveserver2, it can start hiveserver normally, > and logging without exception. > liky@ljq1:/usr/local/hive/conf$ hiveserver2 > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] > 2022-05-09 20:03:41: Starting HiveServer2 > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] > Hive Session ID = 0b419706-4026-4a8b-80fe-b79fecbccd4f > Hive Session ID = 0f9e28d7-0081-4b2f-a743-4093c38c152d > Next, if you use beeline as a client to connect to hive and send a request > for database related operations, for example, if you query all the databases, > after successfully executing "show databases", beeline blocks and no other > operations can be performed. > liky@ljq1:/opt/hive$ beeline > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] > Beeline version 3.1.2 by Apache Hive > beeline> !connect jdbc:hive2://192.168.1.194:1/default > Connecting to jdbc:hive2://192.168.1.194:1/default > Enter username for jdbc:hive2://192.168.1.194:1/default: hive > Enter password for jdbc:hive2://192.168.1.194:1/default: * > Connected to: Apache Hive (version 3.1.2) > Driver: Hive JDBC (version 3.1.2) > Transaction isolation: TRANSACTION_REPEATABLE_READ > 0: jdbc:hive2://192.168.1.194:1/default> show databases > . . . . . . . . . . . . . . . . . . . . . .> ; > INFO : Compiling > command(queryId=liky_20220509202542_15382019-f07b-40ff-840d-1f720df77d8b): > show databases > INFO : Concurrency mode is disabled, not creating a lock manager > INFO : Semantic Analysis Completed (retrial = false) > INFO : Returning Hive schema: > Schema(fieldSchemas:[FieldSchema(name:database_name, type:string, > comment:from deserializer)], properties:null) > INFO : Completed compiling > command(queryId=liky_20220509202542_15382019-f07b-40ff-840d-1f720df77d8b); > Time taken: 0.393 seconds > INFO : Concurrency mode is disabled, not creating a lock manager > INFO : Executing > command(queryId=liky_20220509202542_15382019-f07b-40ff-840d-1f720df77d8b): > show databases > INFO : Starting task [Stage-0:DDL] in serial mode > INFO : Completed executing > command(queryId=liky_20220509202542_15382019-f07b-40ff-840d-1f720df77d8b); > Time taken: 0.109 seconds > INFO : OK > INFO : Concurrency mode is disabled, not creating a lock manager > ++ > | database_name | > ++ > | default | > ++ > 1 row selected (1.374 seconds) > Also, on the hiveserver side, a runtime null pointer exception is thrown, and > the
[jira] [Work logged] (HIVE-26205) Remove the incorrect org.slf4j dependency in kafka-handler
[ https://issues.apache.org/jira/browse/HIVE-26205?focusedWorklogId=767877=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-767877 ] ASF GitHub Bot logged work on HIVE-26205: - Author: ASF GitHub Bot Created on: 09/May/22 11:55 Start Date: 09/May/22 11:55 Worklog Time Spent: 10m Work Description: wecharyu commented on PR #3272: URL: https://github.com/apache/hive/pull/3272#issuecomment-1121002974 @pvary @deniskuzZ : Could you please review this PR? Project compile failed in my new machine with `maven 3.8.5`, and I think this dependency is redundant, which can be inherited from the parent pom. Issue Time Tracking --- Worklog Id: (was: 767877) Time Spent: 20m (was: 10m) > Remove the incorrect org.slf4j dependency in kafka-handler > -- > > Key: HIVE-26205 > URL: https://issues.apache.org/jira/browse/HIVE-26205 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 4.0.0-alpha-2 >Reporter: Wechar >Assignee: Wechar >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-2 > > Time Spent: 20m > Remaining Estimate: 0h > > Get a compile error while executing: > {code:bash} > mvn clean install -DskipTests > {code} > The error message is: > {code:bash} > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile > (default-compile) on project kafka-handler: Compilation failure: Compilation > failure: > [ERROR] > /ldap_home/weiqiang.yu/forked-hive/kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaStorageHandler.java:[53,17] > package org.slf4j does not exist > [ERROR] > /ldap_home/weiqiang.yu/forked-hive/kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaStorageHandler.java:[54,17] > package org.slf4j does not exist > [ERROR] > /ldap_home/weiqiang.yu/forked-hive/kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaStorageHandler.java:[73,24] > cannot find symbol > [ERROR] symbol: class Logger > [ERROR] location: class org.apache.hadoop.hive.kafka.KafkaStorageHandler > [ERROR] > /ldap_home/weiqiang.yu/forked-hive/kafka-handler/src/java/org/apache/hadoop/hive/kafka/VectorizedKafkaRecordReader.java:[37,17] > package org.slf4j does not exist > [ERROR] > /ldap_home/weiqiang.yu/forked-hive/kafka-handler/src/java/org/apache/hadoop/hive/kafka/VectorizedKafkaRecordReader.java:[47,24] > cannot find symbol > [ERROR] symbol: class Logger > [ERROR] location: class > org.apache.hadoop.hive.kafka.VectorizedKafkaRecordReader > [ERROR] > /ldap_home/weiqiang.yu/forked-hive/kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaJsonSerDe.java:[63,17] > package org.slf4j does not exist > [ERROR] > /ldap_home/weiqiang.yu/forked-hive/kafka-handler/src/java/org/apache/hadoop/hive/kafka/SimpleKafkaWriter.java:[35,17] > package org.slf4j does not exist > [ERROR] > /ldap_home/weiqiang.yu/forked-hive/kafka-handler/src/java/org/apache/hadoop/hive/kafka/SimpleKafkaWriter.java:[50,24] > cannot find symbol > [ERROR] symbol: class Logger > [ERROR] location: class org.apache.hadoop.hive.kafka.SimpleKafkaWriter > [ERROR] > /ldap_home/weiqiang.yu/forked-hive/kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaOutputFormat.java:[34,17] > package org.slf4j does not exist > [ERROR] > /ldap_home/weiqiang.yu/forked-hive/kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaOutputFormat.java:[43,24] > cannot find symbol > [ERROR] symbol: class Logger > [ERROR] location: class org.apache.hadoop.hive.kafka.KafkaOutputFormat > [ERROR] > /ldap_home/weiqiang.yu/forked-hive/kafka-handler/src/java/org/apache/hadoop/hive/kafka/RetryUtils.java:[24,17] > package org.slf4j does not exist > [ERROR] > /ldap_home/weiqiang.yu/forked-hive/kafka-handler/src/java/org/apache/hadoop/hive/kafka/RetryUtils.java:[34,24] > cannot find symbol > [ERROR] symbol: class Logger > [ERROR] location: class org.apache.hadoop.hive.kafka.RetryUtils > [ERROR] > /ldap_home/weiqiang.yu/forked-hive/kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaScanTrimmer.java:[51,17] > package org.slf4j does not exist > [ERROR] > /ldap_home/weiqiang.yu/forked-hive/kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaScanTrimmer.java:[65,24] > cannot find symbol > [ERROR] symbol: class Logger > [ERROR] location: class org.apache.hadoop.hive.kafka.KafkaScanTrimmer > [ERROR] > /ldap_home/weiqiang.yu/forked-hive/kafka-handler/src/java/org/apache/hadoop/hive/kafka/TransactionalKafkaWriter.java:[45,17] > package org.slf4j does not exist > [ERROR] > /ldap_home/weiqiang.yu/forked-hive/kafka-handler/src/java/org/apache/hadoop/hive/kafka/TransactionalKafkaWriter.java:[65,24] >
[jira] [Work logged] (HIVE-26203) Implement alter iceberg table metadata location
[ https://issues.apache.org/jira/browse/HIVE-26203?focusedWorklogId=767868=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-767868 ] ASF GitHub Bot logged work on HIVE-26203: - Author: ASF GitHub Bot Created on: 09/May/22 11:18 Start Date: 09/May/22 11:18 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3270: URL: https://github.com/apache/hive/pull/3270#discussion_r867902283 ## iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerNoScan.java: ## @@ -1456,6 +1456,63 @@ public void testCreateTableWithMetadataLocation() throws IOException { HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS.stream()).collect(Collectors.toList()), records, 0); } + @Test + public void testAlterTableWithMetadataLocation() throws IOException { +Assume.assumeTrue("Alter table with metadata location is only supported for Hive Catalog tables", Review Comment: What do the users see when they try to do this for a non-HiveCatalog table? Issue Time Tracking --- Worklog Id: (was: 767868) Time Spent: 50m (was: 40m) > Implement alter iceberg table metadata location > --- > > Key: HIVE-26203 > URL: https://issues.apache.org/jira/browse/HIVE-26203 > Project: Hive > Issue Type: Improvement >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: iceberg, pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26203) Implement alter iceberg table metadata location
[ https://issues.apache.org/jira/browse/HIVE-26203?focusedWorklogId=767867=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-767867 ] ASF GitHub Bot logged work on HIVE-26203: - Author: ASF GitHub Bot Created on: 09/May/22 11:17 Start Date: 09/May/22 11:17 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3270: URL: https://github.com/apache/hive/pull/3270#discussion_r867902283 ## iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerNoScan.java: ## @@ -1456,6 +1456,63 @@ public void testCreateTableWithMetadataLocation() throws IOException { HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS.stream()).collect(Collectors.toList()), records, 0); } + @Test + public void testAlterTableWithMetadataLocation() throws IOException { +Assume.assumeTrue("Alter table with metadata location is only supported for Hive Catalog tables", Review Comment: What happens in the code where we try to do this for a non-HiveCatalog table? Issue Time Tracking --- Worklog Id: (was: 767867) Time Spent: 40m (was: 0.5h) > Implement alter iceberg table metadata location > --- > > Key: HIVE-26203 > URL: https://issues.apache.org/jira/browse/HIVE-26203 > Project: Hive > Issue Type: Improvement >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: iceberg, pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26203) Implement alter iceberg table metadata location
[ https://issues.apache.org/jira/browse/HIVE-26203?focusedWorklogId=767866=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-767866 ] ASF GitHub Bot logged work on HIVE-26203: - Author: ASF GitHub Bot Created on: 09/May/22 11:16 Start Date: 09/May/22 11:16 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3270: URL: https://github.com/apache/hive/pull/3270#discussion_r867901386 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java: ## @@ -336,6 +336,30 @@ public void preAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTable, E // that users can change data types or reorder columns too with this alter op type, so its name is misleading..) assertNotMigratedTable(hmsTable.getParameters(), "CHANGE COLUMN"); handleChangeColumn(hmsTable); +} else if (AlterTableType.ADDPROPS.equals(currentAlterTableOp)) { + assertNotThirdPartyMetadataLocationChange(hmsTable.getParameters()); +} + } + + /** + * Perform a check on the current iceberg table whether a metadata change can be performed. A table is eligible if + * the current metadata uuid and the new metadata uuid matches. + * @param tblParams hms table properties, must be non-null + */ + private void assertNotThirdPartyMetadataLocationChange(Map tblParams) { +if (tblParams.containsKey(BaseMetastoreTableOperations.METADATA_LOCATION_PROP)) { + Preconditions.checkArgument(icebergTable != null, Review Comment: How could this happen? Issue Time Tracking --- Worklog Id: (was: 767866) Time Spent: 0.5h (was: 20m) > Implement alter iceberg table metadata location > --- > > Key: HIVE-26203 > URL: https://issues.apache.org/jira/browse/HIVE-26203 > Project: Hive > Issue Type: Improvement >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: iceberg, pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26203) Implement alter iceberg table metadata location
[ https://issues.apache.org/jira/browse/HIVE-26203?focusedWorklogId=767865=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-767865 ] ASF GitHub Bot logged work on HIVE-26203: - Author: ASF GitHub Bot Created on: 09/May/22 11:14 Start Date: 09/May/22 11:14 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3270: URL: https://github.com/apache/hive/pull/3270#discussion_r867899800 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java: ## @@ -336,6 +336,30 @@ public void preAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTable, E // that users can change data types or reorder columns too with this alter op type, so its name is misleading..) assertNotMigratedTable(hmsTable.getParameters(), "CHANGE COLUMN"); handleChangeColumn(hmsTable); +} else if (AlterTableType.ADDPROPS.equals(currentAlterTableOp)) { + assertNotThirdPartyMetadataLocationChange(hmsTable.getParameters()); Review Comment: nit: `crossTableMetadataLocationChange`? Issue Time Tracking --- Worklog Id: (was: 767865) Time Spent: 20m (was: 10m) > Implement alter iceberg table metadata location > --- > > Key: HIVE-26203 > URL: https://issues.apache.org/jira/browse/HIVE-26203 > Project: Hive > Issue Type: Improvement >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: iceberg, pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Assigned] (HIVE-26177) Create a new connection pool for compaction (DataNucleus)
[ https://issues.apache.org/jira/browse/HIVE-26177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antal Sinkovits reassigned HIVE-26177: -- Assignee: Antal Sinkovits > Create a new connection pool for compaction (DataNucleus) > - > > Key: HIVE-26177 > URL: https://issues.apache.org/jira/browse/HIVE-26177 > Project: Hive > Issue Type: Sub-task >Reporter: Antal Sinkovits >Assignee: Antal Sinkovits >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (HIVE-26177) Create a new connection pool for compaction (DataNucleus)
[ https://issues.apache.org/jira/browse/HIVE-26177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antal Sinkovits resolved HIVE-26177. Resolution: Fixed Pushed to master. Thanks for the review [~dkuzmenko] > Create a new connection pool for compaction (DataNucleus) > - > > Key: HIVE-26177 > URL: https://issues.apache.org/jira/browse/HIVE-26177 > Project: Hive > Issue Type: Sub-task >Reporter: Antal Sinkovits >Assignee: Antal Sinkovits >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26177) Create a new connection pool for compaction (DataNucleus)
[ https://issues.apache.org/jira/browse/HIVE-26177?focusedWorklogId=767819=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-767819 ] ASF GitHub Bot logged work on HIVE-26177: - Author: ASF GitHub Bot Created on: 09/May/22 07:49 Start Date: 09/May/22 07:49 Worklog Time Spent: 10m Work Description: asinkovits merged PR #3265: URL: https://github.com/apache/hive/pull/3265 Issue Time Tracking --- Worklog Id: (was: 767819) Time Spent: 20m (was: 10m) > Create a new connection pool for compaction (DataNucleus) > - > > Key: HIVE-26177 > URL: https://issues.apache.org/jira/browse/HIVE-26177 > Project: Hive > Issue Type: Sub-task >Reporter: Antal Sinkovits >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (HIVE-26210) Fix tests for Cleaner failed attempt threshold
[ https://issues.apache.org/jira/browse/HIVE-26210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-26210: -- Labels: pull-request-available (was: ) > Fix tests for Cleaner failed attempt threshold > -- > > Key: HIVE-26210 > URL: https://issues.apache.org/jira/browse/HIVE-26210 > Project: Hive > Issue Type: Bug >Reporter: László Végh >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26210) Fix tests for Cleaner failed attempt threshold
[ https://issues.apache.org/jira/browse/HIVE-26210?focusedWorklogId=767813=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-767813 ] ASF GitHub Bot logged work on HIVE-26210: - Author: ASF GitHub Bot Created on: 09/May/22 07:25 Start Date: 09/May/22 07:25 Worklog Time Spent: 10m Work Description: veghlaci05 opened a new pull request, #3274: URL: https://github.com/apache/hive/pull/3274 ### What changes were proposed in this pull request? This PR fixes the flaky tests created for HIVE-25943. ### Why are the changes needed? The test introduced in HIVE-25943 were flaky. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Tested manually. Issue Time Tracking --- Worklog Id: (was: 767813) Remaining Estimate: 0h Time Spent: 10m > Fix tests for Cleaner failed attempt threshold > -- > > Key: HIVE-26210 > URL: https://issues.apache.org/jira/browse/HIVE-26210 > Project: Hive > Issue Type: Bug >Reporter: László Végh >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (HIVE-12336) Sort Merge Partition Map Join
[ https://issues.apache.org/jira/browse/HIVE-12336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasad Gawande updated HIVE-12336: -- Description: Logically and functionally bucketing and partitioning are quite similar - both provide mechanism to segregate and separate the table's data based on its content. Thanks to that significant further optimisations like [partition] PRUNING or [bucket] MAP JOIN are possible. The difference seems to be imposed by design where the PARTITIONing is open/explicit while BUCKETing is discrete/implicit. Partitioning seems to be very common if not a standard feature in all current RDBMS while BUCKETING seems to be HIVE specific only. In a way BUCKETING could be also called by "hashing" or simply "IMPLICIT PARTITIONING". Regardless of the fact that these two are recognised as two separate features available in Hive there should be nothing to prevent leveraging same existing query/join optimisations across the two. PARTITION SORT MERGE MAPJOIN Use the same type of optimization as in SORT MERGE BUCKETED MAP JOIN for partitioned tables. The sort-merge join optimization could be performed when PARTITIONED tables being joined are sorted and partitioned on the join columns. The corresponding partitions are joined with each other at the mapper. If both A and B have partitions set on their columns KEY, the following join SELECT /*+ MAPJOIN(b) */ a.key, a.value FROM A a JOIN B b ON a.key = b.key can be done on the mapper only. The mapper for the partition key='201512' for A will traverse the corresponding partition for B. Traversing is possible if the corresponding partitions are sorted on the same columns. This is dependent on (taken care by HIVE-11525) was: Logically and functionally bucketing and partitioning are quite similar - both provide mechanism to segregate and separate the table's data based on its content. Thanks to that significant further optimisations like [partition] PRUNING or [bucket] MAP JOIN are possible. The difference seems to be imposed by design where the PARTITIONing is open/explicit while BUCKETing is discrete/implicit. Partitioning seems to be very common if not a standard feature in all current RDBMS while BUCKETING seems to be HIVE specific only. In a way BUCKETING could be also called by "hashing" or simply "IMPLICIT PARTITIONING". Regardless of the fact that these two are recognised as two separate features available in Hive there should be nothing to prevent leveraging same existing query/join optimisations across the two. PARTITION SORT MERGE MAPJOIN Use the same type of optimization as in SORT MERGE BUCKETED MAP JOIN for partitioned tables. The sort-merge join optimization could be performed when PARTITIONED tables being joined are sorted and partitioned on the join columns. The corresponding partitions are joined with each other at the mapper. If both A and B have partitions set on their columns KEY, the following join SELECT /*+ MAPJOIN(b) */ a.key, a.value FROM A a JOIN B b ON a.key = b.key can be done on the mapper only. The mapper for the partition key='201512' for A will traverse the corresponding partition for B. Traversing is possible if the corresponding partitions are sorted on the same columns. This is dependent on (taken care by [HIVE-11525|https://issues.apache.org/jira/browse/HIVE-12337]) > Sort Merge Partition Map Join > - > > Key: HIVE-12336 > URL: https://issues.apache.org/jira/browse/HIVE-12336 > Project: Hive > Issue Type: Improvement > Components: Logical Optimizer, Physical Optimizer, SQL >Affects Versions: 0.13.0, 0.13.1, 0.14.0, 1.0.0, 1.1.0 >Reporter: Maciek Kocon >Priority: Major > Labels: gsoc2015 > > Logically and functionally bucketing and partitioning are quite similar - > both provide mechanism to segregate and separate the table's data based on > its content. Thanks to that significant further optimisations like > [partition] PRUNING or [bucket] MAP JOIN are possible. > The difference seems to be imposed by design where the PARTITIONing is > open/explicit while BUCKETing is discrete/implicit. > Partitioning seems to be very common if not a standard feature in all current > RDBMS while BUCKETING seems to be HIVE specific only. > In a way BUCKETING could be also called by "hashing" or simply "IMPLICIT > PARTITIONING". > Regardless of the fact that these two are recognised as two separate features > available in Hive there should be nothing to prevent leveraging same existing > query/join optimisations across the two. > PARTITION SORT MERGE MAPJOIN > Use the same type of optimization as in SORT MERGE BUCKETED MAP JOIN for > partitioned tables. > The sort-merge join optimization could be performed when PARTITIONED tables > being joined are sorted and partitioned on the join