[jira] [Assigned] (HIVE-23254) Upgrade guava version in hive from 19.0 to 27.0-jre
[ https://issues.apache.org/jira/browse/HIVE-23254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wenjun ma reassigned HIVE-23254: Assignee: wenjun ma > Upgrade guava version in hive from 19.0 to 27.0-jre > --- > > Key: HIVE-23254 > URL: https://issues.apache.org/jira/browse/HIVE-23254 > Project: Hive > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Ankur Raj >Assignee: wenjun ma >Priority: Critical > > Upgrade guava version in hive from 19.0 to 27.0-jre. > Hadoop has already upgraded it as part of > [https://jira.apache.org/jira/browse/HADOOP-16213] > Concern : [https://nvd.nist.gov/vuln/detail/CVE-2018-10237 > :|https://nvd.nist.gov/vuln/detail/CVE-2018-10237] > Unbounded memory allocation in Google Guava 11.0 through 24.x before 24.1.1 > allows remote attackers to conduct denial of service attacks against servers > that depend on this library and deserialize attacker-provided data, because > the AtomicDoubleArray class (when serialized with Java serialization) and the > CompoundOrdering class (when serialized with GWT serialization) perform eager > allocation without appropriate checks on what a client has sent and whether > the data size is reasonable. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23873) Querying Hive JDBCStorageHandler table fails with NPE
[ https://issues.apache.org/jira/browse/HIVE-23873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17160324#comment-17160324 ] Chiran Ravani commented on HIVE-23873: -- [~srahman] Yes, problem exists with Master branch too. Going by Code, it does not seem we are handling column name case conversion in Master branch. {code} 2020-07-18T03:44:39,678 INFO [83565622-bc0d-4dbd-b463-88188d46b64e main]: dao.GenericJdbcDatabaseAccessor (:()) - Query to execute is [select * from TESTHIVEJDBCSTORAGE] 2020-07-18T03:44:39,898 ERROR [83565622-bc0d-4dbd-b463-88188d46b64e main]: CliDriver (:()) - Failed with exception java.io.IOException:java.lang.NullPointerException java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:638) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:545) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:150) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:603) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:243) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:277) at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:862) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:798) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:717) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:318) at org.apache.hadoop.util.RunJar.main(RunJar.java:232) Caused by: java.lang.NullPointerException at org.apache.hive.storage.jdbc.JdbcSerDe.deserialize(JdbcSerDe.java:235) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:619) ... 17 more {code} > Querying Hive JDBCStorageHandler table fails with NPE > - > > Key: HIVE-23873 > URL: https://issues.apache.org/jira/browse/HIVE-23873 > Project: Hive > Issue Type: Bug > Components: HiveServer2, JDBC >Affects Versions: 3.1.0, 3.1.1, 3.1.2 >Reporter: Chiran Ravani >Assignee: Chiran Ravani >Priority: Critical > Attachments: HIVE-23873.01.patch > > > Scenario is Hive table having same schema as table in Oracle, however when we > query the table with data it fails with NPE, below is the trace. > {code} > Caused by: java.io.IOException: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:617) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at > org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:524) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2739) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at > org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:473) > ~[hive-service-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > ... 34 more > Caused by: java.lang.NullPointerException > at > org.apache.hive.storage.jdbc.JdbcSerDe.deserialize(JdbcSerDe.java:164) > ~[hive-jdbc-handler-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:598) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at > org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:524) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2739) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229) >
[jira] [Assigned] (HIVE-21335) NPE in ObjectStore.getObjectCount
[ https://issues.apache.org/jira/browse/HIVE-21335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wenjun ma reassigned HIVE-21335: Assignee: wenjun ma > NPE in ObjectStore.getObjectCount > - > > Key: HIVE-21335 > URL: https://issues.apache.org/jira/browse/HIVE-21335 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Adam Holley >Assignee: wenjun ma >Priority: Major > > In ObjectStore.getObjectCount() there is no null check on result before call > intValue(). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-22233) Wrong result with vectorized execution when column value is casted to TINYINT
[ https://issues.apache.org/jira/browse/HIVE-22233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wenjun ma reassigned HIVE-22233: Assignee: wenjun ma > Wrong result with vectorized execution when column value is casted to TINYINT > - > > Key: HIVE-22233 > URL: https://issues.apache.org/jira/browse/HIVE-22233 > Project: Hive > Issue Type: Bug > Components: Vectorization >Affects Versions: 3.1.1, 2.3.4, 2.3.6 >Reporter: Ganesha Shreedhara >Assignee: wenjun ma >Priority: Major > > Casting a column value to TINYINT is giving incorrect result when vectorized > mode of the reduce-side GROUP BY query execution is enabled by setting > *hive.vectorized.execution.reduce.groupby.enabled* parameter (enabled by > default). This issue is only when the sub query has SUM/COUNT aggregation > operations in IF condition. > > *Steps to reproduce:* > {code:java} > create table test(id int); > insert into test values (1); > SELECT CAST(col AS TINYINT) col_cast FROM ( SELECT IF(SUM(1) > 0, 1, 0) col > FROM test) x; > {code} > > *Result:* > {code:java} > 0{code} > *Expected result:* > {code:java} > 1{code} > > We get the expected result when > *hive.vectorized.execution.reduce.groupby.enabled* parameter is disabled. > We also get the expected result when we don't CAST or don't have SUM/COUNT > aggregation in IF condition. > The following queries give correct result when > hive.vectorized.execution.reduce.groupby.enabled is set. > {code:java} > SELECT CAST(col AS INT) col_cast FROM ( SELECT IF(SUM(1) > 0, 1, 0) col FROM > test) x; > SELECT col FROM ( SELECT IF(SUM(1) > 0, 1, 0) col FROM test) x; > SELECT CAST(col AS TINYINT) col_cast FROM ( SELECT IF(2 > 1, 1, 0) col FROM > test) x; > SELECT CAST(col AS TINYINT) col_cast FROM ( SELECT IF(true, 1, 0) col FROM > test) x; > {code} > > This issue is only when we use *CAST(col AS TINYINT)* along with *IF(SUM(1) > > 0, 1, 0)* or *IF(COUNT(1) > 0, 1, 0)* in sub query. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-22383) `alterPartitions` is invoked twice during dynamic partition load causing runtime delay
[ https://issues.apache.org/jira/browse/HIVE-22383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wenjun ma reassigned HIVE-22383: Assignee: wenjun ma > `alterPartitions` is invoked twice during dynamic partition load causing > runtime delay > -- > > Key: HIVE-22383 > URL: https://issues.apache.org/jira/browse/HIVE-22383 > Project: Hive > Issue Type: Bug >Reporter: Rajesh Balamohan >Assignee: wenjun ma >Priority: Major > Labels: performance > > First invocation in {{Hive::loadDynamicPartitions}} > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2978 > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2638 > Second invocation in {{BasicStatsTask::aggregateStats}} > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsTask.java#L335 > This leads to good amount of delay in dynamic partition loading. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23740) [Hive]delete from ; without where clause not giving correct error msg
[ https://issues.apache.org/jira/browse/HIVE-23740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17160299#comment-17160299 ] wenjun ma commented on HIVE-23740: -- Hi [~abhishek.akg], It should be as design. For insert_only table, you can only inert and drop it. > [Hive]delete from ; without where clause not giving correct error > msg > > > Key: HIVE-23740 > URL: https://issues.apache.org/jira/browse/HIVE-23740 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 3.1.0 >Reporter: ABHISHEK KUMAR GUPTA >Assignee: wenjun ma >Priority: Minor > > Created Hive table from Hive > Inserted data > and fire delete from ; > CREATE TABLE insert_only (key int, value string) STORED AS ORC > TBLPROPERTIES ("transactional"="true", > "transactional_properties"="insert_only"); > INSERT INTO insert_only VALUES (13,'BAD'), (14,'SUCCESS'); > delete from insert_only; > Error throws: > Error: Error while compiling statement: FAILED: SemanticException [Error > 10414]: Attempt to do update or delete on table hive.insert_only that is > insert-only transactional (state=42000,code=10414) > Expectation: > Should throw as where clause is missing because to delete all the content of > table hive provides truncate table ; -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23740) [Hive]delete from ; without where clause not giving correct error msg
[ https://issues.apache.org/jira/browse/HIVE-23740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wenjun ma reassigned HIVE-23740: Assignee: wenjun ma > [Hive]delete from ; without where clause not giving correct error > msg > > > Key: HIVE-23740 > URL: https://issues.apache.org/jira/browse/HIVE-23740 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 3.1.0 >Reporter: ABHISHEK KUMAR GUPTA >Assignee: wenjun ma >Priority: Minor > > Created Hive table from Hive > Inserted data > and fire delete from ; > CREATE TABLE insert_only (key int, value string) STORED AS ORC > TBLPROPERTIES ("transactional"="true", > "transactional_properties"="insert_only"); > INSERT INTO insert_only VALUES (13,'BAD'), (14,'SUCCESS'); > delete from insert_only; > Error throws: > Error: Error while compiling statement: FAILED: SemanticException [Error > 10414]: Attempt to do update or delete on table hive.insert_only that is > insert-only transactional (state=42000,code=10414) > Expectation: > Should throw as where clause is missing because to delete all the content of > table hive provides truncate table ; -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23351) Ranger Replication Scheduling
[ https://issues.apache.org/jira/browse/HIVE-23351?focusedWorklogId=460626=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460626 ] ASF GitHub Bot logged work on HIVE-23351: - Author: ASF GitHub Bot Created on: 18/Jul/20 00:32 Start Date: 18/Jul/20 00:32 Worklog Time Spent: 10m Work Description: github-actions[bot] closed pull request #1004: URL: https://github.com/apache/hive/pull/1004 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 460626) Time Spent: 3h 10m (was: 3h) > Ranger Replication Scheduling > - > > Key: HIVE-23351 > URL: https://issues.apache.org/jira/browse/HIVE-23351 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23351.01.patch, HIVE-23351.02.patch, > HIVE-23351.03.patch, HIVE-23351.04.patch, HIVE-23351.05.patch, > HIVE-23351.06.patch, HIVE-23351.07.patch, HIVE-23351.08.patch, > HIVE-23351.09.patch, HIVE-23351.10.patch, HIVE-23351.10.patch, > HIVE-23351.11.patch, HIVE-23351.12.patch > > Time Spent: 3h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23876) Miss init NOTIFICATION_SEQUENCE on hive-schema-3.1.0.mssql.sql
[ https://issues.apache.org/jira/browse/HIVE-23876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wenjun ma reassigned HIVE-23876: Assignee: wenjun ma > Miss init NOTIFICATION_SEQUENCE on hive-schema-3.1.0.mssql.sql > -- > > Key: HIVE-23876 > URL: https://issues.apache.org/jira/browse/HIVE-23876 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: wenjun ma >Assignee: wenjun ma >Priority: Major > > Miss > INSERT INTO NOTIFICATION_SEQUENCE (NNI_ID, NEXT_EVENT_ID) SELECT 1,1 FROM > DUAL WHERE NOT EXISTS ( SELECT NEXT_EVENT_ID FROM NOTIFICATION_SEQUENCE); > in hive-schema-3.1.0.mssql.sql > Others db schemas are OK. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23876) Miss init NOTIFICATION_SEQUENCE on hive-schema-3.1.0.mssql.sql
[ https://issues.apache.org/jira/browse/HIVE-23876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wenjun ma updated HIVE-23876: - Description: Miss INSERT INTO NOTIFICATION_SEQUENCE (NNI_ID, NEXT_EVENT_ID) SELECT 1,1 FROM DUAL WHERE NOT EXISTS ( SELECT NEXT_EVENT_ID FROM NOTIFICATION_SEQUENCE); in hive-schema-3.1.0.mssql.sql Others db schemas are OK. was: Miss INSERT INTO NOTIFICATION_SEQUENCE (NNI_ID, NEXT_EVENT_ID) SELECT 1,1 FROM DUAL WHERE NOT EXISTS ( SELECT NEXT_EVENT_ID FROM NOTIFICATION_SEQUENCE); in hive-schema-3.1.0.mssql.sql > Miss init NOTIFICATION_SEQUENCE on hive-schema-3.1.0.mssql.sql > -- > > Key: HIVE-23876 > URL: https://issues.apache.org/jira/browse/HIVE-23876 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: wenjun ma >Priority: Major > > Miss > INSERT INTO NOTIFICATION_SEQUENCE (NNI_ID, NEXT_EVENT_ID) SELECT 1,1 FROM > DUAL WHERE NOT EXISTS ( SELECT NEXT_EVENT_ID FROM NOTIFICATION_SEQUENCE); > in hive-schema-3.1.0.mssql.sql > Others db schemas are OK. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23836) Make "cols" dependent so that it cascade deletes
[ https://issues.apache.org/jira/browse/HIVE-23836?focusedWorklogId=460624=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460624 ] ASF GitHub Bot logged work on HIVE-23836: - Author: ASF GitHub Bot Created on: 18/Jul/20 00:27 Start Date: 18/Jul/20 00:27 Worklog Time Spent: 10m Work Description: ashutoshc commented on pull request #1239: URL: https://github.com/apache/hive/pull/1239#issuecomment-660391990 +1 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 460624) Time Spent: 0.5h (was: 20m) > Make "cols" dependent so that it cascade deletes > > > Key: HIVE-23836 > URL: https://issues.apache.org/jira/browse/HIVE-23836 > Project: Hive > Issue Type: Bug >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > {quote} > If you want the deletion of a persistent object to cause the deletion of > related objects then you need to mark the related fields in the mapping to be > "dependent". > {quote} > http://www.datanucleus.org/products/accessplatform/jdo/persistence.html#dependent_fields > http://www.datanucleus.org/products/datanucleus/jdo/persistence.html#_deleting_an_object > The database won't do it: > {code:sql|title=Derby Schema} > ALTER TABLE "APP"."COLUMNS_V2" ADD CONSTRAINT "COLUMNS_V2_FK1" FOREIGN KEY > ("CD_ID") REFERENCES "APP"."CDS" ("CD_ID") ON DELETE NO ACTION ON UPDATE NO > ACTION; > {code} > https://github.com/apache/hive/blob/65cf6957cf9432277a096f91b40985237274579f/standalone-metastore/metastore-server/src/main/sql/derby/hive-schema-4.0.0.derby.sql#L452 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-23707) Unable to create materialized views with transactions enabled with MySQL metastore
[ https://issues.apache.org/jira/browse/HIVE-23707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17160283#comment-17160283 ] wenjun ma edited comment on HIVE-23707 at 7/18/20, 12:26 AM: - I can not reproduce this issue with the same version with the MS SQL database. can you try to create a new cluster to reproduce? was (Author: wenjunma003): I can not reproduce this issue with the same version with the MS SQL database. what're kinds of DB do you use it? can you try to create a new cluster to reproduce? > Unable to create materialized views with transactions enabled with MySQL > metastore > -- > > Key: HIVE-23707 > URL: https://issues.apache.org/jira/browse/HIVE-23707 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 3.1.2 >Reporter: Dustin Koupal >Assignee: wenjun ma >Priority: Blocker > > When attempting to create a materialized view with transactions enabled, we > get the following exception: > > {code:java} > ERROR : FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Failed to > generate new Mapping of type > org.datanucleus.store.rdbms.mapping.java.StringMapping, exception : JDBC type > CLOB declared for field > "org.apache.hadoop.hive.metastore.model.MCreationMetadata.txnList" of java > type java.lang.String cant be mapped for this datastore.ERROR : FAILED: > Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. > MetaException(message:Failed to generate new Mapping of type > org.datanucleus.store.rdbms.mapping.java.StringMapping, exception : JDBC type > CLOB declared for field > "org.apache.hadoop.hive.metastore.model.MCreationMetadata.txnList" of java > type java.lang.String cant be mapped for this datastore.JDBC type CLOB > declared for field > "org.apache.hadoop.hive.metastore.model.MCreationMetadata.txnList" of java > type java.lang.String cant be mapped for this > datastore.org.datanucleus.exceptions.NucleusException: JDBC type CLOB > declared for field > "org.apache.hadoop.hive.metastore.model.MCreationMetadata.txnList" of java > type java.lang.String cant be mapped for this datastore. at > org.datanucleus.store.rdbms.mapping.RDBMSMappingManager.getDatastoreMappingClass(RDBMSMappingManager.java:1386) > at > org.datanucleus.store.rdbms.mapping.RDBMSMappingManager.createDatastoreMapping(RDBMSMappingManager.java:1616) > at > org.datanucleus.store.rdbms.mapping.java.SingleFieldMapping.prepareDatastoreMapping(SingleFieldMapping.java:59) > at > org.datanucleus.store.rdbms.mapping.java.SingleFieldMapping.initialize(SingleFieldMapping.java:48) > at > org.datanucleus.store.rdbms.mapping.RDBMSMappingManager.getMapping(RDBMSMappingManager.java:482) > at > org.datanucleus.store.rdbms.table.ClassTable.manageMembers(ClassTable.java:536) > at > org.datanucleus.store.rdbms.table.ClassTable.manageClass(ClassTable.java:442) > at > org.datanucleus.store.rdbms.table.ClassTable.initializeForClass(ClassTable.java:1270) > at > org.datanucleus.store.rdbms.table.ClassTable.initialize(ClassTable.java:276) > at > org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.initializeClassTables(RDBMSStoreManager.java:3279) > at > org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.run(RDBMSStoreManager.java:2889) > at > org.datanucleus.store.rdbms.AbstractSchemaTransaction.execute(AbstractSchemaTransaction.java:119) > at > org.datanucleus.store.rdbms.RDBMSStoreManager.manageClasses(RDBMSStoreManager.java:1627) > at > org.datanucleus.store.rdbms.RDBMSStoreManager.getDatastoreClass(RDBMSStoreManager.java:672) > at > org.datanucleus.store.rdbms.RDBMSStoreManager.getPropertiesForGenerator(RDBMSStoreManager.java:2088) > at > org.datanucleus.store.AbstractStoreManager.getStrategyValue(AbstractStoreManager.java:1271) > at > org.datanucleus.ExecutionContextImpl.newObjectId(ExecutionContextImpl.java:3760) > at > org.datanucleus.state.StateManagerImpl.setIdentity(StateManagerImpl.java:2267) > at > org.datanucleus.state.StateManagerImpl.initialiseForPersistentNew(StateManagerImpl.java:484) > at > org.datanucleus.state.StateManagerImpl.initialiseForPersistentNew(StateManagerImpl.java:120) > at > org.datanucleus.state.ObjectProviderFactoryImpl.newForPersistentNew(ObjectProviderFactoryImpl.java:218) > at > org.datanucleus.ExecutionContextImpl.persistObjectInternal(ExecutionContextImpl.java:2079) > at > org.datanucleus.ExecutionContextImpl.persistObjectWork(ExecutionContextImpl.java:1923) > at > org.datanucleus.ExecutionContextImpl.persistObject(ExecutionContextImpl.java:1778) > at >
[jira] [Commented] (HIVE-23707) Unable to create materialized views with transactions enabled with MySQL metastore
[ https://issues.apache.org/jira/browse/HIVE-23707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17160283#comment-17160283 ] wenjun ma commented on HIVE-23707: -- I can not reproduce this issue with the same version with the MS SQL database. what're kinds of DB do you use it? can you try to create a new cluster to reproduce? > Unable to create materialized views with transactions enabled with MySQL > metastore > -- > > Key: HIVE-23707 > URL: https://issues.apache.org/jira/browse/HIVE-23707 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 3.1.2 >Reporter: Dustin Koupal >Assignee: wenjun ma >Priority: Blocker > > When attempting to create a materialized view with transactions enabled, we > get the following exception: > > {code:java} > ERROR : FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Failed to > generate new Mapping of type > org.datanucleus.store.rdbms.mapping.java.StringMapping, exception : JDBC type > CLOB declared for field > "org.apache.hadoop.hive.metastore.model.MCreationMetadata.txnList" of java > type java.lang.String cant be mapped for this datastore.ERROR : FAILED: > Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. > MetaException(message:Failed to generate new Mapping of type > org.datanucleus.store.rdbms.mapping.java.StringMapping, exception : JDBC type > CLOB declared for field > "org.apache.hadoop.hive.metastore.model.MCreationMetadata.txnList" of java > type java.lang.String cant be mapped for this datastore.JDBC type CLOB > declared for field > "org.apache.hadoop.hive.metastore.model.MCreationMetadata.txnList" of java > type java.lang.String cant be mapped for this > datastore.org.datanucleus.exceptions.NucleusException: JDBC type CLOB > declared for field > "org.apache.hadoop.hive.metastore.model.MCreationMetadata.txnList" of java > type java.lang.String cant be mapped for this datastore. at > org.datanucleus.store.rdbms.mapping.RDBMSMappingManager.getDatastoreMappingClass(RDBMSMappingManager.java:1386) > at > org.datanucleus.store.rdbms.mapping.RDBMSMappingManager.createDatastoreMapping(RDBMSMappingManager.java:1616) > at > org.datanucleus.store.rdbms.mapping.java.SingleFieldMapping.prepareDatastoreMapping(SingleFieldMapping.java:59) > at > org.datanucleus.store.rdbms.mapping.java.SingleFieldMapping.initialize(SingleFieldMapping.java:48) > at > org.datanucleus.store.rdbms.mapping.RDBMSMappingManager.getMapping(RDBMSMappingManager.java:482) > at > org.datanucleus.store.rdbms.table.ClassTable.manageMembers(ClassTable.java:536) > at > org.datanucleus.store.rdbms.table.ClassTable.manageClass(ClassTable.java:442) > at > org.datanucleus.store.rdbms.table.ClassTable.initializeForClass(ClassTable.java:1270) > at > org.datanucleus.store.rdbms.table.ClassTable.initialize(ClassTable.java:276) > at > org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.initializeClassTables(RDBMSStoreManager.java:3279) > at > org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.run(RDBMSStoreManager.java:2889) > at > org.datanucleus.store.rdbms.AbstractSchemaTransaction.execute(AbstractSchemaTransaction.java:119) > at > org.datanucleus.store.rdbms.RDBMSStoreManager.manageClasses(RDBMSStoreManager.java:1627) > at > org.datanucleus.store.rdbms.RDBMSStoreManager.getDatastoreClass(RDBMSStoreManager.java:672) > at > org.datanucleus.store.rdbms.RDBMSStoreManager.getPropertiesForGenerator(RDBMSStoreManager.java:2088) > at > org.datanucleus.store.AbstractStoreManager.getStrategyValue(AbstractStoreManager.java:1271) > at > org.datanucleus.ExecutionContextImpl.newObjectId(ExecutionContextImpl.java:3760) > at > org.datanucleus.state.StateManagerImpl.setIdentity(StateManagerImpl.java:2267) > at > org.datanucleus.state.StateManagerImpl.initialiseForPersistentNew(StateManagerImpl.java:484) > at > org.datanucleus.state.StateManagerImpl.initialiseForPersistentNew(StateManagerImpl.java:120) > at > org.datanucleus.state.ObjectProviderFactoryImpl.newForPersistentNew(ObjectProviderFactoryImpl.java:218) > at > org.datanucleus.ExecutionContextImpl.persistObjectInternal(ExecutionContextImpl.java:2079) > at > org.datanucleus.ExecutionContextImpl.persistObjectWork(ExecutionContextImpl.java:1923) > at > org.datanucleus.ExecutionContextImpl.persistObject(ExecutionContextImpl.java:1778) > at > org.datanucleus.ExecutionContextThreadedImpl.persistObject(ExecutionContextThreadedImpl.java:217) > at > org.datanucleus.api.jdo.JDOPersistenceManager.jdoMakePersistent(JDOPersistenceManager.java:724) > at > org.datanucleus.api.jdo.JDOPersistenceManager.makePersistent(JDOPersistenceManager.java:749) >
[jira] [Work logged] (HIVE-23855) TestQueryShutdownHooks is flaky
[ https://issues.apache.org/jira/browse/HIVE-23855?focusedWorklogId=460620=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460620 ] ASF GitHub Bot logged work on HIVE-23855: - Author: ASF GitHub Bot Created on: 18/Jul/20 00:17 Start Date: 18/Jul/20 00:17 Worklog Time Spent: 10m Work Description: mustafaiman opened a new pull request #1277: URL: https://github.com/apache/hive/pull/1277 Increased timeout for async query. Test were not isolated very well. Test async query did not clean up properly. State leaked to test sync causing it to fail. Cleanup is moved to @After so cleanup is always run. Change-Id: I669ba35c22020910f5e348003b1f05d8a7cde75d ## NOTICE Please create an issue in ASF JIRA before opening a pull request, and you need to set the title of the pull request which starts with the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY) For more details, please see https://cwiki.apache.org/confluence/display/Hive/HowToContribute This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 460620) Time Spent: 0.5h (was: 20m) > TestQueryShutdownHooks is flaky > --- > > Key: HIVE-23855 > URL: https://issues.apache.org/jira/browse/HIVE-23855 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Mustafa Iman >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > http://ci.hive.apache.org/job/hive-precommit/job/master/100/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23855) TestQueryShutdownHooks is flaky
[ https://issues.apache.org/jira/browse/HIVE-23855?focusedWorklogId=460619=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460619 ] ASF GitHub Bot logged work on HIVE-23855: - Author: ASF GitHub Bot Created on: 18/Jul/20 00:16 Start Date: 18/Jul/20 00:16 Worklog Time Spent: 10m Work Description: mustafaiman closed pull request #1277: URL: https://github.com/apache/hive/pull/1277 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 460619) Time Spent: 20m (was: 10m) > TestQueryShutdownHooks is flaky > --- > > Key: HIVE-23855 > URL: https://issues.apache.org/jira/browse/HIVE-23855 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Mustafa Iman >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > http://ci.hive.apache.org/job/hive-precommit/job/master/100/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23797) Throw exception when no metastore found in zookeeper
[ https://issues.apache.org/jira/browse/HIVE-23797?focusedWorklogId=460615=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460615 ] ASF GitHub Bot logged work on HIVE-23797: - Author: ASF GitHub Bot Created on: 17/Jul/20 23:58 Start Date: 17/Jul/20 23:58 Worklog Time Spent: 10m Work Description: dengzhhu653 commented on pull request #1201: URL: https://github.com/apache/hive/pull/1201#issuecomment-660385890 @belugabehr Is there anything else to do to make the pr get through? thank you very much! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 460615) Time Spent: 1h (was: 50m) > Throw exception when no metastore found in zookeeper > - > > Key: HIVE-23797 > URL: https://issues.apache.org/jira/browse/HIVE-23797 > Project: Hive > Issue Type: Improvement >Reporter: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > When enable service discovery for metastore, there is a chance that the > client may find no metastore uris available in zookeeper, such as during > metastores startup or the client wrongly configured the path. This results to > redundant retries and finally MetaException with "Unknown exception" message. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-20441) NPE in GenericUDF when hive.allow.udf.load.on.demand is set to true
[ https://issues.apache.org/jira/browse/HIVE-20441?focusedWorklogId=460611=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460611 ] ASF GitHub Bot logged work on HIVE-20441: - Author: ASF GitHub Bot Created on: 17/Jul/20 23:53 Start Date: 17/Jul/20 23:53 Worklog Time Spent: 10m Work Description: dengzhhu653 commented on pull request #1242: URL: https://github.com/apache/hive/pull/1242#issuecomment-660384821 @kgyrtkirk @pvary cloud you please take a look? thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 460611) Time Spent: 1h 10m (was: 1h) > NPE in GenericUDF when hive.allow.udf.load.on.demand is set to true > > > Key: HIVE-20441 > URL: https://issues.apache.org/jira/browse/HIVE-20441 > Project: Hive > Issue Type: Bug > Components: CLI, HiveServer2 >Affects Versions: 1.2.1, 2.3.3 >Reporter: Hui Huang >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Attachments: HIVE-20441.1.patch, HIVE-20441.2.patch, > HIVE-20441.3.patch, HIVE-20441.4.patch, HIVE-20441.patch > > Time Spent: 1h 10m > Remaining Estimate: 0h > > When hive.allow.udf.load.on.demand is set to true and hiveserver2 has been > started, the new created function from other clients or hiveserver2 will be > loaded from the metastore at the first time. > When the udf is used in where clause, we got a NPE like: > {code:java} > Error executing statement: > org.apache.hive.service.cli.HiveSQLException: Error while compiling > statement: FAILED: NullPointerException null > at > org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at > org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:206) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at > org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:290) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at > org.apache.hive.service.cli.operation.Operation.run(Operation.java:320) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at > org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:530) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAP > SHOT] > at > org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:517) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHO > T] > at > org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:310) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at > org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:542) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at > org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1437) > ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNA > PSHOT] > at > org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1422) > ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNA > PSHOT] > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at > org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:57) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) > ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [?:1.8.0_77] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [?:1.8.0_77] > at java.lang.Thread.run(Thread.java:745) [?:1.8.0_77] > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc.newInstance(ExprNodeGenericFuncDesc.java:236) > ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.getXpathOrFuncExprNodeDesc(TypeCheckProcFactory.java:1104)
[jira] [Work logged] (HIVE-23855) TestQueryShutdownHooks is flaky
[ https://issues.apache.org/jira/browse/HIVE-23855?focusedWorklogId=460573=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460573 ] ASF GitHub Bot logged work on HIVE-23855: - Author: ASF GitHub Bot Created on: 17/Jul/20 22:12 Start Date: 17/Jul/20 22:12 Worklog Time Spent: 10m Work Description: mustafaiman opened a new pull request #1277: URL: https://github.com/apache/hive/pull/1277 Increased timeout for async query. Test were not isolated very well. Test async query did not clean up properly. State leaked to test sync causing it to fail. Cleanup is moved to @After so cleanup is always run. Change-Id: I669ba35c22020910f5e348003b1f05d8a7cde75d ## NOTICE Please create an issue in ASF JIRA before opening a pull request, and you need to set the title of the pull request which starts with the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY) For more details, please see https://cwiki.apache.org/confluence/display/Hive/HowToContribute This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 460573) Remaining Estimate: 0h Time Spent: 10m > TestQueryShutdownHooks is flaky > --- > > Key: HIVE-23855 > URL: https://issues.apache.org/jira/browse/HIVE-23855 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Mustafa Iman >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > http://ci.hive.apache.org/job/hive-precommit/job/master/100/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23855) TestQueryShutdownHooks is flaky
[ https://issues.apache.org/jira/browse/HIVE-23855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-23855: -- Labels: pull-request-available (was: ) > TestQueryShutdownHooks is flaky > --- > > Key: HIVE-23855 > URL: https://issues.apache.org/jira/browse/HIVE-23855 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Mustafa Iman >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > http://ci.hive.apache.org/job/hive-precommit/job/master/100/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23874) Add Debug Logging to HiveQueryResultSet
[ https://issues.apache.org/jira/browse/HIVE-23874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hunter Logan reassigned HIVE-23874: --- Assignee: Hunter Logan > Add Debug Logging to HiveQueryResultSet > --- > > Key: HIVE-23874 > URL: https://issues.apache.org/jira/browse/HIVE-23874 > Project: Hive > Issue Type: Improvement > Components: JDBC >Reporter: Hunter Logan >Assignee: Hunter Logan >Priority: Minor > > Adding a debug message on this topic with handle, orientation, and fetch size > would be useful. > [https://github.com/apache/hive/blob/bc00454c194413753ac1d7067044ca78c77e1a34/jdbc/src/java/org/apache/hive/jdbc/HiveQueryResultSet.java#L342] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23875) Add VSCode files to gitignore
[ https://issues.apache.org/jira/browse/HIVE-23875?focusedWorklogId=460519=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460519 ] ASF GitHub Bot logged work on HIVE-23875: - Author: ASF GitHub Bot Created on: 17/Jul/20 20:47 Start Date: 17/Jul/20 20:47 Worklog Time Spent: 10m Work Description: HunterL opened a new pull request #1276: URL: https://github.com/apache/hive/pull/1276 Added VSCode files to gitignore This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 460519) Remaining Estimate: 0h Time Spent: 10m > Add VSCode files to gitignore > - > > Key: HIVE-23875 > URL: https://issues.apache.org/jira/browse/HIVE-23875 > Project: Hive > Issue Type: Improvement >Reporter: Hunter Logan >Assignee: Hunter Logan >Priority: Trivial > Time Spent: 10m > Remaining Estimate: 0h > > gitignore currently includes Eclipse and Intellij specific files, should > include VSCode as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23875) Add VSCode files to gitignore
[ https://issues.apache.org/jira/browse/HIVE-23875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-23875: -- Labels: pull-request-available (was: ) > Add VSCode files to gitignore > - > > Key: HIVE-23875 > URL: https://issues.apache.org/jira/browse/HIVE-23875 > Project: Hive > Issue Type: Improvement >Reporter: Hunter Logan >Assignee: Hunter Logan >Priority: Trivial > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > gitignore currently includes Eclipse and Intellij specific files, should > include VSCode as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23875) Add VSCode files to gitignore
[ https://issues.apache.org/jira/browse/HIVE-23875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hunter Logan reassigned HIVE-23875: --- Assignee: Hunter Logan > Add VSCode files to gitignore > - > > Key: HIVE-23875 > URL: https://issues.apache.org/jira/browse/HIVE-23875 > Project: Hive > Issue Type: Improvement >Reporter: Hunter Logan >Assignee: Hunter Logan >Priority: Trivial > > gitignore currently includes Eclipse and Intellij specific files, should > include VSCode as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23871) ObjectStore should properly handle MicroManaged Table properties
[ https://issues.apache.org/jira/browse/HIVE-23871?focusedWorklogId=460467=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460467 ] ASF GitHub Bot logged work on HIVE-23871: - Author: ASF GitHub Bot Created on: 17/Jul/20 18:29 Start Date: 17/Jul/20 18:29 Worklog Time Spent: 10m Work Description: pgaref edited a comment on pull request #1273: URL: https://github.com/apache/hive/pull/1273#issuecomment-660269331 Thanks for the review @mustafaiman ! Addressed your comments as part of the second commit -- I am still expecting some q.out differences so currently waiting for the qtests to run This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 460467) Time Spent: 1h (was: 50m) > ObjectStore should properly handle MicroManaged Table properties > > > Key: HIVE-23871 > URL: https://issues.apache.org/jira/browse/HIVE-23871 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Labels: pull-request-available > Attachments: table1 > > Time Spent: 1h > Remaining Estimate: 0h > > HIVE-23281 optimizes StorageDescriptor conversion as part of the ObjectStore > by skipping particular Table properties like SkewInfo, bucketCols, ordering > etc. > However, it does that for all Transactional Tables – not only ACID – causing > MicroManaged Tables to behave abnormally. > MicroManaged (insert_only) tables may miss needed properties such as Storage > Desc Params – that may define how lines are delimited (like in the example > below): > To repro the issue: > {code:java} > CREATE TRANSACTIONAL TABLE delim_table_trans(id INT, name STRING, safety INT) > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE; > LOAD DATA INPATH 'table1' OVERWRITE INTO TABLE delim_table_trans; > describe formatted delim_table_trans; > SELECT * FROM delim_table_trans; > {code} > Result: > {code:java} > Table Type: MANAGED_TABLE > Table Parameters: > bucketing_version 2 > numFiles1 > numRows 0 > rawDataSize 0 > totalSize 72 > transactional true > transactional_propertiesinsert_only > A masked pattern was here > > # Storage Information > SerDe Library:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > > InputFormat: org.apache.hadoop.mapred.TextInputFormat > OutputFormat: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > Compressed: No > Num Buckets: -1 > Bucket Columns: [] > Sort Columns: [] > PREHOOK: query: SELECT * FROM delim_table_trans > PREHOOK: type: QUERY > PREHOOK: Input: default@delim_table_trans > A masked pattern was here > POSTHOOK: query: SELECT * FROM delim_table_trans > POSTHOOK: type: QUERY > POSTHOOK: Input: default@delim_table_trans > A masked pattern was here > NULL NULLNULL > NULL NULLNULL > NULL NULLNULL > NULL NULLNULL > NULL NULLNULL > NULL NULLNULL > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23871) ObjectStore should properly handle MicroManaged Table properties
[ https://issues.apache.org/jira/browse/HIVE-23871?focusedWorklogId=460456=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460456 ] ASF GitHub Bot logged work on HIVE-23871: - Author: ASF GitHub Bot Created on: 17/Jul/20 18:24 Start Date: 17/Jul/20 18:24 Worklog Time Spent: 10m Work Description: pgaref commented on pull request #1273: URL: https://github.com/apache/hive/pull/1273#issuecomment-660269331 Thanks for the review @mustafaiman ! Addressed your comments as part of the second commit -- I am still expecting some q.out difference so I am currently waiting for the qtests to run This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 460456) Time Spent: 50m (was: 40m) > ObjectStore should properly handle MicroManaged Table properties > > > Key: HIVE-23871 > URL: https://issues.apache.org/jira/browse/HIVE-23871 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Labels: pull-request-available > Attachments: table1 > > Time Spent: 50m > Remaining Estimate: 0h > > HIVE-23281 optimizes StorageDescriptor conversion as part of the ObjectStore > by skipping particular Table properties like SkewInfo, bucketCols, ordering > etc. > However, it does that for all Transactional Tables – not only ACID – causing > MicroManaged Tables to behave abnormally. > MicroManaged (insert_only) tables may miss needed properties such as Storage > Desc Params – that may define how lines are delimited (like in the example > below): > To repro the issue: > {code:java} > CREATE TRANSACTIONAL TABLE delim_table_trans(id INT, name STRING, safety INT) > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE; > LOAD DATA INPATH 'table1' OVERWRITE INTO TABLE delim_table_trans; > describe formatted delim_table_trans; > SELECT * FROM delim_table_trans; > {code} > Result: > {code:java} > Table Type: MANAGED_TABLE > Table Parameters: > bucketing_version 2 > numFiles1 > numRows 0 > rawDataSize 0 > totalSize 72 > transactional true > transactional_propertiesinsert_only > A masked pattern was here > > # Storage Information > SerDe Library:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > > InputFormat: org.apache.hadoop.mapred.TextInputFormat > OutputFormat: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > Compressed: No > Num Buckets: -1 > Bucket Columns: [] > Sort Columns: [] > PREHOOK: query: SELECT * FROM delim_table_trans > PREHOOK: type: QUERY > PREHOOK: Input: default@delim_table_trans > A masked pattern was here > POSTHOOK: query: SELECT * FROM delim_table_trans > POSTHOOK: type: QUERY > POSTHOOK: Input: default@delim_table_trans > A masked pattern was here > NULL NULLNULL > NULL NULLNULL > NULL NULLNULL > NULL NULLNULL > NULL NULLNULL > NULL NULLNULL > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23871) ObjectStore should properly handle MicroManaged Table properties
[ https://issues.apache.org/jira/browse/HIVE-23871?focusedWorklogId=460455=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460455 ] ASF GitHub Bot logged work on HIVE-23871: - Author: ASF GitHub Bot Created on: 17/Jul/20 18:23 Start Date: 17/Jul/20 18:23 Worklog Time Spent: 10m Work Description: pgaref commented on a change in pull request #1273: URL: https://github.com/apache/hive/pull/1273#discussion_r456604112 ## File path: ql/src/test/results/clientpositive/llap/load_micromanaged_delim.q.out ## @@ -0,0 +1,186 @@ + A masked pattern was here +PREHOOK: type: CREATETABLE + A masked pattern was here +PREHOOK: Output: database:default +PREHOOK: Output: default@delim_table_ext + A masked pattern was here +POSTHOOK: type: CREATETABLE + A masked pattern was here +POSTHOOK: Output: database:default +POSTHOOK: Output: default@delim_table_ext +PREHOOK: query: describe formatted delim_table_ext +PREHOOK: type: DESCTABLE +PREHOOK: Input: default@delim_table_ext +POSTHOOK: query: describe formatted delim_table_ext +POSTHOOK: type: DESCTABLE +POSTHOOK: Input: default@delim_table_ext +# col_name data_type comment +id int +name string +safety int + +# Detailed Table Information +Database: default + A masked pattern was here +Retention: 0 + A masked pattern was here +Table Type:EXTERNAL_TABLE +Table Parameters: + EXTERNALTRUE + bucketing_version 2 + numFiles1 + totalSize 52 + A masked pattern was here + +# Storage Information +SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe +InputFormat: org.apache.hadoop.mapred.TextInputFormat +OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat +Compressed:No +Num Buckets: -1 +Bucket Columns:[] +Sort Columns: [] +Storage Desc Params: + field.delim \t + serialization.format\t +PREHOOK: query: SELECT * FROM delim_table_ext +PREHOOK: type: QUERY +PREHOOK: Input: default@delim_table_ext + A masked pattern was here +POSTHOOK: query: SELECT * FROM delim_table_ext +POSTHOOK: type: QUERY +POSTHOOK: Input: default@delim_table_ext + A masked pattern was here +1 Acura 4 +2 Toyota 3 +3 Tesla 5 +4 Honda 5 +11 Mazda 2 +PREHOOK: query: CREATE TABLE delim_table_micro(id INT, name STRING, safety INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE TBLPROPERTIES('transactional'='true', "transactional_properties"="insert_only") +PREHOOK: type: CREATETABLE +PREHOOK: Output: database:default +PREHOOK: Output: default@delim_table_micro +POSTHOOK: query: CREATE TABLE delim_table_micro(id INT, name STRING, safety INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE TBLPROPERTIES('transactional'='true', "transactional_properties"="insert_only") +POSTHOOK: type: CREATETABLE +POSTHOOK: Output: database:default +POSTHOOK: Output: default@delim_table_micro + A masked pattern was here +PREHOOK: type: LOAD + A masked pattern was here +PREHOOK: Output: default@delim_table_micro + A masked pattern was here +POSTHOOK: type: LOAD + A masked pattern was here +POSTHOOK: Output: default@delim_table_micro +PREHOOK: query: describe formatted delim_table_micro +PREHOOK: type: DESCTABLE +PREHOOK: Input: default@delim_table_micro +POSTHOOK: query: describe formatted delim_table_micro +POSTHOOK: type: DESCTABLE +POSTHOOK: Input: default@delim_table_micro +# col_name data_type comment +id int +name string +safety int + +# Detailed Table Information +Database: default + A masked pattern was here +Retention: 0 + A masked pattern was here +Table Type:MANAGED_TABLE +Table Parameters: + bucketing_version 2 +
[jira] [Work logged] (HIVE-23871) ObjectStore should properly handle MicroManaged Table properties
[ https://issues.apache.org/jira/browse/HIVE-23871?focusedWorklogId=460454=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460454 ] ASF GitHub Bot logged work on HIVE-23871: - Author: ASF GitHub Bot Created on: 17/Jul/20 18:22 Start Date: 17/Jul/20 18:22 Worklog Time Spent: 10m Work Description: pgaref commented on a change in pull request #1273: URL: https://github.com/apache/hive/pull/1273#discussion_r456603816 ## File path: ql/src/test/queries/clientpositive/load_micromanaged_delim.q ## @@ -0,0 +1,32 @@ +set hive.support.concurrency=true; +set hive.exec.dynamic.partition.mode=nonstrict; +set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; + + +dfs -mkdir ${system:test.tmp.dir}/delim_table; +dfs -mkdir ${system:test.tmp.dir}/delim_table_ext; +dfs -mkdir ${system:test.tmp.dir}/delim_table_trans; +dfs -cp ${system:hive.root}/data/files/table1 ${system:test.tmp.dir}/delim_table/; +dfs -cp ${system:hive.root}/data/files/table1 ${system:test.tmp.dir}/delim_table_ext/; +dfs -cp ${system:hive.root}/data/files/table1 ${system:test.tmp.dir}/delim_table_trans/; + +-- Checking that MicroManged and External tables have the same behaviour with delimited input files +-- External table +CREATE EXTERNAL TABLE delim_table_ext(id INT, name STRING, safety INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE LOCATION '${system:test.tmp.dir}/delim_table_ext/'; +describe formatted delim_table_ext; +SELECT * FROM delim_table_ext; + +-- SET hive.create.as.acid=true +-- SET hive.create.as.insert.only=true Review comment: Creates the same behaviour as the Table properties below but I agree it makes sense to remove it ## File path: data/files/table1 ## @@ -0,0 +1,5 @@ +1 Acura 4 Review comment: sure, done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 460454) Time Spent: 0.5h (was: 20m) > ObjectStore should properly handle MicroManaged Table properties > > > Key: HIVE-23871 > URL: https://issues.apache.org/jira/browse/HIVE-23871 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Labels: pull-request-available > Attachments: table1 > > Time Spent: 0.5h > Remaining Estimate: 0h > > HIVE-23281 optimizes StorageDescriptor conversion as part of the ObjectStore > by skipping particular Table properties like SkewInfo, bucketCols, ordering > etc. > However, it does that for all Transactional Tables – not only ACID – causing > MicroManaged Tables to behave abnormally. > MicroManaged (insert_only) tables may miss needed properties such as Storage > Desc Params – that may define how lines are delimited (like in the example > below): > To repro the issue: > {code:java} > CREATE TRANSACTIONAL TABLE delim_table_trans(id INT, name STRING, safety INT) > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE; > LOAD DATA INPATH 'table1' OVERWRITE INTO TABLE delim_table_trans; > describe formatted delim_table_trans; > SELECT * FROM delim_table_trans; > {code} > Result: > {code:java} > Table Type: MANAGED_TABLE > Table Parameters: > bucketing_version 2 > numFiles1 > numRows 0 > rawDataSize 0 > totalSize 72 > transactional true > transactional_propertiesinsert_only > A masked pattern was here > > # Storage Information > SerDe Library:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > > InputFormat: org.apache.hadoop.mapred.TextInputFormat > OutputFormat: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > Compressed: No > Num Buckets: -1 > Bucket Columns: [] > Sort Columns: [] > PREHOOK: query: SELECT * FROM delim_table_trans > PREHOOK: type: QUERY > PREHOOK: Input: default@delim_table_trans > A masked pattern was here > POSTHOOK: query: SELECT * FROM delim_table_trans >
[jira] [Work logged] (HIVE-23786) HMS Server side filter
[ https://issues.apache.org/jira/browse/HIVE-23786?focusedWorklogId=460434=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460434 ] ASF GitHub Bot logged work on HIVE-23786: - Author: ASF GitHub Bot Created on: 17/Jul/20 18:07 Start Date: 17/Jul/20 18:07 Worklog Time Spent: 10m Work Description: sam-an-cloudera commented on a change in pull request #1221: URL: https://github.com/apache/hive/pull/1221#discussion_r456596817 ## File path: standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/client/TestGetPartitions.java ## @@ -416,7 +417,7 @@ public void testGetPartitionsByNamesNullDbName() throws Exception { createTable3PartCols1Part(client); client.getPartitionsByNames(null, TABLE_NAME, Lists.newArrayList("=2000/mm=01/dd=02")); fail("Should have thrown exception"); -} catch (NullPointerException | TTransportException e) { +} catch (NullPointerException | TTransportException | MetaException e) { Review comment: coalesce with above as 1 single test issue. ## File path: standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/client/TestGetPartitions.java ## @@ -427,7 +428,7 @@ public void testGetPartitionsByNamesNullTblName() throws Exception { createTable3PartCols1Part(client); client.getPartitionsByNames(DB_NAME, null, Lists.newArrayList("=2000/mm=01/dd=02")); fail("Should have thrown exception"); -} catch (NullPointerException | TTransportException e) { +} catch (NullPointerException | TTransportException | TProtocolException | MetaException e ) { Review comment: coalesce with above as 1 single test issue. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 460434) Time Spent: 3h 20m (was: 3h 10m) > HMS Server side filter > -- > > Key: HIVE-23786 > URL: https://issues.apache.org/jira/browse/HIVE-23786 > Project: Hive > Issue Type: Improvement >Reporter: Sam An >Assignee: Sam An >Priority: Major > Labels: pull-request-available > Time Spent: 3h 20m > Remaining Estimate: 0h > > HMS server side filter of results based on authorization. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23786) HMS Server side filter
[ https://issues.apache.org/jira/browse/HIVE-23786?focusedWorklogId=460432=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460432 ] ASF GitHub Bot logged work on HIVE-23786: - Author: ASF GitHub Bot Created on: 17/Jul/20 18:06 Start Date: 17/Jul/20 18:06 Worklog Time Spent: 10m Work Description: sam-an-cloudera commented on a change in pull request #1221: URL: https://github.com/apache/hive/pull/1221#discussion_r456596445 ## File path: ql/src/test/org/apache/hadoop/hive/ql/metadata/TestSessionHiveMetastoreClientGetPartitionsTempTable.java ## @@ -123,13 +123,13 @@ public void testGetPartitionsByNamesEmptyParts() throws Exception { getClient().getPartitionsByNames(DB_NAME, TABLE_NAME, Lists.newArrayList("", "")); } - @Test(expected = MetaException.class) + @Test @Override public void testGetPartitionsByNamesNullDbName() throws Exception { super.testGetPartitionsByNamesNullDbName(); } - @Test(expected = MetaException.class) + @Test Review comment: coalesce with above This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 460432) Time Spent: 3h 10m (was: 3h) > HMS Server side filter > -- > > Key: HIVE-23786 > URL: https://issues.apache.org/jira/browse/HIVE-23786 > Project: Hive > Issue Type: Improvement >Reporter: Sam An >Assignee: Sam An >Priority: Major > Labels: pull-request-available > Time Spent: 3h 10m > Remaining Estimate: 0h > > HMS server side filter of results based on authorization. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23786) HMS Server side filter
[ https://issues.apache.org/jira/browse/HIVE-23786?focusedWorklogId=460431=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460431 ] ASF GitHub Bot logged work on HIVE-23786: - Author: ASF GitHub Bot Created on: 17/Jul/20 18:06 Start Date: 17/Jul/20 18:06 Worklog Time Spent: 10m Work Description: sam-an-cloudera commented on a change in pull request #1221: URL: https://github.com/apache/hive/pull/1221#discussion_r456596312 ## File path: ql/src/test/org/apache/hadoop/hive/ql/metadata/TestSessionHiveMetastoreClientGetPartitionsTempTable.java ## @@ -123,13 +123,13 @@ public void testGetPartitionsByNamesEmptyParts() throws Exception { getClient().getPartitionsByNames(DB_NAME, TABLE_NAME, Lists.newArrayList("", "")); } - @Test(expected = MetaException.class) + @Test Review comment: I don't recall why we (Ramesh and I) made the change in downstream here, but I will see if it can be reverted. I didn't change HMS API per se. The MetaException could be thrown from getPartitionsByNames before my changes. Those are tests only. Any way, let me see if I can change to the way the tests were ,and if not, will give justification. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 460431) Time Spent: 3h (was: 2h 50m) > HMS Server side filter > -- > > Key: HIVE-23786 > URL: https://issues.apache.org/jira/browse/HIVE-23786 > Project: Hive > Issue Type: Improvement >Reporter: Sam An >Assignee: Sam An >Priority: Major > Labels: pull-request-available > Time Spent: 3h > Remaining Estimate: 0h > > HMS server side filter of results based on authorization. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23716) Support Anti Join in Hive
[ https://issues.apache.org/jira/browse/HIVE-23716?focusedWorklogId=460429=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460429 ] ASF GitHub Bot logged work on HIVE-23716: - Author: ASF GitHub Bot Created on: 17/Jul/20 18:02 Start Date: 17/Jul/20 18:02 Worklog Time Spent: 10m Work Description: maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r456594608 ## File path: ql/src/test/results/clientpositive/llap/antijoin.q.out ## @@ -0,0 +1,1007 @@ +PREHOOK: query: create table t1_n55 as select cast(key as int) key, value from src where key <= 10 +PREHOOK: type: CREATETABLE_AS_SELECT +PREHOOK: Input: default@src +PREHOOK: Output: database:default +PREHOOK: Output: default@t1_n55 +POSTHOOK: query: create table t1_n55 as select cast(key as int) key, value from src where key <= 10 +POSTHOOK: type: CREATETABLE_AS_SELECT +POSTHOOK: Input: default@src +POSTHOOK: Output: database:default +POSTHOOK: Output: default@t1_n55 +POSTHOOK: Lineage: t1_n55.key EXPRESSION [(src)src.FieldSchema(name:key, type:string, comment:default), ] +POSTHOOK: Lineage: t1_n55.value SIMPLE [(src)src.FieldSchema(name:value, type:string, comment:default), ] +PREHOOK: query: select * from t1_n55 sort by key +PREHOOK: type: QUERY +PREHOOK: Input: default@t1_n55 + A masked pattern was here +POSTHOOK: query: select * from t1_n55 sort by key +POSTHOOK: type: QUERY +POSTHOOK: Input: default@t1_n55 + A masked pattern was here +0 val_0 Review comment: These all new test cases are added from the failure test cases of a dry run with anti join enabled true. Manually i have verified that the resultant records are same and plan difference is as per expected behavior. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 460429) Time Spent: 50m (was: 40m) > Support Anti Join in Hive > -- > > Key: HIVE-23716 > URL: https://issues.apache.org/jira/browse/HIVE-23716 > Project: Hive > Issue Type: Bug >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23716.01.patch > > Time Spent: 50m > Remaining Estimate: 0h > > Currently hive does not support Anti join. The query for anti join is > converted to left outer join and null filter on right side join key is added > to get the desired result. This is causing > # Extra computation — The left outer join projects the redundant columns > from right side. Along with that, filtering is done to remove the redundant > rows. This is can be avoided in case of anti join as anti join will project > only the required columns and rows from the left side table. > # Extra shuffle — In case of anti join the duplicate records moved to join > node can be avoided from the child node. This can reduce significant amount > of data movement if the number of distinct rows( join keys) is significant. > # Extra Memory Usage - In case of map based anti join , hash set is > sufficient as just the key is required to check if the records matches the > join condition. In case of left join, we need the key and the non key columns > also and thus a hash table will be required. > For a query like > {code:java} > select wr_order_number FROM web_returns LEFT JOIN web_sales ON > wr_order_number = ws_order_number WHERE ws_order_number IS NULL;{code} > The number of distinct ws_order_number in web_sales table in a typical 10TB > TPCDS set up is just 10% of total records. So when we convert this query to > anti join, instead of 7 billion rows, only 600 million rows are moved to join > node. > In the current patch, just one conversion is done. The pattern of > project->filter->left-join is converted to project->anti-join. This will take > care of sub queries with “not exists” clause. The queries with “not exists” > are converted first to filter + left-join and then its converted to anti > join. The queries with “not in” are not handled in the current patch. > From execution side, both merge join and map join with vectorized execution > is supported for anti join. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23716) Support Anti Join in Hive
[ https://issues.apache.org/jira/browse/HIVE-23716?focusedWorklogId=460427=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460427 ] ASF GitHub Bot logged work on HIVE-23716: - Author: ASF GitHub Bot Created on: 17/Jul/20 18:01 Start Date: 17/Jul/20 18:01 Worklog Time Spent: 10m Work Description: maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r456593908 ## File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java ## @@ -2162,7 +2162,8 @@ private static void populateLlapDaemonVarsSet(Set llapDaemonVarsSetLocal "Whether Hive enables the optimization about converting common join into mapjoin based on the input file size. \n" + "If this parameter is on, and the sum of size for n-1 of the tables/partitions for a n-way join is smaller than the\n" + "specified size, the join is directly converted to a mapjoin (there is no conditional task)."), - +HIVE_CONVERT_ANTI_JOIN("hive.auto.convert.anti.join", false, Review comment: Yes, i had triggered a ptest run with this config enabled to true by default. There were some 26 failures. I had analyzed those and some fixes were done to make sure that the result is same for both and difference in plan is as expected. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 460427) Time Spent: 40m (was: 0.5h) > Support Anti Join in Hive > -- > > Key: HIVE-23716 > URL: https://issues.apache.org/jira/browse/HIVE-23716 > Project: Hive > Issue Type: Bug >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23716.01.patch > > Time Spent: 40m > Remaining Estimate: 0h > > Currently hive does not support Anti join. The query for anti join is > converted to left outer join and null filter on right side join key is added > to get the desired result. This is causing > # Extra computation — The left outer join projects the redundant columns > from right side. Along with that, filtering is done to remove the redundant > rows. This is can be avoided in case of anti join as anti join will project > only the required columns and rows from the left side table. > # Extra shuffle — In case of anti join the duplicate records moved to join > node can be avoided from the child node. This can reduce significant amount > of data movement if the number of distinct rows( join keys) is significant. > # Extra Memory Usage - In case of map based anti join , hash set is > sufficient as just the key is required to check if the records matches the > join condition. In case of left join, we need the key and the non key columns > also and thus a hash table will be required. > For a query like > {code:java} > select wr_order_number FROM web_returns LEFT JOIN web_sales ON > wr_order_number = ws_order_number WHERE ws_order_number IS NULL;{code} > The number of distinct ws_order_number in web_sales table in a typical 10TB > TPCDS set up is just 10% of total records. So when we convert this query to > anti join, instead of 7 billion rows, only 600 million rows are moved to join > node. > In the current patch, just one conversion is done. The pattern of > project->filter->left-join is converted to project->anti-join. This will take > care of sub queries with “not exists” clause. The queries with “not exists” > are converted first to filter + left-join and then its converted to anti > join. The queries with “not in” are not handled in the current patch. > From execution side, both merge join and map join with vectorized execution > is supported for anti join. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23716) Support Anti Join in Hive
[ https://issues.apache.org/jira/browse/HIVE-23716?focusedWorklogId=460416=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460416 ] ASF GitHub Bot logged work on HIVE-23716: - Author: ASF GitHub Bot Created on: 17/Jul/20 17:52 Start Date: 17/Jul/20 17:52 Worklog Time Spent: 10m Work Description: vineetgarg02 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r456588923 ## File path: ql/src/test/results/clientpositive/llap/antijoin.q.out ## @@ -0,0 +1,1007 @@ +PREHOOK: query: create table t1_n55 as select cast(key as int) key, value from src where key <= 10 +PREHOOK: type: CREATETABLE_AS_SELECT +PREHOOK: Input: default@src +PREHOOK: Output: database:default +PREHOOK: Output: default@t1_n55 +POSTHOOK: query: create table t1_n55 as select cast(key as int) key, value from src where key <= 10 +POSTHOOK: type: CREATETABLE_AS_SELECT +POSTHOOK: Input: default@src +POSTHOOK: Output: database:default +POSTHOOK: Output: default@t1_n55 +POSTHOOK: Lineage: t1_n55.key EXPRESSION [(src)src.FieldSchema(name:key, type:string, comment:default), ] +POSTHOOK: Lineage: t1_n55.value SIMPLE [(src)src.FieldSchema(name:value, type:string, comment:default), ] +PREHOOK: query: select * from t1_n55 sort by key +PREHOOK: type: QUERY +PREHOOK: Input: default@t1_n55 + A masked pattern was here +POSTHOOK: query: select * from t1_n55 sort by key +POSTHOOK: type: QUERY +POSTHOOK: Input: default@t1_n55 + A masked pattern was here +0 val_0 Review comment: How was the correctness of results verified? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 460416) Time Spent: 0.5h (was: 20m) > Support Anti Join in Hive > -- > > Key: HIVE-23716 > URL: https://issues.apache.org/jira/browse/HIVE-23716 > Project: Hive > Issue Type: Bug >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23716.01.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > Currently hive does not support Anti join. The query for anti join is > converted to left outer join and null filter on right side join key is added > to get the desired result. This is causing > # Extra computation — The left outer join projects the redundant columns > from right side. Along with that, filtering is done to remove the redundant > rows. This is can be avoided in case of anti join as anti join will project > only the required columns and rows from the left side table. > # Extra shuffle — In case of anti join the duplicate records moved to join > node can be avoided from the child node. This can reduce significant amount > of data movement if the number of distinct rows( join keys) is significant. > # Extra Memory Usage - In case of map based anti join , hash set is > sufficient as just the key is required to check if the records matches the > join condition. In case of left join, we need the key and the non key columns > also and thus a hash table will be required. > For a query like > {code:java} > select wr_order_number FROM web_returns LEFT JOIN web_sales ON > wr_order_number = ws_order_number WHERE ws_order_number IS NULL;{code} > The number of distinct ws_order_number in web_sales table in a typical 10TB > TPCDS set up is just 10% of total records. So when we convert this query to > anti join, instead of 7 billion rows, only 600 million rows are moved to join > node. > In the current patch, just one conversion is done. The pattern of > project->filter->left-join is converted to project->anti-join. This will take > care of sub queries with “not exists” clause. The queries with “not exists” > are converted first to filter + left-join and then its converted to anti > join. The queries with “not in” are not handled in the current patch. > From execution side, both merge join and map join with vectorized execution > is supported for anti join. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23716) Support Anti Join in Hive
[ https://issues.apache.org/jira/browse/HIVE-23716?focusedWorklogId=460414=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460414 ] ASF GitHub Bot logged work on HIVE-23716: - Author: ASF GitHub Bot Created on: 17/Jul/20 17:51 Start Date: 17/Jul/20 17:51 Worklog Time Spent: 10m Work Description: vineetgarg02 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r456588241 ## File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java ## @@ -2162,7 +2162,8 @@ private static void populateLlapDaemonVarsSet(Set llapDaemonVarsSetLocal "Whether Hive enables the optimization about converting common join into mapjoin based on the input file size. \n" + "If this parameter is on, and the sum of size for n-1 of the tables/partitions for a n-way join is smaller than the\n" + "specified size, the join is directly converted to a mapjoin (there is no conditional task)."), - +HIVE_CONVERT_ANTI_JOIN("hive.auto.convert.anti.join", false, Review comment: @maheshk114 Have you run all the tests with this feature set to true by default? This change touches existing logic/code and we should definitely run all the existing tests with this set to TRUE. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 460414) Time Spent: 20m (was: 10m) > Support Anti Join in Hive > -- > > Key: HIVE-23716 > URL: https://issues.apache.org/jira/browse/HIVE-23716 > Project: Hive > Issue Type: Bug >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23716.01.patch > > Time Spent: 20m > Remaining Estimate: 0h > > Currently hive does not support Anti join. The query for anti join is > converted to left outer join and null filter on right side join key is added > to get the desired result. This is causing > # Extra computation — The left outer join projects the redundant columns > from right side. Along with that, filtering is done to remove the redundant > rows. This is can be avoided in case of anti join as anti join will project > only the required columns and rows from the left side table. > # Extra shuffle — In case of anti join the duplicate records moved to join > node can be avoided from the child node. This can reduce significant amount > of data movement if the number of distinct rows( join keys) is significant. > # Extra Memory Usage - In case of map based anti join , hash set is > sufficient as just the key is required to check if the records matches the > join condition. In case of left join, we need the key and the non key columns > also and thus a hash table will be required. > For a query like > {code:java} > select wr_order_number FROM web_returns LEFT JOIN web_sales ON > wr_order_number = ws_order_number WHERE ws_order_number IS NULL;{code} > The number of distinct ws_order_number in web_sales table in a typical 10TB > TPCDS set up is just 10% of total records. So when we convert this query to > anti join, instead of 7 billion rows, only 600 million rows are moved to join > node. > In the current patch, just one conversion is done. The pattern of > project->filter->left-join is converted to project->anti-join. This will take > care of sub queries with “not exists” clause. The queries with “not exists” > are converted first to filter + left-join and then its converted to anti > join. The queries with “not in” are not handled in the current patch. > From execution side, both merge join and map join with vectorized execution > is supported for anti join. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23873) Querying Hive JDBCStorageHandler table fails with NPE
[ https://issues.apache.org/jira/browse/HIVE-23873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17160110#comment-17160110 ] Syed Shameerur Rahman commented on HIVE-23873: -- [~chiran54321] Do you see the same issue with hive master branch? > Querying Hive JDBCStorageHandler table fails with NPE > - > > Key: HIVE-23873 > URL: https://issues.apache.org/jira/browse/HIVE-23873 > Project: Hive > Issue Type: Bug > Components: HiveServer2, JDBC >Affects Versions: 3.1.0, 3.1.1, 3.1.2 >Reporter: Chiran Ravani >Assignee: Chiran Ravani >Priority: Critical > Attachments: HIVE-23873.01.patch > > > Scenario is Hive table having same schema as table in Oracle, however when we > query the table with data it fails with NPE, below is the trace. > {code} > Caused by: java.io.IOException: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:617) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at > org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:524) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2739) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at > org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:473) > ~[hive-service-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > ... 34 more > Caused by: java.lang.NullPointerException > at > org.apache.hive.storage.jdbc.JdbcSerDe.deserialize(JdbcSerDe.java:164) > ~[hive-jdbc-handler-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:598) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at > org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:524) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2739) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at > org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:473) > ~[hive-service-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > ... 34 more > {code} > Problem appears when column names in Oracle are in Upper case and since in > Hive, table and column names are forced to store in lowercase during > creation. User runs into NPE error while fetching data. > While deserializing data, input consists of column names in lower case which > fails to get the value > https://github.com/apache/hive/blob/rel/release-3.1.2/jdbc-handler/src/main/java/org/apache/hive/storage/jdbc/JdbcSerDe.java#L136 > {code} > rowVal = ((ObjectWritable)value).get(); > {code} > Log Snio: > = > {code} > 2020-07-17T16:49:09,598 INFO [04ed42ec-91d2-4662-aee7-37e840a06036 > HiveServer2-Handler-Pool: Thread-104]: dao.GenericJdbcDatabaseAccessor (:()) > - Query to execute is [select * from TESTHIVEJDBCSTORAGE] > 2020-07-17T16:49:10,642 INFO [04ed42ec-91d2-4662-aee7-37e840a06036 > HiveServer2-Handler-Pool: Thread-104]: jdbc.JdbcSerDe (:()) - *** ColumnKey = > ID > 2020-07-17T16:49:10,642 INFO [04ed42ec-91d2-4662-aee7-37e840a06036 > HiveServer2-Handler-Pool: Thread-104]: jdbc.JdbcSerDe (:()) - *** Blob value > = {fname=OW[class=class java.lang.String,value=Name1], id=OW[class=class > java.lang.Integer,value=1]} > {code} > Simple Reproducer for this case. > = > 1. Create table in Oracle > {code} > create table TESTHIVEJDBCSTORAGE(ID INT, FNAME VARCHAR(20)); > {code} > 2. Insert dummy data. > {code} > Insert into TESTHIVEJDBCSTORAGE values (1, 'Name1'); > {code} > 3. Create JDBCStorageHandler table in Hive. > {code} > CREATE EXTERNAL TABLE default.TESTHIVEJDBCSTORAGE_HIVE_TBL (ID INT, FNAME > VARCHAR(20)) > STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler' > TBLPROPERTIES ( > "hive.sql.database.type" = "ORACLE", > "hive.sql.jdbc.driver" = "oracle.jdbc.OracleDriver", > "hive.sql.jdbc.url" = "jdbc:oracle:thin:@orachehostname/XE", > "hive.sql.dbcp.username" = "chiran", > "hive.sql.dbcp.password" = "supersecurepassword", >
[jira] [Work logged] (HIVE-23871) ObjectStore should properly handle MicroManaged Table properties
[ https://issues.apache.org/jira/browse/HIVE-23871?focusedWorklogId=460404=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460404 ] ASF GitHub Bot logged work on HIVE-23871: - Author: ASF GitHub Bot Created on: 17/Jul/20 17:32 Start Date: 17/Jul/20 17:32 Worklog Time Spent: 10m Work Description: mustafaiman commented on a change in pull request #1273: URL: https://github.com/apache/hive/pull/1273#discussion_r456570304 ## File path: data/files/table1 ## @@ -0,0 +1,5 @@ +1 Acura 4 Review comment: can we give the file a non generic name? ## File path: ql/src/test/results/clientpositive/llap/load_micromanaged_delim.q.out ## @@ -0,0 +1,186 @@ + A masked pattern was here +PREHOOK: type: CREATETABLE + A masked pattern was here +PREHOOK: Output: database:default +PREHOOK: Output: default@delim_table_ext + A masked pattern was here +POSTHOOK: type: CREATETABLE + A masked pattern was here +POSTHOOK: Output: database:default +POSTHOOK: Output: default@delim_table_ext +PREHOOK: query: describe formatted delim_table_ext +PREHOOK: type: DESCTABLE +PREHOOK: Input: default@delim_table_ext +POSTHOOK: query: describe formatted delim_table_ext +POSTHOOK: type: DESCTABLE +POSTHOOK: Input: default@delim_table_ext +# col_name data_type comment +id int +name string +safety int + +# Detailed Table Information +Database: default + A masked pattern was here +Retention: 0 + A masked pattern was here +Table Type:EXTERNAL_TABLE +Table Parameters: + EXTERNALTRUE + bucketing_version 2 + numFiles1 + totalSize 52 + A masked pattern was here + +# Storage Information +SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe +InputFormat: org.apache.hadoop.mapred.TextInputFormat +OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat +Compressed:No +Num Buckets: -1 +Bucket Columns:[] +Sort Columns: [] +Storage Desc Params: + field.delim \t + serialization.format\t +PREHOOK: query: SELECT * FROM delim_table_ext +PREHOOK: type: QUERY +PREHOOK: Input: default@delim_table_ext + A masked pattern was here +POSTHOOK: query: SELECT * FROM delim_table_ext +POSTHOOK: type: QUERY +POSTHOOK: Input: default@delim_table_ext + A masked pattern was here +1 Acura 4 +2 Toyota 3 +3 Tesla 5 +4 Honda 5 +11 Mazda 2 +PREHOOK: query: CREATE TABLE delim_table_micro(id INT, name STRING, safety INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE TBLPROPERTIES('transactional'='true', "transactional_properties"="insert_only") +PREHOOK: type: CREATETABLE +PREHOOK: Output: database:default +PREHOOK: Output: default@delim_table_micro +POSTHOOK: query: CREATE TABLE delim_table_micro(id INT, name STRING, safety INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE TBLPROPERTIES('transactional'='true', "transactional_properties"="insert_only") +POSTHOOK: type: CREATETABLE +POSTHOOK: Output: database:default +POSTHOOK: Output: default@delim_table_micro + A masked pattern was here +PREHOOK: type: LOAD + A masked pattern was here +PREHOOK: Output: default@delim_table_micro + A masked pattern was here +POSTHOOK: type: LOAD + A masked pattern was here +POSTHOOK: Output: default@delim_table_micro +PREHOOK: query: describe formatted delim_table_micro +PREHOOK: type: DESCTABLE +PREHOOK: Input: default@delim_table_micro +POSTHOOK: query: describe formatted delim_table_micro +POSTHOOK: type: DESCTABLE +POSTHOOK: Input: default@delim_table_micro +# col_name data_type comment +id int +name string +safety int + +# Detailed Table Information +Database: default + A masked pattern was here +Retention: 0 + A masked pattern
[jira] [Work logged] (HIVE-23324) Parallelise compaction directory cleaning process
[ https://issues.apache.org/jira/browse/HIVE-23324?focusedWorklogId=460388=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460388 ] ASF GitHub Bot logged work on HIVE-23324: - Author: ASF GitHub Bot Created on: 17/Jul/20 17:09 Start Date: 17/Jul/20 17:09 Worklog Time Spent: 10m Work Description: adesh-rao opened a new pull request #1275: URL: https://github.com/apache/hive/pull/1275 ## NOTICE Please create an issue in ASF JIRA before opening a pull request, and you need to set the title of the pull request which starts with the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY) For more details, please see https://cwiki.apache.org/confluence/display/Hive/HowToContribute This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 460388) Remaining Estimate: 0h Time Spent: 10m > Parallelise compaction directory cleaning process > - > > Key: HIVE-23324 > URL: https://issues.apache.org/jira/browse/HIVE-23324 > Project: Hive > Issue Type: Improvement >Reporter: Marton Bod >Assignee: Adesh Kumar Rao >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Initiator processes the various compaction candidates in parallel, so we > could follow a similar approach in Cleaner where we currently clean the > directories sequentially. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23324) Parallelise compaction directory cleaning process
[ https://issues.apache.org/jira/browse/HIVE-23324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-23324: -- Labels: pull-request-available (was: ) > Parallelise compaction directory cleaning process > - > > Key: HIVE-23324 > URL: https://issues.apache.org/jira/browse/HIVE-23324 > Project: Hive > Issue Type: Improvement >Reporter: Marton Bod >Assignee: Adesh Kumar Rao >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Initiator processes the various compaction candidates in parallel, so we > could follow a similar approach in Cleaner where we currently clean the > directories sequentially. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23873) Querying Hive JDBCStorageHandler table fails with NPE
[ https://issues.apache.org/jira/browse/HIVE-23873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chiran Ravani updated HIVE-23873: - Description: Scenario is Hive table having same schema as table in Oracle, however when we query the table with data it fails with NPE, below is the trace. {code} Caused by: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:617) ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:524) ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2739) ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] at org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229) ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:473) ~[hive-service-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] ... 34 more Caused by: java.lang.NullPointerException at org.apache.hive.storage.jdbc.JdbcSerDe.deserialize(JdbcSerDe.java:164) ~[hive-jdbc-handler-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:598) ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:524) ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2739) ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] at org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229) ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:473) ~[hive-service-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] ... 34 more {code} Problem appears when column names in Oracle are in Upper case and since in Hive, table and column names are forced to store in lowercase during creation. User runs into NPE error while fetching data. While deserializing data, input consists of column names in lower case which fails to get the value https://github.com/apache/hive/blob/rel/release-3.1.2/jdbc-handler/src/main/java/org/apache/hive/storage/jdbc/JdbcSerDe.java#L136 {code} rowVal = ((ObjectWritable)value).get(); {code} Log Snio: = {code} 2020-07-17T16:49:09,598 INFO [04ed42ec-91d2-4662-aee7-37e840a06036 HiveServer2-Handler-Pool: Thread-104]: dao.GenericJdbcDatabaseAccessor (:()) - Query to execute is [select * from TESTHIVEJDBCSTORAGE] 2020-07-17T16:49:10,642 INFO [04ed42ec-91d2-4662-aee7-37e840a06036 HiveServer2-Handler-Pool: Thread-104]: jdbc.JdbcSerDe (:()) - *** ColumnKey = ID 2020-07-17T16:49:10,642 INFO [04ed42ec-91d2-4662-aee7-37e840a06036 HiveServer2-Handler-Pool: Thread-104]: jdbc.JdbcSerDe (:()) - *** Blob value = {fname=OW[class=class java.lang.String,value=Name1], id=OW[class=class java.lang.Integer,value=1]} {code} Simple Reproducer for this case. = 1. Create table in Oracle {code} create table TESTHIVEJDBCSTORAGE(ID INT, FNAME VARCHAR(20)); {code} 2. Insert dummy data. {code} Insert into TESTHIVEJDBCSTORAGE values (1, 'Name1'); {code} 3. Create JDBCStorageHandler table in Hive. {code} CREATE EXTERNAL TABLE default.TESTHIVEJDBCSTORAGE_HIVE_TBL (ID INT, FNAME VARCHAR(20)) STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler' TBLPROPERTIES ( "hive.sql.database.type" = "ORACLE", "hive.sql.jdbc.driver" = "oracle.jdbc.OracleDriver", "hive.sql.jdbc.url" = "jdbc:oracle:thin:@orachehostname/XE", "hive.sql.dbcp.username" = "chiran", "hive.sql.dbcp.password" = "supersecurepassword", "hive.sql.table" = "TESTHIVEJDBCSTORAGE", "hive.sql.dbcp.maxActive" = "1" ); {code} 4. Query Hive table, fails with NPE. {code} > select * from default.TESTHIVEJDBCSTORAGE_HIVE_TBL; INFO : Compiling command(queryId=hive_20200717164857_cd6f5020-4a69-4a2d-9e63-9db99d0121bc): select * from default.TESTHIVEJDBCSTORAGE_HIVE_TBL INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:testhivejdbcstorage_hive_tbl.id, type:int, comment:null), FieldSchema(name:testhivejdbcstorage_hive_tbl.fname, type:varchar(20), comment:null)], properties:null) INFO : Completed compiling command(queryId=hive_20200717164857_cd6f5020-4a69-4a2d-9e63-9db99d0121bc); Time taken: 9.914 seconds INFO : Executing
[jira] [Updated] (HIVE-23873) Querying Hive JDBCStorageHandler table fails with NPE
[ https://issues.apache.org/jira/browse/HIVE-23873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chiran Ravani updated HIVE-23873: - Attachment: HIVE-23873.01.patch Status: Patch Available (was: Open) > Querying Hive JDBCStorageHandler table fails with NPE > - > > Key: HIVE-23873 > URL: https://issues.apache.org/jira/browse/HIVE-23873 > Project: Hive > Issue Type: Bug > Components: HiveServer2, JDBC >Affects Versions: 3.1.2, 3.1.1, 3.1.0 >Reporter: Chiran Ravani >Assignee: Chiran Ravani >Priority: Critical > Attachments: HIVE-23873.01.patch > > > Scenario is Hive table having same schema as table in Oracle, however when we > query the table with data it fails with NPE, below is the trace. > {code} > Caused by: java.io.IOException: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:617) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at > org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:524) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2739) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at > org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:473) > ~[hive-service-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > ... 34 more > Caused by: java.lang.NullPointerException > at > org.apache.hive.storage.jdbc.JdbcSerDe.deserialize(JdbcSerDe.java:164) > ~[hive-jdbc-handler-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:598) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at > org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:524) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2739) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at > org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:473) > ~[hive-service-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > ... 34 more > {code} > Problem appears when column names in Oracle are in Upper case and since in > Hive, table and column names are forced to store in lowercase during > creation. User runs into NPE error while fetching data. > While deserializing data, input consists of column names in lower case which > fails to get the value > https://github.com/apache/hive/blob/rel/release-3.1.2/jdbc-handler/src/main/java/org/apache/hive/storage/jdbc/JdbcSerDe.java#L136 > {code} > rowVal = ((ObjectWritable)value).get(); > {code} > Log Snio: > = > {code} > 2020-07-17T16:49:09,598 INFO [04ed42ec-91d2-4662-aee7-37e840a06036 > HiveServer2-Handler-Pool: Thread-104]: dao.GenericJdbcDatabaseAccessor (:()) > - Query to execute is [select * from TESTHIVEJDBCSTORAGE] > 2020-07-17T16:49:10,642 INFO [04ed42ec-91d2-4662-aee7-37e840a06036 > HiveServer2-Handler-Pool: Thread-104]: jdbc.JdbcSerDe (:()) - *** ColumnKey = > ID > 2020-07-17T16:49:10,642 INFO [04ed42ec-91d2-4662-aee7-37e840a06036 > HiveServer2-Handler-Pool: Thread-104]: jdbc.JdbcSerDe (:()) - *** Blob value > = {fname=OW[class=class java.lang.String,value=Name1], id=OW[class=class > java.lang.Integer,value=1]} > {code} > Simple Reproducer for this case. > = > 1. Create table in Oracle > {code} > create table TESTHIVEJDBCSTORAGE(ID INT, FNAME VARCHAR(20)); > {code} > 2. Insert dummy data. > {code} > Insert into TESTHIVEJDBCSTORAGE values (1, 'Name1'); > {code} > 3. Create JDBCStorageHandler table in Hive. > {code} > CREATE EXTERNAL TABLE default.TESTHIVEJDBCSTORAGE_HIVE_TBL (ID INT, FNAME > VARCHAR(20)) > STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler' > TBLPROPERTIES ( > "hive.sql.database.type" = "ORACLE", > "hive.sql.jdbc.driver" = "oracle.jdbc.OracleDriver", > "hive.sql.jdbc.url" = "jdbc:oracle:thin:@10.96.95.99:49161/XE", > "hive.sql.dbcp.username" = "chiran", > "hive.sql.dbcp.password" = "hadoop", > "hive.sql.table" = "TESTHIVEJDBCSTORAGE", >
[jira] [Updated] (HIVE-23873) Querying Hive JDBCStorageHandler table fails with NPE
[ https://issues.apache.org/jira/browse/HIVE-23873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chiran Ravani updated HIVE-23873: - Attachment: (was: HIVE-23873.01.patch) > Querying Hive JDBCStorageHandler table fails with NPE > - > > Key: HIVE-23873 > URL: https://issues.apache.org/jira/browse/HIVE-23873 > Project: Hive > Issue Type: Bug > Components: HiveServer2, JDBC >Affects Versions: 3.1.0, 3.1.1, 3.1.2 >Reporter: Chiran Ravani >Assignee: Chiran Ravani >Priority: Critical > > Scenario is Hive table having same schema as table in Oracle, however when we > query the table with data it fails with NPE, below is the trace. > {code} > Caused by: java.io.IOException: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:617) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at > org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:524) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2739) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at > org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:473) > ~[hive-service-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > ... 34 more > Caused by: java.lang.NullPointerException > at > org.apache.hive.storage.jdbc.JdbcSerDe.deserialize(JdbcSerDe.java:164) > ~[hive-jdbc-handler-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:598) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at > org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:524) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2739) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at > org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:473) > ~[hive-service-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > ... 34 more > {code} > Problem appears when column names in Oracle are in Upper case and since in > Hive, table and column names are forced to store in lowercase during > creation. User runs into NPE error while fetching data. > While deserializing data, input consists of column names in lower case which > fails to get the value > https://github.com/apache/hive/blob/rel/release-3.1.2/jdbc-handler/src/main/java/org/apache/hive/storage/jdbc/JdbcSerDe.java#L136 > {code} > rowVal = ((ObjectWritable)value).get(); > {code} > Log Snio: > = > {code} > 2020-07-17T16:49:09,598 INFO [04ed42ec-91d2-4662-aee7-37e840a06036 > HiveServer2-Handler-Pool: Thread-104]: dao.GenericJdbcDatabaseAccessor (:()) > - Query to execute is [select * from TESTHIVEJDBCSTORAGE] > 2020-07-17T16:49:10,642 INFO [04ed42ec-91d2-4662-aee7-37e840a06036 > HiveServer2-Handler-Pool: Thread-104]: jdbc.JdbcSerDe (:()) - *** ColumnKey = > ID > 2020-07-17T16:49:10,642 INFO [04ed42ec-91d2-4662-aee7-37e840a06036 > HiveServer2-Handler-Pool: Thread-104]: jdbc.JdbcSerDe (:()) - *** Blob value > = {fname=OW[class=class java.lang.String,value=Name1], id=OW[class=class > java.lang.Integer,value=1]} > {code} > Simple Reproducer for this case. > = > 1. Create table in Oracle > {code} > create table TESTHIVEJDBCSTORAGE(ID INT, FNAME VARCHAR(20)); > {code} > 2. Insert dummy data. > {code} > Insert into TESTHIVEJDBCSTORAGE values (1, 'Name1'); > {code} > 3. Create JDBCStorageHandler table in Hive. > {code} > CREATE EXTERNAL TABLE default.TESTHIVEJDBCSTORAGE_HIVE_TBL (ID INT, FNAME > VARCHAR(20)) > STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler' > TBLPROPERTIES ( > "hive.sql.database.type" = "ORACLE", > "hive.sql.jdbc.driver" = "oracle.jdbc.OracleDriver", > "hive.sql.jdbc.url" = "jdbc:oracle:thin:@10.96.95.99:49161/XE", > "hive.sql.dbcp.username" = "chiran", > "hive.sql.dbcp.password" = "hadoop", > "hive.sql.table" = "TESTHIVEJDBCSTORAGE", > "hive.sql.dbcp.maxActive" = "1" > ); > {code} > 4. Query Hive table, fails with NPE. > {code} >
[jira] [Updated] (HIVE-23873) Querying Hive JDBCStorageHandler table fails with NPE
[ https://issues.apache.org/jira/browse/HIVE-23873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chiran Ravani updated HIVE-23873: - Attachment: HIVE-23873.01.patch > Querying Hive JDBCStorageHandler table fails with NPE > - > > Key: HIVE-23873 > URL: https://issues.apache.org/jira/browse/HIVE-23873 > Project: Hive > Issue Type: Bug > Components: HiveServer2, JDBC >Affects Versions: 3.1.0, 3.1.1, 3.1.2 >Reporter: Chiran Ravani >Priority: Critical > > Scenario is Hive table having same schema as table in Oracle, however when we > query the table with data it fails with NPE, below is the trace. > {code} > Caused by: java.io.IOException: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:617) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at > org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:524) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2739) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at > org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:473) > ~[hive-service-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > ... 34 more > Caused by: java.lang.NullPointerException > at > org.apache.hive.storage.jdbc.JdbcSerDe.deserialize(JdbcSerDe.java:164) > ~[hive-jdbc-handler-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:598) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at > org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:524) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2739) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at > org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:473) > ~[hive-service-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > ... 34 more > {code} > Problem appears when column names in Oracle are in Upper case and since in > Hive, table and column names are forced to store in lowercase during > creation. User runs into NPE error while fetching data. > While deserializing data, input consists of column names in lower case which > fails to get the value > https://github.com/apache/hive/blob/rel/release-3.1.2/jdbc-handler/src/main/java/org/apache/hive/storage/jdbc/JdbcSerDe.java#L136 > {code} > rowVal = ((ObjectWritable)value).get(); > {code} > Log Snio: > = > {code} > 2020-07-17T16:49:09,598 INFO [04ed42ec-91d2-4662-aee7-37e840a06036 > HiveServer2-Handler-Pool: Thread-104]: dao.GenericJdbcDatabaseAccessor (:()) > - Query to execute is [select * from TESTHIVEJDBCSTORAGE] > 2020-07-17T16:49:10,642 INFO [04ed42ec-91d2-4662-aee7-37e840a06036 > HiveServer2-Handler-Pool: Thread-104]: jdbc.JdbcSerDe (:()) - *** ColumnKey = > ID > 2020-07-17T16:49:10,642 INFO [04ed42ec-91d2-4662-aee7-37e840a06036 > HiveServer2-Handler-Pool: Thread-104]: jdbc.JdbcSerDe (:()) - *** Blob value > = {fname=OW[class=class java.lang.String,value=Name1], id=OW[class=class > java.lang.Integer,value=1]} > {code} > Simple Reproducer for this case. > = > 1. Create table in Oracle > {code} > create table TESTHIVEJDBCSTORAGE(ID INT, FNAME VARCHAR(20)); > {code} > 2. Insert dummy data. > {code} > Insert into TESTHIVEJDBCSTORAGE values (1, 'Name1'); > {code} > 3. Create JDBCStorageHandler table in Hive. > {code} > CREATE EXTERNAL TABLE default.TESTHIVEJDBCSTORAGE_HIVE_TBL (ID INT, FNAME > VARCHAR(20)) > STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler' > TBLPROPERTIES ( > "hive.sql.database.type" = "ORACLE", > "hive.sql.jdbc.driver" = "oracle.jdbc.OracleDriver", > "hive.sql.jdbc.url" = "jdbc:oracle:thin:@10.96.95.99:49161/XE", > "hive.sql.dbcp.username" = "chiran", > "hive.sql.dbcp.password" = "hadoop", > "hive.sql.table" = "TESTHIVEJDBCSTORAGE", > "hive.sql.dbcp.maxActive" = "1" > ); > {code} > 4. Query Hive table, fails with NPE. > {code} > > select * from
[jira] [Assigned] (HIVE-23873) Querying Hive JDBCStorageHandler table fails with NPE
[ https://issues.apache.org/jira/browse/HIVE-23873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chiran Ravani reassigned HIVE-23873: Assignee: Chiran Ravani > Querying Hive JDBCStorageHandler table fails with NPE > - > > Key: HIVE-23873 > URL: https://issues.apache.org/jira/browse/HIVE-23873 > Project: Hive > Issue Type: Bug > Components: HiveServer2, JDBC >Affects Versions: 3.1.0, 3.1.1, 3.1.2 >Reporter: Chiran Ravani >Assignee: Chiran Ravani >Priority: Critical > > Scenario is Hive table having same schema as table in Oracle, however when we > query the table with data it fails with NPE, below is the trace. > {code} > Caused by: java.io.IOException: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:617) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at > org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:524) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2739) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at > org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:473) > ~[hive-service-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > ... 34 more > Caused by: java.lang.NullPointerException > at > org.apache.hive.storage.jdbc.JdbcSerDe.deserialize(JdbcSerDe.java:164) > ~[hive-jdbc-handler-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:598) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at > org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:524) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2739) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at > org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:473) > ~[hive-service-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > ... 34 more > {code} > Problem appears when column names in Oracle are in Upper case and since in > Hive, table and column names are forced to store in lowercase during > creation. User runs into NPE error while fetching data. > While deserializing data, input consists of column names in lower case which > fails to get the value > https://github.com/apache/hive/blob/rel/release-3.1.2/jdbc-handler/src/main/java/org/apache/hive/storage/jdbc/JdbcSerDe.java#L136 > {code} > rowVal = ((ObjectWritable)value).get(); > {code} > Log Snio: > = > {code} > 2020-07-17T16:49:09,598 INFO [04ed42ec-91d2-4662-aee7-37e840a06036 > HiveServer2-Handler-Pool: Thread-104]: dao.GenericJdbcDatabaseAccessor (:()) > - Query to execute is [select * from TESTHIVEJDBCSTORAGE] > 2020-07-17T16:49:10,642 INFO [04ed42ec-91d2-4662-aee7-37e840a06036 > HiveServer2-Handler-Pool: Thread-104]: jdbc.JdbcSerDe (:()) - *** ColumnKey = > ID > 2020-07-17T16:49:10,642 INFO [04ed42ec-91d2-4662-aee7-37e840a06036 > HiveServer2-Handler-Pool: Thread-104]: jdbc.JdbcSerDe (:()) - *** Blob value > = {fname=OW[class=class java.lang.String,value=Name1], id=OW[class=class > java.lang.Integer,value=1]} > {code} > Simple Reproducer for this case. > = > 1. Create table in Oracle > {code} > create table TESTHIVEJDBCSTORAGE(ID INT, FNAME VARCHAR(20)); > {code} > 2. Insert dummy data. > {code} > Insert into TESTHIVEJDBCSTORAGE values (1, 'Name1'); > {code} > 3. Create JDBCStorageHandler table in Hive. > {code} > CREATE EXTERNAL TABLE default.TESTHIVEJDBCSTORAGE_HIVE_TBL (ID INT, FNAME > VARCHAR(20)) > STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler' > TBLPROPERTIES ( > "hive.sql.database.type" = "ORACLE", > "hive.sql.jdbc.driver" = "oracle.jdbc.OracleDriver", > "hive.sql.jdbc.url" = "jdbc:oracle:thin:@10.96.95.99:49161/XE", > "hive.sql.dbcp.username" = "chiran", > "hive.sql.dbcp.password" = "hadoop", > "hive.sql.table" = "TESTHIVEJDBCSTORAGE", > "hive.sql.dbcp.maxActive" = "1" > ); > {code} > 4. Query Hive table, fails with NPE. > {code} > > select *
[jira] [Commented] (HIVE-23871) ObjectStore should properly handle MicroManaged Table properties
[ https://issues.apache.org/jira/browse/HIVE-23871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17159994#comment-17159994 ] Jean-Daniel Cryans commented on HIVE-23871: --- Thanks for taking care of this, [~pgaref], it's a pretty bad issue. > ObjectStore should properly handle MicroManaged Table properties > > > Key: HIVE-23871 > URL: https://issues.apache.org/jira/browse/HIVE-23871 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Labels: pull-request-available > Attachments: table1 > > Time Spent: 10m > Remaining Estimate: 0h > > HIVE-23281 optimizes StorageDescriptor conversion as part of the ObjectStore > by skipping particular Table properties like SkewInfo, bucketCols, ordering > etc. > However, it does that for all Transactional Tables – not only ACID – causing > MicroManaged Tables to behave abnormally. > MicroManaged (insert_only) tables may miss needed properties such as Storage > Desc Params – that may define how lines are delimited (like in the example > below): > To repro the issue: > {code:java} > CREATE TRANSACTIONAL TABLE delim_table_trans(id INT, name STRING, safety INT) > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE; > LOAD DATA INPATH 'table1' OVERWRITE INTO TABLE delim_table_trans; > describe formatted delim_table_trans; > SELECT * FROM delim_table_trans; > {code} > Result: > {code:java} > Table Type: MANAGED_TABLE > Table Parameters: > bucketing_version 2 > numFiles1 > numRows 0 > rawDataSize 0 > totalSize 72 > transactional true > transactional_propertiesinsert_only > A masked pattern was here > > # Storage Information > SerDe Library:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > > InputFormat: org.apache.hadoop.mapred.TextInputFormat > OutputFormat: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > Compressed: No > Num Buckets: -1 > Bucket Columns: [] > Sort Columns: [] > PREHOOK: query: SELECT * FROM delim_table_trans > PREHOOK: type: QUERY > PREHOOK: Input: default@delim_table_trans > A masked pattern was here > POSTHOOK: query: SELECT * FROM delim_table_trans > POSTHOOK: type: QUERY > POSTHOOK: Input: default@delim_table_trans > A masked pattern was here > NULL NULLNULL > NULL NULLNULL > NULL NULLNULL > NULL NULLNULL > NULL NULLNULL > NULL NULLNULL > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23850) Allow PPD when subject is not a column with grouping sets present
[ https://issues.apache.org/jira/browse/HIVE-23850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17159993#comment-17159993 ] Zhihua Deng commented on HIVE-23850: Thanks a lot for the help and review, [~jcamachorodriguez]! > Allow PPD when subject is not a column with grouping sets present > - > > Key: HIVE-23850 > URL: https://issues.apache.org/jira/browse/HIVE-23850 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 50m > Remaining Estimate: 0h > > After [HIVE-19653|https://issues.apache.org/jira/browse/HIVE-19653], filters > with only columns and constants are pushed down, but in some cases, this may > not work as well, for example: > SET hive.cbo.enable=false; > SELECT a, b, sum(s) > FROM T1 > GROUP BY a, b GROUPING SETS ((a), (a, b)) > HAVING upper(a) = "AAA" AND sum(s) > 100; > > SELECT upper(a), b, sum(s) > FROM T1 > GROUP BY upper(a), b GROUPING SETS ((upper(a)), (upper(a), b)) > HAVING upper(a) = "AAA" AND sum(s) > 100; > > The filters pushed down to GBY can be f(gbyKey) or gbyKey with udf , not > only the column groupby keys. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23869) Move alter statements in parser to new file
[ https://issues.apache.org/jira/browse/HIVE-23869?focusedWorklogId=460351=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460351 ] ASF GitHub Bot logged work on HIVE-23869: - Author: ASF GitHub Bot Created on: 17/Jul/20 15:16 Start Date: 17/Jul/20 15:16 Worklog Time Spent: 10m Work Description: jcamachor merged pull request #1270: URL: https://github.com/apache/hive/pull/1270 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 460351) Time Spent: 20m (was: 10m) > Move alter statements in parser to new file > --- > > Key: HIVE-23869 > URL: https://issues.apache.org/jira/browse/HIVE-23869 > Project: Hive > Issue Type: Improvement > Components: Parser >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Critical > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > We are hitting HiveParser 'code too large' problem. HIVE-23857 introduced an > adhoc script to solve this problem. Instead, we can split HiveParser.g into > smaller files. For instance, we can group all alter statements into their own > .g file. > This patch also fixes an ambiguity warning that was thrown related to LIKE > ALL/ANY clauses. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (HIVE-23869) Move alter statements in parser to new file
[ https://issues.apache.org/jira/browse/HIVE-23869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-23869 started by Jesus Camacho Rodriguez. -- > Move alter statements in parser to new file > --- > > Key: HIVE-23869 > URL: https://issues.apache.org/jira/browse/HIVE-23869 > Project: Hive > Issue Type: Improvement > Components: Parser >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Critical > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > We are hitting HiveParser 'code too large' problem. HIVE-23857 introduced an > adhoc script to solve this problem. Instead, we can split HiveParser.g into > smaller files. For instance, we can group all alter statements into their own > .g file. > This patch also fixes an ambiguity warning that was thrown related to LIKE > ALL/ANY clauses. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-23869) Move alter statements in parser to new file
[ https://issues.apache.org/jira/browse/HIVE-23869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez resolved HIVE-23869. Fix Version/s: 4.0.0 Resolution: Fixed Pushed to master, thanks for reviewing [~mgergely]. > Move alter statements in parser to new file > --- > > Key: HIVE-23869 > URL: https://issues.apache.org/jira/browse/HIVE-23869 > Project: Hive > Issue Type: Improvement > Components: Parser >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Critical > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > We are hitting HiveParser 'code too large' problem. HIVE-23857 introduced an > adhoc script to solve this problem. Instead, we can split HiveParser.g into > smaller files. For instance, we can group all alter statements into their own > .g file. > This patch also fixes an ambiguity warning that was thrown related to LIKE > ALL/ANY clauses. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23852) Natively support Date type in ReduceSink operator
[ https://issues.apache.org/jira/browse/HIVE-23852?focusedWorklogId=460352=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460352 ] ASF GitHub Bot logged work on HIVE-23852: - Author: ASF GitHub Bot Created on: 17/Jul/20 15:16 Start Date: 17/Jul/20 15:16 Worklog Time Spent: 10m Work Description: pgaref opened a new pull request #1274: URL: https://github.com/apache/hive/pull/1274 ## NOTICE Please create an issue in ASF JIRA before opening a pull request, and you need to set the title of the pull request which starts with the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY) For more details, please see https://cwiki.apache.org/confluence/display/Hive/HowToContribute This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 460352) Time Spent: 50m (was: 40m) > Natively support Date type in ReduceSink operator > - > > Key: HIVE-23852 > URL: https://issues.apache.org/jira/browse/HIVE-23852 > Project: Hive > Issue Type: Improvement >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > There is no native support currently meaning that these types end up being > serialized as multi-key columns which is much slower (iterating through batch > columns instead of writing a value directly) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23852) Natively support Date type in ReduceSink operator
[ https://issues.apache.org/jira/browse/HIVE-23852?focusedWorklogId=460350=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460350 ] ASF GitHub Bot logged work on HIVE-23852: - Author: ASF GitHub Bot Created on: 17/Jul/20 15:14 Start Date: 17/Jul/20 15:14 Worklog Time Spent: 10m Work Description: pgaref closed pull request #1257: URL: https://github.com/apache/hive/pull/1257 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 460350) Time Spent: 40m (was: 0.5h) > Natively support Date type in ReduceSink operator > - > > Key: HIVE-23852 > URL: https://issues.apache.org/jira/browse/HIVE-23852 > Project: Hive > Issue Type: Improvement >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > There is no native support currently meaning that these types end up being > serialized as multi-key columns which is much slower (iterating through batch > columns instead of writing a value directly) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23871) ObjectStore should properly handle MicroManaged Table properties
[ https://issues.apache.org/jira/browse/HIVE-23871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-23871: -- Labels: pull-request-available (was: ) > ObjectStore should properly handle MicroManaged Table properties > > > Key: HIVE-23871 > URL: https://issues.apache.org/jira/browse/HIVE-23871 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Labels: pull-request-available > Attachments: table1 > > Time Spent: 10m > Remaining Estimate: 0h > > HIVE-23281 optimizes StorageDescriptor conversion as part of the ObjectStore > by skipping particular Table properties like SkewInfo, bucketCols, ordering > etc. > However, it does that for all Transactional Tables – not only ACID – causing > MicroManaged Tables to behave abnormally. > MicroManaged (insert_only) tables may miss needed properties such as Storage > Desc Params – that may define how lines are delimited (like in the example > below): > To repro the issue: > {code:java} > CREATE TRANSACTIONAL TABLE delim_table_trans(id INT, name STRING, safety INT) > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE; > LOAD DATA INPATH 'table1' OVERWRITE INTO TABLE delim_table_trans; > describe formatted delim_table_trans; > SELECT * FROM delim_table_trans; > {code} > Result: > {code:java} > Table Type: MANAGED_TABLE > Table Parameters: > bucketing_version 2 > numFiles1 > numRows 0 > rawDataSize 0 > totalSize 72 > transactional true > transactional_propertiesinsert_only > A masked pattern was here > > # Storage Information > SerDe Library:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > > InputFormat: org.apache.hadoop.mapred.TextInputFormat > OutputFormat: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > Compressed: No > Num Buckets: -1 > Bucket Columns: [] > Sort Columns: [] > PREHOOK: query: SELECT * FROM delim_table_trans > PREHOOK: type: QUERY > PREHOOK: Input: default@delim_table_trans > A masked pattern was here > POSTHOOK: query: SELECT * FROM delim_table_trans > POSTHOOK: type: QUERY > POSTHOOK: Input: default@delim_table_trans > A masked pattern was here > NULL NULLNULL > NULL NULLNULL > NULL NULLNULL > NULL NULLNULL > NULL NULLNULL > NULL NULLNULL > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23871) ObjectStore should properly handle MicroManaged Table properties
[ https://issues.apache.org/jira/browse/HIVE-23871?focusedWorklogId=460349=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460349 ] ASF GitHub Bot logged work on HIVE-23871: - Author: ASF GitHub Bot Created on: 17/Jul/20 15:13 Start Date: 17/Jul/20 15:13 Worklog Time Spent: 10m Work Description: pgaref opened a new pull request #1273: URL: https://github.com/apache/hive/pull/1273 ObjectStore should properly handle MicroManaged Table properties Change-Id: Ia5db047419a11504f3c6047a1eb63acd2a14bdc3 ## NOTICE Please create an issue in ASF JIRA before opening a pull request, and you need to set the title of the pull request which starts with the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY) For more details, please see https://cwiki.apache.org/confluence/display/Hive/HowToContribute This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 460349) Remaining Estimate: 0h Time Spent: 10m > ObjectStore should properly handle MicroManaged Table properties > > > Key: HIVE-23871 > URL: https://issues.apache.org/jira/browse/HIVE-23871 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Attachments: table1 > > Time Spent: 10m > Remaining Estimate: 0h > > HIVE-23281 optimizes StorageDescriptor conversion as part of the ObjectStore > by skipping particular Table properties like SkewInfo, bucketCols, ordering > etc. > However, it does that for all Transactional Tables – not only ACID – causing > MicroManaged Tables to behave abnormally. > MicroManaged (insert_only) tables may miss needed properties such as Storage > Desc Params – that may define how lines are delimited (like in the example > below): > To repro the issue: > {code:java} > CREATE TRANSACTIONAL TABLE delim_table_trans(id INT, name STRING, safety INT) > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE; > LOAD DATA INPATH 'table1' OVERWRITE INTO TABLE delim_table_trans; > describe formatted delim_table_trans; > SELECT * FROM delim_table_trans; > {code} > Result: > {code:java} > Table Type: MANAGED_TABLE > Table Parameters: > bucketing_version 2 > numFiles1 > numRows 0 > rawDataSize 0 > totalSize 72 > transactional true > transactional_propertiesinsert_only > A masked pattern was here > > # Storage Information > SerDe Library:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > > InputFormat: org.apache.hadoop.mapred.TextInputFormat > OutputFormat: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > Compressed: No > Num Buckets: -1 > Bucket Columns: [] > Sort Columns: [] > PREHOOK: query: SELECT * FROM delim_table_trans > PREHOOK: type: QUERY > PREHOOK: Input: default@delim_table_trans > A masked pattern was here > POSTHOOK: query: SELECT * FROM delim_table_trans > POSTHOOK: type: QUERY > POSTHOOK: Input: default@delim_table_trans > A masked pattern was here > NULL NULLNULL > NULL NULLNULL > NULL NULLNULL > NULL NULLNULL > NULL NULLNULL > NULL NULLNULL > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Issue Comment Deleted] (HIVE-23850) Allow PPD when subject is not a column with grouping sets present
[ https://issues.apache.org/jira/browse/HIVE-23850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-23850: --- Comment: was deleted (was: Commit id is https://github.com/apache/hive/commit/44aa72f096639d7b1a52ef18887016af98bd6999 . I missed the JIRA number in the commit message.) > Allow PPD when subject is not a column with grouping sets present > - > > Key: HIVE-23850 > URL: https://issues.apache.org/jira/browse/HIVE-23850 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 50m > Remaining Estimate: 0h > > After [HIVE-19653|https://issues.apache.org/jira/browse/HIVE-19653], filters > with only columns and constants are pushed down, but in some cases, this may > not work as well, for example: > SET hive.cbo.enable=false; > SELECT a, b, sum(s) > FROM T1 > GROUP BY a, b GROUPING SETS ((a), (a, b)) > HAVING upper(a) = "AAA" AND sum(s) > 100; > > SELECT upper(a), b, sum(s) > FROM T1 > GROUP BY upper(a), b GROUPING SETS ((upper(a)), (upper(a), b)) > HAVING upper(a) = "AAA" AND sum(s) > 100; > > The filters pushed down to GBY can be f(gbyKey) or gbyKey with udf , not > only the column groupby keys. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23850) Allow PPD when subject is not a column with grouping sets present
[ https://issues.apache.org/jira/browse/HIVE-23850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17159985#comment-17159985 ] Jesus Camacho Rodriguez commented on HIVE-23850: Commit id is https://github.com/apache/hive/commit/44aa72f096639d7b1a52ef18887016af98bd6999 . I missed the JIRA number in the commit message. > Allow PPD when subject is not a column with grouping sets present > - > > Key: HIVE-23850 > URL: https://issues.apache.org/jira/browse/HIVE-23850 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 50m > Remaining Estimate: 0h > > After [HIVE-19653|https://issues.apache.org/jira/browse/HIVE-19653], filters > with only columns and constants are pushed down, but in some cases, this may > not work as well, for example: > SET hive.cbo.enable=false; > SELECT a, b, sum(s) > FROM T1 > GROUP BY a, b GROUPING SETS ((a), (a, b)) > HAVING upper(a) = "AAA" AND sum(s) > 100; > > SELECT upper(a), b, sum(s) > FROM T1 > GROUP BY upper(a), b GROUPING SETS ((upper(a)), (upper(a), b)) > HAVING upper(a) = "AAA" AND sum(s) > 100; > > The filters pushed down to GBY can be f(gbyKey) or gbyKey with udf , not > only the column groupby keys. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23851) MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions
[ https://issues.apache.org/jira/browse/HIVE-23851?focusedWorklogId=460347=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460347 ] ASF GitHub Bot logged work on HIVE-23851: - Author: ASF GitHub Bot Created on: 17/Jul/20 15:05 Start Date: 17/Jul/20 15:05 Worklog Time Spent: 10m Work Description: shameersss1 commented on a change in pull request #1271: URL: https://github.com/apache/hive/pull/1271#discussion_r456501029 ## File path: standalone-metastore/metastore-server/pom.xml ## @@ -204,6 +204,11 @@ hive-storage-api ${storage-api.version} + + org.apache.hive + hive-serde Review comment: @kgyrtkirk Is this change okay? I mean we could have used reflection again to call serde classes but it will make code more complex and non-readable. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 460347) Time Spent: 0.5h (was: 20m) > MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions > > > Key: HIVE-23851 > URL: https://issues.apache.org/jira/browse/HIVE-23851 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > *Steps to reproduce:* > # Create external table > # Run msck command to sync all the partitions with metastore > # Remove one of the partition path > # Run msck repair with partition filtering > *Stack Trace:* > {code:java} > 2020-07-15T02:10:29,045 ERROR [4dad298b-28b1-4e6b-94b6-aa785b60c576 main] > ppr.PartitionExpressionForMetastore: Failed to deserialize the expression > java.lang.IndexOutOfBoundsException: Index: 110, Size: 0 > at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_192] > at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_192] > at > org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:857) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:707) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:806) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:96) > [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.convertExprToFilter(PartitionExpressionForMetastore.java:52) > [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.PartFilterExprUtil.makeExpressionTree(PartFilterExprUtil.java:48) > [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:3593) > [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:80) > [hive-standalone-metastore-server-4.0.0-SNAPSHOT-tests.jar:4.0.0-SNAPSHOT] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_192] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[?:1.8.0_192] > {code} > *Cause:* > In case of msck repair with partition filtering we expect expression proxy > class to be set as PartitionExpressionForMetastore ( > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckAnalyzer.java#L78 > ), While dropping partition we serialize the drop partition filter > expression as ( >
[jira] [Work logged] (HIVE-23850) Allow PPD when subject is not a column with grouping sets present
[ https://issues.apache.org/jira/browse/HIVE-23850?focusedWorklogId=460345=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460345 ] ASF GitHub Bot logged work on HIVE-23850: - Author: ASF GitHub Bot Created on: 17/Jul/20 15:04 Start Date: 17/Jul/20 15:04 Worklog Time Spent: 10m Work Description: jcamachor merged pull request #1255: URL: https://github.com/apache/hive/pull/1255 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 460345) Time Spent: 50m (was: 40m) > Allow PPD when subject is not a column with grouping sets present > - > > Key: HIVE-23850 > URL: https://issues.apache.org/jira/browse/HIVE-23850 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > After [HIVE-19653|https://issues.apache.org/jira/browse/HIVE-19653], filters > with only columns and constants are pushed down, but in some cases, this may > not work as well, for example: > SET hive.cbo.enable=false; > SELECT a, b, sum(s) > FROM T1 > GROUP BY a, b GROUPING SETS ((a), (a, b)) > HAVING upper(a) = "AAA" AND sum(s) > 100; > > SELECT upper(a), b, sum(s) > FROM T1 > GROUP BY upper(a), b GROUPING SETS ((upper(a)), (upper(a), b)) > HAVING upper(a) = "AAA" AND sum(s) > 100; > > The filters pushed down to GBY can be f(gbyKey) or gbyKey with udf , not > only the column groupby keys. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-23850) Allow PPD when subject is not a column with grouping sets present
[ https://issues.apache.org/jira/browse/HIVE-23850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez resolved HIVE-23850. Fix Version/s: 4.0.0 Resolution: Fixed Pushed to master, thanks for your contribution [~dengzh]! > Allow PPD when subject is not a column with grouping sets present > - > > Key: HIVE-23850 > URL: https://issues.apache.org/jira/browse/HIVE-23850 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 50m > Remaining Estimate: 0h > > After [HIVE-19653|https://issues.apache.org/jira/browse/HIVE-19653], filters > with only columns and constants are pushed down, but in some cases, this may > not work as well, for example: > SET hive.cbo.enable=false; > SELECT a, b, sum(s) > FROM T1 > GROUP BY a, b GROUPING SETS ((a), (a, b)) > HAVING upper(a) = "AAA" AND sum(s) > 100; > > SELECT upper(a), b, sum(s) > FROM T1 > GROUP BY upper(a), b GROUPING SETS ((upper(a)), (upper(a), b)) > HAVING upper(a) = "AAA" AND sum(s) > 100; > > The filters pushed down to GBY can be f(gbyKey) or gbyKey with udf , not > only the column groupby keys. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23851) MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions
[ https://issues.apache.org/jira/browse/HIVE-23851?focusedWorklogId=460344=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460344 ] ASF GitHub Bot logged work on HIVE-23851: - Author: ASF GitHub Bot Created on: 17/Jul/20 15:04 Start Date: 17/Jul/20 15:04 Worklog Time Spent: 10m Work Description: shameersss1 commented on a change in pull request #1271: URL: https://github.com/apache/hive/pull/1271#discussion_r456500229 ## File path: ql/src/test/org/apache/hadoop/hive/ql/exec/TestPartitionManagement.java ## @@ -15,7 +15,7 @@ * See the License for the specific language governing permissions and * limitations under the License. */ -package org.apache.hadoop.hive.metastore; +package org.apache.hadoop.hive.ql.exec; Review comment: Moved TestPartitionManagement.java to ql module due to dependency on PartitionExpressionForMetastore and some other ql class for serializing partition expression. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 460344) Time Spent: 20m (was: 10m) > MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions > > > Key: HIVE-23851 > URL: https://issues.apache.org/jira/browse/HIVE-23851 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > *Steps to reproduce:* > # Create external table > # Run msck command to sync all the partitions with metastore > # Remove one of the partition path > # Run msck repair with partition filtering > *Stack Trace:* > {code:java} > 2020-07-15T02:10:29,045 ERROR [4dad298b-28b1-4e6b-94b6-aa785b60c576 main] > ppr.PartitionExpressionForMetastore: Failed to deserialize the expression > java.lang.IndexOutOfBoundsException: Index: 110, Size: 0 > at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_192] > at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_192] > at > org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:857) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:707) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:806) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:96) > [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.convertExprToFilter(PartitionExpressionForMetastore.java:52) > [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.PartFilterExprUtil.makeExpressionTree(PartFilterExprUtil.java:48) > [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:3593) > [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:80) > [hive-standalone-metastore-server-4.0.0-SNAPSHOT-tests.jar:4.0.0-SNAPSHOT] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_192] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[?:1.8.0_192] > {code} > *Cause:* > In case of msck repair with partition filtering we expect expression proxy > class to be set as PartitionExpressionForMetastore ( >
[jira] [Assigned] (HIVE-23850) Allow PPD when subject is not a column with grouping sets present
[ https://issues.apache.org/jira/browse/HIVE-23850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez reassigned HIVE-23850: -- Assignee: Zhihua Deng > Allow PPD when subject is not a column with grouping sets present > - > > Key: HIVE-23850 > URL: https://issues.apache.org/jira/browse/HIVE-23850 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > After [HIVE-19653|https://issues.apache.org/jira/browse/HIVE-19653], filters > with only columns and constants are pushed down, but in some cases, this may > not work as well, for example: > SET hive.cbo.enable=false; > SELECT a, b, sum(s) > FROM T1 > GROUP BY a, b GROUPING SETS ((a), (a, b)) > HAVING upper(a) = "AAA" AND sum(s) > 100; > > SELECT upper(a), b, sum(s) > FROM T1 > GROUP BY upper(a), b GROUPING SETS ((upper(a)), (upper(a), b)) > HAVING upper(a) = "AAA" AND sum(s) > 100; > > The filters pushed down to GBY can be f(gbyKey) or gbyKey with udf , not > only the column groupby keys. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23868) Windowing function spec: support 0 preceeding/following
[ https://issues.apache.org/jira/browse/HIVE-23868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-23868: --- Fix Version/s: 4.0.0 Resolution: Fixed Status: Resolved (was: Patch Available) Pushed to master, thanks [~jdere]! > Windowing function spec: support 0 preceeding/following > --- > > Key: HIVE-23868 > URL: https://issues.apache.org/jira/browse/HIVE-23868 > Project: Hive > Issue Type: Bug > Components: Query Planning >Reporter: Jason Dere >Assignee: Jason Dere >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-23868.1.patch > > Time Spent: 20m > Remaining Estimate: 0h > > HIVE-12574 removed support for 0 PRECEDING/FOLLOWING in window function > specifications. We can restore support for this by converting 0 > PRECEDING/FOLLOWING to CURRENT ROW in the query plan, which should be the > same. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23868) Windowing function spec: support 0 preceeding/following
[ https://issues.apache.org/jira/browse/HIVE-23868?focusedWorklogId=460342=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460342 ] ASF GitHub Bot logged work on HIVE-23868: - Author: ASF GitHub Bot Created on: 17/Jul/20 14:55 Start Date: 17/Jul/20 14:55 Worklog Time Spent: 10m Work Description: jcamachor merged pull request #1269: URL: https://github.com/apache/hive/pull/1269 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 460342) Time Spent: 20m (was: 10m) > Windowing function spec: support 0 preceeding/following > --- > > Key: HIVE-23868 > URL: https://issues.apache.org/jira/browse/HIVE-23868 > Project: Hive > Issue Type: Bug > Components: Query Planning >Reporter: Jason Dere >Assignee: Jason Dere >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23868.1.patch > > Time Spent: 20m > Remaining Estimate: 0h > > HIVE-12574 removed support for 0 PRECEDING/FOLLOWING in window function > specifications. We can restore support for this by converting 0 > PRECEDING/FOLLOWING to CURRENT ROW in the query plan, which should be the > same. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22869) Add locking benchmark to metastore-tools/metastore-benchmarks
[ https://issues.apache.org/jira/browse/HIVE-22869?focusedWorklogId=460300=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460300 ] ASF GitHub Bot logged work on HIVE-22869: - Author: ASF GitHub Bot Created on: 17/Jul/20 13:16 Start Date: 17/Jul/20 13:16 Worklog Time Spent: 10m Work Description: zchovan commented on a change in pull request #1073: URL: https://github.com/apache/hive/pull/1073#discussion_r456434704 ## File path: standalone-metastore/metastore-tools/tools-common/src/main/java/org/apache/hadoop/hive/metastore/tools/HMSClient.java ## @@ -345,21 +348,44 @@ boolean openTxn(int numTxns) throws TException { return openTxns; } + List getOpenTxnsInfo() throws TException { +return client.get_open_txns_info().getOpen_txns(); + } + boolean commitTxn(long txnId) throws TException { client.commit_txn(new CommitTxnRequest(txnId)); return true; } - boolean abortTxn(long txnId) throws TException { -client.abort_txn(new AbortTxnRequest(txnId)); + boolean abortTxns(List txnIds) throws TException { +client.abort_txns(new AbortTxnsRequest(txnIds)); return true; } - boolean abortTxns(List txnIds) throws TException { -client.abort_txns(new AbortTxnsRequest(txnIds)); + boolean allocateTableWriteIds(String dbName, String tableName, List openTxns) throws TException { +AllocateTableWriteIdsRequest awiRqst = new AllocateTableWriteIdsRequest(dbName, tableName); +openTxns.forEach(t -> { + awiRqst.addToTxnIds(t); +}); + +client.allocate_table_write_ids(awiRqst); return true; } + boolean getValidWriteIds(List fullTableNames) throws TException { Review comment: ah sorry, I was mistaken, the reason why it never returned the writeIds is because they are never used, the benchmark is just executing the api call. The return value from the hms is actually a GetValidWriteIdsResponse object, not a list. As it is never used I'm not sure if we need to change this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 460300) Time Spent: 2h (was: 1h 50m) > Add locking benchmark to metastore-tools/metastore-benchmarks > - > > Key: HIVE-22869 > URL: https://issues.apache.org/jira/browse/HIVE-22869 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Chovan >Assignee: Zoltan Chovan >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22869.2.patch, HIVE-22869.3.patch, > HIVE-22869.4.patch, HIVE-22869.5.patch, HIVE-22869.6.patch, > HIVE-22869.7.patch, HIVE-22869.8.patch, HIVE-22869.9.patch, HIVE-22869.patch > > Time Spent: 2h > Remaining Estimate: 0h > > Add the possibility to run benchmarks on opening lock in the HMS -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22869) Add locking benchmark to metastore-tools/metastore-benchmarks
[ https://issues.apache.org/jira/browse/HIVE-22869?focusedWorklogId=460294=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460294 ] ASF GitHub Bot logged work on HIVE-22869: - Author: ASF GitHub Bot Created on: 17/Jul/20 13:05 Start Date: 17/Jul/20 13:05 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1073: URL: https://github.com/apache/hive/pull/1073#discussion_r456421989 ## File path: standalone-metastore/metastore-tools/tools-common/src/main/java/org/apache/hadoop/hive/metastore/tools/HMSClient.java ## @@ -345,21 +348,44 @@ boolean openTxn(int numTxns) throws TException { return openTxns; } + List getOpenTxnsInfo() throws TException { +return client.get_open_txns_info().getOpen_txns(); + } + boolean commitTxn(long txnId) throws TException { client.commit_txn(new CommitTxnRequest(txnId)); return true; } - boolean abortTxn(long txnId) throws TException { -client.abort_txn(new AbortTxnRequest(txnId)); + boolean abortTxns(List txnIds) throws TException { +client.abort_txns(new AbortTxnsRequest(txnIds)); return true; } - boolean abortTxns(List txnIds) throws TException { -client.abort_txns(new AbortTxnsRequest(txnIds)); + boolean allocateTableWriteIds(String dbName, String tableName, List openTxns) throws TException { +AllocateTableWriteIdsRequest awiRqst = new AllocateTableWriteIdsRequest(dbName, tableName); +openTxns.forEach(t -> { + awiRqst.addToTxnIds(t); +}); + +client.allocate_table_write_ids(awiRqst); return true; } + boolean getValidWriteIds(List fullTableNames) throws TException { Review comment: I don't get what does it have to do with throwingSupplierWrapper. throwingSupplierWrapper just handles checked exceptions. Could you please elaborate here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 460294) Time Spent: 1h 50m (was: 1h 40m) > Add locking benchmark to metastore-tools/metastore-benchmarks > - > > Key: HIVE-22869 > URL: https://issues.apache.org/jira/browse/HIVE-22869 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Chovan >Assignee: Zoltan Chovan >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22869.2.patch, HIVE-22869.3.patch, > HIVE-22869.4.patch, HIVE-22869.5.patch, HIVE-22869.6.patch, > HIVE-22869.7.patch, HIVE-22869.8.patch, HIVE-22869.9.patch, HIVE-22869.patch > > Time Spent: 1h 50m > Remaining Estimate: 0h > > Add the possibility to run benchmarks on opening lock in the HMS -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23818) Use String Switch-Case Statement in StatUtils
[ https://issues.apache.org/jira/browse/HIVE-23818?focusedWorklogId=460293=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460293 ] ASF GitHub Bot logged work on HIVE-23818: - Author: ASF GitHub Bot Created on: 17/Jul/20 13:04 Start Date: 17/Jul/20 13:04 Worklog Time Spent: 10m Work Description: belugabehr commented on pull request #1229: URL: https://github.com/apache/hive/pull/1229#issuecomment-660094981 @kgyrtkirk Thanks a million for pointing that out. I addressed the issue, tests pass, and I have merged. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 460293) Time Spent: 1.5h (was: 1h 20m) > Use String Switch-Case Statement in StatUtils > - > > Key: HIVE-23818 > URL: https://issues.apache.org/jira/browse/HIVE-23818 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > switch-case statements with Java is now available. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-23818) Use String Switch-Case Statement in StatUtils
[ https://issues.apache.org/jira/browse/HIVE-23818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor resolved HIVE-23818. --- Fix Version/s: 4.0.0 Resolution: Fixed Pushed to master! > Use String Switch-Case Statement in StatUtils > - > > Key: HIVE-23818 > URL: https://issues.apache.org/jira/browse/HIVE-23818 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > switch-case statements with Java is now available. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23818) Use String Switch-Case Statement in StatUtils
[ https://issues.apache.org/jira/browse/HIVE-23818?focusedWorklogId=460292=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460292 ] ASF GitHub Bot logged work on HIVE-23818: - Author: ASF GitHub Bot Created on: 17/Jul/20 13:03 Start Date: 17/Jul/20 13:03 Worklog Time Spent: 10m Work Description: belugabehr merged pull request #1229: URL: https://github.com/apache/hive/pull/1229 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 460292) Time Spent: 1h 20m (was: 1h 10m) > Use String Switch-Case Statement in StatUtils > - > > Key: HIVE-23818 > URL: https://issues.apache.org/jira/browse/HIVE-23818 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > switch-case statements with Java is now available. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22869) Add locking benchmark to metastore-tools/metastore-benchmarks
[ https://issues.apache.org/jira/browse/HIVE-22869?focusedWorklogId=460291=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460291 ] ASF GitHub Bot logged work on HIVE-22869: - Author: ASF GitHub Bot Created on: 17/Jul/20 13:02 Start Date: 17/Jul/20 13:02 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1073: URL: https://github.com/apache/hive/pull/1073#discussion_r456427550 ## File path: standalone-metastore/metastore-tools/metastore-benchmarks/src/main/java/org/apache/hadoop/hive/metastore/tools/ACIDBenchmarks.java ## @@ -0,0 +1,247 @@ +package org.apache.hadoop.hive.metastore.tools; + +import org.apache.hadoop.hive.metastore.api.DataOperationType; +import org.apache.hadoop.hive.metastore.api.LockComponent; +import org.apache.hadoop.hive.metastore.api.LockRequest; +import org.apache.logging.log4j.Level; +import org.apache.logging.log4j.LogManager; +import org.apache.logging.log4j.core.LoggerContext; +import org.apache.logging.log4j.core.config.Configuration; +import org.apache.thrift.TException; +import org.openjdk.jmh.annotations.Benchmark; +import org.openjdk.jmh.annotations.Param; +import org.openjdk.jmh.annotations.Scope; +import org.openjdk.jmh.annotations.Setup; +import org.openjdk.jmh.annotations.State; +import org.openjdk.jmh.annotations.TearDown; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.ArrayList; +import java.util.List; + +import static org.apache.hadoop.hive.metastore.tools.BenchmarkUtils.createManyTables; +import static org.apache.hadoop.hive.metastore.tools.BenchmarkUtils.dropManyTables; +import static org.apache.hadoop.hive.metastore.tools.Util.throwingSupplierWrapper; + +public class ACIDBenchmarks { + + private static final Logger LOG = LoggerFactory.getLogger(CoreContext.class); + + @State(Scope.Benchmark) + public static class CoreContext { +@Param("1") +protected int howMany; + +@State(Scope.Thread) +public static class ThreadState { + HMSClient client; + + @Setup + public void doSetup() throws Exception { +LOG.debug("Creating client"); +client = HMSConfig.getInstance().newClient(); + } + + @TearDown + public void doTearDown() throws Exception { +client.close(); +LOG.debug("Closed a connection to metastore."); + } +} + +@Setup +public void setup() { + LoggerContext ctx = (LoggerContext) LogManager.getContext(false); + Configuration ctxConfig = ctx.getConfiguration(); + ctxConfig.getLoggerConfig(CoreContext.class.getName()).setLevel(Level.INFO); + ctx.updateLoggers(ctxConfig); +} + } + + @State(Scope.Benchmark) + public static class TestOpenTxn extends CoreContext { + +@State(Scope.Thread) +public static class ThreadState extends CoreContext.ThreadState { + List openTxns = new ArrayList<>(); + + @TearDown + public void doTearDown() throws Exception { +client.abortTxns(openTxns); +LOG.debug("aborted all opened txns"); + } + + void addTxn(List openTxn) { +openTxns.addAll(openTxn); + } +} + +@Benchmark +public void openTxn(TestOpenTxn.ThreadState state) throws TException { + state.addTxn(state.client.openTxn(howMany)); + LOG.debug("opened txns, count=", howMany); +} + } + + @State(Scope.Benchmark) + public static class TestLocking extends CoreContext { +private int nTables; + +@Param("0") +private int nPartitions; + +private List lockComponents; + +@Setup +public void setup() { + this.nTables = (nPartitions != 0) ? howMany / nPartitions : howMany; + createLockComponents(); +} + +@State(Scope.Thread) +public static class ThreadState extends CoreContext.ThreadState { + List openTxns = new ArrayList<>(); + long txnId; + + @Setup(org.openjdk.jmh.annotations.Level.Invocation) + public void iterSetup() { +txnId = executeOpenTxnAndGetTxnId(client); +LOG.debug("opened txn, id={}", txnId); +openTxns.add(txnId); + } + + @TearDown + public void doTearDown() throws Exception { +client.abortTxns(openTxns); +if (BenchmarkUtils.checkTxnsCleaned(client, openTxns) == false) { + LOG.error("Something went wrong with the cleanup of txns"); +} +LOG.debug("aborted all opened txns"); + } +} + +@Benchmark +public void lock(TestLocking.ThreadState state) { + LOG.debug("sending lock request"); + executeLock(state.client, state.txnId, lockComponents); +} + +private void createLockComponents() { + lockComponents = new ArrayList<>(); + + for (int i = 0; i < nTables; i++) { +for (int j = 0; j < nPartitions - (nPartitions > 1 ? 1 : 0); j++) { + lockComponents.add( +
[jira] [Work logged] (HIVE-23862) Clean Up StatsUtils and BasicStats
[ https://issues.apache.org/jira/browse/HIVE-23862?focusedWorklogId=460290=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460290 ] ASF GitHub Bot logged work on HIVE-23862: - Author: ASF GitHub Bot Created on: 17/Jul/20 13:00 Start Date: 17/Jul/20 13:00 Worklog Time Spent: 10m Work Description: belugabehr opened a new pull request #1264: URL: https://github.com/apache/hive/pull/1264 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 460290) Time Spent: 1h (was: 50m) > Clean Up StatsUtils and BasicStats > -- > > Key: HIVE-23862 > URL: https://issues.apache.org/jira/browse/HIVE-23862 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > Miscellaneous improvements to readability and performance. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23862) Clean Up StatsUtils and BasicStats
[ https://issues.apache.org/jira/browse/HIVE-23862?focusedWorklogId=460289=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460289 ] ASF GitHub Bot logged work on HIVE-23862: - Author: ASF GitHub Bot Created on: 17/Jul/20 13:00 Start Date: 17/Jul/20 13:00 Worklog Time Spent: 10m Work Description: belugabehr closed pull request #1264: URL: https://github.com/apache/hive/pull/1264 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 460289) Time Spent: 50m (was: 40m) > Clean Up StatsUtils and BasicStats > -- > > Key: HIVE-23862 > URL: https://issues.apache.org/jira/browse/HIVE-23862 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > Miscellaneous improvements to readability and performance. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23835) Repl Dump should dump function binaries to staging directory
[ https://issues.apache.org/jira/browse/HIVE-23835?focusedWorklogId=460288=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460288 ] ASF GitHub Bot logged work on HIVE-23835: - Author: ASF GitHub Bot Created on: 17/Jul/20 12:59 Start Date: 17/Jul/20 12:59 Worklog Time Spent: 10m Work Description: pkumarsinha commented on a change in pull request #1249: URL: https://github.com/apache/hive/pull/1249#discussion_r456425658 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/dump/events/CreateFunctionHandler.java ## @@ -41,13 +53,36 @@ CreateFunctionMessage eventMessage(String stringRepresentation) { public void handle(Context withinContext) throws Exception { LOG.info("Processing#{} CREATE_FUNCTION message : {}", fromEventId(), eventMessageAsJSON); Path metadataPath = new Path(withinContext.eventRoot, EximUtil.METADATA_NAME); +Path dataPath = new Path(withinContext.eventRoot, EximUtil.DATA_PATH_NAME); FileSystem fileSystem = metadataPath.getFileSystem(withinContext.hiveConf); - +List functionBinaryCopyPaths = new ArrayList<>(); try (JsonWriter jsonWriter = new JsonWriter(fileSystem, metadataPath)) { - new FunctionSerializer(eventMessage.getFunctionObj(), withinContext.hiveConf) - .writeTo(jsonWriter, withinContext.replicationSpec); + FunctionSerializer serializer = new FunctionSerializer(eventMessage.getFunctionObj(), + dataPath, withinContext.hiveConf); + serializer.writeTo(jsonWriter, withinContext.replicationSpec); + functionBinaryCopyPaths.addAll(serializer.getFunctionBinaryCopyPaths()); } withinContext.createDmd(this).write(); +copyFunctionBinaries(functionBinaryCopyPaths, withinContext.hiveConf); + } + + private void copyFunctionBinaries(List functionBinaryCopyPaths, HiveConf hiveConf) Review comment: no, for function binary copy, we are not using the load flag. It is retained as it is currently. meaning: earlier during load it used to copy from src location. Now with this change, it will copy from staging location. So that src cluster visibility in not required during load of function. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 460288) Time Spent: 50m (was: 40m) > Repl Dump should dump function binaries to staging directory > > > Key: HIVE-23835 > URL: https://issues.apache.org/jira/browse/HIVE-23835 > Project: Hive > Issue Type: Task >Reporter: Pravin Sinha >Assignee: Pravin Sinha >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23835.01.patch, HIVE-23835.02.patch > > Time Spent: 50m > Remaining Estimate: 0h > > {color:#172b4d}When hive function's binaries are on source HDFS, repl dump > should dump it to the staging location in order to break cross clusters > visibility requirement.{color} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23474) Deny Repl Dump if the database is a target of replication
[ https://issues.apache.org/jira/browse/HIVE-23474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17159906#comment-17159906 ] Pravin Sinha commented on HIVE-23474: - +1 > Deny Repl Dump if the database is a target of replication > - > > Key: HIVE-23474 > URL: https://issues.apache.org/jira/browse/HIVE-23474 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23474.01.patch, HIVE-23474.02.patch, > HIVE-23474.03.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22869) Add locking benchmark to metastore-tools/metastore-benchmarks
[ https://issues.apache.org/jira/browse/HIVE-22869?focusedWorklogId=460285=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460285 ] ASF GitHub Bot logged work on HIVE-22869: - Author: ASF GitHub Bot Created on: 17/Jul/20 12:55 Start Date: 17/Jul/20 12:55 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1073: URL: https://github.com/apache/hive/pull/1073#discussion_r456423568 ## File path: standalone-metastore/metastore-tools/metastore-benchmarks/src/main/java/org/apache/hadoop/hive/metastore/tools/BenchmarkUtils.java ## @@ -0,0 +1,72 @@ +package org.apache.hadoop.hive.metastore.tools; + +import org.apache.hadoop.hive.metastore.TableType; +import org.apache.hadoop.hive.metastore.api.FieldSchema; +import org.apache.hadoop.hive.metastore.api.TxnInfo; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collections; +import java.util.List; +import java.util.stream.IntStream; + +import static org.apache.hadoop.hive.metastore.tools.Util.createSchema; +import static org.apache.hadoop.hive.metastore.tools.Util.throwingSupplierWrapper; + +public class BenchmarkUtils { + private static final Logger LOG = LoggerFactory.getLogger(BenchmarkUtils.class); + + + static void createManyTables(HMSClient client, int howMany, String dbName, String format) { +List columns = createSchema(new ArrayList<>(Arrays.asList("name", "string"))); +List partitions = createSchema(new ArrayList<>(Arrays.asList("date", "string"))); +IntStream.range(0, howMany) +.forEach(i -> +throwingSupplierWrapper(() -> client.createTable( +new Util.TableBuilder(dbName, String.format(format, i)) +.withType(TableType.MANAGED_TABLE) +.withColumns(columns) +.withPartitionKeys(partitions) +.build(; + } + + static void dropManyTables(HMSClient client, int howMany, String dbName, String format) { +IntStream.range(0, howMany) +.forEach(i -> +throwingSupplierWrapper(() -> client.dropTable(dbName, String.format(format, i; + } + + // Create a simple table with a single column and single partition + static void createPartitionedTable(HMSClient client, String dbName, String tableName) { +throwingSupplierWrapper(() -> client.createTable( +new Util.TableBuilder(dbName, tableName) +.withType(TableType.MANAGED_TABLE) + .withColumns(createSchema(Collections.singletonList("name:string"))) +.withPartitionKeys(createSchema(Collections.singletonList("date"))) +.build())); + } + + static boolean checkTxnsCleaned(HMSClient client, List txnsOpenedByBenchmark) throws InterruptedException { +// let's wait the default cleaner run period +Thread.sleep(10); +List notCleanedTxns = new ArrayList<>(); +throwingSupplierWrapper(() -> { + List txnInfos = client.getOpenTxnsInfo(); Review comment: can't see any change here This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 460285) Time Spent: 1.5h (was: 1h 20m) > Add locking benchmark to metastore-tools/metastore-benchmarks > - > > Key: HIVE-22869 > URL: https://issues.apache.org/jira/browse/HIVE-22869 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Chovan >Assignee: Zoltan Chovan >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22869.2.patch, HIVE-22869.3.patch, > HIVE-22869.4.patch, HIVE-22869.5.patch, HIVE-22869.6.patch, > HIVE-22869.7.patch, HIVE-22869.8.patch, HIVE-22869.9.patch, HIVE-22869.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > Add the possibility to run benchmarks on opening lock in the HMS -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22869) Add locking benchmark to metastore-tools/metastore-benchmarks
[ https://issues.apache.org/jira/browse/HIVE-22869?focusedWorklogId=460284=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460284 ] ASF GitHub Bot logged work on HIVE-22869: - Author: ASF GitHub Bot Created on: 17/Jul/20 12:54 Start Date: 17/Jul/20 12:54 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1073: URL: https://github.com/apache/hive/pull/1073#discussion_r456423093 ## File path: standalone-metastore/metastore-tools/metastore-benchmarks/src/main/java/org/apache/hadoop/hive/metastore/tools/BenchmarkTool.java ## @@ -141,12 +175,62 @@ private static void saveDataFile(String location, String name, } } - @Override public void run() { -LOG.info("Using warmup " + warmup + -" spin " + spinCount + " nparams " + nParameters + " threads " + nThreads); +LOG.info("Using warmup " + warmup + " spin " + spinCount + " nparams " + Arrays.toString(nParameters) + " threads " ++ nThreads); +HMSConfig.getInstance().init(host, port, confDir); + +if (runMode == RunModes.ALL) { Review comment: can't see change here This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 460284) Time Spent: 1h 20m (was: 1h 10m) > Add locking benchmark to metastore-tools/metastore-benchmarks > - > > Key: HIVE-22869 > URL: https://issues.apache.org/jira/browse/HIVE-22869 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Chovan >Assignee: Zoltan Chovan >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22869.2.patch, HIVE-22869.3.patch, > HIVE-22869.4.patch, HIVE-22869.5.patch, HIVE-22869.6.patch, > HIVE-22869.7.patch, HIVE-22869.8.patch, HIVE-22869.9.patch, HIVE-22869.patch > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Add the possibility to run benchmarks on opening lock in the HMS -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22869) Add locking benchmark to metastore-tools/metastore-benchmarks
[ https://issues.apache.org/jira/browse/HIVE-22869?focusedWorklogId=460283=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460283 ] ASF GitHub Bot logged work on HIVE-22869: - Author: ASF GitHub Bot Created on: 17/Jul/20 12:52 Start Date: 17/Jul/20 12:52 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1073: URL: https://github.com/apache/hive/pull/1073#discussion_r456421989 ## File path: standalone-metastore/metastore-tools/tools-common/src/main/java/org/apache/hadoop/hive/metastore/tools/HMSClient.java ## @@ -345,21 +348,44 @@ boolean openTxn(int numTxns) throws TException { return openTxns; } + List getOpenTxnsInfo() throws TException { +return client.get_open_txns_info().getOpen_txns(); + } + boolean commitTxn(long txnId) throws TException { client.commit_txn(new CommitTxnRequest(txnId)); return true; } - boolean abortTxn(long txnId) throws TException { -client.abort_txn(new AbortTxnRequest(txnId)); + boolean abortTxns(List txnIds) throws TException { +client.abort_txns(new AbortTxnsRequest(txnIds)); return true; } - boolean abortTxns(List txnIds) throws TException { -client.abort_txns(new AbortTxnsRequest(txnIds)); + boolean allocateTableWriteIds(String dbName, String tableName, List openTxns) throws TException { +AllocateTableWriteIdsRequest awiRqst = new AllocateTableWriteIdsRequest(dbName, tableName); +openTxns.forEach(t -> { + awiRqst.addToTxnIds(t); +}); + +client.allocate_table_write_ids(awiRqst); return true; } + boolean getValidWriteIds(List fullTableNames) throws TException { Review comment: I don't get what does it have to do with throwingSupplierWrapper. throwingSupplierWrapper just handles checked exceptions. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 460283) Time Spent: 1h 10m (was: 1h) > Add locking benchmark to metastore-tools/metastore-benchmarks > - > > Key: HIVE-22869 > URL: https://issues.apache.org/jira/browse/HIVE-22869 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Chovan >Assignee: Zoltan Chovan >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22869.2.patch, HIVE-22869.3.patch, > HIVE-22869.4.patch, HIVE-22869.5.patch, HIVE-22869.6.patch, > HIVE-22869.7.patch, HIVE-22869.8.patch, HIVE-22869.9.patch, HIVE-22869.patch > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Add the possibility to run benchmarks on opening lock in the HMS -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23474) Deny Repl Dump if the database is a target of replication
[ https://issues.apache.org/jira/browse/HIVE-23474?focusedWorklogId=460274=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460274 ] ASF GitHub Bot logged work on HIVE-23474: - Author: ASF GitHub Bot Created on: 17/Jul/20 12:21 Start Date: 17/Jul/20 12:21 Worklog Time Spent: 10m Work Description: aasha commented on a change in pull request #1247: URL: https://github.com/apache/hive/pull/1247#discussion_r456406599 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcrossInstances.java ## @@ -944,96 +944,6 @@ public void testIncrementalDumpMultiIteration() throws Throwable { Assert.assertEquals(IncrementalLoadTasksBuilder.getNumIteration(), numEvents); } - @Test - public void testIfCkptAndSourceOfReplPropsIgnoredByReplDump() throws Throwable { Review comment: Agree, added This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 460274) Time Spent: 0.5h (was: 20m) > Deny Repl Dump if the database is a target of replication > - > > Key: HIVE-23474 > URL: https://issues.apache.org/jira/browse/HIVE-23474 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23474.01.patch, HIVE-23474.02.patch, > HIVE-23474.03.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23474) Deny Repl Dump if the database is a target of replication
[ https://issues.apache.org/jira/browse/HIVE-23474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aasha Medhi updated HIVE-23474: --- Attachment: HIVE-23474.03.patch Status: Patch Available (was: In Progress) > Deny Repl Dump if the database is a target of replication > - > > Key: HIVE-23474 > URL: https://issues.apache.org/jira/browse/HIVE-23474 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23474.01.patch, HIVE-23474.02.patch, > HIVE-23474.03.patch > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23474) Deny Repl Dump if the database is a target of replication
[ https://issues.apache.org/jira/browse/HIVE-23474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aasha Medhi updated HIVE-23474: --- Status: In Progress (was: Patch Available) > Deny Repl Dump if the database is a target of replication > - > > Key: HIVE-23474 > URL: https://issues.apache.org/jira/browse/HIVE-23474 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23474.01.patch, HIVE-23474.02.patch > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23835) Repl Dump should dump function binaries to staging directory
[ https://issues.apache.org/jira/browse/HIVE-23835?focusedWorklogId=460270=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460270 ] ASF GitHub Bot logged work on HIVE-23835: - Author: ASF GitHub Bot Created on: 17/Jul/20 12:15 Start Date: 17/Jul/20 12:15 Worklog Time Spent: 10m Work Description: aasha commented on a change in pull request #1249: URL: https://github.com/apache/hive/pull/1249#discussion_r456403763 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/dump/events/CreateFunctionHandler.java ## @@ -41,13 +53,36 @@ CreateFunctionMessage eventMessage(String stringRepresentation) { public void handle(Context withinContext) throws Exception { LOG.info("Processing#{} CREATE_FUNCTION message : {}", fromEventId(), eventMessageAsJSON); Path metadataPath = new Path(withinContext.eventRoot, EximUtil.METADATA_NAME); +Path dataPath = new Path(withinContext.eventRoot, EximUtil.DATA_PATH_NAME); FileSystem fileSystem = metadataPath.getFileSystem(withinContext.hiveConf); - +List functionBinaryCopyPaths = new ArrayList<>(); try (JsonWriter jsonWriter = new JsonWriter(fileSystem, metadataPath)) { - new FunctionSerializer(eventMessage.getFunctionObj(), withinContext.hiveConf) - .writeTo(jsonWriter, withinContext.replicationSpec); + FunctionSerializer serializer = new FunctionSerializer(eventMessage.getFunctionObj(), + dataPath, withinContext.hiveConf); + serializer.writeTo(jsonWriter, withinContext.replicationSpec); + functionBinaryCopyPaths.addAll(serializer.getFunctionBinaryCopyPaths()); } withinContext.createDmd(this).write(); +copyFunctionBinaries(functionBinaryCopyPaths, withinContext.hiveConf); + } + + private void copyFunctionBinaries(List functionBinaryCopyPaths, HiveConf hiveConf) Review comment: Does this depend on whether copy of load flag is true or false? Or always we will do it at the time of load? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 460270) Time Spent: 40m (was: 0.5h) > Repl Dump should dump function binaries to staging directory > > > Key: HIVE-23835 > URL: https://issues.apache.org/jira/browse/HIVE-23835 > Project: Hive > Issue Type: Task >Reporter: Pravin Sinha >Assignee: Pravin Sinha >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23835.01.patch, HIVE-23835.02.patch > > Time Spent: 40m > Remaining Estimate: 0h > > {color:#172b4d}When hive function's binaries are on source HDFS, repl dump > should dump it to the staging location in order to break cross clusters > visibility requirement.{color} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23871) ObjectStore should properly handle MicroManaged Table properties
[ https://issues.apache.org/jira/browse/HIVE-23871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Panagiotis Garefalakis updated HIVE-23871: -- Component/s: Metastore > ObjectStore should properly handle MicroManaged Table properties > > > Key: HIVE-23871 > URL: https://issues.apache.org/jira/browse/HIVE-23871 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Attachments: table1 > > > HIVE-23281 optimizes StorageDescriptor conversion as part of the ObjectStore > by skipping particular Table properties like SkewInfo, bucketCols, ordering > etc. > However, it does that for all Transactional Tables – not only ACID – causing > MicroManaged Tables to behave abnormally. > MicroManaged (insert_only) tables may miss needed properties such as Storage > Desc Params – that may define how lines are delimited (like in the example > below): > To repro the issue: > {code:java} > CREATE TRANSACTIONAL TABLE delim_table_trans(id INT, name STRING, safety INT) > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE; > LOAD DATA INPATH 'table1' OVERWRITE INTO TABLE delim_table_trans; > describe formatted delim_table_trans; > SELECT * FROM delim_table_trans; > {code} > Result: > {code:java} > Table Type: MANAGED_TABLE > Table Parameters: > bucketing_version 2 > numFiles1 > numRows 0 > rawDataSize 0 > totalSize 72 > transactional true > transactional_propertiesinsert_only > A masked pattern was here > > # Storage Information > SerDe Library:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > > InputFormat: org.apache.hadoop.mapred.TextInputFormat > OutputFormat: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > Compressed: No > Num Buckets: -1 > Bucket Columns: [] > Sort Columns: [] > PREHOOK: query: SELECT * FROM delim_table_trans > PREHOOK: type: QUERY > PREHOOK: Input: default@delim_table_trans > A masked pattern was here > POSTHOOK: query: SELECT * FROM delim_table_trans > POSTHOOK: type: QUERY > POSTHOOK: Input: default@delim_table_trans > A masked pattern was here > NULL NULLNULL > NULL NULLNULL > NULL NULLNULL > NULL NULLNULL > NULL NULLNULL > NULL NULLNULL > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23871) ObjectStore should properly handle MicroManaged Table properties
[ https://issues.apache.org/jira/browse/HIVE-23871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Panagiotis Garefalakis updated HIVE-23871: -- Description: HIVE-23281 optimizes StorageDescriptor conversion as part of the ObjectStore by skipping particular Table properties like SkewInfo, bucketCols, ordering etc. However, it does that for all Transactional Tables – not only ACID – causing MicroManaged Tables to behave abnormally. MicroManaged (insert_only) tables may miss needed properties such as Storage Desc Params – that may define how lines are delimited (like in the example below): To repro the issue: {code:java} CREATE TRANSACTIONAL TABLE delim_table_trans(id INT, name STRING, safety INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE; LOAD DATA INPATH 'table1' OVERWRITE INTO TABLE delim_table_trans; describe formatted delim_table_trans; SELECT * FROM delim_table_trans; {code} Result: {code:java} Table Type: MANAGED_TABLE Table Parameters: bucketing_version 2 numFiles1 numRows 0 rawDataSize 0 totalSize 72 transactional true transactional_propertiesinsert_only A masked pattern was here # Storage Information SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat:org.apache.hadoop.mapred.TextInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Compressed: No Num Buckets:-1 Bucket Columns: [] Sort Columns: [] PREHOOK: query: SELECT * FROM delim_table_trans PREHOOK: type: QUERY PREHOOK: Input: default@delim_table_trans A masked pattern was here POSTHOOK: query: SELECT * FROM delim_table_trans POSTHOOK: type: QUERY POSTHOOK: Input: default@delim_table_trans A masked pattern was here NULLNULLNULL NULLNULLNULL NULLNULLNULL NULLNULLNULL NULLNULLNULL NULLNULLNULL {code} was: HIVE-23281 optimizes StorageDescriptor conversion as part of the ObjectStore by skipping particular Table properties like SkewInfo, bucketCols, ordering etc. However, it does that for all Transactional Tables – not only ACID – causing MicroManaged Tables to behave abnormally. MicroManaged (insert_only) tables may miss needed properties such as Storage Desc Params – that may define how lines are delimited (like in the example below): To repro the issue: {code:java} CREATE TRANSACTIONAL TABLE delim_table_trans(id INT, name STRING, safety INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE; LOAD DATA INPATH 'table1' OVERWRITE INTO TABLE delim_table_trans; describe formatted delim_table_trans; SELECT * FROM delim_table_trans; {code} Result: {code:java} # Storage Information SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat:org.apache.hadoop.mapred.TextInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Compressed: No Num Buckets:-1 Bucket Columns: [] Sort Columns: [] PREHOOK: query: SELECT * FROM delim_table_trans PREHOOK: type: QUERY PREHOOK: Input: default@delim_table_trans A masked pattern was here POSTHOOK: query: SELECT * FROM delim_table_trans POSTHOOK: type: QUERY POSTHOOK: Input: default@delim_table_trans A masked pattern was here NULLNULLNULL NULLNULLNULL NULLNULLNULL NULLNULLNULL NULLNULLNULL NULLNULLNULL {code} > ObjectStore should properly handle MicroManaged Table properties > > > Key: HIVE-23871 > URL: https://issues.apache.org/jira/browse/HIVE-23871 > Project: Hive > Issue Type: Bug >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Attachments: table1 > > > HIVE-23281 optimizes StorageDescriptor conversion as part of the ObjectStore > by skipping particular Table properties like SkewInfo, bucketCols, ordering > etc. > However, it does that for all Transactional Tables – not only ACID – causing > MicroManaged Tables to behave abnormally. > MicroManaged (insert_only) tables may miss needed properties such as Storage > Desc Params –
[jira] [Updated] (HIVE-23871) ObjectStore should properly handle MicroManaged Table properties
[ https://issues.apache.org/jira/browse/HIVE-23871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Panagiotis Garefalakis updated HIVE-23871: -- Description: HIVE-23281 optimizes StorageDescriptor conversion as part of the ObjectStore by skipping particular Table properties like SkewInfo, bucketCols, ordering etc. However, it does that for all Transactional Tables – not only ACID – causing MicroManaged Tables to behave abnormally. MicroManaged (insert_only) tables may miss needed properties such as Storage Desc Params – that may define how lines are delimited (like in the example below): To repro the issue: {code:java} CREATE TRANSACTIONAL TABLE delim_table_trans(id INT, name STRING, safety INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE; LOAD DATA INPATH 'table1' OVERWRITE INTO TABLE delim_table_trans; describe formatted delim_table_trans; SELECT * FROM delim_table_trans; {code} Result: {code:java} # Storage Information SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat:org.apache.hadoop.mapred.TextInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Compressed: No Num Buckets:-1 Bucket Columns: [] Sort Columns: [] PREHOOK: query: SELECT * FROM delim_table_trans PREHOOK: type: QUERY PREHOOK: Input: default@delim_table_trans A masked pattern was here POSTHOOK: query: SELECT * FROM delim_table_trans POSTHOOK: type: QUERY POSTHOOK: Input: default@delim_table_trans A masked pattern was here NULLNULLNULL NULLNULLNULL NULLNULLNULL NULLNULLNULL NULLNULLNULL NULLNULLNULL {code} was: HIVE-23281 optimizes StorageDescriptor conversion as part of the ObjectStore by skipping particular Table properties like SkewInfo, bucketCols, ordering etc. However, it does that for all Transactional Tables -- not only ACID. This causes MicroManaged (insert_only) table to skip needed properties such as Storage Desc Params -- that may define how lines are delimited (like in the example below): To repro the issue: {code:java} CREATE TRANSACTIONAL TABLE delim_table_trans(id INT, name STRING, safety INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE; LOAD DATA INPATH 'table1' OVERWRITE INTO TABLE delim_table_trans; describe formatted delim_table_trans; SELECT * FROM delim_table_trans; {code} Result: {code:java} # Storage Information SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat:org.apache.hadoop.mapred.TextInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Compressed: No Num Buckets:-1 Bucket Columns: [] Sort Columns: [] PREHOOK: query: SELECT * FROM delim_table_trans PREHOOK: type: QUERY PREHOOK: Input: default@delim_table_trans A masked pattern was here POSTHOOK: query: SELECT * FROM delim_table_trans POSTHOOK: type: QUERY POSTHOOK: Input: default@delim_table_trans A masked pattern was here NULLNULLNULL NULLNULLNULL NULLNULLNULL NULLNULLNULL NULLNULLNULL NULLNULLNULL {code} > ObjectStore should properly handle MicroManaged Table properties > > > Key: HIVE-23871 > URL: https://issues.apache.org/jira/browse/HIVE-23871 > Project: Hive > Issue Type: Bug >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Attachments: table1 > > > HIVE-23281 optimizes StorageDescriptor conversion as part of the ObjectStore > by skipping particular Table properties like SkewInfo, bucketCols, ordering > etc. > However, it does that for all Transactional Tables – not only ACID – causing > MicroManaged Tables to behave abnormally. > MicroManaged (insert_only) tables may miss needed properties such as Storage > Desc Params – that may define how lines are delimited (like in the example > below): > To repro the issue: > {code:java} > CREATE TRANSACTIONAL TABLE delim_table_trans(id INT, name STRING, safety INT) > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE; > LOAD DATA INPATH 'table1' OVERWRITE INTO TABLE delim_table_trans; > describe formatted delim_table_trans; > SELECT * FROM delim_table_trans; > {code} > Result: > {code:java} > # Storage Information > SerDe Library:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > >
[jira] [Updated] (HIVE-23871) ObjectStore should properly handle MicroManaged Table properties
[ https://issues.apache.org/jira/browse/HIVE-23871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Panagiotis Garefalakis updated HIVE-23871: -- Description: HIVE-23281 optimizes StorageDescriptor conversion as part of the ObjectStore by skipping particular Table properties like SkewInfo, bucketCols, ordering etc. However, it does that for all Transactional Tables -- not only ACID. This causes MicroManaged (insert_only) table to skip needed properties such as Storage Desc Params -- that may define how lines are delimited (like in the example below): To repro the issue: {code:java} CREATE TRANSACTIONAL TABLE delim_table_trans(id INT, name STRING, safety INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE; LOAD DATA INPATH 'table1' OVERWRITE INTO TABLE delim_table_trans; describe formatted delim_table_trans; SELECT * FROM delim_table_trans; {code} Result: {code:java} # Storage Information SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat:org.apache.hadoop.mapred.TextInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Compressed: No Num Buckets:-1 Bucket Columns: [] Sort Columns: [] PREHOOK: query: SELECT * FROM delim_table_trans PREHOOK: type: QUERY PREHOOK: Input: default@delim_table_trans A masked pattern was here POSTHOOK: query: SELECT * FROM delim_table_trans POSTHOOK: type: QUERY POSTHOOK: Input: default@delim_table_trans A masked pattern was here NULLNULLNULL NULLNULLNULL NULLNULLNULL NULLNULLNULL NULLNULLNULL NULLNULLNULL {code} was: HIVE-23281 optimizes StorageDescriptor conversion as part of the ObjectStore by skipping particular Table properties like SkewInfo, bucketCols, ordering etc. However, it does that for all Transactional Tables -- not only ACID. This causes MicroManaged (insert_only) table to skip needed properties such as Storage Desc Params -- that may define how lines are delimited (like in the example below): To repro the issue: > ObjectStore should properly handle MicroManaged Table properties > > > Key: HIVE-23871 > URL: https://issues.apache.org/jira/browse/HIVE-23871 > Project: Hive > Issue Type: Bug >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Attachments: table1 > > > HIVE-23281 optimizes StorageDescriptor conversion as part of the ObjectStore > by skipping particular Table properties like SkewInfo, bucketCols, ordering > etc. > However, it does that for all Transactional Tables -- not only ACID. > This causes MicroManaged (insert_only) table to skip needed properties such > as Storage Desc Params -- that may define how lines are delimited (like in > the example below): > To repro the issue: > {code:java} > CREATE TRANSACTIONAL TABLE delim_table_trans(id INT, name STRING, safety INT) > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE; > LOAD DATA INPATH 'table1' OVERWRITE INTO TABLE delim_table_trans; > describe formatted delim_table_trans; > SELECT * FROM delim_table_trans; > {code} > Result: > {code:java} > # Storage Information > SerDe Library:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > > InputFormat: org.apache.hadoop.mapred.TextInputFormat > OutputFormat: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > Compressed: No > Num Buckets: -1 > Bucket Columns: [] > Sort Columns: [] > PREHOOK: query: SELECT * FROM delim_table_trans > PREHOOK: type: QUERY > PREHOOK: Input: default@delim_table_trans > A masked pattern was here > POSTHOOK: query: SELECT * FROM delim_table_trans > POSTHOOK: type: QUERY > POSTHOOK: Input: default@delim_table_trans > A masked pattern was here > NULL NULLNULL > NULL NULLNULL > NULL NULLNULL > NULL NULLNULL > NULL NULLNULL > NULL NULLNULL > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23871) ObjectStore should properly handle MicroManaged Table properties
[ https://issues.apache.org/jira/browse/HIVE-23871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Panagiotis Garefalakis updated HIVE-23871: -- Attachment: table1 > ObjectStore should properly handle MicroManaged Table properties > > > Key: HIVE-23871 > URL: https://issues.apache.org/jira/browse/HIVE-23871 > Project: Hive > Issue Type: Bug >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Attachments: table1 > > > HIVE-23281 optimizes StorageDescriptor conversion as part of the ObjectStore > by skipping particular Table properties like SkewInfo, bucketCols, ordering > etc. > However, it does that for all Transactional Tables -- not only ACID. > This causes MicroManaged (insert_only) table to skip needed properties such > as Storage Desc Params -- that may define how lines are delimited (like in > the example below): > To repro the issue: -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23871) ObjectStore should properly handle MicroManaged Table properties
[ https://issues.apache.org/jira/browse/HIVE-23871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Panagiotis Garefalakis updated HIVE-23871: -- Description: HIVE-23281 optimizes StorageDescriptor conversion as part of the ObjectStore by skipping particular Table properties like SkewInfo, bucketCols, ordering etc. However, it does that for all Transactional Tables -- not only ACID. This causes MicroManaged (insert_only) table to skip needed properties such as Storage Desc Params -- that may define how lines are delimited (like in the example below): To repro the issue: was: HIVE-23281 optimizes StorageDescriptor conversion as part of the ObjectStore by skipping particular Table properties like SkewInfo, bucketCols, ordering etc. However, it does not properly handle MicroManaged tables > ObjectStore should properly handle MicroManaged Table properties > > > Key: HIVE-23871 > URL: https://issues.apache.org/jira/browse/HIVE-23871 > Project: Hive > Issue Type: Bug >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > > HIVE-23281 optimizes StorageDescriptor conversion as part of the ObjectStore > by skipping particular Table properties like SkewInfo, bucketCols, ordering > etc. > However, it does that for all Transactional Tables -- not only ACID. > This causes MicroManaged (insert_only) table to skip needed properties such > as Storage Desc Params -- that may define how lines are delimited (like in > the example below): > To repro the issue: -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23871) ObjectStore should properly handle MicroManaged Table properties
[ https://issues.apache.org/jira/browse/HIVE-23871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Panagiotis Garefalakis updated HIVE-23871: -- Issue Type: Bug (was: Improvement) > ObjectStore should properly handle MicroManaged Table properties > > > Key: HIVE-23871 > URL: https://issues.apache.org/jira/browse/HIVE-23871 > Project: Hive > Issue Type: Bug >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > > HIVE-23281 optimizes StorageDescriptor conversion as part of the ObjectStore > by skipping particular Table properties like SkewInfo, bucketCols, ordering > etc. > However, it does not properly handle MicroManaged tables -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23871) ObjectStore should properly handle MicroManaged Table properties
[ https://issues.apache.org/jira/browse/HIVE-23871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Panagiotis Garefalakis updated HIVE-23871: -- Description: HIVE-23281 optimizes StorageDescriptor conversion as part of the ObjectStore by skipping particular Table properties like SkewInfo, bucketCols, ordering etc. However, it does not properly handle MicroManaged tables > ObjectStore should properly handle MicroManaged Table properties > > > Key: HIVE-23871 > URL: https://issues.apache.org/jira/browse/HIVE-23871 > Project: Hive > Issue Type: Improvement >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > > HIVE-23281 optimizes StorageDescriptor conversion as part of the ObjectStore > by skipping particular Table properties like SkewInfo, bucketCols, ordering > etc. > However, it does not properly handle MicroManaged tables -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23871) ObjectStore should properly handle MicroManaged Table properties
[ https://issues.apache.org/jira/browse/HIVE-23871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Panagiotis Garefalakis reassigned HIVE-23871: - > ObjectStore should properly handle MicroManaged Table properties > > > Key: HIVE-23871 > URL: https://issues.apache.org/jira/browse/HIVE-23871 > Project: Hive > Issue Type: Improvement >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (HIVE-23871) ObjectStore should properly handle MicroManaged Table properties
[ https://issues.apache.org/jira/browse/HIVE-23871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-23871 started by Panagiotis Garefalakis. - > ObjectStore should properly handle MicroManaged Table properties > > > Key: HIVE-23871 > URL: https://issues.apache.org/jira/browse/HIVE-23871 > Project: Hive > Issue Type: Improvement >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23727) Improve SQLOperation log handling when canceling background
[ https://issues.apache.org/jira/browse/HIVE-23727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihua Deng updated HIVE-23727: --- Summary: Improve SQLOperation log handling when canceling background (was: Improve SQLOperation log handling when cancel background) > Improve SQLOperation log handling when canceling background > --- > > Key: HIVE-23727 > URL: https://issues.apache.org/jira/browse/HIVE-23727 > Project: Hive > Issue Type: Improvement >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Minor > Labels: pull-request-available > Time Spent: 4h > Remaining Estimate: 0h > > The SQLOperation checks _if (shouldRunAsync() && state != > OperationState.CANCELED && state != OperationState.TIMEDOUT)_ to cancel the > background task. If true, the state should not be OperationState.CANCELED, so > logging under the state == OperationState.CANCELED should never happen. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23727) Improve SQLOperation log handling when cancel background
[ https://issues.apache.org/jira/browse/HIVE-23727?focusedWorklogId=460186=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460186 ] ASF GitHub Bot logged work on HIVE-23727: - Author: ASF GitHub Bot Created on: 17/Jul/20 09:05 Start Date: 17/Jul/20 09:05 Worklog Time Spent: 10m Work Description: dengzhhu653 opened a new pull request #1272: URL: https://github.com/apache/hive/pull/1272 ## NOTICE Please create an issue in ASF JIRA before opening a pull request, and you need to set the title of the pull request which starts with the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY) For more details, please see https://cwiki.apache.org/confluence/display/Hive/HowToContribute This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 460186) Time Spent: 4h (was: 3h 50m) > Improve SQLOperation log handling when cancel background > > > Key: HIVE-23727 > URL: https://issues.apache.org/jira/browse/HIVE-23727 > Project: Hive > Issue Type: Improvement >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Minor > Labels: pull-request-available > Time Spent: 4h > Remaining Estimate: 0h > > The SQLOperation checks _if (shouldRunAsync() && state != > OperationState.CANCELED && state != OperationState.TIMEDOUT)_ to cancel the > background task. If true, the state should not be OperationState.CANCELED, so > logging under the state == OperationState.CANCELED should never happen. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23339) SBA does not check permissions for DB location specified in Create or Alter database query
[ https://issues.apache.org/jira/browse/HIVE-23339?focusedWorklogId=460185=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460185 ] ASF GitHub Bot logged work on HIVE-23339: - Author: ASF GitHub Bot Created on: 17/Jul/20 09:04 Start Date: 17/Jul/20 09:04 Worklog Time Spent: 10m Work Description: miklosgergely merged pull request #1011: URL: https://github.com/apache/hive/pull/1011 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 460185) Time Spent: 0.5h (was: 20m) > SBA does not check permissions for DB location specified in Create or Alter > database query > -- > > Key: HIVE-23339 > URL: https://issues.apache.org/jira/browse/HIVE-23339 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 3.1.0 >Reporter: Riju Trivedi >Assignee: Shubham Chaurasia >Priority: Critical > Labels: pull-request-available > Attachments: HIVE-23339.01.patch, HIVE-23339.02.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > With doAs=true and StorageBasedAuthorization provider, create database with > specific location succeeds even if user doesn't have access to that path. > > {code:java} > hadoop fs -ls -d /tmp/cannot_write > drwx-- - hive hadoop 0 2020-04-01 22:53 /tmp/cannot_write > create a database under /tmp/cannot_write. We would expect it to fail, but is > actually created successfully with "hive" as the owner: > rtrivedi@bdp01:~> beeline -e "create database rtrivedi_1 location > '/tmp/cannot_write/rtrivedi_1'" > INFO : OK > No rows affected (0.116 seconds) > hive@hpchdd2e:~> hadoop fs -ls /tmp/cannot_write > Found 1 items > drwx-- - hive hadoop 0 2020-04-01 23:05 /tmp/cannot_write/rtrivedi_1 > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23727) Improve SQLOperation log handling when cancel background
[ https://issues.apache.org/jira/browse/HIVE-23727?focusedWorklogId=460182=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460182 ] ASF GitHub Bot logged work on HIVE-23727: - Author: ASF GitHub Bot Created on: 17/Jul/20 09:01 Start Date: 17/Jul/20 09:01 Worklog Time Spent: 10m Work Description: dengzhhu653 closed pull request #1149: URL: https://github.com/apache/hive/pull/1149 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 460182) Time Spent: 3h 50m (was: 3h 40m) > Improve SQLOperation log handling when cancel background > > > Key: HIVE-23727 > URL: https://issues.apache.org/jira/browse/HIVE-23727 > Project: Hive > Issue Type: Improvement >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Minor > Labels: pull-request-available > Time Spent: 3h 50m > Remaining Estimate: 0h > > The SQLOperation checks _if (shouldRunAsync() && state != > OperationState.CANCELED && state != OperationState.TIMEDOUT)_ to cancel the > background task. If true, the state should not be OperationState.CANCELED, so > logging under the state == OperationState.CANCELED should never happen. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23867) Truncate table fail with AccessControlException if doAs enabled and tbl database has source of replication
[ https://issues.apache.org/jira/browse/HIVE-23867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17159785#comment-17159785 ] Aasha Medhi commented on HIVE-23867: After this patch HIVE-22736, for external tables we skip CM. So you will not face this issue. > Truncate table fail with AccessControlException if doAs enabled and tbl > database has source of replication > -- > > Key: HIVE-23867 > URL: https://issues.apache.org/jira/browse/HIVE-23867 > Project: Hive > Issue Type: Bug > Components: Hive, repl >Affects Versions: 3.1.1 >Reporter: Rajkumar Singh >Priority: Major > > Steps to repro: > 1. enable doAs > 2. with some user (not a super user) create database > create database sampledb with dbproperties('repl.source.for'='1,2,3'); > 3. create table using create table sampledb.sampletble (id int); > 4. insert some data into it insert into sampledb.sampletble values (1), > (2),(3); > 5. Run truncate command on the table which fail with following error > {code:java} > org.apache.hadoop.ipc.RemoteException: User username is not a super user > (non-super user cannot change owner). > at > org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setOwner(FSDirAttrOp.java:85) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setOwner(FSNamesystem.java:1907) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setOwner(NameNodeRpcServer.java:866) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setOwner(ClientNamenodeProtocolServerSideTranslatorPB.java:531) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1498) > ~[hadoop-common-3.1.1.3.1.5.0-152.jar:?] > at org.apache.hadoop.ipc.Client.call(Client.java:1444) > ~[hadoop-common-3.1.1.3.1.5.0-152.jar:?] > at org.apache.hadoop.ipc.Client.call(Client.java:1354) > ~[hadoop-common-3.1.1.3.1.5.0-152.jar:?] > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) > ~[hadoop-common-3.1.1.3.1.5.0-152.jar:?] > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) > ~[hadoop-common-3.1.1.3.1.5.0-152.jar:?] > at com.sun.proxy.$Proxy31.setOwner(Unknown Source) ~[?:?] > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.setOwner(ClientNamenodeProtocolTranslatorPB.java:470) > ~[hadoop-hdfs-client-3.1.1.3.1.5.0-152.jar:?] > at sun.reflect.GeneratedMethodAccessor151.invoke(Unknown Source) ~[?:?] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:1.8.0_232] > at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_232] > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) > [hadoop-common-3.1.1.3.1.5.0-152.jar:?] > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) > ~[hadoop-common-3.1.1.3.1.5.0-152.jar:?] > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) > ~[hadoop-common-3.1.1.3.1.5.0-152.jar:?] > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) > [hadoop-common-3.1.1.3.1.5.0-152.jar:?] > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) > [hadoop-common-3.1.1.3.1.5.0-152.jar:?] > at com.sun.proxy.$Proxy32.setOwner(Unknown Source) [?:?] > at org.apache.hadoop.hdfs.DFSClient.setOwner(DFSClient.java:1914) > [hadoop-hdfs-client-3.1.1.3.1.5.0-152.jar:?] > at > org.apache.hadoop.hdfs.DistributedFileSystem$36.doCall(DistributedFileSystem.java:1764) > [hadoop-hdfs-client-3.1.1.3.1.5.0-152.jar:?] > at > org.apache.hadoop.hdfs.DistributedFileSystem$36.doCall(DistributedFileSystem.java:1761) > [hadoop-hdfs-client-3.1.1.3.1.5.0-152.jar:?] > at >
[jira] [Commented] (HIVE-23069) Memory efficient iterator should be used during replication.
[ https://issues.apache.org/jira/browse/HIVE-23069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17159781#comment-17159781 ] Aasha Medhi commented on HIVE-23069: +1 > Memory efficient iterator should be used during replication. > > > Key: HIVE-23069 > URL: https://issues.apache.org/jira/browse/HIVE-23069 > Project: Hive > Issue Type: Improvement >Reporter: Pravin Sinha >Assignee: Pravin Sinha >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23069.01.patch, HIVE-23069.02.patch, > HIVE-23069.03.patch > > Time Spent: 6h > Remaining Estimate: 0h > > Currently the iterator used while copying table data is memory based. In case > of a database with very large number of table/partitions, such iterator may > cause HS2 process to go OOM. > Also introduces a config option to run data copy tasks during repl load > operation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23851) MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions
[ https://issues.apache.org/jira/browse/HIVE-23851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17159745#comment-17159745 ] Syed Shameerur Rahman commented on HIVE-23851: -- [~kgyrtkirk] I have raised a PR following the approach 2, Could you please review? > MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions > > > Key: HIVE-23851 > URL: https://issues.apache.org/jira/browse/HIVE-23851 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > *Steps to reproduce:* > # Create external table > # Run msck command to sync all the partitions with metastore > # Remove one of the partition path > # Run msck repair with partition filtering > *Stack Trace:* > {code:java} > 2020-07-15T02:10:29,045 ERROR [4dad298b-28b1-4e6b-94b6-aa785b60c576 main] > ppr.PartitionExpressionForMetastore: Failed to deserialize the expression > java.lang.IndexOutOfBoundsException: Index: 110, Size: 0 > at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_192] > at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_192] > at > org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:857) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:707) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:806) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:96) > [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.convertExprToFilter(PartitionExpressionForMetastore.java:52) > [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.PartFilterExprUtil.makeExpressionTree(PartFilterExprUtil.java:48) > [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:3593) > [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:80) > [hive-standalone-metastore-server-4.0.0-SNAPSHOT-tests.jar:4.0.0-SNAPSHOT] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_192] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[?:1.8.0_192] > {code} > *Cause:* > In case of msck repair with partition filtering we expect expression proxy > class to be set as PartitionExpressionForMetastore ( > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckAnalyzer.java#L78 > ), While dropping partition we serialize the drop partition filter > expression as ( > https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java#L589 > ) which is incompatible during deserializtion happening in > PartitionExpressionForMetastore ( > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionExpressionForMetastore.java#L52 > ) hence the query fails with Failed to deserialize the expression. > *Solutions*: > I could think of two approaches to this problem > # Since PartitionExpressionForMetastore is required only during parition > pruning step, We can switch back the expression proxy class to > MsckPartitionExpressionProxy once the partition pruning step is done. > # The other solution is to make serialization process in msck drop partition > filter expression compatible with the one with > PartitionExpressionForMetastore, We can do this via Reflection since the drop > partition serialization happens in Msck class (standadlone-metatsore) by this > way we can
[jira] [Updated] (HIVE-23851) MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions
[ https://issues.apache.org/jira/browse/HIVE-23851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-23851: -- Labels: pull-request-available (was: ) > MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions > > > Key: HIVE-23851 > URL: https://issues.apache.org/jira/browse/HIVE-23851 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > *Steps to reproduce:* > # Create external table > # Run msck command to sync all the partitions with metastore > # Remove one of the partition path > # Run msck repair with partition filtering > *Stack Trace:* > {code:java} > 2020-07-15T02:10:29,045 ERROR [4dad298b-28b1-4e6b-94b6-aa785b60c576 main] > ppr.PartitionExpressionForMetastore: Failed to deserialize the expression > java.lang.IndexOutOfBoundsException: Index: 110, Size: 0 > at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_192] > at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_192] > at > org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:857) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:707) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:806) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:96) > [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.convertExprToFilter(PartitionExpressionForMetastore.java:52) > [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.PartFilterExprUtil.makeExpressionTree(PartFilterExprUtil.java:48) > [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:3593) > [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:80) > [hive-standalone-metastore-server-4.0.0-SNAPSHOT-tests.jar:4.0.0-SNAPSHOT] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_192] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[?:1.8.0_192] > {code} > *Cause:* > In case of msck repair with partition filtering we expect expression proxy > class to be set as PartitionExpressionForMetastore ( > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckAnalyzer.java#L78 > ), While dropping partition we serialize the drop partition filter > expression as ( > https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java#L589 > ) which is incompatible during deserializtion happening in > PartitionExpressionForMetastore ( > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionExpressionForMetastore.java#L52 > ) hence the query fails with Failed to deserialize the expression. > *Solutions*: > I could think of two approaches to this problem > # Since PartitionExpressionForMetastore is required only during parition > pruning step, We can switch back the expression proxy class to > MsckPartitionExpressionProxy once the partition pruning step is done. > # The other solution is to make serialization process in msck drop partition > filter expression compatible with the one with > PartitionExpressionForMetastore, We can do this via Reflection since the drop > partition serialization happens in Msck class (standadlone-metatsore) by this > way we can completely remove the need for class MsckPartitionExpressionProxy > and this also helps to reduce the
[jira] [Work logged] (HIVE-23851) MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions
[ https://issues.apache.org/jira/browse/HIVE-23851?focusedWorklogId=460123=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460123 ] ASF GitHub Bot logged work on HIVE-23851: - Author: ASF GitHub Bot Created on: 17/Jul/20 07:36 Start Date: 17/Jul/20 07:36 Worklog Time Spent: 10m Work Description: shameersss1 opened a new pull request #1271: URL: https://github.com/apache/hive/pull/1271 Refer: https://issues.apache.org/jira/browse/HIVE-23851 for more information This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 460123) Remaining Estimate: 0h Time Spent: 10m > MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions > > > Key: HIVE-23851 > URL: https://issues.apache.org/jira/browse/HIVE-23851 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > Fix For: 4.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > *Steps to reproduce:* > # Create external table > # Run msck command to sync all the partitions with metastore > # Remove one of the partition path > # Run msck repair with partition filtering > *Stack Trace:* > {code:java} > 2020-07-15T02:10:29,045 ERROR [4dad298b-28b1-4e6b-94b6-aa785b60c576 main] > ppr.PartitionExpressionForMetastore: Failed to deserialize the expression > java.lang.IndexOutOfBoundsException: Index: 110, Size: 0 > at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_192] > at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_192] > at > org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:857) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:707) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:806) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:96) > [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.convertExprToFilter(PartitionExpressionForMetastore.java:52) > [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.PartFilterExprUtil.makeExpressionTree(PartFilterExprUtil.java:48) > [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:3593) > [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:80) > [hive-standalone-metastore-server-4.0.0-SNAPSHOT-tests.jar:4.0.0-SNAPSHOT] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_192] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[?:1.8.0_192] > {code} > *Cause:* > In case of msck repair with partition filtering we expect expression proxy > class to be set as PartitionExpressionForMetastore ( > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckAnalyzer.java#L78 > ), While dropping partition we serialize the drop partition filter > expression as ( > https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java#L589 > ) which is incompatible during deserializtion happening in > PartitionExpressionForMetastore ( >
[jira] [Work logged] (HIVE-23837) HbaseStorageHandler is not configured properly when the FileSinkOperator is the child of a MergeJoinWork
[ https://issues.apache.org/jira/browse/HIVE-23837?focusedWorklogId=460124=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460124 ] ASF GitHub Bot logged work on HIVE-23837: - Author: ASF GitHub Bot Created on: 17/Jul/20 07:36 Start Date: 17/Jul/20 07:36 Worklog Time Spent: 10m Work Description: deniskuzZ merged pull request #1244: URL: https://github.com/apache/hive/pull/1244 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 460124) Time Spent: 1h 20m (was: 1h 10m) > HbaseStorageHandler is not configured properly when the FileSinkOperator is > the child of a MergeJoinWork > > > Key: HIVE-23837 > URL: https://issues.apache.org/jira/browse/HIVE-23837 > Project: Hive > Issue Type: Bug >Reporter: Peter Varga >Assignee: Peter Varga >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > If the FileSinkOperator's root operator is a MergeJoinWork the > HbaseStorageHandler.configureJobConf will never get called, and the execution > will miss the HBASE_AUTH_TOKEN and the hbase jars. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23474) Deny Repl Dump if the database is a target of replication
[ https://issues.apache.org/jira/browse/HIVE-23474?focusedWorklogId=460100=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460100 ] ASF GitHub Bot logged work on HIVE-23474: - Author: ASF GitHub Bot Created on: 17/Jul/20 06:23 Start Date: 17/Jul/20 06:23 Worklog Time Spent: 10m Work Description: pkumarsinha commented on a change in pull request #1247: URL: https://github.com/apache/hive/pull/1247#discussion_r456242103 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcrossInstances.java ## @@ -944,96 +944,6 @@ public void testIncrementalDumpMultiIteration() throws Throwable { Assert.assertEquals(IncrementalLoadTasksBuilder.getNumIteration(), numEvents); } - @Test - public void testIfCkptAndSourceOfReplPropsIgnoredByReplDump() throws Throwable { Review comment: This was testing that source of replication properties are ignored by replication while the custom ones are not. Can we retain that part? ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/ReplicationSemanticAnalyzer.java ## @@ -219,6 +219,11 @@ private void initReplDump(ASTNode ast) throws HiveException { " as it is not a source of replication (repl.source.for)"); throw new SemanticException(ErrorMsg.REPL_DATABASE_IS_NOT_SOURCE_OF_REPLICATION.getMsg()); } +if (ReplUtils.isTargetOfReplication(database)) { + LOG.error("Cannot dump database " + dbNameOrPattern + +" as it is a target of replication (repl.target.for)"); Review comment: nit: Can accommodate in one line This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 460100) Time Spent: 20m (was: 10m) > Deny Repl Dump if the database is a target of replication > - > > Key: HIVE-23474 > URL: https://issues.apache.org/jira/browse/HIVE-23474 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23474.01.patch, HIVE-23474.02.patch > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)