[jira] [Commented] (HIVE-23716) Support Anti Join in Hive
[ https://issues.apache.org/jira/browse/HIVE-23716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161706#comment-17161706 ] Peter Vary commented on HIVE-23716: --- [~maheshk114]: How efficient would be this implementation for ACID delete_deltas? Currently we use ACID specific readers to read ACID files, and remove deleted rows, but using anti-joins we can automatically get the perf benefits of LLAP IO cache for delete deltas too. > Support Anti Join in Hive > -- > > Key: HIVE-23716 > URL: https://issues.apache.org/jira/browse/HIVE-23716 > Project: Hive > Issue Type: Bug >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23716.01.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > Currently hive does not support Anti join. The query for anti join is > converted to left outer join and null filter on right side join key is added > to get the desired result. This is causing > # Extra computation — The left outer join projects the redundant columns > from right side. Along with that, filtering is done to remove the redundant > rows. This is can be avoided in case of anti join as anti join will project > only the required columns and rows from the left side table. > # Extra shuffle — In case of anti join the duplicate records moved to join > node can be avoided from the child node. This can reduce significant amount > of data movement if the number of distinct rows( join keys) is significant. > # Extra Memory Usage - In case of map based anti join , hash set is > sufficient as just the key is required to check if the records matches the > join condition. In case of left join, we need the key and the non key columns > also and thus a hash table will be required. > For a query like > {code:java} > select wr_order_number FROM web_returns LEFT JOIN web_sales ON > wr_order_number = ws_order_number WHERE ws_order_number IS NULL;{code} > The number of distinct ws_order_number in web_sales table in a typical 10TB > TPCDS set up is just 10% of total records. So when we convert this query to > anti join, instead of 7 billion rows, only 600 million rows are moved to join > node. > In the current patch, just one conversion is done. The pattern of > project->filter->left-join is converted to project->anti-join. This will take > care of sub queries with “not exists” clause. The queries with “not exists” > are converted first to filter + left-join and then its converted to anti > join. The queries with “not in” are not handled in the current patch. > From execution side, both merge join and map join with vectorized execution > is supported for anti join. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23716) Support Anti Join in Hive
[ https://issues.apache.org/jira/browse/HIVE-23716?focusedWorklogId=461406=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461406 ] ASF GitHub Bot logged work on HIVE-23716: - Author: ASF GitHub Bot Created on: 21/Jul/20 04:32 Start Date: 21/Jul/20 04:32 Worklog Time Spent: 10m Work Description: jcamachor commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r457829820 ## File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java ## @@ -2162,7 +2162,8 @@ private static void populateLlapDaemonVarsSet(Set llapDaemonVarsSetLocal "Whether Hive enables the optimization about converting common join into mapjoin based on the input file size. \n" + "If this parameter is on, and the sum of size for n-1 of the tables/partitions for a n-way join is smaller than the\n" + "specified size, the join is directly converted to a mapjoin (there is no conditional task)."), - +HIVE_CONVERT_ANTI_JOIN("hive.auto.convert.anti.join", false, Review comment: Is there any reason why we should not enable this by default in master? It seems it is always beneficial to execute the antijoin since we already have a vectorized implementation too. That would increase the test coverage for the feature. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 461406) Time Spent: 1.5h (was: 1h 20m) > Support Anti Join in Hive > -- > > Key: HIVE-23716 > URL: https://issues.apache.org/jira/browse/HIVE-23716 > Project: Hive > Issue Type: Bug >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23716.01.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > Currently hive does not support Anti join. The query for anti join is > converted to left outer join and null filter on right side join key is added > to get the desired result. This is causing > # Extra computation — The left outer join projects the redundant columns > from right side. Along with that, filtering is done to remove the redundant > rows. This is can be avoided in case of anti join as anti join will project > only the required columns and rows from the left side table. > # Extra shuffle — In case of anti join the duplicate records moved to join > node can be avoided from the child node. This can reduce significant amount > of data movement if the number of distinct rows( join keys) is significant. > # Extra Memory Usage - In case of map based anti join , hash set is > sufficient as just the key is required to check if the records matches the > join condition. In case of left join, we need the key and the non key columns > also and thus a hash table will be required. > For a query like > {code:java} > select wr_order_number FROM web_returns LEFT JOIN web_sales ON > wr_order_number = ws_order_number WHERE ws_order_number IS NULL;{code} > The number of distinct ws_order_number in web_sales table in a typical 10TB > TPCDS set up is just 10% of total records. So when we convert this query to > anti join, instead of 7 billion rows, only 600 million rows are moved to join > node. > In the current patch, just one conversion is done. The pattern of > project->filter->left-join is converted to project->anti-join. This will take > care of sub queries with “not exists” clause. The queries with “not exists” > are converted first to filter + left-join and then its converted to anti > join. The queries with “not in” are not handled in the current patch. > From execution side, both merge join and map join with vectorized execution > is supported for anti join. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23817) Pushing TopN Key operator PKFK inner joins
[ https://issues.apache.org/jira/browse/HIVE-23817?focusedWorklogId=461405=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461405 ] ASF GitHub Bot logged work on HIVE-23817: - Author: ASF GitHub Bot Created on: 21/Jul/20 04:26 Start Date: 21/Jul/20 04:26 Worklog Time Spent: 10m Work Description: jcamachor commented on a change in pull request #1228: URL: https://github.com/apache/hive/pull/1228#discussion_r457824052 ## File path: parser/src/java/org/apache/hadoop/hive/ql/parse/HiveLexerParent.g ## @@ -471,6 +471,8 @@ Number (Digit)+ ( DOT (Digit)* (Exponent)? | Exponent)? ; +PKFK_JOIN: 'PKFK_JOIN'; Review comment: Can we prefix with `KW_` and move above with rest of keywords? ## File path: ql/src/test/queries/clientpositive/topnkey_inner_join.q ## @@ -0,0 +1,50 @@ +drop table if exists customer; +drop table if exists orders; + +create table customer (id int, name string, email string); +create table orders (customer_id int not null enforced, amount int); + +alter table customer add constraint pk_customer_id primary key (id) disable novalidate rely; +alter table orders add constraint fk_order_customer_id foreign key (customer_id) references customer(id) disable novalidate rely; + +insert into customer values + (4, 'Heisenberg', 'heisenb...@email.com'), + (3, 'Smith', 'sm...@email.com'), + (2, 'Jones', 'jo...@email.com'), + (1, 'Robinson', 'robin...@email.com'); + +insert into orders values + (2, 200), + (3, 40), + (1, 100), + (1, 50), + (3, 30); + +set hive.optimize.topnkey=true; +set hive.optimize.limittranspose=false; + +select 'positive: order by columns are coming from child table'; +-- FIXME: explain select * from customer join orders on customer.id = orders.customer_id order by customer.id limit 3; Review comment: I see this example is below. Can FIXME be removed? ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java ## @@ -442,6 +450,40 @@ private QueryBlockInfo convertSource(RelNode r) throws CalciteSemanticException return new QueryBlockInfo(s, ast); } + /** + * Add PK-FK join information to the AST as a query hint + * @param ast + * @param join + * @param swapSides whether the left and right input of the join is swapped + */ + private void addPkFkInfoToAST(ASTNode ast, Join join, boolean swapSides) { +List joinFilters = new ArrayList<>(RelOptUtil.conjunctions(join.getCondition())); +RelMetadataQuery mq = join.getCluster().getMetadataQuery(); +HiveRelOptUtil.PKFKJoinInfo rightInputResult = +HiveRelOptUtil.extractPKFKJoin(join, joinFilters, false, mq); +HiveRelOptUtil.PKFKJoinInfo leftInputResult = +HiveRelOptUtil.extractPKFKJoin(join, joinFilters, true, mq); +// Add the fkJoinIndex (0=left, 1=right, if swapSides is false) to the AST +// check if the nonFK side is filtered +if (leftInputResult.isPkFkJoin && leftInputResult.additionalPredicates.isEmpty()) { + RelNode nonFkInput = join.getRight(); + ast.addChild(pkFkHint(swapSides ? 1 : 0, HiveRelOptUtil.isRowFilteringPlan(mq, nonFkInput))); +} else if (rightInputResult.isPkFkJoin && rightInputResult.additionalPredicates.isEmpty()) { + RelNode nonFkInput = join.getLeft(); + ast.addChild(pkFkHint(swapSides ? 0 : 1, HiveRelOptUtil.isRowFilteringPlan(mq, nonFkInput))); +} + } + + private ASTNode pkFkHint(int fkTableIndex, boolean nonFkSideIsFiltered) { +ParseDriver parseDriver = new ParseDriver(); +try { + return parseDriver.parseHint(String.format("PKFK_JOIN(%d, %s)", + fkTableIndex, nonFkSideIsFiltered ? NON_FK_FILTERED : "notFiltered")); Review comment: Naming (NON_FK_FILTERED : "notFiltered") is a bit confusing. We can simplify to NON_FK_FILTERED vs NON_FK_NOT_FILTERED? Create String in converter for both (or enum). ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/topnkey/TopNKeyPushdownProcessor.java ## @@ -255,6 +260,88 @@ private void pushdownThroughLeftOuterJoin(TopNKeyOperator topNKey) throws Semant } } + private void pushdownInnerJoin(TopNKeyOperator topNKey, int fkJoinInputIndex, boolean nonFkSideIsFiltered) throws SemanticException { +TopNKeyDesc topNKeyDesc = topNKey.getConf(); +CommonJoinOperator join = +(CommonJoinOperator) topNKey.getParentOperators().get(0); +List> joinInputs = join.getParentOperators(); +ReduceSinkOperator fkJoinInput = (ReduceSinkOperator) joinInputs.get(fkJoinInputIndex); +if (nonFkSideIsFiltered) { + LOG.debug("Not pushing {} through {} as non FK side of the join is filtered", topNKey.getName(), join.getName()); + return; +} +// Check column origins: +// 1. If all OrderBy columns are coming from the child (FK)
[jira] [Work logged] (HIVE-23815) output statistics of underlying datastore
[ https://issues.apache.org/jira/browse/HIVE-23815?focusedWorklogId=461396=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461396 ] ASF GitHub Bot logged work on HIVE-23815: - Author: ASF GitHub Bot Created on: 21/Jul/20 03:37 Start Date: 21/Jul/20 03:37 Worklog Time Spent: 10m Work Description: xinghuayu007 commented on pull request #1281: URL: https://github.com/apache/hive/pull/1281#issuecomment-66166 @belugabehr ,Hi, could you have time to have a look at this new feature! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 461396) Time Spent: 3.5h (was: 3h 20m) > output statistics of underlying datastore > -- > > Key: HIVE-23815 > URL: https://issues.apache.org/jira/browse/HIVE-23815 > Project: Hive > Issue Type: Improvement >Reporter: Rossetti Wong >Assignee: Rossetti Wong >Priority: Major > Labels: pull-request-available > Time Spent: 3.5h > Remaining Estimate: 0h > > This patch provides a way to get the statistics data of metastore's > underlying datastore, like MySQL, Oracle and so on. You can get the number > of datastore reads and writes, the average time of transaction execution, the > total active connection and so on. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-23873) Querying Hive JDBCStorageHandler table fails with NPE
[ https://issues.apache.org/jira/browse/HIVE-23873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161674#comment-17161674 ] Chiran Ravani edited comment on HIVE-23873 at 7/21/20, 3:09 AM: [~srahman] Thanks for adding more light to it. I am now able to further narrow down the issue with CBO. The problem appears only when CBO is off. Below is the query run with CBO on for Derby. {code} 20/07/20 19:49:18 [787073ad-c44e-4e04-8cfb-ba49e14602a0 main]: INFO dao.GenericJdbcDatabaseAccessor: Query to execute is [SELECT "IKEY", "bkey", "fkey", "dkey" FROM "EXTERNAL_JDBC_SIMPLE_DERBY2_TABLE1"] 20/07/20 19:49:18 [Hive Hook Proto Log Writer 0]: INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library 20/07/20 19:49:18 [Hive Hook Proto Log Writer 0]: INFO compress.CodecPool: Got brand-new compressor [.deflate] 20/07/20 19:49:18 [787073ad-c44e-4e04-8cfb-ba49e14602a0 main]: INFO jdbc.JdbcSerDe: Blob data = {dkey=OW[class=class java.lang.Double,value=20.0], bkey=OW[class=class java.lang.Long,value=20], fkey=OW[class=class java.lang.Float,value=20.0], IKEY=OW[class=class java.lang.Integer,value=20]} {code} When turning off CBO, query picked is Select * and problem appears. {code} 20/07/21 03:08:44 [ce531fd3-ca3f-4dc4-8681-3cf1233f27df main]: INFO jdbc.JdbcInputFormat: Creating 1 input split limit:-1, offset:0 20/07/21 03:08:44 [ce531fd3-ca3f-4dc4-8681-3cf1233f27df main]: INFO dao.GenericJdbcDatabaseAccessor: Query to execute is [select * from EXTERNAL_JDBC_SIMPLE_DERBY2_TABLE1] Failed with exception java.io.IOException:java.lang.NullPointerException 20/07/21 03:08:44 [ce531fd3-ca3f-4dc4-8681-3cf1233f27df main]: ERROR CliDriver: Failed with exception java.io.IOException:java.lang.NullPointerException java.io.IOException: java.lang.NullPointerException . . Caused by: java.lang.NullPointerException at org.apache.hive.storage.jdbc.JdbcSerDe.deserialize(JdbcSerDe.java:235) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:619) ... 17 more {code} With Oracle, the CBO fails and skipped causing the problem, below is my trace. {code} 20/07/21 02:13:18 [Heartbeater-0]: INFO metastore.RetryingMetaStoreClient: RetryingMetaStoreClient proxy=class org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient ugi=h...@example.com (auth:KERBEROS) retries=24 delay=5 lifetime=0 20/07/21 02:14:31 [f68e0ab8-1585-4db5-a872-20790c5eb3dd main]: ERROR parse.CalcitePlanner: CBO failed, skipping CBO. java.lang.IllegalArgumentException: Multiple entries with same key: APEX_ACTIVITY_LOG=JdbcTable {APEX_ACTIVITY_LOG} and APEX_ACTIVITY_LOG=JdbcTable {APEX_ACTIVITY_LOG} at org.apache.hive.com.google.common.collect.ImmutableMap.checkNoConflict(ImmutableMap.java:136) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hive.com.google.common.collect.RegularImmutableMap.checkNoConflictInKeyBucket(RegularImmutableMap.java:98) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hive.com.google.common.collect.RegularImmutableMap.fromEntryArray(RegularImmutableMap.java:84) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hive.com.google.common.collect.ImmutableMap$Builder.build(ImmutableMap.java:295) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.calcite.adapter.jdbc.JdbcSchema.computeTables(JdbcSchema.java:295) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.calcite.adapter.jdbc.JdbcSchema.getTableMap(JdbcSchema.java:351) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.calcite.adapter.jdbc.JdbcSchema.getTable(JdbcSchema.java:345) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genTableLogicalPlan(CalcitePlanner.java:3203) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:5182) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1849) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1795) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:130) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:915) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:179) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:125)
[jira] [Commented] (HIVE-23873) Querying Hive JDBCStorageHandler table fails with NPE
[ https://issues.apache.org/jira/browse/HIVE-23873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161674#comment-17161674 ] Chiran Ravani commented on HIVE-23873: -- [~srahman] Thanks for adding more light to it. I am now able to further narrow down the issue with CBO. The problem appears only when CBO is off. Below is the query run with CBO on for Derby. {code} 20/07/20 19:49:18 [787073ad-c44e-4e04-8cfb-ba49e14602a0 main]: INFO dao.GenericJdbcDatabaseAccessor: Query to execute is [SELECT "IKEY", "bkey", "fkey", "dkey" FROM "EXTERNAL_JDBC_SIMPLE_DERBY2_TABLE1"] 20/07/20 19:49:18 [Hive Hook Proto Log Writer 0]: INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library 20/07/20 19:49:18 [Hive Hook Proto Log Writer 0]: INFO compress.CodecPool: Got brand-new compressor [.deflate] 20/07/20 19:49:18 [787073ad-c44e-4e04-8cfb-ba49e14602a0 main]: INFO jdbc.JdbcSerDe: Blob data = {dkey=OW[class=class java.lang.Double,value=20.0], bkey=OW[class=class java.lang.Long,value=20], fkey=OW[class=class java.lang.Float,value=20.0], IKEY=OW[class=class java.lang.Integer,value=20]} {code} When turning off CBO, query picked is Select * and problem appears. {code} 20/07/20 20:00:56 [bfdb833e-8366-451c-86ef-f38e2b002690 main]: INFO dao.GenericJdbcDatabaseAccessor: Query to execute is [select * from TESTHIVEJDBCSTORAGE] 20/07/20 20:00:56 [bfdb833e-8366-451c-86ef-f38e2b002690 main]: INFO jdbc.JdbcSerDe: Blob data = {fname=OW[class=class java.lang.String,value=Name1], id=OW[class=class java.lang.Integer,value=1]} {code} With Oracle, the CBO fails and skipped causing the problem, below is my trace. {code} 20/07/21 02:13:18 [Heartbeater-0]: INFO metastore.RetryingMetaStoreClient: RetryingMetaStoreClient proxy=class org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient ugi=h...@example.com (auth:KERBEROS) retries=24 delay=5 lifetime=0 20/07/21 02:14:31 [f68e0ab8-1585-4db5-a872-20790c5eb3dd main]: ERROR parse.CalcitePlanner: CBO failed, skipping CBO. java.lang.IllegalArgumentException: Multiple entries with same key: APEX_ACTIVITY_LOG=JdbcTable {APEX_ACTIVITY_LOG} and APEX_ACTIVITY_LOG=JdbcTable {APEX_ACTIVITY_LOG} at org.apache.hive.com.google.common.collect.ImmutableMap.checkNoConflict(ImmutableMap.java:136) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hive.com.google.common.collect.RegularImmutableMap.checkNoConflictInKeyBucket(RegularImmutableMap.java:98) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hive.com.google.common.collect.RegularImmutableMap.fromEntryArray(RegularImmutableMap.java:84) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hive.com.google.common.collect.ImmutableMap$Builder.build(ImmutableMap.java:295) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.calcite.adapter.jdbc.JdbcSchema.computeTables(JdbcSchema.java:295) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.calcite.adapter.jdbc.JdbcSchema.getTableMap(JdbcSchema.java:351) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.calcite.adapter.jdbc.JdbcSchema.getTable(JdbcSchema.java:345) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genTableLogicalPlan(CalcitePlanner.java:3203) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:5182) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1849) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1795) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:130) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:915) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:179) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:125) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1556) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:541) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12460) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at
[jira] [Updated] (HIVE-23873) Querying Hive JDBCStorageHandler table fails with NPE
[ https://issues.apache.org/jira/browse/HIVE-23873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chiran Ravani updated HIVE-23873: - Attachment: HIVE-23873.02.patch > Querying Hive JDBCStorageHandler table fails with NPE > - > > Key: HIVE-23873 > URL: https://issues.apache.org/jira/browse/HIVE-23873 > Project: Hive > Issue Type: Bug > Components: HiveServer2, JDBC >Affects Versions: 3.1.0, 3.1.1, 3.1.2 >Reporter: Chiran Ravani >Assignee: Chiran Ravani >Priority: Critical > Attachments: HIVE-23873.01.patch, HIVE-23873.02.patch > > > Scenario is Hive table having same schema as table in Oracle, however when we > query the table with data it fails with NPE, below is the trace. > {code} > Caused by: java.io.IOException: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:617) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at > org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:524) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2739) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at > org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:473) > ~[hive-service-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > ... 34 more > Caused by: java.lang.NullPointerException > at > org.apache.hive.storage.jdbc.JdbcSerDe.deserialize(JdbcSerDe.java:164) > ~[hive-jdbc-handler-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:598) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at > org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:524) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2739) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at > org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:473) > ~[hive-service-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > ... 34 more > {code} > Problem appears when column names in Oracle are in Upper case and since in > Hive, table and column names are forced to store in lowercase during > creation. User runs into NPE error while fetching data. > While deserializing data, input consists of column names in lower case which > fails to get the value > https://github.com/apache/hive/blob/rel/release-3.1.2/jdbc-handler/src/main/java/org/apache/hive/storage/jdbc/JdbcSerDe.java#L136 > {code} > rowVal = ((ObjectWritable)value).get(); > {code} > Log Snio: > = > {code} > 2020-07-17T16:49:09,598 INFO [04ed42ec-91d2-4662-aee7-37e840a06036 > HiveServer2-Handler-Pool: Thread-104]: dao.GenericJdbcDatabaseAccessor (:()) > - Query to execute is [select * from TESTHIVEJDBCSTORAGE] > 2020-07-17T16:49:10,642 INFO [04ed42ec-91d2-4662-aee7-37e840a06036 > HiveServer2-Handler-Pool: Thread-104]: jdbc.JdbcSerDe (:()) - *** ColumnKey = > ID > 2020-07-17T16:49:10,642 INFO [04ed42ec-91d2-4662-aee7-37e840a06036 > HiveServer2-Handler-Pool: Thread-104]: jdbc.JdbcSerDe (:()) - *** Blob value > = {fname=OW[class=class java.lang.String,value=Name1], id=OW[class=class > java.lang.Integer,value=1]} > {code} > Simple Reproducer for this case. > = > 1. Create table in Oracle > {code} > create table TESTHIVEJDBCSTORAGE(ID INT, FNAME VARCHAR(20)); > {code} > 2. Insert dummy data. > {code} > Insert into TESTHIVEJDBCSTORAGE values (1, 'Name1'); > {code} > 3. Create JDBCStorageHandler table in Hive. > {code} > CREATE EXTERNAL TABLE default.TESTHIVEJDBCSTORAGE_HIVE_TBL (ID INT, FNAME > VARCHAR(20)) > STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler' > TBLPROPERTIES ( > "hive.sql.database.type" = "ORACLE", > "hive.sql.jdbc.driver" = "oracle.jdbc.OracleDriver", > "hive.sql.jdbc.url" = "jdbc:oracle:thin:@orachehostname/XE", > "hive.sql.dbcp.username" = "chiran", > "hive.sql.dbcp.password" = "supersecurepassword", > "hive.sql.table" = "TESTHIVEJDBCSTORAGE", > "hive.sql.dbcp.maxActive" = "1"
[jira] [Work logged] (HIVE-23843) Improve key evictions in VectorGroupByOperator
[ https://issues.apache.org/jira/browse/HIVE-23843?focusedWorklogId=461375=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461375 ] ASF GitHub Bot logged work on HIVE-23843: - Author: ASF GitHub Bot Created on: 21/Jul/20 02:11 Start Date: 21/Jul/20 02:11 Worklog Time Spent: 10m Work Description: rbalamohan commented on pull request #1250: URL: https://github.com/apache/hive/pull/1250#issuecomment-661564023 Simplified and revised the patch. Q67 @ 10 TB scale shows good improvement (2050+ seconds --> 1500+ seconds) in internal cluster. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 461375) Time Spent: 1h 10m (was: 1h) > Improve key evictions in VectorGroupByOperator > -- > > Key: HIVE-23843 > URL: https://issues.apache.org/jira/browse/HIVE-23843 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > Keys in {{mapKeysAggregationBuffers}} are evicted in random order. Tasks also > get into GC issues when multiple keys are involved in groupbys. It would be > good to provide an option to have LRU based eviction for > mapKeysAggregationBuffers. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23843) Improve key evictions in VectorGroupByOperator
[ https://issues.apache.org/jira/browse/HIVE-23843?focusedWorklogId=461374=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461374 ] ASF GitHub Bot logged work on HIVE-23843: - Author: ASF GitHub Bot Created on: 21/Jul/20 02:07 Start Date: 21/Jul/20 02:07 Worklog Time Spent: 10m Work Description: rbalamohan commented on a change in pull request #1250: URL: https://github.com/apache/hive/pull/1250#discussion_r457792912 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java ## @@ -358,6 +360,34 @@ public void close(boolean aborted) throws HiveException { */ private long numRowsCompareHashAggr; +/** + * To track current memory usage. + */ +private long currMemUsed; + +/** + * Whether to make use of LRUCache for map aggr buffers or not. + */ +private boolean lruCache; + +class LRUCache extends LinkedHashMap { Review comment: LinkedHashMap provides this semantics and already maintains the double linked list internally. Invokes removeEldestEntry on need basis. However, planning to make it a lot more simpler in next iteration of the patch. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 461374) Time Spent: 1h (was: 50m) > Improve key evictions in VectorGroupByOperator > -- > > Key: HIVE-23843 > URL: https://issues.apache.org/jira/browse/HIVE-23843 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > Keys in {{mapKeysAggregationBuffers}} are evicted in random order. Tasks also > get into GC issues when multiple keys are involved in groupbys. It would be > good to provide an option to have LRU based eviction for > mapKeysAggregationBuffers. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23846) Avoid unnecessary serialization and deserialization of bitvectors
[ https://issues.apache.org/jira/browse/HIVE-23846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161655#comment-17161655 ] Naveen Gangam commented on HIVE-23846: -- I have +1'ed the fix on PR. > Avoid unnecessary serialization and deserialization of bitvectors > - > > Key: HIVE-23846 > URL: https://issues.apache.org/jira/browse/HIVE-23846 > Project: Hive > Issue Type: Bug > Components: Standalone Metastore >Reporter: Yu-Wen Lai >Assignee: Yu-Wen Lai >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > In the method *getNdvEstimator* of *ColumnStatsDataInspector*, it > will call isSetBitVectors(), in which it serializes the bitvectors again when > we already have deserialized bitvectors _ndvEstimator_. For example, we can > see this pattern from > [LongColumnStatsDataInspector|[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/columnstats/cache/LongColumnStatsDataInspector.java#L106]]. > This method could check if the _ndvEstimator_ is set first so that it won't > need to serialize and deserialize back. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23432) Add Ranger Replication Metrics
[ https://issues.apache.org/jira/browse/HIVE-23432?focusedWorklogId=461354=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461354 ] ASF GitHub Bot logged work on HIVE-23432: - Author: ASF GitHub Bot Created on: 21/Jul/20 00:33 Start Date: 21/Jul/20 00:33 Worklog Time Spent: 10m Work Description: github-actions[bot] closed pull request #1015: URL: https://github.com/apache/hive/pull/1015 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 461354) Time Spent: 20m (was: 10m) > Add Ranger Replication Metrics > --- > > Key: HIVE-23432 > URL: https://issues.apache.org/jira/browse/HIVE-23432 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23432.01.patch, HIVE-23432.02.patch, > HIVE-23432.03.patch, HIVE-23432.04.patch, HIVE-23432.05.patch, > HIVE-23432.06.patch, HIVE-23432.07.patch > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23438) Missing Rows When Left Outer Join In N-way HybridGraceHashJoin
[ https://issues.apache.org/jira/browse/HIVE-23438?focusedWorklogId=461355=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461355 ] ASF GitHub Bot logged work on HIVE-23438: - Author: ASF GitHub Bot Created on: 21/Jul/20 00:33 Start Date: 21/Jul/20 00:33 Worklog Time Spent: 10m Work Description: github-actions[bot] closed pull request #1014: URL: https://github.com/apache/hive/pull/1014 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 461355) Time Spent: 20m (was: 10m) > Missing Rows When Left Outer Join In N-way HybridGraceHashJoin > -- > > Key: HIVE-23438 > URL: https://issues.apache.org/jira/browse/HIVE-23438 > Project: Hive > Issue Type: Bug > Components: SQL, Tez >Affects Versions: 2.3.4 >Reporter: 范宜臻 >Assignee: 范宜臻 >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23438.001.branch-2.3.patch, > HIVE-23438.branch-2.3.patch > > Time Spent: 20m > Remaining Estimate: 0h > > *Run Test in Patch File* > {code:java} > mvn test -Dtest=TestMiniTezCliDriver -Dqfile=hybridgrace_hashjoin_2.q{code} > *Manual Reproduce* > *STEP 1. Create test data(q_test_init_tez.sql)* > {code:java} > //create table src1 > CREATE TABLE src1 (key STRING COMMENT 'default', value STRING COMMENT > 'default') STORED AS TEXTFILE; > LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv3.txt" INTO TABLE src1; > //create table src2 > CREATE TABLE src2(key STRING COMMENT 'default', value STRING COMMENT > 'default') STORED AS TEXTFILE; > LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv11.txt" OVERWRITE INTO > TABLE src2; > //create table srcpart > CREATE TABLE srcpart (key STRING COMMENT 'default', value STRING COMMENT > 'default') > PARTITIONED BY (ds STRING, hr STRING) > STORED AS TEXTFILE; > LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt" > OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="11"); > LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt" > OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="12"); > LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt" > OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="11"); > LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt" > OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="12");{code} > *STEP 2. Run query* > {code:java} > set hive.auto.convert.join=true; > set hive.auto.convert.join.noconditionaltask=true; > set hive.auto.convert.join.noconditionaltask.size=1000; > set hive.cbo.enable=false; > set hive.mapjoin.hybridgrace.hashtable=true; > select * > from > ( > select key from src1 group by key > ) x > left join src2 z on x.key = z.key > join > ( > select key from srcpart y group by key > ) y on y.key = x.key; > {code} > *EXPECTED RESULT*** > > {code:java} > 128 NULLNULL128 > 146 146 1val_1461 146 > 150 150 1val_1501 150 > 238 NULLNULL238 > 369 NULLNULL369 > 406 406 1val_4061 406 > 273 273 1val_2731 273 > 98NULLNULL98 > 213 213 1val_2131 213 > 255 NULLNULL255 > 401 401 1val_4011 401 > 278 NULLNULL278 > 6666 11val_6611 66 > 224 NULLNULL224 > 311 NULLNULL311 > {code} > > *ACTUAL RESULT* > {code:java} > 128 NULLNULL128 > 146 146 1val_1461 146 > 150 150 1val_1501 150 > 213 213 1val_2131 213 > 238 NULLNULL238 > 273 273 1val_2731 273 > 369 NULLNULL369 > 406 406 1val_4061 406 > 98NULLNULL98 > 401 401 1val_4011 401 > 6666 11val_6611 66 > {code} > > *ROOT CAUSE* > src1 left join src2, src1 is big table and src2 is small table. Join result > between big table row and the corresponding hashtable maybe NO_MATCH state, > however, these NO_MATCH rows is needed because LEFT OUTER JOIN. > In addition, these big table rows will not spilled into matchfile related to > this hashtable on disk because only SPILL state can use `spillBigTableRow`. > Then, these big table rows will be spilled into matchfile in hashtables of > table `srcpart`(second small table) > Finally, when reProcessBigTable, big table rows in matchfile are only read > from `firstSmallTable`, some datum are missing. > > *WORKAROUND* > configure firstSmallTable in
[jira] [Work logged] (HIVE-23786) HMS Server side filter
[ https://issues.apache.org/jira/browse/HIVE-23786?focusedWorklogId=461314=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461314 ] ASF GitHub Bot logged work on HIVE-23786: - Author: ASF GitHub Bot Created on: 20/Jul/20 22:54 Start Date: 20/Jul/20 22:54 Worklog Time Spent: 10m Work Description: sam-an-cloudera commented on a change in pull request #1221: URL: https://github.com/apache/hive/pull/1221#discussion_r457736948 ## File path: ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/metastore/HiveMetaStoreAuthorizer.java ## @@ -312,5 +578,13 @@ private String getCurrentUser() { private String getCurrentUser(HiveMetaStoreAuthorizableEvent authorizableEvent) { return authorizableEvent.getAuthzContext().getUGI().getShortUserName(); } + + private UserGroupInformation getUGI() { +try { + return UserGroupInformation.getCurrentUser(); +} catch (IOException excp) { Review comment: Good catch. I didn't know we were just copy and pasting. Other codes using getCurrentUser were either crashing or rethrow, so I've changed it to throw, and more importantly, return false to the skipAuthorization call. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 461314) Time Spent: 3.5h (was: 3h 20m) > HMS Server side filter > -- > > Key: HIVE-23786 > URL: https://issues.apache.org/jira/browse/HIVE-23786 > Project: Hive > Issue Type: Improvement >Reporter: Sam An >Assignee: Sam An >Priority: Major > Labels: pull-request-available > Time Spent: 3.5h > Remaining Estimate: 0h > > HMS server side filter of results based on authorization. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge
[ https://issues.apache.org/jira/browse/HIVE-23880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161559#comment-17161559 ] Stamatis Zampetakis commented on HIVE-23880: Hi [~abstractdog], I was looking at this part of the code in the past but I had the impression that bitwise OR for the sizes that you cite is in the order of a few seconds (2-3sec) not in the order of minutes. Out of curiosity how did you verify that 1-2 minutes are spend in the computation of the OR operation? Apart from that I'm curious about the benefit brought by the parallelization of this computation. If I recall well when I tried something similar for another scenario the improvement was rather subtle; context-switching, cache misses along with the extra code needed for the parallel version counterbalanced the benefit. I think I have somewhere a micro-bench that I can adapt rather easily for this case. > Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge > --- > > Key: HIVE-23880 > URL: https://issues.apache.org/jira/browse/HIVE-23880 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Attachments: lipwig-output3605036885489193068.svg > > Time Spent: 40m > Remaining Estimate: 0h > > Merging bloom filters in semijoin reduction can become the main bottleneck in > case of large number of source mapper tasks (~1000, Map 1 in below example) > and a large amount of expected entries (50M) in bloom filters. > For example in TPCDS Q93: > {code} > select /*+ semi(store_returns, sr_item_sk, store_sales, 7000)*/ > ss_customer_sk > ,sum(act_sales) sumsales > from (select ss_item_sk > ,ss_ticket_number > ,ss_customer_sk > ,case when sr_return_quantity is not null then > (ss_quantity-sr_return_quantity)*ss_sales_price > else > (ss_quantity*ss_sales_price) end act_sales > from store_sales left outer join store_returns on (sr_item_sk = > ss_item_sk >and > sr_ticket_number = ss_ticket_number) > ,reason > where sr_reason_sk = r_reason_sk > and r_reason_desc = 'reason 66') t > group by ss_customer_sk > order by sumsales, ss_customer_sk > limit 100; > {code} > On 10TB-30TB scale there is a chance that from 3-4 mins of query runtime 1-2 > mins are spent with merging bloom filters (Reducer 2), as in: > [^lipwig-output3605036885489193068.svg] > {code} > -- > VERTICES MODESTATUS TOTAL COMPLETED RUNNING PENDING > FAILED KILLED > -- > Map 3 .. llap SUCCEEDED 1 100 > 0 0 > Map 1 .. llap SUCCEEDED 1263 126300 > 0 0 > Reducer 2 llap RUNNING 1 010 > 0 0 > Map 4 llap RUNNING 6154 0 207 5947 > 0 0 > Reducer 5 llapINITED 43 00 43 > 0 0 > Reducer 6 llapINITED 1 001 > 0 0 > -- > VERTICES: 02/06 [>>--] 16% ELAPSED TIME: 149.98 s > -- > {code} > For example, 70M entries in bloom filter leads to a 436 465 696 bits, so > merging 1263 bloom filters means running ~ 1263 * 436 465 696 bitwise OR > operation, which is very hot codepath, but can be parallelized. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-16490) Hive should not use private HDFS APIs for encryption
[ https://issues.apache.org/jira/browse/HIVE-16490?focusedWorklogId=461282=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461282 ] ASF GitHub Bot logged work on HIVE-16490: - Author: ASF GitHub Bot Created on: 20/Jul/20 21:21 Start Date: 20/Jul/20 21:21 Worklog Time Spent: 10m Work Description: umamaheswararao commented on pull request #1279: URL: https://github.com/apache/hive/pull/1279#issuecomment-661339873 Thanks @ashutoshc for the review!. I have updated it. Please take a look. Thank you This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 461282) Time Spent: 40m (was: 0.5h) > Hive should not use private HDFS APIs for encryption > > > Key: HIVE-16490 > URL: https://issues.apache.org/jira/browse/HIVE-16490 > Project: Hive > Issue Type: Improvement > Components: Encryption >Affects Versions: 2.2.0 >Reporter: Andrew Wang >Assignee: Naveen Gangam >Priority: Critical > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > When compiling against bleeding edge versions of Hive and Hadoop, we > discovered that HIVE-16047 references a private HDFS API, DFSClient, to get > at various encryption related information. The private API was recently > changed by HADOOP-14104, which broke Hive compilation. > It'd be better to instead use publicly supported APIs. HDFS-11687 has been > filed to add whatever encryption APIs are needed by Hive. This JIRA is to > move Hive over to these new APIs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23870) Optimise multiple text conversions in WritableHiveCharObjectInspector.getPrimitiveJavaObject / HiveCharWritable
[ https://issues.apache.org/jira/browse/HIVE-23870?focusedWorklogId=461259=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461259 ] ASF GitHub Bot logged work on HIVE-23870: - Author: ASF GitHub Bot Created on: 20/Jul/20 20:26 Start Date: 20/Jul/20 20:26 Worklog Time Spent: 10m Work Description: belugabehr commented on pull request #1282: URL: https://github.com/apache/hive/pull/1282#issuecomment-661314887 https://github.com/apache/hadoop/pull/2157 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 461259) Time Spent: 40m (was: 0.5h) > Optimise multiple text conversions in > WritableHiveCharObjectInspector.getPrimitiveJavaObject / HiveCharWritable > --- > > Key: HIVE-23870 > URL: https://issues.apache.org/jira/browse/HIVE-23870 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Priority: Major > Labels: pull-request-available > Attachments: image-2020-07-17-11-31-38-241.png > > Time Spent: 40m > Remaining Estimate: 0h > > Observed this when creating materialized view. > [https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/WritableHiveCharObjectInspector.java#L85] > Same content is converted to Text multiple times. > !image-2020-07-17-11-31-38-241.png|width=1048,height=936! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-16490) Hive should not use private HDFS APIs for encryption
[ https://issues.apache.org/jira/browse/HIVE-16490?focusedWorklogId=461246=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461246 ] ASF GitHub Bot logged work on HIVE-16490: - Author: ASF GitHub Bot Created on: 20/Jul/20 20:06 Start Date: 20/Jul/20 20:06 Worklog Time Spent: 10m Work Description: umamaheswararao commented on a change in pull request #1279: URL: https://github.com/apache/hive/pull/1279#discussion_r457661391 ## File path: shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java ## @@ -1223,11 +1223,31 @@ public static boolean isHdfsEncryptionSupported() { private final Configuration conf; public HdfsEncryptionShim(URI uri, Configuration conf) throws IOException { - DistributedFileSystem dfs = (DistributedFileSystem)FileSystem.get(uri, conf); - this.conf = conf; - this.keyProvider = dfs.getClient().getKeyProvider(); this.hdfsAdmin = new HdfsAdmin(uri, conf); + this.keyProvider = getKeyProvider(); +} + +private KeyProvider getKeyProvider() throws IOException { + if (isMethodExist(HdfsAdmin.class, "getKeyProvider")) { Review comment: That is great. I will remove that then. Thanks for pointing. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 461246) Time Spent: 0.5h (was: 20m) > Hive should not use private HDFS APIs for encryption > > > Key: HIVE-16490 > URL: https://issues.apache.org/jira/browse/HIVE-16490 > Project: Hive > Issue Type: Improvement > Components: Encryption >Affects Versions: 2.2.0 >Reporter: Andrew Wang >Assignee: Naveen Gangam >Priority: Critical > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > When compiling against bleeding edge versions of Hive and Hadoop, we > discovered that HIVE-16047 references a private HDFS API, DFSClient, to get > at various encryption related information. The private API was recently > changed by HADOOP-14104, which broke Hive compilation. > It'd be better to instead use publicly supported APIs. HDFS-11687 has been > filed to add whatever encryption APIs are needed by Hive. This JIRA is to > move Hive over to these new APIs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-16490) Hive should not use private HDFS APIs for encryption
[ https://issues.apache.org/jira/browse/HIVE-16490?focusedWorklogId=461243=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461243 ] ASF GitHub Bot logged work on HIVE-16490: - Author: ASF GitHub Bot Created on: 20/Jul/20 20:03 Start Date: 20/Jul/20 20:03 Worklog Time Spent: 10m Work Description: ashutoshc commented on a change in pull request #1279: URL: https://github.com/apache/hive/pull/1279#discussion_r457659517 ## File path: shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java ## @@ -1223,11 +1223,31 @@ public static boolean isHdfsEncryptionSupported() { private final Configuration conf; public HdfsEncryptionShim(URI uri, Configuration conf) throws IOException { - DistributedFileSystem dfs = (DistributedFileSystem)FileSystem.get(uri, conf); - this.conf = conf; - this.keyProvider = dfs.getClient().getKeyProvider(); this.hdfsAdmin = new HdfsAdmin(uri, conf); + this.keyProvider = getKeyProvider(); +} + +private KeyProvider getKeyProvider() throws IOException { + if (isMethodExist(HdfsAdmin.class, "getKeyProvider")) { Review comment: this is not needed. Hive's minimum version required for Hadoop is 3.0 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 461243) Time Spent: 20m (was: 10m) > Hive should not use private HDFS APIs for encryption > > > Key: HIVE-16490 > URL: https://issues.apache.org/jira/browse/HIVE-16490 > Project: Hive > Issue Type: Improvement > Components: Encryption >Affects Versions: 2.2.0 >Reporter: Andrew Wang >Assignee: Naveen Gangam >Priority: Critical > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > When compiling against bleeding edge versions of Hive and Hadoop, we > discovered that HIVE-16047 references a private HDFS API, DFSClient, to get > at various encryption related information. The private API was recently > changed by HADOOP-14104, which broke Hive compilation. > It'd be better to instead use publicly supported APIs. HDFS-11687 has been > filed to add whatever encryption APIs are needed by Hive. This JIRA is to > move Hive over to these new APIs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23882) Compiler should skip MJ keyExpr for probe optimization
[ https://issues.apache.org/jira/browse/HIVE-23882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-23882: -- Labels: pull-request-available (was: ) > Compiler should skip MJ keyExpr for probe optimization > -- > > Key: HIVE-23882 > URL: https://issues.apache.org/jira/browse/HIVE-23882 > Project: Hive > Issue Type: Sub-task >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > In probe we cannot currently support Key expressions (on the big table Side) > as ORC CVs Probe directly the smalltable HT (there is no expr evaluation at > that level). > TezCompiler should take this into account when picking MJs to push probe > details -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23882) Compiler should skip MJ keyExpr for probe optimization
[ https://issues.apache.org/jira/browse/HIVE-23882?focusedWorklogId=461233=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461233 ] ASF GitHub Bot logged work on HIVE-23882: - Author: ASF GitHub Bot Created on: 20/Jul/20 19:30 Start Date: 20/Jul/20 19:30 Worklog Time Spent: 10m Work Description: pgaref opened a new pull request #1286: URL: https://github.com/apache/hive/pull/1286 Change-Id: I1033a65f26592ef3683b7aa0a669e0c378667278 ## NOTICE Please create an issue in ASF JIRA before opening a pull request, and you need to set the title of the pull request which starts with the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY) For more details, please see https://cwiki.apache.org/confluence/display/Hive/HowToContribute This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 461233) Remaining Estimate: 0h Time Spent: 10m > Compiler should skip MJ keyExpr for probe optimization > -- > > Key: HIVE-23882 > URL: https://issues.apache.org/jira/browse/HIVE-23882 > Project: Hive > Issue Type: Sub-task >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > In probe we cannot currently support Key expressions (on the big table Side) > as ORC CVs Probe directly the smalltable HT (there is no expr evaluation at > that level). > TezCompiler should take this into account when picking MJs to push probe > details -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23885) Remove Hive on Spark
[ https://issues.apache.org/jira/browse/HIVE-23885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-23885: -- Labels: pull-request-available (was: ) > Remove Hive on Spark > > > Key: HIVE-23885 > URL: https://issues.apache.org/jira/browse/HIVE-23885 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Critical > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23885) Remove Hive on Spark
[ https://issues.apache.org/jira/browse/HIVE-23885?focusedWorklogId=461225=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461225 ] ASF GitHub Bot logged work on HIVE-23885: - Author: ASF GitHub Bot Created on: 20/Jul/20 19:02 Start Date: 20/Jul/20 19:02 Worklog Time Spent: 10m Work Description: belugabehr opened a new pull request #1285: URL: https://github.com/apache/hive/pull/1285 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 461225) Remaining Estimate: 0h Time Spent: 10m > Remove Hive on Spark > > > Key: HIVE-23885 > URL: https://issues.apache.org/jira/browse/HIVE-23885 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Critical > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23885) Remove Hive on Spark
[ https://issues.apache.org/jira/browse/HIVE-23885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor reassigned HIVE-23885: - > Remove Hive on Spark > > > Key: HIVE-23885 > URL: https://issues.apache.org/jira/browse/HIVE-23885 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Critical > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23870) Optimise multiple text conversions in WritableHiveCharObjectInspector.getPrimitiveJavaObject / HiveCharWritable
[ https://issues.apache.org/jira/browse/HIVE-23870?focusedWorklogId=461217=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461217 ] ASF GitHub Bot logged work on HIVE-23870: - Author: ASF GitHub Bot Created on: 20/Jul/20 18:36 Start Date: 20/Jul/20 18:36 Worklog Time Spent: 10m Work Description: belugabehr commented on pull request #1282: URL: https://github.com/apache/hive/pull/1282#issuecomment-661261016 This is an interesting observation. However, maybe this should be contributed to Hadoop project directly. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 461217) Time Spent: 0.5h (was: 20m) > Optimise multiple text conversions in > WritableHiveCharObjectInspector.getPrimitiveJavaObject / HiveCharWritable > --- > > Key: HIVE-23870 > URL: https://issues.apache.org/jira/browse/HIVE-23870 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Priority: Major > Labels: pull-request-available > Attachments: image-2020-07-17-11-31-38-241.png > > Time Spent: 0.5h > Remaining Estimate: 0h > > Observed this when creating materialized view. > [https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/WritableHiveCharObjectInspector.java#L85] > Same content is converted to Text multiple times. > !image-2020-07-17-11-31-38-241.png|width=1048,height=936! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23870) Optimise multiple text conversions in WritableHiveCharObjectInspector.getPrimitiveJavaObject / HiveCharWritable
[ https://issues.apache.org/jira/browse/HIVE-23870?focusedWorklogId=461211=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461211 ] ASF GitHub Bot logged work on HIVE-23870: - Author: ASF GitHub Bot Created on: 20/Jul/20 18:29 Start Date: 20/Jul/20 18:29 Worklog Time Spent: 10m Work Description: belugabehr commented on pull request #1282: URL: https://github.com/apache/hive/pull/1282#issuecomment-661258825 So, this caches the charLength ? That is the fix here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 461211) Time Spent: 20m (was: 10m) > Optimise multiple text conversions in > WritableHiveCharObjectInspector.getPrimitiveJavaObject / HiveCharWritable > --- > > Key: HIVE-23870 > URL: https://issues.apache.org/jira/browse/HIVE-23870 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Priority: Major > Labels: pull-request-available > Attachments: image-2020-07-17-11-31-38-241.png > > Time Spent: 20m > Remaining Estimate: 0h > > Observed this when creating materialized view. > [https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/WritableHiveCharObjectInspector.java#L85] > Same content is converted to Text multiple times. > !image-2020-07-17-11-31-38-241.png|width=1048,height=936! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-23871) ObjectStore should properly handle MicroManaged Table properties
[ https://issues.apache.org/jira/browse/HIVE-23871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan resolved HIVE-23871. - Fix Version/s: 4.0.0 Resolution: Fixed Pushed to master. Thanks, Panos! > ObjectStore should properly handle MicroManaged Table properties > > > Key: HIVE-23871 > URL: https://issues.apache.org/jira/browse/HIVE-23871 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: table1 > > Time Spent: 1h > Remaining Estimate: 0h > > HIVE-23281 optimizes StorageDescriptor conversion as part of the ObjectStore > by skipping particular Table properties like SkewInfo, bucketCols, ordering > etc. > However, it does that for all Transactional Tables – not only ACID – causing > MicroManaged Tables to behave abnormally. > MicroManaged (insert_only) tables may miss needed properties such as Storage > Desc Params – that may define how lines are delimited (like in the example > below): > To repro the issue: > {code:java} > CREATE TRANSACTIONAL TABLE delim_table_trans(id INT, name STRING, safety INT) > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE; > LOAD DATA INPATH 'table1' OVERWRITE INTO TABLE delim_table_trans; > describe formatted delim_table_trans; > SELECT * FROM delim_table_trans; > {code} > Result: > {code:java} > Table Type: MANAGED_TABLE > Table Parameters: > bucketing_version 2 > numFiles1 > numRows 0 > rawDataSize 0 > totalSize 72 > transactional true > transactional_propertiesinsert_only > A masked pattern was here > > # Storage Information > SerDe Library:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > > InputFormat: org.apache.hadoop.mapred.TextInputFormat > OutputFormat: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > Compressed: No > Num Buckets: -1 > Bucket Columns: [] > Sort Columns: [] > PREHOOK: query: SELECT * FROM delim_table_trans > PREHOOK: type: QUERY > PREHOOK: Input: default@delim_table_trans > A masked pattern was here > POSTHOOK: query: SELECT * FROM delim_table_trans > POSTHOOK: type: QUERY > POSTHOOK: Input: default@delim_table_trans > A masked pattern was here > NULL NULLNULL > NULL NULLNULL > NULL NULLNULL > NULL NULLNULL > NULL NULLNULL > NULL NULLNULL > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23707) Unable to create materialized views with transactions enabled with MySQL metastore
[ https://issues.apache.org/jira/browse/HIVE-23707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161435#comment-17161435 ] wenjun ma commented on HIVE-23707: -- I try MS SQL it works as your reproduce steps. > Unable to create materialized views with transactions enabled with MySQL > metastore > -- > > Key: HIVE-23707 > URL: https://issues.apache.org/jira/browse/HIVE-23707 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 3.1.2 >Reporter: Dustin Koupal >Assignee: wenjun ma >Priority: Blocker > > When attempting to create a materialized view with transactions enabled, we > get the following exception: > > {code:java} > ERROR : FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Failed to > generate new Mapping of type > org.datanucleus.store.rdbms.mapping.java.StringMapping, exception : JDBC type > CLOB declared for field > "org.apache.hadoop.hive.metastore.model.MCreationMetadata.txnList" of java > type java.lang.String cant be mapped for this datastore.ERROR : FAILED: > Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. > MetaException(message:Failed to generate new Mapping of type > org.datanucleus.store.rdbms.mapping.java.StringMapping, exception : JDBC type > CLOB declared for field > "org.apache.hadoop.hive.metastore.model.MCreationMetadata.txnList" of java > type java.lang.String cant be mapped for this datastore.JDBC type CLOB > declared for field > "org.apache.hadoop.hive.metastore.model.MCreationMetadata.txnList" of java > type java.lang.String cant be mapped for this > datastore.org.datanucleus.exceptions.NucleusException: JDBC type CLOB > declared for field > "org.apache.hadoop.hive.metastore.model.MCreationMetadata.txnList" of java > type java.lang.String cant be mapped for this datastore. at > org.datanucleus.store.rdbms.mapping.RDBMSMappingManager.getDatastoreMappingClass(RDBMSMappingManager.java:1386) > at > org.datanucleus.store.rdbms.mapping.RDBMSMappingManager.createDatastoreMapping(RDBMSMappingManager.java:1616) > at > org.datanucleus.store.rdbms.mapping.java.SingleFieldMapping.prepareDatastoreMapping(SingleFieldMapping.java:59) > at > org.datanucleus.store.rdbms.mapping.java.SingleFieldMapping.initialize(SingleFieldMapping.java:48) > at > org.datanucleus.store.rdbms.mapping.RDBMSMappingManager.getMapping(RDBMSMappingManager.java:482) > at > org.datanucleus.store.rdbms.table.ClassTable.manageMembers(ClassTable.java:536) > at > org.datanucleus.store.rdbms.table.ClassTable.manageClass(ClassTable.java:442) > at > org.datanucleus.store.rdbms.table.ClassTable.initializeForClass(ClassTable.java:1270) > at > org.datanucleus.store.rdbms.table.ClassTable.initialize(ClassTable.java:276) > at > org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.initializeClassTables(RDBMSStoreManager.java:3279) > at > org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.run(RDBMSStoreManager.java:2889) > at > org.datanucleus.store.rdbms.AbstractSchemaTransaction.execute(AbstractSchemaTransaction.java:119) > at > org.datanucleus.store.rdbms.RDBMSStoreManager.manageClasses(RDBMSStoreManager.java:1627) > at > org.datanucleus.store.rdbms.RDBMSStoreManager.getDatastoreClass(RDBMSStoreManager.java:672) > at > org.datanucleus.store.rdbms.RDBMSStoreManager.getPropertiesForGenerator(RDBMSStoreManager.java:2088) > at > org.datanucleus.store.AbstractStoreManager.getStrategyValue(AbstractStoreManager.java:1271) > at > org.datanucleus.ExecutionContextImpl.newObjectId(ExecutionContextImpl.java:3760) > at > org.datanucleus.state.StateManagerImpl.setIdentity(StateManagerImpl.java:2267) > at > org.datanucleus.state.StateManagerImpl.initialiseForPersistentNew(StateManagerImpl.java:484) > at > org.datanucleus.state.StateManagerImpl.initialiseForPersistentNew(StateManagerImpl.java:120) > at > org.datanucleus.state.ObjectProviderFactoryImpl.newForPersistentNew(ObjectProviderFactoryImpl.java:218) > at > org.datanucleus.ExecutionContextImpl.persistObjectInternal(ExecutionContextImpl.java:2079) > at > org.datanucleus.ExecutionContextImpl.persistObjectWork(ExecutionContextImpl.java:1923) > at > org.datanucleus.ExecutionContextImpl.persistObject(ExecutionContextImpl.java:1778) > at > org.datanucleus.ExecutionContextThreadedImpl.persistObject(ExecutionContextThreadedImpl.java:217) > at > org.datanucleus.api.jdo.JDOPersistenceManager.jdoMakePersistent(JDOPersistenceManager.java:724) > at > org.datanucleus.api.jdo.JDOPersistenceManager.makePersistent(JDOPersistenceManager.java:749) > at > org.apache.hadoop.hive.metastore.ObjectStore.createTable(ObjectStore.java:1308) > at
[jira] [Work logged] (HIVE-23716) Support Anti Join in Hive
[ https://issues.apache.org/jira/browse/HIVE-23716?focusedWorklogId=461160=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461160 ] ASF GitHub Bot logged work on HIVE-23716: - Author: ASF GitHub Bot Created on: 20/Jul/20 16:37 Start Date: 20/Jul/20 16:37 Worklog Time Spent: 10m Work Description: maheshk114 commented on a change in pull request #1147: URL: https://github.com/apache/hive/pull/1147#discussion_r457545081 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinWithFilterToAntiJoinRule.java ## @@ -0,0 +1,149 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.optimizer.calcite.rules; + +import org.apache.calcite.plan.RelOptRule; +import org.apache.calcite.plan.RelOptRuleCall; +import org.apache.calcite.plan.RelOptUtil; +import org.apache.calcite.rel.RelNode; +import org.apache.calcite.rel.core.Filter; +import org.apache.calcite.rel.core.Join; +import org.apache.calcite.rel.core.JoinRelType; +import org.apache.calcite.rel.core.Project; +import org.apache.calcite.rel.type.RelDataTypeField; +import org.apache.calcite.rex.RexInputRef; +import org.apache.calcite.rex.RexNode; +import org.apache.calcite.sql.SqlKind; +import org.apache.calcite.util.ImmutableBitSet; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.ArrayList; +import java.util.List; + +/** + * Planner rule that converts a join plus filter to anti join. + */ +public class HiveJoinWithFilterToAntiJoinRule extends RelOptRule { + protected static final Logger LOG = LoggerFactory.getLogger(HiveJoinWithFilterToAntiJoinRule.class); + public static final HiveJoinWithFilterToAntiJoinRule INSTANCE = new HiveJoinWithFilterToAntiJoinRule(); + + //HiveProject(fld=[$0]) + // HiveFilter(condition=[IS NULL($1)]) + //HiveJoin(condition=[=($0, $1)], joinType=[left], algorithm=[none], cost=[not available]) + // + // TO + // + //HiveProject(fld_tbl=[$0]) + // HiveAntiJoin(condition=[=($0, $1)], joinType=[anti]) + // + public HiveJoinWithFilterToAntiJoinRule() { +super(operand(Project.class, operand(Filter.class, operand(Join.class, RelOptRule.any(, +"HiveJoinWithFilterToAntiJoinRule:filter"); + } + + // is null filter over a left join. + public void onMatch(final RelOptRuleCall call) { +final Project project = call.rel(0); +final Filter filter = call.rel(1); +final Join join = call.rel(2); +perform(call, project, filter, join); + } + + protected void perform(RelOptRuleCall call, Project project, Filter filter, Join join) { +LOG.debug("Matched HiveAntiJoinRule"); Review comment: sure ..will do that This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 461160) Time Spent: 1h 20m (was: 1h 10m) > Support Anti Join in Hive > -- > > Key: HIVE-23716 > URL: https://issues.apache.org/jira/browse/HIVE-23716 > Project: Hive > Issue Type: Bug >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23716.01.patch > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Currently hive does not support Anti join. The query for anti join is > converted to left outer join and null filter on right side join key is added > to get the desired result. This is causing > # Extra computation — The left outer join projects the redundant columns > from right side. Along with that, filtering is done to remove the redundant > rows. This is can be avoided in case of anti join as anti join will project > only the required columns and rows from
[jira] [Work logged] (HIVE-23716) Support Anti Join in Hive
[ https://issues.apache.org/jira/browse/HIVE-23716?focusedWorklogId=461139=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461139 ] ASF GitHub Bot logged work on HIVE-23716: - Author: ASF GitHub Bot Created on: 20/Jul/20 15:51 Start Date: 20/Jul/20 15:51 Worklog Time Spent: 10m Work Description: ramesh0201 commented on pull request #1147: URL: https://github.com/apache/hive/pull/1147#issuecomment-661124391 Runtime changes look good to me +1. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 461139) Time Spent: 1h 10m (was: 1h) > Support Anti Join in Hive > -- > > Key: HIVE-23716 > URL: https://issues.apache.org/jira/browse/HIVE-23716 > Project: Hive > Issue Type: Bug >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23716.01.patch > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Currently hive does not support Anti join. The query for anti join is > converted to left outer join and null filter on right side join key is added > to get the desired result. This is causing > # Extra computation — The left outer join projects the redundant columns > from right side. Along with that, filtering is done to remove the redundant > rows. This is can be avoided in case of anti join as anti join will project > only the required columns and rows from the left side table. > # Extra shuffle — In case of anti join the duplicate records moved to join > node can be avoided from the child node. This can reduce significant amount > of data movement if the number of distinct rows( join keys) is significant. > # Extra Memory Usage - In case of map based anti join , hash set is > sufficient as just the key is required to check if the records matches the > join condition. In case of left join, we need the key and the non key columns > also and thus a hash table will be required. > For a query like > {code:java} > select wr_order_number FROM web_returns LEFT JOIN web_sales ON > wr_order_number = ws_order_number WHERE ws_order_number IS NULL;{code} > The number of distinct ws_order_number in web_sales table in a typical 10TB > TPCDS set up is just 10% of total records. So when we convert this query to > anti join, instead of 7 billion rows, only 600 million rows are moved to join > node. > In the current patch, just one conversion is done. The pattern of > project->filter->left-join is converted to project->anti-join. This will take > care of sub queries with “not exists” clause. The queries with “not exists” > are converted first to filter + left-join and then its converted to anti > join. The queries with “not in” are not handled in the current patch. > From execution side, both merge join and map join with vectorized execution > is supported for anti join. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-23673) Maven Standard Directories for accumulo-handler
[ https://issues.apache.org/jira/browse/HIVE-23673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor resolved HIVE-23673. --- Resolution: Won't Fix Actually, quite a few projects are like this. This needs to be part of a bigger discussion. > Maven Standard Directories for accumulo-handler > --- > > Key: HIVE-23673 > URL: https://issues.apache.org/jira/browse/HIVE-23673 > Project: Hive > Issue Type: Sub-task >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23673) Maven Standard Directories for accumulo-handler
[ https://issues.apache.org/jira/browse/HIVE-23673?focusedWorklogId=461133=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461133 ] ASF GitHub Bot logged work on HIVE-23673: - Author: ASF GitHub Bot Created on: 20/Jul/20 15:42 Start Date: 20/Jul/20 15:42 Worklog Time Spent: 10m Work Description: belugabehr closed pull request #1088: URL: https://github.com/apache/hive/pull/1088 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 461133) Time Spent: 1h 50m (was: 1h 40m) > Maven Standard Directories for accumulo-handler > --- > > Key: HIVE-23673 > URL: https://issues.apache.org/jira/browse/HIVE-23673 > Project: Hive > Issue Type: Sub-task >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge
[ https://issues.apache.org/jira/browse/HIVE-23880?focusedWorklogId=461130=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461130 ] ASF GitHub Bot logged work on HIVE-23880: - Author: ASF GitHub Bot Created on: 20/Jul/20 15:31 Start Date: 20/Jul/20 15:31 Worklog Time Spent: 10m Work Description: belugabehr commented on a change in pull request #1280: URL: https://github.com/apache/hive/pull/1280#discussion_r457496282 ## File path: storage-api/src/java/org/apache/hive/common/util/BloomKFilter.java ## @@ -527,20 +527,19 @@ public void add(ElementWrapper wrapper) { @Override public void run() { while (!executor.isTerminated() && !queue.isEmpty()) { Review comment: A bit unrelated, but since you're touching this code. This check is completely useless: ``` while (!executor.isTerminated() && !queue.isEmpty()) { ... } ``` I cannot think of many scenarios where the thread needs to check the state of its own `ExecutorService`. If the `ExecutorService` is terminated, it will Interrupt every thread in the pool and that should cause it to cease to run. Also, checking if the `Queue` is empty is improper. You will have two threads that check the state of the Queue (size = 1), see the same non-empty queue, and both try to read, even if there is only one item left. Both should just try to `take` and one will succeed and the other will fail. ## File path: storage-api/src/java/org/apache/hive/common/util/BloomKFilter.java ## @@ -527,20 +527,19 @@ public void add(ElementWrapper wrapper) { @Override public void run() { while (!executor.isTerminated() && !queue.isEmpty()) { -ElementWrapper currentBf = queue.poll(); +ElementWrapper currentBf = null; +try { + currentBf = queue.take(); +} catch (InterruptedException e) { Review comment: Do not ignore. An Interrupt means that it's time to exit. ## File path: storage-api/src/java/org/apache/hive/common/util/BloomKFilter.java ## @@ -506,18 +505,19 @@ public ElementWrapper(byte[] bytes, int start, int length, int modifiedStart, in } private static class BloomFilterMergeWorker implements Runnable { -Queue queue = new LinkedBlockingDeque<>(); +ArrayBlockingQueue queue; Review comment: Use the generic `BlockingQueue` here. ## File path: storage-api/src/java/org/apache/hive/common/util/BloomKFilter.java ## @@ -506,18 +505,19 @@ public ElementWrapper(byte[] bytes, int start, int length, int modifiedStart, in } private static class BloomFilterMergeWorker implements Runnable { -Queue queue = new LinkedBlockingDeque<>(); +ArrayBlockingQueue queue; private ExecutorService executor; private byte[] bfAggregation; private int bfAggregationStart; private int bfAggregationLength; -public BloomFilterMergeWorker(ExecutorService executor, byte[] bfAggregation, int bfAggregationStart, int bfAggregationLength) { +public BloomFilterMergeWorker(ExecutorService executor, int batchSize, byte[] bfAggregation, int bfAggregationStart, int bfAggregationLength) { this.executor = executor; Review comment: Do not capture this value. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 461130) Time Spent: 40m (was: 0.5h) > Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge > --- > > Key: HIVE-23880 > URL: https://issues.apache.org/jira/browse/HIVE-23880 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Attachments: lipwig-output3605036885489193068.svg > > Time Spent: 40m > Remaining Estimate: 0h > > Merging bloom filters in semijoin reduction can become the main bottleneck in > case of large number of source mapper tasks (~1000, Map 1 in below example) > and a large amount of expected entries (50M) in bloom filters. > For example in TPCDS Q93: > {code} > select /*+ semi(store_returns, sr_item_sk, store_sales, 7000)*/ > ss_customer_sk > ,sum(act_sales) sumsales > from (select ss_item_sk > ,ss_ticket_number > ,ss_customer_sk > ,case when sr_return_quantity is not null then >
[jira] [Commented] (HIVE-23707) Unable to create materialized views with transactions enabled with MySQL metastore
[ https://issues.apache.org/jira/browse/HIVE-23707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161321#comment-17161321 ] Dustin Koupal commented on HIVE-23707: -- Thanks for checking. To confirm, did you try with MySQL or MS SQL? We ran into this issue with the MySQL server included in EMR. > Unable to create materialized views with transactions enabled with MySQL > metastore > -- > > Key: HIVE-23707 > URL: https://issues.apache.org/jira/browse/HIVE-23707 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 3.1.2 >Reporter: Dustin Koupal >Assignee: wenjun ma >Priority: Blocker > > When attempting to create a materialized view with transactions enabled, we > get the following exception: > > {code:java} > ERROR : FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Failed to > generate new Mapping of type > org.datanucleus.store.rdbms.mapping.java.StringMapping, exception : JDBC type > CLOB declared for field > "org.apache.hadoop.hive.metastore.model.MCreationMetadata.txnList" of java > type java.lang.String cant be mapped for this datastore.ERROR : FAILED: > Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. > MetaException(message:Failed to generate new Mapping of type > org.datanucleus.store.rdbms.mapping.java.StringMapping, exception : JDBC type > CLOB declared for field > "org.apache.hadoop.hive.metastore.model.MCreationMetadata.txnList" of java > type java.lang.String cant be mapped for this datastore.JDBC type CLOB > declared for field > "org.apache.hadoop.hive.metastore.model.MCreationMetadata.txnList" of java > type java.lang.String cant be mapped for this > datastore.org.datanucleus.exceptions.NucleusException: JDBC type CLOB > declared for field > "org.apache.hadoop.hive.metastore.model.MCreationMetadata.txnList" of java > type java.lang.String cant be mapped for this datastore. at > org.datanucleus.store.rdbms.mapping.RDBMSMappingManager.getDatastoreMappingClass(RDBMSMappingManager.java:1386) > at > org.datanucleus.store.rdbms.mapping.RDBMSMappingManager.createDatastoreMapping(RDBMSMappingManager.java:1616) > at > org.datanucleus.store.rdbms.mapping.java.SingleFieldMapping.prepareDatastoreMapping(SingleFieldMapping.java:59) > at > org.datanucleus.store.rdbms.mapping.java.SingleFieldMapping.initialize(SingleFieldMapping.java:48) > at > org.datanucleus.store.rdbms.mapping.RDBMSMappingManager.getMapping(RDBMSMappingManager.java:482) > at > org.datanucleus.store.rdbms.table.ClassTable.manageMembers(ClassTable.java:536) > at > org.datanucleus.store.rdbms.table.ClassTable.manageClass(ClassTable.java:442) > at > org.datanucleus.store.rdbms.table.ClassTable.initializeForClass(ClassTable.java:1270) > at > org.datanucleus.store.rdbms.table.ClassTable.initialize(ClassTable.java:276) > at > org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.initializeClassTables(RDBMSStoreManager.java:3279) > at > org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.run(RDBMSStoreManager.java:2889) > at > org.datanucleus.store.rdbms.AbstractSchemaTransaction.execute(AbstractSchemaTransaction.java:119) > at > org.datanucleus.store.rdbms.RDBMSStoreManager.manageClasses(RDBMSStoreManager.java:1627) > at > org.datanucleus.store.rdbms.RDBMSStoreManager.getDatastoreClass(RDBMSStoreManager.java:672) > at > org.datanucleus.store.rdbms.RDBMSStoreManager.getPropertiesForGenerator(RDBMSStoreManager.java:2088) > at > org.datanucleus.store.AbstractStoreManager.getStrategyValue(AbstractStoreManager.java:1271) > at > org.datanucleus.ExecutionContextImpl.newObjectId(ExecutionContextImpl.java:3760) > at > org.datanucleus.state.StateManagerImpl.setIdentity(StateManagerImpl.java:2267) > at > org.datanucleus.state.StateManagerImpl.initialiseForPersistentNew(StateManagerImpl.java:484) > at > org.datanucleus.state.StateManagerImpl.initialiseForPersistentNew(StateManagerImpl.java:120) > at > org.datanucleus.state.ObjectProviderFactoryImpl.newForPersistentNew(ObjectProviderFactoryImpl.java:218) > at > org.datanucleus.ExecutionContextImpl.persistObjectInternal(ExecutionContextImpl.java:2079) > at > org.datanucleus.ExecutionContextImpl.persistObjectWork(ExecutionContextImpl.java:1923) > at > org.datanucleus.ExecutionContextImpl.persistObject(ExecutionContextImpl.java:1778) > at > org.datanucleus.ExecutionContextThreadedImpl.persistObject(ExecutionContextThreadedImpl.java:217) > at > org.datanucleus.api.jdo.JDOPersistenceManager.jdoMakePersistent(JDOPersistenceManager.java:724) > at > org.datanucleus.api.jdo.JDOPersistenceManager.makePersistent(JDOPersistenceManager.java:749) > at >
[jira] [Work logged] (HIVE-23865) Use More Java Collections Class
[ https://issues.apache.org/jira/browse/HIVE-23865?focusedWorklogId=461112=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461112 ] ASF GitHub Bot logged work on HIVE-23865: - Author: ASF GitHub Bot Created on: 20/Jul/20 14:42 Start Date: 20/Jul/20 14:42 Worklog Time Spent: 10m Work Description: belugabehr opened a new pull request #1267: URL: https://github.com/apache/hive/pull/1267 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 461112) Time Spent: 50m (was: 40m) > Use More Java Collections Class > --- > > Key: HIVE-23865 > URL: https://issues.apache.org/jira/browse/HIVE-23865 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23865) Use More Java Collections Class
[ https://issues.apache.org/jira/browse/HIVE-23865?focusedWorklogId=461109=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461109 ] ASF GitHub Bot logged work on HIVE-23865: - Author: ASF GitHub Bot Created on: 20/Jul/20 14:38 Start Date: 20/Jul/20 14:38 Worklog Time Spent: 10m Work Description: belugabehr closed pull request #1267: URL: https://github.com/apache/hive/pull/1267 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 461109) Time Spent: 40m (was: 0.5h) > Use More Java Collections Class > --- > > Key: HIVE-23865 > URL: https://issues.apache.org/jira/browse/HIVE-23865 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23875) Add VSCode files to gitignore
[ https://issues.apache.org/jira/browse/HIVE-23875?focusedWorklogId=461086=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461086 ] ASF GitHub Bot logged work on HIVE-23875: - Author: ASF GitHub Bot Created on: 20/Jul/20 14:10 Start Date: 20/Jul/20 14:10 Worklog Time Spent: 10m Work Description: HunterL opened a new pull request #1276: URL: https://github.com/apache/hive/pull/1276 Added VSCode files to gitignore This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 461086) Time Spent: 0.5h (was: 20m) > Add VSCode files to gitignore > - > > Key: HIVE-23875 > URL: https://issues.apache.org/jira/browse/HIVE-23875 > Project: Hive > Issue Type: Improvement >Reporter: Hunter Logan >Assignee: Hunter Logan >Priority: Trivial > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > gitignore currently includes Eclipse and Intellij specific files, should > include VSCode as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23875) Add VSCode files to gitignore
[ https://issues.apache.org/jira/browse/HIVE-23875?focusedWorklogId=461080=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461080 ] ASF GitHub Bot logged work on HIVE-23875: - Author: ASF GitHub Bot Created on: 20/Jul/20 13:58 Start Date: 20/Jul/20 13:58 Worklog Time Spent: 10m Work Description: HunterL closed pull request #1276: URL: https://github.com/apache/hive/pull/1276 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 461080) Time Spent: 20m (was: 10m) > Add VSCode files to gitignore > - > > Key: HIVE-23875 > URL: https://issues.apache.org/jira/browse/HIVE-23875 > Project: Hive > Issue Type: Improvement >Reporter: Hunter Logan >Assignee: Hunter Logan >Priority: Trivial > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > gitignore currently includes Eclipse and Intellij specific files, should > include VSCode as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge
[ https://issues.apache.org/jira/browse/HIVE-23880?focusedWorklogId=461077=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461077 ] ASF GitHub Bot logged work on HIVE-23880: - Author: ASF GitHub Bot Created on: 20/Jul/20 13:51 Start Date: 20/Jul/20 13:51 Worklog Time Spent: 10m Work Description: abstractdog commented on a change in pull request #1280: URL: https://github.com/apache/hive/pull/1280#discussion_r457402888 ## File path: storage-api/src/java/org/apache/hive/common/util/BloomKFilter.java ## @@ -362,16 +379,178 @@ public static void mergeBloomFilterBytes( // Just bitwise-OR the bits together - size/# functions should be the same, // rest of the data is serialized long values for the bitset which are supposed to be bitwise-ORed. -for (int idx = START_OF_SERIALIZED_LONGS; idx < bf1Length; ++idx) { +for (int idx = mergeStart; idx < mergeEnd; ++idx) { bf1Bytes[bf1Start + idx] |= bf2Bytes[bf2Start + idx]; } } + public static void mergeBloomFilterBytesFromInputColumn( + byte[] bf1Bytes, int bf1Start, int bf1Length, long bf1ExpectedEntries, + BytesColumnVector inputColumn, int batchSize, boolean selectedInUse, int[] selected, int numThreads) { +if (numThreads == 0) { + numThreads = Runtime.getRuntime().availableProcessors(); +} +if (numThreads < 0) { + throw new RuntimeException("invalid number of threads: " + numThreads); +} + +ExecutorService executor = Executors.newFixedThreadPool(numThreads); + +BloomFilterMergeWorker[] workers = new BloomFilterMergeWorker[numThreads]; +for (int f = 0; f < numThreads; f++) { + workers[f] = new BloomFilterMergeWorker(executor, bf1Bytes, bf1Start, bf1Length); +} + +// split every bloom filter (represented by a part of a byte[]) across workers +for (int j = 0; j < batchSize; j++) { + if (!selectedInUse && inputColumn.noNulls) { +splitVectorAcrossWorkers(workers, inputColumn.vector[j], inputColumn.start[j], +inputColumn.length[j]); + } else if (!selectedInUse) { +if (!inputColumn.isNull[j]) { + splitVectorAcrossWorkers(workers, inputColumn.vector[j], inputColumn.start[j], + inputColumn.length[j]); +} + } else if (inputColumn.noNulls) { +int i = selected[j]; +splitVectorAcrossWorkers(workers, inputColumn.vector[i], inputColumn.start[i], +inputColumn.length[i]); + } else { +int i = selected[j]; +if (!inputColumn.isNull[i]) { + splitVectorAcrossWorkers(workers, inputColumn.vector[i], inputColumn.start[i], + inputColumn.length[i]); +} + } +} + +for (int f = 0; f < numThreads; f++) { + executor.submit(workers[f]); +} + +executor.shutdown(); +try { + executor.awaitTermination(3600, TimeUnit.SECONDS); +} catch (InterruptedException e) { + throw new RuntimeException(e); +} + } + + private static void splitVectorAcrossWorkers(BloomFilterMergeWorker[] workers, byte[] bytes, + int start, int length) { +if (bytes == null || length == 0) { + return; +} +/* + * This will split a byte[] across workers as below: + * let's say there are 10 workers for 7813 bytes, in this case + * length: 7813, elementPerBatch: 781 + * bytes assigned to workers: inclusive lower bound, exclusive upper bound + * 1. worker: 5 -> 786 + * 2. worker: 786 -> 1567 + * 3. worker: 1567 -> 2348 + * 4. worker: 2348 -> 3129 + * 5. worker: 3129 -> 3910 + * 6. worker: 3910 -> 4691 + * 7. worker: 4691 -> 5472 + * 8. worker: 5472 -> 6253 + * 9. worker: 6253 -> 7034 + * 10. worker: 7034 -> 7813 (last element per batch is: 779) + * + * This way, a particular worker will be given with the same part + * of all bloom filters along with the shared base bloom filter, + * so the bitwise OR function will not be a subject of threading/sync issues. + */ +int elementPerBatch = +(int) Math.ceil((double) (length - START_OF_SERIALIZED_LONGS) / workers.length); + +for (int w = 0; w < workers.length; w++) { + int modifiedStart = START_OF_SERIALIZED_LONGS + w * elementPerBatch; + int modifiedLength = (w == workers.length - 1) +? length - (START_OF_SERIALIZED_LONGS + w * elementPerBatch) : elementPerBatch; + + ElementWrapper wrapper = + new ElementWrapper(bytes, start, length, modifiedStart, modifiedLength); + workers[w].add(wrapper); +} + } + + public static byte[] getInitialBytes(long expectedEntries) { +ByteArrayOutputStream bytesOut = null; +try { + bytesOut = new ByteArrayOutputStream(); + BloomKFilter bf = new BloomKFilter(expectedEntries); + BloomKFilter.serialize(bytesOut, bf); + return
[jira] [Commented] (HIVE-23883) Streaming does not flush the side file
[ https://issues.apache.org/jira/browse/HIVE-23883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161246#comment-17161246 ] Peter Vary commented on HIVE-23883: --- CC: [~kuczoram], [~klcopp] > Streaming does not flush the side file > -- > > Key: HIVE-23883 > URL: https://issues.apache.org/jira/browse/HIVE-23883 > Project: Hive > Issue Type: Bug > Components: Streaming, Transactions >Reporter: Peter Vary >Priority: Major > > When a streaming write commits a mid-batch write with > {{connection.commitTransaction()}} then it tries to flush the sideFile with > {{OrcInputFormat.SHIMS.hflush(flushLengths)}}. This uses > FSOutputSummer.flush, which does not flush the buffer data to the disk so the > actual data is not written. > Had to remove the check from the end of the streaming tests in > {{TestCrudCompactorOnTez.java}} > {code:java} > CompactorTestUtilities.checkAcidVersion(fs.listFiles(new > Path(table.getSd().getLocation()), true), fs, > conf.getBoolVar(HiveConf.ConfVars.HIVE_WRITE_ACID_VERSION_FILE), > new String[] { AcidUtils.DELTA_PREFIX }); > {code} > These checks verifies the {{_flush_length}} files, and they would fail > otherwise. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-23836) Make "cols" dependent so that it cascade deletes
[ https://issues.apache.org/jira/browse/HIVE-23836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor resolved HIVE-23836. --- Fix Version/s: 4.0.0 Resolution: Fixed Pushed to master! Thanks. > Make "cols" dependent so that it cascade deletes > > > Key: HIVE-23836 > URL: https://issues.apache.org/jira/browse/HIVE-23836 > Project: Hive > Issue Type: Bug >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > > {quote} > If you want the deletion of a persistent object to cause the deletion of > related objects then you need to mark the related fields in the mapping to be > "dependent". > {quote} > http://www.datanucleus.org/products/accessplatform/jdo/persistence.html#dependent_fields > http://www.datanucleus.org/products/datanucleus/jdo/persistence.html#_deleting_an_object > The database won't do it: > {code:sql|title=Derby Schema} > ALTER TABLE "APP"."COLUMNS_V2" ADD CONSTRAINT "COLUMNS_V2_FK1" FOREIGN KEY > ("CD_ID") REFERENCES "APP"."CDS" ("CD_ID") ON DELETE NO ACTION ON UPDATE NO > ACTION; > {code} > https://github.com/apache/hive/blob/65cf6957cf9432277a096f91b40985237274579f/standalone-metastore/metastore-server/src/main/sql/derby/hive-schema-4.0.0.derby.sql#L452 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23836) Make "cols" dependent so that it cascade deletes
[ https://issues.apache.org/jira/browse/HIVE-23836?focusedWorklogId=461066=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461066 ] ASF GitHub Bot logged work on HIVE-23836: - Author: ASF GitHub Bot Created on: 20/Jul/20 13:31 Start Date: 20/Jul/20 13:31 Worklog Time Spent: 10m Work Description: belugabehr merged pull request #1239: URL: https://github.com/apache/hive/pull/1239 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 461066) Time Spent: 40m (was: 0.5h) > Make "cols" dependent so that it cascade deletes > > > Key: HIVE-23836 > URL: https://issues.apache.org/jira/browse/HIVE-23836 > Project: Hive > Issue Type: Bug >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > {quote} > If you want the deletion of a persistent object to cause the deletion of > related objects then you need to mark the related fields in the mapping to be > "dependent". > {quote} > http://www.datanucleus.org/products/accessplatform/jdo/persistence.html#dependent_fields > http://www.datanucleus.org/products/datanucleus/jdo/persistence.html#_deleting_an_object > The database won't do it: > {code:sql|title=Derby Schema} > ALTER TABLE "APP"."COLUMNS_V2" ADD CONSTRAINT "COLUMNS_V2_FK1" FOREIGN KEY > ("CD_ID") REFERENCES "APP"."CDS" ("CD_ID") ON DELETE NO ACTION ON UPDATE NO > ACTION; > {code} > https://github.com/apache/hive/blob/65cf6957cf9432277a096f91b40985237274579f/standalone-metastore/metastore-server/src/main/sql/derby/hive-schema-4.0.0.derby.sql#L452 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23881) Deprecate get_open_txns to use get_open_txns_req method.
[ https://issues.apache.org/jira/browse/HIVE-23881?focusedWorklogId=461056=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461056 ] ASF GitHub Bot logged work on HIVE-23881: - Author: ASF GitHub Bot Created on: 20/Jul/20 13:01 Start Date: 20/Jul/20 13:01 Worklog Time Spent: 10m Work Description: aasha commented on a change in pull request #1284: URL: https://github.com/apache/hive/pull/1284#discussion_r457361443 ## File path: standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift ## @@ -2651,6 +2651,7 @@ PartitionsResponse get_partitions_req(1:PartitionsRequest req) // Transaction and lock management calls // Get just list of open transactions + //Deprecated use get_open_txns_req Review comment: Added This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 461056) Time Spent: 50m (was: 40m) > Deprecate get_open_txns to use get_open_txns_req method. > > > Key: HIVE-23881 > URL: https://issues.apache.org/jira/browse/HIVE-23881 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23881.01.patch > > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23881) Deprecate get_open_txns to use get_open_txns_req method.
[ https://issues.apache.org/jira/browse/HIVE-23881?focusedWorklogId=461053=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461053 ] ASF GitHub Bot logged work on HIVE-23881: - Author: ASF GitHub Bot Created on: 20/Jul/20 12:48 Start Date: 20/Jul/20 12:48 Worklog Time Spent: 10m Work Description: aasha opened a new pull request #1284: URL: https://github.com/apache/hive/pull/1284 ## NOTICE Please create an issue in ASF JIRA before opening a pull request, and you need to set the title of the pull request which starts with the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY) For more details, please see https://cwiki.apache.org/confluence/display/Hive/HowToContribute This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 461053) Time Spent: 0.5h (was: 20m) > Deprecate get_open_txns to use get_open_txns_req method. > > > Key: HIVE-23881 > URL: https://issues.apache.org/jira/browse/HIVE-23881 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23881) Deprecate get_open_txns to use get_open_txns_req method.
[ https://issues.apache.org/jira/browse/HIVE-23881?focusedWorklogId=461054=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461054 ] ASF GitHub Bot logged work on HIVE-23881: - Author: ASF GitHub Bot Created on: 20/Jul/20 12:51 Start Date: 20/Jul/20 12:51 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #1284: URL: https://github.com/apache/hive/pull/1284#discussion_r457354181 ## File path: standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift ## @@ -2651,6 +2651,7 @@ PartitionsResponse get_partitions_req(1:PartitionsRequest req) // Transaction and lock management calls // Get just list of open transactions + //Deprecated use get_open_txns_req Review comment: nit: Could you please add 1 space here? :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 461054) Time Spent: 40m (was: 0.5h) > Deprecate get_open_txns to use get_open_txns_req method. > > > Key: HIVE-23881 > URL: https://issues.apache.org/jira/browse/HIVE-23881 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23881.01.patch > > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23881) Deprecate get_open_txns to use get_open_txns_req method.
[ https://issues.apache.org/jira/browse/HIVE-23881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aasha Medhi updated HIVE-23881: --- Attachment: HIVE-23881.01.patch Status: Patch Available (was: Open) > Deprecate get_open_txns to use get_open_txns_req method. > > > Key: HIVE-23881 > URL: https://issues.apache.org/jira/browse/HIVE-23881 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23881.01.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23881) Deprecate get_open_txns to use get_open_txns_req method.
[ https://issues.apache.org/jira/browse/HIVE-23881?focusedWorklogId=461050=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461050 ] ASF GitHub Bot logged work on HIVE-23881: - Author: ASF GitHub Bot Created on: 20/Jul/20 12:42 Start Date: 20/Jul/20 12:42 Worklog Time Spent: 10m Work Description: aasha closed pull request #1283: URL: https://github.com/apache/hive/pull/1283 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 461050) Time Spent: 20m (was: 10m) > Deprecate get_open_txns to use get_open_txns_req method. > > > Key: HIVE-23881 > URL: https://issues.apache.org/jira/browse/HIVE-23881 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23882) Compiler should skip MJ keyExpr for probe optimization
[ https://issues.apache.org/jira/browse/HIVE-23882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Panagiotis Garefalakis reassigned HIVE-23882: - > Compiler should skip MJ keyExpr for probe optimization > -- > > Key: HIVE-23882 > URL: https://issues.apache.org/jira/browse/HIVE-23882 > Project: Hive > Issue Type: Sub-task >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > > In probe we cannot currently support Key expressions (on the big table Side) > as ORC CVs Probe directly the smalltable HT (there is no expr evaluation at > that level). > TezCompiler should take this into account when picking MJs to push probe > details -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23881) Deprecate get_open_txns to use get_open_txns_req method.
[ https://issues.apache.org/jira/browse/HIVE-23881?focusedWorklogId=461046=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461046 ] ASF GitHub Bot logged work on HIVE-23881: - Author: ASF GitHub Bot Created on: 20/Jul/20 12:19 Start Date: 20/Jul/20 12:19 Worklog Time Spent: 10m Work Description: aasha opened a new pull request #1283: URL: https://github.com/apache/hive/pull/1283 ## NOTICE Please create an issue in ASF JIRA before opening a pull request, and you need to set the title of the pull request which starts with the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY) For more details, please see https://cwiki.apache.org/confluence/display/Hive/HowToContribute This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 461046) Remaining Estimate: 0h Time Spent: 10m > Deprecate get_open_txns to use get_open_txns_req method. > > > Key: HIVE-23881 > URL: https://issues.apache.org/jira/browse/HIVE-23881 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23881) Deprecate get_open_txns to use get_open_txns_req method.
[ https://issues.apache.org/jira/browse/HIVE-23881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-23881: -- Labels: pull-request-available (was: ) > Deprecate get_open_txns to use get_open_txns_req method. > > > Key: HIVE-23881 > URL: https://issues.apache.org/jira/browse/HIVE-23881 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-20441) NPE in GenericUDF when hive.allow.udf.load.on.demand is set to true
[ https://issues.apache.org/jira/browse/HIVE-20441?focusedWorklogId=461009=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461009 ] ASF GitHub Bot logged work on HIVE-20441: - Author: ASF GitHub Bot Created on: 20/Jul/20 10:19 Start Date: 20/Jul/20 10:19 Worklog Time Spent: 10m Work Description: dengzhhu653 commented on a change in pull request #1242: URL: https://github.com/apache/hive/pull/1242#discussion_r457248607 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/Registry.java ## @@ -293,9 +293,10 @@ public FunctionInfo registerPermanentFunction(String functionName, if (registerToSession) { String qualifiedName = FunctionUtils.qualifyFunctionName( functionName, SessionState.get().getCurrentDatabase().toLowerCase()); - if (registerToSessionRegistry(qualifiedName, function) != null) { + FunctionInfo newFunction = registerToSessionRegistry(qualifiedName, function); Review comment: The call ```FunctionRegistry.getFunctionInfo(String functionName)``` will make HS2 will lookup the function from MetaStore when the function does not find in the session or system registry with hive.allow.udf.load.on.demand enabled. If the function is found, a FunctionInfo created by ```new FunctionInfo(functionName, className, resources)``` will be returned, but the genericUDF field of the FunctionInfo is null, https://github.com/apache/hive/blob/fa086ecce543384993a61d5154b14fa3b80df3b5/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionInfo.java#L67-L74 . So when TypeCheckProcFactory.DefaultExprProcessor gets function expr from AstNode, https://github.com/apache/hive/blob/fa086ecce543384993a61d5154b14fa3b80df3b5/ql/src/java/org/apache/hadoop/hive/ql/parse/type/TypeCheckProcFactory.java#L935-L948, The genericUDF got from ```GenericUDF genericUDF = fi.getGenericUDF();``` is null, if the genericUDF is used to create function expr desc afterwards, a npe will be thrown. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 461009) Time Spent: 1h 40m (was: 1.5h) > NPE in GenericUDF when hive.allow.udf.load.on.demand is set to true > > > Key: HIVE-20441 > URL: https://issues.apache.org/jira/browse/HIVE-20441 > Project: Hive > Issue Type: Bug > Components: CLI, HiveServer2 >Affects Versions: 1.2.1, 2.3.3 >Reporter: Hui Huang >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Attachments: HIVE-20441.1.patch, HIVE-20441.2.patch, > HIVE-20441.3.patch, HIVE-20441.4.patch, HIVE-20441.patch > > Time Spent: 1h 40m > Remaining Estimate: 0h > > When hive.allow.udf.load.on.demand is set to true and hiveserver2 has been > started, the new created function from other clients or hiveserver2 will be > loaded from the metastore at the first time. > When the udf is used in where clause, we got a NPE like: > {code:java} > Error executing statement: > org.apache.hive.service.cli.HiveSQLException: Error while compiling > statement: FAILED: NullPointerException null > at > org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at > org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:206) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at > org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:290) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at > org.apache.hive.service.cli.operation.Operation.run(Operation.java:320) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at > org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:530) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAP > SHOT] > at > org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:517) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHO > T] > at > org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:310) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at > org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:542) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at >
[jira] [Work logged] (HIVE-20441) NPE in GenericUDF when hive.allow.udf.load.on.demand is set to true
[ https://issues.apache.org/jira/browse/HIVE-20441?focusedWorklogId=461012=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461012 ] ASF GitHub Bot logged work on HIVE-20441: - Author: ASF GitHub Bot Created on: 20/Jul/20 10:22 Start Date: 20/Jul/20 10:22 Worklog Time Spent: 10m Work Description: dengzhhu653 commented on a change in pull request #1242: URL: https://github.com/apache/hive/pull/1242#discussion_r457248607 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/Registry.java ## @@ -293,9 +293,10 @@ public FunctionInfo registerPermanentFunction(String functionName, if (registerToSession) { String qualifiedName = FunctionUtils.qualifyFunctionName( functionName, SessionState.get().getCurrentDatabase().toLowerCase()); - if (registerToSessionRegistry(qualifiedName, function) != null) { + FunctionInfo newFunction = registerToSessionRegistry(qualifiedName, function); Review comment: The call ```FunctionRegistry.getFunctionInfo(String functionName)``` will make HS2 will lookup the function from MetaStore when the function does not find in the session or system registry with hive.allow.udf.load.on.demand enabled. If the function is found, a FunctionInfo created by ```new FunctionInfo(functionName, className, resources)``` will be returned, but the genericUDF field of the FunctionInfo is null, https://github.com/apache/hive/blob/fa086ecce543384993a61d5154b14fa3b80df3b5/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionInfo.java#L67-L74 . So when TypeCheckProcFactory.DefaultExprProcessor gets function expr from AstNode, https://github.com/apache/hive/blob/fa086ecce543384993a61d5154b14fa3b80df3b5/ql/src/java/org/apache/hadoop/hive/ql/parse/type/TypeCheckProcFactory.java#L935-L948, The genericUDF got from ```GenericUDF genericUDF = fi.getGenericUDF();``` is null, if the genericUDF is used to create function expr desc afterwards, a npe will be thrown. https://github.com/apache/hive/blob/fa086ecce543384993a61d5154b14fa3b80df3b5/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionInfo.java#L117-L123 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 461012) Time Spent: 2h (was: 1h 50m) > NPE in GenericUDF when hive.allow.udf.load.on.demand is set to true > > > Key: HIVE-20441 > URL: https://issues.apache.org/jira/browse/HIVE-20441 > Project: Hive > Issue Type: Bug > Components: CLI, HiveServer2 >Affects Versions: 1.2.1, 2.3.3 >Reporter: Hui Huang >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Attachments: HIVE-20441.1.patch, HIVE-20441.2.patch, > HIVE-20441.3.patch, HIVE-20441.4.patch, HIVE-20441.patch > > Time Spent: 2h > Remaining Estimate: 0h > > When hive.allow.udf.load.on.demand is set to true and hiveserver2 has been > started, the new created function from other clients or hiveserver2 will be > loaded from the metastore at the first time. > When the udf is used in where clause, we got a NPE like: > {code:java} > Error executing statement: > org.apache.hive.service.cli.HiveSQLException: Error while compiling > statement: FAILED: NullPointerException null > at > org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at > org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:206) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at > org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:290) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at > org.apache.hive.service.cli.operation.Operation.run(Operation.java:320) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at > org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:530) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAP > SHOT] > at > org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:517) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHO > T] > at > org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:310) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at >
[jira] [Work logged] (HIVE-23851) MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions
[ https://issues.apache.org/jira/browse/HIVE-23851?focusedWorklogId=461011=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461011 ] ASF GitHub Bot logged work on HIVE-23851: - Author: ASF GitHub Bot Created on: 20/Jul/20 10:22 Start Date: 20/Jul/20 10:22 Worklog Time Spent: 10m Work Description: shameersss1 commented on pull request #1271: URL: https://github.com/apache/hive/pull/1271#issuecomment-660941689 @kgyrtkirk Could you please take a look? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 461011) Time Spent: 50m (was: 40m) > MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions > > > Key: HIVE-23851 > URL: https://issues.apache.org/jira/browse/HIVE-23851 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 50m > Remaining Estimate: 0h > > *Steps to reproduce:* > # Create external table > # Run msck command to sync all the partitions with metastore > # Remove one of the partition path > # Run msck repair with partition filtering > *Stack Trace:* > {code:java} > 2020-07-15T02:10:29,045 ERROR [4dad298b-28b1-4e6b-94b6-aa785b60c576 main] > ppr.PartitionExpressionForMetastore: Failed to deserialize the expression > java.lang.IndexOutOfBoundsException: Index: 110, Size: 0 > at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_192] > at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_192] > at > org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:857) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:707) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:806) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:96) > [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.convertExprToFilter(PartitionExpressionForMetastore.java:52) > [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.PartFilterExprUtil.makeExpressionTree(PartFilterExprUtil.java:48) > [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:3593) > [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:80) > [hive-standalone-metastore-server-4.0.0-SNAPSHOT-tests.jar:4.0.0-SNAPSHOT] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_192] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[?:1.8.0_192] > {code} > *Cause:* > In case of msck repair with partition filtering we expect expression proxy > class to be set as PartitionExpressionForMetastore ( > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckAnalyzer.java#L78 > ), While dropping partition we serialize the drop partition filter > expression as ( > https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java#L589 > ) which is incompatible during deserializtion happening in > PartitionExpressionForMetastore ( >
[jira] [Work logged] (HIVE-20441) NPE in GenericUDF when hive.allow.udf.load.on.demand is set to true
[ https://issues.apache.org/jira/browse/HIVE-20441?focusedWorklogId=461010=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461010 ] ASF GitHub Bot logged work on HIVE-20441: - Author: ASF GitHub Bot Created on: 20/Jul/20 10:21 Start Date: 20/Jul/20 10:21 Worklog Time Spent: 10m Work Description: dengzhhu653 commented on a change in pull request #1242: URL: https://github.com/apache/hive/pull/1242#discussion_r457255692 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/Registry.java ## @@ -293,9 +293,10 @@ public FunctionInfo registerPermanentFunction(String functionName, if (registerToSession) { String qualifiedName = FunctionUtils.qualifyFunctionName( functionName, SessionState.get().getCurrentDatabase().toLowerCase()); - if (registerToSessionRegistry(qualifiedName, function) != null) { + FunctionInfo newFunction = registerToSessionRegistry(qualifiedName, function); Review comment: ```registerToSessionRegistry``` will finally initialize the genericUDF by calling ```FunctionInfo(FunctionType functionType, String displayName, GenericUDF genericUDF, FunctionResource... resources)``` https://github.com/apache/hive/blob/fa086ecce543384993a61d5154b14fa3b80df3b5/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionInfo.java#L76-L83, The genericUDF field of ```FunctionInfo``` returned by this call would not be null. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 461010) Time Spent: 1h 50m (was: 1h 40m) > NPE in GenericUDF when hive.allow.udf.load.on.demand is set to true > > > Key: HIVE-20441 > URL: https://issues.apache.org/jira/browse/HIVE-20441 > Project: Hive > Issue Type: Bug > Components: CLI, HiveServer2 >Affects Versions: 1.2.1, 2.3.3 >Reporter: Hui Huang >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Attachments: HIVE-20441.1.patch, HIVE-20441.2.patch, > HIVE-20441.3.patch, HIVE-20441.4.patch, HIVE-20441.patch > > Time Spent: 1h 50m > Remaining Estimate: 0h > > When hive.allow.udf.load.on.demand is set to true and hiveserver2 has been > started, the new created function from other clients or hiveserver2 will be > loaded from the metastore at the first time. > When the udf is used in where clause, we got a NPE like: > {code:java} > Error executing statement: > org.apache.hive.service.cli.HiveSQLException: Error while compiling > statement: FAILED: NullPointerException null > at > org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at > org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:206) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at > org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:290) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at > org.apache.hive.service.cli.operation.Operation.run(Operation.java:320) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at > org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:530) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAP > SHOT] > at > org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:517) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHO > T] > at > org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:310) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at > org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:542) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at > org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1437) > ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNA > PSHOT] > at > org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1422) > ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNA > PSHOT] > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at >
[jira] [Work logged] (HIVE-20441) NPE in GenericUDF when hive.allow.udf.load.on.demand is set to true
[ https://issues.apache.org/jira/browse/HIVE-20441?focusedWorklogId=461005=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461005 ] ASF GitHub Bot logged work on HIVE-20441: - Author: ASF GitHub Bot Created on: 20/Jul/20 10:10 Start Date: 20/Jul/20 10:10 Worklog Time Spent: 10m Work Description: dengzhhu653 commented on a change in pull request #1242: URL: https://github.com/apache/hive/pull/1242#discussion_r457248607 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/Registry.java ## @@ -293,9 +293,10 @@ public FunctionInfo registerPermanentFunction(String functionName, if (registerToSession) { String qualifiedName = FunctionUtils.qualifyFunctionName( functionName, SessionState.get().getCurrentDatabase().toLowerCase()); - if (registerToSessionRegistry(qualifiedName, function) != null) { + FunctionInfo newFunction = registerToSessionRegistry(qualifiedName, function); Review comment: The call ```FunctionRegistry.getFunctionInfo(String functionName)``` will make HS2 will lookup the function from MetaStore when the function does not find in the session or system registry with hive.allow.udf.load.on.demand enabled. If the function is found, a FunctionInfo created by ```new FunctionInfo(functionName, className, resources)``` will be returned, but the genericUDF field of the FunctionInfo is null, https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionInfo.java#L67-L74 . So when TypeCheckProcFactory.DefaultExprProcessor gets function expr from AstNode, https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/type/TypeCheckProcFactory.java#L935-L948, The genericUDF got from ```GenericUDF genericUDF = fi.getGenericUDF();``` is null, if the genericUDF is used to create function expr desc afterwards, a npe will be thrown. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 461005) Time Spent: 1.5h (was: 1h 20m) > NPE in GenericUDF when hive.allow.udf.load.on.demand is set to true > > > Key: HIVE-20441 > URL: https://issues.apache.org/jira/browse/HIVE-20441 > Project: Hive > Issue Type: Bug > Components: CLI, HiveServer2 >Affects Versions: 1.2.1, 2.3.3 >Reporter: Hui Huang >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Attachments: HIVE-20441.1.patch, HIVE-20441.2.patch, > HIVE-20441.3.patch, HIVE-20441.4.patch, HIVE-20441.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > When hive.allow.udf.load.on.demand is set to true and hiveserver2 has been > started, the new created function from other clients or hiveserver2 will be > loaded from the metastore at the first time. > When the udf is used in where clause, we got a NPE like: > {code:java} > Error executing statement: > org.apache.hive.service.cli.HiveSQLException: Error while compiling > statement: FAILED: NullPointerException null > at > org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at > org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:206) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at > org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:290) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at > org.apache.hive.service.cli.operation.Operation.run(Operation.java:320) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at > org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:530) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAP > SHOT] > at > org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:517) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHO > T] > at > org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:310) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at > org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:542) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at > org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1437) >
[jira] [Work logged] (HIVE-23560) Optimize bootstrap dump to abort only write Transactions
[ https://issues.apache.org/jira/browse/HIVE-23560?focusedWorklogId=461000=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461000 ] ASF GitHub Bot logged work on HIVE-23560: - Author: ASF GitHub Bot Created on: 20/Jul/20 10:02 Start Date: 20/Jul/20 10:02 Worklog Time Spent: 10m Work Description: aasha commented on a change in pull request #1232: URL: https://github.com/apache/hive/pull/1232#discussion_r457242768 ## File path: standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift ## @@ -2066,6 +2066,10 @@ struct GetReplicationMetricsRequest { 3: optional i64 dumpExecutionId } +struct GetOpenTxnsRequest { + 1: required list excludeTxnTypes; Review comment: https://issues.apache.org/jira/browse/HIVE-23881 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 461000) Time Spent: 3.5h (was: 3h 20m) > Optimize bootstrap dump to abort only write Transactions > > > Key: HIVE-23560 > URL: https://issues.apache.org/jira/browse/HIVE-23560 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23560.01.patch, HIVE-23560.02.patch, > HIVE-23560.03.patch, HIVE-23560.04.patch, Optimize bootstrap dump to avoid > aborting all transactions.pdf > > Time Spent: 3.5h > Remaining Estimate: 0h > > Currently before doing a bootstrap dump, we abort all open transactions after > waiting for a configured time. We are proposing to abort only write > transactions for the db under replication and leave the read and repl created > transactions as is. > This doc attached talks about it in detail -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23881) Deprecate get_open_txns to use get_open_txns_req method.
[ https://issues.apache.org/jira/browse/HIVE-23881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aasha Medhi reassigned HIVE-23881: -- Assignee: Aasha Medhi > Deprecate get_open_txns to use get_open_txns_req method. > > > Key: HIVE-23881 > URL: https://issues.apache.org/jira/browse/HIVE-23881 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23560) Optimize bootstrap dump to abort only write Transactions
[ https://issues.apache.org/jira/browse/HIVE-23560?focusedWorklogId=460999=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460999 ] ASF GitHub Bot logged work on HIVE-23560: - Author: ASF GitHub Bot Created on: 20/Jul/20 09:59 Start Date: 20/Jul/20 09:59 Worklog Time Spent: 10m Work Description: aasha commented on a change in pull request #1232: URL: https://github.com/apache/hive/pull/1232#discussion_r457240568 ## File path: standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift ## @@ -2066,6 +2066,10 @@ struct GetReplicationMetricsRequest { 3: optional i64 dumpExecutionId } +struct GetOpenTxnsRequest { + 1: required list excludeTxnTypes; Review comment: Yes makes sense. This pull request is already merged. I will make the param optional in another pull request and will create a ticket to use this new method across and deprecate the original get_open_txns method. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 460999) Time Spent: 3h 20m (was: 3h 10m) > Optimize bootstrap dump to abort only write Transactions > > > Key: HIVE-23560 > URL: https://issues.apache.org/jira/browse/HIVE-23560 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23560.01.patch, HIVE-23560.02.patch, > HIVE-23560.03.patch, HIVE-23560.04.patch, Optimize bootstrap dump to avoid > aborting all transactions.pdf > > Time Spent: 3h 20m > Remaining Estimate: 0h > > Currently before doing a bootstrap dump, we abort all open transactions after > waiting for a configured time. We are proposing to abort only write > transactions for the db under replication and leave the read and repl created > transactions as is. > This doc attached talks about it in detail -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23870) Optimise multiple text conversions in WritableHiveCharObjectInspector.getPrimitiveJavaObject / HiveCharWritable
[ https://issues.apache.org/jira/browse/HIVE-23870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-23870: -- Labels: pull-request-available (was: ) > Optimise multiple text conversions in > WritableHiveCharObjectInspector.getPrimitiveJavaObject / HiveCharWritable > --- > > Key: HIVE-23870 > URL: https://issues.apache.org/jira/browse/HIVE-23870 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Priority: Major > Labels: pull-request-available > Attachments: image-2020-07-17-11-31-38-241.png > > Time Spent: 10m > Remaining Estimate: 0h > > Observed this when creating materialized view. > [https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/WritableHiveCharObjectInspector.java#L85] > Same content is converted to Text multiple times. > !image-2020-07-17-11-31-38-241.png|width=1048,height=936! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23870) Optimise multiple text conversions in WritableHiveCharObjectInspector.getPrimitiveJavaObject / HiveCharWritable
[ https://issues.apache.org/jira/browse/HIVE-23870?focusedWorklogId=460994=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460994 ] ASF GitHub Bot logged work on HIVE-23870: - Author: ASF GitHub Bot Created on: 20/Jul/20 09:52 Start Date: 20/Jul/20 09:52 Worklog Time Spent: 10m Work Description: rbalamohan opened a new pull request #1282: URL: https://github.com/apache/hive/pull/1282 Observed runtime dropping from "7600s --> 4800s" in internal cluster, when running a job which creates materialized view on warehouse/inventory/date_dim (where warehouse had char/varchar). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 460994) Remaining Estimate: 0h Time Spent: 10m > Optimise multiple text conversions in > WritableHiveCharObjectInspector.getPrimitiveJavaObject / HiveCharWritable > --- > > Key: HIVE-23870 > URL: https://issues.apache.org/jira/browse/HIVE-23870 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Priority: Major > Attachments: image-2020-07-17-11-31-38-241.png > > Time Spent: 10m > Remaining Estimate: 0h > > Observed this when creating materialized view. > [https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/WritableHiveCharObjectInspector.java#L85] > Same content is converted to Text multiple times. > !image-2020-07-17-11-31-38-241.png|width=1048,height=936! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23870) Optimise multiple text conversions in WritableHiveCharObjectInspector.getPrimitiveJavaObject / HiveCharWritable
[ https://issues.apache.org/jira/browse/HIVE-23870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HIVE-23870: Summary: Optimise multiple text conversions in WritableHiveCharObjectInspector.getPrimitiveJavaObject / HiveCharWritable (was: Optimise multiple text conversions in WritableHiveCharObjectInspector.getPrimitiveJavaObjec) > Optimise multiple text conversions in > WritableHiveCharObjectInspector.getPrimitiveJavaObject / HiveCharWritable > --- > > Key: HIVE-23870 > URL: https://issues.apache.org/jira/browse/HIVE-23870 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Priority: Major > Attachments: image-2020-07-17-11-31-38-241.png > > > Observed this when creating materialized view. > [https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/WritableHiveCharObjectInspector.java#L85] > Same content is converted to Text multiple times. > !image-2020-07-17-11-31-38-241.png|width=1048,height=936! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-23840) Use LLAP to get orc metadata
[ https://issues.apache.org/jira/browse/HIVE-23840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary resolved HIVE-23840. --- Fix Version/s: 4.0.0 Resolution: Fixed Pushed to master. Thanks for the review [~szita]! > Use LLAP to get orc metadata > > > Key: HIVE-23840 > URL: https://issues.apache.org/jira/browse/HIVE-23840 > Project: Hive > Issue Type: Improvement > Components: Transactions >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > HIVE-23824 added the possibility to access ORC metadata. We can use this to > decide which delta files should be read, and which could be omitted. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23840) Use LLAP to get orc metadata
[ https://issues.apache.org/jira/browse/HIVE-23840?focusedWorklogId=460984=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460984 ] ASF GitHub Bot logged work on HIVE-23840: - Author: ASF GitHub Bot Created on: 20/Jul/20 09:32 Start Date: 20/Jul/20 09:32 Worklog Time Spent: 10m Work Description: pvary merged pull request #1251: URL: https://github.com/apache/hive/pull/1251 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 460984) Time Spent: 1h 10m (was: 1h) > Use LLAP to get orc metadata > > > Key: HIVE-23840 > URL: https://issues.apache.org/jira/browse/HIVE-23840 > Project: Hive > Issue Type: Improvement > Components: Transactions >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > HIVE-23824 added the possibility to access ORC metadata. We can use this to > decide which delta files should be read, and which could be omitted. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23873) Querying Hive JDBCStorageHandler table fails with NPE
[ https://issues.apache.org/jira/browse/HIVE-23873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161068#comment-17161068 ] Syed Shameerur Rahman commented on HIVE-23873: -- [~chiran54321] Interesting... We do have a qtest external_jdbc_table4.q which cover the above use case and it passes. Assuming the above stack Trace was generated with master hive branch, had there been a case sensitive issue https://github.com/apache/hive/blob/master/jdbc-handler/src/main/java/org/apache/hive/storage/jdbc/JdbcSerDe.java#L229 , value should be null and ultimately the the rowVal will be set to null https://github.com/apache/hive/blob/master/jdbc-handler/src/main/java/org/apache/hive/storage/jdbc/JdbcSerDe.java#L233 > Querying Hive JDBCStorageHandler table fails with NPE > - > > Key: HIVE-23873 > URL: https://issues.apache.org/jira/browse/HIVE-23873 > Project: Hive > Issue Type: Bug > Components: HiveServer2, JDBC >Affects Versions: 3.1.0, 3.1.1, 3.1.2 >Reporter: Chiran Ravani >Assignee: Chiran Ravani >Priority: Critical > Attachments: HIVE-23873.01.patch > > > Scenario is Hive table having same schema as table in Oracle, however when we > query the table with data it fails with NPE, below is the trace. > {code} > Caused by: java.io.IOException: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:617) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at > org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:524) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2739) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at > org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:473) > ~[hive-service-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > ... 34 more > Caused by: java.lang.NullPointerException > at > org.apache.hive.storage.jdbc.JdbcSerDe.deserialize(JdbcSerDe.java:164) > ~[hive-jdbc-handler-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:598) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at > org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:524) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2739) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229) > ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > at > org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:473) > ~[hive-service-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152] > ... 34 more > {code} > Problem appears when column names in Oracle are in Upper case and since in > Hive, table and column names are forced to store in lowercase during > creation. User runs into NPE error while fetching data. > While deserializing data, input consists of column names in lower case which > fails to get the value > https://github.com/apache/hive/blob/rel/release-3.1.2/jdbc-handler/src/main/java/org/apache/hive/storage/jdbc/JdbcSerDe.java#L136 > {code} > rowVal = ((ObjectWritable)value).get(); > {code} > Log Snio: > = > {code} > 2020-07-17T16:49:09,598 INFO [04ed42ec-91d2-4662-aee7-37e840a06036 > HiveServer2-Handler-Pool: Thread-104]: dao.GenericJdbcDatabaseAccessor (:()) > - Query to execute is [select * from TESTHIVEJDBCSTORAGE] > 2020-07-17T16:49:10,642 INFO [04ed42ec-91d2-4662-aee7-37e840a06036 > HiveServer2-Handler-Pool: Thread-104]: jdbc.JdbcSerDe (:()) - *** ColumnKey = > ID > 2020-07-17T16:49:10,642 INFO [04ed42ec-91d2-4662-aee7-37e840a06036 > HiveServer2-Handler-Pool: Thread-104]: jdbc.JdbcSerDe (:()) - *** Blob value > = {fname=OW[class=class java.lang.String,value=Name1], id=OW[class=class > java.lang.Integer,value=1]} > {code} > Simple Reproducer for this case. > = > 1. Create table in Oracle > {code} > create table TESTHIVEJDBCSTORAGE(ID INT, FNAME VARCHAR(20)); > {code} > 2. Insert dummy data. > {code} > Insert into TESTHIVEJDBCSTORAGE values (1, 'Name1'); > {code} > 3. Create
[jira] [Work logged] (HIVE-23560) Optimize bootstrap dump to abort only write Transactions
[ https://issues.apache.org/jira/browse/HIVE-23560?focusedWorklogId=460982=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460982 ] ASF GitHub Bot logged work on HIVE-23560: - Author: ASF GitHub Bot Created on: 20/Jul/20 09:31 Start Date: 20/Jul/20 09:31 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #1232: URL: https://github.com/apache/hive/pull/1232#discussion_r457220034 ## File path: standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift ## @@ -2066,6 +2066,10 @@ struct GetReplicationMetricsRequest { 3: optional i64 dumpExecutionId } +struct GetOpenTxnsRequest { + 1: required list excludeTxnTypes; Review comment: Could we make this optional? So if we later want to change the GetOpenTxnsRequest object we do not end up sending an empty list all the time? (as a general rule for new methods we create a Request object for them with all optional fields) Also this might need some checks later in the code, but we might want to merge the codepath for the original get_open_txns method with this new one as soon as possible. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 460982) Time Spent: 3h 10m (was: 3h) > Optimize bootstrap dump to abort only write Transactions > > > Key: HIVE-23560 > URL: https://issues.apache.org/jira/browse/HIVE-23560 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23560.01.patch, HIVE-23560.02.patch, > HIVE-23560.03.patch, HIVE-23560.04.patch, Optimize bootstrap dump to avoid > aborting all transactions.pdf > > Time Spent: 3h 10m > Remaining Estimate: 0h > > Currently before doing a bootstrap dump, we abort all open transactions after > waiting for a configured time. We are proposing to abort only write > transactions for the db under replication and leave the read and repl created > transactions as is. > This doc attached talks about it in detail -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23560) Optimize bootstrap dump to abort only write Transactions
[ https://issues.apache.org/jira/browse/HIVE-23560?focusedWorklogId=460978=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460978 ] ASF GitHub Bot logged work on HIVE-23560: - Author: ASF GitHub Bot Created on: 20/Jul/20 09:28 Start Date: 20/Jul/20 09:28 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #1232: URL: https://github.com/apache/hive/pull/1232#discussion_r457217636 ## File path: standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift ## @@ -2802,6 +2806,7 @@ PartitionsResponse get_partitions_req(1:PartitionsRequest req) void add_replication_metrics(1: ReplicationMetricList replicationMetricList) throws(1:MetaException o1) ReplicationMetricList get_replication_metrics(1: GetReplicationMetricsRequest rqst) throws(1:MetaException o1) + GetOpenTxnsResponse get_open_txns_req(1: GetOpenTxnsRequest getOpenTxnsRequest) Review comment: Fair point This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 460978) Time Spent: 3h (was: 2h 50m) > Optimize bootstrap dump to abort only write Transactions > > > Key: HIVE-23560 > URL: https://issues.apache.org/jira/browse/HIVE-23560 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23560.01.patch, HIVE-23560.02.patch, > HIVE-23560.03.patch, HIVE-23560.04.patch, Optimize bootstrap dump to avoid > aborting all transactions.pdf > > Time Spent: 3h > Remaining Estimate: 0h > > Currently before doing a bootstrap dump, we abort all open transactions after > waiting for a configured time. We are proposing to abort only write > transactions for the db under replication and leave the read and repl created > transactions as is. > This doc attached talks about it in detail -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-20441) NPE in GenericUDF when hive.allow.udf.load.on.demand is set to true
[ https://issues.apache.org/jira/browse/HIVE-20441?focusedWorklogId=460972=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460972 ] ASF GitHub Bot logged work on HIVE-20441: - Author: ASF GitHub Bot Created on: 20/Jul/20 09:07 Start Date: 20/Jul/20 09:07 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #1242: URL: https://github.com/apache/hive/pull/1242#discussion_r457202709 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/Registry.java ## @@ -293,9 +293,10 @@ public FunctionInfo registerPermanentFunction(String functionName, if (registerToSession) { String qualifiedName = FunctionUtils.qualifyFunctionName( functionName, SessionState.get().getCurrentDatabase().toLowerCase()); - if (registerToSessionRegistry(qualifiedName, function) != null) { + FunctionInfo newFunction = registerToSessionRegistry(qualifiedName, function); Review comment: I tried to understand the goal of the change, but could not find the root cause, and do not see what I miss. Could you please explain when this code results in different return value before and after the patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 460972) Time Spent: 1h 20m (was: 1h 10m) > NPE in GenericUDF when hive.allow.udf.load.on.demand is set to true > > > Key: HIVE-20441 > URL: https://issues.apache.org/jira/browse/HIVE-20441 > Project: Hive > Issue Type: Bug > Components: CLI, HiveServer2 >Affects Versions: 1.2.1, 2.3.3 >Reporter: Hui Huang >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Attachments: HIVE-20441.1.patch, HIVE-20441.2.patch, > HIVE-20441.3.patch, HIVE-20441.4.patch, HIVE-20441.patch > > Time Spent: 1h 20m > Remaining Estimate: 0h > > When hive.allow.udf.load.on.demand is set to true and hiveserver2 has been > started, the new created function from other clients or hiveserver2 will be > loaded from the metastore at the first time. > When the udf is used in where clause, we got a NPE like: > {code:java} > Error executing statement: > org.apache.hive.service.cli.HiveSQLException: Error while compiling > statement: FAILED: NullPointerException null > at > org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at > org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:206) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at > org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:290) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at > org.apache.hive.service.cli.operation.Operation.run(Operation.java:320) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at > org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:530) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAP > SHOT] > at > org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:517) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHO > T] > at > org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:310) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at > org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:542) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at > org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1437) > ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNA > PSHOT] > at > org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1422) > ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNA > PSHOT] > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at > org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:57) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at >
[jira] [Updated] (HIVE-23815) output statistics of underlying datastore
[ https://issues.apache.org/jira/browse/HIVE-23815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rossetti Wong updated HIVE-23815: - Flags: Patch > output statistics of underlying datastore > -- > > Key: HIVE-23815 > URL: https://issues.apache.org/jira/browse/HIVE-23815 > Project: Hive > Issue Type: Improvement >Reporter: Rossetti Wong >Assignee: Rossetti Wong >Priority: Major > Labels: pull-request-available > Time Spent: 3h 20m > Remaining Estimate: 0h > > This patch provides a way to get the statistics data of metastore's > underlying datastore, like MySQL, Oracle and so on. You can get the number > of datastore reads and writes, the average time of transaction execution, the > total active connection and so on. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge
[ https://issues.apache.org/jira/browse/HIVE-23880?focusedWorklogId=460968=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460968 ] ASF GitHub Bot logged work on HIVE-23880: - Author: ASF GitHub Bot Created on: 20/Jul/20 08:43 Start Date: 20/Jul/20 08:43 Worklog Time Spent: 10m Work Description: rbalamohan commented on a change in pull request #1280: URL: https://github.com/apache/hive/pull/1280#discussion_r457184037 ## File path: storage-api/src/java/org/apache/hive/common/util/BloomKFilter.java ## @@ -362,16 +379,178 @@ public static void mergeBloomFilterBytes( // Just bitwise-OR the bits together - size/# functions should be the same, // rest of the data is serialized long values for the bitset which are supposed to be bitwise-ORed. -for (int idx = START_OF_SERIALIZED_LONGS; idx < bf1Length; ++idx) { +for (int idx = mergeStart; idx < mergeEnd; ++idx) { bf1Bytes[bf1Start + idx] |= bf2Bytes[bf2Start + idx]; } } + public static void mergeBloomFilterBytesFromInputColumn( + byte[] bf1Bytes, int bf1Start, int bf1Length, long bf1ExpectedEntries, + BytesColumnVector inputColumn, int batchSize, boolean selectedInUse, int[] selected, int numThreads) { +if (numThreads == 0) { + numThreads = Runtime.getRuntime().availableProcessors(); +} +if (numThreads < 0) { + throw new RuntimeException("invalid number of threads: " + numThreads); +} + +ExecutorService executor = Executors.newFixedThreadPool(numThreads); + +BloomFilterMergeWorker[] workers = new BloomFilterMergeWorker[numThreads]; +for (int f = 0; f < numThreads; f++) { + workers[f] = new BloomFilterMergeWorker(executor, bf1Bytes, bf1Start, bf1Length); +} + +// split every bloom filter (represented by a part of a byte[]) across workers +for (int j = 0; j < batchSize; j++) { + if (!selectedInUse && inputColumn.noNulls) { +splitVectorAcrossWorkers(workers, inputColumn.vector[j], inputColumn.start[j], +inputColumn.length[j]); + } else if (!selectedInUse) { +if (!inputColumn.isNull[j]) { + splitVectorAcrossWorkers(workers, inputColumn.vector[j], inputColumn.start[j], + inputColumn.length[j]); +} + } else if (inputColumn.noNulls) { +int i = selected[j]; +splitVectorAcrossWorkers(workers, inputColumn.vector[i], inputColumn.start[i], +inputColumn.length[i]); + } else { +int i = selected[j]; +if (!inputColumn.isNull[i]) { + splitVectorAcrossWorkers(workers, inputColumn.vector[i], inputColumn.start[i], + inputColumn.length[i]); +} + } +} + +for (int f = 0; f < numThreads; f++) { + executor.submit(workers[f]); +} + +executor.shutdown(); +try { + executor.awaitTermination(3600, TimeUnit.SECONDS); +} catch (InterruptedException e) { + throw new RuntimeException(e); +} + } + + private static void splitVectorAcrossWorkers(BloomFilterMergeWorker[] workers, byte[] bytes, + int start, int length) { +if (bytes == null || length == 0) { + return; +} +/* + * This will split a byte[] across workers as below: + * let's say there are 10 workers for 7813 bytes, in this case + * length: 7813, elementPerBatch: 781 + * bytes assigned to workers: inclusive lower bound, exclusive upper bound + * 1. worker: 5 -> 786 + * 2. worker: 786 -> 1567 + * 3. worker: 1567 -> 2348 + * 4. worker: 2348 -> 3129 + * 5. worker: 3129 -> 3910 + * 6. worker: 3910 -> 4691 + * 7. worker: 4691 -> 5472 + * 8. worker: 5472 -> 6253 + * 9. worker: 6253 -> 7034 + * 10. worker: 7034 -> 7813 (last element per batch is: 779) + * + * This way, a particular worker will be given with the same part + * of all bloom filters along with the shared base bloom filter, + * so the bitwise OR function will not be a subject of threading/sync issues. + */ +int elementPerBatch = +(int) Math.ceil((double) (length - START_OF_SERIALIZED_LONGS) / workers.length); + +for (int w = 0; w < workers.length; w++) { + int modifiedStart = START_OF_SERIALIZED_LONGS + w * elementPerBatch; + int modifiedLength = (w == workers.length - 1) +? length - (START_OF_SERIALIZED_LONGS + w * elementPerBatch) : elementPerBatch; + + ElementWrapper wrapper = + new ElementWrapper(bytes, start, length, modifiedStart, modifiedLength); + workers[w].add(wrapper); +} + } + + public static byte[] getInitialBytes(long expectedEntries) { +ByteArrayOutputStream bytesOut = null; +try { + bytesOut = new ByteArrayOutputStream(); + BloomKFilter bf = new BloomKFilter(expectedEntries); + BloomKFilter.serialize(bytesOut, bf); + return
[jira] [Updated] (HIVE-23815) output statistics of underlying datastore
[ https://issues.apache.org/jira/browse/HIVE-23815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rossetti Wong updated HIVE-23815: - External issue URL: https://github.com/apache/hive/pull/1281 > output statistics of underlying datastore > -- > > Key: HIVE-23815 > URL: https://issues.apache.org/jira/browse/HIVE-23815 > Project: Hive > Issue Type: Improvement >Reporter: Rossetti Wong >Assignee: Rossetti Wong >Priority: Major > Labels: pull-request-available > Time Spent: 3h 20m > Remaining Estimate: 0h > > This patch provides a way to get the statistics data of metastore's > underlying datastore, like MySQL, Oracle and so on. You can get the number > of datastore reads and writes, the average time of transaction execution, the > total active connection and so on. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23671) MSCK repair should handle transactional tables in certain usecases
[ https://issues.apache.org/jira/browse/HIVE-23671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denys Kuzmenko updated HIVE-23671: -- Resolution: Fixed Status: Resolved (was: Patch Available) > MSCK repair should handle transactional tables in certain usecases > -- > > Key: HIVE-23671 > URL: https://issues.apache.org/jira/browse/HIVE-23671 > Project: Hive > Issue Type: Improvement > Components: Metastore >Reporter: Peter Varga >Assignee: Peter Varga >Priority: Major > Labels: pull-request-available > Time Spent: 11h 50m > Remaining Estimate: 0h > > The MSCK REPAIR tool does not handle transactional tables too well. It can > find and add new partitions the same way as for non-transactional tables, but > since the writeId differences are not handled, the data can not read back > from the new partitions. > We could handle some usecases when the writeIds in the HMS and the underlying > data are not conflicting. If the HMS does not contains allocated writes for > the table we can seed the table with the writeIds read from the directory > structrure. > Real life use cases could be: > * Copy data files from one cluster to another with different HMS, create the > table and call MSCK REPAIR > * If the HMS db is lost, recreate the table and call MSCK REPAIR > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23671) MSCK repair should handle transactional tables in certain usecases
[ https://issues.apache.org/jira/browse/HIVE-23671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161025#comment-17161025 ] Denys Kuzmenko commented on HIVE-23671: --- Pushed to master. Thank you for the patch, [~pvargacl]!! > MSCK repair should handle transactional tables in certain usecases > -- > > Key: HIVE-23671 > URL: https://issues.apache.org/jira/browse/HIVE-23671 > Project: Hive > Issue Type: Improvement > Components: Metastore >Reporter: Peter Varga >Assignee: Peter Varga >Priority: Major > Labels: pull-request-available > Time Spent: 11h 50m > Remaining Estimate: 0h > > The MSCK REPAIR tool does not handle transactional tables too well. It can > find and add new partitions the same way as for non-transactional tables, but > since the writeId differences are not handled, the data can not read back > from the new partitions. > We could handle some usecases when the writeIds in the HMS and the underlying > data are not conflicting. If the HMS does not contains allocated writes for > the table we can seed the table with the writeIds read from the directory > structrure. > Real life use cases could be: > * Copy data files from one cluster to another with different HMS, create the > table and call MSCK REPAIR > * If the HMS db is lost, recreate the table and call MSCK REPAIR > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23837) HbaseStorageHandler is not configured properly when the FileSinkOperator is the child of a MergeJoinWork
[ https://issues.apache.org/jira/browse/HIVE-23837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denys Kuzmenko reassigned HIVE-23837: - Assignee: Denys Kuzmenko (was: Peter Varga) > HbaseStorageHandler is not configured properly when the FileSinkOperator is > the child of a MergeJoinWork > > > Key: HIVE-23837 > URL: https://issues.apache.org/jira/browse/HIVE-23837 > Project: Hive > Issue Type: Bug >Reporter: Peter Varga >Assignee: Denys Kuzmenko >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > If the FileSinkOperator's root operator is a MergeJoinWork the > HbaseStorageHandler.configureJobConf will never get called, and the execution > will miss the HBASE_AUTH_TOKEN and the hbase jars. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22869) Add locking benchmark to metastore-tools/metastore-benchmarks
[ https://issues.apache.org/jira/browse/HIVE-22869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denys Kuzmenko updated HIVE-22869: -- Resolution: Fixed Status: Resolved (was: Patch Available) > Add locking benchmark to metastore-tools/metastore-benchmarks > - > > Key: HIVE-22869 > URL: https://issues.apache.org/jira/browse/HIVE-22869 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Chovan >Assignee: Zoltan Chovan >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22869.2.patch, HIVE-22869.3.patch, > HIVE-22869.4.patch, HIVE-22869.5.patch, HIVE-22869.6.patch, > HIVE-22869.7.patch, HIVE-22869.8.patch, HIVE-22869.9.patch, HIVE-22869.patch > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Add the possibility to run benchmarks on opening lock in the HMS -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22869) Add locking benchmark to metastore-tools/metastore-benchmarks
[ https://issues.apache.org/jira/browse/HIVE-22869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161023#comment-17161023 ] Denys Kuzmenko commented on HIVE-22869: --- Pushed to master. Thank you for the patch, [~zchovan]!! > Add locking benchmark to metastore-tools/metastore-benchmarks > - > > Key: HIVE-22869 > URL: https://issues.apache.org/jira/browse/HIVE-22869 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Chovan >Assignee: Zoltan Chovan >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22869.2.patch, HIVE-22869.3.patch, > HIVE-22869.4.patch, HIVE-22869.5.patch, HIVE-22869.6.patch, > HIVE-22869.7.patch, HIVE-22869.8.patch, HIVE-22869.9.patch, HIVE-22869.patch > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Add the possibility to run benchmarks on opening lock in the HMS -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-12155) hive exited with status 5
[ https://issues.apache.org/jira/browse/HIVE-12155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161024#comment-17161024 ] Chunhui Yang commented on HIVE-12155: - As of July 20, 2020, has no one solved this problem yet? > hive exited with status 5 > - > > Key: HIVE-12155 > URL: https://issues.apache.org/jira/browse/HIVE-12155 > Project: Hive > Issue Type: Bug > Components: CLI, Clients >Affects Versions: 1.2.1 > Environment: sqoop 1.4.5 & hadoop 2.6 & hive 1.2.1 >Reporter: Qiuzhuang Lian >Priority: Major > > We run sqoop-hive import job via RunJar to harness parallelisms runnings. > Sqoop hive import works very well but suddenly the sqoop-hive import job JVM > exits with "Hive exited with status 5" error during hive import phrase which > invokes HIVE CLI via java Process. Futhermore, we can't find any related > hive logs under /tmp/hive/hive_*.log. The error blocks all futher sqoop > import jobs. As a result, we have to restart system and it works well again. > The log detail is as follows, > Encountered IOException running import job: java.io.IOException: Hive exited > with status 5 > at > org.apache.sqoop.hive.HiveImport.executeExternalHiveScript(HiveImport.java:385) > at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:335) > at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:239) > at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:511) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23837) HbaseStorageHandler is not configured properly when the FileSinkOperator is the child of a MergeJoinWork
[ https://issues.apache.org/jira/browse/HIVE-23837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161022#comment-17161022 ] Denys Kuzmenko commented on HIVE-23837: --- Pushed to master. Thank you for the patch, [~pvargacl]!! > HbaseStorageHandler is not configured properly when the FileSinkOperator is > the child of a MergeJoinWork > > > Key: HIVE-23837 > URL: https://issues.apache.org/jira/browse/HIVE-23837 > Project: Hive > Issue Type: Bug >Reporter: Peter Varga >Assignee: Peter Varga >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > If the FileSinkOperator's root operator is a MergeJoinWork the > HbaseStorageHandler.configureJobConf will never get called, and the execution > will miss the HBASE_AUTH_TOKEN and the hbase jars. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-23837) HbaseStorageHandler is not configured properly when the FileSinkOperator is the child of a MergeJoinWork
[ https://issues.apache.org/jira/browse/HIVE-23837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denys Kuzmenko resolved HIVE-23837. --- Resolution: Fixed > HbaseStorageHandler is not configured properly when the FileSinkOperator is > the child of a MergeJoinWork > > > Key: HIVE-23837 > URL: https://issues.apache.org/jira/browse/HIVE-23837 > Project: Hive > Issue Type: Bug >Reporter: Peter Varga >Assignee: Peter Varga >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > If the FileSinkOperator's root operator is a MergeJoinWork the > HbaseStorageHandler.configureJobConf will never get called, and the execution > will miss the HBASE_AUTH_TOKEN and the hbase jars. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23560) Optimize bootstrap dump to abort only write Transactions
[ https://issues.apache.org/jira/browse/HIVE-23560?focusedWorklogId=460959=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460959 ] ASF GitHub Bot logged work on HIVE-23560: - Author: ASF GitHub Bot Created on: 20/Jul/20 08:27 Start Date: 20/Jul/20 08:27 Worklog Time Spent: 10m Work Description: aasha commented on a change in pull request #1232: URL: https://github.com/apache/hive/pull/1232#discussion_r457171271 ## File path: standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift ## @@ -2802,6 +2806,7 @@ PartitionsResponse get_partitions_req(1:PartitionsRequest req) void add_replication_metrics(1: ReplicationMetricList replicationMetricList) throws(1:MetaException o1) ReplicationMetricList get_replication_metrics(1: GetReplicationMetricsRequest rqst) throws(1:MetaException o1) + GetOpenTxnsResponse get_open_txns_req(1: GetOpenTxnsRequest getOpenTxnsRequest) Review comment: get_open_txns_info doesn't take any input params. We needed to exclude certain type of txns and only return the other open txns. The other way could have been add a new field in TxnInfo with the TxnType and still return all the open txns, but filter on the client side. Currently get_open_txns_info filters out the read txns without informing the client. So we thought it might be better to expose an explicit call to exclude specific txn types. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 460959) Time Spent: 2h 50m (was: 2h 40m) > Optimize bootstrap dump to abort only write Transactions > > > Key: HIVE-23560 > URL: https://issues.apache.org/jira/browse/HIVE-23560 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23560.01.patch, HIVE-23560.02.patch, > HIVE-23560.03.patch, HIVE-23560.04.patch, Optimize bootstrap dump to avoid > aborting all transactions.pdf > > Time Spent: 2h 50m > Remaining Estimate: 0h > > Currently before doing a bootstrap dump, we abort all open transactions after > waiting for a configured time. We are proposing to abort only write > transactions for the db under replication and leave the read and repl created > transactions as is. > This doc attached talks about it in detail -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23815) output statistics of underlying datastore
[ https://issues.apache.org/jira/browse/HIVE-23815?focusedWorklogId=460956=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460956 ] ASF GitHub Bot logged work on HIVE-23815: - Author: ASF GitHub Bot Created on: 20/Jul/20 08:19 Start Date: 20/Jul/20 08:19 Worklog Time Spent: 10m Work Description: xinghuayu007 closed pull request #1227: URL: https://github.com/apache/hive/pull/1227 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 460956) Time Spent: 3h 20m (was: 3h 10m) > output statistics of underlying datastore > -- > > Key: HIVE-23815 > URL: https://issues.apache.org/jira/browse/HIVE-23815 > Project: Hive > Issue Type: Improvement >Reporter: Rossetti Wong >Assignee: Rossetti Wong >Priority: Major > Labels: pull-request-available > Time Spent: 3h 20m > Remaining Estimate: 0h > > This patch provides a way to get the statistics data of metastore's > underlying datastore, like MySQL, Oracle and so on. You can get the number > of datastore reads and writes, the average time of transaction execution, the > total active connection and so on. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22869) Add locking benchmark to metastore-tools/metastore-benchmarks
[ https://issues.apache.org/jira/browse/HIVE-22869?focusedWorklogId=460952=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460952 ] ASF GitHub Bot logged work on HIVE-22869: - Author: ASF GitHub Bot Created on: 20/Jul/20 08:04 Start Date: 20/Jul/20 08:04 Worklog Time Spent: 10m Work Description: deniskuzZ merged pull request #1073: URL: https://github.com/apache/hive/pull/1073 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 460952) Time Spent: 2h 10m (was: 2h) > Add locking benchmark to metastore-tools/metastore-benchmarks > - > > Key: HIVE-22869 > URL: https://issues.apache.org/jira/browse/HIVE-22869 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Chovan >Assignee: Zoltan Chovan >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22869.2.patch, HIVE-22869.3.patch, > HIVE-22869.4.patch, HIVE-22869.5.patch, HIVE-22869.6.patch, > HIVE-22869.7.patch, HIVE-22869.8.patch, HIVE-22869.9.patch, HIVE-22869.patch > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Add the possibility to run benchmarks on opening lock in the HMS -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23560) Optimize bootstrap dump to abort only write Transactions
[ https://issues.apache.org/jira/browse/HIVE-23560?focusedWorklogId=460951=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460951 ] ASF GitHub Bot logged work on HIVE-23560: - Author: ASF GitHub Bot Created on: 20/Jul/20 07:57 Start Date: 20/Jul/20 07:57 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #1232: URL: https://github.com/apache/hive/pull/1232#discussion_r457149267 ## File path: standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift ## @@ -2802,6 +2806,7 @@ PartitionsResponse get_partitions_req(1:PartitionsRequest req) void add_replication_metrics(1: ReplicationMetricList replicationMetricList) throws(1:MetaException o1) ReplicationMetricList get_replication_metrics(1: GetReplicationMetricsRequest rqst) throws(1:MetaException o1) + GetOpenTxnsResponse get_open_txns_req(1: GetOpenTxnsRequest getOpenTxnsRequest) Review comment: Why did you introduce a new HMS API method for this? I would add a new type attribute to TxnInfo and use get_open_txns_info instead This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 460951) Time Spent: 2h 40m (was: 2.5h) > Optimize bootstrap dump to abort only write Transactions > > > Key: HIVE-23560 > URL: https://issues.apache.org/jira/browse/HIVE-23560 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23560.01.patch, HIVE-23560.02.patch, > HIVE-23560.03.patch, HIVE-23560.04.patch, Optimize bootstrap dump to avoid > aborting all transactions.pdf > > Time Spent: 2h 40m > Remaining Estimate: 0h > > Currently before doing a bootstrap dump, we abort all open transactions after > waiting for a configured time. We are proposing to abort only write > transactions for the db under replication and leave the read and repl created > transactions as is. > This doc attached talks about it in detail -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge
[ https://issues.apache.org/jira/browse/HIVE-23880?focusedWorklogId=460950=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460950 ] ASF GitHub Bot logged work on HIVE-23880: - Author: ASF GitHub Bot Created on: 20/Jul/20 07:53 Start Date: 20/Jul/20 07:53 Worklog Time Spent: 10m Work Description: abstractdog opened a new pull request #1280: URL: https://github.com/apache/hive/pull/1280 …AFBloomFilterMerge Change-Id: I235248ad327b0cea91e637e74a0c67720710737e ## NOTICE Please create an issue in ASF JIRA before opening a pull request, and you need to set the title of the pull request which starts with the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY) For more details, please see https://cwiki.apache.org/confluence/display/Hive/HowToContribute This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 460950) Remaining Estimate: 0h Time Spent: 10m > Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge > --- > > Key: HIVE-23880 > URL: https://issues.apache.org/jira/browse/HIVE-23880 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Attachments: lipwig-output3605036885489193068.svg > > Time Spent: 10m > Remaining Estimate: 0h > > Merging bloom filters in semijoin reduction can become the main bottleneck in > case of large number of source mapper tasks (~1000, Map 1 in below example) > and a large amount of expected entries (50M) in bloom filters. > For example in TPCDS Q93: > {code} > select /*+ semi(store_returns, sr_item_sk, store_sales, 7000)*/ > ss_customer_sk > ,sum(act_sales) sumsales > from (select ss_item_sk > ,ss_ticket_number > ,ss_customer_sk > ,case when sr_return_quantity is not null then > (ss_quantity-sr_return_quantity)*ss_sales_price > else > (ss_quantity*ss_sales_price) end act_sales > from store_sales left outer join store_returns on (sr_item_sk = > ss_item_sk >and > sr_ticket_number = ss_ticket_number) > ,reason > where sr_reason_sk = r_reason_sk > and r_reason_desc = 'reason 66') t > group by ss_customer_sk > order by sumsales, ss_customer_sk > limit 100; > {code} > On 10TB-30TB scale there is a chance that from 3-4 mins of query runtime 1-2 > mins are spent with merging bloom filters (Reducer 2), as in: > [^lipwig-output3605036885489193068.svg] > {code} > -- > VERTICES MODESTATUS TOTAL COMPLETED RUNNING PENDING > FAILED KILLED > -- > Map 3 .. llap SUCCEEDED 1 100 > 0 0 > Map 1 .. llap SUCCEEDED 1263 126300 > 0 0 > Reducer 2 llap RUNNING 1 010 > 0 0 > Map 4 llap RUNNING 6154 0 207 5947 > 0 0 > Reducer 5 llapINITED 43 00 43 > 0 0 > Reducer 6 llapINITED 1 001 > 0 0 > -- > VERTICES: 02/06 [>>--] 16% ELAPSED TIME: 149.98 s > -- > {code} > For example, 70M entries in bloom filter leads to a 436 465 696 bits, so > merging 1263 bloom filters means running ~ 1263 * 436 465 696 bitwise OR > operation, which is very hot codepath, but can be parallelized. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge
[ https://issues.apache.org/jira/browse/HIVE-23880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-23880: -- Labels: pull-request-available (was: ) > Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge > --- > > Key: HIVE-23880 > URL: https://issues.apache.org/jira/browse/HIVE-23880 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Attachments: lipwig-output3605036885489193068.svg > > Time Spent: 10m > Remaining Estimate: 0h > > Merging bloom filters in semijoin reduction can become the main bottleneck in > case of large number of source mapper tasks (~1000, Map 1 in below example) > and a large amount of expected entries (50M) in bloom filters. > For example in TPCDS Q93: > {code} > select /*+ semi(store_returns, sr_item_sk, store_sales, 7000)*/ > ss_customer_sk > ,sum(act_sales) sumsales > from (select ss_item_sk > ,ss_ticket_number > ,ss_customer_sk > ,case when sr_return_quantity is not null then > (ss_quantity-sr_return_quantity)*ss_sales_price > else > (ss_quantity*ss_sales_price) end act_sales > from store_sales left outer join store_returns on (sr_item_sk = > ss_item_sk >and > sr_ticket_number = ss_ticket_number) > ,reason > where sr_reason_sk = r_reason_sk > and r_reason_desc = 'reason 66') t > group by ss_customer_sk > order by sumsales, ss_customer_sk > limit 100; > {code} > On 10TB-30TB scale there is a chance that from 3-4 mins of query runtime 1-2 > mins are spent with merging bloom filters (Reducer 2), as in: > [^lipwig-output3605036885489193068.svg] > {code} > -- > VERTICES MODESTATUS TOTAL COMPLETED RUNNING PENDING > FAILED KILLED > -- > Map 3 .. llap SUCCEEDED 1 100 > 0 0 > Map 1 .. llap SUCCEEDED 1263 126300 > 0 0 > Reducer 2 llap RUNNING 1 010 > 0 0 > Map 4 llap RUNNING 6154 0 207 5947 > 0 0 > Reducer 5 llapINITED 43 00 43 > 0 0 > Reducer 6 llapINITED 1 001 > 0 0 > -- > VERTICES: 02/06 [>>--] 16% ELAPSED TIME: 149.98 s > -- > {code} > For example, 70M entries in bloom filter leads to a 436 465 696 bits, so > merging 1263 bloom filters means running ~ 1263 * 436 465 696 bitwise OR > operation, which is very hot codepath, but can be parallelized. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge
[ https://issues.apache.org/jira/browse/HIVE-23880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-23880: Description: Merging bloom filters in semijoin reduction can become the main bottleneck in case of large number of source mapper tasks (~1000, Map 1 in below example) and a large amount of expected entries (50M) in bloom filters. For example in TPCDS Q93: {code} select /*+ semi(store_returns, sr_item_sk, store_sales, 7000)*/ ss_customer_sk ,sum(act_sales) sumsales from (select ss_item_sk ,ss_ticket_number ,ss_customer_sk ,case when sr_return_quantity is not null then (ss_quantity-sr_return_quantity)*ss_sales_price else (ss_quantity*ss_sales_price) end act_sales from store_sales left outer join store_returns on (sr_item_sk = ss_item_sk and sr_ticket_number = ss_ticket_number) ,reason where sr_reason_sk = r_reason_sk and r_reason_desc = 'reason 66') t group by ss_customer_sk order by sumsales, ss_customer_sk limit 100; {code} On 10TB-30TB scale there is a chance that from 3-4 mins of query runtime 1-2 mins are spent with merging bloom filters (Reducer 2), as in: [^lipwig-output3605036885489193068.svg] {code} -- VERTICES MODESTATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -- Map 3 .. llap SUCCEEDED 1 100 0 0 Map 1 .. llap SUCCEEDED 1263 126300 0 0 Reducer 2 llap RUNNING 1 010 0 0 Map 4 llap RUNNING 6154 0 207 5947 0 0 Reducer 5 llapINITED 43 00 43 0 0 Reducer 6 llapINITED 1 001 0 0 -- VERTICES: 02/06 [>>--] 16% ELAPSED TIME: 149.98 s -- {code} For example, 70M entries in bloom filter leads to a 436 465 696 bits, so merging 1263 bloom filters means running ~ 1263 * 436 465 696 bitwise OR operation, which is very hot codepath, but can be parallelized. was: Merging bloom filters in semijoin reduction can become the main bottleneck in case of large number of source mapper tasks (~1000) and a large amount of expected entries (50M) in bloom filters. For example in TPCDS Q93: {code} select /*+ semi(store_returns, sr_item_sk, store_sales, 7000)*/ ss_customer_sk ,sum(act_sales) sumsales from (select ss_item_sk ,ss_ticket_number ,ss_customer_sk ,case when sr_return_quantity is not null then (ss_quantity-sr_return_quantity)*ss_sales_price else (ss_quantity*ss_sales_price) end act_sales from store_sales left outer join store_returns on (sr_item_sk = ss_item_sk and sr_ticket_number = ss_ticket_number) ,reason where sr_reason_sk = r_reason_sk and r_reason_desc = 'reason 66') t group by ss_customer_sk order by sumsales, ss_customer_sk limit 100; {code} On 10TB-30TB scale there is a chance that from 3-4 mins of query runtime 1-2 mins are spent with merging bloom filters, as in: > Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge > --- > > Key: HIVE-23880 > URL: https://issues.apache.org/jira/browse/HIVE-23880 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Attachments: lipwig-output3605036885489193068.svg > > > Merging bloom filters in semijoin reduction can become the main bottleneck in > case of large number of source mapper tasks (~1000, Map 1 in below example) > and a large amount of expected entries (50M) in bloom filters. > For example in TPCDS Q93: > {code} > select /*+ semi(store_returns, sr_item_sk, store_sales, 7000)*/ > ss_customer_sk > ,sum(act_sales) sumsales > from (select ss_item_sk >
[jira] [Updated] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge
[ https://issues.apache.org/jira/browse/HIVE-23880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-23880: Attachment: lipwig-output3605036885489193068.svg > Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge > --- > > Key: HIVE-23880 > URL: https://issues.apache.org/jira/browse/HIVE-23880 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Attachments: lipwig-output3605036885489193068.svg > > > Merging bloom filters in semijoin reduction can become the main bottleneck in > case of large number of source mapper tasks (~1000) and a large amount of > expected entries (50M) in bloom filters. > For example in TPCDS Q93: > {code} > select /*+ semi(store_returns, sr_item_sk, store_sales, 7000)*/ > ss_customer_sk > ,sum(act_sales) sumsales > from (select ss_item_sk > ,ss_ticket_number > ,ss_customer_sk > ,case when sr_return_quantity is not null then > (ss_quantity-sr_return_quantity)*ss_sales_price > else > (ss_quantity*ss_sales_price) end act_sales > from store_sales left outer join store_returns on (sr_item_sk = > ss_item_sk >and > sr_ticket_number = ss_ticket_number) > ,reason > where sr_reason_sk = r_reason_sk > and r_reason_desc = 'reason 66') t > group by ss_customer_sk > order by sumsales, ss_customer_sk > limit 100; > {code} > On 10TB-30TB scale there is a chance that from 3-4 mins of query runtime 1-2 > mins are spent with merging bloom filters, as in: -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge
[ https://issues.apache.org/jira/browse/HIVE-23880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-23880: Description: Merging bloom filters in semijoin reduction can become the main bottleneck in case of large number of source mapper tasks (~1000) and a large amount of expected entries (50M) in bloom filters. For example in TPCDS Q93: {code} select /*+ semi(store_returns, sr_item_sk, store_sales, 7000)*/ ss_customer_sk ,sum(act_sales) sumsales from (select ss_item_sk ,ss_ticket_number ,ss_customer_sk ,case when sr_return_quantity is not null then (ss_quantity-sr_return_quantity)*ss_sales_price else (ss_quantity*ss_sales_price) end act_sales from store_sales left outer join store_returns on (sr_item_sk = ss_item_sk and sr_ticket_number = ss_ticket_number) ,reason where sr_reason_sk = r_reason_sk and r_reason_desc = 'reason 66') t group by ss_customer_sk order by sumsales, ss_customer_sk limit 100; {code} On 10TB-30TB scale there is a chance that from 3-4 mins of query runtime 1-2 mins are spent with merging bloom filters, as in: > Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge > --- > > Key: HIVE-23880 > URL: https://issues.apache.org/jira/browse/HIVE-23880 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Attachments: lipwig-output3605036885489193068.svg > > > Merging bloom filters in semijoin reduction can become the main bottleneck in > case of large number of source mapper tasks (~1000) and a large amount of > expected entries (50M) in bloom filters. > For example in TPCDS Q93: > {code} > select /*+ semi(store_returns, sr_item_sk, store_sales, 7000)*/ > ss_customer_sk > ,sum(act_sales) sumsales > from (select ss_item_sk > ,ss_ticket_number > ,ss_customer_sk > ,case when sr_return_quantity is not null then > (ss_quantity-sr_return_quantity)*ss_sales_price > else > (ss_quantity*ss_sales_price) end act_sales > from store_sales left outer join store_returns on (sr_item_sk = > ss_item_sk >and > sr_ticket_number = ss_ticket_number) > ,reason > where sr_reason_sk = r_reason_sk > and r_reason_desc = 'reason 66') t > group by ss_customer_sk > order by sumsales, ss_customer_sk > limit 100; > {code} > On 10TB-30TB scale there is a chance that from 3-4 mins of query runtime 1-2 > mins are spent with merging bloom filters, as in: -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23879) Data has been lost after table location was altered
[ https://issues.apache.org/jira/browse/HIVE-23879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Demyd updated HIVE-23879: - Description: When I alter location for not empty table and inserts data to it. I don't see old data at work with hs2. But I can find there in maprfs by old table location. Steps to reproduce: {code:sql} 1. connect to hs2 by beeline" hive --service beeline -u "jdbc:hive2://:1/;" 2. create test db: create database dbtest1 location 'hdfs:///dbtest1.db'; 3. create test table: create table dbtest1.t1 (id int); 4. insert data to table: insert into dbtest1.t1 (id) values (1); 5. set new table location: alter table dbtest1.t1 set location 'hdfs:///dbtest1a/t1'; 6. insert data to table: insert into dbtest1.t1 (id) values (2); {code} Actual result: {code:sql} jdbc:hive2://:> select * from dbtest1.t1; ++ |t1.id | ++ |2 | ++ 1 row selected (0.097 seconds) {code} Expected result: {code:sql} jdbc:hive2://:> select * from dbtest1.t1; ++ |t1.id | ++ |2 | ++ |1 | ++ 1 row selected (0.097 seconds) {code} was: When I alter location for not empty table and inserts data to it. I don't see old data at work with hs2. But I can find there in maprfs by old table location. Steps to reproduce: {code:sql} 1. connect to hs2 by beeline" hive --service beeline -u "jdbc:hive2://:1/;" 2. create test db: create database dbtest1 location 'hdfs:///dbtest1.db'; 3. create test table: create table dbtest1.t1 (id int); 4. insert data to table: insert into dbtest1.t1 (id) values (1); 5. set new table location: alter table dbtest1.t1 set location 'hdfs:///dbtest1a/t1'; 6. insert data to table: insert into dbtest1.t1 (id) values (2);\{code} {code} Actual result: {code:sql} jdbc:hive2://:> select * from dbtest1.t1; ++ |t1.id | ++ |2 | ++ 1 row selected (0.097 seconds) {code} Expected result: {code:sql} jdbc:hive2://:> select * from dbtest1.t1; ++ |t1.id | ++ |2 | ++ |1 | ++ 1 row selected (0.097 seconds) {code} > Data has been lost after table location was altered > --- > > Key: HIVE-23879 > URL: https://issues.apache.org/jira/browse/HIVE-23879 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Demyd >Priority: Major > > When I alter location for not empty table and inserts data to it. I don't see > old data at work with hs2. But I can find there in maprfs by old table > location. > Steps to reproduce: > {code:sql} > 1. connect to hs2 by beeline" > hive --service beeline -u "jdbc:hive2://:1/;" > 2. create test db: > create database dbtest1 location 'hdfs:///dbtest1.db'; > 3. create test table: > create table dbtest1.t1 (id int); > 4. insert data to table: > insert into dbtest1.t1 (id) values (1); > 5. set new table location: > alter table dbtest1.t1 set location 'hdfs:///dbtest1a/t1'; > 6. insert data to table: > insert into dbtest1.t1 (id) values (2); > {code} > Actual result: > {code:sql} > jdbc:hive2://:> select * from dbtest1.t1; > ++ > |t1.id | > ++ > |2 | > ++ > 1 row selected (0.097 seconds) > {code} > Expected result: > {code:sql} > jdbc:hive2://:> select * from dbtest1.t1; > ++ > |t1.id | > ++ > |2 | > ++ > |1 | > ++ > 1 row selected (0.097 seconds) > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23879) Data has been lost after table location was altered
[ https://issues.apache.org/jira/browse/HIVE-23879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Demyd updated HIVE-23879: - Description: When I alter location for not empty table and inserts data to it. I don't see old data at work with hs2. But I can find there in maprfs by old table location. Steps to reproduce: {code:sql} 1. connect to hs2 by beeline" hive --service beeline -u "jdbc:hive2://:1/;" 2. create test db: create database dbtest1 location 'hdfs:///dbtest1.db'; 3. create test table: create table dbtest1.t1 (id int); 4. insert data to table: insert into dbtest1.t1 (id) values (1); 5. set new table location: alter table dbtest1.t1 set location 'hdfs:///dbtest1a/t1'; 6. insert data to table: insert into dbtest1.t1 (id) values (2);\{code} {code} Actual result: {code:sql} jdbc:hive2://:> select * from dbtest1.t1; ++ |t1.id | ++ |2 | ++ 1 row selected (0.097 seconds) {code} Expected result: {code:sql} jdbc:hive2://:> select * from dbtest1.t1; ++ |t1.id | ++ |2 | ++ |1 | ++ 1 row selected (0.097 seconds) {code} was: When I alter location for not empty table and inserts data to it. I don't see old data at work with hs2. But I can find there in maprfs by old table location. Steps to reproduce: {code:sql} 1. connect to hs2 by beeline" hive --service beeline -u "jdbc:hive2://:1/;" 2. create test db: create database dbtest1 location 'hdfs:///dbtest1.db'; 3. create test table: create table dbtest1.t1 (id int); 4. insert data to table: insert into dbtest1.t1 (id) values (1); 5. set new table location: alter table dbtest1.t1 set location 'hdfs:///dbtest1a/t1'; 6. insert data to table: insert into dbtest1.t1 (id) values (2);\{code} {code} Actual result: {code:sql} jdbc:hive2://:> select * from dbtest1.t1; ++ |t1.id | ++ |2 | ++ 1 row selected (0.097 seconds) {code:code} Expected result: {code:sql} jdbc:hive2://:> select * from dbtest1.t1; ++ |t1.id | ++ |2 | ++ |1 | ++ 1 row selected (0.097 seconds) {code} > Data has been lost after table location was altered > --- > > Key: HIVE-23879 > URL: https://issues.apache.org/jira/browse/HIVE-23879 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Demyd >Priority: Major > > When I alter location for not empty table and inserts data to it. I don't see > old data at work with hs2. But I can find there in maprfs by old table > location. > Steps to reproduce: > {code:sql} > 1. connect to hs2 by beeline" > hive --service beeline -u "jdbc:hive2://:1/;" > 2. create test db: > create database dbtest1 location 'hdfs:///dbtest1.db'; > 3. create test table: > create table dbtest1.t1 (id int); > 4. insert data to table: > insert into dbtest1.t1 (id) values (1); > 5. set new table location: > alter table dbtest1.t1 set location 'hdfs:///dbtest1a/t1'; > 6. insert data to table: > insert into dbtest1.t1 (id) values (2);\{code} > {code} > Actual result: > {code:sql} > jdbc:hive2://:> select * from dbtest1.t1; > ++ > |t1.id | > ++ > |2 | > ++ > 1 row selected (0.097 seconds) > {code} > Expected result: > {code:sql} > jdbc:hive2://:> select * from dbtest1.t1; > ++ > |t1.id | > ++ > |2 | > ++ > |1 | > ++ > 1 row selected (0.097 seconds) > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23879) Data has been lost after table location was altered
[ https://issues.apache.org/jira/browse/HIVE-23879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Demyd updated HIVE-23879: - Description: When I alter location for not empty table and inserts data to it. I don't see old data at work with hs2. But I can find there in maprfs by old table location. Steps to reproduce: {code:sql} 1. connect to hs2 by beeline" hive --service beeline -u "jdbc:hive2://:1/;" 2. create test db: create database dbtest1 location 'hdfs:///dbtest1.db'; 3. create test table: create table dbtest1.t1 (id int); 4. insert data to table: insert into dbtest1.t1 (id) values (1); 5. set new table location: alter table dbtest1.t1 set location 'hdfs:///dbtest1a/t1'; 6. insert data to table: insert into dbtest1.t1 (id) values (2);\{code} {code} Actual result: {code:sql} jdbc:hive2://:> select * from dbtest1.t1; ++ |t1.id | ++ |2 | ++ 1 row selected (0.097 seconds) {code:code} Expected result: {code:sql} jdbc:hive2://:> select * from dbtest1.t1; ++ |t1.id | ++ |2 | ++ |1 | ++ 1 row selected (0.097 seconds) {code} was: When I alter location for not empty table and inserts data to it. I don't see old data at work with hs2. But I can find there in maprfs by old table location. Steps to reproduce: {code:java} 1. connect to hs2 by beeline" hive --service beeline -u "jdbc:hive2://:1/;" 2. create test db: create database dbtest1 location 'hdfs:///dbtest1.db'; 3. create test table: create table dbtest1.t1 (id int); 4. insert data to table: insert into dbtest1.t1 (id) values (1); 5. set new table location: alter table dbtest1.t1 set location 'hdfs:///dbtest1a/t1'; 6. insert data to table: insert into dbtest1.t1 (id) values (2);\{code} {code} Actual result: jdbc:hive2://:> select * from dbtest1.t1; ++ |t1.id | ++ |2 | ++ 1 row selected (0.097 seconds) {code:java} Expected result: {code} jdbc:hive2://:> select * from dbtest1.t1; ++ |t1.id | ++ |2 | ++ |1 | ++ 1 row selected (0.097 seconds) {code:java} {code} > Data has been lost after table location was altered > --- > > Key: HIVE-23879 > URL: https://issues.apache.org/jira/browse/HIVE-23879 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Demyd >Priority: Major > > When I alter location for not empty table and inserts data to it. I don't see > old data at work with hs2. But I can find there in maprfs by old table > location. > Steps to reproduce: > {code:sql} > 1. connect to hs2 by beeline" > hive --service beeline -u "jdbc:hive2://:1/;" > 2. create test db: > create database dbtest1 location 'hdfs:///dbtest1.db'; > 3. create test table: > create table dbtest1.t1 (id int); > 4. insert data to table: > insert into dbtest1.t1 (id) values (1); > 5. set new table location: > alter table dbtest1.t1 set location 'hdfs:///dbtest1a/t1'; > 6. insert data to table: > insert into dbtest1.t1 (id) values (2);\{code} > {code} > Actual result: > {code:sql} > jdbc:hive2://:> select * from dbtest1.t1; > ++ > |t1.id | > ++ > |2 | > ++ > 1 row selected (0.097 seconds) > {code:code} > Expected result: > {code:sql} > jdbc:hive2://:> select * from dbtest1.t1; > ++ > |t1.id | > ++ > |2 | > ++ > |1 | > ++ > 1 row selected (0.097 seconds) > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23879) Data has been lost after table location was altered
[ https://issues.apache.org/jira/browse/HIVE-23879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Demyd updated HIVE-23879: - Description: When I alter location for not empty table and inserts data to it. I don't see old data at work with hs2. But I can find there in maprfs by old table location. Steps to reproduce: {code:java} 1. connect to hs2 by beeline" hive --service beeline -u "jdbc:hive2://:1/;" 2. create test db: create database dbtest1 location 'hdfs:///dbtest1.db'; 3. create test table: create table dbtest1.t1 (id int); 4. insert data to table: insert into dbtest1.t1 (id) values (1); 5. set new table location: alter table dbtest1.t1 set location 'hdfs:///dbtest1a/t1'; 6. insert data to table: insert into dbtest1.t1 (id) values (2);\{code} {code} Actual result: jdbc:hive2://:> select * from dbtest1.t1; ++ |t1.id | ++ |2 | ++ 1 row selected (0.097 seconds) {code:java} Expected result: {code} jdbc:hive2://:> select * from dbtest1.t1; ++ |t1.id | ++ |2 | ++ |1 | ++ 1 row selected (0.097 seconds) {code:java} {code} was: When I alter location for not empty table and inserts data to it. I don't see old data at work with hs2. But I can find there in maprfs by old table location. Steps to reproduce: {code}1. connect to hs2 by beeline" hive --service beeline -u "jdbc:hive2://:1/;auth=maprsasl;ssl=true" 2. create test db: create database dbtest1 location 'maprfs:///dbtest1.db'; 3. create test table: create table dbtest1.t1 (id int); 4. insert data to table: insert into dbtest1.t1 (id) values (1); 5. set new table location: alter table dbtest1.t1 set location 'maprfs:///dbtest1a/t1'; 6. insert data to table: insert into dbtest1.t1 (id) values (2);\{code} Actual result: {code} jdbc:hive2://:> select * from dbtest1.t1; ++ | t1.id | ++ | 2 | ++ 1 row selected (0.097 seconds) {code} Expected result: {code} jdbc:hive2://:> select * from dbtest1.t1; ++ | t1.id | ++ | 2 | ++ | 1 | ++ 1 row selected (0.097 seconds) {code} > Data has been lost after table location was altered > --- > > Key: HIVE-23879 > URL: https://issues.apache.org/jira/browse/HIVE-23879 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Demyd >Priority: Major > > When I alter location for not empty table and inserts data to it. I don't see > old data at work with hs2. But I can find there in maprfs by old table > location. > Steps to reproduce: > {code:java} > 1. connect to hs2 by beeline" > hive --service beeline -u "jdbc:hive2://:1/;" > 2. create test db: > create database dbtest1 location 'hdfs:///dbtest1.db'; > 3. create test table: > create table dbtest1.t1 (id int); > 4. insert data to table: > insert into dbtest1.t1 (id) values (1); > 5. set new table location: > alter table dbtest1.t1 set location 'hdfs:///dbtest1a/t1'; > 6. insert data to table: > insert into dbtest1.t1 (id) values (2);\{code} > {code} > Actual result: > jdbc:hive2://:> select * from dbtest1.t1; > ++ > |t1.id | > ++ > |2 | > ++ > 1 row selected (0.097 seconds) > {code:java} > Expected result: > {code} > jdbc:hive2://:> select * from dbtest1.t1; > ++ > |t1.id | > ++ > |2 | > ++ > |1 | > ++ > 1 row selected (0.097 seconds) > {code:java} > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23879) Data has been lost after table location was altered
[ https://issues.apache.org/jira/browse/HIVE-23879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Demyd updated HIVE-23879: - Description: When I alter location for not empty table and inserts data to it. I don't see old data at work with hs2. But I can find there in maprfs by old table location. Steps to reproduce: {code}1. connect to hs2 by beeline" hive --service beeline -u "jdbc:hive2://:1/;auth=maprsasl;ssl=true" 2. create test db: create database dbtest1 location 'maprfs:///dbtest1.db'; 3. create test table: create table dbtest1.t1 (id int); 4. insert data to table: insert into dbtest1.t1 (id) values (1); 5. set new table location: alter table dbtest1.t1 set location 'maprfs:///dbtest1a/t1'; 6. insert data to table: insert into dbtest1.t1 (id) values (2);\{code} Actual result: {code} jdbc:hive2://:> select * from dbtest1.t1; ++ | t1.id | ++ | 2 | ++ 1 row selected (0.097 seconds) {code} Expected result: {code} jdbc:hive2://:> select * from dbtest1.t1; ++ | t1.id | ++ | 2 | ++ | 1 | ++ 1 row selected (0.097 seconds) {code} was: When I alter location for not empty table and inserts data to it. I don't see old data at work with hs2. But I can find there in maprfs by old table location. Steps to reproduce: 1. connect to hs2 by beeline" hive --service beeline -u "jdbc:hive2://:1/;"2. create test db: create database dbtest1 location 'hdfs:///dbtest1.db';3. create test table: create table dbtest1.t1 (id int); 4. insert data to table: insert into dbtest1.t1 (id) values (1); 5. set new table location: alter table dbtest1.t1 set location 'hdfs:///dbtest1a/t1';6. insert data to table: insert into dbtest1.t1 (id) values (2); Actual result: jdbc:hive2://:> select * from dbtest1.t1;++ | t1.id | ++ | 2 | ++ 1 row selected (0.097 seconds) Expected result: jdbc:hive2://:> select * from dbtest1.t1;++ | t1.id | ++ | 2 | ++ | 1 | ++ 1 row selected (0.097 seconds) > Data has been lost after table location was altered > --- > > Key: HIVE-23879 > URL: https://issues.apache.org/jira/browse/HIVE-23879 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Demyd >Priority: Major > > When I alter location for not empty table and inserts data to it. I don't see > old data at work with hs2. But I can find there in maprfs by old table > location. > Steps to reproduce: > {code}1. connect to hs2 by beeline" > hive --service beeline -u "jdbc:hive2://:1/;auth=maprsasl;ssl=true" > 2. create test db: > create database dbtest1 location 'maprfs:///dbtest1.db'; > 3. create test table: > create table dbtest1.t1 (id int); > 4. insert data to table: > insert into dbtest1.t1 (id) values (1); > 5. set new table location: > alter table dbtest1.t1 set location 'maprfs:///dbtest1a/t1'; > 6. insert data to table: > insert into dbtest1.t1 (id) values (2);\{code} > Actual result: > {code} > jdbc:hive2://:> select * from dbtest1.t1; > ++ > | t1.id | > ++ > | 2 | > ++ > 1 row selected (0.097 seconds) > {code} > Expected result: > {code} > jdbc:hive2://:> select * from dbtest1.t1; > ++ > | t1.id | > ++ > | 2 | > ++ > | 1 | > ++ > 1 row selected (0.097 seconds) > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge
[ https://issues.apache.org/jira/browse/HIVE-23880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor reassigned HIVE-23880: --- Assignee: László Bodor > Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge > --- > > Key: HIVE-23880 > URL: https://issues.apache.org/jira/browse/HIVE-23880 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)