[jira] [Commented] (HIVE-23716) Support Anti Join in Hive

2020-07-20 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161706#comment-17161706
 ] 

Peter Vary commented on HIVE-23716:
---

[~maheshk114]: How efficient would be this implementation for ACID 
delete_deltas? Currently we use ACID specific readers to read ACID files, and 
remove deleted rows, but using anti-joins we can automatically get the perf 
benefits of LLAP IO cache for delete deltas too.

> Support Anti Join in Hive 
> --
>
> Key: HIVE-23716
> URL: https://issues.apache.org/jira/browse/HIVE-23716
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23716.01.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Currently hive does not support Anti join. The query for anti join is 
> converted to left outer join and null filter on right side join key is added 
> to get the desired result. This is causing
>  # Extra computation — The left outer join projects the redundant columns 
> from right side. Along with that, filtering is done to remove the redundant 
> rows. This is can be avoided in case of anti join as anti join will project 
> only the required columns and rows from the left side table.
>  # Extra shuffle — In case of anti join the duplicate records moved to join 
> node can be avoided from the child node. This can reduce significant amount 
> of data movement if the number of distinct rows( join keys) is significant.
>  # Extra Memory Usage - In case of map based anti join , hash set is 
> sufficient as just the key is required to check  if the records matches the 
> join condition. In case of left join, we need the key and the non key columns 
> also and thus a hash table will be required.
> For a query like
> {code:java}
>  select wr_order_number FROM web_returns LEFT JOIN web_sales  ON 
> wr_order_number = ws_order_number WHERE ws_order_number IS NULL;{code}
> The number of distinct ws_order_number in web_sales table in a typical 10TB 
> TPCDS set up is just 10% of total records. So when we convert this query to 
> anti join, instead of 7 billion rows, only 600 million rows are moved to join 
> node.
> In the current patch, just one conversion is done. The pattern of 
> project->filter->left-join is converted to project->anti-join. This will take 
> care of sub queries with “not exists” clause. The queries with “not exists” 
> are converted first to filter + left-join and then its converted to anti 
> join. The queries with “not in” are not handled in the current patch.
> From execution side, both merge join and map join with vectorized execution  
> is supported for anti join.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23716) Support Anti Join in Hive

2020-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23716?focusedWorklogId=461406=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461406
 ]

ASF GitHub Bot logged work on HIVE-23716:
-

Author: ASF GitHub Bot
Created on: 21/Jul/20 04:32
Start Date: 21/Jul/20 04:32
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r457829820



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -2162,7 +2162,8 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
 "Whether Hive enables the optimization about converting common join 
into mapjoin based on the input file size. \n" +
 "If this parameter is on, and the sum of size for n-1 of the 
tables/partitions for a n-way join is smaller than the\n" +
 "specified size, the join is directly converted to a mapjoin (there is 
no conditional task)."),
-
+HIVE_CONVERT_ANTI_JOIN("hive.auto.convert.anti.join", false,

Review comment:
   Is there any reason why we should not enable this by default in master? 
It seems it is always beneficial to execute the antijoin since we already have 
a vectorized implementation too. That would increase the test coverage for the 
feature.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 461406)
Time Spent: 1.5h  (was: 1h 20m)

> Support Anti Join in Hive 
> --
>
> Key: HIVE-23716
> URL: https://issues.apache.org/jira/browse/HIVE-23716
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23716.01.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Currently hive does not support Anti join. The query for anti join is 
> converted to left outer join and null filter on right side join key is added 
> to get the desired result. This is causing
>  # Extra computation — The left outer join projects the redundant columns 
> from right side. Along with that, filtering is done to remove the redundant 
> rows. This is can be avoided in case of anti join as anti join will project 
> only the required columns and rows from the left side table.
>  # Extra shuffle — In case of anti join the duplicate records moved to join 
> node can be avoided from the child node. This can reduce significant amount 
> of data movement if the number of distinct rows( join keys) is significant.
>  # Extra Memory Usage - In case of map based anti join , hash set is 
> sufficient as just the key is required to check  if the records matches the 
> join condition. In case of left join, we need the key and the non key columns 
> also and thus a hash table will be required.
> For a query like
> {code:java}
>  select wr_order_number FROM web_returns LEFT JOIN web_sales  ON 
> wr_order_number = ws_order_number WHERE ws_order_number IS NULL;{code}
> The number of distinct ws_order_number in web_sales table in a typical 10TB 
> TPCDS set up is just 10% of total records. So when we convert this query to 
> anti join, instead of 7 billion rows, only 600 million rows are moved to join 
> node.
> In the current patch, just one conversion is done. The pattern of 
> project->filter->left-join is converted to project->anti-join. This will take 
> care of sub queries with “not exists” clause. The queries with “not exists” 
> are converted first to filter + left-join and then its converted to anti 
> join. The queries with “not in” are not handled in the current patch.
> From execution side, both merge join and map join with vectorized execution  
> is supported for anti join.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23817) Pushing TopN Key operator PKFK inner joins

2020-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23817?focusedWorklogId=461405=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461405
 ]

ASF GitHub Bot logged work on HIVE-23817:
-

Author: ASF GitHub Bot
Created on: 21/Jul/20 04:26
Start Date: 21/Jul/20 04:26
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on a change in pull request #1228:
URL: https://github.com/apache/hive/pull/1228#discussion_r457824052



##
File path: parser/src/java/org/apache/hadoop/hive/ql/parse/HiveLexerParent.g
##
@@ -471,6 +471,8 @@ Number
 (Digit)+ ( DOT (Digit)* (Exponent)? | Exponent)?
 ;
 
+PKFK_JOIN: 'PKFK_JOIN';

Review comment:
   Can we prefix with `KW_` and move above with rest of keywords?

##
File path: ql/src/test/queries/clientpositive/topnkey_inner_join.q
##
@@ -0,0 +1,50 @@
+drop table if exists customer;
+drop table if exists orders;
+
+create table customer (id int, name string, email string);
+create table orders (customer_id int not null enforced, amount int);
+
+alter table customer add constraint pk_customer_id primary key (id) disable 
novalidate rely;
+alter table orders add constraint fk_order_customer_id foreign key 
(customer_id) references customer(id) disable novalidate rely;
+
+insert into customer values
+  (4, 'Heisenberg', 'heisenb...@email.com'),
+  (3, 'Smith', 'sm...@email.com'),
+  (2, 'Jones', 'jo...@email.com'),
+  (1, 'Robinson', 'robin...@email.com');
+
+insert into orders values
+  (2, 200),
+  (3, 40),
+  (1, 100),
+  (1, 50),
+  (3, 30);
+
+set hive.optimize.topnkey=true;
+set hive.optimize.limittranspose=false;
+
+select 'positive: order by columns are coming from child table';
+-- FIXME: explain select * from customer join orders on customer.id = 
orders.customer_id order by customer.id limit 3;

Review comment:
   I see this example is below. Can FIXME be removed?

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java
##
@@ -442,6 +450,40 @@ private QueryBlockInfo convertSource(RelNode r) throws 
CalciteSemanticException
 return new QueryBlockInfo(s, ast);
   }
 
+  /**
+   * Add PK-FK join information to the AST as a query hint
+   * @param ast
+   * @param join
+   * @param swapSides whether the left and right input of the join is swapped
+   */
+  private void addPkFkInfoToAST(ASTNode ast, Join join, boolean swapSides) {
+List joinFilters = new 
ArrayList<>(RelOptUtil.conjunctions(join.getCondition()));
+RelMetadataQuery mq = join.getCluster().getMetadataQuery();
+HiveRelOptUtil.PKFKJoinInfo rightInputResult =
+HiveRelOptUtil.extractPKFKJoin(join, joinFilters, false, mq);
+HiveRelOptUtil.PKFKJoinInfo leftInputResult =
+HiveRelOptUtil.extractPKFKJoin(join, joinFilters, true, mq);
+// Add the fkJoinIndex (0=left, 1=right, if swapSides is false) to the AST
+// check if the nonFK side is filtered
+if (leftInputResult.isPkFkJoin && 
leftInputResult.additionalPredicates.isEmpty()) {
+  RelNode nonFkInput = join.getRight();
+  ast.addChild(pkFkHint(swapSides ? 1 : 0, 
HiveRelOptUtil.isRowFilteringPlan(mq, nonFkInput)));
+} else if (rightInputResult.isPkFkJoin && 
rightInputResult.additionalPredicates.isEmpty()) {
+  RelNode nonFkInput = join.getLeft();
+  ast.addChild(pkFkHint(swapSides ? 0 : 1, 
HiveRelOptUtil.isRowFilteringPlan(mq, nonFkInput)));
+}
+  }
+
+  private ASTNode pkFkHint(int fkTableIndex, boolean nonFkSideIsFiltered) {
+ParseDriver parseDriver = new ParseDriver();
+try {
+  return parseDriver.parseHint(String.format("PKFK_JOIN(%d, %s)",
+  fkTableIndex, nonFkSideIsFiltered ? NON_FK_FILTERED : 
"notFiltered"));

Review comment:
   Naming (NON_FK_FILTERED : "notFiltered") is a bit confusing. We can 
simplify to NON_FK_FILTERED vs NON_FK_NOT_FILTERED? Create String in converter 
for both (or enum).

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/topnkey/TopNKeyPushdownProcessor.java
##
@@ -255,6 +260,88 @@ private void pushdownThroughLeftOuterJoin(TopNKeyOperator 
topNKey) throws Semant
 }
   }
 
+  private void pushdownInnerJoin(TopNKeyOperator topNKey, int 
fkJoinInputIndex, boolean nonFkSideIsFiltered) throws SemanticException {
+TopNKeyDesc topNKeyDesc = topNKey.getConf();
+CommonJoinOperator join =
+(CommonJoinOperator) 
topNKey.getParentOperators().get(0);
+List> joinInputs = 
join.getParentOperators();
+ReduceSinkOperator fkJoinInput = (ReduceSinkOperator) 
joinInputs.get(fkJoinInputIndex);
+if (nonFkSideIsFiltered) {
+  LOG.debug("Not pushing {} through {} as non FK side of the join is 
filtered", topNKey.getName(), join.getName());
+  return;
+}
+// Check column origins:
+//  1. If all OrderBy columns are coming from the child (FK) 

[jira] [Work logged] (HIVE-23815) output statistics of underlying datastore

2020-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23815?focusedWorklogId=461396=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461396
 ]

ASF GitHub Bot logged work on HIVE-23815:
-

Author: ASF GitHub Bot
Created on: 21/Jul/20 03:37
Start Date: 21/Jul/20 03:37
Worklog Time Spent: 10m 
  Work Description: xinghuayu007 commented on pull request #1281:
URL: https://github.com/apache/hive/pull/1281#issuecomment-66166


   @belugabehr ,Hi, could you have time to have a look at this new feature!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 461396)
Time Spent: 3.5h  (was: 3h 20m)

> output statistics of underlying datastore 
> --
>
> Key: HIVE-23815
> URL: https://issues.apache.org/jira/browse/HIVE-23815
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rossetti Wong
>Assignee: Rossetti Wong
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> This patch provides a way to get the statistics data of metastore's 
> underlying datastore, like MySQL, Oracle and so on.  You can get the number 
> of datastore reads and writes, the average time of transaction execution, the 
> total active connection and so on.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-23873) Querying Hive JDBCStorageHandler table fails with NPE

2020-07-20 Thread Chiran Ravani (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161674#comment-17161674
 ] 

Chiran Ravani edited comment on HIVE-23873 at 7/21/20, 3:09 AM:


[~srahman] Thanks for adding more light to it.  I am now able to further narrow 
down the issue with CBO. The problem appears only when CBO is off.  Below is 
the query run with CBO on for Derby.

{code}
20/07/20 19:49:18 [787073ad-c44e-4e04-8cfb-ba49e14602a0 main]: INFO 
dao.GenericJdbcDatabaseAccessor: Query to execute is [SELECT "IKEY", "bkey", 
"fkey", "dkey"
FROM "EXTERNAL_JDBC_SIMPLE_DERBY2_TABLE1"]
20/07/20 19:49:18 [Hive Hook Proto Log Writer 0]: INFO zlib.ZlibFactory: 
Successfully loaded & initialized native-zlib library
20/07/20 19:49:18 [Hive Hook Proto Log Writer 0]: INFO compress.CodecPool: Got 
brand-new compressor [.deflate]
20/07/20 19:49:18 [787073ad-c44e-4e04-8cfb-ba49e14602a0 main]: INFO 
jdbc.JdbcSerDe:  Blob data = {dkey=OW[class=class 
java.lang.Double,value=20.0], bkey=OW[class=class java.lang.Long,value=20], 
fkey=OW[class=class java.lang.Float,value=20.0], IKEY=OW[class=class 
java.lang.Integer,value=20]}
{code}

When turning off CBO, query picked is Select * and problem appears.

{code}
20/07/21 03:08:44 [ce531fd3-ca3f-4dc4-8681-3cf1233f27df main]: INFO 
jdbc.JdbcInputFormat: Creating 1 input split limit:-1, offset:0
20/07/21 03:08:44 [ce531fd3-ca3f-4dc4-8681-3cf1233f27df main]: INFO 
dao.GenericJdbcDatabaseAccessor: Query to execute is [select * from 
EXTERNAL_JDBC_SIMPLE_DERBY2_TABLE1]
Failed with exception java.io.IOException:java.lang.NullPointerException
20/07/21 03:08:44 [ce531fd3-ca3f-4dc4-8681-3cf1233f27df main]: ERROR CliDriver: 
Failed with exception java.io.IOException:java.lang.NullPointerException
java.io.IOException: java.lang.NullPointerException
.
.
Caused by: java.lang.NullPointerException
at 
org.apache.hive.storage.jdbc.JdbcSerDe.deserialize(JdbcSerDe.java:235)
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:619)
... 17 more
{code}

With Oracle, the CBO fails and skipped causing the problem, below is my trace.
{code}
20/07/21 02:13:18 [Heartbeater-0]: INFO metastore.RetryingMetaStoreClient: 
RetryingMetaStoreClient proxy=class 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient 
ugi=h...@example.com (auth:KERBEROS) retries=24 delay=5 lifetime=0
20/07/21 02:14:31 [f68e0ab8-1585-4db5-a872-20790c5eb3dd main]: ERROR 
parse.CalcitePlanner: CBO failed, skipping CBO.
java.lang.IllegalArgumentException: Multiple entries with same key: 
APEX_ACTIVITY_LOG=JdbcTable {APEX_ACTIVITY_LOG} and APEX_ACTIVITY_LOG=JdbcTable 
{APEX_ACTIVITY_LOG}
at 
org.apache.hive.com.google.common.collect.ImmutableMap.checkNoConflict(ImmutableMap.java:136)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hive.com.google.common.collect.RegularImmutableMap.checkNoConflictInKeyBucket(RegularImmutableMap.java:98)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hive.com.google.common.collect.RegularImmutableMap.fromEntryArray(RegularImmutableMap.java:84)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hive.com.google.common.collect.ImmutableMap$Builder.build(ImmutableMap.java:295)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.calcite.adapter.jdbc.JdbcSchema.computeTables(JdbcSchema.java:295) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.calcite.adapter.jdbc.JdbcSchema.getTableMap(JdbcSchema.java:351) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.calcite.adapter.jdbc.JdbcSchema.getTable(JdbcSchema.java:345) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genTableLogicalPlan(CalcitePlanner.java:3203)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:5182)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1849)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1795)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:130) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:915)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:179) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:125) 

[jira] [Commented] (HIVE-23873) Querying Hive JDBCStorageHandler table fails with NPE

2020-07-20 Thread Chiran Ravani (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161674#comment-17161674
 ] 

Chiran Ravani commented on HIVE-23873:
--

[~srahman] Thanks for adding more light to it.  I am now able to further narrow 
down the issue with CBO. The problem appears only when CBO is off.  Below is 
the query run with CBO on for Derby.

{code}
20/07/20 19:49:18 [787073ad-c44e-4e04-8cfb-ba49e14602a0 main]: INFO 
dao.GenericJdbcDatabaseAccessor: Query to execute is [SELECT "IKEY", "bkey", 
"fkey", "dkey"
FROM "EXTERNAL_JDBC_SIMPLE_DERBY2_TABLE1"]
20/07/20 19:49:18 [Hive Hook Proto Log Writer 0]: INFO zlib.ZlibFactory: 
Successfully loaded & initialized native-zlib library
20/07/20 19:49:18 [Hive Hook Proto Log Writer 0]: INFO compress.CodecPool: Got 
brand-new compressor [.deflate]
20/07/20 19:49:18 [787073ad-c44e-4e04-8cfb-ba49e14602a0 main]: INFO 
jdbc.JdbcSerDe:  Blob data = {dkey=OW[class=class 
java.lang.Double,value=20.0], bkey=OW[class=class java.lang.Long,value=20], 
fkey=OW[class=class java.lang.Float,value=20.0], IKEY=OW[class=class 
java.lang.Integer,value=20]}
{code}

When turning off CBO, query picked is Select * and problem appears.

{code}
20/07/20 20:00:56 [bfdb833e-8366-451c-86ef-f38e2b002690 main]: INFO 
dao.GenericJdbcDatabaseAccessor: Query to execute is [select * from 
TESTHIVEJDBCSTORAGE]
20/07/20 20:00:56 [bfdb833e-8366-451c-86ef-f38e2b002690 main]: INFO 
jdbc.JdbcSerDe:  Blob data = {fname=OW[class=class 
java.lang.String,value=Name1], id=OW[class=class java.lang.Integer,value=1]}
{code}

With Oracle, the CBO fails and skipped causing the problem, below is my trace.
{code}
20/07/21 02:13:18 [Heartbeater-0]: INFO metastore.RetryingMetaStoreClient: 
RetryingMetaStoreClient proxy=class 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient 
ugi=h...@example.com (auth:KERBEROS) retries=24 delay=5 lifetime=0
20/07/21 02:14:31 [f68e0ab8-1585-4db5-a872-20790c5eb3dd main]: ERROR 
parse.CalcitePlanner: CBO failed, skipping CBO.
java.lang.IllegalArgumentException: Multiple entries with same key: 
APEX_ACTIVITY_LOG=JdbcTable {APEX_ACTIVITY_LOG} and APEX_ACTIVITY_LOG=JdbcTable 
{APEX_ACTIVITY_LOG}
at 
org.apache.hive.com.google.common.collect.ImmutableMap.checkNoConflict(ImmutableMap.java:136)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hive.com.google.common.collect.RegularImmutableMap.checkNoConflictInKeyBucket(RegularImmutableMap.java:98)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hive.com.google.common.collect.RegularImmutableMap.fromEntryArray(RegularImmutableMap.java:84)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hive.com.google.common.collect.ImmutableMap$Builder.build(ImmutableMap.java:295)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.calcite.adapter.jdbc.JdbcSchema.computeTables(JdbcSchema.java:295) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.calcite.adapter.jdbc.JdbcSchema.getTableMap(JdbcSchema.java:351) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.calcite.adapter.jdbc.JdbcSchema.getTable(JdbcSchema.java:345) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genTableLogicalPlan(CalcitePlanner.java:3203)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:5182)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1849)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1795)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:130) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:915)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:179) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:125) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1556)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:541)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12460)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 

[jira] [Updated] (HIVE-23873) Querying Hive JDBCStorageHandler table fails with NPE

2020-07-20 Thread Chiran Ravani (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chiran Ravani updated HIVE-23873:
-
Attachment: HIVE-23873.02.patch

> Querying Hive JDBCStorageHandler table fails with NPE
> -
>
> Key: HIVE-23873
> URL: https://issues.apache.org/jira/browse/HIVE-23873
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, JDBC
>Affects Versions: 3.1.0, 3.1.1, 3.1.2
>Reporter: Chiran Ravani
>Assignee: Chiran Ravani
>Priority: Critical
> Attachments: HIVE-23873.01.patch, HIVE-23873.02.patch
>
>
> Scenario is Hive table having same schema as table in Oracle, however when we 
> query the table with data it fails with NPE, below is the trace.
> {code}
> Caused by: java.io.IOException: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:617)
>  ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:524) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2739) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229)
>  ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:473)
>  ~[hive-service-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> ... 34 more
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hive.storage.jdbc.JdbcSerDe.deserialize(JdbcSerDe.java:164) 
> ~[hive-jdbc-handler-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:598)
>  ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:524) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2739) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229)
>  ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:473)
>  ~[hive-service-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> ... 34 more
> {code}
> Problem appears when column names in Oracle are in Upper case and since in 
> Hive, table and column names are forced to store in lowercase during 
> creation. User runs into NPE error while fetching data.
> While deserializing data, input consists of column names in lower case which 
> fails to get the value
> https://github.com/apache/hive/blob/rel/release-3.1.2/jdbc-handler/src/main/java/org/apache/hive/storage/jdbc/JdbcSerDe.java#L136
> {code}
> rowVal = ((ObjectWritable)value).get();
> {code}
> Log Snio:
> =
> {code}
> 2020-07-17T16:49:09,598 INFO  [04ed42ec-91d2-4662-aee7-37e840a06036 
> HiveServer2-Handler-Pool: Thread-104]: dao.GenericJdbcDatabaseAccessor (:()) 
> - Query to execute is [select * from TESTHIVEJDBCSTORAGE]
> 2020-07-17T16:49:10,642 INFO  [04ed42ec-91d2-4662-aee7-37e840a06036 
> HiveServer2-Handler-Pool: Thread-104]: jdbc.JdbcSerDe (:()) - *** ColumnKey = 
> ID
> 2020-07-17T16:49:10,642 INFO  [04ed42ec-91d2-4662-aee7-37e840a06036 
> HiveServer2-Handler-Pool: Thread-104]: jdbc.JdbcSerDe (:()) - *** Blob value 
> = {fname=OW[class=class java.lang.String,value=Name1], id=OW[class=class 
> java.lang.Integer,value=1]}
> {code}
> Simple Reproducer for this case.
> =
> 1. Create table in Oracle
> {code}
> create table TESTHIVEJDBCSTORAGE(ID INT, FNAME VARCHAR(20));
> {code}
> 2. Insert dummy data.
> {code}
> Insert into TESTHIVEJDBCSTORAGE values (1, 'Name1');
> {code}
> 3. Create JDBCStorageHandler table in Hive.
> {code}
> CREATE EXTERNAL TABLE default.TESTHIVEJDBCSTORAGE_HIVE_TBL (ID INT, FNAME 
> VARCHAR(20)) 
> STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler' 
> TBLPROPERTIES ( 
> "hive.sql.database.type" = "ORACLE", 
> "hive.sql.jdbc.driver" = "oracle.jdbc.OracleDriver", 
> "hive.sql.jdbc.url" = "jdbc:oracle:thin:@orachehostname/XE", 
> "hive.sql.dbcp.username" = "chiran", 
> "hive.sql.dbcp.password" = "supersecurepassword", 
> "hive.sql.table" = "TESTHIVEJDBCSTORAGE", 
> "hive.sql.dbcp.maxActive" = "1" 

[jira] [Work logged] (HIVE-23843) Improve key evictions in VectorGroupByOperator

2020-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23843?focusedWorklogId=461375=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461375
 ]

ASF GitHub Bot logged work on HIVE-23843:
-

Author: ASF GitHub Bot
Created on: 21/Jul/20 02:11
Start Date: 21/Jul/20 02:11
Worklog Time Spent: 10m 
  Work Description: rbalamohan commented on pull request #1250:
URL: https://github.com/apache/hive/pull/1250#issuecomment-661564023


   Simplified and revised the patch. Q67 @ 10 TB scale shows good improvement 
(2050+ seconds --> 1500+ seconds) in internal cluster.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 461375)
Time Spent: 1h 10m  (was: 1h)

> Improve key evictions in VectorGroupByOperator
> --
>
> Key: HIVE-23843
> URL: https://issues.apache.org/jira/browse/HIVE-23843
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Keys in {{mapKeysAggregationBuffers}} are evicted in random order. Tasks also 
> get into GC issues when multiple keys are involved in groupbys. It would be 
> good to provide an option to have LRU based eviction for 
> mapKeysAggregationBuffers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23843) Improve key evictions in VectorGroupByOperator

2020-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23843?focusedWorklogId=461374=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461374
 ]

ASF GitHub Bot logged work on HIVE-23843:
-

Author: ASF GitHub Bot
Created on: 21/Jul/20 02:07
Start Date: 21/Jul/20 02:07
Worklog Time Spent: 10m 
  Work Description: rbalamohan commented on a change in pull request #1250:
URL: https://github.com/apache/hive/pull/1250#discussion_r457792912



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java
##
@@ -358,6 +360,34 @@ public void close(boolean aborted) throws HiveException {
  */
 private long numRowsCompareHashAggr;
 
+/**
+ * To track current memory usage.
+ */
+private long currMemUsed;
+
+/**
+ * Whether to make use of LRUCache for map aggr buffers or not.
+ */
+private boolean lruCache;
+
+class LRUCache extends LinkedHashMap {

Review comment:
   LinkedHashMap provides this semantics and already maintains the double 
linked list internally. Invokes removeEldestEntry on need basis.  However, 
planning to make it a lot more simpler in next iteration of the patch.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 461374)
Time Spent: 1h  (was: 50m)

> Improve key evictions in VectorGroupByOperator
> --
>
> Key: HIVE-23843
> URL: https://issues.apache.org/jira/browse/HIVE-23843
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Keys in {{mapKeysAggregationBuffers}} are evicted in random order. Tasks also 
> get into GC issues when multiple keys are involved in groupbys. It would be 
> good to provide an option to have LRU based eviction for 
> mapKeysAggregationBuffers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23846) Avoid unnecessary serialization and deserialization of bitvectors

2020-07-20 Thread Naveen Gangam (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161655#comment-17161655
 ] 

Naveen Gangam commented on HIVE-23846:
--

I have +1'ed the fix on PR.

> Avoid unnecessary serialization and deserialization of bitvectors
> -
>
> Key: HIVE-23846
> URL: https://issues.apache.org/jira/browse/HIVE-23846
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Reporter: Yu-Wen Lai
>Assignee: Yu-Wen Lai
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In the method *getNdvEstimator* of *ColumnStatsDataInspector*, it 
> will call isSetBitVectors(), in which it serializes the bitvectors again when 
> we already have deserialized bitvectors _ndvEstimator_. For example, we can 
> see this pattern from 
> [LongColumnStatsDataInspector|[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/columnstats/cache/LongColumnStatsDataInspector.java#L106]].
> This method could check if the _ndvEstimator_ is set first so that it won't 
> need to serialize and deserialize back.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23432) Add Ranger Replication Metrics

2020-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23432?focusedWorklogId=461354=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461354
 ]

ASF GitHub Bot logged work on HIVE-23432:
-

Author: ASF GitHub Bot
Created on: 21/Jul/20 00:33
Start Date: 21/Jul/20 00:33
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1015:
URL: https://github.com/apache/hive/pull/1015


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 461354)
Time Spent: 20m  (was: 10m)

> Add Ranger Replication Metrics 
> ---
>
> Key: HIVE-23432
> URL: https://issues.apache.org/jira/browse/HIVE-23432
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23432.01.patch, HIVE-23432.02.patch, 
> HIVE-23432.03.patch, HIVE-23432.04.patch, HIVE-23432.05.patch, 
> HIVE-23432.06.patch, HIVE-23432.07.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23438) Missing Rows When Left Outer Join In N-way HybridGraceHashJoin

2020-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23438?focusedWorklogId=461355=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461355
 ]

ASF GitHub Bot logged work on HIVE-23438:
-

Author: ASF GitHub Bot
Created on: 21/Jul/20 00:33
Start Date: 21/Jul/20 00:33
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1014:
URL: https://github.com/apache/hive/pull/1014


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 461355)
Time Spent: 20m  (was: 10m)

> Missing Rows When Left Outer Join In N-way HybridGraceHashJoin
> --
>
> Key: HIVE-23438
> URL: https://issues.apache.org/jira/browse/HIVE-23438
> Project: Hive
>  Issue Type: Bug
>  Components: SQL, Tez
>Affects Versions: 2.3.4
>Reporter: 范宜臻
>Assignee: 范宜臻
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23438.001.branch-2.3.patch, 
> HIVE-23438.branch-2.3.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> *Run Test in Patch File*
> {code:java}
> mvn test -Dtest=TestMiniTezCliDriver -Dqfile=hybridgrace_hashjoin_2.q{code}
> *Manual Reproduce*
> *STEP 1. Create test data(q_test_init_tez.sql)*
> {code:java}
> //create table src1
> CREATE TABLE src1 (key STRING COMMENT 'default', value STRING COMMENT 
> 'default') STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv3.txt" INTO TABLE src1;
> //create table src2
> CREATE TABLE src2(key STRING COMMENT 'default', value STRING COMMENT 
> 'default') STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv11.txt" OVERWRITE INTO 
> TABLE src2;
> //create table srcpart
> CREATE TABLE srcpart (key STRING COMMENT 'default', value STRING COMMENT 
> 'default')
> PARTITIONED BY (ds STRING, hr STRING)
> STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="11");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="12");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="11");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="12");{code}
> *STEP 2. Run query*
> {code:java}
> set hive.auto.convert.join=true; 
> set hive.auto.convert.join.noconditionaltask=true; 
> set hive.auto.convert.join.noconditionaltask.size=1000; 
> set hive.cbo.enable=false;
> set hive.mapjoin.hybridgrace.hashtable=true;
> select *
> from
> (
> select key from src1 group by key
> ) x
> left join src2 z on x.key = z.key
> join
> (
> select key from srcpart y group by key
> ) y on y.key = x.key;
> {code}
> *EXPECTED RESULT***
>  
> {code:java}
> 128   NULLNULL128
> 146   146 1val_1461   146
> 150   150 1val_1501   150
> 238   NULLNULL238
> 369   NULLNULL369
> 406   406 1val_4061   406
> 273   273 1val_2731   273
> 98NULLNULL98
> 213   213 1val_2131   213
> 255   NULLNULL255
> 401   401 1val_4011   401
> 278   NULLNULL278
> 6666  11val_6611  66
> 224   NULLNULL224
> 311   NULLNULL311
> {code}
>  
> *ACTUAL RESULT*
> {code:java}
> 128   NULLNULL128
> 146   146 1val_1461   146
> 150   150 1val_1501   150
> 213   213 1val_2131   213
> 238   NULLNULL238
> 273   273 1val_2731   273
> 369   NULLNULL369
> 406   406 1val_4061   406
> 98NULLNULL98
> 401   401 1val_4011   401
> 6666  11val_6611  66
> {code}
>  
> *ROOT CAUSE*
> src1 left join src2, src1 is big table and src2 is small table. Join result 
> between big table row and the corresponding hashtable maybe NO_MATCH state, 
> however, these NO_MATCH rows is needed because LEFT OUTER JOIN.
> In addition, these big table rows will not spilled into matchfile related to 
> this hashtable on disk because only SPILL state can use `spillBigTableRow`.  
> Then, these big table rows will be spilled into matchfile in hashtables of 
> table `srcpart`(second small table)
> Finally, when reProcessBigTable, big table rows in matchfile are only read 
> from `firstSmallTable`, some datum are missing.
>  
> *WORKAROUND*
>  configure firstSmallTable in 

[jira] [Work logged] (HIVE-23786) HMS Server side filter

2020-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23786?focusedWorklogId=461314=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461314
 ]

ASF GitHub Bot logged work on HIVE-23786:
-

Author: ASF GitHub Bot
Created on: 20/Jul/20 22:54
Start Date: 20/Jul/20 22:54
Worklog Time Spent: 10m 
  Work Description: sam-an-cloudera commented on a change in pull request 
#1221:
URL: https://github.com/apache/hive/pull/1221#discussion_r457736948



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/metastore/HiveMetaStoreAuthorizer.java
##
@@ -312,5 +578,13 @@ private String getCurrentUser() {
   private String getCurrentUser(HiveMetaStoreAuthorizableEvent 
authorizableEvent) {
 return authorizableEvent.getAuthzContext().getUGI().getShortUserName();
   }
+
+  private UserGroupInformation getUGI() {
+try {
+  return UserGroupInformation.getCurrentUser();
+} catch (IOException excp) {

Review comment:
   Good catch. I didn't know we were just copy and pasting. Other codes 
using getCurrentUser were either crashing or rethrow, so I've changed it to 
throw, and more importantly, return false to the skipAuthorization call. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 461314)
Time Spent: 3.5h  (was: 3h 20m)

> HMS Server side filter
> --
>
> Key: HIVE-23786
> URL: https://issues.apache.org/jira/browse/HIVE-23786
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sam An
>Assignee: Sam An
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> HMS server side filter of results based on authorization. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge

2020-07-20 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161559#comment-17161559
 ] 

Stamatis Zampetakis commented on HIVE-23880:


Hi [~abstractdog], I was looking at this part of the code in the past but I had 
the impression that bitwise OR for the sizes that you cite is in the order of a 
few seconds (2-3sec) not in the order of minutes. Out of curiosity how did you 
verify that 1-2 minutes are spend in the computation of the OR operation?

Apart from that I'm curious about the benefit brought by the parallelization of 
this computation. If I recall well when I tried something similar for another 
scenario the improvement was rather subtle; context-switching, cache misses 
along with the extra code needed for the parallel version counterbalanced the 
benefit. I think I have somewhere a micro-bench that I can adapt rather easily 
for this case.

> Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge
> ---
>
> Key: HIVE-23880
> URL: https://issues.apache.org/jira/browse/HIVE-23880
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Attachments: lipwig-output3605036885489193068.svg
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Merging bloom filters in semijoin reduction can become the main bottleneck in 
> case of large number of source mapper tasks (~1000, Map 1 in below example) 
> and a large amount of expected entries (50M) in bloom filters.
> For example in TPCDS Q93:
> {code}
> select /*+ semi(store_returns, sr_item_sk, store_sales, 7000)*/ 
> ss_customer_sk
> ,sum(act_sales) sumsales
>   from (select ss_item_sk
>   ,ss_ticket_number
>   ,ss_customer_sk
>   ,case when sr_return_quantity is not null then 
> (ss_quantity-sr_return_quantity)*ss_sales_price
> else 
> (ss_quantity*ss_sales_price) end act_sales
> from store_sales left outer join store_returns on (sr_item_sk = 
> ss_item_sk
>and 
> sr_ticket_number = ss_ticket_number)
> ,reason
> where sr_reason_sk = r_reason_sk
>   and r_reason_desc = 'reason 66') t
>   group by ss_customer_sk
>   order by sumsales, ss_customer_sk
> limit 100;
> {code}
> On 10TB-30TB scale there is a chance that from 3-4 mins of query runtime 1-2 
> mins are spent with merging bloom filters (Reducer 2), as in:  
> [^lipwig-output3605036885489193068.svg] 
> {code}
> --
> VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  KILLED
> --
> Map 3 ..  llap SUCCEEDED  1  100  
>  0   0
> Map 1 ..  llap SUCCEEDED   1263   126300  
>  0   0
> Reducer 2 llap   RUNNING  1  010  
>  0   0
> Map 4 llap   RUNNING   6154  0  207 5947  
>  0   0
> Reducer 5 llapINITED 43  00   43  
>  0   0
> Reducer 6 llapINITED  1  001  
>  0   0
> --
> VERTICES: 02/06  [>>--] 16%   ELAPSED TIME: 149.98 s
> --
> {code}
> For example, 70M entries in bloom filter leads to a 436 465 696 bits, so 
> merging 1263 bloom filters means running ~ 1263 * 436 465 696 bitwise OR 
> operation, which is very hot codepath, but can be parallelized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-16490) Hive should not use private HDFS APIs for encryption

2020-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-16490?focusedWorklogId=461282=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461282
 ]

ASF GitHub Bot logged work on HIVE-16490:
-

Author: ASF GitHub Bot
Created on: 20/Jul/20 21:21
Start Date: 20/Jul/20 21:21
Worklog Time Spent: 10m 
  Work Description: umamaheswararao commented on pull request #1279:
URL: https://github.com/apache/hive/pull/1279#issuecomment-661339873


   Thanks @ashutoshc for the review!. I have updated it. Please take a look. 
Thank you



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 461282)
Time Spent: 40m  (was: 0.5h)

> Hive should not use private HDFS APIs for encryption
> 
>
> Key: HIVE-16490
> URL: https://issues.apache.org/jira/browse/HIVE-16490
> Project: Hive
>  Issue Type: Improvement
>  Components: Encryption
>Affects Versions: 2.2.0
>Reporter: Andrew Wang
>Assignee: Naveen Gangam
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When compiling against bleeding edge versions of Hive and Hadoop, we 
> discovered that HIVE-16047 references a private HDFS API, DFSClient, to get 
> at various encryption related information. The private API was recently 
> changed by HADOOP-14104, which broke Hive compilation.
> It'd be better to instead use publicly supported APIs. HDFS-11687 has been 
> filed to add whatever encryption APIs are needed by Hive. This JIRA is to 
> move Hive over to these new APIs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23870) Optimise multiple text conversions in WritableHiveCharObjectInspector.getPrimitiveJavaObject / HiveCharWritable

2020-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23870?focusedWorklogId=461259=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461259
 ]

ASF GitHub Bot logged work on HIVE-23870:
-

Author: ASF GitHub Bot
Created on: 20/Jul/20 20:26
Start Date: 20/Jul/20 20:26
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on pull request #1282:
URL: https://github.com/apache/hive/pull/1282#issuecomment-661314887


   https://github.com/apache/hadoop/pull/2157



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 461259)
Time Spent: 40m  (was: 0.5h)

> Optimise multiple text conversions in 
> WritableHiveCharObjectInspector.getPrimitiveJavaObject / HiveCharWritable
> ---
>
> Key: HIVE-23870
> URL: https://issues.apache.org/jira/browse/HIVE-23870
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2020-07-17-11-31-38-241.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Observed this when creating materialized view.
> [https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/WritableHiveCharObjectInspector.java#L85]
> Same content is converted to Text multiple times.
> !image-2020-07-17-11-31-38-241.png|width=1048,height=936!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-16490) Hive should not use private HDFS APIs for encryption

2020-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-16490?focusedWorklogId=461246=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461246
 ]

ASF GitHub Bot logged work on HIVE-16490:
-

Author: ASF GitHub Bot
Created on: 20/Jul/20 20:06
Start Date: 20/Jul/20 20:06
Worklog Time Spent: 10m 
  Work Description: umamaheswararao commented on a change in pull request 
#1279:
URL: https://github.com/apache/hive/pull/1279#discussion_r457661391



##
File path: 
shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java
##
@@ -1223,11 +1223,31 @@ public static boolean isHdfsEncryptionSupported() {
 private final Configuration conf;
 
 public HdfsEncryptionShim(URI uri, Configuration conf) throws IOException {
-  DistributedFileSystem dfs = (DistributedFileSystem)FileSystem.get(uri, 
conf);
-
   this.conf = conf;
-  this.keyProvider = dfs.getClient().getKeyProvider();
   this.hdfsAdmin = new HdfsAdmin(uri, conf);
+  this.keyProvider = getKeyProvider();
+}
+
+private KeyProvider getKeyProvider() throws IOException {
+  if (isMethodExist(HdfsAdmin.class, "getKeyProvider")) {

Review comment:
   That is great. I will remove that then. Thanks for pointing.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 461246)
Time Spent: 0.5h  (was: 20m)

> Hive should not use private HDFS APIs for encryption
> 
>
> Key: HIVE-16490
> URL: https://issues.apache.org/jira/browse/HIVE-16490
> Project: Hive
>  Issue Type: Improvement
>  Components: Encryption
>Affects Versions: 2.2.0
>Reporter: Andrew Wang
>Assignee: Naveen Gangam
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When compiling against bleeding edge versions of Hive and Hadoop, we 
> discovered that HIVE-16047 references a private HDFS API, DFSClient, to get 
> at various encryption related information. The private API was recently 
> changed by HADOOP-14104, which broke Hive compilation.
> It'd be better to instead use publicly supported APIs. HDFS-11687 has been 
> filed to add whatever encryption APIs are needed by Hive. This JIRA is to 
> move Hive over to these new APIs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-16490) Hive should not use private HDFS APIs for encryption

2020-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-16490?focusedWorklogId=461243=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461243
 ]

ASF GitHub Bot logged work on HIVE-16490:
-

Author: ASF GitHub Bot
Created on: 20/Jul/20 20:03
Start Date: 20/Jul/20 20:03
Worklog Time Spent: 10m 
  Work Description: ashutoshc commented on a change in pull request #1279:
URL: https://github.com/apache/hive/pull/1279#discussion_r457659517



##
File path: 
shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java
##
@@ -1223,11 +1223,31 @@ public static boolean isHdfsEncryptionSupported() {
 private final Configuration conf;
 
 public HdfsEncryptionShim(URI uri, Configuration conf) throws IOException {
-  DistributedFileSystem dfs = (DistributedFileSystem)FileSystem.get(uri, 
conf);
-
   this.conf = conf;
-  this.keyProvider = dfs.getClient().getKeyProvider();
   this.hdfsAdmin = new HdfsAdmin(uri, conf);
+  this.keyProvider = getKeyProvider();
+}
+
+private KeyProvider getKeyProvider() throws IOException {
+  if (isMethodExist(HdfsAdmin.class, "getKeyProvider")) {

Review comment:
   this is not needed. Hive's minimum version required for Hadoop is 3.0





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 461243)
Time Spent: 20m  (was: 10m)

> Hive should not use private HDFS APIs for encryption
> 
>
> Key: HIVE-16490
> URL: https://issues.apache.org/jira/browse/HIVE-16490
> Project: Hive
>  Issue Type: Improvement
>  Components: Encryption
>Affects Versions: 2.2.0
>Reporter: Andrew Wang
>Assignee: Naveen Gangam
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When compiling against bleeding edge versions of Hive and Hadoop, we 
> discovered that HIVE-16047 references a private HDFS API, DFSClient, to get 
> at various encryption related information. The private API was recently 
> changed by HADOOP-14104, which broke Hive compilation.
> It'd be better to instead use publicly supported APIs. HDFS-11687 has been 
> filed to add whatever encryption APIs are needed by Hive. This JIRA is to 
> move Hive over to these new APIs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23882) Compiler should skip MJ keyExpr for probe optimization

2020-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23882:
--
Labels: pull-request-available  (was: )

> Compiler should skip MJ keyExpr for probe optimization
> --
>
> Key: HIVE-23882
> URL: https://issues.apache.org/jira/browse/HIVE-23882
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In probe we cannot currently support Key expressions (on the big table Side) 
> as ORC CVs Probe directly the smalltable HT (there is no expr evaluation at 
> that level).
> TezCompiler should take this into account when picking MJs to push probe 
> details



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23882) Compiler should skip MJ keyExpr for probe optimization

2020-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23882?focusedWorklogId=461233=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461233
 ]

ASF GitHub Bot logged work on HIVE-23882:
-

Author: ASF GitHub Bot
Created on: 20/Jul/20 19:30
Start Date: 20/Jul/20 19:30
Worklog Time Spent: 10m 
  Work Description: pgaref opened a new pull request #1286:
URL: https://github.com/apache/hive/pull/1286


   Change-Id: I1033a65f26592ef3683b7aa0a669e0c378667278
   
   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY)
   For more details, please see 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 461233)
Remaining Estimate: 0h
Time Spent: 10m

> Compiler should skip MJ keyExpr for probe optimization
> --
>
> Key: HIVE-23882
> URL: https://issues.apache.org/jira/browse/HIVE-23882
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In probe we cannot currently support Key expressions (on the big table Side) 
> as ORC CVs Probe directly the smalltable HT (there is no expr evaluation at 
> that level).
> TezCompiler should take this into account when picking MJs to push probe 
> details



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23885) Remove Hive on Spark

2020-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23885:
--
Labels: pull-request-available  (was: )

> Remove Hive on Spark
> 
>
> Key: HIVE-23885
> URL: https://issues.apache.org/jira/browse/HIVE-23885
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23885) Remove Hive on Spark

2020-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23885?focusedWorklogId=461225=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461225
 ]

ASF GitHub Bot logged work on HIVE-23885:
-

Author: ASF GitHub Bot
Created on: 20/Jul/20 19:02
Start Date: 20/Jul/20 19:02
Worklog Time Spent: 10m 
  Work Description: belugabehr opened a new pull request #1285:
URL: https://github.com/apache/hive/pull/1285


   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 461225)
Remaining Estimate: 0h
Time Spent: 10m

> Remove Hive on Spark
> 
>
> Key: HIVE-23885
> URL: https://issues.apache.org/jira/browse/HIVE-23885
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Critical
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23885) Remove Hive on Spark

2020-07-20 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-23885:
-


> Remove Hive on Spark
> 
>
> Key: HIVE-23885
> URL: https://issues.apache.org/jira/browse/HIVE-23885
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Critical
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23870) Optimise multiple text conversions in WritableHiveCharObjectInspector.getPrimitiveJavaObject / HiveCharWritable

2020-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23870?focusedWorklogId=461217=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461217
 ]

ASF GitHub Bot logged work on HIVE-23870:
-

Author: ASF GitHub Bot
Created on: 20/Jul/20 18:36
Start Date: 20/Jul/20 18:36
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on pull request #1282:
URL: https://github.com/apache/hive/pull/1282#issuecomment-661261016


   This is an interesting observation.  However, maybe this should be 
contributed to Hadoop project directly.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 461217)
Time Spent: 0.5h  (was: 20m)

> Optimise multiple text conversions in 
> WritableHiveCharObjectInspector.getPrimitiveJavaObject / HiveCharWritable
> ---
>
> Key: HIVE-23870
> URL: https://issues.apache.org/jira/browse/HIVE-23870
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2020-07-17-11-31-38-241.png
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Observed this when creating materialized view.
> [https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/WritableHiveCharObjectInspector.java#L85]
> Same content is converted to Text multiple times.
> !image-2020-07-17-11-31-38-241.png|width=1048,height=936!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23870) Optimise multiple text conversions in WritableHiveCharObjectInspector.getPrimitiveJavaObject / HiveCharWritable

2020-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23870?focusedWorklogId=461211=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461211
 ]

ASF GitHub Bot logged work on HIVE-23870:
-

Author: ASF GitHub Bot
Created on: 20/Jul/20 18:29
Start Date: 20/Jul/20 18:29
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on pull request #1282:
URL: https://github.com/apache/hive/pull/1282#issuecomment-661258825


   So, this caches the charLength ?  That is the fix here?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 461211)
Time Spent: 20m  (was: 10m)

> Optimise multiple text conversions in 
> WritableHiveCharObjectInspector.getPrimitiveJavaObject / HiveCharWritable
> ---
>
> Key: HIVE-23870
> URL: https://issues.apache.org/jira/browse/HIVE-23870
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2020-07-17-11-31-38-241.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Observed this when creating materialized view.
> [https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/WritableHiveCharObjectInspector.java#L85]
> Same content is converted to Text multiple times.
> !image-2020-07-17-11-31-38-241.png|width=1048,height=936!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-23871) ObjectStore should properly handle MicroManaged Table properties

2020-07-20 Thread Ashutosh Chauhan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-23871.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master. Thanks, Panos!

> ObjectStore should properly handle MicroManaged Table properties
> 
>
> Key: HIVE-23871
> URL: https://issues.apache.org/jira/browse/HIVE-23871
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: table1
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> HIVE-23281 optimizes StorageDescriptor conversion as part of the ObjectStore 
> by skipping particular Table properties like SkewInfo, bucketCols, ordering 
> etc.
>  However, it does that for all Transactional Tables – not only ACID – causing 
> MicroManaged Tables to behave abnormally.
>  MicroManaged (insert_only) tables may miss needed properties such as Storage 
> Desc Params – that may define how lines are delimited (like in the example 
> below):
> To repro the issue:
> {code:java}
> CREATE TRANSACTIONAL TABLE delim_table_trans(id INT, name STRING, safety INT) 
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE;
> LOAD DATA INPATH 'table1' OVERWRITE INTO TABLE delim_table_trans;
> describe formatted delim_table_trans;
> SELECT * FROM delim_table_trans;
> {code}
> Result:
> {code:java}
> Table Type:   MANAGED_TABLE
> Table Parameters:  
>   bucketing_version   2   
>   numFiles1   
>   numRows 0   
>   rawDataSize 0   
>   totalSize   72  
>   transactional   true
>   transactional_propertiesinsert_only 
>  A masked pattern was here 
>
> # Storage Information  
> SerDe Library:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe  
>  
> InputFormat:  org.apache.hadoop.mapred.TextInputFormat 
> OutputFormat: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat   
> Compressed:   No   
> Num Buckets:  -1   
> Bucket Columns:   []   
> Sort Columns: []   
> PREHOOK: query: SELECT * FROM delim_table_trans
> PREHOOK: type: QUERY
> PREHOOK: Input: default@delim_table_trans
>  A masked pattern was here 
> POSTHOOK: query: SELECT * FROM delim_table_trans
> POSTHOOK: type: QUERY
> POSTHOOK: Input: default@delim_table_trans
>  A masked pattern was here 
> NULL  NULLNULL
> NULL  NULLNULL
> NULL  NULLNULL
> NULL  NULLNULL
> NULL  NULLNULL
> NULL  NULLNULL
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23707) Unable to create materialized views with transactions enabled with MySQL metastore

2020-07-20 Thread wenjun ma (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161435#comment-17161435
 ] 

wenjun ma commented on HIVE-23707:
--

I try  MS SQL it works as your reproduce steps.

> Unable to create materialized views with transactions enabled with MySQL 
> metastore
> --
>
> Key: HIVE-23707
> URL: https://issues.apache.org/jira/browse/HIVE-23707
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.1.2
>Reporter: Dustin Koupal
>Assignee: wenjun ma
>Priority: Blocker
>
> When attempting to create a materialized view with transactions enabled, we 
> get the following exception:
>  
> {code:java}
> ERROR : FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Failed to 
> generate new Mapping of type 
> org.datanucleus.store.rdbms.mapping.java.StringMapping, exception : JDBC type 
> CLOB declared for field 
> "org.apache.hadoop.hive.metastore.model.MCreationMetadata.txnList" of java 
> type java.lang.String cant be mapped for this datastore.ERROR : FAILED: 
> Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. 
> MetaException(message:Failed to generate new Mapping of type 
> org.datanucleus.store.rdbms.mapping.java.StringMapping, exception : JDBC type 
> CLOB declared for field 
> "org.apache.hadoop.hive.metastore.model.MCreationMetadata.txnList" of java 
> type java.lang.String cant be mapped for this datastore.JDBC type CLOB 
> declared for field 
> "org.apache.hadoop.hive.metastore.model.MCreationMetadata.txnList" of java 
> type java.lang.String cant be mapped for this 
> datastore.org.datanucleus.exceptions.NucleusException: JDBC type CLOB 
> declared for field 
> "org.apache.hadoop.hive.metastore.model.MCreationMetadata.txnList" of java 
> type java.lang.String cant be mapped for this datastore. at 
> org.datanucleus.store.rdbms.mapping.RDBMSMappingManager.getDatastoreMappingClass(RDBMSMappingManager.java:1386)
>  at 
> org.datanucleus.store.rdbms.mapping.RDBMSMappingManager.createDatastoreMapping(RDBMSMappingManager.java:1616)
>  at 
> org.datanucleus.store.rdbms.mapping.java.SingleFieldMapping.prepareDatastoreMapping(SingleFieldMapping.java:59)
>  at 
> org.datanucleus.store.rdbms.mapping.java.SingleFieldMapping.initialize(SingleFieldMapping.java:48)
>  at 
> org.datanucleus.store.rdbms.mapping.RDBMSMappingManager.getMapping(RDBMSMappingManager.java:482)
>  at 
> org.datanucleus.store.rdbms.table.ClassTable.manageMembers(ClassTable.java:536)
>  at 
> org.datanucleus.store.rdbms.table.ClassTable.manageClass(ClassTable.java:442) 
> at 
> org.datanucleus.store.rdbms.table.ClassTable.initializeForClass(ClassTable.java:1270)
>  at 
> org.datanucleus.store.rdbms.table.ClassTable.initialize(ClassTable.java:276) 
> at 
> org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.initializeClassTables(RDBMSStoreManager.java:3279)
>  at 
> org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.run(RDBMSStoreManager.java:2889)
>  at 
> org.datanucleus.store.rdbms.AbstractSchemaTransaction.execute(AbstractSchemaTransaction.java:119)
>  at 
> org.datanucleus.store.rdbms.RDBMSStoreManager.manageClasses(RDBMSStoreManager.java:1627)
>  at 
> org.datanucleus.store.rdbms.RDBMSStoreManager.getDatastoreClass(RDBMSStoreManager.java:672)
>  at 
> org.datanucleus.store.rdbms.RDBMSStoreManager.getPropertiesForGenerator(RDBMSStoreManager.java:2088)
>  at 
> org.datanucleus.store.AbstractStoreManager.getStrategyValue(AbstractStoreManager.java:1271)
>  at 
> org.datanucleus.ExecutionContextImpl.newObjectId(ExecutionContextImpl.java:3760)
>  at 
> org.datanucleus.state.StateManagerImpl.setIdentity(StateManagerImpl.java:2267)
>  at 
> org.datanucleus.state.StateManagerImpl.initialiseForPersistentNew(StateManagerImpl.java:484)
>  at 
> org.datanucleus.state.StateManagerImpl.initialiseForPersistentNew(StateManagerImpl.java:120)
>  at 
> org.datanucleus.state.ObjectProviderFactoryImpl.newForPersistentNew(ObjectProviderFactoryImpl.java:218)
>  at 
> org.datanucleus.ExecutionContextImpl.persistObjectInternal(ExecutionContextImpl.java:2079)
>  at 
> org.datanucleus.ExecutionContextImpl.persistObjectWork(ExecutionContextImpl.java:1923)
>  at 
> org.datanucleus.ExecutionContextImpl.persistObject(ExecutionContextImpl.java:1778)
>  at 
> org.datanucleus.ExecutionContextThreadedImpl.persistObject(ExecutionContextThreadedImpl.java:217)
>  at 
> org.datanucleus.api.jdo.JDOPersistenceManager.jdoMakePersistent(JDOPersistenceManager.java:724)
>  at 
> org.datanucleus.api.jdo.JDOPersistenceManager.makePersistent(JDOPersistenceManager.java:749)
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore.createTable(ObjectStore.java:1308)
>  at 

[jira] [Work logged] (HIVE-23716) Support Anti Join in Hive

2020-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23716?focusedWorklogId=461160=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461160
 ]

ASF GitHub Bot logged work on HIVE-23716:
-

Author: ASF GitHub Bot
Created on: 20/Jul/20 16:37
Start Date: 20/Jul/20 16:37
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on a change in pull request #1147:
URL: https://github.com/apache/hive/pull/1147#discussion_r457545081



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinWithFilterToAntiJoinRule.java
##
@@ -0,0 +1,149 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptUtil;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.Filter;
+import org.apache.calcite.rel.core.Join;
+import org.apache.calcite.rel.core.JoinRelType;
+import org.apache.calcite.rel.core.Project;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.calcite.util.ImmutableBitSet;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.List;
+
+/**
+ * Planner rule that converts a join plus filter to anti join.
+ */
+public class HiveJoinWithFilterToAntiJoinRule extends RelOptRule {
+  protected static final Logger LOG = 
LoggerFactory.getLogger(HiveJoinWithFilterToAntiJoinRule.class);
+  public static final HiveJoinWithFilterToAntiJoinRule INSTANCE = new 
HiveJoinWithFilterToAntiJoinRule();
+
+  //HiveProject(fld=[$0])
+  //  HiveFilter(condition=[IS NULL($1)])
+  //HiveJoin(condition=[=($0, $1)], joinType=[left], algorithm=[none], 
cost=[not available])
+  //
+  // TO
+  //
+  //HiveProject(fld_tbl=[$0])
+  //  HiveAntiJoin(condition=[=($0, $1)], joinType=[anti])
+  //
+  public HiveJoinWithFilterToAntiJoinRule() {
+super(operand(Project.class, operand(Filter.class, operand(Join.class, 
RelOptRule.any(,
+"HiveJoinWithFilterToAntiJoinRule:filter");
+  }
+
+  // is null filter over a left join.
+  public void onMatch(final RelOptRuleCall call) {
+final Project project = call.rel(0);
+final Filter filter = call.rel(1);
+final Join join = call.rel(2);
+perform(call, project, filter, join);
+  }
+
+  protected void perform(RelOptRuleCall call, Project project, Filter filter, 
Join join) {
+LOG.debug("Matched HiveAntiJoinRule");

Review comment:
   sure ..will do that 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 461160)
Time Spent: 1h 20m  (was: 1h 10m)

> Support Anti Join in Hive 
> --
>
> Key: HIVE-23716
> URL: https://issues.apache.org/jira/browse/HIVE-23716
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23716.01.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently hive does not support Anti join. The query for anti join is 
> converted to left outer join and null filter on right side join key is added 
> to get the desired result. This is causing
>  # Extra computation — The left outer join projects the redundant columns 
> from right side. Along with that, filtering is done to remove the redundant 
> rows. This is can be avoided in case of anti join as anti join will project 
> only the required columns and rows from 

[jira] [Work logged] (HIVE-23716) Support Anti Join in Hive

2020-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23716?focusedWorklogId=461139=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461139
 ]

ASF GitHub Bot logged work on HIVE-23716:
-

Author: ASF GitHub Bot
Created on: 20/Jul/20 15:51
Start Date: 20/Jul/20 15:51
Worklog Time Spent: 10m 
  Work Description: ramesh0201 commented on pull request #1147:
URL: https://github.com/apache/hive/pull/1147#issuecomment-661124391


   Runtime changes look good to me +1.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 461139)
Time Spent: 1h 10m  (was: 1h)

> Support Anti Join in Hive 
> --
>
> Key: HIVE-23716
> URL: https://issues.apache.org/jira/browse/HIVE-23716
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23716.01.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Currently hive does not support Anti join. The query for anti join is 
> converted to left outer join and null filter on right side join key is added 
> to get the desired result. This is causing
>  # Extra computation — The left outer join projects the redundant columns 
> from right side. Along with that, filtering is done to remove the redundant 
> rows. This is can be avoided in case of anti join as anti join will project 
> only the required columns and rows from the left side table.
>  # Extra shuffle — In case of anti join the duplicate records moved to join 
> node can be avoided from the child node. This can reduce significant amount 
> of data movement if the number of distinct rows( join keys) is significant.
>  # Extra Memory Usage - In case of map based anti join , hash set is 
> sufficient as just the key is required to check  if the records matches the 
> join condition. In case of left join, we need the key and the non key columns 
> also and thus a hash table will be required.
> For a query like
> {code:java}
>  select wr_order_number FROM web_returns LEFT JOIN web_sales  ON 
> wr_order_number = ws_order_number WHERE ws_order_number IS NULL;{code}
> The number of distinct ws_order_number in web_sales table in a typical 10TB 
> TPCDS set up is just 10% of total records. So when we convert this query to 
> anti join, instead of 7 billion rows, only 600 million rows are moved to join 
> node.
> In the current patch, just one conversion is done. The pattern of 
> project->filter->left-join is converted to project->anti-join. This will take 
> care of sub queries with “not exists” clause. The queries with “not exists” 
> are converted first to filter + left-join and then its converted to anti 
> join. The queries with “not in” are not handled in the current patch.
> From execution side, both merge join and map join with vectorized execution  
> is supported for anti join.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-23673) Maven Standard Directories for accumulo-handler

2020-07-20 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor resolved HIVE-23673.
---
Resolution: Won't Fix

Actually, quite a few projects are like this.  This needs to be part of a 
bigger discussion.

> Maven Standard Directories for accumulo-handler
> ---
>
> Key: HIVE-23673
> URL: https://issues.apache.org/jira/browse/HIVE-23673
> Project: Hive
>  Issue Type: Sub-task
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23673) Maven Standard Directories for accumulo-handler

2020-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23673?focusedWorklogId=461133=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461133
 ]

ASF GitHub Bot logged work on HIVE-23673:
-

Author: ASF GitHub Bot
Created on: 20/Jul/20 15:42
Start Date: 20/Jul/20 15:42
Worklog Time Spent: 10m 
  Work Description: belugabehr closed pull request #1088:
URL: https://github.com/apache/hive/pull/1088


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 461133)
Time Spent: 1h 50m  (was: 1h 40m)

> Maven Standard Directories for accumulo-handler
> ---
>
> Key: HIVE-23673
> URL: https://issues.apache.org/jira/browse/HIVE-23673
> Project: Hive
>  Issue Type: Sub-task
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge

2020-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23880?focusedWorklogId=461130=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461130
 ]

ASF GitHub Bot logged work on HIVE-23880:
-

Author: ASF GitHub Bot
Created on: 20/Jul/20 15:31
Start Date: 20/Jul/20 15:31
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1280:
URL: https://github.com/apache/hive/pull/1280#discussion_r457496282



##
File path: storage-api/src/java/org/apache/hive/common/util/BloomKFilter.java
##
@@ -527,20 +527,19 @@ public void add(ElementWrapper wrapper) {
 @Override
 public void run() {
   while (!executor.isTerminated() && !queue.isEmpty()) {

Review comment:
   A bit unrelated, but since you're touching this code.  This check is 
completely useless:
   
   ```
   while (!executor.isTerminated() && !queue.isEmpty()) {
 ...
   }
   ```
   
   I cannot think of many scenarios where the thread needs to check the state 
of its own `ExecutorService`.  If the `ExecutorService` is terminated, it will 
Interrupt every thread in the pool and that should cause it to cease to run.  
Also, checking if the `Queue` is empty is improper.  You will have two threads 
that check the state of the Queue (size = 1), see the same non-empty queue, and 
both try to read, even if there is only one item left.  Both should just try to 
`take` and one will succeed and the other will fail.

##
File path: storage-api/src/java/org/apache/hive/common/util/BloomKFilter.java
##
@@ -527,20 +527,19 @@ public void add(ElementWrapper wrapper) {
 @Override
 public void run() {
   while (!executor.isTerminated() && !queue.isEmpty()) {
-ElementWrapper currentBf = queue.poll();
+ElementWrapper currentBf = null;
+try {
+  currentBf = queue.take();
+} catch (InterruptedException e) {

Review comment:
   Do not ignore.  An Interrupt means that it's time to exit.

##
File path: storage-api/src/java/org/apache/hive/common/util/BloomKFilter.java
##
@@ -506,18 +505,19 @@ public ElementWrapper(byte[] bytes, int start, int 
length, int modifiedStart, in
   }
 
   private static class BloomFilterMergeWorker implements Runnable {
-Queue queue = new LinkedBlockingDeque<>();
+ArrayBlockingQueue queue;

Review comment:
   Use the generic `BlockingQueue` here.

##
File path: storage-api/src/java/org/apache/hive/common/util/BloomKFilter.java
##
@@ -506,18 +505,19 @@ public ElementWrapper(byte[] bytes, int start, int 
length, int modifiedStart, in
   }
 
   private static class BloomFilterMergeWorker implements Runnable {
-Queue queue = new LinkedBlockingDeque<>();
+ArrayBlockingQueue queue;
 private ExecutorService executor;
 
 private byte[] bfAggregation;
 private int bfAggregationStart;
 private int bfAggregationLength;
 
-public BloomFilterMergeWorker(ExecutorService executor, byte[] 
bfAggregation, int bfAggregationStart, int bfAggregationLength) {
+public BloomFilterMergeWorker(ExecutorService executor, int batchSize, 
byte[] bfAggregation, int bfAggregationStart, int bfAggregationLength) {
   this.executor = executor;

Review comment:
   Do not capture this value.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 461130)
Time Spent: 40m  (was: 0.5h)

> Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge
> ---
>
> Key: HIVE-23880
> URL: https://issues.apache.org/jira/browse/HIVE-23880
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Attachments: lipwig-output3605036885489193068.svg
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Merging bloom filters in semijoin reduction can become the main bottleneck in 
> case of large number of source mapper tasks (~1000, Map 1 in below example) 
> and a large amount of expected entries (50M) in bloom filters.
> For example in TPCDS Q93:
> {code}
> select /*+ semi(store_returns, sr_item_sk, store_sales, 7000)*/ 
> ss_customer_sk
> ,sum(act_sales) sumsales
>   from (select ss_item_sk
>   ,ss_ticket_number
>   ,ss_customer_sk
>   ,case when sr_return_quantity is not null then 
> 

[jira] [Commented] (HIVE-23707) Unable to create materialized views with transactions enabled with MySQL metastore

2020-07-20 Thread Dustin Koupal (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161321#comment-17161321
 ] 

Dustin Koupal commented on HIVE-23707:
--

Thanks for checking.  To confirm, did you try with MySQL or MS SQL?  We ran 
into this issue with the MySQL server included in EMR.

> Unable to create materialized views with transactions enabled with MySQL 
> metastore
> --
>
> Key: HIVE-23707
> URL: https://issues.apache.org/jira/browse/HIVE-23707
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.1.2
>Reporter: Dustin Koupal
>Assignee: wenjun ma
>Priority: Blocker
>
> When attempting to create a materialized view with transactions enabled, we 
> get the following exception:
>  
> {code:java}
> ERROR : FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Failed to 
> generate new Mapping of type 
> org.datanucleus.store.rdbms.mapping.java.StringMapping, exception : JDBC type 
> CLOB declared for field 
> "org.apache.hadoop.hive.metastore.model.MCreationMetadata.txnList" of java 
> type java.lang.String cant be mapped for this datastore.ERROR : FAILED: 
> Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. 
> MetaException(message:Failed to generate new Mapping of type 
> org.datanucleus.store.rdbms.mapping.java.StringMapping, exception : JDBC type 
> CLOB declared for field 
> "org.apache.hadoop.hive.metastore.model.MCreationMetadata.txnList" of java 
> type java.lang.String cant be mapped for this datastore.JDBC type CLOB 
> declared for field 
> "org.apache.hadoop.hive.metastore.model.MCreationMetadata.txnList" of java 
> type java.lang.String cant be mapped for this 
> datastore.org.datanucleus.exceptions.NucleusException: JDBC type CLOB 
> declared for field 
> "org.apache.hadoop.hive.metastore.model.MCreationMetadata.txnList" of java 
> type java.lang.String cant be mapped for this datastore. at 
> org.datanucleus.store.rdbms.mapping.RDBMSMappingManager.getDatastoreMappingClass(RDBMSMappingManager.java:1386)
>  at 
> org.datanucleus.store.rdbms.mapping.RDBMSMappingManager.createDatastoreMapping(RDBMSMappingManager.java:1616)
>  at 
> org.datanucleus.store.rdbms.mapping.java.SingleFieldMapping.prepareDatastoreMapping(SingleFieldMapping.java:59)
>  at 
> org.datanucleus.store.rdbms.mapping.java.SingleFieldMapping.initialize(SingleFieldMapping.java:48)
>  at 
> org.datanucleus.store.rdbms.mapping.RDBMSMappingManager.getMapping(RDBMSMappingManager.java:482)
>  at 
> org.datanucleus.store.rdbms.table.ClassTable.manageMembers(ClassTable.java:536)
>  at 
> org.datanucleus.store.rdbms.table.ClassTable.manageClass(ClassTable.java:442) 
> at 
> org.datanucleus.store.rdbms.table.ClassTable.initializeForClass(ClassTable.java:1270)
>  at 
> org.datanucleus.store.rdbms.table.ClassTable.initialize(ClassTable.java:276) 
> at 
> org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.initializeClassTables(RDBMSStoreManager.java:3279)
>  at 
> org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.run(RDBMSStoreManager.java:2889)
>  at 
> org.datanucleus.store.rdbms.AbstractSchemaTransaction.execute(AbstractSchemaTransaction.java:119)
>  at 
> org.datanucleus.store.rdbms.RDBMSStoreManager.manageClasses(RDBMSStoreManager.java:1627)
>  at 
> org.datanucleus.store.rdbms.RDBMSStoreManager.getDatastoreClass(RDBMSStoreManager.java:672)
>  at 
> org.datanucleus.store.rdbms.RDBMSStoreManager.getPropertiesForGenerator(RDBMSStoreManager.java:2088)
>  at 
> org.datanucleus.store.AbstractStoreManager.getStrategyValue(AbstractStoreManager.java:1271)
>  at 
> org.datanucleus.ExecutionContextImpl.newObjectId(ExecutionContextImpl.java:3760)
>  at 
> org.datanucleus.state.StateManagerImpl.setIdentity(StateManagerImpl.java:2267)
>  at 
> org.datanucleus.state.StateManagerImpl.initialiseForPersistentNew(StateManagerImpl.java:484)
>  at 
> org.datanucleus.state.StateManagerImpl.initialiseForPersistentNew(StateManagerImpl.java:120)
>  at 
> org.datanucleus.state.ObjectProviderFactoryImpl.newForPersistentNew(ObjectProviderFactoryImpl.java:218)
>  at 
> org.datanucleus.ExecutionContextImpl.persistObjectInternal(ExecutionContextImpl.java:2079)
>  at 
> org.datanucleus.ExecutionContextImpl.persistObjectWork(ExecutionContextImpl.java:1923)
>  at 
> org.datanucleus.ExecutionContextImpl.persistObject(ExecutionContextImpl.java:1778)
>  at 
> org.datanucleus.ExecutionContextThreadedImpl.persistObject(ExecutionContextThreadedImpl.java:217)
>  at 
> org.datanucleus.api.jdo.JDOPersistenceManager.jdoMakePersistent(JDOPersistenceManager.java:724)
>  at 
> org.datanucleus.api.jdo.JDOPersistenceManager.makePersistent(JDOPersistenceManager.java:749)
>  at 
> 

[jira] [Work logged] (HIVE-23865) Use More Java Collections Class

2020-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23865?focusedWorklogId=461112=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461112
 ]

ASF GitHub Bot logged work on HIVE-23865:
-

Author: ASF GitHub Bot
Created on: 20/Jul/20 14:42
Start Date: 20/Jul/20 14:42
Worklog Time Spent: 10m 
  Work Description: belugabehr opened a new pull request #1267:
URL: https://github.com/apache/hive/pull/1267


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 461112)
Time Spent: 50m  (was: 40m)

> Use More Java Collections Class
> ---
>
> Key: HIVE-23865
> URL: https://issues.apache.org/jira/browse/HIVE-23865
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23865) Use More Java Collections Class

2020-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23865?focusedWorklogId=461109=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461109
 ]

ASF GitHub Bot logged work on HIVE-23865:
-

Author: ASF GitHub Bot
Created on: 20/Jul/20 14:38
Start Date: 20/Jul/20 14:38
Worklog Time Spent: 10m 
  Work Description: belugabehr closed pull request #1267:
URL: https://github.com/apache/hive/pull/1267


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 461109)
Time Spent: 40m  (was: 0.5h)

> Use More Java Collections Class
> ---
>
> Key: HIVE-23865
> URL: https://issues.apache.org/jira/browse/HIVE-23865
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23875) Add VSCode files to gitignore

2020-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23875?focusedWorklogId=461086=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461086
 ]

ASF GitHub Bot logged work on HIVE-23875:
-

Author: ASF GitHub Bot
Created on: 20/Jul/20 14:10
Start Date: 20/Jul/20 14:10
Worklog Time Spent: 10m 
  Work Description: HunterL opened a new pull request #1276:
URL: https://github.com/apache/hive/pull/1276


   Added VSCode files to gitignore



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 461086)
Time Spent: 0.5h  (was: 20m)

> Add VSCode files to gitignore
> -
>
> Key: HIVE-23875
> URL: https://issues.apache.org/jira/browse/HIVE-23875
> Project: Hive
>  Issue Type: Improvement
>Reporter: Hunter Logan
>Assignee: Hunter Logan
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> gitignore currently includes Eclipse and Intellij specific files, should 
> include VSCode as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23875) Add VSCode files to gitignore

2020-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23875?focusedWorklogId=461080=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461080
 ]

ASF GitHub Bot logged work on HIVE-23875:
-

Author: ASF GitHub Bot
Created on: 20/Jul/20 13:58
Start Date: 20/Jul/20 13:58
Worklog Time Spent: 10m 
  Work Description: HunterL closed pull request #1276:
URL: https://github.com/apache/hive/pull/1276


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 461080)
Time Spent: 20m  (was: 10m)

> Add VSCode files to gitignore
> -
>
> Key: HIVE-23875
> URL: https://issues.apache.org/jira/browse/HIVE-23875
> Project: Hive
>  Issue Type: Improvement
>Reporter: Hunter Logan
>Assignee: Hunter Logan
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> gitignore currently includes Eclipse and Intellij specific files, should 
> include VSCode as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge

2020-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23880?focusedWorklogId=461077=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461077
 ]

ASF GitHub Bot logged work on HIVE-23880:
-

Author: ASF GitHub Bot
Created on: 20/Jul/20 13:51
Start Date: 20/Jul/20 13:51
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #1280:
URL: https://github.com/apache/hive/pull/1280#discussion_r457402888



##
File path: storage-api/src/java/org/apache/hive/common/util/BloomKFilter.java
##
@@ -362,16 +379,178 @@ public static void mergeBloomFilterBytes(
 
 // Just bitwise-OR the bits together - size/# functions should be the same,
 // rest of the data is serialized long values for the bitset which are 
supposed to be bitwise-ORed.
-for (int idx = START_OF_SERIALIZED_LONGS; idx < bf1Length; ++idx) {
+for (int idx = mergeStart; idx < mergeEnd; ++idx) {
   bf1Bytes[bf1Start + idx] |= bf2Bytes[bf2Start + idx];
 }
   }
 
+  public static void mergeBloomFilterBytesFromInputColumn(
+  byte[] bf1Bytes, int bf1Start, int bf1Length, long bf1ExpectedEntries,
+  BytesColumnVector inputColumn, int batchSize, boolean selectedInUse, 
int[] selected, int numThreads) {
+if (numThreads == 0) {
+  numThreads = Runtime.getRuntime().availableProcessors();
+}
+if (numThreads < 0) {
+  throw new RuntimeException("invalid number of threads: " + numThreads);
+}
+
+ExecutorService executor = Executors.newFixedThreadPool(numThreads);
+
+BloomFilterMergeWorker[] workers = new BloomFilterMergeWorker[numThreads];
+for (int f = 0; f < numThreads; f++) {
+  workers[f] = new BloomFilterMergeWorker(executor, bf1Bytes, bf1Start, 
bf1Length);
+}
+
+// split every bloom filter (represented by a part of a byte[]) across 
workers
+for (int j = 0; j < batchSize; j++) {
+  if (!selectedInUse && inputColumn.noNulls) {
+splitVectorAcrossWorkers(workers, inputColumn.vector[j], 
inputColumn.start[j],
+inputColumn.length[j]);
+  } else if (!selectedInUse) {
+if (!inputColumn.isNull[j]) {
+  splitVectorAcrossWorkers(workers, inputColumn.vector[j], 
inputColumn.start[j],
+  inputColumn.length[j]);
+}
+  } else if (inputColumn.noNulls) {
+int i = selected[j];
+splitVectorAcrossWorkers(workers, inputColumn.vector[i], 
inputColumn.start[i],
+inputColumn.length[i]);
+  } else {
+int i = selected[j];
+if (!inputColumn.isNull[i]) {
+  splitVectorAcrossWorkers(workers, inputColumn.vector[i], 
inputColumn.start[i],
+  inputColumn.length[i]);
+}
+  }
+}
+
+for (int f = 0; f < numThreads; f++) {
+  executor.submit(workers[f]);
+}
+
+executor.shutdown();
+try {
+  executor.awaitTermination(3600, TimeUnit.SECONDS);
+} catch (InterruptedException e) {
+  throw new RuntimeException(e);
+}
+  }
+
+  private static void splitVectorAcrossWorkers(BloomFilterMergeWorker[] 
workers, byte[] bytes,
+  int start, int length) {
+if (bytes == null || length == 0) {
+  return;
+}
+/*
+ * This will split a byte[] across workers as below:
+ * let's say there are 10 workers for 7813 bytes, in this case
+ * length: 7813, elementPerBatch: 781
+ * bytes assigned to workers: inclusive lower bound, exclusive upper bound
+ * 1. worker: 5 -> 786
+ * 2. worker: 786 -> 1567
+ * 3. worker: 1567 -> 2348
+ * 4. worker: 2348 -> 3129
+ * 5. worker: 3129 -> 3910
+ * 6. worker: 3910 -> 4691
+ * 7. worker: 4691 -> 5472
+ * 8. worker: 5472 -> 6253
+ * 9. worker: 6253 -> 7034
+ * 10. worker: 7034 -> 7813 (last element per batch is: 779)
+ *
+ * This way, a particular worker will be given with the same part
+ * of all bloom filters along with the shared base bloom filter,
+ * so the bitwise OR function will not be a subject of threading/sync 
issues.
+ */
+int elementPerBatch =
+(int) Math.ceil((double) (length - START_OF_SERIALIZED_LONGS) / 
workers.length);
+
+for (int w = 0; w < workers.length; w++) {
+  int modifiedStart = START_OF_SERIALIZED_LONGS + w * elementPerBatch;
+  int modifiedLength = (w == workers.length - 1)
+? length - (START_OF_SERIALIZED_LONGS + w * elementPerBatch) : 
elementPerBatch;
+
+  ElementWrapper wrapper =
+  new ElementWrapper(bytes, start, length, modifiedStart, 
modifiedLength);
+  workers[w].add(wrapper);
+}
+  }
+
+  public static byte[] getInitialBytes(long expectedEntries) {
+ByteArrayOutputStream bytesOut = null;
+try {
+  bytesOut = new ByteArrayOutputStream();
+  BloomKFilter bf = new BloomKFilter(expectedEntries);
+  BloomKFilter.serialize(bytesOut, bf);
+  return 

[jira] [Commented] (HIVE-23883) Streaming does not flush the side file

2020-07-20 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161246#comment-17161246
 ] 

Peter Vary commented on HIVE-23883:
---

CC: [~kuczoram], [~klcopp]

> Streaming does not flush the side file
> --
>
> Key: HIVE-23883
> URL: https://issues.apache.org/jira/browse/HIVE-23883
> Project: Hive
>  Issue Type: Bug
>  Components: Streaming, Transactions
>Reporter: Peter Vary
>Priority: Major
>
> When a streaming write commits a mid-batch write with 
> {{connection.commitTransaction()}} then it tries to flush the sideFile with 
> {{OrcInputFormat.SHIMS.hflush(flushLengths)}}. This uses 
> FSOutputSummer.flush, which does not flush the buffer data to the disk so the 
> actual data is not written.
> Had to remove the check from the end of the streaming tests in 
> {{TestCrudCompactorOnTez.java}}
> {code:java}
>   CompactorTestUtilities.checkAcidVersion(fs.listFiles(new 
> Path(table.getSd().getLocation()), true), fs,
>   conf.getBoolVar(HiveConf.ConfVars.HIVE_WRITE_ACID_VERSION_FILE),
>   new String[] { AcidUtils.DELTA_PREFIX });
> {code}
> These checks verifies the {{_flush_length}} files, and they would fail 
> otherwise.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-23836) Make "cols" dependent so that it cascade deletes

2020-07-20 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor resolved HIVE-23836.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master! Thanks.

> Make "cols" dependent so that it cascade deletes
> 
>
> Key: HIVE-23836
> URL: https://issues.apache.org/jira/browse/HIVE-23836
> Project: Hive
>  Issue Type: Bug
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {quote}
> If you want the deletion of a persistent object to cause the deletion of 
> related objects then you need to mark the related fields in the mapping to be 
> "dependent".
> {quote}
> http://www.datanucleus.org/products/accessplatform/jdo/persistence.html#dependent_fields
> http://www.datanucleus.org/products/datanucleus/jdo/persistence.html#_deleting_an_object
> The database won't do it:
> {code:sql|title=Derby Schema}
> ALTER TABLE "APP"."COLUMNS_V2" ADD CONSTRAINT "COLUMNS_V2_FK1" FOREIGN KEY 
> ("CD_ID") REFERENCES "APP"."CDS" ("CD_ID") ON DELETE NO ACTION ON UPDATE NO 
> ACTION;
> {code}
> https://github.com/apache/hive/blob/65cf6957cf9432277a096f91b40985237274579f/standalone-metastore/metastore-server/src/main/sql/derby/hive-schema-4.0.0.derby.sql#L452



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23836) Make "cols" dependent so that it cascade deletes

2020-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23836?focusedWorklogId=461066=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461066
 ]

ASF GitHub Bot logged work on HIVE-23836:
-

Author: ASF GitHub Bot
Created on: 20/Jul/20 13:31
Start Date: 20/Jul/20 13:31
Worklog Time Spent: 10m 
  Work Description: belugabehr merged pull request #1239:
URL: https://github.com/apache/hive/pull/1239


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 461066)
Time Spent: 40m  (was: 0.5h)

> Make "cols" dependent so that it cascade deletes
> 
>
> Key: HIVE-23836
> URL: https://issues.apache.org/jira/browse/HIVE-23836
> Project: Hive
>  Issue Type: Bug
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {quote}
> If you want the deletion of a persistent object to cause the deletion of 
> related objects then you need to mark the related fields in the mapping to be 
> "dependent".
> {quote}
> http://www.datanucleus.org/products/accessplatform/jdo/persistence.html#dependent_fields
> http://www.datanucleus.org/products/datanucleus/jdo/persistence.html#_deleting_an_object
> The database won't do it:
> {code:sql|title=Derby Schema}
> ALTER TABLE "APP"."COLUMNS_V2" ADD CONSTRAINT "COLUMNS_V2_FK1" FOREIGN KEY 
> ("CD_ID") REFERENCES "APP"."CDS" ("CD_ID") ON DELETE NO ACTION ON UPDATE NO 
> ACTION;
> {code}
> https://github.com/apache/hive/blob/65cf6957cf9432277a096f91b40985237274579f/standalone-metastore/metastore-server/src/main/sql/derby/hive-schema-4.0.0.derby.sql#L452



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23881) Deprecate get_open_txns to use get_open_txns_req method.

2020-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23881?focusedWorklogId=461056=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461056
 ]

ASF GitHub Bot logged work on HIVE-23881:
-

Author: ASF GitHub Bot
Created on: 20/Jul/20 13:01
Start Date: 20/Jul/20 13:01
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1284:
URL: https://github.com/apache/hive/pull/1284#discussion_r457361443



##
File path: 
standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift
##
@@ -2651,6 +2651,7 @@ PartitionsResponse get_partitions_req(1:PartitionsRequest 
req)
 
   // Transaction and lock management calls
   // Get just list of open transactions
+  //Deprecated use get_open_txns_req

Review comment:
   Added





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 461056)
Time Spent: 50m  (was: 40m)

> Deprecate get_open_txns to use get_open_txns_req method.
> 
>
> Key: HIVE-23881
> URL: https://issues.apache.org/jira/browse/HIVE-23881
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23881.01.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23881) Deprecate get_open_txns to use get_open_txns_req method.

2020-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23881?focusedWorklogId=461053=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461053
 ]

ASF GitHub Bot logged work on HIVE-23881:
-

Author: ASF GitHub Bot
Created on: 20/Jul/20 12:48
Start Date: 20/Jul/20 12:48
Worklog Time Spent: 10m 
  Work Description: aasha opened a new pull request #1284:
URL: https://github.com/apache/hive/pull/1284


   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY)
   For more details, please see 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 461053)
Time Spent: 0.5h  (was: 20m)

> Deprecate get_open_txns to use get_open_txns_req method.
> 
>
> Key: HIVE-23881
> URL: https://issues.apache.org/jira/browse/HIVE-23881
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23881) Deprecate get_open_txns to use get_open_txns_req method.

2020-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23881?focusedWorklogId=461054=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461054
 ]

ASF GitHub Bot logged work on HIVE-23881:
-

Author: ASF GitHub Bot
Created on: 20/Jul/20 12:51
Start Date: 20/Jul/20 12:51
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1284:
URL: https://github.com/apache/hive/pull/1284#discussion_r457354181



##
File path: 
standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift
##
@@ -2651,6 +2651,7 @@ PartitionsResponse get_partitions_req(1:PartitionsRequest 
req)
 
   // Transaction and lock management calls
   // Get just list of open transactions
+  //Deprecated use get_open_txns_req

Review comment:
   nit: Could you please add 1 space here? :)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 461054)
Time Spent: 40m  (was: 0.5h)

> Deprecate get_open_txns to use get_open_txns_req method.
> 
>
> Key: HIVE-23881
> URL: https://issues.apache.org/jira/browse/HIVE-23881
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23881.01.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23881) Deprecate get_open_txns to use get_open_txns_req method.

2020-07-20 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-23881:
---
Attachment: HIVE-23881.01.patch
Status: Patch Available  (was: Open)

> Deprecate get_open_txns to use get_open_txns_req method.
> 
>
> Key: HIVE-23881
> URL: https://issues.apache.org/jira/browse/HIVE-23881
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23881.01.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23881) Deprecate get_open_txns to use get_open_txns_req method.

2020-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23881?focusedWorklogId=461050=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461050
 ]

ASF GitHub Bot logged work on HIVE-23881:
-

Author: ASF GitHub Bot
Created on: 20/Jul/20 12:42
Start Date: 20/Jul/20 12:42
Worklog Time Spent: 10m 
  Work Description: aasha closed pull request #1283:
URL: https://github.com/apache/hive/pull/1283


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 461050)
Time Spent: 20m  (was: 10m)

> Deprecate get_open_txns to use get_open_txns_req method.
> 
>
> Key: HIVE-23881
> URL: https://issues.apache.org/jira/browse/HIVE-23881
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23882) Compiler should skip MJ keyExpr for probe optimization

2020-07-20 Thread Panagiotis Garefalakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis reassigned HIVE-23882:
-


> Compiler should skip MJ keyExpr for probe optimization
> --
>
> Key: HIVE-23882
> URL: https://issues.apache.org/jira/browse/HIVE-23882
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>
> In probe we cannot currently support Key expressions (on the big table Side) 
> as ORC CVs Probe directly the smalltable HT (there is no expr evaluation at 
> that level).
> TezCompiler should take this into account when picking MJs to push probe 
> details



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23881) Deprecate get_open_txns to use get_open_txns_req method.

2020-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23881?focusedWorklogId=461046=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461046
 ]

ASF GitHub Bot logged work on HIVE-23881:
-

Author: ASF GitHub Bot
Created on: 20/Jul/20 12:19
Start Date: 20/Jul/20 12:19
Worklog Time Spent: 10m 
  Work Description: aasha opened a new pull request #1283:
URL: https://github.com/apache/hive/pull/1283


   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY)
   For more details, please see 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 461046)
Remaining Estimate: 0h
Time Spent: 10m

> Deprecate get_open_txns to use get_open_txns_req method.
> 
>
> Key: HIVE-23881
> URL: https://issues.apache.org/jira/browse/HIVE-23881
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23881) Deprecate get_open_txns to use get_open_txns_req method.

2020-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23881:
--
Labels: pull-request-available  (was: )

> Deprecate get_open_txns to use get_open_txns_req method.
> 
>
> Key: HIVE-23881
> URL: https://issues.apache.org/jira/browse/HIVE-23881
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-20441) NPE in GenericUDF when hive.allow.udf.load.on.demand is set to true

2020-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-20441?focusedWorklogId=461009=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461009
 ]

ASF GitHub Bot logged work on HIVE-20441:
-

Author: ASF GitHub Bot
Created on: 20/Jul/20 10:19
Start Date: 20/Jul/20 10:19
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on a change in pull request #1242:
URL: https://github.com/apache/hive/pull/1242#discussion_r457248607



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/Registry.java
##
@@ -293,9 +293,10 @@ public FunctionInfo registerPermanentFunction(String 
functionName,
 if (registerToSession) {
   String qualifiedName = FunctionUtils.qualifyFunctionName(
   functionName, SessionState.get().getCurrentDatabase().toLowerCase());
-  if (registerToSessionRegistry(qualifiedName, function) != null) {
+  FunctionInfo newFunction = registerToSessionRegistry(qualifiedName, 
function);

Review comment:
   The call ```FunctionRegistry.getFunctionInfo(String functionName)``` 
will make HS2 will lookup the function from MetaStore when the function does 
not find in the session or system registry with hive.allow.udf.load.on.demand 
enabled. If the function is found, a FunctionInfo created by ```new 
FunctionInfo(functionName, className, resources)``` will be returned, but the 
genericUDF field of the FunctionInfo is null, 
   
https://github.com/apache/hive/blob/fa086ecce543384993a61d5154b14fa3b80df3b5/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionInfo.java#L67-L74
 . 
   So when TypeCheckProcFactory.DefaultExprProcessor gets function expr from 
AstNode, 
   
https://github.com/apache/hive/blob/fa086ecce543384993a61d5154b14fa3b80df3b5/ql/src/java/org/apache/hadoop/hive/ql/parse/type/TypeCheckProcFactory.java#L935-L948,
  The genericUDF got from ```GenericUDF genericUDF = fi.getGenericUDF();```  is 
null,  if the genericUDF is used to create function expr desc afterwards, a npe 
will be thrown.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 461009)
Time Spent: 1h 40m  (was: 1.5h)

> NPE in GenericUDF  when hive.allow.udf.load.on.demand is set to true
> 
>
> Key: HIVE-20441
> URL: https://issues.apache.org/jira/browse/HIVE-20441
> Project: Hive
>  Issue Type: Bug
>  Components: CLI, HiveServer2
>Affects Versions: 1.2.1, 2.3.3
>Reporter: Hui Huang
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20441.1.patch, HIVE-20441.2.patch, 
> HIVE-20441.3.patch, HIVE-20441.4.patch, HIVE-20441.patch
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> When hive.allow.udf.load.on.demand is set to true and hiveserver2 has been 
> started, the new created function from other clients or hiveserver2 will be 
> loaded from the metastore at the first time. 
> When the udf is used in where clause, we got a NPE like:
> {code:java}
> Error executing statement:
> org.apache.hive.service.cli.HiveSQLException: Error while compiling 
> statement: FAILED: NullPointerException null
> at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:206)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:290)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:320) 
> ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:530)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAP
> SHOT]
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:517)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHO
> T]
> at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:310)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:542)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> 

[jira] [Work logged] (HIVE-20441) NPE in GenericUDF when hive.allow.udf.load.on.demand is set to true

2020-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-20441?focusedWorklogId=461012=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461012
 ]

ASF GitHub Bot logged work on HIVE-20441:
-

Author: ASF GitHub Bot
Created on: 20/Jul/20 10:22
Start Date: 20/Jul/20 10:22
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on a change in pull request #1242:
URL: https://github.com/apache/hive/pull/1242#discussion_r457248607



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/Registry.java
##
@@ -293,9 +293,10 @@ public FunctionInfo registerPermanentFunction(String 
functionName,
 if (registerToSession) {
   String qualifiedName = FunctionUtils.qualifyFunctionName(
   functionName, SessionState.get().getCurrentDatabase().toLowerCase());
-  if (registerToSessionRegistry(qualifiedName, function) != null) {
+  FunctionInfo newFunction = registerToSessionRegistry(qualifiedName, 
function);

Review comment:
   The call ```FunctionRegistry.getFunctionInfo(String functionName)``` 
will make HS2 will lookup the function from MetaStore when the function does 
not find in the session or system registry with hive.allow.udf.load.on.demand 
enabled. If the function is found, a FunctionInfo created by ```new 
FunctionInfo(functionName, className, resources)``` will be returned, but the 
genericUDF field of the FunctionInfo is null, 
   
https://github.com/apache/hive/blob/fa086ecce543384993a61d5154b14fa3b80df3b5/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionInfo.java#L67-L74
 . 
   So when TypeCheckProcFactory.DefaultExprProcessor gets function expr from 
AstNode, 
   
https://github.com/apache/hive/blob/fa086ecce543384993a61d5154b14fa3b80df3b5/ql/src/java/org/apache/hadoop/hive/ql/parse/type/TypeCheckProcFactory.java#L935-L948,
  The genericUDF got from ```GenericUDF genericUDF = fi.getGenericUDF();```  is 
null,  if the genericUDF is used to create function expr desc afterwards, a npe 
will be thrown.
   
https://github.com/apache/hive/blob/fa086ecce543384993a61d5154b14fa3b80df3b5/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionInfo.java#L117-L123





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 461012)
Time Spent: 2h  (was: 1h 50m)

> NPE in GenericUDF  when hive.allow.udf.load.on.demand is set to true
> 
>
> Key: HIVE-20441
> URL: https://issues.apache.org/jira/browse/HIVE-20441
> Project: Hive
>  Issue Type: Bug
>  Components: CLI, HiveServer2
>Affects Versions: 1.2.1, 2.3.3
>Reporter: Hui Huang
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20441.1.patch, HIVE-20441.2.patch, 
> HIVE-20441.3.patch, HIVE-20441.4.patch, HIVE-20441.patch
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> When hive.allow.udf.load.on.demand is set to true and hiveserver2 has been 
> started, the new created function from other clients or hiveserver2 will be 
> loaded from the metastore at the first time. 
> When the udf is used in where clause, we got a NPE like:
> {code:java}
> Error executing statement:
> org.apache.hive.service.cli.HiveSQLException: Error while compiling 
> statement: FAILED: NullPointerException null
> at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:206)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:290)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:320) 
> ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:530)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAP
> SHOT]
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:517)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHO
> T]
> at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:310)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> 

[jira] [Work logged] (HIVE-23851) MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions

2020-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23851?focusedWorklogId=461011=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461011
 ]

ASF GitHub Bot logged work on HIVE-23851:
-

Author: ASF GitHub Bot
Created on: 20/Jul/20 10:22
Start Date: 20/Jul/20 10:22
Worklog Time Spent: 10m 
  Work Description: shameersss1 commented on pull request #1271:
URL: https://github.com/apache/hive/pull/1271#issuecomment-660941689


   @kgyrtkirk Could you please take a look?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 461011)
Time Spent: 50m  (was: 40m)

> MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions
> 
>
> Key: HIVE-23851
> URL: https://issues.apache.org/jira/browse/HIVE-23851
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> *Steps to reproduce:*
> # Create external table
> # Run msck command to sync all the partitions with metastore
> # Remove one of the partition path
> # Run msck repair with partition filtering
> *Stack Trace:*
> {code:java}
>  2020-07-15T02:10:29,045 ERROR [4dad298b-28b1-4e6b-94b6-aa785b60c576 main] 
> ppr.PartitionExpressionForMetastore: Failed to deserialize the expression
>  java.lang.IndexOutOfBoundsException: Index: 110, Size: 0
>  at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_192]
>  at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_192]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:857)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:707) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:806)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:96)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.convertExprToFilter(PartitionExpressionForMetastore.java:52)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.PartFilterExprUtil.makeExpressionTree(PartFilterExprUtil.java:48)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:3593)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:80)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT-tests.jar:4.0.0-SNAPSHOT]
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_192]
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_192]
> {code}
> *Cause:*
> In case of msck repair with partition filtering we expect expression proxy 
> class to be set as PartitionExpressionForMetastore ( 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckAnalyzer.java#L78
>  ), While dropping partition we serialize the drop partition filter 
> expression as ( 
> https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java#L589
>  ) which is incompatible during deserializtion happening in 
> PartitionExpressionForMetastore ( 
> 

[jira] [Work logged] (HIVE-20441) NPE in GenericUDF when hive.allow.udf.load.on.demand is set to true

2020-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-20441?focusedWorklogId=461010=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461010
 ]

ASF GitHub Bot logged work on HIVE-20441:
-

Author: ASF GitHub Bot
Created on: 20/Jul/20 10:21
Start Date: 20/Jul/20 10:21
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on a change in pull request #1242:
URL: https://github.com/apache/hive/pull/1242#discussion_r457255692



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/Registry.java
##
@@ -293,9 +293,10 @@ public FunctionInfo registerPermanentFunction(String 
functionName,
 if (registerToSession) {
   String qualifiedName = FunctionUtils.qualifyFunctionName(
   functionName, SessionState.get().getCurrentDatabase().toLowerCase());
-  if (registerToSessionRegistry(qualifiedName, function) != null) {
+  FunctionInfo newFunction = registerToSessionRegistry(qualifiedName, 
function);

Review comment:
   ```registerToSessionRegistry``` will finally initialize the genericUDF 
by calling ```FunctionInfo(FunctionType functionType, String displayName, 
GenericUDF genericUDF, FunctionResource... resources)``` 
https://github.com/apache/hive/blob/fa086ecce543384993a61d5154b14fa3b80df3b5/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionInfo.java#L76-L83,
 The genericUDF field  of  ```FunctionInfo``` returned by this call would not 
be null.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 461010)
Time Spent: 1h 50m  (was: 1h 40m)

> NPE in GenericUDF  when hive.allow.udf.load.on.demand is set to true
> 
>
> Key: HIVE-20441
> URL: https://issues.apache.org/jira/browse/HIVE-20441
> Project: Hive
>  Issue Type: Bug
>  Components: CLI, HiveServer2
>Affects Versions: 1.2.1, 2.3.3
>Reporter: Hui Huang
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20441.1.patch, HIVE-20441.2.patch, 
> HIVE-20441.3.patch, HIVE-20441.4.patch, HIVE-20441.patch
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> When hive.allow.udf.load.on.demand is set to true and hiveserver2 has been 
> started, the new created function from other clients or hiveserver2 will be 
> loaded from the metastore at the first time. 
> When the udf is used in where clause, we got a NPE like:
> {code:java}
> Error executing statement:
> org.apache.hive.service.cli.HiveSQLException: Error while compiling 
> statement: FAILED: NullPointerException null
> at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:206)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:290)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:320) 
> ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:530)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAP
> SHOT]
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:517)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHO
> T]
> at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:310)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:542)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1437)
>  ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNA
> PSHOT]
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1422)
>  ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNA
> PSHOT]
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
> ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) 
> ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> 

[jira] [Work logged] (HIVE-20441) NPE in GenericUDF when hive.allow.udf.load.on.demand is set to true

2020-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-20441?focusedWorklogId=461005=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461005
 ]

ASF GitHub Bot logged work on HIVE-20441:
-

Author: ASF GitHub Bot
Created on: 20/Jul/20 10:10
Start Date: 20/Jul/20 10:10
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on a change in pull request #1242:
URL: https://github.com/apache/hive/pull/1242#discussion_r457248607



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/Registry.java
##
@@ -293,9 +293,10 @@ public FunctionInfo registerPermanentFunction(String 
functionName,
 if (registerToSession) {
   String qualifiedName = FunctionUtils.qualifyFunctionName(
   functionName, SessionState.get().getCurrentDatabase().toLowerCase());
-  if (registerToSessionRegistry(qualifiedName, function) != null) {
+  FunctionInfo newFunction = registerToSessionRegistry(qualifiedName, 
function);

Review comment:
   The call ```FunctionRegistry.getFunctionInfo(String functionName)``` 
will make HS2 will lookup the function from MetaStore when the function does 
not find in the session or system registry with hive.allow.udf.load.on.demand 
enabled. If the function is found, a FunctionInfo created by ```new 
FunctionInfo(functionName, className, resources)``` will be returned, but the 
genericUDF field of the FunctionInfo is null, 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionInfo.java#L67-L74
 . So when TypeCheckProcFactory.DefaultExprProcessor gets function expr from 
AstNode, 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/type/TypeCheckProcFactory.java#L935-L948,
  The genericUDF got from ```GenericUDF genericUDF = fi.getGenericUDF();```  is 
null,  if the genericUDF is used to create function expr desc afterwards, a npe 
will be thrown.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 461005)
Time Spent: 1.5h  (was: 1h 20m)

> NPE in GenericUDF  when hive.allow.udf.load.on.demand is set to true
> 
>
> Key: HIVE-20441
> URL: https://issues.apache.org/jira/browse/HIVE-20441
> Project: Hive
>  Issue Type: Bug
>  Components: CLI, HiveServer2
>Affects Versions: 1.2.1, 2.3.3
>Reporter: Hui Huang
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20441.1.patch, HIVE-20441.2.patch, 
> HIVE-20441.3.patch, HIVE-20441.4.patch, HIVE-20441.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> When hive.allow.udf.load.on.demand is set to true and hiveserver2 has been 
> started, the new created function from other clients or hiveserver2 will be 
> loaded from the metastore at the first time. 
> When the udf is used in where clause, we got a NPE like:
> {code:java}
> Error executing statement:
> org.apache.hive.service.cli.HiveSQLException: Error while compiling 
> statement: FAILED: NullPointerException null
> at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:206)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:290)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:320) 
> ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:530)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAP
> SHOT]
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:517)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHO
> T]
> at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:310)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:542)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1437)
>  

[jira] [Work logged] (HIVE-23560) Optimize bootstrap dump to abort only write Transactions

2020-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23560?focusedWorklogId=461000=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461000
 ]

ASF GitHub Bot logged work on HIVE-23560:
-

Author: ASF GitHub Bot
Created on: 20/Jul/20 10:02
Start Date: 20/Jul/20 10:02
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1232:
URL: https://github.com/apache/hive/pull/1232#discussion_r457242768



##
File path: 
standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift
##
@@ -2066,6 +2066,10 @@ struct GetReplicationMetricsRequest {
   3: optional i64 dumpExecutionId
 }
 
+struct GetOpenTxnsRequest {
+  1: required list excludeTxnTypes;

Review comment:
   https://issues.apache.org/jira/browse/HIVE-23881





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 461000)
Time Spent: 3.5h  (was: 3h 20m)

> Optimize bootstrap dump to abort only write Transactions
> 
>
> Key: HIVE-23560
> URL: https://issues.apache.org/jira/browse/HIVE-23560
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23560.01.patch, HIVE-23560.02.patch, 
> HIVE-23560.03.patch, HIVE-23560.04.patch, Optimize bootstrap dump to avoid 
> aborting all transactions.pdf
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Currently before doing a bootstrap dump, we abort all open transactions after 
> waiting for a configured time. We are proposing to abort only write 
> transactions for the db under replication and leave the read and repl created 
> transactions as is.
> This doc attached talks about it in detail



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23881) Deprecate get_open_txns to use get_open_txns_req method.

2020-07-20 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi reassigned HIVE-23881:
--

Assignee: Aasha Medhi

> Deprecate get_open_txns to use get_open_txns_req method.
> 
>
> Key: HIVE-23881
> URL: https://issues.apache.org/jira/browse/HIVE-23881
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23560) Optimize bootstrap dump to abort only write Transactions

2020-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23560?focusedWorklogId=460999=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460999
 ]

ASF GitHub Bot logged work on HIVE-23560:
-

Author: ASF GitHub Bot
Created on: 20/Jul/20 09:59
Start Date: 20/Jul/20 09:59
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1232:
URL: https://github.com/apache/hive/pull/1232#discussion_r457240568



##
File path: 
standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift
##
@@ -2066,6 +2066,10 @@ struct GetReplicationMetricsRequest {
   3: optional i64 dumpExecutionId
 }
 
+struct GetOpenTxnsRequest {
+  1: required list excludeTxnTypes;

Review comment:
   Yes makes sense. This pull request is already merged. I will make the 
param optional in another pull request and will create a ticket to use this new 
method across and deprecate the original get_open_txns method.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 460999)
Time Spent: 3h 20m  (was: 3h 10m)

> Optimize bootstrap dump to abort only write Transactions
> 
>
> Key: HIVE-23560
> URL: https://issues.apache.org/jira/browse/HIVE-23560
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23560.01.patch, HIVE-23560.02.patch, 
> HIVE-23560.03.patch, HIVE-23560.04.patch, Optimize bootstrap dump to avoid 
> aborting all transactions.pdf
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Currently before doing a bootstrap dump, we abort all open transactions after 
> waiting for a configured time. We are proposing to abort only write 
> transactions for the db under replication and leave the read and repl created 
> transactions as is.
> This doc attached talks about it in detail



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23870) Optimise multiple text conversions in WritableHiveCharObjectInspector.getPrimitiveJavaObject / HiveCharWritable

2020-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23870:
--
Labels: pull-request-available  (was: )

> Optimise multiple text conversions in 
> WritableHiveCharObjectInspector.getPrimitiveJavaObject / HiveCharWritable
> ---
>
> Key: HIVE-23870
> URL: https://issues.apache.org/jira/browse/HIVE-23870
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2020-07-17-11-31-38-241.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Observed this when creating materialized view.
> [https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/WritableHiveCharObjectInspector.java#L85]
> Same content is converted to Text multiple times.
> !image-2020-07-17-11-31-38-241.png|width=1048,height=936!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23870) Optimise multiple text conversions in WritableHiveCharObjectInspector.getPrimitiveJavaObject / HiveCharWritable

2020-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23870?focusedWorklogId=460994=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460994
 ]

ASF GitHub Bot logged work on HIVE-23870:
-

Author: ASF GitHub Bot
Created on: 20/Jul/20 09:52
Start Date: 20/Jul/20 09:52
Worklog Time Spent: 10m 
  Work Description: rbalamohan opened a new pull request #1282:
URL: https://github.com/apache/hive/pull/1282


   Observed runtime dropping from "7600s --> 4800s" in internal cluster, when 
running a job which creates materialized view on warehouse/inventory/date_dim 
(where warehouse had char/varchar).
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 460994)
Remaining Estimate: 0h
Time Spent: 10m

> Optimise multiple text conversions in 
> WritableHiveCharObjectInspector.getPrimitiveJavaObject / HiveCharWritable
> ---
>
> Key: HIVE-23870
> URL: https://issues.apache.org/jira/browse/HIVE-23870
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Major
> Attachments: image-2020-07-17-11-31-38-241.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Observed this when creating materialized view.
> [https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/WritableHiveCharObjectInspector.java#L85]
> Same content is converted to Text multiple times.
> !image-2020-07-17-11-31-38-241.png|width=1048,height=936!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23870) Optimise multiple text conversions in WritableHiveCharObjectInspector.getPrimitiveJavaObject / HiveCharWritable

2020-07-20 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-23870:

Summary: Optimise multiple text conversions in 
WritableHiveCharObjectInspector.getPrimitiveJavaObject / HiveCharWritable  
(was: Optimise multiple text conversions in 
WritableHiveCharObjectInspector.getPrimitiveJavaObjec)

> Optimise multiple text conversions in 
> WritableHiveCharObjectInspector.getPrimitiveJavaObject / HiveCharWritable
> ---
>
> Key: HIVE-23870
> URL: https://issues.apache.org/jira/browse/HIVE-23870
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Major
> Attachments: image-2020-07-17-11-31-38-241.png
>
>
> Observed this when creating materialized view.
> [https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/WritableHiveCharObjectInspector.java#L85]
> Same content is converted to Text multiple times.
> !image-2020-07-17-11-31-38-241.png|width=1048,height=936!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-23840) Use LLAP to get orc metadata

2020-07-20 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-23840.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master.
Thanks for the review [~szita]!

> Use LLAP to get orc metadata
> 
>
> Key: HIVE-23840
> URL: https://issues.apache.org/jira/browse/HIVE-23840
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> HIVE-23824 added the possibility to access ORC metadata. We can use this to 
> decide which delta files should be read, and which could be omitted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23840) Use LLAP to get orc metadata

2020-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23840?focusedWorklogId=460984=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460984
 ]

ASF GitHub Bot logged work on HIVE-23840:
-

Author: ASF GitHub Bot
Created on: 20/Jul/20 09:32
Start Date: 20/Jul/20 09:32
Worklog Time Spent: 10m 
  Work Description: pvary merged pull request #1251:
URL: https://github.com/apache/hive/pull/1251


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 460984)
Time Spent: 1h 10m  (was: 1h)

> Use LLAP to get orc metadata
> 
>
> Key: HIVE-23840
> URL: https://issues.apache.org/jira/browse/HIVE-23840
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> HIVE-23824 added the possibility to access ORC metadata. We can use this to 
> decide which delta files should be read, and which could be omitted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23873) Querying Hive JDBCStorageHandler table fails with NPE

2020-07-20 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161068#comment-17161068
 ] 

Syed Shameerur Rahman commented on HIVE-23873:
--

[~chiran54321] Interesting... We do have a qtest external_jdbc_table4.q which 
cover the above use case and it passes.  Assuming the above stack Trace was 
generated with master hive branch, had there been a case sensitive issue 
https://github.com/apache/hive/blob/master/jdbc-handler/src/main/java/org/apache/hive/storage/jdbc/JdbcSerDe.java#L229
 , value should be null and ultimately the the rowVal will be set to null 
https://github.com/apache/hive/blob/master/jdbc-handler/src/main/java/org/apache/hive/storage/jdbc/JdbcSerDe.java#L233
 

> Querying Hive JDBCStorageHandler table fails with NPE
> -
>
> Key: HIVE-23873
> URL: https://issues.apache.org/jira/browse/HIVE-23873
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, JDBC
>Affects Versions: 3.1.0, 3.1.1, 3.1.2
>Reporter: Chiran Ravani
>Assignee: Chiran Ravani
>Priority: Critical
> Attachments: HIVE-23873.01.patch
>
>
> Scenario is Hive table having same schema as table in Oracle, however when we 
> query the table with data it fails with NPE, below is the trace.
> {code}
> Caused by: java.io.IOException: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:617)
>  ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:524) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2739) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229)
>  ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:473)
>  ~[hive-service-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> ... 34 more
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hive.storage.jdbc.JdbcSerDe.deserialize(JdbcSerDe.java:164) 
> ~[hive-jdbc-handler-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:598)
>  ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:524) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2739) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229)
>  ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:473)
>  ~[hive-service-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> ... 34 more
> {code}
> Problem appears when column names in Oracle are in Upper case and since in 
> Hive, table and column names are forced to store in lowercase during 
> creation. User runs into NPE error while fetching data.
> While deserializing data, input consists of column names in lower case which 
> fails to get the value
> https://github.com/apache/hive/blob/rel/release-3.1.2/jdbc-handler/src/main/java/org/apache/hive/storage/jdbc/JdbcSerDe.java#L136
> {code}
> rowVal = ((ObjectWritable)value).get();
> {code}
> Log Snio:
> =
> {code}
> 2020-07-17T16:49:09,598 INFO  [04ed42ec-91d2-4662-aee7-37e840a06036 
> HiveServer2-Handler-Pool: Thread-104]: dao.GenericJdbcDatabaseAccessor (:()) 
> - Query to execute is [select * from TESTHIVEJDBCSTORAGE]
> 2020-07-17T16:49:10,642 INFO  [04ed42ec-91d2-4662-aee7-37e840a06036 
> HiveServer2-Handler-Pool: Thread-104]: jdbc.JdbcSerDe (:()) - *** ColumnKey = 
> ID
> 2020-07-17T16:49:10,642 INFO  [04ed42ec-91d2-4662-aee7-37e840a06036 
> HiveServer2-Handler-Pool: Thread-104]: jdbc.JdbcSerDe (:()) - *** Blob value 
> = {fname=OW[class=class java.lang.String,value=Name1], id=OW[class=class 
> java.lang.Integer,value=1]}
> {code}
> Simple Reproducer for this case.
> =
> 1. Create table in Oracle
> {code}
> create table TESTHIVEJDBCSTORAGE(ID INT, FNAME VARCHAR(20));
> {code}
> 2. Insert dummy data.
> {code}
> Insert into TESTHIVEJDBCSTORAGE values (1, 'Name1');
> {code}
> 3. Create 

[jira] [Work logged] (HIVE-23560) Optimize bootstrap dump to abort only write Transactions

2020-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23560?focusedWorklogId=460982=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460982
 ]

ASF GitHub Bot logged work on HIVE-23560:
-

Author: ASF GitHub Bot
Created on: 20/Jul/20 09:31
Start Date: 20/Jul/20 09:31
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1232:
URL: https://github.com/apache/hive/pull/1232#discussion_r457220034



##
File path: 
standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift
##
@@ -2066,6 +2066,10 @@ struct GetReplicationMetricsRequest {
   3: optional i64 dumpExecutionId
 }
 
+struct GetOpenTxnsRequest {
+  1: required list excludeTxnTypes;

Review comment:
   Could we make this optional? So if we later want to change the 
GetOpenTxnsRequest object we do not end up sending an empty list all the time? 
(as a general rule for new methods we create a Request object for them with all 
optional fields)
   Also this might need some checks later in the code, but we might want to 
merge the codepath for the original get_open_txns method with this new one as 
soon as possible.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 460982)
Time Spent: 3h 10m  (was: 3h)

> Optimize bootstrap dump to abort only write Transactions
> 
>
> Key: HIVE-23560
> URL: https://issues.apache.org/jira/browse/HIVE-23560
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23560.01.patch, HIVE-23560.02.patch, 
> HIVE-23560.03.patch, HIVE-23560.04.patch, Optimize bootstrap dump to avoid 
> aborting all transactions.pdf
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Currently before doing a bootstrap dump, we abort all open transactions after 
> waiting for a configured time. We are proposing to abort only write 
> transactions for the db under replication and leave the read and repl created 
> transactions as is.
> This doc attached talks about it in detail



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23560) Optimize bootstrap dump to abort only write Transactions

2020-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23560?focusedWorklogId=460978=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460978
 ]

ASF GitHub Bot logged work on HIVE-23560:
-

Author: ASF GitHub Bot
Created on: 20/Jul/20 09:28
Start Date: 20/Jul/20 09:28
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1232:
URL: https://github.com/apache/hive/pull/1232#discussion_r457217636



##
File path: 
standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift
##
@@ -2802,6 +2806,7 @@ PartitionsResponse get_partitions_req(1:PartitionsRequest 
req)
 
   void add_replication_metrics(1: ReplicationMetricList replicationMetricList) 
throws(1:MetaException o1)
   ReplicationMetricList get_replication_metrics(1: 
GetReplicationMetricsRequest rqst) throws(1:MetaException o1)
+  GetOpenTxnsResponse get_open_txns_req(1: GetOpenTxnsRequest 
getOpenTxnsRequest)

Review comment:
   Fair point





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 460978)
Time Spent: 3h  (was: 2h 50m)

> Optimize bootstrap dump to abort only write Transactions
> 
>
> Key: HIVE-23560
> URL: https://issues.apache.org/jira/browse/HIVE-23560
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23560.01.patch, HIVE-23560.02.patch, 
> HIVE-23560.03.patch, HIVE-23560.04.patch, Optimize bootstrap dump to avoid 
> aborting all transactions.pdf
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Currently before doing a bootstrap dump, we abort all open transactions after 
> waiting for a configured time. We are proposing to abort only write 
> transactions for the db under replication and leave the read and repl created 
> transactions as is.
> This doc attached talks about it in detail



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-20441) NPE in GenericUDF when hive.allow.udf.load.on.demand is set to true

2020-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-20441?focusedWorklogId=460972=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460972
 ]

ASF GitHub Bot logged work on HIVE-20441:
-

Author: ASF GitHub Bot
Created on: 20/Jul/20 09:07
Start Date: 20/Jul/20 09:07
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1242:
URL: https://github.com/apache/hive/pull/1242#discussion_r457202709



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/Registry.java
##
@@ -293,9 +293,10 @@ public FunctionInfo registerPermanentFunction(String 
functionName,
 if (registerToSession) {
   String qualifiedName = FunctionUtils.qualifyFunctionName(
   functionName, SessionState.get().getCurrentDatabase().toLowerCase());
-  if (registerToSessionRegistry(qualifiedName, function) != null) {
+  FunctionInfo newFunction = registerToSessionRegistry(qualifiedName, 
function);

Review comment:
   I tried to understand the goal of the change, but could not find the 
root cause, and do not see what I miss.
   Could you please explain when this code results in different return value 
before and after the patch?
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 460972)
Time Spent: 1h 20m  (was: 1h 10m)

> NPE in GenericUDF  when hive.allow.udf.load.on.demand is set to true
> 
>
> Key: HIVE-20441
> URL: https://issues.apache.org/jira/browse/HIVE-20441
> Project: Hive
>  Issue Type: Bug
>  Components: CLI, HiveServer2
>Affects Versions: 1.2.1, 2.3.3
>Reporter: Hui Huang
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20441.1.patch, HIVE-20441.2.patch, 
> HIVE-20441.3.patch, HIVE-20441.4.patch, HIVE-20441.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> When hive.allow.udf.load.on.demand is set to true and hiveserver2 has been 
> started, the new created function from other clients or hiveserver2 will be 
> loaded from the metastore at the first time. 
> When the udf is used in where clause, we got a NPE like:
> {code:java}
> Error executing statement:
> org.apache.hive.service.cli.HiveSQLException: Error while compiling 
> statement: FAILED: NullPointerException null
> at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:206)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:290)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:320) 
> ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:530)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAP
> SHOT]
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:517)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHO
> T]
> at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:310)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:542)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1437)
>  ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNA
> PSHOT]
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1422)
>  ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNA
> PSHOT]
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
> ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) 
> ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:57)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> 

[jira] [Updated] (HIVE-23815) output statistics of underlying datastore

2020-07-20 Thread Rossetti Wong (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rossetti Wong updated HIVE-23815:
-
Flags: Patch

> output statistics of underlying datastore 
> --
>
> Key: HIVE-23815
> URL: https://issues.apache.org/jira/browse/HIVE-23815
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rossetti Wong
>Assignee: Rossetti Wong
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> This patch provides a way to get the statistics data of metastore's 
> underlying datastore, like MySQL, Oracle and so on.  You can get the number 
> of datastore reads and writes, the average time of transaction execution, the 
> total active connection and so on.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge

2020-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23880?focusedWorklogId=460968=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460968
 ]

ASF GitHub Bot logged work on HIVE-23880:
-

Author: ASF GitHub Bot
Created on: 20/Jul/20 08:43
Start Date: 20/Jul/20 08:43
Worklog Time Spent: 10m 
  Work Description: rbalamohan commented on a change in pull request #1280:
URL: https://github.com/apache/hive/pull/1280#discussion_r457184037



##
File path: storage-api/src/java/org/apache/hive/common/util/BloomKFilter.java
##
@@ -362,16 +379,178 @@ public static void mergeBloomFilterBytes(
 
 // Just bitwise-OR the bits together - size/# functions should be the same,
 // rest of the data is serialized long values for the bitset which are 
supposed to be bitwise-ORed.
-for (int idx = START_OF_SERIALIZED_LONGS; idx < bf1Length; ++idx) {
+for (int idx = mergeStart; idx < mergeEnd; ++idx) {
   bf1Bytes[bf1Start + idx] |= bf2Bytes[bf2Start + idx];
 }
   }
 
+  public static void mergeBloomFilterBytesFromInputColumn(
+  byte[] bf1Bytes, int bf1Start, int bf1Length, long bf1ExpectedEntries,
+  BytesColumnVector inputColumn, int batchSize, boolean selectedInUse, 
int[] selected, int numThreads) {
+if (numThreads == 0) {
+  numThreads = Runtime.getRuntime().availableProcessors();
+}
+if (numThreads < 0) {
+  throw new RuntimeException("invalid number of threads: " + numThreads);
+}
+
+ExecutorService executor = Executors.newFixedThreadPool(numThreads);
+
+BloomFilterMergeWorker[] workers = new BloomFilterMergeWorker[numThreads];
+for (int f = 0; f < numThreads; f++) {
+  workers[f] = new BloomFilterMergeWorker(executor, bf1Bytes, bf1Start, 
bf1Length);
+}
+
+// split every bloom filter (represented by a part of a byte[]) across 
workers
+for (int j = 0; j < batchSize; j++) {
+  if (!selectedInUse && inputColumn.noNulls) {
+splitVectorAcrossWorkers(workers, inputColumn.vector[j], 
inputColumn.start[j],
+inputColumn.length[j]);
+  } else if (!selectedInUse) {
+if (!inputColumn.isNull[j]) {
+  splitVectorAcrossWorkers(workers, inputColumn.vector[j], 
inputColumn.start[j],
+  inputColumn.length[j]);
+}
+  } else if (inputColumn.noNulls) {
+int i = selected[j];
+splitVectorAcrossWorkers(workers, inputColumn.vector[i], 
inputColumn.start[i],
+inputColumn.length[i]);
+  } else {
+int i = selected[j];
+if (!inputColumn.isNull[i]) {
+  splitVectorAcrossWorkers(workers, inputColumn.vector[i], 
inputColumn.start[i],
+  inputColumn.length[i]);
+}
+  }
+}
+
+for (int f = 0; f < numThreads; f++) {
+  executor.submit(workers[f]);
+}
+
+executor.shutdown();
+try {
+  executor.awaitTermination(3600, TimeUnit.SECONDS);
+} catch (InterruptedException e) {
+  throw new RuntimeException(e);
+}
+  }
+
+  private static void splitVectorAcrossWorkers(BloomFilterMergeWorker[] 
workers, byte[] bytes,
+  int start, int length) {
+if (bytes == null || length == 0) {
+  return;
+}
+/*
+ * This will split a byte[] across workers as below:
+ * let's say there are 10 workers for 7813 bytes, in this case
+ * length: 7813, elementPerBatch: 781
+ * bytes assigned to workers: inclusive lower bound, exclusive upper bound
+ * 1. worker: 5 -> 786
+ * 2. worker: 786 -> 1567
+ * 3. worker: 1567 -> 2348
+ * 4. worker: 2348 -> 3129
+ * 5. worker: 3129 -> 3910
+ * 6. worker: 3910 -> 4691
+ * 7. worker: 4691 -> 5472
+ * 8. worker: 5472 -> 6253
+ * 9. worker: 6253 -> 7034
+ * 10. worker: 7034 -> 7813 (last element per batch is: 779)
+ *
+ * This way, a particular worker will be given with the same part
+ * of all bloom filters along with the shared base bloom filter,
+ * so the bitwise OR function will not be a subject of threading/sync 
issues.
+ */
+int elementPerBatch =
+(int) Math.ceil((double) (length - START_OF_SERIALIZED_LONGS) / 
workers.length);
+
+for (int w = 0; w < workers.length; w++) {
+  int modifiedStart = START_OF_SERIALIZED_LONGS + w * elementPerBatch;
+  int modifiedLength = (w == workers.length - 1)
+? length - (START_OF_SERIALIZED_LONGS + w * elementPerBatch) : 
elementPerBatch;
+
+  ElementWrapper wrapper =
+  new ElementWrapper(bytes, start, length, modifiedStart, 
modifiedLength);
+  workers[w].add(wrapper);
+}
+  }
+
+  public static byte[] getInitialBytes(long expectedEntries) {
+ByteArrayOutputStream bytesOut = null;
+try {
+  bytesOut = new ByteArrayOutputStream();
+  BloomKFilter bf = new BloomKFilter(expectedEntries);
+  BloomKFilter.serialize(bytesOut, bf);
+  return 

[jira] [Updated] (HIVE-23815) output statistics of underlying datastore

2020-07-20 Thread Rossetti Wong (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rossetti Wong updated HIVE-23815:
-
External issue URL: https://github.com/apache/hive/pull/1281

> output statistics of underlying datastore 
> --
>
> Key: HIVE-23815
> URL: https://issues.apache.org/jira/browse/HIVE-23815
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rossetti Wong
>Assignee: Rossetti Wong
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> This patch provides a way to get the statistics data of metastore's 
> underlying datastore, like MySQL, Oracle and so on.  You can get the number 
> of datastore reads and writes, the average time of transaction execution, the 
> total active connection and so on.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23671) MSCK repair should handle transactional tables in certain usecases

2020-07-20 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-23671:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> MSCK repair should handle transactional tables in certain usecases
> --
>
> Key: HIVE-23671
> URL: https://issues.apache.org/jira/browse/HIVE-23671
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 11h 50m
>  Remaining Estimate: 0h
>
> The MSCK REPAIR tool does not handle transactional tables too well. It can 
> find and add new partitions the same way as for non-transactional tables, but 
> since the writeId differences are not handled, the data can not read back 
> from the new partitions.
> We could handle some usecases when the writeIds in the HMS and the underlying 
> data are not conflicting. If the HMS does not contains allocated writes for 
> the table we can seed the table with the writeIds read from the directory 
> structrure.
> Real life use cases could be:
>  * Copy data files from one cluster to another with different HMS, create the 
> table and call MSCK REPAIR
>  * If the HMS db is lost, recreate the table and call MSCK REPAIR
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23671) MSCK repair should handle transactional tables in certain usecases

2020-07-20 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161025#comment-17161025
 ] 

Denys Kuzmenko commented on HIVE-23671:
---

Pushed to master.
Thank you for the patch, [~pvargacl]!!

> MSCK repair should handle transactional tables in certain usecases
> --
>
> Key: HIVE-23671
> URL: https://issues.apache.org/jira/browse/HIVE-23671
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 11h 50m
>  Remaining Estimate: 0h
>
> The MSCK REPAIR tool does not handle transactional tables too well. It can 
> find and add new partitions the same way as for non-transactional tables, but 
> since the writeId differences are not handled, the data can not read back 
> from the new partitions.
> We could handle some usecases when the writeIds in the HMS and the underlying 
> data are not conflicting. If the HMS does not contains allocated writes for 
> the table we can seed the table with the writeIds read from the directory 
> structrure.
> Real life use cases could be:
>  * Copy data files from one cluster to another with different HMS, create the 
> table and call MSCK REPAIR
>  * If the HMS db is lost, recreate the table and call MSCK REPAIR
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23837) HbaseStorageHandler is not configured properly when the FileSinkOperator is the child of a MergeJoinWork

2020-07-20 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko reassigned HIVE-23837:
-

Assignee: Denys Kuzmenko  (was: Peter Varga)

> HbaseStorageHandler is not configured properly when the FileSinkOperator is 
> the child of a MergeJoinWork
> 
>
> Key: HIVE-23837
> URL: https://issues.apache.org/jira/browse/HIVE-23837
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> If the FileSinkOperator's root operator is a MergeJoinWork the 
> HbaseStorageHandler.configureJobConf will never get called, and the execution 
> will miss the HBASE_AUTH_TOKEN and the hbase jars.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22869) Add locking benchmark to metastore-tools/metastore-benchmarks

2020-07-20 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-22869:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Add locking benchmark to metastore-tools/metastore-benchmarks
> -
>
> Key: HIVE-22869
> URL: https://issues.apache.org/jira/browse/HIVE-22869
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Chovan
>Assignee: Zoltan Chovan
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22869.2.patch, HIVE-22869.3.patch, 
> HIVE-22869.4.patch, HIVE-22869.5.patch, HIVE-22869.6.patch, 
> HIVE-22869.7.patch, HIVE-22869.8.patch, HIVE-22869.9.patch, HIVE-22869.patch
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Add the possibility to run benchmarks on opening lock in the HMS



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22869) Add locking benchmark to metastore-tools/metastore-benchmarks

2020-07-20 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161023#comment-17161023
 ] 

Denys Kuzmenko commented on HIVE-22869:
---

Pushed to master.
Thank you for the patch, [~zchovan]!!

> Add locking benchmark to metastore-tools/metastore-benchmarks
> -
>
> Key: HIVE-22869
> URL: https://issues.apache.org/jira/browse/HIVE-22869
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Chovan
>Assignee: Zoltan Chovan
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22869.2.patch, HIVE-22869.3.patch, 
> HIVE-22869.4.patch, HIVE-22869.5.patch, HIVE-22869.6.patch, 
> HIVE-22869.7.patch, HIVE-22869.8.patch, HIVE-22869.9.patch, HIVE-22869.patch
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Add the possibility to run benchmarks on opening lock in the HMS



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-12155) hive exited with status 5

2020-07-20 Thread Chunhui Yang (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-12155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161024#comment-17161024
 ] 

Chunhui Yang commented on HIVE-12155:
-

As of July 20, 2020, has no one solved this problem yet?

> hive exited with status 5
> -
>
> Key: HIVE-12155
> URL: https://issues.apache.org/jira/browse/HIVE-12155
> Project: Hive
>  Issue Type: Bug
>  Components: CLI, Clients
>Affects Versions: 1.2.1
> Environment: sqoop 1.4.5 & hadoop 2.6 & hive 1.2.1
>Reporter: Qiuzhuang Lian
>Priority: Major
>
> We run sqoop-hive import job via RunJar to harness parallelisms runnings. 
> Sqoop hive import works very well but suddenly the sqoop-hive import job JVM 
> exits with "Hive exited with status 5" error during hive import phrase which 
> invokes HIVE CLI via java Process.  Futhermore, we can't find any related 
> hive logs under /tmp/hive/hive_*.log. The error blocks all futher sqoop 
> import jobs. As a result, we have to restart system and it works well again. 
> The log detail is as follows,
> Encountered IOException running import job: java.io.IOException: Hive exited 
> with status 5
> at 
> org.apache.sqoop.hive.HiveImport.executeExternalHiveScript(HiveImport.java:385)
> at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:335)
> at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:239)
> at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:511)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23837) HbaseStorageHandler is not configured properly when the FileSinkOperator is the child of a MergeJoinWork

2020-07-20 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161022#comment-17161022
 ] 

Denys Kuzmenko commented on HIVE-23837:
---

Pushed to master.
Thank you for the patch, [~pvargacl]!!

> HbaseStorageHandler is not configured properly when the FileSinkOperator is 
> the child of a MergeJoinWork
> 
>
> Key: HIVE-23837
> URL: https://issues.apache.org/jira/browse/HIVE-23837
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> If the FileSinkOperator's root operator is a MergeJoinWork the 
> HbaseStorageHandler.configureJobConf will never get called, and the execution 
> will miss the HBASE_AUTH_TOKEN and the hbase jars.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-23837) HbaseStorageHandler is not configured properly when the FileSinkOperator is the child of a MergeJoinWork

2020-07-20 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko resolved HIVE-23837.
---
Resolution: Fixed

> HbaseStorageHandler is not configured properly when the FileSinkOperator is 
> the child of a MergeJoinWork
> 
>
> Key: HIVE-23837
> URL: https://issues.apache.org/jira/browse/HIVE-23837
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> If the FileSinkOperator's root operator is a MergeJoinWork the 
> HbaseStorageHandler.configureJobConf will never get called, and the execution 
> will miss the HBASE_AUTH_TOKEN and the hbase jars.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23560) Optimize bootstrap dump to abort only write Transactions

2020-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23560?focusedWorklogId=460959=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460959
 ]

ASF GitHub Bot logged work on HIVE-23560:
-

Author: ASF GitHub Bot
Created on: 20/Jul/20 08:27
Start Date: 20/Jul/20 08:27
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1232:
URL: https://github.com/apache/hive/pull/1232#discussion_r457171271



##
File path: 
standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift
##
@@ -2802,6 +2806,7 @@ PartitionsResponse get_partitions_req(1:PartitionsRequest 
req)
 
   void add_replication_metrics(1: ReplicationMetricList replicationMetricList) 
throws(1:MetaException o1)
   ReplicationMetricList get_replication_metrics(1: 
GetReplicationMetricsRequest rqst) throws(1:MetaException o1)
+  GetOpenTxnsResponse get_open_txns_req(1: GetOpenTxnsRequest 
getOpenTxnsRequest)

Review comment:
   get_open_txns_info doesn't take any input params. We needed to exclude 
certain type of txns and only return the other open txns.
   The other way could have been add a new field in TxnInfo with the TxnType 
and still return all the open txns, but filter on the client side.
   Currently get_open_txns_info filters out the read txns without informing the 
client. So we thought it might be better to expose an explicit call to exclude 
specific txn types.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 460959)
Time Spent: 2h 50m  (was: 2h 40m)

> Optimize bootstrap dump to abort only write Transactions
> 
>
> Key: HIVE-23560
> URL: https://issues.apache.org/jira/browse/HIVE-23560
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23560.01.patch, HIVE-23560.02.patch, 
> HIVE-23560.03.patch, HIVE-23560.04.patch, Optimize bootstrap dump to avoid 
> aborting all transactions.pdf
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Currently before doing a bootstrap dump, we abort all open transactions after 
> waiting for a configured time. We are proposing to abort only write 
> transactions for the db under replication and leave the read and repl created 
> transactions as is.
> This doc attached talks about it in detail



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23815) output statistics of underlying datastore

2020-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23815?focusedWorklogId=460956=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460956
 ]

ASF GitHub Bot logged work on HIVE-23815:
-

Author: ASF GitHub Bot
Created on: 20/Jul/20 08:19
Start Date: 20/Jul/20 08:19
Worklog Time Spent: 10m 
  Work Description: xinghuayu007 closed pull request #1227:
URL: https://github.com/apache/hive/pull/1227


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 460956)
Time Spent: 3h 20m  (was: 3h 10m)

> output statistics of underlying datastore 
> --
>
> Key: HIVE-23815
> URL: https://issues.apache.org/jira/browse/HIVE-23815
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rossetti Wong
>Assignee: Rossetti Wong
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> This patch provides a way to get the statistics data of metastore's 
> underlying datastore, like MySQL, Oracle and so on.  You can get the number 
> of datastore reads and writes, the average time of transaction execution, the 
> total active connection and so on.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22869) Add locking benchmark to metastore-tools/metastore-benchmarks

2020-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22869?focusedWorklogId=460952=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460952
 ]

ASF GitHub Bot logged work on HIVE-22869:
-

Author: ASF GitHub Bot
Created on: 20/Jul/20 08:04
Start Date: 20/Jul/20 08:04
Worklog Time Spent: 10m 
  Work Description: deniskuzZ merged pull request #1073:
URL: https://github.com/apache/hive/pull/1073


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 460952)
Time Spent: 2h 10m  (was: 2h)

> Add locking benchmark to metastore-tools/metastore-benchmarks
> -
>
> Key: HIVE-22869
> URL: https://issues.apache.org/jira/browse/HIVE-22869
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Chovan
>Assignee: Zoltan Chovan
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22869.2.patch, HIVE-22869.3.patch, 
> HIVE-22869.4.patch, HIVE-22869.5.patch, HIVE-22869.6.patch, 
> HIVE-22869.7.patch, HIVE-22869.8.patch, HIVE-22869.9.patch, HIVE-22869.patch
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Add the possibility to run benchmarks on opening lock in the HMS



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23560) Optimize bootstrap dump to abort only write Transactions

2020-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23560?focusedWorklogId=460951=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460951
 ]

ASF GitHub Bot logged work on HIVE-23560:
-

Author: ASF GitHub Bot
Created on: 20/Jul/20 07:57
Start Date: 20/Jul/20 07:57
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1232:
URL: https://github.com/apache/hive/pull/1232#discussion_r457149267



##
File path: 
standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift
##
@@ -2802,6 +2806,7 @@ PartitionsResponse get_partitions_req(1:PartitionsRequest 
req)
 
   void add_replication_metrics(1: ReplicationMetricList replicationMetricList) 
throws(1:MetaException o1)
   ReplicationMetricList get_replication_metrics(1: 
GetReplicationMetricsRequest rqst) throws(1:MetaException o1)
+  GetOpenTxnsResponse get_open_txns_req(1: GetOpenTxnsRequest 
getOpenTxnsRequest)

Review comment:
   Why did you introduce a new HMS API method for this?
   
   I would add a new type attribute to TxnInfo and use get_open_txns_info 
instead





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 460951)
Time Spent: 2h 40m  (was: 2.5h)

> Optimize bootstrap dump to abort only write Transactions
> 
>
> Key: HIVE-23560
> URL: https://issues.apache.org/jira/browse/HIVE-23560
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23560.01.patch, HIVE-23560.02.patch, 
> HIVE-23560.03.patch, HIVE-23560.04.patch, Optimize bootstrap dump to avoid 
> aborting all transactions.pdf
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Currently before doing a bootstrap dump, we abort all open transactions after 
> waiting for a configured time. We are proposing to abort only write 
> transactions for the db under replication and leave the read and repl created 
> transactions as is.
> This doc attached talks about it in detail



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge

2020-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23880?focusedWorklogId=460950=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460950
 ]

ASF GitHub Bot logged work on HIVE-23880:
-

Author: ASF GitHub Bot
Created on: 20/Jul/20 07:53
Start Date: 20/Jul/20 07:53
Worklog Time Spent: 10m 
  Work Description: abstractdog opened a new pull request #1280:
URL: https://github.com/apache/hive/pull/1280


   …AFBloomFilterMerge
   
   Change-Id: I235248ad327b0cea91e637e74a0c67720710737e
   
   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY)
   For more details, please see 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 460950)
Remaining Estimate: 0h
Time Spent: 10m

> Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge
> ---
>
> Key: HIVE-23880
> URL: https://issues.apache.org/jira/browse/HIVE-23880
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
> Attachments: lipwig-output3605036885489193068.svg
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Merging bloom filters in semijoin reduction can become the main bottleneck in 
> case of large number of source mapper tasks (~1000, Map 1 in below example) 
> and a large amount of expected entries (50M) in bloom filters.
> For example in TPCDS Q93:
> {code}
> select /*+ semi(store_returns, sr_item_sk, store_sales, 7000)*/ 
> ss_customer_sk
> ,sum(act_sales) sumsales
>   from (select ss_item_sk
>   ,ss_ticket_number
>   ,ss_customer_sk
>   ,case when sr_return_quantity is not null then 
> (ss_quantity-sr_return_quantity)*ss_sales_price
> else 
> (ss_quantity*ss_sales_price) end act_sales
> from store_sales left outer join store_returns on (sr_item_sk = 
> ss_item_sk
>and 
> sr_ticket_number = ss_ticket_number)
> ,reason
> where sr_reason_sk = r_reason_sk
>   and r_reason_desc = 'reason 66') t
>   group by ss_customer_sk
>   order by sumsales, ss_customer_sk
> limit 100;
> {code}
> On 10TB-30TB scale there is a chance that from 3-4 mins of query runtime 1-2 
> mins are spent with merging bloom filters (Reducer 2), as in:  
> [^lipwig-output3605036885489193068.svg] 
> {code}
> --
> VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  KILLED
> --
> Map 3 ..  llap SUCCEEDED  1  100  
>  0   0
> Map 1 ..  llap SUCCEEDED   1263   126300  
>  0   0
> Reducer 2 llap   RUNNING  1  010  
>  0   0
> Map 4 llap   RUNNING   6154  0  207 5947  
>  0   0
> Reducer 5 llapINITED 43  00   43  
>  0   0
> Reducer 6 llapINITED  1  001  
>  0   0
> --
> VERTICES: 02/06  [>>--] 16%   ELAPSED TIME: 149.98 s
> --
> {code}
> For example, 70M entries in bloom filter leads to a 436 465 696 bits, so 
> merging 1263 bloom filters means running ~ 1263 * 436 465 696 bitwise OR 
> operation, which is very hot codepath, but can be parallelized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge

2020-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23880:
--
Labels: pull-request-available  (was: )

> Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge
> ---
>
> Key: HIVE-23880
> URL: https://issues.apache.org/jira/browse/HIVE-23880
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Attachments: lipwig-output3605036885489193068.svg
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Merging bloom filters in semijoin reduction can become the main bottleneck in 
> case of large number of source mapper tasks (~1000, Map 1 in below example) 
> and a large amount of expected entries (50M) in bloom filters.
> For example in TPCDS Q93:
> {code}
> select /*+ semi(store_returns, sr_item_sk, store_sales, 7000)*/ 
> ss_customer_sk
> ,sum(act_sales) sumsales
>   from (select ss_item_sk
>   ,ss_ticket_number
>   ,ss_customer_sk
>   ,case when sr_return_quantity is not null then 
> (ss_quantity-sr_return_quantity)*ss_sales_price
> else 
> (ss_quantity*ss_sales_price) end act_sales
> from store_sales left outer join store_returns on (sr_item_sk = 
> ss_item_sk
>and 
> sr_ticket_number = ss_ticket_number)
> ,reason
> where sr_reason_sk = r_reason_sk
>   and r_reason_desc = 'reason 66') t
>   group by ss_customer_sk
>   order by sumsales, ss_customer_sk
> limit 100;
> {code}
> On 10TB-30TB scale there is a chance that from 3-4 mins of query runtime 1-2 
> mins are spent with merging bloom filters (Reducer 2), as in:  
> [^lipwig-output3605036885489193068.svg] 
> {code}
> --
> VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  KILLED
> --
> Map 3 ..  llap SUCCEEDED  1  100  
>  0   0
> Map 1 ..  llap SUCCEEDED   1263   126300  
>  0   0
> Reducer 2 llap   RUNNING  1  010  
>  0   0
> Map 4 llap   RUNNING   6154  0  207 5947  
>  0   0
> Reducer 5 llapINITED 43  00   43  
>  0   0
> Reducer 6 llapINITED  1  001  
>  0   0
> --
> VERTICES: 02/06  [>>--] 16%   ELAPSED TIME: 149.98 s
> --
> {code}
> For example, 70M entries in bloom filter leads to a 436 465 696 bits, so 
> merging 1263 bloom filters means running ~ 1263 * 436 465 696 bitwise OR 
> operation, which is very hot codepath, but can be parallelized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge

2020-07-20 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-23880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-23880:

Description: 
Merging bloom filters in semijoin reduction can become the main bottleneck in 
case of large number of source mapper tasks (~1000, Map 1 in below example) and 
a large amount of expected entries (50M) in bloom filters.

For example in TPCDS Q93:
{code}
select /*+ semi(store_returns, sr_item_sk, store_sales, 7000)*/ 
ss_customer_sk
,sum(act_sales) sumsales
  from (select ss_item_sk
  ,ss_ticket_number
  ,ss_customer_sk
  ,case when sr_return_quantity is not null then 
(ss_quantity-sr_return_quantity)*ss_sales_price
else 
(ss_quantity*ss_sales_price) end act_sales
from store_sales left outer join store_returns on (sr_item_sk = 
ss_item_sk
   and 
sr_ticket_number = ss_ticket_number)
,reason
where sr_reason_sk = r_reason_sk
  and r_reason_desc = 'reason 66') t
  group by ss_customer_sk
  order by sumsales, ss_customer_sk
limit 100;
{code}

On 10TB-30TB scale there is a chance that from 3-4 mins of query runtime 1-2 
mins are spent with merging bloom filters (Reducer 2), as in:  
[^lipwig-output3605036885489193068.svg] 

{code}
--
VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
FAILED  KILLED
--
Map 3 ..  llap SUCCEEDED  1  100
   0   0
Map 1 ..  llap SUCCEEDED   1263   126300
   0   0
Reducer 2 llap   RUNNING  1  010
   0   0
Map 4 llap   RUNNING   6154  0  207 5947
   0   0
Reducer 5 llapINITED 43  00   43
   0   0
Reducer 6 llapINITED  1  001
   0   0
--
VERTICES: 02/06  [>>--] 16%   ELAPSED TIME: 149.98 s
--
{code}

For example, 70M entries in bloom filter leads to a 436 465 696 bits, so 
merging 1263 bloom filters means running ~ 1263 * 436 465 696 bitwise OR 
operation, which is very hot codepath, but can be parallelized.


  was:
Merging bloom filters in semijoin reduction can become the main bottleneck in 
case of large number of source mapper tasks (~1000) and a large amount of 
expected entries (50M) in bloom filters.

For example in TPCDS Q93:
{code}
select /*+ semi(store_returns, sr_item_sk, store_sales, 7000)*/ 
ss_customer_sk
,sum(act_sales) sumsales
  from (select ss_item_sk
  ,ss_ticket_number
  ,ss_customer_sk
  ,case when sr_return_quantity is not null then 
(ss_quantity-sr_return_quantity)*ss_sales_price
else 
(ss_quantity*ss_sales_price) end act_sales
from store_sales left outer join store_returns on (sr_item_sk = 
ss_item_sk
   and 
sr_ticket_number = ss_ticket_number)
,reason
where sr_reason_sk = r_reason_sk
  and r_reason_desc = 'reason 66') t
  group by ss_customer_sk
  order by sumsales, ss_customer_sk
limit 100;
{code}

On 10TB-30TB scale there is a chance that from 3-4 mins of query runtime 1-2 
mins are spent with merging bloom filters, as in: 


> Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge
> ---
>
> Key: HIVE-23880
> URL: https://issues.apache.org/jira/browse/HIVE-23880
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
> Attachments: lipwig-output3605036885489193068.svg
>
>
> Merging bloom filters in semijoin reduction can become the main bottleneck in 
> case of large number of source mapper tasks (~1000, Map 1 in below example) 
> and a large amount of expected entries (50M) in bloom filters.
> For example in TPCDS Q93:
> {code}
> select /*+ semi(store_returns, sr_item_sk, store_sales, 7000)*/ 
> ss_customer_sk
> ,sum(act_sales) sumsales
>   from (select ss_item_sk
>   

[jira] [Updated] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge

2020-07-20 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-23880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-23880:

Attachment: lipwig-output3605036885489193068.svg

> Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge
> ---
>
> Key: HIVE-23880
> URL: https://issues.apache.org/jira/browse/HIVE-23880
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
> Attachments: lipwig-output3605036885489193068.svg
>
>
> Merging bloom filters in semijoin reduction can become the main bottleneck in 
> case of large number of source mapper tasks (~1000) and a large amount of 
> expected entries (50M) in bloom filters.
> For example in TPCDS Q93:
> {code}
> select /*+ semi(store_returns, sr_item_sk, store_sales, 7000)*/ 
> ss_customer_sk
> ,sum(act_sales) sumsales
>   from (select ss_item_sk
>   ,ss_ticket_number
>   ,ss_customer_sk
>   ,case when sr_return_quantity is not null then 
> (ss_quantity-sr_return_quantity)*ss_sales_price
> else 
> (ss_quantity*ss_sales_price) end act_sales
> from store_sales left outer join store_returns on (sr_item_sk = 
> ss_item_sk
>and 
> sr_ticket_number = ss_ticket_number)
> ,reason
> where sr_reason_sk = r_reason_sk
>   and r_reason_desc = 'reason 66') t
>   group by ss_customer_sk
>   order by sumsales, ss_customer_sk
> limit 100;
> {code}
> On 10TB-30TB scale there is a chance that from 3-4 mins of query runtime 1-2 
> mins are spent with merging bloom filters, as in: 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge

2020-07-20 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-23880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-23880:

Description: 
Merging bloom filters in semijoin reduction can become the main bottleneck in 
case of large number of source mapper tasks (~1000) and a large amount of 
expected entries (50M) in bloom filters.

For example in TPCDS Q93:
{code}
select /*+ semi(store_returns, sr_item_sk, store_sales, 7000)*/ 
ss_customer_sk
,sum(act_sales) sumsales
  from (select ss_item_sk
  ,ss_ticket_number
  ,ss_customer_sk
  ,case when sr_return_quantity is not null then 
(ss_quantity-sr_return_quantity)*ss_sales_price
else 
(ss_quantity*ss_sales_price) end act_sales
from store_sales left outer join store_returns on (sr_item_sk = 
ss_item_sk
   and 
sr_ticket_number = ss_ticket_number)
,reason
where sr_reason_sk = r_reason_sk
  and r_reason_desc = 'reason 66') t
  group by ss_customer_sk
  order by sumsales, ss_customer_sk
limit 100;
{code}

On 10TB-30TB scale there is a chance that from 3-4 mins of query runtime 1-2 
mins are spent with merging bloom filters, as in: 

> Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge
> ---
>
> Key: HIVE-23880
> URL: https://issues.apache.org/jira/browse/HIVE-23880
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
> Attachments: lipwig-output3605036885489193068.svg
>
>
> Merging bloom filters in semijoin reduction can become the main bottleneck in 
> case of large number of source mapper tasks (~1000) and a large amount of 
> expected entries (50M) in bloom filters.
> For example in TPCDS Q93:
> {code}
> select /*+ semi(store_returns, sr_item_sk, store_sales, 7000)*/ 
> ss_customer_sk
> ,sum(act_sales) sumsales
>   from (select ss_item_sk
>   ,ss_ticket_number
>   ,ss_customer_sk
>   ,case when sr_return_quantity is not null then 
> (ss_quantity-sr_return_quantity)*ss_sales_price
> else 
> (ss_quantity*ss_sales_price) end act_sales
> from store_sales left outer join store_returns on (sr_item_sk = 
> ss_item_sk
>and 
> sr_ticket_number = ss_ticket_number)
> ,reason
> where sr_reason_sk = r_reason_sk
>   and r_reason_desc = 'reason 66') t
>   group by ss_customer_sk
>   order by sumsales, ss_customer_sk
> limit 100;
> {code}
> On 10TB-30TB scale there is a chance that from 3-4 mins of query runtime 1-2 
> mins are spent with merging bloom filters, as in: 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23879) Data has been lost after table location was altered

2020-07-20 Thread Demyd (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Demyd updated HIVE-23879:
-
Description: 
When I alter location for not empty table and inserts data to it. I don't see 
old data at work with hs2. But I can find there in maprfs by old table location.

Steps to reproduce:
{code:sql}
1. connect to hs2 by beeline"
 hive --service beeline -u "jdbc:hive2://:1/;"

2. create test db:
 create database dbtest1 location 'hdfs:///dbtest1.db';

3. create test table:
 create table dbtest1.t1 (id int);

4. insert data to table:
 insert into dbtest1.t1 (id) values (1);

5. set new table location:
 alter table dbtest1.t1 set location 'hdfs:///dbtest1a/t1';

6. insert data to table:
 insert into dbtest1.t1 (id) values (2);
{code}

Actual result:

{code:sql}
jdbc:hive2://:> select * from dbtest1.t1;
 ++
|t1.id      |
++
|2            |
++
 1 row selected (0.097 seconds)
{code}

Expected result:

{code:sql}
jdbc:hive2://:> select * from dbtest1.t1;
 ++
|t1.id      |
++
|2            |
++
|1            |
++
 1 row selected (0.097 seconds)
 {code}
 

  was:
When I alter location for not empty table and inserts data to it. I don't see 
old data at work with hs2. But I can find there in maprfs by old table location.

Steps to reproduce:
{code:sql}
1. connect to hs2 by beeline"
 hive --service beeline -u "jdbc:hive2://:1/;"

2. create test db:
 create database dbtest1 location 'hdfs:///dbtest1.db';

3. create test table:
 create table dbtest1.t1 (id int);

4. insert data to table:
 insert into dbtest1.t1 (id) values (1);

5. set new table location:
 alter table dbtest1.t1 set location 'hdfs:///dbtest1a/t1';

6. insert data to table:
 insert into dbtest1.t1 (id) values (2);\{code}
{code}

Actual result:

{code:sql}
jdbc:hive2://:> select * from dbtest1.t1;
 ++
|t1.id      |
++
|2            |
++
 1 row selected (0.097 seconds)
{code}

Expected result:

{code:sql}
jdbc:hive2://:> select * from dbtest1.t1;
 ++
|t1.id      |
++
|2            |
++
|1            |
++
 1 row selected (0.097 seconds)
 {code}
 


> Data has been lost after table location was altered
> ---
>
> Key: HIVE-23879
> URL: https://issues.apache.org/jira/browse/HIVE-23879
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Demyd
>Priority: Major
>
> When I alter location for not empty table and inserts data to it. I don't see 
> old data at work with hs2. But I can find there in maprfs by old table 
> location.
> Steps to reproduce:
> {code:sql}
> 1. connect to hs2 by beeline"
>  hive --service beeline -u "jdbc:hive2://:1/;"
> 2. create test db:
>  create database dbtest1 location 'hdfs:///dbtest1.db';
> 3. create test table:
>  create table dbtest1.t1 (id int);
> 4. insert data to table:
>  insert into dbtest1.t1 (id) values (1);
> 5. set new table location:
>  alter table dbtest1.t1 set location 'hdfs:///dbtest1a/t1';
> 6. insert data to table:
>  insert into dbtest1.t1 (id) values (2);
> {code}
> Actual result:
> {code:sql}
> jdbc:hive2://:> select * from dbtest1.t1;
>  ++
> |t1.id      |
> ++
> |2            |
> ++
>  1 row selected (0.097 seconds)
> {code}
> Expected result:
> {code:sql}
> jdbc:hive2://:> select * from dbtest1.t1;
>  ++
> |t1.id      |
> ++
> |2            |
> ++
> |1            |
> ++
>  1 row selected (0.097 seconds)
>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23879) Data has been lost after table location was altered

2020-07-20 Thread Demyd (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Demyd updated HIVE-23879:
-
Description: 
When I alter location for not empty table and inserts data to it. I don't see 
old data at work with hs2. But I can find there in maprfs by old table location.

Steps to reproduce:
{code:sql}
1. connect to hs2 by beeline"
 hive --service beeline -u "jdbc:hive2://:1/;"

2. create test db:
 create database dbtest1 location 'hdfs:///dbtest1.db';

3. create test table:
 create table dbtest1.t1 (id int);

4. insert data to table:
 insert into dbtest1.t1 (id) values (1);

5. set new table location:
 alter table dbtest1.t1 set location 'hdfs:///dbtest1a/t1';

6. insert data to table:
 insert into dbtest1.t1 (id) values (2);\{code}
{code}

Actual result:

{code:sql}
jdbc:hive2://:> select * from dbtest1.t1;
 ++
|t1.id      |
++
|2            |
++
 1 row selected (0.097 seconds)
{code}

Expected result:

{code:sql}
jdbc:hive2://:> select * from dbtest1.t1;
 ++
|t1.id      |
++
|2            |
++
|1            |
++
 1 row selected (0.097 seconds)
 {code}
 

  was:
When I alter location for not empty table and inserts data to it. I don't see 
old data at work with hs2. But I can find there in maprfs by old table location.

Steps to reproduce:
{code:sql}
1. connect to hs2 by beeline"
 hive --service beeline -u "jdbc:hive2://:1/;"

2. create test db:
 create database dbtest1 location 'hdfs:///dbtest1.db';

3. create test table:
 create table dbtest1.t1 (id int);

4. insert data to table:
 insert into dbtest1.t1 (id) values (1);

5. set new table location:
 alter table dbtest1.t1 set location 'hdfs:///dbtest1a/t1';

6. insert data to table:
 insert into dbtest1.t1 (id) values (2);\{code}
{code}

Actual result:
{code:sql}
jdbc:hive2://:> select * from dbtest1.t1;
 ++
|t1.id      |
++
|2            |
++
 1 row selected (0.097 seconds)
{code:code}

Expected result:
{code:sql}
jdbc:hive2://:> select * from dbtest1.t1;
 ++
|t1.id      |
++
|2            |
++
|1            |
++
 1 row selected (0.097 seconds)
 {code}
 


> Data has been lost after table location was altered
> ---
>
> Key: HIVE-23879
> URL: https://issues.apache.org/jira/browse/HIVE-23879
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Demyd
>Priority: Major
>
> When I alter location for not empty table and inserts data to it. I don't see 
> old data at work with hs2. But I can find there in maprfs by old table 
> location.
> Steps to reproduce:
> {code:sql}
> 1. connect to hs2 by beeline"
>  hive --service beeline -u "jdbc:hive2://:1/;"
> 2. create test db:
>  create database dbtest1 location 'hdfs:///dbtest1.db';
> 3. create test table:
>  create table dbtest1.t1 (id int);
> 4. insert data to table:
>  insert into dbtest1.t1 (id) values (1);
> 5. set new table location:
>  alter table dbtest1.t1 set location 'hdfs:///dbtest1a/t1';
> 6. insert data to table:
>  insert into dbtest1.t1 (id) values (2);\{code}
> {code}
> Actual result:
> {code:sql}
> jdbc:hive2://:> select * from dbtest1.t1;
>  ++
> |t1.id      |
> ++
> |2            |
> ++
>  1 row selected (0.097 seconds)
> {code}
> Expected result:
> {code:sql}
> jdbc:hive2://:> select * from dbtest1.t1;
>  ++
> |t1.id      |
> ++
> |2            |
> ++
> |1            |
> ++
>  1 row selected (0.097 seconds)
>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23879) Data has been lost after table location was altered

2020-07-20 Thread Demyd (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Demyd updated HIVE-23879:
-
Description: 
When I alter location for not empty table and inserts data to it. I don't see 
old data at work with hs2. But I can find there in maprfs by old table location.

Steps to reproduce:
{code:sql}
1. connect to hs2 by beeline"
 hive --service beeline -u "jdbc:hive2://:1/;"

2. create test db:
 create database dbtest1 location 'hdfs:///dbtest1.db';

3. create test table:
 create table dbtest1.t1 (id int);

4. insert data to table:
 insert into dbtest1.t1 (id) values (1);

5. set new table location:
 alter table dbtest1.t1 set location 'hdfs:///dbtest1a/t1';

6. insert data to table:
 insert into dbtest1.t1 (id) values (2);\{code}
{code}

Actual result:
{code:sql}
jdbc:hive2://:> select * from dbtest1.t1;
 ++
|t1.id      |
++
|2            |
++
 1 row selected (0.097 seconds)
{code:code}

Expected result:
{code:sql}
jdbc:hive2://:> select * from dbtest1.t1;
 ++
|t1.id      |
++
|2            |
++
|1            |
++
 1 row selected (0.097 seconds)
 {code}
 

  was:
When I alter location for not empty table and inserts data to it. I don't see 
old data at work with hs2. But I can find there in maprfs by old table location.

Steps to reproduce:
{code:java}
1. connect to hs2 by beeline"
 hive --service beeline -u "jdbc:hive2://:1/;"

2. create test db:
 create database dbtest1 location 'hdfs:///dbtest1.db';

3. create test table:
 create table dbtest1.t1 (id int);

4. insert data to table:
 insert into dbtest1.t1 (id) values (1);

5. set new table location:
 alter table dbtest1.t1 set location 'hdfs:///dbtest1a/t1';

6. insert data to table:
 insert into dbtest1.t1 (id) values (2);\{code}

{code}
Actual result:
jdbc:hive2://:> select * from dbtest1.t1;
 ++
|t1.id      |

++
|2            |

++
 1 row selected (0.097 seconds)
{code:java}
Expected result:

{code}
jdbc:hive2://:> select * from dbtest1.t1;
 ++
|t1.id      |

++
|2            |

++
|1            |

++
 1 row selected (0.097 seconds)
{code:java}
 {code}
 


> Data has been lost after table location was altered
> ---
>
> Key: HIVE-23879
> URL: https://issues.apache.org/jira/browse/HIVE-23879
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Demyd
>Priority: Major
>
> When I alter location for not empty table and inserts data to it. I don't see 
> old data at work with hs2. But I can find there in maprfs by old table 
> location.
> Steps to reproduce:
> {code:sql}
> 1. connect to hs2 by beeline"
>  hive --service beeline -u "jdbc:hive2://:1/;"
> 2. create test db:
>  create database dbtest1 location 'hdfs:///dbtest1.db';
> 3. create test table:
>  create table dbtest1.t1 (id int);
> 4. insert data to table:
>  insert into dbtest1.t1 (id) values (1);
> 5. set new table location:
>  alter table dbtest1.t1 set location 'hdfs:///dbtest1a/t1';
> 6. insert data to table:
>  insert into dbtest1.t1 (id) values (2);\{code}
> {code}
> Actual result:
> {code:sql}
> jdbc:hive2://:> select * from dbtest1.t1;
>  ++
> |t1.id      |
> ++
> |2            |
> ++
>  1 row selected (0.097 seconds)
> {code:code}
> Expected result:
> {code:sql}
> jdbc:hive2://:> select * from dbtest1.t1;
>  ++
> |t1.id      |
> ++
> |2            |
> ++
> |1            |
> ++
>  1 row selected (0.097 seconds)
>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23879) Data has been lost after table location was altered

2020-07-20 Thread Demyd (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Demyd updated HIVE-23879:
-
Description: 
When I alter location for not empty table and inserts data to it. I don't see 
old data at work with hs2. But I can find there in maprfs by old table location.

Steps to reproduce:
{code:java}
1. connect to hs2 by beeline"
 hive --service beeline -u "jdbc:hive2://:1/;"

2. create test db:
 create database dbtest1 location 'hdfs:///dbtest1.db';

3. create test table:
 create table dbtest1.t1 (id int);

4. insert data to table:
 insert into dbtest1.t1 (id) values (1);

5. set new table location:
 alter table dbtest1.t1 set location 'hdfs:///dbtest1a/t1';

6. insert data to table:
 insert into dbtest1.t1 (id) values (2);\{code}

{code}
Actual result:
jdbc:hive2://:> select * from dbtest1.t1;
 ++
|t1.id      |

++
|2            |

++
 1 row selected (0.097 seconds)
{code:java}
Expected result:

{code}
jdbc:hive2://:> select * from dbtest1.t1;
 ++
|t1.id      |

++
|2            |

++
|1            |

++
 1 row selected (0.097 seconds)
{code:java}
 {code}
 

  was:
When I alter location for not empty table and inserts data to it. I don't see 
old data at work with hs2. But I can find there in maprfs by old table location.

Steps to reproduce:
{code}1. connect to hs2 by beeline"
 hive --service beeline -u "jdbc:hive2://:1/;auth=maprsasl;ssl=true"

2. create test db:
 create database dbtest1 location 'maprfs:///dbtest1.db';

3. create test table:
 create table dbtest1.t1 (id int);

4. insert data to table:
 insert into dbtest1.t1 (id) values (1);

5. set new table location:
 alter table dbtest1.t1 set location 'maprfs:///dbtest1a/t1';

6. insert data to table:
 insert into dbtest1.t1 (id) values (2);\{code}

Actual result:
{code}
 jdbc:hive2://:> select * from dbtest1.t1;
++
| t1.id       |
++
| 2            |
++
1 row selected (0.097 seconds)
{code}
Expected result:

{code}
 jdbc:hive2://:> select * from dbtest1.t1;
++
| t1.id       |
++
| 2            |
++
| 1            |
++
1 row selected (0.097 seconds)
{code}
 


> Data has been lost after table location was altered
> ---
>
> Key: HIVE-23879
> URL: https://issues.apache.org/jira/browse/HIVE-23879
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Demyd
>Priority: Major
>
> When I alter location for not empty table and inserts data to it. I don't see 
> old data at work with hs2. But I can find there in maprfs by old table 
> location.
> Steps to reproduce:
> {code:java}
> 1. connect to hs2 by beeline"
>  hive --service beeline -u "jdbc:hive2://:1/;"
> 2. create test db:
>  create database dbtest1 location 'hdfs:///dbtest1.db';
> 3. create test table:
>  create table dbtest1.t1 (id int);
> 4. insert data to table:
>  insert into dbtest1.t1 (id) values (1);
> 5. set new table location:
>  alter table dbtest1.t1 set location 'hdfs:///dbtest1a/t1';
> 6. insert data to table:
>  insert into dbtest1.t1 (id) values (2);\{code}
> {code}
> Actual result:
> jdbc:hive2://:> select * from dbtest1.t1;
>  ++
> |t1.id      |
> ++
> |2            |
> ++
>  1 row selected (0.097 seconds)
> {code:java}
> Expected result:
> {code}
> jdbc:hive2://:> select * from dbtest1.t1;
>  ++
> |t1.id      |
> ++
> |2            |
> ++
> |1            |
> ++
>  1 row selected (0.097 seconds)
> {code:java}
>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23879) Data has been lost after table location was altered

2020-07-20 Thread Demyd (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Demyd updated HIVE-23879:
-
Description: 
When I alter location for not empty table and inserts data to it. I don't see 
old data at work with hs2. But I can find there in maprfs by old table location.

Steps to reproduce:
{code}1. connect to hs2 by beeline"
 hive --service beeline -u "jdbc:hive2://:1/;auth=maprsasl;ssl=true"

2. create test db:
 create database dbtest1 location 'maprfs:///dbtest1.db';

3. create test table:
 create table dbtest1.t1 (id int);

4. insert data to table:
 insert into dbtest1.t1 (id) values (1);

5. set new table location:
 alter table dbtest1.t1 set location 'maprfs:///dbtest1a/t1';

6. insert data to table:
 insert into dbtest1.t1 (id) values (2);\{code}

Actual result:
{code}
 jdbc:hive2://:> select * from dbtest1.t1;
++
| t1.id       |
++
| 2            |
++
1 row selected (0.097 seconds)
{code}
Expected result:

{code}
 jdbc:hive2://:> select * from dbtest1.t1;
++
| t1.id       |
++
| 2            |
++
| 1            |
++
1 row selected (0.097 seconds)
{code}
 

  was:
When I alter location for not empty table and inserts data to it. I don't see 
old data at work with hs2. But I can find there in maprfs by old table location.

Steps to reproduce:
1. connect to hs2 by beeline"
hive --service beeline -u "jdbc:hive2://:1/;"2. create test db:
create database dbtest1 location 'hdfs:///dbtest1.db';3. create test table:
create table dbtest1.t1 (id int);

4. insert data to table:
insert into dbtest1.t1 (id) values (1);

5. set new table location:
alter table dbtest1.t1 set location 'hdfs:///dbtest1a/t1';6. insert data to 
table:
insert into dbtest1.t1 (id) values (2);



Actual result:
jdbc:hive2://:> select * from dbtest1.t1;++
| t1.id  |
++
| 2  |
++
1 row selected (0.097 seconds)



Expected result:
jdbc:hive2://:> select * from dbtest1.t1;++
| t1.id  |
++
| 2  |
++
| 1  |
++
1 row selected (0.097 seconds)


> Data has been lost after table location was altered
> ---
>
> Key: HIVE-23879
> URL: https://issues.apache.org/jira/browse/HIVE-23879
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Demyd
>Priority: Major
>
> When I alter location for not empty table and inserts data to it. I don't see 
> old data at work with hs2. But I can find there in maprfs by old table 
> location.
> Steps to reproduce:
> {code}1. connect to hs2 by beeline"
>  hive --service beeline -u "jdbc:hive2://:1/;auth=maprsasl;ssl=true"
> 2. create test db:
>  create database dbtest1 location 'maprfs:///dbtest1.db';
> 3. create test table:
>  create table dbtest1.t1 (id int);
> 4. insert data to table:
>  insert into dbtest1.t1 (id) values (1);
> 5. set new table location:
>  alter table dbtest1.t1 set location 'maprfs:///dbtest1a/t1';
> 6. insert data to table:
>  insert into dbtest1.t1 (id) values (2);\{code}
> Actual result:
> {code}
>  jdbc:hive2://:> select * from dbtest1.t1;
> ++
> | t1.id       |
> ++
> | 2            |
> ++
> 1 row selected (0.097 seconds)
> {code}
> Expected result:
> {code}
>  jdbc:hive2://:> select * from dbtest1.t1;
> ++
> | t1.id       |
> ++
> | 2            |
> ++
> | 1            |
> ++
> 1 row selected (0.097 seconds)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge

2020-07-20 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-23880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor reassigned HIVE-23880:
---

Assignee: László Bodor

> Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge
> ---
>
> Key: HIVE-23880
> URL: https://issues.apache.org/jira/browse/HIVE-23880
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)