[jira] [Work logged] (HIVE-24526) Get grouped locations of external table data using metatool.

2021-01-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24526?focusedWorklogId=530548=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-530548
 ]

ASF GitHub Bot logged work on HIVE-24526:
-

Author: ASF GitHub Bot
Created on: 04/Jan/21 07:49
Start Date: 04/Jan/21 07:49
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #1768:
URL: https://github.com/apache/hive/pull/1768#discussion_r551156543



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/tools/metatool/MetaToolTaskDiffExtTblLocs.java
##
@@ -0,0 +1,165 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.metastore.tools.metatool;
+
+import org.codehaus.jettison.json.JSONArray;
+import org.codehaus.jettison.json.JSONException;
+import org.codehaus.jettison.json.JSONObject;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import java.io.File;
+import java.io.FileWriter;
+import java.io.IOException;
+import java.io.PrintWriter;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+public class MetaToolTaskDiffExtTblLocs extends MetaToolTask {
+  private static final Logger LOG = 
LoggerFactory.getLogger(MetaToolTaskDiffExtTblLocs.class);
+  @Override
+  void execute() {
+String[] args = getCl().getDiffExtTblLocsParams();
+try {
+  File file1 = new File(args[0]);
+  File file2 = new File(args[1]);
+  String ouputDir = args[2];

Review comment:
   Validate for args count, print usage message on failure.

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/tools/metatool/MetaToolTaskDiffExtTblLocs.java
##
@@ -0,0 +1,165 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.metastore.tools.metatool;
+
+import org.codehaus.jettison.json.JSONArray;
+import org.codehaus.jettison.json.JSONException;
+import org.codehaus.jettison.json.JSONObject;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import java.io.File;
+import java.io.FileWriter;
+import java.io.IOException;
+import java.io.PrintWriter;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+public class MetaToolTaskDiffExtTblLocs extends MetaToolTask {
+  private static final Logger LOG = 
LoggerFactory.getLogger(MetaToolTaskDiffExtTblLocs.class);
+  @Override
+  void execute() {
+String[] args = getCl().getDiffExtTblLocsParams();
+try {
+  File file1 = new File(args[0]);
+  File file2 = new File(args[1]);
+  String ouputDir = args[2];
+  String outFileName = "diff_" + System.currentTimeMillis();
+  System.out.println("Writing diff to " + outFileName);
+  if (!file1.exists()) {
+System.out.println("Input " + args[0] + " does not exist.");
+return;
+  }
+  if (!file2.exists()) {
+System.out.println("Input " 

[jira] [Work logged] (HIVE-24519) Optimize MV: Materialized views should not rebuild when tables are not modified

2021-01-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24519?focusedWorklogId=530546=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-530546
 ]

ASF GitHub Bot logged work on HIVE-24519:
-

Author: ASF GitHub Bot
Created on: 04/Jan/21 07:41
Start Date: 04/Jan/21 07:41
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #1772:
URL: https://github.com/apache/hive/pull/1772#discussion_r551156318



##
File path: 
ql/src/test/results/clientpositive/llap/materialized_view_rebuild_2.q.out
##
@@ -0,0 +1,171 @@
+PREHOOK: query: create table t1(col0 int) stored as orc TBLPROPERTIES 
('transactional'='true')
+PREHOOK: type: CREATETABLE
+PREHOOK: Output: database:default
+PREHOOK: Output: default@t1
+POSTHOOK: query: create table t1(col0 int) stored as orc TBLPROPERTIES 
('transactional'='true')
+POSTHOOK: type: CREATETABLE
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@t1
+PREHOOK: query: insert into t1(col0) values(1)
+PREHOOK: type: QUERY
+PREHOOK: Input: _dummy_database@_dummy_table
+PREHOOK: Output: default@t1
+POSTHOOK: query: insert into t1(col0) values(1)
+POSTHOOK: type: QUERY
+POSTHOOK: Input: _dummy_database@_dummy_table
+POSTHOOK: Output: default@t1
+POSTHOOK: Lineage: t1.col0 SCRIPT []
+PREHOOK: query: create materialized view mat1 as
+select col0 from t1 where col0 = 1
+PREHOOK: type: CREATE_MATERIALIZED_VIEW
+PREHOOK: Input: default@t1
+PREHOOK: Output: database:default
+PREHOOK: Output: default@mat1
+POSTHOOK: query: create materialized view mat1 as
+select col0 from t1 where col0 = 1
+POSTHOOK: type: CREATE_MATERIALIZED_VIEW
+POSTHOOK: Input: default@t1
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@mat1
+Materialized view default.mat1 is up to date. Cancelling rebuild.
+PREHOOK: query: explain
+alter materialized view mat1 rebuild
+PREHOOK: type: ALTER_MATERIALIZED_VIEW_REBUILD
+POSTHOOK: query: explain
+alter materialized view mat1 rebuild
+POSTHOOK: type: ALTER_MATERIALIZED_VIEW_REBUILD
+STAGE DEPENDENCIES:
+
+STAGE PLANS:
+Materialized view default.mat1 is up to date. Cancelling rebuild.
+PREHOOK: query: alter materialized view mat1 rebuild
+PREHOOK: type: ALTER_MATERIALIZED_VIEW_REBUILD
+POSTHOOK: query: alter materialized view mat1 rebuild
+POSTHOOK: type: ALTER_MATERIALIZED_VIEW_REBUILD
+PREHOOK: query: insert into t1(col0) values(1)
+PREHOOK: type: QUERY
+PREHOOK: Input: _dummy_database@_dummy_table
+PREHOOK: Output: default@t1
+POSTHOOK: query: insert into t1(col0) values(1)
+POSTHOOK: type: QUERY
+POSTHOOK: Input: _dummy_database@_dummy_table
+POSTHOOK: Output: default@t1
+POSTHOOK: Lineage: t1.col0 SCRIPT []
+PREHOOK: query: explain
+alter materialized view mat1 rebuild
+PREHOOK: type: QUERY
+PREHOOK: Input: default@t1
+PREHOOK: Output: default@mat1
+POSTHOOK: query: explain
+alter materialized view mat1 rebuild
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@t1
+POSTHOOK: Output: default@mat1
+STAGE DEPENDENCIES:
+  Stage-1 is a root stage
+  Stage-2 depends on stages: Stage-1
+  Stage-0 depends on stages: Stage-2
+  Stage-3 depends on stages: Stage-0
+  Stage-4 depends on stages: Stage-3
+
+STAGE PLANS:
+  Stage: Stage-1
+Tez
+ A masked pattern was here 
+  Edges:
+Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
+ A masked pattern was here 
+  Vertices:
+Map 1 
+Map Operator Tree:
+TableScan
+  alias: t1
+  filterExpr: ((ROW__ID.writeid > 1L) and (col0 = 1)) (type: 
boolean)
+  Statistics: Num rows: 2 Data size: 8 Basic stats: COMPLETE 
Column stats: COMPLETE
+  Filter Operator
+predicate: ((ROW__ID.writeid > 1L) and (col0 = 1)) (type: 
boolean)
+Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
Column stats: COMPLETE
+Select Operator
+  expressions: 1 (type: int)
+  outputColumnNames: _col0
+  Statistics: Num rows: 1 Data size: 4 Basic stats: 
COMPLETE Column stats: COMPLETE
+  File Output Operator
+compressed: false
+Statistics: Num rows: 1 Data size: 4 Basic stats: 
COMPLETE Column stats: COMPLETE
+table:
+input format: 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
+output format: 
org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
+serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
+name: default.mat1
+  Select Operator
+expressions: _col0 (type: int)
+outputColumnNames: col0
+Statistics: Num rows: 1 Data size: 4 Basic stats: 
COMPLETE Column 

[jira] [Work logged] (HIVE-24519) Optimize MV: Materialized views should not rebuild when tables are not modified

2021-01-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24519?focusedWorklogId=530527=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-530527
 ]

ASF GitHub Bot logged work on HIVE-24519:
-

Author: ASF GitHub Bot
Created on: 04/Jan/21 07:00
Start Date: 04/Jan/21 07:00
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #1772:
URL: https://github.com/apache/hive/pull/1772#discussion_r551144553



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/view/materialized/alter/rebuild/AlterMaterializedViewRebuildAnalyzer.java
##
@@ -63,6 +66,20 @@ public void analyzeInternal(ASTNode root) throws 
SemanticException {
   unparseTranslator.addTableNameTranslation(tableTree, 
SessionState.get().getCurrentDatabase());
   return;
 }
+
+try {
+  Boolean outdated = db.isOutdatedMaterializedView(getTxnMgr(), tableName);
+  if (outdated != null && !outdated) {
+String msg = String.format("Materialized view %s.%s is up to date. 
Cancelling rebuild.",
+tableName.getDb(), tableName.getTable());
+LOG.info(msg);
+console.printInfo(msg, false);

Review comment:
   Changed.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 530527)
Time Spent: 1h  (was: 50m)

> Optimize MV: Materialized views should not rebuild when tables are not 
> modified
> ---
>
> Key: HIVE-24519
> URL: https://issues.apache.org/jira/browse/HIVE-24519
> Project: Hive
>  Issue Type: Sub-task
>  Components: Materialized views
>Reporter: Rajesh Balamohan
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> e.g
> {noformat}
> create materialized view c_c_address as 
> select c_customer_sk from customer c, customer_address ca where 
> c_current_addr_sk = ca.ca_address_id;
> ALTER MATERIALIZED VIEW c_c_address REBUILD; <-- This shouldn't trigger 
> rebuild, when source tables are not modified
>  {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24519) Optimize MV: Materialized views should not rebuild when tables are not modified

2021-01-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24519?focusedWorklogId=530525=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-530525
 ]

ASF GitHub Bot logged work on HIVE-24519:
-

Author: ASF GitHub Bot
Created on: 04/Jan/21 06:56
Start Date: 04/Jan/21 06:56
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #1772:
URL: https://github.com/apache/hive/pull/1772#discussion_r551143401



##
File path: 
ql/src/test/results/clientpositive/llap/materialized_view_rebuild_2.q.out
##
@@ -0,0 +1,171 @@
+PREHOOK: query: create table t1(col0 int) stored as orc TBLPROPERTIES 
('transactional'='true')
+PREHOOK: type: CREATETABLE
+PREHOOK: Output: database:default
+PREHOOK: Output: default@t1
+POSTHOOK: query: create table t1(col0 int) stored as orc TBLPROPERTIES 
('transactional'='true')
+POSTHOOK: type: CREATETABLE
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@t1
+PREHOOK: query: insert into t1(col0) values(1)
+PREHOOK: type: QUERY
+PREHOOK: Input: _dummy_database@_dummy_table
+PREHOOK: Output: default@t1
+POSTHOOK: query: insert into t1(col0) values(1)
+POSTHOOK: type: QUERY
+POSTHOOK: Input: _dummy_database@_dummy_table
+POSTHOOK: Output: default@t1
+POSTHOOK: Lineage: t1.col0 SCRIPT []
+PREHOOK: query: create materialized view mat1 as
+select col0 from t1 where col0 = 1
+PREHOOK: type: CREATE_MATERIALIZED_VIEW
+PREHOOK: Input: default@t1
+PREHOOK: Output: database:default
+PREHOOK: Output: default@mat1
+POSTHOOK: query: create materialized view mat1 as
+select col0 from t1 where col0 = 1
+POSTHOOK: type: CREATE_MATERIALIZED_VIEW
+POSTHOOK: Input: default@t1
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@mat1
+Materialized view default.mat1 is up to date. Cancelling rebuild.

Review comment:
   This message belongs to the next command output:
   ```
   explain
   alter materialized view mat1 rebuild;
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 530525)
Time Spent: 50m  (was: 40m)

> Optimize MV: Materialized views should not rebuild when tables are not 
> modified
> ---
>
> Key: HIVE-24519
> URL: https://issues.apache.org/jira/browse/HIVE-24519
> Project: Hive
>  Issue Type: Sub-task
>  Components: Materialized views
>Reporter: Rajesh Balamohan
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> e.g
> {noformat}
> create materialized view c_c_address as 
> select c_customer_sk from customer c, customer_address ca where 
> c_current_addr_sk = ca.ca_address_id;
> ALTER MATERIALIZED VIEW c_c_address REBUILD; <-- This shouldn't trigger 
> rebuild, when source tables are not modified
>  {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24579) Incorrect Result For Groupby With Limit

2021-01-03 Thread Nemon Lou (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-24579:
-
Priority: Major  (was: Critical)

> Incorrect Result For Groupby With Limit
> ---
>
> Key: HIVE-24579
> URL: https://issues.apache.org/jira/browse/HIVE-24579
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.7, 3.1.2, 4.0.0
>Reporter: Nemon Lou
>Priority: Major
>
> {code:sql}
> create table test(id int);
> explain extended select id,count(*) from test group by id limit 10;
> {code}
> There is an TopN unexpectly for map phase, which casues incorrect result.
> {code:sql}
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   DagId: root_20210104141527_c599c0cd-ca2f-4c7d-a3cc-3a01d65c49a1:5
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
>   DagName: root_20210104141527_c599c0cd-ca2f-4c7d-a3cc-3a01d65c49a1:5
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: test
>   Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
>   GatherStats: false
>   Select Operator
> expressions: id (type: int)
> outputColumnNames: id
> Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
> Group By Operator
>   aggregations: count()
>   keys: id (type: int)
>   mode: hash
>   outputColumnNames: _col0, _col1
>   Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> key expressions: _col0 (type: int)
> null sort order: a
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
> tag: -1
> TopN: 10
> TopN Hash Memory Usage: 0.1
> value expressions: _col1 (type: bigint)
> auto parallelism: true
> Execution mode: vectorized
> Path -> Alias:
>   file:/user/hive/warehouse/test [test]
> Path -> Partition:
>   file:/user/hive/warehouse/test 
> Partition
>   base file name: test
>   input format: org.apache.hadoop.mapred.TextInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>   properties:
> COLUMN_STATS_ACCURATE 
> {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true"}}
> bucket_count -1
> bucketing_version 2
> column.name.delimiter ,
> columns id
> columns.comments 
> columns.types int
> file.inputformat org.apache.hadoop.mapred.TextInputFormat
> file.outputformat 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> location file:/user/hive/warehouse/test
> name default.test
> numFiles 0
> numRows 0
> rawDataSize 0
> serialization.ddl struct test { i32 id}
> serialization.format 1
> serialization.lib 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> totalSize 0
> transient_lastDdlTime 1609730190
>   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> 
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> properties:
>   COLUMN_STATS_ACCURATE 
> {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true"}}
>   bucket_count -1
>   bucketing_version 2
>   column.name.delimiter ,
>   columns id
>   columns.comments 
>   columns.types int
>   file.inputformat 
> org.apache.hadoop.mapred.TextInputFormat
>   file.outputformat 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>   location file:/user/hive/warehouse/test
>   name default.test
>   numFiles 0
>   

[jira] [Updated] (HIVE-24579) Incorrect Result For Groupby With Limit

2021-01-03 Thread Nemon Lou (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-24579:
-
Description: 
{code:sql}
create table test(id int);
explain extended select id,count(*) from test group by id limit 10;
{code}

There is an TopN unexpectly for map phase, which casues incorrect result.


{code:sql}
STAGE PLANS:
  Stage: Stage-1
Tez
  DagId: root_20210104141527_c599c0cd-ca2f-4c7d-a3cc-3a01d65c49a1:5
  Edges:
Reducer 2 <- Map 1 (SIMPLE_EDGE)
  DagName: root_20210104141527_c599c0cd-ca2f-4c7d-a3cc-3a01d65c49a1:5
  Vertices:
Map 1 
Map Operator Tree:
TableScan
  alias: test
  Statistics: Num rows: 1 Data size: 13500 Basic stats: 
COMPLETE Column stats: NONE
  GatherStats: false
  Select Operator
expressions: id (type: int)
outputColumnNames: id
Statistics: Num rows: 1 Data size: 13500 Basic stats: 
COMPLETE Column stats: NONE
Group By Operator
  aggregations: count()
  keys: id (type: int)
  mode: hash
  outputColumnNames: _col0, _col1
  Statistics: Num rows: 1 Data size: 13500 Basic stats: 
COMPLETE Column stats: NONE
  Reduce Output Operator
key expressions: _col0 (type: int)
null sort order: a
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 1 Data size: 13500 Basic stats: 
COMPLETE Column stats: NONE
tag: -1
TopN: 10
TopN Hash Memory Usage: 0.1
value expressions: _col1 (type: bigint)
auto parallelism: true
Execution mode: vectorized
Path -> Alias:
  file:/user/hive/warehouse/test [test]
Path -> Partition:
  file:/user/hive/warehouse/test 
Partition
  base file name: test
  input format: org.apache.hadoop.mapred.TextInputFormat
  output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
  properties:
COLUMN_STATS_ACCURATE 
{"BASIC_STATS":"true","COLUMN_STATS":{"id":"true"}}
bucket_count -1
bucketing_version 2
column.name.delimiter ,
columns id
columns.comments 
columns.types int
file.inputformat org.apache.hadoop.mapred.TextInputFormat
file.outputformat 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
location file:/user/hive/warehouse/test
name default.test
numFiles 0
numRows 0
rawDataSize 0
serialization.ddl struct test { i32 id}
serialization.format 1
serialization.lib 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
totalSize 0
transient_lastDdlTime 1609730190
  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

input format: org.apache.hadoop.mapred.TextInputFormat
output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
properties:
  COLUMN_STATS_ACCURATE 
{"BASIC_STATS":"true","COLUMN_STATS":{"id":"true"}}
  bucket_count -1
  bucketing_version 2
  column.name.delimiter ,
  columns id
  columns.comments 
  columns.types int
  file.inputformat org.apache.hadoop.mapred.TextInputFormat
  file.outputformat 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
  location file:/user/hive/warehouse/test
  name default.test
  numFiles 0
  numRows 0
  rawDataSize 0
  serialization.ddl struct test { i32 id}
  serialization.format 1
  serialization.lib 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
  totalSize 0
  transient_lastDdlTime 1609730190
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
name: default.test
  name: default.test
Truncated Path -> Alias:
  /test [test]
Reducer 

[jira] [Updated] (HIVE-24579) Incorrect Result For Groupby With Limit

2021-01-03 Thread Nemon Lou (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-24579:
-
Description: 
{code:sql}
create table test(id int);
explain extended select id,count(*) from test group by id limit 10;
{code}

There is an TopN unexpectly for map phase, which casues incorrect result.


{code:sql}
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
Tez
  DagId: root_20210104140946_940cd4ce-8bb5-41ac-91ec-1185245da009:4
  Edges:
Reducer 2 <- Map 1 (SIMPLE_EDGE)
  DagName: root_20210104140946_940cd4ce-8bb5-41ac-91ec-1185245da009:4
  Vertices:
Map 1 
Map Operator Tree:
TableScan
  alias: test
  Statistics: Num rows: 1 Data size: 13500 Basic stats: 
COMPLETE Column stats: NONE
  GatherStats: false
  Select Operator
expressions: id (type: int)
outputColumnNames: id
Statistics: Num rows: 1 Data size: 13500 Basic stats: 
COMPLETE Column stats: NONE
Top N Key Operator
  sort order: +
  keys: id (type: int)
  Statistics: Num rows: 1 Data size: 13500 Basic stats: 
COMPLETE Column stats: NONE
  top n: 10
  Group By Operator
aggregations: count()
keys: id (type: int)
mode: hash
outputColumnNames: _col0, _col1
Statistics: Num rows: 1 Data size: 13500 Basic stats: 
COMPLETE Column stats: NONE
Reduce Output Operator
  key expressions: _col0 (type: int)
  null sort order: a
  sort order: +
  Map-reduce partition columns: _col0 (type: int)
  Statistics: Num rows: 1 Data size: 13500 Basic stats: 
COMPLETE Column stats: NONE
  tag: -1
  TopN: 10
  TopN Hash Memory Usage: 0.1
  value expressions: _col1 (type: bigint)
  auto parallelism: true
Execution mode: vectorized
Path -> Alias:
  file:/user/hive/warehouse/test [test]
Path -> Partition:
  file:/user/hive/warehouse/test 
Partition
  base file name: test
  input format: org.apache.hadoop.mapred.TextInputFormat
  output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
  properties:
COLUMN_STATS_ACCURATE 
{"BASIC_STATS":"true","COLUMN_STATS":{"id":"true"}}
bucket_count -1
bucketing_version 2
column.name.delimiter ,
columns id
columns.comments 
columns.types int
file.inputformat org.apache.hadoop.mapred.TextInputFormat
file.outputformat 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
location file:/user/hive/warehouse/test
name default.test
numFiles 0
numRows 0
rawDataSize 0
serialization.ddl struct test { i32 id}
serialization.format 1
serialization.lib 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
totalSize 0
transient_lastDdlTime 1609730190
  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

input format: org.apache.hadoop.mapred.TextInputFormat
output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
properties:
  COLUMN_STATS_ACCURATE 
{"BASIC_STATS":"true","COLUMN_STATS":{"id":"true"}}
  bucket_count -1
  bucketing_version 2
  column.name.delimiter ,
  columns id
  columns.comments 
  columns.types int
  file.inputformat org.apache.hadoop.mapred.TextInputFormat
  file.outputformat 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
  location file:/user/hive/warehouse/test
  name default.test
  numFiles 0
  numRows 0
  rawDataSize 0
  serialization.ddl struct test { i32 id}
  serialization.format 1
  serialization.lib 

[jira] [Work logged] (HIVE-15820) comment at the head of beeline -e

2021-01-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-15820?focusedWorklogId=530520=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-530520
 ]

ASF GitHub Bot logged work on HIVE-15820:
-

Author: ASF GitHub Bot
Created on: 04/Jan/21 06:11
Start Date: 04/Jan/21 06:11
Worklog Time Spent: 10m 
  Work Description: ujc714 commented on pull request #1814:
URL: https://github.com/apache/hive/pull/1814#issuecomment-753778414


   The function trim() is called by HiveStringUtils.removeComments(String, 
int[]). As HiveStringUtils.removeComments(String, int[]) process a multiline 
statement as a single line string, only the leading spaces in the first line 
and the trailing spaces in the last line are removed. After I replaced 
HiveStringUtils.removeComments(String, int[]) with 
HiveStringUtils.removeComments(String), the leading spaces and trailing spaces 
in each line are trimmed. It's why 6 tests failed.
   
   Rather than change HiveStringUtils.removeComments(String, int[]), I changed 
the test files as the new behaviour is expected.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 530520)
Time Spent: 20m  (was: 10m)

> comment at the head of beeline -e
> -
>
> Key: HIVE-15820
> URL: https://issues.apache.org/jira/browse/HIVE-15820
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 1.2.1, 2.1.1
>Reporter: muxin
>Assignee: muxin
>Priority: Major
>  Labels: patch, pull-request-available
> Attachments: HIVE-15820.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> $ beeline -u jdbc:hive2://localhost:1 -n test -e "
> > --asdfasdfasdfasdf
> > select * from test_table;
> > "
> expected result of the above command should be all rows of test_table(same as 
> run in beeline interactive mode),but it does not output anything.
> the cause is that -e option will read commands as one string, and in method 
> dispatch(String line) it calls function isComment(String line) in the first, 
> which using
>  'lineTrimmed.startsWith("#") || lineTrimmed.startsWith("--")' 
> to regard commands as a comment.
> two ways can be considered to fix this problem:
> 1. in method initArgs(String[] args), split command by '\n' into command list 
> before dispatch when cl.getOptionValues('e') != null
> 2. in method dispatch(String line), remove comments using this:
> static String removeComments(String line) {
> if (line == null || line.isEmpty()) {
> return line;
> }
> StringBuilder builder = new StringBuilder();
> int escape = -1;
> for (int index = 0; index < line.length(); index++) {
> if (index < line.length() - 1 && line.charAt(index) == 
> line.charAt(index + 1)) {
> if (escape == -1 && line.charAt(index) == '-') {
> //find \n as the end of comment
> index = line.indexOf('\n',index+1);
> //there is no sql after this comment,so just break out
> if (-1==index){
> break;
> }
> }
> }
> char letter = line.charAt(index);
> if (letter == escape) {
> escape = -1; // Turn escape off.
> } else if (escape == -1 && (letter == '\'' || letter == '"')) {
> escape = letter; // Turn escape on.
> }
> builder.append(letter);
> }
> return builder.toString();
>   }
> the second way can be a general solution to remove all comments start with 
> '--'  in a sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24502) Store table level regular expression used during dump for table level replication

2021-01-03 Thread Anishek Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anishek Agarwal updated HIVE-24502:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Merged to master,Thanks for the patch [~aasha] and review [~pkumarsinha]

> Store table level regular expression used during dump for table level 
> replication
> -
>
> Key: HIVE-24502
> URL: https://issues.apache.org/jira/browse/HIVE-24502
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24502.01.patch, HIVE-24502.02.patch, 
> HIVE-24502.03.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Store include table list and exclude table list as part of dump meta data file



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24502) Store table level regular expression used during dump for table level replication

2021-01-03 Thread Pravin Sinha (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257945#comment-17257945
 ] 

Pravin Sinha commented on HIVE-24502:
-

+1

> Store table level regular expression used during dump for table level 
> replication
> -
>
> Key: HIVE-24502
> URL: https://issues.apache.org/jira/browse/HIVE-24502
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24502.01.patch, HIVE-24502.02.patch, 
> HIVE-24502.03.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Store include table list and exclude table list as part of dump meta data file



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24471) Add support for combiner in hash mode group aggregation

2021-01-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24471?focusedWorklogId=530507=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-530507
 ]

ASF GitHub Bot logged work on HIVE-24471:
-

Author: ASF GitHub Bot
Created on: 04/Jan/21 04:07
Start Date: 04/Jan/21 04:07
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on a change in pull request #1736:
URL: https://github.com/apache/hive/pull/1736#discussion_r551109482



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByCombiner.java
##
@@ -0,0 +1,282 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.exec.vector.VectorGroupByCombiner;
+import org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.plan.BaseWork;
+import org.apache.hadoop.hive.ql.plan.GroupByDesc;
+import org.apache.hadoop.hive.ql.plan.ReduceWork;
+import org.apache.hadoop.hive.ql.plan.TableDesc;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator;
+import org.apache.hadoop.hive.serde2.AbstractSerDe;
+import org.apache.hadoop.hive.serde2.Deserializer;
+import org.apache.hadoop.hive.serde2.SerDeException;
+import org.apache.hadoop.hive.serde2.SerDeUtils;
+import org.apache.hadoop.hive.serde2.Serializer;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
+import org.apache.hadoop.io.BytesWritable;
+import org.apache.hadoop.io.DataInputBuffer;
+import org.apache.hadoop.util.ReflectionUtils;
+import org.apache.tez.runtime.api.TaskContext;
+import org.apache.tez.runtime.library.common.sort.impl.IFile;
+import org.apache.tez.runtime.library.common.sort.impl.TezRawKeyValueIterator;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.apache.hadoop.fs.Path;
+
+import java.io.IOException;
+import java.util.ArrayList;
+
+import static org.apache.hadoop.hive.ql.exec.Utilities.HAS_REDUCE_WORK;
+import static org.apache.hadoop.hive.ql.exec.Utilities.REDUCE_PLAN_NAME;
+
+// Combiner for normal group by operator. In case of map side aggregate, the 
partially
+// aggregated records are sorted based on group by key. If because of some 
reasons, like hash
+// table memory exceeded the limit or the first few batches of records have 
less ndvs, the
+// aggregation is not done, then here the aggregation can be done cheaply as 
the records
+// are sorted based on group by key.
+public class GroupByCombiner extends VectorGroupByCombiner {
+
+  private static final Logger LOG = LoggerFactory.getLogger(
+  org.apache.hadoop.hive.ql.exec.GroupByCombiner.class.getName());
+
+  private transient GenericUDAFEvaluator[] aggregationEvaluators;
+  Deserializer valueDeserializer;
+  GenericUDAFEvaluator.AggregationBuffer[] aggregationBuffers;
+  GroupByOperator groupByOperator;
+  Serializer valueSerializer;
+  ObjectInspector aggrObjectInspector;
+  DataInputBuffer valueBuffer;
+  Object[] cachedValues;
+
+  public GroupByCombiner(TaskContext taskContext) throws HiveException, 
IOException {
+super(taskContext);
+if (rw != null) {
+  try {
+groupByOperator = (GroupByOperator) rw.getReducer();
+
+ArrayList ois = new ArrayList();
+ois.add(keyObjectInspector);
+ois.add(valueObjectInspector);
+ObjectInspector[] rowObjectInspector = new ObjectInspector[1];
+rowObjectInspector[0] =
+
ObjectInspectorFactory.getStandardStructObjectInspector(Utilities.reduceFieldNameList,
+ois);
+groupByOperator.setInputObjInspectors(rowObjectInspector);
+groupByOperator.initializeOp(conf);
+aggregationBuffers = groupByOperator.getAggregationBuffers();
+aggregationEvaluators = groupByOperator.getAggregationEvaluator();
+
+TableDesc valueTableDesc = rw.getTagToValueDesc().get(0);
+if 

[jira] [Work logged] (HIVE-24471) Add support for combiner in hash mode group aggregation

2021-01-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24471?focusedWorklogId=530508=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-530508
 ]

ASF GitHub Bot logged work on HIVE-24471:
-

Author: ASF GitHub Bot
Created on: 04/Jan/21 04:07
Start Date: 04/Jan/21 04:07
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on a change in pull request #1736:
URL: https://github.com/apache/hive/pull/1736#discussion_r551109525



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByCombiner.java
##
@@ -0,0 +1,282 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.exec.vector.VectorGroupByCombiner;
+import org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.plan.BaseWork;
+import org.apache.hadoop.hive.ql.plan.GroupByDesc;
+import org.apache.hadoop.hive.ql.plan.ReduceWork;
+import org.apache.hadoop.hive.ql.plan.TableDesc;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator;
+import org.apache.hadoop.hive.serde2.AbstractSerDe;
+import org.apache.hadoop.hive.serde2.Deserializer;
+import org.apache.hadoop.hive.serde2.SerDeException;
+import org.apache.hadoop.hive.serde2.SerDeUtils;
+import org.apache.hadoop.hive.serde2.Serializer;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
+import org.apache.hadoop.io.BytesWritable;
+import org.apache.hadoop.io.DataInputBuffer;
+import org.apache.hadoop.util.ReflectionUtils;
+import org.apache.tez.runtime.api.TaskContext;
+import org.apache.tez.runtime.library.common.sort.impl.IFile;
+import org.apache.tez.runtime.library.common.sort.impl.TezRawKeyValueIterator;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.apache.hadoop.fs.Path;
+
+import java.io.IOException;
+import java.util.ArrayList;
+
+import static org.apache.hadoop.hive.ql.exec.Utilities.HAS_REDUCE_WORK;
+import static org.apache.hadoop.hive.ql.exec.Utilities.REDUCE_PLAN_NAME;
+
+// Combiner for normal group by operator. In case of map side aggregate, the 
partially
+// aggregated records are sorted based on group by key. If because of some 
reasons, like hash
+// table memory exceeded the limit or the first few batches of records have 
less ndvs, the
+// aggregation is not done, then here the aggregation can be done cheaply as 
the records
+// are sorted based on group by key.
+public class GroupByCombiner extends VectorGroupByCombiner {
+
+  private static final Logger LOG = LoggerFactory.getLogger(
+  org.apache.hadoop.hive.ql.exec.GroupByCombiner.class.getName());
+
+  private transient GenericUDAFEvaluator[] aggregationEvaluators;
+  Deserializer valueDeserializer;
+  GenericUDAFEvaluator.AggregationBuffer[] aggregationBuffers;
+  GroupByOperator groupByOperator;
+  Serializer valueSerializer;
+  ObjectInspector aggrObjectInspector;
+  DataInputBuffer valueBuffer;
+  Object[] cachedValues;
+
+  public GroupByCombiner(TaskContext taskContext) throws HiveException, 
IOException {
+super(taskContext);
+if (rw != null) {
+  try {
+groupByOperator = (GroupByOperator) rw.getReducer();
+
+ArrayList ois = new ArrayList();
+ois.add(keyObjectInspector);
+ois.add(valueObjectInspector);
+ObjectInspector[] rowObjectInspector = new ObjectInspector[1];
+rowObjectInspector[0] =
+
ObjectInspectorFactory.getStandardStructObjectInspector(Utilities.reduceFieldNameList,
+ois);
+groupByOperator.setInputObjInspectors(rowObjectInspector);
+groupByOperator.initializeOp(conf);
+aggregationBuffers = groupByOperator.getAggregationBuffers();
+aggregationEvaluators = groupByOperator.getAggregationEvaluator();
+
+TableDesc valueTableDesc = rw.getTagToValueDesc().get(0);
+if 

[jira] [Work logged] (HIVE-24434) Filter out materialized views for rewriting if plan pattern is not allowed

2021-01-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24434?focusedWorklogId=530505=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-530505
 ]

ASF GitHub Bot logged work on HIVE-24434:
-

Author: ASF GitHub Bot
Created on: 04/Jan/21 03:53
Start Date: 04/Jan/21 03:53
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #1782:
URL: https://github.com/apache/hive/pull/1782#discussion_r551107074



##
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
##
@@ -1831,7 +1832,8 @@ public RelOptMaterialization 
getMaterializedViewForRebuild(String dbName, String
 
   private List getValidMaterializedViews(List 
materializedViewTables,
   List tablesUsed, boolean forceMVContentsUpToDate, boolean 
expandGroupingSets,
-  HiveTxnManager txnMgr) throws HiveException {
+  HiveTxnManager txnMgr, 
EnumSet 
scope)

Review comment:
   We already have a `Materialization` class so renamed this one to 
`HiveRelOptMaterialization` which fixed this.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 530505)
Time Spent: 50m  (was: 40m)

> Filter out materialized views for rewriting if plan pattern is not allowed
> --
>
> Key: HIVE-24434
> URL: https://issues.apache.org/jira/browse/HIVE-24434
> Project: Hive
>  Issue Type: Improvement
>  Components: Materialized views
>Affects Versions: 4.0.0
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Some materialized views are not enabled for Calcite based rewriting. Rules 
> for validating materialized views are implemented by HIVE-20748. 
> Since text based materialized view query rewrite doesn't have such 
> limitations some logic must be implemented to flag materialized view whether 
> they are enabled to text based rewrite only or both.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24434) Filter out materialized views for rewriting if plan pattern is not allowed

2021-01-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24434?focusedWorklogId=530504=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-530504
 ]

ASF GitHub Bot logged work on HIVE-24434:
-

Author: ASF GitHub Bot
Created on: 04/Jan/21 03:52
Start Date: 04/Jan/21 03:52
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #1782:
URL: https://github.com/apache/hive/pull/1782#discussion_r551106837



##
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Materialization.java
##
@@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.metadata;
+
+import org.apache.calcite.plan.RelOptMaterialization;
+
+import java.util.EnumSet;
+
+import static org.apache.commons.collections.CollectionUtils.intersection;
+
+/**
+ * Wrapper class of {@link RelOptMaterialization} and corresponding flags.
+ */
+public class Materialization {

Review comment:
   Changed this to extend `RelOptMaterialization`. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 530504)
Time Spent: 40m  (was: 0.5h)

> Filter out materialized views for rewriting if plan pattern is not allowed
> --
>
> Key: HIVE-24434
> URL: https://issues.apache.org/jira/browse/HIVE-24434
> Project: Hive
>  Issue Type: Improvement
>  Components: Materialized views
>Affects Versions: 4.0.0
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Some materialized views are not enabled for Calcite based rewriting. Rules 
> for validating materialized views are implemented by HIVE-20748. 
> Since text based materialized view query rewrite doesn't have such 
> limitations some logic must be implemented to flag materialized view whether 
> they are enabled to text based rewrite only or both.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24434) Filter out materialized views for rewriting if plan pattern is not allowed

2021-01-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24434?focusedWorklogId=530503=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-530503
 ]

ASF GitHub Bot logged work on HIVE-24434:
-

Author: ASF GitHub Bot
Created on: 04/Jan/21 03:51
Start Date: 04/Jan/21 03:51
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #1782:
URL: https://github.com/apache/hive/pull/1782#discussion_r551106677



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/CBOPlan.java
##
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.parse;
+
+import org.apache.calcite.rel.RelNode;
+
+/**
+ * Wrapper of Calcite plan.
+ */
+public class CBOPlan {
+  private final RelNode plan;
+  private final String invalidAutomaticRewritingMaterializationReason;
+
+  public CBOPlan(RelNode plan, String 
invalidAutomaticRewritingMaterializationReason) {
+this.plan = plan;
+this.invalidAutomaticRewritingMaterializationReason = 
invalidAutomaticRewritingMaterializationReason;
+  }
+
+  /**
+   * Root node of plan.
+   * @return Root {@link RelNode}
+   */
+  public RelNode getPlan() {
+return plan;
+  }
+
+  /**
+   * Returns an error message if this plan can not be a definition of a 
Materialized view which is an input of
+   * Calcite based materialized view query rewrite.
+   * null or empty string otherwise.

Review comment:
   fixed





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 530503)
Time Spent: 0.5h  (was: 20m)

> Filter out materialized views for rewriting if plan pattern is not allowed
> --
>
> Key: HIVE-24434
> URL: https://issues.apache.org/jira/browse/HIVE-24434
> Project: Hive
>  Issue Type: Improvement
>  Components: Materialized views
>Affects Versions: 4.0.0
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Some materialized views are not enabled for Calcite based rewriting. Rules 
> for validating materialized views are implemented by HIVE-20748. 
> Since text based materialized view query rewrite doesn't have such 
> limitations some logic must be implemented to flag materialized view whether 
> they are enabled to text based rewrite only or both.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24579) Incorrect Result For Groupby With Limit

2021-01-03 Thread Nemon Lou (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-24579:
-
Description: 
{code:sql}
create table test(id int);
explain extended select id,count(*) from test group by id limit 10;
{code}

There is an TopN unexpectly for map phase, which casues incorrect result.


{code:sql}
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
Tez
  DagId: root_20210104113831_2451d621-8f77-4a29-9da6-3a65bc4d9e56:2
  Edges:
Reducer 2 <- Map 1 (SIMPLE_EDGE)
  DagName: root_20210104113831_2451d621-8f77-4a29-9da6-3a65bc4d9e56:2
  Vertices:
Map 1 
Map Operator Tree:
TableScan
  alias: test
  Statistics: Num rows: 1 Data size: 13500 Basic stats: 
COMPLETE Column stats: NONE
  GatherStats: false
  Select Operator
expressions: id (type: int)
outputColumnNames: id
Statistics: Num rows: 1 Data size: 13500 Basic stats: 
COMPLETE Column stats: NONE
Group By Operator
  aggregations: count()
  keys: id (type: int)
  mode: hash
  outputColumnNames: _col0, _col1
  Statistics: Num rows: 1 Data size: 13500 Basic stats: 
COMPLETE Column stats: NONE
  Reduce Output Operator
key expressions: _col0 (type: int)
null sort order: a
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 1 Data size: 13500 Basic stats: 
COMPLETE Column stats: NONE
tag: -1
value expressions: _col1 (type: bigint)
auto parallelism: true
Execution mode: vectorized
Path -> Alias:
  file:/user/hive/warehouse/test [test]
Path -> Partition:
  file:/user/hive/warehouse/test 
Partition
  base file name: test
  input format: org.apache.hadoop.mapred.TextInputFormat
  output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
  properties:
COLUMN_STATS_ACCURATE 
{"BASIC_STATS":"true","COLUMN_STATS":{"id":"true"}}
bucket_count -1
bucketing_version 2
column.name.delimiter ,
columns id
columns.comments 
columns.types int
file.inputformat org.apache.hadoop.mapred.TextInputFormat
file.outputformat 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
location file:/user/hive/warehouse/test
name default.test
numFiles 0
numRows 0
rawDataSize 0
serialization.ddl struct test { i32 id}
serialization.format 1
serialization.lib 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
totalSize 0
transient_lastDdlTime 1609730190
  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

input format: org.apache.hadoop.mapred.TextInputFormat
output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
properties:
  COLUMN_STATS_ACCURATE 
{"BASIC_STATS":"true","COLUMN_STATS":{"id":"true"}}
  bucket_count -1
  bucketing_version 2
  column.name.delimiter ,
  columns id
  columns.comments 
  columns.types int
  file.inputformat org.apache.hadoop.mapred.TextInputFormat
  file.outputformat 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
  location file:/user/hive/warehouse/test
  name default.test
  numFiles 0
  numRows 0
  rawDataSize 0
  serialization.ddl struct test { i32 id}
  serialization.format 1
  serialization.lib 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
  totalSize 0
  transient_lastDdlTime 1609730190
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
name: default.test
  name: default.test
Truncated Path -> Alias:
  /test [test]
Reducer 

[jira] [Updated] (HIVE-24580) Add support for combiner in hash mode group aggregation (Support for distinct)

2021-01-03 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-24580:
---
Labels:   (was: pull-request-available)

> Add support for combiner in hash mode group aggregation (Support for distinct)
> --
>
> Key: HIVE-24580
> URL: https://issues.apache.org/jira/browse/HIVE-24580
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
> For distinct the number of  aggregation function does not match with the 
> number of value column and this needs special handling in the combiner logic.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24580) Add support for combiner in hash mode group aggregation (Support for distinct)

2021-01-03 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-24580:
---
Description: For distinct the number of  aggregation function does not 
match with the number of value column and this needs special handling in the 
combiner logic.  (was: In map side group aggregation, partial grouped 
aggregation is calculated to reduce the data written to disk by map task. In 
case of hash aggregation, where the input data is not sorted, hash table is 
used (with sorting also being performed before flushing). If the hash table 
size increases beyond configurable limit, data is flushed to disk and new hash 
table is generated. If the reduction by hash table is less than min hash 
aggregation reduction calculated during compile time, the map side aggregation 
is converted to streaming mode. So if the first few batch of records does not 
result into significant reduction, then the mode is switched to streaming mode. 
This may have impact on performance, if the subsequent batch of records have 
less number of distinct values. 

To improve performance both in Hash and Streaming mode, a combiner can be added 
to the map task after the keys are sorted. This will make sure that the 
aggregation is done if possible and reduce the data written to disk.)

> Add support for combiner in hash mode group aggregation (Support for distinct)
> --
>
> Key: HIVE-24580
> URL: https://issues.apache.org/jira/browse/HIVE-24580
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>
> For distinct the number of  aggregation function does not match with the 
> number of value column and this needs special handling in the combiner logic.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24579) Incorrect Result For Groupby With Limit

2021-01-03 Thread Nemon Lou (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257932#comment-17257932
 ] 

Nemon Lou commented on HIVE-24579:
--

A workaround is hive.limit.pushdown.memory.usage=0 .

 

> Incorrect Result For Groupby With Limit
> ---
>
> Key: HIVE-24579
> URL: https://issues.apache.org/jira/browse/HIVE-24579
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.7, 3.1.2, 4.0.0
>Reporter: Nemon Lou
>Priority: Critical
>
> {code:sql}
> create table test(id int);
> explain extended select id,count(*) from test group by id limit 10;
> {code}
> There is an TopN unexpectly for map phase, which casues incorrect result.
> {code:sql}
> STAGE PLANS:
>  Stage: Stage-1
>  Map Reduce
>  Map Operator Tree:
>  TableScan
>  alias: test
>  Statistics: Num rows: 337 Data size: 1350 Basic stats: COMPLETE Column 
> stats: NONE
>  GatherStats: false
>  Select Operator
>  expressions: id (type: int)
>  outputColumnNames: id
>  Statistics: Num rows: 337 Data size: 1350 Basic stats: COMPLETE Column 
> stats: NONE
>  Group By Operator
>  aggregations: count()
>  keys: id (type: int)
>  mode: hash
>  outputColumnNames: _col0, _col1
>  Statistics: Num rows: 337 Data size: 1350 Basic stats: COMPLETE Column 
> stats: NONE
>  Reduce Output Operator
>  key expressions: _col0 (type: int)
>  null sort order: a
>  sort order: +
>  Map-reduce partition columns: _col0 (type: int)
>  Statistics: Num rows: 337 Data size: 1350 Basic stats: COMPLETE Column 
> stats: NONE
>  tag: -1
>  TopN: 10
>  TopN Hash Memory Usage: 0.1
>  value expressions: _col1 (type: bigint)
>  auto parallelism: false
>  Path -> Alias:
>  file:/user/hive/warehouse/test [test]
>  Path -> Partition:
>  file:/user/hive/warehouse/test 
>  Partition
>  base file name: test
>  input format: org.apache.hadoop.mapred.TextInputFormat
>  output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>  properties:
>  COLUMN_STATS_ACCURATE \{"BASIC_STATS":"true"}
>  bucket_count -1
>  column.name.delimiter ,
>  columns id
>  columns.comments 
>  columns.types int
>  file.inputformat org.apache.hadoop.mapred.TextInputFormat
>  file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>  location file:/user/hive/warehouse/test
>  name default.test
>  numFiles 0
>  numRows 0
>  rawDataSize 0
>  serialization.ddl struct test \{ i32 id}
>  serialization.format 1
>  serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>  totalSize 0
>  transient_lastDdlTime 1609730036
>  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>  
>  input format: org.apache.hadoop.mapred.TextInputFormat
>  output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>  properties:
>  COLUMN_STATS_ACCURATE \{"BASIC_STATS":"true"}
>  bucket_count -1
>  column.name.delimiter ,
>  columns id
>  columns.comments 
>  columns.types int
>  file.inputformat org.apache.hadoop.mapred.TextInputFormat
>  file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>  location file:/user/hive/warehouse/test
>  name default.test
>  numFiles 0
>  numRows 0
>  rawDataSize 0
>  serialization.ddl struct test \{ i32 id}
>  serialization.format 1
>  serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>  totalSize 0
>  transient_lastDdlTime 1609730036
>  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>  name: default.test
>  name: default.test
>  Truncated Path -> Alias:
>  /test [test]
>  Needs Tagging: false
>  Reduce Operator Tree:
>  Group By Operator
>  aggregations: count(VALUE._col0)
>  keys: KEY._col0 (type: int)
>  mode: mergepartial
>  outputColumnNames: _col0, _col1
>  Statistics: Num rows: 168 Data size: 672 Basic stats: COMPLETE Column stats: 
> NONE
>  Limit
>  Number of rows: 10
>  Statistics: Num rows: 10 Data size: 40 Basic stats: COMPLETE Column stats: 
> NONE
>  File Output Operator
>  compressed: false
>  GlobalTableId: 0
>  directory: 
> file:/tmp/root/bd08973b-b58c-4185-9072-c1891f67878d/hive_2021-01-04_11-14-01_745_4475755683092435506-1/-mr-10001/.hive-staging_hive_2021-01-04_11-14-01_745_4475755683092435506-1/-ext-10002
>  NumFilesPerFileSink: 1
>  Statistics: Num rows: 10 Data size: 40 Basic stats: COMPLETE Column stats: 
> NONE
>  Stats Publishing Key Prefix: 
> file:/tmp/root/bd08973b-b58c-4185-9072-c1891f67878d/hive_2021-01-04_11-14-01_745_4475755683092435506-1/-mr-10001/.hive-staging_hive_2021-01-04_11-14-01_745_4475755683092435506-1/-ext-10002/
>  table:
>  input format: org.apache.hadoop.mapred.SequenceFileInputFormat
>  output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>  properties:
>  columns _col0,_col1
>  columns.types int:bigint
>  escape.delim \
>  hive.serialization.extend.additional.nesting.levels true
>  serialization.escape.crlf true
>  

[jira] [Updated] (HIVE-24580) Add support for combiner in hash mode group aggregation (Support for distinct)

2021-01-03 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-24580:
---
Parent: HIVE-24471
Issue Type: Sub-task  (was: Bug)

> Add support for combiner in hash mode group aggregation (Support for distinct)
> --
>
> Key: HIVE-24580
> URL: https://issues.apache.org/jira/browse/HIVE-24580
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>
> In map side group aggregation, partial grouped aggregation is calculated to 
> reduce the data written to disk by map task. In case of hash aggregation, 
> where the input data is not sorted, hash table is used (with sorting also 
> being performed before flushing). If the hash table size increases beyond 
> configurable limit, data is flushed to disk and new hash table is generated. 
> If the reduction by hash table is less than min hash aggregation reduction 
> calculated during compile time, the map side aggregation is converted to 
> streaming mode. So if the first few batch of records does not result into 
> significant reduction, then the mode is switched to streaming mode. This may 
> have impact on performance, if the subsequent batch of records have less 
> number of distinct values. 
> To improve performance both in Hash and Streaming mode, a combiner can be 
> added to the map task after the keys are sorted. This will make sure that the 
> aggregation is done if possible and reduce the data written to disk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24580) Add support for combiner in hash mode group aggregation (Support for distinct)

2021-01-03 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera reassigned HIVE-24580:
--


> Add support for combiner in hash mode group aggregation (Support for distinct)
> --
>
> Key: HIVE-24580
> URL: https://issues.apache.org/jira/browse/HIVE-24580
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>
> In map side group aggregation, partial grouped aggregation is calculated to 
> reduce the data written to disk by map task. In case of hash aggregation, 
> where the input data is not sorted, hash table is used (with sorting also 
> being performed before flushing). If the hash table size increases beyond 
> configurable limit, data is flushed to disk and new hash table is generated. 
> If the reduction by hash table is less than min hash aggregation reduction 
> calculated during compile time, the map side aggregation is converted to 
> streaming mode. So if the first few batch of records does not result into 
> significant reduction, then the mode is switched to streaming mode. This may 
> have impact on performance, if the subsequent batch of records have less 
> number of distinct values. 
> To improve performance both in Hash and Streaming mode, a combiner can be 
> added to the map task after the keys are sorted. This will make sure that the 
> aggregation is done if possible and reduce the data written to disk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24302) Cleaner shouldn't run if it can't remove obsolete files

2021-01-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24302?focusedWorklogId=530498=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-530498
 ]

ASF GitHub Bot logged work on HIVE-24302:
-

Author: ASF GitHub Bot
Created on: 04/Jan/21 01:10
Start Date: 04/Jan/21 01:10
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1612:
URL: https://github.com/apache/hive/pull/1612


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 530498)
Time Spent: 50m  (was: 40m)

> Cleaner shouldn't run if it can't remove obsolete files
> ---
>
> Key: HIVE-24302
> URL: https://issues.apache.org/jira/browse/HIVE-24302
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Example:
>  # open txn 5, leave it open (maybe it's a long-running compaction)
>  # insert into table t in txns 6, 7 with writeids 1, 2
>  # compactor.Worker runs on table t and compacts writeids 1, 2
>  # compactor.Cleaner picks up the compaction queue entry, but doesn't delete 
> any files because the min global open txnid is 5, which cannot see writeIds 
> 1, 2.
>  # Cleaner marks the compactor queue entry as cleaned and removes the entry 
> from the queue.
> delta_1 and delta_2 will remain in the file system until another compaction 
> is run on table t.
> Step 5 should not happen, we should skip calling markCleaned() and leave it 
> in the queue in "ready to clean" state. MarkCleaned() should be called only 
> after txn 5 is closed and, following that, the cleaner runs successfully.
> This will potentially slow down the cleaner, but on the other hand it won't 
> silently "fail" i.e. not do its job.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.6

2021-01-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=530491=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-530491
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 03/Jan/21 23:46
Start Date: 03/Jan/21 23:46
Worklog Time Spent: 10m 
  Work Description: pgaref commented on pull request #1823:
URL: https://github.com/apache/hive/pull/1823#issuecomment-753694143


   Seems like the current Config File provider for Jenkins does not allow 
snapshots for testing Hive (needed for the above changes until we do an ORC 
release).
   @kgyrtkirk any idea if we could add apache snapshots in the existing Jenkins 
maven repo configuration? Looks like there is restricted access to that Config 
file.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 530491)
Time Spent: 40m  (was: 0.5h)

> Upgrade ORC version to 1.6.6
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288==12318320=Create_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.6

2021-01-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=530485=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-530485
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 03/Jan/21 22:45
Start Date: 03/Jan/21 22:45
Worklog Time Spent: 10m 
  Work Description: pgaref opened a new pull request #1823:
URL: https://github.com/apache/hive/pull/1823


   ### What changes were proposed in this pull request?
   Bump apache ORC version to 1.6.6
   
   ### Why are the changes needed?
   So hive can take advantage of the latest features and bug fixes
   
   
   ### Does this PR introduce _any_ user-facing change?
   no
   
   
   ### How was this patch tested?
   Internal tests + q files



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 530485)
Time Spent: 0.5h  (was: 20m)

> Upgrade ORC version to 1.6.6
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288==12318320=Create_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.6

2021-01-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=530483=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-530483
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 03/Jan/21 22:44
Start Date: 03/Jan/21 22:44
Worklog Time Spent: 10m 
  Work Description: pgaref closed pull request #1785:
URL: https://github.com/apache/hive/pull/1785


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 530483)
Time Spent: 20m  (was: 10m)

> Upgrade ORC version to 1.6.6
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288==12318320=Create_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)