[jira] [Commented] (HIVE-27081) Revert HIVE-26717 and HIVE-26718

2023-09-15 Thread Jacques (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17765510#comment-17765510
 ] 

Jacques commented on HIVE-27081:


Is this off the table indefinitely? In our cluster we have a large amount of 
insert-only acid tables (lecagy Impala compatibility) and rebalance compaction 
would be extremely useful in these cases as well

> Revert HIVE-26717 and HIVE-26718
> 
>
> Key: HIVE-27081
> URL: https://issues.apache.org/jira/browse/HIVE-27081
> Project: Hive
>  Issue Type: Sub-task
>Reporter: László Végh
>Assignee: László Végh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-beta-1
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Due to some unexpected challenges, the scope for rebalance compaction is 
> reduced. Only manual rebalance on full-acid tables are need to be supported.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-22215) Compaction of sorted table

2023-09-15 Thread Jacques (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17765506#comment-17765506
 ] 

Jacques commented on HIVE-22215:


Any information / view on when compaction of SORTED tables will be supported? 

> Compaction of sorted table
> --
>
> Key: HIVE-22215
> URL: https://issues.apache.org/jira/browse/HIVE-22215
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.0
>Reporter: Pawel Jurkiewicz
>Priority: Major
>
> I recently came across an issue regarding compacting tables with sorting.
> I am creating and populating with test data two tables: both ACID but only 
> one is sorted
> {code:sql}
> USE priv;
> DROP TABLE IF EXISTS test_data;
> DROP TABLE IF EXISTS test_compact_insert_with_sorting;
> DROP TABLE IF EXISTS test_compact_insert_without_sorting;
> CREATE TABLE test_data AS SELECT 'foobar' col;
> CREATE TABLE test_compact_insert_with_sorting (col string) 
> CLUSTERED BY (col) SORTED BY (col) INTO 1 BUCKETS
> TBLPROPERTIES ('transactional' = 'true', 
> 'transactional_properties'='insert_only');
> CREATE TABLE test_compact_insert_without_sorting (col string) 
> CLUSTERED BY (col) INTO 1 BUCKETS
> TBLPROPERTIES ('transactional' = 'true', 
> 'transactional_properties'='insert_only');
> INSERT OVERWRITE TABLE test_compact_insert_with_sorting SELECT col FROM 
> test_data;
> INSERT OVERWRITE TABLE test_compact_insert_without_sorting SELECT col FROM 
> test_data;  INSERT OVERWRITE TABLE test_compact_insert_with_sorting SELECT 
> col FROM test_data;
> INSERT OVERWRITE TABLE test_compact_insert_without_sorting SELECT col FROM 
> test_data; 
> {code}
> As expected, after these operations two base files were created for each 
> table:
> {code:bash}
> $ hdfs dfs -ls /warehouse/tablespace/managed/hive/priv.db/test_compact_insert*
> Found 2 items
> drwxrwx---+  - hive hadoop  0 2019-09-18 15:08 
> /warehouse/tablespace/managed/hive/priv.db/test_compact_insert_with_sorting/base_001
> drwxrwx---+  - hive hadoop  0 2019-09-18 15:08 
> /warehouse/tablespace/managed/hive/priv.db/test_compact_insert_with_sorting/base_002
> Found 2 items
> drwxrwx---+  - hive hadoop  0 2019-09-18 15:08 
> /warehouse/tablespace/managed/hive/priv.db/test_compact_insert_without_sorting/base_001
> drwxrwx---+  - hive hadoop  0 2019-09-18 15:08 
> /warehouse/tablespace/managed/hive/priv.db/test_compact_insert_without_sorting/base_002
> {code}
> But after running manual compaction on those tables:
> {code:sql}
> USE priv;
> ALTER TABLE test_compact_insert_with_sorting COMPACT 'MAJOR';
> ALTER TABLE test_compact_insert_without_sorting COMPACT 'MAJOR';
> {code}
> Tuns out only the one without sorting got compacted:
> {code:bash}
> hdfs dfs -ls /warehouse/tablespace/managed/hive/priv.db/test_compact*
> Found 2 items
> drwxrwx---+  - hive hadoop  0 2019-09-18 15:08 
> /warehouse/tablespace/managed/hive/priv.db/test_compact_insert_with_sorting/base_001
> drwxrwx---+  - hive hadoop  0 2019-09-18 15:08 
> /warehouse/tablespace/managed/hive/priv.db/test_compact_insert_with_sorting/base_002
> Found 1 items
> drwxrwx---+  - hive hadoop  0 2019-09-18 15:08 
> /warehouse/tablespace/managed/hive/priv.db/test_compact_insert_without_sorting/base_002
> {code}
> Compactions inspection returns:
> {code:bash}
> $ beeline -e 'show compactions' | grep priv | grep test_compact
> | 7598474   | priv  | test_compact_insert_with_sorting   |  ---   
> | MAJOR  | succeeded  | 
> master-01.pd.my-domain.com.pl-51  | 1568812155386  | 11 | None
> |
> | 7598475   | priv  | test_compact_insert_without_sorting|  ---   
> | MAJOR  | succeeded  |  ---  
>| 1568812155403  | 298| None
> {code}
> Is this by design? Both compactions states are 'succeeded' but only the one 
> that resulted in reducing number of base files took some time. Another 
> remarkable behavior is compaction of the table with sorting has worker 
> assigned meaning it is still in progress?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27078) Bucket Map Join can hang if the source vertex parallelism is changed by reducer autoparallelism

2023-05-04 Thread Jacques (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17719342#comment-17719342
 ] 

Jacques commented on HIVE-27078:


Any progress on this issue? We basically cannot enable BMJ on our cluster 
because of this

> Bucket Map Join can hang if the source vertex parallelism is changed by 
> reducer autoparallelism
> ---
>
> Key: HIVE-27078
> URL: https://issues.apache.org/jira/browse/HIVE-27078
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Priority: Major
>
> Considering this DAG:
> {code}
> | Map 1 <- Reducer 3 (CUSTOM_EDGE)   |
> | Map 2 <- Map 4 (CUSTOM_EDGE)   |
> | Map 5 <- Map 1 (CUSTOM_EDGE)   |
> | Reducer 3 <- Map 2 (SIMPLE_EDGE)   
> {code}
> this can be simplified further, just picked from a customer query, the 
> problematic vertices and edge is:
> {code}
> | Map 1 <- Reducer 3 (CUSTOM_EDGE)   |
> {code}
> Reducer 3 started scheduled with 20 tasks, and later it's decided by auto 
> reducer parallelism that only 4 tasks are needed:
> {code}
> 2023-02-07 13:00:36,078 [INFO] [App Shared Pool - #4] 
> |vertexmanager.ShuffleVertexManager|: Reducing auto parallelism for vertex: 
> Reducer 3 from 20 to 4
> {code}
> in this case, Map 1 can hang as it still expects 20 inputs:
> {code}
> --
> VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  KILLED
> --
> Map 4 .. container SUCCEEDED 16 1600  
>  0   0
> Map 2 .. container SUCCEEDED 48 4800  
>  0   0
> Reducer 3 .. container SUCCEEDED  4  400  
>  0   0
> Map 1container   RUNNING192  0   13  179  
>  0   0
> Map 5containerINITED241  00  241  
>  0   0
> --
> VERTICES: 03/05  [===>>---] 13%   ELAPSED TIME: 901.18 s
> --
> {code}
> in logs it's like:
> {code}
> 2022-12-08 09:42:26,845 [INFO] [I/O Setup 2 Start: {Reducer 3}] 
> |impl.ShuffleManager|: Reducer_3: numInputs=20, 
> compressionCodec=org.apache.hadoop.io.compress.SnappyCodec, numFetchers=10, 
> ifileBufferSize=4096, ifileReadAheadEnabled=true, 
> ifileReadAheadLength=4194304, localDiskFetchEnabled=true, 
> sharedFetchEnabled=false, keepAlive=true, keepAliveMaxConnections=20, 
> connectionTimeout=18, readTimeout=18, bufferSize=8192, 
> bufferSize=8192, maxTaskOutputAtOnce=20, asyncHttp=false
> ...receives the input event:
> 2022-12-08 09:42:27,134 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: 
> Routing events from heartbeat response to task, 
> currentTaskAttemptId=attempt_1670331499491_1408_1_03_39_0, eventCount=1 
> fromEventId=0 nextFromEventId=0
> ...but then it hangs while waiting for further inputs:
> "TezChild" #29 daemon prio=5 os_prio=0 tid=0x7f3fae141000 nid=0x9581 
> waiting on condition [0x7f3f737ba000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x00071ad90a00> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
>   at 
> java.util.concurrent.LinkedBlockingDeque.takeFirst(LinkedBlockingDeque.java:492)
>   at 
> java.util.concurrent.LinkedBlockingDeque.take(LinkedBlockingDeque.java:680)
>   at 
> org.apache.tez.runtime.library.common.shuffle.impl.ShuffleManager.getNextInput(ShuffleManager.java:1033)
>   at 
> org.apache.tez.runtime.library.common.readers.UnorderedKVReader.moveToNextInput(UnorderedKVReader.java:202)
>   at 
> org.apache.tez.runtime.library.common.readers.UnorderedKVReader.next(UnorderedKVReader.java:125)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastHashTableLoader.load(VectorMapJoinFastHashTableLoader.java:129)
>   at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTableInternal(MapJoinOperator.java:385)
>   at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:454)
>   at 
> 

[jira] [Commented] (HIVE-17342) Where condition with 1=0 should be treated similar to limit 0

2022-08-31 Thread Jacques (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-17342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17598219#comment-17598219
 ] 

Jacques commented on HIVE-17342:


other example with same query as per samples above:
set hive.cbo.enable=true;
set hive.optimize.limittranspose=true;
create table t1 (a1 int, b1 int);
explain cbo select y from (select a1 y from t1 where b1 > 10) q WHERE 1=0;

|CBO PLAN:|
|2|HiveProject(a1=[$0])|
|3|  HiveFilter(condition=[false])|
|4|HiveTableScan(table=[[schema, t1]], table:alias=[t1])|
|5|

> Where condition with 1=0 should be treated similar to limit 0
> -
>
> Key: HIVE-17342
> URL: https://issues.apache.org/jira/browse/HIVE-17342
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Krisztian Kasa
>Priority: Minor
>
> In some cases, queries may get executed with where condition mentioning to 
> "1=0" to get schema. E.g 
> {noformat}
> SELECT * FROM (select avg(d_year) as  y from date_dim where d_year>1999) q 
> WHERE 1=0
> {noformat}
> Currently hive executes the query; it would be good to consider this similar 
> to "limit 0" which does not execute the query.
> {code}
> hive> explain SELECT * FROM (select avg(d_year) as  y from date_dim where 
> d_year>1999) q WHERE 1=0;
> OK
> Plan optimized by CBO.
> Vertex dependency in root stage
> Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2 vectorized, llap
>   File Output Operator [FS_13]
> Group By Operator [GBY_12] (rows=1 width=76)
>   Output:["_col0"],aggregations:["avg(VALUE._col0)"]
> <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized, llap
>   PARTITION_ONLY_SHUFFLE [RS_11]
> Group By Operator [GBY_10] (rows=1 width=76)
>   Output:["_col0"],aggregations:["avg(d_year)"]
>   Filter Operator [FIL_9] (rows=1 width=0)
> predicate:false
> TableScan [TS_0] (rows=1 width=0)
>   
> default@date_dim,date_dim,Tbl:PARTIAL,Col:NONE,Output:["d_year"]
> {code}
> It does generate 0 splits, but does send a DAG plan to the AM and receive 0 
> rows as output.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-17342) Where condition with 1=0 should be treated similar to limit 0

2022-08-31 Thread Jacques (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-17342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17598213#comment-17598213
 ] 

Jacques commented on HIVE-17342:


 Can you please confirm in which  Hive version this is resolved?
I tested with HIVE3.1.3 included in CDP7.1.7 SP1 and am still getting:

explain
SELECT * FROM sch.tb1 where 1 = 0;

|1|Plan optimized by CBO.|
|2| |
|3|Stage-0|
|4|  Fetch Operator|
|5|limit:-1|
|6|Select Operator [SEL_2]|
|7|  Output:["_col0","_col1","_col2","_col3","_col4"]|
|8|  Filter Operator [FIL_4]|
|9|predicate:false|
|10|TableScan [TS_0]|
|11|  Output:["_col0","_col1","_col2","_col3","_col4"]|
|12|

 

(select from a view = majority of my use cases)
explain
SELECT * FROM sch.vw1 where 1 = 0;
| | |
|Plan optimized by CBO.| |
|2| |
|3|Stage-0|
|4|  Fetch Operator|
|5|limit:-1|
|6|Select Operator [SEL_3]|
|7|  Output:["_col0"]|
|8|  Filter Operator [FIL_2]|
|9|predicate:false|
|10|Select Operator [SEL_1]|
|11|  TableScan [TS_0]|
|12|properties:\{"insideView":"TRUE"}|
|13| |
| | |

> Where condition with 1=0 should be treated similar to limit 0
> -
>
> Key: HIVE-17342
> URL: https://issues.apache.org/jira/browse/HIVE-17342
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Krisztian Kasa
>Priority: Minor
>
> In some cases, queries may get executed with where condition mentioning to 
> "1=0" to get schema. E.g 
> {noformat}
> SELECT * FROM (select avg(d_year) as  y from date_dim where d_year>1999) q 
> WHERE 1=0
> {noformat}
> Currently hive executes the query; it would be good to consider this similar 
> to "limit 0" which does not execute the query.
> {code}
> hive> explain SELECT * FROM (select avg(d_year) as  y from date_dim where 
> d_year>1999) q WHERE 1=0;
> OK
> Plan optimized by CBO.
> Vertex dependency in root stage
> Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2 vectorized, llap
>   File Output Operator [FS_13]
> Group By Operator [GBY_12] (rows=1 width=76)
>   Output:["_col0"],aggregations:["avg(VALUE._col0)"]
> <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized, llap
>   PARTITION_ONLY_SHUFFLE [RS_11]
> Group By Operator [GBY_10] (rows=1 width=76)
>   Output:["_col0"],aggregations:["avg(d_year)"]
>   Filter Operator [FIL_9] (rows=1 width=0)
> predicate:false
> TableScan [TS_0] (rows=1 width=0)
>   
> default@date_dim,date_dim,Tbl:PARTIAL,Col:NONE,Output:["d_year"]
> {code}
> It does generate 0 splits, but does send a DAG plan to the AM and receive 0 
> rows as output.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-17342) Where condition with 1=0 should be treated similar to limit 0

2022-08-17 Thread Jacques (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-17342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17580663#comment-17580663
 ] 

Jacques commented on HIVE-17342:


I just want to add my support for this Jira. We have a scenario where a third 
party tool generates the "WHERE 0=1" syntax, so we have no control to change it 
to LIMIT 0. In our use-case, the datawarehouse design makes use of Hive views a 
lot as well, which makes the issue even worse (executing the query on a view 
that is often quite complex).

The expectation is that Hive should be able to optimize the execution away, 
similar to databases like SQL Server / etc, or like Hive itself is already 
doing with LIMIT 0.

The impact to our development team is actually quite severe, since these type 
of queries are executed in the background by the tool regularly in their dev 
process - leading to minutes long wait times.

> Where condition with 1=0 should be treated similar to limit 0
> -
>
> Key: HIVE-17342
> URL: https://issues.apache.org/jira/browse/HIVE-17342
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Minor
>
> In some cases, queries may get executed with where condition mentioning to 
> "1=0" to get schema. E.g 
> {noformat}
> SELECT * FROM (select avg(d_year) as  y from date_dim where d_year>1999) q 
> WHERE 1=0
> {noformat}
> Currently hive executes the query; it would be good to consider this similar 
> to "limit 0" which does not execute the query.
> {code}
> hive> explain SELECT * FROM (select avg(d_year) as  y from date_dim where 
> d_year>1999) q WHERE 1=0;
> OK
> Plan optimized by CBO.
> Vertex dependency in root stage
> Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2 vectorized, llap
>   File Output Operator [FS_13]
> Group By Operator [GBY_12] (rows=1 width=76)
>   Output:["_col0"],aggregations:["avg(VALUE._col0)"]
> <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized, llap
>   PARTITION_ONLY_SHUFFLE [RS_11]
> Group By Operator [GBY_10] (rows=1 width=76)
>   Output:["_col0"],aggregations:["avg(d_year)"]
>   Filter Operator [FIL_9] (rows=1 width=0)
> predicate:false
> TableScan [TS_0] (rows=1 width=0)
>   
> default@date_dim,date_dim,Tbl:PARTIAL,Col:NONE,Output:["d_year"]
> {code}
> It does generate 0 splits, but does send a DAG plan to the AM and receive 0 
> rows as output.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-11388) Allow ACID Compactor components to run in multiple metastores

2021-04-08 Thread Jacques (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-11388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17317139#comment-17317139
 ] 

Jacques commented on HIVE-11388:


The documentation for  hive.compactor.initiator.on

at [https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties] 
still mentions "It's critical that this is enabled on exactly one metastore 
service instance (not enforced yet)."
i.e. missing the "As of Hive 1.3.0 this property may be enabled on any number 
of standalone metastore instances." comment added to the link above.

Should both links be in sync?

> Allow ACID Compactor components to run in multiple metastores
> -
>
> Key: HIVE-11388
> URL: https://issues.apache.org/jira/browse/HIVE-11388
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Fix For: 1.3.0, 2.1.0
>
> Attachments: HIVE-11388.2.patch, HIVE-11388.4.patch, 
> HIVE-11388.5.patch, HIVE-11388.6.patch, HIVE-11388.7.patch, 
> HIVE-11388.branch-1.patch, HIVE-11388.patch
>
>
> (this description is no loner accurate; see further comments)
> org.apache.hadoop.hive.ql.txn.compactor.Initiator is a thread that runs 
> inside the metastore service to manage compactions of ACID tables.  There 
> should be exactly 1 instance of this thread (even with multiple Thrift 
> services).
> This is documented in 
> https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-Configuration
>  but not enforced.
> Should add enforcement, since more than 1 Initiator could cause concurrent 
> attempts to compact the same table/partition - which will not work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)