[jira] [Created] (HIVE-23804) Adding defaults for Columns Stats table in the schema to make them backward compatible
Aditya Shah created HIVE-23804: -- Summary: Adding defaults for Columns Stats table in the schema to make them backward compatible Key: HIVE-23804 URL: https://issues.apache.org/jira/browse/HIVE-23804 Project: Hive Issue Type: Sub-task Affects Versions: 2.3.7, 2.1.1 Reporter: Aditya Shah Assignee: Aditya Shah Since the table/part column statistics tables have added a new `CAT_NAME` column with `NOT NULL` constraint in version >3.0.0, queries to analyze statistics break for Hive versions <3.0.0 when used against an upgraded DB. One such miss is handled in HIVE-21739. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23803) Initiator misses compactions of table which were just allowed auto compaction after a given interval
Aditya Shah created HIVE-23803: -- Summary: Initiator misses compactions of table which were just allowed auto compaction after a given interval Key: HIVE-23803 URL: https://issues.apache.org/jira/browse/HIVE-23803 Project: Hive Issue Type: Bug Reporter: Aditya Shah After HIVE-21917 we are just looking at completed transaction components entries that have a timestamp in past check interval for initiators. But if there is a table which has `NO_AUTO_COMPACTION` set as true for a while (at least check interval) and is toggled to false, auto compaction will not happen for that table till there is no new entry in completed transaction component. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23724) Hive ACID Lock conflicts not getting resolved correctly.
Aditya Shah created HIVE-23724: -- Summary: Hive ACID Lock conflicts not getting resolved correctly. Key: HIVE-23724 URL: https://issues.apache.org/jira/browse/HIVE-23724 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 3.1.2 Reporter: Aditya Shah Assignee: Aditya Shah Steps to reproduce: 1. `Drop database temp cascade` 2. Parallelly (after 1. but while 1. is running) fire a `create table temp.temp_table (a int, b int) clustered by (a) into 2 buckets stored as orc TBLPROPERTIES ('transactional'='true')` 3. Parallelly (after 2. but while 2. is running) fire a `insert overwrite table temp.temp_table values (1,2)` note: The above could be easily reproduced by a unit test in testDbTxnManager. Observation: Exclusive lock for Table in 3. is granted although exclusive lock for DB acquired in 1. is still acquired and shared read lock on DB for 2. is waiting. Cause of issue: while acquiring a lock if we choose to ignore a conflict between the desired lock and one of the existing locks we immediately allow the desired lock to be acquired without checking against all the existing locks. The above-mentioned scenario was one such ignore conflict condition in 2. and 3. There could be other possible combinations where this may occur. Like for example when we request a lock with the same txn ids. Although hive guarantees that this scenario will not occur due to all lock requests related to a txn are asked at the same and failure of one guarantees failure of all, we in future will have to be extra careful with it. Resolution: Whenever we ignore conflict we should keep looking against all the existing locks and only then allow the lock to be acquired. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22964) MM table split computation is very slow
Aditya Shah created HIVE-22964: -- Summary: MM table split computation is very slow Key: HIVE-22964 URL: https://issues.apache.org/jira/browse/HIVE-22964 Project: Hive Issue Type: Improvement Reporter: Aditya Shah Assignee: Aditya Shah Since for MM table we process the paths prior to inputFormat.getSplits() we end up doing listing on the whole table at once. This could be optimized. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22764) Create new command for "optimize" compaction and have basic implementation.
Aditya Shah created HIVE-22764: -- Summary: Create new command for "optimize" compaction and have basic implementation. Key: HIVE-22764 URL: https://issues.apache.org/jira/browse/HIVE-22764 Project: Hive Issue Type: Sub-task Reporter: Aditya Shah Assignee: Aditya Shah Created new blocking compaction (added compaction type "optimize") by adding a lock request on the compaction's transaction. It works mostly like mmMajorCompaction and writes files w/o row_IDs. I have added an additional table property to provide optimize columns that is used by the compactor to cluster the data by. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22701) New Compaction for subsequent read's optimisations.
Aditya Shah created HIVE-22701: -- Summary: New Compaction for subsequent read's optimisations. Key: HIVE-22701 URL: https://issues.apache.org/jira/browse/HIVE-22701 Project: Hive Issue Type: New Feature Components: Transactions Reporter: Aditya Shah Introducing a new Compaction Type say "OPTIMIZE" to have the following optimizations for better reads: 1. Sort data 2. Re-bucket data 3. z-ordering 4. removing ROW_IDs I've attached a [design doc| https://docs.google.com/document/d/10zWk7FR6I0CMy57Uykbkcox4HZTMQv2sgLoZrHVeLYU/edit?usp=sharing] with the JIRA. Feel free to comment on the same. cc: [~t3rmin4t0r] [~pvary] [~lpinter] [~asomani] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22636) Data loss on skewjoin for ACID tables.
Aditya Shah created HIVE-22636: -- Summary: Data loss on skewjoin for ACID tables. Key: HIVE-22636 URL: https://issues.apache.org/jira/browse/HIVE-22636 Project: Hive Issue Type: Bug Affects Versions: 4.0.0 Reporter: Aditya Shah I am trying to do a skewjoin and writing the result into a FullAcid table. The results are incorrect. The issue is similar to seen for MM tables in HIVE-16051 where the fix was to skip having a skewjoin for MM table. Steps to reproduce: Used a qtest similar to HIVE-16051: {code:java} --! qt:dataset:src1 --! qt:dataset:src -- MASK_LINEAGE set hive.mapred.mode=nonstrict; set hive.exec.dynamic.partition.mode=nonstrict; set hive.support.concurrency=true; set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; set hive.optimize.skewjoin=true; set hive.skewjoin.key=2; set hive.optimize.metadataonly=false; CREATE TABLE skewjoin_acid(key INT, value STRING) STORED AS ORC tblproperties ("transactional"="true"); FROM src src1 JOIN src src2 ON (src1.key = src2.key) INSERT into TABLE skewjoin_acid SELECT src1.key, src2.value; select count(distinct key) from skewjoin_acid; drop table skewjoin_acid; {code} The expected result for the count was 309 but got 173. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22582) Avoid reading table as ACID when table name is starting with "delta" , but table is not transactional and BI Split Strategy is used
Aditya Shah created HIVE-22582: -- Summary: Avoid reading table as ACID when table name is starting with "delta" , but table is not transactional and BI Split Strategy is used Key: HIVE-22582 URL: https://issues.apache.org/jira/browse/HIVE-22582 Project: Hive Issue Type: Bug Reporter: Aditya Shah The issue is fixed in HIVE-22473 but missed a check for BI Split Strategy. Steps to reproduce: {code:java} set hive.exec.orc.split.strategy=BI; create table delta_result (a int) stored as orc tblproperties('transactional'='false'); insert into delta_result select 1; select * from delta_result; {code} Exception Stack Trace: {code:java} Caused by: java.lang.RuntimeException: ORC split generation failed with exception: String index out of range: -1 at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1929) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:2016) at org.apache.hadoop.hive.ql.exec.FetchOperator.generateWrappedSplits(FetchOperator.java:461) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextSplits(FetchOperator.java:430) at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:336) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:576) ... 50 more Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at java.lang.String.substring(String.java:1967) at org.apache.hadoop.hive.ql.io.AcidUtils.parsedDelta(AcidUtils.java:1128) at org.apache.hadoop.hive.ql.io.AcidUtils$ParsedDeltaLight.parse(AcidUtils.java:921) at org.apache.hadoop.hive.ql.io.AcidUtils.getLogicalLength(AcidUtils.java:2084) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$BISplitStrategy.getSplits(OrcInputFormat.java:1115) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1905) ... 55 more {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22561) Data loss on map join for bucketed, partitioned table
Aditya Shah created HIVE-22561: -- Summary: Data loss on map join for bucketed, partitioned table Key: HIVE-22561 URL: https://issues.apache.org/jira/browse/HIVE-22561 Project: Hive Issue Type: Bug Affects Versions: 3.1.2 Reporter: Aditya Shah Attachments: Screenshot 2019-11-28 at 8.45.17 PM.png, image-2019-11-28-20-46-25-432.png A map join on a column (which is neither involved in bucketing and partition) causes data loss. Steps to reproduce: Env: [hive-dev-box|[https://github.com/kgyrtkirk/hive-dev-box]] hive 3.1.2. Create tables: {code:java} CREATE TABLE `testj2`( `id` int, `bn` string, `cn` string, `ad` map, `mi` array) PARTITIONED BY ( `br` string) CLUSTERED BY ( bn) INTO 2 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE TBLPROPERTIES ( 'bucketing_version'='2'); CREATE TABLE `testj1`( `id` int, `can` string, `cn` string, `ad` map, `av` boolean, `mi` array) PARTITIONED BY ( `brand` string) CLUSTERED BY ( can) INTO 2 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE TBLPROPERTIES ( 'bucketing_version'='2'); {code} insert some data in both: {code:java} insert into testj1 values (100, 'mes_1', 'customer_1', map('city1', 560077), false, array(5, 10), 'brand_1'), (101, 'mes_2', 'customer_2', map('city2', 560078), true, array(10, 20), 'brand_2'), (102, 'mes_3', 'customer_3', map('city3', 560079), false, array(15, 30), 'brand_3'), (103, 'mes_4', 'customer_4', map('city4', 560080), true, array(20, 40), 'brand_4'), (104, 'mes_5', 'customer_5', map('city5', 560081), false, array(25, 50), 'brand_5'); insert into table testj2 values (100, 'tv_0', 'customer_0', map('city0', 560076),array(0, 0, 0), 'tv'), (101, 'tv_1', 'customer_1', map('city1', 560077),array(20, 25, 30), 'tv'), (102, 'tv_2', 'customer_2', map('city2', 560078),array(40, 50, 60), 'tv'), (103, 'tv_3', 'customer_3', map('city3', 560079),array(60, 75, 90), 'tv'), (104, 'tv_4', 'customer_4', map('city4', 560080),array(80, 100, 120), 'tv'); {code} Do a join between them: {code:java} select t1.id, t1.can, t1.cn, t2.bn,t2.ad, t2.br FROM testj1 t1 JOIN testj2 t2 on (t1.id = t2.id) order by t1.id; {code} Observed results: !image-2019-11-28-20-46-25-432.png|width=524,height=100! In the plan, I can see a map join. Disabling it gives the correct result. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22407) Hive metastore upgrade scripts have incorrect (or outdated) comment syntax
Aditya Shah created HIVE-22407: -- Summary: Hive metastore upgrade scripts have incorrect (or outdated) comment syntax Key: HIVE-22407 URL: https://issues.apache.org/jira/browse/HIVE-22407 Project: Hive Issue Type: Bug Components: Standalone Metastore Affects Versions: 3.1.2, 4.0.0 Reporter: Aditya Shah Assignee: Aditya Shah MySQL has made the single line comment which starts with `--` syntax to have min one space after this. This causes the current upgrade scripts in the standalone-metastore to throw an exception. ref: [https://dev.mysql.com/doc/refman/5.7/en/ansi-diff-comments.html] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22067) Null pointer exception for update query on a partitioned acid table
Aditya Shah created HIVE-22067: -- Summary: Null pointer exception for update query on a partitioned acid table Key: HIVE-22067 URL: https://issues.apache.org/jira/browse/HIVE-22067 Project: Hive Issue Type: Bug Reporter: Aditya Shah In case of an acid table, the final paths (array) of the filesink operator is populated by using bucket id as the index. This causes the final paths to have null entries when we don't write to some of the buckets. Thus, finally while committing the paths in closeOp this results in an NPE. Observed for the following query: {code:java} CREATE TABLE if not exists test_bckt_part(a int) partitioned by (b int) stored as orc; CREATE TABLE test_src_delete (a int, b int) CLUSTERED BY (b) into 5 BUCKETS; INSERT INTO TABLE test_src_delete values (1,2),(3,4),(5,2),(7,8),(9,10),(11,2),(34,53),(95,23),(1,2),(3,4),(5,2),(7,8),(9,10),(11,2),(34,53),(95,23); set tez.grouping.split-count=5; INSERT OVERWRITE TABLE test_bckt_part SELECT * FROM test_src_delete; Alter table test_bckt_part SET TBLPROPERTIES ('transactional'='true'); update test_bckt_part set a=99 where b=23; {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (HIVE-22004) Non-acid to acid conversion doesn't handle random filenames
Aditya Shah created HIVE-22004: -- Summary: Non-acid to acid conversion doesn't handle random filenames Key: HIVE-22004 URL: https://issues.apache.org/jira/browse/HIVE-22004 Project: Hive Issue Type: Bug Reporter: Aditya Shah Right now the supported filename patterns for non-acid to acid table's files (original files) are the only ones created by Hive itself (eg. 00, 00_COPY_1, bucket_0, etc). But at the same time Hive non-acid table supports reading from tables having files with random filenames. We should support the same for acid tables. A way to handle this would be to rename such files and though rename is not a costly operation for HDFS, But for non-acid tables with the location on a blobstore like s3 and having random filenames will have costly added steps to convert to acid. Current scenario: What we do now for original files is assign them a logical bucket id and for unrecognized patterns we assign -1 and ignore those files. Proposed alternatives: 1) For all the random files assume the logical bucket id as 0 and let the files belong to the same bucket in the way similar to we do for multiple files with same bucket id (_copy_N). 2) For all the random files lexicographically sort them and sequentially assign them a bucket id similar to the handling of multiple files for a non-bucketed table where we extract the bucket id simply from filenames -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (HIVE-21821) Backport HIVE-21739 to branch-3.1
Aditya Shah created HIVE-21821: -- Summary: Backport HIVE-21739 to branch-3.1 Key: HIVE-21821 URL: https://issues.apache.org/jira/browse/HIVE-21821 Project: Hive Issue Type: Bug Reporter: Aditya Shah Assignee: Aditya Shah -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21751) HMS database install tests broken due to db scripts being moved to new module.
Aditya Shah created HIVE-21751: -- Summary: HMS database install tests broken due to db scripts being moved to new module. Key: HIVE-21751 URL: https://issues.apache.org/jira/browse/HIVE-21751 Project: Hive Issue Type: Sub-task Reporter: Aditya Shah As the upgrade and schema scripts are moved to a new module (standalone-metastore) the db install tests introduced in HIVE-9800 break. The paths need to be corrected to avoid multiple copies of scripts. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21739) Make metastore DB backward compatible with pre-catalog versions of hive.
Aditya Shah created HIVE-21739: -- Summary: Make metastore DB backward compatible with pre-catalog versions of hive. Key: HIVE-21739 URL: https://issues.apache.org/jira/browse/HIVE-21739 Project: Hive Issue Type: Sub-task Affects Versions: 2.1.1, 1.2.0 Reporter: Aditya Shah Assignee: Aditya Shah Since the addition of foreign key constraint between Database ('DBS') table and catalogs ('CTLGS') table in HIVE-18755 we are able to run a simple create database command with an older version of Metastore Server. This is due to older versions having JDO schema as per older schema of 'DBS' which did not have an additional 'CTLG_NAME' column. The error is as follows: {code:java} org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Exception thrown flushing changes to datastore) java.sql.BatchUpdateException: Cannot add or update a child row: a foreign key constraint fails ("metastore_1238"."DBS", CONSTRAINT "CTLG_FK1" FOREIGN KEY ("CTLG_NAME") REFERENCES "CTLGS" ("NAME")) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21650) QOutProcessor should provide configurable partial masks for qtests
Aditya Shah created HIVE-21650: -- Summary: QOutProcessor should provide configurable partial masks for qtests Key: HIVE-21650 URL: https://issues.apache.org/jira/browse/HIVE-21650 Project: Hive Issue Type: Improvement Components: Test, Testing Infrastructure Reporter: Aditya Shah Assignee: Aditya Shah Fix For: 4.0.0 QOutProcessor would mask a whole bunch of outputs in q.out files if it sees any of the target mask patterns. This restricts us from testing a whole bunch of tests like for example testing directories being formed for an acid table. Thus, internal configurations where we can provide additional partial masks for us to cover such similar case would help us make our tests better. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21280) Null pointer exception on runnning compaction against a MM table.
Aditya Shah created HIVE-21280: -- Summary: Null pointer exception on runnning compaction against a MM table. Key: HIVE-21280 URL: https://issues.apache.org/jira/browse/HIVE-21280 Project: Hive Issue Type: Bug Affects Versions: 3.1.1, 3.0.0 Reporter: Aditya Shah On running compaction on MM table, got a null pointer exception while getting HDFS session path. The error seemed to me that the session state was not started for these queries. Even after making it start it further fails in running a Teztask for insert overwrite on temp table with the contents of the original table. The cause for this is Tezsession state is not able to initialize due to Illegal Argument exception being thrown at the time of setting up caller context in Tez task due to caller id which uses queryid being an empty string. I do think session state needs to be started and each of the queries running for compaction (I'm also doubtful for stats updater thread's queries) should have a query id. Some details are as follows: Steps to reproduce: 1) Using beeline with HS2 and HMS 2) create an MM table 3) Insert a few values in the table 4) alter table mm_table compact 'major'; Stack trace on HMS: {code:java} compactor.Worker: Caught exception while trying to compact id:8,dbname:default,tableName:acid_mm_orc,partName:null,state:^@,type:MAJOR,properties:null,runAs:null,tooManyAborts:false,highestWriteId:0. Marking failed to avoid repeated failures, java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to run create temporary table default.tmp_compactor_acid_mm_orc_1550222367257(`a` int, `b` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'WITH SERDEPROPERTIES ( 'serialization.format'='1')STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' LOCATION 'hdfs://localhost:9000/user/hive/warehouse/acid_mm_orc/_tmp_2d8a096c-2db5-4ed8-921c-b3f6d31e079e/_base' TBLPROPERTIES ('transactional'='false') at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.runMmCompaction(CompactorMR.java:373) at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.run(CompactorMR.java:241) at org.apache.hadoop.hive.ql.txn.compactor.Worker.run(Worker.java:174) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to run create temporary table default.tmp_compactor_acid_mm_orc_1550222367257(`a` int, `b` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'WITH SERDEPROPERTIES ( 'serialization.format'='1')STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' LOCATION 'hdfs://localhost:9000/user/hive/warehouse/acid_mm_orc/_tmp_2d8a096c-2db5-4ed8-921c-b3f6d31e079e/_base' TBLPROPERTIES ('transactional'='false') at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.runOnDriver(CompactorMR.java:525) at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.runMmCompaction(CompactorMR.java:365) ... 2 more Caused by: java.lang.NullPointerException: Non-local session path expected to be non-null at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:228) at org.apache.hadoop.hive.ql.session.SessionState.getHDFSSessionPath(SessionState.java:815) at org.apache.hadoop.hive.ql.Context.(Context.java:309) at org.apache.hadoop.hive.ql.Context.(Context.java:295) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:591) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1684) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1807) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1567) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1556) at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.runOnDriver(CompactorMR.java:522) ... 3 more {code} cc: [~ekoifman] [~vgumashta] [~sershe] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20456) Query fails with FNFException using MR with skewjoin enabled and auto convert join disabled
Aditya Shah created HIVE-20456: -- Summary: Query fails with FNFException using MR with skewjoin enabled and auto convert join disabled Key: HIVE-20456 URL: https://issues.apache.org/jira/browse/HIVE-20456 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.1.0, 2.1.1, 1.2.0 Reporter: Aditya Shah Assignee: Aditya Shah When skew join is enabled and auto convert join is disabled the query fails with file not found exception. The following query reproduces the error: set hive.optimize.skewjoin = true; set hive.auto.convert.join = false; set hive.groupby.orderby.position.alias = true; set hive.on.master=true; set hive.execution.engine=mr; drop database if exists test cascade; create database if not exists test; use test; CREATE EXTERNAL TABLE test_table1 ( `a` int , `b` int, `c` int) PARTITIONED BY ( `d` int) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' ; CREATE EXTERNAL TABLE test_table2 ( `a` int , `b` int, `c` int) PARTITIONED BY ( `d` int) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'; CREATE EXTERNAL TABLE test_table3 ( `a` int , `b` int, `c` int) PARTITIONED BY ( `e` int) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ( 'field.delim'='\u0001', 'serialization.format'='\u0001') STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'; CREATE EXTERNAL TABLE test_table4 (`a` int , `b` int, `c` int) PARTITIONED BY ( `e` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ( 'field.delim'='\u0001', 'serialization.format'='\u0001') STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'; with temp1 as ( select g.a, n.b, u.c from test_table2 g inner join test_table4 u on g.a = u.a inner join test_table3 n on u.b = n.b ), temp2 as ( select * from test_table4 where a = 2 ), temp21 as ( select g.b, n.c, u.a from temp2 g inner join test_table3 u on g.b = u.b inner join test_table2 n on u.c = n.c group by g.b, n.c, u.a ), stack as ( select * from temp1 union all select * from temp21 ) select * from stack; The query runs perfectly fine when tez is used or other combinations of skew join and auto convert join are set. On diagnosing the issue, the problem was when a conditional task resolves tasks it puts the resolved task directly in the runnable state without checking the parental dependencies as well as whether the task is already queued. -- This message was sent by Atlassian JIRA (v7.6.3#76005)