[jira] [Created] (HIVE-23804) Adding defaults for Columns Stats table in the schema to make them backward compatible

2020-07-06 Thread Aditya Shah (Jira)
Aditya Shah created HIVE-23804:
--

 Summary: Adding defaults for Columns Stats table in the schema to 
make them backward compatible
 Key: HIVE-23804
 URL: https://issues.apache.org/jira/browse/HIVE-23804
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 2.3.7, 2.1.1
Reporter: Aditya Shah
Assignee: Aditya Shah


Since the table/part column statistics tables have added a new `CAT_NAME` 
column with `NOT NULL` constraint in version >3.0.0, queries to analyze 
statistics break for Hive versions <3.0.0 when used against an upgraded DB. One 
such miss is handled in HIVE-21739.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23803) Initiator misses compactions of table which were just allowed auto compaction after a given interval

2020-07-06 Thread Aditya Shah (Jira)
Aditya Shah created HIVE-23803:
--

 Summary: Initiator misses compactions of table which were just 
allowed auto compaction after a given interval
 Key: HIVE-23803
 URL: https://issues.apache.org/jira/browse/HIVE-23803
 Project: Hive
  Issue Type: Bug
Reporter: Aditya Shah


After HIVE-21917  we are just looking at completed transaction components 
entries that have a timestamp in past check interval for initiators. But if 
there is a table which has `NO_AUTO_COMPACTION` set as true for a while (at 
least check interval) and is toggled to false, auto compaction will not happen 
for that table till there is no new entry in completed transaction component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23724) Hive ACID Lock conflicts not getting resolved correctly.

2020-06-18 Thread Aditya Shah (Jira)
Aditya Shah created HIVE-23724:
--

 Summary: Hive ACID Lock conflicts not getting resolved correctly.
 Key: HIVE-23724
 URL: https://issues.apache.org/jira/browse/HIVE-23724
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 3.1.2
Reporter: Aditya Shah
Assignee: Aditya Shah


Steps to reproduce:

1. `Drop database temp cascade`
2. Parallelly (after 1. but while 1. is running) fire a `create table 
temp.temp_table (a int, b int) clustered by (a) into 2 buckets stored as orc 
TBLPROPERTIES ('transactional'='true')`
3. Parallelly (after 2. but while 2. is running) fire a `insert overwrite table 
temp.temp_table values (1,2)`

note: The above could be easily reproduced by a unit test in testDbTxnManager.

Observation: Exclusive lock for Table in 3. is granted although exclusive lock 
for DB acquired in 1. is still acquired and shared read lock on DB for 2. is 
waiting.

Cause of issue: while acquiring a lock if we choose to ignore a conflict 
between the desired lock and one of the existing locks we immediately allow the 
desired lock to be acquired without checking against all the existing locks. 
The above-mentioned scenario was one such ignore conflict condition in 2. and 
3. There could be other possible combinations where this may occur. Like for 
example when we request a lock with the same txn ids. Although hive guarantees 
that this scenario will not occur due to all lock requests related to a txn are 
asked at the same and failure of one guarantees failure of all, we in future 
will have to be extra careful with it.

Resolution: Whenever we ignore conflict we should keep looking against all the 
existing locks and only then allow the lock to be acquired.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22964) MM table split computation is very slow

2020-03-02 Thread Aditya Shah (Jira)
Aditya Shah created HIVE-22964:
--

 Summary: MM table split computation is very slow
 Key: HIVE-22964
 URL: https://issues.apache.org/jira/browse/HIVE-22964
 Project: Hive
  Issue Type: Improvement
Reporter: Aditya Shah
Assignee: Aditya Shah


Since for MM table we process the paths prior to inputFormat.getSplits() we end 
up doing listing on the whole table at once. This could be optimized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22764) Create new command for "optimize" compaction and have basic implementation.

2020-01-23 Thread Aditya Shah (Jira)
Aditya Shah created HIVE-22764:
--

 Summary: Create new command for "optimize" compaction and have 
basic implementation.
 Key: HIVE-22764
 URL: https://issues.apache.org/jira/browse/HIVE-22764
 Project: Hive
  Issue Type: Sub-task
Reporter: Aditya Shah
Assignee: Aditya Shah


Created new blocking compaction (added compaction type "optimize") by adding a 
lock request on the compaction's transaction. It works mostly like 
mmMajorCompaction and writes files w/o row_IDs. I have added an additional 
table property to provide optimize columns that is used by the compactor to 
cluster the data by. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22701) New Compaction for subsequent read's optimisations.

2020-01-07 Thread Aditya Shah (Jira)
Aditya Shah created HIVE-22701:
--

 Summary: New Compaction for subsequent read's optimisations.
 Key: HIVE-22701
 URL: https://issues.apache.org/jira/browse/HIVE-22701
 Project: Hive
  Issue Type: New Feature
  Components: Transactions
Reporter: Aditya Shah


Introducing a new Compaction Type say "OPTIMIZE" to have the following 
optimizations for better reads:

1. Sort data
2. Re-bucket data
3. z-ordering
4. removing ROW_IDs

I've attached a [design doc| 
https://docs.google.com/document/d/10zWk7FR6I0CMy57Uykbkcox4HZTMQv2sgLoZrHVeLYU/edit?usp=sharing]
 with the JIRA. Feel free to comment on the same.

cc: [~t3rmin4t0r] [~pvary]  [~lpinter]  [~asomani]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22636) Data loss on skewjoin for ACID tables.

2019-12-12 Thread Aditya Shah (Jira)
Aditya Shah created HIVE-22636:
--

 Summary: Data loss on skewjoin for ACID tables.
 Key: HIVE-22636
 URL: https://issues.apache.org/jira/browse/HIVE-22636
 Project: Hive
  Issue Type: Bug
Affects Versions: 4.0.0
Reporter: Aditya Shah


I am trying to do a skewjoin and writing the result into a FullAcid table. The 
results are incorrect. The issue is similar to seen for MM tables in HIVE-16051 
where the fix was to skip having a skewjoin for MM table. 

Steps to reproduce:

Used a qtest similar to HIVE-16051:
{code:java}
--! qt:dataset:src1
--! qt:dataset:src

-- MASK_LINEAGE
set hive.mapred.mode=nonstrict;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.support.concurrency=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
set hive.optimize.skewjoin=true;
set hive.skewjoin.key=2;
set hive.optimize.metadataonly=false;

CREATE TABLE skewjoin_acid(key INT, value STRING) STORED AS ORC tblproperties 
("transactional"="true");
FROM src src1 JOIN src src2 ON (src1.key = src2.key) INSERT into TABLE 
skewjoin_acid SELECT src1.key, src2.value;
select count(distinct key) from skewjoin_acid;
drop table skewjoin_acid;
{code}
The expected result for the count was 309 but got 173. 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22582) Avoid reading table as ACID when table name is starting with "delta" , but table is not transactional and BI Split Strategy is used

2019-12-05 Thread Aditya Shah (Jira)
Aditya Shah created HIVE-22582:
--

 Summary: Avoid reading table as ACID when table name is starting 
with "delta" , but table is not transactional and BI Split Strategy is used
 Key: HIVE-22582
 URL: https://issues.apache.org/jira/browse/HIVE-22582
 Project: Hive
  Issue Type: Bug
Reporter: Aditya Shah


The issue is fixed in HIVE-22473 but missed a check for BI Split Strategy.

Steps to reproduce: 
{code:java}
set hive.exec.orc.split.strategy=BI;
create table delta_result (a int) stored as orc 
tblproperties('transactional'='false');
insert into delta_result select 1;
select * from delta_result;
{code}
Exception Stack Trace:
{code:java}
Caused by: java.lang.RuntimeException: ORC split generation failed with 
exception: String index out of range: -1
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1929)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:2016)
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.generateWrappedSplits(FetchOperator.java:461)
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextSplits(FetchOperator.java:430)
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:336)
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:576)
... 50 more
Caused by: java.lang.StringIndexOutOfBoundsException: String index out of 
range: -1
at java.lang.String.substring(String.java:1967)
at 
org.apache.hadoop.hive.ql.io.AcidUtils.parsedDelta(AcidUtils.java:1128)
at 
org.apache.hadoop.hive.ql.io.AcidUtils$ParsedDeltaLight.parse(AcidUtils.java:921)
at 
org.apache.hadoop.hive.ql.io.AcidUtils.getLogicalLength(AcidUtils.java:2084)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$BISplitStrategy.getSplits(OrcInputFormat.java:1115)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1905)
... 55 more
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22561) Data loss on map join for bucketed, partitioned table

2019-11-28 Thread Aditya Shah (Jira)
Aditya Shah created HIVE-22561:
--

 Summary: Data loss on map join for bucketed, partitioned table
 Key: HIVE-22561
 URL: https://issues.apache.org/jira/browse/HIVE-22561
 Project: Hive
  Issue Type: Bug
Affects Versions: 3.1.2
Reporter: Aditya Shah
 Attachments: Screenshot 2019-11-28 at 8.45.17 PM.png, 
image-2019-11-28-20-46-25-432.png

A map join on a column (which is neither involved in bucketing and partition) 
causes data loss. 

Steps to reproduce:

Env: [hive-dev-box|[https://github.com/kgyrtkirk/hive-dev-box]] hive 3.1.2.

Create tables:

 
{code:java}
CREATE TABLE `testj2`(
  `id` int, 
  `bn` string, 
  `cn` string, 
  `ad` map, 
  `mi` array)
PARTITIONED BY ( 
  `br` string)
CLUSTERED BY ( 
  bn) 
INTO 2 BUCKETS
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
TBLPROPERTIES (
  'bucketing_version'='2');

CREATE TABLE `testj1`(
  `id` int, 
  `can` string, 
  `cn` string, 
  `ad` map, 
  `av` boolean, 
  `mi` array)
PARTITIONED BY ( 
  `brand` string)
CLUSTERED BY ( 
  can) 
INTO 2 BUCKETS
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
TBLPROPERTIES (
  'bucketing_version'='2');
{code}
insert some data in both:
{code:java}
insert into testj1 values (100, 'mes_1', 'customer_1',  map('city1', 560077), 
false, array(5, 10), 'brand_1'),
(101, 'mes_2', 'customer_2',  map('city2', 560078), true, array(10, 20), 
'brand_2'),
(102, 'mes_3', 'customer_3',  map('city3', 560079), false, array(15, 30), 
'brand_3'),
(103, 'mes_4', 'customer_4',  map('city4', 560080), true, array(20, 40), 
'brand_4'),
(104, 'mes_5', 'customer_5',  map('city5', 560081), false, array(25, 50), 
'brand_5');

insert into table testj2 values (100, 'tv_0', 'customer_0', map('city0', 
560076),array(0, 0, 0), 'tv'),
(101, 'tv_1', 'customer_1', map('city1', 560077),array(20, 25, 30), 'tv'),
(102, 'tv_2', 'customer_2', map('city2', 560078),array(40, 50, 60), 'tv'),
(103, 'tv_3', 'customer_3', map('city3', 560079),array(60, 75, 90), 'tv'),
(104, 'tv_4', 'customer_4', map('city4', 560080),array(80, 100, 120), 'tv');
{code}
Do a join between them:
{code:java}
select t1.id, t1.can, t1.cn, t2.bn,t2.ad, t2.br FROM testj1 t1 JOIN testj2 t2 
on (t1.id = t2.id) order by t1.id;
{code}
Observed results:

!image-2019-11-28-20-46-25-432.png|width=524,height=100!

In the plan, I can see a map join. Disabling it gives the correct result.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22407) Hive metastore upgrade scripts have incorrect (or outdated) comment syntax

2019-10-26 Thread Aditya Shah (Jira)
Aditya Shah created HIVE-22407:
--

 Summary: Hive metastore upgrade scripts have incorrect (or 
outdated) comment syntax
 Key: HIVE-22407
 URL: https://issues.apache.org/jira/browse/HIVE-22407
 Project: Hive
  Issue Type: Bug
  Components: Standalone Metastore
Affects Versions: 3.1.2, 4.0.0
Reporter: Aditya Shah
Assignee: Aditya Shah


MySQL has made the single line comment which starts with `--` syntax to have 
min one space after this. This causes the current upgrade scripts in the 
standalone-metastore to throw an exception. 

ref: [https://dev.mysql.com/doc/refman/5.7/en/ansi-diff-comments.html]  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22067) Null pointer exception for update query on a partitioned acid table

2019-08-01 Thread Aditya Shah (JIRA)
Aditya Shah created HIVE-22067:
--

 Summary: Null pointer exception for update query on a partitioned 
acid table
 Key: HIVE-22067
 URL: https://issues.apache.org/jira/browse/HIVE-22067
 Project: Hive
  Issue Type: Bug
Reporter: Aditya Shah


In case of an acid table, the final paths (array) of the filesink operator is 
populated by using bucket id as the index. This causes the final paths to have 
null entries when we don't write to some of the buckets. Thus, finally while 
committing the paths in closeOp this results in an NPE.

Observed for the following query:

 
{code:java}
CREATE TABLE if not exists test_bckt_part(a int) partitioned by (b int)
stored as orc;
CREATE TABLE test_src_delete (a int, b int) CLUSTERED BY (b) into 5 BUCKETS;
INSERT INTO TABLE test_src_delete values 
(1,2),(3,4),(5,2),(7,8),(9,10),(11,2),(34,53),(95,23),(1,2),(3,4),(5,2),(7,8),(9,10),(11,2),(34,53),(95,23);
set tez.grouping.split-count=5;
INSERT OVERWRITE TABLE test_bckt_part SELECT * FROM test_src_delete;
Alter table test_bckt_part SET TBLPROPERTIES ('transactional'='true');
update test_bckt_part set a=99 where b=23;
{code}
 

 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (HIVE-22004) Non-acid to acid conversion doesn't handle random filenames

2019-07-17 Thread Aditya Shah (JIRA)
Aditya Shah created HIVE-22004:
--

 Summary: Non-acid to acid conversion doesn't handle random 
filenames
 Key: HIVE-22004
 URL: https://issues.apache.org/jira/browse/HIVE-22004
 Project: Hive
  Issue Type: Bug
Reporter: Aditya Shah


Right now the supported filename patterns for non-acid to acid table's files 
(original files) are the only ones created by Hive itself (eg. 00, 
00_COPY_1, bucket_0, etc). But at the same time Hive non-acid table 
supports reading from tables having files with random filenames. We should 
support the same for acid tables.

A way to handle this would be to rename such files and though rename is not a 
costly operation for HDFS, But for non-acid tables with the location on a 
blobstore like s3 and having random filenames will have costly added steps to 
convert to acid.

Current scenario: What we do now for original files is assign them a logical 
bucket id and for unrecognized patterns we assign -1 and ignore those files.

Proposed alternatives:

1) For all the random files assume the logical bucket id as 0 and let the files 
belong to the same bucket in the way similar to we do for multiple files with 
same bucket id (_copy_N). 
2) For all the random files lexicographically sort them and sequentially assign 
them a bucket id similar to the handling of multiple files for a non-bucketed 
table where we extract the bucket id simply from filenames



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (HIVE-21821) Backport HIVE-21739 to branch-3.1

2019-06-03 Thread Aditya Shah (JIRA)
Aditya Shah created HIVE-21821:
--

 Summary: Backport HIVE-21739 to branch-3.1
 Key: HIVE-21821
 URL: https://issues.apache.org/jira/browse/HIVE-21821
 Project: Hive
  Issue Type: Bug
Reporter: Aditya Shah
Assignee: Aditya Shah






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21751) HMS database install tests broken due to db scripts being moved to new module.

2019-05-18 Thread Aditya Shah (JIRA)
Aditya Shah created HIVE-21751:
--

 Summary: HMS database install tests broken due to db scripts being 
moved to new module.
 Key: HIVE-21751
 URL: https://issues.apache.org/jira/browse/HIVE-21751
 Project: Hive
  Issue Type: Sub-task
Reporter: Aditya Shah


As the upgrade and schema scripts are moved to a new module 
(standalone-metastore) the db install tests introduced in HIVE-9800 break. The 
paths need to be corrected to avoid multiple copies of scripts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21739) Make metastore DB backward compatible with pre-catalog versions of hive.

2019-05-16 Thread Aditya Shah (JIRA)
Aditya Shah created HIVE-21739:
--

 Summary: Make metastore DB backward compatible with pre-catalog 
versions of hive.
 Key: HIVE-21739
 URL: https://issues.apache.org/jira/browse/HIVE-21739
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 2.1.1, 1.2.0
Reporter: Aditya Shah
Assignee: Aditya Shah


Since the addition of foreign key constraint between Database ('DBS') table and 
catalogs ('CTLGS') table in HIVE-18755 we are able to run a simple create 
database command with an older version of Metastore Server. This is due to 
older versions having JDO schema as per older schema of 'DBS' which did not 
have an additional 'CTLG_NAME' column.

The error is as follows: 
{code:java}
org.apache.hadoop.hive.ql.metadata.HiveException: 
MetaException(message:Exception thrown flushing changes to datastore)

java.sql.BatchUpdateException: Cannot add or update a child row: a foreign key 
constraint fails ("metastore_1238"."DBS", CONSTRAINT "CTLG_FK1" FOREIGN KEY 
("CTLG_NAME") REFERENCES "CTLGS" ("NAME"))
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21650) QOutProcessor should provide configurable partial masks for qtests

2019-04-25 Thread Aditya Shah (JIRA)
Aditya Shah created HIVE-21650:
--

 Summary: QOutProcessor should provide configurable partial masks 
for qtests
 Key: HIVE-21650
 URL: https://issues.apache.org/jira/browse/HIVE-21650
 Project: Hive
  Issue Type: Improvement
  Components: Test, Testing Infrastructure
Reporter: Aditya Shah
Assignee: Aditya Shah
 Fix For: 4.0.0


QOutProcessor would mask a whole bunch of outputs in q.out files if it sees any 
of the target mask patterns. This restricts us from testing a whole bunch of 
tests like for example testing directories being formed for an acid table. 
Thus, internal configurations where we can provide additional partial masks for 
us to cover such similar case would help us make our tests better.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21280) Null pointer exception on runnning compaction against a MM table.

2019-02-16 Thread Aditya Shah (JIRA)
Aditya Shah created HIVE-21280:
--

 Summary: Null pointer exception on runnning compaction against a 
MM table.
 Key: HIVE-21280
 URL: https://issues.apache.org/jira/browse/HIVE-21280
 Project: Hive
  Issue Type: Bug
Affects Versions: 3.1.1, 3.0.0
Reporter: Aditya Shah


On running compaction on MM table, got a null pointer exception while getting 
HDFS session path. The error seemed to me that the session state was not 
started for these queries. Even after making it start it further fails in 
running a Teztask for insert overwrite on temp table with the contents of the 
original table. The cause for this is Tezsession state is not able to 
initialize due to Illegal Argument exception being thrown at the time of 
setting up caller context in Tez task due to caller id which uses queryid being 
an empty string. 
I do think session state needs to be started and each of the queries running 
for compaction (I'm also doubtful for stats updater thread's queries) should 
have a query id. Some details are as follows:


Steps to reproduce:
1) Using beeline with HS2 and HMS
2) create an MM table
3) Insert a few values in the table
4) alter table mm_table compact 'major'; 

Stack trace on HMS:
{code:java}
compactor.Worker: Caught exception while trying to compact 
id:8,dbname:default,tableName:acid_mm_orc,partName:null,state:^@,type:MAJOR,properties:null,runAs:null,tooManyAborts:false,highestWriteId:0.
 Marking failed to avoid repeated failures, java.io.IOException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Failed to run create 
temporary table default.tmp_compactor_acid_mm_orc_1550222367257(`a` int, `b` 
string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'WITH 
SERDEPROPERTIES (
'serialization.format'='1')STORED AS INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' LOCATION 
'hdfs://localhost:9000/user/hive/warehouse/acid_mm_orc/_tmp_2d8a096c-2db5-4ed8-921c-b3f6d31e079e/_base'
 TBLPROPERTIES ('transactional'='false')
at 
org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.runMmCompaction(CompactorMR.java:373)
at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.run(CompactorMR.java:241)
at org.apache.hadoop.hive.ql.txn.compactor.Worker.run(Worker.java:174)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to run 
create temporary table default.tmp_compactor_acid_mm_orc_1550222367257(`a` int, 
`b` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'WITH 
SERDEPROPERTIES (
'serialization.format'='1')STORED AS INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' LOCATION 
'hdfs://localhost:9000/user/hive/warehouse/acid_mm_orc/_tmp_2d8a096c-2db5-4ed8-921c-b3f6d31e079e/_base'
 TBLPROPERTIES ('transactional'='false')
at 
org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.runOnDriver(CompactorMR.java:525)
at 
org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.runMmCompaction(CompactorMR.java:365)
... 2 more
Caused by: java.lang.NullPointerException: Non-local session path expected to 
be non-null
at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:228)
at 
org.apache.hadoop.hive.ql.session.SessionState.getHDFSSessionPath(SessionState.java:815)
at org.apache.hadoop.hive.ql.Context.(Context.java:309)
at org.apache.hadoop.hive.ql.Context.(Context.java:295)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:591)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1684)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1807)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1567)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1556)
at 
org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.runOnDriver(CompactorMR.java:522)
... 3 more
{code}
cc: [~ekoifman] [~vgumashta] [~sershe]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20456) Query fails with FNFException using MR with skewjoin enabled and auto convert join disabled

2018-08-24 Thread Aditya Shah (JIRA)
Aditya Shah created HIVE-20456:
--

 Summary: Query fails with FNFException using MR with skewjoin 
enabled and auto convert join disabled
 Key: HIVE-20456
 URL: https://issues.apache.org/jira/browse/HIVE-20456
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 3.1.0, 2.1.1, 1.2.0
Reporter: Aditya Shah
Assignee: Aditya Shah


When skew join is enabled and auto convert join is disabled the query fails 
with file not found exception. The following query reproduces the error:



set hive.optimize.skewjoin = true;
set hive.auto.convert.join = false;
set hive.groupby.orderby.position.alias = true;
set hive.on.master=true;
set hive.execution.engine=mr;

drop database if exists test cascade;
create database if not exists test;
use test;

CREATE EXTERNAL TABLE test_table1
( `a` int , `b` int, `c` int)
PARTITIONED BY (
`d` int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
;

CREATE EXTERNAL TABLE test_table2
( `a` int , `b` int, `c` int)
PARTITIONED BY (
`d` int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';


CREATE EXTERNAL TABLE test_table3
( `a` int , `b` int, `c` int)
PARTITIONED BY (
`e` int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
'field.delim'='\u0001',
'serialization.format'='\u0001')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';


CREATE EXTERNAL TABLE test_table4 (`a` int , `b` int, `c` int)
PARTITIONED BY (
`e` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
'field.delim'='\u0001',
'serialization.format'='\u0001')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';


with
temp1 as (
select
g.a,
n.b,
u.c
from
test_table2 g
inner join test_table4 u on g.a = u.a
inner join test_table3 n on u.b = n.b
),
temp2 as (
select * from test_table4 where a = 2
),
temp21 as (
select
g.b,
n.c,
u.a
from
temp2 g
inner join test_table3 u on g.b = u.b
inner join test_table2 n on u.c = n.c
group by g.b, n.c, u.a
),
stack as (
select * from temp1
union all
select * from temp21
)
select * from stack;



The query runs perfectly fine when tez is used or other combinations of skew 
join and auto convert join are set. On diagnosing the issue, the problem was 
when a conditional task resolves tasks it puts the resolved task directly in 
the runnable state without checking the parental dependencies as well as 
whether the task is already queued.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)