date:20160818

[jira] [Commented] (HIVE-12077) MSCK Repair table should fix partitions in batches

2016-08-18 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427659#comment-15427659
 ] 

Lefty Leverenz commented on HIVE-12077:
---

Doc note:  HIVE-14571 tracks documenting the new configuration parameter 
*hive.msck.repair.batch.size*.

> MSCK Repair table should fix partitions in batches 
> ---
>
> Key: HIVE-12077
> URL: https://issues.apache.org/jira/browse/HIVE-12077
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Ryan P
>Assignee: Chinna Rao Lalam
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-12077.1.patch, HIVE-12077.2.patch, 
> HIVE-12077.3.patch, HIVE-12077.4.patch, HIVE-12077.5.patch
>
>
> If a user attempts to run MSCK REPAIR TABLE on a directory with a large 
> number of untracked partitions HMS will OOME. I suspect this is because it 
> attempts to do one large bulk load in an effort to save time. Ultimately this 
> can lead to a collection so large in size that HMS eventually hits an Out of 
> Memory Exception. 
> Instead I suggest that Hive include a configurable batch size that HMS can 
> use to break up the load. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12077) MSCK Repair table should fix partitions in batches

2016-08-18 Thread Lefty Leverenz (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-12077:
--
Labels: TODOC2.2  (was: )

> MSCK Repair table should fix partitions in batches 
> ---
>
> Key: HIVE-12077
> URL: https://issues.apache.org/jira/browse/HIVE-12077
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Ryan P
>Assignee: Chinna Rao Lalam
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-12077.1.patch, HIVE-12077.2.patch, 
> HIVE-12077.3.patch, HIVE-12077.4.patch, HIVE-12077.5.patch
>
>
> If a user attempts to run MSCK REPAIR TABLE on a directory with a large 
> number of untracked partitions HMS will OOME. I suspect this is because it 
> attempts to do one large bulk load in an effort to save time. Ultimately this 
> can lead to a collection so large in size that HMS eventually hits an Out of 
> Memory Exception. 
> Instead I suggest that Hive include a configurable batch size that HMS can 
> use to break up the load. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14571) Document configuration hive.msck.repair.batch.size

2016-08-18 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427653#comment-15427653
 ] 

Lefty Leverenz commented on HIVE-14571:
---

Review of text in the Description:

bq.  ... execute all the partitions at one short.

The correct phrase is "at one shot" but since that's metaphoric, perhaps "at 
once" would be better.

> Document configuration hive.msck.repair.batch.size
> --
>
> Key: HIVE-14571
> URL: https://issues.apache.org/jira/browse/HIVE-14571
> Project: Hive
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Chinna Rao Lalam
>Assignee: Chinna Rao Lalam
>Priority: Minor
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
>
> Update here 
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RecoverPartitions(MSCKREPAIRTABLE)]
> {quote}
> When there is a large number of untracked partitions for the MSCK REPAIR 
> TABLE command, there is a provision to run the msck repair table batch wise 
> to avoid OOME. By giving the configured batch size for the property 
> *hive.msck.repair.batch.size* it can run in the batches internally. The 
> default value of the property is zero, it means it will execute all the 
> partitions at one short.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14571) Document configuration hive.msck.repair.batch.size

2016-08-18 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427651#comment-15427651
 ] 

Lefty Leverenz commented on HIVE-14571:
---

*hive.msck.repair.batch.size* also needs to be documented in the Configuration 
Properties wikidoc here:

* [Configuration Properties -- Query and DDL Execution | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-QueryandDDLExecution]

Added a TODOC2.2 label.

> Document configuration hive.msck.repair.batch.size
> --
>
> Key: HIVE-14571
> URL: https://issues.apache.org/jira/browse/HIVE-14571
> Project: Hive
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Chinna Rao Lalam
>Assignee: Chinna Rao Lalam
>Priority: Minor
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
>
> Update here 
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RecoverPartitions(MSCKREPAIRTABLE)]
> {quote}
> When there is a large number of untracked partitions for the MSCK REPAIR 
> TABLE command, there is a provision to run the msck repair table batch wise 
> to avoid OOME. By giving the configured batch size for the property 
> *hive.msck.repair.batch.size* it can run in the batches internally. The 
> default value of the property is zero, it means it will execute all the 
> partitions at one short.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14571) Document configuration hive.msck.repair.batch.size

2016-08-18 Thread Lefty Leverenz (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-14571:
--
Labels: TODOC2.2  (was: )

> Document configuration hive.msck.repair.batch.size
> --
>
> Key: HIVE-14571
> URL: https://issues.apache.org/jira/browse/HIVE-14571
> Project: Hive
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Chinna Rao Lalam
>Assignee: Chinna Rao Lalam
>Priority: Minor
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
>
> Update here 
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RecoverPartitions(MSCKREPAIRTABLE)]
> {quote}
> When there is a large number of untracked partitions for the MSCK REPAIR 
> TABLE command, there is a provision to run the msck repair table batch wise 
> to avoid OOME. By giving the configured batch size for the property 
> *hive.msck.repair.batch.size* it can run in the batches internally. The 
> default value of the property is zero, it means it will execute all the 
> partitions at one short.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14565) CBO (Calcite Return Path) Handle field access for nested column

2016-08-18 Thread Vineet Garg (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427645#comment-15427645
 ] 

Vineet Garg commented on HIVE-14565:


Looks good to me. Was wondering if we handle array but looks like it is handled 
and converted to index(array<>, idx) function call.Not sure about map yet.

> CBO (Calcite Return Path) Handle field access for nested column
> ---
>
> Key: HIVE-14565
> URL: https://issues.apache.org/jira/browse/HIVE-14565
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer
>Affects Versions: 2.1.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-14565.1.patch, HIVE-14565.patch
>
>
> ExprNodeConverter doesn't handle field access currently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14566) LLAP IO reads timestamp wrongly

2016-08-18 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427635#comment-15427635
 ] 

Hive QA commented on HIVE-14566:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12824472/HIVE-14566.3.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10443 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_1]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_2]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[load_dyn_part1]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[load_dyn_part2]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[transform_ppr1]
org.apache.hive.beeline.TestBeeLineWithArgs.testEmbeddedBeelineOutputs
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testSelectThriftSerializeInTasks
org.apache.hive.service.cli.operation.TestOperationLoggingLayout.testSwitchLogLayout
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/934/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/934/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-934/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12824472 - PreCommit-HIVE-MASTER-Build

> LLAP IO reads timestamp wrongly
> ---
>
> Key: HIVE-14566
> URL: https://issues.apache.org/jira/browse/HIVE-14566
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.1.0, 2.0.1, 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Critical
> Attachments: HIVE-14566.1.patch, HIVE-14566.2.patch, 
> HIVE-14566.3.patch
>
>
> HIVE-10127 is causing incorrect results when orc_merge12.q is run in llap.
> It reads timestamp wrongly.
> {code:title=LLAP IO Enabled}
> hive> select atimestamp1 from alltypesorc3xcols limit 10;
> OK
> 1969-12-31 15:59:46.674
> NULL
> 1969-12-31 15:59:55.787
> 1969-12-31 15:59:44.187
> 1969-12-31 15:59:50.434
> 1969-12-31 16:00:15.007
> 1969-12-31 16:00:07.021
> 1969-12-31 16:00:04.963
> 1969-12-31 15:59:52.176
> 1969-12-31 15:59:44.569
> {code}
> {code:title=LLAP IO Disabled}
> hive> select atimestamp1 from alltypesorc3xcols limit 10;
> OK
> 1969-12-31 15:59:46.674
> NULL
> 1969-12-31 15:59:55.787
> 1969-12-31 15:59:44.187
> 1969-12-31 15:59:50.434
> 1969-12-31 16:00:14.007
> 1969-12-31 16:00:06.021
> 1969-12-31 16:00:03.963
> 1969-12-31 15:59:52.176
> 1969-12-31 15:59:44.569
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14522) CBO: Calcite Operator To Hive Operator(Calcite Return Path): Fix test failure for auto_join_filters

2016-08-18 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427633#comment-15427633
 ] 

Lefty Leverenz commented on HIVE-14522:
---

Doc note:  This removes *hive.outerjoin.supports.filters* from HiveConf.java in 
release 2.2.0.  It hasn't been documented in the wiki yet, but ought to be.  
(Created in 0.7.0 by HIVE-1534.)

* [Configuration Properties -- Query and DDL Execution | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-QueryandDDLExecution]

Added a TODOC2.2 label.

> CBO: Calcite Operator To Hive Operator(Calcite Return Path): Fix test failure 
> for auto_join_filters
> ---
>
> Key: HIVE-14522
> URL: https://issues.apache.org/jira/browse/HIVE-14522
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-14522.1.patch, HIVE-14522.2.patch
>
>
> {code}
> CREATE TABLE smb_input1(key int, value int) CLUSTERED BY (key) SORTED BY 
> (key) INTO 2 BUCKETS; 
> CREATE TABLE smb_input2(key int, value int) CLUSTERED BY (value) SORTED BY 
> (value) INTO 2 BUCKETS; 
> LOAD DATA LOCAL INPATH '../../data/files/in1.txt' into table smb_input1;
> LOAD DATA LOCAL INPATH '../../data/files/in2.txt' into table smb_input1;
> LOAD DATA LOCAL INPATH '../../data/files/in1.txt' into table smb_input2;
> LOAD DATA LOCAL INPATH '../../data/files/in2.txt' into table smb_input2;
> SET hive.optimize.bucketmapjoin = true;
> SET hive.optimize.bucketmapjoin.sortedmerge = true;
> SET hive.input.format = 
> org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> SET hive.outerjoin.supports.filters = false;
> {code}
> {code} SELECT sum(hash(a.key,a.value,b.key,b.value)) FROM myinput1 a LEFT 
> OUTER JOIN myinput1 b on a.key > 40 AND a.value > 50 AND a.key = a.value AND 
> b.key > 40 AND b.value > 50 AND b.key = b.value; {code}
> {code} Expected result: 3078400 Actual result: 4937935 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14522) CBO: Calcite Operator To Hive Operator(Calcite Return Path): Fix test failure for auto_join_filters

2016-08-18 Thread Lefty Leverenz (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-14522:
--
Labels: TODOC2.2  (was: )

> CBO: Calcite Operator To Hive Operator(Calcite Return Path): Fix test failure 
> for auto_join_filters
> ---
>
> Key: HIVE-14522
> URL: https://issues.apache.org/jira/browse/HIVE-14522
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-14522.1.patch, HIVE-14522.2.patch
>
>
> {code}
> CREATE TABLE smb_input1(key int, value int) CLUSTERED BY (key) SORTED BY 
> (key) INTO 2 BUCKETS; 
> CREATE TABLE smb_input2(key int, value int) CLUSTERED BY (value) SORTED BY 
> (value) INTO 2 BUCKETS; 
> LOAD DATA LOCAL INPATH '../../data/files/in1.txt' into table smb_input1;
> LOAD DATA LOCAL INPATH '../../data/files/in2.txt' into table smb_input1;
> LOAD DATA LOCAL INPATH '../../data/files/in1.txt' into table smb_input2;
> LOAD DATA LOCAL INPATH '../../data/files/in2.txt' into table smb_input2;
> SET hive.optimize.bucketmapjoin = true;
> SET hive.optimize.bucketmapjoin.sortedmerge = true;
> SET hive.input.format = 
> org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> SET hive.outerjoin.supports.filters = false;
> {code}
> {code} SELECT sum(hash(a.key,a.value,b.key,b.value)) FROM myinput1 a LEFT 
> OUTER JOIN myinput1 b on a.key > 40 AND a.value > 50 AND a.key = a.value AND 
> b.key > 40 AND b.value > 50 AND b.key = b.value; {code}
> {code} Expected result: 3078400 Actual result: 4937935 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-13874) Tighten up EOF checking in Fast DeserializeRead classes; display better exception information; add new Unit Tests

2016-08-18 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-13874:

Attachment: HIVE-13874.04.patch

> Tighten up EOF checking in Fast DeserializeRead classes; display better 
> exception information; add new Unit Tests
> -
>
> Key: HIVE-13874
> URL: https://issues.apache.org/jira/browse/HIVE-13874
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-13874.01.patch, HIVE-13874.02.patch, 
> HIVE-13874.03.patch, HIVE-13874.04.patch
>
>
>  Tighten up EOF bounds checking in LazyBinaryDeserializeRead so bytes beyond 
> stated row end are never read.  Use WritableUtils.decodeVIntSize to check for 
> room ahead like regular LazyBinary code does.
> Display more detailed information when an exception is thrown by 
> DeserializeRead classes.
> Add Unit Tests, including some designed that catch the errors like HIVE-13818.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-13874) Tighten up EOF checking in Fast DeserializeRead classes; display better exception information; add new Unit Tests

2016-08-18 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-13874:

Status: Patch Available  (was: In Progress)

> Tighten up EOF checking in Fast DeserializeRead classes; display better 
> exception information; add new Unit Tests
> -
>
> Key: HIVE-13874
> URL: https://issues.apache.org/jira/browse/HIVE-13874
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-13874.01.patch, HIVE-13874.02.patch, 
> HIVE-13874.03.patch, HIVE-13874.04.patch
>
>
>  Tighten up EOF bounds checking in LazyBinaryDeserializeRead so bytes beyond 
> stated row end are never read.  Use WritableUtils.decodeVIntSize to check for 
> room ahead like regular LazyBinary code does.
> Display more detailed information when an exception is thrown by 
> DeserializeRead classes.
> Add Unit Tests, including some designed that catch the errors like HIVE-13818.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14502) Convert MiniTez tests to MiniLlap tests

2016-08-18 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427615#comment-15427615
 ] 

Prasanth Jayachandran commented on HIVE-14502:
--

orc_merge12.q will be fixed in HIVE-14566. Remaining 3 issues are not really 
critical. Will fix them in follow up. 

> Convert MiniTez tests to MiniLlap tests
> ---
>
> Key: HIVE-14502
> URL: https://issues.apache.org/jira/browse/HIVE-14502
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14502.1.patch, HIVE-14502.2.patch, 
> HIVE-14502.3.patch, HIVE-14502.4.patch
>
>
> Llap shares most of the codepath with tez. MiniLlapCliDriver is much faster 
> than MiniTezCliDriver because of threaded executors and caching. 
> MiniTezCliDriver tests takes around 3hr 15mins to run around 400 tests. To 
> cut down this test time significantly it makes sense to move over mive tez 
> tests to mini llap tests.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14502) Convert MiniTez tests to MiniLlap tests

2016-08-18 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427614#comment-15427614
 ] 

Prasanth Jayachandran commented on HIVE-14502:
--

vector_join_part_col_char.q test output for llap is similar to mr. Show 
partitions for partitions with char type is showing with padded space in mr and 
llap but trims spaces in tez. I will investigate more why tez is showing 
partitions without padded spaces.

> Convert MiniTez tests to MiniLlap tests
> ---
>
> Key: HIVE-14502
> URL: https://issues.apache.org/jira/browse/HIVE-14502
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14502.1.patch, HIVE-14502.2.patch, 
> HIVE-14502.3.patch, HIVE-14502.4.patch
>
>
> Llap shares most of the codepath with tez. MiniLlapCliDriver is much faster 
> than MiniTezCliDriver because of threaded executors and caching. 
> MiniTezCliDriver tests takes around 3hr 15mins to run around 400 tests. To 
> cut down this test time significantly it makes sense to move over mive tez 
> tests to mini llap tests.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14502) Convert MiniTez tests to MiniLlap tests

2016-08-18 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14502:
-
Attachment: HIVE-14502.4.patch

> Convert MiniTez tests to MiniLlap tests
> ---
>
> Key: HIVE-14502
> URL: https://issues.apache.org/jira/browse/HIVE-14502
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14502.1.patch, HIVE-14502.2.patch, 
> HIVE-14502.3.patch, HIVE-14502.4.patch
>
>
> Llap shares most of the codepath with tez. MiniLlapCliDriver is much faster 
> than MiniTezCliDriver because of threaded executors and caching. 
> MiniTezCliDriver tests takes around 3hr 15mins to run around 400 tests. To 
> cut down this test time significantly it makes sense to move over mive tez 
> tests to mini llap tests.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14574) use consistent hashing for LLAP consistent splits to alleviate impact from cluster changes

2016-08-18 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427543#comment-15427543
 ] 

Hive QA commented on HIVE-14574:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12824479/HIVE-14574.01.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10443 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_1]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_2]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[load_dyn_part1]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[transform_ppr1]
org.apache.hive.beeline.TestBeeLineWithArgs.testEmbeddedBeelineOutputs
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testSelectThriftSerializeInTasks
org.apache.hive.service.cli.operation.TestOperationLoggingLayout.testSwitchLogLayout
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/933/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/933/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-933/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12824479 - PreCommit-HIVE-MASTER-Build

> use consistent hashing for LLAP consistent splits to alleviate impact from 
> cluster changes
> --
>
> Key: HIVE-14574
> URL: https://issues.apache.org/jira/browse/HIVE-14574
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14574.01.patch, HIVE-14574.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14583) Error: Error while compiling statement: FAILED: SemanticException Generate Map Join Task Error: Encountered unregistered class ID: 95

2016-08-18 Thread huangxiangang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427540#comment-15427540
 ] 

huangxiangang commented on HIVE-14583:
--

there some other logs :

2016-08-18 15:13:59,712 | WARN  | HiveServer2-Handler-Pool: Thread-42427169 | 
Runtime exception be caught while deserializing object using kryo:Unable to 
find class: com.huawei.udaf.GroupConcatSimpleUDAF$GroupConcatSimpleUDAFEvaluator
Serialization trace:
udafEvaluator 
(org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge$GenericUDAFBridgeEvaluator)
genericUDAFWritableEvaluator (org.apache.hadoop.hive.ql.plan.AggregationDesc)
aggregators (org.apache.hadoop.hive.ql.plan.GroupByDesc)
conf (org.apache.hadoop.hive.ql.exec.GroupByOperator)
childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
childOperators (org.apache.hadoop.hive.ql.exec.GroupByOperator)
opParseCtxMap (org.apache.hadoop.hive.ql.plan.MapWork)
mapWork (org.apache.hadoop.hive.ql.plan.MapredWork) | 
org.apache.hadoop.hive.ql.exec.Utilities.deserializePlan(Utilities.java:948)
2016-08-18 15:13:59,713 | ERROR | HiveServer2-Handler-Pool: Thread-42427169 | 
FAILED: SemanticException Generate Map Join Task Error: Index: 93, Size: 0
org.apache.hadoop.hive.ql.parse.SemanticException: Generate Map Join Task 
Error: Index: 93, Size: 0
at 
org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinTaskDispatcher.processCurrentTask(CommonJoinTaskDispatcher.java:513)
at 
org.apache.hadoop.hive.ql.optimizer.physical.AbstractJoinTaskDispatcher.dispatch(AbstractJoinTaskDispatcher.java:182)
at 
org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111)
at 
org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:194)
at 
org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(TaskGraphWalker.java:139)
at 
org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinResolver.resolve(CommonJoinResolver.java:79)
at 
org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:104)
at 
org.apache.hadoop.hive.ql.parse.MapReduceCompiler.optimizeTaskPlan(MapReduceCompiler.java:290)
at 
org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:217)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9309)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:331)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:384)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:280)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1035)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1018)
at 
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:103)
   at 
org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:208)
at 
org.apache.hive.service.cli.operation.Operation.run(Operation.java:256)
at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:311)
at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:298)
at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:79)
at 
org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:37)
at 
org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:64)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1641)
at 
org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:493)
at 
org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:60)
at com.sun.proxy.$Proxy20.executeStatementAsync(Unknown Source)
at 
org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:236)
at 
org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:530)
at 
org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313)
at 
org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at

[jira] [Commented] (HIVE-14583) Error: Error while compiling statement: FAILED: SemanticException Generate Map Join Task Error: Encountered unregistered class ID: 95

2016-08-18 Thread huangxiangang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427535#comment-15427535
 ] 

huangxiangang commented on HIVE-14583:
--

the ERROR can not be reproducible,because when I try to run it，it succeed。and 
from the log，I can not find the keywords “kryo”. so, I think it maybe some 
other issue .

> Error: Error while compiling statement: FAILED: SemanticException Generate 
> Map Join Task Error: Encountered unregistered class ID: 95
> -
>
> Key: HIVE-14583
> URL: https://issues.apache.org/jira/browse/HIVE-14583
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline, CLI, Clients
>Affects Versions: 0.13.0
>Reporter: huangxiangang
>Assignee: huangxiangang
>
> sometimes ,when I run  a hive join query , get the following error:
> Error: Error while compiling statement: FAILED: SemanticException Generate 
> Map Join Task Error: Encountered unregistered class ID: 95
> but,when I try to run it several times,it maybe succeed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14427) CompactionTxnHandler.markCleaned() can delete aborted txns

2016-08-18 Thread Barna Zsombor Klara (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427526#comment-15427526
 ] 

Barna Zsombor Klara commented on HIVE-14427:


Hi [~ekoifman]

are you currently working on this, or could I take a look at it?

Thanks,
Zsombor

> CompactionTxnHandler.markCleaned() can delete aborted txns
> --
>
> Key: HIVE-14427
> URL: https://issues.apache.org/jira/browse/HIVE-14427
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Eugene Koifman
>
> We can modify 
> {noformat}
> s = "select distinct txn_id from TXNS, TXN_COMPONENTS where txn_id = tc_txnid 
> and txn_state = '" +
>   TXN_ABORTED + "' and tc_database = '" + info.dbname + "' and 
> tc_table = '" +
>   info.tableName + "'" + (info.highestTxnId == 0 ? "" : " and txn_id 
> <= " + info.highestTxnId);
> {noformat}
> to use select txn_id, count(*) ... group by txn_id so that we know the number 
> of components in a TXN.
> Then when running "delete from TXN_COMPONENTS where..." we know how many rows 
> were deleted.
> If the sum of all values from 1st query matched total number of rows deleted, 
> we know that all Aborted txns in this set are empty and thus can be deleted 
> here.
> This means we clean up aborted txns from TXNS table quicker and avoid a large 
> join in _cleanEmptyAbortedTxns()_.  Also, doing delete on TXNS here will have 
> PKs in WHERE clause so it should be cheap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14502) Convert MiniTez tests to MiniLlap tests

2016-08-18 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14502:
-
Attachment: HIVE-14502.3.patch

4 qfiles that are yet to be moved to llap must be in minitez.shared. That was 
causing the previous failures. 

> Convert MiniTez tests to MiniLlap tests
> ---
>
> Key: HIVE-14502
> URL: https://issues.apache.org/jira/browse/HIVE-14502
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14502.1.patch, HIVE-14502.2.patch, 
> HIVE-14502.3.patch
>
>
> Llap shares most of the codepath with tez. MiniLlapCliDriver is much faster 
> than MiniTezCliDriver because of threaded executors and caching. 
> MiniTezCliDriver tests takes around 3hr 15mins to run around 400 tests. To 
> cut down this test time significantly it makes sense to move over mive tez 
> tests to mini llap tests.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14583) Error: Error while compiling statement: FAILED: SemanticException Generate Map Join Task Error: Encountered unregistered class ID: 95

2016-08-18 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427506#comment-15427506
 ] 

Prasanth Jayachandran commented on HIVE-14583:
--

Can you please provide a small reproducible test case? There were several fixes 
that went in after 0.13.0 related to kryo. We might no longer have this issue 
in recent releases.

> Error: Error while compiling statement: FAILED: SemanticException Generate 
> Map Join Task Error: Encountered unregistered class ID: 95
> -
>
> Key: HIVE-14583
> URL: https://issues.apache.org/jira/browse/HIVE-14583
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline, CLI, Clients
>Affects Versions: 0.13.0
>Reporter: huangxiangang
>Assignee: huangxiangang
>
> sometimes ,when I run  a hive join query , get the following error:
> Error: Error while compiling statement: FAILED: SemanticException Generate 
> Map Join Task Error: Encountered unregistered class ID: 95
> but,when I try to run it several times,it maybe succeed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14583) Error: Error while compiling statement: FAILED: SemanticException Generate Map Join Task Error: Encountered unregistered class ID: 95

2016-08-18 Thread huangxiangang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huangxiangang updated HIVE-14583:
-
Component/s: Clients
 CLI

> Error: Error while compiling statement: FAILED: SemanticException Generate 
> Map Join Task Error: Encountered unregistered class ID: 95
> -
>
> Key: HIVE-14583
> URL: https://issues.apache.org/jira/browse/HIVE-14583
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline, CLI, Clients
>Affects Versions: 0.13.0
>Reporter: huangxiangang
>Assignee: huangxiangang
>
> sometimes ,when I run  a hive join query , get the following error:
> Error: Error while compiling statement: FAILED: SemanticException Generate 
> Map Join Task Error: Encountered unregistered class ID: 95
> but,when I try to run it several times,it maybe succeed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14560) Support exchange partition between s3 and hdfs tables

2016-08-18 Thread Abdullah Yousufi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abdullah Yousufi updated HIVE-14560:

Attachment: (was: HIVE-14560.02.patch)

> Support exchange partition between s3 and hdfs tables
> -
>
> Key: HIVE-14560
> URL: https://issues.apache.org/jira/browse/HIVE-14560
> Project: Hive
>  Issue Type: Bug
>Reporter: Abdullah Yousufi
>Assignee: Abdullah Yousufi
> Fix For: 2.2.0
>
> Attachments: HIVE-14560.02.patch, HIVE-14560.patch
>
>
> {code}
> alter table s3_tbl exchange partition (country='USA', state='CA') with table 
> hdfs_tbl;
> {code}
> results in:
> {code}
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got 
> exception: java.lang.IllegalArgumentException Wrong FS: 
> s3a://hive-on-s3/s3_tbl/country=USA/state=CA, expected: 
> hdfs://localhost:9000) (state=08S01,code=1)
> {code}
> because the check for whether the s3 destination table path exists occurs on 
> the hdfs filesystem.
> Furthermore, exchanging between s3 to hdfs fails because the hdfs rename 
> operation is not supported across filesystems. Fix uses copy + deletion in 
> the case that the file systems differ.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14560) Support exchange partition between s3 and hdfs tables

2016-08-18 Thread Abdullah Yousufi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abdullah Yousufi updated HIVE-14560:

Attachment: HIVE-14560.02.patch

> Support exchange partition between s3 and hdfs tables
> -
>
> Key: HIVE-14560
> URL: https://issues.apache.org/jira/browse/HIVE-14560
> Project: Hive
>  Issue Type: Bug
>Reporter: Abdullah Yousufi
>Assignee: Abdullah Yousufi
> Fix For: 2.2.0
>
> Attachments: HIVE-14560.02.patch, HIVE-14560.patch
>
>
> {code}
> alter table s3_tbl exchange partition (country='USA', state='CA') with table 
> hdfs_tbl;
> {code}
> results in:
> {code}
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got 
> exception: java.lang.IllegalArgumentException Wrong FS: 
> s3a://hive-on-s3/s3_tbl/country=USA/state=CA, expected: 
> hdfs://localhost:9000) (state=08S01,code=1)
> {code}
> because the check for whether the s3 destination table path exists occurs on 
> the hdfs filesystem.
> Furthermore, exchanging between s3 to hdfs fails because the hdfs rename 
> operation is not supported across filesystems. Fix uses copy + deletion in 
> the case that the file systems differ.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14559) Remove setting hive.execution.engine in qfiles

2016-08-18 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14559:
-
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Thanks for the review! Committed to master.

> Remove setting hive.execution.engine in qfiles
> --
>
> Key: HIVE-14559
> URL: https://issues.apache.org/jira/browse/HIVE-14559
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Fix For: 2.2.0
>
> Attachments: HIVE-14559.1.patch
>
>
> Some qfiles are explicitly setting execution engine. If we run those tests on 
> different Mini CliDriver's it could be very slow. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14559) Remove setting hive.execution.engine in qfiles

2016-08-18 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427463#comment-15427463
 ] 

Prasanth Jayachandran commented on HIVE-14559:
--

Test failures are not related to this patch.

> Remove setting hive.execution.engine in qfiles
> --
>
> Key: HIVE-14559
> URL: https://issues.apache.org/jira/browse/HIVE-14559
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14559.1.patch
>
>
> Some qfiles are explicitly setting execution engine. If we run those tests on 
> different Mini CliDriver's it could be very slow. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14554) Hive ptest should delete the itests/thirdparty directory everytime it builds hive

2016-08-18 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427464#comment-15427464
 ] 

Rui Li commented on HIVE-14554:
---

The checksum way sounds good to me.

> Hive ptest should delete the itests/thirdparty directory everytime it builds 
> hive
> -
>
> Key: HIVE-14554
> URL: https://issues.apache.org/jira/browse/HIVE-14554
> Project: Hive
>  Issue Type: Task
>  Components: Testing Infrastructure
>Reporter: Sergio Peña
>Assignee: Sergio Peña
>
> The {{itests/thridparty}} directory is created by hive on spark when 
> downloading the spark-assembly file. Hive ptest should delete this directory 
> everytime it runs a new set of tests to avoid conflicts when a new spark 
> tarball is submitted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14560) Support exchange partition between s3 and hdfs tables

2016-08-18 Thread Abdullah Yousufi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abdullah Yousufi updated HIVE-14560:

Attachment: HIVE-14560.02.patch

Removed whitespace lines modified by editor + change FileUtils.copy() to copy()

> Support exchange partition between s3 and hdfs tables
> -
>
> Key: HIVE-14560
> URL: https://issues.apache.org/jira/browse/HIVE-14560
> Project: Hive
>  Issue Type: Bug
>Reporter: Abdullah Yousufi
>Assignee: Abdullah Yousufi
> Fix For: 2.2.0
>
> Attachments: HIVE-14560.02.patch, HIVE-14560.patch
>
>
> {code}
> alter table s3_tbl exchange partition (country='USA', state='CA') with table 
> hdfs_tbl;
> {code}
> results in:
> {code}
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got 
> exception: java.lang.IllegalArgumentException Wrong FS: 
> s3a://hive-on-s3/s3_tbl/country=USA/state=CA, expected: 
> hdfs://localhost:9000) (state=08S01,code=1)
> {code}
> because the check for whether the s3 destination table path exists occurs on 
> the hdfs filesystem.
> Furthermore, exchanging between s3 to hdfs fails because the hdfs rename 
> operation is not supported across filesystems. Fix uses copy + deletion in 
> the case that the file systems differ.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14559) Remove setting hive.execution.engine in qfiles

2016-08-18 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427454#comment-15427454
 ] 

Hive QA commented on HIVE-14559:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12824213/HIVE-14559.1.patch

{color:green}SUCCESS:{color} +1 due to 6 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10442 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_1]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_2]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[load_dyn_part1]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[load_dyn_part2]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[transform_ppr1]
org.apache.hive.beeline.TestBeeLineWithArgs.testEmbeddedBeelineOutputs
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testSelectThriftSerializeInTasks
org.apache.hive.service.cli.operation.TestOperationLoggingLayout.testSwitchLogLayout
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/932/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/932/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-932/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12824213 - PreCommit-HIVE-MASTER-Build

> Remove setting hive.execution.engine in qfiles
> --
>
> Key: HIVE-14559
> URL: https://issues.apache.org/jira/browse/HIVE-14559
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14559.1.patch
>
>
> Some qfiles are explicitly setting execution engine. If we run those tests on 
> different Mini CliDriver's it could be very slow. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14580) Introduce || operator

2016-08-18 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427453#comment-15427453
 ] 

Ashutosh Chauhan commented on HIVE-14580:
-

https://docs.oracle.com/cd/B19306_01/server.102/b14200/operators003.htm is an 
example. 

> Introduce || operator
> -
>
> Key: HIVE-14580
> URL: https://issues.apache.org/jira/browse/HIVE-14580
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Ashutosh Chauhan
>
> Functionally equivalent to concat() udf. But standard allows usage of || for 
> string concatenations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14574) use consistent hashing for LLAP consistent splits to alleviate impact from cluster changes

2016-08-18 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14574:

Attachment: HIVE-14574.01.patch

Added test and naming... this is the most over-engineered test ever. The number 
trends to more reasonable with more splits, strangely enough :)
I think we can make a separate jira for more the machine accounting described 
above.

> use consistent hashing for LLAP consistent splits to alleviate impact from 
> cluster changes
> --
>
> Key: HIVE-14574
> URL: https://issues.apache.org/jira/browse/HIVE-14574
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14574.01.patch, HIVE-14574.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14580) Introduce || operator

2016-08-18 Thread Pengcheng Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427447#comment-15427447
 ] 

Pengcheng Xiong commented on HIVE-14580:


[~ashutoshc]. how do we distinguish this from mathematic "||"? Thanks.

> Introduce || operator
> -
>
> Key: HIVE-14580
> URL: https://issues.apache.org/jira/browse/HIVE-14580
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Ashutosh Chauhan
>
> Functionally equivalent to concat() udf. But standard allows usage of || for 
> string concatenations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14579) Add extract udf

2016-08-18 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427430#comment-15427430
 ] 

Ashutosh Chauhan commented on HIVE-14579:
-

Part of standard sql and heavily used in timeseries based datasets.

> Add extract udf
> ---
>
> Key: HIVE-14579
> URL: https://issues.apache.org/jira/browse/HIVE-14579
> Project: Hive
>  Issue Type: Sub-task
>  Components: UDF
>Reporter: Ashutosh Chauhan
>
> https://www.postgresql.org/docs/9.1/static/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14570) Create table with column names ROWID, INPUTFILENAME, BLOCKOFFSETINSIDEFILE sucess but query fails

2016-08-18 Thread Niklaus Xiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427424#comment-15427424
 ] 

Niklaus Xiao commented on HIVE-14570:
-

Unrelated tests failure.

> Create table with column names ROW__ID, INPUT__FILE__NAME, 
> BLOCK__OFFSET__INSIDE__FILE sucess but query fails
> -
>
> Key: HIVE-14570
> URL: https://issues.apache.org/jira/browse/HIVE-14570
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 1.3.0, 2.2.0
>Reporter: Niklaus Xiao
>Assignee: Niklaus Xiao
> Fix For: 2.2.0
>
> Attachments: HIVE-14570.patch
>
>
> {code}
> 0: jdbc:hive2://189.39.151.74:21066/> create table foo1(ROW__ID string);
> No rows affected (0.281 seconds)
> 0: jdbc:hive2://189.39.151.74:21066/> create table 
> foo2(BLOCK__OFFSET__INSIDE__FILE string);
> No rows affected (0.323 seconds)
> 0: jdbc:hive2://189.39.151.74:21066/> create table foo3(INPUT__FILE__NAME 
> string);
> No rows affected (0.307 seconds)
> 0: jdbc:hive2://189.39.151.74:21066/> select * from foo1;
> Error: Error while compiling statement: FAILED: SemanticException Line 0:-1 
> Invalid column reference 'TOK_ALLCOLREF' (state=42000,code=4)
> 0: jdbc:hive2://189.39.151.74:21066/> select * from foo2;
> Error: Error while compiling statement: FAILED: SemanticException Line 0:-1 
> Invalid column reference 'TOK_ALLCOLREF' (state=42000,code=4)
> 0: jdbc:hive2://189.39.151.74:21066/> select * from foo3;
> Error: Error while compiling statement: FAILED: SemanticException Line 0:-1 
> Invalid column reference 'TOK_ALLCOLREF' (state=42000,code=4)
> {code}
> We should prevent user from creating table with column names the same as 
> Virtual Column names



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13874) Tighten up EOF checking in Fast DeserializeRead classes; display better exception information; add new Unit Tests

2016-08-18 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427420#comment-15427420
 ] 

Sergey Shelukhin commented on HIVE-13874:
-

couple comments on RB

> Tighten up EOF checking in Fast DeserializeRead classes; display better 
> exception information; add new Unit Tests
> -
>
> Key: HIVE-13874
> URL: https://issues.apache.org/jira/browse/HIVE-13874
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-13874.01.patch, HIVE-13874.02.patch, 
> HIVE-13874.03.patch
>
>
>  Tighten up EOF bounds checking in LazyBinaryDeserializeRead so bytes beyond 
> stated row end are never read.  Use WritableUtils.decodeVIntSize to check for 
> room ahead like regular LazyBinary code does.
> Display more detailed information when an exception is thrown by 
> DeserializeRead classes.
> Add Unit Tests, including some designed that catch the errors like HIVE-13818.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14565) CBO (Calcite Return Path) Handle field access for nested column

2016-08-18 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-14565:

Status: Open  (was: Patch Available)

> CBO (Calcite Return Path) Handle field access for nested column
> ---
>
> Key: HIVE-14565
> URL: https://issues.apache.org/jira/browse/HIVE-14565
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer
>Affects Versions: 2.1.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-14565.1.patch, HIVE-14565.patch
>
>
> ExprNodeConverter doesn't handle field access currently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14565) CBO (Calcite Return Path) Handle field access for nested column

2016-08-18 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-14565:

Status: Patch Available  (was: Open)

> CBO (Calcite Return Path) Handle field access for nested column
> ---
>
> Key: HIVE-14565
> URL: https://issues.apache.org/jira/browse/HIVE-14565
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer
>Affects Versions: 2.1.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-14565.1.patch, HIVE-14565.patch
>
>
> ExprNodeConverter doesn't handle field access currently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14565) CBO (Calcite Return Path) Handle field access for nested column

2016-08-18 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-14565:

Attachment: HIVE-14565.1.patch

[~jcamachorodriguez] Can you please review this?

> CBO (Calcite Return Path) Handle field access for nested column
> ---
>
> Key: HIVE-14565
> URL: https://issues.apache.org/jira/browse/HIVE-14565
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer
>Affects Versions: 2.1.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-14565.1.patch, HIVE-14565.patch
>
>
> ExprNodeConverter doesn't handle field access currently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14566) LLAP IO reads timestamp wrongly

2016-08-18 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427415#comment-15427415
 ] 

Sergey Shelukhin commented on HIVE-14566:
-

+1 pending tests

> LLAP IO reads timestamp wrongly
> ---
>
> Key: HIVE-14566
> URL: https://issues.apache.org/jira/browse/HIVE-14566
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.1.0, 2.0.1, 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Critical
> Attachments: HIVE-14566.1.patch, HIVE-14566.2.patch, 
> HIVE-14566.3.patch
>
>
> HIVE-10127 is causing incorrect results when orc_merge12.q is run in llap.
> It reads timestamp wrongly.
> {code:title=LLAP IO Enabled}
> hive> select atimestamp1 from alltypesorc3xcols limit 10;
> OK
> 1969-12-31 15:59:46.674
> NULL
> 1969-12-31 15:59:55.787
> 1969-12-31 15:59:44.187
> 1969-12-31 15:59:50.434
> 1969-12-31 16:00:15.007
> 1969-12-31 16:00:07.021
> 1969-12-31 16:00:04.963
> 1969-12-31 15:59:52.176
> 1969-12-31 15:59:44.569
> {code}
> {code:title=LLAP IO Disabled}
> hive> select atimestamp1 from alltypesorc3xcols limit 10;
> OK
> 1969-12-31 15:59:46.674
> NULL
> 1969-12-31 15:59:55.787
> 1969-12-31 15:59:44.187
> 1969-12-31 15:59:50.434
> 1969-12-31 16:00:14.007
> 1969-12-31 16:00:06.021
> 1969-12-31 16:00:03.963
> 1969-12-31 15:59:52.176
> 1969-12-31 15:59:44.569
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14566) LLAP IO reads timestamp wrongly

2016-08-18 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14566:
-
Attachment: HIVE-14566.3.patch

Added a junit test.

> LLAP IO reads timestamp wrongly
> ---
>
> Key: HIVE-14566
> URL: https://issues.apache.org/jira/browse/HIVE-14566
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.1.0, 2.0.1, 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Critical
> Attachments: HIVE-14566.1.patch, HIVE-14566.2.patch, 
> HIVE-14566.3.patch
>
>
> HIVE-10127 is causing incorrect results when orc_merge12.q is run in llap.
> It reads timestamp wrongly.
> {code:title=LLAP IO Enabled}
> hive> select atimestamp1 from alltypesorc3xcols limit 10;
> OK
> 1969-12-31 15:59:46.674
> NULL
> 1969-12-31 15:59:55.787
> 1969-12-31 15:59:44.187
> 1969-12-31 15:59:50.434
> 1969-12-31 16:00:15.007
> 1969-12-31 16:00:07.021
> 1969-12-31 16:00:04.963
> 1969-12-31 15:59:52.176
> 1969-12-31 15:59:44.569
> {code}
> {code:title=LLAP IO Disabled}
> hive> select atimestamp1 from alltypesorc3xcols limit 10;
> OK
> 1969-12-31 15:59:46.674
> NULL
> 1969-12-31 15:59:55.787
> 1969-12-31 15:59:44.187
> 1969-12-31 15:59:50.434
> 1969-12-31 16:00:14.007
> 1969-12-31 16:00:06.021
> 1969-12-31 16:00:03.963
> 1969-12-31 15:59:52.176
> 1969-12-31 15:59:44.569
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14165) Remove Hive file listing during split computation

2016-08-18 Thread Abdullah Yousufi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abdullah Yousufi updated HIVE-14165:

Attachment: HIVE-14165.02.patch

> Remove Hive file listing during split computation
> -
>
> Key: HIVE-14165
> URL: https://issues.apache.org/jira/browse/HIVE-14165
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.1.0
>Reporter: Abdullah Yousufi
>Assignee: Abdullah Yousufi
> Attachments: HIVE-14165.02.patch, HIVE-14165.patch
>
>
> The Hive side listing in FetchOperator.java is unnecessary, since Hadoop's 
> FileInputFormat.java will list the files during split computation anyway to 
> determine their size. One way to remove this is to catch the 
> InvalidInputFormat exception thrown by FileInputFormat#getSplits() on the 
> Hive side instead of doing the file listing beforehand.
> For S3 select queries on partitioned tables, this results in a 2x speedup.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14503) Remove explicit order by in qfiles for union tests

2016-08-18 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427367#comment-15427367
 ] 

Hive QA commented on HIVE-14503:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12824030/HIVE-14503.3.patch

{color:green}SUCCESS:{color} +1 due to 35 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10442 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_1]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_2]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[load_dyn_part1]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[load_dyn_part2]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[transform_ppr1]
org.apache.hive.beeline.TestBeeLineWithArgs.testEmbeddedBeelineOutputs
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testSelectThriftSerializeInTasks
org.apache.hive.service.cli.operation.TestOperationLoggingLayout.testSwitchLogLayout
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/931/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/931/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-931/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12824030 - PreCommit-HIVE-MASTER-Build

> Remove explicit order by in qfiles for union tests
> --
>
> Key: HIVE-14503
> URL: https://issues.apache.org/jira/browse/HIVE-14503
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14503.1.patch, HIVE-14503.2.patch, 
> HIVE-14503.3.patch, HIVE-14503.4.patch
>
>
> Identify qfiles with explicit order by and replace them with 
> SORT_QUERY_RESULTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14522) CBO: Calcite Operator To Hive Operator(Calcite Return Path): Fix test failure for auto_join_filters

2016-08-18 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-14522:

   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Pushed to master. Thanks, Vineet!

> CBO: Calcite Operator To Hive Operator(Calcite Return Path): Fix test failure 
> for auto_join_filters
> ---
>
> Key: HIVE-14522
> URL: https://issues.apache.org/jira/browse/HIVE-14522
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Fix For: 2.2.0
>
> Attachments: HIVE-14522.1.patch, HIVE-14522.2.patch
>
>
> {code}
> CREATE TABLE smb_input1(key int, value int) CLUSTERED BY (key) SORTED BY 
> (key) INTO 2 BUCKETS; 
> CREATE TABLE smb_input2(key int, value int) CLUSTERED BY (value) SORTED BY 
> (value) INTO 2 BUCKETS; 
> LOAD DATA LOCAL INPATH '../../data/files/in1.txt' into table smb_input1;
> LOAD DATA LOCAL INPATH '../../data/files/in2.txt' into table smb_input1;
> LOAD DATA LOCAL INPATH '../../data/files/in1.txt' into table smb_input2;
> LOAD DATA LOCAL INPATH '../../data/files/in2.txt' into table smb_input2;
> SET hive.optimize.bucketmapjoin = true;
> SET hive.optimize.bucketmapjoin.sortedmerge = true;
> SET hive.input.format = 
> org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> SET hive.outerjoin.supports.filters = false;
> {code}
> {code} SELECT sum(hash(a.key,a.value,b.key,b.value)) FROM myinput1 a LEFT 
> OUTER JOIN myinput1 b on a.key > 40 AND a.value > 50 AND a.key = a.value AND 
> b.key > 40 AND b.value > 50 AND b.key = b.value; {code}
> {code} Expected result: 3078400 Actual result: 4937935 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14554) Hive ptest should delete the itests/thirdparty directory everytime it builds hive

2016-08-18 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HIVE-14554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427358#comment-15427358
 ] 

Sergio Peña commented on HIVE-14554:


The best solution is to switch to dependency model, but that is not a trivial 
task. We're working on that to make the move.

Another quick solution is to add a checksum file to the spark assembly file. 
Then hive will download the checksum, and compare it against the local checksum 
file. If they're different, then it will download the spark-assembly, otherwise 
it just uses the local. I like this approach more.

> Hive ptest should delete the itests/thirdparty directory everytime it builds 
> hive
> -
>
> Key: HIVE-14554
> URL: https://issues.apache.org/jira/browse/HIVE-14554
> Project: Hive
>  Issue Type: Task
>  Components: Testing Infrastructure
>Reporter: Sergio Peña
>Assignee: Sergio Peña
>
> The {{itests/thridparty}} directory is created by hive on spark when 
> downloading the spark-assembly file. Hive ptest should delete this directory 
> everytime it runs a new set of tests to avoid conflicts when a new spark 
> tarball is submitted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14565) CBO (Calcite Return Path) Handle field access for nested column

2016-08-18 Thread Vineet Garg (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427357#comment-15427357
 ] 

Vineet Garg commented on HIVE-14565:


Thanks Ashutosh 

> CBO (Calcite Return Path) Handle field access for nested column
> ---
>
> Key: HIVE-14565
> URL: https://issues.apache.org/jira/browse/HIVE-14565
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer
>Affects Versions: 2.1.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-14565.patch
>
>
> ExprNodeConverter doesn't handle field access currently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14554) Hive ptest should delete the itests/thirdparty directory everytime it builds hive

2016-08-18 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427348#comment-15427348
 ] 

Rui Li commented on HIVE-14554:
---

Oh sorry I misunderstood it. Just thought more about this, maybe it's better to 
associate a timestamp with the file and only re-download when upstream 
timestamp updates? I guess that's similar to how maven handles snapshots.

> Hive ptest should delete the itests/thirdparty directory everytime it builds 
> hive
> -
>
> Key: HIVE-14554
> URL: https://issues.apache.org/jira/browse/HIVE-14554
> Project: Hive
>  Issue Type: Task
>  Components: Testing Infrastructure
>Reporter: Sergio Peña
>Assignee: Sergio Peña
>
> The {{itests/thridparty}} directory is created by hive on spark when 
> downloading the spark-assembly file. Hive ptest should delete this directory 
> everytime it runs a new set of tests to avoid conflicts when a new spark 
> tarball is submitted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14503) Remove explicit order by in qfiles for union tests

2016-08-18 Thread Siddharth Seth (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427341#comment-15427341
 ] 

Siddharth Seth commented on HIVE-14503:
---

+1.

> Remove explicit order by in qfiles for union tests
> --
>
> Key: HIVE-14503
> URL: https://issues.apache.org/jira/browse/HIVE-14503
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14503.1.patch, HIVE-14503.2.patch, 
> HIVE-14503.3.patch, HIVE-14503.4.patch
>
>
> Identify qfiles with explicit order by and replace them with 
> SORT_QUERY_RESULTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14502) Convert MiniTez tests to MiniLlap tests

2016-08-18 Thread Siddharth Seth (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427340#comment-15427340
 ] 

Siddharth Seth commented on HIVE-14502:
---

+1. After jenkins gets back.

> Convert MiniTez tests to MiniLlap tests
> ---
>
> Key: HIVE-14502
> URL: https://issues.apache.org/jira/browse/HIVE-14502
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14502.1.patch, HIVE-14502.2.patch
>
>
> Llap shares most of the codepath with tez. MiniLlapCliDriver is much faster 
> than MiniTezCliDriver because of threaded executors and caching. 
> MiniTezCliDriver tests takes around 3hr 15mins to run around 400 tests. To 
> cut down this test time significantly it makes sense to move over mive tez 
> tests to mini llap tests.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14561) Minor ptest2 improvements

2016-08-18 Thread Siddharth Seth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-14561:
--
Issue Type: Sub-task  (was: Task)
Parent: HIVE-13503

> Minor ptest2 improvements
> -
>
> Key: HIVE-14561
> URL: https://issues.apache.org/jira/browse/HIVE-14561
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14561.01.patch
>
>
> Re-purposed to track a few more improvements.
> - Update spring framework to work with Java8
> - Change elapseTime logging to milliseconds from seconds
> - Add thread name to log files.
> - Allow an empty logsEndPoint if outputDir is not specified
> - Log configuration when starting in a web server
> - Allow tests to be run even if no qtests property is set
> - Fix an exception on test completion when using FixedExecutionContextProvider



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14563) StatsOptimizer treats NULL in a wrong way

2016-08-18 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427338#comment-15427338
 ] 

Ashutosh Chauhan commented on HIVE-14563:
-

+1

> StatsOptimizer treats NULL in a wrong way
> -
>
> Key: HIVE-14563
> URL: https://issues.apache.org/jira/browse/HIVE-14563
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14563.01.patch
>
>
> {code}
> OSTHOOK: query: explain select count(key) from (select null as key from 
> src)src
> POSTHOOK: type: QUERY
> STAGE DEPENDENCIES:
>   Stage-0 is a root stage
> STAGE PLANS:
>   Stage: Stage-0
> Fetch Operator
>   limit: 1
>   Processor Tree:
> ListSink
> PREHOOK: query: select count(key) from (select null as key from src)src
> PREHOOK: type: QUERY
> PREHOOK: Input: default@src
>  A masked pattern was here 
> POSTHOOK: query: select count(key) from (select null as key from src)src
> POSTHOOK: type: QUERY
> POSTHOOK: Input: default@src
>  A masked pattern was here 
> 500
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14576) Testing: Fixes to TestHBaseMinimrCliDriver

2016-08-18 Thread Siddharth Seth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-14576:
--
Parent Issue: HIVE-14547  (was: HIVE-13503)

> Testing: Fixes to TestHBaseMinimrCliDriver
> --
>
> Key: HIVE-14576
> URL: https://issues.apache.org/jira/browse/HIVE-14576
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>
> 1. Runtime over 1000s.
> 2. Runs as an isolated test.
> Need to fix both.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-14574) use consistent hashing for LLAP consistent splits to alleviate impact from cluster changes

2016-08-18 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427319#comment-15427319
 ] 

Sergey Shelukhin edited comment on HIVE-14574 at 8/18/16 10:59 PM:
---

We can easily achieve unique IDs by taking ZK node name (which is unique and 
sequential). However, as the new node is re-added to the tail on every restart, 
it throws everything off.
What we want conceptually is that restarted nodes go into the same position in 
the order as the nodes they replaced, but that is difficult to achieve (or 
impossible, I am not sure we have such a concept with Slider). We can just have 
sequential numbers in ZK to take, with every registering node fighting for the 
lowest number. I wonder if there's already a primitive for that in curator ;) 
That way the replacement nodes take the place of the nodes that died, most of 
the time, and leave the running ones undisturbed for most of the time.
We can also assume we usually restart in the same place and order by the first 
time there was LLAP on that particular node for that cluster, then by name. 


was (Author: sershe):
We can easily achieve unique IDs by taking ZK node name (which is unique and 
sequential). However, as the new node is re-added to the tail on every restart, 
it throws everything off.
What we want conceptually is that restarted nodes go into the same position in 
the order as the nodes they replaced, but that is difficult to achieve (or 
impossible, I am not sure we have such a concept with Slider). We can just have 
sequential numbers in ZK to take, with every registering node fighting for the 
lowest number. I wonder if there's already a primitive for that in curator ;)
We can also assume we usually restart in the same place and order by the first 
time there was LLAP on that particular node for that cluster, then by name. 

> use consistent hashing for LLAP consistent splits to alleviate impact from 
> cluster changes
> --
>
> Key: HIVE-14574
> URL: https://issues.apache.org/jira/browse/HIVE-14574
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14574.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-14373) Add integration tests for hive on S3

2016-08-18 Thread Abdullah Yousufi (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427324#comment-15427324
 ] 

Abdullah Yousufi edited comment on HIVE-14373 at 8/18/16 10:54 PM:
---

Hey everyone, I've updated RB with this patch, which takes into account 
HIVE-1's removal of vm files. I've also added features like source and 
output table paths, as well as making the bucket path a property to pass in 
when running the test. Major thanks to [~yalovyyi] for the reference patch and 
everyone else for their feedback so far.


was (Author: ayousufi):
Hey everyone, I've updated RB with this patch, which takes into account 
HIVE-1's removal of vm files. I've also added features like source and 
output table paths, as well as making the bucket path a property to pass in 
when running the test. Major thanks to [~yalovyyi]'s for the reference patch 
and everyone else for their feedback so far.

> Add integration tests for hive on S3
> 
>
> Key: HIVE-14373
> URL: https://issues.apache.org/jira/browse/HIVE-14373
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergio Peña
>Assignee: Abdullah Yousufi
> Attachments: HIVE-14373.02.patch, HIVE-14373.03.patch, 
> HIVE-14373.patch
>
>
> With Hive doing improvements to run on S3, it would be ideal to have better 
> integration testing on S3.
> These S3 tests won't be able to be executed by HiveQA because it will need 
> Amazon credentials. We need to write suite based on ideas from the Hadoop 
> project where:
> - an xml file is provided with S3 credentials
> - a committer must run these tests manually to verify it works
> - the xml file should not be part of the commit, and hiveqa should not run 
> these tests.
> https://wiki.apache.org/hadoop/HowToContribute#Submitting_patches_against_object_stores_such_as_Amazon_S3.2C_OpenStack_Swift_and_Microsoft_Azure



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14373) Add integration tests for hive on S3

2016-08-18 Thread Abdullah Yousufi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abdullah Yousufi updated HIVE-14373:

Attachment: HIVE-14373.03.patch

Hey everyone, I've updated RB with this patch, which takes into account 
HIVE-1's removal of vm files. I've also added features like source and 
output table paths, as well as making the bucket path a property to pass in 
when running the test. Major thanks to [~yalovyyi]'s for the reference patch 
and everyone else for their feedback so far.

> Add integration tests for hive on S3
> 
>
> Key: HIVE-14373
> URL: https://issues.apache.org/jira/browse/HIVE-14373
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergio Peña
>Assignee: Abdullah Yousufi
> Attachments: HIVE-14373.02.patch, HIVE-14373.03.patch, 
> HIVE-14373.patch
>
>
> With Hive doing improvements to run on S3, it would be ideal to have better 
> integration testing on S3.
> These S3 tests won't be able to be executed by HiveQA because it will need 
> Amazon credentials. We need to write suite based on ideas from the Hadoop 
> project where:
> - an xml file is provided with S3 credentials
> - a committer must run these tests manually to verify it works
> - the xml file should not be part of the commit, and hiveqa should not run 
> these tests.
> https://wiki.apache.org/hadoop/HowToContribute#Submitting_patches_against_object_stores_such_as_Amazon_S3.2C_OpenStack_Swift_and_Microsoft_Azure



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14574) use consistent hashing for LLAP consistent splits to alleviate impact from cluster changes

2016-08-18 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427319#comment-15427319
 ] 

Sergey Shelukhin commented on HIVE-14574:
-

We can easily achieve unique IDs by taking ZK node name (which is unique and 
sequential). However, as the new node is re-added to the tail on every restart, 
it throws everything off.
What we want conceptually is that restarted nodes go into the same position in 
the order as the nodes they replaced, but that is difficult to achieve (or 
impossible, I am not sure we have such a concept with Slider). We can just have 
sequential numbers in ZK to take, with every registering node fighting for the 
lowest number. I wonder if there's already a primitive for that in curator ;)
We can also assume we usually restart in the same place and order by the first 
time there was LLAP on that particular node for that cluster, then by name. 

> use consistent hashing for LLAP consistent splits to alleviate impact from 
> cluster changes
> --
>
> Key: HIVE-14574
> URL: https://issues.apache.org/jira/browse/HIVE-14574
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14574.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14522) CBO: Calcite Operator To Hive Operator(Calcite Return Path): Fix test failure for auto_join_filters

2016-08-18 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427314#comment-15427314
 ] 

Ashutosh Chauhan commented on HIVE-14522:
-

+1

> CBO: Calcite Operator To Hive Operator(Calcite Return Path): Fix test failure 
> for auto_join_filters
> ---
>
> Key: HIVE-14522
> URL: https://issues.apache.org/jira/browse/HIVE-14522
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-14522.1.patch, HIVE-14522.2.patch
>
>
> {code}
> CREATE TABLE smb_input1(key int, value int) CLUSTERED BY (key) SORTED BY 
> (key) INTO 2 BUCKETS; 
> CREATE TABLE smb_input2(key int, value int) CLUSTERED BY (value) SORTED BY 
> (value) INTO 2 BUCKETS; 
> LOAD DATA LOCAL INPATH '../../data/files/in1.txt' into table smb_input1;
> LOAD DATA LOCAL INPATH '../../data/files/in2.txt' into table smb_input1;
> LOAD DATA LOCAL INPATH '../../data/files/in1.txt' into table smb_input2;
> LOAD DATA LOCAL INPATH '../../data/files/in2.txt' into table smb_input2;
> SET hive.optimize.bucketmapjoin = true;
> SET hive.optimize.bucketmapjoin.sortedmerge = true;
> SET hive.input.format = 
> org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> SET hive.outerjoin.supports.filters = false;
> {code}
> {code} SELECT sum(hash(a.key,a.value,b.key,b.value)) FROM myinput1 a LEFT 
> OUTER JOIN myinput1 b on a.key > 40 AND a.value > 50 AND a.key = a.value AND 
> b.key > 40 AND b.value > 50 AND b.key = b.value; {code}
> {code} Expected result: 3078400 Actual result: 4937935 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14561) Minor ptest2 improvements

2016-08-18 Thread Siddharth Seth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-14561:
--
Status: Patch Available  (was: Open)

> Minor ptest2 improvements
> -
>
> Key: HIVE-14561
> URL: https://issues.apache.org/jira/browse/HIVE-14561
> Project: Hive
>  Issue Type: Task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14561.01.patch
>
>
> Re-purposed to track a few more improvements.
> - Update spring framework to work with Java8
> - Change elapseTime logging to milliseconds from seconds
> - Add thread name to log files.
> - Allow an empty logsEndPoint if outputDir is not specified
> - Log configuration when starting in a web server
> - Allow tests to be run even if no qtests property is set
> - Fix an exception on test completion when using FixedExecutionContextProvider



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-13403) Make Streaming API not create empty buckets (at least as an option)

2016-08-18 Thread Wei Zheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-13403:
-
Status: Open  (was: Patch Available)

> Make Streaming API not create empty buckets (at least as an option)
> ---
>
> Key: HIVE-13403
> URL: https://issues.apache.org/jira/browse/HIVE-13403
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Transactions
>Affects Versions: 1.3.0
>Reporter: Eugene Koifman
>Assignee: Wei Zheng
>Priority: Critical
> Attachments: HIVE-13403.1.patch, HIVE-13403.2.patch, 
> HIVE-13403.3.patch
>
>
> as of HIVE-11983, when a TransactionBatch is opened in StreamingAPI, a full 
> compliment of bucket files (AbstractRecordWriter.createRecordUpdaters()) is 
> created on disk even though some may end up receiving no data.
> It would be better to create them on demand and not clog the FS.
> Tez can handle missing (empty) buckets and on MR bucket join algorithms will 
> check if all buckets are there and bail out if not.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-13403) Make Streaming API not create empty buckets (at least as an option)

2016-08-18 Thread Wei Zheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-13403:
-
Attachment: HIVE-13403.3.patch

> Make Streaming API not create empty buckets (at least as an option)
> ---
>
> Key: HIVE-13403
> URL: https://issues.apache.org/jira/browse/HIVE-13403
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Transactions
>Affects Versions: 1.3.0
>Reporter: Eugene Koifman
>Assignee: Wei Zheng
>Priority: Critical
> Attachments: HIVE-13403.1.patch, HIVE-13403.2.patch, 
> HIVE-13403.3.patch
>
>
> as of HIVE-11983, when a TransactionBatch is opened in StreamingAPI, a full 
> compliment of bucket files (AbstractRecordWriter.createRecordUpdaters()) is 
> created on disk even though some may end up receiving no data.
> It would be better to create them on demand and not clog the FS.
> Tez can handle missing (empty) buckets and on MR bucket join algorithms will 
> check if all buckets are there and bail out if not.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-13403) Make Streaming API not create empty buckets (at least as an option)

2016-08-18 Thread Wei Zheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-13403:
-
Status: Patch Available  (was: Open)

> Make Streaming API not create empty buckets (at least as an option)
> ---
>
> Key: HIVE-13403
> URL: https://issues.apache.org/jira/browse/HIVE-13403
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Transactions
>Affects Versions: 1.3.0
>Reporter: Eugene Koifman
>Assignee: Wei Zheng
>Priority: Critical
> Attachments: HIVE-13403.1.patch, HIVE-13403.2.patch, 
> HIVE-13403.3.patch
>
>
> as of HIVE-11983, when a TransactionBatch is opened in StreamingAPI, a full 
> compliment of bucket files (AbstractRecordWriter.createRecordUpdaters()) is 
> created on disk even though some may end up receiving no data.
> It would be better to create them on demand and not clog the FS.
> Tez can handle missing (empty) buckets and on MR bucket join algorithms will 
> check if all buckets are there and bail out if not.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14566) LLAP IO reads timestamp wrongly

2016-08-18 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427294#comment-15427294
 ] 

Owen O'Malley commented on HIVE-14566:
--

I really think that we need to make a context object for passing information 
down to the tree reader. Otherwise, we are going to get killed by adding 
parameters to this, especially when ORC makes it out of Hive.

How about something like:

{code}
public interface Context {
  SchemaEvolution getEvolution();
  boolean skipCorrupt();
  String writerTimezone();
}

public static TreeReader createTreeReader(TypeDescription readerType,
  Context 
context) throws IOException {
{code}

Then we can add new information without making sure a huge change that touches 
all of the methods.

> LLAP IO reads timestamp wrongly
> ---
>
> Key: HIVE-14566
> URL: https://issues.apache.org/jira/browse/HIVE-14566
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.1.0, 2.0.1, 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Critical
> Attachments: HIVE-14566.1.patch, HIVE-14566.2.patch
>
>
> HIVE-10127 is causing incorrect results when orc_merge12.q is run in llap.
> It reads timestamp wrongly.
> {code:title=LLAP IO Enabled}
> hive> select atimestamp1 from alltypesorc3xcols limit 10;
> OK
> 1969-12-31 15:59:46.674
> NULL
> 1969-12-31 15:59:55.787
> 1969-12-31 15:59:44.187
> 1969-12-31 15:59:50.434
> 1969-12-31 16:00:15.007
> 1969-12-31 16:00:07.021
> 1969-12-31 16:00:04.963
> 1969-12-31 15:59:52.176
> 1969-12-31 15:59:44.569
> {code}
> {code:title=LLAP IO Disabled}
> hive> select atimestamp1 from alltypesorc3xcols limit 10;
> OK
> 1969-12-31 15:59:46.674
> NULL
> 1969-12-31 15:59:55.787
> 1969-12-31 15:59:44.187
> 1969-12-31 15:59:50.434
> 1969-12-31 16:00:14.007
> 1969-12-31 16:00:06.021
> 1969-12-31 16:00:03.963
> 1969-12-31 15:59:52.176
> 1969-12-31 15:59:44.569
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14522) CBO: Calcite Operator To Hive Operator(Calcite Return Path): Fix test failure for auto_join_filters

2016-08-18 Thread Vineet Garg (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427288#comment-15427288
 ] 

Vineet Garg commented on HIVE-14522:


Created: https://reviews.apache.org/r/51226/

> CBO: Calcite Operator To Hive Operator(Calcite Return Path): Fix test failure 
> for auto_join_filters
> ---
>
> Key: HIVE-14522
> URL: https://issues.apache.org/jira/browse/HIVE-14522
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-14522.1.patch, HIVE-14522.2.patch
>
>
> {code}
> CREATE TABLE smb_input1(key int, value int) CLUSTERED BY (key) SORTED BY 
> (key) INTO 2 BUCKETS; 
> CREATE TABLE smb_input2(key int, value int) CLUSTERED BY (value) SORTED BY 
> (value) INTO 2 BUCKETS; 
> LOAD DATA LOCAL INPATH '../../data/files/in1.txt' into table smb_input1;
> LOAD DATA LOCAL INPATH '../../data/files/in2.txt' into table smb_input1;
> LOAD DATA LOCAL INPATH '../../data/files/in1.txt' into table smb_input2;
> LOAD DATA LOCAL INPATH '../../data/files/in2.txt' into table smb_input2;
> SET hive.optimize.bucketmapjoin = true;
> SET hive.optimize.bucketmapjoin.sortedmerge = true;
> SET hive.input.format = 
> org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> SET hive.outerjoin.supports.filters = false;
> {code}
> {code} SELECT sum(hash(a.key,a.value,b.key,b.value)) FROM myinput1 a LEFT 
> OUTER JOIN myinput1 b on a.key > 40 AND a.value > 50 AND a.key = a.value AND 
> b.key > 40 AND b.value > 50 AND b.key = b.value; {code}
> {code} Expected result: 3078400 Actual result: 4937935 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14522) CBO: Calcite Operator To Hive Operator(Calcite Return Path): Fix test failure for auto_join_filters

2016-08-18 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427277#comment-15427277
 ] 

Ashutosh Chauhan commented on HIVE-14522:
-

Can you create a RB for this?

> CBO: Calcite Operator To Hive Operator(Calcite Return Path): Fix test failure 
> for auto_join_filters
> ---
>
> Key: HIVE-14522
> URL: https://issues.apache.org/jira/browse/HIVE-14522
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-14522.1.patch, HIVE-14522.2.patch
>
>
> {code}
> CREATE TABLE smb_input1(key int, value int) CLUSTERED BY (key) SORTED BY 
> (key) INTO 2 BUCKETS; 
> CREATE TABLE smb_input2(key int, value int) CLUSTERED BY (value) SORTED BY 
> (value) INTO 2 BUCKETS; 
> LOAD DATA LOCAL INPATH '../../data/files/in1.txt' into table smb_input1;
> LOAD DATA LOCAL INPATH '../../data/files/in2.txt' into table smb_input1;
> LOAD DATA LOCAL INPATH '../../data/files/in1.txt' into table smb_input2;
> LOAD DATA LOCAL INPATH '../../data/files/in2.txt' into table smb_input2;
> SET hive.optimize.bucketmapjoin = true;
> SET hive.optimize.bucketmapjoin.sortedmerge = true;
> SET hive.input.format = 
> org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> SET hive.outerjoin.supports.filters = false;
> {code}
> {code} SELECT sum(hash(a.key,a.value,b.key,b.value)) FROM myinput1 a LEFT 
> OUTER JOIN myinput1 b on a.key > 40 AND a.value > 50 AND a.key = a.value AND 
> b.key > 40 AND b.value > 50 AND b.key = b.value; {code}
> {code} Expected result: 3078400 Actual result: 4937935 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14503) Remove explicit order by in qfiles for union tests

2016-08-18 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427270#comment-15427270
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-14503:
--

looks good, conditional +1 based on clean run. Also, will be a "nice to have":  
the comment I had added in the RB.


> Remove explicit order by in qfiles for union tests
> --
>
> Key: HIVE-14503
> URL: https://issues.apache.org/jira/browse/HIVE-14503
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14503.1.patch, HIVE-14503.2.patch, 
> HIVE-14503.3.patch, HIVE-14503.4.patch
>
>
> Identify qfiles with explicit order by and replace them with 
> SORT_QUERY_RESULTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14564) Column Pruning generates out of order columns in SelectOperator which cause ArrayIndexOutOfBoundsException.

2016-08-18 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427266#comment-15427266
 ] 

Ashutosh Chauhan commented on HIVE-14564:
-

[~zxu] Thanks for the patch. Can you add a testcase to demonstrate the problem 
you are facing here?

> Column Pruning generates out of order columns in SelectOperator which cause 
> ArrayIndexOutOfBoundsException.
> ---
>
> Key: HIVE-14564
> URL: https://issues.apache.org/jira/browse/HIVE-14564
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.1.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: HIVE-14564.000.patch
>
>
> Column Pruning generates out of order columns in SelectOperator which cause 
> ArrayIndexOutOfBoundsException.
> {code}
> 2016-07-26 21:49:24,390 FATAL [main] 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:507)
>   at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.ArrayIndexOutOfBoundsException
>   at 
> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:397)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:497)
>   ... 9 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException
>   at java.lang.System.arraycopy(Native Method)
>   at org.apache.hadoop.io.Text.set(Text.java:225)
>   at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryString.init(LazyBinaryString.java:48)
>   at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.uncheckedGetField(LazyBinaryStruct.java:264)
>   at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:201)
>   at 
> org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:64)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator._evaluate(ExprNodeColumnEvaluator.java:94)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
>   at 
> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.makeValueWritable(ReduceSinkOperator.java:550)
>   at 
> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:377)
>   ... 13 more
> {code}
> The exception is because the serialization and deserialization doesn't match.
> The serialization by LazyBinarySerDe from previous MapReduce job used 
> different order of columns. When the current MapReduce job deserialized the 
> intermediate sequence file generated by previous MapReduce job, it will get 
> corrupted data from the deserialization using wrong order of columns by 
> LazyBinaryStruct. The unmatched columns between  serialization and 
> deserialization is caused by SelectOperator's Column Pruning 
> {{ColumnPrunerSelectProc}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14574) use consistent hashing for LLAP consistent splits to alleviate impact from cluster changes

2016-08-18 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427231#comment-15427231
 ] 

Gopal V commented on HIVE-14574:


> In terms of the 3 byte difference in ORC 

That's the file MAGIC of "ORC" ... 3 byte.

> use consistent hashing for LLAP consistent splits to alleviate impact from 
> cluster changes
> --
>
> Key: HIVE-14574
> URL: https://issues.apache.org/jira/browse/HIVE-14574
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14574.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14503) Remove explicit order by in qfiles for union tests

2016-08-18 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14503:
-
Attachment: HIVE-14503.4.patch

Reuploading the correct patch.

> Remove explicit order by in qfiles for union tests
> --
>
> Key: HIVE-14503
> URL: https://issues.apache.org/jira/browse/HIVE-14503
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14503.1.patch, HIVE-14503.2.patch, 
> HIVE-14503.3.patch, HIVE-14503.4.patch
>
>
> Identify qfiles with explicit order by and replace them with 
> SORT_QUERY_RESULTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-14577) Sync up configs for MiniTez and MiniLlap

2016-08-18 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran reassigned HIVE-14577:


Assignee: Prasanth Jayachandran

> Sync up configs for MiniTez and MiniLlap
> 
>
> Key: HIVE-14577
> URL: https://issues.apache.org/jira/browse/HIVE-14577
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>
> Some configs like hive.explain.user is different for MiniTez and MiniLlap. 
> Similarly there could be others. We should sync up the configs that could 
> affect the plan between tez and llap so that it will be easier to compare the 
> test output files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14502) Convert MiniTez tests to MiniLlap tests

2016-08-18 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427246#comment-15427246
 ] 

Prasanth Jayachandran commented on HIVE-14502:
--

Created HIVE-14577


> Convert MiniTez tests to MiniLlap tests
> ---
>
> Key: HIVE-14502
> URL: https://issues.apache.org/jira/browse/HIVE-14502
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14502.1.patch, HIVE-14502.2.patch
>
>
> Llap shares most of the codepath with tez. MiniLlapCliDriver is much faster 
> than MiniTezCliDriver because of threaded executors and caching. 
> MiniTezCliDriver tests takes around 3hr 15mins to run around 400 tests. To 
> cut down this test time significantly it makes sense to move over mive tez 
> tests to mini llap tests.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14574) use consistent hashing for LLAP consistent splits to alleviate impact from cluster changes

2016-08-18 Thread Siddharth Seth (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427244#comment-15427244
 ] 

Siddharth Seth commented on HIVE-14574:
---

Another thing to consider here is that knownLocations is a list sorted by name 
(random uuid i think). A new node could show up anywhere in this array. The 
name generation would need to be fixed as well if doing something like this. 
That could be as simple as a counter in ZK - but I think a fix is required for 
something like this to actually work.

> use consistent hashing for LLAP consistent splits to alleviate impact from 
> cluster changes
> --
>
> Key: HIVE-14574
> URL: https://issues.apache.org/jira/browse/HIVE-14574
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14574.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14502) Convert MiniTez tests to MiniLlap tests

2016-08-18 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427239#comment-15427239
 ] 

Prasanth Jayachandran commented on HIVE-14502:
--

Sure. Will create one. It was easier for me to verify the test results by 
explicitly setting explain user to true for these tests. 

> Convert MiniTez tests to MiniLlap tests
> ---
>
> Key: HIVE-14502
> URL: https://issues.apache.org/jira/browse/HIVE-14502
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14502.1.patch, HIVE-14502.2.patch
>
>
> Llap shares most of the codepath with tez. MiniLlapCliDriver is much faster 
> than MiniTezCliDriver because of threaded executors and caching. 
> MiniTezCliDriver tests takes around 3hr 15mins to run around 400 tests. To 
> cut down this test time significantly it makes sense to move over mive tez 
> tests to mini llap tests.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14562) CBO (Calcite Return Path) Wrong results for limit + offset

2016-08-18 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427234#comment-15427234
 ] 

Ashutosh Chauhan commented on HIVE-14562:
-

[~jcamachorodriguez] Can you please review this?

> CBO (Calcite Return Path) Wrong results for limit + offset
> --
>
> Key: HIVE-14562
> URL: https://issues.apache.org/jira/browse/HIVE-14562
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer
>Affects Versions: 2.1.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-14562.patch
>
>
> offset is missed altogether.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14574) use consistent hashing for LLAP consistent splits to alleviate impact from cluster changes

2016-08-18 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427233#comment-15427233
 ] 

Prasanth Jayachandran commented on HIVE-14574:
--

The 3 byte difference will still be there based on what split strategy is 
chose. If a big file is chosen by ETL split strategy the first split will start 
from 3 offset. If chosen by BI split strategy the first split will start from 
0. My fix was related to inconsistently choosing strategies based on AM cache 
being on or off. 

> use consistent hashing for LLAP consistent splits to alleviate impact from 
> cluster changes
> --
>
> Key: HIVE-14574
> URL: https://issues.apache.org/jira/browse/HIVE-14574
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14574.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-14574) use consistent hashing for LLAP consistent splits to alleviate impact from cluster changes

2016-08-18 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427233#comment-15427233
 ] 

Prasanth Jayachandran edited comment on HIVE-14574 at 8/18/16 9:55 PM:
---

The 3 byte difference will still be there based on what split strategy is 
chosen. If a big file is chosen by ETL split strategy the first split will 
start from 3 offset. If chosen by BI split strategy the first split will start 
from 0. My fix was related to inconsistently choosing strategies based on AM 
cache being on or off. 


was (Author: prasanth_j):
The 3 byte difference will still be there based on what split strategy is 
chose. If a big file is chosen by ETL split strategy the first split will start 
from 3 offset. If chosen by BI split strategy the first split will start from 
0. My fix was related to inconsistently choosing strategies based on AM cache 
being on or off. 

> use consistent hashing for LLAP consistent splits to alleviate impact from 
> cluster changes
> --
>
> Key: HIVE-14574
> URL: https://issues.apache.org/jira/browse/HIVE-14574
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14574.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14563) StatsOptimizer treats NULL in a wrong way

2016-08-18 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427232#comment-15427232
 ] 

Ashutosh Chauhan commented on HIVE-14563:
-

Can you create a RB for it?

> StatsOptimizer treats NULL in a wrong way
> -
>
> Key: HIVE-14563
> URL: https://issues.apache.org/jira/browse/HIVE-14563
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14563.01.patch
>
>
> {code}
> OSTHOOK: query: explain select count(key) from (select null as key from 
> src)src
> POSTHOOK: type: QUERY
> STAGE DEPENDENCIES:
>   Stage-0 is a root stage
> STAGE PLANS:
>   Stage: Stage-0
> Fetch Operator
>   limit: 1
>   Processor Tree:
> ListSink
> PREHOOK: query: select count(key) from (select null as key from src)src
> PREHOOK: type: QUERY
> PREHOOK: Input: default@src
>  A masked pattern was here 
> POSTHOOK: query: select count(key) from (select null as key from src)src
> POSTHOOK: type: QUERY
> POSTHOOK: Input: default@src
>  A masked pattern was here 
> 500
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14503) Remove explicit order by in qfiles for union tests

2016-08-18 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14503:
-
Attachment: (was: HIVE-14503.4.patch)

> Remove explicit order by in qfiles for union tests
> --
>
> Key: HIVE-14503
> URL: https://issues.apache.org/jira/browse/HIVE-14503
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14503.1.patch, HIVE-14503.2.patch, 
> HIVE-14503.3.patch, HIVE-14503.4.patch
>
>
> Identify qfiles with explicit order by and replace them with 
> SORT_QUERY_RESULTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14502) Convert MiniTez tests to MiniLlap tests

2016-08-18 Thread Siddharth Seth (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427223#comment-15427223
 ] 

Siddharth Seth commented on HIVE-14502:
---

Should we change that to be consistent in a follow up jira; avoid unnecessary 
sets in the q files.

> Convert MiniTez tests to MiniLlap tests
> ---
>
> Key: HIVE-14502
> URL: https://issues.apache.org/jira/browse/HIVE-14502
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14502.1.patch, HIVE-14502.2.patch
>
>
> Llap shares most of the codepath with tez. MiniLlapCliDriver is much faster 
> than MiniTezCliDriver because of threaded executors and caching. 
> MiniTezCliDriver tests takes around 3hr 15mins to run around 400 tests. To 
> cut down this test time significantly it makes sense to move over mive tez 
> tests to mini llap tests.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14574) use consistent hashing for LLAP consistent splits to alleviate impact from cluster changes

2016-08-18 Thread Siddharth Seth (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427219#comment-15427219
 ] 

Siddharth Seth commented on HIVE-14574:
---

bq. block boundaries and stripe boundaries
I'n not sure why this comment was even added in there. In terms of the 3 byte 
difference in ORC - [~prasanth_j] may have fixed that already.

Will Hashing.consistentHash generate the same values across different JVMs ? I 
think I had considered this earlier, and finally went with murmur since that 
does generate the same value across different JVMs / machines.

Needs new unit tests to validate behaviour. Don't think there's any reasons for 
the existing ones to break.

> use consistent hashing for LLAP consistent splits to alleviate impact from 
> cluster changes
> --
>
> Key: HIVE-14574
> URL: https://issues.apache.org/jira/browse/HIVE-14574
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14574.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14503) Remove explicit order by in qfiles for union tests

2016-08-18 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427216#comment-15427216
 ] 

Prasanth Jayachandran commented on HIVE-14503:
--

I left union23.q to have union and order by in the .4 patch. 

> Remove explicit order by in qfiles for union tests
> --
>
> Key: HIVE-14503
> URL: https://issues.apache.org/jira/browse/HIVE-14503
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14503.1.patch, HIVE-14503.2.patch, 
> HIVE-14503.3.patch, HIVE-14503.4.patch
>
>
> Identify qfiles with explicit order by and replace them with 
> SORT_QUERY_RESULTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14503) Remove explicit order by in qfiles for union tests

2016-08-18 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14503:
-
Attachment: HIVE-14503.4.patch

> Remove explicit order by in qfiles for union tests
> --
>
> Key: HIVE-14503
> URL: https://issues.apache.org/jira/browse/HIVE-14503
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14503.1.patch, HIVE-14503.2.patch, 
> HIVE-14503.3.patch, HIVE-14503.4.patch
>
>
> Identify qfiles with explicit order by and replace them with 
> SORT_QUERY_RESULTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14574) use consistent hashing for LLAP consistent splits to alleviate impact from cluster changes

2016-08-18 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427204#comment-15427204
 ] 

Sergey Shelukhin commented on HIVE-14574:
-

Hrrm. That sounds like some voodoo magic. Sure...

> use consistent hashing for LLAP consistent splits to alleviate impact from 
> cluster changes
> --
>
> Key: HIVE-14574
> URL: https://issues.apache.org/jira/browse/HIVE-14574
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14574.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14522) CBO: Calcite Operator To Hive Operator(Calcite Return Path): Fix test failure for auto_join_filters

2016-08-18 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427208#comment-15427208
 ] 

Hive QA commented on HIVE-14522:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12824421/HIVE-14522.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10442 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_1]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_2]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[load_dyn_part1]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[load_dyn_part2]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[transform_ppr1]
org.apache.hive.beeline.TestBeeLineWithArgs.testEmbeddedBeelineOutputs
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testSelectThriftSerializeInTasks
org.apache.hive.service.cli.operation.TestOperationLoggingLayout.testSwitchLogLayout
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/930/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/930/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-930/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12824421 - PreCommit-HIVE-MASTER-Build

> CBO: Calcite Operator To Hive Operator(Calcite Return Path): Fix test failure 
> for auto_join_filters
> ---
>
> Key: HIVE-14522
> URL: https://issues.apache.org/jira/browse/HIVE-14522
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-14522.1.patch, HIVE-14522.2.patch
>
>
> {code}
> CREATE TABLE smb_input1(key int, value int) CLUSTERED BY (key) SORTED BY 
> (key) INTO 2 BUCKETS; 
> CREATE TABLE smb_input2(key int, value int) CLUSTERED BY (value) SORTED BY 
> (value) INTO 2 BUCKETS; 
> LOAD DATA LOCAL INPATH '../../data/files/in1.txt' into table smb_input1;
> LOAD DATA LOCAL INPATH '../../data/files/in2.txt' into table smb_input1;
> LOAD DATA LOCAL INPATH '../../data/files/in1.txt' into table smb_input2;
> LOAD DATA LOCAL INPATH '../../data/files/in2.txt' into table smb_input2;
> SET hive.optimize.bucketmapjoin = true;
> SET hive.optimize.bucketmapjoin.sortedmerge = true;
> SET hive.input.format = 
> org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> SET hive.outerjoin.supports.filters = false;
> {code}
> {code} SELECT sum(hash(a.key,a.value,b.key,b.value)) FROM myinput1 a LEFT 
> OUTER JOIN myinput1 b on a.key > 40 AND a.value > 50 AND a.key = a.value AND 
> b.key > 40 AND b.value > 50 AND b.key = b.value; {code}
> {code} Expected result: 3078400 Actual result: 4937935 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14576) Testing: Fixes to TestHBaseMinimrCliDriver

2016-08-18 Thread Vaibhav Gumashta (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-14576:

Summary: Testing: Fixes to TestHBaseMinimrCliDriver  (was: Fixes to 
TestHBaseMinimrCliDriver)

> Testing: Fixes to TestHBaseMinimrCliDriver
> --
>
> Key: HIVE-14576
> URL: https://issues.apache.org/jira/browse/HIVE-14576
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>
> 1. Runtime over 1000s.
> 2. Runs as an isolated test.
> Need to fix both.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14512) Testing: Evaluate and fix overheads in executing a single q test

2016-08-18 Thread Vaibhav Gumashta (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-14512:

Issue Type: Sub-task  (was: Bug)
Parent: HIVE-13503

> Testing: Evaluate and fix overheads in executing a single q test 
> -
>
> Key: HIVE-14512
> URL: https://issues.apache.org/jira/browse/HIVE-14512
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14566) LLAP IO reads timestamp wrongly

2016-08-18 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14566:
-
Attachment: HIVE-14566.2.patch

Addressed [~sershe]'s review comments.

> LLAP IO reads timestamp wrongly
> ---
>
> Key: HIVE-14566
> URL: https://issues.apache.org/jira/browse/HIVE-14566
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.1.0, 2.0.1, 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Critical
> Attachments: HIVE-14566.1.patch, HIVE-14566.2.patch
>
>
> HIVE-10127 is causing incorrect results when orc_merge12.q is run in llap.
> It reads timestamp wrongly.
> {code:title=LLAP IO Enabled}
> hive> select atimestamp1 from alltypesorc3xcols limit 10;
> OK
> 1969-12-31 15:59:46.674
> NULL
> 1969-12-31 15:59:55.787
> 1969-12-31 15:59:44.187
> 1969-12-31 15:59:50.434
> 1969-12-31 16:00:15.007
> 1969-12-31 16:00:07.021
> 1969-12-31 16:00:04.963
> 1969-12-31 15:59:52.176
> 1969-12-31 15:59:44.569
> {code}
> {code:title=LLAP IO Disabled}
> hive> select atimestamp1 from alltypesorc3xcols limit 10;
> OK
> 1969-12-31 15:59:46.674
> NULL
> 1969-12-31 15:59:55.787
> 1969-12-31 15:59:44.187
> 1969-12-31 15:59:50.434
> 1969-12-31 16:00:14.007
> 1969-12-31 16:00:06.021
> 1969-12-31 16:00:03.963
> 1969-12-31 15:59:52.176
> 1969-12-31 15:59:44.569
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14502) Convert MiniTez tests to MiniLlap tests

2016-08-18 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427198#comment-15427198
 ] 

Prasanth Jayachandran commented on HIVE-14502:
--

User explain is set to true for MiniTez and false for MiniLlap in hive-site.. 
changing to true for MiniLlap in hive-site will affect more tests. So I 
explicitly set them to true in qfiles. 

> Convert MiniTez tests to MiniLlap tests
> ---
>
> Key: HIVE-14502
> URL: https://issues.apache.org/jira/browse/HIVE-14502
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14502.1.patch, HIVE-14502.2.patch
>
>
> Llap shares most of the codepath with tez. MiniLlapCliDriver is much faster 
> than MiniTezCliDriver because of threaded executors and caching. 
> MiniTezCliDriver tests takes around 3hr 15mins to run around 400 tests. To 
> cut down this test time significantly it makes sense to move over mive tez 
> tests to mini llap tests.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14574) use consistent hashing for LLAP consistent splits to alleviate impact from cluster changes

2016-08-18 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427192#comment-15427192
 ] 

Gopal V commented on HIVE-14574:


[~sershe]: adding to my build for today - minor comment

{code}
block boundaries and stripe boundaries
{code}

The start offset can vary by 3-5 bytes depending on this too - round that down 
to a multiple of 8?

> use consistent hashing for LLAP consistent splits to alleviate impact from 
> cluster changes
> --
>
> Key: HIVE-14574
> URL: https://issues.apache.org/jira/browse/HIVE-14574
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14574.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14502) Convert MiniTez tests to MiniLlap tests

2016-08-18 Thread Siddharth Seth (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427191#comment-15427191
 ] 

Siddharth Seth commented on HIVE-14502:
---

Mostly looks good. Not sure why "+set hive.explain.user=true;" this has been 
added to some of the q files.

> Convert MiniTez tests to MiniLlap tests
> ---
>
> Key: HIVE-14502
> URL: https://issues.apache.org/jira/browse/HIVE-14502
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14502.1.patch, HIVE-14502.2.patch
>
>
> Llap shares most of the codepath with tez. MiniLlapCliDriver is much faster 
> than MiniTezCliDriver because of threaded executors and caching. 
> MiniTezCliDriver tests takes around 3hr 15mins to run around 400 tests. To 
> cut down this test time significantly it makes sense to move over mive tez 
> tests to mini llap tests.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14574) use consistent hashing for LLAP consistent splits to alleviate impact from cluster changes

2016-08-18 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14574:

Attachment: HIVE-14574.patch

[~gopalv] [~sseth] can you take a look?

> use consistent hashing for LLAP consistent splits to alleviate impact from 
> cluster changes
> --
>
> Key: HIVE-14574
> URL: https://issues.apache.org/jira/browse/HIVE-14574
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14574.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14574) use consistent hashing for LLAP consistent splits to alleviate impact from cluster changes

2016-08-18 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14574:

Status: Patch Available  (was: Open)

> use consistent hashing for LLAP consistent splits to alleviate impact from 
> cluster changes
> --
>
> Key: HIVE-14574
> URL: https://issues.apache.org/jira/browse/HIVE-14574
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14574.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14572) Investigate jenkins test report timings

2016-08-18 Thread Zoltan Haindrich (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427158#comment-15427158
 ] 

Zoltan Haindrich commented on HIVE-14572:
-

I think that after ptest is done...the TEST*xml-s are parsed into a report by 
some jenkins plugin...and that plugin fails to correctly aggregate the results.

I guess that surefire-report plugin doesn't even run - but I might be wrong ;)

> Investigate jenkins test report timings
> ---
>
> Key: HIVE-14572
> URL: https://issues.apache.org/jira/browse/HIVE-14572
> Project: Hive
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: Zoltan Haindrich
>
> [~sseth] have noticed some odd timings in the jenkins reports
> I've created a sample project, to emulate a clidriver run during qtest:
> the testclass:
> * 1 sec beforeclass
> * 3x 0.2s test
> created using junit4 parameterized.
> Double checkout; second project runs different tests...or at least they have 
> different name.
> here are my preliminary findings:
> || thing || expected || 2.16 || 2.19.1
> | total time | ~3.4s | 1.2s | 3.4s 
> | package time | ~3.4s | 0.61s | 1.7s
> | class time | ~3.4s | 0.61s | 1.7s
> | testcase times | ~.2s | ~.2s | ~.2s 
> notes:
> * using 2.16 beforeclass timngs are totally hidden or lost
> * 2.19.1 does account for beforeclass but still fails to correctly aggregate 
> the two runs of the similary named testclasses
> it might worth a try to look at the bleeding edge of this jenkins plugin...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14165) Remove Hive file listing during split computation

2016-08-18 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427148#comment-15427148
 ] 

Steve Loughran commented on HIVE-14165:
---

the faster list status is only applicable on a recursive listing; if you are 
listing one directory, it's just the same time as before

> Remove Hive file listing during split computation
> -
>
> Key: HIVE-14165
> URL: https://issues.apache.org/jira/browse/HIVE-14165
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.1.0
>Reporter: Abdullah Yousufi
>Assignee: Abdullah Yousufi
> Attachments: HIVE-14165.patch
>
>
> The Hive side listing in FetchOperator.java is unnecessary, since Hadoop's 
> FileInputFormat.java will list the files during split computation anyway to 
> determine their size. One way to remove this is to catch the 
> InvalidInputFormat exception thrown by FileInputFormat#getSplits() on the 
> Hive side instead of doing the file listing beforehand.
> For S3 select queries on partitioned tables, this results in a 2x speedup.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14461) Investigate HBaseMinimrCliDriver tests

2016-08-18 Thread Zoltan Haindrich (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427151#comment-15427151
 ] 

Zoltan Haindrich commented on HIVE-14461:
-

the git history have led me to :
{code}
commit 8018e513e484b8d4ef29beb1f263c0a32df8bc33
Author: Brock Noland 
Date:   Thu Oct 31 18:27:31 2013 +

HIVE-5610 - Merge maven branch into trunk (patch)
{code}
It looks like this was the point when pom.xml have borned...so missing this 
small needle in the haystack was clearly an unintentioal change ;)

i think these tests should be re-enabled...and that loan qfile's extension be 
corrected

> Investigate HBaseMinimrCliDriver tests
> --
>
> Key: HIVE-14461
> URL: https://issues.apache.org/jira/browse/HIVE-14461
> Project: Hive
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: Zoltan Haindrich
>
> during HIVE-1 i've encountered an odd thing:
> HBaseMinimrCliDriver only executes single test...and that test is set using 
> the qfile selector...which looks a out-of-place.
> The only test it executes doesn't follow regular qtest file naming...and has 
> an extension 'm'
> At least the file should be renamedbut I think change wasn't 
> intentional



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14373) Add integration tests for hive on S3

2016-08-18 Thread Zoltan Haindrich (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427131#comment-15427131
 ] 

Zoltan Haindrich commented on HIVE-14373:
-

That's great [~ayousufi], I will try to help anyway I can!

My patch have uncovered a few issues and also left a few questions unanswered - 
I will be working on those during next week...but I will try to not create any 
more glitches before this gets in.

> Add integration tests for hive on S3
> 
>
> Key: HIVE-14373
> URL: https://issues.apache.org/jira/browse/HIVE-14373
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergio Peña
>Assignee: Abdullah Yousufi
> Attachments: HIVE-14373.02.patch, HIVE-14373.patch
>
>
> With Hive doing improvements to run on S3, it would be ideal to have better 
> integration testing on S3.
> These S3 tests won't be able to be executed by HiveQA because it will need 
> Amazon credentials. We need to write suite based on ideas from the Hadoop 
> project where:
> - an xml file is provided with S3 credentials
> - a committer must run these tests manually to verify it works
> - the xml file should not be part of the commit, and hiveqa should not run 
> these tests.
> https://wiki.apache.org/hadoop/HowToContribute#Submitting_patches_against_object_stores_such_as_Amazon_S3.2C_OpenStack_Swift_and_Microsoft_Azure



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14503) Remove explicit order by in qfiles for union tests

2016-08-18 Thread Siddharth Seth (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427130#comment-15427130
 ] 

Siddharth Seth commented on HIVE-14503:
---

Mostly looks good to me.

As an example - ql/src/test/queries/clientpositive/union_script.q - This would 
result in a plan change. Is that significant for the tests? For a lot of the .q 
files the order is removed after a previous insert into ... UNION ... - so I 
don't think it matters there. Maybe we should leave at least one query in place 
with a union and order by.

Triggered another run on jenkins.

> Remove explicit order by in qfiles for union tests
> --
>
> Key: HIVE-14503
> URL: https://issues.apache.org/jira/browse/HIVE-14503
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14503.1.patch, HIVE-14503.2.patch, 
> HIVE-14503.3.patch
>
>
> Identify qfiles with explicit order by and replace them with 
> SORT_QUERY_RESULTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14566) LLAP IO reads timestamp wrongly

2016-08-18 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427129#comment-15427129
 ] 

Prasanth Jayachandran commented on HIVE-14566:
--

The issue is actually not 1 second difference. It happened to be the case in 
the test case (data/files/alltypesorc3xcols file was written with different 
timezone). The actual issue is, llap reader was not making timezone adjustments 
when reading timestamp columns causing difference in results. The non-llap 
reader used to make the timezone adjustments during start of stripe. This was 
missing for llap 
https://github.com/apache/hive/blob/master/orc/src/java/org/apache/orc/impl/TreeReaderFactory.java#L870

Each stripe in orc maintains the timezone that was used by the writer. The 
reader reads the timestamp values using reader's timezone and by knowing the 
writer's timezone information from the stripe footer, the reader will make 
offset adjustments to read timestamp correctly. 

> LLAP IO reads timestamp wrongly
> ---
>
> Key: HIVE-14566
> URL: https://issues.apache.org/jira/browse/HIVE-14566
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.1.0, 2.0.1, 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Critical
> Attachments: HIVE-14566.1.patch
>
>
> HIVE-10127 is causing incorrect results when orc_merge12.q is run in llap.
> It reads timestamp wrongly.
> {code:title=LLAP IO Enabled}
> hive> select atimestamp1 from alltypesorc3xcols limit 10;
> OK
> 1969-12-31 15:59:46.674
> NULL
> 1969-12-31 15:59:55.787
> 1969-12-31 15:59:44.187
> 1969-12-31 15:59:50.434
> 1969-12-31 16:00:15.007
> 1969-12-31 16:00:07.021
> 1969-12-31 16:00:04.963
> 1969-12-31 15:59:52.176
> 1969-12-31 15:59:44.569
> {code}
> {code:title=LLAP IO Disabled}
> hive> select atimestamp1 from alltypesorc3xcols limit 10;
> OK
> 1969-12-31 15:59:46.674
> NULL
> 1969-12-31 15:59:55.787
> 1969-12-31 15:59:44.187
> 1969-12-31 15:59:50.434
> 1969-12-31 16:00:14.007
> 1969-12-31 16:00:06.021
> 1969-12-31 16:00:03.963
> 1969-12-31 15:59:52.176
> 1969-12-31 15:59:44.569
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14559) Remove setting hive.execution.engine in qfiles

2016-08-18 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14559:
-
Status: Patch Available  (was: Open)

> Remove setting hive.execution.engine in qfiles
> --
>
> Key: HIVE-14559
> URL: https://issues.apache.org/jira/browse/HIVE-14559
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14559.1.patch
>
>
> Some qfiles are explicitly setting execution engine. If we run those tests on 
> different Mini CliDriver's it could be very slow. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14554) Hive ptest should delete the itests/thirdparty directory everytime it builds hive

2016-08-18 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427114#comment-15427114
 ] 

Sergey Shelukhin commented on HIVE-14554:
-

I thought this JIRA was only for pre-commit tests :)

> Hive ptest should delete the itests/thirdparty directory everytime it builds 
> hive
> -
>
> Key: HIVE-14554
> URL: https://issues.apache.org/jira/browse/HIVE-14554
> Project: Hive
>  Issue Type: Task
>  Components: Testing Infrastructure
>Reporter: Sergio Peña
>Assignee: Sergio Peña
>
> The {{itests/thridparty}} directory is created by hive on spark when 
> downloading the spark-assembly file. Hive ptest should delete this directory 
> everytime it runs a new set of tests to avoid conflicts when a new spark 
> tarball is submitted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14572) Investigate jenkins test report timings

2016-08-18 Thread Siddharth Seth (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427093#comment-15427093
 ] 

Siddharth Seth commented on HIVE-14572:
---

I see the following in ptest2. Not sure if this is what generates the report.
{code}
   org.apache.maven.plugins
maven-surefire-report-plugin
2.15

{code}

> Investigate jenkins test report timings
> ---
>
> Key: HIVE-14572
> URL: https://issues.apache.org/jira/browse/HIVE-14572
> Project: Hive
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: Zoltan Haindrich
>
> [~sseth] have noticed some odd timings in the jenkins reports
> I've created a sample project, to emulate a clidriver run during qtest:
> the testclass:
> * 1 sec beforeclass
> * 3x 0.2s test
> created using junit4 parameterized.
> Double checkout; second project runs different tests...or at least they have 
> different name.
> here are my preliminary findings:
> || thing || expected || 2.16 || 2.19.1
> | total time | ~3.4s | 1.2s | 3.4s 
> | package time | ~3.4s | 0.61s | 1.7s
> | class time | ~3.4s | 0.61s | 1.7s
> | testcase times | ~.2s | ~.2s | ~.2s 
> notes:
> * using 2.16 beforeclass timngs are totally hidden or lost
> * 2.19.1 does account for beforeclass but still fails to correctly aggregate 
> the two runs of the similary named testclasses
> it might worth a try to look at the bleeding edge of this jenkins plugin...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-14561) Minor ptest2 improvements

2016-08-18 Thread Siddharth Seth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth reassigned HIVE-14561:
-

Assignee: Siddharth Seth

> Minor ptest2 improvements
> -
>
> Key: HIVE-14561
> URL: https://issues.apache.org/jira/browse/HIVE-14561
> Project: Hive
>  Issue Type: Task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14561.01.patch
>
>
> Re-purposed to track a few more improvements.
> - Update spring framework to work with Java8
> - Change elapseTime logging to milliseconds from seconds
> - Add thread name to log files.
> - Allow an empty logsEndPoint if outputDir is not specified
> - Log configuration when starting in a web server
> - Allow tests to be run even if no qtests property is set
> - Fix an exception on test completion when using FixedExecutionContextProvider



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14561) Minor ptest2 improvements

2016-08-18 Thread Siddharth Seth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-14561:
--
Attachment: HIVE-14561.01.patch

Patch to address the changes. Most of the changes are tested. Verifying some of 
the last minute changes like thread names, and times.

[~vgumashta], [~spena] - could you please take a look. There's no point running 
the precommit since it does not test anything here.
Unit tests pass locally.

I'm going to open follow up jiras to allow the the pre-setup and batch-exec 
files to be configurable.

> Minor ptest2 improvements
> -
>
> Key: HIVE-14561
> URL: https://issues.apache.org/jira/browse/HIVE-14561
> Project: Hive
>  Issue Type: Task
>Reporter: Siddharth Seth
> Attachments: HIVE-14561.01.patch
>
>
> Re-purposed to track a few more improvements.
> - Update spring framework to work with Java8
> - Change elapseTime logging to milliseconds from seconds
> - Add thread name to log files.
> - Allow an empty logsEndPoint if outputDir is not specified
> - Log configuration when starting in a web server
> - Allow tests to be run even if no qtests property is set
> - Fix an exception on test completion when using FixedExecutionContextProvider



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

1 2 >

1 - 100 of 133 matches

Mail list logo