[jira] [Commented] (HIVE-12077) MSCK Repair table should fix partitions in batches
[ https://issues.apache.org/jira/browse/HIVE-12077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427659#comment-15427659 ] Lefty Leverenz commented on HIVE-12077: --- Doc note: HIVE-14571 tracks documenting the new configuration parameter *hive.msck.repair.batch.size*. > MSCK Repair table should fix partitions in batches > --- > > Key: HIVE-12077 > URL: https://issues.apache.org/jira/browse/HIVE-12077 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Ryan P >Assignee: Chinna Rao Lalam > Labels: TODOC2.2 > Fix For: 2.2.0 > > Attachments: HIVE-12077.1.patch, HIVE-12077.2.patch, > HIVE-12077.3.patch, HIVE-12077.4.patch, HIVE-12077.5.patch > > > If a user attempts to run MSCK REPAIR TABLE on a directory with a large > number of untracked partitions HMS will OOME. I suspect this is because it > attempts to do one large bulk load in an effort to save time. Ultimately this > can lead to a collection so large in size that HMS eventually hits an Out of > Memory Exception. > Instead I suggest that Hive include a configurable batch size that HMS can > use to break up the load. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12077) MSCK Repair table should fix partitions in batches
[ https://issues.apache.org/jira/browse/HIVE-12077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-12077: -- Labels: TODOC2.2 (was: ) > MSCK Repair table should fix partitions in batches > --- > > Key: HIVE-12077 > URL: https://issues.apache.org/jira/browse/HIVE-12077 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Ryan P >Assignee: Chinna Rao Lalam > Labels: TODOC2.2 > Fix For: 2.2.0 > > Attachments: HIVE-12077.1.patch, HIVE-12077.2.patch, > HIVE-12077.3.patch, HIVE-12077.4.patch, HIVE-12077.5.patch > > > If a user attempts to run MSCK REPAIR TABLE on a directory with a large > number of untracked partitions HMS will OOME. I suspect this is because it > attempts to do one large bulk load in an effort to save time. Ultimately this > can lead to a collection so large in size that HMS eventually hits an Out of > Memory Exception. > Instead I suggest that Hive include a configurable batch size that HMS can > use to break up the load. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14571) Document configuration hive.msck.repair.batch.size
[ https://issues.apache.org/jira/browse/HIVE-14571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427653#comment-15427653 ] Lefty Leverenz commented on HIVE-14571: --- Review of text in the Description: bq. ... execute all the partitions at one short. The correct phrase is "at one shot" but since that's metaphoric, perhaps "at once" would be better. > Document configuration hive.msck.repair.batch.size > -- > > Key: HIVE-14571 > URL: https://issues.apache.org/jira/browse/HIVE-14571 > Project: Hive > Issue Type: Improvement > Components: Documentation >Reporter: Chinna Rao Lalam >Assignee: Chinna Rao Lalam >Priority: Minor > Labels: TODOC2.2 > Fix For: 2.2.0 > > > Update here > [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RecoverPartitions(MSCKREPAIRTABLE)] > {quote} > When there is a large number of untracked partitions for the MSCK REPAIR > TABLE command, there is a provision to run the msck repair table batch wise > to avoid OOME. By giving the configured batch size for the property > *hive.msck.repair.batch.size* it can run in the batches internally. The > default value of the property is zero, it means it will execute all the > partitions at one short. > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14571) Document configuration hive.msck.repair.batch.size
[ https://issues.apache.org/jira/browse/HIVE-14571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427651#comment-15427651 ] Lefty Leverenz commented on HIVE-14571: --- *hive.msck.repair.batch.size* also needs to be documented in the Configuration Properties wikidoc here: * [Configuration Properties -- Query and DDL Execution | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-QueryandDDLExecution] Added a TODOC2.2 label. > Document configuration hive.msck.repair.batch.size > -- > > Key: HIVE-14571 > URL: https://issues.apache.org/jira/browse/HIVE-14571 > Project: Hive > Issue Type: Improvement > Components: Documentation >Reporter: Chinna Rao Lalam >Assignee: Chinna Rao Lalam >Priority: Minor > Labels: TODOC2.2 > Fix For: 2.2.0 > > > Update here > [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RecoverPartitions(MSCKREPAIRTABLE)] > {quote} > When there is a large number of untracked partitions for the MSCK REPAIR > TABLE command, there is a provision to run the msck repair table batch wise > to avoid OOME. By giving the configured batch size for the property > *hive.msck.repair.batch.size* it can run in the batches internally. The > default value of the property is zero, it means it will execute all the > partitions at one short. > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14571) Document configuration hive.msck.repair.batch.size
[ https://issues.apache.org/jira/browse/HIVE-14571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-14571: -- Labels: TODOC2.2 (was: ) > Document configuration hive.msck.repair.batch.size > -- > > Key: HIVE-14571 > URL: https://issues.apache.org/jira/browse/HIVE-14571 > Project: Hive > Issue Type: Improvement > Components: Documentation >Reporter: Chinna Rao Lalam >Assignee: Chinna Rao Lalam >Priority: Minor > Labels: TODOC2.2 > Fix For: 2.2.0 > > > Update here > [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RecoverPartitions(MSCKREPAIRTABLE)] > {quote} > When there is a large number of untracked partitions for the MSCK REPAIR > TABLE command, there is a provision to run the msck repair table batch wise > to avoid OOME. By giving the configured batch size for the property > *hive.msck.repair.batch.size* it can run in the batches internally. The > default value of the property is zero, it means it will execute all the > partitions at one short. > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14565) CBO (Calcite Return Path) Handle field access for nested column
[ https://issues.apache.org/jira/browse/HIVE-14565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427645#comment-15427645 ] Vineet Garg commented on HIVE-14565: Looks good to me. Was wondering if we handle array but looks like it is handled and converted to index(array<>, idx) function call.Not sure about map yet. > CBO (Calcite Return Path) Handle field access for nested column > --- > > Key: HIVE-14565 > URL: https://issues.apache.org/jira/browse/HIVE-14565 > Project: Hive > Issue Type: Sub-task > Components: Logical Optimizer >Affects Versions: 2.1.0 >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Attachments: HIVE-14565.1.patch, HIVE-14565.patch > > > ExprNodeConverter doesn't handle field access currently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14566) LLAP IO reads timestamp wrongly
[ https://issues.apache.org/jira/browse/HIVE-14566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427635#comment-15427635 ] Hive QA commented on HIVE-14566: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12824472/HIVE-14566.3.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10443 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_1] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_2] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[load_dyn_part1] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[load_dyn_part2] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[transform_ppr1] org.apache.hive.beeline.TestBeeLineWithArgs.testEmbeddedBeelineOutputs org.apache.hive.jdbc.TestJdbcWithMiniHS2.testSelectThriftSerializeInTasks org.apache.hive.service.cli.operation.TestOperationLoggingLayout.testSwitchLogLayout {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/934/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/934/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-934/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 8 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12824472 - PreCommit-HIVE-MASTER-Build > LLAP IO reads timestamp wrongly > --- > > Key: HIVE-14566 > URL: https://issues.apache.org/jira/browse/HIVE-14566 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 2.1.0, 2.0.1, 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Critical > Attachments: HIVE-14566.1.patch, HIVE-14566.2.patch, > HIVE-14566.3.patch > > > HIVE-10127 is causing incorrect results when orc_merge12.q is run in llap. > It reads timestamp wrongly. > {code:title=LLAP IO Enabled} > hive> select atimestamp1 from alltypesorc3xcols limit 10; > OK > 1969-12-31 15:59:46.674 > NULL > 1969-12-31 15:59:55.787 > 1969-12-31 15:59:44.187 > 1969-12-31 15:59:50.434 > 1969-12-31 16:00:15.007 > 1969-12-31 16:00:07.021 > 1969-12-31 16:00:04.963 > 1969-12-31 15:59:52.176 > 1969-12-31 15:59:44.569 > {code} > {code:title=LLAP IO Disabled} > hive> select atimestamp1 from alltypesorc3xcols limit 10; > OK > 1969-12-31 15:59:46.674 > NULL > 1969-12-31 15:59:55.787 > 1969-12-31 15:59:44.187 > 1969-12-31 15:59:50.434 > 1969-12-31 16:00:14.007 > 1969-12-31 16:00:06.021 > 1969-12-31 16:00:03.963 > 1969-12-31 15:59:52.176 > 1969-12-31 15:59:44.569 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14522) CBO: Calcite Operator To Hive Operator(Calcite Return Path): Fix test failure for auto_join_filters
[ https://issues.apache.org/jira/browse/HIVE-14522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427633#comment-15427633 ] Lefty Leverenz commented on HIVE-14522: --- Doc note: This removes *hive.outerjoin.supports.filters* from HiveConf.java in release 2.2.0. It hasn't been documented in the wiki yet, but ought to be. (Created in 0.7.0 by HIVE-1534.) * [Configuration Properties -- Query and DDL Execution | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-QueryandDDLExecution] Added a TODOC2.2 label. > CBO: Calcite Operator To Hive Operator(Calcite Return Path): Fix test failure > for auto_join_filters > --- > > Key: HIVE-14522 > URL: https://issues.apache.org/jira/browse/HIVE-14522 > Project: Hive > Issue Type: Sub-task > Components: CBO >Reporter: Vineet Garg >Assignee: Vineet Garg > Labels: TODOC2.2 > Fix For: 2.2.0 > > Attachments: HIVE-14522.1.patch, HIVE-14522.2.patch > > > {code} > CREATE TABLE smb_input1(key int, value int) CLUSTERED BY (key) SORTED BY > (key) INTO 2 BUCKETS; > CREATE TABLE smb_input2(key int, value int) CLUSTERED BY (value) SORTED BY > (value) INTO 2 BUCKETS; > LOAD DATA LOCAL INPATH '../../data/files/in1.txt' into table smb_input1; > LOAD DATA LOCAL INPATH '../../data/files/in2.txt' into table smb_input1; > LOAD DATA LOCAL INPATH '../../data/files/in1.txt' into table smb_input2; > LOAD DATA LOCAL INPATH '../../data/files/in2.txt' into table smb_input2; > SET hive.optimize.bucketmapjoin = true; > SET hive.optimize.bucketmapjoin.sortedmerge = true; > SET hive.input.format = > org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat; > SET hive.outerjoin.supports.filters = false; > {code} > {code} SELECT sum(hash(a.key,a.value,b.key,b.value)) FROM myinput1 a LEFT > OUTER JOIN myinput1 b on a.key > 40 AND a.value > 50 AND a.key = a.value AND > b.key > 40 AND b.value > 50 AND b.key = b.value; {code} > {code} Expected result: 3078400 Actual result: 4937935 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14522) CBO: Calcite Operator To Hive Operator(Calcite Return Path): Fix test failure for auto_join_filters
[ https://issues.apache.org/jira/browse/HIVE-14522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-14522: -- Labels: TODOC2.2 (was: ) > CBO: Calcite Operator To Hive Operator(Calcite Return Path): Fix test failure > for auto_join_filters > --- > > Key: HIVE-14522 > URL: https://issues.apache.org/jira/browse/HIVE-14522 > Project: Hive > Issue Type: Sub-task > Components: CBO >Reporter: Vineet Garg >Assignee: Vineet Garg > Labels: TODOC2.2 > Fix For: 2.2.0 > > Attachments: HIVE-14522.1.patch, HIVE-14522.2.patch > > > {code} > CREATE TABLE smb_input1(key int, value int) CLUSTERED BY (key) SORTED BY > (key) INTO 2 BUCKETS; > CREATE TABLE smb_input2(key int, value int) CLUSTERED BY (value) SORTED BY > (value) INTO 2 BUCKETS; > LOAD DATA LOCAL INPATH '../../data/files/in1.txt' into table smb_input1; > LOAD DATA LOCAL INPATH '../../data/files/in2.txt' into table smb_input1; > LOAD DATA LOCAL INPATH '../../data/files/in1.txt' into table smb_input2; > LOAD DATA LOCAL INPATH '../../data/files/in2.txt' into table smb_input2; > SET hive.optimize.bucketmapjoin = true; > SET hive.optimize.bucketmapjoin.sortedmerge = true; > SET hive.input.format = > org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat; > SET hive.outerjoin.supports.filters = false; > {code} > {code} SELECT sum(hash(a.key,a.value,b.key,b.value)) FROM myinput1 a LEFT > OUTER JOIN myinput1 b on a.key > 40 AND a.value > 50 AND a.key = a.value AND > b.key > 40 AND b.value > 50 AND b.key = b.value; {code} > {code} Expected result: 3078400 Actual result: 4937935 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13874) Tighten up EOF checking in Fast DeserializeRead classes; display better exception information; add new Unit Tests
[ https://issues.apache.org/jira/browse/HIVE-13874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-13874: Attachment: HIVE-13874.04.patch > Tighten up EOF checking in Fast DeserializeRead classes; display better > exception information; add new Unit Tests > - > > Key: HIVE-13874 > URL: https://issues.apache.org/jira/browse/HIVE-13874 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-13874.01.patch, HIVE-13874.02.patch, > HIVE-13874.03.patch, HIVE-13874.04.patch > > > Tighten up EOF bounds checking in LazyBinaryDeserializeRead so bytes beyond > stated row end are never read. Use WritableUtils.decodeVIntSize to check for > room ahead like regular LazyBinary code does. > Display more detailed information when an exception is thrown by > DeserializeRead classes. > Add Unit Tests, including some designed that catch the errors like HIVE-13818. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13874) Tighten up EOF checking in Fast DeserializeRead classes; display better exception information; add new Unit Tests
[ https://issues.apache.org/jira/browse/HIVE-13874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-13874: Status: Patch Available (was: In Progress) > Tighten up EOF checking in Fast DeserializeRead classes; display better > exception information; add new Unit Tests > - > > Key: HIVE-13874 > URL: https://issues.apache.org/jira/browse/HIVE-13874 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-13874.01.patch, HIVE-13874.02.patch, > HIVE-13874.03.patch, HIVE-13874.04.patch > > > Tighten up EOF bounds checking in LazyBinaryDeserializeRead so bytes beyond > stated row end are never read. Use WritableUtils.decodeVIntSize to check for > room ahead like regular LazyBinary code does. > Display more detailed information when an exception is thrown by > DeserializeRead classes. > Add Unit Tests, including some designed that catch the errors like HIVE-13818. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14502) Convert MiniTez tests to MiniLlap tests
[ https://issues.apache.org/jira/browse/HIVE-14502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427615#comment-15427615 ] Prasanth Jayachandran commented on HIVE-14502: -- orc_merge12.q will be fixed in HIVE-14566. Remaining 3 issues are not really critical. Will fix them in follow up. > Convert MiniTez tests to MiniLlap tests > --- > > Key: HIVE-14502 > URL: https://issues.apache.org/jira/browse/HIVE-14502 > Project: Hive > Issue Type: Sub-task > Components: Test >Affects Versions: 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-14502.1.patch, HIVE-14502.2.patch, > HIVE-14502.3.patch, HIVE-14502.4.patch > > > Llap shares most of the codepath with tez. MiniLlapCliDriver is much faster > than MiniTezCliDriver because of threaded executors and caching. > MiniTezCliDriver tests takes around 3hr 15mins to run around 400 tests. To > cut down this test time significantly it makes sense to move over mive tez > tests to mini llap tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14502) Convert MiniTez tests to MiniLlap tests
[ https://issues.apache.org/jira/browse/HIVE-14502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427614#comment-15427614 ] Prasanth Jayachandran commented on HIVE-14502: -- vector_join_part_col_char.q test output for llap is similar to mr. Show partitions for partitions with char type is showing with padded space in mr and llap but trims spaces in tez. I will investigate more why tez is showing partitions without padded spaces. > Convert MiniTez tests to MiniLlap tests > --- > > Key: HIVE-14502 > URL: https://issues.apache.org/jira/browse/HIVE-14502 > Project: Hive > Issue Type: Sub-task > Components: Test >Affects Versions: 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-14502.1.patch, HIVE-14502.2.patch, > HIVE-14502.3.patch, HIVE-14502.4.patch > > > Llap shares most of the codepath with tez. MiniLlapCliDriver is much faster > than MiniTezCliDriver because of threaded executors and caching. > MiniTezCliDriver tests takes around 3hr 15mins to run around 400 tests. To > cut down this test time significantly it makes sense to move over mive tez > tests to mini llap tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14502) Convert MiniTez tests to MiniLlap tests
[ https://issues.apache.org/jira/browse/HIVE-14502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-14502: - Attachment: HIVE-14502.4.patch > Convert MiniTez tests to MiniLlap tests > --- > > Key: HIVE-14502 > URL: https://issues.apache.org/jira/browse/HIVE-14502 > Project: Hive > Issue Type: Sub-task > Components: Test >Affects Versions: 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-14502.1.patch, HIVE-14502.2.patch, > HIVE-14502.3.patch, HIVE-14502.4.patch > > > Llap shares most of the codepath with tez. MiniLlapCliDriver is much faster > than MiniTezCliDriver because of threaded executors and caching. > MiniTezCliDriver tests takes around 3hr 15mins to run around 400 tests. To > cut down this test time significantly it makes sense to move over mive tez > tests to mini llap tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14574) use consistent hashing for LLAP consistent splits to alleviate impact from cluster changes
[ https://issues.apache.org/jira/browse/HIVE-14574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427543#comment-15427543 ] Hive QA commented on HIVE-14574: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12824479/HIVE-14574.01.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10443 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_1] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_2] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[load_dyn_part1] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[transform_ppr1] org.apache.hive.beeline.TestBeeLineWithArgs.testEmbeddedBeelineOutputs org.apache.hive.jdbc.TestJdbcWithMiniHS2.testSelectThriftSerializeInTasks org.apache.hive.service.cli.operation.TestOperationLoggingLayout.testSwitchLogLayout {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/933/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/933/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-933/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12824479 - PreCommit-HIVE-MASTER-Build > use consistent hashing for LLAP consistent splits to alleviate impact from > cluster changes > -- > > Key: HIVE-14574 > URL: https://issues.apache.org/jira/browse/HIVE-14574 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-14574.01.patch, HIVE-14574.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14583) Error: Error while compiling statement: FAILED: SemanticException Generate Map Join Task Error: Encountered unregistered class ID: 95
[ https://issues.apache.org/jira/browse/HIVE-14583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427540#comment-15427540 ] huangxiangang commented on HIVE-14583: -- there some other logs : 2016-08-18 15:13:59,712 | WARN | HiveServer2-Handler-Pool: Thread-42427169 | Runtime exception be caught while deserializing object using kryo:Unable to find class: com.huawei.udaf.GroupConcatSimpleUDAF$GroupConcatSimpleUDAFEvaluator Serialization trace: udafEvaluator (org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge$GenericUDAFBridgeEvaluator) genericUDAFWritableEvaluator (org.apache.hadoop.hive.ql.plan.AggregationDesc) aggregators (org.apache.hadoop.hive.ql.plan.GroupByDesc) conf (org.apache.hadoop.hive.ql.exec.GroupByOperator) childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator) childOperators (org.apache.hadoop.hive.ql.exec.GroupByOperator) opParseCtxMap (org.apache.hadoop.hive.ql.plan.MapWork) mapWork (org.apache.hadoop.hive.ql.plan.MapredWork) | org.apache.hadoop.hive.ql.exec.Utilities.deserializePlan(Utilities.java:948) 2016-08-18 15:13:59,713 | ERROR | HiveServer2-Handler-Pool: Thread-42427169 | FAILED: SemanticException Generate Map Join Task Error: Index: 93, Size: 0 org.apache.hadoop.hive.ql.parse.SemanticException: Generate Map Join Task Error: Index: 93, Size: 0 at org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinTaskDispatcher.processCurrentTask(CommonJoinTaskDispatcher.java:513) at org.apache.hadoop.hive.ql.optimizer.physical.AbstractJoinTaskDispatcher.dispatch(AbstractJoinTaskDispatcher.java:182) at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111) at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:194) at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(TaskGraphWalker.java:139) at org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinResolver.resolve(CommonJoinResolver.java:79) at org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:104) at org.apache.hadoop.hive.ql.parse.MapReduceCompiler.optimizeTaskPlan(MapReduceCompiler.java:290) at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:217) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9309) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:331) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:384) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:280) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1035) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1018) at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:103) at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:208) at org.apache.hive.service.cli.operation.Operation.run(Operation.java:256) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:311) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:298) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:79) at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:37) at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:64) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1641) at org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:493) at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:60) at com.sun.proxy.$Proxy20.executeStatementAsync(Unknown Source) at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:236) at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:530) at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313) at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at
[jira] [Commented] (HIVE-14583) Error: Error while compiling statement: FAILED: SemanticException Generate Map Join Task Error: Encountered unregistered class ID: 95
[ https://issues.apache.org/jira/browse/HIVE-14583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427535#comment-15427535 ] huangxiangang commented on HIVE-14583: -- the ERROR can not be reproducible,because when I try to run it,it succeed。and from the log,I can not find the keywords “kryo”. so, I think it maybe some other issue . > Error: Error while compiling statement: FAILED: SemanticException Generate > Map Join Task Error: Encountered unregistered class ID: 95 > - > > Key: HIVE-14583 > URL: https://issues.apache.org/jira/browse/HIVE-14583 > Project: Hive > Issue Type: Bug > Components: Beeline, CLI, Clients >Affects Versions: 0.13.0 >Reporter: huangxiangang >Assignee: huangxiangang > > sometimes ,when I run a hive join query , get the following error: > Error: Error while compiling statement: FAILED: SemanticException Generate > Map Join Task Error: Encountered unregistered class ID: 95 > but,when I try to run it several times,it maybe succeed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14427) CompactionTxnHandler.markCleaned() can delete aborted txns
[ https://issues.apache.org/jira/browse/HIVE-14427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427526#comment-15427526 ] Barna Zsombor Klara commented on HIVE-14427: Hi [~ekoifman] are you currently working on this, or could I take a look at it? Thanks, Zsombor > CompactionTxnHandler.markCleaned() can delete aborted txns > -- > > Key: HIVE-14427 > URL: https://issues.apache.org/jira/browse/HIVE-14427 > Project: Hive > Issue Type: Improvement > Components: Transactions >Reporter: Eugene Koifman > > We can modify > {noformat} > s = "select distinct txn_id from TXNS, TXN_COMPONENTS where txn_id = tc_txnid > and txn_state = '" + > TXN_ABORTED + "' and tc_database = '" + info.dbname + "' and > tc_table = '" + > info.tableName + "'" + (info.highestTxnId == 0 ? "" : " and txn_id > <= " + info.highestTxnId); > {noformat} > to use select txn_id, count(*) ... group by txn_id so that we know the number > of components in a TXN. > Then when running "delete from TXN_COMPONENTS where..." we know how many rows > were deleted. > If the sum of all values from 1st query matched total number of rows deleted, > we know that all Aborted txns in this set are empty and thus can be deleted > here. > This means we clean up aborted txns from TXNS table quicker and avoid a large > join in _cleanEmptyAbortedTxns()_. Also, doing delete on TXNS here will have > PKs in WHERE clause so it should be cheap. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14502) Convert MiniTez tests to MiniLlap tests
[ https://issues.apache.org/jira/browse/HIVE-14502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-14502: - Attachment: HIVE-14502.3.patch 4 qfiles that are yet to be moved to llap must be in minitez.shared. That was causing the previous failures. > Convert MiniTez tests to MiniLlap tests > --- > > Key: HIVE-14502 > URL: https://issues.apache.org/jira/browse/HIVE-14502 > Project: Hive > Issue Type: Sub-task > Components: Test >Affects Versions: 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-14502.1.patch, HIVE-14502.2.patch, > HIVE-14502.3.patch > > > Llap shares most of the codepath with tez. MiniLlapCliDriver is much faster > than MiniTezCliDriver because of threaded executors and caching. > MiniTezCliDriver tests takes around 3hr 15mins to run around 400 tests. To > cut down this test time significantly it makes sense to move over mive tez > tests to mini llap tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14583) Error: Error while compiling statement: FAILED: SemanticException Generate Map Join Task Error: Encountered unregistered class ID: 95
[ https://issues.apache.org/jira/browse/HIVE-14583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427506#comment-15427506 ] Prasanth Jayachandran commented on HIVE-14583: -- Can you please provide a small reproducible test case? There were several fixes that went in after 0.13.0 related to kryo. We might no longer have this issue in recent releases. > Error: Error while compiling statement: FAILED: SemanticException Generate > Map Join Task Error: Encountered unregistered class ID: 95 > - > > Key: HIVE-14583 > URL: https://issues.apache.org/jira/browse/HIVE-14583 > Project: Hive > Issue Type: Bug > Components: Beeline, CLI, Clients >Affects Versions: 0.13.0 >Reporter: huangxiangang >Assignee: huangxiangang > > sometimes ,when I run a hive join query , get the following error: > Error: Error while compiling statement: FAILED: SemanticException Generate > Map Join Task Error: Encountered unregistered class ID: 95 > but,when I try to run it several times,it maybe succeed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14583) Error: Error while compiling statement: FAILED: SemanticException Generate Map Join Task Error: Encountered unregistered class ID: 95
[ https://issues.apache.org/jira/browse/HIVE-14583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huangxiangang updated HIVE-14583: - Component/s: Clients CLI > Error: Error while compiling statement: FAILED: SemanticException Generate > Map Join Task Error: Encountered unregistered class ID: 95 > - > > Key: HIVE-14583 > URL: https://issues.apache.org/jira/browse/HIVE-14583 > Project: Hive > Issue Type: Bug > Components: Beeline, CLI, Clients >Affects Versions: 0.13.0 >Reporter: huangxiangang >Assignee: huangxiangang > > sometimes ,when I run a hive join query , get the following error: > Error: Error while compiling statement: FAILED: SemanticException Generate > Map Join Task Error: Encountered unregistered class ID: 95 > but,when I try to run it several times,it maybe succeed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14560) Support exchange partition between s3 and hdfs tables
[ https://issues.apache.org/jira/browse/HIVE-14560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abdullah Yousufi updated HIVE-14560: Attachment: (was: HIVE-14560.02.patch) > Support exchange partition between s3 and hdfs tables > - > > Key: HIVE-14560 > URL: https://issues.apache.org/jira/browse/HIVE-14560 > Project: Hive > Issue Type: Bug >Reporter: Abdullah Yousufi >Assignee: Abdullah Yousufi > Fix For: 2.2.0 > > Attachments: HIVE-14560.02.patch, HIVE-14560.patch > > > {code} > alter table s3_tbl exchange partition (country='USA', state='CA') with table > hdfs_tbl; > {code} > results in: > {code} > Error: Error while processing statement: FAILED: Execution Error, return code > 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got > exception: java.lang.IllegalArgumentException Wrong FS: > s3a://hive-on-s3/s3_tbl/country=USA/state=CA, expected: > hdfs://localhost:9000) (state=08S01,code=1) > {code} > because the check for whether the s3 destination table path exists occurs on > the hdfs filesystem. > Furthermore, exchanging between s3 to hdfs fails because the hdfs rename > operation is not supported across filesystems. Fix uses copy + deletion in > the case that the file systems differ. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14560) Support exchange partition between s3 and hdfs tables
[ https://issues.apache.org/jira/browse/HIVE-14560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abdullah Yousufi updated HIVE-14560: Attachment: HIVE-14560.02.patch > Support exchange partition between s3 and hdfs tables > - > > Key: HIVE-14560 > URL: https://issues.apache.org/jira/browse/HIVE-14560 > Project: Hive > Issue Type: Bug >Reporter: Abdullah Yousufi >Assignee: Abdullah Yousufi > Fix For: 2.2.0 > > Attachments: HIVE-14560.02.patch, HIVE-14560.patch > > > {code} > alter table s3_tbl exchange partition (country='USA', state='CA') with table > hdfs_tbl; > {code} > results in: > {code} > Error: Error while processing statement: FAILED: Execution Error, return code > 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got > exception: java.lang.IllegalArgumentException Wrong FS: > s3a://hive-on-s3/s3_tbl/country=USA/state=CA, expected: > hdfs://localhost:9000) (state=08S01,code=1) > {code} > because the check for whether the s3 destination table path exists occurs on > the hdfs filesystem. > Furthermore, exchanging between s3 to hdfs fails because the hdfs rename > operation is not supported across filesystems. Fix uses copy + deletion in > the case that the file systems differ. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14559) Remove setting hive.execution.engine in qfiles
[ https://issues.apache.org/jira/browse/HIVE-14559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-14559: - Resolution: Fixed Fix Version/s: 2.2.0 Status: Resolved (was: Patch Available) Thanks for the review! Committed to master. > Remove setting hive.execution.engine in qfiles > -- > > Key: HIVE-14559 > URL: https://issues.apache.org/jira/browse/HIVE-14559 > Project: Hive > Issue Type: Sub-task > Components: Test >Affects Versions: 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Fix For: 2.2.0 > > Attachments: HIVE-14559.1.patch > > > Some qfiles are explicitly setting execution engine. If we run those tests on > different Mini CliDriver's it could be very slow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14559) Remove setting hive.execution.engine in qfiles
[ https://issues.apache.org/jira/browse/HIVE-14559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427463#comment-15427463 ] Prasanth Jayachandran commented on HIVE-14559: -- Test failures are not related to this patch. > Remove setting hive.execution.engine in qfiles > -- > > Key: HIVE-14559 > URL: https://issues.apache.org/jira/browse/HIVE-14559 > Project: Hive > Issue Type: Sub-task > Components: Test >Affects Versions: 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-14559.1.patch > > > Some qfiles are explicitly setting execution engine. If we run those tests on > different Mini CliDriver's it could be very slow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14554) Hive ptest should delete the itests/thirdparty directory everytime it builds hive
[ https://issues.apache.org/jira/browse/HIVE-14554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427464#comment-15427464 ] Rui Li commented on HIVE-14554: --- The checksum way sounds good to me. > Hive ptest should delete the itests/thirdparty directory everytime it builds > hive > - > > Key: HIVE-14554 > URL: https://issues.apache.org/jira/browse/HIVE-14554 > Project: Hive > Issue Type: Task > Components: Testing Infrastructure >Reporter: Sergio Peña >Assignee: Sergio Peña > > The {{itests/thridparty}} directory is created by hive on spark when > downloading the spark-assembly file. Hive ptest should delete this directory > everytime it runs a new set of tests to avoid conflicts when a new spark > tarball is submitted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14560) Support exchange partition between s3 and hdfs tables
[ https://issues.apache.org/jira/browse/HIVE-14560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abdullah Yousufi updated HIVE-14560: Attachment: HIVE-14560.02.patch Removed whitespace lines modified by editor + change FileUtils.copy() to copy() > Support exchange partition between s3 and hdfs tables > - > > Key: HIVE-14560 > URL: https://issues.apache.org/jira/browse/HIVE-14560 > Project: Hive > Issue Type: Bug >Reporter: Abdullah Yousufi >Assignee: Abdullah Yousufi > Fix For: 2.2.0 > > Attachments: HIVE-14560.02.patch, HIVE-14560.patch > > > {code} > alter table s3_tbl exchange partition (country='USA', state='CA') with table > hdfs_tbl; > {code} > results in: > {code} > Error: Error while processing statement: FAILED: Execution Error, return code > 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got > exception: java.lang.IllegalArgumentException Wrong FS: > s3a://hive-on-s3/s3_tbl/country=USA/state=CA, expected: > hdfs://localhost:9000) (state=08S01,code=1) > {code} > because the check for whether the s3 destination table path exists occurs on > the hdfs filesystem. > Furthermore, exchanging between s3 to hdfs fails because the hdfs rename > operation is not supported across filesystems. Fix uses copy + deletion in > the case that the file systems differ. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14559) Remove setting hive.execution.engine in qfiles
[ https://issues.apache.org/jira/browse/HIVE-14559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427454#comment-15427454 ] Hive QA commented on HIVE-14559: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12824213/HIVE-14559.1.patch {color:green}SUCCESS:{color} +1 due to 6 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10442 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_1] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_2] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[load_dyn_part1] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[load_dyn_part2] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[transform_ppr1] org.apache.hive.beeline.TestBeeLineWithArgs.testEmbeddedBeelineOutputs org.apache.hive.jdbc.TestJdbcWithMiniHS2.testSelectThriftSerializeInTasks org.apache.hive.service.cli.operation.TestOperationLoggingLayout.testSwitchLogLayout {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/932/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/932/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-932/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 8 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12824213 - PreCommit-HIVE-MASTER-Build > Remove setting hive.execution.engine in qfiles > -- > > Key: HIVE-14559 > URL: https://issues.apache.org/jira/browse/HIVE-14559 > Project: Hive > Issue Type: Sub-task > Components: Test >Affects Versions: 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-14559.1.patch > > > Some qfiles are explicitly setting execution engine. If we run those tests on > different Mini CliDriver's it could be very slow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14580) Introduce || operator
[ https://issues.apache.org/jira/browse/HIVE-14580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427453#comment-15427453 ] Ashutosh Chauhan commented on HIVE-14580: - https://docs.oracle.com/cd/B19306_01/server.102/b14200/operators003.htm is an example. > Introduce || operator > - > > Key: HIVE-14580 > URL: https://issues.apache.org/jira/browse/HIVE-14580 > Project: Hive > Issue Type: Sub-task > Components: SQL >Reporter: Ashutosh Chauhan > > Functionally equivalent to concat() udf. But standard allows usage of || for > string concatenations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14574) use consistent hashing for LLAP consistent splits to alleviate impact from cluster changes
[ https://issues.apache.org/jira/browse/HIVE-14574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-14574: Attachment: HIVE-14574.01.patch Added test and naming... this is the most over-engineered test ever. The number trends to more reasonable with more splits, strangely enough :) I think we can make a separate jira for more the machine accounting described above. > use consistent hashing for LLAP consistent splits to alleviate impact from > cluster changes > -- > > Key: HIVE-14574 > URL: https://issues.apache.org/jira/browse/HIVE-14574 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-14574.01.patch, HIVE-14574.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14580) Introduce || operator
[ https://issues.apache.org/jira/browse/HIVE-14580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427447#comment-15427447 ] Pengcheng Xiong commented on HIVE-14580: [~ashutoshc]. how do we distinguish this from mathematic "||"? Thanks. > Introduce || operator > - > > Key: HIVE-14580 > URL: https://issues.apache.org/jira/browse/HIVE-14580 > Project: Hive > Issue Type: Sub-task > Components: SQL >Reporter: Ashutosh Chauhan > > Functionally equivalent to concat() udf. But standard allows usage of || for > string concatenations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14579) Add extract udf
[ https://issues.apache.org/jira/browse/HIVE-14579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427430#comment-15427430 ] Ashutosh Chauhan commented on HIVE-14579: - Part of standard sql and heavily used in timeseries based datasets. > Add extract udf > --- > > Key: HIVE-14579 > URL: https://issues.apache.org/jira/browse/HIVE-14579 > Project: Hive > Issue Type: Sub-task > Components: UDF >Reporter: Ashutosh Chauhan > > https://www.postgresql.org/docs/9.1/static/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14570) Create table with column names ROW__ID, INPUT__FILE__NAME, BLOCK__OFFSET__INSIDE__FILE sucess but query fails
[ https://issues.apache.org/jira/browse/HIVE-14570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427424#comment-15427424 ] Niklaus Xiao commented on HIVE-14570: - Unrelated tests failure. > Create table with column names ROW__ID, INPUT__FILE__NAME, > BLOCK__OFFSET__INSIDE__FILE sucess but query fails > - > > Key: HIVE-14570 > URL: https://issues.apache.org/jira/browse/HIVE-14570 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 1.3.0, 2.2.0 >Reporter: Niklaus Xiao >Assignee: Niklaus Xiao > Fix For: 2.2.0 > > Attachments: HIVE-14570.patch > > > {code} > 0: jdbc:hive2://189.39.151.74:21066/> create table foo1(ROW__ID string); > No rows affected (0.281 seconds) > 0: jdbc:hive2://189.39.151.74:21066/> create table > foo2(BLOCK__OFFSET__INSIDE__FILE string); > No rows affected (0.323 seconds) > 0: jdbc:hive2://189.39.151.74:21066/> create table foo3(INPUT__FILE__NAME > string); > No rows affected (0.307 seconds) > 0: jdbc:hive2://189.39.151.74:21066/> select * from foo1; > Error: Error while compiling statement: FAILED: SemanticException Line 0:-1 > Invalid column reference 'TOK_ALLCOLREF' (state=42000,code=4) > 0: jdbc:hive2://189.39.151.74:21066/> select * from foo2; > Error: Error while compiling statement: FAILED: SemanticException Line 0:-1 > Invalid column reference 'TOK_ALLCOLREF' (state=42000,code=4) > 0: jdbc:hive2://189.39.151.74:21066/> select * from foo3; > Error: Error while compiling statement: FAILED: SemanticException Line 0:-1 > Invalid column reference 'TOK_ALLCOLREF' (state=42000,code=4) > {code} > We should prevent user from creating table with column names the same as > Virtual Column names -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13874) Tighten up EOF checking in Fast DeserializeRead classes; display better exception information; add new Unit Tests
[ https://issues.apache.org/jira/browse/HIVE-13874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427420#comment-15427420 ] Sergey Shelukhin commented on HIVE-13874: - couple comments on RB > Tighten up EOF checking in Fast DeserializeRead classes; display better > exception information; add new Unit Tests > - > > Key: HIVE-13874 > URL: https://issues.apache.org/jira/browse/HIVE-13874 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-13874.01.patch, HIVE-13874.02.patch, > HIVE-13874.03.patch > > > Tighten up EOF bounds checking in LazyBinaryDeserializeRead so bytes beyond > stated row end are never read. Use WritableUtils.decodeVIntSize to check for > room ahead like regular LazyBinary code does. > Display more detailed information when an exception is thrown by > DeserializeRead classes. > Add Unit Tests, including some designed that catch the errors like HIVE-13818. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14565) CBO (Calcite Return Path) Handle field access for nested column
[ https://issues.apache.org/jira/browse/HIVE-14565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-14565: Status: Open (was: Patch Available) > CBO (Calcite Return Path) Handle field access for nested column > --- > > Key: HIVE-14565 > URL: https://issues.apache.org/jira/browse/HIVE-14565 > Project: Hive > Issue Type: Sub-task > Components: Logical Optimizer >Affects Versions: 2.1.0 >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Attachments: HIVE-14565.1.patch, HIVE-14565.patch > > > ExprNodeConverter doesn't handle field access currently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14565) CBO (Calcite Return Path) Handle field access for nested column
[ https://issues.apache.org/jira/browse/HIVE-14565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-14565: Status: Patch Available (was: Open) > CBO (Calcite Return Path) Handle field access for nested column > --- > > Key: HIVE-14565 > URL: https://issues.apache.org/jira/browse/HIVE-14565 > Project: Hive > Issue Type: Sub-task > Components: Logical Optimizer >Affects Versions: 2.1.0 >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Attachments: HIVE-14565.1.patch, HIVE-14565.patch > > > ExprNodeConverter doesn't handle field access currently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14565) CBO (Calcite Return Path) Handle field access for nested column
[ https://issues.apache.org/jira/browse/HIVE-14565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-14565: Attachment: HIVE-14565.1.patch [~jcamachorodriguez] Can you please review this? > CBO (Calcite Return Path) Handle field access for nested column > --- > > Key: HIVE-14565 > URL: https://issues.apache.org/jira/browse/HIVE-14565 > Project: Hive > Issue Type: Sub-task > Components: Logical Optimizer >Affects Versions: 2.1.0 >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Attachments: HIVE-14565.1.patch, HIVE-14565.patch > > > ExprNodeConverter doesn't handle field access currently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14566) LLAP IO reads timestamp wrongly
[ https://issues.apache.org/jira/browse/HIVE-14566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427415#comment-15427415 ] Sergey Shelukhin commented on HIVE-14566: - +1 pending tests > LLAP IO reads timestamp wrongly > --- > > Key: HIVE-14566 > URL: https://issues.apache.org/jira/browse/HIVE-14566 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 2.1.0, 2.0.1, 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Critical > Attachments: HIVE-14566.1.patch, HIVE-14566.2.patch, > HIVE-14566.3.patch > > > HIVE-10127 is causing incorrect results when orc_merge12.q is run in llap. > It reads timestamp wrongly. > {code:title=LLAP IO Enabled} > hive> select atimestamp1 from alltypesorc3xcols limit 10; > OK > 1969-12-31 15:59:46.674 > NULL > 1969-12-31 15:59:55.787 > 1969-12-31 15:59:44.187 > 1969-12-31 15:59:50.434 > 1969-12-31 16:00:15.007 > 1969-12-31 16:00:07.021 > 1969-12-31 16:00:04.963 > 1969-12-31 15:59:52.176 > 1969-12-31 15:59:44.569 > {code} > {code:title=LLAP IO Disabled} > hive> select atimestamp1 from alltypesorc3xcols limit 10; > OK > 1969-12-31 15:59:46.674 > NULL > 1969-12-31 15:59:55.787 > 1969-12-31 15:59:44.187 > 1969-12-31 15:59:50.434 > 1969-12-31 16:00:14.007 > 1969-12-31 16:00:06.021 > 1969-12-31 16:00:03.963 > 1969-12-31 15:59:52.176 > 1969-12-31 15:59:44.569 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14566) LLAP IO reads timestamp wrongly
[ https://issues.apache.org/jira/browse/HIVE-14566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-14566: - Attachment: HIVE-14566.3.patch Added a junit test. > LLAP IO reads timestamp wrongly > --- > > Key: HIVE-14566 > URL: https://issues.apache.org/jira/browse/HIVE-14566 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 2.1.0, 2.0.1, 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Critical > Attachments: HIVE-14566.1.patch, HIVE-14566.2.patch, > HIVE-14566.3.patch > > > HIVE-10127 is causing incorrect results when orc_merge12.q is run in llap. > It reads timestamp wrongly. > {code:title=LLAP IO Enabled} > hive> select atimestamp1 from alltypesorc3xcols limit 10; > OK > 1969-12-31 15:59:46.674 > NULL > 1969-12-31 15:59:55.787 > 1969-12-31 15:59:44.187 > 1969-12-31 15:59:50.434 > 1969-12-31 16:00:15.007 > 1969-12-31 16:00:07.021 > 1969-12-31 16:00:04.963 > 1969-12-31 15:59:52.176 > 1969-12-31 15:59:44.569 > {code} > {code:title=LLAP IO Disabled} > hive> select atimestamp1 from alltypesorc3xcols limit 10; > OK > 1969-12-31 15:59:46.674 > NULL > 1969-12-31 15:59:55.787 > 1969-12-31 15:59:44.187 > 1969-12-31 15:59:50.434 > 1969-12-31 16:00:14.007 > 1969-12-31 16:00:06.021 > 1969-12-31 16:00:03.963 > 1969-12-31 15:59:52.176 > 1969-12-31 15:59:44.569 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14165) Remove Hive file listing during split computation
[ https://issues.apache.org/jira/browse/HIVE-14165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abdullah Yousufi updated HIVE-14165: Attachment: HIVE-14165.02.patch > Remove Hive file listing during split computation > - > > Key: HIVE-14165 > URL: https://issues.apache.org/jira/browse/HIVE-14165 > Project: Hive > Issue Type: Sub-task >Affects Versions: 2.1.0 >Reporter: Abdullah Yousufi >Assignee: Abdullah Yousufi > Attachments: HIVE-14165.02.patch, HIVE-14165.patch > > > The Hive side listing in FetchOperator.java is unnecessary, since Hadoop's > FileInputFormat.java will list the files during split computation anyway to > determine their size. One way to remove this is to catch the > InvalidInputFormat exception thrown by FileInputFormat#getSplits() on the > Hive side instead of doing the file listing beforehand. > For S3 select queries on partitioned tables, this results in a 2x speedup. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14503) Remove explicit order by in qfiles for union tests
[ https://issues.apache.org/jira/browse/HIVE-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427367#comment-15427367 ] Hive QA commented on HIVE-14503: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12824030/HIVE-14503.3.patch {color:green}SUCCESS:{color} +1 due to 35 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10442 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_1] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_2] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[load_dyn_part1] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[load_dyn_part2] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[transform_ppr1] org.apache.hive.beeline.TestBeeLineWithArgs.testEmbeddedBeelineOutputs org.apache.hive.jdbc.TestJdbcWithMiniHS2.testSelectThriftSerializeInTasks org.apache.hive.service.cli.operation.TestOperationLoggingLayout.testSwitchLogLayout {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/931/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/931/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-931/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 8 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12824030 - PreCommit-HIVE-MASTER-Build > Remove explicit order by in qfiles for union tests > -- > > Key: HIVE-14503 > URL: https://issues.apache.org/jira/browse/HIVE-14503 > Project: Hive > Issue Type: Sub-task > Components: Test >Affects Versions: 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-14503.1.patch, HIVE-14503.2.patch, > HIVE-14503.3.patch, HIVE-14503.4.patch > > > Identify qfiles with explicit order by and replace them with > SORT_QUERY_RESULTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14522) CBO: Calcite Operator To Hive Operator(Calcite Return Path): Fix test failure for auto_join_filters
[ https://issues.apache.org/jira/browse/HIVE-14522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-14522: Resolution: Fixed Fix Version/s: 2.2.0 Status: Resolved (was: Patch Available) Pushed to master. Thanks, Vineet! > CBO: Calcite Operator To Hive Operator(Calcite Return Path): Fix test failure > for auto_join_filters > --- > > Key: HIVE-14522 > URL: https://issues.apache.org/jira/browse/HIVE-14522 > Project: Hive > Issue Type: Sub-task > Components: CBO >Reporter: Vineet Garg >Assignee: Vineet Garg > Fix For: 2.2.0 > > Attachments: HIVE-14522.1.patch, HIVE-14522.2.patch > > > {code} > CREATE TABLE smb_input1(key int, value int) CLUSTERED BY (key) SORTED BY > (key) INTO 2 BUCKETS; > CREATE TABLE smb_input2(key int, value int) CLUSTERED BY (value) SORTED BY > (value) INTO 2 BUCKETS; > LOAD DATA LOCAL INPATH '../../data/files/in1.txt' into table smb_input1; > LOAD DATA LOCAL INPATH '../../data/files/in2.txt' into table smb_input1; > LOAD DATA LOCAL INPATH '../../data/files/in1.txt' into table smb_input2; > LOAD DATA LOCAL INPATH '../../data/files/in2.txt' into table smb_input2; > SET hive.optimize.bucketmapjoin = true; > SET hive.optimize.bucketmapjoin.sortedmerge = true; > SET hive.input.format = > org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat; > SET hive.outerjoin.supports.filters = false; > {code} > {code} SELECT sum(hash(a.key,a.value,b.key,b.value)) FROM myinput1 a LEFT > OUTER JOIN myinput1 b on a.key > 40 AND a.value > 50 AND a.key = a.value AND > b.key > 40 AND b.value > 50 AND b.key = b.value; {code} > {code} Expected result: 3078400 Actual result: 4937935 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14554) Hive ptest should delete the itests/thirdparty directory everytime it builds hive
[ https://issues.apache.org/jira/browse/HIVE-14554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427358#comment-15427358 ] Sergio Peña commented on HIVE-14554: The best solution is to switch to dependency model, but that is not a trivial task. We're working on that to make the move. Another quick solution is to add a checksum file to the spark assembly file. Then hive will download the checksum, and compare it against the local checksum file. If they're different, then it will download the spark-assembly, otherwise it just uses the local. I like this approach more. > Hive ptest should delete the itests/thirdparty directory everytime it builds > hive > - > > Key: HIVE-14554 > URL: https://issues.apache.org/jira/browse/HIVE-14554 > Project: Hive > Issue Type: Task > Components: Testing Infrastructure >Reporter: Sergio Peña >Assignee: Sergio Peña > > The {{itests/thridparty}} directory is created by hive on spark when > downloading the spark-assembly file. Hive ptest should delete this directory > everytime it runs a new set of tests to avoid conflicts when a new spark > tarball is submitted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14565) CBO (Calcite Return Path) Handle field access for nested column
[ https://issues.apache.org/jira/browse/HIVE-14565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427357#comment-15427357 ] Vineet Garg commented on HIVE-14565: Thanks Ashutosh > CBO (Calcite Return Path) Handle field access for nested column > --- > > Key: HIVE-14565 > URL: https://issues.apache.org/jira/browse/HIVE-14565 > Project: Hive > Issue Type: Sub-task > Components: Logical Optimizer >Affects Versions: 2.1.0 >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Attachments: HIVE-14565.patch > > > ExprNodeConverter doesn't handle field access currently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14554) Hive ptest should delete the itests/thirdparty directory everytime it builds hive
[ https://issues.apache.org/jira/browse/HIVE-14554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427348#comment-15427348 ] Rui Li commented on HIVE-14554: --- Oh sorry I misunderstood it. Just thought more about this, maybe it's better to associate a timestamp with the file and only re-download when upstream timestamp updates? I guess that's similar to how maven handles snapshots. > Hive ptest should delete the itests/thirdparty directory everytime it builds > hive > - > > Key: HIVE-14554 > URL: https://issues.apache.org/jira/browse/HIVE-14554 > Project: Hive > Issue Type: Task > Components: Testing Infrastructure >Reporter: Sergio Peña >Assignee: Sergio Peña > > The {{itests/thridparty}} directory is created by hive on spark when > downloading the spark-assembly file. Hive ptest should delete this directory > everytime it runs a new set of tests to avoid conflicts when a new spark > tarball is submitted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14503) Remove explicit order by in qfiles for union tests
[ https://issues.apache.org/jira/browse/HIVE-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427341#comment-15427341 ] Siddharth Seth commented on HIVE-14503: --- +1. > Remove explicit order by in qfiles for union tests > -- > > Key: HIVE-14503 > URL: https://issues.apache.org/jira/browse/HIVE-14503 > Project: Hive > Issue Type: Sub-task > Components: Test >Affects Versions: 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-14503.1.patch, HIVE-14503.2.patch, > HIVE-14503.3.patch, HIVE-14503.4.patch > > > Identify qfiles with explicit order by and replace them with > SORT_QUERY_RESULTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14502) Convert MiniTez tests to MiniLlap tests
[ https://issues.apache.org/jira/browse/HIVE-14502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427340#comment-15427340 ] Siddharth Seth commented on HIVE-14502: --- +1. After jenkins gets back. > Convert MiniTez tests to MiniLlap tests > --- > > Key: HIVE-14502 > URL: https://issues.apache.org/jira/browse/HIVE-14502 > Project: Hive > Issue Type: Sub-task > Components: Test >Affects Versions: 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-14502.1.patch, HIVE-14502.2.patch > > > Llap shares most of the codepath with tez. MiniLlapCliDriver is much faster > than MiniTezCliDriver because of threaded executors and caching. > MiniTezCliDriver tests takes around 3hr 15mins to run around 400 tests. To > cut down this test time significantly it makes sense to move over mive tez > tests to mini llap tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14561) Minor ptest2 improvements
[ https://issues.apache.org/jira/browse/HIVE-14561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14561: -- Issue Type: Sub-task (was: Task) Parent: HIVE-13503 > Minor ptest2 improvements > - > > Key: HIVE-14561 > URL: https://issues.apache.org/jira/browse/HIVE-14561 > Project: Hive > Issue Type: Sub-task >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14561.01.patch > > > Re-purposed to track a few more improvements. > - Update spring framework to work with Java8 > - Change elapseTime logging to milliseconds from seconds > - Add thread name to log files. > - Allow an empty logsEndPoint if outputDir is not specified > - Log configuration when starting in a web server > - Allow tests to be run even if no qtests property is set > - Fix an exception on test completion when using FixedExecutionContextProvider -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14563) StatsOptimizer treats NULL in a wrong way
[ https://issues.apache.org/jira/browse/HIVE-14563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427338#comment-15427338 ] Ashutosh Chauhan commented on HIVE-14563: - +1 > StatsOptimizer treats NULL in a wrong way > - > > Key: HIVE-14563 > URL: https://issues.apache.org/jira/browse/HIVE-14563 > Project: Hive > Issue Type: Bug >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-14563.01.patch > > > {code} > OSTHOOK: query: explain select count(key) from (select null as key from > src)src > POSTHOOK: type: QUERY > STAGE DEPENDENCIES: > Stage-0 is a root stage > STAGE PLANS: > Stage: Stage-0 > Fetch Operator > limit: 1 > Processor Tree: > ListSink > PREHOOK: query: select count(key) from (select null as key from src)src > PREHOOK: type: QUERY > PREHOOK: Input: default@src > A masked pattern was here > POSTHOOK: query: select count(key) from (select null as key from src)src > POSTHOOK: type: QUERY > POSTHOOK: Input: default@src > A masked pattern was here > 500 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14576) Testing: Fixes to TestHBaseMinimrCliDriver
[ https://issues.apache.org/jira/browse/HIVE-14576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14576: -- Parent Issue: HIVE-14547 (was: HIVE-13503) > Testing: Fixes to TestHBaseMinimrCliDriver > -- > > Key: HIVE-14576 > URL: https://issues.apache.org/jira/browse/HIVE-14576 > Project: Hive > Issue Type: Sub-task >Reporter: Vaibhav Gumashta >Assignee: Vaibhav Gumashta > > 1. Runtime over 1000s. > 2. Runs as an isolated test. > Need to fix both. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-14574) use consistent hashing for LLAP consistent splits to alleviate impact from cluster changes
[ https://issues.apache.org/jira/browse/HIVE-14574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427319#comment-15427319 ] Sergey Shelukhin edited comment on HIVE-14574 at 8/18/16 10:59 PM: --- We can easily achieve unique IDs by taking ZK node name (which is unique and sequential). However, as the new node is re-added to the tail on every restart, it throws everything off. What we want conceptually is that restarted nodes go into the same position in the order as the nodes they replaced, but that is difficult to achieve (or impossible, I am not sure we have such a concept with Slider). We can just have sequential numbers in ZK to take, with every registering node fighting for the lowest number. I wonder if there's already a primitive for that in curator ;) That way the replacement nodes take the place of the nodes that died, most of the time, and leave the running ones undisturbed for most of the time. We can also assume we usually restart in the same place and order by the first time there was LLAP on that particular node for that cluster, then by name. was (Author: sershe): We can easily achieve unique IDs by taking ZK node name (which is unique and sequential). However, as the new node is re-added to the tail on every restart, it throws everything off. What we want conceptually is that restarted nodes go into the same position in the order as the nodes they replaced, but that is difficult to achieve (or impossible, I am not sure we have such a concept with Slider). We can just have sequential numbers in ZK to take, with every registering node fighting for the lowest number. I wonder if there's already a primitive for that in curator ;) We can also assume we usually restart in the same place and order by the first time there was LLAP on that particular node for that cluster, then by name. > use consistent hashing for LLAP consistent splits to alleviate impact from > cluster changes > -- > > Key: HIVE-14574 > URL: https://issues.apache.org/jira/browse/HIVE-14574 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-14574.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-14373) Add integration tests for hive on S3
[ https://issues.apache.org/jira/browse/HIVE-14373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427324#comment-15427324 ] Abdullah Yousufi edited comment on HIVE-14373 at 8/18/16 10:54 PM: --- Hey everyone, I've updated RB with this patch, which takes into account HIVE-1's removal of vm files. I've also added features like source and output table paths, as well as making the bucket path a property to pass in when running the test. Major thanks to [~yalovyyi] for the reference patch and everyone else for their feedback so far. was (Author: ayousufi): Hey everyone, I've updated RB with this patch, which takes into account HIVE-1's removal of vm files. I've also added features like source and output table paths, as well as making the bucket path a property to pass in when running the test. Major thanks to [~yalovyyi]'s for the reference patch and everyone else for their feedback so far. > Add integration tests for hive on S3 > > > Key: HIVE-14373 > URL: https://issues.apache.org/jira/browse/HIVE-14373 > Project: Hive > Issue Type: Sub-task >Reporter: Sergio Peña >Assignee: Abdullah Yousufi > Attachments: HIVE-14373.02.patch, HIVE-14373.03.patch, > HIVE-14373.patch > > > With Hive doing improvements to run on S3, it would be ideal to have better > integration testing on S3. > These S3 tests won't be able to be executed by HiveQA because it will need > Amazon credentials. We need to write suite based on ideas from the Hadoop > project where: > - an xml file is provided with S3 credentials > - a committer must run these tests manually to verify it works > - the xml file should not be part of the commit, and hiveqa should not run > these tests. > https://wiki.apache.org/hadoop/HowToContribute#Submitting_patches_against_object_stores_such_as_Amazon_S3.2C_OpenStack_Swift_and_Microsoft_Azure -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14373) Add integration tests for hive on S3
[ https://issues.apache.org/jira/browse/HIVE-14373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abdullah Yousufi updated HIVE-14373: Attachment: HIVE-14373.03.patch Hey everyone, I've updated RB with this patch, which takes into account HIVE-1's removal of vm files. I've also added features like source and output table paths, as well as making the bucket path a property to pass in when running the test. Major thanks to [~yalovyyi]'s for the reference patch and everyone else for their feedback so far. > Add integration tests for hive on S3 > > > Key: HIVE-14373 > URL: https://issues.apache.org/jira/browse/HIVE-14373 > Project: Hive > Issue Type: Sub-task >Reporter: Sergio Peña >Assignee: Abdullah Yousufi > Attachments: HIVE-14373.02.patch, HIVE-14373.03.patch, > HIVE-14373.patch > > > With Hive doing improvements to run on S3, it would be ideal to have better > integration testing on S3. > These S3 tests won't be able to be executed by HiveQA because it will need > Amazon credentials. We need to write suite based on ideas from the Hadoop > project where: > - an xml file is provided with S3 credentials > - a committer must run these tests manually to verify it works > - the xml file should not be part of the commit, and hiveqa should not run > these tests. > https://wiki.apache.org/hadoop/HowToContribute#Submitting_patches_against_object_stores_such_as_Amazon_S3.2C_OpenStack_Swift_and_Microsoft_Azure -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14574) use consistent hashing for LLAP consistent splits to alleviate impact from cluster changes
[ https://issues.apache.org/jira/browse/HIVE-14574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427319#comment-15427319 ] Sergey Shelukhin commented on HIVE-14574: - We can easily achieve unique IDs by taking ZK node name (which is unique and sequential). However, as the new node is re-added to the tail on every restart, it throws everything off. What we want conceptually is that restarted nodes go into the same position in the order as the nodes they replaced, but that is difficult to achieve (or impossible, I am not sure we have such a concept with Slider). We can just have sequential numbers in ZK to take, with every registering node fighting for the lowest number. I wonder if there's already a primitive for that in curator ;) We can also assume we usually restart in the same place and order by the first time there was LLAP on that particular node for that cluster, then by name. > use consistent hashing for LLAP consistent splits to alleviate impact from > cluster changes > -- > > Key: HIVE-14574 > URL: https://issues.apache.org/jira/browse/HIVE-14574 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-14574.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14522) CBO: Calcite Operator To Hive Operator(Calcite Return Path): Fix test failure for auto_join_filters
[ https://issues.apache.org/jira/browse/HIVE-14522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427314#comment-15427314 ] Ashutosh Chauhan commented on HIVE-14522: - +1 > CBO: Calcite Operator To Hive Operator(Calcite Return Path): Fix test failure > for auto_join_filters > --- > > Key: HIVE-14522 > URL: https://issues.apache.org/jira/browse/HIVE-14522 > Project: Hive > Issue Type: Sub-task > Components: CBO >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-14522.1.patch, HIVE-14522.2.patch > > > {code} > CREATE TABLE smb_input1(key int, value int) CLUSTERED BY (key) SORTED BY > (key) INTO 2 BUCKETS; > CREATE TABLE smb_input2(key int, value int) CLUSTERED BY (value) SORTED BY > (value) INTO 2 BUCKETS; > LOAD DATA LOCAL INPATH '../../data/files/in1.txt' into table smb_input1; > LOAD DATA LOCAL INPATH '../../data/files/in2.txt' into table smb_input1; > LOAD DATA LOCAL INPATH '../../data/files/in1.txt' into table smb_input2; > LOAD DATA LOCAL INPATH '../../data/files/in2.txt' into table smb_input2; > SET hive.optimize.bucketmapjoin = true; > SET hive.optimize.bucketmapjoin.sortedmerge = true; > SET hive.input.format = > org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat; > SET hive.outerjoin.supports.filters = false; > {code} > {code} SELECT sum(hash(a.key,a.value,b.key,b.value)) FROM myinput1 a LEFT > OUTER JOIN myinput1 b on a.key > 40 AND a.value > 50 AND a.key = a.value AND > b.key > 40 AND b.value > 50 AND b.key = b.value; {code} > {code} Expected result: 3078400 Actual result: 4937935 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14561) Minor ptest2 improvements
[ https://issues.apache.org/jira/browse/HIVE-14561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14561: -- Status: Patch Available (was: Open) > Minor ptest2 improvements > - > > Key: HIVE-14561 > URL: https://issues.apache.org/jira/browse/HIVE-14561 > Project: Hive > Issue Type: Task >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14561.01.patch > > > Re-purposed to track a few more improvements. > - Update spring framework to work with Java8 > - Change elapseTime logging to milliseconds from seconds > - Add thread name to log files. > - Allow an empty logsEndPoint if outputDir is not specified > - Log configuration when starting in a web server > - Allow tests to be run even if no qtests property is set > - Fix an exception on test completion when using FixedExecutionContextProvider -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13403) Make Streaming API not create empty buckets (at least as an option)
[ https://issues.apache.org/jira/browse/HIVE-13403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng updated HIVE-13403: - Status: Open (was: Patch Available) > Make Streaming API not create empty buckets (at least as an option) > --- > > Key: HIVE-13403 > URL: https://issues.apache.org/jira/browse/HIVE-13403 > Project: Hive > Issue Type: Bug > Components: HCatalog, Transactions >Affects Versions: 1.3.0 >Reporter: Eugene Koifman >Assignee: Wei Zheng >Priority: Critical > Attachments: HIVE-13403.1.patch, HIVE-13403.2.patch, > HIVE-13403.3.patch > > > as of HIVE-11983, when a TransactionBatch is opened in StreamingAPI, a full > compliment of bucket files (AbstractRecordWriter.createRecordUpdaters()) is > created on disk even though some may end up receiving no data. > It would be better to create them on demand and not clog the FS. > Tez can handle missing (empty) buckets and on MR bucket join algorithms will > check if all buckets are there and bail out if not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13403) Make Streaming API not create empty buckets (at least as an option)
[ https://issues.apache.org/jira/browse/HIVE-13403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng updated HIVE-13403: - Attachment: HIVE-13403.3.patch > Make Streaming API not create empty buckets (at least as an option) > --- > > Key: HIVE-13403 > URL: https://issues.apache.org/jira/browse/HIVE-13403 > Project: Hive > Issue Type: Bug > Components: HCatalog, Transactions >Affects Versions: 1.3.0 >Reporter: Eugene Koifman >Assignee: Wei Zheng >Priority: Critical > Attachments: HIVE-13403.1.patch, HIVE-13403.2.patch, > HIVE-13403.3.patch > > > as of HIVE-11983, when a TransactionBatch is opened in StreamingAPI, a full > compliment of bucket files (AbstractRecordWriter.createRecordUpdaters()) is > created on disk even though some may end up receiving no data. > It would be better to create them on demand and not clog the FS. > Tez can handle missing (empty) buckets and on MR bucket join algorithms will > check if all buckets are there and bail out if not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13403) Make Streaming API not create empty buckets (at least as an option)
[ https://issues.apache.org/jira/browse/HIVE-13403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng updated HIVE-13403: - Status: Patch Available (was: Open) > Make Streaming API not create empty buckets (at least as an option) > --- > > Key: HIVE-13403 > URL: https://issues.apache.org/jira/browse/HIVE-13403 > Project: Hive > Issue Type: Bug > Components: HCatalog, Transactions >Affects Versions: 1.3.0 >Reporter: Eugene Koifman >Assignee: Wei Zheng >Priority: Critical > Attachments: HIVE-13403.1.patch, HIVE-13403.2.patch, > HIVE-13403.3.patch > > > as of HIVE-11983, when a TransactionBatch is opened in StreamingAPI, a full > compliment of bucket files (AbstractRecordWriter.createRecordUpdaters()) is > created on disk even though some may end up receiving no data. > It would be better to create them on demand and not clog the FS. > Tez can handle missing (empty) buckets and on MR bucket join algorithms will > check if all buckets are there and bail out if not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14566) LLAP IO reads timestamp wrongly
[ https://issues.apache.org/jira/browse/HIVE-14566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427294#comment-15427294 ] Owen O'Malley commented on HIVE-14566: -- I really think that we need to make a context object for passing information down to the tree reader. Otherwise, we are going to get killed by adding parameters to this, especially when ORC makes it out of Hive. How about something like: {code} public interface Context { SchemaEvolution getEvolution(); boolean skipCorrupt(); String writerTimezone(); } public static TreeReader createTreeReader(TypeDescription readerType, Context context) throws IOException { {code} Then we can add new information without making sure a huge change that touches all of the methods. > LLAP IO reads timestamp wrongly > --- > > Key: HIVE-14566 > URL: https://issues.apache.org/jira/browse/HIVE-14566 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 2.1.0, 2.0.1, 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Critical > Attachments: HIVE-14566.1.patch, HIVE-14566.2.patch > > > HIVE-10127 is causing incorrect results when orc_merge12.q is run in llap. > It reads timestamp wrongly. > {code:title=LLAP IO Enabled} > hive> select atimestamp1 from alltypesorc3xcols limit 10; > OK > 1969-12-31 15:59:46.674 > NULL > 1969-12-31 15:59:55.787 > 1969-12-31 15:59:44.187 > 1969-12-31 15:59:50.434 > 1969-12-31 16:00:15.007 > 1969-12-31 16:00:07.021 > 1969-12-31 16:00:04.963 > 1969-12-31 15:59:52.176 > 1969-12-31 15:59:44.569 > {code} > {code:title=LLAP IO Disabled} > hive> select atimestamp1 from alltypesorc3xcols limit 10; > OK > 1969-12-31 15:59:46.674 > NULL > 1969-12-31 15:59:55.787 > 1969-12-31 15:59:44.187 > 1969-12-31 15:59:50.434 > 1969-12-31 16:00:14.007 > 1969-12-31 16:00:06.021 > 1969-12-31 16:00:03.963 > 1969-12-31 15:59:52.176 > 1969-12-31 15:59:44.569 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14522) CBO: Calcite Operator To Hive Operator(Calcite Return Path): Fix test failure for auto_join_filters
[ https://issues.apache.org/jira/browse/HIVE-14522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427288#comment-15427288 ] Vineet Garg commented on HIVE-14522: Created: https://reviews.apache.org/r/51226/ > CBO: Calcite Operator To Hive Operator(Calcite Return Path): Fix test failure > for auto_join_filters > --- > > Key: HIVE-14522 > URL: https://issues.apache.org/jira/browse/HIVE-14522 > Project: Hive > Issue Type: Sub-task > Components: CBO >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-14522.1.patch, HIVE-14522.2.patch > > > {code} > CREATE TABLE smb_input1(key int, value int) CLUSTERED BY (key) SORTED BY > (key) INTO 2 BUCKETS; > CREATE TABLE smb_input2(key int, value int) CLUSTERED BY (value) SORTED BY > (value) INTO 2 BUCKETS; > LOAD DATA LOCAL INPATH '../../data/files/in1.txt' into table smb_input1; > LOAD DATA LOCAL INPATH '../../data/files/in2.txt' into table smb_input1; > LOAD DATA LOCAL INPATH '../../data/files/in1.txt' into table smb_input2; > LOAD DATA LOCAL INPATH '../../data/files/in2.txt' into table smb_input2; > SET hive.optimize.bucketmapjoin = true; > SET hive.optimize.bucketmapjoin.sortedmerge = true; > SET hive.input.format = > org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat; > SET hive.outerjoin.supports.filters = false; > {code} > {code} SELECT sum(hash(a.key,a.value,b.key,b.value)) FROM myinput1 a LEFT > OUTER JOIN myinput1 b on a.key > 40 AND a.value > 50 AND a.key = a.value AND > b.key > 40 AND b.value > 50 AND b.key = b.value; {code} > {code} Expected result: 3078400 Actual result: 4937935 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14522) CBO: Calcite Operator To Hive Operator(Calcite Return Path): Fix test failure for auto_join_filters
[ https://issues.apache.org/jira/browse/HIVE-14522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427277#comment-15427277 ] Ashutosh Chauhan commented on HIVE-14522: - Can you create a RB for this? > CBO: Calcite Operator To Hive Operator(Calcite Return Path): Fix test failure > for auto_join_filters > --- > > Key: HIVE-14522 > URL: https://issues.apache.org/jira/browse/HIVE-14522 > Project: Hive > Issue Type: Sub-task > Components: CBO >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-14522.1.patch, HIVE-14522.2.patch > > > {code} > CREATE TABLE smb_input1(key int, value int) CLUSTERED BY (key) SORTED BY > (key) INTO 2 BUCKETS; > CREATE TABLE smb_input2(key int, value int) CLUSTERED BY (value) SORTED BY > (value) INTO 2 BUCKETS; > LOAD DATA LOCAL INPATH '../../data/files/in1.txt' into table smb_input1; > LOAD DATA LOCAL INPATH '../../data/files/in2.txt' into table smb_input1; > LOAD DATA LOCAL INPATH '../../data/files/in1.txt' into table smb_input2; > LOAD DATA LOCAL INPATH '../../data/files/in2.txt' into table smb_input2; > SET hive.optimize.bucketmapjoin = true; > SET hive.optimize.bucketmapjoin.sortedmerge = true; > SET hive.input.format = > org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat; > SET hive.outerjoin.supports.filters = false; > {code} > {code} SELECT sum(hash(a.key,a.value,b.key,b.value)) FROM myinput1 a LEFT > OUTER JOIN myinput1 b on a.key > 40 AND a.value > 50 AND a.key = a.value AND > b.key > 40 AND b.value > 50 AND b.key = b.value; {code} > {code} Expected result: 3078400 Actual result: 4937935 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14503) Remove explicit order by in qfiles for union tests
[ https://issues.apache.org/jira/browse/HIVE-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427270#comment-15427270 ] Hari Sankar Sivarama Subramaniyan commented on HIVE-14503: -- looks good, conditional +1 based on clean run. Also, will be a "nice to have": the comment I had added in the RB. > Remove explicit order by in qfiles for union tests > -- > > Key: HIVE-14503 > URL: https://issues.apache.org/jira/browse/HIVE-14503 > Project: Hive > Issue Type: Sub-task > Components: Test >Affects Versions: 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-14503.1.patch, HIVE-14503.2.patch, > HIVE-14503.3.patch, HIVE-14503.4.patch > > > Identify qfiles with explicit order by and replace them with > SORT_QUERY_RESULTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14564) Column Pruning generates out of order columns in SelectOperator which cause ArrayIndexOutOfBoundsException.
[ https://issues.apache.org/jira/browse/HIVE-14564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427266#comment-15427266 ] Ashutosh Chauhan commented on HIVE-14564: - [~zxu] Thanks for the patch. Can you add a testcase to demonstrate the problem you are facing here? > Column Pruning generates out of order columns in SelectOperator which cause > ArrayIndexOutOfBoundsException. > --- > > Key: HIVE-14564 > URL: https://issues.apache.org/jira/browse/HIVE-14564 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 2.1.0 >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Attachments: HIVE-14564.000.patch > > > Column Pruning generates out of order columns in SelectOperator which cause > ArrayIndexOutOfBoundsException. > {code} > 2016-07-26 21:49:24,390 FATAL [main] > org.apache.hadoop.hive.ql.exec.mr.ExecMapper: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:507) > at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.ArrayIndexOutOfBoundsException > at > org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:397) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95) > at > org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157) > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:497) > ... 9 more > Caused by: java.lang.ArrayIndexOutOfBoundsException > at java.lang.System.arraycopy(Native Method) > at org.apache.hadoop.io.Text.set(Text.java:225) > at > org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryString.init(LazyBinaryString.java:48) > at > org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.uncheckedGetField(LazyBinaryStruct.java:264) > at > org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:201) > at > org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:64) > at > org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator._evaluate(ExprNodeColumnEvaluator.java:94) > at > org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77) > at > org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65) > at > org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.makeValueWritable(ReduceSinkOperator.java:550) > at > org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:377) > ... 13 more > {code} > The exception is because the serialization and deserialization doesn't match. > The serialization by LazyBinarySerDe from previous MapReduce job used > different order of columns. When the current MapReduce job deserialized the > intermediate sequence file generated by previous MapReduce job, it will get > corrupted data from the deserialization using wrong order of columns by > LazyBinaryStruct. The unmatched columns between serialization and > deserialization is caused by SelectOperator's Column Pruning > {{ColumnPrunerSelectProc}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14574) use consistent hashing for LLAP consistent splits to alleviate impact from cluster changes
[ https://issues.apache.org/jira/browse/HIVE-14574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427231#comment-15427231 ] Gopal V commented on HIVE-14574: > In terms of the 3 byte difference in ORC That's the file MAGIC of "ORC" ... 3 byte. > use consistent hashing for LLAP consistent splits to alleviate impact from > cluster changes > -- > > Key: HIVE-14574 > URL: https://issues.apache.org/jira/browse/HIVE-14574 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-14574.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14503) Remove explicit order by in qfiles for union tests
[ https://issues.apache.org/jira/browse/HIVE-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-14503: - Attachment: HIVE-14503.4.patch Reuploading the correct patch. > Remove explicit order by in qfiles for union tests > -- > > Key: HIVE-14503 > URL: https://issues.apache.org/jira/browse/HIVE-14503 > Project: Hive > Issue Type: Sub-task > Components: Test >Affects Versions: 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-14503.1.patch, HIVE-14503.2.patch, > HIVE-14503.3.patch, HIVE-14503.4.patch > > > Identify qfiles with explicit order by and replace them with > SORT_QUERY_RESULTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-14577) Sync up configs for MiniTez and MiniLlap
[ https://issues.apache.org/jira/browse/HIVE-14577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran reassigned HIVE-14577: Assignee: Prasanth Jayachandran > Sync up configs for MiniTez and MiniLlap > > > Key: HIVE-14577 > URL: https://issues.apache.org/jira/browse/HIVE-14577 > Project: Hive > Issue Type: Sub-task > Components: Test >Affects Versions: 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > > Some configs like hive.explain.user is different for MiniTez and MiniLlap. > Similarly there could be others. We should sync up the configs that could > affect the plan between tez and llap so that it will be easier to compare the > test output files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14502) Convert MiniTez tests to MiniLlap tests
[ https://issues.apache.org/jira/browse/HIVE-14502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427246#comment-15427246 ] Prasanth Jayachandran commented on HIVE-14502: -- Created HIVE-14577 > Convert MiniTez tests to MiniLlap tests > --- > > Key: HIVE-14502 > URL: https://issues.apache.org/jira/browse/HIVE-14502 > Project: Hive > Issue Type: Sub-task > Components: Test >Affects Versions: 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-14502.1.patch, HIVE-14502.2.patch > > > Llap shares most of the codepath with tez. MiniLlapCliDriver is much faster > than MiniTezCliDriver because of threaded executors and caching. > MiniTezCliDriver tests takes around 3hr 15mins to run around 400 tests. To > cut down this test time significantly it makes sense to move over mive tez > tests to mini llap tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14574) use consistent hashing for LLAP consistent splits to alleviate impact from cluster changes
[ https://issues.apache.org/jira/browse/HIVE-14574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427244#comment-15427244 ] Siddharth Seth commented on HIVE-14574: --- Another thing to consider here is that knownLocations is a list sorted by name (random uuid i think). A new node could show up anywhere in this array. The name generation would need to be fixed as well if doing something like this. That could be as simple as a counter in ZK - but I think a fix is required for something like this to actually work. > use consistent hashing for LLAP consistent splits to alleviate impact from > cluster changes > -- > > Key: HIVE-14574 > URL: https://issues.apache.org/jira/browse/HIVE-14574 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-14574.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14502) Convert MiniTez tests to MiniLlap tests
[ https://issues.apache.org/jira/browse/HIVE-14502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427239#comment-15427239 ] Prasanth Jayachandran commented on HIVE-14502: -- Sure. Will create one. It was easier for me to verify the test results by explicitly setting explain user to true for these tests. > Convert MiniTez tests to MiniLlap tests > --- > > Key: HIVE-14502 > URL: https://issues.apache.org/jira/browse/HIVE-14502 > Project: Hive > Issue Type: Sub-task > Components: Test >Affects Versions: 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-14502.1.patch, HIVE-14502.2.patch > > > Llap shares most of the codepath with tez. MiniLlapCliDriver is much faster > than MiniTezCliDriver because of threaded executors and caching. > MiniTezCliDriver tests takes around 3hr 15mins to run around 400 tests. To > cut down this test time significantly it makes sense to move over mive tez > tests to mini llap tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14562) CBO (Calcite Return Path) Wrong results for limit + offset
[ https://issues.apache.org/jira/browse/HIVE-14562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427234#comment-15427234 ] Ashutosh Chauhan commented on HIVE-14562: - [~jcamachorodriguez] Can you please review this? > CBO (Calcite Return Path) Wrong results for limit + offset > -- > > Key: HIVE-14562 > URL: https://issues.apache.org/jira/browse/HIVE-14562 > Project: Hive > Issue Type: Sub-task > Components: Logical Optimizer >Affects Versions: 2.1.0 >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Attachments: HIVE-14562.patch > > > offset is missed altogether. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14574) use consistent hashing for LLAP consistent splits to alleviate impact from cluster changes
[ https://issues.apache.org/jira/browse/HIVE-14574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427233#comment-15427233 ] Prasanth Jayachandran commented on HIVE-14574: -- The 3 byte difference will still be there based on what split strategy is chose. If a big file is chosen by ETL split strategy the first split will start from 3 offset. If chosen by BI split strategy the first split will start from 0. My fix was related to inconsistently choosing strategies based on AM cache being on or off. > use consistent hashing for LLAP consistent splits to alleviate impact from > cluster changes > -- > > Key: HIVE-14574 > URL: https://issues.apache.org/jira/browse/HIVE-14574 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-14574.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-14574) use consistent hashing for LLAP consistent splits to alleviate impact from cluster changes
[ https://issues.apache.org/jira/browse/HIVE-14574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427233#comment-15427233 ] Prasanth Jayachandran edited comment on HIVE-14574 at 8/18/16 9:55 PM: --- The 3 byte difference will still be there based on what split strategy is chosen. If a big file is chosen by ETL split strategy the first split will start from 3 offset. If chosen by BI split strategy the first split will start from 0. My fix was related to inconsistently choosing strategies based on AM cache being on or off. was (Author: prasanth_j): The 3 byte difference will still be there based on what split strategy is chose. If a big file is chosen by ETL split strategy the first split will start from 3 offset. If chosen by BI split strategy the first split will start from 0. My fix was related to inconsistently choosing strategies based on AM cache being on or off. > use consistent hashing for LLAP consistent splits to alleviate impact from > cluster changes > -- > > Key: HIVE-14574 > URL: https://issues.apache.org/jira/browse/HIVE-14574 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-14574.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14563) StatsOptimizer treats NULL in a wrong way
[ https://issues.apache.org/jira/browse/HIVE-14563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427232#comment-15427232 ] Ashutosh Chauhan commented on HIVE-14563: - Can you create a RB for it? > StatsOptimizer treats NULL in a wrong way > - > > Key: HIVE-14563 > URL: https://issues.apache.org/jira/browse/HIVE-14563 > Project: Hive > Issue Type: Bug >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-14563.01.patch > > > {code} > OSTHOOK: query: explain select count(key) from (select null as key from > src)src > POSTHOOK: type: QUERY > STAGE DEPENDENCIES: > Stage-0 is a root stage > STAGE PLANS: > Stage: Stage-0 > Fetch Operator > limit: 1 > Processor Tree: > ListSink > PREHOOK: query: select count(key) from (select null as key from src)src > PREHOOK: type: QUERY > PREHOOK: Input: default@src > A masked pattern was here > POSTHOOK: query: select count(key) from (select null as key from src)src > POSTHOOK: type: QUERY > POSTHOOK: Input: default@src > A masked pattern was here > 500 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14503) Remove explicit order by in qfiles for union tests
[ https://issues.apache.org/jira/browse/HIVE-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-14503: - Attachment: (was: HIVE-14503.4.patch) > Remove explicit order by in qfiles for union tests > -- > > Key: HIVE-14503 > URL: https://issues.apache.org/jira/browse/HIVE-14503 > Project: Hive > Issue Type: Sub-task > Components: Test >Affects Versions: 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-14503.1.patch, HIVE-14503.2.patch, > HIVE-14503.3.patch, HIVE-14503.4.patch > > > Identify qfiles with explicit order by and replace them with > SORT_QUERY_RESULTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14502) Convert MiniTez tests to MiniLlap tests
[ https://issues.apache.org/jira/browse/HIVE-14502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427223#comment-15427223 ] Siddharth Seth commented on HIVE-14502: --- Should we change that to be consistent in a follow up jira; avoid unnecessary sets in the q files. > Convert MiniTez tests to MiniLlap tests > --- > > Key: HIVE-14502 > URL: https://issues.apache.org/jira/browse/HIVE-14502 > Project: Hive > Issue Type: Sub-task > Components: Test >Affects Versions: 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-14502.1.patch, HIVE-14502.2.patch > > > Llap shares most of the codepath with tez. MiniLlapCliDriver is much faster > than MiniTezCliDriver because of threaded executors and caching. > MiniTezCliDriver tests takes around 3hr 15mins to run around 400 tests. To > cut down this test time significantly it makes sense to move over mive tez > tests to mini llap tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14574) use consistent hashing for LLAP consistent splits to alleviate impact from cluster changes
[ https://issues.apache.org/jira/browse/HIVE-14574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427219#comment-15427219 ] Siddharth Seth commented on HIVE-14574: --- bq. block boundaries and stripe boundaries I'n not sure why this comment was even added in there. In terms of the 3 byte difference in ORC - [~prasanth_j] may have fixed that already. Will Hashing.consistentHash generate the same values across different JVMs ? I think I had considered this earlier, and finally went with murmur since that does generate the same value across different JVMs / machines. Needs new unit tests to validate behaviour. Don't think there's any reasons for the existing ones to break. > use consistent hashing for LLAP consistent splits to alleviate impact from > cluster changes > -- > > Key: HIVE-14574 > URL: https://issues.apache.org/jira/browse/HIVE-14574 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-14574.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14503) Remove explicit order by in qfiles for union tests
[ https://issues.apache.org/jira/browse/HIVE-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427216#comment-15427216 ] Prasanth Jayachandran commented on HIVE-14503: -- I left union23.q to have union and order by in the .4 patch. > Remove explicit order by in qfiles for union tests > -- > > Key: HIVE-14503 > URL: https://issues.apache.org/jira/browse/HIVE-14503 > Project: Hive > Issue Type: Sub-task > Components: Test >Affects Versions: 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-14503.1.patch, HIVE-14503.2.patch, > HIVE-14503.3.patch, HIVE-14503.4.patch > > > Identify qfiles with explicit order by and replace them with > SORT_QUERY_RESULTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14503) Remove explicit order by in qfiles for union tests
[ https://issues.apache.org/jira/browse/HIVE-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-14503: - Attachment: HIVE-14503.4.patch > Remove explicit order by in qfiles for union tests > -- > > Key: HIVE-14503 > URL: https://issues.apache.org/jira/browse/HIVE-14503 > Project: Hive > Issue Type: Sub-task > Components: Test >Affects Versions: 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-14503.1.patch, HIVE-14503.2.patch, > HIVE-14503.3.patch, HIVE-14503.4.patch > > > Identify qfiles with explicit order by and replace them with > SORT_QUERY_RESULTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14574) use consistent hashing for LLAP consistent splits to alleviate impact from cluster changes
[ https://issues.apache.org/jira/browse/HIVE-14574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427204#comment-15427204 ] Sergey Shelukhin commented on HIVE-14574: - Hrrm. That sounds like some voodoo magic. Sure... > use consistent hashing for LLAP consistent splits to alleviate impact from > cluster changes > -- > > Key: HIVE-14574 > URL: https://issues.apache.org/jira/browse/HIVE-14574 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-14574.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14522) CBO: Calcite Operator To Hive Operator(Calcite Return Path): Fix test failure for auto_join_filters
[ https://issues.apache.org/jira/browse/HIVE-14522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427208#comment-15427208 ] Hive QA commented on HIVE-14522: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12824421/HIVE-14522.2.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10442 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_1] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_2] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[load_dyn_part1] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[load_dyn_part2] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[transform_ppr1] org.apache.hive.beeline.TestBeeLineWithArgs.testEmbeddedBeelineOutputs org.apache.hive.jdbc.TestJdbcWithMiniHS2.testSelectThriftSerializeInTasks org.apache.hive.service.cli.operation.TestOperationLoggingLayout.testSwitchLogLayout {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/930/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/930/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-930/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 8 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12824421 - PreCommit-HIVE-MASTER-Build > CBO: Calcite Operator To Hive Operator(Calcite Return Path): Fix test failure > for auto_join_filters > --- > > Key: HIVE-14522 > URL: https://issues.apache.org/jira/browse/HIVE-14522 > Project: Hive > Issue Type: Sub-task > Components: CBO >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-14522.1.patch, HIVE-14522.2.patch > > > {code} > CREATE TABLE smb_input1(key int, value int) CLUSTERED BY (key) SORTED BY > (key) INTO 2 BUCKETS; > CREATE TABLE smb_input2(key int, value int) CLUSTERED BY (value) SORTED BY > (value) INTO 2 BUCKETS; > LOAD DATA LOCAL INPATH '../../data/files/in1.txt' into table smb_input1; > LOAD DATA LOCAL INPATH '../../data/files/in2.txt' into table smb_input1; > LOAD DATA LOCAL INPATH '../../data/files/in1.txt' into table smb_input2; > LOAD DATA LOCAL INPATH '../../data/files/in2.txt' into table smb_input2; > SET hive.optimize.bucketmapjoin = true; > SET hive.optimize.bucketmapjoin.sortedmerge = true; > SET hive.input.format = > org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat; > SET hive.outerjoin.supports.filters = false; > {code} > {code} SELECT sum(hash(a.key,a.value,b.key,b.value)) FROM myinput1 a LEFT > OUTER JOIN myinput1 b on a.key > 40 AND a.value > 50 AND a.key = a.value AND > b.key > 40 AND b.value > 50 AND b.key = b.value; {code} > {code} Expected result: 3078400 Actual result: 4937935 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14576) Testing: Fixes to TestHBaseMinimrCliDriver
[ https://issues.apache.org/jira/browse/HIVE-14576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-14576: Summary: Testing: Fixes to TestHBaseMinimrCliDriver (was: Fixes to TestHBaseMinimrCliDriver) > Testing: Fixes to TestHBaseMinimrCliDriver > -- > > Key: HIVE-14576 > URL: https://issues.apache.org/jira/browse/HIVE-14576 > Project: Hive > Issue Type: Sub-task >Reporter: Vaibhav Gumashta >Assignee: Vaibhav Gumashta > > 1. Runtime over 1000s. > 2. Runs as an isolated test. > Need to fix both. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14512) Testing: Evaluate and fix overheads in executing a single q test
[ https://issues.apache.org/jira/browse/HIVE-14512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-14512: Issue Type: Sub-task (was: Bug) Parent: HIVE-13503 > Testing: Evaluate and fix overheads in executing a single q test > - > > Key: HIVE-14512 > URL: https://issues.apache.org/jira/browse/HIVE-14512 > Project: Hive > Issue Type: Sub-task >Reporter: Vaibhav Gumashta >Assignee: Vaibhav Gumashta > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14566) LLAP IO reads timestamp wrongly
[ https://issues.apache.org/jira/browse/HIVE-14566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-14566: - Attachment: HIVE-14566.2.patch Addressed [~sershe]'s review comments. > LLAP IO reads timestamp wrongly > --- > > Key: HIVE-14566 > URL: https://issues.apache.org/jira/browse/HIVE-14566 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 2.1.0, 2.0.1, 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Critical > Attachments: HIVE-14566.1.patch, HIVE-14566.2.patch > > > HIVE-10127 is causing incorrect results when orc_merge12.q is run in llap. > It reads timestamp wrongly. > {code:title=LLAP IO Enabled} > hive> select atimestamp1 from alltypesorc3xcols limit 10; > OK > 1969-12-31 15:59:46.674 > NULL > 1969-12-31 15:59:55.787 > 1969-12-31 15:59:44.187 > 1969-12-31 15:59:50.434 > 1969-12-31 16:00:15.007 > 1969-12-31 16:00:07.021 > 1969-12-31 16:00:04.963 > 1969-12-31 15:59:52.176 > 1969-12-31 15:59:44.569 > {code} > {code:title=LLAP IO Disabled} > hive> select atimestamp1 from alltypesorc3xcols limit 10; > OK > 1969-12-31 15:59:46.674 > NULL > 1969-12-31 15:59:55.787 > 1969-12-31 15:59:44.187 > 1969-12-31 15:59:50.434 > 1969-12-31 16:00:14.007 > 1969-12-31 16:00:06.021 > 1969-12-31 16:00:03.963 > 1969-12-31 15:59:52.176 > 1969-12-31 15:59:44.569 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14502) Convert MiniTez tests to MiniLlap tests
[ https://issues.apache.org/jira/browse/HIVE-14502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427198#comment-15427198 ] Prasanth Jayachandran commented on HIVE-14502: -- User explain is set to true for MiniTez and false for MiniLlap in hive-site.. changing to true for MiniLlap in hive-site will affect more tests. So I explicitly set them to true in qfiles. > Convert MiniTez tests to MiniLlap tests > --- > > Key: HIVE-14502 > URL: https://issues.apache.org/jira/browse/HIVE-14502 > Project: Hive > Issue Type: Sub-task > Components: Test >Affects Versions: 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-14502.1.patch, HIVE-14502.2.patch > > > Llap shares most of the codepath with tez. MiniLlapCliDriver is much faster > than MiniTezCliDriver because of threaded executors and caching. > MiniTezCliDriver tests takes around 3hr 15mins to run around 400 tests. To > cut down this test time significantly it makes sense to move over mive tez > tests to mini llap tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14574) use consistent hashing for LLAP consistent splits to alleviate impact from cluster changes
[ https://issues.apache.org/jira/browse/HIVE-14574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427192#comment-15427192 ] Gopal V commented on HIVE-14574: [~sershe]: adding to my build for today - minor comment {code} block boundaries and stripe boundaries {code} The start offset can vary by 3-5 bytes depending on this too - round that down to a multiple of 8? > use consistent hashing for LLAP consistent splits to alleviate impact from > cluster changes > -- > > Key: HIVE-14574 > URL: https://issues.apache.org/jira/browse/HIVE-14574 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-14574.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14502) Convert MiniTez tests to MiniLlap tests
[ https://issues.apache.org/jira/browse/HIVE-14502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427191#comment-15427191 ] Siddharth Seth commented on HIVE-14502: --- Mostly looks good. Not sure why "+set hive.explain.user=true;" this has been added to some of the q files. > Convert MiniTez tests to MiniLlap tests > --- > > Key: HIVE-14502 > URL: https://issues.apache.org/jira/browse/HIVE-14502 > Project: Hive > Issue Type: Sub-task > Components: Test >Affects Versions: 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-14502.1.patch, HIVE-14502.2.patch > > > Llap shares most of the codepath with tez. MiniLlapCliDriver is much faster > than MiniTezCliDriver because of threaded executors and caching. > MiniTezCliDriver tests takes around 3hr 15mins to run around 400 tests. To > cut down this test time significantly it makes sense to move over mive tez > tests to mini llap tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14574) use consistent hashing for LLAP consistent splits to alleviate impact from cluster changes
[ https://issues.apache.org/jira/browse/HIVE-14574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-14574: Attachment: HIVE-14574.patch [~gopalv] [~sseth] can you take a look? > use consistent hashing for LLAP consistent splits to alleviate impact from > cluster changes > -- > > Key: HIVE-14574 > URL: https://issues.apache.org/jira/browse/HIVE-14574 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-14574.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14574) use consistent hashing for LLAP consistent splits to alleviate impact from cluster changes
[ https://issues.apache.org/jira/browse/HIVE-14574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-14574: Status: Patch Available (was: Open) > use consistent hashing for LLAP consistent splits to alleviate impact from > cluster changes > -- > > Key: HIVE-14574 > URL: https://issues.apache.org/jira/browse/HIVE-14574 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-14574.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14572) Investigate jenkins test report timings
[ https://issues.apache.org/jira/browse/HIVE-14572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427158#comment-15427158 ] Zoltan Haindrich commented on HIVE-14572: - I think that after ptest is done...the TEST*xml-s are parsed into a report by some jenkins plugin...and that plugin fails to correctly aggregate the results. I guess that surefire-report plugin doesn't even run - but I might be wrong ;) > Investigate jenkins test report timings > --- > > Key: HIVE-14572 > URL: https://issues.apache.org/jira/browse/HIVE-14572 > Project: Hive > Issue Type: Sub-task > Components: Tests >Reporter: Zoltan Haindrich > > [~sseth] have noticed some odd timings in the jenkins reports > I've created a sample project, to emulate a clidriver run during qtest: > the testclass: > * 1 sec beforeclass > * 3x 0.2s test > created using junit4 parameterized. > Double checkout; second project runs different tests...or at least they have > different name. > here are my preliminary findings: > || thing || expected || 2.16 || 2.19.1 > | total time | ~3.4s | 1.2s | 3.4s > | package time | ~3.4s | 0.61s | 1.7s > | class time | ~3.4s | 0.61s | 1.7s > | testcase times | ~.2s | ~.2s | ~.2s > notes: > * using 2.16 beforeclass timngs are totally hidden or lost > * 2.19.1 does account for beforeclass but still fails to correctly aggregate > the two runs of the similary named testclasses > it might worth a try to look at the bleeding edge of this jenkins plugin... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14165) Remove Hive file listing during split computation
[ https://issues.apache.org/jira/browse/HIVE-14165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427148#comment-15427148 ] Steve Loughran commented on HIVE-14165: --- the faster list status is only applicable on a recursive listing; if you are listing one directory, it's just the same time as before > Remove Hive file listing during split computation > - > > Key: HIVE-14165 > URL: https://issues.apache.org/jira/browse/HIVE-14165 > Project: Hive > Issue Type: Sub-task >Affects Versions: 2.1.0 >Reporter: Abdullah Yousufi >Assignee: Abdullah Yousufi > Attachments: HIVE-14165.patch > > > The Hive side listing in FetchOperator.java is unnecessary, since Hadoop's > FileInputFormat.java will list the files during split computation anyway to > determine their size. One way to remove this is to catch the > InvalidInputFormat exception thrown by FileInputFormat#getSplits() on the > Hive side instead of doing the file listing beforehand. > For S3 select queries on partitioned tables, this results in a 2x speedup. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14461) Investigate HBaseMinimrCliDriver tests
[ https://issues.apache.org/jira/browse/HIVE-14461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427151#comment-15427151 ] Zoltan Haindrich commented on HIVE-14461: - the git history have led me to : {code} commit 8018e513e484b8d4ef29beb1f263c0a32df8bc33 Author: Brock NolandDate: Thu Oct 31 18:27:31 2013 + HIVE-5610 - Merge maven branch into trunk (patch) {code} It looks like this was the point when pom.xml have borned...so missing this small needle in the haystack was clearly an unintentioal change ;) i think these tests should be re-enabled...and that loan qfile's extension be corrected > Investigate HBaseMinimrCliDriver tests > -- > > Key: HIVE-14461 > URL: https://issues.apache.org/jira/browse/HIVE-14461 > Project: Hive > Issue Type: Sub-task > Components: Tests >Reporter: Zoltan Haindrich > > during HIVE-1 i've encountered an odd thing: > HBaseMinimrCliDriver only executes single test...and that test is set using > the qfile selector...which looks a out-of-place. > The only test it executes doesn't follow regular qtest file naming...and has > an extension 'm' > At least the file should be renamedbut I think change wasn't > intentional -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14373) Add integration tests for hive on S3
[ https://issues.apache.org/jira/browse/HIVE-14373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427131#comment-15427131 ] Zoltan Haindrich commented on HIVE-14373: - That's great [~ayousufi], I will try to help anyway I can! My patch have uncovered a few issues and also left a few questions unanswered - I will be working on those during next week...but I will try to not create any more glitches before this gets in. > Add integration tests for hive on S3 > > > Key: HIVE-14373 > URL: https://issues.apache.org/jira/browse/HIVE-14373 > Project: Hive > Issue Type: Sub-task >Reporter: Sergio Peña >Assignee: Abdullah Yousufi > Attachments: HIVE-14373.02.patch, HIVE-14373.patch > > > With Hive doing improvements to run on S3, it would be ideal to have better > integration testing on S3. > These S3 tests won't be able to be executed by HiveQA because it will need > Amazon credentials. We need to write suite based on ideas from the Hadoop > project where: > - an xml file is provided with S3 credentials > - a committer must run these tests manually to verify it works > - the xml file should not be part of the commit, and hiveqa should not run > these tests. > https://wiki.apache.org/hadoop/HowToContribute#Submitting_patches_against_object_stores_such_as_Amazon_S3.2C_OpenStack_Swift_and_Microsoft_Azure -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14503) Remove explicit order by in qfiles for union tests
[ https://issues.apache.org/jira/browse/HIVE-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427130#comment-15427130 ] Siddharth Seth commented on HIVE-14503: --- Mostly looks good to me. As an example - ql/src/test/queries/clientpositive/union_script.q - This would result in a plan change. Is that significant for the tests? For a lot of the .q files the order is removed after a previous insert into ... UNION ... - so I don't think it matters there. Maybe we should leave at least one query in place with a union and order by. Triggered another run on jenkins. > Remove explicit order by in qfiles for union tests > -- > > Key: HIVE-14503 > URL: https://issues.apache.org/jira/browse/HIVE-14503 > Project: Hive > Issue Type: Sub-task > Components: Test >Affects Versions: 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-14503.1.patch, HIVE-14503.2.patch, > HIVE-14503.3.patch > > > Identify qfiles with explicit order by and replace them with > SORT_QUERY_RESULTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14566) LLAP IO reads timestamp wrongly
[ https://issues.apache.org/jira/browse/HIVE-14566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427129#comment-15427129 ] Prasanth Jayachandran commented on HIVE-14566: -- The issue is actually not 1 second difference. It happened to be the case in the test case (data/files/alltypesorc3xcols file was written with different timezone). The actual issue is, llap reader was not making timezone adjustments when reading timestamp columns causing difference in results. The non-llap reader used to make the timezone adjustments during start of stripe. This was missing for llap https://github.com/apache/hive/blob/master/orc/src/java/org/apache/orc/impl/TreeReaderFactory.java#L870 Each stripe in orc maintains the timezone that was used by the writer. The reader reads the timestamp values using reader's timezone and by knowing the writer's timezone information from the stripe footer, the reader will make offset adjustments to read timestamp correctly. > LLAP IO reads timestamp wrongly > --- > > Key: HIVE-14566 > URL: https://issues.apache.org/jira/browse/HIVE-14566 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 2.1.0, 2.0.1, 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Critical > Attachments: HIVE-14566.1.patch > > > HIVE-10127 is causing incorrect results when orc_merge12.q is run in llap. > It reads timestamp wrongly. > {code:title=LLAP IO Enabled} > hive> select atimestamp1 from alltypesorc3xcols limit 10; > OK > 1969-12-31 15:59:46.674 > NULL > 1969-12-31 15:59:55.787 > 1969-12-31 15:59:44.187 > 1969-12-31 15:59:50.434 > 1969-12-31 16:00:15.007 > 1969-12-31 16:00:07.021 > 1969-12-31 16:00:04.963 > 1969-12-31 15:59:52.176 > 1969-12-31 15:59:44.569 > {code} > {code:title=LLAP IO Disabled} > hive> select atimestamp1 from alltypesorc3xcols limit 10; > OK > 1969-12-31 15:59:46.674 > NULL > 1969-12-31 15:59:55.787 > 1969-12-31 15:59:44.187 > 1969-12-31 15:59:50.434 > 1969-12-31 16:00:14.007 > 1969-12-31 16:00:06.021 > 1969-12-31 16:00:03.963 > 1969-12-31 15:59:52.176 > 1969-12-31 15:59:44.569 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14559) Remove setting hive.execution.engine in qfiles
[ https://issues.apache.org/jira/browse/HIVE-14559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-14559: - Status: Patch Available (was: Open) > Remove setting hive.execution.engine in qfiles > -- > > Key: HIVE-14559 > URL: https://issues.apache.org/jira/browse/HIVE-14559 > Project: Hive > Issue Type: Sub-task > Components: Test >Affects Versions: 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-14559.1.patch > > > Some qfiles are explicitly setting execution engine. If we run those tests on > different Mini CliDriver's it could be very slow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14554) Hive ptest should delete the itests/thirdparty directory everytime it builds hive
[ https://issues.apache.org/jira/browse/HIVE-14554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427114#comment-15427114 ] Sergey Shelukhin commented on HIVE-14554: - I thought this JIRA was only for pre-commit tests :) > Hive ptest should delete the itests/thirdparty directory everytime it builds > hive > - > > Key: HIVE-14554 > URL: https://issues.apache.org/jira/browse/HIVE-14554 > Project: Hive > Issue Type: Task > Components: Testing Infrastructure >Reporter: Sergio Peña >Assignee: Sergio Peña > > The {{itests/thridparty}} directory is created by hive on spark when > downloading the spark-assembly file. Hive ptest should delete this directory > everytime it runs a new set of tests to avoid conflicts when a new spark > tarball is submitted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14572) Investigate jenkins test report timings
[ https://issues.apache.org/jira/browse/HIVE-14572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427093#comment-15427093 ] Siddharth Seth commented on HIVE-14572: --- I see the following in ptest2. Not sure if this is what generates the report. {code} org.apache.maven.plugins maven-surefire-report-plugin 2.15 {code} > Investigate jenkins test report timings > --- > > Key: HIVE-14572 > URL: https://issues.apache.org/jira/browse/HIVE-14572 > Project: Hive > Issue Type: Sub-task > Components: Tests >Reporter: Zoltan Haindrich > > [~sseth] have noticed some odd timings in the jenkins reports > I've created a sample project, to emulate a clidriver run during qtest: > the testclass: > * 1 sec beforeclass > * 3x 0.2s test > created using junit4 parameterized. > Double checkout; second project runs different tests...or at least they have > different name. > here are my preliminary findings: > || thing || expected || 2.16 || 2.19.1 > | total time | ~3.4s | 1.2s | 3.4s > | package time | ~3.4s | 0.61s | 1.7s > | class time | ~3.4s | 0.61s | 1.7s > | testcase times | ~.2s | ~.2s | ~.2s > notes: > * using 2.16 beforeclass timngs are totally hidden or lost > * 2.19.1 does account for beforeclass but still fails to correctly aggregate > the two runs of the similary named testclasses > it might worth a try to look at the bleeding edge of this jenkins plugin... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-14561) Minor ptest2 improvements
[ https://issues.apache.org/jira/browse/HIVE-14561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth reassigned HIVE-14561: - Assignee: Siddharth Seth > Minor ptest2 improvements > - > > Key: HIVE-14561 > URL: https://issues.apache.org/jira/browse/HIVE-14561 > Project: Hive > Issue Type: Task >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14561.01.patch > > > Re-purposed to track a few more improvements. > - Update spring framework to work with Java8 > - Change elapseTime logging to milliseconds from seconds > - Add thread name to log files. > - Allow an empty logsEndPoint if outputDir is not specified > - Log configuration when starting in a web server > - Allow tests to be run even if no qtests property is set > - Fix an exception on test completion when using FixedExecutionContextProvider -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14561) Minor ptest2 improvements
[ https://issues.apache.org/jira/browse/HIVE-14561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14561: -- Attachment: HIVE-14561.01.patch Patch to address the changes. Most of the changes are tested. Verifying some of the last minute changes like thread names, and times. [~vgumashta], [~spena] - could you please take a look. There's no point running the precommit since it does not test anything here. Unit tests pass locally. I'm going to open follow up jiras to allow the the pre-setup and batch-exec files to be configurable. > Minor ptest2 improvements > - > > Key: HIVE-14561 > URL: https://issues.apache.org/jira/browse/HIVE-14561 > Project: Hive > Issue Type: Task >Reporter: Siddharth Seth > Attachments: HIVE-14561.01.patch > > > Re-purposed to track a few more improvements. > - Update spring framework to work with Java8 > - Change elapseTime logging to milliseconds from seconds > - Add thread name to log files. > - Allow an empty logsEndPoint if outputDir is not specified > - Log configuration when starting in a web server > - Allow tests to be run even if no qtests property is set > - Fix an exception on test completion when using FixedExecutionContextProvider -- This message was sent by Atlassian JIRA (v6.3.4#6332)