[jira] [Commented] (HIVE-15399) Parser change for UniqueJoin
[ https://issues.apache.org/jira/browse/HIVE-15399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15744483#comment-15744483 ] Hive QA commented on HIVE-15399: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12842938/HIVE-15399.01.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 10797 tests executed *Failed tests:* {noformat} TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=110) [tez_joins_explain.q,transform2.q,groupby5.q,cbo_semijoin.q,bucketmapjoin13.q,union_remove_6_subq.q,groupby2_map_multi_distinct.q,load_dyn_part9.q,multi_insert_gby2.q,vectorization_11.q,groupby_position.q,avro_compression_enabled_native.q,smb_mapjoin_8.q,join21.q,auto_join16.q] TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely timed out) (batchId=251) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] (batchId=135) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision] (batchId=151) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_5] (batchId=92) org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[uniquejoin] (batchId=85) org.apache.hadoop.hive.ql.parse.TestIUD.testSelectStarFromAnonymousVirtTable1Row (batchId=257) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2555/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2555/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2555/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 12 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12842938 - PreCommit-HIVE-Build > Parser change for UniqueJoin > > > Key: HIVE-15399 > URL: https://issues.apache.org/jira/browse/HIVE-15399 > Project: Hive > Issue Type: Bug >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-15399.01.patch > > > UniqueJoin was introduced in HIVE-591. Add Unique Join. (Emil Ibrishimov via > namit). It sounds like that there is only one q test for unique join, i.e., > uniquejoin.q. In the q test, unique join source can only come from a table. > However, in parser, its source can come from not only tableSource, but also > {code} > partitionedTableFunction | tableSource | subQuerySource | virtualTableSource > {code} > I think it would be better to change the parser and limit it to meet the > user's real requirement. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15074) Schematool provides a way to detect invalid entries in VERSION table
[ https://issues.apache.org/jira/browse/HIVE-15074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15744480#comment-15744480 ] Lefty Leverenz commented on HIVE-15074: --- Okay, I added a TODOC2.2 label. Thanks. > Schematool provides a way to detect invalid entries in VERSION table > > > Key: HIVE-15074 > URL: https://issues.apache.org/jira/browse/HIVE-15074 > Project: Hive > Issue Type: Sub-task > Components: Metastore >Reporter: Yongzhi Chen >Assignee: Chaoyu Tang >Priority: Minor > Labels: TODOC2.2 > Fix For: 2.2.0 > > Attachments: HIVE-15074.1.patch, HIVE-15074.patch > > > For some unknown reason, we see customer's HMS can not start because there > are multiple entries in their HMS VERSION table. Schematool should provide a > way to validate the HMS db and provide warning and fix options for this kind > of issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15074) Schematool provides a way to detect invalid entries in VERSION table
[ https://issues.apache.org/jira/browse/HIVE-15074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-15074: -- Labels: TODOC2.2 (was: ) > Schematool provides a way to detect invalid entries in VERSION table > > > Key: HIVE-15074 > URL: https://issues.apache.org/jira/browse/HIVE-15074 > Project: Hive > Issue Type: Sub-task > Components: Metastore >Reporter: Yongzhi Chen >Assignee: Chaoyu Tang >Priority: Minor > Labels: TODOC2.2 > Fix For: 2.2.0 > > Attachments: HIVE-15074.1.patch, HIVE-15074.patch > > > For some unknown reason, we see customer's HMS can not start because there > are multiple entries in their HMS VERSION table. Schematool should provide a > way to validate the HMS db and provide warning and fix options for this kind > of issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15423) Allowing Hive to reverse map IP from hostname for partition info
[ https://issues.apache.org/jira/browse/HIVE-15423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15744434#comment-15744434 ] ASF GitHub Bot commented on HIVE-15423: --- GitHub user subahugu opened a pull request: https://github.com/apache/hive/pull/122 HIVE-15423: Allowing Hive to reverse map IP from hostname for partiti… …on info You can merge this pull request into a Git repository by running: $ git pull https://github.com/subahugu/hive branch-1.2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/122.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #122 commit 754783def2ac81b54f1a24403a78b9b47ed6e091 Author: suresh.bahuguna Date: 2016-12-13T07:23:55Z HIVE-15423: Allowing Hive to reverse map IP from hostname for partition info > Allowing Hive to reverse map IP from hostname for partition info > > > Key: HIVE-15423 > URL: https://issues.apache.org/jira/browse/HIVE-15423 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.1 >Reporter: Suresh Bahuguna > > Hive - Namenode hostname mismatch when running queries with 2 MR jobs. > Hive tries to find Partition info using hdfs://:, > whereas the info has been hashed using hdfs://:. > Exception raised in HiveFileFormatUtils.java: > - > java.io.IOException: cannot find dir = > hdfs://hd-nn-24:9000/tmp/hive-admin/hive_2013-08-30_06-11-52_007_1545561832334194535/-mr-10002/00_0 > in pathToPartitionInfo: > [hdfs://192.168.156.24:9000/tmp/hive-admin/hive_2013-08-30_06-11-52_007_1545561832334194535/-mr-10002] > at > org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java > - -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15422) HiveInputFormat::pushProjectionsAndFilters paths comparison generates huge number of objects for partitioned dataset
[ https://issues.apache.org/jira/browse/HIVE-15422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HIVE-15422: Summary: HiveInputFormat::pushProjectionsAndFilters paths comparison generates huge number of objects for partitioned dataset (was: HiveInputFormat::pushProjectionsAndFilters path comparisons create huge number of objects for partitioned dataset) > HiveInputFormat::pushProjectionsAndFilters paths comparison generates huge > number of objects for partitioned dataset > > > Key: HIVE-15422 > URL: https://issues.apache.org/jira/browse/HIVE-15422 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Priority: Minor > Attachments: HIVE-15422.1.patch, Profiler_Snapshot_HIVE-15422.png > > > When executing the following query in LLAP (single instance) in a 5 node > cluster, lots of GC pressure was observed. > {noformat} > select a.type, a.city , a.frequency, b.city, b.country, b.lat, b.lon > from (select 'depart' as type, origin as city, count(origin) as frequency > from flights > group by origin > order by frequency desc, type) as a > left join airports as b on a.city = b.iata > order by frequency desc; > {noformat} > Flights table has got around 7000+ partitions in S3. Profiling revealed large > amount of objects created just in path comparisons in HiveInputFormat. > HIVE-15405 reduces number of path comparisons at FileUtils, but it still ends > up doing lots of comparisons in HiveInputFormat::pushProjectionsAndFilters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15422) HiveInputFormat::pushProjectionsAndFilters path comparisons create huge number of objects for partitioned dataset
[ https://issues.apache.org/jira/browse/HIVE-15422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HIVE-15422: Status: Patch Available (was: Open) > HiveInputFormat::pushProjectionsAndFilters path comparisons create huge > number of objects for partitioned dataset > - > > Key: HIVE-15422 > URL: https://issues.apache.org/jira/browse/HIVE-15422 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Priority: Minor > Attachments: HIVE-15422.1.patch, Profiler_Snapshot_HIVE-15422.png > > > When executing the following query in LLAP (single instance) in a 5 node > cluster, lots of GC pressure was observed. > {noformat} > select a.type, a.city , a.frequency, b.city, b.country, b.lat, b.lon > from (select 'depart' as type, origin as city, count(origin) as frequency > from flights > group by origin > order by frequency desc, type) as a > left join airports as b on a.city = b.iata > order by frequency desc; > {noformat} > Flights table has got around 7000+ partitions in S3. Profiling revealed large > amount of objects created just in path comparisons in HiveInputFormat. > HIVE-15405 reduces number of path comparisons at FileUtils, but it still ends > up doing lots of comparisons in HiveInputFormat::pushProjectionsAndFilters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15422) HiveInputFormat::pushProjectionsAndFilters path comparisons create huge number of objects for partitioned dataset
[ https://issues.apache.org/jira/browse/HIVE-15422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HIVE-15422: Attachment: Profiler_Snapshot_HIVE-15422.png > HiveInputFormat::pushProjectionsAndFilters path comparisons create huge > number of objects for partitioned dataset > - > > Key: HIVE-15422 > URL: https://issues.apache.org/jira/browse/HIVE-15422 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Priority: Minor > Attachments: HIVE-15422.1.patch, Profiler_Snapshot_HIVE-15422.png > > > When executing the following query in LLAP (single instance) in a 5 node > cluster, lots of GC pressure was observed. > {noformat} > select a.type, a.city , a.frequency, b.city, b.country, b.lat, b.lon > from (select 'depart' as type, origin as city, count(origin) as frequency > from flights > group by origin > order by frequency desc, type) as a > left join airports as b on a.city = b.iata > order by frequency desc; > {noformat} > Flights table has got around 7000+ partitions in S3. Profiling revealed large > amount of objects created just in path comparisons in HiveInputFormat. > HIVE-15405 reduces number of path comparisons at FileUtils, but it still ends > up doing lots of comparisons in HiveInputFormat::pushProjectionsAndFilters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15422) HiveInputFormat::pushProjectionsAndFilters path comparisons create huge number of objects for partitioned dataset
[ https://issues.apache.org/jira/browse/HIVE-15422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HIVE-15422: Attachment: HIVE-15422.1.patch > HiveInputFormat::pushProjectionsAndFilters path comparisons create huge > number of objects for partitioned dataset > - > > Key: HIVE-15422 > URL: https://issues.apache.org/jira/browse/HIVE-15422 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Priority: Minor > Attachments: HIVE-15422.1.patch, Profiler_Snapshot_HIVE-15422.png > > > When executing the following query in LLAP (single instance) in a 5 node > cluster, lots of GC pressure was observed. > {noformat} > select a.type, a.city , a.frequency, b.city, b.country, b.lat, b.lon > from (select 'depart' as type, origin as city, count(origin) as frequency > from flights > group by origin > order by frequency desc, type) as a > left join airports as b on a.city = b.iata > order by frequency desc; > {noformat} > Flights table has got around 7000+ partitions in S3. Profiling revealed large > amount of objects created just in path comparisons in HiveInputFormat. > HIVE-15405 reduces number of path comparisons at FileUtils, but it still ends > up doing lots of comparisons in HiveInputFormat::pushProjectionsAndFilters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15386) Expose Spark task counts and stage Ids information in SparkTask from SparkJobMonitor
[ https://issues.apache.org/jira/browse/HIVE-15386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-15386: -- Affects Version/s: (was: 2.2.0) 2.1.1 > Expose Spark task counts and stage Ids information in SparkTask from > SparkJobMonitor > > > Key: HIVE-15386 > URL: https://issues.apache.org/jira/browse/HIVE-15386 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 2.1.1 >Reporter: zhihai xu >Assignee: zhihai xu > Fix For: 2.2.0 > > Attachments: HIVE-15386.000.patch, HIVE-15386.001.patch, > HIVE-15386.002.patch > > > Expose Spark task counts and stage Ids information in SparkTask from > SparkJobMonitor. So these information can be used by hive hook to monitor > spark jobs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15386) Expose Spark task counts and stage Ids information in SparkTask from SparkJobMonitor
[ https://issues.apache.org/jira/browse/HIVE-15386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-15386: -- Resolution: Fixed Fix Version/s: 2.2.0 Status: Resolved (was: Patch Available) Committed to master. Thanks Zhihai for the contribution and Xuefu for the review :) > Expose Spark task counts and stage Ids information in SparkTask from > SparkJobMonitor > > > Key: HIVE-15386 > URL: https://issues.apache.org/jira/browse/HIVE-15386 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 2.2.0 >Reporter: zhihai xu >Assignee: zhihai xu > Fix For: 2.2.0 > > Attachments: HIVE-15386.000.patch, HIVE-15386.001.patch, > HIVE-15386.002.patch > > > Expose Spark task counts and stage Ids information in SparkTask from > SparkJobMonitor. So these information can be used by hive hook to monitor > spark jobs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13278) Many redundant 'File not found' messages appeared in container log during query execution with Hive on Spark
[ https://issues.apache.org/jira/browse/HIVE-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15744366#comment-15744366 ] Rui Li commented on HIVE-13278: --- Hi [~xuefuz], the conclusion is we somehow try to read reduce.xml for map-only job, and yes it happens to MR as well. The call path is {{HiveOutputFormatImpl.checkOutputSpecs -> Utilities.getMapRedWork}}. The reason why HiveOutputFormatImpl needs to get the MapRedWork is it needs to do some check on all the FS operators. Since FS only exists at the end of a job, my suggestion is we firstly try to get MapWork. If the MapWork has an FS in it, it means this is a map-only job so we don't have to look for ReduceWork. But [~stakiar] found that some map-only job may not have FS in the MapWork, e.g. {{ANALYZE TABLE}}. To have a complete fix, we'll need some flag in the JobConf indicating if this is map-only. Or we can use my solution, which solves the issue for most cases. Some special handling for HoS may be needed. For HoS, each map.xml and reduce.xml resides in a different path. We can use {{mapred.task.is.map}} to determine whether the JobConf is for MapWork or ReduceWork. And then call getMapWork or getReduceWork respectively. > Many redundant 'File not found' messages appeared in container log during > query execution with Hive on Spark > > > Key: HIVE-13278 > URL: https://issues.apache.org/jira/browse/HIVE-13278 > Project: Hive > Issue Type: Bug > Environment: Hive on Spark engine > Found based on : > Apache Hive 2.0.0 > Apache Spark 1.6.0 >Reporter: Xin Hao >Assignee: Sahil Takiar >Priority: Minor > > Many redundant 'File not found' messages appeared in container log during > query execution with Hive on Spark. > Certainly, it doesn't prevent the query from running successfully. So mark it > as Minor currently. > Error message example: > {noformat} > 16/03/14 01:45:06 INFO exec.Utilities: File not found: File does not exist: > /tmp/hive/hadoop/2d378538-f5d3-493c-9276-c62dd6634fb4/hive_2016-03-14_01-44-16_835_623058724409492515-6/-mr-10010/0a6d0cae-1eb3-448c-883b-590b3b198a73/reduce.xml > at > org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66) > at > org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15399) Parser change for UniqueJoin
[ https://issues.apache.org/jira/browse/HIVE-15399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-15399: --- Status: Patch Available (was: Open) [~ashutoshc] ,could u take a look? Thanks. > Parser change for UniqueJoin > > > Key: HIVE-15399 > URL: https://issues.apache.org/jira/browse/HIVE-15399 > Project: Hive > Issue Type: Bug >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-15399.01.patch > > > UniqueJoin was introduced in HIVE-591. Add Unique Join. (Emil Ibrishimov via > namit). It sounds like that there is only one q test for unique join, i.e., > uniquejoin.q. In the q test, unique join source can only come from a table. > However, in parser, its source can come from not only tableSource, but also > {code} > partitionedTableFunction | tableSource | subQuerySource | virtualTableSource > {code} > I think it would be better to change the parser and limit it to meet the > user's real requirement. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15399) Parser change for UniqueJoin
[ https://issues.apache.org/jira/browse/HIVE-15399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-15399: --- Attachment: HIVE-15399.01.patch > Parser change for UniqueJoin > > > Key: HIVE-15399 > URL: https://issues.apache.org/jira/browse/HIVE-15399 > Project: Hive > Issue Type: Bug >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-15399.01.patch > > > UniqueJoin was introduced in HIVE-591. Add Unique Join. (Emil Ibrishimov via > namit). It sounds like that there is only one q test for unique join, i.e., > uniquejoin.q. In the q test, unique join source can only come from a table. > However, in parser, its source can come from not only tableSource, but also > {code} > partitionedTableFunction | tableSource | subQuerySource | virtualTableSource > {code} > I think it would be better to change the parser and limit it to meet the > user's real requirement. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13452) StatsOptimizer should return no rows on empty table with group by
[ https://issues.apache.org/jira/browse/HIVE-13452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-13452: --- Resolution: Fixed Fix Version/s: 2.2.0 Status: Resolved (was: Patch Available) pushed to master. Thanks [~ashutoshc] for the review! > StatsOptimizer should return no rows on empty table with group by > - > > Key: HIVE-13452 > URL: https://issues.apache.org/jira/browse/HIVE-13452 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Affects Versions: 1.0.0, 1.2.0, 1.1.0, 2.0.0, 2.1.0 >Reporter: Ashutosh Chauhan >Assignee: Pengcheng Xiong > Fix For: 2.2.0 > > Attachments: HIVE-13452.01.patch > > > {code} > create table t1 (a int); > analyze table t1 compute statistics; > analyze table t1 compute statistics for columns; > select count(1) from t1 group by 1; > set hive.compute.query.using.stats=true; > select count(1) from t1 group by 1; > {code} > In both cases result set should be empty. However, with statsoptimizer on > Hive returns one row with value 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13452) StatsOptimizer should return no rows on empty table with group by
[ https://issues.apache.org/jira/browse/HIVE-13452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-13452: --- Affects Version/s: 1.0.0 1.2.0 1.1.0 2.0.0 2.1.0 > StatsOptimizer should return no rows on empty table with group by > - > > Key: HIVE-13452 > URL: https://issues.apache.org/jira/browse/HIVE-13452 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Affects Versions: 1.0.0, 1.2.0, 1.1.0, 2.0.0, 2.1.0 >Reporter: Ashutosh Chauhan >Assignee: Pengcheng Xiong > Fix For: 2.2.0 > > Attachments: HIVE-13452.01.patch > > > {code} > create table t1 (a int); > analyze table t1 compute statistics; > analyze table t1 compute statistics for columns; > select count(1) from t1 group by 1; > set hive.compute.query.using.stats=true; > select count(1) from t1 group by 1; > {code} > In both cases result set should be empty. However, with statsoptimizer on > Hive returns one row with value 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13452) StatsOptimizer should return no rows on empty table with group by
[ https://issues.apache.org/jira/browse/HIVE-13452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15744204#comment-15744204 ] Hive QA commented on HIVE-13452: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12842907/HIVE-13452.01.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10782 tests executed *Failed tests:* {noformat} TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=144) [vectorized_rcfile_columnar.q,vector_elt.q,explainuser_1.q,multi_insert.q,tez_dml.q,vector_bround.q,schema_evol_orc_acid_table.q,vector_when_case_null.q,orc_ppd_schema_evol_1b.q,vector_join30.q,vectorization_11.q,cte_3.q,update_tmp_table.q,vector_decimal_cast.q,groupby_grouping_id2.q,vector_decimal_round.q,tez_smb_empty.q,orc_merge6.q,vector_decimal_trailing.q,cte_5.q,tez_union.q,cbo_rp_subq_not_in.q,vector_decimal_2.q,columnStatsUpdateForStatsOptimizer_1.q,vector_outer_join3.q,schema_evol_text_vec_part_all_complex.q,tez_dynpart_hashjoin_2.q,auto_sortmerge_join_12.q,offset_limit.q,tez_union_multiinsert.q] TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely timed out) (batchId=251) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] (batchId=135) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision] (batchId=151) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2554/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2554/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2554/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12842907 - PreCommit-HIVE-Build > StatsOptimizer should return no rows on empty table with group by > - > > Key: HIVE-13452 > URL: https://issues.apache.org/jira/browse/HIVE-13452 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Reporter: Ashutosh Chauhan >Assignee: Pengcheng Xiong > Attachments: HIVE-13452.01.patch > > > {code} > create table t1 (a int); > analyze table t1 compute statistics; > analyze table t1 compute statistics for columns; > select count(1) from t1 group by 1; > set hive.compute.query.using.stats=true; > select count(1) from t1 group by 1; > {code} > In both cases result set should be empty. However, with statsoptimizer on > Hive returns one row with value 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-13278) Many redundant 'File not found' messages appeared in container log during query execution with Hive on Spark
[ https://issues.apache.org/jira/browse/HIVE-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15744091#comment-15744091 ] Xuefu Zhang edited comment on HIVE-13278 at 12/13/16 4:15 AM: -- [~lirui], I'm not sure if I understand the conclusion here, so a summary would be great. Annoying log is one thing, but another, more important problem is that trying to read reduce.xml for map-only tasks puts load on namenode. I'm wondering if it's possible to avoid making that call. Can you share your thoughts? Thanks. BTW, this seems happening to Hive on MR as well. was (Author: xuefuz): [~lirui], I'm not sure if I understand the conclusion here, so a summary would be great. Annoying log is one thing, but another, more important problem is that trying to read reduce.xml for map-only tasks puts load on namenode. I'm wondering if it's possible to avoid making that call. Can you share your thoughts? Thanks. > Many redundant 'File not found' messages appeared in container log during > query execution with Hive on Spark > > > Key: HIVE-13278 > URL: https://issues.apache.org/jira/browse/HIVE-13278 > Project: Hive > Issue Type: Bug > Environment: Hive on Spark engine > Found based on : > Apache Hive 2.0.0 > Apache Spark 1.6.0 >Reporter: Xin Hao >Assignee: Sahil Takiar >Priority: Minor > > Many redundant 'File not found' messages appeared in container log during > query execution with Hive on Spark. > Certainly, it doesn't prevent the query from running successfully. So mark it > as Minor currently. > Error message example: > {noformat} > 16/03/14 01:45:06 INFO exec.Utilities: File not found: File does not exist: > /tmp/hive/hadoop/2d378538-f5d3-493c-9276-c62dd6634fb4/hive_2016-03-14_01-44-16_835_623058724409492515-6/-mr-10010/0a6d0cae-1eb3-448c-883b-590b3b198a73/reduce.xml > at > org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66) > at > org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13278) Many redundant 'File not found' messages appeared in container log during query execution with Hive on Spark
[ https://issues.apache.org/jira/browse/HIVE-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15744091#comment-15744091 ] Xuefu Zhang commented on HIVE-13278: [~lirui], I'm not sure if I understand the conclusion here, so a summary would be great. Annoying log is one thing, but another, more important problem is that trying to read reduce.xml for map-only tasks puts load on namenode. I'm wondering if it's possible to avoid making that call. Can you share your thoughts? Thanks. > Many redundant 'File not found' messages appeared in container log during > query execution with Hive on Spark > > > Key: HIVE-13278 > URL: https://issues.apache.org/jira/browse/HIVE-13278 > Project: Hive > Issue Type: Bug > Environment: Hive on Spark engine > Found based on : > Apache Hive 2.0.0 > Apache Spark 1.6.0 >Reporter: Xin Hao >Assignee: Sahil Takiar >Priority: Minor > > Many redundant 'File not found' messages appeared in container log during > query execution with Hive on Spark. > Certainly, it doesn't prevent the query from running successfully. So mark it > as Minor currently. > Error message example: > {noformat} > 16/03/14 01:45:06 INFO exec.Utilities: File not found: File does not exist: > /tmp/hive/hadoop/2d378538-f5d3-493c-9276-c62dd6634fb4/hive_2016-03-14_01-44-16_835_623058724409492515-6/-mr-10010/0a6d0cae-1eb3-448c-883b-590b3b198a73/reduce.xml > at > org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66) > at > org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14007) Replace ORC module with ORC release
[ https://issues.apache.org/jira/browse/HIVE-14007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15744088#comment-15744088 ] Hive QA commented on HIVE-14007: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12842906/HIVE-14007.patch {color:green}SUCCESS:{color} +1 due to 6 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 45 failed/errored test(s), 9958 tests executed *Failed tests:* {noformat} TestBitFieldReader - did not produce a TEST-*.xml file (likely timed out) (batchId=237) TestBitPack - did not produce a TEST-*.xml file (likely timed out) (batchId=237) TestColumnStatistics - did not produce a TEST-*.xml file (likely timed out) (batchId=235) TestColumnStatisticsImpl - did not produce a TEST-*.xml file (likely timed out) (batchId=236) TestDataReaderProperties - did not produce a TEST-*.xml file (likely timed out) (batchId=236) TestDynamicArray - did not produce a TEST-*.xml file (likely timed out) (batchId=236) TestFileDump - did not produce a TEST-*.xml file (likely timed out) (batchId=235) TestInStream - did not produce a TEST-*.xml file (likely timed out) (batchId=236) TestIntegerCompressionReader - did not produce a TEST-*.xml file (likely timed out) (batchId=236) TestJsonFileDump - did not produce a TEST-*.xml file (likely timed out) (batchId=235) TestMemoryManager - did not produce a TEST-*.xml file (likely timed out) (batchId=237) TestMiniLlapCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=133) [mapreduce2.q,orc_llap_counters1.q,bucket6.q,insert_into1.q,empty_dir_in_table.q,orc_merge1.q,script_env_var1.q,orc_merge_diff_fs.q,llapdecider.q,load_hdfs_file_with_space_in_the_name.q,llap_nullscan.q,orc_ppd_basic.q,transform_ppr1.q,rcfile_merge4.q,orc_merge3.q] TestMiniLlapCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=137) [orc_merge2.q,insert_into2.q,reduce_deduplicate.q,orc_llap_counters.q,cte_4.q,schemeAuthority2.q,file_with_header_footer.q,rcfile_merge3.q] TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=144) [vectorized_rcfile_columnar.q,vector_elt.q,explainuser_1.q,multi_insert.q,tez_dml.q,vector_bround.q,schema_evol_orc_acid_table.q,vector_when_case_null.q,orc_ppd_schema_evol_1b.q,vector_join30.q,vectorization_11.q,cte_3.q,update_tmp_table.q,vector_decimal_cast.q,groupby_grouping_id2.q,vector_decimal_round.q,tez_smb_empty.q,orc_merge6.q,vector_decimal_trailing.q,cte_5.q,tez_union.q,cbo_rp_subq_not_in.q,vector_decimal_2.q,columnStatsUpdateForStatsOptimizer_1.q,vector_outer_join3.q,schema_evol_text_vec_part_all_complex.q,tez_dynpart_hashjoin_2.q,auto_sortmerge_join_12.q,offset_limit.q,tez_union_multiinsert.q] TestNewIntegerEncoding - did not produce a TEST-*.xml file (likely timed out) (batchId=238) TestOrcNullOptimization - did not produce a TEST-*.xml file (likely timed out) (batchId=235) TestOrcTimezone1 - did not produce a TEST-*.xml file (likely timed out) (batchId=235) TestOrcTimezone2 - did not produce a TEST-*.xml file (likely timed out) (batchId=235) TestOrcTimezone3 - did not produce a TEST-*.xml file (likely timed out) (batchId=235) TestOrcWideTable - did not produce a TEST-*.xml file (likely timed out) (batchId=236) TestOutStream - did not produce a TEST-*.xml file (likely timed out) (batchId=237) TestRLEv2 - did not produce a TEST-*.xml file (likely timed out) (batchId=236) TestReaderImpl - did not produce a TEST-*.xml file (likely timed out) (batchId=237) TestRecordReaderImpl - did not produce a TEST-*.xml file (likely timed out) (batchId=237) TestRunLengthByteReader - did not produce a TEST-*.xml file (likely timed out) (batchId=236) TestRunLengthIntegerReader - did not produce a TEST-*.xml file (likely timed out) (batchId=237) TestSchemaEvolution - did not produce a TEST-*.xml file (likely timed out) (batchId=237) TestSerializationUtils - did not produce a TEST-*.xml file (likely timed out) (batchId=237) TestStreamName - did not produce a TEST-*.xml file (likely timed out) (batchId=236) TestStringDictionary - did not produce a TEST-*.xml file (likely timed out) (batchId=235) TestStringRedBlackTree - did not produce a TEST-*.xml file (likely timed out) (batchId=236) TestTypeDescription - did not produce a TEST-*.xml file (likely timed out) (batchId=238) TestUnrolledBitPack - did not produce a TEST-*.xml file (likely timed out) (batchId=235) TestVectorOrcFile - did not produce a TEST-*.xml file (likely timed out) (batchId=235) TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely timed out) (batchId=251) TestZlib - did not produce a TEST-*.xml file (likely timed out) (batchId=237) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15) org.apache.hadoop.
[jira] [Commented] (HIVE-15386) Expose Spark task counts and stage Ids information in SparkTask from SparkJobMonitor
[ https://issues.apache.org/jira/browse/HIVE-15386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15744066#comment-15744066 ] Xuefu Zhang commented on HIVE-15386: Patch looks good to me as well. +1 > Expose Spark task counts and stage Ids information in SparkTask from > SparkJobMonitor > > > Key: HIVE-15386 > URL: https://issues.apache.org/jira/browse/HIVE-15386 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 2.2.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: HIVE-15386.000.patch, HIVE-15386.001.patch, > HIVE-15386.002.patch > > > Expose Spark task counts and stage Ids information in SparkTask from > SparkJobMonitor. So these information can be used by hive hook to monitor > spark jobs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15335) Fast Decimal
[ https://issues.apache.org/jira/browse/HIVE-15335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743990#comment-15743990 ] Sergey Shelukhin commented on HIVE-15335: - Did a pass of everything except FastHiveDecimalImpl.java itself. Will look at that later this week > Fast Decimal > > > Key: HIVE-15335 > URL: https://issues.apache.org/jira/browse/HIVE-15335 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-15335.01.patch, HIVE-15335.02.patch, > HIVE-15335.03.patch, HIVE-15335.04.patch, HIVE-15335.05.patch, > HIVE-15335.06.patch, HIVE-15335.07.patch > > > Replace HiveDecimal implementation that currently represents the decimal > internally as a BigDecimal with a faster version that does not allocate extra > objects > Replace HiveDecimalWritable implementation with a faster version that has new > mutable* calls (e.g. mutableAdd, mutableEnforcePrecisionScale, etc) and > stores the result as a fast decimal instead of a slow byte array containing a > serialized BigInteger. > Provide faster ways to serialize/deserialize decimals. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15421) Assumption in exception handling can be wrong in DagUtils.localizeResource
[ https://issues.apache.org/jira/browse/HIVE-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743964#comment-15743964 ] Hive QA commented on HIVE-15421: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12842890/HIVE-15421.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10811 tests executed *Failed tests:* {noformat} TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely timed out) (batchId=251) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] (batchId=135) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision] (batchId=151) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2552/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2552/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2552/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 8 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12842890 - PreCommit-HIVE-Build > Assumption in exception handling can be wrong in DagUtils.localizeResource > -- > > Key: HIVE-15421 > URL: https://issues.apache.org/jira/browse/HIVE-15421 > Project: Hive > Issue Type: Bug > Components: Tez >Affects Versions: 2.2.0 >Reporter: Wei Zheng >Assignee: Wei Zheng > Attachments: HIVE-15421.1.patch > > > In localizeResource once we got an IOException, we always assume this is due > to another thread writing the same file. But that is not always the case. > Even without the interference from other threads, it may still get an > IOException (RemoteException) due to failure of copyFromLocalFile in a > specific environment, for example, in a kerberized HDFS encryption zone where > the TGT is expired. > We'd better fail early with different message to avoid confusion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15147) LLAP: use LLAP cache for non-columnar formats in a somewhat general way
[ https://issues.apache.org/jira/browse/HIVE-15147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-15147: Attachment: HIVE-15147.02.WIP.noout.patch Added the cleanup, and fixed the initial issues; when running the tests now I see cache data being used, both to read entire files, and to complement file data. > LLAP: use LLAP cache for non-columnar formats in a somewhat general way > --- > > Key: HIVE-15147 > URL: https://issues.apache.org/jira/browse/HIVE-15147 > Project: Hive > Issue Type: New Feature >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-15147.01.WIP.noout.patch, > HIVE-15147.02.WIP.noout.patch, HIVE-15147.WIP.noout.patch > > > The primary goal for the first pass is caching text files. Nothing would > prevent other formats from using the same path, in principle, although, as > was originally done with ORC, it may be better to have native caching support > optimized for each particular format. > Given that caching pure text is not smart, and we already have ORC-encoded > cache that is columnar due to ORC file structure, we will transform data into > columnar ORC. > The general idea is to treat all the data in the world as merely ORC that was > compressed with some poor compression codec, such as csv. Using the original > IF and serde, as well as an ORC writer (with some heavyweight optimizations > disabled, potentially), we can "uncompress" the csv/whatever data into its > "original" ORC representation, then cache it efficiently, by column, and also > reuse a lot of the existing code. > Various other points: > 1) Caching granularity will have to be somehow determined (i.e. how do we > slice the file horizontally, to avoid caching entire columns). As with ORC > uncompressed files, the specific offsets don't really matter as long as they > are consistent between reads. The problem is that the file offsets will > actually need to be propagated to the new reader from the original > inputformat. Row counts are easier to use but there's a problem of how to > actually map them to missing ranges to read from disk. > 2) Obviously, for row-based formats, if any one column that is to be read has > been evicted or is otherwise missing, "all the columns" have to be read for > the corresponding slice to cache and read that one column. The vague plan is > to handle this implicitly, similarly to how ORC reader handles CB-RG overlaps > - it will just so happen that a missing column in disk range list to retrieve > will expand the disk-range-to-read into the whole horizontal slice of the > file. > 3) Granularity/etc. won't work for gzipped text. If anything at all is > evicted, the entire file has to be re-read. Gzipped text is a ridiculous > feature, so this is by design. > 4) In future, it would be possible to also build some form or > metadata/indexes for this cached data to do PPD, etc. This is out of the > scope for now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15313) Add export spark.yarn.archive or spark.yarn.jars variable in Hive on Spark document
[ https://issues.apache.org/jira/browse/HIVE-15313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liyunzhang_intel updated HIVE-15313: Description: According to [wiki|https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started], run queries in HOS16 and HOS20 in yarn mode. Following table shows the difference in query time between HOS16 and HOS20. ||Version||Total time||Time for Jobs||Time for preparing jobs|| |Spark16|51|39|12| |Spark20|54|40|14| HOS20 spends more time(2 secs) on preparing jobs than HOS16. After reviewing the source code of spark, found that following point causes this: code:[Client#distribute|https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L546], In spark20, if spark cannot find spark.yarn.archive and spark.yarn.jars in spark configuration file, it will first copy all jars in $SPARK_HOME/jars to a tmp directory and upload the tmp directory to distribute cache. Comparing [spark16|https://github.com/apache/spark/blob/branch-1.6/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L1145], In spark16, it searches spark-assembly*.jar and upload it to distribute cache. In spark20, it spends 2 more seconds to copy all jars in $SPARK_HOME/jar to a tmp directory if we don't set "spark.yarn.archive" or "spark.yarn.jars". We can accelerate the startup of hive on spark 20 by settintg "spark.yarn.archive" or "spark.yarn.jars": set "spark.yarn.archive": {code} cd $SPARK_HOME/jars zip spark-archive.zip ./*.jar # this is important, enter the jars folder then zip $ hadoop fs -copyFromLocal spark-archive.zip $ echo "spark.yarn.archive=hdfs:///xxx:8020/spark-archive.zip" >> conf/spark-defaults.conf {code} set "spark.yarn.jars": {code} $ hadoop fs mkdir spark-2.0.0-bin-hadoop $hadoop fs -copyFromLocal $SPARK_HOME/jars/* spark-2.0.0-bin-hadoop $ echo "spark.yarn.jars=hdfs:///xxx:8020/spark-2.0.0-bin-hadoop/*" >> conf/spark-defaults.conf {code} Suggest to add this part in wiki. performance.improvement.after.set.spark.yarn.archive.PNG shows the detail performance impovement after setting spark.yarn.archive in small queries. was: According to [wiki|https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started], run queries in HOS16 and HOS20 in yarn mode. Following table shows the difference in query time between HOS16 and HOS20. ||Version||Total time||Time for Jobs||Time for preparing jobs|| |Spark16|51|39|12| |Spark20|54|40|14| HOS20 spends more time(2 secs) on preparing jobs than HOS16. After reviewing the source code of spark, found that following point causes this: code:[Client#distribute|https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L546], In spark20, if spark cannot find spark.yarn.archive and spark.yarn.jars in spark configuration file, it will first copy all jars in $SPARK_HOME/jars to a tmp directory and upload the tmp directory to distribute cache. Comparing [spark16|https://github.com/apache/spark/blob/branch-1.6/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L1145], In spark16, it searches spark-assembly*.jar and upload it to distribute cache. In spark20, it spends 2 more seconds to copy all jars in $SPARK_HOME/jar to a tmp directory if we don't set "spark.yarn.archive" or "spark.yarn.jars". We can accelerate the startup of hive on spark 20 by settintg "spark.yarn.archive" or "spark.yarn.jars": set "spark.yarn.archive": {code} zip spark-archive.zip $SPARK_HOME/jars/* $ hadoop fs -copyFromLocal spark-archive.zip $ echo "spark.yarn.archive=hdfs:///xxx:8020/spark-archive.zip" >> conf/spark-defaults.conf {code} set "spark.yarn.jars": {code} $ hadoop fs mkdir spark-2.0.0-bin-hadoop $hadoop fs -copyFromLocal $SPARK_HOME/jars/* spark-2.0.0-bin-hadoop $ echo "spark.yarn.jars=hdfs:///xxx:8020/spark-2.0.0-bin-hadoop/*" >> conf/spark-defaults.conf {code} Suggest to add this part in wiki. performance.improvement.after.set.spark.yarn.archive.PNG shows the detail performance impovement after setting spark.yarn.archive in small queries. > Add export spark.yarn.archive or spark.yarn.jars variable in Hive on Spark > document > --- > > Key: HIVE-15313 > URL: https://issues.apache.org/jira/browse/HIVE-15313 > Project: Hive > Issue Type: Bug >Reporter: liyunzhang_intel >Priority: Minor > Attachments: performance.improvement.after.set.spark.yarn.archive.PNG > > > According to > [wiki|https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started], > run queries in HOS16 and HOS20 in yarn mode. > Following table shows the difference in query time between HOS16 and HOS20. > ||Version||Total time||Time for Jobs||Time for prep
[jira] [Commented] (HIVE-15407) add distcp to classpath by default, because hive depends on it.
[ https://issues.apache.org/jira/browse/HIVE-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743900#comment-15743900 ] Fei Hui commented on HIVE-15407: hi [~prasanth_j] could you please give suggestions and review it ? > add distcp to classpath by default, because hive depends on it. > > > Key: HIVE-15407 > URL: https://issues.apache.org/jira/browse/HIVE-15407 > Project: Hive > Issue Type: Bug > Components: Beeline, CLI >Affects Versions: 2.2.0 >Reporter: Fei Hui >Assignee: Fei Hui > Attachments: HIVE-15407.1.patch > > > when i run hive queries, i get errors as follow > java.lang.NoClassDefFoundError: org/apache/hadoop/tools/DistCpOptions > ... > I dig into code, and find that hive depends on distcp ,but distcp is not in > classpath by default. > I think if adding distcp to hadoop classpath by default in hadoop project, > but hadoop committers will not do that. discussions in HADOOP-13865 . They > propose that Resolving this problem on HIVE > So i add distcp to classpath on HIVE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15386) Expose Spark task counts and stage Ids information in SparkTask from SparkJobMonitor
[ https://issues.apache.org/jira/browse/HIVE-15386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743862#comment-15743862 ] Rui Li commented on HIVE-15386: --- Thanks for the update [~zxu]. +1. [~xuefuz] do you have any further comments? > Expose Spark task counts and stage Ids information in SparkTask from > SparkJobMonitor > > > Key: HIVE-15386 > URL: https://issues.apache.org/jira/browse/HIVE-15386 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 2.2.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: HIVE-15386.000.patch, HIVE-15386.001.patch, > HIVE-15386.002.patch > > > Expose Spark task counts and stage Ids information in SparkTask from > SparkJobMonitor. So these information can be used by hive hook to monitor > spark jobs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15118) Remove unused 'COLUMNS' table from derby schema
[ https://issues.apache.org/jira/browse/HIVE-15118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743852#comment-15743852 ] Hive QA commented on HIVE-15118: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12842887/HIVE-15118.3.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10811 tests executed *Failed tests:* {noformat} TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely timed out) (batchId=251) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] (batchId=135) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision] (batchId=151) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=93) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2551/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2551/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2551/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12842887 - PreCommit-HIVE-Build > Remove unused 'COLUMNS' table from derby schema > --- > > Key: HIVE-15118 > URL: https://issues.apache.org/jira/browse/HIVE-15118 > Project: Hive > Issue Type: Sub-task > Components: Database/Schema >Affects Versions: 2.2.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Minor > Attachments: HIVE-15118.1.patch, HIVE-15118.2.patch, > HIVE-15118.3.patch > > > COLUMNS table is unused any more. Other databases already removed it. Remove > from derby as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15383) Add additional info to 'desc function extended' output
[ https://issues.apache.org/jira/browse/HIVE-15383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743769#comment-15743769 ] Hive QA commented on HIVE-15383: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12842875/HIVE-15383.2.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 10781 tests executed *Failed tests:* {noformat} TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=144) [vectorized_rcfile_columnar.q,vector_elt.q,explainuser_1.q,multi_insert.q,tez_dml.q,vector_bround.q,schema_evol_orc_acid_table.q,vector_when_case_null.q,orc_ppd_schema_evol_1b.q,vector_join30.q,vectorization_11.q,cte_3.q,update_tmp_table.q,vector_decimal_cast.q,groupby_grouping_id2.q,vector_decimal_round.q,tez_smb_empty.q,orc_merge6.q,vector_decimal_trailing.q,cte_5.q,tez_union.q,cbo_rp_subq_not_in.q,vector_decimal_2.q,columnStatsUpdateForStatsOptimizer_1.q,vector_outer_join3.q,schema_evol_text_vec_part_all_complex.q,tez_dynpart_hashjoin_2.q,auto_sortmerge_join_12.q,offset_limit.q,tez_union_multiinsert.q] TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely timed out) (batchId=251) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_index] (batchId=33) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_sort_array] (batchId=59) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_stddev_pop] (batchId=71) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a] (batchId=135) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] (batchId=135) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision] (batchId=151) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=93) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2549/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2549/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2549/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 14 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12842875 - PreCommit-HIVE-Build > Add additional info to 'desc function extended' output > -- > > Key: HIVE-15383 > URL: https://issues.apache.org/jira/browse/HIVE-15383 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 2.2.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Trivial > Attachments: HIVE-15383.1.patch, HIVE-15383.2.patch > > > Add additional info to the output to 'desc function extended'. The resources > would be helpful for the user to check which jars are referred. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13452) StatsOptimizer should return no rows on empty table with group by
[ https://issues.apache.org/jira/browse/HIVE-13452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-13452: --- Status: Patch Available (was: Open) > StatsOptimizer should return no rows on empty table with group by > - > > Key: HIVE-13452 > URL: https://issues.apache.org/jira/browse/HIVE-13452 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Reporter: Ashutosh Chauhan >Assignee: Pengcheng Xiong > Attachments: HIVE-13452.01.patch > > > {code} > create table t1 (a int); > analyze table t1 compute statistics; > analyze table t1 compute statistics for columns; > select count(1) from t1 group by 1; > set hive.compute.query.using.stats=true; > select count(1) from t1 group by 1; > {code} > In both cases result set should be empty. However, with statsoptimizer on > Hive returns one row with value 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13452) StatsOptimizer should return no rows on empty table with group by
[ https://issues.apache.org/jira/browse/HIVE-13452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-13452: --- Attachment: HIVE-13452.01.patch > StatsOptimizer should return no rows on empty table with group by > - > > Key: HIVE-13452 > URL: https://issues.apache.org/jira/browse/HIVE-13452 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Reporter: Ashutosh Chauhan >Assignee: Pengcheng Xiong > Attachments: HIVE-13452.01.patch > > > {code} > create table t1 (a int); > analyze table t1 compute statistics; > analyze table t1 compute statistics for columns; > select count(1) from t1 group by 1; > set hive.compute.query.using.stats=true; > select count(1) from t1 group by 1; > {code} > In both cases result set should be empty. However, with statsoptimizer on > Hive returns one row with value 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13452) StatsOptimizer should return no rows on empty table with group by
[ https://issues.apache.org/jira/browse/HIVE-13452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-13452: --- Status: Open (was: Patch Available) > StatsOptimizer should return no rows on empty table with group by > - > > Key: HIVE-13452 > URL: https://issues.apache.org/jira/browse/HIVE-13452 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Reporter: Ashutosh Chauhan >Assignee: Pengcheng Xiong > Attachments: HIVE-13452.01.patch > > > {code} > create table t1 (a int); > analyze table t1 compute statistics; > analyze table t1 compute statistics for columns; > select count(1) from t1 group by 1; > set hive.compute.query.using.stats=true; > select count(1) from t1 group by 1; > {code} > In both cases result set should be empty. However, with statsoptimizer on > Hive returns one row with value 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13452) StatsOptimizer should return no rows on empty table with group by
[ https://issues.apache.org/jira/browse/HIVE-13452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-13452: --- Attachment: (was: HIVE-13452.01.patch) > StatsOptimizer should return no rows on empty table with group by > - > > Key: HIVE-13452 > URL: https://issues.apache.org/jira/browse/HIVE-13452 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Reporter: Ashutosh Chauhan >Assignee: Pengcheng Xiong > > {code} > create table t1 (a int); > analyze table t1 compute statistics; > analyze table t1 compute statistics for columns; > select count(1) from t1 group by 1; > set hive.compute.query.using.stats=true; > select count(1) from t1 group by 1; > {code} > In both cases result set should be empty. However, with statsoptimizer on > Hive returns one row with value 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14007) Replace ORC module with ORC release
[ https://issues.apache.org/jira/browse/HIVE-14007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HIVE-14007: - Attachment: HIVE-14007.patch > Replace ORC module with ORC release > --- > > Key: HIVE-14007 > URL: https://issues.apache.org/jira/browse/HIVE-14007 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 2.2.0 >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Fix For: 2.2.0 > > Attachments: HIVE-14007.patch, HIVE-14007.patch, HIVE-14007.patch, > HIVE-14007.patch, HIVE-14007.patch > > > This completes moving the core ORC reader & writer to the ORC project. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13680) HiveServer2: Provide a way to compress ResultSets
[ https://issues.apache.org/jira/browse/HIVE-13680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743598#comment-15743598 ] Hive QA commented on HIVE-13680: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12842862/HIVE-13680.6.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 10795 tests executed *Failed tests:* {noformat} TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=144) [vectorized_rcfile_columnar.q,vector_elt.q,explainuser_1.q,multi_insert.q,tez_dml.q,vector_bround.q,schema_evol_orc_acid_table.q,vector_when_case_null.q,orc_ppd_schema_evol_1b.q,vector_join30.q,vectorization_11.q,cte_3.q,update_tmp_table.q,vector_decimal_cast.q,groupby_grouping_id2.q,vector_decimal_round.q,tez_smb_empty.q,orc_merge6.q,vector_decimal_trailing.q,cte_5.q,tez_union.q,cbo_rp_subq_not_in.q,vector_decimal_2.q,columnStatsUpdateForStatsOptimizer_1.q,vector_outer_join3.q,schema_evol_text_vec_part_all_complex.q,tez_dynpart_hashjoin_2.q,auto_sortmerge_join_12.q,offset_limit.q,tez_union_multiinsert.q] TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely timed out) (batchId=252) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] (batchId=44) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] (batchId=135) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision] (batchId=151) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=93) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2548/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2548/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2548/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 11 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12842862 - PreCommit-HIVE-Build > HiveServer2: Provide a way to compress ResultSets > - > > Key: HIVE-13680 > URL: https://issues.apache.org/jira/browse/HIVE-13680 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2, JDBC >Reporter: Vaibhav Gumashta >Assignee: Kevin Liew > Attachments: HIVE-13680.2.patch, HIVE-13680.3.patch, > HIVE-13680.4.patch, HIVE-13680.6.patch, HIVE-13680.patch, SnappyCompDe.zip, > proposal.pdf > > > With HIVE-12049 in, we can provide an option to compress ResultSets before > writing to disk. The user can specify a compression library via a config > param which can be used in the tasks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15421) Assumption in exception handling can be wrong in DagUtils.localizeResource
[ https://issues.apache.org/jira/browse/HIVE-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743526#comment-15743526 ] Daniel Dai commented on HIVE-15421: --- Can you comment the HDFS jira and fixed version in code? > Assumption in exception handling can be wrong in DagUtils.localizeResource > -- > > Key: HIVE-15421 > URL: https://issues.apache.org/jira/browse/HIVE-15421 > Project: Hive > Issue Type: Bug > Components: Tez >Affects Versions: 2.2.0 >Reporter: Wei Zheng >Assignee: Wei Zheng > Attachments: HIVE-15421.1.patch > > > In localizeResource once we got an IOException, we always assume this is due > to another thread writing the same file. But that is not always the case. > Even without the interference from other threads, it may still get an > IOException (RemoteException) due to failure of copyFromLocalFile in a > specific environment, for example, in a kerberized HDFS encryption zone where > the TGT is expired. > We'd better fail early with different message to avoid confusion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15386) Expose Spark task counts and stage Ids information in SparkTask from SparkJobMonitor
[ https://issues.apache.org/jira/browse/HIVE-15386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743445#comment-15743445 ] Hive QA commented on HIVE-15386: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12842852/HIVE-15386.002.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 10780 tests executed *Failed tests:* {noformat} TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=144) [vectorized_rcfile_columnar.q,vector_elt.q,explainuser_1.q,multi_insert.q,tez_dml.q,vector_bround.q,schema_evol_orc_acid_table.q,vector_when_case_null.q,orc_ppd_schema_evol_1b.q,vector_join30.q,vectorization_11.q,cte_3.q,update_tmp_table.q,vector_decimal_cast.q,groupby_grouping_id2.q,vector_decimal_round.q,tez_smb_empty.q,orc_merge6.q,vector_decimal_trailing.q,cte_5.q,tez_union.q,cbo_rp_subq_not_in.q,vector_decimal_2.q,columnStatsUpdateForStatsOptimizer_1.q,vector_outer_join3.q,schema_evol_text_vec_part_all_complex.q,tez_dynpart_hashjoin_2.q,auto_sortmerge_join_12.q,offset_limit.q,tez_union_multiinsert.q] TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely timed out) (batchId=251) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] (batchId=135) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision] (batchId=151) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=93) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_3] (batchId=92) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2547/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2547/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2547/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 11 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12842852 - PreCommit-HIVE-Build > Expose Spark task counts and stage Ids information in SparkTask from > SparkJobMonitor > > > Key: HIVE-15386 > URL: https://issues.apache.org/jira/browse/HIVE-15386 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 2.2.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: HIVE-15386.000.patch, HIVE-15386.001.patch, > HIVE-15386.002.patch > > > Expose Spark task counts and stage Ids information in SparkTask from > SparkJobMonitor. So these information can be used by hive hook to monitor > spark jobs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15118) Remove unused 'COLUMNS' table from derby schema
[ https://issues.apache.org/jira/browse/HIVE-15118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743444#comment-15743444 ] Naveen Gangam commented on HIVE-15118: -- [~ychena] It was done it {{008-HIVE-2246.derby.sql}} file. There is a {{008-REVERT-HIVE-2246.derby.sql}} script that reverts this change where the rename occurs, but it never gets called from any of the real upgrade scripts. So I assume it was meant for folks who wanted to manually revert the change > Remove unused 'COLUMNS' table from derby schema > --- > > Key: HIVE-15118 > URL: https://issues.apache.org/jira/browse/HIVE-15118 > Project: Hive > Issue Type: Sub-task > Components: Database/Schema >Affects Versions: 2.2.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Minor > Attachments: HIVE-15118.1.patch, HIVE-15118.2.patch, > HIVE-15118.3.patch > > > COLUMNS table is unused any more. Other databases already removed it. Remove > from derby as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15118) Remove unused 'COLUMNS' table from derby schema
[ https://issues.apache.org/jira/browse/HIVE-15118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743435#comment-15743435 ] Yongzhi Chen commented on HIVE-15118: - [~aihuaxu], could you point to me where is the script that change columns to columns_old? > Remove unused 'COLUMNS' table from derby schema > --- > > Key: HIVE-15118 > URL: https://issues.apache.org/jira/browse/HIVE-15118 > Project: Hive > Issue Type: Sub-task > Components: Database/Schema >Affects Versions: 2.2.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Minor > Attachments: HIVE-15118.1.patch, HIVE-15118.2.patch, > HIVE-15118.3.patch > > > COLUMNS table is unused any more. Other databases already removed it. Remove > from derby as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14007) Replace ORC module with ORC release
[ https://issues.apache.org/jira/browse/HIVE-14007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743404#comment-15743404 ] Owen O'Malley commented on HIVE-14007: -- {quote} The current behavior and the fix to encode column names was shipped in hive 2 and this patch fundamentally changes how alter table statements/schema evolution works. {quote} Ok, it is trivial to add a configuration knob like "orc.ignore.names.for.evolution", which we can do. I've filed ORC-120 to handle the transition to the richer schema evolution. As for your other concerns, I've presented a solution and you keep raising the same vague concerns. Having the two implementations of ORC is *really* problematic and is leading to Hive not getting bug fixes and new features. That will continue to worsen as the two code branches continue to diverge. > Replace ORC module with ORC release > --- > > Key: HIVE-14007 > URL: https://issues.apache.org/jira/browse/HIVE-14007 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 2.2.0 >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Fix For: 2.2.0 > > Attachments: HIVE-14007.patch, HIVE-14007.patch, HIVE-14007.patch, > HIVE-14007.patch > > > This completes moving the core ORC reader & writer to the ORC project. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15401) Import constraints into HBase metastore
[ https://issues.apache.org/jira/browse/HIVE-15401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-15401: -- Resolution: Fixed Fix Version/s: 2.2.0 Status: Resolved (was: Patch Available) Patch committed to master. Thanks Daniel for the review. > Import constraints into HBase metastore > --- > > Key: HIVE-15401 > URL: https://issues.apache.org/jira/browse/HIVE-15401 > Project: Hive > Issue Type: Sub-task > Components: HBase Metastore >Affects Versions: 2.1.1 >Reporter: Alan Gates >Assignee: Alan Gates > Fix For: 2.2.0 > > Attachments: HIVE-15401.patch > > > Since HIVE-15342 added support for primary and foreign keys in the HBase > metastore we should support them in HBaseImport as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15421) Assumption in exception handling can be wrong in DagUtils.localizeResource
[ https://issues.apache.org/jira/browse/HIVE-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743377#comment-15743377 ] Wei Zheng commented on HIVE-15421: -- [~daijy] Can you take a look please? > Assumption in exception handling can be wrong in DagUtils.localizeResource > -- > > Key: HIVE-15421 > URL: https://issues.apache.org/jira/browse/HIVE-15421 > Project: Hive > Issue Type: Bug > Components: Tez >Affects Versions: 2.2.0 >Reporter: Wei Zheng >Assignee: Wei Zheng > Attachments: HIVE-15421.1.patch > > > In localizeResource once we got an IOException, we always assume this is due > to another thread writing the same file. But that is not always the case. > Even without the interference from other threads, it may still get an > IOException (RemoteException) due to failure of copyFromLocalFile in a > specific environment, for example, in a kerberized HDFS encryption zone where > the TGT is expired. > We'd better fail early with different message to avoid confusion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15421) Assumption in exception handling can be wrong in DagUtils.localizeResource
[ https://issues.apache.org/jira/browse/HIVE-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng updated HIVE-15421: - Status: Patch Available (was: Open) > Assumption in exception handling can be wrong in DagUtils.localizeResource > -- > > Key: HIVE-15421 > URL: https://issues.apache.org/jira/browse/HIVE-15421 > Project: Hive > Issue Type: Bug > Components: Tez >Affects Versions: 2.2.0 >Reporter: Wei Zheng >Assignee: Wei Zheng > Attachments: HIVE-15421.1.patch > > > In localizeResource once we got an IOException, we always assume this is due > to another thread writing the same file. But that is not always the case. > Even without the interference from other threads, it may still get an > IOException (RemoteException) due to failure of copyFromLocalFile in a > specific environment, for example, in a kerberized HDFS encryption zone where > the TGT is expired. > We'd better fail early with different message to avoid confusion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15421) Assumption in exception handling can be wrong in DagUtils.localizeResource
[ https://issues.apache.org/jira/browse/HIVE-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng updated HIVE-15421: - Attachment: (was: HIVE-15421.1.patch) > Assumption in exception handling can be wrong in DagUtils.localizeResource > -- > > Key: HIVE-15421 > URL: https://issues.apache.org/jira/browse/HIVE-15421 > Project: Hive > Issue Type: Bug > Components: Tez >Affects Versions: 2.2.0 >Reporter: Wei Zheng >Assignee: Wei Zheng > Attachments: HIVE-15421.1.patch > > > In localizeResource once we got an IOException, we always assume this is due > to another thread writing the same file. But that is not always the case. > Even without the interference from other threads, it may still get an > IOException (RemoteException) due to failure of copyFromLocalFile in a > specific environment, for example, in a kerberized HDFS encryption zone where > the TGT is expired. > We'd better fail early with different message to avoid confusion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15421) Assumption in exception handling can be wrong in DagUtils.localizeResource
[ https://issues.apache.org/jira/browse/HIVE-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng updated HIVE-15421: - Attachment: HIVE-15421.1.patch > Assumption in exception handling can be wrong in DagUtils.localizeResource > -- > > Key: HIVE-15421 > URL: https://issues.apache.org/jira/browse/HIVE-15421 > Project: Hive > Issue Type: Bug > Components: Tez >Affects Versions: 2.2.0 >Reporter: Wei Zheng >Assignee: Wei Zheng > Attachments: HIVE-15421.1.patch > > > In localizeResource once we got an IOException, we always assume this is due > to another thread writing the same file. But that is not always the case. > Even without the interference from other threads, it may still get an > IOException (RemoteException) due to failure of copyFromLocalFile in a > specific environment, for example, in a kerberized HDFS encryption zone where > the TGT is expired. > We'd better fail early with different message to avoid confusion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15421) Assumption in exception handling can be wrong in DagUtils.localizeResource
[ https://issues.apache.org/jira/browse/HIVE-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng updated HIVE-15421: - Attachment: HIVE-15421.1.patch > Assumption in exception handling can be wrong in DagUtils.localizeResource > -- > > Key: HIVE-15421 > URL: https://issues.apache.org/jira/browse/HIVE-15421 > Project: Hive > Issue Type: Bug > Components: Tez >Affects Versions: 2.2.0 >Reporter: Wei Zheng >Assignee: Wei Zheng > Attachments: HIVE-15421.1.patch > > > In localizeResource once we got an IOException, we always assume this is due > to another thread writing the same file. But that is not always the case. > Even without the interference from other threads, it may still get an > IOException (RemoteException) due to failure of copyFromLocalFile in a > specific environment, for example, in a kerberized HDFS encryption zone where > the TGT is expired. > We'd better fail early with different message to avoid confusion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-15418) "select 'abc'" will throw 'Cannot find path in conf'
[ https://issues.apache.org/jira/browse/HIVE-15418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu resolved HIVE-15418. - Resolution: Cannot Reproduce Assignee: (was: Aihua Xu) Seems it's caused by older hadoop version. > "select 'abc'" will throw 'Cannot find path in conf' > > > Key: HIVE-15418 > URL: https://issues.apache.org/jira/browse/HIVE-15418 > Project: Hive > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Aihua Xu > > Here is the stack trace. Seems it's a regression since it worked with earlier > version. > {noformat} > 2016-12-09T16:32:37,577 ERROR [56fa1999-ffbe-42c0-bb91-61211cd62476 main] > CliDriver: Failed with exception java.io.IOException:java.io.IOException: > Cannot find path in conf > java.io.IOException: java.io.IOException: Cannot find path in conf > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:521) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428) > at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:147) > at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2191) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:253) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:400) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:777) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:715) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:642) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > Caused by: java.io.IOException: Cannot find path in conf > at > org.apache.hadoop.hive.ql.io.NullRowsInputFormat.getSplits(NullRowsInputFormat.java:165) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getNextSplits(FetchOperator.java:372) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:304) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:459) > ... 15 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15118) Remove unused 'COLUMNS' table from derby schema
[ https://issues.apache.org/jira/browse/HIVE-15118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743365#comment-15743365 ] Naveen Gangam commented on HIVE-15118: -- LGTM pending tests .. +1 for me. > Remove unused 'COLUMNS' table from derby schema > --- > > Key: HIVE-15118 > URL: https://issues.apache.org/jira/browse/HIVE-15118 > Project: Hive > Issue Type: Sub-task > Components: Database/Schema >Affects Versions: 2.2.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Minor > Attachments: HIVE-15118.1.patch, HIVE-15118.2.patch, > HIVE-15118.3.patch > > > COLUMNS table is unused any more. Other databases already removed it. Remove > from derby as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15118) Remove unused 'COLUMNS' table from derby schema
[ https://issues.apache.org/jira/browse/HIVE-15118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-15118: Attachment: HIVE-15118.3.patch patch-3: address comments. During upgrade, the columns table was renamed to columns_old, so need to drop columns_old instead of columns during upgrade. > Remove unused 'COLUMNS' table from derby schema > --- > > Key: HIVE-15118 > URL: https://issues.apache.org/jira/browse/HIVE-15118 > Project: Hive > Issue Type: Sub-task > Components: Database/Schema >Affects Versions: 2.2.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Minor > Attachments: HIVE-15118.1.patch, HIVE-15118.2.patch, > HIVE-15118.3.patch > > > COLUMNS table is unused any more. Other databases already removed it. Remove > from derby as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13452) StatsOptimizer should return no rows on empty table with group by
[ https://issues.apache.org/jira/browse/HIVE-13452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-13452: --- Status: Patch Available (was: Open) > StatsOptimizer should return no rows on empty table with group by > - > > Key: HIVE-13452 > URL: https://issues.apache.org/jira/browse/HIVE-13452 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Reporter: Ashutosh Chauhan >Assignee: Pengcheng Xiong > Attachments: HIVE-13452.01.patch > > > {code} > create table t1 (a int); > analyze table t1 compute statistics; > analyze table t1 compute statistics for columns; > select count(1) from t1 group by 1; > set hive.compute.query.using.stats=true; > select count(1) from t1 group by 1; > {code} > In both cases result set should be empty. However, with statsoptimizer on > Hive returns one row with value 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13452) StatsOptimizer should return no rows on empty table with group by
[ https://issues.apache.org/jira/browse/HIVE-13452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-13452: --- Status: Open (was: Patch Available) > StatsOptimizer should return no rows on empty table with group by > - > > Key: HIVE-13452 > URL: https://issues.apache.org/jira/browse/HIVE-13452 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Reporter: Ashutosh Chauhan >Assignee: Pengcheng Xiong > Attachments: HIVE-13452.01.patch > > > {code} > create table t1 (a int); > analyze table t1 compute statistics; > analyze table t1 compute statistics for columns; > select count(1) from t1 group by 1; > set hive.compute.query.using.stats=true; > select count(1) from t1 group by 1; > {code} > In both cases result set should be empty. However, with statsoptimizer on > Hive returns one row with value 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15420) LLAP UI: Relativize resources to allow proxied/secured views
[ https://issues.apache.org/jira/browse/HIVE-15420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-15420: --- Status: Patch Available (was: Open) > LLAP UI: Relativize resources to allow proxied/secured views > - > > Key: HIVE-15420 > URL: https://issues.apache.org/jira/browse/HIVE-15420 > Project: Hive > Issue Type: Bug > Components: llap, Web UI >Reporter: Gopal V >Assignee: Gopal V > Attachments: HIVE-15420.1.patch > > > If the UI is secured behind a gateway firewall instance, this allows for the > UI to function with a base URL like http:///proxy/ > NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15420) LLAP UI: Relativize resources to allow proxied/secured views
[ https://issues.apache.org/jira/browse/HIVE-15420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-15420: --- Description: If the UI is secured behind a gateway firewall instance, this allows for the UI to function with a base URL like http:///proxy/ NO PRECOMMIT TESTS was:If the UI is secured behind a gateway firewall instance, this allows for the UI to function with a base URL like http:///proxy/ > LLAP UI: Relativize resources to allow proxied/secured views > - > > Key: HIVE-15420 > URL: https://issues.apache.org/jira/browse/HIVE-15420 > Project: Hive > Issue Type: Bug > Components: llap, Web UI >Reporter: Gopal V >Assignee: Gopal V > Attachments: HIVE-15420.1.patch > > > If the UI is secured behind a gateway firewall instance, this allows for the > UI to function with a base URL like http:///proxy/ > NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15420) LLAP UI: Relativize resources to allow proxied/secured views
[ https://issues.apache.org/jira/browse/HIVE-15420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-15420: --- Attachment: HIVE-15420.1.patch > LLAP UI: Relativize resources to allow proxied/secured views > - > > Key: HIVE-15420 > URL: https://issues.apache.org/jira/browse/HIVE-15420 > Project: Hive > Issue Type: Bug > Components: llap, Web UI >Reporter: Gopal V >Assignee: Gopal V > Attachments: HIVE-15420.1.patch > > > If the UI is secured behind a gateway firewall instance, this allows for the > UI to function with a base URL like http:///proxy/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-15420) LLAP UI: Relativize resources to allow proxied/secured views
[ https://issues.apache.org/jira/browse/HIVE-15420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V reassigned HIVE-15420: -- Assignee: Gopal V > LLAP UI: Relativize resources to allow proxied/secured views > - > > Key: HIVE-15420 > URL: https://issues.apache.org/jira/browse/HIVE-15420 > Project: Hive > Issue Type: Bug > Components: llap, Web UI >Reporter: Gopal V >Assignee: Gopal V > > If the UI is secured behind a gateway firewall instance, this allows for the > UI to function with a base URL like http:///proxy/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15397) metadata-only queries may return incorrect results with empty tables
[ https://issues.apache.org/jira/browse/HIVE-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743268#comment-15743268 ] Ashutosh Chauhan commented on HIVE-15397: - +1 > metadata-only queries may return incorrect results with empty tables > > > Key: HIVE-15397 > URL: https://issues.apache.org/jira/browse/HIVE-15397 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-15397.01.patch, HIVE-15397.patch > > > Queries like select 1=1 from t group by 1=1 may return rows, based on > OneNullRowInputFormat, even if the source table is empty. For now, add some > basic detection of empty tables and turn this off by default (since we can't > know whether a table is empty or not based on there being some files, without > reading them). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15417) Glitches using ACID's row__id hidden column
[ https://issues.apache.org/jira/browse/HIVE-15417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743265#comment-15743265 ] Hive QA commented on HIVE-15417: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12842851/HIVE-15417.02.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 10796 tests executed *Failed tests:* {noformat} TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=116) [join39.q,bucketsortoptimize_insert_7.q,vector_distinct_2.q,join11.q,union13.q,dynamic_rdd_cache.q,auto_sortmerge_join_16.q,windowing.q,union_remove_3.q,skewjoinopt7.q,stats7.q,annotate_stats_join.q,multi_insert_lateral_view.q,ptf_streaming.q,join_1to1.q] TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely timed out) (batchId=251) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] (batchId=135) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision] (batchId=151) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_4] (batchId=93) org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.testSparkQuery (batchId=216) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2546/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2546/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2546/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 11 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12842851 - PreCommit-HIVE-Build > Glitches using ACID's row__id hidden column > --- > > Key: HIVE-15417 > URL: https://issues.apache.org/jira/browse/HIVE-15417 > Project: Hive > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Carter Shanklin >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-15417.01.patch, HIVE-15417.02.patch, > HIVE-15417.patch > > > This only works if you turn PPD off. > {code:sql} > drop table if exists hello_acid; > create table hello_acid (key int, value int) > partitioned by (load_date date) > clustered by(key) into 3 buckets > stored as orc tblproperties ('transactional'='true'); > insert into hello_acid partition (load_date='2016-03-01') values (1, 1); > insert into hello_acid partition (load_date='2016-03-02') values (2, 2); > insert into hello_acid partition (load_date='2016-03-03') values (3, 3); > {code} > {code} > hive> set hive.optimize.ppd=true; > hive> select tid from (select row__id.transactionid as tid from hello_acid) > sub where tid = 15; > FAILED: SemanticException MetaException(message:cannot find field row__id > from [0:load_date]) > hive> set hive.optimize.ppd=false; > hive> select tid from (select row__id.transactionid as tid from hello_acid) > sub where tid = 15; > OK > tid > 15 > Time taken: 0.075 seconds, Fetched: 1 row(s) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15118) Remove unused 'COLUMNS' table from derby schema
[ https://issues.apache.org/jira/browse/HIVE-15118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743236#comment-15743236 ] Naveen Gangam commented on HIVE-15118: -- [~aihuaxu] Functionally, the second version of the patch looks fine. Just a couple of nits: 1) the {{UPDATE "APP".VERSION SET SCHEMA_VERSION=}} command should be the last command in this file for a couple of reasons a) for consistency with other upgrade schema files and more importantly 2) the version in this table is pretty heavily relied upon to determine the schema version. So the upgrade will not be considered complete until the version is set. If for some reason, the upgrade script terminates abruptly before this version is set, the version would not be set and would be and indication that something went wrong during upgrade. With the current patch, if the script ends right after the version is set, the drop table command will not be processed but the schema version would indicate that it succeeded. 2) Can you move the {{DROP TABLE}} command to a separate file, say 037-HIVE-15118.derby.sql and then just call RUN on this file from the upgrade script? Thanks > Remove unused 'COLUMNS' table from derby schema > --- > > Key: HIVE-15118 > URL: https://issues.apache.org/jira/browse/HIVE-15118 > Project: Hive > Issue Type: Sub-task > Components: Database/Schema >Affects Versions: 2.2.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Minor > Attachments: HIVE-15118.1.patch, HIVE-15118.2.patch > > > COLUMNS table is unused any more. Other databases already removed it. Remove > from derby as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15294) Capture additional metadata to replicate a simple insert at destination
[ https://issues.apache.org/jira/browse/HIVE-15294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743228#comment-15743228 ] Thejas M Nair commented on HIVE-15294: -- [~vgumashta] Can you please create a review board link or pull request (without the generated files) ? > Capture additional metadata to replicate a simple insert at destination > --- > > Key: HIVE-15294 > URL: https://issues.apache.org/jira/browse/HIVE-15294 > Project: Hive > Issue Type: Sub-task > Components: repl >Reporter: Vaibhav Gumashta >Assignee: Vaibhav Gumashta > Attachments: HIVE-15294.1.patch > > > For replicating inserts like {{INSERT INTO ... SELECT ... FROM}}, we will > need to capture the newly added files in the notification message to be able > to replicate the event at destination. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15383) Add additional info to 'desc function extended' output
[ https://issues.apache.org/jira/browse/HIVE-15383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-15383: Attachment: HIVE-15383.2.patch > Add additional info to 'desc function extended' output > -- > > Key: HIVE-15383 > URL: https://issues.apache.org/jira/browse/HIVE-15383 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 2.2.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Trivial > Attachments: HIVE-15383.1.patch, HIVE-15383.2.patch > > > Add additional info to the output to 'desc function extended'. The resources > would be helpful for the user to check which jars are referred. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15383) Add additional info to 'desc function extended' output
[ https://issues.apache.org/jira/browse/HIVE-15383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-15383: Attachment: (was: HIVE-15383.2.patch) > Add additional info to 'desc function extended' output > -- > > Key: HIVE-15383 > URL: https://issues.apache.org/jira/browse/HIVE-15383 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 2.2.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Trivial > Attachments: HIVE-15383.1.patch, HIVE-15383.2.patch > > > Add additional info to the output to 'desc function extended'. The resources > would be helpful for the user to check which jars are referred. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14007) Replace ORC module with ORC release
[ https://issues.apache.org/jira/browse/HIVE-14007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743156#comment-15743156 ] Gunther Hagleitner commented on HIVE-14007: --- [~owen.omalley] this partially addresses my concerns but you're being vague about what exactly and when. So I will try to be clear. I'm -1 on this. Blockers for me are: a) change breaks compatibility. (schema evolution) b) there's no documented and proven way to actually be able to turn these features/bugs around in 3 days. (many open questions, what's considered public v private, backwards compat criteria of the api, documentation for api itself, documentation of release mechanics, what are the version numbers, who votes on it, do a feature end to end to demonstrate.) c) not sure why it's up to you who gets to play and who doesn't. for me at the very least anyone who's been working on ACID, ORC, LLAP, cloud/s3 or vectorization should keep their committership/pmc in ORC (or have it transferred/added). i'm fine if folks explicitly opt out of that. given how these things are interrelated, anything else will just make it harder/impossible for hive devs to do their work. > Replace ORC module with ORC release > --- > > Key: HIVE-14007 > URL: https://issues.apache.org/jira/browse/HIVE-14007 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 2.2.0 >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Fix For: 2.2.0 > > Attachments: HIVE-14007.patch, HIVE-14007.patch, HIVE-14007.patch, > HIVE-14007.patch > > > This completes moving the core ORC reader & writer to the ORC project. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15376) Improve heartbeater scheduling for transactions
[ https://issues.apache.org/jira/browse/HIVE-15376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743099#comment-15743099 ] Hive QA commented on HIVE-15376: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12842845/HIVE-15376.6.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 10795 tests executed *Failed tests:* {noformat} TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=123) [groupby_complex_types.q,multigroupby_singlemr.q,mapjoin_decimal.q,groupby7.q,join5.q,bucketmapjoin_negative2.q,vectorization_div0.q,union_script.q,add_part_multiple.q,limit_pushdown.q,union_remove_17.q,uniquejoin.q,metadata_only_queries_with_filters.q,union25.q,load_dyn_part13.q] TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely timed out) (batchId=251) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] (batchId=44) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a] (batchId=135) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] (batchId=135) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision] (batchId=151) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2545/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2545/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2545/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 11 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12842845 - PreCommit-HIVE-Build > Improve heartbeater scheduling for transactions > --- > > Key: HIVE-15376 > URL: https://issues.apache.org/jira/browse/HIVE-15376 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 2.2.0 >Reporter: Wei Zheng >Assignee: Wei Zheng > Attachments: HIVE-15376.1.patch, HIVE-15376.2.patch, > HIVE-15376.3.patch, HIVE-15376.4.patch, HIVE-15376.5.patch, HIVE-15376.6.patch > > > HIVE-12366 improved the heartbeater logic by bringing down the gap between > the lock acquisition and first heartbeat, but that's not enough, there may > still be some issue, e.g. > Time A: a transaction is opened > Time B: acquireLocks is called (blocking call), but it can take a long time > to actually acquire the locks and return if the system is busy > Time C: as acquireLocks returns, the first heartbeat is sent > If hive.txn.timeout < C - A, then the transaction will be timed out and > aborted, thus causing failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15416) CAST to string does not work for large decimal numbers
[ https://issues.apache.org/jira/browse/HIVE-15416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743068#comment-15743068 ] Sergey Shelukhin commented on HIVE-15416: - That's probably because some code goes thru float or double. I remember seeing code like that in some conversion cases. > CAST to string does not work for large decimal numbers > -- > > Key: HIVE-15416 > URL: https://issues.apache.org/jira/browse/HIVE-15416 > Project: Hive > Issue Type: Bug >Affects Versions: 1.2.1 >Reporter: Pavel Benes > > The cast of large decimal values to string does not work and produces NULL > values. > Steps to reproduce: > {code} > hive> create table test_hive_bug30(decimal_col DECIMAL(30,0)); > OK > {code} > {code} > hive> insert into test_hive_bug30 VALUES (123), > (9), > (99),(999); > Query ID = benesp_20161212135717_5d16d7f4-7b84-409e-ad00-36085deaae54 > Total jobs = 1 > Launching Job 1 out of 1 > Status: Running (Executing on YARN cluster with App id > application_1480833176011_2469) > > VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED > KILLED > > Map 1 .. SUCCEEDED 1 100 0 > 0 > > VERTICES: 01/01 [==>>] 100% ELAPSED TIME: 7.69 s > > Loading data to table default.test_hive_bug30 > Table default.test_hive_bug30 stats: [numFiles=1, numRows=4, totalSize=68, > rawDataSize=64] > OK > Time taken: 8.239 seconds > {code} > {code} > hive> select CAST(decimal_col AS STRING) from test_hive_bug30; > OK > 123 > NULL > NULL > NULL > Time taken: 0.043 seconds, Fetched: 4 row(s) > {code} > The numbers with 29 and 30 digits should be exported, but they are converted > to NULL instead. > The values are stored correctly as can be seen here: > {code} > hive> select * from test_hive_bug30; > OK > 123 > 9 > 99 > NULL > Time taken: 0.447 seconds, Fetched: 4 row(s) > {code} > The same issue does not exists for smaller numbers (e.g. DECIMAL(10)). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-4095) Add exchange partition in Hive
[ https://issues.apache.org/jira/browse/HIVE-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743019#comment-15743019 ] Anthony Hsu edited comment on HIVE-4095 at 12/12/16 8:16 PM: - [~leftylev]: Based on my testing with versions 0.13.1, 1.1.0, and 2.2.0 (trunk), the destination table should come first. I clarified the example in https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ExchangePartition. was (Author: erwaman): [~leftylev]: Based on my testing with versions 0.13.1, 1.1.0, and 2.2.0 (trunk), the destination table should come first. > Add exchange partition in Hive > -- > > Key: HIVE-4095 > URL: https://issues.apache.org/jira/browse/HIVE-4095 > Project: Hive > Issue Type: New Feature > Components: Query Processor >Reporter: Namit Jain >Assignee: Dheeraj Kumar Singh > Fix For: 0.12.0 > > Attachments: HIVE-4095.D10155.1.patch, HIVE-4095.D10155.2.patch, > HIVE-4095.D10347.1.patch, HIVE-4095.part11.patch.txt, > HIVE-4095.part12.patch.txt, hive.4095.1.patch, hive.4095.refresh.patch, > hive.4095.svn.thrift.patch, hive.4095.svn.thrift.patch.refresh > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-4095) Add exchange partition in Hive
[ https://issues.apache.org/jira/browse/HIVE-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743019#comment-15743019 ] Anthony Hsu commented on HIVE-4095: --- [~leftylev]: Based on my testing with versions 0.13.1, 1.1.0, and 2.2.0 (trunk), the destination table should come first. > Add exchange partition in Hive > -- > > Key: HIVE-4095 > URL: https://issues.apache.org/jira/browse/HIVE-4095 > Project: Hive > Issue Type: New Feature > Components: Query Processor >Reporter: Namit Jain >Assignee: Dheeraj Kumar Singh > Fix For: 0.12.0 > > Attachments: HIVE-4095.D10155.1.patch, HIVE-4095.D10155.2.patch, > HIVE-4095.D10347.1.patch, HIVE-4095.part11.patch.txt, > HIVE-4095.part12.patch.txt, hive.4095.1.patch, hive.4095.refresh.patch, > hive.4095.svn.thrift.patch, hive.4095.svn.thrift.patch.refresh > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15397) metadata-only queries may return incorrect results with empty tables
[ https://issues.apache.org/jira/browse/HIVE-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743015#comment-15743015 ] Sergey Shelukhin commented on HIVE-15397: - [~ashutoshc] ping? > metadata-only queries may return incorrect results with empty tables > > > Key: HIVE-15397 > URL: https://issues.apache.org/jira/browse/HIVE-15397 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-15397.01.patch, HIVE-15397.patch > > > Queries like select 1=1 from t group by 1=1 may return rows, based on > OneNullRowInputFormat, even if the source table is empty. For now, add some > basic detection of empty tables and turn this off by default (since we can't > know whether a table is empty or not based on there being some files, without > reading them). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15108) allow Hive script to skip hadoop version check and HBase classpath
[ https://issues.apache.org/jira/browse/HIVE-15108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15742999#comment-15742999 ] Sergey Shelukhin commented on HIVE-15108: - Is hive tool itself documented? if yes, I can add them there > allow Hive script to skip hadoop version check and HBase classpath > -- > > Key: HIVE-15108 > URL: https://issues.apache.org/jira/browse/HIVE-15108 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Fix For: 2.2.0 > > Attachments: HIVE-15108.patch, HIVE-15108.patch > > > Both will be performed by default, as before -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15353) Metastore throws NPE if StorageDescriptor.cols is null
[ https://issues.apache.org/jira/browse/HIVE-15353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743001#comment-15743001 ] Anthony Hsu commented on HIVE-15353: Canceled and resubmitted patch. Will see if PreCommit tests run. > Metastore throws NPE if StorageDescriptor.cols is null > -- > > Key: HIVE-15353 > URL: https://issues.apache.org/jira/browse/HIVE-15353 > Project: Hive > Issue Type: Bug >Affects Versions: 1.1.0, 2.2.0 >Reporter: Anthony Hsu >Assignee: Anthony Hsu > Attachments: HIVE-15353.1.patch, HIVE-15353.2.patch, > HIVE-15353.3.patch > > > When using the HiveMetaStoreClient API directly to talk to the metastore, you > get NullPointerExceptions when StorageDescriptor.cols is null in the > Table/Partition object in the following calls: > * create_table > * alter_table > * alter_partition > Calling add_partition with StorageDescriptor.cols set to null causes null to > be stored in the metastore database and subsequent calls to alter_partition > for that partition to fail with an NPE. > Null checks should be added to eliminate the NPEs in the metastore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15353) Metastore throws NPE if StorageDescriptor.cols is null
[ https://issues.apache.org/jira/browse/HIVE-15353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anthony Hsu updated HIVE-15353: --- Status: Open (was: Patch Available) > Metastore throws NPE if StorageDescriptor.cols is null > -- > > Key: HIVE-15353 > URL: https://issues.apache.org/jira/browse/HIVE-15353 > Project: Hive > Issue Type: Bug >Affects Versions: 1.1.0, 2.2.0 >Reporter: Anthony Hsu >Assignee: Anthony Hsu > Attachments: HIVE-15353.1.patch, HIVE-15353.2.patch, > HIVE-15353.3.patch > > > When using the HiveMetaStoreClient API directly to talk to the metastore, you > get NullPointerExceptions when StorageDescriptor.cols is null in the > Table/Partition object in the following calls: > * create_table > * alter_table > * alter_partition > Calling add_partition with StorageDescriptor.cols set to null causes null to > be stored in the metastore database and subsequent calls to alter_partition > for that partition to fail with an NPE. > Null checks should be added to eliminate the NPEs in the metastore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15353) Metastore throws NPE if StorageDescriptor.cols is null
[ https://issues.apache.org/jira/browse/HIVE-15353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anthony Hsu updated HIVE-15353: --- Status: Patch Available (was: Open) > Metastore throws NPE if StorageDescriptor.cols is null > -- > > Key: HIVE-15353 > URL: https://issues.apache.org/jira/browse/HIVE-15353 > Project: Hive > Issue Type: Bug >Affects Versions: 1.1.0, 2.2.0 >Reporter: Anthony Hsu >Assignee: Anthony Hsu > Attachments: HIVE-15353.1.patch, HIVE-15353.2.patch, > HIVE-15353.3.patch > > > When using the HiveMetaStoreClient API directly to talk to the metastore, you > get NullPointerExceptions when StorageDescriptor.cols is null in the > Table/Partition object in the following calls: > * create_table > * alter_table > * alter_partition > Calling add_partition with StorageDescriptor.cols set to null causes null to > be stored in the metastore database and subsequent calls to alter_partition > for that partition to fail with an NPE. > Null checks should be added to eliminate the NPEs in the metastore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13680) HiveServer2: Provide a way to compress ResultSets
[ https://issues.apache.org/jira/browse/HIVE-13680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Liew updated HIVE-13680: -- Attachment: HIVE-13680.6.patch Latest patch attached. The server-side framework will remain in this JIRA while the sample compressor moves to HIVE-15384. > HiveServer2: Provide a way to compress ResultSets > - > > Key: HIVE-13680 > URL: https://issues.apache.org/jira/browse/HIVE-13680 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2, JDBC >Reporter: Vaibhav Gumashta >Assignee: Kevin Liew > Attachments: HIVE-13680.2.patch, HIVE-13680.3.patch, > HIVE-13680.4.patch, HIVE-13680.6.patch, HIVE-13680.patch, SnappyCompDe.zip, > proposal.pdf > > > With HIVE-12049 in, we can provide an option to compress ResultSets before > writing to disk. The user can specify a compression library via a config > param which can be used in the tasks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15384) Compressor plugin
[ https://issues.apache.org/jira/browse/HIVE-15384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Liew updated HIVE-15384: -- Summary: Compressor plugin (was: Compressor plugin framework) > Compressor plugin > - > > Key: HIVE-15384 > URL: https://issues.apache.org/jira/browse/HIVE-15384 > Project: Hive > Issue Type: Sub-task >Reporter: Ziyang Zhao > > Splitting server framework into separate JIRA from compressor -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (HIVE-14007) Replace ORC module with ORC release
[ https://issues.apache.org/jira/browse/HIVE-14007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HIVE-14007: - Comment: was deleted (was: .bq The other thing I think we need community wide clarity on before you rip out orc is how we’re going to keep developing hive afterwards. Right now there’s a cyclic dependency. Hive -> ORC -> Hive - because of a shared storage api. There is agreement within Hive to release the storage-api independently from Hive. That would break the cycle and allow a non-cyclic release process. I'll file a Hive jira to do that work. Avoiding have two copies of code makes the whole ecosystem stronger by making sure that fixes get applied everywhere. I'd suggest leaving storage-api in the Hive source tree rather than making its own git repository. .bq There are features that touch all three. And it turns out these are more frequent than expected. They come in waves. In the last three months, there have been 2 changes to storage-api. Most of the patches are in either storage-api or ORC. For example, HIVE-14453 only touches ORC. .bq How do you propose to handle development and release of these features given the cyclic dependency? How do you work out feature branches/ snapshots? For changes that touch one or the other, you'd commit the relevant change and release either storage-api or ORC and have a jira that updates the version in Hive. In the worst case, where the change spreads among the three artifacts, you would: * commit to storage-api & ORC * release them * upgrade the pom in Hive .bq If a successful feature commit requires sequential hive and orc releases, then that means minimum several months before commit and that's not great. How will this be done? No, ORC releases typically take 3 days. Storage API is much simpler and should also take 3 days. By being much smaller and more focused, they are much more nimble. Furthermore, the two votes could completely overlap, so the total time to get the change into Hive would be roughly 3 days. .bq Looking over the PMC and committer lists in ORC it looks like many people working on ACID, vectorization or llap will lose the ability to do what they are doing today with this change. When we set up the ORC project, we were pretty inclusive in the committer list and we continue to add new committers and PMC members. I'll take at the contributors to the Hive ORC module to look for new committers.) > Replace ORC module with ORC release > --- > > Key: HIVE-14007 > URL: https://issues.apache.org/jira/browse/HIVE-14007 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 2.2.0 >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Fix For: 2.2.0 > > Attachments: HIVE-14007.patch, HIVE-14007.patch, HIVE-14007.patch, > HIVE-14007.patch > > > This completes moving the core ORC reader & writer to the ORC project. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15294) Capture additional metadata to replicate a simple insert at destination
[ https://issues.apache.org/jira/browse/HIVE-15294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15742927#comment-15742927 ] Hive QA commented on HIVE-15294: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12842837/HIVE-15294.1.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10810 tests executed *Failed tests:* {noformat} TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely timed out) (batchId=251) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] (batchId=44) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] (batchId=135) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision] (batchId=151) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2544/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2544/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2544/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12842837 - PreCommit-HIVE-Build > Capture additional metadata to replicate a simple insert at destination > --- > > Key: HIVE-15294 > URL: https://issues.apache.org/jira/browse/HIVE-15294 > Project: Hive > Issue Type: Sub-task > Components: repl >Reporter: Vaibhav Gumashta >Assignee: Vaibhav Gumashta > Attachments: HIVE-15294.1.patch > > > For replicating inserts like {{INSERT INTO ... SELECT ... FROM}}, we will > need to capture the newly added files in the notification message to be able > to replicate the event at destination. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14007) Replace ORC module with ORC release
[ https://issues.apache.org/jira/browse/HIVE-14007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15742925#comment-15742925 ] Owen O'Malley commented on HIVE-14007: -- {quote} The other thing I think we need community wide clarity on before you rip out orc is how we’re going to keep developing hive afterwards. Right now there’s a cyclic dependency. Hive -> ORC -> Hive - because of a shared storage api. {quote} There is agreement within Hive to release the storage-api independently from Hive. That would break the cycle and allow a non-cyclic release process. I'll file a Hive jira to do that work. Avoiding have two copies of code makes the whole ecosystem stronger by making sure that fixes get applied everywhere. I'd suggest leaving storage-api in the Hive source tree rather than making its own git repository. {quote} There are features that touch all three. And it turns out these are more frequent than expected. {quote} They come in waves. In the last three months, there have been 2 changes to storage-api. Most of the patches are in either storage-api or ORC. For example, HIVE-14453 only touches ORC. {quote} How do you propose to handle development and release of these features given the cyclic dependency? How do you work out feature branches/ snapshots? {quote} For changes that touch one or the other, you'd commit the relevant change and release either storage-api or ORC and have a jira that updates the version in Hive. In the worst case, where the change spreads among the three artifacts, you would: * commit to storage-api & ORC * release them * upgrade the pom in Hive {quote} If a successful feature commit requires sequential hive and orc releases, then that means minimum several months before commit and that's not great. How will this be done? {quote} No, ORC releases typically take 3 days. Storage API is much simpler and should also take 3 days. By being much smaller and more focused, they are much more nimble. Furthermore, the two votes could completely overlap, so the total time to get the change into Hive would be roughly 3 days. {quote} Looking over the PMC and committer lists in ORC it looks like many people working on ACID, vectorization or llap will lose the ability to do what they are doing today with this change. {quote} When we set up the ORC project, we were pretty inclusive in the committer list and we continue to add new committers and PMC members. I'll take at the contributors to the Hive ORC module to look for new committers. > Replace ORC module with ORC release > --- > > Key: HIVE-14007 > URL: https://issues.apache.org/jira/browse/HIVE-14007 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 2.2.0 >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Fix For: 2.2.0 > > Attachments: HIVE-14007.patch, HIVE-14007.patch, HIVE-14007.patch, > HIVE-14007.patch > > > This completes moving the core ORC reader & writer to the ORC project. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14007) Replace ORC module with ORC release
[ https://issues.apache.org/jira/browse/HIVE-14007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15742914#comment-15742914 ] Owen O'Malley commented on HIVE-14007: -- .bq The other thing I think we need community wide clarity on before you rip out orc is how we’re going to keep developing hive afterwards. Right now there’s a cyclic dependency. Hive -> ORC -> Hive - because of a shared storage api. There is agreement within Hive to release the storage-api independently from Hive. That would break the cycle and allow a non-cyclic release process. I'll file a Hive jira to do that work. Avoiding have two copies of code makes the whole ecosystem stronger by making sure that fixes get applied everywhere. I'd suggest leaving storage-api in the Hive source tree rather than making its own git repository. .bq There are features that touch all three. And it turns out these are more frequent than expected. They come in waves. In the last three months, there have been 2 changes to storage-api. Most of the patches are in either storage-api or ORC. For example, HIVE-14453 only touches ORC. .bq How do you propose to handle development and release of these features given the cyclic dependency? How do you work out feature branches/ snapshots? For changes that touch one or the other, you'd commit the relevant change and release either storage-api or ORC and have a jira that updates the version in Hive. In the worst case, where the change spreads among the three artifacts, you would: * commit to storage-api & ORC * release them * upgrade the pom in Hive .bq If a successful feature commit requires sequential hive and orc releases, then that means minimum several months before commit and that's not great. How will this be done? No, ORC releases typically take 3 days. Storage API is much simpler and should also take 3 days. By being much smaller and more focused, they are much more nimble. Furthermore, the two votes could completely overlap, so the total time to get the change into Hive would be roughly 3 days. .bq Looking over the PMC and committer lists in ORC it looks like many people working on ACID, vectorization or llap will lose the ability to do what they are doing today with this change. When we set up the ORC project, we were pretty inclusive in the committer list and we continue to add new committers and PMC members. I'll take at the contributors to the Hive ORC module to look for new committers. > Replace ORC module with ORC release > --- > > Key: HIVE-14007 > URL: https://issues.apache.org/jira/browse/HIVE-14007 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 2.2.0 >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Fix For: 2.2.0 > > Attachments: HIVE-14007.patch, HIVE-14007.patch, HIVE-14007.patch, > HIVE-14007.patch > > > This completes moving the core ORC reader & writer to the ORC project. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-15386) Expose Spark task counts and stage Ids information in SparkTask from SparkJobMonitor
[ https://issues.apache.org/jira/browse/HIVE-15386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15742839#comment-15742839 ] zhihai xu edited comment on HIVE-15386 at 12/12/16 7:15 PM: yes, thanks for the review [~lirui]! using PerfLogger to get submitTime is better, I also use PerfLogger to get finishTime in the new patch HIVE-15386.002.patch. So all the timing information is based on PerfLogger for consistency. Please review it! was (Author: zxu): yes, thanks for the review [~lirui]! using PerfLogger to get submitTime is better, I also use PerfLogger to get finishTime. So all the timing information is based on PerfLogger for consistency. Please review it! > Expose Spark task counts and stage Ids information in SparkTask from > SparkJobMonitor > > > Key: HIVE-15386 > URL: https://issues.apache.org/jira/browse/HIVE-15386 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 2.2.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: HIVE-15386.000.patch, HIVE-15386.001.patch, > HIVE-15386.002.patch > > > Expose Spark task counts and stage Ids information in SparkTask from > SparkJobMonitor. So these information can be used by hive hook to monitor > spark jobs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15386) Expose Spark task counts and stage Ids information in SparkTask from SparkJobMonitor
[ https://issues.apache.org/jira/browse/HIVE-15386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15742839#comment-15742839 ] zhihai xu commented on HIVE-15386: -- yes, thanks for the review [~lirui]! using PerfLogger to get submitTime is better, I also use PerfLogger to get finishTime. So all the timing information is based on PerfLogger for consistency. Please review it! > Expose Spark task counts and stage Ids information in SparkTask from > SparkJobMonitor > > > Key: HIVE-15386 > URL: https://issues.apache.org/jira/browse/HIVE-15386 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 2.2.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: HIVE-15386.000.patch, HIVE-15386.001.patch, > HIVE-15386.002.patch > > > Expose Spark task counts and stage Ids information in SparkTask from > SparkJobMonitor. So these information can be used by hive hook to monitor > spark jobs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15386) Expose Spark task counts and stage Ids information in SparkTask from SparkJobMonitor
[ https://issues.apache.org/jira/browse/HIVE-15386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated HIVE-15386: - Attachment: HIVE-15386.002.patch > Expose Spark task counts and stage Ids information in SparkTask from > SparkJobMonitor > > > Key: HIVE-15386 > URL: https://issues.apache.org/jira/browse/HIVE-15386 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 2.2.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: HIVE-15386.000.patch, HIVE-15386.001.patch, > HIVE-15386.002.patch > > > Expose Spark task counts and stage Ids information in SparkTask from > SparkJobMonitor. So these information can be used by hive hook to monitor > spark jobs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15417) Glitches using ACID's row__id hidden column
[ https://issues.apache.org/jira/browse/HIVE-15417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-15417: --- Attachment: HIVE-15417.02.patch Regenerating q file and making new test behavior deterministic. > Glitches using ACID's row__id hidden column > --- > > Key: HIVE-15417 > URL: https://issues.apache.org/jira/browse/HIVE-15417 > Project: Hive > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Carter Shanklin >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-15417.01.patch, HIVE-15417.02.patch, > HIVE-15417.patch > > > This only works if you turn PPD off. > {code:sql} > drop table if exists hello_acid; > create table hello_acid (key int, value int) > partitioned by (load_date date) > clustered by(key) into 3 buckets > stored as orc tblproperties ('transactional'='true'); > insert into hello_acid partition (load_date='2016-03-01') values (1, 1); > insert into hello_acid partition (load_date='2016-03-02') values (2, 2); > insert into hello_acid partition (load_date='2016-03-03') values (3, 3); > {code} > {code} > hive> set hive.optimize.ppd=true; > hive> select tid from (select row__id.transactionid as tid from hello_acid) > sub where tid = 15; > FAILED: SemanticException MetaException(message:cannot find field row__id > from [0:load_date]) > hive> set hive.optimize.ppd=false; > hive> select tid from (select row__id.transactionid as tid from hello_acid) > sub where tid = 15; > OK > tid > 15 > Time taken: 0.075 seconds, Fetched: 1 row(s) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14948) properly handle special characters in identifiers
[ https://issues.apache.org/jira/browse/HIVE-14948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15742804#comment-15742804 ] Eugene Koifman commented on HIVE-14948: --- Support for quoted identifier was added in HIVE-6013. Even though QuotedIdentifier TokenType was created there no ASTNode of that type is generated. The ` (back tick) is stripped away in the grammar - see HiveLexer.g where QuotedIdentifier is defined. > properly handle special characters in identifiers > - > > Key: HIVE-14948 > URL: https://issues.apache.org/jira/browse/HIVE-14948 > Project: Hive > Issue Type: Sub-task > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-14948.01.patch, HIVE-14948.02.patch > > > The treatment of quoted identifiers in HIVE-14943 is inconsistent. Need to > clean this up and if possible only quote those identifiers that need to be > quoted in the generated SQL statement -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15413) Primary key constraints forced to be unique across database and table names
[ https://issues.apache.org/jira/browse/HIVE-15413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15742801#comment-15742801 ] Ashutosh Chauhan commented on HIVE-15413: - Thing to keep in mind here is unlike other databases, Hive does allow referencing tables from different databases in a single query, so making constraint name unique may have consequences to that. > Primary key constraints forced to be unique across database and table names > --- > > Key: HIVE-15413 > URL: https://issues.apache.org/jira/browse/HIVE-15413 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 2.2.0 >Reporter: Alan Gates >Priority: Critical > > In the RDBMS underlying the metastore the table that stores primary and > foreign keys has it's own primary key (at the RDBMS level) of > (constraint_name, position). This means that a constraint name must be > unique across all tables and databases in a system. This is not reasonable. > Database and table name should be included in the RDBMS primary key. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15417) Glitches using ACID's row__id hidden column
[ https://issues.apache.org/jira/browse/HIVE-15417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15742780#comment-15742780 ] Hive QA commented on HIVE-15417: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12842823/HIVE-15417.01.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 10811 tests executed *Failed tests:* {noformat} TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely timed out) (batchId=250) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] (batchId=44) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[nested_column_pruning] (batchId=31) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[row__id] (batchId=70) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] (batchId=134) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision] (batchId=150) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2543/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2543/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2543/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 11 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12842823 - PreCommit-HIVE-Build > Glitches using ACID's row__id hidden column > --- > > Key: HIVE-15417 > URL: https://issues.apache.org/jira/browse/HIVE-15417 > Project: Hive > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Carter Shanklin >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-15417.01.patch, HIVE-15417.patch > > > This only works if you turn PPD off. > {code:sql} > drop table if exists hello_acid; > create table hello_acid (key int, value int) > partitioned by (load_date date) > clustered by(key) into 3 buckets > stored as orc tblproperties ('transactional'='true'); > insert into hello_acid partition (load_date='2016-03-01') values (1, 1); > insert into hello_acid partition (load_date='2016-03-02') values (2, 2); > insert into hello_acid partition (load_date='2016-03-03') values (3, 3); > {code} > {code} > hive> set hive.optimize.ppd=true; > hive> select tid from (select row__id.transactionid as tid from hello_acid) > sub where tid = 15; > FAILED: SemanticException MetaException(message:cannot find field row__id > from [0:load_date]) > hive> set hive.optimize.ppd=false; > hive> select tid from (select row__id.transactionid as tid from hello_acid) > sub where tid = 15; > OK > tid > 15 > Time taken: 0.075 seconds, Fetched: 1 row(s) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15414) Fix batchSize for TestNegativeCliDriver
[ https://issues.apache.org/jira/browse/HIVE-15414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15742752#comment-15742752 ] Sergio Peña commented on HIVE-15414: Thanks. I did the change on the profile to use 400 qFileTest.clientNegative.driver = TestNegativeCliDriver qFileTest.clientNegative.directory = ql/src/test/queries/clientnegative qFileTest.clientNegative.batchSize = 400 We should see the results later after the last Jenkins build finishes. > Fix batchSize for TestNegativeCliDriver > --- > > Key: HIVE-15414 > URL: https://issues.apache.org/jira/browse/HIVE-15414 > Project: Hive > Issue Type: Sub-task >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Minor > > While analyzing the console output of pre-commit console logs, I noticed that > TestNegativeCliDriver batchSize ~770 qfiles which doesn't look right. > 2016-12-09 22:23:58,945 DEBUG [TestExecutor] ExecutionPhase.execute:96 > PBatch: QFileTestBatch [batchId=84, size=774, driver=TestNegativeCliDriver, > queryFilesProperty=qfile, > name=84-TestNegativeCliDriver-nopart_insert.q-input41.q-having1.q-and-771-more.. > > I think {{qFileTest.clientNegative.batchSize = 1000}} in > {{test-configuration2.properties}} is probably the reason. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14731) Use Tez cartesian product edge in Hive (unpartitioned case only)
[ https://issues.apache.org/jira/browse/HIVE-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15742755#comment-15742755 ] Zhiyuan Yang commented on HIVE-14731: - [~jcamachorodriguez], I'm not sure whether it's a good idea to enable this new feature by default for user, but I'll use [~hagleitn]'s way to enable it for tests first. > Use Tez cartesian product edge in Hive (unpartitioned case only) > > > Key: HIVE-14731 > URL: https://issues.apache.org/jira/browse/HIVE-14731 > Project: Hive > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Attachments: HIVE-14731.1.patch, HIVE-14731.10.patch, > HIVE-14731.2.patch, HIVE-14731.3.patch, HIVE-14731.4.patch, > HIVE-14731.5.patch, HIVE-14731.6.patch, HIVE-14731.7.patch, > HIVE-14731.8.patch, HIVE-14731.9.patch > > > Given cartesian product edge is available in Tez now (see TEZ-3230), let's > integrate it into Hive on Tez. This allows us to have more than one reducer > in cross product queries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15415) Random "java.util.ConcurrentModificationException"
[ https://issues.apache.org/jira/browse/HIVE-15415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15742738#comment-15742738 ] Vihang Karajgaonkar commented on HIVE-15415: Hi [~BigDataOrange] can you paste the stack trace here so that we can confirm if it is same as HIVE-15355? Thanks! > Random "java.util.ConcurrentModificationException" > -- > > Key: HIVE-15415 > URL: https://issues.apache.org/jira/browse/HIVE-15415 > Project: Hive > Issue Type: Bug > Components: Beeline >Affects Versions: 2.1.0 > Environment: Hadoop 2.7.3, Hive 2.1.0 >Reporter: Alexandre Linte > > I'm regularly facing Hive job failures through Oozie or through the beeline > CLI. The jobs exit with an error "FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.MoveTask. > java.util.ConcurrentModificationException (state=08S01,code=1)" but not 100% > of the time. > it's also important to underline that only one user is working on the table > when the jobs are running. > - stderr > {noformat} > Connecting to jdbc:hive2://hiveserver2.bigdata.fr:1/default > Connected to: Apache Hive (version 2.1.0) > Driver: Hive JDBC (version 2.1.0) > Transaction isolation: TRANSACTION_REPEATABLE_READ > No rows affected (1.475 seconds) > No rows affected (0.004 seconds) > No rows affected (0.004 seconds) > No rows affected (58.977 seconds) > No rows affected (5.524 seconds) > No rows affected (5.235 seconds) > Error: Error while processing statement: FAILED: Execution Error, return code > 1 from org.apache.hadoop.hive.ql.exec.MoveTask. > java.util.ConcurrentModificationException (state=08S01,code=1) > Closing: 0: jdbc:hive2://hiveserver2.bigdata.fr:1/default > Intercepting System.exit(2) > {noformat} > - stdout > {noformat} > Beeline command arguments : > -u > jdbc:hive2://hiveserver2.bigdata.fr:1/default > -n > my_user > -p > DUMMY > -d > org.apache.hive.jdbc.HiveDriver > -f > full_job > -a > delegationToken > --hiveconf > mapreduce.job.tags=oozie-75b060aacd7ec48c4ed637855e413280 > Fetching child yarn jobs > tag id : oozie-75b060aacd7ec48c4ed637855e413280 > Child yarn jobs are found - > = > >>> Invoking Beeline command line now >>> > 0: jdbc:hive2://hiveserver2.bigdata.fr> use my_db; > 0: jdbc:hive2://hiveserver2.bigdata.fr> set hive.execution.engine=tez; > 0: jdbc:hive2://hiveserver2.bigdata.fr> set tez.queue.name=tez_queue; > 0: jdbc:hive2://hiveserver2.bigdata.fr> > 0: jdbc:hive2://hiveserver2.bigdata.fr> insert overwrite table main_table > ^M_fd_livcfm > . . . . . . . . . . . . . . . . . . . . . . .> select > . . . . . . . . . . . . . . . . . . . . . . .> col.co_cd as co_cd, > . . . . . . . . . . . . . . . . . . . . . . .> col.line_co_cd as line_co_cd, > . . . . . . . . . . . . . . . . . . . . . . .> > unix_timestamp(min(tt.statut_dt)) ^M as statut_dt > . . . . . . . . . . . . . . . . . . . . . . .> from > dlk_scf_rn_customer_order_li ^Mne col > . . . . . . . . . . . . . . . . . . . . . . .> join > dlk_scf_rn_shipment_handling ^M_utility shu > . . . . . . . . . . . . . . . . . . . . . . .> on shu.co_cd =col.co_cd > . . . . . . . . . . . . . . . . . . . . . . .> and shu.line_co_cd = > col.line_co_ ^Mcd > . . . . . . . . . . . . . . . . . . . . . . .> join ( select > scaler_internal_ref ^M, statut_dt,recep_number,state,reason > . . . . . . . . . . . . . . . . . . . . . . .> from > dlk_scf_rn_transport_trackin ^Mg where state='LIV' and reason='CFM' ) tt > . . . . . . . . . . . . . . . . . . . . . . .> on > concat('CAL',shu.c_waybill_no) ^M =tt.scaler_internal_ref group by > col.co_cd,col.line_co_cd; > Heart beat > Heart beat > 0: jdbc:hive2://hiveserver2.bigdata.fr> > 0: jdbc:hive2://hiveserver2.bigdata.fr> insert overwrite table main_table > ^M_fd_cae > . . . . . . . . . . . . . . . . . . . . . . .> select > . . . . . . . . . . . . . . . . . . . . . . .> po_cd as cae, line_po_cd as > lcae, ^M origin_co_cd, origin_line_co_cd > . . . . . . . . . . . . . . . . . . . . . . .> from > dlk_scf_rn_purchase_order_li ^Mne > . . . . . . . . . . . . . . . . . . . . . . .> where instr(po_cd,"7")=1; > 0: jdbc:hive2://hiveserver2.bigdata.fr> > 0: jdbc:hive2://hiveserver2.bigdata.fr> insert overwrite table main_table > ^M_fd_cai > . . . . . . . . . . . . . . . . . . . . . . .> select > . . . . . . . . . . . . . . . . . . . . . . .> po_cd as cai, line_po_cd as > lcai, ^M origin_co_cd, origin_line_co_cd > . . . . . . . . . . . . . . . . . . . . . . .> from > dlk_scf_rn_purchase_order_li ^Mne > . . . . . . . . . . . . . . . . . . . . . . .> where in
[jira] [Commented] (HIVE-15417) Glitches using ACID's row__id hidden column
[ https://issues.apache.org/jira/browse/HIVE-15417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15742739#comment-15742739 ] Ashutosh Chauhan commented on HIVE-15417: - +1 I assume failures are unrelated. > Glitches using ACID's row__id hidden column > --- > > Key: HIVE-15417 > URL: https://issues.apache.org/jira/browse/HIVE-15417 > Project: Hive > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Carter Shanklin >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-15417.01.patch, HIVE-15417.patch > > > This only works if you turn PPD off. > {code:sql} > drop table if exists hello_acid; > create table hello_acid (key int, value int) > partitioned by (load_date date) > clustered by(key) into 3 buckets > stored as orc tblproperties ('transactional'='true'); > insert into hello_acid partition (load_date='2016-03-01') values (1, 1); > insert into hello_acid partition (load_date='2016-03-02') values (2, 2); > insert into hello_acid partition (load_date='2016-03-03') values (3, 3); > {code} > {code} > hive> set hive.optimize.ppd=true; > hive> select tid from (select row__id.transactionid as tid from hello_acid) > sub where tid = 15; > FAILED: SemanticException MetaException(message:cannot find field row__id > from [0:load_date]) > hive> set hive.optimize.ppd=false; > hive> select tid from (select row__id.transactionid as tid from hello_acid) > sub where tid = 15; > OK > tid > 15 > Time taken: 0.075 seconds, Fetched: 1 row(s) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15414) Fix batchSize for TestNegativeCliDriver
[ https://issues.apache.org/jira/browse/HIVE-15414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15742728#comment-15742728 ] Vihang Karajgaonkar commented on HIVE-15414: Thanks [~pvary]. I think you may be right but I thought of checking it either ways in the hope that it might improve the run time further :) [~spena] Based on these logs on the latest run: 2016-12-12 17:21:30,903 INFO [HostExecutor 46] LocalCommand.:45 Starting LocalCommandId=205113: ssh -v -i /home/hiveptest/.ssh/hive-ptest-user-key -l hiveptest 104.154.105.18 'bash /home/hiveptest/104.154.105.18-hiveptest-0/scratch/hiveptest-84-TestNegativeCliDriver-nopart_insert.q-input41.q-having1.q-and-771-more.sh' 2016-12-12 17:33:28,341 INFO [HostExecutor 46] LocalCommand.awaitProcessCompletion:67 Finished LocalCommandId=205113. ElapsedTime(ms)=717437 2016-12-12 17:33:28,341 INFO [HostExecutor 46] HostExecutor.executeTestBatch:261 Completed executing tests for batch [84-TestNegativeCliDriver-nopart_insert.q-input41.q-having1.q-and-771-more] on host 104.154.105.18. ElapsedTime(ms)=717437 It takes about ~700 sec. I did a quick search of the logs and it looks like most other batches take between 200-300 sec so we may see some benefits if we reduce the limit from 1000 to 400 so that the tests are divided into 2 batches. > Fix batchSize for TestNegativeCliDriver > --- > > Key: HIVE-15414 > URL: https://issues.apache.org/jira/browse/HIVE-15414 > Project: Hive > Issue Type: Sub-task >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Minor > > While analyzing the console output of pre-commit console logs, I noticed that > TestNegativeCliDriver batchSize ~770 qfiles which doesn't look right. > 2016-12-09 22:23:58,945 DEBUG [TestExecutor] ExecutionPhase.execute:96 > PBatch: QFileTestBatch [batchId=84, size=774, driver=TestNegativeCliDriver, > queryFilesProperty=qfile, > name=84-TestNegativeCliDriver-nopart_insert.q-input41.q-having1.q-and-771-more.. > > I think {{qFileTest.clientNegative.batchSize = 1000}} in > {{test-configuration2.properties}} is probably the reason. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14731) Use Tez cartesian product edge in Hive (unpartitioned case only)
[ https://issues.apache.org/jira/browse/HIVE-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15742721#comment-15742721 ] Gunther Hagleitner commented on HIVE-14731: --- Can you also please turn the feature on in "data/conf/hive-site.xml" that will enable them for all unit tests? I think the "off" case will implicitly be tested through MR/spark/etc test drivers. > Use Tez cartesian product edge in Hive (unpartitioned case only) > > > Key: HIVE-14731 > URL: https://issues.apache.org/jira/browse/HIVE-14731 > Project: Hive > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Attachments: HIVE-14731.1.patch, HIVE-14731.10.patch, > HIVE-14731.2.patch, HIVE-14731.3.patch, HIVE-14731.4.patch, > HIVE-14731.5.patch, HIVE-14731.6.patch, HIVE-14731.7.patch, > HIVE-14731.8.patch, HIVE-14731.9.patch > > > Given cartesian product edge is available in Tez now (see TEZ-3230), let's > integrate it into Hive on Tez. This allows us to have more than one reducer > in cross product queries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15401) Import constraints into HBase metastore
[ https://issues.apache.org/jira/browse/HIVE-15401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15742695#comment-15742695 ] Daniel Dai commented on HIVE-15401: --- +1 > Import constraints into HBase metastore > --- > > Key: HIVE-15401 > URL: https://issues.apache.org/jira/browse/HIVE-15401 > Project: Hive > Issue Type: Sub-task > Components: HBase Metastore >Affects Versions: 2.1.1 >Reporter: Alan Gates >Assignee: Alan Gates > Attachments: HIVE-15401.patch > > > Since HIVE-15342 added support for primary and foreign keys in the HBase > metastore we should support them in HBaseImport as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15376) Improve heartbeater scheduling for transactions
[ https://issues.apache.org/jira/browse/HIVE-15376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng updated HIVE-15376: - Status: Patch Available (was: Open) > Improve heartbeater scheduling for transactions > --- > > Key: HIVE-15376 > URL: https://issues.apache.org/jira/browse/HIVE-15376 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 2.2.0 >Reporter: Wei Zheng >Assignee: Wei Zheng > Attachments: HIVE-15376.1.patch, HIVE-15376.2.patch, > HIVE-15376.3.patch, HIVE-15376.4.patch, HIVE-15376.5.patch, HIVE-15376.6.patch > > > HIVE-12366 improved the heartbeater logic by bringing down the gap between > the lock acquisition and first heartbeat, but that's not enough, there may > still be some issue, e.g. > Time A: a transaction is opened > Time B: acquireLocks is called (blocking call), but it can take a long time > to actually acquire the locks and return if the system is busy > Time C: as acquireLocks returns, the first heartbeat is sent > If hive.txn.timeout < C - A, then the transaction will be timed out and > aborted, thus causing failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15376) Improve heartbeater scheduling for transactions
[ https://issues.apache.org/jira/browse/HIVE-15376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng updated HIVE-15376: - Attachment: HIVE-15376.6.patch > Improve heartbeater scheduling for transactions > --- > > Key: HIVE-15376 > URL: https://issues.apache.org/jira/browse/HIVE-15376 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 2.2.0 >Reporter: Wei Zheng >Assignee: Wei Zheng > Attachments: HIVE-15376.1.patch, HIVE-15376.2.patch, > HIVE-15376.3.patch, HIVE-15376.4.patch, HIVE-15376.5.patch, HIVE-15376.6.patch > > > HIVE-12366 improved the heartbeater logic by bringing down the gap between > the lock acquisition and first heartbeat, but that's not enough, there may > still be some issue, e.g. > Time A: a transaction is opened > Time B: acquireLocks is called (blocking call), but it can take a long time > to actually acquire the locks and return if the system is busy > Time C: as acquireLocks returns, the first heartbeat is sent > If hive.txn.timeout < C - A, then the transaction will be timed out and > aborted, thus causing failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15376) Improve heartbeater scheduling for transactions
[ https://issues.apache.org/jira/browse/HIVE-15376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng updated HIVE-15376: - Status: Open (was: Patch Available) > Improve heartbeater scheduling for transactions > --- > > Key: HIVE-15376 > URL: https://issues.apache.org/jira/browse/HIVE-15376 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 2.2.0 >Reporter: Wei Zheng >Assignee: Wei Zheng > Attachments: HIVE-15376.1.patch, HIVE-15376.2.patch, > HIVE-15376.3.patch, HIVE-15376.4.patch, HIVE-15376.5.patch, HIVE-15376.6.patch > > > HIVE-12366 improved the heartbeater logic by bringing down the gap between > the lock acquisition and first heartbeat, but that's not enough, there may > still be some issue, e.g. > Time A: a transaction is opened > Time B: acquireLocks is called (blocking call), but it can take a long time > to actually acquire the locks and return if the system is busy > Time C: as acquireLocks returns, the first heartbeat is sent > If hive.txn.timeout < C - A, then the transaction will be timed out and > aborted, thus causing failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15376) Improve heartbeater scheduling for transactions
[ https://issues.apache.org/jira/browse/HIVE-15376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng updated HIVE-15376: - Description: HIVE-12366 improved the heartbeater logic by bringing down the gap between the lock acquisition and first heartbeat, but that's not enough, there may still be some issue, e.g. Time A: a transaction is opened Time B: acquireLocks is called (blocking call), but it can take a long time to actually acquire the locks and return if the system is busy Time C: as acquireLocks returns, the first heartbeat is sent If hive.txn.timeout < C - A, then the transaction will be timed out and aborted, thus causing failure. > Improve heartbeater scheduling for transactions > --- > > Key: HIVE-15376 > URL: https://issues.apache.org/jira/browse/HIVE-15376 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 2.2.0 >Reporter: Wei Zheng >Assignee: Wei Zheng > Attachments: HIVE-15376.1.patch, HIVE-15376.2.patch, > HIVE-15376.3.patch, HIVE-15376.4.patch, HIVE-15376.5.patch > > > HIVE-12366 improved the heartbeater logic by bringing down the gap between > the lock acquisition and first heartbeat, but that's not enough, there may > still be some issue, e.g. > Time A: a transaction is opened > Time B: acquireLocks is called (blocking call), but it can take a long time > to actually acquire the locks and return if the system is busy > Time C: as acquireLocks returns, the first heartbeat is sent > If hive.txn.timeout < C - A, then the transaction will be timed out and > aborted, thus causing failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14731) Use Tez cartesian product edge in Hive (unpartitioned case only)
[ https://issues.apache.org/jira/browse/HIVE-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15742658#comment-15742658 ] Jesus Camacho Rodriguez commented on HIVE-14731: [~aplusplus], is there a reason to not enable the new edge by default? That would increase test coverage. > Use Tez cartesian product edge in Hive (unpartitioned case only) > > > Key: HIVE-14731 > URL: https://issues.apache.org/jira/browse/HIVE-14731 > Project: Hive > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Attachments: HIVE-14731.1.patch, HIVE-14731.10.patch, > HIVE-14731.2.patch, HIVE-14731.3.patch, HIVE-14731.4.patch, > HIVE-14731.5.patch, HIVE-14731.6.patch, HIVE-14731.7.patch, > HIVE-14731.8.patch, HIVE-14731.9.patch > > > Given cartesian product edge is available in Tez now (see TEZ-3230), let's > integrate it into Hive on Tez. This allows us to have more than one reducer > in cross product queries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15413) Primary key constraints forced to be unique across database and table names
[ https://issues.apache.org/jira/browse/HIVE-15413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15742643#comment-15742643 ] Alan Gates commented on HIVE-15413: --- Just making it unique within the database seems fine to me. > Primary key constraints forced to be unique across database and table names > --- > > Key: HIVE-15413 > URL: https://issues.apache.org/jira/browse/HIVE-15413 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 2.2.0 >Reporter: Alan Gates >Priority: Critical > > In the RDBMS underlying the metastore the table that stores primary and > foreign keys has it's own primary key (at the RDBMS level) of > (constraint_name, position). This means that a constraint name must be > unique across all tables and databases in a system. This is not reasonable. > Database and table name should be included in the RDBMS primary key. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14731) Use Tez cartesian product edge in Hive (unpartitioned case only)
[ https://issues.apache.org/jira/browse/HIVE-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15742640#comment-15742640 ] Gunther Hagleitner commented on HIVE-14731: --- Two quick comments: a) listing the entire output of the x prod in the golden file makes the patch unnecessarily large. Can you either do a sum(hash(...)) over the output or use smaller tables? b) this patch probably warrants more tests, unless there are other places in the q files that already cover that. > Use Tez cartesian product edge in Hive (unpartitioned case only) > > > Key: HIVE-14731 > URL: https://issues.apache.org/jira/browse/HIVE-14731 > Project: Hive > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Attachments: HIVE-14731.1.patch, HIVE-14731.10.patch, > HIVE-14731.2.patch, HIVE-14731.3.patch, HIVE-14731.4.patch, > HIVE-14731.5.patch, HIVE-14731.6.patch, HIVE-14731.7.patch, > HIVE-14731.8.patch, HIVE-14731.9.patch > > > Given cartesian product edge is available in Tez now (see TEZ-3230), let's > integrate it into Hive on Tez. This allows us to have more than one reducer > in cross product queries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15294) Capture additional metadata to replicate a simple insert at destination
[ https://issues.apache.org/jira/browse/HIVE-15294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-15294: Status: Patch Available (was: Open) > Capture additional metadata to replicate a simple insert at destination > --- > > Key: HIVE-15294 > URL: https://issues.apache.org/jira/browse/HIVE-15294 > Project: Hive > Issue Type: Sub-task > Components: repl >Reporter: Vaibhav Gumashta >Assignee: Vaibhav Gumashta > Attachments: HIVE-15294.1.patch > > > For replicating inserts like {{INSERT INTO ... SELECT ... FROM}}, we will > need to capture the newly added files in the notification message to be able > to replicate the event at destination. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15294) Capture additional metadata to replicate a simple insert at destination
[ https://issues.apache.org/jira/browse/HIVE-15294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-15294: Attachment: HIVE-15294.1.patch > Capture additional metadata to replicate a simple insert at destination > --- > > Key: HIVE-15294 > URL: https://issues.apache.org/jira/browse/HIVE-15294 > Project: Hive > Issue Type: Sub-task > Components: repl >Reporter: Vaibhav Gumashta >Assignee: Vaibhav Gumashta > Attachments: HIVE-15294.1.patch > > > For replicating inserts like {{INSERT INTO ... SELECT ... FROM}}, we will > need to capture the newly added files in the notification message to be able > to replicate the event at destination. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14735) Build Infra: Spark artifacts download takes a long time
[ https://issues.apache.org/jira/browse/HIVE-14735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15742633#comment-15742633 ] Hive QA commented on HIVE-14735: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12842623/HIVE-14735.3.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 10795 tests executed *Failed tests:* {noformat} TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=108) [groupby_grouping_id2.q,input17.q,bucketmapjoin12.q,ppd_gby_join.q,auto_join10.q,ptf_rcfile.q,vectorized_rcfile_columnar.q,vector_elt.q,ppd_join5.q,ppd_join.q,join_filters_overlap.q,join_cond_pushdown_1.q,timestamp_3.q,load_dyn_part6.q,stats_noscan_2.q] TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely timed out) (batchId=250) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a] (batchId=134) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] (batchId=134) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision] (batchId=150) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2542/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2542/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2542/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 10 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12842623 - PreCommit-HIVE-Build > Build Infra: Spark artifacts download takes a long time > --- > > Key: HIVE-14735 > URL: https://issues.apache.org/jira/browse/HIVE-14735 > Project: Hive > Issue Type: Bug > Components: Build Infrastructure >Reporter: Vaibhav Gumashta >Assignee: Zoltan Haindrich > Attachments: HIVE-14735.1.patch, HIVE-14735.1.patch, > HIVE-14735.1.patch, HIVE-14735.1.patch, HIVE-14735.2.patch, HIVE-14735.3.patch > > > In particular this command: > {{curl -Sso ./../thirdparty/spark-1.6.0-bin-hadoop2-without-hive.tgz > http://d3jw87u4immizc.cloudfront.net/spark-tarball/spark-1.6.0-bin-hadoop2-without-hive.tgz}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)