[jira] [Commented] (HIVE-15399) Parser change for UniqueJoin

2016-12-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15744483#comment-15744483
 ] 

Hive QA commented on HIVE-15399:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12842938/HIVE-15399.01.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 10797 tests 
executed
*Failed tests:*
{noformat}
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=110)

[tez_joins_explain.q,transform2.q,groupby5.q,cbo_semijoin.q,bucketmapjoin13.q,union_remove_6_subq.q,groupby2_map_multi_distinct.q,load_dyn_part9.q,multi_insert_gby2.q,vectorization_11.q,groupby_position.q,avro_compression_enabled_native.q,smb_mapjoin_8.q,join21.q,auto_join16.q]
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=251)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_5] 
(batchId=92)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[uniquejoin] 
(batchId=85)
org.apache.hadoop.hive.ql.parse.TestIUD.testSelectStarFromAnonymousVirtTable1Row
 (batchId=257)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2555/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2555/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2555/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 12 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12842938 - PreCommit-HIVE-Build

> Parser change for UniqueJoin
> 
>
> Key: HIVE-15399
> URL: https://issues.apache.org/jira/browse/HIVE-15399
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15399.01.patch
>
>
> UniqueJoin was introduced in HIVE-591. Add Unique Join. (Emil Ibrishimov via 
> namit). It sounds like that there is only one q test for unique join, i.e., 
> uniquejoin.q. In the q test, unique join source can only come from a table. 
> However, in parser, its source can come from not only tableSource, but also
> {code}
> partitionedTableFunction | tableSource | subQuerySource | virtualTableSource
> {code}
> I think it would be better to change the parser and limit it to meet the 
> user's real requirement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15074) Schematool provides a way to detect invalid entries in VERSION table

2016-12-12 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15744480#comment-15744480
 ] 

Lefty Leverenz commented on HIVE-15074:
---

Okay, I added a TODOC2.2 label.  Thanks.

> Schematool provides a way to detect invalid entries in VERSION table
> 
>
> Key: HIVE-15074
> URL: https://issues.apache.org/jira/browse/HIVE-15074
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Yongzhi Chen
>Assignee: Chaoyu Tang
>Priority: Minor
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-15074.1.patch, HIVE-15074.patch
>
>
> For some unknown reason, we see customer's HMS can not start because there 
> are multiple entries in their HMS VERSION table. Schematool should provide a 
> way to validate the HMS db and provide warning and fix options for this kind 
> of issues. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15074) Schematool provides a way to detect invalid entries in VERSION table

2016-12-12 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-15074:
--
Labels: TODOC2.2  (was: )

> Schematool provides a way to detect invalid entries in VERSION table
> 
>
> Key: HIVE-15074
> URL: https://issues.apache.org/jira/browse/HIVE-15074
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Yongzhi Chen
>Assignee: Chaoyu Tang
>Priority: Minor
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-15074.1.patch, HIVE-15074.patch
>
>
> For some unknown reason, we see customer's HMS can not start because there 
> are multiple entries in their HMS VERSION table. Schematool should provide a 
> way to validate the HMS db and provide warning and fix options for this kind 
> of issues. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15423) Allowing Hive to reverse map IP from hostname for partition info

2016-12-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15744434#comment-15744434
 ] 

ASF GitHub Bot commented on HIVE-15423:
---

GitHub user subahugu opened a pull request:

https://github.com/apache/hive/pull/122

HIVE-15423: Allowing Hive to reverse map IP from hostname for partiti…

…on info

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/subahugu/hive branch-1.2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/122.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #122


commit 754783def2ac81b54f1a24403a78b9b47ed6e091
Author: suresh.bahuguna 
Date:   2016-12-13T07:23:55Z

HIVE-15423: Allowing Hive to reverse map IP from hostname for partition info




> Allowing Hive to reverse map IP from hostname for partition info
> 
>
> Key: HIVE-15423
> URL: https://issues.apache.org/jira/browse/HIVE-15423
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.1
>Reporter: Suresh Bahuguna
>
> Hive - Namenode hostname mismatch when running queries with 2 MR jobs.
> Hive tries to find Partition info using hdfs://:, 
> whereas the info has been hashed using hdfs://:.
> Exception raised in HiveFileFormatUtils.java:
> -
> java.io.IOException: cannot find dir = 
> hdfs://hd-nn-24:9000/tmp/hive-admin/hive_2013-08-30_06-11-52_007_1545561832334194535/-mr-10002/00_0
>  in pathToPartitionInfo: 
> [hdfs://192.168.156.24:9000/tmp/hive-admin/hive_2013-08-30_06-11-52_007_1545561832334194535/-mr-10002]
> at 
> org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java
> -



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15422) HiveInputFormat::pushProjectionsAndFilters paths comparison generates huge number of objects for partitioned dataset

2016-12-12 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-15422:

Summary: HiveInputFormat::pushProjectionsAndFilters paths comparison 
generates huge number of objects for partitioned dataset  (was: 
HiveInputFormat::pushProjectionsAndFilters path comparisons create huge number 
of objects for partitioned dataset)

> HiveInputFormat::pushProjectionsAndFilters paths comparison generates huge 
> number of objects for partitioned dataset
> 
>
> Key: HIVE-15422
> URL: https://issues.apache.org/jira/browse/HIVE-15422
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-15422.1.patch, Profiler_Snapshot_HIVE-15422.png
>
>
> When executing the following query in LLAP (single instance) in a 5 node 
> cluster, lots of GC pressure was observed.
> {noformat}
> select a.type, a.city , a.frequency, b.city, b.country, b.lat, b.lon
> from (select  'depart' as type, origin as city, count(origin) as frequency
> from flights
>   group by origin
>   order by frequency desc, type) as a 
> left join airports as b on a.city = b.iata
> order by frequency desc;
> {noformat}
> Flights table has got around 7000+ partitions in S3. Profiling revealed large 
> amount of objects created just in path comparisons in HiveInputFormat.  
> HIVE-15405 reduces number of path comparisons at FileUtils, but it still ends 
> up doing lots of comparisons in HiveInputFormat::pushProjectionsAndFilters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15422) HiveInputFormat::pushProjectionsAndFilters path comparisons create huge number of objects for partitioned dataset

2016-12-12 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-15422:

Status: Patch Available  (was: Open)

> HiveInputFormat::pushProjectionsAndFilters path comparisons create huge 
> number of objects for partitioned dataset
> -
>
> Key: HIVE-15422
> URL: https://issues.apache.org/jira/browse/HIVE-15422
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-15422.1.patch, Profiler_Snapshot_HIVE-15422.png
>
>
> When executing the following query in LLAP (single instance) in a 5 node 
> cluster, lots of GC pressure was observed.
> {noformat}
> select a.type, a.city , a.frequency, b.city, b.country, b.lat, b.lon
> from (select  'depart' as type, origin as city, count(origin) as frequency
> from flights
>   group by origin
>   order by frequency desc, type) as a 
> left join airports as b on a.city = b.iata
> order by frequency desc;
> {noformat}
> Flights table has got around 7000+ partitions in S3. Profiling revealed large 
> amount of objects created just in path comparisons in HiveInputFormat.  
> HIVE-15405 reduces number of path comparisons at FileUtils, but it still ends 
> up doing lots of comparisons in HiveInputFormat::pushProjectionsAndFilters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15422) HiveInputFormat::pushProjectionsAndFilters path comparisons create huge number of objects for partitioned dataset

2016-12-12 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-15422:

Attachment: Profiler_Snapshot_HIVE-15422.png

> HiveInputFormat::pushProjectionsAndFilters path comparisons create huge 
> number of objects for partitioned dataset
> -
>
> Key: HIVE-15422
> URL: https://issues.apache.org/jira/browse/HIVE-15422
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-15422.1.patch, Profiler_Snapshot_HIVE-15422.png
>
>
> When executing the following query in LLAP (single instance) in a 5 node 
> cluster, lots of GC pressure was observed.
> {noformat}
> select a.type, a.city , a.frequency, b.city, b.country, b.lat, b.lon
> from (select  'depart' as type, origin as city, count(origin) as frequency
> from flights
>   group by origin
>   order by frequency desc, type) as a 
> left join airports as b on a.city = b.iata
> order by frequency desc;
> {noformat}
> Flights table has got around 7000+ partitions in S3. Profiling revealed large 
> amount of objects created just in path comparisons in HiveInputFormat.  
> HIVE-15405 reduces number of path comparisons at FileUtils, but it still ends 
> up doing lots of comparisons in HiveInputFormat::pushProjectionsAndFilters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15422) HiveInputFormat::pushProjectionsAndFilters path comparisons create huge number of objects for partitioned dataset

2016-12-12 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-15422:

Attachment: HIVE-15422.1.patch

> HiveInputFormat::pushProjectionsAndFilters path comparisons create huge 
> number of objects for partitioned dataset
> -
>
> Key: HIVE-15422
> URL: https://issues.apache.org/jira/browse/HIVE-15422
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-15422.1.patch, Profiler_Snapshot_HIVE-15422.png
>
>
> When executing the following query in LLAP (single instance) in a 5 node 
> cluster, lots of GC pressure was observed.
> {noformat}
> select a.type, a.city , a.frequency, b.city, b.country, b.lat, b.lon
> from (select  'depart' as type, origin as city, count(origin) as frequency
> from flights
>   group by origin
>   order by frequency desc, type) as a 
> left join airports as b on a.city = b.iata
> order by frequency desc;
> {noformat}
> Flights table has got around 7000+ partitions in S3. Profiling revealed large 
> amount of objects created just in path comparisons in HiveInputFormat.  
> HIVE-15405 reduces number of path comparisons at FileUtils, but it still ends 
> up doing lots of comparisons in HiveInputFormat::pushProjectionsAndFilters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15386) Expose Spark task counts and stage Ids information in SparkTask from SparkJobMonitor

2016-12-12 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-15386:
--
Affects Version/s: (was: 2.2.0)
   2.1.1

> Expose Spark task counts and stage Ids information in SparkTask from 
> SparkJobMonitor
> 
>
> Key: HIVE-15386
> URL: https://issues.apache.org/jira/browse/HIVE-15386
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 2.1.1
>Reporter: zhihai xu
>Assignee: zhihai xu
> Fix For: 2.2.0
>
> Attachments: HIVE-15386.000.patch, HIVE-15386.001.patch, 
> HIVE-15386.002.patch
>
>
> Expose Spark task counts and stage Ids information in SparkTask from 
> SparkJobMonitor. So these information can be used by hive hook to monitor 
> spark jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15386) Expose Spark task counts and stage Ids information in SparkTask from SparkJobMonitor

2016-12-12 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-15386:
--
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Committed to master. Thanks Zhihai for the contribution and Xuefu for the 
review :)

> Expose Spark task counts and stage Ids information in SparkTask from 
> SparkJobMonitor
> 
>
> Key: HIVE-15386
> URL: https://issues.apache.org/jira/browse/HIVE-15386
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 2.2.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Fix For: 2.2.0
>
> Attachments: HIVE-15386.000.patch, HIVE-15386.001.patch, 
> HIVE-15386.002.patch
>
>
> Expose Spark task counts and stage Ids information in SparkTask from 
> SparkJobMonitor. So these information can be used by hive hook to monitor 
> spark jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13278) Many redundant 'File not found' messages appeared in container log during query execution with Hive on Spark

2016-12-12 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15744366#comment-15744366
 ] 

Rui Li commented on HIVE-13278:
---

Hi [~xuefuz], the conclusion is we somehow try to read reduce.xml for map-only 
job, and yes it happens to MR as well. The call path is 
{{HiveOutputFormatImpl.checkOutputSpecs -> Utilities.getMapRedWork}}. The 
reason why HiveOutputFormatImpl needs to get the MapRedWork is it needs to do 
some check on all the FS operators. Since FS only exists at the end of a job, 
my suggestion is we firstly try to get MapWork. If the MapWork has an FS in it, 
it means this is a map-only job so we don't have to look for ReduceWork. But 
[~stakiar] found that some map-only job may not have FS in the MapWork, e.g. 
{{ANALYZE TABLE}}. To have a complete fix, we'll need some flag in the JobConf 
indicating if this is map-only. Or we can use my solution, which solves the 
issue for most cases.

Some special handling for HoS may be needed. For HoS, each map.xml and 
reduce.xml resides in a different path. We can use {{mapred.task.is.map}} to 
determine whether the JobConf is for MapWork or ReduceWork. And then call 
getMapWork or getReduceWork respectively.

> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark
> 
>
> Key: HIVE-13278
> URL: https://issues.apache.org/jira/browse/HIVE-13278
> Project: Hive
>  Issue Type: Bug
> Environment: Hive on Spark engine
> Found based on :
> Apache Hive 2.0.0
> Apache Spark 1.6.0
>Reporter: Xin Hao
>Assignee: Sahil Takiar
>Priority: Minor
>
> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark.
> Certainly, it doesn't prevent the query from running successfully. So mark it 
> as Minor currently.
> Error message example:
> {noformat}
> 16/03/14 01:45:06 INFO exec.Utilities: File not found: File does not exist: 
> /tmp/hive/hadoop/2d378538-f5d3-493c-9276-c62dd6634fb4/hive_2016-03-14_01-44-16_835_623058724409492515-6/-mr-10010/0a6d0cae-1eb3-448c-883b-590b3b198a73/reduce.xml
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15399) Parser change for UniqueJoin

2016-12-12 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15399:
---
Status: Patch Available  (was: Open)

[~ashutoshc] ,could u take a look? Thanks.

> Parser change for UniqueJoin
> 
>
> Key: HIVE-15399
> URL: https://issues.apache.org/jira/browse/HIVE-15399
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15399.01.patch
>
>
> UniqueJoin was introduced in HIVE-591. Add Unique Join. (Emil Ibrishimov via 
> namit). It sounds like that there is only one q test for unique join, i.e., 
> uniquejoin.q. In the q test, unique join source can only come from a table. 
> However, in parser, its source can come from not only tableSource, but also
> {code}
> partitionedTableFunction | tableSource | subQuerySource | virtualTableSource
> {code}
> I think it would be better to change the parser and limit it to meet the 
> user's real requirement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15399) Parser change for UniqueJoin

2016-12-12 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15399:
---
Attachment: HIVE-15399.01.patch

> Parser change for UniqueJoin
> 
>
> Key: HIVE-15399
> URL: https://issues.apache.org/jira/browse/HIVE-15399
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15399.01.patch
>
>
> UniqueJoin was introduced in HIVE-591. Add Unique Join. (Emil Ibrishimov via 
> namit). It sounds like that there is only one q test for unique join, i.e., 
> uniquejoin.q. In the q test, unique join source can only come from a table. 
> However, in parser, its source can come from not only tableSource, but also
> {code}
> partitionedTableFunction | tableSource | subQuerySource | virtualTableSource
> {code}
> I think it would be better to change the parser and limit it to meet the 
> user's real requirement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13452) StatsOptimizer should return no rows on empty table with group by

2016-12-12 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-13452:
---
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

pushed to master. Thanks [~ashutoshc] for the review!

> StatsOptimizer should return no rows on empty table with group by
> -
>
> Key: HIVE-13452
> URL: https://issues.apache.org/jira/browse/HIVE-13452
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 1.0.0, 1.2.0, 1.1.0, 2.0.0, 2.1.0
>Reporter: Ashutosh Chauhan
>Assignee: Pengcheng Xiong
> Fix For: 2.2.0
>
> Attachments: HIVE-13452.01.patch
>
>
> {code}
> create table t1 (a int);
> analyze table t1 compute statistics;
> analyze table t1 compute statistics for columns;
> select count(1) from t1 group by 1;
> set hive.compute.query.using.stats=true;
> select count(1) from t1 group by 1;
> {code}
> In both cases result set should be empty. However, with statsoptimizer on 
> Hive returns one row with value 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13452) StatsOptimizer should return no rows on empty table with group by

2016-12-12 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-13452:
---
Affects Version/s: 1.0.0
   1.2.0
   1.1.0
   2.0.0
   2.1.0

> StatsOptimizer should return no rows on empty table with group by
> -
>
> Key: HIVE-13452
> URL: https://issues.apache.org/jira/browse/HIVE-13452
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 1.0.0, 1.2.0, 1.1.0, 2.0.0, 2.1.0
>Reporter: Ashutosh Chauhan
>Assignee: Pengcheng Xiong
> Fix For: 2.2.0
>
> Attachments: HIVE-13452.01.patch
>
>
> {code}
> create table t1 (a int);
> analyze table t1 compute statistics;
> analyze table t1 compute statistics for columns;
> select count(1) from t1 group by 1;
> set hive.compute.query.using.stats=true;
> select count(1) from t1 group by 1;
> {code}
> In both cases result set should be empty. However, with statsoptimizer on 
> Hive returns one row with value 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13452) StatsOptimizer should return no rows on empty table with group by

2016-12-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15744204#comment-15744204
 ] 

Hive QA commented on HIVE-13452:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12842907/HIVE-13452.01.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10782 tests 
executed
*Failed tests:*
{noformat}
TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed 
out) (batchId=144)

[vectorized_rcfile_columnar.q,vector_elt.q,explainuser_1.q,multi_insert.q,tez_dml.q,vector_bround.q,schema_evol_orc_acid_table.q,vector_when_case_null.q,orc_ppd_schema_evol_1b.q,vector_join30.q,vectorization_11.q,cte_3.q,update_tmp_table.q,vector_decimal_cast.q,groupby_grouping_id2.q,vector_decimal_round.q,tez_smb_empty.q,orc_merge6.q,vector_decimal_trailing.q,cte_5.q,tez_union.q,cbo_rp_subq_not_in.q,vector_decimal_2.q,columnStatsUpdateForStatsOptimizer_1.q,vector_outer_join3.q,schema_evol_text_vec_part_all_complex.q,tez_dynpart_hashjoin_2.q,auto_sortmerge_join_12.q,offset_limit.q,tez_union_multiinsert.q]
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=251)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
 (batchId=151)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2554/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2554/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2554/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12842907 - PreCommit-HIVE-Build

> StatsOptimizer should return no rows on empty table with group by
> -
>
> Key: HIVE-13452
> URL: https://issues.apache.org/jira/browse/HIVE-13452
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Ashutosh Chauhan
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13452.01.patch
>
>
> {code}
> create table t1 (a int);
> analyze table t1 compute statistics;
> analyze table t1 compute statistics for columns;
> select count(1) from t1 group by 1;
> set hive.compute.query.using.stats=true;
> select count(1) from t1 group by 1;
> {code}
> In both cases result set should be empty. However, with statsoptimizer on 
> Hive returns one row with value 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-13278) Many redundant 'File not found' messages appeared in container log during query execution with Hive on Spark

2016-12-12 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15744091#comment-15744091
 ] 

Xuefu Zhang edited comment on HIVE-13278 at 12/13/16 4:15 AM:
--

[~lirui], I'm not sure if I understand the conclusion here, so a summary would 
be great. Annoying log is one thing, but another, more important problem is 
that trying to read reduce.xml for map-only tasks puts load on namenode. I'm 
wondering if it's possible to avoid making that call. Can you share your 
thoughts? Thanks.

BTW, this seems happening to Hive on MR as well.


was (Author: xuefuz):
[~lirui], I'm not sure if I understand the conclusion here, so a summary would 
be great. Annoying log is one thing, but another, more important problem is 
that trying to read reduce.xml for map-only tasks puts load on namenode. I'm 
wondering if it's possible to avoid making that call. Can you share your 
thoughts? Thanks.

> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark
> 
>
> Key: HIVE-13278
> URL: https://issues.apache.org/jira/browse/HIVE-13278
> Project: Hive
>  Issue Type: Bug
> Environment: Hive on Spark engine
> Found based on :
> Apache Hive 2.0.0
> Apache Spark 1.6.0
>Reporter: Xin Hao
>Assignee: Sahil Takiar
>Priority: Minor
>
> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark.
> Certainly, it doesn't prevent the query from running successfully. So mark it 
> as Minor currently.
> Error message example:
> {noformat}
> 16/03/14 01:45:06 INFO exec.Utilities: File not found: File does not exist: 
> /tmp/hive/hadoop/2d378538-f5d3-493c-9276-c62dd6634fb4/hive_2016-03-14_01-44-16_835_623058724409492515-6/-mr-10010/0a6d0cae-1eb3-448c-883b-590b3b198a73/reduce.xml
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13278) Many redundant 'File not found' messages appeared in container log during query execution with Hive on Spark

2016-12-12 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15744091#comment-15744091
 ] 

Xuefu Zhang commented on HIVE-13278:


[~lirui], I'm not sure if I understand the conclusion here, so a summary would 
be great. Annoying log is one thing, but another, more important problem is 
that trying to read reduce.xml for map-only tasks puts load on namenode. I'm 
wondering if it's possible to avoid making that call. Can you share your 
thoughts? Thanks.

> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark
> 
>
> Key: HIVE-13278
> URL: https://issues.apache.org/jira/browse/HIVE-13278
> Project: Hive
>  Issue Type: Bug
> Environment: Hive on Spark engine
> Found based on :
> Apache Hive 2.0.0
> Apache Spark 1.6.0
>Reporter: Xin Hao
>Assignee: Sahil Takiar
>Priority: Minor
>
> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark.
> Certainly, it doesn't prevent the query from running successfully. So mark it 
> as Minor currently.
> Error message example:
> {noformat}
> 16/03/14 01:45:06 INFO exec.Utilities: File not found: File does not exist: 
> /tmp/hive/hadoop/2d378538-f5d3-493c-9276-c62dd6634fb4/hive_2016-03-14_01-44-16_835_623058724409492515-6/-mr-10010/0a6d0cae-1eb3-448c-883b-590b3b198a73/reduce.xml
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14007) Replace ORC module with ORC release

2016-12-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15744088#comment-15744088
 ] 

Hive QA commented on HIVE-14007:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12842906/HIVE-14007.patch

{color:green}SUCCESS:{color} +1 due to 6 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 45 failed/errored test(s), 9958 tests 
executed
*Failed tests:*
{noformat}
TestBitFieldReader - did not produce a TEST-*.xml file (likely timed out) 
(batchId=237)
TestBitPack - did not produce a TEST-*.xml file (likely timed out) (batchId=237)
TestColumnStatistics - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
TestColumnStatisticsImpl - did not produce a TEST-*.xml file (likely timed out) 
(batchId=236)
TestDataReaderProperties - did not produce a TEST-*.xml file (likely timed out) 
(batchId=236)
TestDynamicArray - did not produce a TEST-*.xml file (likely timed out) 
(batchId=236)
TestFileDump - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
TestInStream - did not produce a TEST-*.xml file (likely timed out) 
(batchId=236)
TestIntegerCompressionReader - did not produce a TEST-*.xml file (likely timed 
out) (batchId=236)
TestJsonFileDump - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
TestMemoryManager - did not produce a TEST-*.xml file (likely timed out) 
(batchId=237)
TestMiniLlapCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=133)

[mapreduce2.q,orc_llap_counters1.q,bucket6.q,insert_into1.q,empty_dir_in_table.q,orc_merge1.q,script_env_var1.q,orc_merge_diff_fs.q,llapdecider.q,load_hdfs_file_with_space_in_the_name.q,llap_nullscan.q,orc_ppd_basic.q,transform_ppr1.q,rcfile_merge4.q,orc_merge3.q]
TestMiniLlapCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=137)

[orc_merge2.q,insert_into2.q,reduce_deduplicate.q,orc_llap_counters.q,cte_4.q,schemeAuthority2.q,file_with_header_footer.q,rcfile_merge3.q]
TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed 
out) (batchId=144)

[vectorized_rcfile_columnar.q,vector_elt.q,explainuser_1.q,multi_insert.q,tez_dml.q,vector_bround.q,schema_evol_orc_acid_table.q,vector_when_case_null.q,orc_ppd_schema_evol_1b.q,vector_join30.q,vectorization_11.q,cte_3.q,update_tmp_table.q,vector_decimal_cast.q,groupby_grouping_id2.q,vector_decimal_round.q,tez_smb_empty.q,orc_merge6.q,vector_decimal_trailing.q,cte_5.q,tez_union.q,cbo_rp_subq_not_in.q,vector_decimal_2.q,columnStatsUpdateForStatsOptimizer_1.q,vector_outer_join3.q,schema_evol_text_vec_part_all_complex.q,tez_dynpart_hashjoin_2.q,auto_sortmerge_join_12.q,offset_limit.q,tez_union_multiinsert.q]
TestNewIntegerEncoding - did not produce a TEST-*.xml file (likely timed out) 
(batchId=238)
TestOrcNullOptimization - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
TestOrcTimezone1 - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
TestOrcTimezone2 - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
TestOrcTimezone3 - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
TestOrcWideTable - did not produce a TEST-*.xml file (likely timed out) 
(batchId=236)
TestOutStream - did not produce a TEST-*.xml file (likely timed out) 
(batchId=237)
TestRLEv2 - did not produce a TEST-*.xml file (likely timed out) (batchId=236)
TestReaderImpl - did not produce a TEST-*.xml file (likely timed out) 
(batchId=237)
TestRecordReaderImpl - did not produce a TEST-*.xml file (likely timed out) 
(batchId=237)
TestRunLengthByteReader - did not produce a TEST-*.xml file (likely timed out) 
(batchId=236)
TestRunLengthIntegerReader - did not produce a TEST-*.xml file (likely timed 
out) (batchId=237)
TestSchemaEvolution - did not produce a TEST-*.xml file (likely timed out) 
(batchId=237)
TestSerializationUtils - did not produce a TEST-*.xml file (likely timed out) 
(batchId=237)
TestStreamName - did not produce a TEST-*.xml file (likely timed out) 
(batchId=236)
TestStringDictionary - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
TestStringRedBlackTree - did not produce a TEST-*.xml file (likely timed out) 
(batchId=236)
TestTypeDescription - did not produce a TEST-*.xml file (likely timed out) 
(batchId=238)
TestUnrolledBitPack - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
TestVectorOrcFile - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=251)
TestZlib - did not produce a TEST-*.xml file (likely timed out) (batchId=237)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.

[jira] [Commented] (HIVE-15386) Expose Spark task counts and stage Ids information in SparkTask from SparkJobMonitor

2016-12-12 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15744066#comment-15744066
 ] 

Xuefu Zhang commented on HIVE-15386:


Patch looks good to me as well. +1

> Expose Spark task counts and stage Ids information in SparkTask from 
> SparkJobMonitor
> 
>
> Key: HIVE-15386
> URL: https://issues.apache.org/jira/browse/HIVE-15386
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 2.2.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: HIVE-15386.000.patch, HIVE-15386.001.patch, 
> HIVE-15386.002.patch
>
>
> Expose Spark task counts and stage Ids information in SparkTask from 
> SparkJobMonitor. So these information can be used by hive hook to monitor 
> spark jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15335) Fast Decimal

2016-12-12 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743990#comment-15743990
 ] 

Sergey Shelukhin commented on HIVE-15335:
-

Did a pass of everything except FastHiveDecimalImpl.java itself. Will look at 
that later this week

> Fast Decimal
> 
>
> Key: HIVE-15335
> URL: https://issues.apache.org/jira/browse/HIVE-15335
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15335.01.patch, HIVE-15335.02.patch, 
> HIVE-15335.03.patch, HIVE-15335.04.patch, HIVE-15335.05.patch, 
> HIVE-15335.06.patch, HIVE-15335.07.patch
>
>
> Replace HiveDecimal implementation that currently represents the decimal 
> internally as a BigDecimal with a faster version that does not allocate extra 
> objects
> Replace HiveDecimalWritable implementation with a faster version that has new 
> mutable* calls (e.g. mutableAdd, mutableEnforcePrecisionScale, etc) and 
> stores the result as a fast decimal instead of a slow byte array containing a 
> serialized BigInteger.
> Provide faster ways to serialize/deserialize decimals.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15421) Assumption in exception handling can be wrong in DagUtils.localizeResource

2016-12-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743964#comment-15743964
 ] 

Hive QA commented on HIVE-15421:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12842890/HIVE-15421.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10811 tests 
executed
*Failed tests:*
{noformat}
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=251)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
 (batchId=151)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2552/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2552/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2552/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12842890 - PreCommit-HIVE-Build

> Assumption in exception handling can be wrong in DagUtils.localizeResource
> --
>
> Key: HIVE-15421
> URL: https://issues.apache.org/jira/browse/HIVE-15421
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15421.1.patch
>
>
> In localizeResource once we got an IOException, we always assume this is due 
> to another thread writing the same file. But that is not always the case. 
> Even without the interference from other threads, it may still get an 
> IOException (RemoteException) due to failure of copyFromLocalFile in a 
> specific environment, for example, in a kerberized HDFS encryption zone where 
> the TGT is expired.
> We'd better fail early with different message to avoid confusion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15147) LLAP: use LLAP cache for non-columnar formats in a somewhat general way

2016-12-12 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15147:

Attachment: HIVE-15147.02.WIP.noout.patch

Added the cleanup, and fixed the initial issues; when running the tests now I 
see cache data being used, both to read entire files, and to complement file 
data. 

> LLAP: use LLAP cache for non-columnar formats in a somewhat general way
> ---
>
> Key: HIVE-15147
> URL: https://issues.apache.org/jira/browse/HIVE-15147
> Project: Hive
>  Issue Type: New Feature
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15147.01.WIP.noout.patch, 
> HIVE-15147.02.WIP.noout.patch, HIVE-15147.WIP.noout.patch
>
>
> The primary goal for the first pass is caching text files. Nothing would 
> prevent other formats from using the same path, in principle, although, as 
> was originally done with ORC, it may be better to have native caching support 
> optimized for each particular format.
> Given that caching pure text is not smart, and we already have ORC-encoded 
> cache that is columnar due to ORC file structure, we will transform data into 
> columnar ORC.
> The general idea is to treat all the data in the world as merely ORC that was 
> compressed with some poor compression codec, such as csv. Using the original 
> IF and serde, as well as an ORC writer (with some heavyweight optimizations 
> disabled, potentially), we can "uncompress" the csv/whatever data into its 
> "original" ORC representation, then cache it efficiently, by column, and also 
> reuse a lot of the existing code.
> Various other points:
> 1) Caching granularity will have to be somehow determined (i.e. how do we 
> slice the file horizontally, to avoid caching entire columns). As with ORC 
> uncompressed files, the specific offsets don't really matter as long as they 
> are consistent between reads. The problem is that the file offsets will 
> actually need to be propagated to the new reader from the original 
> inputformat. Row counts are easier to use but there's a problem of how to 
> actually map them to missing ranges to read from disk.
> 2) Obviously, for row-based formats, if any one column that is to be read has 
> been evicted or is otherwise missing, "all the columns" have to be read for 
> the corresponding slice to cache and read that one column. The vague plan is 
> to handle this implicitly, similarly to how ORC reader handles CB-RG overlaps 
> - it will just so happen that a missing column in disk range list to retrieve 
> will expand the disk-range-to-read into the whole horizontal slice of the 
> file.
> 3) Granularity/etc. won't work for gzipped text. If anything at all is 
> evicted, the entire file has to be re-read. Gzipped text is a ridiculous 
> feature, so this is by design.
> 4) In future, it would be possible to also build some form or 
> metadata/indexes for this cached data to do PPD, etc. This is out of the 
> scope for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15313) Add export spark.yarn.archive or spark.yarn.jars variable in Hive on Spark document

2016-12-12 Thread liyunzhang_intel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel updated HIVE-15313:

Description: 
According to 
[wiki|https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started],
 run queries in HOS16 and HOS20 in yarn mode.
Following table shows the difference in query time between HOS16 and HOS20.
||Version||Total time||Time for Jobs||Time for preparing jobs||
|Spark16|51|39|12|
|Spark20|54|40|14| 

 HOS20 spends more time(2 secs) on preparing jobs than HOS16. After reviewing 
the source code of spark, found that following point causes this:
code:[Client#distribute|https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L546],
 In spark20, if spark cannot find spark.yarn.archive and spark.yarn.jars in 
spark configuration file, it will first copy all jars in $SPARK_HOME/jars to a 
tmp directory and upload the tmp directory to distribute cache. Comparing 
[spark16|https://github.com/apache/spark/blob/branch-1.6/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L1145],
 
In spark16, it searches spark-assembly*.jar and upload it to distribute cache.

In spark20, it spends 2 more seconds to copy all jars in $SPARK_HOME/jar to a 
tmp directory if we don't set "spark.yarn.archive" or "spark.yarn.jars".

We can accelerate the startup of hive on spark 20 by settintg 
"spark.yarn.archive" or "spark.yarn.jars":
set "spark.yarn.archive":
{code}
cd $SPARK_HOME/jars
zip spark-archive.zip ./*.jar # this is important, enter the jars folder then 
zip
$ hadoop fs -copyFromLocal spark-archive.zip 
$ echo "spark.yarn.archive=hdfs:///xxx:8020/spark-archive.zip" >> 
conf/spark-defaults.conf
{code}
set "spark.yarn.jars":
{code}
$ hadoop fs mkdir spark-2.0.0-bin-hadoop 
$hadoop fs -copyFromLocal $SPARK_HOME/jars/* spark-2.0.0-bin-hadoop 
$ echo "spark.yarn.jars=hdfs:///xxx:8020/spark-2.0.0-bin-hadoop/*" >> 
conf/spark-defaults.conf
{code}

Suggest to add this part in wiki.

performance.improvement.after.set.spark.yarn.archive.PNG shows the detail 
performance impovement after setting spark.yarn.archive in small queries.





  was:
According to 
[wiki|https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started],
 run queries in HOS16 and HOS20 in yarn mode.
Following table shows the difference in query time between HOS16 and HOS20.
||Version||Total time||Time for Jobs||Time for preparing jobs||
|Spark16|51|39|12|
|Spark20|54|40|14| 

 HOS20 spends more time(2 secs) on preparing jobs than HOS16. After reviewing 
the source code of spark, found that following point causes this:
code:[Client#distribute|https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L546],
 In spark20, if spark cannot find spark.yarn.archive and spark.yarn.jars in 
spark configuration file, it will first copy all jars in $SPARK_HOME/jars to a 
tmp directory and upload the tmp directory to distribute cache. Comparing 
[spark16|https://github.com/apache/spark/blob/branch-1.6/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L1145],
 
In spark16, it searches spark-assembly*.jar and upload it to distribute cache.

In spark20, it spends 2 more seconds to copy all jars in $SPARK_HOME/jar to a 
tmp directory if we don't set "spark.yarn.archive" or "spark.yarn.jars".

We can accelerate the startup of hive on spark 20 by settintg 
"spark.yarn.archive" or "spark.yarn.jars":
set "spark.yarn.archive":
{code}
 zip spark-archive.zip $SPARK_HOME/jars/*
$ hadoop fs -copyFromLocal spark-archive.zip 
$ echo "spark.yarn.archive=hdfs:///xxx:8020/spark-archive.zip" >> 
conf/spark-defaults.conf
{code}
set "spark.yarn.jars":
{code}
$ hadoop fs mkdir spark-2.0.0-bin-hadoop 
$hadoop fs -copyFromLocal $SPARK_HOME/jars/* spark-2.0.0-bin-hadoop 
$ echo "spark.yarn.jars=hdfs:///xxx:8020/spark-2.0.0-bin-hadoop/*" >> 
conf/spark-defaults.conf
{code}

Suggest to add this part in wiki.

performance.improvement.after.set.spark.yarn.archive.PNG shows the detail 
performance impovement after setting spark.yarn.archive in small queries.






> Add export spark.yarn.archive or spark.yarn.jars variable in Hive on Spark 
> document
> ---
>
> Key: HIVE-15313
> URL: https://issues.apache.org/jira/browse/HIVE-15313
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Priority: Minor
> Attachments: performance.improvement.after.set.spark.yarn.archive.PNG
>
>
> According to 
> [wiki|https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started],
>  run queries in HOS16 and HOS20 in yarn mode.
> Following table shows the difference in query time between HOS16 and HOS20.
> ||Version||Total time||Time for Jobs||Time for prep

[jira] [Commented] (HIVE-15407) add distcp to classpath by default, because hive depends on it.

2016-12-12 Thread Fei Hui (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743900#comment-15743900
 ] 

Fei Hui commented on HIVE-15407:


hi [~prasanth_j]
could you please give suggestions and review it ?

> add distcp to classpath by default, because hive depends on it. 
> 
>
> Key: HIVE-15407
> URL: https://issues.apache.org/jira/browse/HIVE-15407
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline, CLI
>Affects Versions: 2.2.0
>Reporter: Fei Hui
>Assignee: Fei Hui
> Attachments: HIVE-15407.1.patch
>
>
> when i run hive queries, i get errors as follow
> java.lang.NoClassDefFoundError: org/apache/hadoop/tools/DistCpOptions
> ...
> I dig into code, and find that hive depends on distcp ,but distcp is not in 
> classpath by default.
> I think if adding distcp to hadoop classpath by default in hadoop project, 
> but hadoop committers will not do that. discussions in HADOOP-13865 . They 
> propose that Resolving this problem on HIVE
> So i add distcp to classpath on HIVE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15386) Expose Spark task counts and stage Ids information in SparkTask from SparkJobMonitor

2016-12-12 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743862#comment-15743862
 ] 

Rui Li commented on HIVE-15386:
---

Thanks for the update [~zxu]. +1.
[~xuefuz] do you have any further comments?

> Expose Spark task counts and stage Ids information in SparkTask from 
> SparkJobMonitor
> 
>
> Key: HIVE-15386
> URL: https://issues.apache.org/jira/browse/HIVE-15386
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 2.2.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: HIVE-15386.000.patch, HIVE-15386.001.patch, 
> HIVE-15386.002.patch
>
>
> Expose Spark task counts and stage Ids information in SparkTask from 
> SparkJobMonitor. So these information can be used by hive hook to monitor 
> spark jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15118) Remove unused 'COLUMNS' table from derby schema

2016-12-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743852#comment-15743852
 ] 

Hive QA commented on HIVE-15118:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12842887/HIVE-15118.3.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10811 tests 
executed
*Failed tests:*
{noformat}
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=251)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=93)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2551/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2551/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2551/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12842887 - PreCommit-HIVE-Build

> Remove unused 'COLUMNS' table from derby schema
> ---
>
> Key: HIVE-15118
> URL: https://issues.apache.org/jira/browse/HIVE-15118
> Project: Hive
>  Issue Type: Sub-task
>  Components: Database/Schema
>Affects Versions: 2.2.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Minor
> Attachments: HIVE-15118.1.patch, HIVE-15118.2.patch, 
> HIVE-15118.3.patch
>
>
> COLUMNS table is unused any more. Other databases already removed it. Remove 
> from derby as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15383) Add additional info to 'desc function extended' output

2016-12-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743769#comment-15743769
 ] 

Hive QA commented on HIVE-15383:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12842875/HIVE-15383.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 10781 tests 
executed
*Failed tests:*
{noformat}
TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed 
out) (batchId=144)

[vectorized_rcfile_columnar.q,vector_elt.q,explainuser_1.q,multi_insert.q,tez_dml.q,vector_bround.q,schema_evol_orc_acid_table.q,vector_when_case_null.q,orc_ppd_schema_evol_1b.q,vector_join30.q,vectorization_11.q,cte_3.q,update_tmp_table.q,vector_decimal_cast.q,groupby_grouping_id2.q,vector_decimal_round.q,tez_smb_empty.q,orc_merge6.q,vector_decimal_trailing.q,cte_5.q,tez_union.q,cbo_rp_subq_not_in.q,vector_decimal_2.q,columnStatsUpdateForStatsOptimizer_1.q,vector_outer_join3.q,schema_evol_text_vec_part_all_complex.q,tez_dynpart_hashjoin_2.q,auto_sortmerge_join_12.q,offset_limit.q,tez_union_multiinsert.q]
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=251)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_index] (batchId=33)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_sort_array] 
(batchId=59)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_stddev_pop] 
(batchId=71)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a]
 (batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=93)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2549/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2549/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2549/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 14 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12842875 - PreCommit-HIVE-Build

> Add additional info to 'desc function extended' output
> --
>
> Key: HIVE-15383
> URL: https://issues.apache.org/jira/browse/HIVE-15383
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 2.2.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Trivial
> Attachments: HIVE-15383.1.patch, HIVE-15383.2.patch
>
>
> Add additional info to the output to 'desc function extended'. The resources 
> would be helpful for the user to check which jars are referred.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13452) StatsOptimizer should return no rows on empty table with group by

2016-12-12 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-13452:
---
Status: Patch Available  (was: Open)

> StatsOptimizer should return no rows on empty table with group by
> -
>
> Key: HIVE-13452
> URL: https://issues.apache.org/jira/browse/HIVE-13452
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Ashutosh Chauhan
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13452.01.patch
>
>
> {code}
> create table t1 (a int);
> analyze table t1 compute statistics;
> analyze table t1 compute statistics for columns;
> select count(1) from t1 group by 1;
> set hive.compute.query.using.stats=true;
> select count(1) from t1 group by 1;
> {code}
> In both cases result set should be empty. However, with statsoptimizer on 
> Hive returns one row with value 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13452) StatsOptimizer should return no rows on empty table with group by

2016-12-12 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-13452:
---
Attachment: HIVE-13452.01.patch

> StatsOptimizer should return no rows on empty table with group by
> -
>
> Key: HIVE-13452
> URL: https://issues.apache.org/jira/browse/HIVE-13452
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Ashutosh Chauhan
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13452.01.patch
>
>
> {code}
> create table t1 (a int);
> analyze table t1 compute statistics;
> analyze table t1 compute statistics for columns;
> select count(1) from t1 group by 1;
> set hive.compute.query.using.stats=true;
> select count(1) from t1 group by 1;
> {code}
> In both cases result set should be empty. However, with statsoptimizer on 
> Hive returns one row with value 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13452) StatsOptimizer should return no rows on empty table with group by

2016-12-12 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-13452:
---
Status: Open  (was: Patch Available)

> StatsOptimizer should return no rows on empty table with group by
> -
>
> Key: HIVE-13452
> URL: https://issues.apache.org/jira/browse/HIVE-13452
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Ashutosh Chauhan
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13452.01.patch
>
>
> {code}
> create table t1 (a int);
> analyze table t1 compute statistics;
> analyze table t1 compute statistics for columns;
> select count(1) from t1 group by 1;
> set hive.compute.query.using.stats=true;
> select count(1) from t1 group by 1;
> {code}
> In both cases result set should be empty. However, with statsoptimizer on 
> Hive returns one row with value 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13452) StatsOptimizer should return no rows on empty table with group by

2016-12-12 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-13452:
---
Attachment: (was: HIVE-13452.01.patch)

> StatsOptimizer should return no rows on empty table with group by
> -
>
> Key: HIVE-13452
> URL: https://issues.apache.org/jira/browse/HIVE-13452
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Ashutosh Chauhan
>Assignee: Pengcheng Xiong
>
> {code}
> create table t1 (a int);
> analyze table t1 compute statistics;
> analyze table t1 compute statistics for columns;
> select count(1) from t1 group by 1;
> set hive.compute.query.using.stats=true;
> select count(1) from t1 group by 1;
> {code}
> In both cases result set should be empty. However, with statsoptimizer on 
> Hive returns one row with value 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14007) Replace ORC module with ORC release

2016-12-12 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-14007:
-
Attachment: HIVE-14007.patch

> Replace ORC module with ORC release
> ---
>
> Key: HIVE-14007
> URL: https://issues.apache.org/jira/browse/HIVE-14007
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.2.0
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 2.2.0
>
> Attachments: HIVE-14007.patch, HIVE-14007.patch, HIVE-14007.patch, 
> HIVE-14007.patch, HIVE-14007.patch
>
>
> This completes moving the core ORC reader & writer to the ORC project.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13680) HiveServer2: Provide a way to compress ResultSets

2016-12-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743598#comment-15743598
 ] 

Hive QA commented on HIVE-13680:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12842862/HIVE-13680.6.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 10795 tests 
executed
*Failed tests:*
{noformat}
TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed 
out) (batchId=144)

[vectorized_rcfile_columnar.q,vector_elt.q,explainuser_1.q,multi_insert.q,tez_dml.q,vector_bround.q,schema_evol_orc_acid_table.q,vector_when_case_null.q,orc_ppd_schema_evol_1b.q,vector_join30.q,vectorization_11.q,cte_3.q,update_tmp_table.q,vector_decimal_cast.q,groupby_grouping_id2.q,vector_decimal_round.q,tez_smb_empty.q,orc_merge6.q,vector_decimal_trailing.q,cte_5.q,tez_union.q,cbo_rp_subq_not_in.q,vector_decimal_2.q,columnStatsUpdateForStatsOptimizer_1.q,vector_outer_join3.q,schema_evol_text_vec_part_all_complex.q,tez_dynpart_hashjoin_2.q,auto_sortmerge_join_12.q,offset_limit.q,tez_union_multiinsert.q]
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=252)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] 
(batchId=44)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=93)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2548/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2548/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2548/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12842862 - PreCommit-HIVE-Build

> HiveServer2: Provide a way to compress ResultSets
> -
>
> Key: HIVE-13680
> URL: https://issues.apache.org/jira/browse/HIVE-13680
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, JDBC
>Reporter: Vaibhav Gumashta
>Assignee: Kevin Liew
> Attachments: HIVE-13680.2.patch, HIVE-13680.3.patch, 
> HIVE-13680.4.patch, HIVE-13680.6.patch, HIVE-13680.patch, SnappyCompDe.zip, 
> proposal.pdf
>
>
> With HIVE-12049 in, we can provide an option to compress ResultSets before 
> writing to disk. The user can specify a compression library via a config 
> param which can be used in the tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15421) Assumption in exception handling can be wrong in DagUtils.localizeResource

2016-12-12 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743526#comment-15743526
 ] 

Daniel Dai commented on HIVE-15421:
---

Can you comment the HDFS jira and fixed version in code?

> Assumption in exception handling can be wrong in DagUtils.localizeResource
> --
>
> Key: HIVE-15421
> URL: https://issues.apache.org/jira/browse/HIVE-15421
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15421.1.patch
>
>
> In localizeResource once we got an IOException, we always assume this is due 
> to another thread writing the same file. But that is not always the case. 
> Even without the interference from other threads, it may still get an 
> IOException (RemoteException) due to failure of copyFromLocalFile in a 
> specific environment, for example, in a kerberized HDFS encryption zone where 
> the TGT is expired.
> We'd better fail early with different message to avoid confusion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15386) Expose Spark task counts and stage Ids information in SparkTask from SparkJobMonitor

2016-12-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743445#comment-15743445
 ] 

Hive QA commented on HIVE-15386:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12842852/HIVE-15386.002.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 10780 tests 
executed
*Failed tests:*
{noformat}
TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed 
out) (batchId=144)

[vectorized_rcfile_columnar.q,vector_elt.q,explainuser_1.q,multi_insert.q,tez_dml.q,vector_bround.q,schema_evol_orc_acid_table.q,vector_when_case_null.q,orc_ppd_schema_evol_1b.q,vector_join30.q,vectorization_11.q,cte_3.q,update_tmp_table.q,vector_decimal_cast.q,groupby_grouping_id2.q,vector_decimal_round.q,tez_smb_empty.q,orc_merge6.q,vector_decimal_trailing.q,cte_5.q,tez_union.q,cbo_rp_subq_not_in.q,vector_decimal_2.q,columnStatsUpdateForStatsOptimizer_1.q,vector_outer_join3.q,schema_evol_text_vec_part_all_complex.q,tez_dynpart_hashjoin_2.q,auto_sortmerge_join_12.q,offset_limit.q,tez_union_multiinsert.q]
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=251)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=93)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_3] 
(batchId=92)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2547/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2547/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2547/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12842852 - PreCommit-HIVE-Build

> Expose Spark task counts and stage Ids information in SparkTask from 
> SparkJobMonitor
> 
>
> Key: HIVE-15386
> URL: https://issues.apache.org/jira/browse/HIVE-15386
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 2.2.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: HIVE-15386.000.patch, HIVE-15386.001.patch, 
> HIVE-15386.002.patch
>
>
> Expose Spark task counts and stage Ids information in SparkTask from 
> SparkJobMonitor. So these information can be used by hive hook to monitor 
> spark jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15118) Remove unused 'COLUMNS' table from derby schema

2016-12-12 Thread Naveen Gangam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743444#comment-15743444
 ] 

Naveen Gangam commented on HIVE-15118:
--

[~ychena] It was done it {{008-HIVE-2246.derby.sql}} file. There is a 
{{008-REVERT-HIVE-2246.derby.sql}} script that reverts this change where the 
rename occurs, but it never gets called from any of the real upgrade scripts. 
So I assume it was meant for folks who wanted to manually revert the change


> Remove unused 'COLUMNS' table from derby schema
> ---
>
> Key: HIVE-15118
> URL: https://issues.apache.org/jira/browse/HIVE-15118
> Project: Hive
>  Issue Type: Sub-task
>  Components: Database/Schema
>Affects Versions: 2.2.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Minor
> Attachments: HIVE-15118.1.patch, HIVE-15118.2.patch, 
> HIVE-15118.3.patch
>
>
> COLUMNS table is unused any more. Other databases already removed it. Remove 
> from derby as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15118) Remove unused 'COLUMNS' table from derby schema

2016-12-12 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743435#comment-15743435
 ] 

Yongzhi Chen commented on HIVE-15118:
-

[~aihuaxu], could you point to me where is the script that change columns to 
columns_old? 

> Remove unused 'COLUMNS' table from derby schema
> ---
>
> Key: HIVE-15118
> URL: https://issues.apache.org/jira/browse/HIVE-15118
> Project: Hive
>  Issue Type: Sub-task
>  Components: Database/Schema
>Affects Versions: 2.2.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Minor
> Attachments: HIVE-15118.1.patch, HIVE-15118.2.patch, 
> HIVE-15118.3.patch
>
>
> COLUMNS table is unused any more. Other databases already removed it. Remove 
> from derby as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14007) Replace ORC module with ORC release

2016-12-12 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743404#comment-15743404
 ] 

Owen O'Malley commented on HIVE-14007:
--

{quote}
The current behavior and the fix to encode column names was shipped in hive 2 
and this patch fundamentally changes how alter table statements/schema 
evolution works.
{quote}

Ok, it is trivial to add a configuration knob like 
"orc.ignore.names.for.evolution", which we can do. I've filed ORC-120 to handle 
the transition to the richer schema evolution.

As for your other concerns, I've presented a solution and you keep raising the 
same vague concerns. Having the two implementations of ORC is *really* 
problematic and is leading to Hive not getting bug fixes and new features. That 
will continue to worsen as the two code branches continue to diverge.

> Replace ORC module with ORC release
> ---
>
> Key: HIVE-14007
> URL: https://issues.apache.org/jira/browse/HIVE-14007
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.2.0
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 2.2.0
>
> Attachments: HIVE-14007.patch, HIVE-14007.patch, HIVE-14007.patch, 
> HIVE-14007.patch
>
>
> This completes moving the core ORC reader & writer to the ORC project.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15401) Import constraints into HBase metastore

2016-12-12 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-15401:
--
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Patch committed to master.  Thanks Daniel for the review.

> Import constraints into HBase metastore
> ---
>
> Key: HIVE-15401
> URL: https://issues.apache.org/jira/browse/HIVE-15401
> Project: Hive
>  Issue Type: Sub-task
>  Components: HBase Metastore
>Affects Versions: 2.1.1
>Reporter: Alan Gates
>Assignee: Alan Gates
> Fix For: 2.2.0
>
> Attachments: HIVE-15401.patch
>
>
> Since HIVE-15342 added support for primary and foreign keys in the HBase 
> metastore we should support them in HBaseImport as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15421) Assumption in exception handling can be wrong in DagUtils.localizeResource

2016-12-12 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743377#comment-15743377
 ] 

Wei Zheng commented on HIVE-15421:
--

[~daijy] Can you take a look please?

> Assumption in exception handling can be wrong in DagUtils.localizeResource
> --
>
> Key: HIVE-15421
> URL: https://issues.apache.org/jira/browse/HIVE-15421
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15421.1.patch
>
>
> In localizeResource once we got an IOException, we always assume this is due 
> to another thread writing the same file. But that is not always the case. 
> Even without the interference from other threads, it may still get an 
> IOException (RemoteException) due to failure of copyFromLocalFile in a 
> specific environment, for example, in a kerberized HDFS encryption zone where 
> the TGT is expired.
> We'd better fail early with different message to avoid confusion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15421) Assumption in exception handling can be wrong in DagUtils.localizeResource

2016-12-12 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-15421:
-
Status: Patch Available  (was: Open)

> Assumption in exception handling can be wrong in DagUtils.localizeResource
> --
>
> Key: HIVE-15421
> URL: https://issues.apache.org/jira/browse/HIVE-15421
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15421.1.patch
>
>
> In localizeResource once we got an IOException, we always assume this is due 
> to another thread writing the same file. But that is not always the case. 
> Even without the interference from other threads, it may still get an 
> IOException (RemoteException) due to failure of copyFromLocalFile in a 
> specific environment, for example, in a kerberized HDFS encryption zone where 
> the TGT is expired.
> We'd better fail early with different message to avoid confusion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15421) Assumption in exception handling can be wrong in DagUtils.localizeResource

2016-12-12 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-15421:
-
Attachment: (was: HIVE-15421.1.patch)

> Assumption in exception handling can be wrong in DagUtils.localizeResource
> --
>
> Key: HIVE-15421
> URL: https://issues.apache.org/jira/browse/HIVE-15421
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15421.1.patch
>
>
> In localizeResource once we got an IOException, we always assume this is due 
> to another thread writing the same file. But that is not always the case. 
> Even without the interference from other threads, it may still get an 
> IOException (RemoteException) due to failure of copyFromLocalFile in a 
> specific environment, for example, in a kerberized HDFS encryption zone where 
> the TGT is expired.
> We'd better fail early with different message to avoid confusion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15421) Assumption in exception handling can be wrong in DagUtils.localizeResource

2016-12-12 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-15421:
-
Attachment: HIVE-15421.1.patch

> Assumption in exception handling can be wrong in DagUtils.localizeResource
> --
>
> Key: HIVE-15421
> URL: https://issues.apache.org/jira/browse/HIVE-15421
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15421.1.patch
>
>
> In localizeResource once we got an IOException, we always assume this is due 
> to another thread writing the same file. But that is not always the case. 
> Even without the interference from other threads, it may still get an 
> IOException (RemoteException) due to failure of copyFromLocalFile in a 
> specific environment, for example, in a kerberized HDFS encryption zone where 
> the TGT is expired.
> We'd better fail early with different message to avoid confusion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15421) Assumption in exception handling can be wrong in DagUtils.localizeResource

2016-12-12 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-15421:
-
Attachment: HIVE-15421.1.patch

> Assumption in exception handling can be wrong in DagUtils.localizeResource
> --
>
> Key: HIVE-15421
> URL: https://issues.apache.org/jira/browse/HIVE-15421
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15421.1.patch
>
>
> In localizeResource once we got an IOException, we always assume this is due 
> to another thread writing the same file. But that is not always the case. 
> Even without the interference from other threads, it may still get an 
> IOException (RemoteException) due to failure of copyFromLocalFile in a 
> specific environment, for example, in a kerberized HDFS encryption zone where 
> the TGT is expired.
> We'd better fail early with different message to avoid confusion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-15418) "select 'abc'" will throw 'Cannot find path in conf'

2016-12-12 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu resolved HIVE-15418.
-
Resolution: Cannot Reproduce
  Assignee: (was: Aihua Xu)

Seems it's caused by older hadoop version.

> "select 'abc'" will throw 'Cannot find path in conf'
> 
>
> Key: HIVE-15418
> URL: https://issues.apache.org/jira/browse/HIVE-15418
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Aihua Xu
>
> Here is the stack trace. Seems it's a regression since it worked with earlier 
> version.
> {noformat}
> 2016-12-09T16:32:37,577 ERROR [56fa1999-ffbe-42c0-bb91-61211cd62476 main] 
> CliDriver: Failed with exception java.io.IOException:java.io.IOException: 
> Cannot find path in conf
> java.io.IOException: java.io.IOException: Cannot find path in conf
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:521)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428)
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:147)
> at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2191)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:253)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:400)
> at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:777)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:715)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:642)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Caused by: java.io.IOException: Cannot find path in conf
> at 
> org.apache.hadoop.hive.ql.io.NullRowsInputFormat.getSplits(NullRowsInputFormat.java:165)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextSplits(FetchOperator.java:372)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:304)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:459)
> ... 15 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15118) Remove unused 'COLUMNS' table from derby schema

2016-12-12 Thread Naveen Gangam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743365#comment-15743365
 ] 

Naveen Gangam commented on HIVE-15118:
--

LGTM pending tests .. +1 for me.

> Remove unused 'COLUMNS' table from derby schema
> ---
>
> Key: HIVE-15118
> URL: https://issues.apache.org/jira/browse/HIVE-15118
> Project: Hive
>  Issue Type: Sub-task
>  Components: Database/Schema
>Affects Versions: 2.2.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Minor
> Attachments: HIVE-15118.1.patch, HIVE-15118.2.patch, 
> HIVE-15118.3.patch
>
>
> COLUMNS table is unused any more. Other databases already removed it. Remove 
> from derby as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15118) Remove unused 'COLUMNS' table from derby schema

2016-12-12 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-15118:

Attachment: HIVE-15118.3.patch

patch-3: address comments. During upgrade, the columns table was renamed to 
columns_old, so need to drop columns_old instead of columns during upgrade.

> Remove unused 'COLUMNS' table from derby schema
> ---
>
> Key: HIVE-15118
> URL: https://issues.apache.org/jira/browse/HIVE-15118
> Project: Hive
>  Issue Type: Sub-task
>  Components: Database/Schema
>Affects Versions: 2.2.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Minor
> Attachments: HIVE-15118.1.patch, HIVE-15118.2.patch, 
> HIVE-15118.3.patch
>
>
> COLUMNS table is unused any more. Other databases already removed it. Remove 
> from derby as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13452) StatsOptimizer should return no rows on empty table with group by

2016-12-12 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-13452:
---
Status: Patch Available  (was: Open)

> StatsOptimizer should return no rows on empty table with group by
> -
>
> Key: HIVE-13452
> URL: https://issues.apache.org/jira/browse/HIVE-13452
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Ashutosh Chauhan
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13452.01.patch
>
>
> {code}
> create table t1 (a int);
> analyze table t1 compute statistics;
> analyze table t1 compute statistics for columns;
> select count(1) from t1 group by 1;
> set hive.compute.query.using.stats=true;
> select count(1) from t1 group by 1;
> {code}
> In both cases result set should be empty. However, with statsoptimizer on 
> Hive returns one row with value 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13452) StatsOptimizer should return no rows on empty table with group by

2016-12-12 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-13452:
---
Status: Open  (was: Patch Available)

> StatsOptimizer should return no rows on empty table with group by
> -
>
> Key: HIVE-13452
> URL: https://issues.apache.org/jira/browse/HIVE-13452
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Ashutosh Chauhan
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13452.01.patch
>
>
> {code}
> create table t1 (a int);
> analyze table t1 compute statistics;
> analyze table t1 compute statistics for columns;
> select count(1) from t1 group by 1;
> set hive.compute.query.using.stats=true;
> select count(1) from t1 group by 1;
> {code}
> In both cases result set should be empty. However, with statsoptimizer on 
> Hive returns one row with value 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15420) LLAP UI: Relativize resources to allow proxied/secured views

2016-12-12 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-15420:
---
Status: Patch Available  (was: Open)

> LLAP UI: Relativize resources to allow proxied/secured views 
> -
>
> Key: HIVE-15420
> URL: https://issues.apache.org/jira/browse/HIVE-15420
> Project: Hive
>  Issue Type: Bug
>  Components: llap, Web UI
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-15420.1.patch
>
>
> If the UI is secured behind a gateway firewall instance, this allows for the 
> UI to function with a base URL like http:///proxy/
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15420) LLAP UI: Relativize resources to allow proxied/secured views

2016-12-12 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-15420:
---
Description: 
If the UI is secured behind a gateway firewall instance, this allows for the UI 
to function with a base URL like http:///proxy/

NO PRECOMMIT TESTS

  was:If the UI is secured behind a gateway firewall instance, this allows for 
the UI to function with a base URL like http:///proxy/


> LLAP UI: Relativize resources to allow proxied/secured views 
> -
>
> Key: HIVE-15420
> URL: https://issues.apache.org/jira/browse/HIVE-15420
> Project: Hive
>  Issue Type: Bug
>  Components: llap, Web UI
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-15420.1.patch
>
>
> If the UI is secured behind a gateway firewall instance, this allows for the 
> UI to function with a base URL like http:///proxy/
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15420) LLAP UI: Relativize resources to allow proxied/secured views

2016-12-12 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-15420:
---
Attachment: HIVE-15420.1.patch

> LLAP UI: Relativize resources to allow proxied/secured views 
> -
>
> Key: HIVE-15420
> URL: https://issues.apache.org/jira/browse/HIVE-15420
> Project: Hive
>  Issue Type: Bug
>  Components: llap, Web UI
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-15420.1.patch
>
>
> If the UI is secured behind a gateway firewall instance, this allows for the 
> UI to function with a base URL like http:///proxy/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-15420) LLAP UI: Relativize resources to allow proxied/secured views

2016-12-12 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V reassigned HIVE-15420:
--

Assignee: Gopal V

> LLAP UI: Relativize resources to allow proxied/secured views 
> -
>
> Key: HIVE-15420
> URL: https://issues.apache.org/jira/browse/HIVE-15420
> Project: Hive
>  Issue Type: Bug
>  Components: llap, Web UI
>Reporter: Gopal V
>Assignee: Gopal V
>
> If the UI is secured behind a gateway firewall instance, this allows for the 
> UI to function with a base URL like http:///proxy/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15397) metadata-only queries may return incorrect results with empty tables

2016-12-12 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743268#comment-15743268
 ] 

Ashutosh Chauhan commented on HIVE-15397:
-

+1

> metadata-only queries may return incorrect results with empty tables
> 
>
> Key: HIVE-15397
> URL: https://issues.apache.org/jira/browse/HIVE-15397
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15397.01.patch, HIVE-15397.patch
>
>
> Queries like select 1=1 from t group by 1=1 may return rows, based on 
> OneNullRowInputFormat, even if the source table is empty. For now, add some 
> basic detection of empty tables and turn this off by default (since we can't 
> know whether a table is empty or not based on there being some files, without 
> reading them).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15417) Glitches using ACID's row__id hidden column

2016-12-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743265#comment-15743265
 ] 

Hive QA commented on HIVE-15417:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12842851/HIVE-15417.02.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 10796 tests 
executed
*Failed tests:*
{noformat}
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=116)

[join39.q,bucketsortoptimize_insert_7.q,vector_distinct_2.q,join11.q,union13.q,dynamic_rdd_cache.q,auto_sortmerge_join_16.q,windowing.q,union_remove_3.q,skewjoinopt7.q,stats7.q,annotate_stats_join.q,multi_insert_lateral_view.q,ptf_streaming.q,join_1to1.q]
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=251)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_4] 
(batchId=93)
org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.testSparkQuery 
(batchId=216)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2546/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2546/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2546/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12842851 - PreCommit-HIVE-Build

> Glitches using ACID's row__id hidden column
> ---
>
> Key: HIVE-15417
> URL: https://issues.apache.org/jira/browse/HIVE-15417
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Carter Shanklin
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-15417.01.patch, HIVE-15417.02.patch, 
> HIVE-15417.patch
>
>
> This only works if you turn PPD off.
> {code:sql}
> drop table if exists hello_acid;
> create table hello_acid (key int, value int)
> partitioned by (load_date date)
> clustered by(key) into 3 buckets
> stored as orc tblproperties ('transactional'='true');
> insert into hello_acid partition (load_date='2016-03-01') values (1, 1);
> insert into hello_acid partition (load_date='2016-03-02') values (2, 2);
> insert into hello_acid partition (load_date='2016-03-03') values (3, 3);
> {code}
> {code}
> hive> set hive.optimize.ppd=true;
> hive> select tid from (select row__id.transactionid as tid from hello_acid) 
> sub where tid = 15;
> FAILED: SemanticException MetaException(message:cannot find field row__id 
> from [0:load_date])
> hive> set hive.optimize.ppd=false;
> hive> select tid from (select row__id.transactionid as tid from hello_acid) 
> sub where tid = 15;
> OK
> tid
> 15
> Time taken: 0.075 seconds, Fetched: 1 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15118) Remove unused 'COLUMNS' table from derby schema

2016-12-12 Thread Naveen Gangam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743236#comment-15743236
 ] 

Naveen Gangam commented on HIVE-15118:
--

[~aihuaxu] Functionally, the second version of the patch looks fine. Just a 
couple of nits: 
1) the {{UPDATE "APP".VERSION SET SCHEMA_VERSION=}} command should be the last 
command in this file for a couple of reasons a) for consistency with other 
upgrade schema files and more importantly 2) the version in this table is 
pretty heavily relied upon to determine the schema version. So the upgrade will 
not be considered complete until the version is set. If for some reason, the 
upgrade script terminates abruptly before this version is set, the version 
would not be set and would be and indication that something went wrong during 
upgrade. With the current patch, if the script ends right after the version is 
set, the drop table command will not be processed but the schema version would 
indicate that it succeeded.
2) Can you move the {{DROP TABLE}} command to a separate file, say 
037-HIVE-15118.derby.sql and then just call RUN on this file from the upgrade 
script? Thanks

> Remove unused 'COLUMNS' table from derby schema
> ---
>
> Key: HIVE-15118
> URL: https://issues.apache.org/jira/browse/HIVE-15118
> Project: Hive
>  Issue Type: Sub-task
>  Components: Database/Schema
>Affects Versions: 2.2.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Minor
> Attachments: HIVE-15118.1.patch, HIVE-15118.2.patch
>
>
> COLUMNS table is unused any more. Other databases already removed it. Remove 
> from derby as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15294) Capture additional metadata to replicate a simple insert at destination

2016-12-12 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743228#comment-15743228
 ] 

Thejas M Nair commented on HIVE-15294:
--

[~vgumashta] Can you please create a review board link or pull request (without 
the generated files) ?


> Capture additional metadata to replicate a simple insert at destination
> ---
>
> Key: HIVE-15294
> URL: https://issues.apache.org/jira/browse/HIVE-15294
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-15294.1.patch
>
>
> For replicating inserts like {{INSERT INTO ... SELECT ... FROM}}, we will 
> need to capture the newly added files in the notification message to be able 
> to replicate the event at destination. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15383) Add additional info to 'desc function extended' output

2016-12-12 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-15383:

Attachment: HIVE-15383.2.patch

> Add additional info to 'desc function extended' output
> --
>
> Key: HIVE-15383
> URL: https://issues.apache.org/jira/browse/HIVE-15383
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 2.2.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Trivial
> Attachments: HIVE-15383.1.patch, HIVE-15383.2.patch
>
>
> Add additional info to the output to 'desc function extended'. The resources 
> would be helpful for the user to check which jars are referred.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15383) Add additional info to 'desc function extended' output

2016-12-12 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-15383:

Attachment: (was: HIVE-15383.2.patch)

> Add additional info to 'desc function extended' output
> --
>
> Key: HIVE-15383
> URL: https://issues.apache.org/jira/browse/HIVE-15383
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 2.2.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Trivial
> Attachments: HIVE-15383.1.patch, HIVE-15383.2.patch
>
>
> Add additional info to the output to 'desc function extended'. The resources 
> would be helpful for the user to check which jars are referred.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14007) Replace ORC module with ORC release

2016-12-12 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743156#comment-15743156
 ] 

Gunther Hagleitner commented on HIVE-14007:
---

[~owen.omalley] this partially addresses my concerns but you're being vague 
about what exactly and when. So I will try to be clear. I'm -1 on this. 
Blockers for me are:

a) change breaks compatibility. (schema evolution)

b) there's no documented and proven way to actually be able to turn these 
features/bugs around in 3 days. (many open questions, what's considered public 
v private, backwards compat criteria of the api, documentation for api itself, 
documentation of release mechanics, what are the version numbers, who votes on 
it, do a feature end to end to demonstrate.)

c) not sure why it's up to you who gets to play and who doesn't. for me at the 
very least anyone who's been working on ACID, ORC, LLAP, cloud/s3 or 
vectorization should keep their committership/pmc in ORC (or have it 
transferred/added). i'm fine if folks explicitly opt out of that. 

given how these things are interrelated, anything else will just make it 
harder/impossible for hive devs to do their work.



> Replace ORC module with ORC release
> ---
>
> Key: HIVE-14007
> URL: https://issues.apache.org/jira/browse/HIVE-14007
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.2.0
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 2.2.0
>
> Attachments: HIVE-14007.patch, HIVE-14007.patch, HIVE-14007.patch, 
> HIVE-14007.patch
>
>
> This completes moving the core ORC reader & writer to the ORC project.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15376) Improve heartbeater scheduling for transactions

2016-12-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743099#comment-15743099
 ] 

Hive QA commented on HIVE-15376:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12842845/HIVE-15376.6.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 10795 tests 
executed
*Failed tests:*
{noformat}
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=123)

[groupby_complex_types.q,multigroupby_singlemr.q,mapjoin_decimal.q,groupby7.q,join5.q,bucketmapjoin_negative2.q,vectorization_div0.q,union_script.q,add_part_multiple.q,limit_pushdown.q,union_remove_17.q,uniquejoin.q,metadata_only_queries_with_filters.q,union25.q,load_dyn_part13.q]
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=251)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] 
(batchId=44)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a]
 (batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
 (batchId=151)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2545/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2545/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2545/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12842845 - PreCommit-HIVE-Build

> Improve heartbeater scheduling for transactions
> ---
>
> Key: HIVE-15376
> URL: https://issues.apache.org/jira/browse/HIVE-15376
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15376.1.patch, HIVE-15376.2.patch, 
> HIVE-15376.3.patch, HIVE-15376.4.patch, HIVE-15376.5.patch, HIVE-15376.6.patch
>
>
> HIVE-12366 improved the heartbeater logic by bringing down the gap between 
> the lock acquisition and first heartbeat, but that's not enough, there may 
> still be some issue, e.g.
>  Time A: a transaction is opened
>  Time B: acquireLocks is called (blocking call), but it can take a long time 
> to actually acquire the locks and return if the system is busy
>  Time C: as acquireLocks returns, the first heartbeat is sent
> If hive.txn.timeout < C - A, then the transaction will be timed out and 
> aborted, thus causing failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15416) CAST to string does not work for large decimal numbers

2016-12-12 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743068#comment-15743068
 ] 

Sergey Shelukhin commented on HIVE-15416:
-

That's probably because some code goes thru float or double. I remember seeing 
code like that in some conversion cases.

> CAST to string does not work for large decimal numbers
> --
>
> Key: HIVE-15416
> URL: https://issues.apache.org/jira/browse/HIVE-15416
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Pavel Benes
>
> The cast of large decimal values to string does not work and produces NULL 
> values. 
> Steps to reproduce:
> {code}
> hive> create table test_hive_bug30(decimal_col DECIMAL(30,0));
> OK
> {code}
> {code}
> hive> insert into test_hive_bug30 VALUES (123), 
> (9), 
> (99),(999);
> Query ID = benesp_20161212135717_5d16d7f4-7b84-409e-ad00-36085deaae54
> Total jobs = 1
> Launching Job 1 out of 1
> Status: Running (Executing on YARN cluster with App id 
> application_1480833176011_2469)
> 
> VERTICES  STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  
> KILLED
> 
> Map 1 ..   SUCCEEDED  1  100   0  
>  0
> 
> VERTICES: 01/01  [==>>] 100%  ELAPSED TIME: 7.69 s
> 
> Loading data to table default.test_hive_bug30
> Table default.test_hive_bug30 stats: [numFiles=1, numRows=4, totalSize=68, 
> rawDataSize=64]
> OK
> Time taken: 8.239 seconds
> {code}
> {code}
> hive> select CAST(decimal_col AS STRING) from test_hive_bug30;
> OK
> 123
> NULL
> NULL
> NULL
> Time taken: 0.043 seconds, Fetched: 4 row(s)
> {code}
> The numbers with 29 and 30 digits should be exported, but they are converted 
> to NULL instead. 
> The values are stored correctly as can be seen here:
> {code}
> hive> select * from test_hive_bug30;
> OK
> 123
> 9
> 99
> NULL
> Time taken: 0.447 seconds, Fetched: 4 row(s)
> {code}
> The same issue does not exists for smaller numbers (e.g. DECIMAL(10)).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-4095) Add exchange partition in Hive

2016-12-12 Thread Anthony Hsu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743019#comment-15743019
 ] 

Anthony Hsu edited comment on HIVE-4095 at 12/12/16 8:16 PM:
-

[~leftylev]: Based on my testing with versions 0.13.1, 1.1.0, and 2.2.0 
(trunk), the destination table should come first. I clarified the example in 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ExchangePartition.


was (Author: erwaman):
[~leftylev]: Based on my testing with versions 0.13.1, 1.1.0, and 2.2.0 
(trunk), the destination table should come first.

> Add exchange partition in Hive
> --
>
> Key: HIVE-4095
> URL: https://issues.apache.org/jira/browse/HIVE-4095
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Dheeraj Kumar Singh
> Fix For: 0.12.0
>
> Attachments: HIVE-4095.D10155.1.patch, HIVE-4095.D10155.2.patch, 
> HIVE-4095.D10347.1.patch, HIVE-4095.part11.patch.txt, 
> HIVE-4095.part12.patch.txt, hive.4095.1.patch, hive.4095.refresh.patch, 
> hive.4095.svn.thrift.patch, hive.4095.svn.thrift.patch.refresh
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-4095) Add exchange partition in Hive

2016-12-12 Thread Anthony Hsu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743019#comment-15743019
 ] 

Anthony Hsu commented on HIVE-4095:
---

[~leftylev]: Based on my testing with versions 0.13.1, 1.1.0, and 2.2.0 
(trunk), the destination table should come first.

> Add exchange partition in Hive
> --
>
> Key: HIVE-4095
> URL: https://issues.apache.org/jira/browse/HIVE-4095
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Dheeraj Kumar Singh
> Fix For: 0.12.0
>
> Attachments: HIVE-4095.D10155.1.patch, HIVE-4095.D10155.2.patch, 
> HIVE-4095.D10347.1.patch, HIVE-4095.part11.patch.txt, 
> HIVE-4095.part12.patch.txt, hive.4095.1.patch, hive.4095.refresh.patch, 
> hive.4095.svn.thrift.patch, hive.4095.svn.thrift.patch.refresh
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15397) metadata-only queries may return incorrect results with empty tables

2016-12-12 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743015#comment-15743015
 ] 

Sergey Shelukhin commented on HIVE-15397:
-

[~ashutoshc] ping?

> metadata-only queries may return incorrect results with empty tables
> 
>
> Key: HIVE-15397
> URL: https://issues.apache.org/jira/browse/HIVE-15397
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15397.01.patch, HIVE-15397.patch
>
>
> Queries like select 1=1 from t group by 1=1 may return rows, based on 
> OneNullRowInputFormat, even if the source table is empty. For now, add some 
> basic detection of empty tables and turn this off by default (since we can't 
> know whether a table is empty or not based on there being some files, without 
> reading them).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15108) allow Hive script to skip hadoop version check and HBase classpath

2016-12-12 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15742999#comment-15742999
 ] 

Sergey Shelukhin commented on HIVE-15108:
-

Is hive tool itself documented? if yes, I can add them there

> allow Hive script to skip hadoop version check and HBase classpath
> --
>
> Key: HIVE-15108
> URL: https://issues.apache.org/jira/browse/HIVE-15108
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 2.2.0
>
> Attachments: HIVE-15108.patch, HIVE-15108.patch
>
>
> Both will be performed by default, as before



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15353) Metastore throws NPE if StorageDescriptor.cols is null

2016-12-12 Thread Anthony Hsu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743001#comment-15743001
 ] 

Anthony Hsu commented on HIVE-15353:


Canceled and resubmitted patch. Will see if PreCommit tests run.

> Metastore throws NPE if StorageDescriptor.cols is null
> --
>
> Key: HIVE-15353
> URL: https://issues.apache.org/jira/browse/HIVE-15353
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15353.1.patch, HIVE-15353.2.patch, 
> HIVE-15353.3.patch
>
>
> When using the HiveMetaStoreClient API directly to talk to the metastore, you 
> get NullPointerExceptions when StorageDescriptor.cols is null in the 
> Table/Partition object in the following calls:
> * create_table
> * alter_table
> * alter_partition
> Calling add_partition with StorageDescriptor.cols set to null causes null to 
> be stored in the metastore database and subsequent calls to alter_partition 
> for that partition to fail with an NPE.
> Null checks should be added to eliminate the NPEs in the metastore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15353) Metastore throws NPE if StorageDescriptor.cols is null

2016-12-12 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-15353:
---
Status: Open  (was: Patch Available)

> Metastore throws NPE if StorageDescriptor.cols is null
> --
>
> Key: HIVE-15353
> URL: https://issues.apache.org/jira/browse/HIVE-15353
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15353.1.patch, HIVE-15353.2.patch, 
> HIVE-15353.3.patch
>
>
> When using the HiveMetaStoreClient API directly to talk to the metastore, you 
> get NullPointerExceptions when StorageDescriptor.cols is null in the 
> Table/Partition object in the following calls:
> * create_table
> * alter_table
> * alter_partition
> Calling add_partition with StorageDescriptor.cols set to null causes null to 
> be stored in the metastore database and subsequent calls to alter_partition 
> for that partition to fail with an NPE.
> Null checks should be added to eliminate the NPEs in the metastore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15353) Metastore throws NPE if StorageDescriptor.cols is null

2016-12-12 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-15353:
---
Status: Patch Available  (was: Open)

> Metastore throws NPE if StorageDescriptor.cols is null
> --
>
> Key: HIVE-15353
> URL: https://issues.apache.org/jira/browse/HIVE-15353
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15353.1.patch, HIVE-15353.2.patch, 
> HIVE-15353.3.patch
>
>
> When using the HiveMetaStoreClient API directly to talk to the metastore, you 
> get NullPointerExceptions when StorageDescriptor.cols is null in the 
> Table/Partition object in the following calls:
> * create_table
> * alter_table
> * alter_partition
> Calling add_partition with StorageDescriptor.cols set to null causes null to 
> be stored in the metastore database and subsequent calls to alter_partition 
> for that partition to fail with an NPE.
> Null checks should be added to eliminate the NPEs in the metastore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13680) HiveServer2: Provide a way to compress ResultSets

2016-12-12 Thread Kevin Liew (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Liew updated HIVE-13680:
--
Attachment: HIVE-13680.6.patch

Latest patch attached. The server-side framework will remain in this JIRA while 
the sample compressor moves to HIVE-15384.

> HiveServer2: Provide a way to compress ResultSets
> -
>
> Key: HIVE-13680
> URL: https://issues.apache.org/jira/browse/HIVE-13680
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, JDBC
>Reporter: Vaibhav Gumashta
>Assignee: Kevin Liew
> Attachments: HIVE-13680.2.patch, HIVE-13680.3.patch, 
> HIVE-13680.4.patch, HIVE-13680.6.patch, HIVE-13680.patch, SnappyCompDe.zip, 
> proposal.pdf
>
>
> With HIVE-12049 in, we can provide an option to compress ResultSets before 
> writing to disk. The user can specify a compression library via a config 
> param which can be used in the tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15384) Compressor plugin

2016-12-12 Thread Kevin Liew (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Liew updated HIVE-15384:
--
Summary: Compressor plugin  (was: Compressor plugin framework)

> Compressor plugin
> -
>
> Key: HIVE-15384
> URL: https://issues.apache.org/jira/browse/HIVE-15384
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ziyang Zhao
>
> Splitting server framework into separate JIRA from compressor



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (HIVE-14007) Replace ORC module with ORC release

2016-12-12 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-14007:
-
Comment: was deleted

(was: .bq
The other thing I think we need community wide clarity on before you rip out 
orc is how we’re going to keep developing hive afterwards. Right now there’s a 
cyclic dependency. Hive -> ORC -> Hive - because of a shared storage api.

There is agreement within Hive to release the storage-api independently from 
Hive.  That would break the cycle and allow a non-cyclic release process. I'll 
file a Hive jira to do that work. Avoiding have two copies of code makes the 
whole ecosystem stronger by making sure that fixes get applied everywhere. I'd 
suggest leaving storage-api in the Hive source tree rather than making its own 
git repository. 

.bq
There are features that touch all three. And it turns out these are more 
frequent than expected. 

They come in waves. In the last three months, there have been 2 changes to 
storage-api.
Most of the patches are in either storage-api or ORC.  For example, HIVE-14453 
only touches ORC.

.bq
How do you propose to handle development and release of these features given 
the cyclic dependency? How do you work out feature branches/ snapshots?

For changes that touch one or the other, you'd commit the relevant change and 
release either storage-api or ORC and have a jira that updates the version in 
Hive. In the worst case, where the change spreads among the three artifacts, 
you would:

* commit to storage-api & ORC
* release them
* upgrade the pom in Hive

.bq
If a successful feature commit requires sequential hive and orc releases, then 
that means minimum several months before commit and that's not great. How will 
this be done?

No, ORC releases typically take 3 days. Storage API is much simpler and should 
also take 3 days. By being much smaller and more focused, they are much more 
nimble. Furthermore, the two votes could completely overlap, so the total time 
to get the change into Hive would be roughly 3 days. 

.bq
Looking over the PMC and committer lists in ORC it looks like many people 
working on ACID, vectorization or llap will lose the ability to do what they 
are doing today with this change.

When we set up the ORC project, we were pretty inclusive in the committer list 
and we continue to add new committers and PMC members. I'll take at the 
contributors to the Hive ORC module to look for new committers.)

> Replace ORC module with ORC release
> ---
>
> Key: HIVE-14007
> URL: https://issues.apache.org/jira/browse/HIVE-14007
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.2.0
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 2.2.0
>
> Attachments: HIVE-14007.patch, HIVE-14007.patch, HIVE-14007.patch, 
> HIVE-14007.patch
>
>
> This completes moving the core ORC reader & writer to the ORC project.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15294) Capture additional metadata to replicate a simple insert at destination

2016-12-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15742927#comment-15742927
 ] 

Hive QA commented on HIVE-15294:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12842837/HIVE-15294.1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10810 tests 
executed
*Failed tests:*
{noformat}
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=251)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] 
(batchId=44)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
 (batchId=151)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2544/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2544/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2544/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12842837 - PreCommit-HIVE-Build

> Capture additional metadata to replicate a simple insert at destination
> ---
>
> Key: HIVE-15294
> URL: https://issues.apache.org/jira/browse/HIVE-15294
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-15294.1.patch
>
>
> For replicating inserts like {{INSERT INTO ... SELECT ... FROM}}, we will 
> need to capture the newly added files in the notification message to be able 
> to replicate the event at destination. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14007) Replace ORC module with ORC release

2016-12-12 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15742925#comment-15742925
 ] 

Owen O'Malley commented on HIVE-14007:
--

{quote}
The other thing I think we need community wide clarity on before you rip out 
orc is how we’re going to keep developing hive afterwards. Right now there’s a 
cyclic dependency. Hive -> ORC -> Hive - because of a shared storage api.
{quote}

There is agreement within Hive to release the storage-api independently from 
Hive. That would break the cycle and allow a non-cyclic release process. I'll 
file a Hive jira to do that work. Avoiding have two copies of code makes the 
whole ecosystem stronger by making sure that fixes get applied everywhere. I'd 
suggest leaving storage-api in the Hive source tree rather than making its own 
git repository.

{quote}
There are features that touch all three. And it turns out these are more 
frequent than expected.
{quote}

They come in waves. In the last three months, there have been 2 changes to 
storage-api. Most of the patches are in either storage-api or ORC. For example, 
HIVE-14453 only touches ORC.

{quote}
How do you propose to handle development and release of these features given 
the cyclic dependency? How do you work out feature branches/ snapshots?
{quote}

For changes that touch one or the other, you'd commit the relevant change and 
release either storage-api or ORC and have a jira that updates the version in 
Hive. In the worst case, where the change spreads among the three artifacts, 
you would:

* commit to storage-api & ORC
* release them
* upgrade the pom in Hive

{quote}
If a successful feature commit requires sequential hive and orc releases, then 
that means minimum several months before commit and that's not great. How will 
this be done?
{quote}

No, ORC releases typically take 3 days. Storage API is much simpler and should 
also take 3 days. By being much smaller and more focused, they are much more 
nimble. Furthermore, the two votes could completely overlap, so the total time 
to get the change into Hive would be roughly 3 days.

{quote}
Looking over the PMC and committer lists in ORC it looks like many people 
working on ACID, vectorization or llap will lose the ability to do what they 
are doing today with this change.
{quote}

When we set up the ORC project, we were pretty inclusive in the committer list 
and we continue to add new committers and PMC members. I'll take at the 
contributors to the Hive ORC module to look for new committers.

> Replace ORC module with ORC release
> ---
>
> Key: HIVE-14007
> URL: https://issues.apache.org/jira/browse/HIVE-14007
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.2.0
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 2.2.0
>
> Attachments: HIVE-14007.patch, HIVE-14007.patch, HIVE-14007.patch, 
> HIVE-14007.patch
>
>
> This completes moving the core ORC reader & writer to the ORC project.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14007) Replace ORC module with ORC release

2016-12-12 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15742914#comment-15742914
 ] 

Owen O'Malley commented on HIVE-14007:
--

.bq
The other thing I think we need community wide clarity on before you rip out 
orc is how we’re going to keep developing hive afterwards. Right now there’s a 
cyclic dependency. Hive -> ORC -> Hive - because of a shared storage api.

There is agreement within Hive to release the storage-api independently from 
Hive.  That would break the cycle and allow a non-cyclic release process. I'll 
file a Hive jira to do that work. Avoiding have two copies of code makes the 
whole ecosystem stronger by making sure that fixes get applied everywhere. I'd 
suggest leaving storage-api in the Hive source tree rather than making its own 
git repository. 

.bq
There are features that touch all three. And it turns out these are more 
frequent than expected. 

They come in waves. In the last three months, there have been 2 changes to 
storage-api.
Most of the patches are in either storage-api or ORC.  For example, HIVE-14453 
only touches ORC.

.bq
How do you propose to handle development and release of these features given 
the cyclic dependency? How do you work out feature branches/ snapshots?

For changes that touch one or the other, you'd commit the relevant change and 
release either storage-api or ORC and have a jira that updates the version in 
Hive. In the worst case, where the change spreads among the three artifacts, 
you would:

* commit to storage-api & ORC
* release them
* upgrade the pom in Hive

.bq
If a successful feature commit requires sequential hive and orc releases, then 
that means minimum several months before commit and that's not great. How will 
this be done?

No, ORC releases typically take 3 days. Storage API is much simpler and should 
also take 3 days. By being much smaller and more focused, they are much more 
nimble. Furthermore, the two votes could completely overlap, so the total time 
to get the change into Hive would be roughly 3 days. 

.bq
Looking over the PMC and committer lists in ORC it looks like many people 
working on ACID, vectorization or llap will lose the ability to do what they 
are doing today with this change.

When we set up the ORC project, we were pretty inclusive in the committer list 
and we continue to add new committers and PMC members. I'll take at the 
contributors to the Hive ORC module to look for new committers.

> Replace ORC module with ORC release
> ---
>
> Key: HIVE-14007
> URL: https://issues.apache.org/jira/browse/HIVE-14007
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.2.0
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 2.2.0
>
> Attachments: HIVE-14007.patch, HIVE-14007.patch, HIVE-14007.patch, 
> HIVE-14007.patch
>
>
> This completes moving the core ORC reader & writer to the ORC project.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-15386) Expose Spark task counts and stage Ids information in SparkTask from SparkJobMonitor

2016-12-12 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15742839#comment-15742839
 ] 

zhihai xu edited comment on HIVE-15386 at 12/12/16 7:15 PM:


yes, thanks for the review [~lirui]! using PerfLogger to get submitTime is 
better, I also use  PerfLogger to get finishTime in the new patch 
HIVE-15386.002.patch. So all the timing information is based on PerfLogger for 
consistency. Please review it!


was (Author: zxu):
yes, thanks for the review [~lirui]! using PerfLogger to get submitTime is 
better, I also use  PerfLogger to get finishTime. So all the timing information 
is based on PerfLogger for consistency. Please review it!

> Expose Spark task counts and stage Ids information in SparkTask from 
> SparkJobMonitor
> 
>
> Key: HIVE-15386
> URL: https://issues.apache.org/jira/browse/HIVE-15386
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 2.2.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: HIVE-15386.000.patch, HIVE-15386.001.patch, 
> HIVE-15386.002.patch
>
>
> Expose Spark task counts and stage Ids information in SparkTask from 
> SparkJobMonitor. So these information can be used by hive hook to monitor 
> spark jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15386) Expose Spark task counts and stage Ids information in SparkTask from SparkJobMonitor

2016-12-12 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15742839#comment-15742839
 ] 

zhihai xu commented on HIVE-15386:
--

yes, thanks for the review [~lirui]! using PerfLogger to get submitTime is 
better, I also use  PerfLogger to get finishTime. So all the timing information 
is based on PerfLogger for consistency. Please review it!

> Expose Spark task counts and stage Ids information in SparkTask from 
> SparkJobMonitor
> 
>
> Key: HIVE-15386
> URL: https://issues.apache.org/jira/browse/HIVE-15386
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 2.2.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: HIVE-15386.000.patch, HIVE-15386.001.patch, 
> HIVE-15386.002.patch
>
>
> Expose Spark task counts and stage Ids information in SparkTask from 
> SparkJobMonitor. So these information can be used by hive hook to monitor 
> spark jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15386) Expose Spark task counts and stage Ids information in SparkTask from SparkJobMonitor

2016-12-12 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated HIVE-15386:
-
Attachment: HIVE-15386.002.patch

> Expose Spark task counts and stage Ids information in SparkTask from 
> SparkJobMonitor
> 
>
> Key: HIVE-15386
> URL: https://issues.apache.org/jira/browse/HIVE-15386
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 2.2.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: HIVE-15386.000.patch, HIVE-15386.001.patch, 
> HIVE-15386.002.patch
>
>
> Expose Spark task counts and stage Ids information in SparkTask from 
> SparkJobMonitor. So these information can be used by hive hook to monitor 
> spark jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15417) Glitches using ACID's row__id hidden column

2016-12-12 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-15417:
---
Attachment: HIVE-15417.02.patch

Regenerating q file and making new test behavior deterministic.

> Glitches using ACID's row__id hidden column
> ---
>
> Key: HIVE-15417
> URL: https://issues.apache.org/jira/browse/HIVE-15417
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Carter Shanklin
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-15417.01.patch, HIVE-15417.02.patch, 
> HIVE-15417.patch
>
>
> This only works if you turn PPD off.
> {code:sql}
> drop table if exists hello_acid;
> create table hello_acid (key int, value int)
> partitioned by (load_date date)
> clustered by(key) into 3 buckets
> stored as orc tblproperties ('transactional'='true');
> insert into hello_acid partition (load_date='2016-03-01') values (1, 1);
> insert into hello_acid partition (load_date='2016-03-02') values (2, 2);
> insert into hello_acid partition (load_date='2016-03-03') values (3, 3);
> {code}
> {code}
> hive> set hive.optimize.ppd=true;
> hive> select tid from (select row__id.transactionid as tid from hello_acid) 
> sub where tid = 15;
> FAILED: SemanticException MetaException(message:cannot find field row__id 
> from [0:load_date])
> hive> set hive.optimize.ppd=false;
> hive> select tid from (select row__id.transactionid as tid from hello_acid) 
> sub where tid = 15;
> OK
> tid
> 15
> Time taken: 0.075 seconds, Fetched: 1 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14948) properly handle special characters in identifiers

2016-12-12 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15742804#comment-15742804
 ] 

Eugene Koifman commented on HIVE-14948:
---

Support for quoted identifier was added in HIVE-6013.
Even though QuotedIdentifier TokenType was created there no ASTNode of that 
type is generated.
The ` (back tick) is stripped away in the grammar - see HiveLexer.g where 
QuotedIdentifier is defined.

> properly handle special characters in identifiers
> -
>
> Key: HIVE-14948
> URL: https://issues.apache.org/jira/browse/HIVE-14948
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-14948.01.patch, HIVE-14948.02.patch
>
>
> The treatment of quoted identifiers in HIVE-14943 is inconsistent.  Need to 
> clean this up and if possible only quote those identifiers that need to be 
> quoted in the generated SQL statement



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15413) Primary key constraints forced to be unique across database and table names

2016-12-12 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15742801#comment-15742801
 ] 

Ashutosh Chauhan commented on HIVE-15413:
-

Thing to keep in mind here is unlike other databases, Hive does allow 
referencing tables from different databases in a single query, so making 
constraint name unique may have consequences to that.

> Primary key constraints forced to be unique across database and table names
> ---
>
> Key: HIVE-15413
> URL: https://issues.apache.org/jira/browse/HIVE-15413
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.2.0
>Reporter: Alan Gates
>Priority: Critical
>
> In the RDBMS underlying the metastore the table that stores primary and 
> foreign keys has it's own primary key (at the RDBMS level) of 
> (constraint_name, position).  This means that a constraint name must be 
> unique across all tables and databases in a system.  This is not reasonable.  
> Database and table name should be included in the RDBMS primary key.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15417) Glitches using ACID's row__id hidden column

2016-12-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15742780#comment-15742780
 ] 

Hive QA commented on HIVE-15417:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12842823/HIVE-15417.01.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 10811 tests 
executed
*Failed tests:*
{noformat}
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=250)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] 
(batchId=44)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[nested_column_pruning] 
(batchId=31)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[row__id] (batchId=70)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=134)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
 (batchId=150)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2543/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2543/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2543/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12842823 - PreCommit-HIVE-Build

> Glitches using ACID's row__id hidden column
> ---
>
> Key: HIVE-15417
> URL: https://issues.apache.org/jira/browse/HIVE-15417
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Carter Shanklin
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-15417.01.patch, HIVE-15417.patch
>
>
> This only works if you turn PPD off.
> {code:sql}
> drop table if exists hello_acid;
> create table hello_acid (key int, value int)
> partitioned by (load_date date)
> clustered by(key) into 3 buckets
> stored as orc tblproperties ('transactional'='true');
> insert into hello_acid partition (load_date='2016-03-01') values (1, 1);
> insert into hello_acid partition (load_date='2016-03-02') values (2, 2);
> insert into hello_acid partition (load_date='2016-03-03') values (3, 3);
> {code}
> {code}
> hive> set hive.optimize.ppd=true;
> hive> select tid from (select row__id.transactionid as tid from hello_acid) 
> sub where tid = 15;
> FAILED: SemanticException MetaException(message:cannot find field row__id 
> from [0:load_date])
> hive> set hive.optimize.ppd=false;
> hive> select tid from (select row__id.transactionid as tid from hello_acid) 
> sub where tid = 15;
> OK
> tid
> 15
> Time taken: 0.075 seconds, Fetched: 1 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15414) Fix batchSize for TestNegativeCliDriver

2016-12-12 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-15414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15742752#comment-15742752
 ] 

Sergio Peña commented on HIVE-15414:


Thanks. I did the change on the profile to use 400

qFileTest.clientNegative.driver = TestNegativeCliDriver
qFileTest.clientNegative.directory = ql/src/test/queries/clientnegative
qFileTest.clientNegative.batchSize = 400

We should see the results later after the last Jenkins build finishes.

> Fix batchSize for TestNegativeCliDriver
> ---
>
> Key: HIVE-15414
> URL: https://issues.apache.org/jira/browse/HIVE-15414
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Minor
>
> While analyzing the console output of pre-commit console logs, I noticed that 
> TestNegativeCliDriver batchSize ~770 qfiles which doesn't look right.
> 2016-12-09 22:23:58,945 DEBUG [TestExecutor] ExecutionPhase.execute:96 
> PBatch: QFileTestBatch [batchId=84, size=774, driver=TestNegativeCliDriver, 
> queryFilesProperty=qfile, 
> name=84-TestNegativeCliDriver-nopart_insert.q-input41.q-having1.q-and-771-more..
>   
> I think {{qFileTest.clientNegative.batchSize = 1000}} in 
> {{test-configuration2.properties}} is probably the reason. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14731) Use Tez cartesian product edge in Hive (unpartitioned case only)

2016-12-12 Thread Zhiyuan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15742755#comment-15742755
 ] 

Zhiyuan Yang commented on HIVE-14731:
-

[~jcamachorodriguez], I'm not sure whether it's a good idea to enable this new 
feature by default for user, but I'll use [~hagleitn]'s way to enable it for 
tests first.

> Use Tez cartesian product edge in Hive (unpartitioned case only)
> 
>
> Key: HIVE-14731
> URL: https://issues.apache.org/jira/browse/HIVE-14731
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-14731.1.patch, HIVE-14731.10.patch, 
> HIVE-14731.2.patch, HIVE-14731.3.patch, HIVE-14731.4.patch, 
> HIVE-14731.5.patch, HIVE-14731.6.patch, HIVE-14731.7.patch, 
> HIVE-14731.8.patch, HIVE-14731.9.patch
>
>
> Given cartesian product edge is available in Tez now (see TEZ-3230), let's 
> integrate it into Hive on Tez. This allows us to have more than one reducer 
> in cross product queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15415) Random "java.util.ConcurrentModificationException"

2016-12-12 Thread Vihang Karajgaonkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15742738#comment-15742738
 ] 

Vihang Karajgaonkar commented on HIVE-15415:


Hi [~BigDataOrange] can you paste the stack trace here so that we can confirm 
if it is same as HIVE-15355? Thanks!

> Random "java.util.ConcurrentModificationException"
> --
>
> Key: HIVE-15415
> URL: https://issues.apache.org/jira/browse/HIVE-15415
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 2.1.0
> Environment: Hadoop 2.7.3, Hive 2.1.0
>Reporter: Alexandre Linte
>
> I'm regularly facing Hive job failures through Oozie or through the beeline 
> CLI. The jobs exit with an error "FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.MoveTask. 
> java.util.ConcurrentModificationException (state=08S01,code=1)" but not 100% 
> of the time. 
> it's also important to underline that only one user is working on the table 
> when the jobs are running.
> - stderr
> {noformat}
> Connecting to jdbc:hive2://hiveserver2.bigdata.fr:1/default
> Connected to: Apache Hive (version 2.1.0)
> Driver: Hive JDBC (version 2.1.0)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> No rows affected (1.475 seconds)
> No rows affected (0.004 seconds)
> No rows affected (0.004 seconds)
> No rows affected (58.977 seconds)
> No rows affected (5.524 seconds)
> No rows affected (5.235 seconds)
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 1 from org.apache.hadoop.hive.ql.exec.MoveTask. 
> java.util.ConcurrentModificationException (state=08S01,code=1)
> Closing: 0: jdbc:hive2://hiveserver2.bigdata.fr:1/default
> Intercepting System.exit(2)
> {noformat}
> - stdout
> {noformat}
> Beeline command arguments :
>  -u
>  jdbc:hive2://hiveserver2.bigdata.fr:1/default
>  -n
>  my_user
>  -p
>  DUMMY
>  -d
>  org.apache.hive.jdbc.HiveDriver
>  -f
>  full_job
>  -a
>  delegationToken
>  --hiveconf
>  mapreduce.job.tags=oozie-75b060aacd7ec48c4ed637855e413280
> Fetching child yarn jobs
> tag id : oozie-75b060aacd7ec48c4ed637855e413280
> Child yarn jobs are found -
> =
> >>> Invoking Beeline command line now >>>
> 0: jdbc:hive2://hiveserver2.bigdata.fr> use my_db;
> 0: jdbc:hive2://hiveserver2.bigdata.fr> set hive.execution.engine=tez;
> 0: jdbc:hive2://hiveserver2.bigdata.fr> set tez.queue.name=tez_queue;
> 0: jdbc:hive2://hiveserver2.bigdata.fr>
> 0: jdbc:hive2://hiveserver2.bigdata.fr> insert overwrite table main_table 
> ^M_fd_livcfm
> . . . . . . . . . . . . . . . . . . . . . . .> select
> . . . . . . . . . . . . . . . . . . . . . . .> col.co_cd as co_cd,
> . . . . . . . . . . . . . . . . . . . . . . .> col.line_co_cd as line_co_cd,
> . . . . . . . . . . . . . . . . . . . . . . .> 
> unix_timestamp(min(tt.statut_dt)) ^M as statut_dt
> . . . . . . . . . . . . . . . . . . . . . . .> from 
> dlk_scf_rn_customer_order_li ^Mne col
> . . . . . . . . . . . . . . . . . . . . . . .> join 
> dlk_scf_rn_shipment_handling ^M_utility shu
> . . . . . . . . . . . . . . . . . . . . . . .> on shu.co_cd =col.co_cd
> . . . . . . . . . . . . . . . . . . . . . . .> and shu.line_co_cd = 
> col.line_co_ ^Mcd
> . . . . . . . . . . . . . . . . . . . . . . .> join ( select 
> scaler_internal_ref ^M, statut_dt,recep_number,state,reason
> . . . . . . . . . . . . . . . . . . . . . . .> from 
> dlk_scf_rn_transport_trackin ^Mg where state='LIV' and reason='CFM' ) tt
> . . . . . . . . . . . . . . . . . . . . . . .> on 
> concat('CAL',shu.c_waybill_no) ^M =tt.scaler_internal_ref group by 
> col.co_cd,col.line_co_cd;
> Heart beat
> Heart beat
> 0: jdbc:hive2://hiveserver2.bigdata.fr>
> 0: jdbc:hive2://hiveserver2.bigdata.fr> insert overwrite table main_table 
> ^M_fd_cae
> . . . . . . . . . . . . . . . . . . . . . . .> select
> . . . . . . . . . . . . . . . . . . . . . . .> po_cd as cae, line_po_cd as 
> lcae, ^M origin_co_cd, origin_line_co_cd
> . . . . . . . . . . . . . . . . . . . . . . .> from 
> dlk_scf_rn_purchase_order_li ^Mne
> . . . . . . . . . . . . . . . . . . . . . . .> where instr(po_cd,"7")=1;
> 0: jdbc:hive2://hiveserver2.bigdata.fr>
> 0: jdbc:hive2://hiveserver2.bigdata.fr> insert overwrite table main_table 
> ^M_fd_cai
> . . . . . . . . . . . . . . . . . . . . . . .> select
> . . . . . . . . . . . . . . . . . . . . . . .> po_cd as cai, line_po_cd as 
> lcai, ^M origin_co_cd, origin_line_co_cd
> . . . . . . . . . . . . . . . . . . . . . . .> from 
> dlk_scf_rn_purchase_order_li ^Mne
> . . . . . . . . . . . . . . . . . . . . . . .> where in

[jira] [Commented] (HIVE-15417) Glitches using ACID's row__id hidden column

2016-12-12 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15742739#comment-15742739
 ] 

Ashutosh Chauhan commented on HIVE-15417:
-

+1 I assume failures are unrelated.

> Glitches using ACID's row__id hidden column
> ---
>
> Key: HIVE-15417
> URL: https://issues.apache.org/jira/browse/HIVE-15417
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Carter Shanklin
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-15417.01.patch, HIVE-15417.patch
>
>
> This only works if you turn PPD off.
> {code:sql}
> drop table if exists hello_acid;
> create table hello_acid (key int, value int)
> partitioned by (load_date date)
> clustered by(key) into 3 buckets
> stored as orc tblproperties ('transactional'='true');
> insert into hello_acid partition (load_date='2016-03-01') values (1, 1);
> insert into hello_acid partition (load_date='2016-03-02') values (2, 2);
> insert into hello_acid partition (load_date='2016-03-03') values (3, 3);
> {code}
> {code}
> hive> set hive.optimize.ppd=true;
> hive> select tid from (select row__id.transactionid as tid from hello_acid) 
> sub where tid = 15;
> FAILED: SemanticException MetaException(message:cannot find field row__id 
> from [0:load_date])
> hive> set hive.optimize.ppd=false;
> hive> select tid from (select row__id.transactionid as tid from hello_acid) 
> sub where tid = 15;
> OK
> tid
> 15
> Time taken: 0.075 seconds, Fetched: 1 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15414) Fix batchSize for TestNegativeCliDriver

2016-12-12 Thread Vihang Karajgaonkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15742728#comment-15742728
 ] 

Vihang Karajgaonkar commented on HIVE-15414:


Thanks [~pvary]. I think you may be right but I thought of checking it either 
ways in the hope that it might improve the run time further :) 

[~spena] Based on these logs on the latest run:

2016-12-12 17:21:30,903  INFO [HostExecutor 46] LocalCommand.:45 Starting 
LocalCommandId=205113: ssh -v -i /home/hiveptest/.ssh/hive-ptest-user-key  -l 
hiveptest 104.154.105.18 'bash 
/home/hiveptest/104.154.105.18-hiveptest-0/scratch/hiveptest-84-TestNegativeCliDriver-nopart_insert.q-input41.q-having1.q-and-771-more.sh'

2016-12-12 17:33:28,341  INFO [HostExecutor 46] 
LocalCommand.awaitProcessCompletion:67 Finished LocalCommandId=205113. 
ElapsedTime(ms)=717437
2016-12-12 17:33:28,341  INFO [HostExecutor 46] 
HostExecutor.executeTestBatch:261 Completed executing tests for batch 
[84-TestNegativeCliDriver-nopart_insert.q-input41.q-having1.q-and-771-more] on 
host 104.154.105.18. ElapsedTime(ms)=717437

It takes about ~700 sec.

I did a quick search of the logs and it looks like most other batches take 
between 200-300 sec so we may see some benefits if we reduce the limit from 
1000 to 400 so that the tests are divided into 2 batches.


> Fix batchSize for TestNegativeCliDriver
> ---
>
> Key: HIVE-15414
> URL: https://issues.apache.org/jira/browse/HIVE-15414
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Minor
>
> While analyzing the console output of pre-commit console logs, I noticed that 
> TestNegativeCliDriver batchSize ~770 qfiles which doesn't look right.
> 2016-12-09 22:23:58,945 DEBUG [TestExecutor] ExecutionPhase.execute:96 
> PBatch: QFileTestBatch [batchId=84, size=774, driver=TestNegativeCliDriver, 
> queryFilesProperty=qfile, 
> name=84-TestNegativeCliDriver-nopart_insert.q-input41.q-having1.q-and-771-more..
>   
> I think {{qFileTest.clientNegative.batchSize = 1000}} in 
> {{test-configuration2.properties}} is probably the reason. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14731) Use Tez cartesian product edge in Hive (unpartitioned case only)

2016-12-12 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15742721#comment-15742721
 ] 

Gunther Hagleitner commented on HIVE-14731:
---

Can you also please turn the feature on in "data/conf/hive-site.xml" that will 
enable them for all unit tests? I think the "off" case will implicitly be 
tested through MR/spark/etc test drivers.

> Use Tez cartesian product edge in Hive (unpartitioned case only)
> 
>
> Key: HIVE-14731
> URL: https://issues.apache.org/jira/browse/HIVE-14731
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-14731.1.patch, HIVE-14731.10.patch, 
> HIVE-14731.2.patch, HIVE-14731.3.patch, HIVE-14731.4.patch, 
> HIVE-14731.5.patch, HIVE-14731.6.patch, HIVE-14731.7.patch, 
> HIVE-14731.8.patch, HIVE-14731.9.patch
>
>
> Given cartesian product edge is available in Tez now (see TEZ-3230), let's 
> integrate it into Hive on Tez. This allows us to have more than one reducer 
> in cross product queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15401) Import constraints into HBase metastore

2016-12-12 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15742695#comment-15742695
 ] 

Daniel Dai commented on HIVE-15401:
---

+1

> Import constraints into HBase metastore
> ---
>
> Key: HIVE-15401
> URL: https://issues.apache.org/jira/browse/HIVE-15401
> Project: Hive
>  Issue Type: Sub-task
>  Components: HBase Metastore
>Affects Versions: 2.1.1
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-15401.patch
>
>
> Since HIVE-15342 added support for primary and foreign keys in the HBase 
> metastore we should support them in HBaseImport as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15376) Improve heartbeater scheduling for transactions

2016-12-12 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-15376:
-
Status: Patch Available  (was: Open)

> Improve heartbeater scheduling for transactions
> ---
>
> Key: HIVE-15376
> URL: https://issues.apache.org/jira/browse/HIVE-15376
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15376.1.patch, HIVE-15376.2.patch, 
> HIVE-15376.3.patch, HIVE-15376.4.patch, HIVE-15376.5.patch, HIVE-15376.6.patch
>
>
> HIVE-12366 improved the heartbeater logic by bringing down the gap between 
> the lock acquisition and first heartbeat, but that's not enough, there may 
> still be some issue, e.g.
>  Time A: a transaction is opened
>  Time B: acquireLocks is called (blocking call), but it can take a long time 
> to actually acquire the locks and return if the system is busy
>  Time C: as acquireLocks returns, the first heartbeat is sent
> If hive.txn.timeout < C - A, then the transaction will be timed out and 
> aborted, thus causing failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15376) Improve heartbeater scheduling for transactions

2016-12-12 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-15376:
-
Attachment: HIVE-15376.6.patch

> Improve heartbeater scheduling for transactions
> ---
>
> Key: HIVE-15376
> URL: https://issues.apache.org/jira/browse/HIVE-15376
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15376.1.patch, HIVE-15376.2.patch, 
> HIVE-15376.3.patch, HIVE-15376.4.patch, HIVE-15376.5.patch, HIVE-15376.6.patch
>
>
> HIVE-12366 improved the heartbeater logic by bringing down the gap between 
> the lock acquisition and first heartbeat, but that's not enough, there may 
> still be some issue, e.g.
>  Time A: a transaction is opened
>  Time B: acquireLocks is called (blocking call), but it can take a long time 
> to actually acquire the locks and return if the system is busy
>  Time C: as acquireLocks returns, the first heartbeat is sent
> If hive.txn.timeout < C - A, then the transaction will be timed out and 
> aborted, thus causing failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15376) Improve heartbeater scheduling for transactions

2016-12-12 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-15376:
-
Status: Open  (was: Patch Available)

> Improve heartbeater scheduling for transactions
> ---
>
> Key: HIVE-15376
> URL: https://issues.apache.org/jira/browse/HIVE-15376
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15376.1.patch, HIVE-15376.2.patch, 
> HIVE-15376.3.patch, HIVE-15376.4.patch, HIVE-15376.5.patch, HIVE-15376.6.patch
>
>
> HIVE-12366 improved the heartbeater logic by bringing down the gap between 
> the lock acquisition and first heartbeat, but that's not enough, there may 
> still be some issue, e.g.
>  Time A: a transaction is opened
>  Time B: acquireLocks is called (blocking call), but it can take a long time 
> to actually acquire the locks and return if the system is busy
>  Time C: as acquireLocks returns, the first heartbeat is sent
> If hive.txn.timeout < C - A, then the transaction will be timed out and 
> aborted, thus causing failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15376) Improve heartbeater scheduling for transactions

2016-12-12 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-15376:
-
Description: 
HIVE-12366 improved the heartbeater logic by bringing down the gap between the 
lock acquisition and first heartbeat, but that's not enough, there may still be 
some issue, e.g.
 Time A: a transaction is opened
 Time B: acquireLocks is called (blocking call), but it can take a long time to 
actually acquire the locks and return if the system is busy
 Time C: as acquireLocks returns, the first heartbeat is sent
If hive.txn.timeout < C - A, then the transaction will be timed out and 
aborted, thus causing failure.

> Improve heartbeater scheduling for transactions
> ---
>
> Key: HIVE-15376
> URL: https://issues.apache.org/jira/browse/HIVE-15376
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15376.1.patch, HIVE-15376.2.patch, 
> HIVE-15376.3.patch, HIVE-15376.4.patch, HIVE-15376.5.patch
>
>
> HIVE-12366 improved the heartbeater logic by bringing down the gap between 
> the lock acquisition and first heartbeat, but that's not enough, there may 
> still be some issue, e.g.
>  Time A: a transaction is opened
>  Time B: acquireLocks is called (blocking call), but it can take a long time 
> to actually acquire the locks and return if the system is busy
>  Time C: as acquireLocks returns, the first heartbeat is sent
> If hive.txn.timeout < C - A, then the transaction will be timed out and 
> aborted, thus causing failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14731) Use Tez cartesian product edge in Hive (unpartitioned case only)

2016-12-12 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15742658#comment-15742658
 ] 

Jesus Camacho Rodriguez commented on HIVE-14731:


[~aplusplus], is there a reason to not enable the new edge by default? That 
would increase test coverage.

> Use Tez cartesian product edge in Hive (unpartitioned case only)
> 
>
> Key: HIVE-14731
> URL: https://issues.apache.org/jira/browse/HIVE-14731
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-14731.1.patch, HIVE-14731.10.patch, 
> HIVE-14731.2.patch, HIVE-14731.3.patch, HIVE-14731.4.patch, 
> HIVE-14731.5.patch, HIVE-14731.6.patch, HIVE-14731.7.patch, 
> HIVE-14731.8.patch, HIVE-14731.9.patch
>
>
> Given cartesian product edge is available in Tez now (see TEZ-3230), let's 
> integrate it into Hive on Tez. This allows us to have more than one reducer 
> in cross product queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15413) Primary key constraints forced to be unique across database and table names

2016-12-12 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15742643#comment-15742643
 ] 

Alan Gates commented on HIVE-15413:
---

Just making it unique within the database seems fine to me.

> Primary key constraints forced to be unique across database and table names
> ---
>
> Key: HIVE-15413
> URL: https://issues.apache.org/jira/browse/HIVE-15413
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.2.0
>Reporter: Alan Gates
>Priority: Critical
>
> In the RDBMS underlying the metastore the table that stores primary and 
> foreign keys has it's own primary key (at the RDBMS level) of 
> (constraint_name, position).  This means that a constraint name must be 
> unique across all tables and databases in a system.  This is not reasonable.  
> Database and table name should be included in the RDBMS primary key.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14731) Use Tez cartesian product edge in Hive (unpartitioned case only)

2016-12-12 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15742640#comment-15742640
 ] 

Gunther Hagleitner commented on HIVE-14731:
---

Two quick comments: a) listing the entire output of the x prod in the golden 
file makes the patch unnecessarily large. Can you either do a sum(hash(...)) 
over the output or use smaller tables? b) this patch probably warrants more 
tests, unless there are other places in the q files that already cover that.

> Use Tez cartesian product edge in Hive (unpartitioned case only)
> 
>
> Key: HIVE-14731
> URL: https://issues.apache.org/jira/browse/HIVE-14731
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-14731.1.patch, HIVE-14731.10.patch, 
> HIVE-14731.2.patch, HIVE-14731.3.patch, HIVE-14731.4.patch, 
> HIVE-14731.5.patch, HIVE-14731.6.patch, HIVE-14731.7.patch, 
> HIVE-14731.8.patch, HIVE-14731.9.patch
>
>
> Given cartesian product edge is available in Tez now (see TEZ-3230), let's 
> integrate it into Hive on Tez. This allows us to have more than one reducer 
> in cross product queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15294) Capture additional metadata to replicate a simple insert at destination

2016-12-12 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-15294:

Status: Patch Available  (was: Open)

> Capture additional metadata to replicate a simple insert at destination
> ---
>
> Key: HIVE-15294
> URL: https://issues.apache.org/jira/browse/HIVE-15294
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-15294.1.patch
>
>
> For replicating inserts like {{INSERT INTO ... SELECT ... FROM}}, we will 
> need to capture the newly added files in the notification message to be able 
> to replicate the event at destination. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15294) Capture additional metadata to replicate a simple insert at destination

2016-12-12 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-15294:

Attachment: HIVE-15294.1.patch

> Capture additional metadata to replicate a simple insert at destination
> ---
>
> Key: HIVE-15294
> URL: https://issues.apache.org/jira/browse/HIVE-15294
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-15294.1.patch
>
>
> For replicating inserts like {{INSERT INTO ... SELECT ... FROM}}, we will 
> need to capture the newly added files in the notification message to be able 
> to replicate the event at destination. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14735) Build Infra: Spark artifacts download takes a long time

2016-12-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15742633#comment-15742633
 ] 

Hive QA commented on HIVE-14735:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12842623/HIVE-14735.3.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 10795 tests 
executed
*Failed tests:*
{noformat}
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=108)

[groupby_grouping_id2.q,input17.q,bucketmapjoin12.q,ppd_gby_join.q,auto_join10.q,ptf_rcfile.q,vectorized_rcfile_columnar.q,vector_elt.q,ppd_join5.q,ppd_join.q,join_filters_overlap.q,join_cond_pushdown_1.q,timestamp_3.q,load_dyn_part6.q,stats_noscan_2.q]
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=250)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a]
 (batchId=134)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=134)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
 (batchId=150)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2542/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2542/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2542/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12842623 - PreCommit-HIVE-Build

> Build Infra: Spark artifacts download takes a long time
> ---
>
> Key: HIVE-14735
> URL: https://issues.apache.org/jira/browse/HIVE-14735
> Project: Hive
>  Issue Type: Bug
>  Components: Build Infrastructure
>Reporter: Vaibhav Gumashta
>Assignee: Zoltan Haindrich
> Attachments: HIVE-14735.1.patch, HIVE-14735.1.patch, 
> HIVE-14735.1.patch, HIVE-14735.1.patch, HIVE-14735.2.patch, HIVE-14735.3.patch
>
>
> In particular this command:
> {{curl -Sso ./../thirdparty/spark-1.6.0-bin-hadoop2-without-hive.tgz 
> http://d3jw87u4immizc.cloudfront.net/spark-tarball/spark-1.6.0-bin-hadoop2-without-hive.tgz}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >