Review Request 25571: Stats collection for columns fails on a partitioned table with null values in partitioning column

2014-09-12 Thread Ashutosh Chauhan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25571/
---

Review request for hive.


Bugs: HIVE-8062
https://issues.apache.org/jira/browse/HIVE-8062


Repository: hive-git


Description
---

Stats collection for columns fails on a partitioned table with null values in 
partitioning column


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java 176a593 
  ql/src/test/queries/clientpositive/stats_only_null.q b47bc48 
  ql/src/test/results/clientpositive/stats_only_null.q.out 063da37 

Diff: https://reviews.apache.org/r/25571/diff/


Testing
---

Added new test.


Thanks,

Ashutosh Chauhan



[jira] [Updated] (HIVE-8017) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]

2014-09-12 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-8017:
-
Attachment: HIVE-8017.5-spark.patch

Update the golden file for union_remove_25

 Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark 
 Branch]
 ---

 Key: HIVE-8017
 URL: https://issues.apache.org/jira/browse/HIVE-8017
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-8017-spark.patch, HIVE-8017.2-spark.patch, 
 HIVE-8017.3-spark.patch, HIVE-8017.4-spark.patch, HIVE-8017.5-spark.patch


 HiveKey should be used as the key type because it holds the hash code for 
 partitioning. While BytesWritable serves partitioning well for simple cases, 
 we have to use {{HiveKey.hashCode}} for more complicated ones, e.g. join, 
 bucketed table, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Query regarding Hive configuration on windows OS and make connection through asp.net

2014-09-12 Thread Kapil Khare
Hi ,

I want to configure Hive-0.13.1 on my windows7 machine ,But some error appear 
during  configuration .I already install hadoop-2.5.0 and cygwin64 terminal on 
my machine. Both are working fine.
For hive configuration on windows there is no specific blog or post available 
on internet. So I need your help or some steps how we can configuring 
Hive-0.13.1 on windows machine and how will we make ODBC connection between 
Hive to asp.net application for both query and Data Cubs .

Please suggest me the steps.

Thanks,
Kapil Khare
Team Lead
Helm360
Phone: +91-120-499 3300
Mobile: +91- 9718012939
A-16, Sector 16 | Noida, UP, India 201 301
kkh...@helm360.commailto:kkh...@helm360.com | 
www.helm360.comhttps://mail.ccsglobaltech.com/exchweb/bin/redir.asp?URL=http://www.helm360.com/


[Description: Description: Description: logox80]






Query regarding Hive configuration on windows OS and make connection through asp.net

2014-09-12 Thread Kapil Khare
Hi ,

I want to configure Hive-0.13.1 on my windows7 machine ,But some error appear 
during  configuration .I already install hadoop-2.5.0 and cygwin64 terminal on 
my machine. Both are working fine.
For hive configuration on windows there is no specific blog or post available 
on internet. So I need your help or some steps how we can configuring 
Hive-0.13.1 on windows machine and how will we make ODBC connection between 
Hive to asp.net application for both query and Data Cubs .

Please suggest me the steps.

Thanks,
Kapil Khare
Team Lead
Helm360
Phone: +91-120-499 3300
Mobile: +91- 9718012939
A-16, Sector 16 | Noida, UP, India 201 301
kkh...@helm360.commailto:kkh...@helm360.com | 
www.helm360.comhttps://mail.ccsglobaltech.com/exchweb/bin/redir.asp?URL=http://www.helm360.com/


[Description: Description: Description: logox80]






[jira] [Created] (HIVE-8069) CBO: RowResolver after SubQuery predicate handling should be reset to outer query block RR

2014-09-12 Thread Harish Butani (JIRA)
Harish Butani created HIVE-8069:
---

 Summary: CBO: RowResolver after SubQuery predicate handling should 
be reset to outer query block RR
 Key: HIVE-8069
 URL: https://issues.apache.org/jira/browse/HIVE-8069
 Project: Hive
  Issue Type: Sub-task
Reporter: Harish Butani
Assignee: Harish Butani






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8069) CBO: RowResolver after SubQuery predicate handling should be reset to outer query block RR

2014-09-12 Thread Harish Butani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harish Butani updated HIVE-8069:

Attachment: HIVE-8069.1.patch

 CBO: RowResolver after SubQuery predicate handling should be reset to outer 
 query block RR
 --

 Key: HIVE-8069
 URL: https://issues.apache.org/jira/browse/HIVE-8069
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-8069.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8067) set default table permissions for table owner to have all privileges

2014-09-12 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131152#comment-14131152
 ] 

Lefty Leverenz commented on HIVE-8067:
--

Agreed, and the description should list all the possible values.  

Patch 1 adds defaults INSERT, SELECT, UPDATE, and DELETE but the old 
description says 'An example like select,drop will grant select and drop 
privilege to the owner of the table' and the wiki lists DROP instead of DELETE.

Is the wiki list complete and accurate? 

* ALL, ALTER, UPDATE, CREATE (irrelevant here), DROP, INDEX (not implemented), 
LOCK, SELECT, SHOW_DATABASE (irrelevant here)
* [Hive Default Authorization (Legacy Mode) -- Privileges | 
https://cwiki.apache.org/confluence/display/Hive/Hive+Default+Authorization+-+Legacy+Mode#HiveDefaultAuthorization-LegacyMode-Privileges]

 set default table permissions for table owner to have all privileges
 

 Key: HIVE-8067
 URL: https://issues.apache.org/jira/browse/HIVE-8067
 Project: Hive
  Issue Type: Bug
  Components: Authorization, SQLStandardAuthorization
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-8067.1.patch


 When tables are created using without SQLStandards based authorization being 
 enabled, the table owner does not have any privileges on the table.
 It makes sense to set the default privileges to be compatible with sql 
 standard mode's expected default privileges for the owner of the table, 
 instead of setting no privileges at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8069) CBO: RowResolver after SubQuery predicate handling should be reset to outer query block RR

2014-09-12 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131153#comment-14131153
 ] 

Laljo John Pullokkaran commented on HIVE-8069:
--

+1

 CBO: RowResolver after SubQuery predicate handling should be reset to outer 
 query block RR
 --

 Key: HIVE-8069
 URL: https://issues.apache.org/jira/browse/HIVE-8069
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-8069.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8070) TestHWIServer failed due to wrong references to war and properties file

2014-09-12 Thread Bing Li (JIRA)
Bing Li created HIVE-8070:
-

 Summary: TestHWIServer failed due to wrong references to war and 
properties file
 Key: HIVE-8070
 URL: https://issues.apache.org/jira/browse/HIVE-8070
 Project: Hive
  Issue Type: Test
  Components: Tests
Affects Versions: 0.13.1
Reporter: Bing Li
Assignee: Bing Li
 Fix For: 0.14.0


In testServerInit() method of that test class, it's still using 
build.properties to retrieve the version # for the war file name



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8052) Vectorization: min() on TimeStamp datatype fails with error Vector aggregate not implemented: min for type: TIMESTAMP

2014-09-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131157#comment-14131157
 ] 

Hive QA commented on HIVE-8052:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12668198/HIVE-8052.02.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6197 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/753/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/753/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-753/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12668198

 Vectorization: min() on TimeStamp datatype fails with error Vector aggregate 
 not implemented: min for type: TIMESTAMP
 ---

 Key: HIVE-8052
 URL: https://issues.apache.org/jira/browse/HIVE-8052
 Project: Hive
  Issue Type: Bug
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Attachments: HIVE-8052.01.patch, HIVE-8052.02.patch


 Changes in HIVE-5760 to make explicit when timestamp and date can be 
 vectorized as Long were accidentally to strict for min, max, count, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8070) TestHWIServer failed due to wrong references to war and properties file

2014-09-12 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131159#comment-14131159
 ] 

Bing Li commented on HIVE-8070:
---

This JIRA is blocked by HIVE-7233

 TestHWIServer failed due to wrong references to war and properties file
 ---

 Key: HIVE-8070
 URL: https://issues.apache.org/jira/browse/HIVE-8070
 Project: Hive
  Issue Type: Test
  Components: Tests
Affects Versions: 0.13.1
Reporter: Bing Li
Assignee: Bing Li
 Fix For: 0.14.0


 In testServerInit() method of that test class, it's still using 
 build.properties to retrieve the version # for the war file name



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7868) AvroSerDe error handling could be improved

2014-09-12 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-7868:
---
Attachment: (was: HIVE-7868.patch)

 AvroSerDe error handling could be improved
 --

 Key: HIVE-7868
 URL: https://issues.apache.org/jira/browse/HIVE-7868
 Project: Hive
  Issue Type: Improvement
Reporter: Brock Noland
Assignee: Ferdinand Xu

 When an Avro schema is invalid, AvroSerDe returns an error message instead of 
 throwing an exception. This is described in 
 {{AvroSerdeUtils.determineSchemaOrReturnErrorSchema}}:
 {noformat}
   /**
* Attempt to determine the schema via the usual means, but do not throw
* an exception if we fail.  Instead, signal failure via a special
* schema.  This is used because Hive calls init on the serde during
* any call, including calls to update the serde properties, meaning
* if the serde is in a bad state, there is no way to update that state.
*/
 {noformat}
 I believe we should find a way to provide a better experience to our users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7868) AvroSerDe error handling could be improved

2014-09-12 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-7868:
---
Attachment: HIVE-7868.1.patch

 AvroSerDe error handling could be improved
 --

 Key: HIVE-7868
 URL: https://issues.apache.org/jira/browse/HIVE-7868
 Project: Hive
  Issue Type: Improvement
Reporter: Brock Noland
Assignee: Ferdinand Xu
 Attachments: HIVE-7868.1.patch


 When an Avro schema is invalid, AvroSerDe returns an error message instead of 
 throwing an exception. This is described in 
 {{AvroSerdeUtils.determineSchemaOrReturnErrorSchema}}:
 {noformat}
   /**
* Attempt to determine the schema via the usual means, but do not throw
* an exception if we fail.  Instead, signal failure via a special
* schema.  This is used because Hive calls init on the serde during
* any call, including calls to update the serde properties, meaning
* if the serde is in a bad state, there is no way to update that state.
*/
 {noformat}
 I believe we should find a way to provide a better experience to our users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8062) Stats collection for columns fails on a partitioned table with null values in partitioning column

2014-09-12 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131169#comment-14131169
 ] 

Gunther Hagleitner commented on HIVE-8062:
--

LGTM +1

 Stats collection for columns fails on a partitioned table with null values in 
 partitioning column
 -

 Key: HIVE-8062
 URL: https://issues.apache.org/jira/browse/HIVE-8062
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.14.0
Reporter: Deepesh Khandelwal
Assignee: Ashutosh Chauhan
 Attachments: HIVE-8062.patch


 Steps to reproduce:
 1. Create a data file abc.txt with the following contents:
 {noformat}
 a,1
 b,
 {noformat}
 2. Use the Hive CLI to create and load the partitioned table:
 {noformat}
 hive create table abc(a string, b int);
 OK
 Time taken: 0.272 seconds
 hive load data local inpath 'abc.txt' into table abc;
 Loading data to table default.abc
 Table default.abc stats: [numFiles=1, numRows=0, totalSize=7, rawDataSize=0]
 OK
 Time taken: 0.463 seconds
 hive create table abc1(a string) partitioned by (b int);
 OK
 Time taken: 0.098 seconds
 hive set hive.exec.dynamic.partition.mode=nonstrict;
 hive insert overwrite table abc1 partition (b) select a, b from abc;
 Query ID = hrt_qa_20140911210909_1200fae7-1e18-4e0d-b74f-040453c27cff
 Total jobs = 1
 Launching Job 1 out of 1
 Status: Running (application id: Executing on YARN cluster with App id 
 application_1410457588978_0063)
 Map 1: -/-Reducer 2: 0/1
 Map 1: 0/1Reducer 2: 0/1
 Map 1: 0(+1)/1Reducer 2: 0/1
 Map 1: 1/1Reducer 2: 0(+1)/1
 Map 1: 1/1Reducer 2: 0/1
 Map 1: 1/1Reducer 2: 1/1
 Status: Finished successfully
 Loading data to table default.abc1 partition (b=null)
   Loading partition {b=__HIVE_DEFAULT_PARTITION__}
 Partition default.abc1{b=__HIVE_DEFAULT_PARTITION__} stats: [numFiles=1, 
 numRows=2, totalSize=7, rawDataSize=5]
 OK
 Time taken: 7.49 seconds
 {noformat}
 3. Now run the analyze statistics command for columns:
 {noformat}
 hive analyze table abc1 partition (b) compute statistics for columns;
 Query ID = hrt_qa_20140911211010_440bdb4a-6a0d-496b-9d2e-5fc84db3d0ee
 Total jobs = 1
 Launching Job 1 out of 1
 Status: Running (application id: Executing on YARN cluster with App id 
 application_1410457588978_0063)
 Map 1: 0(+1)/1Reducer 2: 0/1
 Map 1: 1/1Reducer 2: 0(+1)/1
 Map 1: 1/1Reducer 2: 1/1
 Status: Finished successfully
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.ColumnStatsTask
 {noformat}
 The analyze statistics for columns fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (HIVE-8070) TestHWIServer failed due to wrong references to war and properties file

2014-09-12 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-8070 started by Bing Li.
-
 TestHWIServer failed due to wrong references to war and properties file
 ---

 Key: HIVE-8070
 URL: https://issues.apache.org/jira/browse/HIVE-8070
 Project: Hive
  Issue Type: Test
  Components: Tests
Affects Versions: 0.13.1
Reporter: Bing Li
Assignee: Bing Li
 Fix For: 0.14.0

 Attachments: HIVE-8070.1.patch


 In testServerInit() method of that test class, it's still using 
 build.properties to retrieve the version # for the war file name



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8070) TestHWIServer failed due to wrong references to war and properties file

2014-09-12 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-8070:
--
Status: Patch Available  (was: In Progress)

The patch is generated for trunk

 TestHWIServer failed due to wrong references to war and properties file
 ---

 Key: HIVE-8070
 URL: https://issues.apache.org/jira/browse/HIVE-8070
 Project: Hive
  Issue Type: Test
  Components: Tests
Affects Versions: 0.13.1
Reporter: Bing Li
Assignee: Bing Li
 Fix For: 0.14.0

 Attachments: HIVE-8070.1.patch


 In testServerInit() method of that test class, it's still using 
 build.properties to retrieve the version # for the war file name



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8070) TestHWIServer failed due to wrong references to war and properties file

2014-09-12 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-8070:
--
Attachment: HIVE-8070.1.patch

 TestHWIServer failed due to wrong references to war and properties file
 ---

 Key: HIVE-8070
 URL: https://issues.apache.org/jira/browse/HIVE-8070
 Project: Hive
  Issue Type: Test
  Components: Tests
Affects Versions: 0.13.1
Reporter: Bing Li
Assignee: Bing Li
 Fix For: 0.14.0

 Attachments: HIVE-8070.1.patch


 In testServerInit() method of that test class, it's still using 
 build.properties to retrieve the version # for the war file name



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8062) Stats collection for columns fails on a partitioned table with null values in partitioning column

2014-09-12 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131174#comment-14131174
 ] 

Pengcheng Xiong commented on HIVE-8062:
---

+1

 Stats collection for columns fails on a partitioned table with null values in 
 partitioning column
 -

 Key: HIVE-8062
 URL: https://issues.apache.org/jira/browse/HIVE-8062
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.14.0
Reporter: Deepesh Khandelwal
Assignee: Ashutosh Chauhan
 Attachments: HIVE-8062.patch


 Steps to reproduce:
 1. Create a data file abc.txt with the following contents:
 {noformat}
 a,1
 b,
 {noformat}
 2. Use the Hive CLI to create and load the partitioned table:
 {noformat}
 hive create table abc(a string, b int);
 OK
 Time taken: 0.272 seconds
 hive load data local inpath 'abc.txt' into table abc;
 Loading data to table default.abc
 Table default.abc stats: [numFiles=1, numRows=0, totalSize=7, rawDataSize=0]
 OK
 Time taken: 0.463 seconds
 hive create table abc1(a string) partitioned by (b int);
 OK
 Time taken: 0.098 seconds
 hive set hive.exec.dynamic.partition.mode=nonstrict;
 hive insert overwrite table abc1 partition (b) select a, b from abc;
 Query ID = hrt_qa_20140911210909_1200fae7-1e18-4e0d-b74f-040453c27cff
 Total jobs = 1
 Launching Job 1 out of 1
 Status: Running (application id: Executing on YARN cluster with App id 
 application_1410457588978_0063)
 Map 1: -/-Reducer 2: 0/1
 Map 1: 0/1Reducer 2: 0/1
 Map 1: 0(+1)/1Reducer 2: 0/1
 Map 1: 1/1Reducer 2: 0(+1)/1
 Map 1: 1/1Reducer 2: 0/1
 Map 1: 1/1Reducer 2: 1/1
 Status: Finished successfully
 Loading data to table default.abc1 partition (b=null)
   Loading partition {b=__HIVE_DEFAULT_PARTITION__}
 Partition default.abc1{b=__HIVE_DEFAULT_PARTITION__} stats: [numFiles=1, 
 numRows=2, totalSize=7, rawDataSize=5]
 OK
 Time taken: 7.49 seconds
 {noformat}
 3. Now run the analyze statistics command for columns:
 {noformat}
 hive analyze table abc1 partition (b) compute statistics for columns;
 Query ID = hrt_qa_20140911211010_440bdb4a-6a0d-496b-9d2e-5fc84db3d0ee
 Total jobs = 1
 Launching Job 1 out of 1
 Status: Running (application id: Executing on YARN cluster with App id 
 application_1410457588978_0063)
 Map 1: 0(+1)/1Reducer 2: 0/1
 Map 1: 1/1Reducer 2: 0(+1)/1
 Map 1: 1/1Reducer 2: 1/1
 Status: Finished successfully
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.ColumnStatsTask
 {noformat}
 The analyze statistics for columns fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7981) alias of compound aggregation functions fails in having clause

2014-09-12 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-7981:

Attachment: HIVE-7981.2.patch.txt

 alias of compound aggregation functions fails in having clause
 --

 Key: HIVE-7981
 URL: https://issues.apache.org/jira/browse/HIVE-7981
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: eyal gruss
Assignee: Navis
Priority: Minor
 Attachments: HIVE-7981.1.patch.txt, HIVE-7981.2.patch.txt


 hive select max(time)-min(time) as span from mytable group by name having 
 span0;
 FAILED: SemanticException [Error 10025]: Line 1:92 Expression not in GROUP BY 
 key '0'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8071) hive shell tries to write hive-exec.jar for each run

2014-09-12 Thread Rajesh Balamohan (JIRA)
Rajesh Balamohan created HIVE-8071:
--

 Summary: hive shell tries to write hive-exec.jar for each run
 Key: HIVE-8071
 URL: https://issues.apache.org/jira/browse/HIVE-8071
 Project: Hive
  Issue Type: Bug
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan




For every run of the hive CLI there is a delay for the shell startup

14/07/31 23:07:19 INFO Configuration.deprecation: fs.default.name is 
deprecated. Instead, use fs.defaultFS
14/07/31 23:07:19 INFO tez.DagUtils: Hive jar directory is 
hdfs://mac-10:8020/user/gopal/apps/2014-Jul-31/hive/
14/07/31 23:07:19 INFO tez.DagUtils: Localizing resource because it does not 
exist: 
file:/home/gopal/tez-autobuild/dist/hive/lib/hive-exec-0.14.0-SNAPSHOT.jar to 
dest: 
hdfs://mac-10:8020/user/gopal/apps/2014-Jul-31/hive/hive-exec-0.14.0-SNAPSHOTde1f82f0b5561d3db9e3080dfb2897210a3bda4ca5e7b14e881e381115837fd8.
jar
14/07/31 23:07:19 INFO tez.DagUtils: Looks like another thread is writing the 
same file will wait.
14/07/31 23:07:19 INFO tez.DagUtils: Number of wait attempts: 5. Wait interval: 
5000
14/07/31 23:07:19 INFO tez.DagUtils: Resource modification time: 1406870512963
14/07/31 23:07:20 INFO tez.TezSessionState: Opening new Tez Session (id: 
02d6b558-44cc-4182-b2f2-6a37ffdd25d2, scratch dir: 
hdfs://mac-10:8020/tmp/hive-gopal/_tez_session_dir/02d6b558-44cc-4182-b2f2-6a37ffdd25d2)

Traced this to a method which does PRIVATE LRs - this is marked as PRIVATE even 
if it is from a common install dir.
{code}
 public LocalResource localizeResource(Path src, Path dest, Configuration conf)
throws IOException {

return createLocalResource(destFS, dest, LocalResourceType.FILE,
LocalResourceVisibility.PRIVATE);
  }
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (HIVE-7858) Parquet compression should be configurable via table property

2014-09-12 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-7858 started by Ferdinand Xu.
--
 Parquet compression should be configurable via table property
 -

 Key: HIVE-7858
 URL: https://issues.apache.org/jira/browse/HIVE-7858
 Project: Hive
  Issue Type: Improvement
Reporter: Brock Noland
Assignee: Ferdinand Xu

 ORC supports the orc.compress table property:
 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC
 {noformat}
 create table Addresses (
   name string,
   street string,
   city string,
   state string,
   zip int
 ) stored as orc tblproperties (orc.compress=NONE);
 {noformat}
 I think it'd be great to support the same for Parquet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-5690) Support subquery for single sourced multi query

2014-09-12 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-5690:

Attachment: HIVE-5690.12.patch.txt

Another rebasing on trunk

 Support subquery for single sourced multi query
 ---

 Key: HIVE-5690
 URL: https://issues.apache.org/jira/browse/HIVE-5690
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: D13791.1.patch, HIVE-5690.10.patch.txt, 
 HIVE-5690.11.patch.txt, HIVE-5690.12.patch.txt, HIVE-5690.2.patch.txt, 
 HIVE-5690.3.patch.txt, HIVE-5690.4.patch.txt, HIVE-5690.5.patch.txt, 
 HIVE-5690.6.patch.txt, HIVE-5690.7.patch.txt, HIVE-5690.8.patch.txt, 
 HIVE-5690.9.patch.txt


 Single sourced multi (insert) query is very useful for various ETL processes 
 but it does not allow subqueries included. For example, 
 {noformat}
 explain from src 
 insert overwrite table x1 select * from (select distinct key,value) b order 
 by key
 insert overwrite table x2 select * from (select distinct key,value) c order 
 by value;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8040) Commit for HIVE-7925 breaks hadoop-1 build

2014-09-12 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131197#comment-14131197
 ] 

Navis commented on HIVE-8040:
-

I've changed ExitException to RuntimeException and confirmed test passed. 
[~mithun], could you check this?

 Commit for HIVE-7925 breaks hadoop-1 build
 --

 Key: HIVE-8040
 URL: https://issues.apache.org/jira/browse/HIVE-8040
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Affects Versions: 0.14.0
Reporter: Xuefu Zhang
 Attachments: HIVE-8040.1.patch.txt, HIVE-8040.2.patch.txt


 {code}
 [ERROR] Failed to execute goal 
 org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) 
 on project hive-metastore: Compilation failure
 [ERROR] 
 /home/xzhang/apache/hive7/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:[45,37]
  package org.apache.commons.math3.stat does not exist
 [ERROR] - [Help 1]
 {code}
 Missing pom file changes maybe?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8017) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]

2014-09-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131201#comment-14131201
 ] 

Hive QA commented on HIVE-8017:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12668276/HIVE-8017.5-spark.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 6343 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/125/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/125/console
Test logs: 
http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-125/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12668276

 Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark 
 Branch]
 ---

 Key: HIVE-8017
 URL: https://issues.apache.org/jira/browse/HIVE-8017
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-8017-spark.patch, HIVE-8017.2-spark.patch, 
 HIVE-8017.3-spark.patch, HIVE-8017.4-spark.patch, HIVE-8017.5-spark.patch


 HiveKey should be used as the key type because it holds the hash code for 
 partitioning. While BytesWritable serves partitioning well for simple cases, 
 we have to use {{HiveKey.hashCode}} for more complicated ones, e.g. join, 
 bucketed table, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7163) ReduceRecordProcessor adds null values to getShuffleInputs - Causing NPE

2014-09-12 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131203#comment-14131203
 ] 

Prasanth J commented on HIVE-7163:
--

[~rajesh.balamohan] do you have a reproducible test case for this?


 ReduceRecordProcessor adds null values to getShuffleInputs - Causing NPE
 

 Key: HIVE-7163
 URL: https://issues.apache.org/jira/browse/HIVE-7163
 Project: Hive
  Issue Type: Bug
Reporter: Rajesh Balamohan
  Labels: tez

 2014-06-01 22:35:55,435 ERROR TezChild 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor: java.lang.RuntimeException: 
 java.lang.NullPointerException
 at 
 org.apache.tez.runtime.InputReadyTracker$InputReadyMonitor.init(InputReadyTracker.java:111)
 at 
 org.apache.tez.runtime.InputReadyTracker.waitForAllInputsReady(InputReadyTracker.java:90)
 at 
 org.apache.tez.runtime.api.impl.TezProcessorContextImpl.waitForAllInputsReady(TezProcessorContextImpl.java:109)
 at 
 org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:198)
 at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:165)
 at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:307)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:173)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:173)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
 at java.util.concurrent.FutureTask.run(FutureTask.java:166)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:724)
 Caused by: java.lang.NullPointerException
 at java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333)
 at java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:1125)
 at java.util.Collections$SetFromMap.add(Collections.java:3903)
 at java.util.AbstractCollection.addAll(AbstractCollection.java:334)
 at 
 org.apache.tez.runtime.InputReadyTracker$InputReadyMonitor.init(InputReadyTracker.java:102)
 ... 17 more
 ReduceRecordProcessor.getShuffleInputs() adds null values to ShuffleInputs. 
 This is passed on to to 
 org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor: getShuffleInputs : 
 null, null
 Environment: Latest codebase



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Timeline for release of Hive 0.14

2014-09-12 Thread Navis류승우
Hi,

I'll really appreciate if HIVE-5690 can be included, which becomes harder
and harder to rebase.
Other 79 patches I've assigned to can be held on.

Thanks,
Navis


2014-09-11 19:54 GMT+09:00 Vaibhav Gumashta vgumas...@hortonworks.com:

 Hi Vikram,

 Can we also add: https://issues.apache.org/jira/browse/HIVE-6799 
 https://issues.apache.org/jira/browse/HIVE-7935 to the list.

 Thanks,
 --Vaibhav

 On Wed, Sep 10, 2014 at 12:18 AM, Satish Mittal satish.mit...@inmobi.com
 wrote:

  Hi,
  Can you please include HIVE-7892 (Thrift Set type not working with Hive)
 as
  well? It is under code review.
 
  Regards,
  Satish
 
 
  On Tue, Sep 9, 2014 at 2:10 PM, Suma Shivaprasad 
  sumasai.shivapra...@gmail.com wrote:
 
   Please include https://issues.apache.org/jira/browse/HIVE-7694  as
 well.
   It
   is currently under review by Amareshwari and should be done in the next
   couple of days.
  
   Thanks
   Suma
  
  
   On Mon, Sep 8, 2014 at 5:44 PM, Alan Gates ga...@hortonworks.com
  wrote:
  
I'll review that.  I just need the time to test it against mysql,
  oracle,
and hopefully sqlserver.  But I think we can do this post branch if
 we
   need
to, as it's a bug fix rather than a feature.
   
Alan.
   
  Damien Carol dca...@blitzbs.com
 September 8, 2014 at 3:19
 Same request for https://issues.apache.org/jira/browse/HIVE-7689
   
I already provided a patch, re-based it many times and I'm waiting
 for
  a
review.
   
Regards,
   
Le 08/09/2014 12:08, amareshwarisr . a écrit :
   
  amareshwarisr . amareshw...@gmail.com
 September 8, 2014 at 3:08
Would like to include
 https://issues.apache.org/jira/browse/HIVE-2390
   and
https://issues.apache.org/jira/browse/HIVE-7936 .
   
I can review and merge them.
   
Thanks
Amareshwari
   
   
   
  Vikram Dixit vik...@hortonworks.com
 September 5, 2014 at 17:53
Hi Folks,
   
I am going to start consolidating the items mentioned in this list
 and
create a wiki page to track it. I will wait till the end of next week
  to
create the branch taking into account Ashutosh's request.
   
Thanks
Vikram.
   
   
On Fri, Sep 5, 2014 at 5:39 PM, Ashutosh Chauhan 
 hashut...@apache.org
  
hashut...@apache.org
   
  Ashutosh Chauhan hashut...@apache.org
 September 5, 2014 at 17:39
Vikram,
   
Some of us are working on stabilizing cbo branch and trying to get it
merged into trunk. We feel we are close. May I request to defer
 cutting
   the
branch for few more days? Folks interested in this can track our
  progress
here : https://issues.apache.org/jira/browse/HIVE-7946
   
Thanks,
Ashutosh
   
   
On Fri, Aug 22, 2014 at 4:09 PM, Lars Francke 
 lars.fran...@gmail.com
lars.fran...@gmail.com
   
  Lars Francke lars.fran...@gmail.com
 August 22, 2014 at 16:09
Thank you for volunteering to do the release. I think a 0.14 release
  is a
good idea.
   
I have a couple of issues I'd like to get in too:
   
* Either HIVE-7107[0] (Fix an issue in the HiveServer1 JDBC driver)
 or
HIVE-6977[1] (Delete HiveServer1). The former needs a review the
  latter a
patch
* HIVE-6123[2] Checkstyle in Maven needs a review
   
HIVE-7622[3]  HIVE-7543[4] are waiting for any reviews or comments
 on
  my
previous thread[5]. I'd still appreciate any helpers for reviews or
  even
just comments. I'd feel very sad if I had done all that work for
  nothing.
Hoping this thread gives me a wider audience. Both patches fix up
  issues
that should have been caught in earlier reviews as they are almost
 all
Checkstyle or other style violations but they make for huge patches.
 I
could also create hundreds of small issues or stop doing these things
entirely
   
   
   
[0] https://issues.apache.org/jira/browse/HIVE-7107 
https://issues.apache.org/jira/browse/HIVE-7107 
[1] https://issues.apache.org/jira/browse/HIVE-6977 
https://issues.apache.org/jira/browse/HIVE-6977 
[2] https://issues.apache.org/jira/browse/HIVE-6123 
https://issues.apache.org/jira/browse/HIVE-6123 
[3] https://issues.apache.org/jira/browse/HIVE-7622 
https://issues.apache.org/jira/browse/HIVE-7622 
[4] https://issues.apache.org/jira/browse/HIVE-7543 
https://issues.apache.org/jira/browse/HIVE-7543 
   
On Fri, Aug 22, 2014 at 11:01 PM, John Pullokkaran 
   
   
--
Sent with Postbox http://www.getpostbox.com 
   
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or
  entity
to which it is addressed and may contain information that is
   confidential,
privileged and exempt from disclosure under applicable law. If the
  reader
of this message is not the intended recipient, you are hereby
 notified
   that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly 

[jira] [Updated] (HIVE-7156) Group-By operator stat-annotation only uses distinct approx to generate rollups

2014-09-12 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-7156:
-
Attachment: HIVE-7156.4.patch

Fixes one last test failure.

 Group-By operator stat-annotation only uses distinct approx to generate 
 rollups
 ---

 Key: HIVE-7156
 URL: https://issues.apache.org/jira/browse/HIVE-7156
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 0.14.0
Reporter: Gopal V
Assignee: Prasanth J
 Attachments: HIVE-7156.1.patch, HIVE-7156.2.patch, HIVE-7156.3.patch, 
 HIVE-7156.4.patch


 The stats annotation for a group-by only annotates the reduce-side row-count 
 with the distinct values.
 The map-side gets the row-count as the rows output instead of distinct * 
 parallelism, while the reducer side gets the correct parallelism.
 {code}
 hive explain select distinct L_SHIPDATE from lineitem;
   Vertices:
 Map 1 
 Map Operator Tree:
 TableScan
   alias: lineitem
   Statistics: Num rows: 589709 Data size: 4745677733354 
 Basic stats: COMPLETE Column stats: COMPLETE
   Select Operator
 expressions: l_shipdate (type: string)
 outputColumnNames: l_shipdate
 Statistics: Num rows: 589709 Data size: 4745677733354 
 Basic stats: COMPLETE Column stats: COMPLETE
 Group By Operator
   keys: l_shipdate (type: string)
   mode: hash
   outputColumnNames: _col0
   Statistics: Num rows: 589709 Data size: 
 563999032646 Basic stats: COMPLETE Column stats: COMPLETE
   Reduce Output Operator
 key expressions: _col0 (type: string)
 sort order: +
 Map-reduce partition columns: _col0 (type: string)
 Statistics: Num rows: 589709 Data size: 
 563999032646 Basic stats: COMPLETE Column stats: COMPLETE
 Execution mode: vectorized
 Reducer 2 
 Reduce Operator Tree:
   Group By Operator
 keys: KEY._col0 (type: string)
 mode: mergepartial
 outputColumnNames: _col0
 Statistics: Num rows: 1955 Data size: 183770 Basic stats: 
 COMPLETE Column stats: COMPLETE
 Select Operator
   expressions: _col0 (type: string)
   outputColumnNames: _col0
   Statistics: Num rows: 1955 Data size: 183770 Basic stats: 
 COMPLETE Column stats: COMPLETE
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7156) Group-By operator stat-annotation only uses distinct approx to generate rollups

2014-09-12 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131218#comment-14131218
 ] 

Prasanth J commented on HIVE-7156:
--

[~hagleitn]/[~gopalv]/[~rhbutani] can someone plz take a look at this patch? 
The patch has mostly test file changes. The code changes are small.

 Group-By operator stat-annotation only uses distinct approx to generate 
 rollups
 ---

 Key: HIVE-7156
 URL: https://issues.apache.org/jira/browse/HIVE-7156
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 0.14.0
Reporter: Gopal V
Assignee: Prasanth J
 Attachments: HIVE-7156.1.patch, HIVE-7156.2.patch, HIVE-7156.3.patch, 
 HIVE-7156.4.patch


 The stats annotation for a group-by only annotates the reduce-side row-count 
 with the distinct values.
 The map-side gets the row-count as the rows output instead of distinct * 
 parallelism, while the reducer side gets the correct parallelism.
 {code}
 hive explain select distinct L_SHIPDATE from lineitem;
   Vertices:
 Map 1 
 Map Operator Tree:
 TableScan
   alias: lineitem
   Statistics: Num rows: 589709 Data size: 4745677733354 
 Basic stats: COMPLETE Column stats: COMPLETE
   Select Operator
 expressions: l_shipdate (type: string)
 outputColumnNames: l_shipdate
 Statistics: Num rows: 589709 Data size: 4745677733354 
 Basic stats: COMPLETE Column stats: COMPLETE
 Group By Operator
   keys: l_shipdate (type: string)
   mode: hash
   outputColumnNames: _col0
   Statistics: Num rows: 589709 Data size: 
 563999032646 Basic stats: COMPLETE Column stats: COMPLETE
   Reduce Output Operator
 key expressions: _col0 (type: string)
 sort order: +
 Map-reduce partition columns: _col0 (type: string)
 Statistics: Num rows: 589709 Data size: 
 563999032646 Basic stats: COMPLETE Column stats: COMPLETE
 Execution mode: vectorized
 Reducer 2 
 Reduce Operator Tree:
   Group By Operator
 keys: KEY._col0 (type: string)
 mode: mergepartial
 outputColumnNames: _col0
 Statistics: Num rows: 1955 Data size: 183770 Basic stats: 
 COMPLETE Column stats: COMPLETE
 Select Operator
   expressions: _col0 (type: string)
   outputColumnNames: _col0
   Statistics: Num rows: 1955 Data size: 183770 Basic stats: 
 COMPLETE Column stats: COMPLETE
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Review Request 25575: HIVE-7615: Beeline should have an option for user to see the query progress

2014-09-12 Thread Dong Chen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25575/
---

Review request for hive.


Repository: hive-git


Description
---

When executing query in Beeline, user should have a option to see the progress 
through the outputs. Beeline could use the API introduced in HIVE-4629 to get 
and display the logs to the client.


Diffs
-

  beeline/pom.xml 45fa02b 
  beeline/src/java/org/apache/hive/beeline/Commands.java a92d69f 
  
itests/hive-unit/src/test/java/org/apache/hive/beeline/TestBeeLineWithArgs.java 
e1d44ec 
  itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcDriver2.java 
ae128a9 
  jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java 2cbf58c 

Diff: https://reviews.apache.org/r/25575/diff/


Testing
---

UT passed.


Thanks,

Dong Chen



[jira] [Updated] (HIVE-7615) Beeline should have an option for user to see the query progress

2014-09-12 Thread Dong Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Chen updated HIVE-7615:

Status: Patch Available  (was: Open)

 Beeline should have an option for user to see the query progress
 

 Key: HIVE-7615
 URL: https://issues.apache.org/jira/browse/HIVE-7615
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Reporter: Dong Chen
Assignee: Dong Chen

 When executing query in Beeline, user should have a option to see the 
 progress through the outputs.
 Beeline could use the API introduced in HIVE-4629 to get and display the logs 
 to the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7826) Dynamic partition pruning on Tez

2014-09-12 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131221#comment-14131221
 ] 

Lefty Leverenz commented on HIVE-7826:
--

Typo fixed:  HIVE-8018.  The parameter name is now 
*hive.tez.dynamic.partition.pruning.max.data.size* for release 0.14.0.

 Dynamic partition pruning on Tez
 

 Key: HIVE-7826
 URL: https://issues.apache.org/jira/browse/HIVE-7826
 Project: Hive
  Issue Type: Bug
  Components: Tez
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
  Labels: TODOC14, tez
 Fix For: 0.14.0

 Attachments: HIVE-7826.1.patch, HIVE-7826.2.patch, HIVE-7826.3.patch, 
 HIVE-7826.4.patch, HIVE-7826.5.patch, HIVE-7826.6.patch, HIVE-7826.7.patch


 It's natural in a star schema to map one or more dimensions to partition 
 columns. Time or location are likely candidates. 
 It can also useful to be to compute the partitions one would like to scan via 
 a subquery (where p in select ... from ...).
 The resulting joins in hive require a full table scan of the large table 
 though, because partition pruning takes place before the corresponding values 
 are known.
 On Tez it's relatively straight forward to send the values needed to prune to 
 the application master - where splits are generated and tasks are submitted. 
 Using these values we can strip out any unneeded partitions dynamically, 
 while the query is running.
 The approach is straight forward:
 - Insert synthetic conditions for each join representing x in (keys of other 
 side in join)
 - This conditions will be pushed as far down as possible
 - If the condition hits a table scan and the column involved is a partition 
 column:
- Setup Operator to send key events to AM
 - else:
- Remove synthetic predicate
 Add  these properties :
 ||Property||Default Value||
 |{{hive.tez.dynamic.partition.pruning}}|true|
 |{{hive.tez.dynamic.partition.pruning.max.event.size}}|1*1024*1024L|
 |{{hive.tez.dynamic.parition.pruning.max.data.size}}|100*1024*1024L|



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7156) Group-By operator stat-annotation only uses distinct approx to generate rollups

2014-09-12 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131223#comment-14131223
 ] 

Gopal V commented on HIVE-7156:
---

I just re-ran my test with this patch applied  the group-by still increases 
the row-count when applied?

{code}
hive explain select distinct L_SHIPDATE from lineitem;

STAGE PLANS:
  Stage: Stage-1
Tez
  Edges:
Reducer 2 - Map 1 (SIMPLE_EDGE)
  DagName: gopal_20140912004141_1f23f948-7852-4882-9f3e-1810904988b8:1
  Vertices:
Map 1 
Map Operator Tree:
TableScan
  alias: lineitem
  Statistics: Num rows: 589709 Data size: 4833087637230 
Basic stats: COMPLETE Column stats: COMPLETE
  Select Operator
expressions: l_shipdate (type: string)
outputColumnNames: l_shipdate
Statistics: Num rows: 589709 Data size: 4833087637230 
Basic stats: COMPLETE Column stats: COMPLETE
Group By Operator
  keys: l_shipdate (type: string)
  mode: hash
  outputColumnNames: _col0
  Statistics: Num rows: 113279805705920 Data size: 
10648301736356480 Basic stats: COMPLETE Column stats: COMPLETE
  Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 113279805705920 Data size: 
10648301736356480 Basic stats: COMPLETE Column stats: COMPLETE
Execution mode: vectorized
{code}

 Group-By operator stat-annotation only uses distinct approx to generate 
 rollups
 ---

 Key: HIVE-7156
 URL: https://issues.apache.org/jira/browse/HIVE-7156
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 0.14.0
Reporter: Gopal V
Assignee: Prasanth J
 Attachments: HIVE-7156.1.patch, HIVE-7156.2.patch, HIVE-7156.3.patch, 
 HIVE-7156.4.patch


 The stats annotation for a group-by only annotates the reduce-side row-count 
 with the distinct values.
 The map-side gets the row-count as the rows output instead of distinct * 
 parallelism, while the reducer side gets the correct parallelism.
 {code}
 hive explain select distinct L_SHIPDATE from lineitem;
   Vertices:
 Map 1 
 Map Operator Tree:
 TableScan
   alias: lineitem
   Statistics: Num rows: 589709 Data size: 4745677733354 
 Basic stats: COMPLETE Column stats: COMPLETE
   Select Operator
 expressions: l_shipdate (type: string)
 outputColumnNames: l_shipdate
 Statistics: Num rows: 589709 Data size: 4745677733354 
 Basic stats: COMPLETE Column stats: COMPLETE
 Group By Operator
   keys: l_shipdate (type: string)
   mode: hash
   outputColumnNames: _col0
   Statistics: Num rows: 589709 Data size: 
 563999032646 Basic stats: COMPLETE Column stats: COMPLETE
   Reduce Output Operator
 key expressions: _col0 (type: string)
 sort order: +
 Map-reduce partition columns: _col0 (type: string)
 Statistics: Num rows: 589709 Data size: 
 563999032646 Basic stats: COMPLETE Column stats: COMPLETE
 Execution mode: vectorized
 Reducer 2 
 Reduce Operator Tree:
   Group By Operator
 keys: KEY._col0 (type: string)
 mode: mergepartial
 outputColumnNames: _col0
 Statistics: Num rows: 1955 Data size: 183770 Basic stats: 
 COMPLETE Column stats: COMPLETE
 Select Operator
   expressions: _col0 (type: string)
   outputColumnNames: _col0
   Statistics: Num rows: 1955 Data size: 183770 Basic stats: 
 COMPLETE Column stats: COMPLETE
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-8041) Hadoop-2 build is broken with JDK6

2014-09-12 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis resolved HIVE-8041.
-
   Resolution: Fixed
Fix Version/s: 0.14.0

Committed to trunk.

 Hadoop-2 build is broken with JDK6
 --

 Key: HIVE-8041
 URL: https://issues.apache.org/jira/browse/HIVE-8041
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Affects Versions: 0.14.0
Reporter: Xuefu Zhang
Assignee: Navis
 Fix For: 0.14.0

 Attachments: HIVE-8041.1.patch.txt


 {code}
 [ERROR] Failed to execute goal 
 org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) 
 on project hive-exec: Compilation failure
 [ERROR] 
 /home/xzhang/apache/hive7/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFIf.java:[81,1]
  illegal start of expression
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7704) Create tez task for fast file merging

2014-09-12 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-7704:
-
Attachment: HIVE-7704.11.patch

Rebased this patch against trunk.

 Create tez task for fast file merging
 -

 Key: HIVE-7704
 URL: https://issues.apache.org/jira/browse/HIVE-7704
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-7704.1.patch, HIVE-7704.10.patch, 
 HIVE-7704.11.patch, HIVE-7704.2.patch, HIVE-7704.3.patch, HIVE-7704.4.patch, 
 HIVE-7704.4.patch, HIVE-7704.5.patch, HIVE-7704.6.patch, HIVE-7704.7.patch, 
 HIVE-7704.8.patch, HIVE-7704.9.patch


 Currently tez falls back to MR task for merge file task. It will beneficial 
 to convert the merge file tasks to tez task to make use of the performance 
 gains from tez. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7156) Group-By operator stat-annotation only uses distinct approx to generate rollups

2014-09-12 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131229#comment-14131229
 ] 

Prasanth J commented on HIVE-7156:
--

Thanks [~gopalv] for looking into it. I will check what going on and will 
report back.

 Group-By operator stat-annotation only uses distinct approx to generate 
 rollups
 ---

 Key: HIVE-7156
 URL: https://issues.apache.org/jira/browse/HIVE-7156
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 0.14.0
Reporter: Gopal V
Assignee: Prasanth J
 Attachments: HIVE-7156.1.patch, HIVE-7156.2.patch, HIVE-7156.3.patch, 
 HIVE-7156.4.patch


 The stats annotation for a group-by only annotates the reduce-side row-count 
 with the distinct values.
 The map-side gets the row-count as the rows output instead of distinct * 
 parallelism, while the reducer side gets the correct parallelism.
 {code}
 hive explain select distinct L_SHIPDATE from lineitem;
   Vertices:
 Map 1 
 Map Operator Tree:
 TableScan
   alias: lineitem
   Statistics: Num rows: 589709 Data size: 4745677733354 
 Basic stats: COMPLETE Column stats: COMPLETE
   Select Operator
 expressions: l_shipdate (type: string)
 outputColumnNames: l_shipdate
 Statistics: Num rows: 589709 Data size: 4745677733354 
 Basic stats: COMPLETE Column stats: COMPLETE
 Group By Operator
   keys: l_shipdate (type: string)
   mode: hash
   outputColumnNames: _col0
   Statistics: Num rows: 589709 Data size: 
 563999032646 Basic stats: COMPLETE Column stats: COMPLETE
   Reduce Output Operator
 key expressions: _col0 (type: string)
 sort order: +
 Map-reduce partition columns: _col0 (type: string)
 Statistics: Num rows: 589709 Data size: 
 563999032646 Basic stats: COMPLETE Column stats: COMPLETE
 Execution mode: vectorized
 Reducer 2 
 Reduce Operator Tree:
   Group By Operator
 keys: KEY._col0 (type: string)
 mode: mergepartial
 outputColumnNames: _col0
 Statistics: Num rows: 1955 Data size: 183770 Basic stats: 
 COMPLETE Column stats: COMPLETE
 Select Operator
   expressions: _col0 (type: string)
   outputColumnNames: _col0
   Statistics: Num rows: 1955 Data size: 183770 Basic stats: 
 COMPLETE Column stats: COMPLETE
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7733) Ambiguous column reference error on query

2014-09-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131232#comment-14131232
 ] 

Hive QA commented on HIVE-7733:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12668234/HIVE-7733.4.patch.txt

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 6198 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ambiguous_col
org.apache.hadoop.hive.ql.txn.compactor.TestCompactor.testStatsAfterCompactionPartTbl
org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes
org.apache.hive.service.TestHS2ImpersonationWithRemoteMS.testImpersonation
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/755/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/755/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-755/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12668234

 Ambiguous column reference error on query
 -

 Key: HIVE-7733
 URL: https://issues.apache.org/jira/browse/HIVE-7733
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Jason Dere
Assignee: Navis
 Attachments: HIVE-7733.1.patch.txt, HIVE-7733.2.patch.txt, 
 HIVE-7733.3.patch.txt, HIVE-7733.4.patch.txt


 {noformat}
 CREATE TABLE agg1 
   ( 
  col0 INT, 
  col1 STRING, 
  col2 DOUBLE 
   ); 
 explain SELECT single_use_subq11.a1 AS a1, 
single_use_subq11.a2 AS a2 
 FROM   (SELECT Sum(agg1.col2) AS a1 
 FROM   agg1 
 GROUP  BY agg1.col0) single_use_subq12 
JOIN (SELECT alias.a2 AS a0, 
 alias.a1 AS a1, 
 alias.a1 AS a2 
  FROM   (SELECT agg1.col1 AS a0, 
 '42'  AS a1, 
 agg1.col0 AS a2 
  FROM   agg1 
  UNION ALL 
  SELECT agg1.col1 AS a0, 
 '41'  AS a1, 
 agg1.col0 AS a2 
  FROM   agg1) alias 
  GROUP  BY alias.a2, 
alias.a1) single_use_subq11 
  ON ( single_use_subq11.a0 = single_use_subq11.a0 );
 {noformat}
 Gets the following error:
 FAILED: SemanticException [Error 10007]: Ambiguous column reference a2
 Looks like this query had been working in 0.12 but starting failing with this 
 error in 0.13



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7946) CBO: Merge CBO changes to Trunk

2014-09-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131233#comment-14131233
 ] 

Hive QA commented on HIVE-7946:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12668238/HIVE-7946.6.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/756/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/756/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-756/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-756/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
Reverted 'ql/src/test/results/clientnegative/ambiguous_col.q.out'
Reverted 'ql/src/test/results/clientnegative/ambiguous_col0.q.out'
Reverted 'ql/src/test/results/clientnegative/ambiguous_col1.q.out'
Reverted 'ql/src/test/results/clientnegative/ambiguous_col2.q.out'
Reverted 'ql/src/test/results/clientpositive/ambiguous_col.q.out'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/parse/RowResolver.java'
++ egrep -v '^X|^Performing status on external'
++ awk '{print $2}'
++ svn status --no-ignore
+ rm -rf target datanucleus.log ant/target shims/target shims/0.20/target 
shims/0.20S/target shims/0.23/target shims/aggregator/target 
shims/common/target shims/common-secure/target packaging/target 
hbase-handler/target testutils/target jdbc/target metastore/target 
itests/target itests/hcatalog-unit/target itests/test-serde/target 
itests/qtest/target itests/hive-unit-hadoop2/target itests/hive-minikdc/target 
itests/hive-unit/target itests/custom-serde/target itests/util/target 
hcatalog/target hcatalog/core/target hcatalog/streaming/target 
hcatalog/server-extensions/target hcatalog/webhcat/svr/target 
hcatalog/webhcat/java-client/target hcatalog/hcatalog-pig-adapter/target 
accumulo-handler/target hwi/target common/target common/src/gen service/target 
contrib/target serde/target beeline/target odbc/target cli/target 
ql/dependency-reduced-pom.xml ql/target 
ql/src/test/results/clientpositive/complex_alias.q.out 
ql/src/test/queries/clientpositive/complex_alias.q
+ svn update
Uql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFIf.java

Fetching external item into 'hcatalog/src/test/e2e/harness'
Updated external to revision 1624472.

Updated to revision 1624472.
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12668238

 CBO: Merge CBO changes to Trunk
 ---

 Key: HIVE-7946
 URL: https://issues.apache.org/jira/browse/HIVE-7946
 Project: Hive
  Issue Type: Bug
  Components: CBO
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Attachments: HIVE-7946.1.patch, HIVE-7946.2.patch, HIVE-7946.3.patch, 
 

[jira] [Commented] (HIVE-8036) PTest SSH Options

2014-09-12 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131245#comment-14131245
 ] 

Lefty Leverenz commented on HIVE-8036:
--

Should these options be documented in the wiki?

 PTest SSH Options
 -

 Key: HIVE-8036
 URL: https://issues.apache.org/jira/browse/HIVE-8036
 Project: Hive
  Issue Type: Improvement
Reporter: Brock Noland
Assignee: Brock Noland
 Fix For: 0.14.0

 Attachments: HIVE-8036.patch


 I'd like to be able to specify the following options:
 {noformat}
 StrictHostKeyChecking no
 ConnectionAttempts 3
 ServerAliveInterval 1
 {noformat}
 as a config param in the ptest config file as opposed to depending on them 
 set in the env.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7733) Ambiguous column reference error on query

2014-09-12 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131246#comment-14131246
 ] 

Navis commented on HIVE-7733:
-

[~ashutoshc] The uniqueness of columns name should be mandatory for sub query. 
testCliDriver_ambiguous_col should be failed in this assumption but it's 
succeeded in exceptional way (select-star in top-level query). Should we allow 
this?

 Ambiguous column reference error on query
 -

 Key: HIVE-7733
 URL: https://issues.apache.org/jira/browse/HIVE-7733
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Jason Dere
Assignee: Navis
 Attachments: HIVE-7733.1.patch.txt, HIVE-7733.2.patch.txt, 
 HIVE-7733.3.patch.txt, HIVE-7733.4.patch.txt


 {noformat}
 CREATE TABLE agg1 
   ( 
  col0 INT, 
  col1 STRING, 
  col2 DOUBLE 
   ); 
 explain SELECT single_use_subq11.a1 AS a1, 
single_use_subq11.a2 AS a2 
 FROM   (SELECT Sum(agg1.col2) AS a1 
 FROM   agg1 
 GROUP  BY agg1.col0) single_use_subq12 
JOIN (SELECT alias.a2 AS a0, 
 alias.a1 AS a1, 
 alias.a1 AS a2 
  FROM   (SELECT agg1.col1 AS a0, 
 '42'  AS a1, 
 agg1.col0 AS a2 
  FROM   agg1 
  UNION ALL 
  SELECT agg1.col1 AS a0, 
 '41'  AS a1, 
 agg1.col0 AS a2 
  FROM   agg1) alias 
  GROUP  BY alias.a2, 
alias.a1) single_use_subq11 
  ON ( single_use_subq11.a0 = single_use_subq11.a0 );
 {noformat}
 Gets the following error:
 FAILED: SemanticException [Error 10007]: Ambiguous column reference a2
 Looks like this query had been working in 0.12 but starting failing with this 
 error in 0.13



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-7733) Ambiguous column reference error on query

2014-09-12 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131246#comment-14131246
 ] 

Navis edited comment on HIVE-7733 at 9/12/14 8:11 AM:
--

[~ashutoshc] The uniqueness of columns name should be mandatory for sub query. 
testCliDriver_ambiguous_col should be failed under this assumption but it's 
succeeded in exceptional way (select-star in top-level query). Should we allow 
this?


was (Author: navis):
[~ashutoshc] The uniqueness of columns name should be mandatory for sub query. 
testCliDriver_ambiguous_col should be failed in this assumption but it's 
succeeded in exceptional way (select-star in top-level query). Should we allow 
this?

 Ambiguous column reference error on query
 -

 Key: HIVE-7733
 URL: https://issues.apache.org/jira/browse/HIVE-7733
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Jason Dere
Assignee: Navis
 Attachments: HIVE-7733.1.patch.txt, HIVE-7733.2.patch.txt, 
 HIVE-7733.3.patch.txt, HIVE-7733.4.patch.txt


 {noformat}
 CREATE TABLE agg1 
   ( 
  col0 INT, 
  col1 STRING, 
  col2 DOUBLE 
   ); 
 explain SELECT single_use_subq11.a1 AS a1, 
single_use_subq11.a2 AS a2 
 FROM   (SELECT Sum(agg1.col2) AS a1 
 FROM   agg1 
 GROUP  BY agg1.col0) single_use_subq12 
JOIN (SELECT alias.a2 AS a0, 
 alias.a1 AS a1, 
 alias.a1 AS a2 
  FROM   (SELECT agg1.col1 AS a0, 
 '42'  AS a1, 
 agg1.col0 AS a2 
  FROM   agg1 
  UNION ALL 
  SELECT agg1.col1 AS a0, 
 '41'  AS a1, 
 agg1.col0 AS a2 
  FROM   agg1) alias 
  GROUP  BY alias.a2, 
alias.a1) single_use_subq11 
  ON ( single_use_subq11.a0 = single_use_subq11.a0 );
 {noformat}
 Gets the following error:
 FAILED: SemanticException [Error 10007]: Ambiguous column reference a2
 Looks like this query had been working in 0.12 but starting failing with this 
 error in 0.13



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8042) Optionally allow move tasks to run in parallel

2014-09-12 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131261#comment-14131261
 ] 

Lefty Leverenz commented on HIVE-8042:
--

Should the description of *hive.exec.parallel* be revised in HiveConf.java, 
since its behavior is changing in 0.14.0?  (Sorry I didn't chime in earlier.)

Alternatively, the wiki could provide information about this change:

* [hive.exec.parallel | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.exec.parallel]

 Optionally allow move tasks to run in parallel
 --

 Key: HIVE-8042
 URL: https://issues.apache.org/jira/browse/HIVE-8042
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Fix For: 0.14.0

 Attachments: HIVE-8042.1.patch, HIVE-8042.2.patch, HIVE-8042.3.patch


 hive.exec.parallel allows one to run different stages of a query in parallel. 
 However that applies only to map-reduce tasks. When using large multi insert 
 queries there are many MoveTasks that are all executed in sequence on the 
 client. There's no real reason for that - they could be run in parallel as 
 well (i.e.: the stage graph captures the dependencies and knows which tasks 
 can happen in parallel).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7859) Tune zlib compression in ORC to account for the encoding strategy

2014-09-12 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-7859:
-
Labels: TODOC14  (was: )

 Tune zlib compression in ORC to account for the encoding strategy
 -

 Key: HIVE-7859
 URL: https://issues.apache.org/jira/browse/HIVE-7859
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Gopal V
Assignee: Gopal V
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-7859.1.patch, HIVE-7859.2.patch, HIVE-7859.3.patch


 Currently ORC Zlib is slow because several compression strategies ZLib uses 
 is already done by ORC in itself (dictionary, RLE, bit-packing).
 We need to pick between Z_FILTERED, Z_HUFFMAN_ONLY, Z_RLE, Z_FIXED and 
 Z_DEFAULT_STRATEGY according to column stream type.
 For instance an RLE_V2 stream could a use Z_FILTERED compression without 
 invoking the rest of the strategies.
 The string streams can use Z_FIXED compression strategies and so on.
 The core limitation to stick to retain compatibility with the default 
 decompressor, so that these are automatically backward compatible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7859) Tune zlib compression in ORC to account for the encoding strategy

2014-09-12 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131271#comment-14131271
 ] 

Lefty Leverenz commented on HIVE-7859:
--

Doc note:  This adds configuration parameter 
*hive.exec.orc.compression.strategy* to HiveConf.java, so it needs to be 
documented in the wiki:

* [Configuration Properties -- ORC File Format | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-ORCFileFormat]

 Tune zlib compression in ORC to account for the encoding strategy
 -

 Key: HIVE-7859
 URL: https://issues.apache.org/jira/browse/HIVE-7859
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Gopal V
Assignee: Gopal V
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-7859.1.patch, HIVE-7859.2.patch, HIVE-7859.3.patch


 Currently ORC Zlib is slow because several compression strategies ZLib uses 
 is already done by ORC in itself (dictionary, RLE, bit-packing).
 We need to pick between Z_FILTERED, Z_HUFFMAN_ONLY, Z_RLE, Z_FIXED and 
 Z_DEFAULT_STRATEGY according to column stream type.
 For instance an RLE_V2 stream could a use Z_FILTERED compression without 
 invoking the rest of the strategies.
 The string streams can use Z_FIXED compression strategies and so on.
 The core limitation to stick to retain compatibility with the default 
 decompressor, so that these are automatically backward compatible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7615) Beeline should have an option for user to see the query progress

2014-09-12 Thread Dong Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Chen updated HIVE-7615:

Attachment: HIVE-7615.patch

[~cartershanklin], [~brocknoland], [~thejas] Thanks very much for looking at 
this jira and give ideas on it. I got a patch uploaded. Since HIVE-4629 was in 
trunk, I tried to use that API to fetch logs and it works fine.

Review board: https://reviews.apache.org/r/25575/

The patch mainly contains:
1. add jdbc layer API, which use thrift method FetchResult(), to get operation 
(query) logs. Also add a QueryStatus to keep the query state in jdbc, since 
execute() method is blocking and get log method needs to be sync up with it in 
another thread.

2. Beeline use the added jdbc API to fetch logs and show them in console.
Use existed 'silent' option to choose whether to show logs, since Beeline seems 
to have many options already. If a seperated option like showProgress is 
preferable, please let me know.

 Beeline should have an option for user to see the query progress
 

 Key: HIVE-7615
 URL: https://issues.apache.org/jira/browse/HIVE-7615
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Reporter: Dong Chen
Assignee: Dong Chen
 Attachments: HIVE-7615.patch


 When executing query in Beeline, user should have a option to see the 
 progress through the outputs.
 Beeline could use the API introduced in HIVE-4629 to get and display the logs 
 to the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7977) Avoid creating serde for partitions if possible in FetchTask

2014-09-12 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-7977:

Attachment: HIVE-7977.3.patch.txt

Fixed NPE in TestJdbcDriver2, but cannot reproduce fail of 
file_with_header_footer.q

 Avoid creating serde for partitions if possible in FetchTask
 

 Key: HIVE-7977
 URL: https://issues.apache.org/jira/browse/HIVE-7977
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-7977.1.patch.txt, HIVE-7977.2.patch.txt, 
 HIVE-7977.3.patch.txt


 Currently, FetchTask creates SerDe instance thrice for each partition, which 
 can be avoided if it's same with table SerDe.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7615) Beeline should have an option for user to see the query progress

2014-09-12 Thread Dong Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131309#comment-14131309
 ] 

Dong Chen commented on HIVE-7615:
-

Another thing I planned to put in this patch was Beeline filtering the fetched 
logs and only showing a simple percentage progress. When filtering, a log4j 
pattern is used to derive needed class and message info. (MR job progress info 
is mainly in class ql.Driver and ql.exec.Task)

This needs some work: Beeline does not know the pattern HS2 using, and Beeline 
is introduced two dependency of jar “chainsaw”, “apache-log4j-extras” to parse 
fetched log.

Any thought on it? Do you think user having an option to show the progress or 
not is enough, or it is prefered to make this filter work?

 Beeline should have an option for user to see the query progress
 

 Key: HIVE-7615
 URL: https://issues.apache.org/jira/browse/HIVE-7615
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Reporter: Dong Chen
Assignee: Dong Chen
 Attachments: HIVE-7615.patch


 When executing query in Beeline, user should have a option to see the 
 progress through the outputs.
 Beeline could use the API introduced in HIVE-4629 to get and display the logs 
 to the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8040) Commit for HIVE-7925 breaks hadoop-1 build

2014-09-12 Thread Satish Mittal (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131330#comment-14131330
 ] 

Satish Mittal commented on HIVE-8040:
-

Applied 2nd patch and ran 'mvn clean install -DskipTests -Phadoop-1'. Now it 
failed at:
{noformat}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:2.3.2:compile (default-compile) 
on project hive-exec: Compilation failure: Compilation failure:
[ERROR] 
/home/satish/work/hive/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/CustomPartitionEdge.java:[29,27]
 cannot find symbol
[ERROR] symbol  : class DataInputByteBuffer
[ERROR] location: package org.apache.hadoop.io
[ERROR] 
/home/satish/work/hive/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/CustomPartitionEdge.java:[73,4]
 cannot find symbol
[ERROR] symbol  : class DataInputByteBuffer
[ERROR] location: class org.apache.hadoop.hive.ql.exec.tez.CustomPartitionEdge
[ERROR] 
/home/satish/work/hive/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/CustomPartitionEdge.java:[73,35]
 cannot find symbol
[ERROR] symbol  : class DataInputByteBuffer
[ERROR] location: class org.apache.hadoop.hive.ql.exec.tez.CustomPartitionEdge
{noformat}

 Commit for HIVE-7925 breaks hadoop-1 build
 --

 Key: HIVE-8040
 URL: https://issues.apache.org/jira/browse/HIVE-8040
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Affects Versions: 0.14.0
Reporter: Xuefu Zhang
 Attachments: HIVE-8040.1.patch.txt, HIVE-8040.2.patch.txt


 {code}
 [ERROR] Failed to execute goal 
 org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) 
 on project hive-metastore: Compilation failure
 [ERROR] 
 /home/xzhang/apache/hive7/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:[45,37]
  package org.apache.commons.math3.stat does not exist
 [ERROR] - [Help 1]
 {code}
 Missing pom file changes maybe?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7777) add CSV support for Serde

2014-09-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131445#comment-14131445
 ] 

Hive QA commented on HIVE-:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12668248/HIVE-.2.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6201 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.metastore.txn.TestCompactionTxnHandler.testRevokeTimedOutWorkers
org.apache.hive.service.TestHS2ImpersonationWithRemoteMS.testImpersonation
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/758/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/758/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-758/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12668248

 add CSV support for Serde
 -

 Key: HIVE-
 URL: https://issues.apache.org/jira/browse/HIVE-
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu
 Attachments: HIVE-.1.patch, HIVE-.2.patch, HIVE-.patch, 
 csv-serde-master.zip


 There is no official support for csvSerde for hive while there is an open 
 source project in github(https://github.com/ogrodnek/csv-serde). CSV is of 
 high frequency in use as a data format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-712) Cleanup build scripts for Hive

2014-09-12 Thread Lars Francke (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Francke resolved HIVE-712.
---
Resolution: Fixed
  Assignee: (was: Ashish Thusoo)

Fixed as part of the Mavenization

 Cleanup build scripts for Hive
 --

 Key: HIVE-712
 URL: https://issues.apache.org/jira/browse/HIVE-712
 Project: Hive
  Issue Type: Improvement
  Components: Build Infrastructure
Affects Versions: 0.3.0
Reporter: Ashish Thusoo
Priority: Minor

 The build scripts for hive have a lot of duplication in build-common.xml and 
 the individual build.xml for the different modules. We need to simply this as 
 well as remove any duplications etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-721) Integration with HadoopDB

2014-09-12 Thread Lars Francke (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131455#comment-14131455
 ] 

Lars Francke commented on HIVE-721:
---

There's not much development on HadoopDB and there's Tez and Spark now. Do you 
plan to work on this? Otherwise I suggest closing it.

 Integration with HadoopDB
 -

 Key: HIVE-721
 URL: https://issues.apache.org/jira/browse/HIVE-721
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.4.0
Reporter: Azza Abouzeid
Priority: Minor
   Original Estimate: 2h
  Remaining Estimate: 2h

 The HadoopDB project integrates Hadoop with single node databases, which 
 provide a high performance data layer for analytical queries over structured 
 data. HadoopDB's SMS (SQL-to-MapReduce-to-SQL) component uses Hive's 
 SemanticAnalyzer to convert SQL to MapReduce plans. After plan generation, we 
 recreate SQL from the lower plan operators and push the SQL into database 
 layer maintaining the upper layers of the plan, that can't be pushed into the 
 single node databases, intact. For more information on this process, please 
 read the HadoopDB paper (http://db.cs.yale.edu/hadoopdb/hadoopdb.pdf) and 
 browse the source code if you feel like it (more specifically the 
 SQLQueryGenerator class) at http://sourceforge.net/projects/hadoopdb/. 
 HadoopDB is a natural system level extension of Hive's goal of providing a 
 simple SQL interface for large-scale data processing.
 A simple patch that integrates Hive with HadoopDB's SMS could be found here: 
 http://hadoopdb.svn.sourceforge.net/viewvc/hadoopdb/trunk/Patches/hive-sms.patch?view=log
 In addition to the semantic analyzer post-processing, we modified certain 
 areas to allow paths to be associated with databases to allow the recreation 
 of the operator tree from the map.input.file configuration. Instead of 
 FileInputSplit --- we set up an interface Pathable, to allow any inputsplit 
 that implements pathable to return a dummy path equivalent to the 
 map.input.file path.
 Instead of the post semantic analysis function call to the SQLQueryGenerator 
 class, you could also use hooks. One such suggestion provided by a HadoopDB 
 user is found here 
 http://sourceforge.net/tracker/index.php?func=detailaid=2829253group_id=269559atid=1146689.
 We would really appreciate your help in better integrating Hive and HadoopDB. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-2438) add trademark attributions to Hive homepage

2014-09-12 Thread Lars Francke (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131458#comment-14131458
 ] 

Lars Francke commented on HIVE-2438:


Looking at the homepage and the linked document I think this is fixed and this 
issue can be closed.

 add trademark attributions to Hive homepage
 ---

 Key: HIVE-2438
 URL: https://issues.apache.org/jira/browse/HIVE-2438
 Project: Hive
  Issue Type: Sub-task
Reporter: John Sichi
Assignee: Carl Steinbach

 http://www.apache.org/foundation/marks/pmcs.html#attributions



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-2437) update project website navigation links

2014-09-12 Thread Lars Francke (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131459#comment-14131459
 ] 

Lars Francke commented on HIVE-2437:


I think this has been fixed and the issue can be closed unless I'm missing 
something.

 update project website navigation links
 ---

 Key: HIVE-2437
 URL: https://issues.apache.org/jira/browse/HIVE-2437
 Project: Hive
  Issue Type: Sub-task
Reporter: John Sichi
Assignee: Carl Steinbach

 http://www.apache.org/foundation/marks/pmcs.html#navigation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 25245: Support dynamic service discovery for HiveServer2

2014-09-12 Thread Vaibhav Gumashta

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25245/
---

(Updated Sept. 12, 2014, 12:30 p.m.)


Review request for hive, Alan Gates, Navis Ryu, Szehon Ho, and Thejas Nair.


Bugs: HIVE-7935
https://issues.apache.org/jira/browse/HIVE-7935


Repository: hive-git


Description
---

https://issues.apache.org/jira/browse/HIVE-7935


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 5d2e6b0 
  itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcDriver2.java 
ae128a9 
  jdbc/pom.xml 1ad13a7 
  jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java cbcfec7 
  jdbc/src/java/org/apache/hive/jdbc/HiveDriver.java 6e248d6 
  jdbc/src/java/org/apache/hive/jdbc/JdbcUriParseException.java PRE-CREATION 
  jdbc/src/java/org/apache/hive/jdbc/Utils.java 58339bf 
  jdbc/src/java/org/apache/hive/jdbc/ZooKeeperHiveClientException.java 
PRE-CREATION 
  jdbc/src/java/org/apache/hive/jdbc/ZooKeeperHiveClientHelper.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java
 0919d2f 
  ql/src/java/org/apache/hadoop/hive/ql/util/ZooKeeperHiveHelper.java 
PRE-CREATION 
  
ql/src/test/org/apache/hadoop/hive/ql/lockmgr/zookeeper/TestZookeeperLockManager.java
 59294b1 
  service/src/java/org/apache/hive/service/cli/CLIService.java a0bc905 
  service/src/java/org/apache/hive/service/cli/operation/OperationManager.java 
f5a8f27 
  service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 
b0bb8be 
  service/src/java/org/apache/hive/service/cli/session/SessionManager.java 
11d25cc 
  
service/src/java/org/apache/hive/service/cli/thrift/ThriftBinaryCLIService.java 
2b80adc 
  service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 
443c371 
  service/src/java/org/apache/hive/service/cli/thrift/ThriftHttpCLIService.java 
4067106 
  service/src/java/org/apache/hive/service/server/HiveServer2.java 124996c 
  
service/src/test/org/apache/hive/service/cli/session/TestSessionGlobalInitFile.java
 66fc1fc 

Diff: https://reviews.apache.org/r/25245/diff/


Testing
---

Manual testing.


Thanks,

Vaibhav Gumashta



[jira] [Updated] (HIVE-7946) CBO: Merge CBO changes to Trunk

2014-09-12 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-7946:
-
Status: Open  (was: Patch Available)

 CBO: Merge CBO changes to Trunk
 ---

 Key: HIVE-7946
 URL: https://issues.apache.org/jira/browse/HIVE-7946
 Project: Hive
  Issue Type: Bug
  Components: CBO
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Attachments: HIVE-7946.1.patch, HIVE-7946.2.patch, HIVE-7946.3.patch, 
 HIVE-7946.4.patch, HIVE-7946.5.patch, HIVE-7946.6.patch, HIVE-7946.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7946) CBO: Merge CBO changes to Trunk

2014-09-12 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-7946:
-
Status: Patch Available  (was: Open)

 CBO: Merge CBO changes to Trunk
 ---

 Key: HIVE-7946
 URL: https://issues.apache.org/jira/browse/HIVE-7946
 Project: Hive
  Issue Type: Bug
  Components: CBO
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Attachments: HIVE-7946.1.patch, HIVE-7946.2.patch, HIVE-7946.3.patch, 
 HIVE-7946.4.patch, HIVE-7946.5.patch, HIVE-7946.6.patch, HIVE-7946.7.patch, 
 HIVE-7946.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7946) CBO: Merge CBO changes to Trunk

2014-09-12 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-7946:
-
Attachment: HIVE-7946.7.patch

 CBO: Merge CBO changes to Trunk
 ---

 Key: HIVE-7946
 URL: https://issues.apache.org/jira/browse/HIVE-7946
 Project: Hive
  Issue Type: Bug
  Components: CBO
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Attachments: HIVE-7946.1.patch, HIVE-7946.2.patch, HIVE-7946.3.patch, 
 HIVE-7946.4.patch, HIVE-7946.5.patch, HIVE-7946.6.patch, HIVE-7946.7.patch, 
 HIVE-7946.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8061) improve the partition col stats update speed

2014-09-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131510#comment-14131510
 ] 

Hive QA commented on HIVE-8061:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12668256/HIVE-8061.2.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6197 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_columnstats_partlvl_dp
org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/759/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/759/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-759/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12668256

 improve the partition col stats update speed
 

 Key: HIVE-8061
 URL: https://issues.apache.org/jira/browse/HIVE-8061
 Project: Hive
  Issue Type: Improvement
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
Priority: Minor
 Attachments: HIVE-8061.1.patch, HIVE-8061.2.patch


 We worked hard towards faster update stats for columns of a partition of a 
 table previously HIVE-7736
 and HIVE-7876
 Although there is some improvement, it is only correct in the first run. 
 There will be duplicate column stats later. Thanks to Eugene Koifman 's 
 comments.
 We fixed this in HIVE-7944 by reversing the patch.
 This JIRA ticket is my another try to improve the speed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7788) Generate plans for insert, update, and delete

2014-09-12 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-7788:
-
Status: Open  (was: Patch Available)

Canceling patch as I need to fix the failing tests.

 Generate plans for insert, update, and delete
 -

 Key: HIVE-7788
 URL: https://issues.apache.org/jira/browse/HIVE-7788
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: HIVE-7788.2.patch, HIVE-7788.3.patch, 
 HIVE-7788.WIP.patch, HIVE-7788.patch


 Insert plans needs to be generated differently for ACID tables, plus we need 
 to be able to generate plans in the semantic analyzer for update and delete.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7788) Generate plans for insert, update, and delete

2014-09-12 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-7788:
-
Attachment: HIVE-7788.4.patch

Backed out the change where I used isAssignedFrom instead of equals to check 
whether an output format was an acid output format, as it incorrectly said all 
output formats were acid.

 Generate plans for insert, update, and delete
 -

 Key: HIVE-7788
 URL: https://issues.apache.org/jira/browse/HIVE-7788
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: HIVE-7788.2.patch, HIVE-7788.3.patch, HIVE-7788.4.patch, 
 HIVE-7788.WIP.patch, HIVE-7788.patch


 Insert plans needs to be generated differently for ACID tables, plus we need 
 to be able to generate plans in the semantic analyzer for update and delete.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7788) Generate plans for insert, update, and delete

2014-09-12 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-7788:
-
Status: Patch Available  (was: Open)

 Generate plans for insert, update, and delete
 -

 Key: HIVE-7788
 URL: https://issues.apache.org/jira/browse/HIVE-7788
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: HIVE-7788.2.patch, HIVE-7788.3.patch, HIVE-7788.4.patch, 
 HIVE-7788.WIP.patch, HIVE-7788.patch


 Insert plans needs to be generated differently for ACID tables, plus we need 
 to be able to generate plans in the semantic analyzer for update and delete.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8017) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]

2014-09-12 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131603#comment-14131603
 ] 

Xuefu Zhang commented on HIVE-8017:
---

{quote}
do you think we need a JIRA to track this difference so we can find the cause 
when we have time
{quote}

Yes, please.

 Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark 
 Branch]
 ---

 Key: HIVE-8017
 URL: https://issues.apache.org/jira/browse/HIVE-8017
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-8017-spark.patch, HIVE-8017.2-spark.patch, 
 HIVE-8017.3-spark.patch, HIVE-8017.4-spark.patch, HIVE-8017.5-spark.patch


 HiveKey should be used as the key type because it holds the hash code for 
 partitioning. While BytesWritable serves partitioning well for simple cases, 
 we have to use {{HiveKey.hashCode}} for more complicated ones, e.g. join, 
 bucketed table, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 25414: HIVE-7788 Generate plans for insert, update, and delete

2014-09-12 Thread Alan Gates


 On Sept. 9, 2014, 9:33 p.m., Eugene Koifman wrote:
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java, line 
  11872
  https://reviews.apache.org/r/25414/diff/1/?file=682029#file682029line11872
 
  does this work if of implements a sublcass of AcidOutputFormat?  
  Perhaps Class.isAssignableFrom() is a safer choice
 
 Alan Gates wrote:
 Changed.

Actually, I had to back this out.  Making this change made it so that it said 
all output formats were acid.


- Alan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25414/#review52655
---


On Sept. 11, 2014, 2:17 p.m., Alan Gates wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25414/
 ---
 
 (Updated Sept. 11, 2014, 2:17 p.m.)
 
 
 Review request for hive, Ashutosh Chauhan, Eugene Koifman, Jason Dere, and 
 Thejas Nair.
 
 
 Bugs: HIVE-7788
 https://issues.apache.org/jira/browse/HIVE-7788
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 This patch adds plan generation as well as making modifications to some of 
 the exec operators to make insert/value, update, and delete work. The patch 
 is large, but about 2/3 of that are tests.
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 5d2e6b0 
   data/conf/tez/hive-site.xml 0b3877c 
   
 itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/history/TestHiveHistory.java
  1a84024 
   
 itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
  9807497 
   itests/src/test/resources/testconfiguration.properties 99049ca 
   metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java 
 f1697bb 
   ql/src/java/org/apache/hadoop/hive/ql/Context.java 7fcbe3c 
   ql/src/java/org/apache/hadoop/hive/ql/Driver.java 9953919 
   ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 4246d68 
   ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java 7477199 
   ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java f018ca0 
   ql/src/java/org/apache/hadoop/hive/ql/hooks/ReadEntity.java e3bc3b1 
   ql/src/java/org/apache/hadoop/hive/ql/hooks/WriteEntity.java 7f1d71b 
   ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java b1c4441 
   ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java 264052f 
   ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DummyTxnManager.java 8354ad9 
   ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveTxnManager.java 32d2f7a 
   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 2b1a345 
   ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 4acafba 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/BucketingSortingReduceSinkOptimizer.java
  96a5d78 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java
  5c711cf 
   ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 
 5195748 
   ql/src/java/org/apache/hadoop/hive/ql/parse/QBParseInfo.java 911ac8a 
   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 496f6a6 
   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzerFactory.java 
 3e3926e 
   ql/src/java/org/apache/hadoop/hive/ql/parse/StorageFormat.java ad91b0f 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/plan/LoadTableDesc.java 2dbf1c8 
   ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java 6dce30c 
   ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java 5695f35 
   ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 5164b16 
   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToInteger.java 789c780 
   ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 63ecb8d 
   
 ql/src/test/org/apache/hadoop/hive/ql/parse/TestUpdateDeleteSemanticAnalyzer.java
  PRE-CREATION 
   ql/src/test/queries/clientnegative/acid_overwrite.q PRE-CREATION 
   ql/src/test/queries/clientnegative/delete_not_acid.q PRE-CREATION 
   ql/src/test/queries/clientnegative/update_not_acid.q PRE-CREATION 
   ql/src/test/queries/clientnegative/update_partition_col.q PRE-CREATION 
   ql/src/test/queries/clientpositive/delete_all_non_partitioned.q 
 PRE-CREATION 
   ql/src/test/queries/clientpositive/delete_all_partitioned.q PRE-CREATION 
   ql/src/test/queries/clientpositive/delete_orig_table.q PRE-CREATION 
   ql/src/test/queries/clientpositive/delete_tmp_table.q PRE-CREATION 
   ql/src/test/queries/clientpositive/delete_where_no_match.q PRE-CREATION 
   ql/src/test/queries/clientpositive/delete_where_non_partitioned.q 
 PRE-CREATION 
   ql/src/test/queries/clientpositive/delete_where_partitioned.q PRE-CREATION 
   

[jira] [Comment Edited] (HIVE-8017) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]

2014-09-12 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131603#comment-14131603
 ] 

Xuefu Zhang edited comment on HIVE-8017 at 9/12/14 2:30 PM:


{quote}
do you think we need a JIRA to track this difference so we can find the cause 
when we have time
{quote}

Yes, please.

I will commit this patch shortly.


was (Author: xuefuz):
{quote}
do you think we need a JIRA to track this difference so we can find the cause 
when we have time
{quote}

Yes, please.

 Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark 
 Branch]
 ---

 Key: HIVE-8017
 URL: https://issues.apache.org/jira/browse/HIVE-8017
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-8017-spark.patch, HIVE-8017.2-spark.patch, 
 HIVE-8017.3-spark.patch, HIVE-8017.4-spark.patch, HIVE-8017.5-spark.patch


 HiveKey should be used as the key type because it holds the hash code for 
 partitioning. While BytesWritable serves partitioning well for simple cases, 
 we have to use {{HiveKey.hashCode}} for more complicated ones, e.g. join, 
 bucketed table, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8069) CBO: RowResolver after SubQuery predicate handling should be reset to outer query block RR

2014-09-12 Thread Harish Butani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harish Butani updated HIVE-8069:

Attachment: HIVE-8069.1.patch

previous patch file was wrong

 CBO: RowResolver after SubQuery predicate handling should be reset to outer 
 query block RR
 --

 Key: HIVE-8069
 URL: https://issues.apache.org/jira/browse/HIVE-8069
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-8069.1.patch, HIVE-8069.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8069) CBO: RowResolver after SubQuery predicate handling should be reset to outer query block RR

2014-09-12 Thread Harish Butani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harish Butani updated HIVE-8069:

Fix Version/s: 0.14.0

 CBO: RowResolver after SubQuery predicate handling should be reset to outer 
 query block RR
 --

 Key: HIVE-8069
 URL: https://issues.apache.org/jira/browse/HIVE-8069
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Harish Butani
Assignee: Harish Butani
 Fix For: 0.14.0

 Attachments: HIVE-8069.1.patch, HIVE-8069.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-8069) CBO: RowResolver after SubQuery predicate handling should be reset to outer query block RR

2014-09-12 Thread Harish Butani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harish Butani resolved HIVE-8069.
-
Resolution: Fixed

Committed to CBO branch
[~jpullokkaran] thanks for reviewing

 CBO: RowResolver after SubQuery predicate handling should be reset to outer 
 query block RR
 --

 Key: HIVE-8069
 URL: https://issues.apache.org/jira/browse/HIVE-8069
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-8069.1.patch, HIVE-8069.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7325) Support non-constant expressions for MAP type indices.

2014-09-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131614#comment-14131614
 ] 

Hive QA commented on HIVE-7325:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12668272/HIVE-7325.3.patch.txt

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6195 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_invalid_map_index
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/760/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/760/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-760/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12668272

 Support non-constant expressions for MAP type indices.
 --

 Key: HIVE-7325
 URL: https://issues.apache.org/jira/browse/HIVE-7325
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
Reporter: Mala Chikka Kempanna
Assignee: Navis
 Fix For: 0.14.0

 Attachments: HIVE-7325.1.patch.txt, HIVE-7325.2.patch.txt, 
 HIVE-7325.3.patch.txt


 Here is my sample:
 {code}
 CREATE TABLE RECORD(RecordID string, BatchDate string, Country string) 
 STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
 WITH SERDEPROPERTIES (hbase.columns.mapping = :key,D:BatchDate,D:Country) 
 TBLPROPERTIES (hbase.table.name = RECORD); 
 CREATE TABLE KEY_RECORD(KeyValue String, RecordId mapstring,string) 
 STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
 WITH SERDEPROPERTIES (hbase.columns.mapping = :key, K:) 
 TBLPROPERTIES (hbase.table.name = KEY_RECORD); 
 {code}
 The following join statement doesn't work. 
 {code}
 SELECT a.*, b.* from KEY_RECORD a join RECORD b 
 WHERE a.RecordId[b.RecordID] is not null;
 {code}
 FAILED: SemanticException 2:16 Non-constant expression for map indexes not 
 supported. Error encountered near token 'RecordID' 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8038) Decouple ORC files split calculation logic from Filesystem's get file location implementation

2014-09-12 Thread Pankit Thapar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131648#comment-14131648
 ] 

Pankit Thapar commented on HIVE-8038:
-

Hi,

Can you please take a look at this cr : https://reviews.apache.org/r/25521/

Thanks,
Pankit

 Decouple ORC files split calculation logic from Filesystem's get file 
 location implementation
 -

 Key: HIVE-8038
 URL: https://issues.apache.org/jira/browse/HIVE-8038
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.13.1
Reporter: Pankit Thapar
 Fix For: 0.14.0

 Attachments: HIVE-8038.patch


 What is the Current Logic
 ==
 1.get the file blocks from FileSystem.getFileBlockLocations() which returns 
 an array of BlockLocation
 2.In SplitGenerator.createSplit(), check if split only spans one block or 
 multiple blocks.
 3.If split spans just one block, then using the array index (index = 
 offset/blockSize), get the corresponding host having the blockLocation
 4.If the split spans multiple blocks, then get all hosts that have at least 
 80% of the max of total data in split hosted by any host.
 5.add the split to a list of splits
 Issue with Current Logic
 =
 Dependency on FileSystem API’s logic for block location calculations. It 
 returns an array and we need to rely on FileSystem to  
 make all blocks of same size if we want to directly access a block from the 
 array.
  
 What is the Fix
 =
 1a.get the file blocks from FileSystem.getFileBlockLocations() which returns 
 an array of BlockLocation
 1b.convert the array into a tree map offset, BlockLocation and return it 
 through getLocationsWithOffSet()
 2.In SplitGenerator.createSplit(), check if split only spans one block or 
 multiple blocks.
 3.If split spans just one block, then using Tree.floorEntry(key), get the 
 highest entry smaller than offset for the split and get the corresponding 
 host.
 4a.If the split spans multiple blocks, get a submap, which contains all 
 entries containing blockLocations from the offset to offset + length
 4b.get all hosts that have at least 80% of the max of total data in split 
 hosted by any host.
 5.add the split to a list of splits
 What are the major changes in logic
 ==
 1. store BlockLocations in a Map instead of an array
 2. Call SHIMS.getLocationsWithOffSet() instead of getLocations()
 3. one block case is checked by if(offset + length = start.getOffset() + 
 start.getLength())  instead of if((offset % blockSize) + length = 
 blockSize)
 What is the affect on Complexity (Big O)
 =
 1. We add a O(n) loop to build a TreeMap from an array but its a one time 
 cost and would not be called for each split
 2. In case of one block case, we can get the block in O(logn) worst case 
 which was O(1) before
 3. Getting the submap is O(logn)
 4. In case of multiple block case, building the list of hosts is O(m) which 
 was O(n)  m  n as previously we were iterating 
over all the block locations but now we are only iterating only blocks 
 that belong to that range go offsets that we need. 
 What are the benefits of the change
 ==
 1. With this fix, we do not depend on the blockLocations returned by 
 FileSystem to figure out the block corresponding to the offset and blockSize
 2. Also, it is not necessary that block lengths is same for all blocks for 
 all FileSystems
 3. Previously we were using blockSize for one block case and block.length for 
 multiple block case, which is not the case now. We figure out the block
depending upon the actual length and offset of the block



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8062) Stats collection for columns fails on a partitioned table with null values in partitioning column

2014-09-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131701#comment-14131701
 ] 

Hive QA commented on HIVE-8062:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12668275/HIVE-8062.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6197 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.ql.parse.TestParse.testParse_union
org.apache.hive.service.TestHS2ImpersonationWithRemoteMS.testImpersonation
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/761/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/761/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-761/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12668275

 Stats collection for columns fails on a partitioned table with null values in 
 partitioning column
 -

 Key: HIVE-8062
 URL: https://issues.apache.org/jira/browse/HIVE-8062
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.14.0
Reporter: Deepesh Khandelwal
Assignee: Ashutosh Chauhan
 Attachments: HIVE-8062.patch


 Steps to reproduce:
 1. Create a data file abc.txt with the following contents:
 {noformat}
 a,1
 b,
 {noformat}
 2. Use the Hive CLI to create and load the partitioned table:
 {noformat}
 hive create table abc(a string, b int);
 OK
 Time taken: 0.272 seconds
 hive load data local inpath 'abc.txt' into table abc;
 Loading data to table default.abc
 Table default.abc stats: [numFiles=1, numRows=0, totalSize=7, rawDataSize=0]
 OK
 Time taken: 0.463 seconds
 hive create table abc1(a string) partitioned by (b int);
 OK
 Time taken: 0.098 seconds
 hive set hive.exec.dynamic.partition.mode=nonstrict;
 hive insert overwrite table abc1 partition (b) select a, b from abc;
 Query ID = hrt_qa_20140911210909_1200fae7-1e18-4e0d-b74f-040453c27cff
 Total jobs = 1
 Launching Job 1 out of 1
 Status: Running (application id: Executing on YARN cluster with App id 
 application_1410457588978_0063)
 Map 1: -/-Reducer 2: 0/1
 Map 1: 0/1Reducer 2: 0/1
 Map 1: 0(+1)/1Reducer 2: 0/1
 Map 1: 1/1Reducer 2: 0(+1)/1
 Map 1: 1/1Reducer 2: 0/1
 Map 1: 1/1Reducer 2: 1/1
 Status: Finished successfully
 Loading data to table default.abc1 partition (b=null)
   Loading partition {b=__HIVE_DEFAULT_PARTITION__}
 Partition default.abc1{b=__HIVE_DEFAULT_PARTITION__} stats: [numFiles=1, 
 numRows=2, totalSize=7, rawDataSize=5]
 OK
 Time taken: 7.49 seconds
 {noformat}
 3. Now run the analyze statistics command for columns:
 {noformat}
 hive analyze table abc1 partition (b) compute statistics for columns;
 Query ID = hrt_qa_20140911211010_440bdb4a-6a0d-496b-9d2e-5fc84db3d0ee
 Total jobs = 1
 Launching Job 1 out of 1
 Status: Running (application id: Executing on YARN cluster with App id 
 application_1410457588978_0063)
 Map 1: 0(+1)/1Reducer 2: 0/1
 Map 1: 1/1Reducer 2: 0(+1)/1
 Map 1: 1/1Reducer 2: 1/1
 Status: Finished successfully
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.ColumnStatsTask
 {noformat}
 The analyze statistics for columns fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8017) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]

2014-09-12 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8017:
--
   Resolution: Fixed
Fix Version/s: spark-branch
   Status: Resolved  (was: Patch Available)

Patch committed to spark branch. Thanks to Rui for the contribution.

 Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark 
 Branch]
 ---

 Key: HIVE-8017
 URL: https://issues.apache.org/jira/browse/HIVE-8017
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Fix For: spark-branch

 Attachments: HIVE-8017-spark.patch, HIVE-8017.2-spark.patch, 
 HIVE-8017.3-spark.patch, HIVE-8017.4-spark.patch, HIVE-8017.5-spark.patch


 HiveKey should be used as the key type because it holds the hash code for 
 partitioning. While BytesWritable serves partitioning well for simple cases, 
 we have to use {{HiveKey.hashCode}} for more complicated ones, e.g. join, 
 bucketed table, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8017) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]

2014-09-12 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8017:
--
Labels: Spark-M1  (was: )

 Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark 
 Branch]
 ---

 Key: HIVE-8017
 URL: https://issues.apache.org/jira/browse/HIVE-8017
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
  Labels: Spark-M1
 Fix For: spark-branch

 Attachments: HIVE-8017-spark.patch, HIVE-8017.2-spark.patch, 
 HIVE-8017.3-spark.patch, HIVE-8017.4-spark.patch, HIVE-8017.5-spark.patch


 HiveKey should be used as the key type because it holds the hash code for 
 partitioning. While BytesWritable serves partitioning well for simple cases, 
 we have to use {{HiveKey.hashCode}} for more complicated ones, e.g. join, 
 bucketed table, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8042) Optionally allow move tasks to run in parallel

2014-09-12 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131721#comment-14131721
 ] 

Xuefu Zhang commented on HIVE-8042:
---

HiveConf.java actually doesn't say anything about task types. It uses a general 
term job, which seems good even after this patch. Wiki of course can supply 
more info.

 Optionally allow move tasks to run in parallel
 --

 Key: HIVE-8042
 URL: https://issues.apache.org/jira/browse/HIVE-8042
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Fix For: 0.14.0

 Attachments: HIVE-8042.1.patch, HIVE-8042.2.patch, HIVE-8042.3.patch


 hive.exec.parallel allows one to run different stages of a query in parallel. 
 However that applies only to map-reduce tasks. When using large multi insert 
 queries there are many MoveTasks that are all executed in sequence on the 
 client. There's no real reason for that - they could be run in parallel as 
 well (i.e.: the stage graph captures the dependencies and knows which tasks 
 can happen in parallel).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8072) TesParse_union is failing on trunk

2014-09-12 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-8072:
---
Status: Patch Available  (was: Open)

 TesParse_union is failing on trunk
 --

 Key: HIVE-8072
 URL: https://issues.apache.org/jira/browse/HIVE-8072
 Project: Hive
  Issue Type: Task
  Components: Tests
Affects Versions: 0.14.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-8072.patch


 Needs golden file update



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8072) TesParse_union is failing on trunk

2014-09-12 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-8072:
---
Attachment: HIVE-8072.patch

 TesParse_union is failing on trunk
 --

 Key: HIVE-8072
 URL: https://issues.apache.org/jira/browse/HIVE-8072
 Project: Hive
  Issue Type: Task
  Components: Tests
Affects Versions: 0.14.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-8072.patch


 Needs golden file update



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8072) TesParse_union is failing on trunk

2014-09-12 Thread Ashutosh Chauhan (JIRA)
Ashutosh Chauhan created HIVE-8072:
--

 Summary: TesParse_union is failing on trunk
 Key: HIVE-8072
 URL: https://issues.apache.org/jira/browse/HIVE-8072
 Project: Hive
  Issue Type: Task
  Components: Tests
Affects Versions: 0.14.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-8072.patch

Needs golden file update



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7812) Disable CombineHiveInputFormat when ACID format is used

2014-09-12 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-7812:

   Resolution: Fixed
Fix Version/s: 0.14.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

 Disable CombineHiveInputFormat when ACID format is used
 ---

 Key: HIVE-7812
 URL: https://issues.apache.org/jira/browse/HIVE-7812
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.14.0

 Attachments: HIVE-7812.patch, HIVE-7812.patch, HIVE-7812.patch


 Currently the HiveCombineInputFormat complains when called on an ACID 
 directory. Modify HiveCombineInputFormat so that HiveInputFormat is used 
 instead if the directory is ACID format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-2438) add trademark attributions to Hive homepage

2014-09-12 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland resolved HIVE-2438.

Resolution: Fixed

Good call, I fixed this when creating the new site.

 add trademark attributions to Hive homepage
 ---

 Key: HIVE-2438
 URL: https://issues.apache.org/jira/browse/HIVE-2438
 Project: Hive
  Issue Type: Sub-task
Reporter: John Sichi
Assignee: Carl Steinbach

 http://www.apache.org/foundation/marks/pmcs.html#attributions



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8036) PTest SSH Options

2014-09-12 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131761#comment-14131761
 ] 

Brock Noland commented on HIVE-8036:


Not today. The docs we have today are not about ptest but about using the infra 
ptest creates. I need to go ahead and create some basic user docs for ptest and 
then we can start docing this stuff. 

 PTest SSH Options
 -

 Key: HIVE-8036
 URL: https://issues.apache.org/jira/browse/HIVE-8036
 Project: Hive
  Issue Type: Improvement
Reporter: Brock Noland
Assignee: Brock Noland
 Fix For: 0.14.0

 Attachments: HIVE-8036.patch


 I'd like to be able to specify the following options:
 {noformat}
 StrictHostKeyChecking no
 ConnectionAttempts 3
 ServerAliveInterval 1
 {noformat}
 as a config param in the ptest config file as opposed to depending on them 
 set in the env.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8073) Go thru all operator plan optimizations and disable those that are not suitable for Spark [Spark Branch]

2014-09-12 Thread Xuefu Zhang (JIRA)
Xuefu Zhang created HIVE-8073:
-

 Summary: Go thru all operator plan optimizations and disable those 
that are not suitable for Spark [Spark Branch]
 Key: HIVE-8073
 URL: https://issues.apache.org/jira/browse/HIVE-8073
 Project: Hive
  Issue Type: Task
  Components: Spark
Reporter: Xuefu Zhang


I have seen some optimization done in the logical plan that's not applicable, 
such as in HIVE-8054. We should go thru all those optimizaitons to identify if 
any.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8062) Stats collection for columns fails on a partitioned table with null values in partitioning column

2014-09-12 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-8062:
---
   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

TestParse_union is failing on trunk too. Created HIVE-8072 for it. 
Committed this one to trunk. 

 Stats collection for columns fails on a partitioned table with null values in 
 partitioning column
 -

 Key: HIVE-8062
 URL: https://issues.apache.org/jira/browse/HIVE-8062
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.14.0
Reporter: Deepesh Khandelwal
Assignee: Ashutosh Chauhan
 Fix For: 0.14.0

 Attachments: HIVE-8062.patch


 Steps to reproduce:
 1. Create a data file abc.txt with the following contents:
 {noformat}
 a,1
 b,
 {noformat}
 2. Use the Hive CLI to create and load the partitioned table:
 {noformat}
 hive create table abc(a string, b int);
 OK
 Time taken: 0.272 seconds
 hive load data local inpath 'abc.txt' into table abc;
 Loading data to table default.abc
 Table default.abc stats: [numFiles=1, numRows=0, totalSize=7, rawDataSize=0]
 OK
 Time taken: 0.463 seconds
 hive create table abc1(a string) partitioned by (b int);
 OK
 Time taken: 0.098 seconds
 hive set hive.exec.dynamic.partition.mode=nonstrict;
 hive insert overwrite table abc1 partition (b) select a, b from abc;
 Query ID = hrt_qa_20140911210909_1200fae7-1e18-4e0d-b74f-040453c27cff
 Total jobs = 1
 Launching Job 1 out of 1
 Status: Running (application id: Executing on YARN cluster with App id 
 application_1410457588978_0063)
 Map 1: -/-Reducer 2: 0/1
 Map 1: 0/1Reducer 2: 0/1
 Map 1: 0(+1)/1Reducer 2: 0/1
 Map 1: 1/1Reducer 2: 0(+1)/1
 Map 1: 1/1Reducer 2: 0/1
 Map 1: 1/1Reducer 2: 1/1
 Status: Finished successfully
 Loading data to table default.abc1 partition (b=null)
   Loading partition {b=__HIVE_DEFAULT_PARTITION__}
 Partition default.abc1{b=__HIVE_DEFAULT_PARTITION__} stats: [numFiles=1, 
 numRows=2, totalSize=7, rawDataSize=5]
 OK
 Time taken: 7.49 seconds
 {noformat}
 3. Now run the analyze statistics command for columns:
 {noformat}
 hive analyze table abc1 partition (b) compute statistics for columns;
 Query ID = hrt_qa_20140911211010_440bdb4a-6a0d-496b-9d2e-5fc84db3d0ee
 Total jobs = 1
 Launching Job 1 out of 1
 Status: Running (application id: Executing on YARN cluster with App id 
 application_1410457588978_0063)
 Map 1: 0(+1)/1Reducer 2: 0/1
 Map 1: 1/1Reducer 2: 0(+1)/1
 Map 1: 1/1Reducer 2: 1/1
 Status: Finished successfully
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.ColumnStatsTask
 {noformat}
 The analyze statistics for columns fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HIVE-8040) Commit for HIVE-7925 breaks hadoop-1 build

2014-09-12 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley reopened HIVE-8040:
-

 Commit for HIVE-7925 breaks hadoop-1 build
 --

 Key: HIVE-8040
 URL: https://issues.apache.org/jira/browse/HIVE-8040
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Affects Versions: 0.14.0
Reporter: Xuefu Zhang
 Attachments: HIVE-8040.1.patch.txt, HIVE-8040.2.patch.txt


 {code}
 [ERROR] Failed to execute goal 
 org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) 
 on project hive-metastore: Compilation failure
 [ERROR] 
 /home/xzhang/apache/hive7/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:[45,37]
  package org.apache.commons.math3.stat does not exist
 [ERROR] - [Help 1]
 {code}
 Missing pom file changes maybe?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8040) Commit for HIVE-7925 breaks hadoop-1 build

2014-09-12 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-8040:

Attachment: HIVE-8040.patch

With this patch on trunk, the hadoop-1 profile compiles.

 Commit for HIVE-7925 breaks hadoop-1 build
 --

 Key: HIVE-8040
 URL: https://issues.apache.org/jira/browse/HIVE-8040
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Affects Versions: 0.14.0
Reporter: Xuefu Zhang
 Attachments: HIVE-8040.1.patch.txt, HIVE-8040.2.patch.txt, 
 HIVE-8040.patch


 {code}
 [ERROR] Failed to execute goal 
 org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) 
 on project hive-metastore: Compilation failure
 [ERROR] 
 /home/xzhang/apache/hive7/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:[45,37]
  package org.apache.commons.math3.stat does not exist
 [ERROR] - [Help 1]
 {code}
 Missing pom file changes maybe?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7981) alias of compound aggregation functions fails in having clause

2014-09-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131802#comment-14131802
 ] 

Hive QA commented on HIVE-7981:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12668287/HIVE-7981.2.patch.txt

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6198 tests executed
*Failed tests:*
{noformat}
org.apache.hive.service.TestHS2ImpersonationWithRemoteMS.testImpersonation
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/762/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/762/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-762/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12668287

 alias of compound aggregation functions fails in having clause
 --

 Key: HIVE-7981
 URL: https://issues.apache.org/jira/browse/HIVE-7981
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: eyal gruss
Assignee: Navis
Priority: Minor
 Attachments: HIVE-7981.1.patch.txt, HIVE-7981.2.patch.txt


 hive select max(time)-min(time) as span from mytable group by name having 
 span0;
 FAILED: SemanticException [Error 10025]: Line 1:92 Expression not in GROUP BY 
 key '0'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8040) Commit for HIVE-7925 breaks hadoop-1 build

2014-09-12 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-8040:
---
Status: Patch Available  (was: Reopened)

LGTM. +1 
Copying class from hadoop to hive seems ok since there is no dependency of that 
class .cc: [~hagleitn]


 Commit for HIVE-7925 breaks hadoop-1 build
 --

 Key: HIVE-8040
 URL: https://issues.apache.org/jira/browse/HIVE-8040
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Affects Versions: 0.14.0
Reporter: Xuefu Zhang
 Attachments: HIVE-8040.1.patch.txt, HIVE-8040.2.patch.txt, 
 HIVE-8040.patch


 {code}
 [ERROR] Failed to execute goal 
 org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) 
 on project hive-metastore: Compilation failure
 [ERROR] 
 /home/xzhang/apache/hive7/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:[45,37]
  package org.apache.commons.math3.stat does not exist
 [ERROR] - [Help 1]
 {code}
 Missing pom file changes maybe?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7981) alias of compound aggregation functions fails in having clause

2014-09-12 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131812#comment-14131812
 ] 

Ashutosh Chauhan commented on HIVE-7981:


[~navis] Can you create RB entry for this?

 alias of compound aggregation functions fails in having clause
 --

 Key: HIVE-7981
 URL: https://issues.apache.org/jira/browse/HIVE-7981
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: eyal gruss
Assignee: Navis
Priority: Minor
 Attachments: HIVE-7981.1.patch.txt, HIVE-7981.2.patch.txt


 hive select max(time)-min(time) as span from mytable group by name having 
 span0;
 FAILED: SemanticException [Error 10025]: Line 1:92 Expression not in GROUP BY 
 key '0'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7156) Group-By operator stat-annotation only uses distinct approx to generate rollups

2014-09-12 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-7156:
-
Status: Open  (was: Patch Available)

 Group-By operator stat-annotation only uses distinct approx to generate 
 rollups
 ---

 Key: HIVE-7156
 URL: https://issues.apache.org/jira/browse/HIVE-7156
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 0.14.0
Reporter: Gopal V
Assignee: Prasanth J
 Attachments: HIVE-7156.1.patch, HIVE-7156.2.patch, HIVE-7156.3.patch, 
 HIVE-7156.4.patch


 The stats annotation for a group-by only annotates the reduce-side row-count 
 with the distinct values.
 The map-side gets the row-count as the rows output instead of distinct * 
 parallelism, while the reducer side gets the correct parallelism.
 {code}
 hive explain select distinct L_SHIPDATE from lineitem;
   Vertices:
 Map 1 
 Map Operator Tree:
 TableScan
   alias: lineitem
   Statistics: Num rows: 589709 Data size: 4745677733354 
 Basic stats: COMPLETE Column stats: COMPLETE
   Select Operator
 expressions: l_shipdate (type: string)
 outputColumnNames: l_shipdate
 Statistics: Num rows: 589709 Data size: 4745677733354 
 Basic stats: COMPLETE Column stats: COMPLETE
 Group By Operator
   keys: l_shipdate (type: string)
   mode: hash
   outputColumnNames: _col0
   Statistics: Num rows: 589709 Data size: 
 563999032646 Basic stats: COMPLETE Column stats: COMPLETE
   Reduce Output Operator
 key expressions: _col0 (type: string)
 sort order: +
 Map-reduce partition columns: _col0 (type: string)
 Statistics: Num rows: 589709 Data size: 
 563999032646 Basic stats: COMPLETE Column stats: COMPLETE
 Execution mode: vectorized
 Reducer 2 
 Reduce Operator Tree:
   Group By Operator
 keys: KEY._col0 (type: string)
 mode: mergepartial
 outputColumnNames: _col0
 Statistics: Num rows: 1955 Data size: 183770 Basic stats: 
 COMPLETE Column stats: COMPLETE
 Select Operator
   expressions: _col0 (type: string)
   outputColumnNames: _col0
   Statistics: Num rows: 1955 Data size: 183770 Basic stats: 
 COMPLETE Column stats: COMPLETE
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8040) Commit for HIVE-7925 breaks hadoop-1 build

2014-09-12 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-8040:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Given that this is fixing the build, with Ashutosh's review I've committed it.

 Commit for HIVE-7925 breaks hadoop-1 build
 --

 Key: HIVE-8040
 URL: https://issues.apache.org/jira/browse/HIVE-8040
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Affects Versions: 0.14.0
Reporter: Xuefu Zhang
 Attachments: HIVE-8040.1.patch.txt, HIVE-8040.2.patch.txt, 
 HIVE-8040.patch


 {code}
 [ERROR] Failed to execute goal 
 org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) 
 on project hive-metastore: Compilation failure
 [ERROR] 
 /home/xzhang/apache/hive7/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:[45,37]
  package org.apache.commons.math3.stat does not exist
 [ERROR] - [Help 1]
 {code}
 Missing pom file changes maybe?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7704) Create tez task for fast file merging

2014-09-12 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-7704:
-
Attachment: HIVE-7704.12.patch

HIVE-7859 changes ORC file sizes which causes diffs in 2 test files. Updated 
them in this patch.

 Create tez task for fast file merging
 -

 Key: HIVE-7704
 URL: https://issues.apache.org/jira/browse/HIVE-7704
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-7704.1.patch, HIVE-7704.10.patch, 
 HIVE-7704.11.patch, HIVE-7704.12.patch, HIVE-7704.2.patch, HIVE-7704.3.patch, 
 HIVE-7704.4.patch, HIVE-7704.4.patch, HIVE-7704.5.patch, HIVE-7704.6.patch, 
 HIVE-7704.7.patch, HIVE-7704.8.patch, HIVE-7704.9.patch


 Currently tez falls back to MR task for merge file task. It will beneficial 
 to convert the merge file tasks to tez task to make use of the performance 
 gains from tez. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7788) Generate plans for insert, update, and delete

2014-09-12 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131874#comment-14131874
 ] 

Thejas M Nair commented on HIVE-7788:
-

Alan, can you please upload the updated patch to review board ?

 Generate plans for insert, update, and delete
 -

 Key: HIVE-7788
 URL: https://issues.apache.org/jira/browse/HIVE-7788
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: HIVE-7788.2.patch, HIVE-7788.3.patch, HIVE-7788.4.patch, 
 HIVE-7788.WIP.patch, HIVE-7788.patch


 Insert plans needs to be generated differently for ACID tables, plus we need 
 to be able to generate plans in the semantic analyzer for update and delete.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8074) Merge spark into trunk 9/12/2014

2014-09-12 Thread Brock Noland (JIRA)
Brock Noland created HIVE-8074:
--

 Summary: Merge spark into trunk 9/12/2014
 Key: HIVE-8074
 URL: https://issues.apache.org/jira/browse/HIVE-8074
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8076) Test Failure input23

2014-09-12 Thread Laljo John Pullokkaran (JIRA)
Laljo John Pullokkaran created HIVE-8076:


 Summary: Test Failure input23
 Key: HIVE-8076
 URL: https://issues.apache.org/jira/browse/HIVE-8076
 Project: Hive
  Issue Type: Sub-task
Reporter: Laljo John Pullokkaran






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8075) Test limit_pushdown failure

2014-09-12 Thread Laljo John Pullokkaran (JIRA)
Laljo John Pullokkaran created HIVE-8075:


 Summary: Test limit_pushdown failure
 Key: HIVE-8075
 URL: https://issues.apache.org/jira/browse/HIVE-8075
 Project: Hive
  Issue Type: Sub-task
Reporter: Laljo John Pullokkaran






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8076) CBO Trunk Merge: Test Failure input23

2014-09-12 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-8076:
-
Summary: CBO Trunk Merge: Test Failure input23  (was: Test Failure input23)

 CBO Trunk Merge: Test Failure input23
 -

 Key: HIVE-8076
 URL: https://issues.apache.org/jira/browse/HIVE-8076
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Laljo John Pullokkaran





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8075) CBO Trunk Merge: Test limit_pushdown failure

2014-09-12 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-8075:
-
Summary: CBO Trunk Merge: Test limit_pushdown failure  (was: Test 
limit_pushdown failure)

 CBO Trunk Merge: Test limit_pushdown failure
 

 Key: HIVE-8075
 URL: https://issues.apache.org/jira/browse/HIVE-8075
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Laljo John Pullokkaran





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8077) Test Failure vectorization_7

2014-09-12 Thread Laljo John Pullokkaran (JIRA)
Laljo John Pullokkaran created HIVE-8077:


 Summary: Test Failure vectorization_7
 Key: HIVE-8077
 URL: https://issues.apache.org/jira/browse/HIVE-8077
 Project: Hive
  Issue Type: Sub-task
Reporter: Laljo John Pullokkaran






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8077) CBO Trunk Merge: Test Failure vectorization_7

2014-09-12 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-8077:
-
Summary: CBO Trunk Merge: Test Failure vectorization_7  (was: Test Failure 
vectorization_7)

 CBO Trunk Merge: Test Failure vectorization_7
 -

 Key: HIVE-8077
 URL: https://issues.apache.org/jira/browse/HIVE-8077
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Laljo John Pullokkaran





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-8077) CBO Trunk Merge: Test Failure vectorization_7

2014-09-12 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran reassigned HIVE-8077:


Assignee: Laljo John Pullokkaran

 CBO Trunk Merge: Test Failure vectorization_7
 -

 Key: HIVE-8077
 URL: https://issues.apache.org/jira/browse/HIVE-8077
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7325) Support non-constant expressions for MAP type indices.

2014-09-12 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131911#comment-14131911
 ] 

Jason Dere commented on HIVE-7325:
--

This looks good. +1 if you remove invalid_map_index.q since it looks like that 
test is no longer valid with your changes to map index.

 Support non-constant expressions for MAP type indices.
 --

 Key: HIVE-7325
 URL: https://issues.apache.org/jira/browse/HIVE-7325
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
Reporter: Mala Chikka Kempanna
Assignee: Navis
 Fix For: 0.14.0

 Attachments: HIVE-7325.1.patch.txt, HIVE-7325.2.patch.txt, 
 HIVE-7325.3.patch.txt


 Here is my sample:
 {code}
 CREATE TABLE RECORD(RecordID string, BatchDate string, Country string) 
 STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
 WITH SERDEPROPERTIES (hbase.columns.mapping = :key,D:BatchDate,D:Country) 
 TBLPROPERTIES (hbase.table.name = RECORD); 
 CREATE TABLE KEY_RECORD(KeyValue String, RecordId mapstring,string) 
 STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
 WITH SERDEPROPERTIES (hbase.columns.mapping = :key, K:) 
 TBLPROPERTIES (hbase.table.name = KEY_RECORD); 
 {code}
 The following join statement doesn't work. 
 {code}
 SELECT a.*, b.* from KEY_RECORD a join RECORD b 
 WHERE a.RecordId[b.RecordID] is not null;
 {code}
 FAILED: SemanticException 2:16 Non-constant expression for map indexes not 
 supported. Error encountered near token 'RecordID' 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8061) improve the partition col stats update speed

2014-09-12 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-8061:
--
Status: Open  (was: Patch Available)

 improve the partition col stats update speed
 

 Key: HIVE-8061
 URL: https://issues.apache.org/jira/browse/HIVE-8061
 Project: Hive
  Issue Type: Improvement
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
Priority: Minor
 Attachments: HIVE-8061.1.patch, HIVE-8061.2.patch, HIVE-8061.3.patch


 We worked hard towards faster update stats for columns of a partition of a 
 table previously HIVE-7736
 and HIVE-7876
 Although there is some improvement, it is only correct in the first run. 
 There will be duplicate column stats later. Thanks to Eugene Koifman 's 
 comments.
 We fixed this in HIVE-7944 by reversing the patch.
 This JIRA ticket is my another try to improve the speed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 25557: improve the speed of col stats update speed

2014-09-12 Thread pengcheng xiong

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25557/
---

(Updated Sept. 12, 2014, 6:47 p.m.)


Review request for hive.


Changes
---

address case insensitive problem


Repository: hive-git


Description
---

Major improvement
(1) All the partition status update/insert is now done in one transaction.
(2) Rather than to use a query to update per col per partition (total query = 
#col * # part),
now we use 1 query to delete everything and then use 1 query to insert 
everything. The transaction makes sure that this happens in ACID mode.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 9df6656 
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
33745e4 
  metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 
5a8591a 
  metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 637a39a 
  metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 5c5ed7f 
  
metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java
 5905efe 
  
metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java
 88b0791 
  ql/src/test/queries/clientpositive/analyze_tbl_part.q 9040bd4 
  ql/src/test/results/clientpositive/analyze_tbl_part.q.out 40b926c 

Diff: https://reviews.apache.org/r/25557/diff/


Testing
---


Thanks,

pengcheng xiong



  1   2   3   >