[jira] [Commented] (HIVE-7110) TestHCatPartitionPublish test failure: No FileSystem or scheme: pfile

2014-06-06 Thread David Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019784#comment-14019784
 ] 

David Chen commented on HIVE-7110:
--

Interesting. Though, something is causing this test to fail when building on OS 
X 1.9.3, and I have reproduced the failure on two different machines, which I 
think does indicate that something strange is going on in the build script and 
should be fixed. I will see if this reproduces on my Ubuntu 12.04 VM and RHEL 
6.4 dev box. If it does not, then we can de-prioritize/postpone this issue.

 TestHCatPartitionPublish test failure: No FileSystem or scheme: pfile
 -

 Key: HIVE-7110
 URL: https://issues.apache.org/jira/browse/HIVE-7110
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Reporter: David Chen
Assignee: David Chen
 Attachments: HIVE-7110.1.patch, HIVE-7110.2.patch, HIVE-7110.3.patch, 
 HIVE-7110.4.patch


 I got the following TestHCatPartitionPublish test failure when running all 
 unit tests against Hadoop 1. This also appears when testing against Hadoop 2.
 {code}
  Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 26.06 sec 
  FAILURE! - in org.apache.hive.hcatalog.mapreduce.TestHCatPartitionPublish
 testPartitionPublish(org.apache.hive.hcatalog.mapreduce.TestHCatPartitionPublish)
   Time elapsed: 1.361 sec   ERROR!
 org.apache.hive.hcatalog.common.HCatException: 
 org.apache.hive.hcatalog.common.HCatException : 2001 : Error setting output 
 information. Cause : java.io.IOException: No FileSystem for scheme: pfile
 at 
 org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1443)
 at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67)
 at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1464)
 at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:263)
 at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)
 at 
 org.apache.hive.hcatalog.mapreduce.HCatOutputFormat.setOutput(HCatOutputFormat.java:212)
 at 
 org.apache.hive.hcatalog.mapreduce.HCatOutputFormat.setOutput(HCatOutputFormat.java:70)
 at 
 org.apache.hive.hcatalog.mapreduce.TestHCatPartitionPublish.runMRCreateFail(TestHCatPartitionPublish.java:191)
 at 
 org.apache.hive.hcatalog.mapreduce.TestHCatPartitionPublish.testPartitionPublish(TestHCatPartitionPublish.java:155)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HIVE-1019) java.io.FileNotFoundException: HIVE_PLAN (No such file or directory)

2014-06-06 Thread Bennie Schut (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bennie Schut resolved HIVE-1019.


Resolution: Won't Fix

Hiveserver2 doesn't suffer from this.

 java.io.FileNotFoundException: HIVE_PLAN (No such file or directory)
 

 Key: HIVE-1019
 URL: https://issues.apache.org/jira/browse/HIVE-1019
 Project: Hive
  Issue Type: Bug
  Components: Server Infrastructure
Affects Versions: 0.6.0
Reporter: Bennie Schut
Assignee: Bennie Schut
Priority: Minor
 Attachments: HIVE-1019-1.patch, HIVE-1019-2.patch, HIVE-1019-3.patch, 
 HIVE-1019-4.patch, HIVE-1019-5.patch, HIVE-1019-6.patch, HIVE-1019-7.patch, 
 HIVE-1019-8.patch, HIVE-1019.patch, stacktrace2.txt


 I keep getting errors like this:
 java.io.FileNotFoundException: HIVE_PLAN (No such file or directory)
 and :
 java.io.IOException: cannot find dir = 
 hdfs://victoria.ebuddy.com:9000/tmp/hive-dwh/801467596/10002 in 
 partToPartitionInfo!
 when running multiple threads with roughly similar queries.
 I have a patch for this which works for me.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7175) Provide password file option to beeline

2014-06-06 Thread Dr. Wendell Urth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dr. Wendell Urth updated HIVE-7175:
---

Attachment: HIVE-7175.patch

I've added a patch that provides this ability akin to Sqoop's mechanism (minus 
the encrypted/obfuscated file loader options, as those could be better handled 
by Larry's proposal).

This would be useful in the immediate future, until what Larry proposes can be 
compatibly added to Hive in future upon completion upstream.

Please review.

 Provide password file option to beeline
 ---

 Key: HIVE-7175
 URL: https://issues.apache.org/jira/browse/HIVE-7175
 Project: Hive
  Issue Type: Improvement
  Components: CLI, Clients
Affects Versions: 0.13.0
Reporter: Robert Justice
  Labels: features, security
 Attachments: HIVE-7175.patch


 For people connecting to Hive Server 2 with LDAP authentication enabled, in 
 order to batch run commands, we currently have to provide the password openly 
 in the command line.   They could use some expect scripting, but I think a 
 valid improvement would be to provide a password file option similar to other 
 CLI commands in hadoop (e.g. sqoop) to be more secure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7175) Provide password file option to beeline

2014-06-06 Thread Dr. Wendell Urth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dr. Wendell Urth updated HIVE-7175:
---

Release Note: Added an --password-file (or, -w) option to BeeLine CLI, to 
read a password from a permission-protected file instead of supplying it in 
plaintext form as part of the command (-p).
  Status: Patch Available  (was: Open)

 Provide password file option to beeline
 ---

 Key: HIVE-7175
 URL: https://issues.apache.org/jira/browse/HIVE-7175
 Project: Hive
  Issue Type: Improvement
  Components: CLI, Clients
Affects Versions: 0.13.0
Reporter: Robert Justice
  Labels: features, security
 Attachments: HIVE-7175.patch


 For people connecting to Hive Server 2 with LDAP authentication enabled, in 
 order to batch run commands, we currently have to provide the password openly 
 in the command line.   They could use some expect scripting, but I think a 
 valid improvement would be to provide a password file option similar to other 
 CLI commands in hadoop (e.g. sqoop) to be more secure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7186) Unable to perform join on table

2014-06-06 Thread Alex Nastetsky (JIRA)
Alex Nastetsky created HIVE-7186:


 Summary: Unable to perform join on table
 Key: HIVE-7186
 URL: https://issues.apache.org/jira/browse/HIVE-7186
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
 Environment: Hortonworks Data Platform 2.0
Reporter: Alex Nastetsky


Occasionally, a table will start exhibiting behavior that will prevent it from 
being used in a JOIN. 

When doing a map join, it will just stall at Starting to launch local task to 
process map join; .
When doing a regular join, it will make progress but then error out with a 
IndexOutOfBoundsException:

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.IndexOutOfBoundsException
at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:365)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:91)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534)
... 9 more
Caused by: java.lang.IndexOutOfBoundsException
at java.nio.Buffer.checkIndex(Buffer.java:532)
at java.nio.ByteBufferAsIntBufferL.put(ByteBufferAsIntBufferL.java:131)
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1153)
at 
org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:586)
at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.collect(ReduceSinkOperator.java:372)
at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:334)
... 15 more

Doing simple selects against this table work fine and do not show any apparent 
problems with the data.

Assume that the table in question is called tableA and was created by queryA.

Doing either of the following has helped resolve the issue in the past.

1) create table tableB as select * from tableA;

  Then just use tableB instead in the JOIN.

2) regenerate tableA using queryA

  Then use tableA in the JOIN again. It usually works the second time.
  

When doing a describe formatted on the tables, the totalSize will be 
different between the original tableA and tableB, and sometimes (but not 
always) between the original tableA and the regenerated tableA. The numRows 
will be the same across all versions of the tables.

This problem can not be reproduced consistently, but the issue always happens 
when we try to use an affected table in a JOIN.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7136) Allow Hive to read hive scripts from any of the supported file systems in hadoop eco-system

2014-06-06 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7136:
---

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Sumit!

 Allow Hive to read hive scripts from any of the supported file systems in 
 hadoop eco-system
 ---

 Key: HIVE-7136
 URL: https://issues.apache.org/jira/browse/HIVE-7136
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 0.13.0
Reporter: Sumit Kumar
Assignee: Sumit Kumar
Priority: Minor
 Fix For: 0.14.0

 Attachments: HIVE-7136.01.patch, HIVE-7136.patch


 Current hive cli assumes that the source file (hive script) is always on the 
 local file system. This patch implements support for reading source files 
 from other file systems in hadoop eco-system (hdfs, s3 etc) as well keeping 
 the default behavior intact to be reading from default filesystem (local) in 
 case scheme is not provided in the url for the source file.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7135) Fix test fail of TestTezTask.testSubmit

2014-06-06 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7135:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Navis!

 Fix test fail of TestTezTask.testSubmit
 ---

 Key: HIVE-7135
 URL: https://issues.apache.org/jira/browse/HIVE-7135
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: 0.14.0
Reporter: Vikram Dixit K
Assignee: Navis
 Fix For: 0.14.0

 Attachments: HIVE-7135.1.patch, HIVE-7135.2.patch.txt


 HIVE-7043 broke a tez test case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7176) FileInputStream is not closed in Commands#properties()

2014-06-06 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7176:
---

   Resolution: Fixed
Fix Version/s: 0.14.0
 Assignee: Navis
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Navis!

 FileInputStream is not closed in Commands#properties()
 --

 Key: HIVE-7176
 URL: https://issues.apache.org/jira/browse/HIVE-7176
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Navis
Priority: Minor
 Fix For: 0.14.0

 Attachments: HIVE-7176.1.patch.txt


 NO PRECOMMIT TESTS
 In beeline.Commands, around line 834:
 {code}
   props.load(new FileInputStream(parts[i]));
 {code}
 The FileInputStream is not closed upon return from the method.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 22170: analyze table T compute statistics for columns; will now compute stats for all columns.

2014-06-06 Thread Ashutosh Chauhan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22170/
---

(Updated June 6, 2014, 4:30 p.m.)


Review request for hive and Prasanth_J.


Changes
---

Fixed last failing test.


Bugs: HIVE-7168
https://issues.apache.org/jira/browse/HIVE-7168


Repository: hive-git


Description
---

analyze table T compute statistics for columns; will now compute stats for all 
columns.


Diffs (updated)
-

  
metastore/src/model/org/apache/hadoop/hive/metastore/model/MPartitionColumnStatistics.java
 1245d80 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java 
5b77e6f 
  ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g 6d958fd 
  ql/src/test/queries/clientpositive/columnstats_partlvl.q 9dfe8ff 
  ql/src/test/queries/clientpositive/columnstats_tbllvl.q 170fbc5 
  ql/src/test/results/clientpositive/columnstats_partlvl.q.out d91be8d 
  ql/src/test/results/clientpositive/columnstats_tbllvl.q.out 3d3d0e2 
  ql/src/test/results/clientpositive/display_colstats_tbllvl.q.out 03b536f 

Diff: https://reviews.apache.org/r/22170/diff/


Testing
---

Added new tests.


Thanks,

Ashutosh Chauhan



[jira] [Updated] (HIVE-7168) Don't require to name all columns in analyze statements if stats collection is for all columns

2014-06-06 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7168:
---

Attachment: HIVE-7168.2.patch

Fixed last failing test.

 Don't require to name all columns in analyze statements if stats collection 
 is for all columns
 --

 Key: HIVE-7168
 URL: https://issues.apache.org/jira/browse/HIVE-7168
 Project: Hive
  Issue Type: Improvement
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-7168.1.patch, HIVE-7168.2.patch, HIVE-7168.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7168) Don't require to name all columns in analyze statements if stats collection is for all columns

2014-06-06 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7168:
---

Status: Open  (was: Patch Available)

 Don't require to name all columns in analyze statements if stats collection 
 is for all columns
 --

 Key: HIVE-7168
 URL: https://issues.apache.org/jira/browse/HIVE-7168
 Project: Hive
  Issue Type: Improvement
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-7168.1.patch, HIVE-7168.2.patch, HIVE-7168.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7168) Don't require to name all columns in analyze statements if stats collection is for all columns

2014-06-06 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7168:
---

Status: Patch Available  (was: Open)

 Don't require to name all columns in analyze statements if stats collection 
 is for all columns
 --

 Key: HIVE-7168
 URL: https://issues.apache.org/jira/browse/HIVE-7168
 Project: Hive
  Issue Type: Improvement
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-7168.1.patch, HIVE-7168.2.patch, HIVE-7168.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7040) TCP KeepAlive for HiveServer2

2014-06-06 Thread Vaibhav Gumashta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020023#comment-14020023
 ] 

Vaibhav Gumashta commented on HIVE-7040:


Thanks for the patch [~nicothieb]! There is another jira: HIVE-6679, which 
looks at doing this for binary mode (with and without SSL). Is it possible to 
handle the SSL case as well in this jira? 

 TCP KeepAlive for HiveServer2
 -

 Key: HIVE-7040
 URL: https://issues.apache.org/jira/browse/HIVE-7040
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2, Server Infrastructure
Reporter: Nicolas ThiƩbaud
 Attachments: HIVE-7040.patch, HIVE-7040.patch.2


 Implement TCP KeepAlive for HiverServer 2 to avoid half open connections.
 A setting could be added
 {code}
 property
   namehive.server2.tcp.keepalive/name
   valuetrue/value
   descriptionWhether to enable TCP keepalive for Hive Server 2/description
 /property
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7143) Add Streaming support in Windowing mode for more UDAFs (min/max, lead/lag, fval/lval)

2014-06-06 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020035#comment-14020035
 ] 

Ashutosh Chauhan commented on HIVE-7143:


+1

 Add Streaming support in Windowing mode for more UDAFs (min/max, lead/lag, 
 fval/lval)
 -

 Key: HIVE-7143
 URL: https://issues.apache.org/jira/browse/HIVE-7143
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-7143.1.patch, HIVE-7143.3.patch


 Provided implementations for Streaming for the above fns.
 Min/Max based on Alg by Daniel Lemire: 
 http://www.archipel.uqam.ca/309/1/webmaximinalgo.pdf



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7186) Unable to perform join on table

2014-06-06 Thread Alex Nastetsky (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Nastetsky updated HIVE-7186:
-

Environment: Hortonworks Data Platform 2.0.6.0  (was: Hortonworks Data 
Platform 2.0)

 Unable to perform join on table
 ---

 Key: HIVE-7186
 URL: https://issues.apache.org/jira/browse/HIVE-7186
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
 Environment: Hortonworks Data Platform 2.0.6.0
Reporter: Alex Nastetsky

 Occasionally, a table will start exhibiting behavior that will prevent it 
 from being used in a JOIN. 
 When doing a map join, it will just stall at Starting to launch local task 
 to process map join; .
 When doing a regular join, it will make progress but then error out with a 
 IndexOutOfBoundsException:
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.IndexOutOfBoundsException
 at 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:365)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
 at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:91)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
 at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534)
 ... 9 more
 Caused by: java.lang.IndexOutOfBoundsException
 at java.nio.Buffer.checkIndex(Buffer.java:532)
 at 
 java.nio.ByteBufferAsIntBufferL.put(ByteBufferAsIntBufferL.java:131)
 at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1153)
 at 
 org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:586)
 at 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.collect(ReduceSinkOperator.java:372)
 at 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:334)
 ... 15 more
 
 Doing simple selects against this table work fine and do not show any 
 apparent problems with the data.
 Assume that the table in question is called tableA and was created by queryA.
 Doing either of the following has helped resolve the issue in the past.
 1) create table tableB as select * from tableA;
   Then just use tableB instead in the JOIN.
 2) regenerate tableA using queryA
   Then use tableA in the JOIN again. It usually works the second time.
   
 When doing a describe formatted on the tables, the totalSize will be 
 different between the original tableA and tableB, and sometimes (but not 
 always) between the original tableA and the regenerated tableA. The numRows 
 will be the same across all versions of the tables.
 This problem can not be reproduced consistently, but the issue always happens 
 when we try to use an affected table in a JOIN.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7040) TCP KeepAlive for HiveServer2

2014-06-06 Thread Vaibhav Gumashta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020049#comment-14020049
 ] 

Vaibhav Gumashta commented on HIVE-7040:


Actually HIVE-6679 looks like focussed just on timeouts, so please ignore the 
jira. 

 TCP KeepAlive for HiveServer2
 -

 Key: HIVE-7040
 URL: https://issues.apache.org/jira/browse/HIVE-7040
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2, Server Infrastructure
Reporter: Nicolas ThiƩbaud
 Attachments: HIVE-7040.patch, HIVE-7040.patch.2


 Implement TCP KeepAlive for HiverServer 2 to avoid half open connections.
 A setting could be added
 {code}
 property
   namehive.server2.tcp.keepalive/name
   valuetrue/value
   descriptionWhether to enable TCP keepalive for Hive Server 2/description
 /property
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7117) Partitions not inheriting table permissions after alter rename partition

2014-06-06 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-7117:
--

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

 Partitions not inheriting table permissions after alter rename partition
 

 Key: HIVE-7117
 URL: https://issues.apache.org/jira/browse/HIVE-7117
 Project: Hive
  Issue Type: Bug
  Components: Security
Reporter: Ashish Kumar Singh
Assignee: Ashish Kumar Singh
 Fix For: 0.14.0

 Attachments: HIVE-7117.2.patch, HIVE-7117.3.patch, HIVE-7117.4.patch, 
 HIVE-7117.5.patch, HIVE-7117.6.patch, HIVE-7117.7.patch, HIVE-7117.8.patch, 
 HIVE-7117.patch


 On altering/renaming a partition it must inherit permission of the parent 
 directory, if the flag hive.warehouse.subdir.inherit.perms is set.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7117) Partitions not inheriting table permissions after alter rename partition

2014-06-06 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020145#comment-14020145
 ] 

Xuefu Zhang commented on HIVE-7117:
---

Patch committed to trunk. Thanks to Ashish for the contribution.

 Partitions not inheriting table permissions after alter rename partition
 

 Key: HIVE-7117
 URL: https://issues.apache.org/jira/browse/HIVE-7117
 Project: Hive
  Issue Type: Bug
  Components: Security
Reporter: Ashish Kumar Singh
Assignee: Ashish Kumar Singh
 Fix For: 0.14.0

 Attachments: HIVE-7117.2.patch, HIVE-7117.3.patch, HIVE-7117.4.patch, 
 HIVE-7117.5.patch, HIVE-7117.6.patch, HIVE-7117.7.patch, HIVE-7117.8.patch, 
 HIVE-7117.patch


 On altering/renaming a partition it must inherit permission of the parent 
 directory, if the flag hive.warehouse.subdir.inherit.perms is set.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7187) Reconcile jetty versions in hive

2014-06-06 Thread Vaibhav Gumashta (JIRA)
Vaibhav Gumashta created HIVE-7187:
--

 Summary: Reconcile jetty versions in hive
 Key: HIVE-7187
 URL: https://issues.apache.org/jira/browse/HIVE-7187
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Web UI, WebHCat
Reporter: Vaibhav Gumashta


Hive root pom has 3 parameters for specifying jetty dependency versions:
{code}
jetty.version6.1.26/jetty.version
jetty.webhcat.version7.6.0.v20120127/jetty.webhcat.version
jetty.hive-service.version7.6.0.v20120127/jetty.hive-service.version
{code}
1st one is used by HWI, 2nd by WebHCat and 3rd by HiveServer2 (in http mode). 
We should probably use the same jetty version for all hive components. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7187) Reconcile jetty versions in hive

2014-06-06 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020170#comment-14020170
 ] 

Eugene Koifman commented on HIVE-7187:
--

also, the current release of Jetty is 9.x.


 Reconcile jetty versions in hive
 

 Key: HIVE-7187
 URL: https://issues.apache.org/jira/browse/HIVE-7187
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Web UI, WebHCat
Reporter: Vaibhav Gumashta

 Hive root pom has 3 parameters for specifying jetty dependency versions:
 {code}
 jetty.version6.1.26/jetty.version
 jetty.webhcat.version7.6.0.v20120127/jetty.webhcat.version
 jetty.hive-service.version7.6.0.v20120127/jetty.hive-service.version
 {code}
 1st one is used by HWI, 2nd by WebHCat and 3rd by HiveServer2 (in http mode). 
 We should probably use the same jetty version for all hive components. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7063) Optimize for the Top N within a Group use case

2014-06-06 Thread Harish Butani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harish Butani updated HIVE-7063:


Attachment: HIVE-7063.1.patch

preliminary patch: this adds code to WdwTabFn to react to a rank limit.

 Optimize for the Top N within a Group use case
 --

 Key: HIVE-7063
 URL: https://issues.apache.org/jira/browse/HIVE-7063
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-7063.1.patch


 It is common to rank within a Group/Partition and then only return the Top N 
 entries within each Group.
 With Streaming mode for Windowing, we should push the post filter on the rank 
 into the Windowing processing as a Limit expression.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: TestAvroSerdeUtils#determineSchemaCanReadSchemaFromHDFS fails on hive-13 for hadoop-2

2014-06-06 Thread Szehon Ho
This is passing in the builds, and also for me.  Looks like some
environment issue.  Are you running in eclipse or maven?

Thanks
Szehon


On Thu, Jun 5, 2014 at 5:51 PM, pankit thapar thapar.pan...@gmail.com
wrote:

 Hi,

 I am trying to build hive on my local desktop.
 I am facing an issue with test case  :
 TestAvroSerdeUtils#determineSchemaCanReadSchemaFromHDFS

 The issue is only with hadoop-2 and not with hadoop-1

 Has anyone been able to run this test case?

 Trace :
 org.apache.hadoop.ipc.RemoteException: File /path/to/schema/schema.avsc
 could only be replicated to 0 nodes instead of minReplication (=1).  There
 are 1 datanode(s) running and no node(s) are excluded in this operation.
 at

 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1406)
 at

 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2596)
 at

 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:563)
 at

 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:407)
 at

 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 at

 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:592)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at

 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956)

 at org.apache.hadoop.ipc.Client.call(Client.java:1406)
 at org.apache.hadoop.ipc.Client.call(Client.java:1359)
 at

 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:211)
 at com.sun.proxy.$Proxy14.addBlock(Unknown Source)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at

 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at

 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at

 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
 at

 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
 at com.sun.proxy.$Proxy14.addBlock(Unknown Source)
 at

 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:348)
 at

 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1275)
 at

 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1123)
 at

 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:527)


 Thanks,
 Pankit



[jira] [Commented] (HIVE-7175) Provide password file option to beeline

2014-06-06 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020290#comment-14020290
 ] 

Hive QA commented on HIVE-7175:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12648646/HIVE-7175.patch

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 5511 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
org.apache.hadoop.hive.ql.exec.tez.TestTezTask.testSubmit
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes
org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimal
org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalX
org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalXY
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/399/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/399/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-399/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12648646

 Provide password file option to beeline
 ---

 Key: HIVE-7175
 URL: https://issues.apache.org/jira/browse/HIVE-7175
 Project: Hive
  Issue Type: Improvement
  Components: CLI, Clients
Affects Versions: 0.13.0
Reporter: Robert Justice
  Labels: features, security
 Attachments: HIVE-7175.patch


 For people connecting to Hive Server 2 with LDAP authentication enabled, in 
 order to batch run commands, we currently have to provide the password openly 
 in the command line.   They could use some expect scripting, but I think a 
 valid improvement would be to provide a password file option similar to other 
 CLI commands in hadoop (e.g. sqoop) to be more secure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-538) make hive_jdbc.jar self-containing

2014-06-06 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-538:
--

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Nick!

 make hive_jdbc.jar self-containing
 --

 Key: HIVE-538
 URL: https://issues.apache.org/jira/browse/HIVE-538
 Project: Hive
  Issue Type: Improvement
  Components: JDBC
Affects Versions: 0.3.0, 0.4.0, 0.6.0, 0.13.0
Reporter: Raghotham Murthy
Assignee: Nick White
 Fix For: 0.14.0

 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-538.D2553.1.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-538.D2553.2.patch, HIVE-538.patch


 Currently, most jars in hive/build/dist/lib and the hadoop-*-core.jar are 
 required in the classpath to run jdbc applications on hive. We need to do 
 atleast the following to get rid of most unnecessary dependencies:
 1. get rid of dynamic serde and use a standard serialization format, maybe 
 tab separated, json or avro
 2. dont use hadoop configuration parameters
 3. repackage thrift and fb303 classes into hive_jdbc.jar



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7188) sum(if()) returns wrong results with vectorization

2014-06-06 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-7188:


Attachment: hike-vector-sum-bug.tgz

 sum(if()) returns wrong results with vectorization
 --

 Key: HIVE-7188
 URL: https://issues.apache.org/jira/browse/HIVE-7188
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: hike-vector-sum-bug.tgz


 1. The tgz file containing the setup is attached.
 2. Run the following query
 select
 sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning
 from hike_error.ttr_day0;
 returns 0 rows with vectorization turned on whereas it return 131 rows with 
 vectorization turned off.
 hive source insert.sql
  ;
 OK
 Time taken: 0.359 seconds
 OK
 Time taken: 0.015 seconds
 OK
 Time taken: 0.069 seconds
 OK
 Time taken: 0.176 seconds
 Loading data to table hike_error.ttr_day0
 Table hike_error.ttr_day0 stats: [numFiles=1, numRows=0, totalSize=3581, 
 rawDataSize=0]
 OK
 Time taken: 0.33 seconds
 hive select
  sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning
  from hike_error.ttr_day0;
 Query ID = hsubramaniyan_20140606134646_04790d3d-ca9a-427a-8cf9-3174536114ed
 Total jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks determined at compile time: 1
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapred.reduce.tasks=number
 Execution log at: 
 /var/folders/r0/9x0wltgx2nv4m4b18m71z1y4gr/T//hsubramaniyan/hsubramaniyan_20140606134646_04790d3d-ca9a-427a-8cf9-3174536114ed.log
 Job running in-process (local Hadoop)
 Hadoop job information for null: number of mappers: 0; number of reducers: 0
 2014-06-06 13:47:02,043 null map = 0%,  reduce = 100%
 Ended Job = job_local773704964_0001
 Execution completed successfully
 MapredLocal task succeeded
 OK
 131
 Time taken: 5.325 seconds, Fetched: 1 row(s)
 hive set hive.vectorized.execution.enabled=true; 

 hive select
  sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning
  from hike_error.ttr_day0;
 Query ID = hsubramaniyan_20140606134747_1182c765-90ac-4a33-a8b1-760adca6bf38
 Total jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks determined at compile time: 1
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapred.reduce.tasks=number
 Execution log at: 
 /var/folders/r0/9x0wltgx2nv4m4b18m71z1y4gr/T//hsubramaniyan/hsubramaniyan_20140606134747_1182c765-90ac-4a33-a8b1-760adca6bf38.log
 Job running in-process (local Hadoop)
 Hadoop job information for null: number of mappers: 0; number of reducers: 0
 2014-06-06 13:47:18,604 null map = 0%,  reduce = 100%
 Ended Job = job_local701415676_0001
 Execution completed successfully
 MapredLocal task succeeded
 OK
 0
 Time taken: 5.52 seconds, Fetched: 1 row(s)
 hive explain select
  sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning
  from hike_error.ttr_day0;
 OK
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 depends on stages: Stage-1
 STAGE PLANS:
   Stage: Stage-1
 Map Reduce
   Map Operator Tree:
   TableScan
 alias: ttr_day0
 Statistics: Num rows: 447 Data size: 3581 Basic stats: COMPLETE 
 Column stats: NONE
 Select Operator
   expressions: is_returning (type: boolean), is_free (type: 
 boolean)
   outputColumnNames: is_returning, is_free
   Statistics: Num rows: 447 Data size: 3581 Basic stats: COMPLETE 
 Column stats: NONE
   Group By Operator
 aggregations: sum(if(((is_returning = true) and (is_free = 
 false)), 1, 0))
 mode: hash
 outputColumnNames: _col0
 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
 Column stats: NONE
 Reduce Output Operator
   sort order: 
   Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
 Column stats: NONE
   value expressions: _col0 (type: bigint)
   Execution mode: vectorized
   Reduce Operator Tree:
 Group By Operator
   aggregations: sum(VALUE._col0)
   mode: mergepartial
   outputColumnNames: _col0
 

[jira] [Created] (HIVE-7188) sum(if()) returns wrong results with vectorization

2014-06-06 Thread Hari Sankar Sivarama Subramaniyan (JIRA)
Hari Sankar Sivarama Subramaniyan created HIVE-7188:
---

 Summary: sum(if()) returns wrong results with vectorization
 Key: HIVE-7188
 URL: https://issues.apache.org/jira/browse/HIVE-7188
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: hike-vector-sum-bug.tgz

1. The tgz file containing the setup is attached.
2. Run the following query
select
sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning
from hike_error.ttr_day0;

returns 0 rows with vectorization turned on whereas it return 131 rows with 
vectorization turned off.



hive source insert.sql
 ;
OK
Time taken: 0.359 seconds
OK
Time taken: 0.015 seconds
OK
Time taken: 0.069 seconds
OK
Time taken: 0.176 seconds
Loading data to table hike_error.ttr_day0
Table hike_error.ttr_day0 stats: [numFiles=1, numRows=0, totalSize=3581, 
rawDataSize=0]
OK
Time taken: 0.33 seconds
hive select
 sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning
 from hike_error.ttr_day0;
Query ID = hsubramaniyan_20140606134646_04790d3d-ca9a-427a-8cf9-3174536114ed
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=number
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=number
In order to set a constant number of reducers:
  set mapred.reduce.tasks=number
Execution log at: 
/var/folders/r0/9x0wltgx2nv4m4b18m71z1y4gr/T//hsubramaniyan/hsubramaniyan_20140606134646_04790d3d-ca9a-427a-8cf9-3174536114ed.log
Job running in-process (local Hadoop)
Hadoop job information for null: number of mappers: 0; number of reducers: 0
2014-06-06 13:47:02,043 null map = 0%,  reduce = 100%
Ended Job = job_local773704964_0001
Execution completed successfully
MapredLocal task succeeded
OK
131
Time taken: 5.325 seconds, Fetched: 1 row(s)
hive set hive.vectorized.execution.enabled=true;   
 
hive select
 sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning
 from hike_error.ttr_day0;
Query ID = hsubramaniyan_20140606134747_1182c765-90ac-4a33-a8b1-760adca6bf38
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=number
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=number
In order to set a constant number of reducers:
  set mapred.reduce.tasks=number
Execution log at: 
/var/folders/r0/9x0wltgx2nv4m4b18m71z1y4gr/T//hsubramaniyan/hsubramaniyan_20140606134747_1182c765-90ac-4a33-a8b1-760adca6bf38.log
Job running in-process (local Hadoop)
Hadoop job information for null: number of mappers: 0; number of reducers: 0
2014-06-06 13:47:18,604 null map = 0%,  reduce = 100%
Ended Job = job_local701415676_0001
Execution completed successfully
MapredLocal task succeeded
OK
0
Time taken: 5.52 seconds, Fetched: 1 row(s)
hive explain select
 sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning
 from hike_error.ttr_day0;
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
Map Reduce
  Map Operator Tree:
  TableScan
alias: ttr_day0
Statistics: Num rows: 447 Data size: 3581 Basic stats: COMPLETE 
Column stats: NONE
Select Operator
  expressions: is_returning (type: boolean), is_free (type: boolean)
  outputColumnNames: is_returning, is_free
  Statistics: Num rows: 447 Data size: 3581 Basic stats: COMPLETE 
Column stats: NONE
  Group By Operator
aggregations: sum(if(((is_returning = true) and (is_free = 
false)), 1, 0))
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: NONE
Reduce Output Operator
  sort order: 
  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: NONE
  value expressions: _col0 (type: bigint)
  Execution mode: vectorized
  Reduce Operator Tree:
Group By Operator
  aggregations: sum(VALUE._col0)
  mode: mergepartial
  outputColumnNames: _col0
  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column 
stats: NONE
  Select Operator
expressions: _col0 (type: bigint)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column 
stats: NONE
File 

[jira] [Created] (HIVE-7189) Hive does not store column names in ORC

2014-06-06 Thread Chris Drome (JIRA)
Chris Drome created HIVE-7189:
-

 Summary: Hive does not store column names in ORC
 Key: HIVE-7189
 URL: https://issues.apache.org/jira/browse/HIVE-7189
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 0.13.0, 0.12.0
Reporter: Chris Drome


We uncovered the following discrepancy between writing ORC files through Pig 
and Hive:

ORCFile header contains the name of the columns. Storing through Pig 
(ORCStorage or HCatStorer), the column names are stored fine. But when stored 
through hive they are stored as _col0, _col1,,_col99 and hive uses the 
partition schema to map the column names. Reading the same file through Pig 
then has problems as user will have to manually map columns.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7190) WebHCat launcher task failure can cause two concurent user jobs to run

2014-06-06 Thread Ivan Mitic (JIRA)
Ivan Mitic created HIVE-7190:


 Summary: WebHCat launcher task failure can cause two concurent 
user jobs to run
 Key: HIVE-7190
 URL: https://issues.apache.org/jira/browse/HIVE-7190
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Reporter: Ivan Mitic


Templeton uses launcher jobs to launch the actual user jobs. Launcher jobs are 
1-map jobs (a single task jobs) which kick off the actual user job and monitor 
it until it finishes. Given that the launcher is a task, like any other MR 
task, it has a retry policy in case it fails (due to a task crash, 
tasktracker/nodemanager crash, machine level outage, etc.). Further, when 
launcher task is retried, it will again launch the same user job, *however* the 
previous attempt user job is already running. What this means is that we can 
have two identical user jobs running in parallel. 

In case of MRv2, there will be an MRAppMaster and the launcher task, which are 
subject to failure. In case any of the two fails, another instance of a user 
job will be launched again in parallel. 

Above situation is already a bug.

Now going further to RM HA, what RM does on failover/restart is that it kills 
all containers, and it restarts all applications. This means that if our 
customer had 10 jobs on the cluster (this is 10 launcher jobs and 10 user 
jobs), on RM failover, all 20 jobs will be restarted, and launcher jobs will 
queue user jobs again. There are two issues with this design:
1. There are *possible* chances for corruption of job outputs (it would be 
useful to analyze this scenario more and confirm this statement).
2. Cluster resources are spent on jobs redundantly

To address the issue at least on Yarn (Hadoop 2.0) clusters, webhcat should do 
the same thing Oozie does in this scenario, and that is to tag all its child 
jobs with an id, and kill those jobs on task restart before they are kicked off 
again.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7190) WebHCat launcher task failure can cause two concurent user jobs to run

2014-06-06 Thread Ivan Mitic (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020358#comment-14020358
 ] 

Ivan Mitic commented on HIVE-7190:
--

Will attach a patch in a bit, feel free to assign the Jira to me as I don't 
have the right to do so yet.

 WebHCat launcher task failure can cause two concurent user jobs to run
 --

 Key: HIVE-7190
 URL: https://issues.apache.org/jira/browse/HIVE-7190
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Reporter: Ivan Mitic

 Templeton uses launcher jobs to launch the actual user jobs. Launcher jobs 
 are 1-map jobs (a single task jobs) which kick off the actual user job and 
 monitor it until it finishes. Given that the launcher is a task, like any 
 other MR task, it has a retry policy in case it fails (due to a task crash, 
 tasktracker/nodemanager crash, machine level outage, etc.). Further, when 
 launcher task is retried, it will again launch the same user job, *however* 
 the previous attempt user job is already running. What this means is that we 
 can have two identical user jobs running in parallel. 
 In case of MRv2, there will be an MRAppMaster and the launcher task, which 
 are subject to failure. In case any of the two fails, another instance of a 
 user job will be launched again in parallel. 
 Above situation is already a bug.
 Now going further to RM HA, what RM does on failover/restart is that it kills 
 all containers, and it restarts all applications. This means that if our 
 customer had 10 jobs on the cluster (this is 10 launcher jobs and 10 user 
 jobs), on RM failover, all 20 jobs will be restarted, and launcher jobs will 
 queue user jobs again. There are two issues with this design:
 1. There are *possible* chances for corruption of job outputs (it would be 
 useful to analyze this scenario more and confirm this statement).
 2. Cluster resources are spent on jobs redundantly
 To address the issue at least on Yarn (Hadoop 2.0) clusters, webhcat should 
 do the same thing Oozie does in this scenario, and that is to tag all its 
 child jobs with an id, and kill those jobs on task restart before they are 
 kicked off again.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7065) Hive jobs in webhcat run in default mr mode even in Hive on Tez setup

2014-06-06 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-7065:


   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Patch committed to trunk. Thanks for the contribution Eugene!


 Hive jobs in webhcat run in default mr mode even in Hive on Tez setup
 -

 Key: HIVE-7065
 URL: https://issues.apache.org/jira/browse/HIVE-7065
 Project: Hive
  Issue Type: Bug
  Components: Tez, WebHCat
Affects Versions: 0.13.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Fix For: 0.14.0

 Attachments: HIVE-7065.1.patch, HIVE-7065.patch


 WebHCat config has templeton.hive.properties to specify Hive config 
 properties that need to be passed to Hive client on node executing a job 
 submitted through WebHCat (hive query, for example).
 this should include hive.execution.engine
 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7191) optimized map join hash table has a bug when it reaches 2Gb

2014-06-06 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-7191:
--

 Summary: optimized map join hash table has a bug when it reaches 
2Gb
 Key: HIVE-7191
 URL: https://issues.apache.org/jira/browse/HIVE-7191
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin


Via [~t3rmin4t0r]:

{noformat}
Caused by: java.lang.ArrayIndexOutOfBoundsException: -204

at java.util.ArrayList.elementData(ArrayList.java:371)

at java.util.ArrayList.get(ArrayList.java:384)

at 
org.apache.hadoop.hive.serde2.WriteBuffers.setReadPoint(WriteBuffers.java:95)

at 
org.apache.hadoop.hive.serde2.WriteBuffers.hashCode(WriteBuffers.java:100)

at 
org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.put(BytesBytesMultiHashMap.java:203)

at 
org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer.putRow(MapJoinBytesTableContainer.java:266)

at 
org.apache.hadoop.hive.ql.exec.tez.HashTableLoader.load(HashTableLoader.java:124)

... 16 more
{noformat}




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7191) optimized map join hash table has a bug when it reaches 2Gb

2014-06-06 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-7191:
---

Attachment: HIVE-7191.patch

Some casts are in order

 optimized map join hash table has a bug when it reaches 2Gb
 ---

 Key: HIVE-7191
 URL: https://issues.apache.org/jira/browse/HIVE-7191
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-7191.patch


 Via [~t3rmin4t0r]:
 {noformat}
 Caused by: java.lang.ArrayIndexOutOfBoundsException: -204
 at java.util.ArrayList.elementData(ArrayList.java:371)
 at java.util.ArrayList.get(ArrayList.java:384)
 at 
 org.apache.hadoop.hive.serde2.WriteBuffers.setReadPoint(WriteBuffers.java:95)
 at 
 org.apache.hadoop.hive.serde2.WriteBuffers.hashCode(WriteBuffers.java:100)
 at 
 org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.put(BytesBytesMultiHashMap.java:203)
 at 
 org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer.putRow(MapJoinBytesTableContainer.java:266)
 at 
 org.apache.hadoop.hive.ql.exec.tez.HashTableLoader.load(HashTableLoader.java:124)
 ... 16 more
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7191) optimized map join hash table has a bug when it reaches 2Gb

2014-06-06 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7191:
---

Status: Patch Available  (was: Open)

+1

 optimized map join hash table has a bug when it reaches 2Gb
 ---

 Key: HIVE-7191
 URL: https://issues.apache.org/jira/browse/HIVE-7191
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-7191.patch


 Via [~t3rmin4t0r]:
 {noformat}
 Caused by: java.lang.ArrayIndexOutOfBoundsException: -204
 at java.util.ArrayList.elementData(ArrayList.java:371)
 at java.util.ArrayList.get(ArrayList.java:384)
 at 
 org.apache.hadoop.hive.serde2.WriteBuffers.setReadPoint(WriteBuffers.java:95)
 at 
 org.apache.hadoop.hive.serde2.WriteBuffers.hashCode(WriteBuffers.java:100)
 at 
 org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.put(BytesBytesMultiHashMap.java:203)
 at 
 org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer.putRow(MapJoinBytesTableContainer.java:266)
 at 
 org.apache.hadoop.hive.ql.exec.tez.HashTableLoader.load(HashTableLoader.java:124)
 ... 16 more
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7190) WebHCat launcher task failure can cause two concurent user jobs to run

2014-06-06 Thread Ivan Mitic (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Mitic updated HIVE-7190:
-

Attachment: HIVE-7190.patch

Attaching the initial patch.

Approach in the patch is similar to what Oozie does to handle this situation. 
Specifically, all child map jobs get tagged with the launcher MR job id. On 
launcher task restart, launcher queries RM for the list of jobs that have the 
tag and kills them. After that it moves on to start the same child job again. 
Again, similarly to what Oozie does, a new {{templeton.job.launch.time}} 
property is introduced that captures the launcher job submit timestamp and 
later used to reduce the search window when RM is queried. 

I have validated that MR, Pig and Hive jobs do get tagged appropriately. I have 
also validated that previous child jobs do get killed on RM failover/task 
failure.

To validate the patch, you will need to add webhcat shim jars to 
templeton.libjars as now webhcat launcher also has a dependency on hadoop 
shims. 

I have noticed that in case of the SqoopDelegator webhcat currently does not 
set the MR delegation token when optionsFile flag is used. This also creates 
the problem in this scenario. This looks like something that should be handled 
via a separate Jira.

 WebHCat launcher task failure can cause two concurent user jobs to run
 --

 Key: HIVE-7190
 URL: https://issues.apache.org/jira/browse/HIVE-7190
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Reporter: Ivan Mitic
 Attachments: HIVE-7190.patch


 Templeton uses launcher jobs to launch the actual user jobs. Launcher jobs 
 are 1-map jobs (a single task jobs) which kick off the actual user job and 
 monitor it until it finishes. Given that the launcher is a task, like any 
 other MR task, it has a retry policy in case it fails (due to a task crash, 
 tasktracker/nodemanager crash, machine level outage, etc.). Further, when 
 launcher task is retried, it will again launch the same user job, *however* 
 the previous attempt user job is already running. What this means is that we 
 can have two identical user jobs running in parallel. 
 In case of MRv2, there will be an MRAppMaster and the launcher task, which 
 are subject to failure. In case any of the two fails, another instance of a 
 user job will be launched again in parallel. 
 Above situation is already a bug.
 Now going further to RM HA, what RM does on failover/restart is that it kills 
 all containers, and it restarts all applications. This means that if our 
 customer had 10 jobs on the cluster (this is 10 launcher jobs and 10 user 
 jobs), on RM failover, all 20 jobs will be restarted, and launcher jobs will 
 queue user jobs again. There are two issues with this design:
 1. There are *possible* chances for corruption of job outputs (it would be 
 useful to analyze this scenario more and confirm this statement).
 2. Cluster resources are spent on jobs redundantly
 To address the issue at least on Yarn (Hadoop 2.0) clusters, webhcat should 
 do the same thing Oozie does in this scenario, and that is to tag all its 
 child jobs with an id, and kill those jobs on task restart before they are 
 kicked off again.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7190) WebHCat launcher task failure can cause two concurent user jobs to run

2014-06-06 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-7190:
-

Affects Version/s: 0.13.0

 WebHCat launcher task failure can cause two concurent user jobs to run
 --

 Key: HIVE-7190
 URL: https://issues.apache.org/jira/browse/HIVE-7190
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.13.0
Reporter: Ivan Mitic
 Attachments: HIVE-7190.patch


 Templeton uses launcher jobs to launch the actual user jobs. Launcher jobs 
 are 1-map jobs (a single task jobs) which kick off the actual user job and 
 monitor it until it finishes. Given that the launcher is a task, like any 
 other MR task, it has a retry policy in case it fails (due to a task crash, 
 tasktracker/nodemanager crash, machine level outage, etc.). Further, when 
 launcher task is retried, it will again launch the same user job, *however* 
 the previous attempt user job is already running. What this means is that we 
 can have two identical user jobs running in parallel. 
 In case of MRv2, there will be an MRAppMaster and the launcher task, which 
 are subject to failure. In case any of the two fails, another instance of a 
 user job will be launched again in parallel. 
 Above situation is already a bug.
 Now going further to RM HA, what RM does on failover/restart is that it kills 
 all containers, and it restarts all applications. This means that if our 
 customer had 10 jobs on the cluster (this is 10 launcher jobs and 10 user 
 jobs), on RM failover, all 20 jobs will be restarted, and launcher jobs will 
 queue user jobs again. There are two issues with this design:
 1. There are *possible* chances for corruption of job outputs (it would be 
 useful to analyze this scenario more and confirm this statement).
 2. Cluster resources are spent on jobs redundantly
 To address the issue at least on Yarn (Hadoop 2.0) clusters, webhcat should 
 do the same thing Oozie does in this scenario, and that is to tag all its 
 child jobs with an id, and kill those jobs on task restart before they are 
 kicked off again.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7192) Hive Streaming - Some required settings are not mentioned in the documentation

2014-06-06 Thread Roshan Naik (JIRA)
Roshan Naik created HIVE-7192:
-

 Summary: Hive Streaming - Some required settings are not mentioned 
in the documentation
 Key: HIVE-7192
 URL: https://issues.apache.org/jira/browse/HIVE-7192
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.13.0
Reporter: Roshan Naik
Assignee: Roshan Naik


Specifically:
 - hive.support.concurrency on metastore
 - hive.vectorized.execution.enabled for query client





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7167) Hive Metastore fails to start with SQLServerException

2014-06-06 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020409#comment-14020409
 ] 

Sergey Shelukhin commented on HIVE-7167:


1) Can you post SQLServerException you are getting?
2) Why these 3 methods of all methods?
3) It seems like and a hacky way to solve the problem. It can still fail again, 
right?

 Hive Metastore fails to start with SQLServerException
 -

 Key: HIVE-7167
 URL: https://issues.apache.org/jira/browse/HIVE-7167
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.13.0
Reporter: Xiaobing Zhou
Assignee: Xiaobing Zhou
  Labels: patch,, test
 Fix For: 0.13.0

 Attachments: HIVE-7167.1.patch


 In the case that hiveserver2 uses embedded metastore and hiveserver uses 
 remote metastore, this exception comes up when hiveserver2 and hiveserver are 
 started simultaneously.
 metastore service status is running but when I launch hive cli, I get 
 following metastore connection error:
 C:\apps\dist\hive-0.13.0.2.1.2.0-1660\binhive.cmd
 14/05/09 17:40:03 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* 
 no l
 onger has any effect.  Use hive.hmshandler.retry.* instead
 Logging initialized using configuration in 
 file:/C:/apps/dist/hive-0.13.0.2.1.2.
 0-1660/conf/hive-log4j.properties
 Exception in thread main java.lang.RuntimeException: 
 java.lang.RuntimeExceptio
 n: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
 at 
 org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.jav
 a:347)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
 java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
 sorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:601)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 Caused by: java.lang.RuntimeException: Unable to instantiate 
 org.apache.hadoop.h
 ive.metastore.HiveMetaStoreClient
 at 
 org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStore
 Utils.java:1413)
 at 
 org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.init(Retry
 ingMetaStoreClient.java:62)
 at 
 org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(Ret
 ryingMetaStoreClient.java:72)
 at 
 org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.ja
 va:2444)
 at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2456)
 at 
 org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.jav
 a:341)
 ... 7 more
 Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
 Method)
 at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstruct
 orAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingC
 onstructorAccessorImpl.java:45)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
 at 
 org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStore
 Utils.java:1411)
 ... 12 more
 Caused by: MetaException(message:Could not connect to meta store using any of 
 th
 e URIs provided. Most recent failure: 
 org.apache.thrift.transport.TTransportExce
 ption: java.net.ConnectException: Connection refused: connect
 at org.apache.thrift.transport.TSocket.open(TSocket.java:185)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaSto
 reClient.java:336)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.init(HiveMetaS
 toreClient.java:214)
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
 Method)
 at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstruct
 orAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingC
 onstructorAccessorImpl.java:45)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
 at 
 org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStore
 Utils.java:1411)
 at 
 org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.init(Retry
 ingMetaStoreClient.java:62)
 at 
 org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(Ret
 ryingMetaStoreClient.java:72)
 at 
 org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.ja
 va:2444)
 at 

[jira] [Updated] (HIVE-7192) Hive Streaming - Some required settings are not mentioned in the documentation

2014-06-06 Thread Roshan Naik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated HIVE-7192:
--

Attachment: HIVE-7192.patch

uploading patch

 Hive Streaming - Some required settings are not mentioned in the documentation
 --

 Key: HIVE-7192
 URL: https://issues.apache.org/jira/browse/HIVE-7192
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.13.0
Reporter: Roshan Naik
Assignee: Roshan Naik
  Labels: Streaming
 Attachments: HIVE-7192.patch


 Specifically:
  - hive.support.concurrency on metastore
  - hive.vectorized.execution.enabled for query client



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7192) Hive Streaming - Some required settings are not mentioned in the documentation

2014-06-06 Thread Roshan Naik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated HIVE-7192:
--

Status: Patch Available  (was: Open)

 Hive Streaming - Some required settings are not mentioned in the documentation
 --

 Key: HIVE-7192
 URL: https://issues.apache.org/jira/browse/HIVE-7192
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.13.0
Reporter: Roshan Naik
Assignee: Roshan Naik
  Labels: Streaming
 Attachments: HIVE-7192.patch


 Specifically:
  - hive.support.concurrency on metastore
  - hive.vectorized.execution.enabled for query client



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7187) Reconcile jetty versions in hive

2014-06-06 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7187:
---

Assignee: Ashutosh Chauhan
  Status: Patch Available  (was: Open)

 Reconcile jetty versions in hive
 

 Key: HIVE-7187
 URL: https://issues.apache.org/jira/browse/HIVE-7187
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Web UI, WebHCat
Reporter: Vaibhav Gumashta
Assignee: Ashutosh Chauhan
 Attachments: HIVE-7187.patch


 Hive root pom has 3 parameters for specifying jetty dependency versions:
 {code}
 jetty.version6.1.26/jetty.version
 jetty.webhcat.version7.6.0.v20120127/jetty.webhcat.version
 jetty.hive-service.version7.6.0.v20120127/jetty.hive-service.version
 {code}
 1st one is used by HWI, 2nd by WebHCat and 3rd by HiveServer2 (in http mode). 
 We should probably use the same jetty version for all hive components. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7187) Reconcile jetty versions in hive

2014-06-06 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7187:
---

Attachment: HIVE-7187.patch

 Reconcile jetty versions in hive
 

 Key: HIVE-7187
 URL: https://issues.apache.org/jira/browse/HIVE-7187
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Web UI, WebHCat
Reporter: Vaibhav Gumashta
 Attachments: HIVE-7187.patch


 Hive root pom has 3 parameters for specifying jetty dependency versions:
 {code}
 jetty.version6.1.26/jetty.version
 jetty.webhcat.version7.6.0.v20120127/jetty.webhcat.version
 jetty.hive-service.version7.6.0.v20120127/jetty.hive-service.version
 {code}
 1st one is used by HWI, 2nd by WebHCat and 3rd by HiveServer2 (in http mode). 
 We should probably use the same jetty version for all hive components. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 22170: analyze table T compute statistics for columns; will now compute stats for all columns.

2014-06-06 Thread j . prasanth . j

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22170/#review44974
---

Ship it!


Ship It!

- Prasanth_J


On June 6, 2014, 4:30 p.m., Ashutosh Chauhan wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/22170/
 ---
 
 (Updated June 6, 2014, 4:30 p.m.)
 
 
 Review request for hive and Prasanth_J.
 
 
 Bugs: HIVE-7168
 https://issues.apache.org/jira/browse/HIVE-7168
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 analyze table T compute statistics for columns; will now compute stats for 
 all columns.
 
 
 Diffs
 -
 
   
 metastore/src/model/org/apache/hadoop/hive/metastore/model/MPartitionColumnStatistics.java
  1245d80 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java 
 5b77e6f 
   ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g 6d958fd 
   ql/src/test/queries/clientpositive/columnstats_partlvl.q 9dfe8ff 
   ql/src/test/queries/clientpositive/columnstats_tbllvl.q 170fbc5 
   ql/src/test/results/clientpositive/columnstats_partlvl.q.out d91be8d 
   ql/src/test/results/clientpositive/columnstats_tbllvl.q.out 3d3d0e2 
   ql/src/test/results/clientpositive/display_colstats_tbllvl.q.out 03b536f 
 
 Diff: https://reviews.apache.org/r/22170/diff/
 
 
 Testing
 ---
 
 Added new tests.
 
 
 Thanks,
 
 Ashutosh Chauhan
 




Review Request 22328: Make hive use one jetty version.

2014-06-06 Thread Ashutosh Chauhan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22328/
---

Review request for hive, Eugene Koifman and Vaibhav Gumashta.


Bugs: HIVE-7187
https://issues.apache.org/jira/browse/HIVE-7187


Repository: hive


Description
---

Make hive use one jetty version.


Diffs
-

  trunk/hcatalog/webhcat/svr/pom.xml 1600966 
  trunk/hwi/pom.xml 1600966 
  trunk/pom.xml 1600992 
  trunk/service/pom.xml 1600966 
  trunk/shims/0.20/pom.xml 1600966 
  trunk/shims/0.20S/pom.xml 1600966 
  trunk/shims/0.23/pom.xml 1600966 

Diff: https://reviews.apache.org/r/22328/diff/


Testing
---

Manually built and ran few tests.


Thanks,

Ashutosh Chauhan



[jira] [Commented] (HIVE-7168) Don't require to name all columns in analyze statements if stats collection is for all columns

2014-06-06 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020423#comment-14020423
 ] 

Prasanth J commented on HIVE-7168:
--

+1

 Don't require to name all columns in analyze statements if stats collection 
 is for all columns
 --

 Key: HIVE-7168
 URL: https://issues.apache.org/jira/browse/HIVE-7168
 Project: Hive
  Issue Type: Improvement
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-7168.1.patch, HIVE-7168.2.patch, HIVE-7168.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


Review Request 22329: HIVE-7190. WebHCat launcher task failure can cause two concurent user jobs to run

2014-06-06 Thread Ivan Mitic

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22329/
---

Review request for hive.


Repository: hive-git


Description
---

Approach in the patch is similar to what Oozie does to handle this situation. 
Specifically, all child map jobs get tagged with the launcher MR job id. On 
launcher task restart, launcher queries RM for the list of jobs that have the 
tag and kills them. After that it moves on to start the same child job again. 
Again, similarly to what Oozie does, a new templeton.job.launch.time property 
is introduced that captures the launcher job submit timestamp and later used to 
reduce the search window when RM is queried. 

To validate the patch, you will need to add webhcat shim jars to 
templeton.libjars as now webhcat launcher also has a dependency on hadoop 
shims. 

I have noticed that in case of the SqoopDelegator webhcat currently does not 
set the MR delegation token when optionsFile flag is used. This also creates 
the problem in this scenario. This looks like something that should be handled 
via a separate Jira.


Diffs
-

  
hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/HiveDelegator.java
 23b1c4f 
  
hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/JarDelegator.java
 41b1dc5 
  
hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/LauncherDelegator.java
 04a5c6f 
  
hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/PigDelegator.java
 04e061d 
  
hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/SqoopDelegator.java
 adcd917 
  
hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/tool/JobSubmissionConstants.java
 a6355a6 
  
hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/tool/LaunchMapper.java
 556ee62 
  shims/0.20S/src/main/java/org/apache/hadoop/mapred/WebHCatJTShim20S.java 
d3552c1 
  shims/0.23/src/main/java/org/apache/hadoop/mapred/WebHCatJTShim23.java 
5a728b2 
  shims/common/src/main/java/org/apache/hadoop/hive/shims/HadoopShims.java 
299e918 

Diff: https://reviews.apache.org/r/22329/diff/


Testing
---

I have validated that MR, Pig and Hive jobs do get tagged appropriately. I have 
also validated that previous child jobs do get killed on RM failover/task 
failure.


Thanks,

Ivan Mitic



[jira] [Commented] (HIVE-7190) WebHCat launcher task failure can cause two concurent user jobs to run

2014-06-06 Thread Ivan Mitic (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020435#comment-14020435
 ] 

Ivan Mitic commented on HIVE-7190:
--

Review board: https://reviews.apache.org/r/22329/ 

 WebHCat launcher task failure can cause two concurent user jobs to run
 --

 Key: HIVE-7190
 URL: https://issues.apache.org/jira/browse/HIVE-7190
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.13.0
Reporter: Ivan Mitic
 Attachments: HIVE-7190.2.patch, HIVE-7190.patch


 Templeton uses launcher jobs to launch the actual user jobs. Launcher jobs 
 are 1-map jobs (a single task jobs) which kick off the actual user job and 
 monitor it until it finishes. Given that the launcher is a task, like any 
 other MR task, it has a retry policy in case it fails (due to a task crash, 
 tasktracker/nodemanager crash, machine level outage, etc.). Further, when 
 launcher task is retried, it will again launch the same user job, *however* 
 the previous attempt user job is already running. What this means is that we 
 can have two identical user jobs running in parallel. 
 In case of MRv2, there will be an MRAppMaster and the launcher task, which 
 are subject to failure. In case any of the two fails, another instance of a 
 user job will be launched again in parallel. 
 Above situation is already a bug.
 Now going further to RM HA, what RM does on failover/restart is that it kills 
 all containers, and it restarts all applications. This means that if our 
 customer had 10 jobs on the cluster (this is 10 launcher jobs and 10 user 
 jobs), on RM failover, all 20 jobs will be restarted, and launcher jobs will 
 queue user jobs again. There are two issues with this design:
 1. There are *possible* chances for corruption of job outputs (it would be 
 useful to analyze this scenario more and confirm this statement).
 2. Cluster resources are spent on jobs redundantly
 To address the issue at least on Yarn (Hadoop 2.0) clusters, webhcat should 
 do the same thing Oozie does in this scenario, and that is to tag all its 
 child jobs with an id, and kill those jobs on task restart before they are 
 kicked off again.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7190) WebHCat launcher task failure can cause two concurent user jobs to run

2014-06-06 Thread Ivan Mitic (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Mitic updated HIVE-7190:
-

Attachment: HIVE-7190.2.patch

Rebasing patch against latest hive trunk.

 WebHCat launcher task failure can cause two concurent user jobs to run
 --

 Key: HIVE-7190
 URL: https://issues.apache.org/jira/browse/HIVE-7190
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.13.0
Reporter: Ivan Mitic
 Attachments: HIVE-7190.2.patch, HIVE-7190.patch


 Templeton uses launcher jobs to launch the actual user jobs. Launcher jobs 
 are 1-map jobs (a single task jobs) which kick off the actual user job and 
 monitor it until it finishes. Given that the launcher is a task, like any 
 other MR task, it has a retry policy in case it fails (due to a task crash, 
 tasktracker/nodemanager crash, machine level outage, etc.). Further, when 
 launcher task is retried, it will again launch the same user job, *however* 
 the previous attempt user job is already running. What this means is that we 
 can have two identical user jobs running in parallel. 
 In case of MRv2, there will be an MRAppMaster and the launcher task, which 
 are subject to failure. In case any of the two fails, another instance of a 
 user job will be launched again in parallel. 
 Above situation is already a bug.
 Now going further to RM HA, what RM does on failover/restart is that it kills 
 all containers, and it restarts all applications. This means that if our 
 customer had 10 jobs on the cluster (this is 10 launcher jobs and 10 user 
 jobs), on RM failover, all 20 jobs will be restarted, and launcher jobs will 
 queue user jobs again. There are two issues with this design:
 1. There are *possible* chances for corruption of job outputs (it would be 
 useful to analyze this scenario more and confirm this statement).
 2. Cluster resources are spent on jobs redundantly
 To address the issue at least on Yarn (Hadoop 2.0) clusters, webhcat should 
 do the same thing Oozie does in this scenario, and that is to tag all its 
 child jobs with an id, and kill those jobs on task restart before they are 
 kicked off again.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-5687) Streaming support in Hive

2014-06-06 Thread Roshan Naik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated HIVE-5687:
--

Attachment: (was: Hive Streaming Ingest API for v4 patch.pdf)

 Streaming support in Hive
 -

 Key: HIVE-5687
 URL: https://issues.apache.org/jira/browse/HIVE-5687
 Project: Hive
  Issue Type: Sub-task
Reporter: Roshan Naik
Assignee: Roshan Naik
  Labels: ACID, Streaming
 Fix For: 0.13.0

 Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, 
 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, 
 HIVE-5687-unit-test-fix.patch, HIVE-5687.patch, HIVE-5687.v2.patch, 
 HIVE-5687.v3.patch, HIVE-5687.v4.patch, HIVE-5687.v5.patch, 
 HIVE-5687.v6.patch, HIVE-5687.v7.patch, Hive Streaming Ingest API for v3 
 patch.pdf, package.html


 Implement support for Streaming data into HIVE.
 - Provide a client streaming API 
 - Transaction support: Clients should be able to periodically commit a batch 
 of records atomically
 - Immediate visibility: Records should be immediately visible to queries on 
 commit
 - Should not overload HDFS with too many small files
 Use Cases:
  - Streaming logs into HIVE via Flume
  - Streaming results of computations from Storm



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-5687) Streaming support in Hive

2014-06-06 Thread Roshan Naik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated HIVE-5687:
--

Attachment: Hive Streaming Ingest API for v4 patch.pdf

updating 'Hive Streaming Ingest API for v4 patch.pdf'
  document with requirements

 Streaming support in Hive
 -

 Key: HIVE-5687
 URL: https://issues.apache.org/jira/browse/HIVE-5687
 Project: Hive
  Issue Type: Sub-task
Reporter: Roshan Naik
Assignee: Roshan Naik
  Labels: ACID, Streaming
 Fix For: 0.13.0

 Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, 
 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, 
 HIVE-5687-unit-test-fix.patch, HIVE-5687.patch, HIVE-5687.v2.patch, 
 HIVE-5687.v3.patch, HIVE-5687.v4.patch, HIVE-5687.v5.patch, 
 HIVE-5687.v6.patch, HIVE-5687.v7.patch, Hive Streaming Ingest API for v3 
 patch.pdf, Hive Streaming Ingest API for v4 patch.pdf, package.html


 Implement support for Streaming data into HIVE.
 - Provide a client streaming API 
 - Transaction support: Clients should be able to periodically commit a batch 
 of records atomically
 - Immediate visibility: Records should be immediately visible to queries on 
 commit
 - Should not overload HDFS with too many small files
 Use Cases:
  - Streaming logs into HIVE via Flume
  - Streaming results of computations from Storm



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7138) add row index dump capability to ORC file dump

2014-06-06 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020445#comment-14020445
 ] 

Owen O'Malley commented on HIVE-7138:
-

+1, but I'd like to use --rowindex instead of -rowindex

 add row index dump capability to ORC file dump
 --

 Key: HIVE-7138
 URL: https://issues.apache.org/jira/browse/HIVE-7138
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-7138.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7168) Don't require to name all columns in analyze statements if stats collection is for all columns

2014-06-06 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020474#comment-14020474
 ] 

Hive QA commented on HIVE-7168:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12648654/HIVE-7168.2.patch

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 5585 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_insert1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_load_dyn_part1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_scriptfile1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_dml
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimal
org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalX
org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalXY
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/400/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/400/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-400/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12648654

 Don't require to name all columns in analyze statements if stats collection 
 is for all columns
 --

 Key: HIVE-7168
 URL: https://issues.apache.org/jira/browse/HIVE-7168
 Project: Hive
  Issue Type: Improvement
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-7168.1.patch, HIVE-7168.2.patch, HIVE-7168.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7168) Don't require to name all columns in analyze statements if stats collection is for all columns

2014-06-06 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7168:
---

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk.

 Don't require to name all columns in analyze statements if stats collection 
 is for all columns
 --

 Key: HIVE-7168
 URL: https://issues.apache.org/jira/browse/HIVE-7168
 Project: Hive
  Issue Type: Improvement
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.14.0

 Attachments: HIVE-7168.1.patch, HIVE-7168.2.patch, HIVE-7168.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7193) Hive should support additional LDAP authentication parameters

2014-06-06 Thread Mala Chikka Kempanna (JIRA)
Mala Chikka Kempanna created HIVE-7193:
--

 Summary: Hive should support additional LDAP authentication 
parameters
 Key: HIVE-7193
 URL: https://issues.apache.org/jira/browse/HIVE-7193
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Mala Chikka Kempanna


Currently hive has only following authenticator parameters for LDAP
 authentication for hiveserver2. 
property 
namehive.server2.authentication/name 
valueLDAP/value 
/property 
property 
namehive.server2.authentication.ldap.url/name 
valueldap://our_ldap_address/value 
/property 

We need to include other LDAP properties as part of hive-LDAP authentication 
like below
a group search base - dc=domain,dc=com 
a group search filter - member={0} 
a user search base - dc=domain,dc=com 
a user search filter - sAMAAccountName={0} 
a list of valid user groups - group1,group2,group3 





--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 22328: Make hive use one jetty version.

2014-06-06 Thread Vaibhav Gumashta

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22328/#review44984
---

Ship it!


Ship It!

- Vaibhav Gumashta


On June 6, 2014, 9:54 p.m., Ashutosh Chauhan wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/22328/
 ---
 
 (Updated June 6, 2014, 9:54 p.m.)
 
 
 Review request for hive, Eugene Koifman and Vaibhav Gumashta.
 
 
 Bugs: HIVE-7187
 https://issues.apache.org/jira/browse/HIVE-7187
 
 
 Repository: hive
 
 
 Description
 ---
 
 Make hive use one jetty version.
 
 
 Diffs
 -
 
   trunk/hcatalog/webhcat/svr/pom.xml 1600966 
   trunk/hwi/pom.xml 1600966 
   trunk/pom.xml 1600992 
   trunk/service/pom.xml 1600966 
   trunk/shims/0.20/pom.xml 1600966 
   trunk/shims/0.20S/pom.xml 1600966 
   trunk/shims/0.23/pom.xml 1600966 
 
 Diff: https://reviews.apache.org/r/22328/diff/
 
 
 Testing
 ---
 
 Manually built and ran few tests.
 
 
 Thanks,
 
 Ashutosh Chauhan
 




[jira] [Commented] (HIVE-7187) Reconcile jetty versions in hive

2014-06-06 Thread Vaibhav Gumashta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020531#comment-14020531
 ] 

Vaibhav Gumashta commented on HIVE-7187:


+1 (pending tests). 

[~ekoifman] How about we handle the upgrade to new jetty version in a new jira?

 Reconcile jetty versions in hive
 

 Key: HIVE-7187
 URL: https://issues.apache.org/jira/browse/HIVE-7187
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Web UI, WebHCat
Reporter: Vaibhav Gumashta
Assignee: Ashutosh Chauhan
 Attachments: HIVE-7187.patch


 Hive root pom has 3 parameters for specifying jetty dependency versions:
 {code}
 jetty.version6.1.26/jetty.version
 jetty.webhcat.version7.6.0.v20120127/jetty.webhcat.version
 jetty.hive-service.version7.6.0.v20120127/jetty.hive-service.version
 {code}
 1st one is used by HWI, 2nd by WebHCat and 3rd by HiveServer2 (in http mode). 
 We should probably use the same jetty version for all hive components. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6394) Implement Timestmap in ParquetSerde

2014-06-06 Thread Szehon Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-6394:


Attachment: HIVE-6394.5.patch

 Implement Timestmap in ParquetSerde
 ---

 Key: HIVE-6394
 URL: https://issues.apache.org/jira/browse/HIVE-6394
 Project: Hive
  Issue Type: Sub-task
  Components: Serializers/Deserializers
Reporter: Jarek Jarcec Cecho
Assignee: Szehon Ho
  Labels: Parquet
 Attachments: HIVE-6394.2.patch, HIVE-6394.3.patch, HIVE-6394.4.patch, 
 HIVE-6394.5.patch, HIVE-6394.5.patch, HIVE-6394.patch


 This JIRA is to implement timestamp support in Parquet SerDe.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6394) Implement Timestmap in ParquetSerde

2014-06-06 Thread Szehon Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-6394:


Attachment: (was: HIVE-6394.5.patch)

 Implement Timestmap in ParquetSerde
 ---

 Key: HIVE-6394
 URL: https://issues.apache.org/jira/browse/HIVE-6394
 Project: Hive
  Issue Type: Sub-task
  Components: Serializers/Deserializers
Reporter: Jarek Jarcec Cecho
Assignee: Szehon Ho
  Labels: Parquet
 Attachments: HIVE-6394.2.patch, HIVE-6394.3.patch, HIVE-6394.4.patch, 
 HIVE-6394.5.patch, HIVE-6394.6.patch, HIVE-6394.patch


 This JIRA is to implement timestamp support in Parquet SerDe.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6394) Implement Timestmap in ParquetSerde

2014-06-06 Thread Szehon Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-6394:


Attachment: HIVE-6394.6.patch

 Implement Timestmap in ParquetSerde
 ---

 Key: HIVE-6394
 URL: https://issues.apache.org/jira/browse/HIVE-6394
 Project: Hive
  Issue Type: Sub-task
  Components: Serializers/Deserializers
Reporter: Jarek Jarcec Cecho
Assignee: Szehon Ho
  Labels: Parquet
 Attachments: HIVE-6394.2.patch, HIVE-6394.3.patch, HIVE-6394.4.patch, 
 HIVE-6394.5.patch, HIVE-6394.6.patch, HIVE-6394.patch


 This JIRA is to implement timestamp support in Parquet SerDe.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6394) Implement Timestmap in ParquetSerde

2014-06-06 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020564#comment-14020564
 ] 

Szehon Ho commented on HIVE-6394:
-

Attaching another patch.  Was using a parquet-example class, now explicitly 
adding that logic in the serde layer.

 Implement Timestmap in ParquetSerde
 ---

 Key: HIVE-6394
 URL: https://issues.apache.org/jira/browse/HIVE-6394
 Project: Hive
  Issue Type: Sub-task
  Components: Serializers/Deserializers
Reporter: Jarek Jarcec Cecho
Assignee: Szehon Ho
  Labels: Parquet
 Attachments: HIVE-6394.2.patch, HIVE-6394.3.patch, HIVE-6394.4.patch, 
 HIVE-6394.5.patch, HIVE-6394.5.patch, HIVE-6394.patch


 This JIRA is to implement timestamp support in Parquet SerDe.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 22174: HIVE-6394 Implement Timestmap in ParquetSerde

2014-06-06 Thread Szehon Ho

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22174/
---

(Updated June 7, 2014, 12:06 a.m.)


Review request for hive, Brock Noland, justin coffey, and Xuefu Zhang.


Changes
---

One more change, adding the 'NanoTime' class in Hive, as it was an example 
class in parquet.

Let's go with using un-annotated INT96 for parquet, that's what other consuming 
applications have been doing.  When the annotation does come, we'll move to 
that.


Bugs: HIVE-6394
https://issues.apache.org/jira/browse/HIVE-6394


Repository: hive-git


Description
---

This uses the Jodd library to convert java.sql.Timestamp type used by Hive into 
the {julian-day:nanos} format expected by parquet, and vice-versa.


Diffs (updated)
-

  data/files/parquet_types.txt 0be390b 
  pom.xml 4bb8880 
  ql/pom.xml 13c477a 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java 
4da0d30 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveSchemaConverter.java
 29f7e11 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ArrayWritableObjectInspector.java
 57161d8 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java 
fb2f5a8 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTime.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTimeUtils.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java 
3490061 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetTimestampUtils.java
 PRE-CREATION 
  ql/src/test/queries/clientpositive/parquet_types.q 5d6333c 
  ql/src/test/results/clientpositive/parquet_types.q.out c23f7f1 

Diff: https://reviews.apache.org/r/22174/diff/


Testing
---

Unit tests the new libraries, and also added timestamp data in the 
parquet_types q-test.


Thanks,

Szehon Ho



[jira] [Updated] (HIVE-7094) Separate out static/dynamic partitioning code in FileRecordWriterContainer

2014-06-06 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-7094:
-

Component/s: HCatalog

 Separate out static/dynamic partitioning code in FileRecordWriterContainer
 --

 Key: HIVE-7094
 URL: https://issues.apache.org/jira/browse/HIVE-7094
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Reporter: David Chen
Assignee: David Chen
 Attachments: HIVE-7094.1.patch


 There are two major places in FileRecordWriterContainer that have the {{if 
 (dynamicPartitioning)}} condition: the constructor and write().
 This is the approach that I am taking:
 # Move the DP and SP code into two subclasses: 
 DynamicFileRecordWriterContainer and StaticFileRecordWriterContainer.
 # Make FileRecordWriterContainer an abstract class that contains the common 
 code for both implementations. For write(), FileRecordWriterContainer will 
 call an abstract method that will provide the local RecordWriter, 
 ObjectInspector, SerDe, and OutputJobInfo.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 22329: HIVE-7190. WebHCat launcher task failure can cause two concurent user jobs to run

2014-06-06 Thread Eugene Koifman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22329/#review44992
---


1. I think webhcat-default.xml should be modified to include the jars that are 
now required in templeton.libjars to minimize out-of-the-box config for end 
users.
2. Is there any test (e2e) that can be added for this? (with reasonable amount 
of effort)
3. When you tested that Pig/Hive jobs get properly tagged, you mean you tested 
that MR jobs that are generated by Pig/Hive are tagged, correct?


hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/tool/JobSubmissionConstants.java
https://reviews.apache.org/r/22329/#comment79625

I think it would be useful to add a more detailed description of these 
props.  Something like what is in the JIRA ticket.  I would have added the 
ticket number to the comment, but Hive prohibits that.



hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/tool/LaunchMapper.java
https://reviews.apache.org/r/22329/#comment79632

Which user will this use?  Is it the user running WebHCat or the value of 
'doAs' parameter?



shims/0.23/src/main/java/org/apache/hadoop/mapred/WebHCatJTShim23.java
https://reviews.apache.org/r/22329/#comment79613

Is LOG.info() the right log level?  Seems like it will pollute the log file.



shims/0.23/src/main/java/org/apache/hadoop/mapred/WebHCatJTShim23.java
https://reviews.apache.org/r/22329/#comment79615

Is LOG.info() the right level?



shims/0.23/src/main/java/org/apache/hadoop/mapred/WebHCatJTShim23.java
https://reviews.apache.org/r/22329/#comment79631

log level


- Eugene Koifman


On June 6, 2014, 10:02 p.m., Ivan Mitic wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/22329/
 ---
 
 (Updated June 6, 2014, 10:02 p.m.)
 
 
 Review request for hive.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Approach in the patch is similar to what Oozie does to handle this situation. 
 Specifically, all child map jobs get tagged with the launcher MR job id. On 
 launcher task restart, launcher queries RM for the list of jobs that have the 
 tag and kills them. After that it moves on to start the same child job again. 
 Again, similarly to what Oozie does, a new templeton.job.launch.time property 
 is introduced that captures the launcher job submit timestamp and later used 
 to reduce the search window when RM is queried. 
 
 To validate the patch, you will need to add webhcat shim jars to 
 templeton.libjars as now webhcat launcher also has a dependency on hadoop 
 shims. 
 
 I have noticed that in case of the SqoopDelegator webhcat currently does not 
 set the MR delegation token when optionsFile flag is used. This also creates 
 the problem in this scenario. This looks like something that should be 
 handled via a separate Jira.
 
 
 Diffs
 -
 
   
 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/HiveDelegator.java
  23b1c4f 
   
 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/JarDelegator.java
  41b1dc5 
   
 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/LauncherDelegator.java
  04a5c6f 
   
 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/PigDelegator.java
  04e061d 
   
 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/SqoopDelegator.java
  adcd917 
   
 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/tool/JobSubmissionConstants.java
  a6355a6 
   
 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/tool/LaunchMapper.java
  556ee62 
   shims/0.20S/src/main/java/org/apache/hadoop/mapred/WebHCatJTShim20S.java 
 d3552c1 
   shims/0.23/src/main/java/org/apache/hadoop/mapred/WebHCatJTShim23.java 
 5a728b2 
   shims/common/src/main/java/org/apache/hadoop/hive/shims/HadoopShims.java 
 299e918 
 
 Diff: https://reviews.apache.org/r/22329/diff/
 
 
 Testing
 ---
 
 I have validated that MR, Pig and Hive jobs do get tagged appropriately. I 
 have also validated that previous child jobs do get killed on RM 
 failover/task failure.
 
 
 Thanks,
 
 Ivan Mitic
 




[jira] [Updated] (HIVE-6473) Allow writing HFiles via HBaseStorageHandler table

2014-06-06 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-6473:
---

Fix Version/s: 0.14.0

 Allow writing HFiles via HBaseStorageHandler table
 --

 Key: HIVE-6473
 URL: https://issues.apache.org/jira/browse/HIVE-6473
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6473.0.patch.txt, HIVE-6473.1.patch, 
 HIVE-6473.1.patch.txt, HIVE-6473.2.patch, HIVE-6473.3.patch, 
 HIVE-6473.4.patch, HIVE-6473.5.patch, HIVE-6473.6.patch


 Generating HFiles for bulkload into HBase could be more convenient. Right now 
 we require the user to register a new table with the appropriate output 
 format. This patch allows the exact same functionality, but through an 
 existing table managed by the HBaseStorageHandler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6473) Allow writing HFiles via HBaseStorageHandler table

2014-06-06 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-6473:
---

Attachment: HIVE-6473.6.patch

Rebased onto trunk again. Removed enabling of hbase_bulk.m; it mostly passes 
but is flakey for me. Will address it in a follow-on ticket.

 Allow writing HFiles via HBaseStorageHandler table
 --

 Key: HIVE-6473
 URL: https://issues.apache.org/jira/browse/HIVE-6473
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6473.0.patch.txt, HIVE-6473.1.patch, 
 HIVE-6473.1.patch.txt, HIVE-6473.2.patch, HIVE-6473.3.patch, 
 HIVE-6473.4.patch, HIVE-6473.5.patch, HIVE-6473.6.patch


 Generating HFiles for bulkload into HBase could be more convenient. Right now 
 we require the user to register a new table with the appropriate output 
 format. This patch allows the exact same functionality, but through an 
 existing table managed by the HBaseStorageHandler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7155) WebHCat controller job exceeds container memory limit

2014-06-06 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020636#comment-14020636
 ] 

Eugene Koifman commented on HIVE-7155:
--

+1

 WebHCat controller job exceeds container memory limit
 -

 Key: HIVE-7155
 URL: https://issues.apache.org/jira/browse/HIVE-7155
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.13.0
Reporter: shanyu zhao
Assignee: shanyu zhao
 Attachments: HIVE-7155.1.patch, HIVE-7155.patch


 Submit a Hive query on a large table via WebHCat results in failure because 
 the WebHCat controller job is killed by Yarn since it exceeds the memory 
 limit (set by mapreduce.map.memory.mb, defaults to 1GB):
 {code}
  INSERT OVERWRITE TABLE Temp_InjusticeEvents_2014_03_01_00_00 SELECT * from 
 Stage_InjusticeEvents where LogTimestamp  '2014-03-01 00:00:00' and 
 LogTimestamp = '2014-03-01 01:00:00';
 {code}
 We could increase mapreduce.map.memory.mb to solve this problem, but this way 
 we are changing this setting system wise.
 We need to provide a WebHCat configuration to overwrite 
 mapreduce.map.memory.mb when submitting the controller job.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-2365) SQL support for bulk load into HBase

2014-06-06 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-2365:
---

Attachment: HIVE-2365.3.patch

Rebased onto HIVE-6473 patch v6.

 SQL support for bulk load into HBase
 

 Key: HIVE-2365
 URL: https://issues.apache.org/jira/browse/HIVE-2365
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: John Sichi
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-2365.2.patch.txt, HIVE-2365.3.patch, 
 HIVE-2365.3.patch, HIVE-2365.WIP.00.patch, HIVE-2365.WIP.01.patch, 
 HIVE-2365.WIP.01.patch


 Support the as simple as this SQL for bulk load from Hive into HBase.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 22329: HIVE-7190. WebHCat launcher task failure can cause two concurent user jobs to run

2014-06-06 Thread Eugene Koifman


 On June 7, 2014, 1:05 a.m., Eugene Koifman wrote:
  1. I think webhcat-default.xml should be modified to include the jars that 
  are now required in templeton.libjars to minimize out-of-the-box config for 
  end users.
  2. Is there any test (e2e) that can be added for this? (with reasonable 
  amount of effort)
  3. When you tested that Pig/Hive jobs get properly tagged, you mean you 
  tested that MR jobs that are generated by Pig/Hive are tagged, correct?

4. Actually, instead of doing 1, could WebHCat dynamically figure out which 
hadoop version it's talking to and add only the necessary shim jar, rather than 
shipping all of them?  It reduces the amount of config needed.  It would also 
be better if we can only ship the minimal set of jars.


- Eugene


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22329/#review44992
---


On June 6, 2014, 10:02 p.m., Ivan Mitic wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/22329/
 ---
 
 (Updated June 6, 2014, 10:02 p.m.)
 
 
 Review request for hive.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Approach in the patch is similar to what Oozie does to handle this situation. 
 Specifically, all child map jobs get tagged with the launcher MR job id. On 
 launcher task restart, launcher queries RM for the list of jobs that have the 
 tag and kills them. After that it moves on to start the same child job again. 
 Again, similarly to what Oozie does, a new templeton.job.launch.time property 
 is introduced that captures the launcher job submit timestamp and later used 
 to reduce the search window when RM is queried. 
 
 To validate the patch, you will need to add webhcat shim jars to 
 templeton.libjars as now webhcat launcher also has a dependency on hadoop 
 shims. 
 
 I have noticed that in case of the SqoopDelegator webhcat currently does not 
 set the MR delegation token when optionsFile flag is used. This also creates 
 the problem in this scenario. This looks like something that should be 
 handled via a separate Jira.
 
 
 Diffs
 -
 
   
 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/HiveDelegator.java
  23b1c4f 
   
 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/JarDelegator.java
  41b1dc5 
   
 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/LauncherDelegator.java
  04a5c6f 
   
 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/PigDelegator.java
  04e061d 
   
 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/SqoopDelegator.java
  adcd917 
   
 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/tool/JobSubmissionConstants.java
  a6355a6 
   
 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/tool/LaunchMapper.java
  556ee62 
   shims/0.20S/src/main/java/org/apache/hadoop/mapred/WebHCatJTShim20S.java 
 d3552c1 
   shims/0.23/src/main/java/org/apache/hadoop/mapred/WebHCatJTShim23.java 
 5a728b2 
   shims/common/src/main/java/org/apache/hadoop/hive/shims/HadoopShims.java 
 299e918 
 
 Diff: https://reviews.apache.org/r/22329/diff/
 
 
 Testing
 ---
 
 I have validated that MR, Pig and Hive jobs do get tagged appropriately. I 
 have also validated that previous child jobs do get killed on RM 
 failover/task failure.
 
 
 Thanks,
 
 Ivan Mitic
 




[jira] [Commented] (HIVE-7175) Provide password file option to beeline

2014-06-06 Thread Dr. Wendell Urth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020642#comment-14020642
 ] 

Dr. Wendell Urth commented on HIVE-7175:


Hi [~hiveqa], none of the failed tests appear related to the small additive 
change specific to BeeLine done here. These tests appear to be generally 
failing on trunk, and are not caused by this patch. Let me know if I am wrong.

 Provide password file option to beeline
 ---

 Key: HIVE-7175
 URL: https://issues.apache.org/jira/browse/HIVE-7175
 Project: Hive
  Issue Type: Improvement
  Components: CLI, Clients
Affects Versions: 0.13.0
Reporter: Robert Justice
  Labels: features, security
 Attachments: HIVE-7175.patch


 For people connecting to Hive Server 2 with LDAP authentication enabled, in 
 order to batch run commands, we currently have to provide the password openly 
 in the command line.   They could use some expect scripting, but I think a 
 valid improvement would be to provide a password file option similar to other 
 CLI commands in hadoop (e.g. sqoop) to be more secure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7191) optimized map join hash table has a bug when it reaches 2Gb

2014-06-06 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020660#comment-14020660
 ] 

Hive QA commented on HIVE-7191:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12648727/HIVE-7191.patch

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 5510 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimal
org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalX
org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalXY
org.apache.hive.hcatalog.templeton.tool.TestTempletonUtils.testPropertiesParsing
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/401/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/401/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-401/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12648727

 optimized map join hash table has a bug when it reaches 2Gb
 ---

 Key: HIVE-7191
 URL: https://issues.apache.org/jira/browse/HIVE-7191
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-7191.patch


 Via [~t3rmin4t0r]:
 {noformat}
 Caused by: java.lang.ArrayIndexOutOfBoundsException: -204
 at java.util.ArrayList.elementData(ArrayList.java:371)
 at java.util.ArrayList.get(ArrayList.java:384)
 at 
 org.apache.hadoop.hive.serde2.WriteBuffers.setReadPoint(WriteBuffers.java:95)
 at 
 org.apache.hadoop.hive.serde2.WriteBuffers.hashCode(WriteBuffers.java:100)
 at 
 org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.put(BytesBytesMultiHashMap.java:203)
 at 
 org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer.putRow(MapJoinBytesTableContainer.java:266)
 at 
 org.apache.hadoop.hive.ql.exec.tez.HashTableLoader.load(HashTableLoader.java:124)
 ... 16 more
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7192) Hive Streaming - Some required settings are not mentioned in the documentation

2014-06-06 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020695#comment-14020695
 ] 

Hive QA commented on HIVE-7192:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12648732/HIVE-7192.patch

{color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 5585 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_insert1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_load_dyn_part1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_scriptfile1
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_part
org.apache.hadoop.hive.metastore.TestMetastoreVersion.testDefaults
org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimal
org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalX
org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalXY
org.apache.hive.hcatalog.templeton.tool.TestTempletonUtils.testPropertiesParsing
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/402/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/402/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-402/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 12 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12648732

 Hive Streaming - Some required settings are not mentioned in the documentation
 --

 Key: HIVE-7192
 URL: https://issues.apache.org/jira/browse/HIVE-7192
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.13.0
Reporter: Roshan Naik
Assignee: Roshan Naik
  Labels: Streaming
 Attachments: HIVE-7192.patch


 Specifically:
  - hive.support.concurrency on metastore
  - hive.vectorized.execution.enabled for query client



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7117) Partitions not inheriting table permissions after alter rename partition

2014-06-06 Thread Ashish Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020720#comment-14020720
 ] 

Ashish Kumar Singh commented on HIVE-7117:
--

Thanks [~szehon], [~xuefuz] and [~swarnim] for reviewing.

 Partitions not inheriting table permissions after alter rename partition
 

 Key: HIVE-7117
 URL: https://issues.apache.org/jira/browse/HIVE-7117
 Project: Hive
  Issue Type: Bug
  Components: Security
Reporter: Ashish Kumar Singh
Assignee: Ashish Kumar Singh
 Fix For: 0.14.0

 Attachments: HIVE-7117.2.patch, HIVE-7117.3.patch, HIVE-7117.4.patch, 
 HIVE-7117.5.patch, HIVE-7117.6.patch, HIVE-7117.7.patch, HIVE-7117.8.patch, 
 HIVE-7117.patch


 On altering/renaming a partition it must inherit permission of the parent 
 directory, if the flag hive.warehouse.subdir.inherit.perms is set.



--
This message was sent by Atlassian JIRA
(v6.2#6252)