date:20140606


 [ 
https://issues.apache.org/jira/browse/HIVE-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7136:
---

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Sumit!

 Allow Hive to read hive scripts from any of the supported file systems in 
 hadoop eco-system
 ---

 Key: HIVE-7136
 URL: https://issues.apache.org/jira/browse/HIVE-7136
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 0.13.0
Reporter: Sumit Kumar
Assignee: Sumit Kumar
Priority: Minor
 Fix For: 0.14.0

 Attachments: HIVE-7136.01.patch, HIVE-7136.patch


 Current hive cli assumes that the source file (hive script) is always on the 
 local file system. This patch implements support for reading source files 
 from other file systems in hadoop eco-system (hdfs, s3 etc) as well keeping 
 the default behavior intact to be reading from default filesystem (local) in 
 case scheme is not provided in the url for the source file.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7135) Fix test fail of TestTezTask.testSubmit


 [ 
https://issues.apache.org/jira/browse/HIVE-7135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7135:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Navis!

 Fix test fail of TestTezTask.testSubmit
 ---

 Key: HIVE-7135
 URL: https://issues.apache.org/jira/browse/HIVE-7135
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: 0.14.0
Reporter: Vikram Dixit K
Assignee: Navis
 Fix For: 0.14.0

 Attachments: HIVE-7135.1.patch, HIVE-7135.2.patch.txt


 HIVE-7043 broke a tez test case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7176) FileInputStream is not closed in Commands#properties()


 [ 
https://issues.apache.org/jira/browse/HIVE-7176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7176:
---

   Resolution: Fixed
Fix Version/s: 0.14.0
 Assignee: Navis
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Navis!

 FileInputStream is not closed in Commands#properties()
 --

 Key: HIVE-7176
 URL: https://issues.apache.org/jira/browse/HIVE-7176
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Navis
Priority: Minor
 Fix For: 0.14.0

 Attachments: HIVE-7176.1.patch.txt


 NO PRECOMMIT TESTS
 In beeline.Commands, around line 834:
 {code}
   props.load(new FileInputStream(parts[i]));
 {code}
 The FileInputStream is not closed upon return from the method.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 22170: analyze table T compute statistics for columns; will now compute stats for all columns.

2014-06-06 Thread Ashutosh Chauhan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22170/
---

(Updated June 6, 2014, 4:30 p.m.)


Review request for hive and Prasanth_J.


Changes
---

Fixed last failing test.


Bugs: HIVE-7168
https://issues.apache.org/jira/browse/HIVE-7168


Repository: hive-git


Description
---

analyze table T compute statistics for columns; will now compute stats for all 
columns.


Diffs (updated)
-

  
metastore/src/model/org/apache/hadoop/hive/metastore/model/MPartitionColumnStatistics.java
 1245d80 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java 
5b77e6f 
  ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g 6d958fd 
  ql/src/test/queries/clientpositive/columnstats_partlvl.q 9dfe8ff 
  ql/src/test/queries/clientpositive/columnstats_tbllvl.q 170fbc5 
  ql/src/test/results/clientpositive/columnstats_partlvl.q.out d91be8d 
  ql/src/test/results/clientpositive/columnstats_tbllvl.q.out 3d3d0e2 
  ql/src/test/results/clientpositive/display_colstats_tbllvl.q.out 03b536f 

Diff: https://reviews.apache.org/r/22170/diff/


Testing
---

Added new tests.


Thanks,

Ashutosh Chauhan

[jira] [Updated] (HIVE-7168) Don't require to name all columns in analyze statements if stats collection is for all columns


 [ 
https://issues.apache.org/jira/browse/HIVE-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7168:
---

Attachment: HIVE-7168.2.patch

Fixed last failing test.

 Don't require to name all columns in analyze statements if stats collection 
 is for all columns
 --

 Key: HIVE-7168
 URL: https://issues.apache.org/jira/browse/HIVE-7168
 Project: Hive
  Issue Type: Improvement
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-7168.1.patch, HIVE-7168.2.patch, HIVE-7168.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7168) Don't require to name all columns in analyze statements if stats collection is for all columns


 [ 
https://issues.apache.org/jira/browse/HIVE-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7168:
---

Status: Open  (was: Patch Available)

 Don't require to name all columns in analyze statements if stats collection 
 is for all columns
 --

 Key: HIVE-7168
 URL: https://issues.apache.org/jira/browse/HIVE-7168
 Project: Hive
  Issue Type: Improvement
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-7168.1.patch, HIVE-7168.2.patch, HIVE-7168.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7168) Don't require to name all columns in analyze statements if stats collection is for all columns


 [ 
https://issues.apache.org/jira/browse/HIVE-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7168:
---

Status: Patch Available  (was: Open)

 Don't require to name all columns in analyze statements if stats collection 
 is for all columns
 --

 Key: HIVE-7168
 URL: https://issues.apache.org/jira/browse/HIVE-7168
 Project: Hive
  Issue Type: Improvement
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-7168.1.patch, HIVE-7168.2.patch, HIVE-7168.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7040) TCP KeepAlive for HiveServer2


[ 
https://issues.apache.org/jira/browse/HIVE-7040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020023#comment-14020023
 ] 

Vaibhav Gumashta commented on HIVE-7040:


Thanks for the patch [~nicothieb]! There is another jira: HIVE-6679, which 
looks at doing this for binary mode (with and without SSL). Is it possible to 
handle the SSL case as well in this jira? 

 TCP KeepAlive for HiveServer2
 -

 Key: HIVE-7040
 URL: https://issues.apache.org/jira/browse/HIVE-7040
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2, Server Infrastructure
Reporter: Nicolas Thiébaud
 Attachments: HIVE-7040.patch, HIVE-7040.patch.2


 Implement TCP KeepAlive for HiverServer 2 to avoid half open connections.
 A setting could be added
 {code}
 property
   namehive.server2.tcp.keepalive/name
   valuetrue/value
   descriptionWhether to enable TCP keepalive for Hive Server 2/description
 /property
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7143) Add Streaming support in Windowing mode for more UDAFs (min/max, lead/lag, fval/lval)


[ 
https://issues.apache.org/jira/browse/HIVE-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020035#comment-14020035
 ] 

Ashutosh Chauhan commented on HIVE-7143:


+1

 Add Streaming support in Windowing mode for more UDAFs (min/max, lead/lag, 
 fval/lval)
 -

 Key: HIVE-7143
 URL: https://issues.apache.org/jira/browse/HIVE-7143
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-7143.1.patch, HIVE-7143.3.patch


 Provided implementations for Streaming for the above fns.
 Min/Max based on Alg by Daniel Lemire: 
 http://www.archipel.uqam.ca/309/1/webmaximinalgo.pdf



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7186) Unable to perform join on table

2014-06-06 Thread Alex Nastetsky (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Nastetsky updated HIVE-7186:
-

Environment: Hortonworks Data Platform 2.0.6.0  (was: Hortonworks Data 
Platform 2.0)

 Unable to perform join on table
 ---

 Key: HIVE-7186
 URL: https://issues.apache.org/jira/browse/HIVE-7186
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
 Environment: Hortonworks Data Platform 2.0.6.0
Reporter: Alex Nastetsky

 Occasionally, a table will start exhibiting behavior that will prevent it 
 from being used in a JOIN. 
 When doing a map join, it will just stall at Starting to launch local task 
 to process map join; .
 When doing a regular join, it will make progress but then error out with a 
 IndexOutOfBoundsException:
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.IndexOutOfBoundsException
 at 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:365)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
 at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:91)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
 at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534)
 ... 9 more
 Caused by: java.lang.IndexOutOfBoundsException
 at java.nio.Buffer.checkIndex(Buffer.java:532)
 at 
 java.nio.ByteBufferAsIntBufferL.put(ByteBufferAsIntBufferL.java:131)
 at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1153)
 at 
 org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:586)
 at 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.collect(ReduceSinkOperator.java:372)
 at 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:334)
 ... 15 more
 
 Doing simple selects against this table work fine and do not show any 
 apparent problems with the data.
 Assume that the table in question is called tableA and was created by queryA.
 Doing either of the following has helped resolve the issue in the past.
 1) create table tableB as select * from tableA;
   Then just use tableB instead in the JOIN.
 2) regenerate tableA using queryA
   Then use tableA in the JOIN again. It usually works the second time.
   
 When doing a describe formatted on the tables, the totalSize will be 
 different between the original tableA and tableB, and sometimes (but not 
 always) between the original tableA and the regenerated tableA. The numRows 
 will be the same across all versions of the tables.
 This problem can not be reproduced consistently, but the issue always happens 
 when we try to use an affected table in a JOIN.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7040) TCP KeepAlive for HiveServer2


[ 
https://issues.apache.org/jira/browse/HIVE-7040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020049#comment-14020049
 ] 

Vaibhav Gumashta commented on HIVE-7040:


Actually HIVE-6679 looks like focussed just on timeouts, so please ignore the 
jira. 

 TCP KeepAlive for HiveServer2
 -

 Key: HIVE-7040
 URL: https://issues.apache.org/jira/browse/HIVE-7040
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2, Server Infrastructure
Reporter: Nicolas Thiébaud
 Attachments: HIVE-7040.patch, HIVE-7040.patch.2


 Implement TCP KeepAlive for HiverServer 2 to avoid half open connections.
 A setting could be added
 {code}
 property
   namehive.server2.tcp.keepalive/name
   valuetrue/value
   descriptionWhether to enable TCP keepalive for Hive Server 2/description
 /property
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7117) Partitions not inheriting table permissions after alter rename partition

2014-06-06 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-7117:
--

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

 Partitions not inheriting table permissions after alter rename partition
 

 Key: HIVE-7117
 URL: https://issues.apache.org/jira/browse/HIVE-7117
 Project: Hive
  Issue Type: Bug
  Components: Security
Reporter: Ashish Kumar Singh
Assignee: Ashish Kumar Singh
 Fix For: 0.14.0

 Attachments: HIVE-7117.2.patch, HIVE-7117.3.patch, HIVE-7117.4.patch, 
 HIVE-7117.5.patch, HIVE-7117.6.patch, HIVE-7117.7.patch, HIVE-7117.8.patch, 
 HIVE-7117.patch


 On altering/renaming a partition it must inherit permission of the parent 
 directory, if the flag hive.warehouse.subdir.inherit.perms is set.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7117) Partitions not inheriting table permissions after alter rename partition

2014-06-06 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020145#comment-14020145
 ] 

Xuefu Zhang commented on HIVE-7117:
---

Patch committed to trunk. Thanks to Ashish for the contribution.

 Partitions not inheriting table permissions after alter rename partition
 

 Key: HIVE-7117
 URL: https://issues.apache.org/jira/browse/HIVE-7117
 Project: Hive
  Issue Type: Bug
  Components: Security
Reporter: Ashish Kumar Singh
Assignee: Ashish Kumar Singh
 Fix For: 0.14.0

 Attachments: HIVE-7117.2.patch, HIVE-7117.3.patch, HIVE-7117.4.patch, 
 HIVE-7117.5.patch, HIVE-7117.6.patch, HIVE-7117.7.patch, HIVE-7117.8.patch, 
 HIVE-7117.patch


 On altering/renaming a partition it must inherit permission of the parent 
 directory, if the flag hive.warehouse.subdir.inherit.perms is set.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7187) Reconcile jetty versions in hive

Vaibhav Gumashta created HIVE-7187:
--

 Summary: Reconcile jetty versions in hive
 Key: HIVE-7187
 URL: https://issues.apache.org/jira/browse/HIVE-7187
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Web UI, WebHCat
Reporter: Vaibhav Gumashta


Hive root pom has 3 parameters for specifying jetty dependency versions:
{code}
jetty.version6.1.26/jetty.version
jetty.webhcat.version7.6.0.v20120127/jetty.webhcat.version
jetty.hive-service.version7.6.0.v20120127/jetty.hive-service.version
{code}
1st one is used by HWI, 2nd by WebHCat and 3rd by HiveServer2 (in http mode). 
We should probably use the same jetty version for all hive components. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7187) Reconcile jetty versions in hive

2014-06-06 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020170#comment-14020170
 ] 

Eugene Koifman commented on HIVE-7187:
--

also, the current release of Jetty is 9.x.


 Reconcile jetty versions in hive
 

 Key: HIVE-7187
 URL: https://issues.apache.org/jira/browse/HIVE-7187
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Web UI, WebHCat
Reporter: Vaibhav Gumashta

 Hive root pom has 3 parameters for specifying jetty dependency versions:
 {code}
 jetty.version6.1.26/jetty.version
 jetty.webhcat.version7.6.0.v20120127/jetty.webhcat.version
 jetty.hive-service.version7.6.0.v20120127/jetty.hive-service.version
 {code}
 1st one is used by HWI, 2nd by WebHCat and 3rd by HiveServer2 (in http mode). 
 We should probably use the same jetty version for all hive components. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7063) Optimize for the Top N within a Group use case

2014-06-06 Thread Harish Butani (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harish Butani updated HIVE-7063:


Attachment: HIVE-7063.1.patch

preliminary patch: this adds code to WdwTabFn to react to a rank limit.

 Optimize for the Top N within a Group use case
 --

 Key: HIVE-7063
 URL: https://issues.apache.org/jira/browse/HIVE-7063
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-7063.1.patch


 It is common to rank within a Group/Partition and then only return the Top N 
 entries within each Group.
 With Streaming mode for Windowing, we should push the post filter on the rank 
 into the Windowing processing as a Limit expression.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: TestAvroSerdeUtils#determineSchemaCanReadSchemaFromHDFS fails on hive-13 for hadoop-2

2014-06-06 Thread Szehon Ho

This is passing in the builds, and also for me.  Looks like some
environment issue.  Are you running in eclipse or maven?

Thanks
Szehon


On Thu, Jun 5, 2014 at 5:51 PM, pankit thapar thapar.pan...@gmail.com
wrote:

 Hi,

 I am trying to build hive on my local desktop.
 I am facing an issue with test case  :
 TestAvroSerdeUtils#determineSchemaCanReadSchemaFromHDFS

 The issue is only with hadoop-2 and not with hadoop-1

 Has anyone been able to run this test case?

 Trace :
 org.apache.hadoop.ipc.RemoteException: File /path/to/schema/schema.avsc
 could only be replicated to 0 nodes instead of minReplication (=1).  There
 are 1 datanode(s) running and no node(s) are excluded in this operation.
 at

 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1406)
 at

 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2596)
 at

 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:563)
 at

 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:407)
 at

 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 at

 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:592)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at

 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956)

 at org.apache.hadoop.ipc.Client.call(Client.java:1406)
 at org.apache.hadoop.ipc.Client.call(Client.java:1359)
 at

 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:211)
 at com.sun.proxy.$Proxy14.addBlock(Unknown Source)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at

 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at

 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at

 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
 at

 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
 at com.sun.proxy.$Proxy14.addBlock(Unknown Source)
 at

 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:348)
 at

 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1275)
 at

 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1123)
 at

 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:527)


 Thanks,
 Pankit

[jira] [Commented] (HIVE-7175) Provide password file option to beeline


[ 
https://issues.apache.org/jira/browse/HIVE-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020290#comment-14020290
 ] 

Hive QA commented on HIVE-7175:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12648646/HIVE-7175.patch

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 5511 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
org.apache.hadoop.hive.ql.exec.tez.TestTezTask.testSubmit
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes
org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimal
org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalX
org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalXY
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/399/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/399/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-399/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12648646

 Provide password file option to beeline
 ---

 Key: HIVE-7175
 URL: https://issues.apache.org/jira/browse/HIVE-7175
 Project: Hive
  Issue Type: Improvement
  Components: CLI, Clients
Affects Versions: 0.13.0
Reporter: Robert Justice
  Labels: features, security
 Attachments: HIVE-7175.patch


 For people connecting to Hive Server 2 with LDAP authentication enabled, in 
 order to batch run commands, we currently have to provide the password openly 
 in the command line.   They could use some expect scripting, but I think a 
 valid improvement would be to provide a password file option similar to other 
 CLI commands in hadoop (e.g. sqoop) to be more secure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-538) make hive_jdbc.jar self-containing

2014-06-06 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-538:
--

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Nick!

 make hive_jdbc.jar self-containing
 --

 Key: HIVE-538
 URL: https://issues.apache.org/jira/browse/HIVE-538
 Project: Hive
  Issue Type: Improvement
  Components: JDBC
Affects Versions: 0.3.0, 0.4.0, 0.6.0, 0.13.0
Reporter: Raghotham Murthy
Assignee: Nick White
 Fix For: 0.14.0

 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-538.D2553.1.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-538.D2553.2.patch, HIVE-538.patch


 Currently, most jars in hive/build/dist/lib and the hadoop-*-core.jar are 
 required in the classpath to run jdbc applications on hive. We need to do 
 atleast the following to get rid of most unnecessary dependencies:
 1. get rid of dynamic serde and use a standard serialization format, maybe 
 tab separated, json or avro
 2. dont use hadoop configuration parameters
 3. repackage thrift and fb303 classes into hive_jdbc.jar



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7188) sum(if()) returns wrong results with vectorization


 [ 
https://issues.apache.org/jira/browse/HIVE-7188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-7188:


Attachment: hike-vector-sum-bug.tgz

 sum(if()) returns wrong results with vectorization
 --

 Key: HIVE-7188
 URL: https://issues.apache.org/jira/browse/HIVE-7188
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: hike-vector-sum-bug.tgz


 1. The tgz file containing the setup is attached.
 2. Run the following query
 select
 sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning
 from hike_error.ttr_day0;
 returns 0 rows with vectorization turned on whereas it return 131 rows with 
 vectorization turned off.
 hive source insert.sql
  ;
 OK
 Time taken: 0.359 seconds
 OK
 Time taken: 0.015 seconds
 OK
 Time taken: 0.069 seconds
 OK
 Time taken: 0.176 seconds
 Loading data to table hike_error.ttr_day0
 Table hike_error.ttr_day0 stats: [numFiles=1, numRows=0, totalSize=3581, 
 rawDataSize=0]
 OK
 Time taken: 0.33 seconds
 hive select
  sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning
  from hike_error.ttr_day0;
 Query ID = hsubramaniyan_20140606134646_04790d3d-ca9a-427a-8cf9-3174536114ed
 Total jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks determined at compile time: 1
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapred.reduce.tasks=number
 Execution log at: 
 /var/folders/r0/9x0wltgx2nv4m4b18m71z1y4gr/T//hsubramaniyan/hsubramaniyan_20140606134646_04790d3d-ca9a-427a-8cf9-3174536114ed.log
 Job running in-process (local Hadoop)
 Hadoop job information for null: number of mappers: 0; number of reducers: 0
 2014-06-06 13:47:02,043 null map = 0%,  reduce = 100%
 Ended Job = job_local773704964_0001
 Execution completed successfully
 MapredLocal task succeeded
 OK
 131
 Time taken: 5.325 seconds, Fetched: 1 row(s)
 hive set hive.vectorized.execution.enabled=true; 

 hive select
  sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning
  from hike_error.ttr_day0;
 Query ID = hsubramaniyan_20140606134747_1182c765-90ac-4a33-a8b1-760adca6bf38
 Total jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks determined at compile time: 1
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapred.reduce.tasks=number
 Execution log at: 
 /var/folders/r0/9x0wltgx2nv4m4b18m71z1y4gr/T//hsubramaniyan/hsubramaniyan_20140606134747_1182c765-90ac-4a33-a8b1-760adca6bf38.log
 Job running in-process (local Hadoop)
 Hadoop job information for null: number of mappers: 0; number of reducers: 0
 2014-06-06 13:47:18,604 null map = 0%,  reduce = 100%
 Ended Job = job_local701415676_0001
 Execution completed successfully
 MapredLocal task succeeded
 OK
 0
 Time taken: 5.52 seconds, Fetched: 1 row(s)
 hive explain select
  sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning
  from hike_error.ttr_day0;
 OK
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 depends on stages: Stage-1
 STAGE PLANS:
   Stage: Stage-1
 Map Reduce
   Map Operator Tree:
   TableScan
 alias: ttr_day0
 Statistics: Num rows: 447 Data size: 3581 Basic stats: COMPLETE 
 Column stats: NONE
 Select Operator
   expressions: is_returning (type: boolean), is_free (type: 
 boolean)
   outputColumnNames: is_returning, is_free
   Statistics: Num rows: 447 Data size: 3581 Basic stats: COMPLETE 
 Column stats: NONE
   Group By Operator
 aggregations: sum(if(((is_returning = true) and (is_free = 
 false)), 1, 0))
 mode: hash
 outputColumnNames: _col0
 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
 Column stats: NONE
 Reduce Output Operator
   sort order: 
   Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
 Column stats: NONE
   value expressions: _col0 (type: bigint)
   Execution mode: vectorized
   Reduce Operator Tree:
 Group By Operator
   aggregations: sum(VALUE._col0)
   mode: mergepartial
   outputColumnNames: _col0

[jira] [Created] (HIVE-7188) sum(if()) returns wrong results with vectorization

2014-06-06 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

Hari Sankar Sivarama Subramaniyan created HIVE-7188:
---

 Summary: sum(if()) returns wrong results with vectorization
 Key: HIVE-7188
 URL: https://issues.apache.org/jira/browse/HIVE-7188
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: hike-vector-sum-bug.tgz

1. The tgz file containing the setup is attached.
2. Run the following query
select
sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning
from hike_error.ttr_day0;

returns 0 rows with vectorization turned on whereas it return 131 rows with 
vectorization turned off.



hive source insert.sql
 ;
OK
Time taken: 0.359 seconds
OK
Time taken: 0.015 seconds
OK
Time taken: 0.069 seconds
OK
Time taken: 0.176 seconds
Loading data to table hike_error.ttr_day0
Table hike_error.ttr_day0 stats: [numFiles=1, numRows=0, totalSize=3581, 
rawDataSize=0]
OK
Time taken: 0.33 seconds
hive select
 sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning
 from hike_error.ttr_day0;
Query ID = hsubramaniyan_20140606134646_04790d3d-ca9a-427a-8cf9-3174536114ed
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=number
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=number
In order to set a constant number of reducers:
  set mapred.reduce.tasks=number
Execution log at: 
/var/folders/r0/9x0wltgx2nv4m4b18m71z1y4gr/T//hsubramaniyan/hsubramaniyan_20140606134646_04790d3d-ca9a-427a-8cf9-3174536114ed.log
Job running in-process (local Hadoop)
Hadoop job information for null: number of mappers: 0; number of reducers: 0
2014-06-06 13:47:02,043 null map = 0%,  reduce = 100%
Ended Job = job_local773704964_0001
Execution completed successfully
MapredLocal task succeeded
OK
131
Time taken: 5.325 seconds, Fetched: 1 row(s)
hive set hive.vectorized.execution.enabled=true;   
 
hive select
 sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning
 from hike_error.ttr_day0;
Query ID = hsubramaniyan_20140606134747_1182c765-90ac-4a33-a8b1-760adca6bf38
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=number
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=number
In order to set a constant number of reducers:
  set mapred.reduce.tasks=number
Execution log at: 
/var/folders/r0/9x0wltgx2nv4m4b18m71z1y4gr/T//hsubramaniyan/hsubramaniyan_20140606134747_1182c765-90ac-4a33-a8b1-760adca6bf38.log
Job running in-process (local Hadoop)
Hadoop job information for null: number of mappers: 0; number of reducers: 0
2014-06-06 13:47:18,604 null map = 0%,  reduce = 100%
Ended Job = job_local701415676_0001
Execution completed successfully
MapredLocal task succeeded
OK
0
Time taken: 5.52 seconds, Fetched: 1 row(s)
hive explain select
 sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning
 from hike_error.ttr_day0;
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
Map Reduce
  Map Operator Tree:
  TableScan
alias: ttr_day0
Statistics: Num rows: 447 Data size: 3581 Basic stats: COMPLETE 
Column stats: NONE
Select Operator
  expressions: is_returning (type: boolean), is_free (type: boolean)
  outputColumnNames: is_returning, is_free
  Statistics: Num rows: 447 Data size: 3581 Basic stats: COMPLETE 
Column stats: NONE
  Group By Operator
aggregations: sum(if(((is_returning = true) and (is_free = 
false)), 1, 0))
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: NONE
Reduce Output Operator
  sort order: 
  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: NONE
  value expressions: _col0 (type: bigint)
  Execution mode: vectorized
  Reduce Operator Tree:
Group By Operator
  aggregations: sum(VALUE._col0)
  mode: mergepartial
  outputColumnNames: _col0
  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column 
stats: NONE
  Select Operator
expressions: _col0 (type: bigint)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column 
stats: NONE
File

[jira] [Created] (HIVE-7189) Hive does not store column names in ORC

2014-06-06 Thread Chris Drome (JIRA)

Chris Drome created HIVE-7189:
-

 Summary: Hive does not store column names in ORC
 Key: HIVE-7189
 URL: https://issues.apache.org/jira/browse/HIVE-7189
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 0.13.0, 0.12.0
Reporter: Chris Drome


We uncovered the following discrepancy between writing ORC files through Pig 
and Hive:

ORCFile header contains the name of the columns. Storing through Pig 
(ORCStorage or HCatStorer), the column names are stored fine. But when stored 
through hive they are stored as _col0, _col1,,_col99 and hive uses the 
partition schema to map the column names. Reading the same file through Pig 
then has problems as user will have to manually map columns.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7190) WebHCat launcher task failure can cause two concurent user jobs to run

Ivan Mitic created HIVE-7190:


 Summary: WebHCat launcher task failure can cause two concurent 
user jobs to run
 Key: HIVE-7190
 URL: https://issues.apache.org/jira/browse/HIVE-7190
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Reporter: Ivan Mitic


Templeton uses launcher jobs to launch the actual user jobs. Launcher jobs are 
1-map jobs (a single task jobs) which kick off the actual user job and monitor 
it until it finishes. Given that the launcher is a task, like any other MR 
task, it has a retry policy in case it fails (due to a task crash, 
tasktracker/nodemanager crash, machine level outage, etc.). Further, when 
launcher task is retried, it will again launch the same user job, *however* the 
previous attempt user job is already running. What this means is that we can 
have two identical user jobs running in parallel. 

In case of MRv2, there will be an MRAppMaster and the launcher task, which are 
subject to failure. In case any of the two fails, another instance of a user 
job will be launched again in parallel. 

Above situation is already a bug.

Now going further to RM HA, what RM does on failover/restart is that it kills 
all containers, and it restarts all applications. This means that if our 
customer had 10 jobs on the cluster (this is 10 launcher jobs and 10 user 
jobs), on RM failover, all 20 jobs will be restarted, and launcher jobs will 
queue user jobs again. There are two issues with this design:
1. There are *possible* chances for corruption of job outputs (it would be 
useful to analyze this scenario more and confirm this statement).
2. Cluster resources are spent on jobs redundantly

To address the issue at least on Yarn (Hadoop 2.0) clusters, webhcat should do 
the same thing Oozie does in this scenario, and that is to tag all its child 
jobs with an id, and kill those jobs on task restart before they are kicked off 
again.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7190) WebHCat launcher task failure can cause two concurent user jobs to run

[
https://issues.apache.org/jira/browse/HIVE-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020358#comment-14020358
]

Ivan Mitic commented on HIVE-7190:
--

Will attach a patch in a bit, feel free to assign the Jira to me as I don't
have the right to do so yet.

WebHCat launcher task failure can cause two concurent user jobs to run
--

Key: HIVE-7190
URL: https://issues.apache.org/jira/browse/HIVE-7190
Project: Hive
Issue Type: Bug
Components: WebHCat
Reporter: Ivan Mitic

Templeton uses launcher jobs to launch the actual user jobs. Launcher jobs
are 1-map jobs (a single task jobs) which kick off the actual user job and
monitor it until it finishes. Given that the launcher is a task, like any
other MR task, it has a retry policy in case it fails (due to a task crash,
tasktracker/nodemanager crash, machine level outage, etc.). Further, when
launcher task is retried, it will again launch the same user job, *however*
the previous attempt user job is already running. What this means is that we
can have two identical user jobs running in parallel.
In case of MRv2, there will be an MRAppMaster and the launcher task, which
are subject to failure. In case any of the two fails, another instance of a
user job will be launched again in parallel.
Above situation is already a bug.
Now going further to RM HA, what RM does on failover/restart is that it kills
all containers, and it restarts all applications. This means that if our
customer had 10 jobs on the cluster (this is 10 launcher jobs and 10 user
jobs), on RM failover, all 20 jobs will be restarted, and launcher jobs will
queue user jobs again. There are two issues with this design:
1. There are *possible* chances for corruption of job outputs (it would be
useful to analyze this scenario more and confirm this statement).
2. Cluster resources are spent on jobs redundantly
To address the issue at least on Yarn (Hadoop 2.0) clusters, webhcat should
do the same thing Oozie does in this scenario, and that is to tag all its
child jobs with an id, and kill those jobs on task restart before they are
kicked off again.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7065) Hive jobs in webhcat run in default mr mode even in Hive on Tez setup

2014-06-06 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-7065:


   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Patch committed to trunk. Thanks for the contribution Eugene!


 Hive jobs in webhcat run in default mr mode even in Hive on Tez setup
 -

 Key: HIVE-7065
 URL: https://issues.apache.org/jira/browse/HIVE-7065
 Project: Hive
  Issue Type: Bug
  Components: Tez, WebHCat
Affects Versions: 0.13.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Fix For: 0.14.0

 Attachments: HIVE-7065.1.patch, HIVE-7065.patch


 WebHCat config has templeton.hive.properties to specify Hive config 
 properties that need to be passed to Hive client on node executing a job 
 submitted through WebHCat (hive query, for example).
 this should include hive.execution.engine
 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7191) optimized map join hash table has a bug when it reaches 2Gb

2014-06-06 Thread Sergey Shelukhin (JIRA)

Sergey Shelukhin created HIVE-7191:
--

 Summary: optimized map join hash table has a bug when it reaches 
2Gb
 Key: HIVE-7191
 URL: https://issues.apache.org/jira/browse/HIVE-7191
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin


Via [~t3rmin4t0r]:

{noformat}
Caused by: java.lang.ArrayIndexOutOfBoundsException: -204

at java.util.ArrayList.elementData(ArrayList.java:371)

at java.util.ArrayList.get(ArrayList.java:384)

at 
org.apache.hadoop.hive.serde2.WriteBuffers.setReadPoint(WriteBuffers.java:95)

at 
org.apache.hadoop.hive.serde2.WriteBuffers.hashCode(WriteBuffers.java:100)

at 
org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.put(BytesBytesMultiHashMap.java:203)

at 
org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer.putRow(MapJoinBytesTableContainer.java:266)

at 
org.apache.hadoop.hive.ql.exec.tez.HashTableLoader.load(HashTableLoader.java:124)

... 16 more
{noformat}




--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7191) optimized map join hash table has a bug when it reaches 2Gb

2014-06-06 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-7191:
---

Attachment: HIVE-7191.patch

Some casts are in order

 optimized map join hash table has a bug when it reaches 2Gb
 ---

 Key: HIVE-7191
 URL: https://issues.apache.org/jira/browse/HIVE-7191
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-7191.patch


 Via [~t3rmin4t0r]:
 {noformat}
 Caused by: java.lang.ArrayIndexOutOfBoundsException: -204
 at java.util.ArrayList.elementData(ArrayList.java:371)
 at java.util.ArrayList.get(ArrayList.java:384)
 at 
 org.apache.hadoop.hive.serde2.WriteBuffers.setReadPoint(WriteBuffers.java:95)
 at 
 org.apache.hadoop.hive.serde2.WriteBuffers.hashCode(WriteBuffers.java:100)
 at 
 org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.put(BytesBytesMultiHashMap.java:203)
 at 
 org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer.putRow(MapJoinBytesTableContainer.java:266)
 at 
 org.apache.hadoop.hive.ql.exec.tez.HashTableLoader.load(HashTableLoader.java:124)
 ... 16 more
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7191) optimized map join hash table has a bug when it reaches 2Gb


 [ 
https://issues.apache.org/jira/browse/HIVE-7191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7191:
---

Status: Patch Available  (was: Open)

+1

 optimized map join hash table has a bug when it reaches 2Gb
 ---

 Key: HIVE-7191
 URL: https://issues.apache.org/jira/browse/HIVE-7191
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-7191.patch


 Via [~t3rmin4t0r]:
 {noformat}
 Caused by: java.lang.ArrayIndexOutOfBoundsException: -204
 at java.util.ArrayList.elementData(ArrayList.java:371)
 at java.util.ArrayList.get(ArrayList.java:384)
 at 
 org.apache.hadoop.hive.serde2.WriteBuffers.setReadPoint(WriteBuffers.java:95)
 at 
 org.apache.hadoop.hive.serde2.WriteBuffers.hashCode(WriteBuffers.java:100)
 at 
 org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.put(BytesBytesMultiHashMap.java:203)
 at 
 org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer.putRow(MapJoinBytesTableContainer.java:266)
 at 
 org.apache.hadoop.hive.ql.exec.tez.HashTableLoader.load(HashTableLoader.java:124)
 ... 16 more
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7190) WebHCat launcher task failure can cause two concurent user jobs to run

[
https://issues.apache.org/jira/browse/HIVE-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ivan Mitic updated HIVE-7190:
-

Attachment: HIVE-7190.patch

Attaching the initial patch.

Approach in the patch is similar to what Oozie does to handle this situation.
Specifically, all child map jobs get tagged with the launcher MR job id. On
launcher task restart, launcher queries RM for the list of jobs that have the
tag and kills them. After that it moves on to start the same child job again.
Again, similarly to what Oozie does, a new {{templeton.job.launch.time}}
property is introduced that captures the launcher job submit timestamp and
later used to reduce the search window when RM is queried.

I have validated that MR, Pig and Hive jobs do get tagged appropriately. I have
also validated that previous child jobs do get killed on RM failover/task
failure.

To validate the patch, you will need to add webhcat shim jars to
templeton.libjars as now webhcat launcher also has a dependency on hadoop
shims.

I have noticed that in case of the SqoopDelegator webhcat currently does not
set the MR delegation token when optionsFile flag is used. This also creates
the problem in this scenario. This looks like something that should be handled
via a separate Jira.

WebHCat launcher task failure can cause two concurent user jobs to run
--

Key: HIVE-7190
URL: https://issues.apache.org/jira/browse/HIVE-7190
Project: Hive
Issue Type: Bug
Components: WebHCat
Reporter: Ivan Mitic
Attachments: HIVE-7190.patch

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7190) WebHCat launcher task failure can cause two concurent user jobs to run

2014-06-06 Thread Eugene Koifman (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Eugene Koifman updated HIVE-7190:
-

Affects Version/s: 0.13.0

WebHCat launcher task failure can cause two concurent user jobs to run
--

Key: HIVE-7190
URL: https://issues.apache.org/jira/browse/HIVE-7190
Project: Hive
Issue Type: Bug
Components: WebHCat
Affects Versions: 0.13.0
Reporter: Ivan Mitic
Attachments: HIVE-7190.patch

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7192) Hive Streaming - Some required settings are not mentioned in the documentation

Roshan Naik created HIVE-7192:
-

 Summary: Hive Streaming - Some required settings are not mentioned 
in the documentation
 Key: HIVE-7192
 URL: https://issues.apache.org/jira/browse/HIVE-7192
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.13.0
Reporter: Roshan Naik
Assignee: Roshan Naik


Specifically:
 - hive.support.concurrency on metastore
 - hive.vectorized.execution.enabled for query client





--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7167) Hive Metastore fails to start with SQLServerException

2014-06-06 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020409#comment-14020409
 ] 

Sergey Shelukhin commented on HIVE-7167:


1) Can you post SQLServerException you are getting?
2) Why these 3 methods of all methods?
3) It seems like and a hacky way to solve the problem. It can still fail again, 
right?

 Hive Metastore fails to start with SQLServerException
 -

 Key: HIVE-7167
 URL: https://issues.apache.org/jira/browse/HIVE-7167
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.13.0
Reporter: Xiaobing Zhou
Assignee: Xiaobing Zhou
  Labels: patch,, test
 Fix For: 0.13.0

 Attachments: HIVE-7167.1.patch


 In the case that hiveserver2 uses embedded metastore and hiveserver uses 
 remote metastore, this exception comes up when hiveserver2 and hiveserver are 
 started simultaneously.
 metastore service status is running but when I launch hive cli, I get 
 following metastore connection error:
 C:\apps\dist\hive-0.13.0.2.1.2.0-1660\binhive.cmd
 14/05/09 17:40:03 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* 
 no l
 onger has any effect.  Use hive.hmshandler.retry.* instead
 Logging initialized using configuration in 
 file:/C:/apps/dist/hive-0.13.0.2.1.2.
 0-1660/conf/hive-log4j.properties
 Exception in thread main java.lang.RuntimeException: 
 java.lang.RuntimeExceptio
 n: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
 at 
 org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.jav
 a:347)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
 java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
 sorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:601)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 Caused by: java.lang.RuntimeException: Unable to instantiate 
 org.apache.hadoop.h
 ive.metastore.HiveMetaStoreClient
 at 
 org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStore
 Utils.java:1413)
 at 
 org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.init(Retry
 ingMetaStoreClient.java:62)
 at 
 org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(Ret
 ryingMetaStoreClient.java:72)
 at 
 org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.ja
 va:2444)
 at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2456)
 at 
 org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.jav
 a:341)
 ... 7 more
 Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
 Method)
 at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstruct
 orAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingC
 onstructorAccessorImpl.java:45)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
 at 
 org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStore
 Utils.java:1411)
 ... 12 more
 Caused by: MetaException(message:Could not connect to meta store using any of 
 th
 e URIs provided. Most recent failure: 
 org.apache.thrift.transport.TTransportExce
 ption: java.net.ConnectException: Connection refused: connect
 at org.apache.thrift.transport.TSocket.open(TSocket.java:185)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaSto
 reClient.java:336)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.init(HiveMetaS
 toreClient.java:214)
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
 Method)
 at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstruct
 orAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingC
 onstructorAccessorImpl.java:45)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
 at 
 org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStore
 Utils.java:1411)
 at 
 org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.init(Retry
 ingMetaStoreClient.java:62)
 at 
 org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(Ret
 ryingMetaStoreClient.java:72)
 at 
 org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.ja
 va:2444)
 at

[jira] [Updated] (HIVE-7192) Hive Streaming - Some required settings are not mentioned in the documentation


 [ 
https://issues.apache.org/jira/browse/HIVE-7192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated HIVE-7192:
--

Attachment: HIVE-7192.patch

uploading patch

 Hive Streaming - Some required settings are not mentioned in the documentation
 --

 Key: HIVE-7192
 URL: https://issues.apache.org/jira/browse/HIVE-7192
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.13.0
Reporter: Roshan Naik
Assignee: Roshan Naik
  Labels: Streaming
 Attachments: HIVE-7192.patch


 Specifically:
  - hive.support.concurrency on metastore
  - hive.vectorized.execution.enabled for query client



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7192) Hive Streaming - Some required settings are not mentioned in the documentation


 [ 
https://issues.apache.org/jira/browse/HIVE-7192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated HIVE-7192:
--

Status: Patch Available  (was: Open)

 Hive Streaming - Some required settings are not mentioned in the documentation
 --

 Key: HIVE-7192
 URL: https://issues.apache.org/jira/browse/HIVE-7192
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.13.0
Reporter: Roshan Naik
Assignee: Roshan Naik
  Labels: Streaming
 Attachments: HIVE-7192.patch


 Specifically:
  - hive.support.concurrency on metastore
  - hive.vectorized.execution.enabled for query client



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7187) Reconcile jetty versions in hive


 [ 
https://issues.apache.org/jira/browse/HIVE-7187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7187:
---

Assignee: Ashutosh Chauhan
  Status: Patch Available  (was: Open)

 Reconcile jetty versions in hive
 

 Key: HIVE-7187
 URL: https://issues.apache.org/jira/browse/HIVE-7187
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Web UI, WebHCat
Reporter: Vaibhav Gumashta
Assignee: Ashutosh Chauhan
 Attachments: HIVE-7187.patch


 Hive root pom has 3 parameters for specifying jetty dependency versions:
 {code}
 jetty.version6.1.26/jetty.version
 jetty.webhcat.version7.6.0.v20120127/jetty.webhcat.version
 jetty.hive-service.version7.6.0.v20120127/jetty.hive-service.version
 {code}
 1st one is used by HWI, 2nd by WebHCat and 3rd by HiveServer2 (in http mode). 
 We should probably use the same jetty version for all hive components. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7187) Reconcile jetty versions in hive


 [ 
https://issues.apache.org/jira/browse/HIVE-7187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7187:
---

Attachment: HIVE-7187.patch

 Reconcile jetty versions in hive
 

 Key: HIVE-7187
 URL: https://issues.apache.org/jira/browse/HIVE-7187
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Web UI, WebHCat
Reporter: Vaibhav Gumashta
 Attachments: HIVE-7187.patch


 Hive root pom has 3 parameters for specifying jetty dependency versions:
 {code}
 jetty.version6.1.26/jetty.version
 jetty.webhcat.version7.6.0.v20120127/jetty.webhcat.version
 jetty.hive-service.version7.6.0.v20120127/jetty.hive-service.version
 {code}
 1st one is used by HWI, 2nd by WebHCat and 3rd by HiveServer2 (in http mode). 
 We should probably use the same jetty version for all hive components. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 22170: analyze table T compute statistics for columns; will now compute stats for all columns.

2014-06-06 Thread j . prasanth . j


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22170/#review44974
---

Ship it!


Ship It!

- Prasanth_J


On June 6, 2014, 4:30 p.m., Ashutosh Chauhan wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/22170/
 ---
 
 (Updated June 6, 2014, 4:30 p.m.)
 
 
 Review request for hive and Prasanth_J.
 
 
 Bugs: HIVE-7168
 https://issues.apache.org/jira/browse/HIVE-7168
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 analyze table T compute statistics for columns; will now compute stats for 
 all columns.
 
 
 Diffs
 -
 
   
 metastore/src/model/org/apache/hadoop/hive/metastore/model/MPartitionColumnStatistics.java
  1245d80 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java 
 5b77e6f 
   ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g 6d958fd 
   ql/src/test/queries/clientpositive/columnstats_partlvl.q 9dfe8ff 
   ql/src/test/queries/clientpositive/columnstats_tbllvl.q 170fbc5 
   ql/src/test/results/clientpositive/columnstats_partlvl.q.out d91be8d 
   ql/src/test/results/clientpositive/columnstats_tbllvl.q.out 3d3d0e2 
   ql/src/test/results/clientpositive/display_colstats_tbllvl.q.out 03b536f 
 
 Diff: https://reviews.apache.org/r/22170/diff/
 
 
 Testing
 ---
 
 Added new tests.
 
 
 Thanks,
 
 Ashutosh Chauhan

Review Request 22328: Make hive use one jetty version.

2014-06-06 Thread Ashutosh Chauhan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22328/
---

Review request for hive, Eugene Koifman and Vaibhav Gumashta.


Bugs: HIVE-7187
https://issues.apache.org/jira/browse/HIVE-7187


Repository: hive


Description
---

Make hive use one jetty version.


Diffs
-

  trunk/hcatalog/webhcat/svr/pom.xml 1600966 
  trunk/hwi/pom.xml 1600966 
  trunk/pom.xml 1600992 
  trunk/service/pom.xml 1600966 
  trunk/shims/0.20/pom.xml 1600966 
  trunk/shims/0.20S/pom.xml 1600966 
  trunk/shims/0.23/pom.xml 1600966 

Diff: https://reviews.apache.org/r/22328/diff/


Testing
---

Manually built and ran few tests.


Thanks,

Ashutosh Chauhan

[jira] [Commented] (HIVE-7168) Don't require to name all columns in analyze statements if stats collection is for all columns

2014-06-06 Thread Prasanth J (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020423#comment-14020423
 ] 

Prasanth J commented on HIVE-7168:
--

+1

 Don't require to name all columns in analyze statements if stats collection 
 is for all columns
 --

 Key: HIVE-7168
 URL: https://issues.apache.org/jira/browse/HIVE-7168
 Project: Hive
  Issue Type: Improvement
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-7168.1.patch, HIVE-7168.2.patch, HIVE-7168.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

Review Request 22329: HIVE-7190. WebHCat launcher task failure can cause two concurent user jobs to run

2014-06-06 Thread Ivan Mitic

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22329/
---

Review request for hive.

Repository: hive-git

Description
---

Approach in the patch is similar to what Oozie does to handle this situation.
Specifically, all child map jobs get tagged with the launcher MR job id. On
launcher task restart, launcher queries RM for the list of jobs that have the
tag and kills them. After that it moves on to start the same child job again.
Again, similarly to what Oozie does, a new templeton.job.launch.time property
is introduced that captures the launcher job submit timestamp and later used to
reduce the search window when RM is queried.

To validate the patch, you will need to add webhcat shim jars to
templeton.libjars as now webhcat launcher also has a dependency on hadoop
shims.

Diffs
-

hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/HiveDelegator.java
23b1c4f

hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/JarDelegator.java
41b1dc5

hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/LauncherDelegator.java
04a5c6f

hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/PigDelegator.java
04e061d

hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/SqoopDelegator.java
adcd917

hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/tool/JobSubmissionConstants.java
a6355a6

hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/tool/LaunchMapper.java
556ee62
shims/0.20S/src/main/java/org/apache/hadoop/mapred/WebHCatJTShim20S.java
d3552c1
shims/0.23/src/main/java/org/apache/hadoop/mapred/WebHCatJTShim23.java
5a728b2
shims/common/src/main/java/org/apache/hadoop/hive/shims/HadoopShims.java
299e918

Diff: https://reviews.apache.org/r/22329/diff/

Testing
---

I have validated that MR, Pig and Hive jobs do get tagged appropriately. I have
also validated that previous child jobs do get killed on RM failover/task
failure.

Thanks,

Ivan Mitic

[jira] [Commented] (HIVE-7190) WebHCat launcher task failure can cause two concurent user jobs to run

[
https://issues.apache.org/jira/browse/HIVE-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020435#comment-14020435
]

Ivan Mitic commented on HIVE-7190:
--

Review board: https://reviews.apache.org/r/22329/

WebHCat launcher task failure can cause two concurent user jobs to run
--

Key: HIVE-7190
URL: https://issues.apache.org/jira/browse/HIVE-7190
Project: Hive
Issue Type: Bug
Components: WebHCat
Affects Versions: 0.13.0
Reporter: Ivan Mitic
Attachments: HIVE-7190.2.patch, HIVE-7190.patch

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7190) WebHCat launcher task failure can cause two concurent user jobs to run

[
https://issues.apache.org/jira/browse/HIVE-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ivan Mitic updated HIVE-7190:
-

Attachment: HIVE-7190.2.patch

Rebasing patch against latest hive trunk.

WebHCat launcher task failure can cause two concurent user jobs to run
--

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-5687) Streaming support in Hive


 [ 
https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated HIVE-5687:
--

Attachment: (was: Hive Streaming Ingest API for v4 patch.pdf)

 Streaming support in Hive
 -

 Key: HIVE-5687
 URL: https://issues.apache.org/jira/browse/HIVE-5687
 Project: Hive
  Issue Type: Sub-task
Reporter: Roshan Naik
Assignee: Roshan Naik
  Labels: ACID, Streaming
 Fix For: 0.13.0

 Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, 
 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, 
 HIVE-5687-unit-test-fix.patch, HIVE-5687.patch, HIVE-5687.v2.patch, 
 HIVE-5687.v3.patch, HIVE-5687.v4.patch, HIVE-5687.v5.patch, 
 HIVE-5687.v6.patch, HIVE-5687.v7.patch, Hive Streaming Ingest API for v3 
 patch.pdf, package.html


 Implement support for Streaming data into HIVE.
 - Provide a client streaming API 
 - Transaction support: Clients should be able to periodically commit a batch 
 of records atomically
 - Immediate visibility: Records should be immediately visible to queries on 
 commit
 - Should not overload HDFS with too many small files
 Use Cases:
  - Streaming logs into HIVE via Flume
  - Streaming results of computations from Storm



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-5687) Streaming support in Hive


 [ 
https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated HIVE-5687:
--

Attachment: Hive Streaming Ingest API for v4 patch.pdf

updating 'Hive Streaming Ingest API for v4 patch.pdf'
  document with requirements

 Streaming support in Hive
 -

 Key: HIVE-5687
 URL: https://issues.apache.org/jira/browse/HIVE-5687
 Project: Hive
  Issue Type: Sub-task
Reporter: Roshan Naik
Assignee: Roshan Naik
  Labels: ACID, Streaming
 Fix For: 0.13.0

 Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, 
 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, 
 HIVE-5687-unit-test-fix.patch, HIVE-5687.patch, HIVE-5687.v2.patch, 
 HIVE-5687.v3.patch, HIVE-5687.v4.patch, HIVE-5687.v5.patch, 
 HIVE-5687.v6.patch, HIVE-5687.v7.patch, Hive Streaming Ingest API for v3 
 patch.pdf, Hive Streaming Ingest API for v4 patch.pdf, package.html


 Implement support for Streaming data into HIVE.
 - Provide a client streaming API 
 - Transaction support: Clients should be able to periodically commit a batch 
 of records atomically
 - Immediate visibility: Records should be immediately visible to queries on 
 commit
 - Should not overload HDFS with too many small files
 Use Cases:
  - Streaming logs into HIVE via Flume
  - Streaming results of computations from Storm



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7138) add row index dump capability to ORC file dump

2014-06-06 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020445#comment-14020445
 ] 

Owen O'Malley commented on HIVE-7138:
-

+1, but I'd like to use --rowindex instead of -rowindex

 add row index dump capability to ORC file dump
 --

 Key: HIVE-7138
 URL: https://issues.apache.org/jira/browse/HIVE-7138
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-7138.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7168) Don't require to name all columns in analyze statements if stats collection is for all columns


[ 
https://issues.apache.org/jira/browse/HIVE-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020474#comment-14020474
 ] 

Hive QA commented on HIVE-7168:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12648654/HIVE-7168.2.patch

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 5585 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_insert1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_load_dyn_part1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_scriptfile1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_dml
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimal
org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalX
org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalXY
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/400/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/400/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-400/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12648654

 Don't require to name all columns in analyze statements if stats collection 
 is for all columns
 --

 Key: HIVE-7168
 URL: https://issues.apache.org/jira/browse/HIVE-7168
 Project: Hive
  Issue Type: Improvement
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-7168.1.patch, HIVE-7168.2.patch, HIVE-7168.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7168) Don't require to name all columns in analyze statements if stats collection is for all columns


 [ 
https://issues.apache.org/jira/browse/HIVE-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7168:
---

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk.

 Don't require to name all columns in analyze statements if stats collection 
 is for all columns
 --

 Key: HIVE-7168
 URL: https://issues.apache.org/jira/browse/HIVE-7168
 Project: Hive
  Issue Type: Improvement
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.14.0

 Attachments: HIVE-7168.1.patch, HIVE-7168.2.patch, HIVE-7168.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7193) Hive should support additional LDAP authentication parameters

2014-06-06 Thread Mala Chikka Kempanna (JIRA)

Mala Chikka Kempanna created HIVE-7193:
--

 Summary: Hive should support additional LDAP authentication 
parameters
 Key: HIVE-7193
 URL: https://issues.apache.org/jira/browse/HIVE-7193
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Mala Chikka Kempanna


Currently hive has only following authenticator parameters for LDAP
 authentication for hiveserver2. 
property 
namehive.server2.authentication/name 
valueLDAP/value 
/property 
property 
namehive.server2.authentication.ldap.url/name 
valueldap://our_ldap_address/value 
/property 

We need to include other LDAP properties as part of hive-LDAP authentication 
like below
a group search base - dc=domain,dc=com 
a group search filter - member={0} 
a user search base - dc=domain,dc=com 
a user search filter - sAMAAccountName={0} 
a list of valid user groups - group1,group2,group3 





--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 22328: Make hive use one jetty version.

2014-06-06 Thread Vaibhav Gumashta


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22328/#review44984
---

Ship it!


Ship It!

- Vaibhav Gumashta


On June 6, 2014, 9:54 p.m., Ashutosh Chauhan wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/22328/
 ---
 
 (Updated June 6, 2014, 9:54 p.m.)
 
 
 Review request for hive, Eugene Koifman and Vaibhav Gumashta.
 
 
 Bugs: HIVE-7187
 https://issues.apache.org/jira/browse/HIVE-7187
 
 
 Repository: hive
 
 
 Description
 ---
 
 Make hive use one jetty version.
 
 
 Diffs
 -
 
   trunk/hcatalog/webhcat/svr/pom.xml 1600966 
   trunk/hwi/pom.xml 1600966 
   trunk/pom.xml 1600992 
   trunk/service/pom.xml 1600966 
   trunk/shims/0.20/pom.xml 1600966 
   trunk/shims/0.20S/pom.xml 1600966 
   trunk/shims/0.23/pom.xml 1600966 
 
 Diff: https://reviews.apache.org/r/22328/diff/
 
 
 Testing
 ---
 
 Manually built and ran few tests.
 
 
 Thanks,
 
 Ashutosh Chauhan

[jira] [Commented] (HIVE-7187) Reconcile jetty versions in hive


[ 
https://issues.apache.org/jira/browse/HIVE-7187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020531#comment-14020531
 ] 

Vaibhav Gumashta commented on HIVE-7187:


+1 (pending tests). 

[~ekoifman] How about we handle the upgrade to new jetty version in a new jira?

 Reconcile jetty versions in hive
 

 Key: HIVE-7187
 URL: https://issues.apache.org/jira/browse/HIVE-7187
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Web UI, WebHCat
Reporter: Vaibhav Gumashta
Assignee: Ashutosh Chauhan
 Attachments: HIVE-7187.patch


 Hive root pom has 3 parameters for specifying jetty dependency versions:
 {code}
 jetty.version6.1.26/jetty.version
 jetty.webhcat.version7.6.0.v20120127/jetty.webhcat.version
 jetty.hive-service.version7.6.0.v20120127/jetty.hive-service.version
 {code}
 1st one is used by HWI, 2nd by WebHCat and 3rd by HiveServer2 (in http mode). 
 We should probably use the same jetty version for all hive components. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6394) Implement Timestmap in ParquetSerde


 [ 
https://issues.apache.org/jira/browse/HIVE-6394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-6394:


Attachment: HIVE-6394.5.patch

 Implement Timestmap in ParquetSerde
 ---

 Key: HIVE-6394
 URL: https://issues.apache.org/jira/browse/HIVE-6394
 Project: Hive
  Issue Type: Sub-task
  Components: Serializers/Deserializers
Reporter: Jarek Jarcec Cecho
Assignee: Szehon Ho
  Labels: Parquet
 Attachments: HIVE-6394.2.patch, HIVE-6394.3.patch, HIVE-6394.4.patch, 
 HIVE-6394.5.patch, HIVE-6394.5.patch, HIVE-6394.patch


 This JIRA is to implement timestamp support in Parquet SerDe.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6394) Implement Timestmap in ParquetSerde


 [ 
https://issues.apache.org/jira/browse/HIVE-6394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-6394:


Attachment: (was: HIVE-6394.5.patch)

 Implement Timestmap in ParquetSerde
 ---

 Key: HIVE-6394
 URL: https://issues.apache.org/jira/browse/HIVE-6394
 Project: Hive
  Issue Type: Sub-task
  Components: Serializers/Deserializers
Reporter: Jarek Jarcec Cecho
Assignee: Szehon Ho
  Labels: Parquet
 Attachments: HIVE-6394.2.patch, HIVE-6394.3.patch, HIVE-6394.4.patch, 
 HIVE-6394.5.patch, HIVE-6394.6.patch, HIVE-6394.patch


 This JIRA is to implement timestamp support in Parquet SerDe.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6394) Implement Timestmap in ParquetSerde


 [ 
https://issues.apache.org/jira/browse/HIVE-6394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-6394:


Attachment: HIVE-6394.6.patch

 Implement Timestmap in ParquetSerde
 ---

 Key: HIVE-6394
 URL: https://issues.apache.org/jira/browse/HIVE-6394
 Project: Hive
  Issue Type: Sub-task
  Components: Serializers/Deserializers
Reporter: Jarek Jarcec Cecho
Assignee: Szehon Ho
  Labels: Parquet
 Attachments: HIVE-6394.2.patch, HIVE-6394.3.patch, HIVE-6394.4.patch, 
 HIVE-6394.5.patch, HIVE-6394.6.patch, HIVE-6394.patch


 This JIRA is to implement timestamp support in Parquet SerDe.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6394) Implement Timestmap in ParquetSerde


[ 
https://issues.apache.org/jira/browse/HIVE-6394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020564#comment-14020564
 ] 

Szehon Ho commented on HIVE-6394:
-

Attaching another patch.  Was using a parquet-example class, now explicitly 
adding that logic in the serde layer.

 Implement Timestmap in ParquetSerde
 ---

 Key: HIVE-6394
 URL: https://issues.apache.org/jira/browse/HIVE-6394
 Project: Hive
  Issue Type: Sub-task
  Components: Serializers/Deserializers
Reporter: Jarek Jarcec Cecho
Assignee: Szehon Ho
  Labels: Parquet
 Attachments: HIVE-6394.2.patch, HIVE-6394.3.patch, HIVE-6394.4.patch, 
 HIVE-6394.5.patch, HIVE-6394.5.patch, HIVE-6394.patch


 This JIRA is to implement timestamp support in Parquet SerDe.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 22174: HIVE-6394 Implement Timestmap in ParquetSerde

2014-06-06 Thread Szehon Ho


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22174/
---

(Updated June 7, 2014, 12:06 a.m.)


Review request for hive, Brock Noland, justin coffey, and Xuefu Zhang.


Changes
---

One more change, adding the 'NanoTime' class in Hive, as it was an example 
class in parquet.

Let's go with using un-annotated INT96 for parquet, that's what other consuming 
applications have been doing.  When the annotation does come, we'll move to 
that.


Bugs: HIVE-6394
https://issues.apache.org/jira/browse/HIVE-6394


Repository: hive-git


Description
---

This uses the Jodd library to convert java.sql.Timestamp type used by Hive into 
the {julian-day:nanos} format expected by parquet, and vice-versa.


Diffs (updated)
-

  data/files/parquet_types.txt 0be390b 
  pom.xml 4bb8880 
  ql/pom.xml 13c477a 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java 
4da0d30 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveSchemaConverter.java
 29f7e11 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ArrayWritableObjectInspector.java
 57161d8 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java 
fb2f5a8 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTime.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTimeUtils.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java 
3490061 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetTimestampUtils.java
 PRE-CREATION 
  ql/src/test/queries/clientpositive/parquet_types.q 5d6333c 
  ql/src/test/results/clientpositive/parquet_types.q.out c23f7f1 

Diff: https://reviews.apache.org/r/22174/diff/


Testing
---

Unit tests the new libraries, and also added timestamp data in the 
parquet_types q-test.


Thanks,

Szehon Ho

[jira] [Updated] (HIVE-7094) Separate out static/dynamic partitioning code in FileRecordWriterContainer

2014-06-06 Thread Carl Steinbach (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-7094:
-

Component/s: HCatalog

 Separate out static/dynamic partitioning code in FileRecordWriterContainer
 --

 Key: HIVE-7094
 URL: https://issues.apache.org/jira/browse/HIVE-7094
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Reporter: David Chen
Assignee: David Chen
 Attachments: HIVE-7094.1.patch


 There are two major places in FileRecordWriterContainer that have the {{if 
 (dynamicPartitioning)}} condition: the constructor and write().
 This is the approach that I am taking:
 # Move the DP and SP code into two subclasses: 
 DynamicFileRecordWriterContainer and StaticFileRecordWriterContainer.
 # Make FileRecordWriterContainer an abstract class that contains the common 
 code for both implementations. For write(), FileRecordWriterContainer will 
 call an abstract method that will provide the local RecordWriter, 
 ObjectInspector, SerDe, and OutputJobInfo.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 22329: HIVE-7190. WebHCat launcher task failure can cause two concurent user jobs to run

2014-06-06 Thread Eugene Koifman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22329/#review44992
---

1. I think webhcat-default.xml should be modified to include the jars that are
now required in templeton.libjars to minimize out-of-the-box config for end
users.
2. Is there any test (e2e) that can be added for this? (with reasonable amount
of effort)
3. When you tested that Pig/Hive jobs get properly tagged, you mean you tested
that MR jobs that are generated by Pig/Hive are tagged, correct?

hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/tool/JobSubmissionConstants.java
https://reviews.apache.org/r/22329/#comment79625

I think it would be useful to add a more detailed description of these
props. Something like what is in the JIRA ticket. I would have added the
ticket number to the comment, but Hive prohibits that.

hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/tool/LaunchMapper.java
https://reviews.apache.org/r/22329/#comment79632

Which user will this use? Is it the user running WebHCat or the value of
'doAs' parameter?

shims/0.23/src/main/java/org/apache/hadoop/mapred/WebHCatJTShim23.java
https://reviews.apache.org/r/22329/#comment79613

Is LOG.info() the right log level? Seems like it will pollute the log file.

shims/0.23/src/main/java/org/apache/hadoop/mapred/WebHCatJTShim23.java
https://reviews.apache.org/r/22329/#comment79615

Is LOG.info() the right level?

shims/0.23/src/main/java/org/apache/hadoop/mapred/WebHCatJTShim23.java
https://reviews.apache.org/r/22329/#comment79631

log level

- Eugene Koifman

On June 6, 2014, 10:02 p.m., Ivan Mitic wrote:

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22329/
---

(Updated June 6, 2014, 10:02 p.m.)

Review request for hive.

Repository: hive-git

Description
---

Approach in the patch is similar to what Oozie does to handle this situation.
Specifically, all child map jobs get tagged with the launcher MR job id. On
launcher task restart, launcher queries RM for the list of jobs that have the
tag and kills them. After that it moves on to start the same child job again.
Again, similarly to what Oozie does, a new templeton.job.launch.time property
is introduced that captures the launcher job submit timestamp and later used
to reduce the search window when RM is queried.

To validate the patch, you will need to add webhcat shim jars to
templeton.libjars as now webhcat launcher also has a dependency on hadoop
shims.

Diffs
-

hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/HiveDelegator.java
23b1c4f

hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/JarDelegator.java
41b1dc5

hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/LauncherDelegator.java
04a5c6f

hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/PigDelegator.java
04e061d

hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/SqoopDelegator.java
adcd917

hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/tool/JobSubmissionConstants.java
a6355a6

Diff: https://reviews.apache.org/r/22329/diff/

Testing
---

I have validated that MR, Pig and Hive jobs do get tagged appropriately. I
have also validated that previous child jobs do get killed on RM
failover/task failure.

Thanks,

Ivan Mitic

[jira] [Updated] (HIVE-6473) Allow writing HFiles via HBaseStorageHandler table

2014-06-06 Thread Nick Dimiduk (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-6473:
---

Fix Version/s: 0.14.0

 Allow writing HFiles via HBaseStorageHandler table
 --

 Key: HIVE-6473
 URL: https://issues.apache.org/jira/browse/HIVE-6473
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6473.0.patch.txt, HIVE-6473.1.patch, 
 HIVE-6473.1.patch.txt, HIVE-6473.2.patch, HIVE-6473.3.patch, 
 HIVE-6473.4.patch, HIVE-6473.5.patch, HIVE-6473.6.patch


 Generating HFiles for bulkload into HBase could be more convenient. Right now 
 we require the user to register a new table with the appropriate output 
 format. This patch allows the exact same functionality, but through an 
 existing table managed by the HBaseStorageHandler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6473) Allow writing HFiles via HBaseStorageHandler table

2014-06-06 Thread Nick Dimiduk (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-6473:
---

Attachment: HIVE-6473.6.patch

Rebased onto trunk again. Removed enabling of hbase_bulk.m; it mostly passes 
but is flakey for me. Will address it in a follow-on ticket.

 Allow writing HFiles via HBaseStorageHandler table
 --

 Key: HIVE-6473
 URL: https://issues.apache.org/jira/browse/HIVE-6473
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6473.0.patch.txt, HIVE-6473.1.patch, 
 HIVE-6473.1.patch.txt, HIVE-6473.2.patch, HIVE-6473.3.patch, 
 HIVE-6473.4.patch, HIVE-6473.5.patch, HIVE-6473.6.patch


 Generating HFiles for bulkload into HBase could be more convenient. Right now 
 we require the user to register a new table with the appropriate output 
 format. This patch allows the exact same functionality, but through an 
 existing table managed by the HBaseStorageHandler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7155) WebHCat controller job exceeds container memory limit

2014-06-06 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020636#comment-14020636
 ] 

Eugene Koifman commented on HIVE-7155:
--

+1

 WebHCat controller job exceeds container memory limit
 -

 Key: HIVE-7155
 URL: https://issues.apache.org/jira/browse/HIVE-7155
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.13.0
Reporter: shanyu zhao
Assignee: shanyu zhao
 Attachments: HIVE-7155.1.patch, HIVE-7155.patch


 Submit a Hive query on a large table via WebHCat results in failure because 
 the WebHCat controller job is killed by Yarn since it exceeds the memory 
 limit (set by mapreduce.map.memory.mb, defaults to 1GB):
 {code}
  INSERT OVERWRITE TABLE Temp_InjusticeEvents_2014_03_01_00_00 SELECT * from 
 Stage_InjusticeEvents where LogTimestamp  '2014-03-01 00:00:00' and 
 LogTimestamp = '2014-03-01 01:00:00';
 {code}
 We could increase mapreduce.map.memory.mb to solve this problem, but this way 
 we are changing this setting system wise.
 We need to provide a WebHCat configuration to overwrite 
 mapreduce.map.memory.mb when submitting the controller job.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-2365) SQL support for bulk load into HBase

2014-06-06 Thread Nick Dimiduk (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-2365:
---

Attachment: HIVE-2365.3.patch

Rebased onto HIVE-6473 patch v6.

 SQL support for bulk load into HBase
 

 Key: HIVE-2365
 URL: https://issues.apache.org/jira/browse/HIVE-2365
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: John Sichi
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-2365.2.patch.txt, HIVE-2365.3.patch, 
 HIVE-2365.3.patch, HIVE-2365.WIP.00.patch, HIVE-2365.WIP.01.patch, 
 HIVE-2365.WIP.01.patch


 Support the as simple as this SQL for bulk load from Hive into HBase.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 22329: HIVE-7190. WebHCat launcher task failure can cause two concurent user jobs to run

2014-06-06 Thread Eugene Koifman

On June 7, 2014, 1:05 a.m., Eugene Koifman wrote:
1. I think webhcat-default.xml should be modified to include the jars that
are now required in templeton.libjars to minimize out-of-the-box config for
end users.
2. Is there any test (e2e) that can be added for this? (with reasonable
amount of effort)
3. When you tested that Pig/Hive jobs get properly tagged, you mean you
tested that MR jobs that are generated by Pig/Hive are tagged, correct?

4. Actually, instead of doing 1, could WebHCat dynamically figure out which
hadoop version it's talking to and add only the necessary shim jar, rather than
shipping all of them? It reduces the amount of config needed. It would also
be better if we can only ship the minimal set of jars.

- Eugene

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22329/#review44992
---

On June 6, 2014, 10:02 p.m., Ivan Mitic wrote:

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22329/
---

(Updated June 6, 2014, 10:02 p.m.)

Review request for hive.

Repository: hive-git

Description
---

Approach in the patch is similar to what Oozie does to handle this situation.
Specifically, all child map jobs get tagged with the launcher MR job id. On
launcher task restart, launcher queries RM for the list of jobs that have the
tag and kills them. After that it moves on to start the same child job again.
Again, similarly to what Oozie does, a new templeton.job.launch.time property
is introduced that captures the launcher job submit timestamp and later used
to reduce the search window when RM is queried.

To validate the patch, you will need to add webhcat shim jars to
templeton.libjars as now webhcat launcher also has a dependency on hadoop
shims.

Diffs
-

hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/HiveDelegator.java
23b1c4f

hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/JarDelegator.java
41b1dc5

hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/LauncherDelegator.java
04a5c6f

hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/PigDelegator.java
04e061d

hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/SqoopDelegator.java
adcd917

hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/tool/JobSubmissionConstants.java
a6355a6

Diff: https://reviews.apache.org/r/22329/diff/

Testing
---

I have validated that MR, Pig and Hive jobs do get tagged appropriately. I
have also validated that previous child jobs do get killed on RM
failover/task failure.

Thanks,

Ivan Mitic

[jira] [Commented] (HIVE-7175) Provide password file option to beeline

2014-06-06 Thread Dr. Wendell Urth (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020642#comment-14020642
 ] 

Dr. Wendell Urth commented on HIVE-7175:


Hi [~hiveqa], none of the failed tests appear related to the small additive 
change specific to BeeLine done here. These tests appear to be generally 
failing on trunk, and are not caused by this patch. Let me know if I am wrong.

 Provide password file option to beeline
 ---

 Key: HIVE-7175
 URL: https://issues.apache.org/jira/browse/HIVE-7175
 Project: Hive
  Issue Type: Improvement
  Components: CLI, Clients
Affects Versions: 0.13.0
Reporter: Robert Justice
  Labels: features, security
 Attachments: HIVE-7175.patch


 For people connecting to Hive Server 2 with LDAP authentication enabled, in 
 order to batch run commands, we currently have to provide the password openly 
 in the command line.   They could use some expect scripting, but I think a 
 valid improvement would be to provide a password file option similar to other 
 CLI commands in hadoop (e.g. sqoop) to be more secure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7191) optimized map join hash table has a bug when it reaches 2Gb


[ 
https://issues.apache.org/jira/browse/HIVE-7191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020660#comment-14020660
 ] 

Hive QA commented on HIVE-7191:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12648727/HIVE-7191.patch

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 5510 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimal
org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalX
org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalXY
org.apache.hive.hcatalog.templeton.tool.TestTempletonUtils.testPropertiesParsing
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/401/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/401/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-401/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12648727

 optimized map join hash table has a bug when it reaches 2Gb
 ---

 Key: HIVE-7191
 URL: https://issues.apache.org/jira/browse/HIVE-7191
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-7191.patch


 Via [~t3rmin4t0r]:
 {noformat}
 Caused by: java.lang.ArrayIndexOutOfBoundsException: -204
 at java.util.ArrayList.elementData(ArrayList.java:371)
 at java.util.ArrayList.get(ArrayList.java:384)
 at 
 org.apache.hadoop.hive.serde2.WriteBuffers.setReadPoint(WriteBuffers.java:95)
 at 
 org.apache.hadoop.hive.serde2.WriteBuffers.hashCode(WriteBuffers.java:100)
 at 
 org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.put(BytesBytesMultiHashMap.java:203)
 at 
 org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer.putRow(MapJoinBytesTableContainer.java:266)
 at 
 org.apache.hadoop.hive.ql.exec.tez.HashTableLoader.load(HashTableLoader.java:124)
 ... 16 more
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7192) Hive Streaming - Some required settings are not mentioned in the documentation