[jira] [Updated] (HIVE-5538) Turn on vectorization by default.

2014-04-17 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HIVE-5538:
---

Status: Open  (was: Patch Available)

 Turn on vectorization by default.
 -

 Key: HIVE-5538
 URL: https://issues.apache.org/jira/browse/HIVE-5538
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Attachments: HIVE-5538.1.patch


   Vectorization should be turned on by default, so that users don't have to 
 specifically enable vectorization. 
   Vectorization code validates and ensures that a query falls back to row 
 mode if it is not supported on vectorized code path. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-5538) Turn on vectorization by default.

2014-04-17 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HIVE-5538:
---

Attachment: HIVE-5538.2.patch

Good point! Uploading the patch re-based against latest trunk.

 Turn on vectorization by default.
 -

 Key: HIVE-5538
 URL: https://issues.apache.org/jira/browse/HIVE-5538
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Attachments: HIVE-5538.1.patch, HIVE-5538.2.patch


   Vectorization should be turned on by default, so that users don't have to 
 specifically enable vectorization. 
   Vectorization code validates and ensures that a query falls back to row 
 mode if it is not supported on vectorized code path. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-5538) Turn on vectorization by default.

2014-04-17 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HIVE-5538:
---

Status: Patch Available  (was: Open)

 Turn on vectorization by default.
 -

 Key: HIVE-5538
 URL: https://issues.apache.org/jira/browse/HIVE-5538
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Attachments: HIVE-5538.1.patch, HIVE-5538.2.patch


   Vectorization should be turned on by default, so that users don't have to 
 specifically enable vectorization. 
   Vectorization code validates and ensures that a query falls back to row 
 mode if it is not supported on vectorized code path. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6318) Document SSL support added to HiveServer2

2014-04-17 Thread Vaibhav Gumashta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13972361#comment-13972361
 ] 

Vaibhav Gumashta commented on HIVE-6318:


[~leftylev] Sorry for the late revert. After the missing list you shared in 
hive 13 release thread, I gave the description to [~rhbutani], who's submitted 
those as a cumulative patch. Thanks a lot for the nudge!

 Document SSL support added to HiveServer2
 -

 Key: HIVE-6318
 URL: https://issues.apache.org/jira/browse/HIVE-6318
 Project: Hive
  Issue Type: Sub-task
  Components: HiveServer2, JDBC
Affects Versions: 0.13.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Fix For: 0.13.0


 SSL support is/will be added to HiveServer2 running in both binary and http 
 mode, in unsecured auth modes. Need to document the usage and setup.
 Linking relevant jiras.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6466) Add support for pluggable authentication modules (PAM) in Hive

2014-04-17 Thread Vaibhav Gumashta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13972362#comment-13972362
 ] 

Vaibhav Gumashta commented on HIVE-6466:


[~leftylev] Thanks a lot for the edits!

 Add support for pluggable authentication modules (PAM) in Hive
 --

 Key: HIVE-6466
 URL: https://issues.apache.org/jira/browse/HIVE-6466
 Project: Hive
  Issue Type: New Feature
  Components: HiveServer2
Affects Versions: 0.13.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Fix For: 0.13.0

 Attachments: HIVE-6466.1.patch, HIVE-6466.2.patch


 More on PAM in these articles:
 http://www.tuxradar.com/content/how-pam-works
 https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Managing_Smart_Cards/Pluggable_Authentication_Modules.html
 Usage from JPAM api: http://jpam.sourceforge.net/JPamUserGuide.html#id.s7.1



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6468) HS2 out of memory error when curl sends a get request

2014-04-17 Thread Vaibhav Gumashta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13972363#comment-13972363
 ] 

Vaibhav Gumashta commented on HIVE-6468:


Thanks a lot for the edits  corrections [~leftylev]! The doc looks good.

 HS2 out of memory error when curl sends a get request
 -

 Key: HIVE-6468
 URL: https://issues.apache.org/jira/browse/HIVE-6468
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
 Environment: Centos 6.3, hive 12, hadoop-2.2
Reporter: Abin Shahab
Assignee: Navis
 Attachments: HIVE-6468.1.patch.txt


 We see an out of memory error when we run simple beeline calls.
 (The hive.server2.transport.mode is binary)
 curl localhost:1
 Exception in thread pool-2-thread-8 java.lang.OutOfMemoryError: Java heap 
 space
   at 
 org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:181)
   at 
 org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125)
   at 
 org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253)
   at 
 org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
   at 
 org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
   at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:189)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:744)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6427) Hive Server2 should reopen Metastore client in case of any Thrift exceptions

2014-04-17 Thread Andrey Stepachev (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Stepachev updated HIVE-6427:
---

Attachment: (was: HIVE-6427-2.patch)

 Hive Server2 should reopen Metastore client in case of any Thrift exceptions
 

 Key: HIVE-6427
 URL: https://issues.apache.org/jira/browse/HIVE-6427
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.12.0
Reporter: Andrey Stepachev
Assignee: Andrey Stepachev
Priority: Critical
 Attachments: HIVE-6427.patch


 In case of metastore restart hive server doesn't reopen connection to 
 metastore. Any command gives broken pipe or similar exceptions.
 http://paste.ubuntu.com/6926215/
 Any subsequent command doesn't reestablish connection and tries to use stale 
 (closed) connection.
 Looks like we shouldn't blindly convert any MetaException to 
 HiveSQLException, but should distinguish between fatal exceptions and logical 
 exceptions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6427) Hive Server2 should reopen Metastore client in case of any Thrift exceptions

2014-04-17 Thread Andrey Stepachev (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Stepachev updated HIVE-6427:
---

Attachment: HIVE-6427.patch

 Hive Server2 should reopen Metastore client in case of any Thrift exceptions
 

 Key: HIVE-6427
 URL: https://issues.apache.org/jira/browse/HIVE-6427
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.12.0
 Environment: cloudera cdh5 beta2
Reporter: Andrey Stepachev
Assignee: Andrey Stepachev
Priority: Critical
 Attachments: HIVE-6427.patch


 In case of metastore restart hive server doesn't reopen connection to 
 metastore. Any command gives broken pipe or similar exceptions.
 http://paste.ubuntu.com/6926215/
 Any subsequent command doesn't reestablish connection and tries to use stale 
 (closed) connection.
 Looks like we shouldn't blindly convert any MetaException to 
 HiveSQLException, but should distinguish between fatal exceptions and logical 
 exceptions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6427) Hive Server2 should reopen Metastore client in case of any Thrift exceptions

2014-04-17 Thread Andrey Stepachev (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Stepachev updated HIVE-6427:
---

Environment: (was: cloudera cdh5 beta2)

 Hive Server2 should reopen Metastore client in case of any Thrift exceptions
 

 Key: HIVE-6427
 URL: https://issues.apache.org/jira/browse/HIVE-6427
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.12.0
Reporter: Andrey Stepachev
Assignee: Andrey Stepachev
Priority: Critical
 Attachments: HIVE-6427.patch


 In case of metastore restart hive server doesn't reopen connection to 
 metastore. Any command gives broken pipe or similar exceptions.
 http://paste.ubuntu.com/6926215/
 Any subsequent command doesn't reestablish connection and tries to use stale 
 (closed) connection.
 Looks like we shouldn't blindly convert any MetaException to 
 HiveSQLException, but should distinguish between fatal exceptions and logical 
 exceptions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Review Request 20444: Hive Server2 should reopen Metastore client connection in case of any Thrift exceptions

2014-04-17 Thread Andrey Stepachev

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20444/
---

Review request for hive.


Bugs: HIVE-6427
https://issues.apache.org/jira/browse/HIVE-6427


Repository: hive-git


Description
---

Connection to metastore should be reestablished. TExceptions should not be 
swallowed.


Diffs
-

  
hcatalog/core/src/test/java/org/apache/hive/hcatalog/cli/TestSemanticAnalysis.java
 3cc548e 
  
hcatalog/webhcat/java-client/src/main/java/org/apache/hive/hcatalog/api/HCatClientHMSImpl.java
 c4b5971 
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 
664dccd 
  
metastore/src/java/org/apache/hadoop/hive/metastore/RetryingMetaStoreClient.java
 5410b45 
  service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 
a9d5902 

Diff: https://reviews.apache.org/r/20444/diff/


Testing
---

Using in our production more then 1 month.


Thanks,

Andrey Stepachev



[jira] [Commented] (HIVE-6427) Hive Server2 should reopen Metastore client in case of any Thrift exceptions

2014-04-17 Thread Andrey Stepachev (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13972527#comment-13972527
 ] 

Andrey Stepachev commented on HIVE-6427:


https://reviews.apache.org/r/20444/

 Hive Server2 should reopen Metastore client in case of any Thrift exceptions
 

 Key: HIVE-6427
 URL: https://issues.apache.org/jira/browse/HIVE-6427
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.12.0
Reporter: Andrey Stepachev
Assignee: Andrey Stepachev
Priority: Critical
 Attachments: HIVE-6427.patch


 In case of metastore restart hive server doesn't reopen connection to 
 metastore. Any command gives broken pipe or similar exceptions.
 http://paste.ubuntu.com/6926215/
 Any subsequent command doesn't reestablish connection and tries to use stale 
 (closed) connection.
 Looks like we shouldn't blindly convert any MetaException to 
 HiveSQLException, but should distinguish between fatal exceptions and logical 
 exceptions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6920) Parquet Serde Simplification

2014-04-17 Thread Justin Coffey (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13972790#comment-13972790
 ] 

Justin Coffey commented on HIVE-6920:
-

cc: [~brocknoland] [~xuefuz]

 Parquet Serde Simplification
 

 Key: HIVE-6920
 URL: https://issues.apache.org/jira/browse/HIVE-6920
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Affects Versions: 0.13.0
Reporter: Justin Coffey
Assignee: Justin Coffey
Priority: Minor
 Fix For: 0.14.0

 Attachments: HIVE-6920.patch


 Various fixes and code simplification in the ParquetHiveSerde (with minor 
 optimizations)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-538) make hive_jdbc.jar self-containing

2014-04-17 Thread Nick White (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick White updated HIVE-538:


Attachment: HIVE-538.patch

I've attached a patch that builds a self-containing jar -

 make hive_jdbc.jar self-containing
 --

 Key: HIVE-538
 URL: https://issues.apache.org/jira/browse/HIVE-538
 Project: Hive
  Issue Type: Improvement
  Components: JDBC
Affects Versions: 0.3.0, 0.4.0, 0.6.0, 0.13.0
Reporter: Raghotham Murthy
Assignee: Nick White
 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-538.D2553.1.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-538.D2553.2.patch, HIVE-538.patch


 Currently, most jars in hive/build/dist/lib and the hadoop-*-core.jar are 
 required in the classpath to run jdbc applications on hive. We need to do 
 atleast the following to get rid of most unnecessary dependencies:
 1. get rid of dynamic serde and use a standard serialization format, maybe 
 tab separated, json or avro
 2. dont use hadoop configuration parameters
 3. repackage thrift and fb303 classes into hive_jdbc.jar



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6923) Use slf4j For Logging Everywhere

2014-04-17 Thread Nick White (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick White updated HIVE-6923:
-

Assignee: Nick White
  Status: Patch Available  (was: Open)

 Use slf4j For Logging Everywhere
 

 Key: HIVE-6923
 URL: https://issues.apache.org/jira/browse/HIVE-6923
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Nick White
Assignee: Nick White
 Fix For: 0.13.0

 Attachments: HIVE-6923.patch


 Hive uses a mixture of slf4j (backed by log4j) and commons-logging. I've 
 attached a patch to tidy this up, by just using slf4j for all loggers. This 
 means that applications using the JDBC driver can make Hive log through their 
 own slf4j implementation consistently.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6923) Use slf4j For Logging Everywhere

2014-04-17 Thread Nick White (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick White updated HIVE-6923:
-

Attachment: HIVE-6923.patch

 Use slf4j For Logging Everywhere
 

 Key: HIVE-6923
 URL: https://issues.apache.org/jira/browse/HIVE-6923
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Nick White
 Fix For: 0.13.0

 Attachments: HIVE-6923.patch


 Hive uses a mixture of slf4j (backed by log4j) and commons-logging. I've 
 attached a patch to tidy this up, by just using slf4j for all loggers. This 
 means that applications using the JDBC driver can make Hive log through their 
 own slf4j implementation consistently.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-6923) Use slf4j For Logging Everywhere

2014-04-17 Thread Nick White (JIRA)
Nick White created HIVE-6923:


 Summary: Use slf4j For Logging Everywhere
 Key: HIVE-6923
 URL: https://issues.apache.org/jira/browse/HIVE-6923
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Nick White
 Fix For: 0.13.0
 Attachments: HIVE-6923.patch

Hive uses a mixture of slf4j (backed by log4j) and commons-logging. I've 
attached a patch to tidy this up, by just using slf4j for all loggers. This 
means that applications using the JDBC driver can make Hive log through their 
own slf4j implementation consistently.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6912) HWI not working - HTTP ERROR 500

2014-04-17 Thread sunil ranjan khuntia (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunil ranjan khuntia updated HIVE-6912:
---

Priority: Critical  (was: Major)

 HWI not working - HTTP ERROR 500
 

 Key: HIVE-6912
 URL: https://issues.apache.org/jira/browse/HIVE-6912
 Project: Hive
  Issue Type: Bug
Reporter: sunil ranjan khuntia
Priority: Critical

 I tried to use hive HWI to write hive queries on a UI.
 As p[er the steps mentioned here 
 https://cwiki.apache.org/confluence/display/Hive/HiveWebInterface
 I set Ant and ran the hive hwi service.
 but In browser when i hit http://localhost:/hwi i got the below error
 HTTP ERROR 500
 Problem accessing /hwi/. Reason:
 Unable to find a javac compiler;
 com.sun.tools.javac.Main is not on the classpath.
 Perhaps JAVA_HOME does not point to the JDK.
 It is currently set to /usr/java/jdk1.6.0_32/jre
 Caused by:
 Unable to find a javac compiler;
 com.sun.tools.javac.Main is not on the classpath.
 Perhaps JAVA_HOME does not point to the JDK.
 It is currently set to /usr/java/jdk1.6.0_32/jre
   at 
 org.apache.tools.ant.taskdefs.compilers.CompilerAdapterFactory.getCompiler(CompilerAdapterFactory.java:129)
 I have checked and changed JAVA_HOME. But its still the same



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6903) Change default value of hive.metastore.execute.setugi to true

2014-04-17 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-6903:
---

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Thejas for review!

 Change default value of hive.metastore.execute.setugi to true
 -

 Key: HIVE-6903
 URL: https://issues.apache.org/jira/browse/HIVE-6903
 Project: Hive
  Issue Type: Task
  Components: Metastore
Affects Versions: 0.10.0, 0.11.0, 0.12.0, 0.13.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.14.0

 Attachments: HIVE-6903.1.patch, HIVE-6903.patch


 Since its introduction in HIVE-2616 I havent seen any bug reported for it, 
 only grief from users who expect system to work as if this is true by 
 default. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6913) Hive unable to find the hashtable file during complex multi-staged map join

2014-04-17 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973072#comment-13973072
 ] 

Ashutosh Chauhan commented on HIVE-6913:


+1

 Hive unable to find the hashtable file during complex multi-staged map join
 ---

 Key: HIVE-6913
 URL: https://issues.apache.org/jira/browse/HIVE-6913
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-6913.patch, HIVE-6913.patch


 If a query has multiple mapjoins and one of the tables to be mapjoined is 
 empty, the query can result in a no such file or directory when looking for 
 the hashtable.
 This is because when we generate a dummy hash table, we do not close the 
 TableScan (TS) operator for that table. Additionally, HashTableSinkOperator 
 (HTSO) outputs it's hash tables in the closeOp method. However, when close is 
 called on HTSO we check to ensure that all parents are closed: 
 https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java#L333
 which is not true on this case, because the TS operator for the empty table 
 was never closed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-1608) use sequencefile as the default for storing intermediate results

2014-04-17 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-1608:
---

Status: Open  (was: Patch Available)

[~appodictic] Do you know whats the factor usually? How large is sequence file 
compared to text in usual scenario?

[~brocknoland] It will be good to enlist the benefits we will get by switching 
over to sequence file.

 use sequencefile as the default for storing intermediate results
 

 Key: HIVE-1608
 URL: https://issues.apache.org/jira/browse/HIVE-1608
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.7.0
Reporter: Namit Jain
Assignee: Brock Noland
 Fix For: 0.14.0

 Attachments: HIVE-1608.patch


 The only argument for having a text file for storing intermediate results 
 seems to be better debuggability.
 But, tailing a sequence file is possible, and it should be more space 
 efficient



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6923) Use slf4j For Logging Everywhere

2014-04-17 Thread Nick White (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick White updated HIVE-6923:
-

Attachment: (was: HIVE-6923.patch)

 Use slf4j For Logging Everywhere
 

 Key: HIVE-6923
 URL: https://issues.apache.org/jira/browse/HIVE-6923
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Nick White
Assignee: Nick White
 Fix For: 0.13.0

 Attachments: HIVE-6923.patch


 Hive uses a mixture of slf4j (backed by log4j) and commons-logging. I've 
 attached a patch to tidy this up, by just using slf4j for all loggers. This 
 means that applications using the JDBC driver can make Hive log through their 
 own slf4j implementation consistently.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6923) Use slf4j For Logging Everywhere

2014-04-17 Thread Nick White (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick White updated HIVE-6923:
-

Attachment: HIVE-6923.patch

 Use slf4j For Logging Everywhere
 

 Key: HIVE-6923
 URL: https://issues.apache.org/jira/browse/HIVE-6923
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Nick White
Assignee: Nick White
 Fix For: 0.13.0

 Attachments: HIVE-6923.patch


 Hive uses a mixture of slf4j (backed by log4j) and commons-logging. I've 
 attached a patch to tidy this up, by just using slf4j for all loggers. This 
 means that applications using the JDBC driver can make Hive log through their 
 own slf4j implementation consistently.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-4576) templeton.hive.properties does not allow values with commas

2014-04-17 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973089#comment-13973089
 ] 

Ashutosh Chauhan commented on HIVE-4576:


Seems like blindly replacing \ after split may run into problems if \ is used 
in non-escaping context , eg windows path like D:\hive\hive-site.xml or may be 
I am misreading the patch.

 templeton.hive.properties does not allow values with commas
 ---

 Key: HIVE-4576
 URL: https://issues.apache.org/jira/browse/HIVE-4576
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.5.0
Reporter: Vitaliy Fuks
Assignee: Eugene Koifman
Priority: Minor
 Attachments: HIVE-4576.patch


 templeton.hive.properties accepts a comma-separated list of key=value 
 property pairs that will be passed to Hive.
 However, this makes it impossible to use any value that itself has a comma 
 in it.
 For example:
 {code:xml}property
   nametempleton.hive.properties/name
   
 valuehive.metastore.sasl.enabled=false,hive.metastore.uris=thrift://foo1.example.com:9083,foo2.example.com:9083/value
 /property{code}
 {noformat}templeton: starting [/usr/bin/hive, --service, cli, --hiveconf, 
 hive.metastore.sasl.enabled=false, --hiveconf, 
 hive.metastore.uris=thrift://foo1.example.com:9083, --hiveconf, 
 foo2.example.com:9083 etc..{noformat}
 because the value is parsed using standard 
 org.apache.hadoop.conf.Configuration.getStrings() call which simply splits on 
 commas from here:
 {code:java}for (String prop : 
 appConf.getStrings(AppConfig.HIVE_PROPS_NAME)){code}
 This is problematic for any hive property that itself has multiple values, 
 such as hive.metastore.uris above or hive.aux.jars.path.
 There should be some way to escape commas or a different delimiter should 
 be used.
 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6361) Un-fork Sqlline

2014-04-17 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973115#comment-13973115
 ] 

Ashutosh Chauhan commented on HIVE-6361:


[~julianhyde] If you are working on this, this may be right time to get this 
work in Hive. We are just getting started after doing a release. So, this seems 
like a right time to absorb code churn which we may have here.

 Un-fork Sqlline
 ---

 Key: HIVE-6361
 URL: https://issues.apache.org/jira/browse/HIVE-6361
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 0.12.0
Reporter: Julian Hyde

 I propose to merge the two development forks of sqlline: Hive's beeline 
 module, and the fork at https://github.com/julianhyde/sqlline.
 How did the forks come about? Hive’s SQL command-line interface Beeline was 
 created by forking Sqlline (see HIVE-987, HIVE-3100), which at the time it 
 was a useful but low-activity project languishing on SourceForge without an 
 active owner. Around the same time, Julian Hyde independently started a 
 github repo based on the same code base. Now several projects are using 
 Julian Hyde's sqlline, including Apache Drill, Apache Phoenix, Cascading 
 Lingual and Optiq.
 Merging these two forks will allow us to pool our resources. (Case in point: 
 Drill issue DRILL-327 had already been fixed in a later version of sqlline; 
 it still exists in beeline.)
 I propose the following steps:
 1. Copy Julian Hyde's sqlline as a new Hive module, hive-sqlline.
 2. Port fixes to hive-beeline into hive-sqlline.
 3. Make hive-beeline depend on hive-sqlline, and remove code that is 
 identical. What remains in the hive-beeline module is Beeline.java (a derived 
 class of Sqlline.java) and Hive-specific extensions.
 4. Make the hive-sqlline the official successor to Julian Hyde's sqlline.
 This achieves continuity for Hive’s users, gives the users of the non-Hive 
 sqlline a version with minimal dependencies, unifies the two code lines, and 
 brings everything under the Apache roof.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6908) TestThriftBinaryCLIService.testExecuteStatementAsync has intermittent failures

2014-04-17 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973119#comment-13973119
 ] 

Ashutosh Chauhan commented on HIVE-6908:


I am not sure what original author of test had in mind for this. Perhaps 
[~vgumashta] may know more.

 TestThriftBinaryCLIService.testExecuteStatementAsync has intermittent failures
 --

 Key: HIVE-6908
 URL: https://issues.apache.org/jira/browse/HIVE-6908
 Project: Hive
  Issue Type: Bug
  Components: Tests
Affects Versions: 0.13.0
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: HIVE-6908.patch


 This has failed sometimes in the pre-commit tests.
 ThriftCLIServiceTest.testExecuteStatementAsync runs two statements.  They are 
 given 100 second timeout total, not sure if its by intention.  As the first 
 is a select query, it will take a majority of the time.  The second statement 
 (create table) should be quicker, but it fails sometimes because timeout is 
 already mostly used up.
 The timeout should probably be reset after the first statement.  If the 
 operation finishes before the timeout, it wont have any effect as it'll break 
 out.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-5870) Move TestJDBCDriver2.testNewConnectionConfiguration to TestJDBCWithMiniHS2

2014-04-17 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-5870:
---

Status: Open  (was: Patch Available)

This one seems to fall through cracks. Lets get this in. [~szehon] Patch needs 
a rebase.

 Move TestJDBCDriver2.testNewConnectionConfiguration to TestJDBCWithMiniHS2
 --

 Key: HIVE-5870
 URL: https://issues.apache.org/jira/browse/HIVE-5870
 Project: Hive
  Issue Type: Bug
  Components: Tests
Affects Versions: 0.13.0
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: HIVE-5870.patch


 TestJDBCDriver2.testNewConnectionConfiguration() attempts to start a 
 Hiveserver2 instance in the test.
 This can cause issues as creating HiveServer2 needs correct environment/path. 
  This test should be moved to TestJdbcWithMiniHS2, which uses MiniHS2.  
 MiniHS2 is for this purpose (setting all the environment properly before 
 starting HiveServer2 instance).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5538) Turn on vectorization by default.

2014-04-17 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973129#comment-13973129
 ] 

Hive QA commented on HIVE-5538:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12640595/HIVE-5538.2.patch

{color:red}ERROR:{color} -1 due to 34 failed/errored test(s), 5405 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_filter
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_select
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_union
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join32
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_without_localtask
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_filter_numeric
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input26
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_view
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_limit_pushdown
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_metadataonly1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_createas1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_diff_part_cols
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_diff_part_cols2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_predicate_pushdown
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_create
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_union_view
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_quote1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_smb_mapjoin9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_aggregate
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_expressions
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_mapjoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_math_funcs
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_decimal_date
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_context
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_auto_sortmerge_join_16
{noformat}

Test results: http://bigtop01.cloudera.org:8080/job/precommit-hive/17/testReport
Console output: http://bigtop01.cloudera.org:8080/job/precommit-hive/17/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 34 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12640595

 Turn on vectorization by default.
 -

 Key: HIVE-5538
 URL: https://issues.apache.org/jira/browse/HIVE-5538
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Attachments: HIVE-5538.1.patch, HIVE-5538.2.patch


   Vectorization should be turned on by default, so that users don't have to 
 specifically enable vectorization. 
   Vectorization code validates and ensures that a query falls back to row 
 mode if it is not supported on vectorized code path. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6835) Reading of partitioned Avro data fails if partition schema does not match table schema

2014-04-17 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973141#comment-13973141
 ] 

Xuefu Zhang commented on HIVE-6835:
---

Just curious. If the avro serde is initialized with the table schema (which is 
the latest), is there a problem for it to read the old data, that is, data that 
conforms to the partition level metadata? I have seen so many JIRAs about 
schema evolution, and isn't quite sure what is possible and what is not.

The example given here is adding a new column in the beginning. What about 
other cases, such as adding it at the end, or changing data type, etc?

 Reading of partitioned Avro data fails if partition schema does not match 
 table schema
 --

 Key: HIVE-6835
 URL: https://issues.apache.org/jira/browse/HIVE-6835
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Anthony Hsu
Assignee: Anthony Hsu
 Attachments: HIVE-6835.1.patch, HIVE-6835.2.patch, HIVE-6835.3.patch


 To reproduce:
 {code}
 create table testarray (a arraystring);
 load data local inpath '/home/ahsu/test/array.txt' into table testarray;
 # create partitioned Avro table with one array column
 create table avroarray partitioned by (y string) row format serde 
 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' with serdeproperties 
 ('avro.schema.literal'='{namespace:test,name:avroarray,type: 
 record, fields: [ { name:a, type:{type:array,items:string} 
 } ] }')  STORED as INPUTFORMAT  
 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'  OUTPUTFORMAT  
 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat';
 insert into table avroarray partition(y=1) select * from testarray;
 # add an int column with a default value of 0
 alter table avroarray set serde 
 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' with 
 serdeproperties('avro.schema.literal'='{namespace:test,name:avroarray,type:
  record, fields: [ {name:intfield,type:int,default:0},{ 
 name:a, type:{type:array,items:string} } ] }');
 # fails with ClassCastException
 select * from avroarray;
 {code}
 The select * fails with:
 {code}
 Failed with exception java.io.IOException:java.lang.ClassCastException: 
 org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector 
 cannot be cast to 
 org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5538) Turn on vectorization by default.

2014-04-17 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973160#comment-13973160
 ] 

Ashutosh Chauhan commented on HIVE-5538:


I think its good idea to turn verctorization on by default. Lets triage these 
failures.

 Turn on vectorization by default.
 -

 Key: HIVE-5538
 URL: https://issues.apache.org/jira/browse/HIVE-5538
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Attachments: HIVE-5538.1.patch, HIVE-5538.2.patch


   Vectorization should be turned on by default, so that users don't have to 
 specifically enable vectorization. 
   Vectorization code validates and ensures that a query falls back to row 
 mode if it is not supported on vectorized code path. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6835) Reading of partitioned Avro data fails if partition schema does not match table schema

2014-04-17 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973166#comment-13973166
 ] 

Ashutosh Chauhan commented on HIVE-6835:


I would also like to know answer for Xuefu's questions. It will be good to 
document what kind of schema evolution is supported by Avro Serde and more 
importantly what kinds are *not* supported.

 Reading of partitioned Avro data fails if partition schema does not match 
 table schema
 --

 Key: HIVE-6835
 URL: https://issues.apache.org/jira/browse/HIVE-6835
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Anthony Hsu
Assignee: Anthony Hsu
 Attachments: HIVE-6835.1.patch, HIVE-6835.2.patch, HIVE-6835.3.patch


 To reproduce:
 {code}
 create table testarray (a arraystring);
 load data local inpath '/home/ahsu/test/array.txt' into table testarray;
 # create partitioned Avro table with one array column
 create table avroarray partitioned by (y string) row format serde 
 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' with serdeproperties 
 ('avro.schema.literal'='{namespace:test,name:avroarray,type: 
 record, fields: [ { name:a, type:{type:array,items:string} 
 } ] }')  STORED as INPUTFORMAT  
 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'  OUTPUTFORMAT  
 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat';
 insert into table avroarray partition(y=1) select * from testarray;
 # add an int column with a default value of 0
 alter table avroarray set serde 
 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' with 
 serdeproperties('avro.schema.literal'='{namespace:test,name:avroarray,type:
  record, fields: [ {name:intfield,type:int,default:0},{ 
 name:a, type:{type:array,items:string} } ] }');
 # fails with ClassCastException
 select * from avroarray;
 {code}
 The select * fails with:
 {code}
 Failed with exception java.io.IOException:java.lang.ClassCastException: 
 org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector 
 cannot be cast to 
 org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-6924) MapJoinKeyBytes::hashCode() should use Murmur hash

2014-04-17 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-6924:
--

 Summary: MapJoinKeyBytes::hashCode() should use Murmur hash
 Key: HIVE-6924
 URL: https://issues.apache.org/jira/browse/HIVE-6924
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin


Existing hashCode is bad, causes HashMap to cluster



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6430) MapJoin hash table has large memory overhead

2014-04-17 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973180#comment-13973180
 ] 

Sergey Shelukhin commented on HIVE-6430:


We should probably do the same in actual codebase... I'll file a JIRA

 MapJoin hash table has large memory overhead
 

 Key: HIVE-6430
 URL: https://issues.apache.org/jira/browse/HIVE-6430
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-6430.01.patch, HIVE-6430.02.patch, 
 HIVE-6430.03.patch, HIVE-6430.04.patch, HIVE-6430.05.patch, 
 HIVE-6430.06.patch, HIVE-6430.07.patch, HIVE-6430.patch


 Right now, in some queries, I see that storing e.g. 4 ints (2 for key and 2 
 for row) can take several hundred bytes, which is ridiculous. I am reducing 
 the size of MJKey and MJRowContainer in other jiras, but in general we don't 
 need to have java hash table there.  We can either use primitive-friendly 
 hashtable like the one from HPPC (Apache-licenced), or some variation, to map 
 primitive keys to single row storage structure without an object per row 
 (similar to vectorization).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-538) make hive_jdbc.jar self-containing

2014-04-17 Thread Nick White (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick White updated HIVE-538:


Attachment: (was: HIVE-538.patch)

 make hive_jdbc.jar self-containing
 --

 Key: HIVE-538
 URL: https://issues.apache.org/jira/browse/HIVE-538
 Project: Hive
  Issue Type: Improvement
  Components: JDBC
Affects Versions: 0.3.0, 0.4.0, 0.6.0, 0.13.0
Reporter: Raghotham Murthy
Assignee: Nick White
 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-538.D2553.1.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-538.D2553.2.patch, HIVE-538.patch


 Currently, most jars in hive/build/dist/lib and the hadoop-*-core.jar are 
 required in the classpath to run jdbc applications on hive. We need to do 
 atleast the following to get rid of most unnecessary dependencies:
 1. get rid of dynamic serde and use a standard serialization format, maybe 
 tab separated, json or avro
 2. dont use hadoop configuration parameters
 3. repackage thrift and fb303 classes into hive_jdbc.jar



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-538) make hive_jdbc.jar self-containing

2014-04-17 Thread Nick White (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick White updated HIVE-538:


Attachment: HIVE-538.patch

 make hive_jdbc.jar self-containing
 --

 Key: HIVE-538
 URL: https://issues.apache.org/jira/browse/HIVE-538
 Project: Hive
  Issue Type: Improvement
  Components: JDBC
Affects Versions: 0.3.0, 0.4.0, 0.6.0, 0.13.0
Reporter: Raghotham Murthy
Assignee: Nick White
 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-538.D2553.1.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-538.D2553.2.patch, HIVE-538.patch


 Currently, most jars in hive/build/dist/lib and the hadoop-*-core.jar are 
 required in the classpath to run jdbc applications on hive. We need to do 
 atleast the following to get rid of most unnecessary dependencies:
 1. get rid of dynamic serde and use a standard serialization format, maybe 
 tab separated, json or avro
 2. dont use hadoop configuration parameters
 3. repackage thrift and fb303 classes into hive_jdbc.jar



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-538) make hive_jdbc.jar self-containing

2014-04-17 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973198#comment-13973198
 ] 

Ashutosh Chauhan commented on HIVE-538:
---

[~njw45] Can you take a look at HIVE-6593 to see if it satisfies your needs?

 make hive_jdbc.jar self-containing
 --

 Key: HIVE-538
 URL: https://issues.apache.org/jira/browse/HIVE-538
 Project: Hive
  Issue Type: Improvement
  Components: JDBC
Affects Versions: 0.3.0, 0.4.0, 0.6.0, 0.13.0
Reporter: Raghotham Murthy
Assignee: Nick White
 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-538.D2553.1.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-538.D2553.2.patch, HIVE-538.patch


 Currently, most jars in hive/build/dist/lib and the hadoop-*-core.jar are 
 required in the classpath to run jdbc applications on hive. We need to do 
 atleast the following to get rid of most unnecessary dependencies:
 1. get rid of dynamic serde and use a standard serialization format, maybe 
 tab separated, json or avro
 2. dont use hadoop configuration parameters
 3. repackage thrift and fb303 classes into hive_jdbc.jar



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-6925) show query progress in Beeline

2014-04-17 Thread Gwen Shapira (JIRA)
Gwen Shapira created HIVE-6925:
--

 Summary: show query progress in Beeline
 Key: HIVE-6925
 URL: https://issues.apache.org/jira/browse/HIVE-6925
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 0.12.0
Reporter: Gwen Shapira


In the old Hive CLI, the MR output was written to screen. Making it easy to 
watch the progress - map and reduce % done.

In Beeline, there is no output until the query is done (or fails). Showing some 
kind of progress indicator will be nice.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-1608) use sequencefile as the default for storing intermediate results

2014-04-17 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973210#comment-13973210
 ] 

Edward Capriolo commented on HIVE-1608:
---

It is not much. SequenceFile + none (codec) only ads some block information 
around text. I still thing sequence by default is a good idea. It makes it 
easier to add compression later without sacrificing split- ablility. 

 use sequencefile as the default for storing intermediate results
 

 Key: HIVE-1608
 URL: https://issues.apache.org/jira/browse/HIVE-1608
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.7.0
Reporter: Namit Jain
Assignee: Brock Noland
 Fix For: 0.14.0

 Attachments: HIVE-1608.patch


 The only argument for having a text file for storing intermediate results 
 seems to be better debuggability.
 But, tailing a sequence file is possible, and it should be more space 
 efficient



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Improving test coverage for HiveServer2

2014-04-17 Thread Ashutosh Chauhan
First, I would like to thank Prasad Mujumdar for his recent contributions
of MiniHS2 and MiniKDC. Those are awesome test infra components to make
testing easier for HS2 and kerberos. Thanks, Prasad!

With that checked in now in our repo, we can start making use of those to
improve our test coverage. There are variety of new features which have
landed recently in trunk for HS2 like http transport, pluggable
authentication, new authorization model to name a few. There are test cases
for these in isolation, but not in combination of other features. e.g, HS2
running on http transport with new auth model with kerberos authentication.
Or, HS2 running in binary mode with LDAP authentication. So on and so
forth. I don't have good sense which all combinations we want to support
and thus test. But for atleast those which we do want to support, seems
like its possible to write tests for those using MiniHS2  MiniKDC. I think
we can take our existing tests in TestJdbcDriver2 (possibly with little
refactoring) and run it against MiniHS2 in various server configurations.

Also, we have TestBeelineDriver, which is currently turned off by default.
Shall we turn it on? I think it can tests at various level of concurrency.
May be to begin with we can set concurrency level at 1 and if things look
good bump that number higher up.

Thoughts?

Thanks,
Ashutosh


Remove HCat cli

2014-04-17 Thread Ashutosh Chauhan
As far as I can see, all the functionality it provides can be provided by
hive cli with some configuration. There is functionality like -g and -p
option which it has, for which if there are users can be added to hive cli.
So, it seems we can get rid of HCatCli.java and its friends as well as
bin/hcat If dev folks think positively about this we can ask on user list
to see how users feel about it.

Thanks,
Ashutosh


Remove HiveServer1

2014-04-17 Thread Ashutosh Chauhan
HiveServer2 was introduced in Hive 0.10 since than we have 3 releases 0.11,
0.12  soon to be 0.13. I think its a high time we remove HS1 from our
trunk. Thoughts?

Thanks,
Ashutosh


[jira] [Commented] (HIVE-1643) support range scans and non-key columns in HBase filter pushdown

2014-04-17 Thread Sandy Pratt (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973241#comment-13973241
 ] 

Sandy Pratt commented on HIVE-1643:
---

Craig, I've been running my patch for this issue in production for at least a 
year now, and it seems to work well enough.  I have an item on my plate to 
contribute the source, but it will have to wait until I have an opening in my 
schedule.  Because the HBase handler is a pluggable SerDe, and my 
implementation strays a bit from the one in Hive, I'll probably stick it on 
Github or something and post a pointer here.

 support range scans and non-key columns in HBase filter pushdown
 

 Key: HIVE-1643
 URL: https://issues.apache.org/jira/browse/HIVE-1643
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Affects Versions: 0.9.0
Reporter: John Sichi
Assignee: bharath v
  Labels: patch
 Attachments: HIVE-1643.patch, Hive-1643.2.patch, hbase_handler.patch


 HIVE-1226 added support for WHERE rowkey=3.  We would like to support WHERE 
 rowkey BETWEEN 10 and 20, as well as predicates on non-rowkeys (plus 
 conjunctions etc).  Non-rowkey conditions can't be used to filter out entire 
 ranges, but they can be used to push the per-row filter processing as far 
 down as possible.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6924) MapJoinKeyBytes::hashCode() should use Murmur hash

2014-04-17 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6924:
---

Attachment: HIVE-6924.patch

 MapJoinKeyBytes::hashCode() should use Murmur hash
 --

 Key: HIVE-6924
 URL: https://issues.apache.org/jira/browse/HIVE-6924
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-6924.patch


 Existing hashCode is bad, causes HashMap to cluster



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6924) MapJoinKeyBytes::hashCode() should use Murmur hash

2014-04-17 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6924:
---

Status: Patch Available  (was: Open)

 MapJoinKeyBytes::hashCode() should use Murmur hash
 --

 Key: HIVE-6924
 URL: https://issues.apache.org/jira/browse/HIVE-6924
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-6924.patch


 Existing hashCode is bad, causes HashMap to cluster



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6924) MapJoinKeyBytes::hashCode() should use Murmur hash

2014-04-17 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973244#comment-13973244
 ] 

Sergey Shelukhin commented on HIVE-6924:


[~t3rmin4t0r] fyi
[~ashutoshc] can you please +1? :)

 MapJoinKeyBytes::hashCode() should use Murmur hash
 --

 Key: HIVE-6924
 URL: https://issues.apache.org/jira/browse/HIVE-6924
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-6924.patch


 Existing hashCode is bad, causes HashMap to cluster



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Remove HiveServer1

2014-04-17 Thread Vaibhav Gumashta
I am +1 on it. I'd also add that we removed JDBC-1 which was supposed to
work with HiveServer1.

Thanks,
--Vaibhav


On Thu, Apr 17, 2014 at 11:26 AM, Ashutosh Chauhan hashut...@apache.orgwrote:

 HiveServer2 was introduced in Hive 0.10 since than we have 3 releases 0.11,
 0.12  soon to be 0.13. I think its a high time we remove HS1 from our
 trunk. Thoughts?

 Thanks,
 Ashutosh


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


[jira] [Updated] (HIVE-6756) alter table set fileformat should set serde too

2014-04-17 Thread Chinna Rao Lalam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam updated HIVE-6756:
---

Attachment: HIVE-6756.patch

In alert table file format for the ORC and RC  file formats are setting the 
corresponding serdes, reaming file formats are not setting the corresponding 
serde.

In create table if we are not specifying the serde other than ORC and RC file 
formats it is setting with LazySimpleSerDe, like create table in alert table 
set file format added this.

 alter table set fileformat should set serde too
 ---

 Key: HIVE-6756
 URL: https://issues.apache.org/jira/browse/HIVE-6756
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Chinna Rao Lalam
 Attachments: HIVE-6756.patch


 Currently doing alter table set fileformat doesn't change the serde. This is 
 unexpected by customers because the serdes are largely file format specific.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6756) alter table set fileformat should set serde too

2014-04-17 Thread Chinna Rao Lalam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam updated HIVE-6756:
---

Status: Patch Available  (was: Open)

 alter table set fileformat should set serde too
 ---

 Key: HIVE-6756
 URL: https://issues.apache.org/jira/browse/HIVE-6756
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Chinna Rao Lalam
 Attachments: HIVE-6756.patch


 Currently doing alter table set fileformat doesn't change the serde. This is 
 unexpected by customers because the serdes are largely file format specific.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HIVE-6912) HWI not working - HTTP ERROR 500

2014-04-17 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-6912.


   Resolution: Duplicate
Fix Version/s: 0.13.0

This has been fixed via HIVE-5132

 HWI not working - HTTP ERROR 500
 

 Key: HIVE-6912
 URL: https://issues.apache.org/jira/browse/HIVE-6912
 Project: Hive
  Issue Type: Bug
Reporter: sunil ranjan khuntia
Priority: Critical
 Fix For: 0.13.0


 I tried to use hive HWI to write hive queries on a UI.
 As p[er the steps mentioned here 
 https://cwiki.apache.org/confluence/display/Hive/HiveWebInterface
 I set Ant and ran the hive hwi service.
 but In browser when i hit http://localhost:/hwi i got the below error
 HTTP ERROR 500
 Problem accessing /hwi/. Reason:
 Unable to find a javac compiler;
 com.sun.tools.javac.Main is not on the classpath.
 Perhaps JAVA_HOME does not point to the JDK.
 It is currently set to /usr/java/jdk1.6.0_32/jre
 Caused by:
 Unable to find a javac compiler;
 com.sun.tools.javac.Main is not on the classpath.
 Perhaps JAVA_HOME does not point to the JDK.
 It is currently set to /usr/java/jdk1.6.0_32/jre
   at 
 org.apache.tools.ant.taskdefs.compilers.CompilerAdapterFactory.getCompiler(CompilerAdapterFactory.java:129)
 I have checked and changed JAVA_HOME. But its still the same



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6924) MapJoinKeyBytes::hashCode() should use Murmur hash

2014-04-17 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973253#comment-13973253
 ] 

Ashutosh Chauhan commented on HIVE-6924:


Not sure, but I hear Cuckoo hashing is even better. We have internal 
implementation of it in ql/exec/vector/expressions/CuckooSetBytes.java Shall we 
use that?

 MapJoinKeyBytes::hashCode() should use Murmur hash
 --

 Key: HIVE-6924
 URL: https://issues.apache.org/jira/browse/HIVE-6924
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-6924.patch


 Existing hashCode is bad, causes HashMap to cluster



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-6926) HiveServer2 should use tcp instead of binary as the name of the transport mode

2014-04-17 Thread Vaibhav Gumashta (JIRA)
Vaibhav Gumashta created HIVE-6926:
--

 Summary: HiveServer2 should use tcp instead of binary as the name 
of the transport mode
 Key: HIVE-6926
 URL: https://issues.apache.org/jira/browse/HIVE-6926
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.14.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Fix For: 0.14.0


I think the name binary really doesn't convey anything. I'll make the change 
in a backward compatible way. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6862) add DB schema DDL and upgrade 12to13 scripts for MS SQL Server

2014-04-17 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973268#comment-13973268
 ] 

Ashutosh Chauhan commented on HIVE-6862:


+1

 add DB schema DDL and upgrade 12to13 scripts for MS SQL Server
 --

 Key: HIVE-6862
 URL: https://issues.apache.org/jira/browse/HIVE-6862
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.13.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Attachments: HIVE-6862.2.patch, HIVE-6862.3.patch, HIVE-6862.patch


 need to add a unifed 0.13 script and a separate script for ACID support
 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6768) remove hcatalog/webhcat/svr/src/main/config/override-container-log4j.properties

2014-04-17 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973271#comment-13973271
 ] 

Ashutosh Chauhan commented on HIVE-6768:


In addition to this file, I assume we also need to revert changes introduced in 
HIVE-5511 [~ekoifman] would you like to attach a patch for this?

 remove 
 hcatalog/webhcat/svr/src/main/config/override-container-log4j.properties
 ---

 Key: HIVE-6768
 URL: https://issues.apache.org/jira/browse/HIVE-6768
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.13.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman

 now that MAPREDUCE-5806 is fixed we can remove 
 override-container-log4j.properties and and all the logic around this which 
 was introduced in HIVE-5511 to work around MAPREDUCE-5806



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6835) Reading of partitioned Avro data fails if partition schema does not match table schema

2014-04-17 Thread Anthony Hsu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973279#comment-13973279
 ] 

Anthony Hsu commented on HIVE-6835:
---

The AvroSerDe handles schema evolution as described in 
http://avro.apache.org/docs/current/spec.html#Schema+Resolution.  However, in 
the Hive code, the AvroSerDe needs to always be initialized with the latest 
schema so that ObjectInspectorConverters.getConvertedOI() (in 
FetchOperator:getRecordReader()) will work.  When the AvroSerDe actually reads 
the Avro file, it will then compare the latest schema to the actual schema 
stored in the Avro file and do schema resolution/evolution.

 Reading of partitioned Avro data fails if partition schema does not match 
 table schema
 --

 Key: HIVE-6835
 URL: https://issues.apache.org/jira/browse/HIVE-6835
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Anthony Hsu
Assignee: Anthony Hsu
 Attachments: HIVE-6835.1.patch, HIVE-6835.2.patch, HIVE-6835.3.patch


 To reproduce:
 {code}
 create table testarray (a arraystring);
 load data local inpath '/home/ahsu/test/array.txt' into table testarray;
 # create partitioned Avro table with one array column
 create table avroarray partitioned by (y string) row format serde 
 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' with serdeproperties 
 ('avro.schema.literal'='{namespace:test,name:avroarray,type: 
 record, fields: [ { name:a, type:{type:array,items:string} 
 } ] }')  STORED as INPUTFORMAT  
 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'  OUTPUTFORMAT  
 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat';
 insert into table avroarray partition(y=1) select * from testarray;
 # add an int column with a default value of 0
 alter table avroarray set serde 
 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' with 
 serdeproperties('avro.schema.literal'='{namespace:test,name:avroarray,type:
  record, fields: [ {name:intfield,type:int,default:0},{ 
 name:a, type:{type:array,items:string} } ] }');
 # fails with ClassCastException
 select * from avroarray;
 {code}
 The select * fails with:
 {code}
 Failed with exception java.io.IOException:java.lang.ClassCastException: 
 org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector 
 cannot be cast to 
 org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6756) alter table set fileformat should set serde too

2014-04-17 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-6756:
---

Status: Open  (was: Patch Available)

I think instead of always defaulting to LazySimpleSerde, better is to set 
LazySimpleSerde for Textfile and SequenceFile format only and throw exception 
in cases where serde is not specified. We cant assume other file format uses 
LazySimpleSerde.

 alter table set fileformat should set serde too
 ---

 Key: HIVE-6756
 URL: https://issues.apache.org/jira/browse/HIVE-6756
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Chinna Rao Lalam
 Attachments: HIVE-6756.patch


 Currently doing alter table set fileformat doesn't change the serde. This is 
 unexpected by customers because the serdes are largely file format specific.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Remove HiveServer1

2014-04-17 Thread Xuefu Zhang
+1 removing server1 and related. However, +1 on keeping Hive CLI.


On Thu, Apr 17, 2014 at 11:34 AM, Vaibhav Gumashta 
vgumas...@hortonworks.com wrote:

 I am +1 on it. I'd also add that we removed JDBC-1 which was supposed to
 work with HiveServer1.

 Thanks,
 --Vaibhav


 On Thu, Apr 17, 2014 at 11:26 AM, Ashutosh Chauhan hashut...@apache.org
 wrote:

  HiveServer2 was introduced in Hive 0.10 since than we have 3 releases
 0.11,
  0.12  soon to be 0.13. I think its a high time we remove HS1 from our
  trunk. Thoughts?
 
  Thanks,
  Ashutosh
 

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.



[jira] [Commented] (HIVE-1608) use sequencefile as the default for storing intermediate results

2014-04-17 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973285#comment-13973285
 ] 

Brock Noland commented on HIVE-1608:


The big win here is that columns with new lines don't get screwed up by 
default. That is they work out of the box. 

 use sequencefile as the default for storing intermediate results
 

 Key: HIVE-1608
 URL: https://issues.apache.org/jira/browse/HIVE-1608
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.7.0
Reporter: Namit Jain
Assignee: Brock Noland
 Fix For: 0.14.0

 Attachments: HIVE-1608.patch


 The only argument for having a text file for storing intermediate results 
 seems to be better debuggability.
 But, tailing a sequence file is possible, and it should be more space 
 efficient



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Remove HCat cli

2014-04-17 Thread Lefty Leverenz
The HCat wikidoc lists some of the differences between Hive and HCat CLIs
here:  Hive 
CLIhttps://cwiki.apache.org/confluence/display/Hive/HCatalog+CLI#HCatalogCLI-HiveCLI
.

-- Lefty


On Thu, Apr 17, 2014 at 2:24 PM, Ashutosh Chauhan hashut...@apache.orgwrote:

 As far as I can see, all the functionality it provides can be provided by
 hive cli with some configuration. There is functionality like -g and -p
 option which it has, for which if there are users can be added to hive cli.
 So, it seems we can get rid of HCatCli.java and its friends as well as
 bin/hcat If dev folks think positively about this we can ask on user list
 to see how users feel about it.

 Thanks,
 Ashutosh



[jira] [Created] (HIVE-6927) Add support for MSSQL in schematool

2014-04-17 Thread Deepesh Khandelwal (JIRA)
Deepesh Khandelwal created HIVE-6927:


 Summary: Add support for MSSQL in schematool
 Key: HIVE-6927
 URL: https://issues.apache.org/jira/browse/HIVE-6927
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.13.0
Reporter: Deepesh Khandelwal
Assignee: Deepesh Khandelwal


Schematool is the preferred way of initializing schema for Hive. Since 
HIVE-6862 provided the script for MSSQL it would be nice to add the support for 
it in schematool.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6927) Add support for MSSQL in schematool

2014-04-17 Thread Deepesh Khandelwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepesh Khandelwal updated HIVE-6927:
-

Attachment: HIVE-6927.patch

Attaching the patch for review.

 Add support for MSSQL in schematool
 ---

 Key: HIVE-6927
 URL: https://issues.apache.org/jira/browse/HIVE-6927
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.13.0
Reporter: Deepesh Khandelwal
Assignee: Deepesh Khandelwal
 Attachments: HIVE-6927.patch


 Schematool is the preferred way of initializing schema for Hive. Since 
 HIVE-6862 provided the script for MSSQL it would be nice to add the support 
 for it in schematool.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HIVE-6919) hive sql std auth select query fails on partitioned tables

2014-04-17 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-6919.


   Resolution: Fixed
Fix Version/s: 0.14.0

Committed to trunk. Thanks, Thejas!

 hive sql std auth select query fails on partitioned tables
 --

 Key: HIVE-6919
 URL: https://issues.apache.org/jira/browse/HIVE-6919
 Project: Hive
  Issue Type: Bug
  Components: Authorization
Affects Versions: 0.13.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-6919.1.patch


 {code}
 analyze table studentparttab30k partition (ds) compute statistics;
 Error: Error while compiling statement: FAILED: HiveAccessControlException 
 Permission denied. Principal [name=hadoopqa, type=USER] does not have 
 following privileges on Object [type=PARTITION, name=null] : [SELECT] 
 (state=42000,code=4)
 {code}
 Sql std auth is supposed to ignore partition level objects for privilege 
 checks, but that is not working as intended.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6835) Reading of partitioned Avro data fails if partition schema does not match table schema

2014-04-17 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973327#comment-13973327
 ] 

Xuefu Zhang commented on HIVE-6835:
---

{quote}
 in the Hive code, the AvroSerDe needs to always be initialized with the latest 
schema so that ObjectInspectorConverters.getConvertedOI() (in 
FetchOperator:getRecordReader()) will work.
{quote}

[~erwaman] I guess I don't quite follow this. The exception stack shows that 
casting error happens when reading old data with partition schema which is old 
schema. If the schema matches the data, I'm not sure why we'd have this casting 
error? On the other hand, if we use the new schema and read old data, would it 
be possible that error might arise?

Anyway, I'm not fully understanding the real cause of the problem and how the 
change will address all other possible scenarios.

 Reading of partitioned Avro data fails if partition schema does not match 
 table schema
 --

 Key: HIVE-6835
 URL: https://issues.apache.org/jira/browse/HIVE-6835
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Anthony Hsu
Assignee: Anthony Hsu
 Attachments: HIVE-6835.1.patch, HIVE-6835.2.patch, HIVE-6835.3.patch


 To reproduce:
 {code}
 create table testarray (a arraystring);
 load data local inpath '/home/ahsu/test/array.txt' into table testarray;
 # create partitioned Avro table with one array column
 create table avroarray partitioned by (y string) row format serde 
 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' with serdeproperties 
 ('avro.schema.literal'='{namespace:test,name:avroarray,type: 
 record, fields: [ { name:a, type:{type:array,items:string} 
 } ] }')  STORED as INPUTFORMAT  
 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'  OUTPUTFORMAT  
 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat';
 insert into table avroarray partition(y=1) select * from testarray;
 # add an int column with a default value of 0
 alter table avroarray set serde 
 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' with 
 serdeproperties('avro.schema.literal'='{namespace:test,name:avroarray,type:
  record, fields: [ {name:intfield,type:int,default:0},{ 
 name:a, type:{type:array,items:string} } ] }');
 # fails with ClassCastException
 select * from avroarray;
 {code}
 The select * fails with:
 {code}
 Failed with exception java.io.IOException:java.lang.ClassCastException: 
 org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector 
 cannot be cast to 
 org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-6928) Beeline should not chop off describe extended results by default

2014-04-17 Thread Szehon Ho (JIRA)
Szehon Ho created HIVE-6928:
---

 Summary: Beeline should not chop off describe extended results 
by default
 Key: HIVE-6928
 URL: https://issues.apache.org/jira/browse/HIVE-6928
 Project: Hive
  Issue Type: Bug
  Components: CLI
Reporter: Szehon Ho


By default, beeline truncates long results based on the console width like:

+-+--+
|  col_name   | 
 |
+-+--+
| pat_id  | string  
 |
| score   | float   
 |
| acutes  | float   
 |
| | 
 |
| Detailed Table Information  | Table(tableName:refills, dbName:default, 
owner:hdadmin, createTime:1393882396, lastAccessTime:0, retention:0, sd:Sto |
+-+--+
5 rows selected (0.4 seconds)

This can be changed by !outputformat, but the default should behave better to 
give a better experience to the first-time beeline user.






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-6929) hcatalog packaging is not fully integrated with hive

2014-04-17 Thread Ashutosh Chauhan (JIRA)
Ashutosh Chauhan created HIVE-6929:
--

 Summary: hcatalog packaging is not fully integrated with hive
 Key: HIVE-6929
 URL: https://issues.apache.org/jira/browse/HIVE-6929
 Project: Hive
  Issue Type: Task
  Components: HCatalog, WebHCat
Affects Versions: 0.13.0
Reporter: Ashutosh Chauhan


Currently, if you run {{mvn package}} hcatalog jars are in 
{{hcatalog/share/hcatalog}} and similarly webhcat jars. All other hive jars are 
in lib/ and thats where hcatalog jars should also be. Similar is the story for 
webhcat. To reduce confusion, its better that hcatalog follow hive's dir 
structure. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6929) hcatalog packaging is not fully integrated with hive

2014-04-17 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973340#comment-13973340
 ] 

Ashutosh Chauhan commented on HIVE-6929:


Practical problem it leads into is hcatalog jars are not available by default 
in hive classpath. So, if you want to make use of hcatalog functionality you 
need to somehow get those in hive's classpath. This makes for a bad user 
experience.
cc: [~susanths] [~ekoifman]

 hcatalog packaging is not fully integrated with hive
 

 Key: HIVE-6929
 URL: https://issues.apache.org/jira/browse/HIVE-6929
 Project: Hive
  Issue Type: Task
  Components: HCatalog, WebHCat
Affects Versions: 0.13.0
Reporter: Ashutosh Chauhan

 Currently, if you run {{mvn package}} hcatalog jars are in 
 {{hcatalog/share/hcatalog}} and similarly webhcat jars. All other hive jars 
 are in lib/ and thats where hcatalog jars should also be. Similar is the 
 story for webhcat. To reduce confusion, its better that hcatalog follow 
 hive's dir structure. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6929) hcatalog packaging is not fully integrated with hive

2014-04-17 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973369#comment-13973369
 ] 

Sushanth Sowmyan commented on HIVE-6929:


Agreed. We should streamline jar locations for hcat to be in hive standard 
locations for 0.14.

 hcatalog packaging is not fully integrated with hive
 

 Key: HIVE-6929
 URL: https://issues.apache.org/jira/browse/HIVE-6929
 Project: Hive
  Issue Type: Task
  Components: HCatalog, WebHCat
Affects Versions: 0.13.0
Reporter: Ashutosh Chauhan

 Currently, if you run {{mvn package}} hcatalog jars are in 
 {{hcatalog/share/hcatalog}} and similarly webhcat jars. All other hive jars 
 are in lib/ and thats where hcatalog jars should also be. Similar is the 
 story for webhcat. To reduce confusion, its better that hcatalog follow 
 hive's dir structure. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HIVE-2302) Allow grant privileges on granting privileges.

2014-04-17 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-2302.


   Resolution: Fixed
Fix Version/s: 0.13.0

This is now possible via SQL std based authorization introduced in HIVE-5837 
which is going to be available in Hive 0.13

 Allow grant privileges on granting privileges.
 --

 Key: HIVE-2302
 URL: https://issues.apache.org/jira/browse/HIVE-2302
 Project: Hive
  Issue Type: Improvement
  Components: Authorization, Security
Affects Versions: 0.9.0, 0.10.0, 0.11.0
Reporter: Guy Doulberg
Assignee: Mohammad Kamrul Islam
 Fix For: 0.13.0


 Today any user can grant him and any other users privileges on schemas and 
 tables.
 This way the administrator can not be sure that the rules he had apply are 
 fulfilled.
   



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6922) NullPointerException in collect_set() UDAF

2014-04-17 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973438#comment-13973438
 ] 

Jason Dere commented on HIVE-6922:
--

Would you be able to add a testcase for this bug?

 NullPointerException in collect_set() UDAF
 --

 Key: HIVE-6922
 URL: https://issues.apache.org/jira/browse/HIVE-6922
 Project: Hive
  Issue Type: Bug
  Components: UDF
Reporter: Sun Rui
Assignee: Sun Rui
 Attachments: HIVE-6922.patch


 Steps to reproduce the bug:
 {noformat}
 create table temp(key int, value string);
 -- leave the table empty
 select collect_set(key) from temp where key=0;
 Error: java.lang.RuntimeException: Hive Runtime Error while closing 
 operators: java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:326)
   at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:471)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1477)
   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1141)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:577)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:318)
   ... 7 more
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMkCollectionEvaluator.merge(GenericUDAFMkCollectionEvaluator.java:140)
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:186)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1132)
   ... 9 more
 {noformat}
 The root cause is that in GenericUDAFMkCollectionEvaluator.merge() 
 partialResult could be null but is not validated before it is used.
 {code}
 ListObject partialResult = (ArrayListObject) 
 internalMergeOI.getList(partial);
 for(Object i : partialResult) {
   putIntoCollection(i, myagg);
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6862) add DB schema DDL and upgrade 12to13 scripts for MS SQL Server

2014-04-17 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-6862:
---

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Eugene!
Lefty I edited that line while committing.

 add DB schema DDL and upgrade 12to13 scripts for MS SQL Server
 --

 Key: HIVE-6862
 URL: https://issues.apache.org/jira/browse/HIVE-6862
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.13.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Fix For: 0.14.0

 Attachments: HIVE-6862.2.patch, HIVE-6862.3.patch, HIVE-6862.patch


 need to add a unifed 0.13 script and a separate script for ACID support
 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6924) MapJoinKeyBytes::hashCode() should use Murmur hash

2014-04-17 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973455#comment-13973455
 ] 

Sergey Shelukhin commented on HIVE-6924:


Hmm.. cuckoo hashing is a method for conflict resolution, right? This is the 
hash function itself. 

 MapJoinKeyBytes::hashCode() should use Murmur hash
 --

 Key: HIVE-6924
 URL: https://issues.apache.org/jira/browse/HIVE-6924
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-6924.patch


 Existing hashCode is bad, causes HashMap to cluster



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Minimum supported versions for DB backing Metastore

2014-04-17 Thread Ashutosh Chauhan
I don't think we have documented anywhere what versions of mysql / postgres
/ oracle / ms-sql we are supporting. It will be good to document those. I
propose following versions:

* Derby -   10.10.1.1  - defined in hive's pom, so all unit
tests run with it.

* MySQL   -   5.6.17   - minimum supported version by mysql
community

* Postgres -  9.1.13- has support for create table if not
exists which is good to have

* Oracle -  11g   - oldest oracle version available to
download from their site

* MSSQL server -  2008 R2 - one which is currently tested against.


Thoughts?

Ashutosh


Re: Minimum supported versions for DB backing Metastore

2014-04-17 Thread Brock Noland
Do we have ms-sql scripts these days? Last time I checked we did not. I
think we need them to claim ms-sql support.
On Apr 17, 2014 4:59 PM, Ashutosh Chauhan hashut...@apache.org wrote:

 I don't think we have documented anywhere what versions of mysql / postgres
 / oracle / ms-sql we are supporting. It will be good to document those. I
 propose following versions:

 * Derby -   10.10.1.1  - defined in hive's pom, so all unit
 tests run with it.

 * MySQL   -   5.6.17   - minimum supported version by mysql
 community

 * Postgres -  9.1.13- has support for create table if not
 exists which is good to have

 * Oracle -  11g   - oldest oracle version available to
 download from their site

 * MSSQL server -  2008 R2 - one which is currently tested against.


 Thoughts?

 Ashutosh



[jira] [Commented] (HIVE-6361) Un-fork Sqlline

2014-04-17 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973500#comment-13973500
 ] 

Julian Hyde commented on HIVE-6361:
---

Agreed. Expect a patch in about a week.

 Un-fork Sqlline
 ---

 Key: HIVE-6361
 URL: https://issues.apache.org/jira/browse/HIVE-6361
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 0.12.0
Reporter: Julian Hyde

 I propose to merge the two development forks of sqlline: Hive's beeline 
 module, and the fork at https://github.com/julianhyde/sqlline.
 How did the forks come about? Hive’s SQL command-line interface Beeline was 
 created by forking Sqlline (see HIVE-987, HIVE-3100), which at the time it 
 was a useful but low-activity project languishing on SourceForge without an 
 active owner. Around the same time, Julian Hyde independently started a 
 github repo based on the same code base. Now several projects are using 
 Julian Hyde's sqlline, including Apache Drill, Apache Phoenix, Cascading 
 Lingual and Optiq.
 Merging these two forks will allow us to pool our resources. (Case in point: 
 Drill issue DRILL-327 had already been fixed in a later version of sqlline; 
 it still exists in beeline.)
 I propose the following steps:
 1. Copy Julian Hyde's sqlline as a new Hive module, hive-sqlline.
 2. Port fixes to hive-beeline into hive-sqlline.
 3. Make hive-beeline depend on hive-sqlline, and remove code that is 
 identical. What remains in the hive-beeline module is Beeline.java (a derived 
 class of Sqlline.java) and Hive-specific extensions.
 4. Make the hive-sqlline the official successor to Julian Hyde's sqlline.
 This achieves continuity for Hive’s users, gives the users of the non-Hive 
 sqlline a version with minimal dependencies, unifies the two code lines, and 
 brings everything under the Apache roof.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Minimum supported versions for DB backing Metastore

2014-04-17 Thread Eugene Koifman
we do: https://issues.apache.org/jira/browse/HIVE-6862


On Thu, Apr 17, 2014 at 3:12 PM, Brock Noland br...@cloudera.com wrote:

 Do we have ms-sql scripts these days? Last time I checked we did not. I
 think we need them to claim ms-sql support.
 On Apr 17, 2014 4:59 PM, Ashutosh Chauhan hashut...@apache.org wrote:

  I don't think we have documented anywhere what versions of mysql /
 postgres
  / oracle / ms-sql we are supporting. It will be good to document those. I
  propose following versions:
 
  * Derby -   10.10.1.1  - defined in hive's pom, so all unit
  tests run with it.
 
  * MySQL   -   5.6.17   - minimum supported version by mysql
  community
 
  * Postgres -  9.1.13- has support for create table if
 not
  exists which is good to have
 
  * Oracle -  11g   - oldest oracle version available
 to
  download from their site
 
  * MSSQL server -  2008 R2 - one which is currently tested against.
 
 
  Thoughts?
 
  Ashutosh
 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


[jira] [Commented] (HIVE-6924) MapJoinKeyBytes::hashCode() should use Murmur hash

2014-04-17 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973531#comment-13973531
 ] 

Ashutosh Chauhan commented on HIVE-6924:


I see. Thats correct. +1

 MapJoinKeyBytes::hashCode() should use Murmur hash
 --

 Key: HIVE-6924
 URL: https://issues.apache.org/jira/browse/HIVE-6924
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-6924.patch


 Existing hashCode is bad, causes HashMap to cluster



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-6930) Beeline should nicely format timestamps when displaying results

2014-04-17 Thread Gwen Shapira (JIRA)
Gwen Shapira created HIVE-6930:
--

 Summary: Beeline should nicely format timestamps when displaying 
results
 Key: HIVE-6930
 URL: https://issues.apache.org/jira/browse/HIVE-6930
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 0.12.0
Reporter: Gwen Shapira


When I have a timestamp column in my query, I get the results back as the 
bigint with number of seconds since epoch. Not very user friendly or readable.
This means that all my queries need to include stuff like:
select from_unixtime(cast(round(transaction_ts/1000) as bigint))...
which is not too readable either :)

Other SQL query tools automatically convert timestamps to some standard 
readable date format. They even let users specify the default formatting by 
setting a parameter (for example NLS_DATE_FORMAT for Oracle).

I'd love to see something like that in beeline.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6843) INSTR for UTF-8 returns incorrect position

2014-04-17 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973566#comment-13973566
 ] 

Jason Dere commented on HIVE-6843:
--

Should this also work for unicode characters which require more than one Java 
character? If you add these checks to TestGenericUDFUtils, the 2nd check fails:
{code}
Assert.assertEquals(3, GenericUDFUtils.findText(new 
Text(123\uD801\uDC00456), new Text(\uD801\uDC00), 0));
Assert.assertEquals(4, GenericUDFUtils.findText(new 
Text(123\uD801\uDC00456), new Text(4), 0));
{code}

This would require using String.codePointCount() on the indexOf() result.

 INSTR for UTF-8 returns incorrect position
 --

 Key: HIVE-6843
 URL: https://issues.apache.org/jira/browse/HIVE-6843
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 0.11.0, 0.12.0
Reporter: Clif Kranish
Assignee: Szehon Ho
Priority: Minor
 Attachments: HIVE-6843.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6835) Reading of partitioned Avro data fails if partition schema does not match table schema

2014-04-17 Thread Anthony Hsu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973615#comment-13973615
 ] 

Anthony Hsu commented on HIVE-6835:
---

What happens is Hive tries to build ObjectInspectorConverters from the 
partition schema to the table schema.  If the partition schema is different 
from the table schema, you may get a ClassCastException like above.

When you add new columns at the end, this is not a problem because these new 
columns are chopped off.  See ObjectInspectorConverters:StructConverter:
{code}
int minFields = Math.min(inputFields.size(), outputFields.size());
fieldConverters = new ArrayListConverter(minFields);
{code}
It's only when you insert new columns at the beginning or in the middle that 
you might run into ClassCastExceptions.

For the AvroSerDe, if it always uses the latest schema (which should be the 
table-level schema), Hive will not get confused when constructing its 
ObjectInspectorConverters.  Then, later, when the AvroSerDe actually goes to 
read the Avro files, it can compare the latest schema with the (possibly old) 
schemas stored in the Avro data files themselves, and do the proper schema 
resolution, omitting fields or substituting default values, following the 
[schema resolution 
rules|http://avro.apache.org/docs/current/spec.html#Schema+Resolution].

 Reading of partitioned Avro data fails if partition schema does not match 
 table schema
 --

 Key: HIVE-6835
 URL: https://issues.apache.org/jira/browse/HIVE-6835
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Anthony Hsu
Assignee: Anthony Hsu
 Attachments: HIVE-6835.1.patch, HIVE-6835.2.patch, HIVE-6835.3.patch


 To reproduce:
 {code}
 create table testarray (a arraystring);
 load data local inpath '/home/ahsu/test/array.txt' into table testarray;
 # create partitioned Avro table with one array column
 create table avroarray partitioned by (y string) row format serde 
 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' with serdeproperties 
 ('avro.schema.literal'='{namespace:test,name:avroarray,type: 
 record, fields: [ { name:a, type:{type:array,items:string} 
 } ] }')  STORED as INPUTFORMAT  
 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'  OUTPUTFORMAT  
 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat';
 insert into table avroarray partition(y=1) select * from testarray;
 # add an int column with a default value of 0
 alter table avroarray set serde 
 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' with 
 serdeproperties('avro.schema.literal'='{namespace:test,name:avroarray,type:
  record, fields: [ {name:intfield,type:int,default:0},{ 
 name:a, type:{type:array,items:string} } ] }');
 # fails with ClassCastException
 select * from avroarray;
 {code}
 The select * fails with:
 {code}
 Failed with exception java.io.IOException:java.lang.ClassCastException: 
 org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector 
 cannot be cast to 
 org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6835) Reading of partitioned Avro data fails if partition schema does not match table schema

2014-04-17 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-6835:
--

Status: Patch Available  (was: Open)

 Reading of partitioned Avro data fails if partition schema does not match 
 table schema
 --

 Key: HIVE-6835
 URL: https://issues.apache.org/jira/browse/HIVE-6835
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Anthony Hsu
Assignee: Anthony Hsu
 Attachments: HIVE-6835.1.patch, HIVE-6835.2.patch, HIVE-6835.3.patch


 To reproduce:
 {code}
 create table testarray (a arraystring);
 load data local inpath '/home/ahsu/test/array.txt' into table testarray;
 # create partitioned Avro table with one array column
 create table avroarray partitioned by (y string) row format serde 
 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' with serdeproperties 
 ('avro.schema.literal'='{namespace:test,name:avroarray,type: 
 record, fields: [ { name:a, type:{type:array,items:string} 
 } ] }')  STORED as INPUTFORMAT  
 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'  OUTPUTFORMAT  
 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat';
 insert into table avroarray partition(y=1) select * from testarray;
 # add an int column with a default value of 0
 alter table avroarray set serde 
 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' with 
 serdeproperties('avro.schema.literal'='{namespace:test,name:avroarray,type:
  record, fields: [ {name:intfield,type:int,default:0},{ 
 name:a, type:{type:array,items:string} } ] }');
 # fails with ClassCastException
 select * from avroarray;
 {code}
 The select * fails with:
 {code}
 Failed with exception java.io.IOException:java.lang.ClassCastException: 
 org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector 
 cannot be cast to 
 org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6916) Export/import inherit permissions from parent directory

2014-04-17 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973623#comment-13973623
 ] 

Szehon Ho commented on HIVE-6916:
-

[~xuefuz] can you please help review this?

 Export/import inherit permissions from parent directory
 ---

 Key: HIVE-6916
 URL: https://issues.apache.org/jira/browse/HIVE-6916
 Project: Hive
  Issue Type: Bug
  Components: Security
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: HIVE-6916.patch


 Export table into an external location and importing into hive, should set 
 the table to have the permission of the parent directory, if the flag 
 hive.warehouse.subdir.inherit.perms is set.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6430) MapJoin hash table has large memory overhead

2014-04-17 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6430:
---

Attachment: HIVE-6430.08.patch

Fixed bugs, improved tests; TPCDS q27 now can run on the cluster I have access 
to (fails with OOM even with 8Gb containers). Profiling the results are 
actually much better now, little own time for the hashmap.

 MapJoin hash table has large memory overhead
 

 Key: HIVE-6430
 URL: https://issues.apache.org/jira/browse/HIVE-6430
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-6430.01.patch, HIVE-6430.02.patch, 
 HIVE-6430.03.patch, HIVE-6430.04.patch, HIVE-6430.05.patch, 
 HIVE-6430.06.patch, HIVE-6430.07.patch, HIVE-6430.08.patch, HIVE-6430.patch


 Right now, in some queries, I see that storing e.g. 4 ints (2 for key and 2 
 for row) can take several hundred bytes, which is ridiculous. I am reducing 
 the size of MJKey and MJRowContainer in other jiras, but in general we don't 
 need to have java hash table there.  We can either use primitive-friendly 
 hashtable like the one from HPPC (Apache-licenced), or some variation, to map 
 primitive keys to single row storage structure without an object per row 
 (similar to vectorization).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 18936: HIVE-6430 MapJoin hash table has large memory overhead

2014-04-17 Thread Sergey Shelukhin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18936/
---

(Updated April 18, 2014, 1 a.m.)


Review request for hive, Gopal V and Gunther Hagleitner.


Changes
---

Another iteration


Repository: hive-git


Description
---

See JIRA


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java e0e1339 
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java 5fe35a5 
  
hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java
 142bfd8 
  ql/src/java/org/apache/hadoop/hive/ql/Driver.java bf9d4c1 
  ql/src/java/org/apache/hadoop/hive/ql/debug/Utils.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java 2b1438d 
  ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 1104a2b 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/AbstractMapJoinTableContainer.java
 8854b19 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/BytesBytesMultiHashMap.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/HashMapWrapper.java 
9df425b 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinKey.java 
64f0be2 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinPersistableTableContainer.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinRowContainer.java 
008a8db 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainer.java
 988959f 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java
 55b7415 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HashTableLoader.java e392592 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java 
eef7656 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedColumnarSerDe.java 
d4be78d 
  ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToString.java 118b339 
  
ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestBytesBytesMultiHashMap.java
 PRE-CREATION 
  
ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestMapJoinEqualityTableContainer.java
 65e3779 
  
ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestMapJoinTableContainer.java
 755d783 
  ql/src/test/queries/clientpositive/mapjoin_decimal.q b65a7be 
  ql/src/test/queries/clientpositive/mapjoin_mapjoin.q 1eb95f6 
  ql/src/test/results/clientpositive/mapjoin_mapjoin.q.out 8350670 
  ql/src/test/results/clientpositive/tez/mapjoin_decimal.q.out 3c55b5c 
  ql/src/test/results/clientpositive/tez/mapjoin_mapjoin.q.out 284cc03 
  serde/src/java/org/apache/hadoop/hive/serde2/ByteStream.java 73d9b29 
  serde/src/java/org/apache/hadoop/hive/serde2/WriteBuffers.java PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarSerDe.java 
5870884 
  
serde/src/java/org/apache/hadoop/hive/serde2/columnar/LazyBinaryColumnarSerDe.java
 bab505e 
  serde/src/java/org/apache/hadoop/hive/serde2/dynamic_type/DynamicSerDe.java 
6f344bb 
  serde/src/java/org/apache/hadoop/hive/serde2/io/DateWritable.java 1f4ccdd 
  serde/src/java/org/apache/hadoop/hive/serde2/io/HiveDecimalWritable.java 
a99c7b4 
  serde/src/java/org/apache/hadoop/hive/serde2/io/TimestampWritable.java 
435d6c6 
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazySimpleSerDe.java 
82c1263 
  serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinarySerDe.java 
b188c3f 
  serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryStruct.java 
caf3517 
  serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryUtils.java 
6c14081 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorConverter.java
 06d5c5e 
  serde/src/test/org/apache/hadoop/hive/serde2/lazy/TestLazyPrimitive.java 
868dd4c 
  
serde/src/test/org/apache/hadoop/hive/serde2/thrift_test/CreateSequenceFile.java
 1fb49e5 

Diff: https://reviews.apache.org/r/18936/diff/


Testing
---


Thanks,

Sergey Shelukhin



[jira] [Commented] (HIVE-6835) Reading of partitioned Avro data fails if partition schema does not match table schema

2014-04-17 Thread Anthony Hsu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973654#comment-13973654
 ] 

Anthony Hsu commented on HIVE-6835:
---

On a side note: If you create an Avro table and store the schema in the 
TBLPROPERTIES -
{code}
CREATE TABLE ... TBLPROPERTIES ('avro.schema.literal'='...');
{code}
\- everything works fine with partitions because TBLPROPERTIES are NOT copied 
to the partition, so the partition will end using the TBLPROPERTIES for 
initializing the Avro SerDe.

It's only when you store the schema in the SERDEPROPERTIES -
{code}
CREATE TABLE ... WITH SERDEPROPERTIES ('avro.schema.literal'='...');
{code}
\- that problems arise.  SERDEPROPERTIES DO get copied to the partitions, so if 
you then end up changing the SERDEPROPERTIES stored at the table level, the 
SERDEPROPERTIES in the table and the partitions get out of sync and this 
sometimes leads to ClassCastExceptions with the AvroSerDe.

 Reading of partitioned Avro data fails if partition schema does not match 
 table schema
 --

 Key: HIVE-6835
 URL: https://issues.apache.org/jira/browse/HIVE-6835
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Anthony Hsu
Assignee: Anthony Hsu
 Attachments: HIVE-6835.1.patch, HIVE-6835.2.patch, HIVE-6835.3.patch


 To reproduce:
 {code}
 create table testarray (a arraystring);
 load data local inpath '/home/ahsu/test/array.txt' into table testarray;
 # create partitioned Avro table with one array column
 create table avroarray partitioned by (y string) row format serde 
 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' with serdeproperties 
 ('avro.schema.literal'='{namespace:test,name:avroarray,type: 
 record, fields: [ { name:a, type:{type:array,items:string} 
 } ] }')  STORED as INPUTFORMAT  
 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'  OUTPUTFORMAT  
 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat';
 insert into table avroarray partition(y=1) select * from testarray;
 # add an int column with a default value of 0
 alter table avroarray set serde 
 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' with 
 serdeproperties('avro.schema.literal'='{namespace:test,name:avroarray,type:
  record, fields: [ {name:intfield,type:int,default:0},{ 
 name:a, type:{type:array,items:string} } ] }');
 # fails with ClassCastException
 select * from avroarray;
 {code}
 The select * fails with:
 {code}
 Failed with exception java.io.IOException:java.lang.ClassCastException: 
 org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector 
 cannot be cast to 
 org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6924) MapJoinKeyBytes::hashCode() should use Murmur hash

2014-04-17 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973661#comment-13973661
 ] 

Sergey Shelukhin commented on HIVE-6924:


Will commit tomorrow

 MapJoinKeyBytes::hashCode() should use Murmur hash
 --

 Key: HIVE-6924
 URL: https://issues.apache.org/jira/browse/HIVE-6924
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-6924.patch


 Existing hashCode is bad, causes HashMap to cluster



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6430) MapJoin hash table has large memory overhead

2014-04-17 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973660#comment-13973660
 ] 

Sergey Shelukhin commented on HIVE-6430:


er, 72

 MapJoin hash table has large memory overhead
 

 Key: HIVE-6430
 URL: https://issues.apache.org/jira/browse/HIVE-6430
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-6430.01.patch, HIVE-6430.02.patch, 
 HIVE-6430.03.patch, HIVE-6430.04.patch, HIVE-6430.05.patch, 
 HIVE-6430.06.patch, HIVE-6430.07.patch, HIVE-6430.08.patch, HIVE-6430.patch


 Right now, in some queries, I see that storing e.g. 4 ints (2 for key and 2 
 for row) can take several hundred bytes, which is ridiculous. I am reducing 
 the size of MJKey and MJRowContainer in other jiras, but in general we don't 
 need to have java hash table there.  We can either use primitive-friendly 
 hashtable like the one from HPPC (Apache-licenced), or some variation, to map 
 primitive keys to single row storage structure without an object per row 
 (similar to vectorization).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-6931) Windows unit test fixes

2014-04-17 Thread Jason Dere (JIRA)
Jason Dere created HIVE-6931:


 Summary: Windows unit test fixes
 Key: HIVE-6931
 URL: https://issues.apache.org/jira/browse/HIVE-6931
 Project: Hive
  Issue Type: Bug
  Components: Tests, Windows
Reporter: Jason Dere
Assignee: Jason Dere


A few misc fixes for some of the unit tests on Windows.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Minimum supported versions for DB backing Metastore

2014-04-17 Thread Thejas Nair
+1 Sounds good to me.

On Thu, Apr 17, 2014 at 2:58 PM, Ashutosh Chauhan hashut...@apache.org wrote:
 I don't think we have documented anywhere what versions of mysql / postgres
 / oracle / ms-sql we are supporting. It will be good to document those. I
 propose following versions:

 * Derby -   10.10.1.1  - defined in hive's pom, so all unit
 tests run with it.

 * MySQL   -   5.6.17   - minimum supported version by mysql
 community

 * Postgres -  9.1.13- has support for create table if not
 exists which is good to have

 * Oracle -  11g   - oldest oracle version available to
 download from their site

 * MSSQL server -  2008 R2 - one which is currently tested against.


 Thoughts?

 Ashutosh

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


[jira] [Updated] (HIVE-6931) Windows unit test fixes

2014-04-17 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-6931:
-

Attachment: HIVE-6931.1.patch

Patch v1:
- Remove setAuxJars() call which was breaking Minimr tests
- Refactor common code between QTestUtil/WindowsPathUtil
- TestExecDriver should initialize tmpdir after converting Windows paths
- Fix a couple of q file tests

 Windows unit test fixes
 ---

 Key: HIVE-6931
 URL: https://issues.apache.org/jira/browse/HIVE-6931
 Project: Hive
  Issue Type: Bug
  Components: Tests, Windows
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-6931.1.patch


 A few misc fixes for some of the unit tests on Windows.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Review Request 20472: HIVE-6931 Windows unit test fixes

2014-04-17 Thread Jason Dere

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20472/
---

Review request for hive and Thejas Nair.


Bugs: HIVE-6931
https://issues.apache.org/jira/browse/HIVE-6931


Repository: hive-git


Description
---

Remove setAuxJars() call which was breaking Minimr tests
Refactor common code between QTestUtil/WindowsPathUtil
TestExecDriver should initialize tmpdir after converting Windows paths
Fix a couple of q file tests


Diffs
-

  itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java d6e33f8 
  pom.xml 426dca8 
  ql/src/test/org/apache/hadoop/hive/ql/WindowsPathUtil.java 131260b 
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java b548672 
  ql/src/test/queries/clientpositive/scriptfile1_win.q 0008ae5 
  ql/src/test/queries/clientpositive/tez_insert_overwrite_local_directory_1.q 
d7a652f 
  ql/src/test/results/clientpositive/scriptfile1_win.q.out dfaa057 

Diff: https://reviews.apache.org/r/20472/diff/


Testing
---


Thanks,

Jason Dere



[jira] [Updated] (HIVE-6931) Windows unit test fixes

2014-04-17 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-6931:
-

Status: Patch Available  (was: Open)

 Windows unit test fixes
 ---

 Key: HIVE-6931
 URL: https://issues.apache.org/jira/browse/HIVE-6931
 Project: Hive
  Issue Type: Bug
  Components: Tests, Windows
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-6931.1.patch


 A few misc fixes for some of the unit tests on Windows.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6922) NullPointerException in collect_set() UDAF

2014-04-17 Thread Sun Rui (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973698#comment-13973698
 ] 

Sun Rui commented on HIVE-6922:
---

[~jdere] I thought the bug was trivial and due to a casual missing of null 
pointer check, so a testcase for it would be trivial. However, if you still 
prefer a testcase, I can add it.

 NullPointerException in collect_set() UDAF
 --

 Key: HIVE-6922
 URL: https://issues.apache.org/jira/browse/HIVE-6922
 Project: Hive
  Issue Type: Bug
  Components: UDF
Reporter: Sun Rui
Assignee: Sun Rui
 Attachments: HIVE-6922.patch


 Steps to reproduce the bug:
 {noformat}
 create table temp(key int, value string);
 -- leave the table empty
 select collect_set(key) from temp where key=0;
 Error: java.lang.RuntimeException: Hive Runtime Error while closing 
 operators: java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:326)
   at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:471)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1477)
   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1141)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:577)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:318)
   ... 7 more
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMkCollectionEvaluator.merge(GenericUDAFMkCollectionEvaluator.java:140)
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:186)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1132)
   ... 9 more
 {noformat}
 The root cause is that in GenericUDAFMkCollectionEvaluator.merge() 
 partialResult could be null but is not validated before it is used.
 {code}
 ListObject partialResult = (ArrayListObject) 
 internalMergeOI.getList(partial);
 for(Object i : partialResult) {
   putIntoCollection(i, myagg);
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6932) hive README needs update

2014-04-17 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973707#comment-13973707
 ] 

Thejas M Nair commented on HIVE-6932:
-

Also needing update is the requirements section. We should include Java 1.7.



 hive README needs update
 

 Key: HIVE-6932
 URL: https://issues.apache.org/jira/browse/HIVE-6932
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Thejas M Nair

 It needs to be updated to include Tez as a runtime. Also, it talks about 
 average latency being in minutes, which is very misleading.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-6932) hive README needs update

2014-04-17 Thread Thejas M Nair (JIRA)
Thejas M Nair created HIVE-6932:
---

 Summary: hive README needs update
 Key: HIVE-6932
 URL: https://issues.apache.org/jira/browse/HIVE-6932
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Thejas M Nair


It needs to be updated to include Tez as a runtime. Also, it talks about 
average latency being in minutes, which is very misleading.




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6932) hive README needs update

2014-04-17 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973708#comment-13973708
 ] 

Thejas M Nair commented on HIVE-6932:
-

Also add MS SQL in databases supported (for 0.14) release.


 hive README needs update
 

 Key: HIVE-6932
 URL: https://issues.apache.org/jira/browse/HIVE-6932
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Thejas M Nair

 It needs to be updated to include Tez as a runtime. Also, it talks about 
 average latency being in minutes, which is very misleading.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (HIVE-6932) hive README needs update

2014-04-17 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973708#comment-13973708
 ] 

Thejas M Nair edited comment on HIVE-6932 at 4/18/14 2:09 AM:
--

Also add Microsoft SQL Server in databases supported (for 0.14) release.



was (Author: thejas):
Also add MS SQL in databases supported (for 0.14) release.


 hive README needs update
 

 Key: HIVE-6932
 URL: https://issues.apache.org/jira/browse/HIVE-6932
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Thejas M Nair

 It needs to be updated to include Tez as a runtime. Also, it talks about 
 average latency being in minutes, which is very misleading.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6922) NullPointerException in collect_set() UDAF

2014-04-17 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973710#comment-13973710
 ] 

Xuefu Zhang commented on HIVE-6922:
---

Yes, adding the null check is trivial, but I guess it's more important to know 
why the variable might be null. Otherwise, null check might just hide other bug.

 NullPointerException in collect_set() UDAF
 --

 Key: HIVE-6922
 URL: https://issues.apache.org/jira/browse/HIVE-6922
 Project: Hive
  Issue Type: Bug
  Components: UDF
Reporter: Sun Rui
Assignee: Sun Rui
 Attachments: HIVE-6922.patch


 Steps to reproduce the bug:
 {noformat}
 create table temp(key int, value string);
 -- leave the table empty
 select collect_set(key) from temp where key=0;
 Error: java.lang.RuntimeException: Hive Runtime Error while closing 
 operators: java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:326)
   at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:471)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1477)
   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1141)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:577)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:318)
   ... 7 more
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMkCollectionEvaluator.merge(GenericUDAFMkCollectionEvaluator.java:140)
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:186)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1132)
   ... 9 more
 {noformat}
 The root cause is that in GenericUDAFMkCollectionEvaluator.merge() 
 partialResult could be null but is not validated before it is used.
 {code}
 ListObject partialResult = (ArrayListObject) 
 internalMergeOI.getList(partial);
 for(Object i : partialResult) {
   putIntoCollection(i, myagg);
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: [VOTE] Apache Hive 0.13.0 Release Candidate 2

2014-04-17 Thread Thejas Nair
+1
- Verified the md5 checksums and gpg keys
- Checked LICENSE, README.txt , NOTICE, RELEASE_NOTES.txt files
- Build src tar.gz
- Ran local mode queries with new build.

I had run unit test suite with rc1 and they looked good.


On Tue, Apr 15, 2014 at 2:06 PM, Harish Butani rhbut...@apache.org wrote:
 Apache Hive 0.13.0 Release Candidate 2 is available here:

 http://people.apache.org/~rhbutani/hive-0.13.0-candidate-2

 Maven artifacts are available here:

 https://repository.apache.org/content/repositories/orgapachehive-1011

 Source tag for RCN is at:
 https://svn.apache.org/repos/asf/hive/tags/release-0.13.0-rc2/

 Voting will conclude in 72 hours.

 Hive PMC Members: Please test and vote.

 Thanks.

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


[jira] [Updated] (HIVE-6913) Hive unable to find the hashtable file during complex multi-staged map join

2014-04-17 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-6913:
--

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Patch committed to trunk. Thanks Brock for the fix.

 Hive unable to find the hashtable file during complex multi-staged map join
 ---

 Key: HIVE-6913
 URL: https://issues.apache.org/jira/browse/HIVE-6913
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Brock Noland
 Fix For: 0.14.0

 Attachments: HIVE-6913.patch, HIVE-6913.patch


 If a query has multiple mapjoins and one of the tables to be mapjoined is 
 empty, the query can result in a no such file or directory when looking for 
 the hashtable.
 This is because when we generate a dummy hash table, we do not close the 
 TableScan (TS) operator for that table. Additionally, HashTableSinkOperator 
 (HTSO) outputs it's hash tables in the closeOp method. However, when close is 
 called on HTSO we check to ensure that all parents are closed: 
 https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java#L333
 which is not true on this case, because the TS operator for the empty table 
 was never closed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6835) Reading of partitioned Avro data fails if partition schema does not match table schema

2014-04-17 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973722#comment-13973722
 ] 

Xuefu Zhang commented on HIVE-6835:
---

[~erwaman] Thanks for the explanation. Now I see where the problem is. 
SERDEPROPERTIES and TBLPROPERTIES are for different purpose. I'm curious why 
user would put avro.schema.literal in the serde properties, as this is table 
specific and it should be put in TBLPROPERTIES. SERDEPROPERTIES, on the other 
hand, is used to control serde behavior (plugin level instead of table level), 
such as field delimiter which doesn't necessary vary from table to table. If 
you check AvroSerde documentation, schema is specified in TBLPROPERTIES. 
https://cwiki.apache.org/confluence/display/Hive/AvroSerDe. Thus, it seems that 
this fix is for an invalid use case. What's your thought on this?

 Reading of partitioned Avro data fails if partition schema does not match 
 table schema
 --

 Key: HIVE-6835
 URL: https://issues.apache.org/jira/browse/HIVE-6835
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Anthony Hsu
Assignee: Anthony Hsu
 Attachments: HIVE-6835.1.patch, HIVE-6835.2.patch, HIVE-6835.3.patch


 To reproduce:
 {code}
 create table testarray (a arraystring);
 load data local inpath '/home/ahsu/test/array.txt' into table testarray;
 # create partitioned Avro table with one array column
 create table avroarray partitioned by (y string) row format serde 
 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' with serdeproperties 
 ('avro.schema.literal'='{namespace:test,name:avroarray,type: 
 record, fields: [ { name:a, type:{type:array,items:string} 
 } ] }')  STORED as INPUTFORMAT  
 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'  OUTPUTFORMAT  
 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat';
 insert into table avroarray partition(y=1) select * from testarray;
 # add an int column with a default value of 0
 alter table avroarray set serde 
 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' with 
 serdeproperties('avro.schema.literal'='{namespace:test,name:avroarray,type:
  record, fields: [ {name:intfield,type:int,default:0},{ 
 name:a, type:{type:array,items:string} } ] }');
 # fails with ClassCastException
 select * from avroarray;
 {code}
 The select * fails with:
 {code}
 Failed with exception java.io.IOException:java.lang.ClassCastException: 
 org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector 
 cannot be cast to 
 org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6922) NullPointerException in collect_set() UDAF

2014-04-17 Thread Sun Rui (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973727#comment-13973727
 ] 

Sun Rui commented on HIVE-6922:
---

[~xuefuz] The reason for the variable being null is that the table is empty and 
thus no input data. 
{code}
  /**
   * Merge with partial aggregation result. NOTE: null might be passed in case
   * there is no input data.
   * 
   * @param partial
   *  The partial aggregation result.
   */
  public abstract void merge(AggregationBuffer agg, Object partial) throws 
HiveException;
{code}
Null might be passed in case there is no input data in the description for 
merge() in GenericUDAFEvaluator.
I found existing examples of checking if partial is null. 
GenericUDAFComputeStats as an example:
{code}
@Override
public void merge(AggregationBuffer agg, Object partial) throws 
HiveException {
  if (partial != null) {
...
  }
}
{code}

 NullPointerException in collect_set() UDAF
 --

 Key: HIVE-6922
 URL: https://issues.apache.org/jira/browse/HIVE-6922
 Project: Hive
  Issue Type: Bug
  Components: UDF
Reporter: Sun Rui
Assignee: Sun Rui
 Attachments: HIVE-6922.patch


 Steps to reproduce the bug:
 {noformat}
 create table temp(key int, value string);
 -- leave the table empty
 select collect_set(key) from temp where key=0;
 Error: java.lang.RuntimeException: Hive Runtime Error while closing 
 operators: java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:326)
   at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:471)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1477)
   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1141)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:577)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:318)
   ... 7 more
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMkCollectionEvaluator.merge(GenericUDAFMkCollectionEvaluator.java:140)
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:186)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1132)
   ... 9 more
 {noformat}
 The root cause is that in GenericUDAFMkCollectionEvaluator.merge() 
 partialResult could be null but is not validated before it is used.
 {code}
 ListObject partialResult = (ArrayListObject) 
 internalMergeOI.getList(partial);
 for(Object i : partialResult) {
   putIntoCollection(i, myagg);
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6843) INSTR for UTF-8 returns incorrect position

2014-04-17 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973730#comment-13973730
 ] 

Szehon Ho commented on HIVE-6843:
-

Thanks for the review.  As I understand, you are passing in a string literal to 
Text constructor, so it is not interpreting \uD801 as one char, so there is 
actually 5 chars there: '\', 'u', 'D', '8', '0', '1'.

I tried the following test and it seemed to work:

char[] chararray = new char[] {'1', '2', '3', '\uD801', '\uDC00', '4', '5', 
'6'};
String str = new String(chararray);
Assert.assertEquals(5, GenericUDFUtils.findText(new Text(str), new 
Text(4), 0));

I guess the second check was supposed to be 5, not 4.

 INSTR for UTF-8 returns incorrect position
 --

 Key: HIVE-6843
 URL: https://issues.apache.org/jira/browse/HIVE-6843
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 0.11.0, 0.12.0
Reporter: Clif Kranish
Assignee: Szehon Ho
Priority: Minor
 Attachments: HIVE-6843.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


  1   2   >