[jira] [Commented] (HIVE-784) Support uncorrelated subqueries in the WHERE clause

2013-07-11 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705581#comment-13705581
 ] 

Navis commented on HIVE-784:


Added some comments

 Support uncorrelated subqueries in the WHERE clause
 ---

 Key: HIVE-784
 URL: https://issues.apache.org/jira/browse/HIVE-784
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Ning Zhang
Assignee: Matthew Weaver
 Attachments: HIVE-784.1.patch.txt


 Hive currently only support views in the FROM-clause, some Facebook use cases 
 suggest that Hive should support subqueries such as those connected by 
 IN/EXISTS in the WHERE-clause. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3691) TestDynamicSerDe failed with IBM JDK

2013-07-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705679#comment-13705679
 ] 

Hudson commented on HIVE-3691:
--

Integrated in Hive-trunk-hadoop2 #282 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/282/])
HIVE-3691 : TestDynamicSerDe failed with IBM JDK (Bing Li  Renata Ghisloti 
via Ashutosh Chauhan) (Revision 1501687)

 Result = ABORTED
hashutosh : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1501687
Files : 
* 
/hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/dynamic_type/TestDynamicSerDe.java


 TestDynamicSerDe failed with IBM JDK
 

 Key: HIVE-3691
 URL: https://issues.apache.org/jira/browse/HIVE-3691
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.7.1, 0.8.0, 0.9.0
 Environment: ant-1.8.2, IBM JDK 1.6
Reporter: Bing Li
Assignee: Bing Li
Priority: Minor
 Fix For: 0.12.0

 Attachments: HIVE-3691.1.patch-trunk.txt, HIVE-3691.1.patch.txt


 the order of the output in the gloden file are different from JDKs.
 the root cause of this is the implementation of HashMap in JDK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4807) Hive metastore hangs

2013-07-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705678#comment-13705678
 ] 

Hudson commented on HIVE-4807:
--

Integrated in Hive-trunk-hadoop2 #282 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/282/])
HIVE-4807 : Hive metastore hangs (Sarvesh Sakalanaga via Ashutosh Chauhan) 
(Revision 1501675)

 Result = ABORTED
hashutosh : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1501675
Files : 
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/ivy/libraries.properties
* /hive/trunk/jdbc/build.xml
* /hive/trunk/metastore/ivy.xml


 Hive metastore hangs
 

 Key: HIVE-4807
 URL: https://issues.apache.org/jira/browse/HIVE-4807
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.8.0, 0.9.0, 0.10.0, 0.11.0
Reporter: Sarvesh Sakalanaga
Assignee: Sarvesh Sakalanaga
 Fix For: 0.12.0

 Attachments: Hive-4807.0.patch, Hive-4807.1.patch, Hive-4807.2.patch


  Hive metastore hangs (does not accept any new connections) due to a bug in 
 DBCP. The root cause analysis is here 
 https://issues.apache.org/jira/browse/DBCP-398.  The fix is to change Hive 
 connection pool to BoneCP which is natively supported by DataNucleus.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4838) Refactor MapJoin HashMap code to improve testability and readability

2013-07-11 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-4838:
---

Attachment: HIVE-4838.patch

Running tests on the attached patch.

 Refactor MapJoin HashMap code to improve testability and readability
 

 Key: HIVE-4838
 URL: https://issues.apache.org/jira/browse/HIVE-4838
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-4838.patch


 MapJoin is an essential component for high performance joins in Hive and the 
 current code has done great service for many years. However, the code is 
 showing it's age and currently suffers  from the following issues:
 * Uses static state via the MapJoinMetaData class to pass serialization 
 metadata to the Key, Row classes.
 * The api of a logical Table Container is not defined and therefore it's 
 unclear what apis HashMapWrapper 
 needs to publicize. Additionally HashMapWrapper has many used public methods.
 * HashMapWrapper contains logic to serialize, test memory bounds, and 
 implement the table container. Ideally these logical units could be seperated
 * HashTableSinkObjectCtx has unused fields and unused methods
 * CommonJoinOperator and children use ArrayList on left hand side when only 
 List is required
 * There are unused classes MRU, DCLLItemm and classes which duplicate 
 functionality MapJoinSingleKey and MapJoinDoubleKeys

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2991) Integrate Clover with Hive

2013-07-11 Thread Ivan A. Veselovsky (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan A. Veselovsky updated HIVE-2991:
-

Attachment: HIVE-clover-trunk--N1.patch
HIVE-clover-branch-0.11--N1.patch
HIVE-clover-branch-0.10--N1.patch

The attached patches HIVE-clover-xxx.patch are somewhat updated versions of 
the clovering. We used them in parallel builds.

Except clovering itself the patches introduce the following changes:
1) .q files test generator changed to split test classes by groups of tests (10 
test cases) per class is the default. This is needed to avoid huge test classes 
-- needed for parralalized and distributed builds.
2) added test-lightweight target that allows to run a batch of tests without 
re-generation/re-compilation. This is badly needed in parallelized and 
distributed builds.
3) we introduce testcase-list parameter that allows to pass several test 
class names to execute. The names are to be passed in form of comma-separated 
list with each name in the list being in form **/a/b/c/TestFoo.*. Last 
asterisk is needed because main project accepts .class names, while HCatalog 
accepts .java names.
4) + several more improvements related to clover instrumentation, reporting, 
etc. 

 Integrate Clover with Hive
 --

 Key: HIVE-2991
 URL: https://issues.apache.org/jira/browse/HIVE-2991
 Project: Hive
  Issue Type: Test
  Components: Testing Infrastructure
Affects Versions: 0.9.0
Reporter: Ashutosh Chauhan
 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2991.D2985.1.patch, 
 hive.2991.1.branch-0.10.patch, hive.2991.1.branch-0.9.patch, 
 hive.2991.1.trunk.patch, hive.2991.2.branch-0.10.patch, 
 hive.2991.2.branch-0.9.patch, hive.2991.2.trunk.patch, 
 hive.2991.3.branch-0.10.patch, hive.2991.3.branch-0.9.patch, 
 hive.2991.3.trunk.patch, hive.2991.4.branch-0.10.patch, 
 hive.2991.4.branch-0.9.patch, hive.2991.4.trunk.patch, 
 HIVE-clover-branch-0.10--N1.patch, HIVE-clover-branch-0.11--N1.patch, 
 HIVE-clover-trunk--N1.patch, hive-trunk-clover-html-report.zip


 Atlassian has donated license of their code coverage tool Clover to ASF. Lets 
 make use of it to generate code coverage report to figure out which areas of 
 Hive are well tested and which ones are not. More information about license 
 can be found in Hadoop jira HADOOP-1718 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4675) Create new parallel unit test environment

2013-07-11 Thread Vikram Dixit K (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705943#comment-13705943
 ] 

Vikram Dixit K commented on HIVE-4675:
--

I used this framework to run tests on hive on a single node. It took about half 
the time that it normally takes which is great. However, I am unable to figure 
out the failing tests. I got a message that goes:

TestOrcHCatLoader has one or more failing tests... Also, it doesn't seem like 
the output is integrated with the ant testreport target. It would be great to 
see a summary of failing tests. Could you please elaborate on how to get an 
idea of the failing tests.

Thanks!

 Create new parallel unit test environment
 -

 Key: HIVE-4675
 URL: https://issues.apache.org/jira/browse/HIVE-4675
 Project: Hive
  Issue Type: Improvement
  Components: Testing Infrastructure
Reporter: Brock Noland
Assignee: Brock Noland
 Fix For: 0.12.0

 Attachments: HIVE-4675.patch


 The current ptest tool is great, but it has the following limitations:
 -Requires an NFS filer
 -Unless the NFS filer is dedicated ptests can become IO bound easily
 -Investigating of failures is troublesome because the source directory for 
 the failure is not saved
 -Ignoring or isolated tests is not supported
 -No unit tests for the ptest framework exist
 It'd be great to have a ptest tool that addresses this limitations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4675) Create new parallel unit test environment

2013-07-11 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705947#comment-13705947
 ] 

Brock Noland commented on HIVE-4675:


Hi,

Great to hear! The TEST-*.xml file should be in the logs directory in the 
working dir. Typically we run this via jenkins and then in the jenkins build 
script copy the TEST-*.xml files into a directory for jenkins to parse.

I think we could generate some kind of report as well, did you want to create 
an enhancement request describing what you'd like?

Brock

 Create new parallel unit test environment
 -

 Key: HIVE-4675
 URL: https://issues.apache.org/jira/browse/HIVE-4675
 Project: Hive
  Issue Type: Improvement
  Components: Testing Infrastructure
Reporter: Brock Noland
Assignee: Brock Noland
 Fix For: 0.12.0

 Attachments: HIVE-4675.patch


 The current ptest tool is great, but it has the following limitations:
 -Requires an NFS filer
 -Unless the NFS filer is dedicated ptests can become IO bound easily
 -Investigating of failures is troublesome because the source directory for 
 the failure is not saved
 -Ignoring or isolated tests is not supported
 -No unit tests for the ptest framework exist
 It'd be great to have a ptest tool that addresses this limitations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HIVE-2991) Integrate Clover with Hive

2013-07-11 Thread Ivan A. Veselovsky (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan A. Veselovsky reassigned HIVE-2991:


Assignee: Ivan A. Veselovsky

 Integrate Clover with Hive
 --

 Key: HIVE-2991
 URL: https://issues.apache.org/jira/browse/HIVE-2991
 Project: Hive
  Issue Type: Test
  Components: Testing Infrastructure
Affects Versions: 0.9.0
Reporter: Ashutosh Chauhan
Assignee: Ivan A. Veselovsky
 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2991.D2985.1.patch, 
 hive.2991.1.branch-0.10.patch, hive.2991.1.branch-0.9.patch, 
 hive.2991.1.trunk.patch, hive.2991.2.branch-0.10.patch, 
 hive.2991.2.branch-0.9.patch, hive.2991.2.trunk.patch, 
 hive.2991.3.branch-0.10.patch, hive.2991.3.branch-0.9.patch, 
 hive.2991.3.trunk.patch, hive.2991.4.branch-0.10.patch, 
 hive.2991.4.branch-0.9.patch, hive.2991.4.trunk.patch, 
 HIVE-clover-branch-0.10--N1.patch, HIVE-clover-branch-0.11--N1.patch, 
 HIVE-clover-trunk--N1.patch, hive-trunk-clover-html-report.zip


 Atlassian has donated license of their code coverage tool Clover to ASF. Lets 
 make use of it to generate code coverage report to figure out which areas of 
 Hive are well tested and which ones are not. More information about license 
 can be found in Hadoop jira HADOOP-1718 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4160) Vectorized Query Execution in Hive

2013-07-11 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706089#comment-13706089
 ] 

Jitendra Nath Pandey commented on HIVE-4160:


Dmitry, Vinod
  There is significant amount of vectorization work in expression evaluation 
for example, arithmetic expressions or logical expressions or aggregations etc. 
Many of these expressions are pretty generic and different systems are likely 
to have similar semantics for these. It should be possible to re-use this code 
with little change in pig or other systems. It will be required to use same 
vectorized representation of data in the processing engine to re-use these 
expressions, but that part of code is also generic and re-usable. I think that 
could be a good starting point.
  However, a bunch of the vectorization work is in operator code where we have 
vectorized version of the hive operators. These operators are closely tied with 
hive semantics and implementation. Therefore, it will need some restructuring 
in hive code base as well to generalize these operators for re-use in other 
projects. Also, at this point we should be thinking more generally about a 
common physical layer shared between pig and hive. These languages can continue 
to have different logical plans but it would be desirable that they share 
common physical plan structure because they both use same map-reduce runtime.

 Vectorized Query Execution in Hive
 --

 Key: HIVE-4160
 URL: https://issues.apache.org/jira/browse/HIVE-4160
 Project: Hive
  Issue Type: New Feature
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Attachments: Hive-Vectorized-Query-Execution-Design.docx, 
 Hive-Vectorized-Query-Execution-Design-rev2.docx, 
 Hive-Vectorized-Query-Execution-Design-rev3.docx, 
 Hive-Vectorized-Query-Execution-Design-rev3.docx, 
 Hive-Vectorized-Query-Execution-Design-rev3.pdf, 
 Hive-Vectorized-Query-Execution-Design-rev4.docx, 
 Hive-Vectorized-Query-Execution-Design-rev4.pdf, 
 Hive-Vectorized-Query-Execution-Design-rev5.docx, 
 Hive-Vectorized-Query-Execution-Design-rev5.pdf, 
 Hive-Vectorized-Query-Execution-Design-rev6.docx, 
 Hive-Vectorized-Query-Execution-Design-rev6.pdf, 
 Hive-Vectorized-Query-Execution-Design-rev7.docx, 
 Hive-Vectorized-Query-Execution-Design-rev8.docx, 
 Hive-Vectorized-Query-Execution-Design-rev8.pdf, 
 Hive-Vectorized-Query-Execution-Design-rev9.docx, 
 Hive-Vectorized-Query-Execution-Design-rev9.pdf


 The Hive query execution engine currently processes one row at a time. A 
 single row of data goes through all the operators before the next row can be 
 processed. This mode of processing is very inefficient in terms of CPU usage. 
 Research has demonstrated that this yields very low instructions per cycle 
 [MonetDB X100]. Also currently Hive heavily relies on lazy deserialization 
 and data columns go through a layer of object inspectors that identify column 
 type, deserialize data and determine appropriate expression routines in the 
 inner loop. These layers of virtual method calls further slow down the 
 processing. 
 This work will add support for vectorized query execution to Hive, where, 
 instead of individual rows, batches of about a thousand rows at a time are 
 processed. Each column in the batch is represented as a vector of a primitive 
 data type. The inner loop of execution scans these vectors very fast, 
 avoiding method calls, deserialization, unnecessary if-then-else, etc. This 
 substantially reduces CPU time used, and gives excellent instructions per 
 cycle (i.e. improved processor pipeline utilization). See the attached design 
 specification for more details.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2991) Integrate Clover with Hive

2013-07-11 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706097#comment-13706097
 ] 

Ashutosh Chauhan commented on HIVE-2991:


[~iveselovsky] Seems like you have expanded the scope of this jira quite a bit. 
Your other changes (introducing targets in build system) are quite useful, but 
they are orthogonal to clover integration (as far as i understand). I would 
suggest to split the patch in three parts: one for clover integration, second 
for improvement in test infrastructure and third for improvements in build 
infra.

 Integrate Clover with Hive
 --

 Key: HIVE-2991
 URL: https://issues.apache.org/jira/browse/HIVE-2991
 Project: Hive
  Issue Type: Test
  Components: Testing Infrastructure
Affects Versions: 0.9.0
Reporter: Ashutosh Chauhan
Assignee: Ivan A. Veselovsky
 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2991.D2985.1.patch, 
 hive.2991.1.branch-0.10.patch, hive.2991.1.branch-0.9.patch, 
 hive.2991.1.trunk.patch, hive.2991.2.branch-0.10.patch, 
 hive.2991.2.branch-0.9.patch, hive.2991.2.trunk.patch, 
 hive.2991.3.branch-0.10.patch, hive.2991.3.branch-0.9.patch, 
 hive.2991.3.trunk.patch, hive.2991.4.branch-0.10.patch, 
 hive.2991.4.branch-0.9.patch, hive.2991.4.trunk.patch, 
 HIVE-clover-branch-0.10--N1.patch, HIVE-clover-branch-0.11--N1.patch, 
 HIVE-clover-trunk--N1.patch, hive-trunk-clover-html-report.zip


 Atlassian has donated license of their code coverage tool Clover to ASF. Lets 
 make use of it to generate code coverage report to figure out which areas of 
 Hive are well tested and which ones are not. More information about license 
 can be found in Hadoop jira HADOOP-1718 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4160) Vectorized Query Execution in Hive

2013-07-11 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706190#comment-13706190
 ] 

Dmitriy V. Ryaboy commented on HIVE-4160:
-

Jitendra,
I believe physical plan primitives for both Hive and Pig (and potentially 
others) are going to come in via Tez, as both Pig and Hive want to get off 
strict MR in the long-term.

I'll take a crack at extracting what's extractable. Right now Hive's UDAF 
reaches fairly deeply into this code, as you noted, but I think with a little 
restructuring this can be factored out.

 Vectorized Query Execution in Hive
 --

 Key: HIVE-4160
 URL: https://issues.apache.org/jira/browse/HIVE-4160
 Project: Hive
  Issue Type: New Feature
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Attachments: Hive-Vectorized-Query-Execution-Design.docx, 
 Hive-Vectorized-Query-Execution-Design-rev2.docx, 
 Hive-Vectorized-Query-Execution-Design-rev3.docx, 
 Hive-Vectorized-Query-Execution-Design-rev3.docx, 
 Hive-Vectorized-Query-Execution-Design-rev3.pdf, 
 Hive-Vectorized-Query-Execution-Design-rev4.docx, 
 Hive-Vectorized-Query-Execution-Design-rev4.pdf, 
 Hive-Vectorized-Query-Execution-Design-rev5.docx, 
 Hive-Vectorized-Query-Execution-Design-rev5.pdf, 
 Hive-Vectorized-Query-Execution-Design-rev6.docx, 
 Hive-Vectorized-Query-Execution-Design-rev6.pdf, 
 Hive-Vectorized-Query-Execution-Design-rev7.docx, 
 Hive-Vectorized-Query-Execution-Design-rev8.docx, 
 Hive-Vectorized-Query-Execution-Design-rev8.pdf, 
 Hive-Vectorized-Query-Execution-Design-rev9.docx, 
 Hive-Vectorized-Query-Execution-Design-rev9.pdf


 The Hive query execution engine currently processes one row at a time. A 
 single row of data goes through all the operators before the next row can be 
 processed. This mode of processing is very inefficient in terms of CPU usage. 
 Research has demonstrated that this yields very low instructions per cycle 
 [MonetDB X100]. Also currently Hive heavily relies on lazy deserialization 
 and data columns go through a layer of object inspectors that identify column 
 type, deserialize data and determine appropriate expression routines in the 
 inner loop. These layers of virtual method calls further slow down the 
 processing. 
 This work will add support for vectorized query execution to Hive, where, 
 instead of individual rows, batches of about a thousand rows at a time are 
 processed. Each column in the batch is represented as a vector of a primitive 
 data type. The inner loop of execution scans these vectors very fast, 
 avoiding method calls, deserialization, unnecessary if-then-else, etc. This 
 substantially reduces CPU time used, and gives excellent instructions per 
 cycle (i.e. improved processor pipeline utilization). See the attached design 
 specification for more details.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde

2013-07-11 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-4732:


Summary: Reduce or eliminate the expensive Schema equals() check for 
AvroSerde  (was: Speed up AvroSerde by checking hashcodes instead of equality)

 Reduce or eliminate the expensive Schema equals() check for AvroSerde
 -

 Key: HIVE-4732
 URL: https://issues.apache.org/jira/browse/HIVE-4732
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Mark Wagner
Assignee: Mohammad Kamrul Islam
 Attachments: HIVE-4732.1.patch


 The AvroSerde spends a significant amount of time checking schema equality. 
 Changing to compare hashcodes (which can be computed once then reused) will 
 improve performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4675) Create new parallel unit test environment

2013-07-11 Thread Vikram Dixit K (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706236#comment-13706236
 ] 

Vikram Dixit K commented on HIVE-4675:
--

[~brocknoland] I have raised HIVE-4842 for the same. 

Thanks!

 Create new parallel unit test environment
 -

 Key: HIVE-4675
 URL: https://issues.apache.org/jira/browse/HIVE-4675
 Project: Hive
  Issue Type: Improvement
  Components: Testing Infrastructure
Reporter: Brock Noland
Assignee: Brock Noland
 Fix For: 0.12.0

 Attachments: HIVE-4675.patch


 The current ptest tool is great, but it has the following limitations:
 -Requires an NFS filer
 -Unless the NFS filer is dedicated ptests can become IO bound easily
 -Investigating of failures is troublesome because the source directory for 
 the failure is not saved
 -Ignoring or isolated tests is not supported
 -No unit tests for the ptest framework exist
 It'd be great to have a ptest tool that addresses this limitations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4842) Hive parallel test framework 2 needs to summarize failures

2013-07-11 Thread Vikram Dixit K (JIRA)
Vikram Dixit K created HIVE-4842:


 Summary: Hive parallel test framework 2 needs to summarize failures
 Key: HIVE-4842
 URL: https://issues.apache.org/jira/browse/HIVE-4842
 Project: Hive
  Issue Type: Improvement
  Components: Build Infrastructure
Affects Versions: 0.12.0
Reporter: Vikram Dixit K
Assignee: Brock Noland
Priority: Minor
 Fix For: 0.12.0


Currently when unit tests are run, there are multiple simple ways to consume 
the results. Particularly ant testreport target that generates an html file for 
easily locating failures. The ptest2 changes coming from HIVE-4675 is great for 
running the tests in parallel but not very easy to figure out the failing 
tests. It would be great to have an output similar to that of the testreport 
target for easy consumption.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3756) LOAD DATA does not honor permission inheritence

2013-07-11 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706269#comment-13706269
 ] 

Sushanth Sowmyan commented on HIVE-3756:


I have a few more thoughts on this. Let's walk through an example:

Let's say Parent Dir d1 has permission/group combination A.
Let's say directory d2 inside Parent Dir has permission/group combination B.

In the case of non-partitioned tables, d1 will be the database/warehouse dir, 
and d2 the table dir.
In the case of partitioned tables, d1 will be the table directory and d2 the 
appropriate partition directories.

If we did not have the flag to inherit permissions on, then whatever data is 
loaded, be it files inside d2 (as during a load operation) or replacing d2 and 
everything in it (as during an insert overwrite operation), will have yet 
another permission/group combination C, which is a function of the user's 
current umask and the user's default group

The purpose behind the subdir inherit permissions flag is to make this 
behaviour go away, and to be able to use the parent dir's permissions/group 
when possible. So far, so good.

Let's say, for purposes of this entire discussion from now onwards, the flag to 
inherit permissions is on.

Now, if we load data into d2, without using overwrite, files inside d2 get 
permission B.
If we load data into d2, using overwrite, we now overwrite d2, and thus, d2 
takes on d1's permissions, and so do the files inside, thus resulting in d2 and 
files inside d2 having permissions/group combination A.

--

While this behaviour is consistent, I find that from a user's perspective, if 
they create a table (say unpartitioned), then chmod/chgrp it to B, and then 
they try to load data into it using an Insert-Overwrite, then they still expect 
that they're only overwriting data inside the table dir, and their expectation 
is that the table still have permissions/group-combination B. They don't want 
it to be replaced by A, the parent db dir's permissions/group , and they 
don't want C, the umask/current-user-default-group.

Now, as to whether this requires a new flag that overrides 
hive.warehouse.subdir.inherit.perms or whether they want 
hive.warehouse.subdir.inherit.perms to work in this way is still up for 
discussion, but there is now need for an additional requirement, that of the 
following:

If the directory being moved in already exists, and will be deleted so that 
this can be placed, then instead of going with the parent permissions, it 
should go with the previous dir's permissions.

Thoughts?

This can be a separate jira if people feel like it should be, but I think it's 
also a minor modification of this current jira.

 LOAD DATA does not honor permission inheritence
 -

 Key: HIVE-3756
 URL: https://issues.apache.org/jira/browse/HIVE-3756
 Project: Hive
  Issue Type: Bug
  Components: Authorization, Security
Affects Versions: 0.9.0
Reporter: Johndee Burks
Assignee: Chaoyu Tang
 Attachments: HIVE-3756_1.patch, HIVE-3756.patch


 When a LOAD DATA operation is performed the resulting data in hdfs for the 
 table does not maintain permission inheritance. This remains true even with 
 the hive.warehouse.subdir.inherit.perms set to true.
 The issue is easily reproducible by creating a table and loading some data 
 into it. After the load is complete just do a dfs -ls -R on the warehouse 
 directory and you will see that the inheritance of permissions worked for the 
 table directory but not for the data. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4055) add Date data type

2013-07-11 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-4055:
-

Status: Patch Available  (was: Open)

 add Date data type
 --

 Key: HIVE-4055
 URL: https://issues.apache.org/jira/browse/HIVE-4055
 Project: Hive
  Issue Type: Sub-task
  Components: JDBC, Query Processor, Serializers/Deserializers, UDF
Reporter: Sun Rui
 Attachments: Date.pdf, HIVE-4055.1.patch.txt, HIVE-4055.2.patch.txt, 
 HIVE-4055.D11547.1.patch


 Add Date data type, a new primitive data type which supports the standard SQL 
 date type.
 Basically, the implementation can take HIVE-2272 and HIVE-2957 as references.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde

2013-07-11 Thread Mohammad Kamrul Islam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706299#comment-13706299
 ] 

Mohammad Kamrul Islam commented on HIVE-4732:
-

Thanks Edward for the comments.
We are now trying to take a different approach to address the same issue.
A new patch is coming soon.

 Reduce or eliminate the expensive Schema equals() check for AvroSerde
 -

 Key: HIVE-4732
 URL: https://issues.apache.org/jira/browse/HIVE-4732
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Mark Wagner
Assignee: Mohammad Kamrul Islam
 Attachments: HIVE-4732.1.patch


 The AvroSerde spends a significant amount of time checking schema equality. 
 Changing to compare hashcodes (which can be computed once then reused) will 
 improve performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4843) Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and readability

2013-07-11 Thread Vikram Dixit K (JIRA)
Vikram Dixit K created HIVE-4843:


 Summary: Refactoring MapRedTask and ExecDriver for better 
re-usability (for tez) and readability
 Key: HIVE-4843
 URL: https://issues.apache.org/jira/browse/HIVE-4843
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0, tez-branch
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Attachments: HIVE-4843.1.patch

Currently, there are static apis in multiple locations in ExecDriver and 
MapRedTask that can be leveraged if put in the already existing utility class 
in the exec package. This would help making the code more maintainable, 
readable and also re-usable by other run-time infra such as tez.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4843) Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and readability

2013-07-11 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-4843:
-

Attachment: HIVE-4843.1.patch

 Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and 
 readability
 ---

 Key: HIVE-4843
 URL: https://issues.apache.org/jira/browse/HIVE-4843
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0, tez-branch
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Attachments: HIVE-4843.1.patch


 Currently, there are static apis in multiple locations in ExecDriver and 
 MapRedTask that can be leveraged if put in the already existing utility class 
 in the exec package. This would help making the code more maintainable, 
 readable and also re-usable by other run-time infra such as tez.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Review Request 12480: HIVE-4732 Reduce or eliminate the expensive Schema equals() check for AvroSerde

2013-07-11 Thread Mohammad Islam

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12480/
---

Review request for hive, Ashutosh Chauhan and Jakob Homan.


Bugs: HIVE-4732
https://issues.apache.org/jira/browse/HIVE-4732


Repository: hive-git


Description
---

From our performance analysis, we found AvroSerde's schema.equals() call 
consumed a substantial amount ( nearly 40%) of time. This patch intends to 
minimize the number schema.equals() calls by pushing the check as late/fewer 
as possible.

At first, we added a unique id for each record reader which is then included in 
every AvroGenericRecordWritable. Then, we introduce two new data structures 
(one hashset and one hashmap) to store intermediate data to avoid duplicates 
checkings. Hashset contains all the record readers' IDs that don't need any 
re-encoding. On the other hand, HashMap contains the already used re-encoders. 
It works as cache and allows re-encoders reuse. With this change, our test 
shows nearly 40% reduction in Avro record reading time.
 
   


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroGenericRecordReader.java 
dbc999f 
  serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java 
c85ef15 
  
serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroGenericRecordWritable.java
 66f0348 
  serde/src/test/org/apache/hadoop/hive/serde2/avro/TestSchemaReEncoder.java 
9af751b 
  serde/src/test/org/apache/hadoop/hive/serde2/avro/Utils.java 2b948eb 

Diff: https://reviews.apache.org/r/12480/diff/


Testing
---


Thanks,

Mohammad Islam



[jira] [Updated] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde

2013-07-11 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-4732:


Attachment: HIVE-4732.v1.patch

 Reduce or eliminate the expensive Schema equals() check for AvroSerde
 -

 Key: HIVE-4732
 URL: https://issues.apache.org/jira/browse/HIVE-4732
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Mark Wagner
Assignee: Mohammad Kamrul Islam
 Attachments: HIVE-4732.1.patch, HIVE-4732.v1.patch


 The AvroSerde spends a significant amount of time checking schema equality. 
 Changing to compare hashcodes (which can be computed once then reused) will 
 improve performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde

2013-07-11 Thread Mohammad Kamrul Islam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706360#comment-13706360
 ] 

Mohammad Kamrul Islam commented on HIVE-4732:
-

New patch is uploaded in RB: https://reviews.apache.org/r/12480/

Description copied from RB:
From our performance analysis, we found AvroSerde's schema.equals() call 
consumed a substantial amount ( nearly 40%) of time. This patch intends to 
minimize the number schema.equals() calls by pushing the check as late/fewer 
as possible.

At first, we added a unique id for each record reader which is then included in 
every AvroGenericRecordWritable. Then, we introduce two new data structures 
(one hashset and one hashmap) to store intermediate data to avoid duplicates 
checkings. Hashset contains all the record readers' IDs that don't need any 
re-encoding. On the other hand, HashMap contains the already used re-encoders. 
It works as cache and allows re-encoders reuse. With this change, our test 
shows nearly 40% reduction in Avro record reading time.

   

 Reduce or eliminate the expensive Schema equals() check for AvroSerde
 -

 Key: HIVE-4732
 URL: https://issues.apache.org/jira/browse/HIVE-4732
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Mark Wagner
Assignee: Mohammad Kamrul Islam
 Attachments: HIVE-4732.1.patch, HIVE-4732.v1.patch


 The AvroSerde spends a significant amount of time checking schema equality. 
 Changing to compare hashcodes (which can be computed once then reused) will 
 improve performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde

2013-07-11 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-4732:


Status: Patch Available  (was: Open)

 Reduce or eliminate the expensive Schema equals() check for AvroSerde
 -

 Key: HIVE-4732
 URL: https://issues.apache.org/jira/browse/HIVE-4732
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Mark Wagner
Assignee: Mohammad Kamrul Islam
 Attachments: HIVE-4732.1.patch, HIVE-4732.v1.patch


 The AvroSerde spends a significant amount of time checking schema equality. 
 Changing to compare hashcodes (which can be computed once then reused) will 
 improve performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4843) Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and readability

2013-07-11 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706419#comment-13706419
 ] 

Gunther Hagleitner commented on HIVE-4843:
--

can you create a review on rb or phabricator please?

 Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and 
 readability
 ---

 Key: HIVE-4843
 URL: https://issues.apache.org/jira/browse/HIVE-4843
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0, tez-branch
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Attachments: HIVE-4843.1.patch


 Currently, there are static apis in multiple locations in ExecDriver and 
 MapRedTask that can be leveraged if put in the already existing utility class 
 in the exec package. This would help making the code more maintainable, 
 readable and also re-usable by other run-time infra such as tez.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4331) Integrated StorageHandler for Hive and HCat using the HiveStorageHandler

2013-07-11 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706460#comment-13706460
 ] 

Rajesh Balamohan commented on HIVE-4331:


This will be extremely beneficial for lots of usecases involving Hive, HBase, 
HCatalog and Pig.  Especially one can think of hosting frequently changed data 
in HBase and access it in Hive/Pig/MapReduce via HCatalog.


 Integrated StorageHandler for Hive and HCat using the HiveStorageHandler
 

 Key: HIVE-4331
 URL: https://issues.apache.org/jira/browse/HIVE-4331
 Project: Hive
  Issue Type: Task
  Components: HCatalog
Affects Versions: 0.11.0, 0.12.0
Reporter: Ashutosh Chauhan
Assignee: Viraj Bhat
 Attachments: StorageHandlerDesign_HIVE4331.pdf


 1) Deprecate the HCatHBaseStorageHandler and RevisionManager from HCatalog. 
 These will now continue to function but internally they will use the 
 DefaultStorageHandler from Hive. They will be removed in future release of 
 Hive.
 2) Design a HivePassThroughFormat so that any new StorageHandler in Hive will 
 bypass the HiveOutputFormat. We will use this class in Hive's 
 HBaseStorageHandler instead of the HiveHBaseTableOutputFormat.
 3) Write new unit tests in the HCat's storagehandler so that systems such 
 as Pig and Map Reduce can use the Hive's HBaseStorageHandler instead of the 
 HCatHBaseStorageHandler.
 4) Make sure all the old and new unit tests pass without backward 
 compatibility (except known issues as described in the Design Document).
 5) Replace all instances of the HCat source code, which point to 
 HCatStorageHandler to use theHiveStorageHandler including the 
 FosterStorageHandler.
 I have attached the design document for the same and will attach a patch to 
 this Jira.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4843) Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and readability

2013-07-11 Thread Vikram Dixit K (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706462#comment-13706462
 ] 

Vikram Dixit K commented on HIVE-4843:
--

https://reviews.apache.org/r/12476/

 Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and 
 readability
 ---

 Key: HIVE-4843
 URL: https://issues.apache.org/jira/browse/HIVE-4843
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0, tez-branch
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Attachments: HIVE-4843.1.patch


 Currently, there are static apis in multiple locations in ExecDriver and 
 MapRedTask that can be leveraged if put in the already existing utility class 
 in the exec package. This would help making the code more maintainable, 
 readable and also re-usable by other run-time infra such as tez.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4844) Add char/varchar data types

2013-07-11 Thread Jason Dere (JIRA)
Jason Dere created HIVE-4844:


 Summary: Add char/varchar data types
 Key: HIVE-4844
 URL: https://issues.apache.org/jira/browse/HIVE-4844
 Project: Hive
  Issue Type: New Feature
  Components: Types
Reporter: Jason Dere


Add new char/varchar data types which have support for more SQL-compliant 
behavior, such as SQL string comparison semantics, max length, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3745) Hive does improper = based string comparisons for strings with trailing whitespaces

2013-07-11 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706519#comment-13706519
 ] 

Jason Dere commented on HIVE-3745:
--

Would it make more sense to support the SQL comparison semantics using new char 
data types, so that we don't break existing behavior for strings? I've created 
HIVE-4844.

 Hive does improper = based string comparisons for strings with trailing 
 whitespaces
 -

 Key: HIVE-3745
 URL: https://issues.apache.org/jira/browse/HIVE-3745
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.9.0
Reporter: Harsh J
Assignee: Gang Tim Liu

 Compared to other systems such as DB2, MySQL, etc., which disregard trailing 
 whitespaces in a string used when comparing two strings with the {{=}} 
 relational operator, Hive does not do this.
 For example, note the following line from the MySQL manual: 
 http://dev.mysql.com/doc/refman/5.1/en/char.html
 {quote}
 All MySQL collations are of type PADSPACE. This means that all CHAR and 
 VARCHAR values in MySQL are compared without regard to any trailing spaces. 
 {quote}
 Hive still is whitespace sensitive and regards trailing spaces of a string as 
 worthy elements when comparing. Ideally {{LIKE}} should consider this 
 strongly, but {{=}} should not.
 Is there a specific reason behind this difference of implementation in Hive's 
 SQL?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4844) Add char/varchar data types

2013-07-11 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-4844:
-

Assignee: Jason Dere

 Add char/varchar data types
 ---

 Key: HIVE-4844
 URL: https://issues.apache.org/jira/browse/HIVE-4844
 Project: Hive
  Issue Type: New Feature
  Components: Types
Reporter: Jason Dere
Assignee: Jason Dere

 Add new char/varchar data types which have support for more SQL-compliant 
 behavior, such as SQL string comparison semantics, max length, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4841) Add partition level hook to HiveMetaHook

2013-07-11 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4841:
--

Attachment: HIVE-4841.D11673.1.patch

navis requested code review of HIVE-4841 [jira] Add partition level hook to 
HiveMetaHook.

Reviewers: JIRA

HIVE-4841 Add partition level hook to HiveMetaHook

Current HiveMetaHook provides hooks for tables only. With partition level hook, 
external storages also could be revised to exploit PPR.

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D11673

AFFECTED FILES
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java
  
hcatalog/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/HBaseHCatStorageHandler.java
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaHook.java
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java

MANAGE HERALD RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/27615/

To: JIRA, navis


 Add partition level hook to HiveMetaHook
 

 Key: HIVE-4841
 URL: https://issues.apache.org/jira/browse/HIVE-4841
 Project: Hive
  Issue Type: Improvement
  Components: StorageHandler
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-4841.D11673.1.patch


 Current HiveMetaHook provides hooks for tables only. With partition level 
 hook, external storages also could be revised to exploit PPR.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4841) Add partition level hook to HiveMetaHook

2013-07-11 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706557#comment-13706557
 ] 

Navis commented on HIVE-4841:
-

I've consolidated various methods,

add_partition_with_environment_context()
append_partition_with_environment_context()
append_partition_by_name_with_environment_context()

into single entry point

add_partition_with_environment_context()

add passed all tests

 Add partition level hook to HiveMetaHook
 

 Key: HIVE-4841
 URL: https://issues.apache.org/jira/browse/HIVE-4841
 Project: Hive
  Issue Type: Improvement
  Components: StorageHandler
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-4841.D11673.1.patch


 Current HiveMetaHook provides hooks for tables only. With partition level 
 hook, external storages also could be revised to exploit PPR.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4841) Add partition level hook to HiveMetaHook

2013-07-11 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-4841:


Status: Patch Available  (was: Open)

 Add partition level hook to HiveMetaHook
 

 Key: HIVE-4841
 URL: https://issues.apache.org/jira/browse/HIVE-4841
 Project: Hive
  Issue Type: Improvement
  Components: StorageHandler
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-4841.D11673.1.patch


 Current HiveMetaHook provides hooks for tables only. With partition level 
 hook, external storages also could be revised to exploit PPR.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4331) Integrated StorageHandler for Hive and HCat using the HiveStorageHandler

2013-07-11 Thread Viraj Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Bhat updated HIVE-4331:
-

Attachment: HIVE_4331.patch

Initial patch will put it on review board

 Integrated StorageHandler for Hive and HCat using the HiveStorageHandler
 

 Key: HIVE-4331
 URL: https://issues.apache.org/jira/browse/HIVE-4331
 Project: Hive
  Issue Type: Task
  Components: HCatalog
Affects Versions: 0.11.0, 0.12.0
Reporter: Ashutosh Chauhan
Assignee: Viraj Bhat
 Attachments: HIVE_4331.patch, StorageHandlerDesign_HIVE4331.pdf


 1) Deprecate the HCatHBaseStorageHandler and RevisionManager from HCatalog. 
 These will now continue to function but internally they will use the 
 DefaultStorageHandler from Hive. They will be removed in future release of 
 Hive.
 2) Design a HivePassThroughFormat so that any new StorageHandler in Hive will 
 bypass the HiveOutputFormat. We will use this class in Hive's 
 HBaseStorageHandler instead of the HiveHBaseTableOutputFormat.
 3) Write new unit tests in the HCat's storagehandler so that systems such 
 as Pig and Map Reduce can use the Hive's HBaseStorageHandler instead of the 
 HCatHBaseStorageHandler.
 4) Make sure all the old and new unit tests pass without backward 
 compatibility (except known issues as described in the Design Document).
 5) Replace all instances of the HCat source code, which point to 
 HCatStorageHandler to use theHiveStorageHandler including the 
 FosterStorageHandler.
 I have attached the design document for the same and will attach a patch to 
 this Jira.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4658) Make KW_OUTER optional in outer joins

2013-07-11 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706581#comment-13706581
 ] 

Edward Capriolo commented on HIVE-4658:
---

Can we go +1?

 Make KW_OUTER optional in outer joins
 -

 Key: HIVE-4658
 URL: https://issues.apache.org/jira/browse/HIVE-4658
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Edward Capriolo
Priority: Trivial
 Attachments: hive-4658.2.patch.txt, HIVE-4658.D11091.1.patch


 For really trivial migration issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3404) UDF to obtain the quarter of an year if a date or timestamp is given .

2013-07-11 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706586#comment-13706586
 ] 

Edward Capriolo commented on HIVE-3404:
---

+1 

 UDF to obtain the quarter of an year if a date or timestamp is given .
 --

 Key: HIVE-3404
 URL: https://issues.apache.org/jira/browse/HIVE-3404
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Reporter: Sanam Naz
 Attachments: HIVE-3404.1.patch.txt


 Hive current releases lacks a function which returns the quarter of an year 
 if a date or timestamp is given .The function QUARTER(date) would return the 
 quarter  from a date / timestamp .This can be used in HiveQL.This will be 
 useful for different domains like retail ,finance etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3404) UDF to obtain the quarter of an year if a date or timestamp is given .

2013-07-11 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706593#comment-13706593
 ] 

Edward Capriolo commented on HIVE-3404:
---

You also need to update show_functions.q

 UDF to obtain the quarter of an year if a date or timestamp is given .
 --

 Key: HIVE-3404
 URL: https://issues.apache.org/jira/browse/HIVE-3404
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Reporter: Sanam Naz
 Attachments: HIVE-3404.1.patch.txt


 Hive current releases lacks a function which returns the quarter of an year 
 if a date or timestamp is given .The function QUARTER(date) would return the 
 quarter  from a date / timestamp .This can be used in HiveQL.This will be 
 useful for different domains like retail ,finance etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-1446) Move Hive Documentation from the wiki to version control

2013-07-11 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo resolved HIVE-1446.
---

Resolution: Fixed

 Move Hive Documentation from the wiki to version control
 

 Key: HIVE-1446
 URL: https://issues.apache.org/jira/browse/HIVE-1446
 Project: Hive
  Issue Type: Task
  Components: Documentation
Reporter: Carl Steinbach
Assignee: Carl Steinbach
 Attachments: hive-1446.diff, hive-1446-part-1.diff, hive-logo-wide.png


 Move the Hive Language Manual (and possibly some other documents) from the 
 Hive wiki to version control. This work needs to be coordinated with the 
 hive-dev and hive-user community in order to avoid missing any edits as well 
 as to avoid or limit unavailability of the docs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2989) Adding Table Links to Hive

2013-07-11 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706597#comment-13706597
 ] 

Edward Capriolo commented on HIVE-2989:
---

Did we ditch this idea? should we close up shop?

 Adding Table Links to Hive
 --

 Key: HIVE-2989
 URL: https://issues.apache.org/jira/browse/HIVE-2989
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Query Processor, Security
Affects Versions: 0.10.0
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
 Attachments: HIVE-2989.10.patch.txt, HIVE-2989.1.patch.txt, 
 HIVE-2989.2.patch.txt, HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, 
 HIVE-2989.5.patch.txt, HIVE-2989.6.patch.txt, HIVE-2989.9.patch.txt

   Original Estimate: 672h
  Remaining Estimate: 672h

 This will add Table Links to Hive. This will be an alternate mechanism for a 
 user to access tables and data in a database that is different from the one 
 he is associated with. This feature can be used to provide access control (if 
 access to databasename.tablename in queries and use database X is turned 
 off in conjunction).
 If db X wants to access one or more partitions from table T in db Y, the user 
 will issue:
 CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N')
 New partitions added to T will automatically be added to the link as well and 
 become available to X. However, if the link is specified to be static, that 
 will not be the case. The X user will then have to explicitly import each 
 partition of T that he needs. The command above will not actually make any 
 existing partitions of T available to X. Instead, we provide the following 
 command to add an existing partition to a link:
 ALTER LINK T@Y ADD PARTITION (ds='2012-04-27')
 The user will need to execute the above for each existing partition that 
 needs to be imported. For future partitions, Hive will take care of this. An 
 imported partition can be dropped from a link using a similar command. We 
 just specify DROP instead of ADD. For querying the linked table, the X 
 user will refer to it as T@Y. Link Tables will only have read access and not 
 be writable. The entire Table Link alongwith all its imported partitions can 
 be dropped as follows:
 DROP LINK TO T@Y
 The above commands are purely MetaStore operations. The implementation will 
 rely on replicating the entire partition metadata when a partition is added 
 to a link.  For every link that is created, we will add a new row to table 
 TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or 
 STATIC_LINK_TABLE if the link has been specified as static). A new column 
 LINK_TBL_ID will be added which will contain the id of the imported table. It 
 will be NULL for all other table types including the regular managed tables. 
 When a partition is added to a link, the new row in the table PARTITIONS will 
 point to the LINK_TABLE in the same database  and not the master table in the 
 other database. We will replicate all the metadata for this partition from 
 the master database. The advantage of this approach is that fewer changes 
 will be needed in query processing and DDL for LINK_TABLEs. Also, commands 
 like SHOW TABLES and SHOW PARTITIONS will work as expected for 
 LINK_TABLEs too. Of course, even though the metadata is not shared, the 
 underlying data on disk is still shared. Hive still needs to know that when 
 dropping a partition which belongs to a LINK_TABLE, it should not drop the 
 underlying data from HDFS. Views and external tables cannot be imported from 
 one database to another.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2989) Adding Table Links to Hive

2013-07-11 Thread Bhushan Mandhani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706598#comment-13706598
 ] 

Bhushan Mandhani commented on HIVE-2989:


Hi, Bhushan Mandhani is no longer at Facebook so this email address is no 
longer being monitored. If you need assistance, please contact another person 
who is currently at the company.


 Adding Table Links to Hive
 --

 Key: HIVE-2989
 URL: https://issues.apache.org/jira/browse/HIVE-2989
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Query Processor, Security
Affects Versions: 0.10.0
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
 Attachments: HIVE-2989.10.patch.txt, HIVE-2989.1.patch.txt, 
 HIVE-2989.2.patch.txt, HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, 
 HIVE-2989.5.patch.txt, HIVE-2989.6.patch.txt, HIVE-2989.9.patch.txt

   Original Estimate: 672h
  Remaining Estimate: 672h

 This will add Table Links to Hive. This will be an alternate mechanism for a 
 user to access tables and data in a database that is different from the one 
 he is associated with. This feature can be used to provide access control (if 
 access to databasename.tablename in queries and use database X is turned 
 off in conjunction).
 If db X wants to access one or more partitions from table T in db Y, the user 
 will issue:
 CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N')
 New partitions added to T will automatically be added to the link as well and 
 become available to X. However, if the link is specified to be static, that 
 will not be the case. The X user will then have to explicitly import each 
 partition of T that he needs. The command above will not actually make any 
 existing partitions of T available to X. Instead, we provide the following 
 command to add an existing partition to a link:
 ALTER LINK T@Y ADD PARTITION (ds='2012-04-27')
 The user will need to execute the above for each existing partition that 
 needs to be imported. For future partitions, Hive will take care of this. An 
 imported partition can be dropped from a link using a similar command. We 
 just specify DROP instead of ADD. For querying the linked table, the X 
 user will refer to it as T@Y. Link Tables will only have read access and not 
 be writable. The entire Table Link alongwith all its imported partitions can 
 be dropped as follows:
 DROP LINK TO T@Y
 The above commands are purely MetaStore operations. The implementation will 
 rely on replicating the entire partition metadata when a partition is added 
 to a link.  For every link that is created, we will add a new row to table 
 TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or 
 STATIC_LINK_TABLE if the link has been specified as static). A new column 
 LINK_TBL_ID will be added which will contain the id of the imported table. It 
 will be NULL for all other table types including the regular managed tables. 
 When a partition is added to a link, the new row in the table PARTITIONS will 
 point to the LINK_TABLE in the same database  and not the master table in the 
 other database. We will replicate all the metadata for this partition from 
 the master database. The advantage of this approach is that fewer changes 
 will be needed in query processing and DDL for LINK_TABLEs. Also, commands 
 like SHOW TABLES and SHOW PARTITIONS will work as expected for 
 LINK_TABLEs too. Of course, even though the metadata is not shared, the 
 underlying data on disk is still shared. Hive still needs to know that when 
 dropping a partition which belongs to a LINK_TABLE, it should not drop the 
 underlying data from HDFS. Views and external tables cannot be imported from 
 one database to another.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-2591) Hive 0.7.1 fails with Exception in thread main java.lang.NoSuchFieldError: type

2013-07-11 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo resolved HIVE-2591.
---

Resolution: Won't Fix

 Hive 0.7.1 fails with Exception in thread main java.lang.NoSuchFieldError: 
 type
 ---

 Key: HIVE-2591
 URL: https://issues.apache.org/jira/browse/HIVE-2591
 Project: Hive
  Issue Type: Bug
  Components: CLI, JDBC, SQL
Affects Versions: 0.7.1
 Environment: Intel Core2 Quad CPU Q8400 @2.66GHz
 4 GB RAM
 Ubuntu 10.10 32 bit
 JDK 6.0_27
 Apache Ant 1.8.0
 Apache Hive 0.7.1
 Apache Hadoop 0.20.203.0
Reporter: Prashanth
Priority: Blocker
  Labels: hive

 Hi,
 When I try to invoke hive and type in SHOW TABLES in cli in the environment 
 as explained above, I get Exception in thread main 
 java.lang.NoSuchFieldError: type and I am not able to use it at all.
 Is there any temporary fix for this? Please let me know, if I am making any 
 mistake here.
 I have downloaded Hive 0.7.1 from the download link as mentioned in the Hive 
 Wiki. The download url is http://hive.apache.org/releases.html.
 /opt/hive-0.7.1$ hive
 WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use 
 org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
 Hive history file=/tmp/hadoop/hive_job_log_hduser_20190121_764439225.txt
 hive SHOW TABLES;
 Exception in thread main java.lang.NoSuchFieldError: type
 at 
 org.apache.hadoop.hive.ql.parse.HiveLexer.mKW_SHOW(HiveLexer.java:1234)
 at 
 org.apache.hadoop.hive.ql.parse.HiveLexer.mTokens(HiveLexer.java:5942)
 at org.antlr.runtime.Lexer.nextToken(Lexer.java:89)
 at 
 org.antlr.runtime.BufferedTokenStream.fetch(BufferedTokenStream.java:133)
 at 
 org.antlr.runtime.BufferedTokenStream.sync(BufferedTokenStream.java:127)
 at 
 org.antlr.runtime.CommonTokenStream.setup(CommonTokenStream.java:127)
 at org.antlr.runtime.CommonTokenStream.LT(CommonTokenStream.java:91)
 at 
 org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:521)
 at 
 org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:436)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:327)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:736)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
 I am not sure what is the actual issue here or rather how to fix it.
 Can you please let me know if there is any workaround for this.
 Alternatively I tried building hive from the SVN source repo.
 I am neither able to build hive from SVN. I get the following error.
 [datanucleusenhancer]   D:\hive\build\ivy\lib\default\zookeeper-3.3.1.jar
 [datanucleusenhancer] Exception in thread main java.lang.VerifyError: 
 Expecting a stackmap frame at branch target 76 in method 
 org.apache.hadoop.hive.metastore.model.MDatabase.jdoCopyField(Lorg/apache/hadoop/hive/metastore/model/MDatabase;I)V
  at offset 1
 [datanucleusenhancer]   at java.lang.Class.getDeclaredFields0(Native Method)
 [datanucleusenhancer]   at 
 java.lang.Class.privateGetDeclaredFields(Class.java:2308)
 [datanucleusenhancer]   at java.lang.Class.getDeclaredFields(Class.java:1760)
 [datanucleusenhancer]   at 
 org.datanucleus.metadata.ClassMetaData.addMetaDataForMembersNotInMetaData(ClassMetaData.java:358)
 [datanucleusenhancer]   at 
 org.datanucleus.metadata.ClassMetaData.populate(ClassMetaData.java:199)
 [datanucleusenhancer]   at 
 org.datanucleus.metadata.MetaDataManager$1.run(MetaDataManager.java:2394)
 [datanucleusenhancer]   at java.security.AccessController.doPrivileged(Native 
 Method)
 [datanucleusenhancer]   at 
 org.datanucleus.metadata.MetaDataManager.populateAbstractClassMetaData(MetaDataManager.java:2388)
 [datanucleusenhancer]   at 
 org.datanucleus.metadata.MetaDataManager.populateFileMetaData(MetaDataManager.java:2225)
 [datanucleusenhancer]   at 
 org.datanucleus.metadata.MetaDataManager.initialiseFileMetaDataForUse(MetaDataManager.java:925)
 [datanucleusenhancer]   at 
 org.datanucleus.metadata.MetaDataManager.loadMetadataFiles(MetaDataManager.java:399)
 [datanucleusenhancer]   at 
 

[jira] [Resolved] (HIVE-2989) Adding Table Links to Hive

2013-07-11 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo resolved HIVE-2989.
---

Resolution: Won't Fix

 Adding Table Links to Hive
 --

 Key: HIVE-2989
 URL: https://issues.apache.org/jira/browse/HIVE-2989
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Query Processor, Security
Affects Versions: 0.10.0
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
 Attachments: HIVE-2989.10.patch.txt, HIVE-2989.1.patch.txt, 
 HIVE-2989.2.patch.txt, HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, 
 HIVE-2989.5.patch.txt, HIVE-2989.6.patch.txt, HIVE-2989.9.patch.txt

   Original Estimate: 672h
  Remaining Estimate: 672h

 This will add Table Links to Hive. This will be an alternate mechanism for a 
 user to access tables and data in a database that is different from the one 
 he is associated with. This feature can be used to provide access control (if 
 access to databasename.tablename in queries and use database X is turned 
 off in conjunction).
 If db X wants to access one or more partitions from table T in db Y, the user 
 will issue:
 CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N')
 New partitions added to T will automatically be added to the link as well and 
 become available to X. However, if the link is specified to be static, that 
 will not be the case. The X user will then have to explicitly import each 
 partition of T that he needs. The command above will not actually make any 
 existing partitions of T available to X. Instead, we provide the following 
 command to add an existing partition to a link:
 ALTER LINK T@Y ADD PARTITION (ds='2012-04-27')
 The user will need to execute the above for each existing partition that 
 needs to be imported. For future partitions, Hive will take care of this. An 
 imported partition can be dropped from a link using a similar command. We 
 just specify DROP instead of ADD. For querying the linked table, the X 
 user will refer to it as T@Y. Link Tables will only have read access and not 
 be writable. The entire Table Link alongwith all its imported partitions can 
 be dropped as follows:
 DROP LINK TO T@Y
 The above commands are purely MetaStore operations. The implementation will 
 rely on replicating the entire partition metadata when a partition is added 
 to a link.  For every link that is created, we will add a new row to table 
 TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or 
 STATIC_LINK_TABLE if the link has been specified as static). A new column 
 LINK_TBL_ID will be added which will contain the id of the imported table. It 
 will be NULL for all other table types including the regular managed tables. 
 When a partition is added to a link, the new row in the table PARTITIONS will 
 point to the LINK_TABLE in the same database  and not the master table in the 
 other database. We will replicate all the metadata for this partition from 
 the master database. The advantage of this approach is that fewer changes 
 will be needed in query processing and DDL for LINK_TABLEs. Also, commands 
 like SHOW TABLES and SHOW PARTITIONS will work as expected for 
 LINK_TABLEs too. Of course, even though the metadata is not shared, the 
 underlying data on disk is still shared. Hive still needs to know that when 
 dropping a partition which belongs to a LINK_TABLE, it should not drop the 
 underlying data from HDFS. Views and external tables cannot be imported from 
 one database to another.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2608) Do not require AS a,b,c part in LATERAL VIEW

2013-07-11 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-2608:
--

Status: Open  (was: Patch Available)

PAtch needs to be rebased.

 Do not require AS a,b,c part in LATERAL VIEW
 

 Key: HIVE-2608
 URL: https://issues.apache.org/jira/browse/HIVE-2608
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor, UDF
Affects Versions: 0.10.0
Reporter: Igor Kabiljo
Assignee: Navis
Priority: Minor
 Attachments: HIVE-2608.D4317.5.patch


 Currently, it is required to state column names when LATERAL VIEW is used.
 That shouldn't be necessary, since UDTF returns struct which contains column 
 names - and they should be used by default.
 For example, it would be great if this was possible:
 SELECT t.*, t.key1 + t.key4
 FROM some_table
 LATERAL VIEW JSON_TUPLE(json, 'key1', 'key2', 'key3', 'key3') t;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Hiveserver2 JDBC Client -SQL Select exceptions

2013-07-11 Thread Varunkumar Manohar
I am trying to execute a hive query from JDBC CLient.I am using Hiveserver2
currently.
A very basic query throws SQL exception only from the JDBC CLient and not
from the CLI
*
The queries shown below execute successfully on the CLI*

From the JDBC client
*select * from tableA*  works fine

whereas if I try to provide a column name and execute the query from the
JDBC CLient  I land into errors
*select col1,col2 from tableA *

throws up the following SQL exception. Is anyone facing the same issue ?
*
Exception in thread main java.sql.SQLException: Error while processing
statement: FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.MapRedTask
at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:159)
at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:147)
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:182)
at
org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:246)


*
I
s there a fix for the issue?

Thanks,Varun

-- 
_
Regards,
Varun


[jira] [Commented] (HIVE-3488) Issue trying to use the thick client (embedded) from windows.

2013-07-11 Thread Kanwaljit Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706668#comment-13706668
 ] 

Kanwaljit Singh commented on HIVE-3488:
---

We are getting a similar error after dropping all partitions:

java.io.IOException: cannot find dir = 
hdfs://HVEname:9000/tmp/hive-admin/hive_2013-07-12_05-31-36_471_398021424951
1966905/-mr-10002/1/emptyFile in pathToPartitionInfo: 
[hdfs://192.168.156.229:9000/tmp/hive-admin/hive_2013-07-12_0
5-31-36_471_3980214249511966905/-mr-10002/1]
at 
org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils
.java:298)
at 
org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils
.java:260)
at 
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat$CombineHiveInputSplit.init(CombineHiveInputFormat.
java:104)
at 
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:407)
at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:929)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:921)

 Issue trying to use the thick client (embedded) from windows.
 -

 Key: HIVE-3488
 URL: https://issues.apache.org/jira/browse/HIVE-3488
 Project: Hive
  Issue Type: Bug
  Components: Windows
Affects Versions: 0.8.1
Reporter: Rémy DUBOIS
Priority: Critical

 I'm trying to execute a very simple SELECT query against my remote hive 
 server.
 If I'm doing a SELECT * from table, everything works well. If I'm trying to 
 execute a SELECT name from table, this error appears:
 {code:java}
 Job Submission failed with exception 'java.io.IOException(cannot find dir = 
 /user/hive/warehouse/test/city=paris/out.csv in pathToPartitionInfo: 
 [hdfs://cdh-four:8020/user/hive/warehouse/test/city=paris])'
 12/09/19 17:18:44 ERROR exec.Task: Job Submission failed with exception 
 'java.io.IOException(cannot find dir = 
 /user/hive/warehouse/test/city=paris/out.csv in pathToPartitionInfo: 
 [hdfs://cdh-four:8020/user/hive/warehouse/test/city=paris])'
 java.io.IOException: cannot find dir = 
 /user/hive/warehouse/test/city=paris/out.csv in pathToPartitionInfo: 
 [hdfs://cdh-four:8020/user/hive/warehouse/test/city=paris]
   at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:290)
   at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:257)
   at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat$CombineHiveInputSplit.init(CombineHiveInputFormat.java:104)
   at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:407)
   at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:989)
   at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:981)
   at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
   at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:891)
   at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:844)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Unknown Source)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
   at 
 org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:844)
   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:818)
   at 
 org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:452)
   at 
 org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:136)
   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:133)
   at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1332)
   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1123)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:931)
   at 
 org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:191)
   at 
 org.apache.hadoop.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:187)
 {code}
 Indeed, this dir (/user/hive/warehouse/test/city=paris/out.csv) can't be 
 found since it deals with my data file, and not a directory.
 Could you please help me?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira