[jira] [Commented] (HIVE-784) Support uncorrelated subqueries in the WHERE clause
[ https://issues.apache.org/jira/browse/HIVE-784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705581#comment-13705581 ] Navis commented on HIVE-784: Added some comments Support uncorrelated subqueries in the WHERE clause --- Key: HIVE-784 URL: https://issues.apache.org/jira/browse/HIVE-784 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Ning Zhang Assignee: Matthew Weaver Attachments: HIVE-784.1.patch.txt Hive currently only support views in the FROM-clause, some Facebook use cases suggest that Hive should support subqueries such as those connected by IN/EXISTS in the WHERE-clause. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3691) TestDynamicSerDe failed with IBM JDK
[ https://issues.apache.org/jira/browse/HIVE-3691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705679#comment-13705679 ] Hudson commented on HIVE-3691: -- Integrated in Hive-trunk-hadoop2 #282 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/282/]) HIVE-3691 : TestDynamicSerDe failed with IBM JDK (Bing Li Renata Ghisloti via Ashutosh Chauhan) (Revision 1501687) Result = ABORTED hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1501687 Files : * /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/dynamic_type/TestDynamicSerDe.java TestDynamicSerDe failed with IBM JDK Key: HIVE-3691 URL: https://issues.apache.org/jira/browse/HIVE-3691 Project: Hive Issue Type: Bug Affects Versions: 0.7.1, 0.8.0, 0.9.0 Environment: ant-1.8.2, IBM JDK 1.6 Reporter: Bing Li Assignee: Bing Li Priority: Minor Fix For: 0.12.0 Attachments: HIVE-3691.1.patch-trunk.txt, HIVE-3691.1.patch.txt the order of the output in the gloden file are different from JDKs. the root cause of this is the implementation of HashMap in JDK -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4807) Hive metastore hangs
[ https://issues.apache.org/jira/browse/HIVE-4807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705678#comment-13705678 ] Hudson commented on HIVE-4807: -- Integrated in Hive-trunk-hadoop2 #282 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/282/]) HIVE-4807 : Hive metastore hangs (Sarvesh Sakalanaga via Ashutosh Chauhan) (Revision 1501675) Result = ABORTED hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1501675 Files : * /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java * /hive/trunk/ivy/libraries.properties * /hive/trunk/jdbc/build.xml * /hive/trunk/metastore/ivy.xml Hive metastore hangs Key: HIVE-4807 URL: https://issues.apache.org/jira/browse/HIVE-4807 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.8.0, 0.9.0, 0.10.0, 0.11.0 Reporter: Sarvesh Sakalanaga Assignee: Sarvesh Sakalanaga Fix For: 0.12.0 Attachments: Hive-4807.0.patch, Hive-4807.1.patch, Hive-4807.2.patch Hive metastore hangs (does not accept any new connections) due to a bug in DBCP. The root cause analysis is here https://issues.apache.org/jira/browse/DBCP-398. The fix is to change Hive connection pool to BoneCP which is natively supported by DataNucleus. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4838) Refactor MapJoin HashMap code to improve testability and readability
[ https://issues.apache.org/jira/browse/HIVE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-4838: --- Attachment: HIVE-4838.patch Running tests on the attached patch. Refactor MapJoin HashMap code to improve testability and readability Key: HIVE-4838 URL: https://issues.apache.org/jira/browse/HIVE-4838 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-4838.patch MapJoin is an essential component for high performance joins in Hive and the current code has done great service for many years. However, the code is showing it's age and currently suffers from the following issues: * Uses static state via the MapJoinMetaData class to pass serialization metadata to the Key, Row classes. * The api of a logical Table Container is not defined and therefore it's unclear what apis HashMapWrapper needs to publicize. Additionally HashMapWrapper has many used public methods. * HashMapWrapper contains logic to serialize, test memory bounds, and implement the table container. Ideally these logical units could be seperated * HashTableSinkObjectCtx has unused fields and unused methods * CommonJoinOperator and children use ArrayList on left hand side when only List is required * There are unused classes MRU, DCLLItemm and classes which duplicate functionality MapJoinSingleKey and MapJoinDoubleKeys -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2991) Integrate Clover with Hive
[ https://issues.apache.org/jira/browse/HIVE-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan A. Veselovsky updated HIVE-2991: - Attachment: HIVE-clover-trunk--N1.patch HIVE-clover-branch-0.11--N1.patch HIVE-clover-branch-0.10--N1.patch The attached patches HIVE-clover-xxx.patch are somewhat updated versions of the clovering. We used them in parallel builds. Except clovering itself the patches introduce the following changes: 1) .q files test generator changed to split test classes by groups of tests (10 test cases) per class is the default. This is needed to avoid huge test classes -- needed for parralalized and distributed builds. 2) added test-lightweight target that allows to run a batch of tests without re-generation/re-compilation. This is badly needed in parallelized and distributed builds. 3) we introduce testcase-list parameter that allows to pass several test class names to execute. The names are to be passed in form of comma-separated list with each name in the list being in form **/a/b/c/TestFoo.*. Last asterisk is needed because main project accepts .class names, while HCatalog accepts .java names. 4) + several more improvements related to clover instrumentation, reporting, etc. Integrate Clover with Hive -- Key: HIVE-2991 URL: https://issues.apache.org/jira/browse/HIVE-2991 Project: Hive Issue Type: Test Components: Testing Infrastructure Affects Versions: 0.9.0 Reporter: Ashutosh Chauhan Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2991.D2985.1.patch, hive.2991.1.branch-0.10.patch, hive.2991.1.branch-0.9.patch, hive.2991.1.trunk.patch, hive.2991.2.branch-0.10.patch, hive.2991.2.branch-0.9.patch, hive.2991.2.trunk.patch, hive.2991.3.branch-0.10.patch, hive.2991.3.branch-0.9.patch, hive.2991.3.trunk.patch, hive.2991.4.branch-0.10.patch, hive.2991.4.branch-0.9.patch, hive.2991.4.trunk.patch, HIVE-clover-branch-0.10--N1.patch, HIVE-clover-branch-0.11--N1.patch, HIVE-clover-trunk--N1.patch, hive-trunk-clover-html-report.zip Atlassian has donated license of their code coverage tool Clover to ASF. Lets make use of it to generate code coverage report to figure out which areas of Hive are well tested and which ones are not. More information about license can be found in Hadoop jira HADOOP-1718 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4675) Create new parallel unit test environment
[ https://issues.apache.org/jira/browse/HIVE-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705943#comment-13705943 ] Vikram Dixit K commented on HIVE-4675: -- I used this framework to run tests on hive on a single node. It took about half the time that it normally takes which is great. However, I am unable to figure out the failing tests. I got a message that goes: TestOrcHCatLoader has one or more failing tests... Also, it doesn't seem like the output is integrated with the ant testreport target. It would be great to see a summary of failing tests. Could you please elaborate on how to get an idea of the failing tests. Thanks! Create new parallel unit test environment - Key: HIVE-4675 URL: https://issues.apache.org/jira/browse/HIVE-4675 Project: Hive Issue Type: Improvement Components: Testing Infrastructure Reporter: Brock Noland Assignee: Brock Noland Fix For: 0.12.0 Attachments: HIVE-4675.patch The current ptest tool is great, but it has the following limitations: -Requires an NFS filer -Unless the NFS filer is dedicated ptests can become IO bound easily -Investigating of failures is troublesome because the source directory for the failure is not saved -Ignoring or isolated tests is not supported -No unit tests for the ptest framework exist It'd be great to have a ptest tool that addresses this limitations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4675) Create new parallel unit test environment
[ https://issues.apache.org/jira/browse/HIVE-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705947#comment-13705947 ] Brock Noland commented on HIVE-4675: Hi, Great to hear! The TEST-*.xml file should be in the logs directory in the working dir. Typically we run this via jenkins and then in the jenkins build script copy the TEST-*.xml files into a directory for jenkins to parse. I think we could generate some kind of report as well, did you want to create an enhancement request describing what you'd like? Brock Create new parallel unit test environment - Key: HIVE-4675 URL: https://issues.apache.org/jira/browse/HIVE-4675 Project: Hive Issue Type: Improvement Components: Testing Infrastructure Reporter: Brock Noland Assignee: Brock Noland Fix For: 0.12.0 Attachments: HIVE-4675.patch The current ptest tool is great, but it has the following limitations: -Requires an NFS filer -Unless the NFS filer is dedicated ptests can become IO bound easily -Investigating of failures is troublesome because the source directory for the failure is not saved -Ignoring or isolated tests is not supported -No unit tests for the ptest framework exist It'd be great to have a ptest tool that addresses this limitations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-2991) Integrate Clover with Hive
[ https://issues.apache.org/jira/browse/HIVE-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan A. Veselovsky reassigned HIVE-2991: Assignee: Ivan A. Veselovsky Integrate Clover with Hive -- Key: HIVE-2991 URL: https://issues.apache.org/jira/browse/HIVE-2991 Project: Hive Issue Type: Test Components: Testing Infrastructure Affects Versions: 0.9.0 Reporter: Ashutosh Chauhan Assignee: Ivan A. Veselovsky Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2991.D2985.1.patch, hive.2991.1.branch-0.10.patch, hive.2991.1.branch-0.9.patch, hive.2991.1.trunk.patch, hive.2991.2.branch-0.10.patch, hive.2991.2.branch-0.9.patch, hive.2991.2.trunk.patch, hive.2991.3.branch-0.10.patch, hive.2991.3.branch-0.9.patch, hive.2991.3.trunk.patch, hive.2991.4.branch-0.10.patch, hive.2991.4.branch-0.9.patch, hive.2991.4.trunk.patch, HIVE-clover-branch-0.10--N1.patch, HIVE-clover-branch-0.11--N1.patch, HIVE-clover-trunk--N1.patch, hive-trunk-clover-html-report.zip Atlassian has donated license of their code coverage tool Clover to ASF. Lets make use of it to generate code coverage report to figure out which areas of Hive are well tested and which ones are not. More information about license can be found in Hadoop jira HADOOP-1718 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4160) Vectorized Query Execution in Hive
[ https://issues.apache.org/jira/browse/HIVE-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706089#comment-13706089 ] Jitendra Nath Pandey commented on HIVE-4160: Dmitry, Vinod There is significant amount of vectorization work in expression evaluation for example, arithmetic expressions or logical expressions or aggregations etc. Many of these expressions are pretty generic and different systems are likely to have similar semantics for these. It should be possible to re-use this code with little change in pig or other systems. It will be required to use same vectorized representation of data in the processing engine to re-use these expressions, but that part of code is also generic and re-usable. I think that could be a good starting point. However, a bunch of the vectorization work is in operator code where we have vectorized version of the hive operators. These operators are closely tied with hive semantics and implementation. Therefore, it will need some restructuring in hive code base as well to generalize these operators for re-use in other projects. Also, at this point we should be thinking more generally about a common physical layer shared between pig and hive. These languages can continue to have different logical plans but it would be desirable that they share common physical plan structure because they both use same map-reduce runtime. Vectorized Query Execution in Hive -- Key: HIVE-4160 URL: https://issues.apache.org/jira/browse/HIVE-4160 Project: Hive Issue Type: New Feature Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: Hive-Vectorized-Query-Execution-Design.docx, Hive-Vectorized-Query-Execution-Design-rev2.docx, Hive-Vectorized-Query-Execution-Design-rev3.docx, Hive-Vectorized-Query-Execution-Design-rev3.docx, Hive-Vectorized-Query-Execution-Design-rev3.pdf, Hive-Vectorized-Query-Execution-Design-rev4.docx, Hive-Vectorized-Query-Execution-Design-rev4.pdf, Hive-Vectorized-Query-Execution-Design-rev5.docx, Hive-Vectorized-Query-Execution-Design-rev5.pdf, Hive-Vectorized-Query-Execution-Design-rev6.docx, Hive-Vectorized-Query-Execution-Design-rev6.pdf, Hive-Vectorized-Query-Execution-Design-rev7.docx, Hive-Vectorized-Query-Execution-Design-rev8.docx, Hive-Vectorized-Query-Execution-Design-rev8.pdf, Hive-Vectorized-Query-Execution-Design-rev9.docx, Hive-Vectorized-Query-Execution-Design-rev9.pdf The Hive query execution engine currently processes one row at a time. A single row of data goes through all the operators before the next row can be processed. This mode of processing is very inefficient in terms of CPU usage. Research has demonstrated that this yields very low instructions per cycle [MonetDB X100]. Also currently Hive heavily relies on lazy deserialization and data columns go through a layer of object inspectors that identify column type, deserialize data and determine appropriate expression routines in the inner loop. These layers of virtual method calls further slow down the processing. This work will add support for vectorized query execution to Hive, where, instead of individual rows, batches of about a thousand rows at a time are processed. Each column in the batch is represented as a vector of a primitive data type. The inner loop of execution scans these vectors very fast, avoiding method calls, deserialization, unnecessary if-then-else, etc. This substantially reduces CPU time used, and gives excellent instructions per cycle (i.e. improved processor pipeline utilization). See the attached design specification for more details. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2991) Integrate Clover with Hive
[ https://issues.apache.org/jira/browse/HIVE-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706097#comment-13706097 ] Ashutosh Chauhan commented on HIVE-2991: [~iveselovsky] Seems like you have expanded the scope of this jira quite a bit. Your other changes (introducing targets in build system) are quite useful, but they are orthogonal to clover integration (as far as i understand). I would suggest to split the patch in three parts: one for clover integration, second for improvement in test infrastructure and third for improvements in build infra. Integrate Clover with Hive -- Key: HIVE-2991 URL: https://issues.apache.org/jira/browse/HIVE-2991 Project: Hive Issue Type: Test Components: Testing Infrastructure Affects Versions: 0.9.0 Reporter: Ashutosh Chauhan Assignee: Ivan A. Veselovsky Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2991.D2985.1.patch, hive.2991.1.branch-0.10.patch, hive.2991.1.branch-0.9.patch, hive.2991.1.trunk.patch, hive.2991.2.branch-0.10.patch, hive.2991.2.branch-0.9.patch, hive.2991.2.trunk.patch, hive.2991.3.branch-0.10.patch, hive.2991.3.branch-0.9.patch, hive.2991.3.trunk.patch, hive.2991.4.branch-0.10.patch, hive.2991.4.branch-0.9.patch, hive.2991.4.trunk.patch, HIVE-clover-branch-0.10--N1.patch, HIVE-clover-branch-0.11--N1.patch, HIVE-clover-trunk--N1.patch, hive-trunk-clover-html-report.zip Atlassian has donated license of their code coverage tool Clover to ASF. Lets make use of it to generate code coverage report to figure out which areas of Hive are well tested and which ones are not. More information about license can be found in Hadoop jira HADOOP-1718 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4160) Vectorized Query Execution in Hive
[ https://issues.apache.org/jira/browse/HIVE-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706190#comment-13706190 ] Dmitriy V. Ryaboy commented on HIVE-4160: - Jitendra, I believe physical plan primitives for both Hive and Pig (and potentially others) are going to come in via Tez, as both Pig and Hive want to get off strict MR in the long-term. I'll take a crack at extracting what's extractable. Right now Hive's UDAF reaches fairly deeply into this code, as you noted, but I think with a little restructuring this can be factored out. Vectorized Query Execution in Hive -- Key: HIVE-4160 URL: https://issues.apache.org/jira/browse/HIVE-4160 Project: Hive Issue Type: New Feature Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: Hive-Vectorized-Query-Execution-Design.docx, Hive-Vectorized-Query-Execution-Design-rev2.docx, Hive-Vectorized-Query-Execution-Design-rev3.docx, Hive-Vectorized-Query-Execution-Design-rev3.docx, Hive-Vectorized-Query-Execution-Design-rev3.pdf, Hive-Vectorized-Query-Execution-Design-rev4.docx, Hive-Vectorized-Query-Execution-Design-rev4.pdf, Hive-Vectorized-Query-Execution-Design-rev5.docx, Hive-Vectorized-Query-Execution-Design-rev5.pdf, Hive-Vectorized-Query-Execution-Design-rev6.docx, Hive-Vectorized-Query-Execution-Design-rev6.pdf, Hive-Vectorized-Query-Execution-Design-rev7.docx, Hive-Vectorized-Query-Execution-Design-rev8.docx, Hive-Vectorized-Query-Execution-Design-rev8.pdf, Hive-Vectorized-Query-Execution-Design-rev9.docx, Hive-Vectorized-Query-Execution-Design-rev9.pdf The Hive query execution engine currently processes one row at a time. A single row of data goes through all the operators before the next row can be processed. This mode of processing is very inefficient in terms of CPU usage. Research has demonstrated that this yields very low instructions per cycle [MonetDB X100]. Also currently Hive heavily relies on lazy deserialization and data columns go through a layer of object inspectors that identify column type, deserialize data and determine appropriate expression routines in the inner loop. These layers of virtual method calls further slow down the processing. This work will add support for vectorized query execution to Hive, where, instead of individual rows, batches of about a thousand rows at a time are processed. Each column in the batch is represented as a vector of a primitive data type. The inner loop of execution scans these vectors very fast, avoiding method calls, deserialization, unnecessary if-then-else, etc. This substantially reduces CPU time used, and gives excellent instructions per cycle (i.e. improved processor pipeline utilization). See the attached design specification for more details. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde
[ https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-4732: Summary: Reduce or eliminate the expensive Schema equals() check for AvroSerde (was: Speed up AvroSerde by checking hashcodes instead of equality) Reduce or eliminate the expensive Schema equals() check for AvroSerde - Key: HIVE-4732 URL: https://issues.apache.org/jira/browse/HIVE-4732 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Mark Wagner Assignee: Mohammad Kamrul Islam Attachments: HIVE-4732.1.patch The AvroSerde spends a significant amount of time checking schema equality. Changing to compare hashcodes (which can be computed once then reused) will improve performance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4675) Create new parallel unit test environment
[ https://issues.apache.org/jira/browse/HIVE-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706236#comment-13706236 ] Vikram Dixit K commented on HIVE-4675: -- [~brocknoland] I have raised HIVE-4842 for the same. Thanks! Create new parallel unit test environment - Key: HIVE-4675 URL: https://issues.apache.org/jira/browse/HIVE-4675 Project: Hive Issue Type: Improvement Components: Testing Infrastructure Reporter: Brock Noland Assignee: Brock Noland Fix For: 0.12.0 Attachments: HIVE-4675.patch The current ptest tool is great, but it has the following limitations: -Requires an NFS filer -Unless the NFS filer is dedicated ptests can become IO bound easily -Investigating of failures is troublesome because the source directory for the failure is not saved -Ignoring or isolated tests is not supported -No unit tests for the ptest framework exist It'd be great to have a ptest tool that addresses this limitations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4842) Hive parallel test framework 2 needs to summarize failures
Vikram Dixit K created HIVE-4842: Summary: Hive parallel test framework 2 needs to summarize failures Key: HIVE-4842 URL: https://issues.apache.org/jira/browse/HIVE-4842 Project: Hive Issue Type: Improvement Components: Build Infrastructure Affects Versions: 0.12.0 Reporter: Vikram Dixit K Assignee: Brock Noland Priority: Minor Fix For: 0.12.0 Currently when unit tests are run, there are multiple simple ways to consume the results. Particularly ant testreport target that generates an html file for easily locating failures. The ptest2 changes coming from HIVE-4675 is great for running the tests in parallel but not very easy to figure out the failing tests. It would be great to have an output similar to that of the testreport target for easy consumption. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3756) LOAD DATA does not honor permission inheritence
[ https://issues.apache.org/jira/browse/HIVE-3756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706269#comment-13706269 ] Sushanth Sowmyan commented on HIVE-3756: I have a few more thoughts on this. Let's walk through an example: Let's say Parent Dir d1 has permission/group combination A. Let's say directory d2 inside Parent Dir has permission/group combination B. In the case of non-partitioned tables, d1 will be the database/warehouse dir, and d2 the table dir. In the case of partitioned tables, d1 will be the table directory and d2 the appropriate partition directories. If we did not have the flag to inherit permissions on, then whatever data is loaded, be it files inside d2 (as during a load operation) or replacing d2 and everything in it (as during an insert overwrite operation), will have yet another permission/group combination C, which is a function of the user's current umask and the user's default group The purpose behind the subdir inherit permissions flag is to make this behaviour go away, and to be able to use the parent dir's permissions/group when possible. So far, so good. Let's say, for purposes of this entire discussion from now onwards, the flag to inherit permissions is on. Now, if we load data into d2, without using overwrite, files inside d2 get permission B. If we load data into d2, using overwrite, we now overwrite d2, and thus, d2 takes on d1's permissions, and so do the files inside, thus resulting in d2 and files inside d2 having permissions/group combination A. -- While this behaviour is consistent, I find that from a user's perspective, if they create a table (say unpartitioned), then chmod/chgrp it to B, and then they try to load data into it using an Insert-Overwrite, then they still expect that they're only overwriting data inside the table dir, and their expectation is that the table still have permissions/group-combination B. They don't want it to be replaced by A, the parent db dir's permissions/group , and they don't want C, the umask/current-user-default-group. Now, as to whether this requires a new flag that overrides hive.warehouse.subdir.inherit.perms or whether they want hive.warehouse.subdir.inherit.perms to work in this way is still up for discussion, but there is now need for an additional requirement, that of the following: If the directory being moved in already exists, and will be deleted so that this can be placed, then instead of going with the parent permissions, it should go with the previous dir's permissions. Thoughts? This can be a separate jira if people feel like it should be, but I think it's also a minor modification of this current jira. LOAD DATA does not honor permission inheritence - Key: HIVE-3756 URL: https://issues.apache.org/jira/browse/HIVE-3756 Project: Hive Issue Type: Bug Components: Authorization, Security Affects Versions: 0.9.0 Reporter: Johndee Burks Assignee: Chaoyu Tang Attachments: HIVE-3756_1.patch, HIVE-3756.patch When a LOAD DATA operation is performed the resulting data in hdfs for the table does not maintain permission inheritance. This remains true even with the hive.warehouse.subdir.inherit.perms set to true. The issue is easily reproducible by creating a table and loading some data into it. After the load is complete just do a dfs -ls -R on the warehouse directory and you will see that the inheritance of permissions worked for the table directory but not for the data. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4055) add Date data type
[ https://issues.apache.org/jira/browse/HIVE-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-4055: - Status: Patch Available (was: Open) add Date data type -- Key: HIVE-4055 URL: https://issues.apache.org/jira/browse/HIVE-4055 Project: Hive Issue Type: Sub-task Components: JDBC, Query Processor, Serializers/Deserializers, UDF Reporter: Sun Rui Attachments: Date.pdf, HIVE-4055.1.patch.txt, HIVE-4055.2.patch.txt, HIVE-4055.D11547.1.patch Add Date data type, a new primitive data type which supports the standard SQL date type. Basically, the implementation can take HIVE-2272 and HIVE-2957 as references. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde
[ https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706299#comment-13706299 ] Mohammad Kamrul Islam commented on HIVE-4732: - Thanks Edward for the comments. We are now trying to take a different approach to address the same issue. A new patch is coming soon. Reduce or eliminate the expensive Schema equals() check for AvroSerde - Key: HIVE-4732 URL: https://issues.apache.org/jira/browse/HIVE-4732 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Mark Wagner Assignee: Mohammad Kamrul Islam Attachments: HIVE-4732.1.patch The AvroSerde spends a significant amount of time checking schema equality. Changing to compare hashcodes (which can be computed once then reused) will improve performance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4843) Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and readability
Vikram Dixit K created HIVE-4843: Summary: Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and readability Key: HIVE-4843 URL: https://issues.apache.org/jira/browse/HIVE-4843 Project: Hive Issue Type: Bug Affects Versions: 0.12.0, tez-branch Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: HIVE-4843.1.patch Currently, there are static apis in multiple locations in ExecDriver and MapRedTask that can be leveraged if put in the already existing utility class in the exec package. This would help making the code more maintainable, readable and also re-usable by other run-time infra such as tez. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4843) Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and readability
[ https://issues.apache.org/jira/browse/HIVE-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-4843: - Attachment: HIVE-4843.1.patch Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and readability --- Key: HIVE-4843 URL: https://issues.apache.org/jira/browse/HIVE-4843 Project: Hive Issue Type: Bug Affects Versions: 0.12.0, tez-branch Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: HIVE-4843.1.patch Currently, there are static apis in multiple locations in ExecDriver and MapRedTask that can be leveraged if put in the already existing utility class in the exec package. This would help making the code more maintainable, readable and also re-usable by other run-time infra such as tez. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request 12480: HIVE-4732 Reduce or eliminate the expensive Schema equals() check for AvroSerde
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/12480/ --- Review request for hive, Ashutosh Chauhan and Jakob Homan. Bugs: HIVE-4732 https://issues.apache.org/jira/browse/HIVE-4732 Repository: hive-git Description --- From our performance analysis, we found AvroSerde's schema.equals() call consumed a substantial amount ( nearly 40%) of time. This patch intends to minimize the number schema.equals() calls by pushing the check as late/fewer as possible. At first, we added a unique id for each record reader which is then included in every AvroGenericRecordWritable. Then, we introduce two new data structures (one hashset and one hashmap) to store intermediate data to avoid duplicates checkings. Hashset contains all the record readers' IDs that don't need any re-encoding. On the other hand, HashMap contains the already used re-encoders. It works as cache and allows re-encoders reuse. With this change, our test shows nearly 40% reduction in Avro record reading time. Diffs - ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroGenericRecordReader.java dbc999f serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java c85ef15 serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroGenericRecordWritable.java 66f0348 serde/src/test/org/apache/hadoop/hive/serde2/avro/TestSchemaReEncoder.java 9af751b serde/src/test/org/apache/hadoop/hive/serde2/avro/Utils.java 2b948eb Diff: https://reviews.apache.org/r/12480/diff/ Testing --- Thanks, Mohammad Islam
[jira] [Updated] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde
[ https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-4732: Attachment: HIVE-4732.v1.patch Reduce or eliminate the expensive Schema equals() check for AvroSerde - Key: HIVE-4732 URL: https://issues.apache.org/jira/browse/HIVE-4732 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Mark Wagner Assignee: Mohammad Kamrul Islam Attachments: HIVE-4732.1.patch, HIVE-4732.v1.patch The AvroSerde spends a significant amount of time checking schema equality. Changing to compare hashcodes (which can be computed once then reused) will improve performance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde
[ https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706360#comment-13706360 ] Mohammad Kamrul Islam commented on HIVE-4732: - New patch is uploaded in RB: https://reviews.apache.org/r/12480/ Description copied from RB: From our performance analysis, we found AvroSerde's schema.equals() call consumed a substantial amount ( nearly 40%) of time. This patch intends to minimize the number schema.equals() calls by pushing the check as late/fewer as possible. At first, we added a unique id for each record reader which is then included in every AvroGenericRecordWritable. Then, we introduce two new data structures (one hashset and one hashmap) to store intermediate data to avoid duplicates checkings. Hashset contains all the record readers' IDs that don't need any re-encoding. On the other hand, HashMap contains the already used re-encoders. It works as cache and allows re-encoders reuse. With this change, our test shows nearly 40% reduction in Avro record reading time. Reduce or eliminate the expensive Schema equals() check for AvroSerde - Key: HIVE-4732 URL: https://issues.apache.org/jira/browse/HIVE-4732 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Mark Wagner Assignee: Mohammad Kamrul Islam Attachments: HIVE-4732.1.patch, HIVE-4732.v1.patch The AvroSerde spends a significant amount of time checking schema equality. Changing to compare hashcodes (which can be computed once then reused) will improve performance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde
[ https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-4732: Status: Patch Available (was: Open) Reduce or eliminate the expensive Schema equals() check for AvroSerde - Key: HIVE-4732 URL: https://issues.apache.org/jira/browse/HIVE-4732 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Mark Wagner Assignee: Mohammad Kamrul Islam Attachments: HIVE-4732.1.patch, HIVE-4732.v1.patch The AvroSerde spends a significant amount of time checking schema equality. Changing to compare hashcodes (which can be computed once then reused) will improve performance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4843) Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and readability
[ https://issues.apache.org/jira/browse/HIVE-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706419#comment-13706419 ] Gunther Hagleitner commented on HIVE-4843: -- can you create a review on rb or phabricator please? Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and readability --- Key: HIVE-4843 URL: https://issues.apache.org/jira/browse/HIVE-4843 Project: Hive Issue Type: Bug Affects Versions: 0.12.0, tez-branch Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: HIVE-4843.1.patch Currently, there are static apis in multiple locations in ExecDriver and MapRedTask that can be leveraged if put in the already existing utility class in the exec package. This would help making the code more maintainable, readable and also re-usable by other run-time infra such as tez. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4331) Integrated StorageHandler for Hive and HCat using the HiveStorageHandler
[ https://issues.apache.org/jira/browse/HIVE-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706460#comment-13706460 ] Rajesh Balamohan commented on HIVE-4331: This will be extremely beneficial for lots of usecases involving Hive, HBase, HCatalog and Pig. Especially one can think of hosting frequently changed data in HBase and access it in Hive/Pig/MapReduce via HCatalog. Integrated StorageHandler for Hive and HCat using the HiveStorageHandler Key: HIVE-4331 URL: https://issues.apache.org/jira/browse/HIVE-4331 Project: Hive Issue Type: Task Components: HCatalog Affects Versions: 0.11.0, 0.12.0 Reporter: Ashutosh Chauhan Assignee: Viraj Bhat Attachments: StorageHandlerDesign_HIVE4331.pdf 1) Deprecate the HCatHBaseStorageHandler and RevisionManager from HCatalog. These will now continue to function but internally they will use the DefaultStorageHandler from Hive. They will be removed in future release of Hive. 2) Design a HivePassThroughFormat so that any new StorageHandler in Hive will bypass the HiveOutputFormat. We will use this class in Hive's HBaseStorageHandler instead of the HiveHBaseTableOutputFormat. 3) Write new unit tests in the HCat's storagehandler so that systems such as Pig and Map Reduce can use the Hive's HBaseStorageHandler instead of the HCatHBaseStorageHandler. 4) Make sure all the old and new unit tests pass without backward compatibility (except known issues as described in the Design Document). 5) Replace all instances of the HCat source code, which point to HCatStorageHandler to use theHiveStorageHandler including the FosterStorageHandler. I have attached the design document for the same and will attach a patch to this Jira. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4843) Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and readability
[ https://issues.apache.org/jira/browse/HIVE-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706462#comment-13706462 ] Vikram Dixit K commented on HIVE-4843: -- https://reviews.apache.org/r/12476/ Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and readability --- Key: HIVE-4843 URL: https://issues.apache.org/jira/browse/HIVE-4843 Project: Hive Issue Type: Bug Affects Versions: 0.12.0, tez-branch Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: HIVE-4843.1.patch Currently, there are static apis in multiple locations in ExecDriver and MapRedTask that can be leveraged if put in the already existing utility class in the exec package. This would help making the code more maintainable, readable and also re-usable by other run-time infra such as tez. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4844) Add char/varchar data types
Jason Dere created HIVE-4844: Summary: Add char/varchar data types Key: HIVE-4844 URL: https://issues.apache.org/jira/browse/HIVE-4844 Project: Hive Issue Type: New Feature Components: Types Reporter: Jason Dere Add new char/varchar data types which have support for more SQL-compliant behavior, such as SQL string comparison semantics, max length, etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3745) Hive does improper = based string comparisons for strings with trailing whitespaces
[ https://issues.apache.org/jira/browse/HIVE-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706519#comment-13706519 ] Jason Dere commented on HIVE-3745: -- Would it make more sense to support the SQL comparison semantics using new char data types, so that we don't break existing behavior for strings? I've created HIVE-4844. Hive does improper = based string comparisons for strings with trailing whitespaces - Key: HIVE-3745 URL: https://issues.apache.org/jira/browse/HIVE-3745 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.9.0 Reporter: Harsh J Assignee: Gang Tim Liu Compared to other systems such as DB2, MySQL, etc., which disregard trailing whitespaces in a string used when comparing two strings with the {{=}} relational operator, Hive does not do this. For example, note the following line from the MySQL manual: http://dev.mysql.com/doc/refman/5.1/en/char.html {quote} All MySQL collations are of type PADSPACE. This means that all CHAR and VARCHAR values in MySQL are compared without regard to any trailing spaces. {quote} Hive still is whitespace sensitive and regards trailing spaces of a string as worthy elements when comparing. Ideally {{LIKE}} should consider this strongly, but {{=}} should not. Is there a specific reason behind this difference of implementation in Hive's SQL? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4844) Add char/varchar data types
[ https://issues.apache.org/jira/browse/HIVE-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-4844: - Assignee: Jason Dere Add char/varchar data types --- Key: HIVE-4844 URL: https://issues.apache.org/jira/browse/HIVE-4844 Project: Hive Issue Type: New Feature Components: Types Reporter: Jason Dere Assignee: Jason Dere Add new char/varchar data types which have support for more SQL-compliant behavior, such as SQL string comparison semantics, max length, etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4841) Add partition level hook to HiveMetaHook
[ https://issues.apache.org/jira/browse/HIVE-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-4841: -- Attachment: HIVE-4841.D11673.1.patch navis requested code review of HIVE-4841 [jira] Add partition level hook to HiveMetaHook. Reviewers: JIRA HIVE-4841 Add partition level hook to HiveMetaHook Current HiveMetaHook provides hooks for tables only. With partition level hook, external storages also could be revised to exploit PPR. TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D11673 AFFECTED FILES hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java hcatalog/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/HBaseHCatStorageHandler.java metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaHook.java metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java MANAGE HERALD RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/27615/ To: JIRA, navis Add partition level hook to HiveMetaHook Key: HIVE-4841 URL: https://issues.apache.org/jira/browse/HIVE-4841 Project: Hive Issue Type: Improvement Components: StorageHandler Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-4841.D11673.1.patch Current HiveMetaHook provides hooks for tables only. With partition level hook, external storages also could be revised to exploit PPR. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4841) Add partition level hook to HiveMetaHook
[ https://issues.apache.org/jira/browse/HIVE-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706557#comment-13706557 ] Navis commented on HIVE-4841: - I've consolidated various methods, add_partition_with_environment_context() append_partition_with_environment_context() append_partition_by_name_with_environment_context() into single entry point add_partition_with_environment_context() add passed all tests Add partition level hook to HiveMetaHook Key: HIVE-4841 URL: https://issues.apache.org/jira/browse/HIVE-4841 Project: Hive Issue Type: Improvement Components: StorageHandler Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-4841.D11673.1.patch Current HiveMetaHook provides hooks for tables only. With partition level hook, external storages also could be revised to exploit PPR. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4841) Add partition level hook to HiveMetaHook
[ https://issues.apache.org/jira/browse/HIVE-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-4841: Status: Patch Available (was: Open) Add partition level hook to HiveMetaHook Key: HIVE-4841 URL: https://issues.apache.org/jira/browse/HIVE-4841 Project: Hive Issue Type: Improvement Components: StorageHandler Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-4841.D11673.1.patch Current HiveMetaHook provides hooks for tables only. With partition level hook, external storages also could be revised to exploit PPR. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4331) Integrated StorageHandler for Hive and HCat using the HiveStorageHandler
[ https://issues.apache.org/jira/browse/HIVE-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated HIVE-4331: - Attachment: HIVE_4331.patch Initial patch will put it on review board Integrated StorageHandler for Hive and HCat using the HiveStorageHandler Key: HIVE-4331 URL: https://issues.apache.org/jira/browse/HIVE-4331 Project: Hive Issue Type: Task Components: HCatalog Affects Versions: 0.11.0, 0.12.0 Reporter: Ashutosh Chauhan Assignee: Viraj Bhat Attachments: HIVE_4331.patch, StorageHandlerDesign_HIVE4331.pdf 1) Deprecate the HCatHBaseStorageHandler and RevisionManager from HCatalog. These will now continue to function but internally they will use the DefaultStorageHandler from Hive. They will be removed in future release of Hive. 2) Design a HivePassThroughFormat so that any new StorageHandler in Hive will bypass the HiveOutputFormat. We will use this class in Hive's HBaseStorageHandler instead of the HiveHBaseTableOutputFormat. 3) Write new unit tests in the HCat's storagehandler so that systems such as Pig and Map Reduce can use the Hive's HBaseStorageHandler instead of the HCatHBaseStorageHandler. 4) Make sure all the old and new unit tests pass without backward compatibility (except known issues as described in the Design Document). 5) Replace all instances of the HCat source code, which point to HCatStorageHandler to use theHiveStorageHandler including the FosterStorageHandler. I have attached the design document for the same and will attach a patch to this Jira. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4658) Make KW_OUTER optional in outer joins
[ https://issues.apache.org/jira/browse/HIVE-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706581#comment-13706581 ] Edward Capriolo commented on HIVE-4658: --- Can we go +1? Make KW_OUTER optional in outer joins - Key: HIVE-4658 URL: https://issues.apache.org/jira/browse/HIVE-4658 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Edward Capriolo Priority: Trivial Attachments: hive-4658.2.patch.txt, HIVE-4658.D11091.1.patch For really trivial migration issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3404) UDF to obtain the quarter of an year if a date or timestamp is given .
[ https://issues.apache.org/jira/browse/HIVE-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706586#comment-13706586 ] Edward Capriolo commented on HIVE-3404: --- +1 UDF to obtain the quarter of an year if a date or timestamp is given . -- Key: HIVE-3404 URL: https://issues.apache.org/jira/browse/HIVE-3404 Project: Hive Issue Type: New Feature Components: UDF Reporter: Sanam Naz Attachments: HIVE-3404.1.patch.txt Hive current releases lacks a function which returns the quarter of an year if a date or timestamp is given .The function QUARTER(date) would return the quarter from a date / timestamp .This can be used in HiveQL.This will be useful for different domains like retail ,finance etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3404) UDF to obtain the quarter of an year if a date or timestamp is given .
[ https://issues.apache.org/jira/browse/HIVE-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706593#comment-13706593 ] Edward Capriolo commented on HIVE-3404: --- You also need to update show_functions.q UDF to obtain the quarter of an year if a date or timestamp is given . -- Key: HIVE-3404 URL: https://issues.apache.org/jira/browse/HIVE-3404 Project: Hive Issue Type: New Feature Components: UDF Reporter: Sanam Naz Attachments: HIVE-3404.1.patch.txt Hive current releases lacks a function which returns the quarter of an year if a date or timestamp is given .The function QUARTER(date) would return the quarter from a date / timestamp .This can be used in HiveQL.This will be useful for different domains like retail ,finance etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-1446) Move Hive Documentation from the wiki to version control
[ https://issues.apache.org/jira/browse/HIVE-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo resolved HIVE-1446. --- Resolution: Fixed Move Hive Documentation from the wiki to version control Key: HIVE-1446 URL: https://issues.apache.org/jira/browse/HIVE-1446 Project: Hive Issue Type: Task Components: Documentation Reporter: Carl Steinbach Assignee: Carl Steinbach Attachments: hive-1446.diff, hive-1446-part-1.diff, hive-logo-wide.png Move the Hive Language Manual (and possibly some other documents) from the Hive wiki to version control. This work needs to be coordinated with the hive-dev and hive-user community in order to avoid missing any edits as well as to avoid or limit unavailability of the docs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2989) Adding Table Links to Hive
[ https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706597#comment-13706597 ] Edward Capriolo commented on HIVE-2989: --- Did we ditch this idea? should we close up shop? Adding Table Links to Hive -- Key: HIVE-2989 URL: https://issues.apache.org/jira/browse/HIVE-2989 Project: Hive Issue Type: Improvement Components: Metastore, Query Processor, Security Affects Versions: 0.10.0 Reporter: Bhushan Mandhani Assignee: Bhushan Mandhani Attachments: HIVE-2989.10.patch.txt, HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, HIVE-2989.6.patch.txt, HIVE-2989.9.patch.txt Original Estimate: 672h Remaining Estimate: 672h This will add Table Links to Hive. This will be an alternate mechanism for a user to access tables and data in a database that is different from the one he is associated with. This feature can be used to provide access control (if access to databasename.tablename in queries and use database X is turned off in conjunction). If db X wants to access one or more partitions from table T in db Y, the user will issue: CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N') New partitions added to T will automatically be added to the link as well and become available to X. However, if the link is specified to be static, that will not be the case. The X user will then have to explicitly import each partition of T that he needs. The command above will not actually make any existing partitions of T available to X. Instead, we provide the following command to add an existing partition to a link: ALTER LINK T@Y ADD PARTITION (ds='2012-04-27') The user will need to execute the above for each existing partition that needs to be imported. For future partitions, Hive will take care of this. An imported partition can be dropped from a link using a similar command. We just specify DROP instead of ADD. For querying the linked table, the X user will refer to it as T@Y. Link Tables will only have read access and not be writable. The entire Table Link alongwith all its imported partitions can be dropped as follows: DROP LINK TO T@Y The above commands are purely MetaStore operations. The implementation will rely on replicating the entire partition metadata when a partition is added to a link. For every link that is created, we will add a new row to table TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or STATIC_LINK_TABLE if the link has been specified as static). A new column LINK_TBL_ID will be added which will contain the id of the imported table. It will be NULL for all other table types including the regular managed tables. When a partition is added to a link, the new row in the table PARTITIONS will point to the LINK_TABLE in the same database and not the master table in the other database. We will replicate all the metadata for this partition from the master database. The advantage of this approach is that fewer changes will be needed in query processing and DDL for LINK_TABLEs. Also, commands like SHOW TABLES and SHOW PARTITIONS will work as expected for LINK_TABLEs too. Of course, even though the metadata is not shared, the underlying data on disk is still shared. Hive still needs to know that when dropping a partition which belongs to a LINK_TABLE, it should not drop the underlying data from HDFS. Views and external tables cannot be imported from one database to another. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2989) Adding Table Links to Hive
[ https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706598#comment-13706598 ] Bhushan Mandhani commented on HIVE-2989: Hi, Bhushan Mandhani is no longer at Facebook so this email address is no longer being monitored. If you need assistance, please contact another person who is currently at the company. Adding Table Links to Hive -- Key: HIVE-2989 URL: https://issues.apache.org/jira/browse/HIVE-2989 Project: Hive Issue Type: Improvement Components: Metastore, Query Processor, Security Affects Versions: 0.10.0 Reporter: Bhushan Mandhani Assignee: Bhushan Mandhani Attachments: HIVE-2989.10.patch.txt, HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, HIVE-2989.6.patch.txt, HIVE-2989.9.patch.txt Original Estimate: 672h Remaining Estimate: 672h This will add Table Links to Hive. This will be an alternate mechanism for a user to access tables and data in a database that is different from the one he is associated with. This feature can be used to provide access control (if access to databasename.tablename in queries and use database X is turned off in conjunction). If db X wants to access one or more partitions from table T in db Y, the user will issue: CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N') New partitions added to T will automatically be added to the link as well and become available to X. However, if the link is specified to be static, that will not be the case. The X user will then have to explicitly import each partition of T that he needs. The command above will not actually make any existing partitions of T available to X. Instead, we provide the following command to add an existing partition to a link: ALTER LINK T@Y ADD PARTITION (ds='2012-04-27') The user will need to execute the above for each existing partition that needs to be imported. For future partitions, Hive will take care of this. An imported partition can be dropped from a link using a similar command. We just specify DROP instead of ADD. For querying the linked table, the X user will refer to it as T@Y. Link Tables will only have read access and not be writable. The entire Table Link alongwith all its imported partitions can be dropped as follows: DROP LINK TO T@Y The above commands are purely MetaStore operations. The implementation will rely on replicating the entire partition metadata when a partition is added to a link. For every link that is created, we will add a new row to table TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or STATIC_LINK_TABLE if the link has been specified as static). A new column LINK_TBL_ID will be added which will contain the id of the imported table. It will be NULL for all other table types including the regular managed tables. When a partition is added to a link, the new row in the table PARTITIONS will point to the LINK_TABLE in the same database and not the master table in the other database. We will replicate all the metadata for this partition from the master database. The advantage of this approach is that fewer changes will be needed in query processing and DDL for LINK_TABLEs. Also, commands like SHOW TABLES and SHOW PARTITIONS will work as expected for LINK_TABLEs too. Of course, even though the metadata is not shared, the underlying data on disk is still shared. Hive still needs to know that when dropping a partition which belongs to a LINK_TABLE, it should not drop the underlying data from HDFS. Views and external tables cannot be imported from one database to another. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-2591) Hive 0.7.1 fails with Exception in thread main java.lang.NoSuchFieldError: type
[ https://issues.apache.org/jira/browse/HIVE-2591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo resolved HIVE-2591. --- Resolution: Won't Fix Hive 0.7.1 fails with Exception in thread main java.lang.NoSuchFieldError: type --- Key: HIVE-2591 URL: https://issues.apache.org/jira/browse/HIVE-2591 Project: Hive Issue Type: Bug Components: CLI, JDBC, SQL Affects Versions: 0.7.1 Environment: Intel Core2 Quad CPU Q8400 @2.66GHz 4 GB RAM Ubuntu 10.10 32 bit JDK 6.0_27 Apache Ant 1.8.0 Apache Hive 0.7.1 Apache Hadoop 0.20.203.0 Reporter: Prashanth Priority: Blocker Labels: hive Hi, When I try to invoke hive and type in SHOW TABLES in cli in the environment as explained above, I get Exception in thread main java.lang.NoSuchFieldError: type and I am not able to use it at all. Is there any temporary fix for this? Please let me know, if I am making any mistake here. I have downloaded Hive 0.7.1 from the download link as mentioned in the Hive Wiki. The download url is http://hive.apache.org/releases.html. /opt/hive-0.7.1$ hive WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files. Hive history file=/tmp/hadoop/hive_job_log_hduser_20190121_764439225.txt hive SHOW TABLES; Exception in thread main java.lang.NoSuchFieldError: type at org.apache.hadoop.hive.ql.parse.HiveLexer.mKW_SHOW(HiveLexer.java:1234) at org.apache.hadoop.hive.ql.parse.HiveLexer.mTokens(HiveLexer.java:5942) at org.antlr.runtime.Lexer.nextToken(Lexer.java:89) at org.antlr.runtime.BufferedTokenStream.fetch(BufferedTokenStream.java:133) at org.antlr.runtime.BufferedTokenStream.sync(BufferedTokenStream.java:127) at org.antlr.runtime.CommonTokenStream.setup(CommonTokenStream.java:127) at org.antlr.runtime.CommonTokenStream.LT(CommonTokenStream.java:91) at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:521) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:436) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:327) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:736) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) I am not sure what is the actual issue here or rather how to fix it. Can you please let me know if there is any workaround for this. Alternatively I tried building hive from the SVN source repo. I am neither able to build hive from SVN. I get the following error. [datanucleusenhancer] D:\hive\build\ivy\lib\default\zookeeper-3.3.1.jar [datanucleusenhancer] Exception in thread main java.lang.VerifyError: Expecting a stackmap frame at branch target 76 in method org.apache.hadoop.hive.metastore.model.MDatabase.jdoCopyField(Lorg/apache/hadoop/hive/metastore/model/MDatabase;I)V at offset 1 [datanucleusenhancer] at java.lang.Class.getDeclaredFields0(Native Method) [datanucleusenhancer] at java.lang.Class.privateGetDeclaredFields(Class.java:2308) [datanucleusenhancer] at java.lang.Class.getDeclaredFields(Class.java:1760) [datanucleusenhancer] at org.datanucleus.metadata.ClassMetaData.addMetaDataForMembersNotInMetaData(ClassMetaData.java:358) [datanucleusenhancer] at org.datanucleus.metadata.ClassMetaData.populate(ClassMetaData.java:199) [datanucleusenhancer] at org.datanucleus.metadata.MetaDataManager$1.run(MetaDataManager.java:2394) [datanucleusenhancer] at java.security.AccessController.doPrivileged(Native Method) [datanucleusenhancer] at org.datanucleus.metadata.MetaDataManager.populateAbstractClassMetaData(MetaDataManager.java:2388) [datanucleusenhancer] at org.datanucleus.metadata.MetaDataManager.populateFileMetaData(MetaDataManager.java:2225) [datanucleusenhancer] at org.datanucleus.metadata.MetaDataManager.initialiseFileMetaDataForUse(MetaDataManager.java:925) [datanucleusenhancer] at org.datanucleus.metadata.MetaDataManager.loadMetadataFiles(MetaDataManager.java:399) [datanucleusenhancer] at
[jira] [Resolved] (HIVE-2989) Adding Table Links to Hive
[ https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo resolved HIVE-2989. --- Resolution: Won't Fix Adding Table Links to Hive -- Key: HIVE-2989 URL: https://issues.apache.org/jira/browse/HIVE-2989 Project: Hive Issue Type: Improvement Components: Metastore, Query Processor, Security Affects Versions: 0.10.0 Reporter: Bhushan Mandhani Assignee: Bhushan Mandhani Attachments: HIVE-2989.10.patch.txt, HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, HIVE-2989.6.patch.txt, HIVE-2989.9.patch.txt Original Estimate: 672h Remaining Estimate: 672h This will add Table Links to Hive. This will be an alternate mechanism for a user to access tables and data in a database that is different from the one he is associated with. This feature can be used to provide access control (if access to databasename.tablename in queries and use database X is turned off in conjunction). If db X wants to access one or more partitions from table T in db Y, the user will issue: CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N') New partitions added to T will automatically be added to the link as well and become available to X. However, if the link is specified to be static, that will not be the case. The X user will then have to explicitly import each partition of T that he needs. The command above will not actually make any existing partitions of T available to X. Instead, we provide the following command to add an existing partition to a link: ALTER LINK T@Y ADD PARTITION (ds='2012-04-27') The user will need to execute the above for each existing partition that needs to be imported. For future partitions, Hive will take care of this. An imported partition can be dropped from a link using a similar command. We just specify DROP instead of ADD. For querying the linked table, the X user will refer to it as T@Y. Link Tables will only have read access and not be writable. The entire Table Link alongwith all its imported partitions can be dropped as follows: DROP LINK TO T@Y The above commands are purely MetaStore operations. The implementation will rely on replicating the entire partition metadata when a partition is added to a link. For every link that is created, we will add a new row to table TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or STATIC_LINK_TABLE if the link has been specified as static). A new column LINK_TBL_ID will be added which will contain the id of the imported table. It will be NULL for all other table types including the regular managed tables. When a partition is added to a link, the new row in the table PARTITIONS will point to the LINK_TABLE in the same database and not the master table in the other database. We will replicate all the metadata for this partition from the master database. The advantage of this approach is that fewer changes will be needed in query processing and DDL for LINK_TABLEs. Also, commands like SHOW TABLES and SHOW PARTITIONS will work as expected for LINK_TABLEs too. Of course, even though the metadata is not shared, the underlying data on disk is still shared. Hive still needs to know that when dropping a partition which belongs to a LINK_TABLE, it should not drop the underlying data from HDFS. Views and external tables cannot be imported from one database to another. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2608) Do not require AS a,b,c part in LATERAL VIEW
[ https://issues.apache.org/jira/browse/HIVE-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-2608: -- Status: Open (was: Patch Available) PAtch needs to be rebased. Do not require AS a,b,c part in LATERAL VIEW Key: HIVE-2608 URL: https://issues.apache.org/jira/browse/HIVE-2608 Project: Hive Issue Type: Improvement Components: Query Processor, UDF Affects Versions: 0.10.0 Reporter: Igor Kabiljo Assignee: Navis Priority: Minor Attachments: HIVE-2608.D4317.5.patch Currently, it is required to state column names when LATERAL VIEW is used. That shouldn't be necessary, since UDTF returns struct which contains column names - and they should be used by default. For example, it would be great if this was possible: SELECT t.*, t.key1 + t.key4 FROM some_table LATERAL VIEW JSON_TUPLE(json, 'key1', 'key2', 'key3', 'key3') t; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Hiveserver2 JDBC Client -SQL Select exceptions
I am trying to execute a hive query from JDBC CLient.I am using Hiveserver2 currently. A very basic query throws SQL exception only from the JDBC CLient and not from the CLI * The queries shown below execute successfully on the CLI* From the JDBC client *select * from tableA* works fine whereas if I try to provide a column name and execute the query from the JDBC CLient I land into errors *select col1,col2 from tableA * throws up the following SQL exception. Is anyone facing the same issue ? * Exception in thread main java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:159) at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:147) at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:182) at org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:246) * I s there a fix for the issue? Thanks,Varun -- _ Regards, Varun
[jira] [Commented] (HIVE-3488) Issue trying to use the thick client (embedded) from windows.
[ https://issues.apache.org/jira/browse/HIVE-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706668#comment-13706668 ] Kanwaljit Singh commented on HIVE-3488: --- We are getting a similar error after dropping all partitions: java.io.IOException: cannot find dir = hdfs://HVEname:9000/tmp/hive-admin/hive_2013-07-12_05-31-36_471_398021424951 1966905/-mr-10002/1/emptyFile in pathToPartitionInfo: [hdfs://192.168.156.229:9000/tmp/hive-admin/hive_2013-07-12_0 5-31-36_471_3980214249511966905/-mr-10002/1] at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils .java:298) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils .java:260) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat$CombineHiveInputSplit.init(CombineHiveInputFormat. java:104) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:407) at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:929) at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:921) Issue trying to use the thick client (embedded) from windows. - Key: HIVE-3488 URL: https://issues.apache.org/jira/browse/HIVE-3488 Project: Hive Issue Type: Bug Components: Windows Affects Versions: 0.8.1 Reporter: Rémy DUBOIS Priority: Critical I'm trying to execute a very simple SELECT query against my remote hive server. If I'm doing a SELECT * from table, everything works well. If I'm trying to execute a SELECT name from table, this error appears: {code:java} Job Submission failed with exception 'java.io.IOException(cannot find dir = /user/hive/warehouse/test/city=paris/out.csv in pathToPartitionInfo: [hdfs://cdh-four:8020/user/hive/warehouse/test/city=paris])' 12/09/19 17:18:44 ERROR exec.Task: Job Submission failed with exception 'java.io.IOException(cannot find dir = /user/hive/warehouse/test/city=paris/out.csv in pathToPartitionInfo: [hdfs://cdh-four:8020/user/hive/warehouse/test/city=paris])' java.io.IOException: cannot find dir = /user/hive/warehouse/test/city=paris/out.csv in pathToPartitionInfo: [hdfs://cdh-four:8020/user/hive/warehouse/test/city=paris] at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:290) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:257) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat$CombineHiveInputSplit.init(CombineHiveInputFormat.java:104) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:407) at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:989) at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:981) at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:891) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:844) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Unknown Source) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:844) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:818) at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:452) at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:136) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:133) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1332) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1123) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:931) at org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:191) at org.apache.hadoop.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:187) {code} Indeed, this dir (/user/hive/warehouse/test/city=paris/out.csv) can't be found since it deals with my data file, and not a directory. Could you please help me? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira