[jira] [Commented] (HIVE-3098) Memory leak from large number of FileSystem instances in FileSystem.CACHE. (Must cache UGIs.)
[ https://issues.apache.org/jira/browse/HIVE-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13403290#comment-13403290 ] Jitendra Nath Pandey commented on HIVE-3098: bq. Problem stems from the fact that there is no expiration policy either in fs or ugi cache. We need to design for UGI cache eviction policy. There, when we are expiring stale ugi's from ugi-cache we can do closeAllForUGI for evicting ugi to evict cached FS objects from fs-cache. +1. It may be more tractable to have a cache expiration policy in ugi-cache based on the semantics of this particular use case. In FS-cache it gets trickier because of the general purpose nature of the file system. Memory leak from large number of FileSystem instances in FileSystem.CACHE. (Must cache UGIs.) - Key: HIVE-3098 URL: https://issues.apache.org/jira/browse/HIVE-3098 Project: Hive Issue Type: Bug Components: Shims Affects Versions: 0.9.0 Environment: Running with Hadoop 20.205.0.3+ / 1.0.x with security turned on. Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Attachments: HIVE-3098.patch The problem manifested from stress-testing HCatalog 0.4.1 (as part of testing the Oracle backend). The HCatalog server ran out of memory (-Xmx2048m) when pounded by 60-threads, in under 24 hours. The heap-dump indicates that hadoop::FileSystem.CACHE had 100 instances of FileSystem, whose combined retained-mem consumed the entire heap. It boiled down to hadoop::UserGroupInformation::equals() being implemented such that the Subject member is compared for equality (==), and not equivalence (.equals()). This causes equivalent UGI instances to compare as unequal, and causes a new FileSystem instance to be created and cached. The UGI.equals() is so implemented, incidentally, as a fix for yet another problem (HADOOP-6670); so it is unlikely that that implementation can be modified. The solution for this is to check for UGI equivalence in HCatalog (i.e. in the Hive metastore), using an cache for UGI instances in the shims. I have a patch to fix this. I'll upload it shortly. I just ran an overnight test to confirm that the memory-leak has been arrested. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4575) In place filtering in Not Filter doesn't handle nulls correctly.
Jitendra Nath Pandey created HIVE-4575: -- Summary: In place filtering in Not Filter doesn't handle nulls correctly. Key: HIVE-4575 URL: https://issues.apache.org/jira/browse/HIVE-4575 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey The FilterNotExpr evaluates the child expression and takes the compliment of the selected vector. Since child expression filters out null values, the compliment includes the nulls in the output. This is incorrect because not(null) = null. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4575) In place filtering in Not Filter doesn't handle nulls correctly.
[ https://issues.apache.org/jira/browse/HIVE-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13661102#comment-13661102 ] Jitendra Nath Pandey commented on HIVE-4575: bq. I think the repro here in our code is that you'd get 1 NULL and 3 NULL returned. Yes. Another point is that output of select * from t where NOT (a = 2); should be same as select * from t where (a 2); In our current implementation first query will return row 1 and 4, while second will return only row 1. In place filtering in Not Filter doesn't handle nulls correctly. Key: HIVE-4575 URL: https://issues.apache.org/jira/browse/HIVE-4575 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey The FilterNotExpr evaluates the child expression and takes the compliment of the selected vector. Since child expression filters out null values, the compliment includes the nulls in the output. This is incorrect because not(null) = null. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4534) IsNotNull and NotCol incorrectly handle nulls.
[ https://issues.apache.org/jira/browse/HIVE-4534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-4534: --- Attachment: HIVE-4534.2.patch Attached patch has additional unit tests, could not create review board entry because this patch is on top of HIVE-4472 patch. IsNotNull and NotCol incorrectly handle nulls. -- Key: HIVE-4534 URL: https://issues.apache.org/jira/browse/HIVE-4534 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Jitendra Nath Pandey Attachments: HIVE-4534.1.patch, HIVE-4534.2.patch See file IsNotNull.java in package org.apache.hadoop.hive.ql.exec.vector.expressions It never looks at the noNulls flag on the input vector, but accesses the isNull[] array anyway. This can yield incorrect results. isRepeating and noNulls are not set in the output, which can also cause wrong results. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4534) IsNotNull and NotCol incorrectly handle nulls.
[ https://issues.apache.org/jira/browse/HIVE-4534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-4534: --- Affects Version/s: vectorization-branch Status: Patch Available (was: Open) IsNotNull and NotCol incorrectly handle nulls. -- Key: HIVE-4534 URL: https://issues.apache.org/jira/browse/HIVE-4534 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Eric Hanson Assignee: Jitendra Nath Pandey Attachments: HIVE-4534.1.patch, HIVE-4534.2.patch See file IsNotNull.java in package org.apache.hadoop.hive.ql.exec.vector.expressions It never looks at the noNulls flag on the input vector, but accesses the isNull[] array anyway. This can yield incorrect results. isRepeating and noNulls are not set in the output, which can also cause wrong results. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4534) IsNotNull and NotCol incorrectly handle nulls.
[ https://issues.apache.org/jira/browse/HIVE-4534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-4534: --- Affects Version/s: (was: vectorization-branch) IsNotNull and NotCol incorrectly handle nulls. -- Key: HIVE-4534 URL: https://issues.apache.org/jira/browse/HIVE-4534 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Jitendra Nath Pandey Attachments: HIVE-4534.1.patch, HIVE-4534.2.patch See file IsNotNull.java in package org.apache.hadoop.hive.ql.exec.vector.expressions It never looks at the noNulls flag on the input vector, but accesses the isNull[] array anyway. This can yield incorrect results. isRepeating and noNulls are not set in the output, which can also cause wrong results. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4472) OR, NOT Filter logic can lose an array, and always takes time O(VectorizedRowBatch.DEFAULT_SIZE)
[ https://issues.apache.org/jira/browse/HIVE-4472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-4472: --- Status: Patch Available (was: Open) OR, NOT Filter logic can lose an array, and always takes time O(VectorizedRowBatch.DEFAULT_SIZE) Key: HIVE-4472 URL: https://issues.apache.org/jira/browse/HIVE-4472 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Jitendra Nath Pandey Attachments: HIVE-4472.1.patch, HIVE-4472.2.patch, HIVE-4472.3.patch, HIVE-4472.4.patch The issue is in file FilterExprOrExpr.java and FilterNotExpr.java. I posted a review for you at https://reviews.apache.org/r/10752/ I think there is a bug related to sharing of an array of integers. Also, one algorithm step takes O(DEFAULT_BATCH_SIZE) time always. If nDEFAULT_BATCH_SIZE then this is a performance issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4537) select * fails on orc table when vectorization is enabled
[ https://issues.apache.org/jira/browse/HIVE-4537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-4537: --- Status: Patch Available (was: Open) select * fails on orc table when vectorization is enabled -- Key: HIVE-4537 URL: https://issues.apache.org/jira/browse/HIVE-4537 Project: Hive Issue Type: Sub-task Components: Query Processor Reporter: Tony Murphy Assignee: Sarvesh Sakalanaga Attachments: Hive-4537.0.patch hive select * from intdataorc; OK Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating cint0 Time taken: 0.213 seconds -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4472) OR, NOT Filter logic can lose an array, and always takes time O(VectorizedRowBatch.DEFAULT_SIZE)
[ https://issues.apache.org/jira/browse/HIVE-4472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-4472: --- Attachment: HIVE-4472.5.patch Same patch as previous one except that the fix to TestConstantVectorExpression is removed, because that is taken care of by HIVE-4553. OR, NOT Filter logic can lose an array, and always takes time O(VectorizedRowBatch.DEFAULT_SIZE) Key: HIVE-4472 URL: https://issues.apache.org/jira/browse/HIVE-4472 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Jitendra Nath Pandey Attachments: HIVE-4472.1.patch, HIVE-4472.2.patch, HIVE-4472.3.patch, HIVE-4472.4.patch, HIVE-4472.5.patch The issue is in file FilterExprOrExpr.java and FilterNotExpr.java. I posted a review for you at https://reviews.apache.org/r/10752/ I think there is a bug related to sharing of an array of integers. Also, one algorithm step takes O(DEFAULT_BATCH_SIZE) time always. If nDEFAULT_BATCH_SIZE then this is a performance issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4592) ColumnArithmeticColumn.txt template never sets output isNull to true; can give wrong results
[ https://issues.apache.org/jira/browse/HIVE-4592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13664817#comment-13664817 ] Jitendra Nath Pandey commented on HIVE-4592: Same issue exists in many other templates. I think we should fix them too in the same jira. Also, most of these templates assume that noNulls=false and isRepeating=true means all values are null. ColumnArithmeticColumn.txt template never sets output isNull to true; can give wrong results Key: HIVE-4592 URL: https://issues.apache.org/jira/browse/HIVE-4592 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Eric Hanson Assignee: Eric Hanson Fix For: vectorization-branch ColumnArithmeticColumn.txt should set the output column's noNulls flag to true if neither input column has nulls, but it does not do that. This can lead to wrong results is the noNulls was set to false in a previous use of the batch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4599) VectorGroupByOperator steals the non-vectorized children and crashes query if vectorization fails
[ https://issues.apache.org/jira/browse/HIVE-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13665549#comment-13665549 ] Jitendra Nath Pandey commented on HIVE-4599: I will recommend putting in VectorReduceSinkOperator, because that will make vectorized map side and non-vectorize reduce side work together for non-GBy queries too. VectorGroupByOperator steals the non-vectorized children and crashes query if vectorization fails - Key: HIVE-4599 URL: https://issues.apache.org/jira/browse/HIVE-4599 Project: Hive Issue Type: Sub-task Components: Query Processor Affects Versions: vectorization-branch Reporter: Remus Rusanu Assignee: Remus Rusanu Have the VGBy clone it's own row mode children or implement vector mode output (including VectorReduceSinkOperator) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4603) VectorSelectOperator projections change the index of columns for subsequent operators.
Jitendra Nath Pandey created HIVE-4603: -- Summary: VectorSelectOperator projections change the index of columns for subsequent operators. Key: HIVE-4603 URL: https://issues.apache.org/jira/browse/HIVE-4603 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4603) VectorSelectOperator projections change the index of columns for subsequent operators.
[ https://issues.apache.org/jira/browse/HIVE-4603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-4603: --- Attachment: HIVE-4603.1.patch Initial patch, the unit test needs to be fixed. I will upload another patch. VectorSelectOperator projections change the index of columns for subsequent operators. -- Key: HIVE-4603 URL: https://issues.apache.org/jira/browse/HIVE-4603 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-4603.1.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4592) ColumnArithmeticColumn.txt template never sets output isNull to true; can give wrong results
[ https://issues.apache.org/jira/browse/HIVE-4592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13666031#comment-13666031 ] Jitendra Nath Pandey commented on HIVE-4592: Long-long division is handled specially, as it is cast to double division. These expressions are no longer generated using templates. Please add the fix to those too. They are located in: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/ ColumnArithmeticColumn.txt template never sets output isNull to true; can give wrong results Key: HIVE-4592 URL: https://issues.apache.org/jira/browse/HIVE-4592 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Eric Hanson Assignee: Eric Hanson Fix For: vectorization-branch Attachments: HIVE-4592.1.patch, HIVE-4592.3.patch ColumnArithmeticColumn.txt should set the output column's noNulls flag to true if neither input column has nulls, but it does not do that. This can lead to wrong results is the noNulls was set to false in a previous use of the batch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4603) VectorSelectOperator projections change the index of columns for subsequent operators.
[ https://issues.apache.org/jira/browse/HIVE-4603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-4603: --- Attachment: HIVE-4603.2.patch VectorSelectOperator projections change the index of columns for subsequent operators. -- Key: HIVE-4603 URL: https://issues.apache.org/jira/browse/HIVE-4603 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-4603.1.patch, HIVE-4603.2.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4603) VectorSelectOperator projections change the index of columns for subsequent operators.
[ https://issues.apache.org/jira/browse/HIVE-4603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13666832#comment-13666832 ] Jitendra Nath Pandey commented on HIVE-4603: New patch with unit test fixed. VectorSelectOperator projections change the index of columns for subsequent operators. -- Key: HIVE-4603 URL: https://issues.apache.org/jira/browse/HIVE-4603 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-4603.1.patch, HIVE-4603.2.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4603) VectorSelectOperator projections change the index of columns for subsequent operators.
[ https://issues.apache.org/jira/browse/HIVE-4603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-4603: --- Description: VectorSelectOperator projections change the index of columns for subsequent operators. VectorSelectOperator projections change the index of columns for subsequent operators. -- Key: HIVE-4603 URL: https://issues.apache.org/jira/browse/HIVE-4603 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-4603.1.patch, HIVE-4603.2.patch VectorSelectOperator projections change the index of columns for subsequent operators. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4640) CommonOrcInputFormat should be the default input format for Orc files.
[ https://issues.apache.org/jira/browse/HIVE-4640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-4640: --- Issue Type: Sub-task (was: Bug) Parent: HIVE-4160 CommonOrcInputFormat should be the default input format for Orc files. -- Key: HIVE-4640 URL: https://issues.apache.org/jira/browse/HIVE-4640 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey CommonOrcInputFormat should be the default input format for Orc files, so that default orc format tables work with both vectorized and non-vectorized path. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4640) CommonOrcInputFormat should be the default input format for Orc files.
Jitendra Nath Pandey created HIVE-4640: -- Summary: CommonOrcInputFormat should be the default input format for Orc files. Key: HIVE-4640 URL: https://issues.apache.org/jira/browse/HIVE-4640 Project: Hive Issue Type: Bug Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey CommonOrcInputFormat should be the default input format for Orc files, so that default orc format tables work with both vectorized and non-vectorized path. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4640) CommonOrcInputFormat should be the default input format for Orc tables.
[ https://issues.apache.org/jira/browse/HIVE-4640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-4640: --- Summary: CommonOrcInputFormat should be the default input format for Orc tables. (was: CommonOrcInputFormat should be the default input format for Orc files.) CommonOrcInputFormat should be the default input format for Orc tables. --- Key: HIVE-4640 URL: https://issues.apache.org/jira/browse/HIVE-4640 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey CommonOrcInputFormat should be the default input format for Orc files, so that default orc format tables work with both vectorized and non-vectorized path. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4649) Unit test failure in TestColumnScalarOperationVectorExpressionEvaluation
Jitendra Nath Pandey created HIVE-4649: -- Summary: Unit test failure in TestColumnScalarOperationVectorExpressionEvaluation Key: HIVE-4649 URL: https://issues.apache.org/jira/browse/HIVE-4649 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey The test fails due to bug in ColumnCompareScalar.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4649) Unit test failure in TestColumnScalarOperationVectorExpressionEvaluation
[ https://issues.apache.org/jira/browse/HIVE-4649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-4649: --- Attachment: HIVE-4649.1.patch Attached patch fixes the issue. Unit test failure in TestColumnScalarOperationVectorExpressionEvaluation - Key: HIVE-4649 URL: https://issues.apache.org/jira/browse/HIVE-4649 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-4649.1.patch The test fails due to bug in ColumnCompareScalar.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4655) Vectorization not working with negative constants, hive doesn't fold constants.
Jitendra Nath Pandey created HIVE-4655: -- Summary: Vectorization not working with negative constants, hive doesn't fold constants. Key: HIVE-4655 URL: https://issues.apache.org/jira/browse/HIVE-4655 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Hive optimizer doesn't fold the constants, however vectorized code path assumes that constants have been folded. This should be fixed in hive optimizer. In this jira we just fix vectorization path to handle folding for negative constants. This is needed because hive plan treats negative constants as unary-minus expression on constants, therefore these expressions also need constant folding. This fix will become redundant once constant folding is appropriately implemented in hive optimizer. (HIVE-746) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4655) Vectorization not working with negative constants, hive doesn't fold constants.
[ https://issues.apache.org/jira/browse/HIVE-4655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-4655: --- Attachment: HIVE-4655.1.patch Vectorization not working with negative constants, hive doesn't fold constants. --- Key: HIVE-4655 URL: https://issues.apache.org/jira/browse/HIVE-4655 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-4655.1.patch Hive optimizer doesn't fold the constants, however vectorized code path assumes that constants have been folded. This should be fixed in hive optimizer. In this jira we just fix vectorization path to handle folding for negative constants. This is needed because hive plan treats negative constants as unary-minus expression on constants, therefore these expressions also need constant folding. This fix will become redundant once constant folding is appropriately implemented in hive optimizer. (HIVE-746) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4655) Vectorization not working with negative constants, hive doesn't fold constants.
[ https://issues.apache.org/jira/browse/HIVE-4655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13675103#comment-13675103 ] Jitendra Nath Pandey commented on HIVE-4655: Review board entry. https://reviews.apache.org/r/11634/ Vectorization not working with negative constants, hive doesn't fold constants. --- Key: HIVE-4655 URL: https://issues.apache.org/jira/browse/HIVE-4655 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Fix For: vectorization-branch Attachments: HIVE-4655.1.patch Hive optimizer doesn't fold the constants, however vectorized code path assumes that constants have been folded. This should be fixed in hive optimizer. In this jira we just fix vectorization path to handle folding for negative constants. This is needed because hive plan treats negative constants as unary-minus expression on constants, therefore these expressions also need constant folding. This fix will become redundant once constant folding is appropriately implemented in hive optimizer. (HIVE-746) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4665) error at VectorExecMapper.close in group-by-agg query over ORC, vectorized
[ https://issues.apache.org/jira/browse/HIVE-4665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13676826#comment-13676826 ] Jitendra Nath Pandey commented on HIVE-4665: We should use Writables from org.apache.hadoop.hive.serde2.io.* as much as possible. Writables from hadoop.io should be used only when an implementation in hive is not available. Also, the strings should use Text instead of BytesWritable. error at VectorExecMapper.close in group-by-agg query over ORC, vectorized -- Key: HIVE-4665 URL: https://issues.apache.org/jira/browse/HIVE-4665 Project: Hive Issue Type: Sub-task Components: Query Processor Affects Versions: vectorization-branch Reporter: Eric Hanson Assignee: Jitendra Nath Pandey CREATE EXTERNAL TABLE FactSqlEngineAM4712( dAppVersionBuild int, dAppVersionBuildUNMAPPED32449 int, dAppVersionMajor int, dAppVersionMinor32447 int, dAverageCols23083 int, dDatabaseSize23090 int, dDate string, dIsInternalMSFT16431 int, dLockEscalationDisabled23323 int, dLockEscalationEnabled23324 int, dMachineID int, dNumberTables23008 int, dNumCompressionPagePartitions23088 int, dNumCompressionRowPartitions23089 int, dNumIndexFragmentation23084 int, dNumPartitionedTables23098 int, dNumPartitions23099 int, dNumTablesClusterIndex23010 int, dNumTablesHeap23100 int, dSessionType5618 int, dSqlEdition8213 int, dTempDbSize23103 int, mNumColumnStoreIndexesVar48171 bigint, mOccurrences int, mRowFlag int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/user/ehans/SQM'; create table FactSqlEngineAM_vec_ORC ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' stored as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.CommonOrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' AS select * from FactSqlEngineAM4712; hive select ddate, max(dnumbertables23008) from factsqlengineam_vec_orc group by ddate; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks not specified. Estimated from input data size: 3 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number Validating if vectorized execution is applicable Going down the vectorization path java.lang.InstantiationException: org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector Continuing ... java.lang.RuntimeException: failed to evaluate: unbound=Class.new(); Continuing ... java.lang.InstantiationException: org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator Continuing ... java.lang.Exception: XMLEncoder: discarding statement ArrayList.add(VectorGroupByOperator); Continuing ... Starting Job = job_201306041757_0016, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201306041757_0016 Kill Command = c:\Hadoop\hadoop-1.1.0-SNAPSHOT\bin\hadoop.cmd job -kill job_201306041757_0016 Hadoop job information for Stage-1: number of mappers: 8; number of reducers: 3 2013-06-05 10:03:06,022 Stage-1 map = 0%, reduce = 0% 2013-06-05 10:03:51,142 Stage-1 map = 100%, reduce = 100% Ended Job = job_201306041757_0016 with errors Error during job, obtaining debugging information... Job Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_201306041757_0016 Examining task ID: task_201306041757_0016_m_09 (and more) from job job_201306041757_0016 Task with the most failures(4): - Task ID: task_201306041757_0016_m_00 URL: http://localhost:50030/taskdetails.jsp?jobid=job_201306041757_0016tipid=task_201306041757_0016_m_00 - Diagnostic Messages for this Task: java.lang.RuntimeException: Hive Runtime Error while closing operators at org.apache.hadoop.hive.ql.exec.vector.VectorExecMapper.close(VectorExecMapper.java:229) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:271) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135) at org.apache.hadoop.mapred.Child.main(Child.java:265) Caused by: java.lang.ClassCastException: org.apache.hadoop.io.BytesWritable cannot be cast to org.apache.hadoop.io.Text at
[jira] [Updated] (HIVE-4665) error at VectorExecMapper.close in group-by-agg query over ORC, vectorized
[ https://issues.apache.org/jira/browse/HIVE-4665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-4665: --- Attachment: HIVE-4665.1.patch error at VectorExecMapper.close in group-by-agg query over ORC, vectorized -- Key: HIVE-4665 URL: https://issues.apache.org/jira/browse/HIVE-4665 Project: Hive Issue Type: Sub-task Components: Query Processor Affects Versions: vectorization-branch Reporter: Eric Hanson Assignee: Jitendra Nath Pandey Attachments: HIVE-4665.1.patch CREATE EXTERNAL TABLE FactSqlEngineAM4712( dAppVersionBuild int, dAppVersionBuildUNMAPPED32449 int, dAppVersionMajor int, dAppVersionMinor32447 int, dAverageCols23083 int, dDatabaseSize23090 int, dDate string, dIsInternalMSFT16431 int, dLockEscalationDisabled23323 int, dLockEscalationEnabled23324 int, dMachineID int, dNumberTables23008 int, dNumCompressionPagePartitions23088 int, dNumCompressionRowPartitions23089 int, dNumIndexFragmentation23084 int, dNumPartitionedTables23098 int, dNumPartitions23099 int, dNumTablesClusterIndex23010 int, dNumTablesHeap23100 int, dSessionType5618 int, dSqlEdition8213 int, dTempDbSize23103 int, mNumColumnStoreIndexesVar48171 bigint, mOccurrences int, mRowFlag int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/user/ehans/SQM'; create table FactSqlEngineAM_vec_ORC ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' stored as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.CommonOrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' AS select * from FactSqlEngineAM4712; hive select ddate, max(dnumbertables23008) from factsqlengineam_vec_orc group by ddate; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks not specified. Estimated from input data size: 3 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number Validating if vectorized execution is applicable Going down the vectorization path java.lang.InstantiationException: org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector Continuing ... java.lang.RuntimeException: failed to evaluate: unbound=Class.new(); Continuing ... java.lang.InstantiationException: org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator Continuing ... java.lang.Exception: XMLEncoder: discarding statement ArrayList.add(VectorGroupByOperator); Continuing ... Starting Job = job_201306041757_0016, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201306041757_0016 Kill Command = c:\Hadoop\hadoop-1.1.0-SNAPSHOT\bin\hadoop.cmd job -kill job_201306041757_0016 Hadoop job information for Stage-1: number of mappers: 8; number of reducers: 3 2013-06-05 10:03:06,022 Stage-1 map = 0%, reduce = 0% 2013-06-05 10:03:51,142 Stage-1 map = 100%, reduce = 100% Ended Job = job_201306041757_0016 with errors Error during job, obtaining debugging information... Job Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_201306041757_0016 Examining task ID: task_201306041757_0016_m_09 (and more) from job job_201306041757_0016 Task with the most failures(4): - Task ID: task_201306041757_0016_m_00 URL: http://localhost:50030/taskdetails.jsp?jobid=job_201306041757_0016tipid=task_201306041757_0016_m_00 - Diagnostic Messages for this Task: java.lang.RuntimeException: Hive Runtime Error while closing operators at org.apache.hadoop.hive.ql.exec.vector.VectorExecMapper.close(VectorExecMapper.java:229) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:271) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135) at org.apache.hadoop.mapred.Child.main(Child.java:265) Caused by: java.lang.ClassCastException: org.apache.hadoop.io.BytesWritable cannot be cast to org.apache.hadoop.io.Text at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveWritableObject(Writable StringObjectInspector.java:40) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hashCode(ObjectInspectorUtils.java:481) at
[jira] [Resolved] (HIVE-4653) Favor serde2.io Writable classes over hadoop.io ones
[ https://issues.apache.org/jira/browse/HIVE-4653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey resolved HIVE-4653. Resolution: Duplicate HIVE-4665 will fix this. Favor serde2.io Writable classes over hadoop.io ones Key: HIVE-4653 URL: https://issues.apache.org/jira/browse/HIVE-4653 Project: Hive Issue Type: Sub-task Components: Query Processor Affects Versions: vectorization-branch Reporter: Remus Rusanu Assignee: Remus Rusanu Priority: Minor The Writables are originally from org.apache.hadoop.io. I tend to assume that they have been re-defined in hive if the original implementation was not considered good enough. However, I don't understand why some are defined twice in hive itself. I noticed that ByteWritable in o.a.h.hive.ql.exec is not being used anywhere. The ByteWritable in serde2.io is being referred to in bunch of places. Therefore, I would suggest to just use the one in serde2.io. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4654) Remove unused org.apache.hadoop.hive.ql.exec Writables
[ https://issues.apache.org/jira/browse/HIVE-4654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13676842#comment-13676842 ] Jitendra Nath Pandey commented on HIVE-4654: I think this is more general than vectorization effort. We should generally remove unused classes. I would suggest to remove it from subtasks of HIVE-4160 and make it a top level bug. Remove unused org.apache.hadoop.hive.ql.exec Writables -- Key: HIVE-4654 URL: https://issues.apache.org/jira/browse/HIVE-4654 Project: Hive Issue Type: Sub-task Components: Query Processor Reporter: Remus Rusanu Priority: Minor The Writables are originally from org.apache.hadoop.io. I tend to assume that they have been re-defined in hive if the original implementation was not considered good enough. However, I don't understand why some are defined twice in hive itself. I noticed that ByteWritable in o.a.h.hive.ql.exec is not being used anywhere. The ByteWritable in serde2.io is being referred to in bunch of places. Therefore, I would suggest to just use the one in serde2.io. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4665) error at VectorExecMapper.close in group-by-agg query over ORC, vectorized
[ https://issues.apache.org/jira/browse/HIVE-4665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13676851#comment-13676851 ] Jitendra Nath Pandey commented on HIVE-4665: Patch uploaded. Review board: https://reviews.apache.org/r/11666/ error at VectorExecMapper.close in group-by-agg query over ORC, vectorized -- Key: HIVE-4665 URL: https://issues.apache.org/jira/browse/HIVE-4665 Project: Hive Issue Type: Sub-task Components: Query Processor Affects Versions: vectorization-branch Reporter: Eric Hanson Assignee: Jitendra Nath Pandey Attachments: HIVE-4665.1.patch CREATE EXTERNAL TABLE FactSqlEngineAM4712( dAppVersionBuild int, dAppVersionBuildUNMAPPED32449 int, dAppVersionMajor int, dAppVersionMinor32447 int, dAverageCols23083 int, dDatabaseSize23090 int, dDate string, dIsInternalMSFT16431 int, dLockEscalationDisabled23323 int, dLockEscalationEnabled23324 int, dMachineID int, dNumberTables23008 int, dNumCompressionPagePartitions23088 int, dNumCompressionRowPartitions23089 int, dNumIndexFragmentation23084 int, dNumPartitionedTables23098 int, dNumPartitions23099 int, dNumTablesClusterIndex23010 int, dNumTablesHeap23100 int, dSessionType5618 int, dSqlEdition8213 int, dTempDbSize23103 int, mNumColumnStoreIndexesVar48171 bigint, mOccurrences int, mRowFlag int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/user/ehans/SQM'; create table FactSqlEngineAM_vec_ORC ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' stored as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.CommonOrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' AS select * from FactSqlEngineAM4712; hive select ddate, max(dnumbertables23008) from factsqlengineam_vec_orc group by ddate; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks not specified. Estimated from input data size: 3 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number Validating if vectorized execution is applicable Going down the vectorization path java.lang.InstantiationException: org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector Continuing ... java.lang.RuntimeException: failed to evaluate: unbound=Class.new(); Continuing ... java.lang.InstantiationException: org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator Continuing ... java.lang.Exception: XMLEncoder: discarding statement ArrayList.add(VectorGroupByOperator); Continuing ... Starting Job = job_201306041757_0016, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201306041757_0016 Kill Command = c:\Hadoop\hadoop-1.1.0-SNAPSHOT\bin\hadoop.cmd job -kill job_201306041757_0016 Hadoop job information for Stage-1: number of mappers: 8; number of reducers: 3 2013-06-05 10:03:06,022 Stage-1 map = 0%, reduce = 0% 2013-06-05 10:03:51,142 Stage-1 map = 100%, reduce = 100% Ended Job = job_201306041757_0016 with errors Error during job, obtaining debugging information... Job Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_201306041757_0016 Examining task ID: task_201306041757_0016_m_09 (and more) from job job_201306041757_0016 Task with the most failures(4): - Task ID: task_201306041757_0016_m_00 URL: http://localhost:50030/taskdetails.jsp?jobid=job_201306041757_0016tipid=task_201306041757_0016_m_00 - Diagnostic Messages for this Task: java.lang.RuntimeException: Hive Runtime Error while closing operators at org.apache.hadoop.hive.ql.exec.vector.VectorExecMapper.close(VectorExecMapper.java:229) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:271) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135) at org.apache.hadoop.mapred.Child.main(Child.java:265) Caused by: java.lang.ClassCastException: org.apache.hadoop.io.BytesWritable cannot be cast to org.apache.hadoop.io.Text at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveWritableObject(Writable StringObjectInspector.java:40) at
[jira] [Created] (HIVE-4673) Use VectorExpessionWriter to write column vectors into Writables.
Jitendra Nath Pandey created HIVE-4673: -- Summary: Use VectorExpessionWriter to write column vectors into Writables. Key: HIVE-4673 URL: https://issues.apache.org/jira/browse/HIVE-4673 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey VectorExpressionWriter interface should be used to write column vectors into Writables. VectorExpressionWriter supports all primitive datatypes and this will make vector select operator and vector group by operators consistent. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4673) Use VectorExpessionWriter to write column vectors into Writables.
[ https://issues.apache.org/jira/browse/HIVE-4673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-4673: --- Attachment: HIVE-4673.1.patch Use VectorExpessionWriter to write column vectors into Writables. - Key: HIVE-4673 URL: https://issues.apache.org/jira/browse/HIVE-4673 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-4673.1.patch VectorExpressionWriter interface should be used to write column vectors into Writables. VectorExpressionWriter supports all primitive datatypes and this will make vector select operator and vector group by operators consistent. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4673) Use VectorExpessionWriter to write column vectors into Writables.
[ https://issues.apache.org/jira/browse/HIVE-4673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-4673: --- Description: VectorExpressionWriter interface should be used to write column vectors into Writables. VectorExpressionWriter supports all primitive datatypes and this will make vector select operator and vector group by operators consistent. (was: VectorExpressionWriter interface should be used to write column vectors into Writables. VectorExpressionWriter supports all primitive datatypes and this will make vector select operator and vector group by operators consistent.) Use VectorExpessionWriter to write column vectors into Writables. - Key: HIVE-4673 URL: https://issues.apache.org/jira/browse/HIVE-4673 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-4673.1.patch VectorExpressionWriter interface should be used to write column vectors into Writables. VectorExpressionWriter supports all primitive datatypes and this will make vector select operator and vector group by operators consistent. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4665) error at VectorExecMapper.close in group-by-agg query over ORC, vectorized
[ https://issues.apache.org/jira/browse/HIVE-4665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-4665: --- Attachment: HIVE-4665.2.patch Another patch uploaded to fix the object inspector for VectorUDAFMinMaxString. For text we should use WritableStringObjectInspector. I have verified, that with this change select min(stringCol) from table also works. Without this fix, it would fail. error at VectorExecMapper.close in group-by-agg query over ORC, vectorized -- Key: HIVE-4665 URL: https://issues.apache.org/jira/browse/HIVE-4665 Project: Hive Issue Type: Sub-task Components: Query Processor Affects Versions: vectorization-branch Reporter: Eric Hanson Assignee: Jitendra Nath Pandey Attachments: HIVE-4665.1.patch, HIVE-4665.2.patch CREATE EXTERNAL TABLE FactSqlEngineAM4712( dAppVersionBuild int, dAppVersionBuildUNMAPPED32449 int, dAppVersionMajor int, dAppVersionMinor32447 int, dAverageCols23083 int, dDatabaseSize23090 int, dDate string, dIsInternalMSFT16431 int, dLockEscalationDisabled23323 int, dLockEscalationEnabled23324 int, dMachineID int, dNumberTables23008 int, dNumCompressionPagePartitions23088 int, dNumCompressionRowPartitions23089 int, dNumIndexFragmentation23084 int, dNumPartitionedTables23098 int, dNumPartitions23099 int, dNumTablesClusterIndex23010 int, dNumTablesHeap23100 int, dSessionType5618 int, dSqlEdition8213 int, dTempDbSize23103 int, mNumColumnStoreIndexesVar48171 bigint, mOccurrences int, mRowFlag int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/user/ehans/SQM'; create table FactSqlEngineAM_vec_ORC ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' stored as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.CommonOrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' AS select * from FactSqlEngineAM4712; hive select ddate, max(dnumbertables23008) from factsqlengineam_vec_orc group by ddate; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks not specified. Estimated from input data size: 3 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number Validating if vectorized execution is applicable Going down the vectorization path java.lang.InstantiationException: org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector Continuing ... java.lang.RuntimeException: failed to evaluate: unbound=Class.new(); Continuing ... java.lang.InstantiationException: org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator Continuing ... java.lang.Exception: XMLEncoder: discarding statement ArrayList.add(VectorGroupByOperator); Continuing ... Starting Job = job_201306041757_0016, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201306041757_0016 Kill Command = c:\Hadoop\hadoop-1.1.0-SNAPSHOT\bin\hadoop.cmd job -kill job_201306041757_0016 Hadoop job information for Stage-1: number of mappers: 8; number of reducers: 3 2013-06-05 10:03:06,022 Stage-1 map = 0%, reduce = 0% 2013-06-05 10:03:51,142 Stage-1 map = 100%, reduce = 100% Ended Job = job_201306041757_0016 with errors Error during job, obtaining debugging information... Job Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_201306041757_0016 Examining task ID: task_201306041757_0016_m_09 (and more) from job job_201306041757_0016 Task with the most failures(4): - Task ID: task_201306041757_0016_m_00 URL: http://localhost:50030/taskdetails.jsp?jobid=job_201306041757_0016tipid=task_201306041757_0016_m_00 - Diagnostic Messages for this Task: java.lang.RuntimeException: Hive Runtime Error while closing operators at org.apache.hadoop.hive.ql.exec.vector.VectorExecMapper.close(VectorExecMapper.java:229) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:271) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135) at org.apache.hadoop.mapred.Child.main(Child.java:265) Caused by: java.lang.ClassCastException: org.apache.hadoop.io.BytesWritable cannot be cast to org.apache.hadoop.io.Text at
[jira] [Assigned] (HIVE-4599) VectorGroupByOperator steals the non-vectorized children and crashes query if vectorization fails
[ https://issues.apache.org/jira/browse/HIVE-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey reassigned HIVE-4599: -- Assignee: Jitendra Nath Pandey (was: Remus Rusanu) VectorGroupByOperator steals the non-vectorized children and crashes query if vectorization fails - Key: HIVE-4599 URL: https://issues.apache.org/jira/browse/HIVE-4599 Project: Hive Issue Type: Sub-task Components: Query Processor Affects Versions: vectorization-branch Reporter: Remus Rusanu Assignee: Jitendra Nath Pandey Have the VGBy clone it's own row mode children or implement vector mode output (including VectorReduceSinkOperator) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4599) VectorGroupByOperator steals the non-vectorized children and crashes query if vectorization fails
[ https://issues.apache.org/jira/browse/HIVE-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-4599: --- Attachment: HIVE-4599.1.patch Patch uploaded. The non-vectorized children are cloned. VectorGroupByOperator steals the non-vectorized children and crashes query if vectorization fails - Key: HIVE-4599 URL: https://issues.apache.org/jira/browse/HIVE-4599 Project: Hive Issue Type: Sub-task Components: Query Processor Affects Versions: vectorization-branch Reporter: Remus Rusanu Assignee: Jitendra Nath Pandey Attachments: HIVE-4599.1.patch Have the VGBy clone it's own row mode children or implement vector mode output (including VectorReduceSinkOperator) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4685) query using LIKE does not vectorize, then crashes
[ https://issues.apache.org/jira/browse/HIVE-4685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13678206#comment-13678206 ] Jitendra Nath Pandey commented on HIVE-4685: I suspect this is same as HIVE-4599. I have a patch on HIVE-4599, which should hopefully fix the issue of query crashing. query using LIKE does not vectorize, then crashes - Key: HIVE-4685 URL: https://issues.apache.org/jira/browse/HIVE-4685 Project: Hive Issue Type: Sub-task Components: Query Processor Affects Versions: vectorization-branch Reporter: Eric Hanson The query select count(ddate) from factsqlengineam_vec_orc where ddate like 2013%; Starts up but does not run in vectorization mode. Then during non-vectorized execution it crashes. Expected result: Query runs vectorized and runs successfully. Actual result: hive select count(ddate) from factsqlengineam_vec_orc where ddate like 2013%; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number Validating if vectorized execution is applicable Cannot vectorize the plan: org.apache.hadoop.hive.ql.metadata.HiveException: Udf: GenericUDFBridge, is not supported java.lang.InstantiationException: org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator Continuing ... java.lang.Exception: XMLEncoder: discarding statement ArrayList.add(VectorGroupByOperator); Continuing ... Starting Job = job_201306061504_0041, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201306061504_0041 Kill Command = c:\Hadoop\hadoop-1.1.0-SNAPSHOT\bin\hadoop.cmd job -kill job_201306061504_0041 Hadoop job information for Stage-1: number of mappers: 8; number of reducers: 1 2013-06-07 10:41:31,544 Stage-1 map = 0%, reduce = 0% 2013-06-07 10:42:01,677 Stage-1 map = 100%, reduce = 100% Ended Job = job_201306061504_0041 with errors Error during job, obtaining debugging information... Job Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_201306061504_0041 Examining task ID: task_201306061504_0041_m_09 (and more) from job job_201306061504_0041 Examining task ID: task_201306061504_0041_m_02 (and more) from job job_201306061504_0041 Examining task ID: task_201306061504_0041_m_00 (and more) from job job_201306061504_0041 Examining task ID: task_201306061504_0041_m_04 (and more) from job job_201306061504_0041 Task with the most failures(4): - Task ID: task_201306061504_0041_m_06 URL: http://localhost:50030/taskdetails.jsp?jobid=job_201306061504_0041tipid=task_201306061504_0041_m_06 - Diagnostic Messages for this Task: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:271) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135) at org.apache.hadoop.mapred.Child.main(Child.java:265) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 9 more Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) ... 14 more Caused by: java.lang.reflect.InvocationTargetException at
[jira] [Commented] (HIVE-4685) query using LIKE does not vectorize
[ https://issues.apache.org/jira/browse/HIVE-4685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13678410#comment-13678410 ] Jitendra Nath Pandey commented on HIVE-4685: Those messages are now logged in debug mode. query using LIKE does not vectorize --- Key: HIVE-4685 URL: https://issues.apache.org/jira/browse/HIVE-4685 Project: Hive Issue Type: Sub-task Components: Query Processor Affects Versions: vectorization-branch Reporter: Eric Hanson The query select count(ddate) from factsqlengineam_vec_orc where ddate like 2013%; Starts up but does not run in vectorization mode. Then during non-vectorized execution it crashes. Expected result: Query runs vectorized and runs successfully. Actual result: hive select count(ddate) from factsqlengineam_vec_orc where ddate like 2013%; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number Validating if vectorized execution is applicable Cannot vectorize the plan: org.apache.hadoop.hive.ql.metadata.HiveException: Udf: GenericUDFBridge, is not supported java.lang.InstantiationException: org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator Continuing ... java.lang.Exception: XMLEncoder: discarding statement ArrayList.add(VectorGroupByOperator); Continuing ... Starting Job = job_201306061504_0041, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201306061504_0041 Kill Command = c:\Hadoop\hadoop-1.1.0-SNAPSHOT\bin\hadoop.cmd job -kill job_201306061504_0041 Hadoop job information for Stage-1: number of mappers: 8; number of reducers: 1 2013-06-07 10:41:31,544 Stage-1 map = 0%, reduce = 0% 2013-06-07 10:42:01,677 Stage-1 map = 100%, reduce = 100% Ended Job = job_201306061504_0041 with errors Error during job, obtaining debugging information... Job Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_201306061504_0041 Examining task ID: task_201306061504_0041_m_09 (and more) from job job_201306061504_0041 Examining task ID: task_201306061504_0041_m_02 (and more) from job job_201306061504_0041 Examining task ID: task_201306061504_0041_m_00 (and more) from job job_201306061504_0041 Examining task ID: task_201306061504_0041_m_04 (and more) from job job_201306061504_0041 Task with the most failures(4): - Task ID: task_201306061504_0041_m_06 URL: http://localhost:50030/taskdetails.jsp?jobid=job_201306061504_0041tipid=task_201306061504_0041_m_06 - Diagnostic Messages for this Task: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:271) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135) at org.apache.hadoop.mapred.Child.main(Child.java:265) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 9 more Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) ... 14 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
[jira] [Commented] (HIVE-4599) VectorGroupByOperator steals the non-vectorized children and crashes query if vectorization fails
[ https://issues.apache.org/jira/browse/HIVE-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13678411#comment-13678411 ] Jitendra Nath Pandey commented on HIVE-4599: Those messages are logged at debug level. VectorGroupByOperator steals the non-vectorized children and crashes query if vectorization fails - Key: HIVE-4599 URL: https://issues.apache.org/jira/browse/HIVE-4599 Project: Hive Issue Type: Sub-task Components: Query Processor Affects Versions: vectorization-branch Reporter: Remus Rusanu Assignee: Jitendra Nath Pandey Attachments: HIVE-4599.1.patch Have the VGBy clone it's own row mode children or implement vector mode output (including VectorReduceSinkOperator) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4678) second clause of AND filter not applied for vectorized execution
[ https://issues.apache.org/jira/browse/HIVE-4678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13678489#comment-13678489 ] Jitendra Nath Pandey commented on HIVE-4678: This issue is same as HIVE-4680, will fix it in the same jira. second clause of AND filter not applied for vectorized execution Key: HIVE-4678 URL: https://issues.apache.org/jira/browse/HIVE-4678 Project: Hive Issue Type: Sub-task Components: Query Processor Affects Versions: vectorization-branch Reporter: Eric Hanson Assignee: Jitendra Nath Pandey Query select ddate, dnumbertables23008 from factsqlengineam_vec_orc where ddate = 2013-01-08 00:00:00 and dnumbertables23008 = 1052436; returns rows where dnumbertables23008 != 1052436. Actual results: 636087 rows Sample: ... 2013-01-08 00:00:00 0 2013-01-08 00:00:00 0 2013-01-08 00:00:00 108 2013-01-08 00:00:00 0 2013-01-08 00:00:00 0 2013-01-08 00:00:00 1625 2013-01-08 00:00:00 210 2013-01-08 00:00:00 0 2013-01-08 00:00:00 209 2013-01-08 00:00:00 0 ... Expected results: Either no rows returned, or all rows have 1052436 in second column. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-4680) second clause of OR filter not applied in vectorized query execution
[ https://issues.apache.org/jira/browse/HIVE-4680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey reassigned HIVE-4680: -- Assignee: Jitendra Nath Pandey second clause of OR filter not applied in vectorized query execution Key: HIVE-4680 URL: https://issues.apache.org/jira/browse/HIVE-4680 Project: Hive Issue Type: Sub-task Components: Query Processor Affects Versions: vectorization-branch Reporter: Eric Hanson Assignee: Jitendra Nath Pandey query: select ddate, count\(\*\) from factsqlengineam_vec_orc where ddate = 2012-05-19 00:00:00 OR ddate = 2012-05-20 00:00:00 group by ddate; Actual result: OK 2012-05-19 00:00:00 528741 Expected result: There would be two rows, one for each day in the OR clause in the query. This query actually returns a row, so there is data there for 2012-05-20. select ddate, count\(\*\) from factsqlengineam_vec_orc where ddate = 2012-05-20 00:00:00 group by ddate; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-4678) second clause of AND filter not applied for vectorized execution
[ https://issues.apache.org/jira/browse/HIVE-4678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey reassigned HIVE-4678: -- Assignee: Jitendra Nath Pandey second clause of AND filter not applied for vectorized execution Key: HIVE-4678 URL: https://issues.apache.org/jira/browse/HIVE-4678 Project: Hive Issue Type: Sub-task Components: Query Processor Affects Versions: vectorization-branch Reporter: Eric Hanson Assignee: Jitendra Nath Pandey Query select ddate, dnumbertables23008 from factsqlengineam_vec_orc where ddate = 2013-01-08 00:00:00 and dnumbertables23008 = 1052436; returns rows where dnumbertables23008 != 1052436. Actual results: 636087 rows Sample: ... 2013-01-08 00:00:00 0 2013-01-08 00:00:00 0 2013-01-08 00:00:00 108 2013-01-08 00:00:00 0 2013-01-08 00:00:00 0 2013-01-08 00:00:00 1625 2013-01-08 00:00:00 210 2013-01-08 00:00:00 0 2013-01-08 00:00:00 209 2013-01-08 00:00:00 0 ... Expected results: Either no rows returned, or all rows have 1052436 in second column. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-4680) second clause of OR filter not applied in vectorized query execution
[ https://issues.apache.org/jira/browse/HIVE-4680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey resolved HIVE-4680. Resolution: Duplicate It is same issue as HIVE-4678. second clause of OR filter not applied in vectorized query execution Key: HIVE-4680 URL: https://issues.apache.org/jira/browse/HIVE-4680 Project: Hive Issue Type: Sub-task Components: Query Processor Affects Versions: vectorization-branch Reporter: Eric Hanson Assignee: Jitendra Nath Pandey query: select ddate, count\(\*\) from factsqlengineam_vec_orc where ddate = 2012-05-19 00:00:00 OR ddate = 2012-05-20 00:00:00 group by ddate; Actual result: OK 2012-05-19 00:00:00 528741 Expected result: There would be two rows, one for each day in the OR clause in the query. This query actually returns a row, so there is data there for 2012-05-20. select ddate, count\(\*\) from factsqlengineam_vec_orc where ddate = 2012-05-20 00:00:00 group by ddate; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4678) second clause of AND, OR filter not applied for vectorized execution
[ https://issues.apache.org/jira/browse/HIVE-4678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-4678: --- Summary: second clause of AND, OR filter not applied for vectorized execution (was: second clause of AND filter not applied for vectorized execution) second clause of AND, OR filter not applied for vectorized execution Key: HIVE-4678 URL: https://issues.apache.org/jira/browse/HIVE-4678 Project: Hive Issue Type: Sub-task Components: Query Processor Affects Versions: vectorization-branch Reporter: Eric Hanson Assignee: Jitendra Nath Pandey Query select ddate, dnumbertables23008 from factsqlengineam_vec_orc where ddate = 2013-01-08 00:00:00 and dnumbertables23008 = 1052436; returns rows where dnumbertables23008 != 1052436. Actual results: 636087 rows Sample: ... 2013-01-08 00:00:00 0 2013-01-08 00:00:00 0 2013-01-08 00:00:00 108 2013-01-08 00:00:00 0 2013-01-08 00:00:00 0 2013-01-08 00:00:00 1625 2013-01-08 00:00:00 210 2013-01-08 00:00:00 0 2013-01-08 00:00:00 209 2013-01-08 00:00:00 0 ... Expected results: Either no rows returned, or all rows have 1052436 in second column. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-4667) tpch query 1 fails with java.lang.ClassCastException
[ https://issues.apache.org/jira/browse/HIVE-4667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey reassigned HIVE-4667: -- Assignee: Jitendra Nath Pandey tpch query 1 fails with java.lang.ClassCastException Key: HIVE-4667 URL: https://issues.apache.org/jira/browse/HIVE-4667 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Tony Murphy Assignee: Jitendra Nath Pandey Fix For: vectorization-branch {noformat} Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to org.apache.hadoop.hive.ql.exec.vector.DoubleColumnVector at org.apache.hadoop.hive.ql.exec.vector.expressions.gen.DoubleColSubtractLongScalar.evaluate(DoubleColSubtractLongScalar.java:46) at org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression.evaluateChildren(VectorExpression.java:69) at org.apache.hadoop.hive.ql.exec.vector.expressions.gen.DoubleColMultiplyDoubleColumn.evaluate(DoubleColMultiplyDoubleColumn.java:41) at org.apache.hadoop.hive.ql.exec.vector.expressions.aggregates.gen.VectorUDAFSumDouble.aggregateInputSelection(VectorUDAFSumDouble.java:98) at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.processAggregators(VectorGroupByOperator.java:174) at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.processOp(VectorGroupByOperator.java:151) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832) at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.processOp(VectorSelectOperator.java:104) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832) at org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.processOp(VectorFilterOperator.java:91) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:90) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832) at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:717) ... 9 more {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4667) tpch query 1 fails with java.lang.ClassCastException
[ https://issues.apache.org/jira/browse/HIVE-4667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-4667: --- Attachment: HIVE-4667.1.patch tpch query 1 fails with java.lang.ClassCastException Key: HIVE-4667 URL: https://issues.apache.org/jira/browse/HIVE-4667 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Tony Murphy Assignee: Jitendra Nath Pandey Fix For: vectorization-branch Attachments: HIVE-4667.1.patch {noformat} Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to org.apache.hadoop.hive.ql.exec.vector.DoubleColumnVector at org.apache.hadoop.hive.ql.exec.vector.expressions.gen.DoubleColSubtractLongScalar.evaluate(DoubleColSubtractLongScalar.java:46) at org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression.evaluateChildren(VectorExpression.java:69) at org.apache.hadoop.hive.ql.exec.vector.expressions.gen.DoubleColMultiplyDoubleColumn.evaluate(DoubleColMultiplyDoubleColumn.java:41) at org.apache.hadoop.hive.ql.exec.vector.expressions.aggregates.gen.VectorUDAFSumDouble.aggregateInputSelection(VectorUDAFSumDouble.java:98) at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.processAggregators(VectorGroupByOperator.java:174) at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.processOp(VectorGroupByOperator.java:151) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832) at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.processOp(VectorSelectOperator.java:104) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832) at org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.processOp(VectorFilterOperator.java:91) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:90) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832) at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:717) ... 9 more {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4667) tpch query 1 fails with java.lang.ClassCastException
[ https://issues.apache.org/jira/browse/HIVE-4667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13678558#comment-13678558 ] Jitendra Nath Pandey commented on HIVE-4667: Patch uploaded. The patch includes the fix for HIVE-4678 as well. tpch query 1 fails with java.lang.ClassCastException Key: HIVE-4667 URL: https://issues.apache.org/jira/browse/HIVE-4667 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Tony Murphy Assignee: Jitendra Nath Pandey Fix For: vectorization-branch Attachments: HIVE-4667.1.patch {noformat} Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to org.apache.hadoop.hive.ql.exec.vector.DoubleColumnVector at org.apache.hadoop.hive.ql.exec.vector.expressions.gen.DoubleColSubtractLongScalar.evaluate(DoubleColSubtractLongScalar.java:46) at org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression.evaluateChildren(VectorExpression.java:69) at org.apache.hadoop.hive.ql.exec.vector.expressions.gen.DoubleColMultiplyDoubleColumn.evaluate(DoubleColMultiplyDoubleColumn.java:41) at org.apache.hadoop.hive.ql.exec.vector.expressions.aggregates.gen.VectorUDAFSumDouble.aggregateInputSelection(VectorUDAFSumDouble.java:98) at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.processAggregators(VectorGroupByOperator.java:174) at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.processOp(VectorGroupByOperator.java:151) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832) at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.processOp(VectorSelectOperator.java:104) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832) at org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.processOp(VectorFilterOperator.java:91) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:90) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832) at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:717) ... 9 more {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4688) NPE in writing null values.
Jitendra Nath Pandey created HIVE-4688: -- Summary: NPE in writing null values. Key: HIVE-4688 URL: https://issues.apache.org/jira/browse/HIVE-4688 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey VectorExpressionWriter throws NPE when writing null values. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-4668) wrong results for query with modulo (%) in WHERE clause filter
[ https://issues.apache.org/jira/browse/HIVE-4668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey resolved HIVE-4668. Resolution: Fixed wrong results for query with modulo (%) in WHERE clause filter -- Key: HIVE-4668 URL: https://issues.apache.org/jira/browse/HIVE-4668 Project: Hive Issue Type: Sub-task Components: Query Processor Affects Versions: vectorization-branch Reporter: Eric Hanson Assignee: Sarvesh Sakalanaga select disinternalmsft16431, count(disinternalmsft16431) from factsqlengineam_vec_orc where ddate = 2012-12 and ddate 2013-02 and disinternalmsft16431 % 5 = 0 group by disinternalmsft16431 Expected result: 0 3160232 5 33039254 Actual result: 0 8697033 6 2706407 5 94709959 There should be no result row for 6 because 6 % 5 != 0. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4688) NPE in writing null values.
[ https://issues.apache.org/jira/browse/HIVE-4688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-4688: --- Attachment: HIVE-4688.1.patch Patch uploaded. For null values, we should return NullWritable instead of null. NPE in writing null values. --- Key: HIVE-4688 URL: https://issues.apache.org/jira/browse/HIVE-4688 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-4688.1.patch VectorExpressionWriter throws NPE when writing null values. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-4678) second clause of AND, OR filter not applied for vectorized execution
[ https://issues.apache.org/jira/browse/HIVE-4678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey resolved HIVE-4678. Resolution: Fixed The fix for this was included in the patch for HIVE-4667. second clause of AND, OR filter not applied for vectorized execution Key: HIVE-4678 URL: https://issues.apache.org/jira/browse/HIVE-4678 Project: Hive Issue Type: Sub-task Components: Query Processor Affects Versions: vectorization-branch Reporter: Eric Hanson Assignee: Jitendra Nath Pandey Query select ddate, dnumbertables23008 from factsqlengineam_vec_orc where ddate = 2013-01-08 00:00:00 and dnumbertables23008 = 1052436; returns rows where dnumbertables23008 != 1052436. Actual results: 636087 rows Sample: ... 2013-01-08 00:00:00 0 2013-01-08 00:00:00 0 2013-01-08 00:00:00 108 2013-01-08 00:00:00 0 2013-01-08 00:00:00 0 2013-01-08 00:00:00 1625 2013-01-08 00:00:00 210 2013-01-08 00:00:00 0 2013-01-08 00:00:00 209 2013-01-08 00:00:00 0 ... Expected results: Either no rows returned, or all rows have 1052436 in second column. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4688) NPE in writing null values.
[ https://issues.apache.org/jira/browse/HIVE-4688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-4688: --- Attachment: HIVE-4688.2.patch NPE in writing null values. --- Key: HIVE-4688 URL: https://issues.apache.org/jira/browse/HIVE-4688 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-4688.1.patch, HIVE-4688.2.patch VectorExpressionWriter throws NPE when writing null values. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4688) NPE in writing null values.
[ https://issues.apache.org/jira/browse/HIVE-4688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-4688: --- Attachment: HIVE-4688.3.patch NPE in writing null values. --- Key: HIVE-4688 URL: https://issues.apache.org/jira/browse/HIVE-4688 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-4688.1.patch, HIVE-4688.2.patch, HIVE-4688.3.patch VectorExpressionWriter throws NPE when writing null values. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-4695) Unit test failure in TestColumnColumnOperationVectorExpressionEvaluation
[ https://issues.apache.org/jira/browse/HIVE-4695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey reassigned HIVE-4695: -- Assignee: Eric Hanson (was: Jitendra Nath Pandey) Unit test failure in TestColumnColumnOperationVectorExpressionEvaluation Key: HIVE-4695 URL: https://issues.apache.org/jira/browse/HIVE-4695 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Eric Hanson failure message=Output column vector repeating state does not match operand columns expected:lt;truegt; but was:lt;falsegt; type=junit.framework.AssertionFailedErrorjunit.framework.AssertionFailedError: Output column vector repeating state does not match operand columns expected:lt;truegt; but was:lt;falsegt; at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.apache.hadoop.hive.ql.exec.vector.expressions.gen.TestColumnColumnOperationVectorExpressionEvaluation.testDoubleColModuloDoubleColumnOutNullsRepeatsC1NullsRepeats(TestColumnColumnOperationVectorExpressionEvaluation.java:5396) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) at org.junit.runners.ParentRunner.run(ParentRunner.java:300) at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:518) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1052) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:906) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4695) Unit test failure in TestColumnColumnOperationVectorExpressionEvaluation
Jitendra Nath Pandey created HIVE-4695: -- Summary: Unit test failure in TestColumnColumnOperationVectorExpressionEvaluation Key: HIVE-4695 URL: https://issues.apache.org/jira/browse/HIVE-4695 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey failure message=Output column vector repeating state does not match operand columns expected:lt;truegt; but was:lt;falsegt; type=junit.framework.AssertionFailedErrorjunit.framework.AssertionFailedError: Output column vector repeating state does not match operand columns expected:lt;truegt; but was:lt;falsegt; at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.apache.hadoop.hive.ql.exec.vector.expressions.gen.TestColumnColumnOperationVectorExpressionEvaluation.testDoubleColModuloDoubleColumnOutNullsRepeatsC1NullsRepeats(TestColumnColumnOperationVectorExpressionEvaluation.java:5396) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) at org.junit.runners.ParentRunner.run(ParentRunner.java:300) at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:518) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1052) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:906) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4694) Fix ORC TestVectorizedORCReader testcase for Timestamps
[ https://issues.apache.org/jira/browse/HIVE-4694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-4694: --- Issue Type: Sub-task (was: Bug) Parent: HIVE-4160 Fix ORC TestVectorizedORCReader testcase for Timestamps --- Key: HIVE-4694 URL: https://issues.apache.org/jira/browse/HIVE-4694 Project: Hive Issue Type: Sub-task Components: Tests Affects Versions: vectorization-branch Reporter: Gopal V Assignee: Gopal V Priority: Minor Fix For: vectorization-branch Attachments: HIVE-4694.patch ORC vectorized tests were not testing for timestamps correctly. java.sql.Timestamp is a confusing API, because of the mix of getTime() getNanos() usage. Though it might look like they return independent values, getTime() includes part of the value already present in getNanos(). Please view the implementation code for the confusion http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/sql/Timestamp.java#Timestamp.getTime%28%29 Fix in HIVE-4681 caused test-failures, which needs the test to be fixed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-4702) Unit test failure TestVectorSelectOperator
[ https://issues.apache.org/jira/browse/HIVE-4702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey reassigned HIVE-4702: -- Assignee: Jitendra Nath Pandey Unit test failure TestVectorSelectOperator -- Key: HIVE-4702 URL: https://issues.apache.org/jira/browse/HIVE-4702 Project: Hive Issue Type: Sub-task Components: Query Processor Affects Versions: vectorization-branch Reporter: Eric Hanson Assignee: Jitendra Nath Pandey TestCase TestVectorSelectOperator Name Status Type Time(s) testSelectOperator Error N/A java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.vector.TestVectorSelectOperator$ValidatorVectorSelectOperator.forward(TestVectorSelectOperator.java:52) at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.processOp(VectorSelectOperator.java:124) at org.apache.hadoop.hive.ql.exec.vector.TestVectorSelectOperator.testSelectOperator(TestVectorSelectOperator.java:87) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) at org.junit.runners.ParentRunner.run(ParentRunner.java:300) at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:422) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:931) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:785) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4702) Unit test failure TestVectorSelectOperator
[ https://issues.apache.org/jira/browse/HIVE-4702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-4702: --- Attachment: HIVE-4702.1.patch Unit test failure TestVectorSelectOperator -- Key: HIVE-4702 URL: https://issues.apache.org/jira/browse/HIVE-4702 Project: Hive Issue Type: Sub-task Components: Query Processor Affects Versions: vectorization-branch Reporter: Eric Hanson Assignee: Jitendra Nath Pandey Attachments: HIVE-4702.1.patch TestCase TestVectorSelectOperator Name Status Type Time(s) testSelectOperator Error N/A java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.vector.TestVectorSelectOperator$ValidatorVectorSelectOperator.forward(TestVectorSelectOperator.java:52) at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.processOp(VectorSelectOperator.java:124) at org.apache.hadoop.hive.ql.exec.vector.TestVectorSelectOperator.testSelectOperator(TestVectorSelectOperator.java:87) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) at org.junit.runners.ParentRunner.run(ParentRunner.java:300) at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:422) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:931) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:785) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4702) Unit test failure TestVectorSelectOperator
[ https://issues.apache.org/jira/browse/HIVE-4702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-4702: --- Status: Patch Available (was: Open) Unit test failure TestVectorSelectOperator -- Key: HIVE-4702 URL: https://issues.apache.org/jira/browse/HIVE-4702 Project: Hive Issue Type: Sub-task Components: Query Processor Affects Versions: vectorization-branch Reporter: Eric Hanson Assignee: Jitendra Nath Pandey Attachments: HIVE-4702.1.patch TestCase TestVectorSelectOperator Name Status Type Time(s) testSelectOperator Error N/A java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.vector.TestVectorSelectOperator$ValidatorVectorSelectOperator.forward(TestVectorSelectOperator.java:52) at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.processOp(VectorSelectOperator.java:124) at org.apache.hadoop.hive.ql.exec.vector.TestVectorSelectOperator.testSelectOperator(TestVectorSelectOperator.java:87) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) at org.junit.runners.ParentRunner.run(ParentRunner.java:300) at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:422) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:931) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:785) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-4596) Fix serialization exceptions in VectorGroupByOperator
[ https://issues.apache.org/jira/browse/HIVE-4596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey resolved HIVE-4596. Resolution: Fixed Release Note: This was fixed with HIVE-4599. The exception was happening because the non-vector operators were not being cloned appropriately and lead to corrupting the original tree. Fix serialization exceptions in VectorGroupByOperator - Key: HIVE-4596 URL: https://issues.apache.org/jira/browse/HIVE-4596 Project: Hive Issue Type: Sub-task Components: Query Processor Affects Versions: vectorization-branch Reporter: Remus Rusanu Assignee: Remus Rusanu Priority: Minor Going down the vectorization path java.lang.InstantiationException: org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector Continuing ... java.lang.RuntimeException: failed to evaluate: unbound=Class.new(); Continuing ... java.lang.InstantiationException: org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator Continuing ... java.lang.Exception: XMLEncoder: discarding statement ArrayList.add(VectorGroupByOperator); Continuing ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4596) Fix serialization exceptions in VectorGroupByOperator
[ https://issues.apache.org/jira/browse/HIVE-4596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-4596: --- Release Note: (was: This was fixed with HIVE-4599. The exception was happening because the non-vector operators were not being cloned appropriately and lead to corrupting the original tree.) Fix serialization exceptions in VectorGroupByOperator - Key: HIVE-4596 URL: https://issues.apache.org/jira/browse/HIVE-4596 Project: Hive Issue Type: Sub-task Components: Query Processor Affects Versions: vectorization-branch Reporter: Remus Rusanu Assignee: Remus Rusanu Priority: Minor Going down the vectorization path java.lang.InstantiationException: org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector Continuing ... java.lang.RuntimeException: failed to evaluate: unbound=Class.new(); Continuing ... java.lang.InstantiationException: org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator Continuing ... java.lang.Exception: XMLEncoder: discarding statement ArrayList.add(VectorGroupByOperator); Continuing ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4596) Fix serialization exceptions in VectorGroupByOperator
[ https://issues.apache.org/jira/browse/HIVE-4596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680226#comment-13680226 ] Jitendra Nath Pandey commented on HIVE-4596: This was fixed with HIVE-4599. The exception was happening because the non-vector operators were not being cloned appropriately and lead to corrupting the original tree. Fix serialization exceptions in VectorGroupByOperator - Key: HIVE-4596 URL: https://issues.apache.org/jira/browse/HIVE-4596 Project: Hive Issue Type: Sub-task Components: Query Processor Affects Versions: vectorization-branch Reporter: Remus Rusanu Assignee: Remus Rusanu Priority: Minor Going down the vectorization path java.lang.InstantiationException: org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector Continuing ... java.lang.RuntimeException: failed to evaluate: unbound=Class.new(); Continuing ... java.lang.InstantiationException: org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator Continuing ... java.lang.Exception: XMLEncoder: discarding statement ArrayList.add(VectorGroupByOperator); Continuing ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-4714) Vectorized Sum of scalar subtract column returns negative result when positive exected
[ https://issues.apache.org/jira/browse/HIVE-4714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey reassigned HIVE-4714: -- Assignee: Jitendra Nath Pandey Vectorized Sum of scalar subtract column returns negative result when positive exected -- Key: HIVE-4714 URL: https://issues.apache.org/jira/browse/HIVE-4714 Project: Hive Issue Type: Sub-task Reporter: Tony Murphy Assignee: Jitendra Nath Pandey Attachments: sum_data.zip Actual: -5701157.669591231 Expected: 5701157.663489044 {noformat} drop table LINEITEM_ORC; create external table LINEITEM_ORC(L_DISCOUNT float ) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.CommonOrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'; {noformat} {noformat} SELECT Sum(1 - l_discount) FROM Lineitem_orc {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4714) Vectorized Sum of scalar subtract column returns negative result when positive exected
[ https://issues.apache.org/jira/browse/HIVE-4714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-4714: --- Status: Patch Available (was: Open) Vectorized Sum of scalar subtract column returns negative result when positive exected -- Key: HIVE-4714 URL: https://issues.apache.org/jira/browse/HIVE-4714 Project: Hive Issue Type: Sub-task Reporter: Tony Murphy Assignee: Jitendra Nath Pandey Attachments: HIVE-4714.1.patch, sum_data.zip Actual: -5701157.669591231 Expected: 5701157.663489044 {noformat} drop table LINEITEM_ORC; create external table LINEITEM_ORC(L_DISCOUNT float ) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.CommonOrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'; {noformat} {noformat} SELECT Sum(1 - l_discount) FROM Lineitem_orc {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4722) MIN on timestamp column gives incorrect result.
Jitendra Nath Pandey created HIVE-4722: -- Summary: MIN on timestamp column gives incorrect result. Key: HIVE-4722 URL: https://issues.apache.org/jira/browse/HIVE-4722 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Gopal V MIN on timestamp column gives incorrect result. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-4718) array out of bounds exception near VectorHashKeyWrapper.getBytes() with 2 column GROUP BY
[ https://issues.apache.org/jira/browse/HIVE-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey reassigned HIVE-4718: -- Assignee: Remus Rusanu array out of bounds exception near VectorHashKeyWrapper.getBytes() with 2 column GROUP BY - Key: HIVE-4718 URL: https://issues.apache.org/jira/browse/HIVE-4718 Project: Hive Issue Type: Sub-task Components: Query Processor Affects Versions: vectorization-branch Reporter: Eric Hanson Assignee: Remus Rusanu select ddate, disinternalmsft16431, count\(\*\) from factsqlengineam_vec_orc where (ddate = '2012-05-19 00:00:00' or ddate = '2012-05-20 00:00:00') and (disinternalmsft16431 = 0 or disinternalmsft16431 = 5) group by ddate, disinternalmsft16431; - Diagnostic Messages for this Task: java.lang.RuntimeException: Hive Runtime Error while closing operators at org.apache.hadoop.hive.ql.exec.vector.VectorExecMapper.close(VectorExecMapper.java:229) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:271) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135) at org.apache.hadoop.mapred.Child.main(Child.java:265) Caused by: java.lang.ArrayIndexOutOfBoundsException: -1 at org.apache.hadoop.hive.ql.exec.vector.VectorHashKeyWrapper.getBytes(VectorHashKeyWrapper.java:226) at org.apache.hadoop.hive.ql.exec.vector.VectorHashKeyWrapperBatch.getWritableKeyValue(VectorHashKeyWrapperBatch.java:528) at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.flush(VectorGroupByOperator.java:293) at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.closeOp(VectorGroupByOperator.java:423) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:597) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:597) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:597) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:597) at org.apache.hadoop.hive.ql.exec.vector.VectorExecMapper.close(VectorExecMapper.java:196) ... 8 more FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-4716) Classcast exception with two group by keys of types string and tinyint.
[ https://issues.apache.org/jira/browse/HIVE-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey reassigned HIVE-4716: -- Assignee: Remus Rusanu Classcast exception with two group by keys of types string and tinyint. --- Key: HIVE-4716 URL: https://issues.apache.org/jira/browse/HIVE-4716 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Remus Rusanu Query: select t,sum(i),s from orcsmall where s aaa group by t, s; t : tinyint i : int s : string Exception: Diagnostic Messages for this Task: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing at org.apache.hadoop.hive.ql.exec.vector.VectorExecMapper.map(VectorExecMapper.java:164) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:170) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:752) at org.apache.hadoop.hive.ql.exec.vector.VectorExecMapper.map(VectorExecMapper.java:146) ... 4 more Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector at org.apache.hadoop.hive.ql.exec.vector.VectorHashKeyWrapperBatch.evaluateBatch(VectorHashKeyWrapperBatch.java:151) at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.processOp(VectorGroupByOperator.java:145) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832) at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.processOp(VectorSelectOperator.java:120) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832) at org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.processOp(VectorFilterOperator.java:91) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:90) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832) at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:745) ... 5 more -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-4744) Unary Minus Expression Throwing java.lang.NullPointerException
[ https://issues.apache.org/jira/browse/HIVE-4744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey reassigned HIVE-4744: -- Assignee: Jitendra Nath Pandey Unary Minus Expression Throwing java.lang.NullPointerException -- Key: HIVE-4744 URL: https://issues.apache.org/jira/browse/HIVE-4744 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Tony Murphy Assignee: Jitendra Nath Pandey Fix For: vectorization-branch {noformat} SELECT L_QUANTITY, L_RETURNFLAG, (L_QUANTITY * -2), (L_QUANTITY % L_SUPPKEY), (-(L_TAX)) FROM lineitem_orc WHERE((L_QUANTITY L_TAX) OR (L_TAX L_ORDERKEY)) ORDER BY L_QUANTITY; {noformat} Executed over tcpch lineitem generated at a scale factor of 1gb {noformat} 13/06/15 03:27:21 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore. Logging initialized using configuration in file:/C:/Hadoop/hive-0.9.0/conf/hive-log4j.properties Hive history file=c:\hadoop\hive-0.9.0\logs\history/hive_job_log_jenkinsuser_4280@SLAVE23-WIN_201306150327_1960387810.txt Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getUnaryMinusExpression(VectorizationContext.java:327) at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpression(VectorizationContext.java:440) at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpression(VectorizationContext.java:397) at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpression(VectorizationContext.java:248) at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.initializeOp(VectorSelectOperator.java:73) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:451) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:407) at org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.initializeOp(VectorFilterOperator.java:76) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:451) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:407) at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:187) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.ExecDriver.validateVectorOperator(ExecDriver.java:580) at org.apache.hadoop.hive.ql.exec.ExecDriver.validateVectorPath(ExecDriver.java:568) at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:287) at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:138) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:145) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1355) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1139) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:945) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:348) at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:446) at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:456) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:712) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597)
[jira] [Assigned] (HIVE-4745) java.lang.RuntimeException: Hive Runtime Error while closing operators: java.lang.ClassCastException: org.apache.hadoop.io.NullWritable cannot be cast to org.apache.hadoo
[ https://issues.apache.org/jira/browse/HIVE-4745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey reassigned HIVE-4745: -- Assignee: Remus Rusanu java.lang.RuntimeException: Hive Runtime Error while closing operators: java.lang.ClassCastException: org.apache.hadoop.io.NullWritable cannot be cast to org.apache.hadoop.hive.serde2.io.DoubleWritable - Key: HIVE-4745 URL: https://issues.apache.org/jira/browse/HIVE-4745 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Tony Murphy Assignee: Remus Rusanu Fix For: vectorization-branch {noformat} SELECT SUM(L_QUANTITY), (SUM(L_QUANTITY) + -1.3000E+000), (-2.2002E+000 % (SUM(L_QUANTITY) + -1.3000E+000)), MIN(L_EXTENDEDPRICE) FROM lineitem_orc WHERE ((L_EXTENDEDPRICE = L_LINENUMBER) OR (L_TAX L_EXTENDEDPRICE)); {noformat} executed over tpch line item with scale factor 1gb {noformat} 13/06/15 11:19:17 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore. Logging initialized using configuration in file:/C:/Hadoop/hive-0.9.0/conf/hive-log4j.properties Hive history file=c:\hadoop\hive-0.9.0\logs\history/hive_job_log_jenkinsuser_5292@SLAVE23-WIN_201306151119_1652846565.txt Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number Starting Job = job_201306142329_0098, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201306142329_0098 Kill Command = c:\Hadoop\hadoop-1.1.0-SNAPSHOT\bin\hadoop.cmd job -kill job_201306142329_0098 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2013-06-15 11:19:47,490 Stage-1 map = 0%, reduce = 0% 2013-06-15 11:20:29,801 Stage-1 map = 76%, reduce = 0% 2013-06-15 11:20:32,849 Stage-1 map = 0%, reduce = 0% 2013-06-15 11:20:35,880 Stage-1 map = 100%, reduce = 100% Ended Job = job_201306142329_0098 with errors Error during job, obtaining debugging information... Job Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_201306142329_0098 Examining task ID: task_201306142329_0098_m_02 (and more) from job job_201306142329_0098 Task with the most failures(4): - Task ID: task_201306142329_0098_m_00 URL: http://localhost:50030/taskdetails.jsp?jobid=job_201306142329_0098tipid=task_201306142329_0098_m_00 - Diagnostic Messages for this Task: java.lang.RuntimeException: Hive Runtime Error while closing operators at org.apache.hadoop.hive.ql.exec.vector.VectorExecMapper.close(VectorExecMapper.java:229) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:271) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135) at org.apache.hadoop.mapred.Child.main(Child.java:265) Caused by: java.lang.ClassCastException: org.apache.hadoop.io.NullWritable cannot be cast to org.apache.hadoop.hive.serde2.io.DoubleWritable at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableDoubleObjectInspector.get(WritableDoubleObjectInspector.java:35) at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:340) at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serializeStruct(LazyBinarySerDe.java:257) at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:204) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:245) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832) at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.flush(VectorGroupByOperator.java:281) at
[jira] [Created] (HIVE-4754) OrcInputFormat should be enhanced to provide vectorized input.
Jitendra Nath Pandey created HIVE-4754: -- Summary: OrcInputFormat should be enhanced to provide vectorized input. Key: HIVE-4754 URL: https://issues.apache.org/jira/browse/HIVE-4754 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey This change will make CommonOrcInputFormat redundant. The OrcInputFormat will again become the default input format for Orc files. The reason for this change is to allow existing orc files to work with vectorized code path. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4718) array out of bounds exception near VectorHashKeyWrapper.getBytes() with 2 column GROUP BY
[ https://issues.apache.org/jira/browse/HIVE-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13687477#comment-13687477 ] Jitendra Nath Pandey commented on HIVE-4718: I found another instance of this issue: Query : select s, i, max(b) from orctabwithnulls group by s, i; Table: CREATE TABLE orctabwithnulls ( t tinyint, si smallint, i int , b bigint , f float , d double , bo boolean , s string ) STORED AS ORC Exception: java.lang.RuntimeException: Hive Runtime Error while closing operators at org.apache.hadoop.hive.ql.exec.vector.VectorExecMapper.close(VectorExecMapper.java:229) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:365) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.lang.ArrayIndexOutOfBoundsException: -1 at org.apache.hadoop.hive.ql.exec.vector.VectorHashKeyWrapper.getBytes(VectorHashKeyWrapper.java:226) at org.apache.hadoop.hive.ql.exec.vector.VectorHashKeyWrapperBatch.getWritableKeyValue(VectorHashKeyWrapperBatch.java:528) at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.flush(VectorGroupByOperator.java:293) at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.closeOp(VectorGroupByOperator.java:423) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:597) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:597) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:597) at org.apache.hadoop.hive.ql.exec.vector.VectorExecMapper.close(VectorExecMapper.java:196) ... 8 more array out of bounds exception near VectorHashKeyWrapper.getBytes() with 2 column GROUP BY - Key: HIVE-4718 URL: https://issues.apache.org/jira/browse/HIVE-4718 Project: Hive Issue Type: Sub-task Components: Query Processor Affects Versions: vectorization-branch Reporter: Eric Hanson Assignee: Remus Rusanu select ddate, disinternalmsft16431, count\(\*\) from factsqlengineam_vec_orc where (ddate = '2012-05-19 00:00:00' or ddate = '2012-05-20 00:00:00') and (disinternalmsft16431 = 0 or disinternalmsft16431 = 5) group by ddate, disinternalmsft16431; - Diagnostic Messages for this Task: java.lang.RuntimeException: Hive Runtime Error while closing operators at org.apache.hadoop.hive.ql.exec.vector.VectorExecMapper.close(VectorExecMapper.java:229) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:271) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135) at org.apache.hadoop.mapred.Child.main(Child.java:265) Caused by: java.lang.ArrayIndexOutOfBoundsException: -1 at org.apache.hadoop.hive.ql.exec.vector.VectorHashKeyWrapper.getBytes(VectorHashKeyWrapper.java:226) at org.apache.hadoop.hive.ql.exec.vector.VectorHashKeyWrapperBatch.getWritableKeyValue(VectorHashKeyWrapperBatch.java:528) at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.flush(VectorGroupByOperator.java:293) at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.closeOp(VectorGroupByOperator.java:423) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:597) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:597) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:597) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:597) at org.apache.hadoop.hive.ql.exec.vector.VectorExecMapper.close(VectorExecMapper.java:196) ... 8 more FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask -- This message is automatically generated by JIRA. If you think it was sent
[jira] [Reopened] (HIVE-4718) array out of bounds exception near VectorHashKeyWrapper.getBytes() with 2 column GROUP BY
[ https://issues.apache.org/jira/browse/HIVE-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey reopened HIVE-4718: array out of bounds exception near VectorHashKeyWrapper.getBytes() with 2 column GROUP BY - Key: HIVE-4718 URL: https://issues.apache.org/jira/browse/HIVE-4718 Project: Hive Issue Type: Sub-task Components: Query Processor Affects Versions: vectorization-branch Reporter: Eric Hanson Assignee: Remus Rusanu select ddate, disinternalmsft16431, count\(\*\) from factsqlengineam_vec_orc where (ddate = '2012-05-19 00:00:00' or ddate = '2012-05-20 00:00:00') and (disinternalmsft16431 = 0 or disinternalmsft16431 = 5) group by ddate, disinternalmsft16431; - Diagnostic Messages for this Task: java.lang.RuntimeException: Hive Runtime Error while closing operators at org.apache.hadoop.hive.ql.exec.vector.VectorExecMapper.close(VectorExecMapper.java:229) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:271) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135) at org.apache.hadoop.mapred.Child.main(Child.java:265) Caused by: java.lang.ArrayIndexOutOfBoundsException: -1 at org.apache.hadoop.hive.ql.exec.vector.VectorHashKeyWrapper.getBytes(VectorHashKeyWrapper.java:226) at org.apache.hadoop.hive.ql.exec.vector.VectorHashKeyWrapperBatch.getWritableKeyValue(VectorHashKeyWrapperBatch.java:528) at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.flush(VectorGroupByOperator.java:293) at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.closeOp(VectorGroupByOperator.java:423) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:597) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:597) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:597) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:597) at org.apache.hadoop.hive.ql.exec.vector.VectorExecMapper.close(VectorExecMapper.java:196) ... 8 more FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4754) OrcInputFormat should be enhanced to provide vectorized input.
[ https://issues.apache.org/jira/browse/HIVE-4754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-4754: --- Attachment: HIVE-4754.1.patch OrcInputFormat should be enhanced to provide vectorized input. -- Key: HIVE-4754 URL: https://issues.apache.org/jira/browse/HIVE-4754 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-4754.1.patch This change will make CommonOrcInputFormat redundant. The OrcInputFormat will again become the default input format for Orc files. The reason for this change is to allow existing orc files to work with vectorized code path. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4744) Unary Minus Expression Throwing java.lang.NullPointerException
[ https://issues.apache.org/jira/browse/HIVE-4744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-4744: --- Status: Patch Available (was: Open) Unary Minus Expression Throwing java.lang.NullPointerException -- Key: HIVE-4744 URL: https://issues.apache.org/jira/browse/HIVE-4744 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Tony Murphy Assignee: Jitendra Nath Pandey Fix For: vectorization-branch Attachments: HIVE-4744.1.patch {noformat} SELECT L_QUANTITY, L_RETURNFLAG, (L_QUANTITY * -2), (L_QUANTITY % L_SUPPKEY), (-(L_TAX)) FROM lineitem_orc WHERE((L_QUANTITY L_TAX) OR (L_TAX L_ORDERKEY)) ORDER BY L_QUANTITY; {noformat} Executed over tcpch lineitem generated at a scale factor of 1gb {noformat} 13/06/15 03:27:21 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore. Logging initialized using configuration in file:/C:/Hadoop/hive-0.9.0/conf/hive-log4j.properties Hive history file=c:\hadoop\hive-0.9.0\logs\history/hive_job_log_jenkinsuser_4280@SLAVE23-WIN_201306150327_1960387810.txt Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getUnaryMinusExpression(VectorizationContext.java:327) at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpression(VectorizationContext.java:440) at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpression(VectorizationContext.java:397) at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpression(VectorizationContext.java:248) at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.initializeOp(VectorSelectOperator.java:73) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:451) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:407) at org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.initializeOp(VectorFilterOperator.java:76) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:451) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:407) at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:187) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.ExecDriver.validateVectorOperator(ExecDriver.java:580) at org.apache.hadoop.hive.ql.exec.ExecDriver.validateVectorPath(ExecDriver.java:568) at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:287) at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:138) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:145) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1355) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1139) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:945) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:348) at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:446) at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:456) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:712) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at
[jira] [Updated] (HIVE-4754) OrcInputFormat should be enhanced to provide vectorized input.
[ https://issues.apache.org/jira/browse/HIVE-4754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-4754: --- Status: Patch Available (was: Open) OrcInputFormat should be enhanced to provide vectorized input. -- Key: HIVE-4754 URL: https://issues.apache.org/jira/browse/HIVE-4754 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-4754.1.patch This change will make CommonOrcInputFormat redundant. The OrcInputFormat will again become the default input format for Orc files. The reason for this change is to allow existing orc files to work with vectorized code path. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4744) Unary Minus Expression Throwing java.lang.NullPointerException
[ https://issues.apache.org/jira/browse/HIVE-4744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-4744: --- Attachment: HIVE-4744.1.patch Unary Minus Expression Throwing java.lang.NullPointerException -- Key: HIVE-4744 URL: https://issues.apache.org/jira/browse/HIVE-4744 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Tony Murphy Assignee: Jitendra Nath Pandey Fix For: vectorization-branch Attachments: HIVE-4744.1.patch {noformat} SELECT L_QUANTITY, L_RETURNFLAG, (L_QUANTITY * -2), (L_QUANTITY % L_SUPPKEY), (-(L_TAX)) FROM lineitem_orc WHERE((L_QUANTITY L_TAX) OR (L_TAX L_ORDERKEY)) ORDER BY L_QUANTITY; {noformat} Executed over tcpch lineitem generated at a scale factor of 1gb {noformat} 13/06/15 03:27:21 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore. Logging initialized using configuration in file:/C:/Hadoop/hive-0.9.0/conf/hive-log4j.properties Hive history file=c:\hadoop\hive-0.9.0\logs\history/hive_job_log_jenkinsuser_4280@SLAVE23-WIN_201306150327_1960387810.txt Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getUnaryMinusExpression(VectorizationContext.java:327) at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpression(VectorizationContext.java:440) at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpression(VectorizationContext.java:397) at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpression(VectorizationContext.java:248) at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.initializeOp(VectorSelectOperator.java:73) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:451) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:407) at org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.initializeOp(VectorFilterOperator.java:76) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:451) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:407) at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:187) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.ExecDriver.validateVectorOperator(ExecDriver.java:580) at org.apache.hadoop.hive.ql.exec.ExecDriver.validateVectorPath(ExecDriver.java:568) at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:287) at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:138) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:145) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1355) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1139) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:945) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:348) at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:446) at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:456) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:712) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at
[jira] [Updated] (HIVE-4758) NULLs and record separators broken with vectorization branch intermediate outputs
[ https://issues.apache.org/jira/browse/HIVE-4758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-4758: --- Issue Type: Sub-task (was: Bug) Parent: HIVE-4160 NULLs and record separators broken with vectorization branch intermediate outputs - Key: HIVE-4758 URL: https://issues.apache.org/jira/browse/HIVE-4758 Project: Hive Issue Type: Sub-task Components: Query Processor Affects Versions: vectorization-branch Reporter: Gopal V Assignee: Gopal V Attachments: HIVE-4758-001.patch Queries of type timestamp on partitioned tables return NULL for all rows of timestamp columns, if the first row in the column is NULL. This was tracked down to the failure of timestamp columns to parse the map output properly, which was due to differing format from the unvectorized code's output. The output file for vectorized code says {code} (null)^A 2013-02-12 21:05:29^A {code} Where the unvectorized code outputs {code} \N 2013-02-12 21:05:29 {code} The vectorized code passes on the (null) string to the LazyTimestamp parser, which fails to parse it returns NULL, but slowed down massively by the IllegalArgumentException. And the extraneous ^A prevents the actual Timestamp from being parsed into valid timestamps. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4744) Unary Minus Expression Throwing java.lang.NullPointerException
[ https://issues.apache.org/jira/browse/HIVE-4744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-4744: --- Attachment: HIVE-4744.2.patch Updated patch with unit tests. Unary Minus Expression Throwing java.lang.NullPointerException -- Key: HIVE-4744 URL: https://issues.apache.org/jira/browse/HIVE-4744 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Tony Murphy Assignee: Jitendra Nath Pandey Fix For: vectorization-branch Attachments: HIVE-4744.1.patch, HIVE-4744.2.patch {noformat} SELECT L_QUANTITY, L_RETURNFLAG, (L_QUANTITY * -2), (L_QUANTITY % L_SUPPKEY), (-(L_TAX)) FROM lineitem_orc WHERE((L_QUANTITY L_TAX) OR (L_TAX L_ORDERKEY)) ORDER BY L_QUANTITY; {noformat} Executed over tcpch lineitem generated at a scale factor of 1gb {noformat} 13/06/15 03:27:21 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore. Logging initialized using configuration in file:/C:/Hadoop/hive-0.9.0/conf/hive-log4j.properties Hive history file=c:\hadoop\hive-0.9.0\logs\history/hive_job_log_jenkinsuser_4280@SLAVE23-WIN_201306150327_1960387810.txt Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getUnaryMinusExpression(VectorizationContext.java:327) at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpression(VectorizationContext.java:440) at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpression(VectorizationContext.java:397) at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpression(VectorizationContext.java:248) at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.initializeOp(VectorSelectOperator.java:73) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:451) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:407) at org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.initializeOp(VectorFilterOperator.java:76) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:451) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:407) at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:187) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.ExecDriver.validateVectorOperator(ExecDriver.java:580) at org.apache.hadoop.hive.ql.exec.ExecDriver.validateVectorPath(ExecDriver.java:568) at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:287) at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:138) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:145) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1355) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1139) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:945) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:348) at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:446) at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:456) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:712) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at
[jira] [Updated] (HIVE-4744) Unary Minus Expression Throwing java.lang.NullPointerException
[ https://issues.apache.org/jira/browse/HIVE-4744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-4744: --- Attachment: HIVE-4744.3.patch Unary Minus Expression Throwing java.lang.NullPointerException -- Key: HIVE-4744 URL: https://issues.apache.org/jira/browse/HIVE-4744 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Tony Murphy Assignee: Jitendra Nath Pandey Fix For: vectorization-branch Attachments: HIVE-4744.1.patch, HIVE-4744.2.patch, HIVE-4744.3.patch {noformat} SELECT L_QUANTITY, L_RETURNFLAG, (L_QUANTITY * -2), (L_QUANTITY % L_SUPPKEY), (-(L_TAX)) FROM lineitem_orc WHERE((L_QUANTITY L_TAX) OR (L_TAX L_ORDERKEY)) ORDER BY L_QUANTITY; {noformat} Executed over tcpch lineitem generated at a scale factor of 1gb {noformat} 13/06/15 03:27:21 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore. Logging initialized using configuration in file:/C:/Hadoop/hive-0.9.0/conf/hive-log4j.properties Hive history file=c:\hadoop\hive-0.9.0\logs\history/hive_job_log_jenkinsuser_4280@SLAVE23-WIN_201306150327_1960387810.txt Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getUnaryMinusExpression(VectorizationContext.java:327) at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpression(VectorizationContext.java:440) at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpression(VectorizationContext.java:397) at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpression(VectorizationContext.java:248) at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.initializeOp(VectorSelectOperator.java:73) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:451) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:407) at org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.initializeOp(VectorFilterOperator.java:76) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:451) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:407) at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:187) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.ExecDriver.validateVectorOperator(ExecDriver.java:580) at org.apache.hadoop.hive.ql.exec.ExecDriver.validateVectorPath(ExecDriver.java:568) at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:287) at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:138) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:145) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1355) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1139) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:945) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:348) at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:446) at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:456) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:712) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at
[jira] [Commented] (HIVE-4744) Unary Minus Expression Throwing java.lang.NullPointerException
[ https://issues.apache.org/jira/browse/HIVE-4744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688550#comment-13688550 ] Jitendra Nath Pandey commented on HIVE-4744: Updated patch with removed commented code. Unary Minus Expression Throwing java.lang.NullPointerException -- Key: HIVE-4744 URL: https://issues.apache.org/jira/browse/HIVE-4744 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Tony Murphy Assignee: Jitendra Nath Pandey Fix For: vectorization-branch Attachments: HIVE-4744.1.patch, HIVE-4744.2.patch, HIVE-4744.3.patch {noformat} SELECT L_QUANTITY, L_RETURNFLAG, (L_QUANTITY * -2), (L_QUANTITY % L_SUPPKEY), (-(L_TAX)) FROM lineitem_orc WHERE((L_QUANTITY L_TAX) OR (L_TAX L_ORDERKEY)) ORDER BY L_QUANTITY; {noformat} Executed over tcpch lineitem generated at a scale factor of 1gb {noformat} 13/06/15 03:27:21 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore. Logging initialized using configuration in file:/C:/Hadoop/hive-0.9.0/conf/hive-log4j.properties Hive history file=c:\hadoop\hive-0.9.0\logs\history/hive_job_log_jenkinsuser_4280@SLAVE23-WIN_201306150327_1960387810.txt Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getUnaryMinusExpression(VectorizationContext.java:327) at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpression(VectorizationContext.java:440) at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpression(VectorizationContext.java:397) at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpression(VectorizationContext.java:248) at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.initializeOp(VectorSelectOperator.java:73) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:451) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:407) at org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.initializeOp(VectorFilterOperator.java:76) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:451) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:407) at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:187) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.ExecDriver.validateVectorOperator(ExecDriver.java:580) at org.apache.hadoop.hive.ql.exec.ExecDriver.validateVectorPath(ExecDriver.java:568) at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:287) at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:138) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:145) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1355) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1139) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:945) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:348) at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:446) at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:456) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:712) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at
[jira] [Updated] (HIVE-4754) OrcInputFormat should be enhanced to provide vectorized input.
[ https://issues.apache.org/jira/browse/HIVE-4754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-4754: --- Attachment: HIVE-4754.2.patch OrcInputFormat should be enhanced to provide vectorized input. -- Key: HIVE-4754 URL: https://issues.apache.org/jira/browse/HIVE-4754 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-4754.1.patch, HIVE-4754.2.patch This change will make CommonOrcInputFormat redundant. The OrcInputFormat will again become the default input format for Orc files. The reason for this change is to allow existing orc files to work with vectorized code path. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4754) OrcInputFormat should be enhanced to provide vectorized input.
[ https://issues.apache.org/jira/browse/HIVE-4754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688574#comment-13688574 ] Jitendra Nath Pandey commented on HIVE-4754: Removed CommonOrcInputFormat in the latest patch. OrcInputFormat should be enhanced to provide vectorized input. -- Key: HIVE-4754 URL: https://issues.apache.org/jira/browse/HIVE-4754 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-4754.1.patch, HIVE-4754.2.patch This change will make CommonOrcInputFormat redundant. The OrcInputFormat will again become the default input format for Orc files. The reason for this change is to allow existing orc files to work with vectorized code path. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4745) java.lang.RuntimeException: Hive Runtime Error while closing operators: java.lang.ClassCastException: org.apache.hadoop.io.NullWritable cannot be cast to org.apache.hadoop
[ https://issues.apache.org/jira/browse/HIVE-4745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-4745: --- Attachment: HIVE-4745.2.patch java.lang.RuntimeException: Hive Runtime Error while closing operators: java.lang.ClassCastException: org.apache.hadoop.io.NullWritable cannot be cast to org.apache.hadoop.hive.serde2.io.DoubleWritable - Key: HIVE-4745 URL: https://issues.apache.org/jira/browse/HIVE-4745 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Tony Murphy Assignee: Jitendra Nath Pandey Fix For: vectorization-branch Attachments: HIVE-4745.2.patch {noformat} SELECT SUM(L_QUANTITY), (SUM(L_QUANTITY) + -1.3000E+000), (-2.2002E+000 % (SUM(L_QUANTITY) + -1.3000E+000)), MIN(L_EXTENDEDPRICE) FROM lineitem_orc WHERE ((L_EXTENDEDPRICE = L_LINENUMBER) OR (L_TAX L_EXTENDEDPRICE)); {noformat} executed over tpch line item with scale factor 1gb {noformat} 13/06/15 11:19:17 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore. Logging initialized using configuration in file:/C:/Hadoop/hive-0.9.0/conf/hive-log4j.properties Hive history file=c:\hadoop\hive-0.9.0\logs\history/hive_job_log_jenkinsuser_5292@SLAVE23-WIN_201306151119_1652846565.txt Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number Starting Job = job_201306142329_0098, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201306142329_0098 Kill Command = c:\Hadoop\hadoop-1.1.0-SNAPSHOT\bin\hadoop.cmd job -kill job_201306142329_0098 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2013-06-15 11:19:47,490 Stage-1 map = 0%, reduce = 0% 2013-06-15 11:20:29,801 Stage-1 map = 76%, reduce = 0% 2013-06-15 11:20:32,849 Stage-1 map = 0%, reduce = 0% 2013-06-15 11:20:35,880 Stage-1 map = 100%, reduce = 100% Ended Job = job_201306142329_0098 with errors Error during job, obtaining debugging information... Job Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_201306142329_0098 Examining task ID: task_201306142329_0098_m_02 (and more) from job job_201306142329_0098 Task with the most failures(4): - Task ID: task_201306142329_0098_m_00 URL: http://localhost:50030/taskdetails.jsp?jobid=job_201306142329_0098tipid=task_201306142329_0098_m_00 - Diagnostic Messages for this Task: java.lang.RuntimeException: Hive Runtime Error while closing operators at org.apache.hadoop.hive.ql.exec.vector.VectorExecMapper.close(VectorExecMapper.java:229) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:271) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135) at org.apache.hadoop.mapred.Child.main(Child.java:265) Caused by: java.lang.ClassCastException: org.apache.hadoop.io.NullWritable cannot be cast to org.apache.hadoop.hive.serde2.io.DoubleWritable at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableDoubleObjectInspector.get(WritableDoubleObjectInspector.java:35) at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:340) at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serializeStruct(LazyBinarySerDe.java:257) at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:204) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:245) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832) at
[jira] [Updated] (HIVE-4745) java.lang.RuntimeException: Hive Runtime Error while closing operators: java.lang.ClassCastException: org.apache.hadoop.io.NullWritable cannot be cast to org.apache.hadoop
[ https://issues.apache.org/jira/browse/HIVE-4745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-4745: --- Attachment: (was: HIVE-4754.2.patch) java.lang.RuntimeException: Hive Runtime Error while closing operators: java.lang.ClassCastException: org.apache.hadoop.io.NullWritable cannot be cast to org.apache.hadoop.hive.serde2.io.DoubleWritable - Key: HIVE-4745 URL: https://issues.apache.org/jira/browse/HIVE-4745 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Tony Murphy Assignee: Jitendra Nath Pandey Fix For: vectorization-branch Attachments: HIVE-4745.2.patch {noformat} SELECT SUM(L_QUANTITY), (SUM(L_QUANTITY) + -1.3000E+000), (-2.2002E+000 % (SUM(L_QUANTITY) + -1.3000E+000)), MIN(L_EXTENDEDPRICE) FROM lineitem_orc WHERE ((L_EXTENDEDPRICE = L_LINENUMBER) OR (L_TAX L_EXTENDEDPRICE)); {noformat} executed over tpch line item with scale factor 1gb {noformat} 13/06/15 11:19:17 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore. Logging initialized using configuration in file:/C:/Hadoop/hive-0.9.0/conf/hive-log4j.properties Hive history file=c:\hadoop\hive-0.9.0\logs\history/hive_job_log_jenkinsuser_5292@SLAVE23-WIN_201306151119_1652846565.txt Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number Starting Job = job_201306142329_0098, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201306142329_0098 Kill Command = c:\Hadoop\hadoop-1.1.0-SNAPSHOT\bin\hadoop.cmd job -kill job_201306142329_0098 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2013-06-15 11:19:47,490 Stage-1 map = 0%, reduce = 0% 2013-06-15 11:20:29,801 Stage-1 map = 76%, reduce = 0% 2013-06-15 11:20:32,849 Stage-1 map = 0%, reduce = 0% 2013-06-15 11:20:35,880 Stage-1 map = 100%, reduce = 100% Ended Job = job_201306142329_0098 with errors Error during job, obtaining debugging information... Job Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_201306142329_0098 Examining task ID: task_201306142329_0098_m_02 (and more) from job job_201306142329_0098 Task with the most failures(4): - Task ID: task_201306142329_0098_m_00 URL: http://localhost:50030/taskdetails.jsp?jobid=job_201306142329_0098tipid=task_201306142329_0098_m_00 - Diagnostic Messages for this Task: java.lang.RuntimeException: Hive Runtime Error while closing operators at org.apache.hadoop.hive.ql.exec.vector.VectorExecMapper.close(VectorExecMapper.java:229) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:271) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135) at org.apache.hadoop.mapred.Child.main(Child.java:265) Caused by: java.lang.ClassCastException: org.apache.hadoop.io.NullWritable cannot be cast to org.apache.hadoop.hive.serde2.io.DoubleWritable at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableDoubleObjectInspector.get(WritableDoubleObjectInspector.java:35) at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:340) at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serializeStruct(LazyBinarySerDe.java:257) at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:204) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:245) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832) at
[jira] [Assigned] (HIVE-4745) java.lang.RuntimeException: Hive Runtime Error while closing operators: java.lang.ClassCastException: org.apache.hadoop.io.NullWritable cannot be cast to org.apache.hadoo
[ https://issues.apache.org/jira/browse/HIVE-4745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey reassigned HIVE-4745: -- Assignee: Jitendra Nath Pandey (was: Remus Rusanu) java.lang.RuntimeException: Hive Runtime Error while closing operators: java.lang.ClassCastException: org.apache.hadoop.io.NullWritable cannot be cast to org.apache.hadoop.hive.serde2.io.DoubleWritable - Key: HIVE-4745 URL: https://issues.apache.org/jira/browse/HIVE-4745 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Tony Murphy Assignee: Jitendra Nath Pandey Fix For: vectorization-branch Attachments: HIVE-4745.2.patch {noformat} SELECT SUM(L_QUANTITY), (SUM(L_QUANTITY) + -1.3000E+000), (-2.2002E+000 % (SUM(L_QUANTITY) + -1.3000E+000)), MIN(L_EXTENDEDPRICE) FROM lineitem_orc WHERE ((L_EXTENDEDPRICE = L_LINENUMBER) OR (L_TAX L_EXTENDEDPRICE)); {noformat} executed over tpch line item with scale factor 1gb {noformat} 13/06/15 11:19:17 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore. Logging initialized using configuration in file:/C:/Hadoop/hive-0.9.0/conf/hive-log4j.properties Hive history file=c:\hadoop\hive-0.9.0\logs\history/hive_job_log_jenkinsuser_5292@SLAVE23-WIN_201306151119_1652846565.txt Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number Starting Job = job_201306142329_0098, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201306142329_0098 Kill Command = c:\Hadoop\hadoop-1.1.0-SNAPSHOT\bin\hadoop.cmd job -kill job_201306142329_0098 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2013-06-15 11:19:47,490 Stage-1 map = 0%, reduce = 0% 2013-06-15 11:20:29,801 Stage-1 map = 76%, reduce = 0% 2013-06-15 11:20:32,849 Stage-1 map = 0%, reduce = 0% 2013-06-15 11:20:35,880 Stage-1 map = 100%, reduce = 100% Ended Job = job_201306142329_0098 with errors Error during job, obtaining debugging information... Job Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_201306142329_0098 Examining task ID: task_201306142329_0098_m_02 (and more) from job job_201306142329_0098 Task with the most failures(4): - Task ID: task_201306142329_0098_m_00 URL: http://localhost:50030/taskdetails.jsp?jobid=job_201306142329_0098tipid=task_201306142329_0098_m_00 - Diagnostic Messages for this Task: java.lang.RuntimeException: Hive Runtime Error while closing operators at org.apache.hadoop.hive.ql.exec.vector.VectorExecMapper.close(VectorExecMapper.java:229) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:271) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135) at org.apache.hadoop.mapred.Child.main(Child.java:265) Caused by: java.lang.ClassCastException: org.apache.hadoop.io.NullWritable cannot be cast to org.apache.hadoop.hive.serde2.io.DoubleWritable at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableDoubleObjectInspector.get(WritableDoubleObjectInspector.java:35) at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:340) at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serializeStruct(LazyBinarySerDe.java:257) at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:204) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:245) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832) at
[jira] [Updated] (HIVE-4745) java.lang.RuntimeException: Hive Runtime Error while closing operators: java.lang.ClassCastException: org.apache.hadoop.io.NullWritable cannot be cast to org.apache.hadoop
[ https://issues.apache.org/jira/browse/HIVE-4745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-4745: --- Attachment: HIVE-4754.2.patch java.lang.RuntimeException: Hive Runtime Error while closing operators: java.lang.ClassCastException: org.apache.hadoop.io.NullWritable cannot be cast to org.apache.hadoop.hive.serde2.io.DoubleWritable - Key: HIVE-4745 URL: https://issues.apache.org/jira/browse/HIVE-4745 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Tony Murphy Assignee: Jitendra Nath Pandey Fix For: vectorization-branch Attachments: HIVE-4745.2.patch {noformat} SELECT SUM(L_QUANTITY), (SUM(L_QUANTITY) + -1.3000E+000), (-2.2002E+000 % (SUM(L_QUANTITY) + -1.3000E+000)), MIN(L_EXTENDEDPRICE) FROM lineitem_orc WHERE ((L_EXTENDEDPRICE = L_LINENUMBER) OR (L_TAX L_EXTENDEDPRICE)); {noformat} executed over tpch line item with scale factor 1gb {noformat} 13/06/15 11:19:17 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore. Logging initialized using configuration in file:/C:/Hadoop/hive-0.9.0/conf/hive-log4j.properties Hive history file=c:\hadoop\hive-0.9.0\logs\history/hive_job_log_jenkinsuser_5292@SLAVE23-WIN_201306151119_1652846565.txt Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number Starting Job = job_201306142329_0098, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201306142329_0098 Kill Command = c:\Hadoop\hadoop-1.1.0-SNAPSHOT\bin\hadoop.cmd job -kill job_201306142329_0098 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2013-06-15 11:19:47,490 Stage-1 map = 0%, reduce = 0% 2013-06-15 11:20:29,801 Stage-1 map = 76%, reduce = 0% 2013-06-15 11:20:32,849 Stage-1 map = 0%, reduce = 0% 2013-06-15 11:20:35,880 Stage-1 map = 100%, reduce = 100% Ended Job = job_201306142329_0098 with errors Error during job, obtaining debugging information... Job Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_201306142329_0098 Examining task ID: task_201306142329_0098_m_02 (and more) from job job_201306142329_0098 Task with the most failures(4): - Task ID: task_201306142329_0098_m_00 URL: http://localhost:50030/taskdetails.jsp?jobid=job_201306142329_0098tipid=task_201306142329_0098_m_00 - Diagnostic Messages for this Task: java.lang.RuntimeException: Hive Runtime Error while closing operators at org.apache.hadoop.hive.ql.exec.vector.VectorExecMapper.close(VectorExecMapper.java:229) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:271) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135) at org.apache.hadoop.mapred.Child.main(Child.java:265) Caused by: java.lang.ClassCastException: org.apache.hadoop.io.NullWritable cannot be cast to org.apache.hadoop.hive.serde2.io.DoubleWritable at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableDoubleObjectInspector.get(WritableDoubleObjectInspector.java:35) at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:340) at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serializeStruct(LazyBinarySerDe.java:257) at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:204) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:245) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832) at
[jira] [Commented] (HIVE-4745) java.lang.RuntimeException: Hive Runtime Error while closing operators: java.lang.ClassCastException: org.apache.hadoop.io.NullWritable cannot be cast to org.apache.hado
[ https://issues.apache.org/jira/browse/HIVE-4745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688720#comment-13688720 ] Jitendra Nath Pandey commented on HIVE-4745: This patch effectively reverts the HIVE-4688 change. The NPE is fixed in VectorizedRowBatch by HIVE-4758. java.lang.RuntimeException: Hive Runtime Error while closing operators: java.lang.ClassCastException: org.apache.hadoop.io.NullWritable cannot be cast to org.apache.hadoop.hive.serde2.io.DoubleWritable - Key: HIVE-4745 URL: https://issues.apache.org/jira/browse/HIVE-4745 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Tony Murphy Assignee: Jitendra Nath Pandey Fix For: vectorization-branch Attachments: HIVE-4745.2.patch {noformat} SELECT SUM(L_QUANTITY), (SUM(L_QUANTITY) + -1.3000E+000), (-2.2002E+000 % (SUM(L_QUANTITY) + -1.3000E+000)), MIN(L_EXTENDEDPRICE) FROM lineitem_orc WHERE ((L_EXTENDEDPRICE = L_LINENUMBER) OR (L_TAX L_EXTENDEDPRICE)); {noformat} executed over tpch line item with scale factor 1gb {noformat} 13/06/15 11:19:17 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore. Logging initialized using configuration in file:/C:/Hadoop/hive-0.9.0/conf/hive-log4j.properties Hive history file=c:\hadoop\hive-0.9.0\logs\history/hive_job_log_jenkinsuser_5292@SLAVE23-WIN_201306151119_1652846565.txt Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number Starting Job = job_201306142329_0098, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201306142329_0098 Kill Command = c:\Hadoop\hadoop-1.1.0-SNAPSHOT\bin\hadoop.cmd job -kill job_201306142329_0098 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2013-06-15 11:19:47,490 Stage-1 map = 0%, reduce = 0% 2013-06-15 11:20:29,801 Stage-1 map = 76%, reduce = 0% 2013-06-15 11:20:32,849 Stage-1 map = 0%, reduce = 0% 2013-06-15 11:20:35,880 Stage-1 map = 100%, reduce = 100% Ended Job = job_201306142329_0098 with errors Error during job, obtaining debugging information... Job Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_201306142329_0098 Examining task ID: task_201306142329_0098_m_02 (and more) from job job_201306142329_0098 Task with the most failures(4): - Task ID: task_201306142329_0098_m_00 URL: http://localhost:50030/taskdetails.jsp?jobid=job_201306142329_0098tipid=task_201306142329_0098_m_00 - Diagnostic Messages for this Task: java.lang.RuntimeException: Hive Runtime Error while closing operators at org.apache.hadoop.hive.ql.exec.vector.VectorExecMapper.close(VectorExecMapper.java:229) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:271) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135) at org.apache.hadoop.mapred.Child.main(Child.java:265) Caused by: java.lang.ClassCastException: org.apache.hadoop.io.NullWritable cannot be cast to org.apache.hadoop.hive.serde2.io.DoubleWritable at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableDoubleObjectInspector.get(WritableDoubleObjectInspector.java:35) at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:340) at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serializeStruct(LazyBinarySerDe.java:257) at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:204) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:245) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502) at
[jira] [Commented] (HIVE-4770) java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row
[ https://issues.apache.org/jira/browse/HIVE-4770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13690562#comment-13690562 ] Jitendra Nath Pandey commented on HIVE-4770: From the exception trace, it seems that the query didn't go on the vectorization code path. I think it is because the LIKE expression support is still not committed. Does the query fail when vectorization is disabled too? If not then we have a bug in validation for vectorization. java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row -- Key: HIVE-4770 URL: https://issues.apache.org/jira/browse/HIVE-4770 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Tony Murphy Fix For: vectorization-branch Attachments: output.txt, tableAndData.zip Table and data attached. {noformat} SELECT cfloat, csmallint, cint, ctimestamp, (cfloat + 10), STDDEV_SAMP(cfloat), (-((cfloat + 10))), (cint / cfloat), MAX(cint), (-(cint)), (cint * STDDEV_SAMP(cfloat)), STDDEV_SAMP(cint), VAR_SAMP(cint), (-(MAX(cint))), ((-(MAX(cint))) / 0.E+000) FROM alltypes_orc WHERE(((1 = cfloat) OR (cstring2 LIKE '%b')) OR ((cint = csmallint) OR (cstring2 LIKE '%ss'))) GROUP BY cfloat, csmallint, cint, ctimestamp ORDER BY cint, cfloat; {noformat} {noformat} java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {ctinyint:null,csmallint:-3806,cint:-66533315,cbigint:null,cdouble:null,cfloat:152.95706,cstring1:null,cstring2:null,ctimestamp:9131-01-01 16:52:03.53,cboolean:null} at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:162) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:271) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135) at org.apache.hadoop.mapred.Child.main(Child.java:265) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {ctinyint:null,csmallint:-3806,cint:-66533315,cbigint:null,cdouble:null,cfloat:152.95706,cstring1:null,cstring2:null,ctimestamp:9131-01-01 16:52:03.53,cboolean:null} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) ... 8 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:796) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:88) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832) at org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:136) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:90) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:652) ... 9 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.GroupByOperator.shouldBeFlushed(GroupByOperator.java:941) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:836) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:723) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:791) ... 21 more {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your
[jira] [Commented] (HIVE-4684) Query with filter constant on left of = and column expression on right does not vectorize
[ https://issues.apache.org/jira/browse/HIVE-4684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13697207#comment-13697207 ] Jitendra Nath Pandey commented on HIVE-4684: All the expressions generated in getVectorBinaryComparisonFilterExpression are filter expressions. We don't need to check for the opType in this method. The boolean expressions outside the 'where clause' e.g. in projections are not being handled right now. That should be addressed separately in a different jira. Query with filter constant on left of = and column expression on right does not vectorize --- Key: HIVE-4684 URL: https://issues.apache.org/jira/browse/HIVE-4684 Project: Hive Issue Type: Sub-task Components: Query Processor Reporter: Eric Hanson Assignee: Sarvesh Sakalanaga Attachments: Hive-4684.0.patch select dmachineid from factsqlengineam_vec_orc where 1073 = dmachineid + 1; Does not go down the vectorization path. Output: hive select dmachineid from factsqlengineam_vec_orc where 1073 = dmachineid + 1; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Validating if vectorized execution is applicable Cannot vectorize the plan: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassNotFoundException: org.apache.hadoop.hiv e.ql.exec.vector.expressions.gen.FilterLongScalarEqualLongColumn Starting Job = job_201306061504_0038, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201306061504_0038 Kill Command = c:\Hadoop\hadoop-1.1.0-SNAPSHOT\bin\hadoop.cmd job -kill job_201306061504_0038 Hadoop job information for Stage-1: number of mappers: 8; number of reducers: 0 2013-06-07 10:25:30,932 Stage-1 map = 0%, reduce = 0% 2013-06-07 10:25:39,953 Stage-1 map = 25%, reduce = 0% 2013-06-07 10:25:42,959 Stage-1 map = 49%, reduce = 0%, Cumulative CPU 8.172 sec 2013-06-07 10:25:43,962 Stage-1 map = 49%, reduce = 0%, Cumulative CPU 8.172 sec ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4160) Vectorized Query Execution in Hive
[ https://issues.apache.org/jira/browse/HIVE-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706089#comment-13706089 ] Jitendra Nath Pandey commented on HIVE-4160: Dmitry, Vinod There is significant amount of vectorization work in expression evaluation for example, arithmetic expressions or logical expressions or aggregations etc. Many of these expressions are pretty generic and different systems are likely to have similar semantics for these. It should be possible to re-use this code with little change in pig or other systems. It will be required to use same vectorized representation of data in the processing engine to re-use these expressions, but that part of code is also generic and re-usable. I think that could be a good starting point. However, a bunch of the vectorization work is in operator code where we have vectorized version of the hive operators. These operators are closely tied with hive semantics and implementation. Therefore, it will need some restructuring in hive code base as well to generalize these operators for re-use in other projects. Also, at this point we should be thinking more generally about a common physical layer shared between pig and hive. These languages can continue to have different logical plans but it would be desirable that they share common physical plan structure because they both use same map-reduce runtime. Vectorized Query Execution in Hive -- Key: HIVE-4160 URL: https://issues.apache.org/jira/browse/HIVE-4160 Project: Hive Issue Type: New Feature Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: Hive-Vectorized-Query-Execution-Design.docx, Hive-Vectorized-Query-Execution-Design-rev2.docx, Hive-Vectorized-Query-Execution-Design-rev3.docx, Hive-Vectorized-Query-Execution-Design-rev3.docx, Hive-Vectorized-Query-Execution-Design-rev3.pdf, Hive-Vectorized-Query-Execution-Design-rev4.docx, Hive-Vectorized-Query-Execution-Design-rev4.pdf, Hive-Vectorized-Query-Execution-Design-rev5.docx, Hive-Vectorized-Query-Execution-Design-rev5.pdf, Hive-Vectorized-Query-Execution-Design-rev6.docx, Hive-Vectorized-Query-Execution-Design-rev6.pdf, Hive-Vectorized-Query-Execution-Design-rev7.docx, Hive-Vectorized-Query-Execution-Design-rev8.docx, Hive-Vectorized-Query-Execution-Design-rev8.pdf, Hive-Vectorized-Query-Execution-Design-rev9.docx, Hive-Vectorized-Query-Execution-Design-rev9.pdf The Hive query execution engine currently processes one row at a time. A single row of data goes through all the operators before the next row can be processed. This mode of processing is very inefficient in terms of CPU usage. Research has demonstrated that this yields very low instructions per cycle [MonetDB X100]. Also currently Hive heavily relies on lazy deserialization and data columns go through a layer of object inspectors that identify column type, deserialize data and determine appropriate expression routines in the inner loop. These layers of virtual method calls further slow down the processing. This work will add support for vectorized query execution to Hive, where, instead of individual rows, batches of about a thousand rows at a time are processed. Each column in the batch is represented as a vector of a primitive data type. The inner loop of execution scans these vectors very fast, avoiding method calls, deserialization, unnecessary if-then-else, etc. This substantially reduces CPU time used, and gives excellent instructions per cycle (i.e. improved processor pipeline utilization). See the attached design specification for more details. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4859) String column comparison classes should be renamed.
Jitendra Nath Pandey created HIVE-4859: -- Summary: String column comparison classes should be renamed. Key: HIVE-4859 URL: https://issues.apache.org/jira/browse/HIVE-4859 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey FilterStringColEqualStringCol should be renamed to FilterStringColEqualStringColumn. Similarly, all string comparison classes should be renamed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4859) String column comparison classes should be renamed.
[ https://issues.apache.org/jira/browse/HIVE-4859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-4859: --- Attachment: HIVE-4859.1.patch String column comparison classes should be renamed. --- Key: HIVE-4859 URL: https://issues.apache.org/jira/browse/HIVE-4859 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-4859.1.patch FilterStringColEqualStringCol should be renamed to FilterStringColEqualStringColumn. Similarly, all string comparison classes should be renamed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4859) String column comparison classes should be renamed.
[ https://issues.apache.org/jira/browse/HIVE-4859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-4859: --- Status: Patch Available (was: Open) String column comparison classes should be renamed. --- Key: HIVE-4859 URL: https://issues.apache.org/jira/browse/HIVE-4859 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-4859.1.patch FilterStringColEqualStringCol should be renamed to FilterStringColEqualStringColumn. Similarly, all string comparison classes should be renamed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4859) String column comparison classes should be renamed.
[ https://issues.apache.org/jira/browse/HIVE-4859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708996#comment-13708996 ] Jitendra Nath Pandey commented on HIVE-4859: Patch uploaded. https://reviews.apache.org/r/12560/ String column comparison classes should be renamed. --- Key: HIVE-4859 URL: https://issues.apache.org/jira/browse/HIVE-4859 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-4859.1.patch FilterStringColEqualStringCol should be renamed to FilterStringColEqualStringColumn. Similarly, all string comparison classes should be renamed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4684) Query with filter constant on left of = and column expression on right does not vectorize
[ https://issues.apache.org/jira/browse/HIVE-4684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-4684: --- Attachment: HIVE-4684.1.patch Query with filter constant on left of = and column expression on right does not vectorize --- Key: HIVE-4684 URL: https://issues.apache.org/jira/browse/HIVE-4684 Project: Hive Issue Type: Sub-task Components: Query Processor Reporter: Eric Hanson Assignee: Sarvesh Sakalanaga Attachments: Hive-4684.0.patch, Hive-4684.1.patch, HIVE-4684.1.patch select dmachineid from factsqlengineam_vec_orc where 1073 = dmachineid + 1; Does not go down the vectorization path. Output: hive select dmachineid from factsqlengineam_vec_orc where 1073 = dmachineid + 1; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Validating if vectorized execution is applicable Cannot vectorize the plan: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassNotFoundException: org.apache.hadoop.hiv e.ql.exec.vector.expressions.gen.FilterLongScalarEqualLongColumn Starting Job = job_201306061504_0038, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201306061504_0038 Kill Command = c:\Hadoop\hadoop-1.1.0-SNAPSHOT\bin\hadoop.cmd job -kill job_201306061504_0038 Hadoop job information for Stage-1: number of mappers: 8; number of reducers: 0 2013-06-07 10:25:30,932 Stage-1 map = 0%, reduce = 0% 2013-06-07 10:25:39,953 Stage-1 map = 25%, reduce = 0% 2013-06-07 10:25:42,959 Stage-1 map = 49%, reduce = 0%, Cumulative CPU 8.172 sec 2013-06-07 10:25:43,962 Stage-1 map = 49%, reduce = 0%, Cumulative CPU 8.172 sec ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-4684) Query with filter constant on left of = and column expression on right does not vectorize
[ https://issues.apache.org/jira/browse/HIVE-4684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey reassigned HIVE-4684: -- Assignee: Jitendra Nath Pandey (was: Sarvesh Sakalanaga) Query with filter constant on left of = and column expression on right does not vectorize --- Key: HIVE-4684 URL: https://issues.apache.org/jira/browse/HIVE-4684 Project: Hive Issue Type: Sub-task Components: Query Processor Reporter: Eric Hanson Assignee: Jitendra Nath Pandey Attachments: Hive-4684.0.patch, Hive-4684.1.patch, HIVE-4684.1.patch, HIVE-4684.2.patch select dmachineid from factsqlengineam_vec_orc where 1073 = dmachineid + 1; Does not go down the vectorization path. Output: hive select dmachineid from factsqlengineam_vec_orc where 1073 = dmachineid + 1; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Validating if vectorized execution is applicable Cannot vectorize the plan: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassNotFoundException: org.apache.hadoop.hiv e.ql.exec.vector.expressions.gen.FilterLongScalarEqualLongColumn Starting Job = job_201306061504_0038, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201306061504_0038 Kill Command = c:\Hadoop\hadoop-1.1.0-SNAPSHOT\bin\hadoop.cmd job -kill job_201306061504_0038 Hadoop job information for Stage-1: number of mappers: 8; number of reducers: 0 2013-06-07 10:25:30,932 Stage-1 map = 0%, reduce = 0% 2013-06-07 10:25:39,953 Stage-1 map = 25%, reduce = 0% 2013-06-07 10:25:42,959 Stage-1 map = 49%, reduce = 0%, Cumulative CPU 8.172 sec 2013-06-07 10:25:43,962 Stage-1 map = 49%, reduce = 0%, Cumulative CPU 8.172 sec ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4684) Query with filter constant on left of = and column expression on right does not vectorize
[ https://issues.apache.org/jira/browse/HIVE-4684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-4684: --- Attachment: HIVE-4684.2.patch Query with filter constant on left of = and column expression on right does not vectorize --- Key: HIVE-4684 URL: https://issues.apache.org/jira/browse/HIVE-4684 Project: Hive Issue Type: Sub-task Components: Query Processor Reporter: Eric Hanson Assignee: Jitendra Nath Pandey Attachments: Hive-4684.0.patch, Hive-4684.1.patch, HIVE-4684.1.patch, HIVE-4684.2.patch select dmachineid from factsqlengineam_vec_orc where 1073 = dmachineid + 1; Does not go down the vectorization path. Output: hive select dmachineid from factsqlengineam_vec_orc where 1073 = dmachineid + 1; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Validating if vectorized execution is applicable Cannot vectorize the plan: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassNotFoundException: org.apache.hadoop.hiv e.ql.exec.vector.expressions.gen.FilterLongScalarEqualLongColumn Starting Job = job_201306061504_0038, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201306061504_0038 Kill Command = c:\Hadoop\hadoop-1.1.0-SNAPSHOT\bin\hadoop.cmd job -kill job_201306061504_0038 Hadoop job information for Stage-1: number of mappers: 8; number of reducers: 0 2013-06-07 10:25:30,932 Stage-1 map = 0%, reduce = 0% 2013-06-07 10:25:39,953 Stage-1 map = 25%, reduce = 0% 2013-06-07 10:25:42,959 Stage-1 map = 49%, reduce = 0%, Cumulative CPU 8.172 sec 2013-06-07 10:25:43,962 Stage-1 map = 49%, reduce = 0%, Cumulative CPU 8.172 sec ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4684) Query with filter constant on left of = and column expression on right does not vectorize
[ https://issues.apache.org/jira/browse/HIVE-4684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712999#comment-13712999 ] Jitendra Nath Pandey commented on HIVE-4684: There are two issues here : 1) If the left expression is constant and right expression is a generic function, the query doesn't vectorize because corresponding vector expressions are missing. 2) If the left expression is constant and right is a column expression, the query vectorizes to an incorrect expression with column on left, which won't work for non-commutative expressions. The latest patch includes the missing expressions that addresses (1) and also a one line fix in VectorizationContext that fixes (2). Query with filter constant on left of = and column expression on right does not vectorize --- Key: HIVE-4684 URL: https://issues.apache.org/jira/browse/HIVE-4684 Project: Hive Issue Type: Sub-task Components: Query Processor Reporter: Eric Hanson Assignee: Jitendra Nath Pandey Attachments: Hive-4684.0.patch, Hive-4684.1.patch, HIVE-4684.1.patch, HIVE-4684.2.patch select dmachineid from factsqlengineam_vec_orc where 1073 = dmachineid + 1; Does not go down the vectorization path. Output: hive select dmachineid from factsqlengineam_vec_orc where 1073 = dmachineid + 1; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Validating if vectorized execution is applicable Cannot vectorize the plan: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassNotFoundException: org.apache.hadoop.hiv e.ql.exec.vector.expressions.gen.FilterLongScalarEqualLongColumn Starting Job = job_201306061504_0038, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201306061504_0038 Kill Command = c:\Hadoop\hadoop-1.1.0-SNAPSHOT\bin\hadoop.cmd job -kill job_201306061504_0038 Hadoop job information for Stage-1: number of mappers: 8; number of reducers: 0 2013-06-07 10:25:30,932 Stage-1 map = 0%, reduce = 0% 2013-06-07 10:25:39,953 Stage-1 map = 25%, reduce = 0% 2013-06-07 10:25:42,959 Stage-1 map = 49%, reduce = 0%, Cumulative CPU 8.172 sec 2013-06-07 10:25:43,962 Stage-1 map = 49%, reduce = 0%, Cumulative CPU 8.172 sec ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4684) Query with filter constant on left of = and column expression on right does not vectorize
[ https://issues.apache.org/jira/browse/HIVE-4684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-4684: --- Attachment: HIVE-4684.3.patch Query with filter constant on left of = and column expression on right does not vectorize --- Key: HIVE-4684 URL: https://issues.apache.org/jira/browse/HIVE-4684 Project: Hive Issue Type: Sub-task Components: Query Processor Reporter: Eric Hanson Assignee: Jitendra Nath Pandey Attachments: Hive-4684.0.patch, Hive-4684.1.patch, HIVE-4684.1.patch, HIVE-4684.2.patch, HIVE-4684.3.patch select dmachineid from factsqlengineam_vec_orc where 1073 = dmachineid + 1; Does not go down the vectorization path. Output: hive select dmachineid from factsqlengineam_vec_orc where 1073 = dmachineid + 1; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Validating if vectorized execution is applicable Cannot vectorize the plan: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassNotFoundException: org.apache.hadoop.hiv e.ql.exec.vector.expressions.gen.FilterLongScalarEqualLongColumn Starting Job = job_201306061504_0038, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201306061504_0038 Kill Command = c:\Hadoop\hadoop-1.1.0-SNAPSHOT\bin\hadoop.cmd job -kill job_201306061504_0038 Hadoop job information for Stage-1: number of mappers: 8; number of reducers: 0 2013-06-07 10:25:30,932 Stage-1 map = 0%, reduce = 0% 2013-06-07 10:25:39,953 Stage-1 map = 25%, reduce = 0% 2013-06-07 10:25:42,959 Stage-1 map = 49%, reduce = 0%, Cumulative CPU 8.172 sec 2013-06-07 10:25:43,962 Stage-1 map = 49%, reduce = 0%, Cumulative CPU 8.172 sec ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4822) implement vectorized math functions
[ https://issues.apache.org/jira/browse/HIVE-4822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13718932#comment-13718932 ] Jitendra Nath Pandey commented on HIVE-4822: bq. How does explain work with the vectorization engine? The 'explain' continues to work as before and returns the same plan as in non-vector mode. Vectorization executes exactly the same query plan, only the implementation of the operators and expressions has been changed to run in vectorized fashion. However, we do plan to enhance 'explain' to also show which operators will be executed in vectorized mode. We will start working on it very soon and file a jira. In current implementation, we don't need the 'explain' annotations on vectorized UDFs, because the vectorized UDFs are used at run time. In the query planning stage only row mode UDFs are used, however at query execution time if vectorization is possible, we switch to corresponding vectorized UDFs. We adopted this approach to avoid any changes to query planner for vectorization. bq. Could we somehow hybrid some of our existing UDFS to work from both engines? We will surely have to support the hybrid approach, as you are suggesting, for UDFs that users have implemented, even though we will recommend users to re-implement their UDFs in vectorized fashion. However, for built in hive UDFs, it will almost always be better to have vectorized implementation for performance. Eventually, we do want to have vectorized implementation for all built-in UDFs. bq. Are we sure that functions that operate on doubles and floats are going to round exactly the same way? We have used same underlying java libraries therefore, our results should match. In our testing we do compare the results with non-vector results to make sure. bq. Do we have a wiki page or something where we are keeping track of what is currently supported using vectorization? That's a good idea, I agree we should track this so that community is aware. It will also help and encourage folks to identify areas to contribute. implement vectorized math functions --- Key: HIVE-4822 URL: https://issues.apache.org/jira/browse/HIVE-4822 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Eric Hanson Assignee: Eric Hanson Fix For: vectorization-branch Attachments: HIVE-4822.1.patch, HIVE-4822.4.patch, HIVE-4822.5-vectorization.patch Implement vectorized support for the all the built-in math functions. This includes implementing the vectorized operation, and tying it all together in VectorizationContext so it runs end-to-end. These functions include: round(Col) Round(Col, N) Floor(Col) Ceil(Col) Rand(), Rand(seed) Exp(Col) Ln(Col) Log10(Col) Log2(Col) Log(base, Col) Pow(col, p), Power(col, p) Sqrt(Col) Bin(Col) Hex(Col) Unhex(Col) Conv(Col, from_base, to_base) Abs(Col) Pmod(arg1, arg2) Sin(Col) Asin(Col) Cos(Col) ACos(Col) Atan(Col) Degrees(Col) Radians(Col) Positive(Col) Negative(Col) Sign(Col) E() Pi() To reduce the total code volume, do an implicit type cast from non-double input types to double. Also, POSITITVE and NEGATIVE are syntactic sugar for unary + and unary -, so reuse code for those as appropriate. Try to call the function directly in the inner loop and avoid new() or expensive operations, as appropriate. Templatize the code where appropriate, e.g. all the unary function of form DOUBLE func(DOUBLE) can probably be done with a template. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira