[jira] [Commented] (HIVE-633) ADD FILE command does not accept quoted filenames
[ https://issues.apache.org/jira/browse/HIVE-633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13566331#comment-13566331 ] kiran sreekumar commented on HIVE-633: -- is this issue still relevant, as i would like to work on this. ADD FILE command does not accept quoted filenames - Key: HIVE-633 URL: https://issues.apache.org/jira/browse/HIVE-633 Project: Hive Issue Type: Bug Affects Versions: 0.3.0 Environment: Ubuntu Linux (intrepid) Reporter: Saurabh Nanda Priority: Minor The following command says file does not exist. Removing the quotes around the filename makes it work. hive add files '/tmp/testing.jar'; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-3962) number of distinct values are in column statistics
Amareshwari Sriramadasu created HIVE-3962: - Summary: number of distinct values are in column statistics Key: HIVE-3962 URL: https://issues.apache.org/jira/browse/HIVE-3962 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.10.0 Reporter: Amareshwari Sriramadasu When we run the query on hive ql src table : select count(distinct(key)), count(distinct(value) from src; 309 309 After running the following analyze query, the stats in metastore seem wrong: analyze table src compute statistics for columns key, value; --- stats in metastore --- mysql select * from TAB_COL_STATS where TABLE_NAME=src; | CS_ID | DB_NAME | TABLE_NAME | COLUMN_NAME | COLUMN_TYPE | TBL_ID | LONG_LOW_VALUE | LONG_HIGH_VALUE | DOUBLE_HIGH_VALUE | DOUBLE_LOW_VALUE | BIG_DECIMAL_LOW_VALUE | BIG_DECIMAL_HIGH_VALUE | NUM_NULLS | NUM_DISTINCTS | AVG_COL_LEN | MAX_COL_LEN | NUM_TRUES | NUM_FALSES | LAST_ANALYZED | | 5 | default | src| key | int | 11 | 0 | 498 |0. | 0. | NULL | NULL | 0 | 291 | 0. | 0 | 0 | 0 |1359539181 | | 6 | default | src| value | string | 11 | 0 | 0 |0. | 0. | NULL | NULL | 0 | 112 | 6.8120 | 7 | 0 | 0 |1359539181 | -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3962) Number of distinct values are wrong in column statistics
[ https://issues.apache.org/jira/browse/HIVE-3962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-3962: -- Summary: Number of distinct values are wrong in column statistics (was: number of distinct values are in column statistics) Number of distinct values are wrong in column statistics Key: HIVE-3962 URL: https://issues.apache.org/jira/browse/HIVE-3962 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.10.0 Reporter: Amareshwari Sriramadasu When we run the query on hive ql src table : select count(distinct(key)), count(distinct(value) from src; 309 309 After running the following analyze query, the stats in metastore seem wrong: analyze table src compute statistics for columns key, value; --- stats in metastore --- mysql select * from TAB_COL_STATS where TABLE_NAME=src; | CS_ID | DB_NAME | TABLE_NAME | COLUMN_NAME | COLUMN_TYPE | TBL_ID | LONG_LOW_VALUE | LONG_HIGH_VALUE | DOUBLE_HIGH_VALUE | DOUBLE_LOW_VALUE | BIG_DECIMAL_LOW_VALUE | BIG_DECIMAL_HIGH_VALUE | NUM_NULLS | NUM_DISTINCTS | AVG_COL_LEN | MAX_COL_LEN | NUM_TRUES | NUM_FALSES | LAST_ANALYZED | | 5 | default | src| key | int | 11 | 0 | 498 |0. | 0. | NULL | NULL | 0 | 291 | 0. | 0 | 0 | 0 |1359539181 | | 6 | default | src| value | string | 11 | 0 | 0 |0. | 0. | NULL | NULL | 0 | 112 | 6.8120 | 7 | 0 | 0 |1359539181 | -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3785) Core hive changes for HiveServer2 implementation
[ https://issues.apache.org/jira/browse/HIVE-3785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13566399#comment-13566399 ] Namit Jain commented on HIVE-3785: -- I am sorry for the delay on my part. Can you refresh ? I will definitely review this time. Core hive changes for HiveServer2 implementation Key: HIVE-3785 URL: https://issues.apache.org/jira/browse/HIVE-3785 Project: Hive Issue Type: Sub-task Components: Authentication, Build Infrastructure, Configuration, Thrift API Affects Versions: 0.10.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Attachments: HS2-changed-files-only.patch The subtask to track changes in the core hive components for HiveServer2 implementation -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3785) Core hive changes for HiveServer2 implementation
[ https://issues.apache.org/jira/browse/HIVE-3785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13566400#comment-13566400 ] Namit Jain commented on HIVE-3785: -- cc [~mgrover], [~prasadm] Core hive changes for HiveServer2 implementation Key: HIVE-3785 URL: https://issues.apache.org/jira/browse/HIVE-3785 Project: Hive Issue Type: Sub-task Components: Authentication, Build Infrastructure, Configuration, Thrift API Affects Versions: 0.10.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Attachments: HS2-changed-files-only.patch The subtask to track changes in the core hive components for HiveServer2 implementation -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3950) Remove code for merging files via MR job
[ https://issues.apache.org/jira/browse/HIVE-3950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13566401#comment-13566401 ] Hudson commented on HIVE-3950: -- Integrated in Hive-trunk-hadoop2 #97 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/97/]) HIVE-3950 : Remove code for merging files via MR job (Ashutosh Chauhan, Reviewed by Namit Jain) (Revision 1440238) Result = FAILURE hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1440238 Files : * /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java * /hive/trunk/conf/hive-default.xml.template * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java * /hive/trunk/ql/src/test/queries/clientnegative/dyn_part_merge.q * /hive/trunk/ql/src/test/results/clientnegative/dyn_part_merge.q.out Remove code for merging files via MR job Key: HIVE-3950 URL: https://issues.apache.org/jira/browse/HIVE-3950 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Fix For: 0.11.0 Attachments: hive-3950_1.patch, hive-3950_2.patch, hive-3950.patch Hive can merge files either via MR job or via map only job. Obviously, doing it via map-only job is more efficient, but there is an option of doing it via MR job as well because CombineFileInputFormat is available only in hadoop-0.20 and later. Since, we no longer support hadoop versions earlier than 20 anymore all that is now dead code, we should get rid of it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-933) Infer bucketing/sorting properties
[ https://issues.apache.org/jira/browse/HIVE-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13566402#comment-13566402 ] Hudson commented on HIVE-933: - Integrated in Hive-trunk-hadoop2 #97 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/97/]) HIVE-933 Infer bucketing/sorting properties (Kevin Wilfong via namit) (Revision 1440271) Result = FAILURE namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1440271 Files : * /hive/trunk/build-common.xml * /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java * /hive/trunk/conf/hive-default.xml.template * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ConditionalTask.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExtractOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ForwardOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MapRedTask.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/lib/RuleExactMatch.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/BucketingSortingCtx.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/BucketingSortingInferenceOptimizer.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/BucketingSortingOpProcFactory.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/PhysicalOptimizer.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java * /hive/trunk/ql/src/test/queries/clientnegative/merge_negative_3.q * /hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort.q * /hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_bucketed_table.q * /hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_convert_join.q * /hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_dyn_part.q * /hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_grouping_operators.q * /hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_list_bucket.q * /hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_map_operators.q * /hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_merge.q * /hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_multi_insert.q * /hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_num_buckets.q * /hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_reducers_power_two.q * /hive/trunk/ql/src/test/queries/clientpositive/reduce_deduplicate_exclude_gby.q * /hive/trunk/ql/src/test/results/clientnegative/merge_negative_3.q.out * /hive/trunk/ql/src/test/results/clientpositive/ctas.q.out * /hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort.q.out * /hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_bucketed_table.q.out * /hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_convert_join.q.out * /hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_dyn_part.q.out * /hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_grouping_operators.q.out * /hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_list_bucket.q.out * /hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_map_operators.q.out * /hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_merge.q.out * /hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_multi_insert.q.out * /hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_num_buckets.q.out * /hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_reducers_power_two.q.out * /hive/trunk/ql/src/test/results/compiler/plan/case_sensitivity.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/cast1.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/groupby1.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/groupby2.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/groupby3.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/groupby4.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/groupby5.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/groupby6.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/input1.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/input2.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/input20.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/input3.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/input4.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/input5.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/input6.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/input7.q.xml *
[jira] [Updated] (HIVE-3403) user should not specify mapjoin to perform sort-merge bucketed join
[ https://issues.apache.org/jira/browse/HIVE-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-3403: - Attachment: hive.3403.19.patch user should not specify mapjoin to perform sort-merge bucketed join --- Key: HIVE-3403 URL: https://issues.apache.org/jira/browse/HIVE-3403 Project: Hive Issue Type: Bug Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.3403.10.patch, hive.3403.11.patch, hive.3403.12.patch, hive.3403.13.patch, hive.3403.14.patch, hive.3403.15.patch, hive.3403.16.patch, hive.3403.17.patch, hive.3403.18.patch, hive.3403.19.patch, hive.3403.1.patch, hive.3403.2.patch, hive.3403.3.patch, hive.3403.4.patch, hive.3403.5.patch, hive.3403.6.patch, hive.3403.7.patch, hive.3403.8.patch, hive.3403.9.patch Currently, in order to perform a sort merge bucketed join, the user needs to set hive.optimize.bucketmapjoin.sortedmerge to true, and also specify the mapjoin hint. The user should not specify any hints. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3950) Remove code for merging files via MR job
[ https://issues.apache.org/jira/browse/HIVE-3950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13566422#comment-13566422 ] Hudson commented on HIVE-3950: -- Integrated in Hive-trunk-h0.21 #1946 (See [https://builds.apache.org/job/Hive-trunk-h0.21/1946/]) HIVE-3950 : Remove code for merging files via MR job (Ashutosh Chauhan, Reviewed by Namit Jain) (Revision 1440238) Result = FAILURE hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1440238 Files : * /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java * /hive/trunk/conf/hive-default.xml.template * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java * /hive/trunk/ql/src/test/queries/clientnegative/dyn_part_merge.q * /hive/trunk/ql/src/test/results/clientnegative/dyn_part_merge.q.out Remove code for merging files via MR job Key: HIVE-3950 URL: https://issues.apache.org/jira/browse/HIVE-3950 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Fix For: 0.11.0 Attachments: hive-3950_1.patch, hive-3950_2.patch, hive-3950.patch Hive can merge files either via MR job or via map only job. Obviously, doing it via map-only job is more efficient, but there is an option of doing it via MR job as well because CombineFileInputFormat is available only in hadoop-0.20 and later. Since, we no longer support hadoop versions earlier than 20 anymore all that is now dead code, we should get rid of it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Hive-trunk-h0.21 - Build # 1946 - Failure
Changes for Build #1944 [namit] HIVE-3873 lot of tests failing for hadoop 23 (Gang Tim Liu via namit) Changes for Build #1945 [hashutosh] Missed deleting empty file GenMRRedSink4.java while commiting 3784 [hashutosh] HIVE-de-emphasize mapjoin hint (Namit Jain via Ashutosh Chauhan) Changes for Build #1946 [hashutosh] HIVE-3950 : Remove code for merging files via MR job (Ashutosh Chauhan, Reviewed by Namit Jain) 1 tests failed. FAILED: org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_stats_aggregator_error_1 Error Message: Forked Java VM exited abnormally. Please note the time in the report does not reflect the time until the VM exit. Stack Trace: junit.framework.AssertionFailedError: Forked Java VM exited abnormally. Please note the time in the report does not reflect the time until the VM exit. at net.sf.antcontrib.logic.ForTask.doSequentialIteration(ForTask.java:259) at net.sf.antcontrib.logic.ForTask.doToken(ForTask.java:268) at net.sf.antcontrib.logic.ForTask.doTheTasks(ForTask.java:324) at net.sf.antcontrib.logic.ForTask.execute(ForTask.java:244) The Apache Jenkins build system has built Hive-trunk-h0.21 (build #1946) Status: Failure Check console output at https://builds.apache.org/job/Hive-trunk-h0.21/1946/ to view the results.
[jira] [Commented] (HIVE-933) Infer bucketing/sorting properties
[ https://issues.apache.org/jira/browse/HIVE-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13566726#comment-13566726 ] Hudson commented on HIVE-933: - Integrated in Hive-trunk-h0.21 #1947 (See [https://builds.apache.org/job/Hive-trunk-h0.21/1947/]) HIVE-933 Infer bucketing/sorting properties (Kevin Wilfong via namit) (Revision 1440271) Result = SUCCESS namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1440271 Files : * /hive/trunk/build-common.xml * /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java * /hive/trunk/conf/hive-default.xml.template * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ConditionalTask.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExtractOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ForwardOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MapRedTask.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/lib/RuleExactMatch.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/BucketingSortingCtx.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/BucketingSortingInferenceOptimizer.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/BucketingSortingOpProcFactory.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/PhysicalOptimizer.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java * /hive/trunk/ql/src/test/queries/clientnegative/merge_negative_3.q * /hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort.q * /hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_bucketed_table.q * /hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_convert_join.q * /hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_dyn_part.q * /hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_grouping_operators.q * /hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_list_bucket.q * /hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_map_operators.q * /hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_merge.q * /hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_multi_insert.q * /hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_num_buckets.q * /hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_reducers_power_two.q * /hive/trunk/ql/src/test/queries/clientpositive/reduce_deduplicate_exclude_gby.q * /hive/trunk/ql/src/test/results/clientnegative/merge_negative_3.q.out * /hive/trunk/ql/src/test/results/clientpositive/ctas.q.out * /hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort.q.out * /hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_bucketed_table.q.out * /hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_convert_join.q.out * /hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_dyn_part.q.out * /hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_grouping_operators.q.out * /hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_list_bucket.q.out * /hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_map_operators.q.out * /hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_merge.q.out * /hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_multi_insert.q.out * /hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_num_buckets.q.out * /hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_reducers_power_two.q.out * /hive/trunk/ql/src/test/results/compiler/plan/case_sensitivity.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/cast1.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/groupby1.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/groupby2.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/groupby3.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/groupby4.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/groupby5.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/groupby6.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/input1.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/input2.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/input20.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/input3.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/input4.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/input5.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/input6.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/input7.q.xml *
Hive-trunk-h0.21 - Build # 1947 - Fixed
Changes for Build #1944 [namit] HIVE-3873 lot of tests failing for hadoop 23 (Gang Tim Liu via namit) Changes for Build #1945 [hashutosh] Missed deleting empty file GenMRRedSink4.java while commiting 3784 [hashutosh] HIVE-de-emphasize mapjoin hint (Namit Jain via Ashutosh Chauhan) Changes for Build #1946 [hashutosh] HIVE-3950 : Remove code for merging files via MR job (Ashutosh Chauhan, Reviewed by Namit Jain) Changes for Build #1947 [namit] HIVE-933 Infer bucketing/sorting properties (Kevin Wilfong via namit) All tests passed The Apache Jenkins build system has built Hive-trunk-h0.21 (build #1947) Status: Fixed Check console output at https://builds.apache.org/job/Hive-trunk-h0.21/1947/ to view the results.
[jira] [Updated] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive
[ https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-3874: - Attachment: hive.3874.2.patch Create a new Optimized Row Columnar file format for Hive Key: HIVE-3874 URL: https://issues.apache.org/jira/browse/HIVE-3874 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: hive.3874.2.patch, OrcFileIntro.pptx, orc.tgz There are several limitations of the current RC File format that I'd like to address by creating a new format: * each column value is stored as a binary blob, which means: ** the entire column value must be read, decompressed, and deserialized ** the file format can't use smarter type-specific compression ** push down filters can't be evaluated * the start of each row group needs to be found by scanning * user metadata can only be added to the file when the file is created * the file doesn't store the number of rows per a file or row group * there is no mechanism for seeking to a particular row number, which is required for external indexes. * there is no mechanism for storing light weight indexes within the file to enable push-down filters to skip entire row groups. * the type of the rows aren't stored in the file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive
[ https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13566727#comment-13566727 ] Namit Jain commented on HIVE-3874: -- I took a stab at it. I am attaching it just in case - feel free to ignore it. I was not able to get the protocol buffer file auto-generated from ant, so I manually generated it for the purpose of this patch. Create a new Optimized Row Columnar file format for Hive Key: HIVE-3874 URL: https://issues.apache.org/jira/browse/HIVE-3874 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: hive.3874.2.patch, OrcFileIntro.pptx, orc.tgz There are several limitations of the current RC File format that I'd like to address by creating a new format: * each column value is stored as a binary blob, which means: ** the entire column value must be read, decompressed, and deserialized ** the file format can't use smarter type-specific compression ** push down filters can't be evaluated * the start of each row group needs to be found by scanning * user metadata can only be added to the file when the file is created * the file doesn't store the number of rows per a file or row group * there is no mechanism for seeking to a particular row number, which is required for external indexes. * there is no mechanism for storing light weight indexes within the file to enable push-down filters to skip entire row groups. * the type of the rows aren't stored in the file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3940) Track columns accessed in each table in a query
[ https://issues.apache.org/jira/browse/HIVE-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samuel Yuan updated HIVE-3940: -- Attachment: HIVE-3940.3.patch.txt Updated. Track columns accessed in each table in a query --- Key: HIVE-3940 URL: https://issues.apache.org/jira/browse/HIVE-3940 Project: Hive Issue Type: Task Components: Query Processor Reporter: Samuel Yuan Assignee: Samuel Yuan Priority: Minor Attachments: HIVE-3940.1.patch.txt, HIVE-3940.2.patch.txt, HIVE-3940.3.patch.txt Similar to partition access logs, we need to have columns access logs, so later we can build tools/reports to inform users if there are wasted columns in a table to be trimmed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13566810#comment-13566810 ] Gunther Hagleitner commented on HIVE-2340: -- FYI: Ran all unit tests on patch .9. Failing tests are: groupby_distinct_samekey.q,join31.q,reduce_deduplicate_extended.q (TestCliDriver). Failures look like outdated golden files (explain output changed). Uploaded testclidriver.txt for reference. optimize orderby followed by a groupby -- Key: HIVE-2340 URL: https://issues.apache.org/jira/browse/HIVE-2340 Project: Hive Issue Type: Sub-task Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Labels: perfomance Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch Before implementing optimizer for JOIN-GBY, try to implement RS-GBY optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-2340: - Attachment: testclidriver.txt just the diff of the latest unit test run. optimize orderby followed by a groupby -- Key: HIVE-2340 URL: https://issues.apache.org/jira/browse/HIVE-2340 Project: Hive Issue Type: Sub-task Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Labels: perfomance Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, testclidriver.txt Before implementing optimizer for JOIN-GBY, try to implement RS-GBY optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13566898#comment-13566898 ] Phabricator commented on HIVE-2340: --- hagleitn has commented on the revision HIVE-2340 [jira] optimize orderby followed by a groupby. Partial review INLINE COMMENTS common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:521 Not sure why this is needed or why this defaults to 4. From comment below it seems this is just to avoid the single reducer order-by case for performance reasons, is that correct? ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:787 Is this required or extra protection? Comment at the top of the file says mapjoin optimization happens before this (and probably should for performance reasons). Also, if I understand it correctly joinAndSort might be a better name than fixed. You're basically saying that if an optimization wants to change the join after this they need to make sure the ordering of the keys is preserved, right? ql/src/java/org/apache/hadoop/hive/ql/ppd/PredicateTransitivePropagate.java:136 seems orthogonal to this patch. ql/src/test/queries/clientpositive/reduce_deduplicate.q:7 There are not a lot of tests, for min.reducer=1. No order by case for instance. Maybe the reduce_deduplicate_extended.q should run with both default and min.reducer=1. REVISION DETAIL https://reviews.facebook.net/D1209 To: JIRA, navis Cc: hagleitn optimize orderby followed by a groupby -- Key: HIVE-2340 URL: https://issues.apache.org/jira/browse/HIVE-2340 Project: Hive Issue Type: Sub-task Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Labels: perfomance Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, testclidriver.txt Before implementing optimizer for JOIN-GBY, try to implement RS-GBY optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13566902#comment-13566902 ] Gunther Hagleitner commented on HIVE-2340: -- Partial review on phabricator. Biggest question is around hive.optimize.reducededuplication.min.reducer. That basically disables the orderby followed by groupby optimization which was the original motivation for the jira. Navis, can you explain this some more? Might be another ticket, but would it be possible to optimize group by/sort by as well with this? optimize orderby followed by a groupby -- Key: HIVE-2340 URL: https://issues.apache.org/jira/browse/HIVE-2340 Project: Hive Issue Type: Sub-task Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Labels: perfomance Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, testclidriver.txt Before implementing optimizer for JOIN-GBY, try to implement RS-GBY optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3917) Support noscan operation for analyze command
[ https://issues.apache.org/jira/browse/HIVE-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Tim Liu updated HIVE-3917: --- Attachment: HIVE-3917.patch.2 Support noscan operation for analyze command Key: HIVE-3917 URL: https://issues.apache.org/jira/browse/HIVE-3917 Project: Hive Issue Type: Improvement Components: Statistics Affects Versions: 0.11.0 Reporter: Gang Tim Liu Assignee: Gang Tim Liu Attachments: HIVE-3917.patch.1, HIVE-3917.patch.2 hive supports analyze command to gather statistics from existing tables/partition https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables It collects: 1. Number of Rows 2. Number of files 3. Size in Bytes If table/partition is big, the operation would take time since it will open all files and scan all data. It would be nice to support fast operation to gather statistics which doesn't require to open all files: 1. Number of files 2. Size in Bytes Potential syntax is ANALYZE TABLE tablename [PARTITION(partcol1[=val1], partcol2[=val2], ...)] COMPUTE STATISTICS [noscan]; In the future, all statistics without scan can be retrieved via this optional parameter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Work started] (HIVE-3917) Support noscan operation for analyze command
[ https://issues.apache.org/jira/browse/HIVE-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-3917 started by Gang Tim Liu. Support noscan operation for analyze command Key: HIVE-3917 URL: https://issues.apache.org/jira/browse/HIVE-3917 Project: Hive Issue Type: Improvement Components: Statistics Affects Versions: 0.11.0 Reporter: Gang Tim Liu Assignee: Gang Tim Liu Attachments: HIVE-3917.patch.1, HIVE-3917.patch.2 hive supports analyze command to gather statistics from existing tables/partition https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables It collects: 1. Number of Rows 2. Number of files 3. Size in Bytes If table/partition is big, the operation would take time since it will open all files and scan all data. It would be nice to support fast operation to gather statistics which doesn't require to open all files: 1. Number of files 2. Size in Bytes Potential syntax is ANALYZE TABLE tablename [PARTITION(partcol1[=val1], partcol2[=val2], ...)] COMPUTE STATISTICS [noscan]; In the future, all statistics without scan can be retrieved via this optional parameter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3917) Support noscan operation for analyze command
[ https://issues.apache.org/jira/browse/HIVE-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Tim Liu updated HIVE-3917: --- Status: Patch Available (was: In Progress) patch is available. Support noscan operation for analyze command Key: HIVE-3917 URL: https://issues.apache.org/jira/browse/HIVE-3917 Project: Hive Issue Type: Improvement Components: Statistics Affects Versions: 0.11.0 Reporter: Gang Tim Liu Assignee: Gang Tim Liu Attachments: HIVE-3917.patch.1, HIVE-3917.patch.2 hive supports analyze command to gather statistics from existing tables/partition https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables It collects: 1. Number of Rows 2. Number of files 3. Size in Bytes If table/partition is big, the operation would take time since it will open all files and scan all data. It would be nice to support fast operation to gather statistics which doesn't require to open all files: 1. Number of files 2. Size in Bytes Potential syntax is ANALYZE TABLE tablename [PARTITION(partcol1[=val1], partcol2[=val2], ...)] COMPUTE STATISTICS [noscan]; In the future, all statistics without scan can be retrieved via this optional parameter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Jenkins build is back to normal : Hive-0.10.0-SNAPSHOT-h0.20.1 #50
See https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/50/
[jira] [Updated] (HIVE-3940) Track columns accessed in each table in a query
[ https://issues.apache.org/jira/browse/HIVE-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Wilfong updated HIVE-3940: Resolution: Fixed Fix Version/s: 0.11.0 Status: Resolved (was: Patch Available) Committed, thanks Samuel. Track columns accessed in each table in a query --- Key: HIVE-3940 URL: https://issues.apache.org/jira/browse/HIVE-3940 Project: Hive Issue Type: Task Components: Query Processor Reporter: Samuel Yuan Assignee: Samuel Yuan Priority: Minor Fix For: 0.11.0 Attachments: HIVE-3940.1.patch.txt, HIVE-3940.2.patch.txt, HIVE-3940.3.patch.txt Similar to partition access logs, we need to have columns access logs, so later we can build tools/reports to inform users if there are wasted columns in a table to be trimmed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-896) Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive.
[ https://issues.apache.org/jira/browse/HIVE-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13566946#comment-13566946 ] Ashutosh Chauhan commented on HIVE-896: --- PTFDesc only contains a serialized string for PTFDef. I think we should just merge these two classes. Rename the existing PTFDef to PTFDesc and removing the existing PTFDef. And than make sure that PTFDesc is serializable. Does that sound right? Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive. --- Key: HIVE-896 URL: https://issues.apache.org/jira/browse/HIVE-896 Project: Hive Issue Type: New Feature Components: OLAP, UDF Reporter: Amr Awadallah Priority: Minor Attachments: DataStructs.pdf, HIVE-896.1.patch.txt, Hive-896.2.patch.txt Windowing functions are very useful for click stream processing and similar time-series/sliding-window analytics. More details at: http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1006709 http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007059 http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007032 -- amr -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-896) Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive.
[ https://issues.apache.org/jira/browse/HIVE-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13566952#comment-13566952 ] Ashutosh Chauhan commented on HIVE-896: --- Also need to make sure that ASTNode and other antlr datastructures referenced (directly or via contained fields) in this new PTFDesc are not required in PTFOperator and are thus not serialized, thereby eliminating antlr runtime dependency. Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive. --- Key: HIVE-896 URL: https://issues.apache.org/jira/browse/HIVE-896 Project: Hive Issue Type: New Feature Components: OLAP, UDF Reporter: Amr Awadallah Priority: Minor Attachments: DataStructs.pdf, HIVE-896.1.patch.txt, Hive-896.2.patch.txt Windowing functions are very useful for click stream processing and similar time-series/sliding-window analytics. More details at: http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1006709 http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007059 http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007032 -- amr -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-3963) Allow Hive to get connect to RDBMS
Maxime LANCIAUX created HIVE-3963: - Summary: Allow Hive to get connect to RDBMS Key: HIVE-3963 URL: https://issues.apache.org/jira/browse/HIVE-3963 Project: Hive Issue Type: New Feature Reporter: Maxime LANCIAUX -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3917) Support noscan operation for analyze command
[ https://issues.apache.org/jira/browse/HIVE-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Tim Liu updated HIVE-3917: --- Attachment: (was: HIVE-3917.patch.2) Support noscan operation for analyze command Key: HIVE-3917 URL: https://issues.apache.org/jira/browse/HIVE-3917 Project: Hive Issue Type: Improvement Components: Statistics Affects Versions: 0.11.0 Reporter: Gang Tim Liu Assignee: Gang Tim Liu Attachments: HIVE-3917.patch.1 hive supports analyze command to gather statistics from existing tables/partition https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables It collects: 1. Number of Rows 2. Number of files 3. Size in Bytes If table/partition is big, the operation would take time since it will open all files and scan all data. It would be nice to support fast operation to gather statistics which doesn't require to open all files: 1. Number of files 2. Size in Bytes Potential syntax is ANALYZE TABLE tablename [PARTITION(partcol1[=val1], partcol2[=val2], ...)] COMPUTE STATISTICS [noscan]; In the future, all statistics without scan can be retrieved via this optional parameter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3778) Add MapJoinDesc.isBucketMapJoin() as part of explain plan
[ https://issues.apache.org/jira/browse/HIVE-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Tim Liu updated HIVE-3778: --- Attachment: HIVE-3778.patch.8 Add MapJoinDesc.isBucketMapJoin() as part of explain plan - Key: HIVE-3778 URL: https://issues.apache.org/jira/browse/HIVE-3778 Project: Hive Issue Type: Bug Reporter: Gang Tim Liu Assignee: Gang Tim Liu Priority: Minor Attachments: HIVE-3778.patch.3, HIVE-3778.patch.6, HIVE-3778.patch.8 This is follow up of HIVE-3767: Add MapJoinDesc.isBucketMapJoin() as part of explain plan -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Work started] (HIVE-3778) Add MapJoinDesc.isBucketMapJoin() as part of explain plan
[ https://issues.apache.org/jira/browse/HIVE-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-3778 started by Gang Tim Liu. Add MapJoinDesc.isBucketMapJoin() as part of explain plan - Key: HIVE-3778 URL: https://issues.apache.org/jira/browse/HIVE-3778 Project: Hive Issue Type: Bug Reporter: Gang Tim Liu Assignee: Gang Tim Liu Priority: Minor Attachments: HIVE-3778.patch.3, HIVE-3778.patch.6, HIVE-3778.patch.8 This is follow up of HIVE-3767: Add MapJoinDesc.isBucketMapJoin() as part of explain plan -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3778) Add MapJoinDesc.isBucketMapJoin() as part of explain plan
[ https://issues.apache.org/jira/browse/HIVE-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567023#comment-13567023 ] Gang Tim Liu commented on HIVE-3778: patch is attached to the jira also. Add MapJoinDesc.isBucketMapJoin() as part of explain plan - Key: HIVE-3778 URL: https://issues.apache.org/jira/browse/HIVE-3778 Project: Hive Issue Type: Bug Reporter: Gang Tim Liu Assignee: Gang Tim Liu Priority: Minor Attachments: HIVE-3778.patch.3, HIVE-3778.patch.6, HIVE-3778.patch.8 This is follow up of HIVE-3767: Add MapJoinDesc.isBucketMapJoin() as part of explain plan -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3778) Add MapJoinDesc.isBucketMapJoin() as part of explain plan
[ https://issues.apache.org/jira/browse/HIVE-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Tim Liu updated HIVE-3778: --- Status: Patch Available (was: In Progress) patch is available https://reviews.facebook.net/D8259 Add MapJoinDesc.isBucketMapJoin() as part of explain plan - Key: HIVE-3778 URL: https://issues.apache.org/jira/browse/HIVE-3778 Project: Hive Issue Type: Bug Reporter: Gang Tim Liu Assignee: Gang Tim Liu Priority: Minor Attachments: HIVE-3778.patch.3, HIVE-3778.patch.6, HIVE-3778.patch.8 This is follow up of HIVE-3767: Add MapJoinDesc.isBucketMapJoin() as part of explain plan -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3917) Support noscan operation for analyze command
[ https://issues.apache.org/jira/browse/HIVE-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567024#comment-13567024 ] Gang Tim Liu commented on HIVE-3917: patch is in both https://reviews.facebook.net/D8235 and attachment. thanks Support noscan operation for analyze command Key: HIVE-3917 URL: https://issues.apache.org/jira/browse/HIVE-3917 Project: Hive Issue Type: Improvement Components: Statistics Affects Versions: 0.11.0 Reporter: Gang Tim Liu Assignee: Gang Tim Liu Attachments: HIVE-3917.patch.1, HIVE-3917.patch.2 hive supports analyze command to gather statistics from existing tables/partition https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables It collects: 1. Number of Rows 2. Number of files 3. Size in Bytes If table/partition is big, the operation would take time since it will open all files and scan all data. It would be nice to support fast operation to gather statistics which doesn't require to open all files: 1. Number of files 2. Size in Bytes Potential syntax is ANALYZE TABLE tablename [PARTITION(partcol1[=val1], partcol2[=val2], ...)] COMPUTE STATISTICS [noscan]; In the future, all statistics without scan can be retrieved via this optional parameter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Jenkins build is back to normal : Hive-0.9.1-SNAPSHOT-h0.21 #277
See https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/277/
[jira] [Commented] (HIVE-3940) Track columns accessed in each table in a query
[ https://issues.apache.org/jira/browse/HIVE-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567149#comment-13567149 ] Hudson commented on HIVE-3940: -- Integrated in hive-trunk-hadoop1 #60 (See [https://builds.apache.org/job/hive-trunk-hadoop1/60/]) HIVE-3940. Track columns accessed in each table in a query. (Samuel Yuan via kevinwilfong) (Revision 1440695) Result = ABORTED kevinwilfong : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1440695 Files : * /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java * /hive/trunk/conf/hive-default.xml.template * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/QueryPlan.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnAccessAnalyzer.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnAccessInfo.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java * /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/hooks/CheckColumnAccessHook.java * /hive/trunk/ql/src/test/queries/clientpositive/column_access_stats.q * /hive/trunk/ql/src/test/results/clientpositive/column_access_stats.q.out Track columns accessed in each table in a query --- Key: HIVE-3940 URL: https://issues.apache.org/jira/browse/HIVE-3940 Project: Hive Issue Type: Task Components: Query Processor Reporter: Samuel Yuan Assignee: Samuel Yuan Priority: Minor Fix For: 0.11.0 Attachments: HIVE-3940.1.patch.txt, HIVE-3940.2.patch.txt, HIVE-3940.3.patch.txt Similar to partition access logs, we need to have columns access logs, so later we can build tools/reports to inform users if there are wasted columns in a table to be trimmed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-3964) Add upgrade script for Oracle backend to metastore.
Mithun Radhakrishnan created HIVE-3964: -- Summary: Add upgrade script for Oracle backend to metastore. Key: HIVE-3964 URL: https://issues.apache.org/jira/browse/HIVE-3964 Project: Hive Issue Type: Bug Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan upgrade-0.9.0-0.10.0.oracle.sql isn't available in metastore/scripts/upgrade/oracle. This warrants testing as well. My concern is that SDS::IS_STOREDASSUBDIRECTORIES is a new, non-nullable column. Existing rows in SDS might need updating with a default value (0) before the constraint is applied. I'll post a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-896) Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive.
[ https://issues.apache.org/jira/browse/HIVE-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567271#comment-13567271 ] Harish Butani commented on HIVE-896: Yes, exactly. Will start to introduce the new Spec classes as noted in the DataStruct attachment, and refactor the Def classes to remove the antlr dependency. But before doing this had to handle the following issue. So the plan we generate has the form ... - ReduceSink - Extract - PTF Op - ... The Reduce Sink RowResolver contains the Virtual Columns from its input Operators. During translation we set the RowResolver of the Extract Op to be the same as the Reduce Sink RR; and this same RR was used to setup the ExprNodeDescs in PTF translation. But at runtime the Extract Op doesn't contain the Virtual Columns and so the internal column names can be different. For e.g. in our testJoinWithLeadLag testCase, which is a self join on part and also has a Windowing expression. The RR of the RS op at translation time looks something like this: (_co1,_col2,..,_col7, _col8(vc=true),_col9(vc=true),_col10,_col11,.._col15(vc=true),_col16(vc=true),..) At runtime the Virtual columns are removed and all the columns after _col7 are shifted 1 or 2 positions. So in child Operators ColumnExprNodeDescs are no longer referring to the right columns. We were handling this issue by recreating the ExprNodeDescs from the ASTNodes at runtime. So to avoid carrying forward the ASTNodes we now build a new RR for the Extract Op, with the Virtual Columns removed. We hand this to the PTFTranslator as the starting RR to use to translate a PTF Chain. With the above change, now it should be possible to use the ExprNodeDescs created during translation in the execution of the PTF Op. So will now start a sequence of steps to move to the new data structures and avoid recreation of ExprNodeDescs at runtime. I apologize if I am not being clear. This is a little hard to explain w/o walking through an example. Happy to go over this in detail offline. Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive. --- Key: HIVE-896 URL: https://issues.apache.org/jira/browse/HIVE-896 Project: Hive Issue Type: New Feature Components: OLAP, UDF Reporter: Amr Awadallah Priority: Minor Attachments: DataStructs.pdf, HIVE-896.1.patch.txt, Hive-896.2.patch.txt Windowing functions are very useful for click stream processing and similar time-series/sliding-window analytics. More details at: http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1006709 http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007059 http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007032 -- amr -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3958) support partial scan for analyze command
[ https://issues.apache.org/jira/browse/HIVE-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhuoluo (Clark) Yang updated HIVE-3958: --- Description: analyze commands allows us to collect statistics on existing tables/partitions. It works great but might be slow since it scans all files. There are 2 ways to speed it up: 1. collect stats without file scan. It may not collect all stats but good and fast enough for use case. HIVE-3917 addresses it 2. collect stats via partial file scan. It doesn't scan all content of files but part of it to get file metadata. some examples are https://cwiki.apache.org/Hive/rcfilecat.html for RCFile, ORC ( HIVE-3874 ) and HFile of Hbase This jira is targeted to address the #2 was: analyze commands allows us to collect statistics on existing tables/partitions. It works great but might be slow since it scans all files. There are 2 ways to speed it up: 1. collect stats without file scan. It may not collect all stats but good and fast enough for use case. Hive-3917 addresses it 2. collect stats via partial file scan. It doesn't scan all content of files but part of it to get file metadata. some examples are https://cwiki.apache.org/Hive/rcfilecat.html for RCFile, ORC ( HIVE-3874 ) and HFile of Hbase This jira is targeted to address the #2 support partial scan for analyze command Key: HIVE-3958 URL: https://issues.apache.org/jira/browse/HIVE-3958 Project: Hive Issue Type: Improvement Reporter: Gang Tim Liu Assignee: Gang Tim Liu analyze commands allows us to collect statistics on existing tables/partitions. It works great but might be slow since it scans all files. There are 2 ways to speed it up: 1. collect stats without file scan. It may not collect all stats but good and fast enough for use case. HIVE-3917 addresses it 2. collect stats via partial file scan. It doesn't scan all content of files but part of it to get file metadata. some examples are https://cwiki.apache.org/Hive/rcfilecat.html for RCFile, ORC ( HIVE-3874 ) and HFile of Hbase This jira is targeted to address the #2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3778) Add MapJoinDesc.isBucketMapJoin() as part of explain plan
[ https://issues.apache.org/jira/browse/HIVE-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567318#comment-13567318 ] Ashutosh Chauhan commented on HIVE-3778: Gang cool idea to address the concern. I think we should extend its usage for all the different booleans we have in explain of other *Desc classes. That probably will update lot more .q.out files so probably should be done in a separate ticket. Can you open a follow-up jira for that? Add MapJoinDesc.isBucketMapJoin() as part of explain plan - Key: HIVE-3778 URL: https://issues.apache.org/jira/browse/HIVE-3778 Project: Hive Issue Type: Bug Reporter: Gang Tim Liu Assignee: Gang Tim Liu Priority: Minor Attachments: HIVE-3778.patch.3, HIVE-3778.patch.6, HIVE-3778.patch.8 This is follow up of HIVE-3767: Add MapJoinDesc.isBucketMapJoin() as part of explain plan -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3778) Add MapJoinDesc.isBucketMapJoin() as part of explain plan
[ https://issues.apache.org/jira/browse/HIVE-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567331#comment-13567331 ] Gang Tim Liu commented on HIVE-3778: [~ashutoshc]glad you like it. yes, here is the follow-up jira HIVE-3965. Add MapJoinDesc.isBucketMapJoin() as part of explain plan - Key: HIVE-3778 URL: https://issues.apache.org/jira/browse/HIVE-3778 Project: Hive Issue Type: Bug Reporter: Gang Tim Liu Assignee: Gang Tim Liu Priority: Minor Attachments: HIVE-3778.patch.3, HIVE-3778.patch.6, HIVE-3778.patch.8 This is follow up of HIVE-3767: Add MapJoinDesc.isBucketMapJoin() as part of explain plan -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-3965) Reduce output of explain plan by printing boolean value only if it is true
Gang Tim Liu created HIVE-3965: -- Summary: Reduce output of explain plan by printing boolean value only if it is true Key: HIVE-3965 URL: https://issues.apache.org/jira/browse/HIVE-3965 Project: Hive Issue Type: Bug Reporter: Gang Tim Liu Assignee: Gang Tim Liu Leverage the design in HIVE-3778 to reduce output of explain plan by printing boolean value only if it is true. That probably will update lot more .q.out files so probably should be done in a separate ticket than 3778. so it ends up here. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata
[ https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567336#comment-13567336 ] Namit Jain commented on HIVE-3833: -- [~jakobhoman], this was definitely not intentional. Unfortunately, there was no test case, so I missed this. Can you provide me a complete testcase ? I will take a look. object inspectors should be initialized based on partition metadata --- Key: HIVE-3833 URL: https://issues.apache.org/jira/browse/HIVE-3833 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Fix For: 0.11.0 Attachments: hive.3833.10.patch, hive.3833.11.patch, hive.3833.12.patch, hive.3833.13.patch, hive.3833.14.patch, hive.3833.16.path, hive.3833.17.patch, hive.3833.18.patch, hive.3833.19.patch, hive.3833.1.patch, hive.3833.20.patch, hive.3833.21.patch, hive.3833.22.patch, hive.3833.23.patch, hive.3833.2.patch, hive.3833.3.patch, hive.3833.4.patch, hive.3833.5.patch, hive.3833.6.patch, hive.3833.7.patch, hive.3833.8.patch, hive.3833.9.patch Currently, different partitions can be picked up for the same input split based on the serdes' etc. And, we dont allow to change the schema for LazyColumnarBinarySerDe. Instead of that, different partitions should be part of the same split, only if the partition schemas exactly match. The operator tree object inspectors should be based on the partition schema. That would give greater flexibility and also help using binary serde with rcfile -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3953) Reading of partitioned Avro data fails because of missing properties
[ https://issues.apache.org/jira/browse/HIVE-3953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567337#comment-13567337 ] Namit Jain commented on HIVE-3953: -- Copying from HIVE-3833. Can you provide me a complete testcase ? I will take a look. Reading of partitioned Avro data fails because of missing properties Key: HIVE-3953 URL: https://issues.apache.org/jira/browse/HIVE-3953 Project: Hive Issue Type: Bug Reporter: Mark Wagner After HIVE-3833, reading partitioned Avro data fails due to missing properties. The avro.schema.(url|literal) properties are not making it all the way to the SerDe. Non-partitioned data can still be read. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
reduced unit test timings
I have noticed that the time taken to run the unit tests has reduced considerably (it has become nearly half) from the last week or so. Just wondering, if anyone else has noticed this too. If yes, does anyone know the root cause of this speedup ? Thanks, -namit
[jira] [Commented] (HIVE-3778) Add MapJoinDesc.isBucketMapJoin() as part of explain plan
[ https://issues.apache.org/jira/browse/HIVE-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567345#comment-13567345 ] Namit Jain commented on HIVE-3778: -- +1 Running tests Add MapJoinDesc.isBucketMapJoin() as part of explain plan - Key: HIVE-3778 URL: https://issues.apache.org/jira/browse/HIVE-3778 Project: Hive Issue Type: Bug Reporter: Gang Tim Liu Assignee: Gang Tim Liu Priority: Minor Attachments: HIVE-3778.patch.3, HIVE-3778.patch.6, HIVE-3778.patch.8 This is follow up of HIVE-3767: Add MapJoinDesc.isBucketMapJoin() as part of explain plan -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request: Request to review the change.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/9171/ --- Review request for hive. Description --- Patch for issue https://issues.apache.org/jira/browse/HIVE-3850, Patch has been accepted by the person who raised the issue. Please review. This addresses bug https://issues.apache.org/jira/browse/HIVE-3850. https://issues.apache.org/jira/browse/https://issues.apache.org/jira/browse/HIVE-3850 Diffs - ql/src/java/org/apache/hadoop/hive/ql/udf/UDFHour.java 85b514a Diff: https://reviews.apache.org/r/9171/diff/ Testing --- The change made was tested Thanks, Arun A K
[jira] [Commented] (HIVE-3850) hour() function returns 12 hour clock value when using timestamp datatype
[ https://issues.apache.org/jira/browse/HIVE-3850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567355#comment-13567355 ] Mark Grover commented on HIVE-3850: --- For completeness, review is at: https://reviews.apache.org/r/9171/ hour() function returns 12 hour clock value when using timestamp datatype - Key: HIVE-3850 URL: https://issues.apache.org/jira/browse/HIVE-3850 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.9.0 Reporter: Pieterjan Vriends Attachments: HIVE-3850.patch.txt Apparently UDFHour.java does have two evaluate() functions. One that does accept a Text object as parameter and one that does use a TimeStampWritable object as parameter. The first function does return the value of Calendar.HOUR_OF_DAY and the second one of Calendar.HOUR. In the documentation I couldn't find any information on the overload of the evaluation function. I did spent quite some time finding out why my statement didn't return a 24 hour clock value. Shouldn't both functions return the same? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3850) hour() function returns 12 hour clock value when using timestamp datatype
[ https://issues.apache.org/jira/browse/HIVE-3850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567356#comment-13567356 ] Mark Grover commented on HIVE-3850: --- Patch looks good to me. Usually, I would ask for unit tests to be added with any change but given that it's a trivial change, I would be ok without new tests. We should, however, make sure we update the existing unit tests if needed. Did you get a chance to run the unit tests (atleast the ones that use hour UDF) and make sure no changes are required in their output? hour() function returns 12 hour clock value when using timestamp datatype - Key: HIVE-3850 URL: https://issues.apache.org/jira/browse/HIVE-3850 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.9.0 Reporter: Pieterjan Vriends Attachments: HIVE-3850.patch.txt Apparently UDFHour.java does have two evaluate() functions. One that does accept a Text object as parameter and one that does use a TimeStampWritable object as parameter. The first function does return the value of Calendar.HOUR_OF_DAY and the second one of Calendar.HOUR. In the documentation I couldn't find any information on the overload of the evaluation function. I did spent quite some time finding out why my statement didn't return a 24 hour clock value. Shouldn't both functions return the same? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (HIVE-3850) hour() function returns 12 hour clock value when using timestamp datatype
[ https://issues.apache.org/jira/browse/HIVE-3850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Grover reopened HIVE-3850: --- The change wasn't committed, re-opening the JIRA. hour() function returns 12 hour clock value when using timestamp datatype - Key: HIVE-3850 URL: https://issues.apache.org/jira/browse/HIVE-3850 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.9.0, 0.10.0 Reporter: Pieterjan Vriends Fix For: 0.11.0 Attachments: HIVE-3850.patch.txt Apparently UDFHour.java does have two evaluate() functions. One that does accept a Text object as parameter and one that does use a TimeStampWritable object as parameter. The first function does return the value of Calendar.HOUR_OF_DAY and the second one of Calendar.HOUR. In the documentation I couldn't find any information on the overload of the evaluation function. I did spent quite some time finding out why my statement didn't return a 24 hour clock value. Shouldn't both functions return the same? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3778) Add MapJoinDesc.isBucketMapJoin() as part of explain plan
[ https://issues.apache.org/jira/browse/HIVE-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Tim Liu updated HIVE-3778: --- Attachment: HIVE-3778.patch.9 Add MapJoinDesc.isBucketMapJoin() as part of explain plan - Key: HIVE-3778 URL: https://issues.apache.org/jira/browse/HIVE-3778 Project: Hive Issue Type: Bug Reporter: Gang Tim Liu Assignee: Gang Tim Liu Priority: Minor Attachments: HIVE-3778.patch.3, HIVE-3778.patch.6, HIVE-3778.patch.8, HIVE-3778.patch.9 This is follow up of HIVE-3767: Add MapJoinDesc.isBucketMapJoin() as part of explain plan -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3778) Add MapJoinDesc.isBucketMapJoin() as part of explain plan
[ https://issues.apache.org/jira/browse/HIVE-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-3778: - Status: Open (was: Patch Available) comments Add MapJoinDesc.isBucketMapJoin() as part of explain plan - Key: HIVE-3778 URL: https://issues.apache.org/jira/browse/HIVE-3778 Project: Hive Issue Type: Bug Reporter: Gang Tim Liu Assignee: Gang Tim Liu Priority: Minor Attachments: HIVE-3778.patch.3, HIVE-3778.patch.6, HIVE-3778.patch.8, HIVE-3778.patch.9 This is follow up of HIVE-3767: Add MapJoinDesc.isBucketMapJoin() as part of explain plan -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3778) Add MapJoinDesc.isBucketMapJoin() as part of explain plan
[ https://issues.apache.org/jira/browse/HIVE-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Tim Liu updated HIVE-3778: --- Attachment: HIVE-3778.patch.10 Add MapJoinDesc.isBucketMapJoin() as part of explain plan - Key: HIVE-3778 URL: https://issues.apache.org/jira/browse/HIVE-3778 Project: Hive Issue Type: Bug Reporter: Gang Tim Liu Assignee: Gang Tim Liu Priority: Minor Attachments: HIVE-3778.patch.10, HIVE-3778.patch.3, HIVE-3778.patch.6, HIVE-3778.patch.8, HIVE-3778.patch.9 This is follow up of HIVE-3767: Add MapJoinDesc.isBucketMapJoin() as part of explain plan -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3778) Add MapJoinDesc.isBucketMapJoin() as part of explain plan
[ https://issues.apache.org/jira/browse/HIVE-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Tim Liu updated HIVE-3778: --- Attachment: HIVE-3778.patch.10 Add MapJoinDesc.isBucketMapJoin() as part of explain plan - Key: HIVE-3778 URL: https://issues.apache.org/jira/browse/HIVE-3778 Project: Hive Issue Type: Bug Reporter: Gang Tim Liu Assignee: Gang Tim Liu Priority: Minor Attachments: HIVE-3778.patch.10, HIVE-3778.patch.10, HIVE-3778.patch.3, HIVE-3778.patch.6, HIVE-3778.patch.8, HIVE-3778.patch.9 This is follow up of HIVE-3767: Add MapJoinDesc.isBucketMapJoin() as part of explain plan -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3778) Add MapJoinDesc.isBucketMapJoin() as part of explain plan
[ https://issues.apache.org/jira/browse/HIVE-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Tim Liu updated HIVE-3778: --- Status: Patch Available (was: Open) patch is available. Add MapJoinDesc.isBucketMapJoin() as part of explain plan - Key: HIVE-3778 URL: https://issues.apache.org/jira/browse/HIVE-3778 Project: Hive Issue Type: Bug Reporter: Gang Tim Liu Assignee: Gang Tim Liu Priority: Minor Attachments: HIVE-3778.patch.10, HIVE-3778.patch.10, HIVE-3778.patch.3, HIVE-3778.patch.6, HIVE-3778.patch.8, HIVE-3778.patch.9 This is follow up of HIVE-3767: Add MapJoinDesc.isBucketMapJoin() as part of explain plan -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3403) user should not specify mapjoin to perform sort-merge bucketed join
[ https://issues.apache.org/jira/browse/HIVE-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-3403: - Attachment: hive.3403.21.patch user should not specify mapjoin to perform sort-merge bucketed join --- Key: HIVE-3403 URL: https://issues.apache.org/jira/browse/HIVE-3403 Project: Hive Issue Type: Bug Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.3403.10.patch, hive.3403.11.patch, hive.3403.12.patch, hive.3403.13.patch, hive.3403.14.patch, hive.3403.15.patch, hive.3403.16.patch, hive.3403.17.patch, hive.3403.18.patch, hive.3403.19.patch, hive.3403.1.patch, hive.3403.21.patch, hive.3403.2.patch, hive.3403.3.patch, hive.3403.4.patch, hive.3403.5.patch, hive.3403.6.patch, hive.3403.7.patch, hive.3403.8.patch, hive.3403.9.patch Currently, in order to perform a sort merge bucketed join, the user needs to set hive.optimize.bucketmapjoin.sortedmerge to true, and also specify the mapjoin hint. The user should not specify any hints. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: reduced unit test timings
I am not sure about half, but https://issues.apache.org/jira/browse/HIVE-3947 has certainly helped. Both MiniMRCliDriver and NegativeMiniMRCliDriver used to remain in hung state for ~10 minutes after all tests have run and minicluster is tearing down. That patch has saved atleast ~15 mins for test runs in my environment. Thanks to Navis for that! Ashutosh On Wed, Jan 30, 2013 at 8:23 PM, Namit Jain nj...@fb.com wrote: I have noticed that the time taken to run the unit tests has reduced considerably (it has become nearly half) from the last week or so. Just wondering, if anyone else has noticed this too. If yes, does anyone know the root cause of this speedup ? Thanks, -namit
Re: reduced unit test timings
I run tests on a parallel cluster (8 machines). For that, the test time has gone down from 2:15hours to approx. 1:15 On 1/31/13 11:55 AM, Ashutosh Chauhan hashut...@apache.org wrote: I am not sure about half, but https://issues.apache.org/jira/browse/HIVE-3947 has certainly helped. Both MiniMRCliDriver and NegativeMiniMRCliDriver used to remain in hung state for ~10 minutes after all tests have run and minicluster is tearing down. That patch has saved atleast ~15 mins for test runs in my environment. Thanks to Navis for that! Ashutosh On Wed, Jan 30, 2013 at 8:23 PM, Namit Jain nj...@fb.com wrote: I have noticed that the time taken to run the unit tests has reduced considerably (it has become nearly half) from the last week or so. Just wondering, if anyone else has noticed this too. If yes, does anyone know the root cause of this speedup ? Thanks, -namit
[jira] [Updated] (HIVE-3917) Support noscan operation for analyze command
[ https://issues.apache.org/jira/browse/HIVE-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Tim Liu updated HIVE-3917: --- Attachment: HIVE-3917.patch.3 Support noscan operation for analyze command Key: HIVE-3917 URL: https://issues.apache.org/jira/browse/HIVE-3917 Project: Hive Issue Type: Improvement Components: Statistics Affects Versions: 0.11.0 Reporter: Gang Tim Liu Assignee: Gang Tim Liu Attachments: HIVE-3917.patch.1, HIVE-3917.patch.2, HIVE-3917.patch.3 hive supports analyze command to gather statistics from existing tables/partition https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables It collects: 1. Number of Rows 2. Number of files 3. Size in Bytes If table/partition is big, the operation would take time since it will open all files and scan all data. It would be nice to support fast operation to gather statistics which doesn't require to open all files: 1. Number of files 2. Size in Bytes Potential syntax is ANALYZE TABLE tablename [PARTITION(partcol1[=val1], partcol2[=val2], ...)] COMPUTE STATISTICS [noscan]; In the future, all statistics without scan can be retrieved via this optional parameter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3403) user should not specify mapjoin to perform sort-merge bucketed join
[ https://issues.apache.org/jira/browse/HIVE-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-3403: - Attachment: hive.3403.22.patch user should not specify mapjoin to perform sort-merge bucketed join --- Key: HIVE-3403 URL: https://issues.apache.org/jira/browse/HIVE-3403 Project: Hive Issue Type: Bug Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.3403.10.patch, hive.3403.11.patch, hive.3403.12.patch, hive.3403.13.patch, hive.3403.14.patch, hive.3403.15.patch, hive.3403.16.patch, hive.3403.17.patch, hive.3403.18.patch, hive.3403.19.patch, hive.3403.1.patch, hive.3403.21.patch, hive.3403.22.patch, hive.3403.2.patch, hive.3403.3.patch, hive.3403.4.patch, hive.3403.5.patch, hive.3403.6.patch, hive.3403.7.patch, hive.3403.8.patch, hive.3403.9.patch Currently, in order to perform a sort merge bucketed join, the user needs to set hive.optimize.bucketmapjoin.sortedmerge to true, and also specify the mapjoin hint. The user should not specify any hints. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3403) user should not specify mapjoin to perform sort-merge bucketed join
[ https://issues.apache.org/jira/browse/HIVE-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567412#comment-13567412 ] Namit Jain commented on HIVE-3403: -- To help in review, the class hierarchy is: AbstractBucketJoinProc AbstractSMBJoinProc SortedMergeBucketMapjoinProc SortedMergeJoinProc BucketMapjoinOptProc The context needed is: BucketJoinOptProcCtx SortBucketJoinOptProcCtx Most of the code in AbstractBucketJoinProc and AbstractSMBJoinProc is old code moved. BucketMapjoinOptProc is also old code – but there has been little refactoring to break it up into context. As such, the only new code is SortedMergeJoinProc. Due to the refactoring, I am able to re-use a lot of code between map-join and join processing. user should not specify mapjoin to perform sort-merge bucketed join --- Key: HIVE-3403 URL: https://issues.apache.org/jira/browse/HIVE-3403 Project: Hive Issue Type: Bug Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.3403.10.patch, hive.3403.11.patch, hive.3403.12.patch, hive.3403.13.patch, hive.3403.14.patch, hive.3403.15.patch, hive.3403.16.patch, hive.3403.17.patch, hive.3403.18.patch, hive.3403.19.patch, hive.3403.1.patch, hive.3403.21.patch, hive.3403.22.patch, hive.3403.2.patch, hive.3403.3.patch, hive.3403.4.patch, hive.3403.5.patch, hive.3403.6.patch, hive.3403.7.patch, hive.3403.8.patch, hive.3403.9.patch Currently, in order to perform a sort merge bucketed join, the user needs to set hive.optimize.bucketmapjoin.sortedmerge to true, and also specify the mapjoin hint. The user should not specify any hints. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira