Re: [ANNOUNCE] New Hive Committer - Amareshwari Sriramadasu
Congrats amareshwari! Yongqiang On Nov 8, 2010, at 6:00 PM, Namit Jain nj...@facebook.com wrote: Hi Folks, The Hive PMC has passed the vote to make Amareshwari Sriramadasu a new committer on the Apache Hive project. Following is a list of the contributions that Amareshwari has made to the project: http://bit.ly/c3z0ty Congratulations Amareshwari. Please send over your CLA to Apache. Thanks, Namit
Confusing in hive-default.xml
Hi, Myabe it is a typo but i'm not sure. Excerpt from hive-default.xml (trunk): property namehive.input.format/name valueorg.apache.hadoop.hive.ql.io.HiveInputFormat/value descriptionThe default input format, if it is not specified, the system assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and 19, whereas it is set to CombinedHiveInputFormat for hadoop 20. The user can always overwrite it - if there is a bug in CombinedHiveInputFormat, it can always be manually set to HiveInputFormat. /description /property The 'CombinedHiveInputFormat' does not exist. It should be 'CombineHiveInputFormat' so the property's value is 'org.apache.hadoop.hive.ql.io.CombineHiveInputFormat'. I don't know whether it's a intention or typo. Thanks. - Youngwoo
[jira] Resolved: (HIVE-1766) Dynamic partition is not working as expected.
[ https://issues.apache.org/jira/browse/HIVE-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saravanan resolved HIVE-1766. - Resolution: Not A Problem Release Note: The data format was wrong. I have used CSV format with quotes. Which was not supported in hive. The data format was wrong. I have used CSV format with quotes. Which was not supported in hive. Dynamic partition is not working as expected. -- Key: HIVE-1766 URL: https://issues.apache.org/jira/browse/HIVE-1766 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.6.0, 0.7.0 Environment: Linux, Got the latest code from hive trunk and also tested in 0.6. hadoop version 0.20 Reporter: Saravanan Fix For: 0.7.0, 0.6.0 Create source table -- CREATE EXTERNAL TABLE testmove ( a string, b string ) PARTITIONED BY (cust string, dt string); Data has been kept in /usr/hive/warehouse/testmove/cust=a/dt=20100102/a.txt a.txt has 1 row the value is a, b Create Destination table --- CREATE EXTERNAL TABLE testmove1 ( a string, b string ) PARTITIONED BY (cust string, dt string) Run the query for dynamic partion insert --- set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode=nonstrict; FROM testmove t INSERT OVERWRITE TABLE testmove1 PARTITION (cust, dt) SELECT t.a, t.b, 'a', '20100102'; output --- otal MapReduce jobs = 2 Launching Job 1 out of 2 Number of reduce tasks is set to 0 since there's no reduce operator Execution log at: /tmp/root/root_20101103170404_9e869676-7bb5-4655-b027-5bcb4b7fa2cb.log Job running in-process (local Hadoop) 2010-11-03 17:04:06,818 null map = 100%, reduce = 0% Ended Job = job_local_0001 Ended Job = -64572, job is filtered out (removed at runtime). Moving data to: file:/tmp/hive-root/hive_2010-11-03_17-03-59_979_5901061386316364507/-ext-1 Loading data to table testmove1 partition (cust=null, dt=null) [Warning] could not update stats. OK If i run as static partion is the data is inserted in to destination table. FROM testmove t INSERT OVERWRITE TABLE testmove1 PARTITION (cust='a', dt='20100102') SELECT t.a, t.b; -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [ANNOUNCE] New Hive Committer - Amareshwari Sriramadasu
Welcome! On Tue, Nov 9, 2010 at 3:28 AM, Yongqiang He heyongqiang...@gmail.com wrote: Congrats amareshwari! Yongqiang On Nov 8, 2010, at 6:00 PM, Namit Jain nj...@facebook.com wrote: Hi Folks, The Hive PMC has passed the vote to make Amareshwari Sriramadasu a new committer on the Apache Hive project. Following is a list of the contributions that Amareshwari has made to the project: http://bit.ly/c3z0ty Congratulations Amareshwari. Please send over your CLA to Apache. Thanks, Namit
[jira] Updated: (HIVE-1754) Remove JDBM component from Map Join
[ https://issues.apache.org/jira/browse/HIVE-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HIVE-1754: - Attachment: hive-1754_4.patch Resolved all the output conflicts in this patch Remove JDBM component from Map Join --- Key: HIVE-1754 URL: https://issues.apache.org/jira/browse/HIVE-1754 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.6.0, 0.7.0 Reporter: Liyin Tang Assignee: Liyin Tang Fix For: 0.7.0 Attachments: Hive-1754.patch, Hive-1754_2.patch, Hive-1754_3.patch, hive-1754_4.patch Right now, JDBM is the major performance bottleneck of performance. With the growth of the small table, the PUT and GET operation will take most of execution time. Map Join is designed to load the data of small table into memory. If the data is too large to hold in memory, then there is no need to use the map join strategy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: Confusing in hive-default.xml
Yes, good catch, that is just a typo. It should be CombineHiveInputFormat. If you want to fix it, could you file a JIRA for this? Cheers, Paul -Original Message- From: 김영우 [mailto:warwit...@gmail.com] Sent: Tuesday, November 09, 2010 3:12 AM To: dev@hive.apache.org Subject: Confusing in hive-default.xml Hi, Myabe it is a typo but i'm not sure. Excerpt from hive-default.xml (trunk): property namehive.input.format/name valueorg.apache.hadoop.hive.ql.io.HiveInputFormat/value descriptionThe default input format, if it is not specified, the system assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and 19, whereas it is set to CombinedHiveInputFormat for hadoop 20. The user can always overwrite it - if there is a bug in CombinedHiveInputFormat, it can always be manually set to HiveInputFormat. /description /property The 'CombinedHiveInputFormat' does not exist. It should be 'CombineHiveInputFormat' so the property's value is 'org.apache.hadoop.hive.ql.io.CombineHiveInputFormat'. I don't know whether it's a intention or typo. Thanks. - Youngwoo
Re: [ANNOUNCE] New Hive Committer - Amareshwari Sriramadasu
Congratulations Amareshwari! Carl On Tue, Nov 9, 2010 at 8:33 AM, Edward Capriolo edlinuxg...@gmail.comwrote: Welcome! On Tue, Nov 9, 2010 at 3:28 AM, Yongqiang He heyongqiang...@gmail.com wrote: Congrats amareshwari! Yongqiang On Nov 8, 2010, at 6:00 PM, Namit Jain nj...@facebook.com wrote: Hi Folks, The Hive PMC has passed the vote to make Amareshwari Sriramadasu a new committer on the Apache Hive project. Following is a list of the contributions that Amareshwari has made to the project: http://bit.ly/c3z0ty Congratulations Amareshwari. Please send over your CLA to Apache. Thanks, Namit
[jira] Resolved: (HIVE-1775) Assertation on inputObjInspectors.length in Groupy operator
[ https://issues.apache.org/jira/browse/HIVE-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang resolved HIVE-1775. -- Resolution: Fixed Release Note: This bug is fixed in Hive-1754 Assertation on inputObjInspectors.length in Groupy operator --- Key: HIVE-1775 URL: https://issues.apache.org/jira/browse/HIVE-1775 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.6.0, 0.7.0 Reporter: Liyin Tang Assignee: Liyin Tang Fix For: 0.7.0 In the Groupby Operator: Line 188: assert (inputObjInspectors.length == 1); But this assertion may not necessary true -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1754) Remove JDBM component from Map Join
[ https://issues.apache.org/jira/browse/HIVE-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930291#action_12930291 ] He Yongqiang commented on HIVE-1754: will take a close look. Remove JDBM component from Map Join --- Key: HIVE-1754 URL: https://issues.apache.org/jira/browse/HIVE-1754 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.6.0, 0.7.0 Reporter: Liyin Tang Assignee: Liyin Tang Fix For: 0.7.0 Attachments: Hive-1754.patch, Hive-1754_2.patch, Hive-1754_3.patch, hive-1754_4.patch, hive-1754_5.patch Right now, JDBM is the major performance bottleneck of performance. With the growth of the small table, the PUT and GET operation will take most of execution time. Map Join is designed to load the data of small table into memory. If the data is too large to hold in memory, then there is no need to use the map join strategy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: Confusing in hive-default.xml
It is a typo. Can you file a jira and fix it ? Thanks, -namit -Original Message- From: 김영우 [mailto:warwit...@gmail.com] Sent: Tuesday, November 09, 2010 3:12 AM To: dev@hive.apache.org Subject: Confusing in hive-default.xml Hi, Myabe it is a typo but i'm not sure. Excerpt from hive-default.xml (trunk): property namehive.input.format/name valueorg.apache.hadoop.hive.ql.io.HiveInputFormat/value descriptionThe default input format, if it is not specified, the system assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and 19, whereas it is set to CombinedHiveInputFormat for hadoop 20. The user can always overwrite it - if there is a bug in CombinedHiveInputFormat, it can always be manually set to HiveInputFormat. /description /property The 'CombinedHiveInputFormat' does not exist. It should be 'CombineHiveInputFormat' so the property's value is 'org.apache.hadoop.hive.ql.io.CombineHiveInputFormat'. I don't know whether it's a intention or typo. Thanks. - Youngwoo
Re: Review Request: HIVE-1771: ROUND(infinity) chokes
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/53/#review31 --- Ship it! +1 Looks good. Will test/commit. - Paul On 2010-11-08 18:36:25, John Sichi wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/53/ --- (Updated 2010-11-08 18:36:25) Review request for hive. Summary --- Review request from jvs. This addresses bug HIVE-1771. https://issues.apache.org/jira/browse/HIVE-1771 Diffs - http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFRound.java 1032795 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientpositive/udf_round.q 1032795 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/udf_round.q.out 1032795 Diff: https://reviews.apache.org/r/53/diff Testing --- Thanks, John
[jira] Commented: (HIVE-1648) Automatically gathering stats when reading a table/partition
[ https://issues.apache.org/jira/browse/HIVE-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930326#action_12930326 ] Paul Butler commented on HIVE-1648: --- I get a bunch of tests failing when I build the latest trunk, even without applying my patch. I'm trying to figure out what's wrong with those first. Automatically gathering stats when reading a table/partition Key: HIVE-1648 URL: https://issues.apache.org/jira/browse/HIVE-1648 Project: Hive Issue Type: Sub-task Reporter: Ning Zhang Assignee: Paul Butler Attachments: HIVE-1648.2.patch, HIVE-1648.patch HIVE-1361 introduces a new command 'ANALYZE TABLE T COMPUTE STATISTICS' to gathering stats. This requires additional scan of the data. Stats gathering can be piggy-backed on TableScanOperator whenever a table/partition is scanned (given not LIMIT operator). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1648) Automatically gathering stats when reading a table/partition
[ https://issues.apache.org/jira/browse/HIVE-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930321#action_12930321 ] Namit Jain commented on HIVE-1648: -- Paul, any updates on the unit tests ? Automatically gathering stats when reading a table/partition Key: HIVE-1648 URL: https://issues.apache.org/jira/browse/HIVE-1648 Project: Hive Issue Type: Sub-task Reporter: Ning Zhang Assignee: Paul Butler Attachments: HIVE-1648.2.patch, HIVE-1648.patch HIVE-1361 introduces a new command 'ANALYZE TABLE T COMPUTE STATISTICS' to gathering stats. This requires additional scan of the data. Stats gathering can be piggy-backed on TableScanOperator whenever a table/partition is scanned (given not LIMIT operator). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
jmx metrics for metastore server
Hi all, We were looking at monitoring requirements from within Howl, which essentially translates to monitoring requirements for the Metastore server and would like to add in volume and latency jmx counters to metastore server calls. I notice an old and somewhat inactive jira : https://issues.apache.org/jira/browse/HIVE-551 which seems to mention jmx metrics for hive, but from an overall point of view, where each query running on hive, and the cli would need tracking. If we started work on just metastore monitoring for now, is there interest in such a thing? Is there any other jira or plan to do anything similar? If not, I can open up a jira on it and can work on it. -Sushanth
Re: jmx metrics for metastore server
On Tue, Nov 9, 2010 at 5:38 PM, Sushanth Sowmyan khorg...@gmail.com wrote: Hi all, We were looking at monitoring requirements from within Howl, which essentially translates to monitoring requirements for the Metastore server and would like to add in volume and latency jmx counters to metastore server calls. I notice an old and somewhat inactive jira : https://issues.apache.org/jira/browse/HIVE-551 which seems to mention jmx metrics for hive, but from an overall point of view, where each query running on hive, and the cli would need tracking. If we started work on just metastore monitoring for now, is there interest in such a thing? Is there any other jira or plan to do anything similar? If not, I can open up a jira on it and can work on it. -Sushanth You are correct I have that ticket open and the Metrics are query centric. I think metastore counters makes sense however I wonder what % of people are using a Metastore server as opposed to just using a LOCAL (JDBC) metastore? Edward
[jira] Updated: (HIVE-1776) parallel execution and auto-local mode combine to place plan file in wrong file system
[ https://issues.apache.org/jira/browse/HIVE-1776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joydeep Sen Sarma updated HIVE-1776: Attachment: HIVE-1776.1.patch the problem is that tasks are trying to modify the shared hive configuration object and trampling each other. fix is to clone the configuration object before modifying it in the Task. parallel execution and auto-local mode combine to place plan file in wrong file system -- Key: HIVE-1776 URL: https://issues.apache.org/jira/browse/HIVE-1776 Project: Hive Issue Type: Bug Reporter: Joydeep Sen Sarma Attachments: HIVE-1776.1.patch A query (that i can't reproduce verbatim) submits a job to a MR cluster with a plan file that is resident on the local file system. This job obviously fails. This seems to result from an interaction between the parallel execution (which is trying to run one local and one remote job at the same time). Turning off either the parallel execution mode or the auto-local mode seems to fix the problem. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [jira] Updated: (HIVE-1776) parallel execution and auto-local mode combine to place plan file in wrong file system
This line is removed from MapRedTask.java: ctx.setOriginalTracker(conf.getVar(HiveConf.ConfVars.HADOOPJT)); I assume this is intentional. On Tue, Nov 9, 2010 at 3:28 PM, Joydeep Sen Sarma (JIRA) j...@apache.orgwrote: [ https://issues.apache.org/jira/browse/HIVE-1776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel] Joydeep Sen Sarma updated HIVE-1776: Attachment: HIVE-1776.1.patch the problem is that tasks are trying to modify the shared hive configuration object and trampling each other. fix is to clone the configuration object before modifying it in the Task. parallel execution and auto-local mode combine to place plan file in wrong file system -- Key: HIVE-1776 URL: https://issues.apache.org/jira/browse/HIVE-1776 Project: Hive Issue Type: Bug Reporter: Joydeep Sen Sarma Attachments: HIVE-1776.1.patch A query (that i can't reproduce verbatim) submits a job to a MR cluster with a plan file that is resident on the local file system. This job obviously fails. This seems to result from an interaction between the parallel execution (which is trying to run one local and one remote job at the same time). Turning off either the parallel execution mode or the auto-local mode seems to fix the problem. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1776) parallel execution and auto-local mode combine to place plan file in wrong file system
[ https://issues.apache.org/jira/browse/HIVE-1776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930359#action_12930359 ] Ted Yu commented on HIVE-1776: -- This line is removed from MapRedTask.java: ctx.setOriginalTracker(conf.getVar(HiveConf.ConfVars.HADOOPJT)); I assume this is intentional. parallel execution and auto-local mode combine to place plan file in wrong file system -- Key: HIVE-1776 URL: https://issues.apache.org/jira/browse/HIVE-1776 Project: Hive Issue Type: Bug Reporter: Joydeep Sen Sarma Assignee: Joydeep Sen Sarma Attachments: HIVE-1776.1.patch A query (that i can't reproduce verbatim) submits a job to a MR cluster with a plan file that is resident on the local file system. This job obviously fails. This seems to result from an interaction between the parallel execution (which is trying to run one local and one remote job at the same time). Turning off either the parallel execution mode or the auto-local mode seems to fix the problem. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1776) parallel execution and auto-local mode combine to place plan file in wrong file system
[ https://issues.apache.org/jira/browse/HIVE-1776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930368#action_12930368 ] Joydeep Sen Sarma commented on HIVE-1776: - yeah it was - but shoot - i forgot to take out the corresponding call in the finally block to restore the tracker. will upload new patch. these are no longer necessary because we are using a cloned configuration object that is discarded once the task completes. parallel execution and auto-local mode combine to place plan file in wrong file system -- Key: HIVE-1776 URL: https://issues.apache.org/jira/browse/HIVE-1776 Project: Hive Issue Type: Bug Reporter: Joydeep Sen Sarma Assignee: Joydeep Sen Sarma Attachments: HIVE-1776.1.patch A query (that i can't reproduce verbatim) submits a job to a MR cluster with a plan file that is resident on the local file system. This job obviously fails. This seems to result from an interaction between the parallel execution (which is trying to run one local and one remote job at the same time). Turning off either the parallel execution mode or the auto-local mode seems to fix the problem. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1779) Implement GenericUDF str_to_map
Implement GenericUDF str_to_map --- Key: HIVE-1779 URL: https://issues.apache.org/jira/browse/HIVE-1779 Project: Hive Issue Type: New Feature Reporter: Siying Dong Assignee: Siying Dong Priority: Minor People need way to load their data to map. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1779) Implement GenericUDF str_to_map
[ https://issues.apache.org/jira/browse/HIVE-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-1779: -- Attachment: HIVE-1779.1.patch Implement GenericUDF str_to_map --- Key: HIVE-1779 URL: https://issues.apache.org/jira/browse/HIVE-1779 Project: Hive Issue Type: New Feature Reporter: Siying Dong Assignee: Siying Dong Priority: Minor Attachments: HIVE-1779.1.patch People need way to load their data to map. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1779) Implement GenericUDF str_to_map
[ https://issues.apache.org/jira/browse/HIVE-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-1779: -- Status: Patch Available (was: Open) Implement GenericUDF str_to_map --- Key: HIVE-1779 URL: https://issues.apache.org/jira/browse/HIVE-1779 Project: Hive Issue Type: New Feature Reporter: Siying Dong Assignee: Siying Dong Priority: Minor Attachments: HIVE-1779.1.patch People need way to load their data to map. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1754) Remove JDBM component from Map Join
[ https://issues.apache.org/jira/browse/HIVE-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930405#action_12930405 ] He Yongqiang commented on HIVE-1754: 1. code style: A new file always needs a Apache license header. And for example: {noformat} public class PathUtil { public static String suffix=.hashtable; public static String generatePath(String baseURI,Byte tag,String bigBucketFileName){ String path = new String(baseURI+Path.SEPARATOR+-+tag+-+bigBucketFileName+suffix); return path; } public static String generateFileName(Byte tag,String bigBucketFileName){ String fileName = new String(-+tag+-+bigBucketFileName+suffix); return fileName; } public static String generateTmpURI(String baseURI,String id){ String tmpFileURI = new String(baseURI+Path.SEPARATOR+HashTable-+id); return tmpFileURI; } } {noformat} Should be formated to : {noformat} /** * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. See the NOTICE file * distributed with this work for additional information * regarding copyright ownership. The ASF licenses this file * to you under the Apache License, Version 2.0 (the * License); you may not use this file except in compliance * with the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an AS IS BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.hadoop.hive.ql.util; import org.apache.hadoop.fs.Path; public class PathUtil { public static String suffix = .hashtable; public static String generatePath(String baseURI, Byte tag, String bigBucketFileName) { String path = new String(baseURI + Path.SEPARATOR + - + tag + - + bigBucketFileName + suffix); return path; } public static String generateFileName(Byte tag, String bigBucketFileName) { String fileName = new String(- + tag + - + bigBucketFileName + suffix); return fileName; } public static String generateTmpURI(String baseURI, String id) { String tmpFileURI = new String(baseURI + Path.SEPARATOR + HashTable- + id); return tmpFileURI; } } {noformat} 2. Let's put PathUtil.java and TimeUtil.java into a HiveUtil class, like Utilities in exec (or create a new one and put in exec package or common package). 3. In ExecDriver.java -//Qualify the path against the filesystem. The user configured path might contain default port which is skipped -//in the file status. This makes sure that all paths which goes into PathToPartitionInfo are always listed status -//filepath. -newPath = fs.makeQualified(newPath); why these code are removed? They should be there. 4. revert the changes in ExecMapper. keep it clean. 5. code style in HashTableDummyOperator. add a default serialize id. do not use 2 blank lines inside a method. keep at least one blank line between 2 method definitons. 6. remove some never read vars from HashTableSinkOperator. {noformat} protected transient MapByte, ListObjectInspector rowContainerStandardObjectInspectors; {noformat} should be in one line. generateMapMetaData(); can be put into init(). MapJoinRowContainer res = null; should be parameterized. int bucketSize = HiveConf.getIntVar(hconf, HiveConf.ConfVars.HIVEMAPJOINBUCKETCACHESIZE); should be put into init(). bucketSize can be a class field. res.add(value); is duplicate in if () {} else {}. Put it after the if else. In close(), if the abort is true, do we need to do the dump? {noformat} String bigBucketFileName = this.getExecContext().getCurrentBigBucketFile(); if(bigBucketFileName == null ||bigBucketFileName.length()==0) { bigBucketFileName=-; } {noformat} I guess if we run it locally, the bigBucketFileName is always null. Is that true. If yes, how does this patch handle the bucket map join? 7. revert changes of MapRedTask 8. AbstractRowContainer/MapJoinDoubleKeys/MapJoinRowContainer/MapJoinSingleKey misses the apache header. Please make sure cleaning up the code. Remove JDBM component from Map Join --- Key: HIVE-1754 URL: https://issues.apache.org/jira/browse/HIVE-1754 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.6.0, 0.7.0 Reporter: Liyin Tang Assignee: Liyin Tang Fix For: 0.7.0 Attachments: Hive-1754.patch, Hive-1754_2.patch, Hive-1754_3.patch, hive-1754_4.patch, hive-1754_5.patch Right now, JDBM is the major performance
[jira] Created: (HIVE-1780) Typo in hive-default.xml
Typo in hive-default.xml Key: HIVE-1780 URL: https://issues.apache.org/jira/browse/HIVE-1780 Project: Hive Issue Type: Bug Components: Configuration Reporter: YoungWoo Kim Priority: Trivial Fix For: 0.7.0 'CombineHiveInputFormat' is spelt incorrectly in the hive-default.xml: It should be 'CombineHiveInputFormat' instead of 'CombinedHiveInputFormat'. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1780) Typo in hive-default.xml
[ https://issues.apache.org/jira/browse/HIVE-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YoungWoo Kim updated HIVE-1780: --- Attachment: HIVE-1780.patch A patch for fixing tyops Typo in hive-default.xml Key: HIVE-1780 URL: https://issues.apache.org/jira/browse/HIVE-1780 Project: Hive Issue Type: Bug Components: Configuration Reporter: YoungWoo Kim Priority: Trivial Fix For: 0.7.0 Attachments: HIVE-1780.patch 'CombineHiveInputFormat' is spelt incorrectly in the hive-default.xml: It should be 'CombineHiveInputFormat' instead of 'CombinedHiveInputFormat'. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Confusing in hive-default.xml
Hi, I filed a jira for this and attached a patch. https://issues.apache.org/jira/browse/HIVE-1780 - Youngwoo 2010/11/10 Namit Jain nj...@facebook.com It is a typo. Can you file a jira and fix it ? Thanks, -namit -Original Message- From: 김영우 [mailto:warwit...@gmail.com] Sent: Tuesday, November 09, 2010 3:12 AM To: dev@hive.apache.org Subject: Confusing in hive-default.xml Hi, Myabe it is a typo but i'm not sure. Excerpt from hive-default.xml (trunk): property namehive.input.format/name valueorg.apache.hadoop.hive.ql.io.HiveInputFormat/value descriptionThe default input format, if it is not specified, the system assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and 19, whereas it is set to CombinedHiveInputFormat for hadoop 20. The user can always overwrite it - if there is a bug in CombinedHiveInputFormat, it can always be manually set to HiveInputFormat. /description /property The 'CombinedHiveInputFormat' does not exist. It should be 'CombineHiveInputFormat' so the property's value is 'org.apache.hadoop.hive.ql.io.CombineHiveInputFormat'. I don't know whether it's a intention or typo. Thanks. - Youngwoo
[jira] Updated: (HIVE-1696) Add delegation token support to metastore
[ https://issues.apache.org/jira/browse/HIVE-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-1696: - Component/s: Server Infrastructure Security Add delegation token support to metastore - Key: HIVE-1696 URL: https://issues.apache.org/jira/browse/HIVE-1696 Project: Hive Issue Type: Sub-task Components: Metastore, Security, Server Infrastructure Reporter: Todd Lipcon As discussed in HIVE-842, kerberos authentication is only sufficient for authentication of a hive user client to the metastore. There are other cases where thrift calls need to be authenticated when the caller is running in an environment without kerberos credentials. For example, an MR task running as part of a hive job may want to report statistics to the metastore, or a job may be running within the context of Oozie or Hive Server. This JIRA is to implement support of delegation tokens for the metastore. The concept of a delegation token is borrowed from the Hadoop security design - the quick summary is that a kerberos-authenticated client may retrieve a binary token from the server. This token can then be passed to other clients which can use it to achieve authentication as the original user in lieu of a kerberos ticket. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1712) Migrating metadata from derby to mysql thrown NullPointerException
[ https://issues.apache.org/jira/browse/HIVE-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930443#action_12930443 ] Paul Yang commented on HIVE-1712: - +1 looks good, will test/commit Migrating metadata from derby to mysql thrown NullPointerException -- Key: HIVE-1712 URL: https://issues.apache.org/jira/browse/HIVE-1712 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.5.0, 0.6.0 Reporter: Jake Farrell Fix For: 0.7.0 Attachments: hive-1712.patch, hive-1712_rebase.patch Exported derby data to csv, loaded data into mysql and ran hive query which worked in derby and got the following exception 2010-10-16 08:57:29,080 INFO metastore.ObjectStore (ObjectStore.java:setConf(106)) - Initialized ObjectStore 2010-10-16 08:57:29,552 INFO metastore.HiveMetaStore (HiveMetaStore.java:logStartFunction(171)) - 0: get_table : db=default tbl=testimport 2010-10-16 08:57:30,140 ERROR metadata.Hive (Hive.java:getTable(395)) - java.lang.NullPointerException at java.util.Hashtable.put(Hashtable.java:394) at java.util.Hashtable.putAll(Hashtable.java:466) at org.apache.hadoop.hive.metastore.MetaStoreUtils.getSchema(MetaStoreUtils.java:520) at org.apache.hadoop.hive.metastore.MetaStoreUtils.getSchema(MetaStoreUtils.java:489) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:381) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:333) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:683) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5200) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275) at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1743) Group-by to determine equals of Keys in reverse order
[ https://issues.apache.org/jira/browse/HIVE-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-1743: -- Status: Patch Available (was: Open) Group-by to determine equals of Keys in reverse order - Key: HIVE-1743 URL: https://issues.apache.org/jira/browse/HIVE-1743 Project: Hive Issue Type: Improvement Reporter: Siying Dong Assignee: Siying Dong Priority: Minor Attachments: HIVE-1743.1.patch When processing group-by, in reduce side, keys are ordered. Comparing equality of two keys can be more efficient in reverse order. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1746) Support for using ALTER to set IDXPROPERTIES
[ https://issues.apache.org/jira/browse/HIVE-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marquis Wang updated HIVE-1746: --- Attachment: HIVE-1746.3.patch Support for using ALTER to set IDXPROPERTIES Key: HIVE-1746 URL: https://issues.apache.org/jira/browse/HIVE-1746 Project: Hive Issue Type: Improvement Components: Indexing Affects Versions: 0.7.0 Reporter: Marquis Wang Assignee: Marquis Wang Fix For: 0.7.0 Attachments: 1746.prelim.patch, HIVE-1746.2.patch, HIVE-1746.3.patch Hive-1498 has support for IDXPROPERTIES on index creation, so now we want to support ALTERing those properties. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1746) Support for using ALTER to set IDXPROPERTIES
[ https://issues.apache.org/jira/browse/HIVE-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930483#action_12930483 ] Marquis Wang commented on HIVE-1746: New patch. Eliminates println calls, adds private updateModifiedParameters method, and pass the database name into AlterIndexDesc. Otherwise the same. Support for using ALTER to set IDXPROPERTIES Key: HIVE-1746 URL: https://issues.apache.org/jira/browse/HIVE-1746 Project: Hive Issue Type: Improvement Components: Indexing Affects Versions: 0.7.0 Reporter: Marquis Wang Assignee: Marquis Wang Fix For: 0.7.0 Attachments: 1746.prelim.patch, HIVE-1746.2.patch, HIVE-1746.3.patch Hive-1498 has support for IDXPROPERTIES on index creation, so now we want to support ALTERing those properties. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HIVE-1496) enhance CREATE INDEX to support immediate index build
[ https://issues.apache.org/jira/browse/HIVE-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marquis Wang reassigned HIVE-1496: -- Assignee: Marquis Wang (was: Russell Melick) enhance CREATE INDEX to support immediate index build - Key: HIVE-1496 URL: https://issues.apache.org/jira/browse/HIVE-1496 Project: Hive Issue Type: Improvement Components: Indexing Affects Versions: 0.7.0 Reporter: John Sichi Assignee: Marquis Wang Fix For: 0.7.0 Currently we only support WITH DEFERRED REBUILD. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1501) when generating reentrant INSERT for index rebuild, quote identifiers using backticks
[ https://issues.apache.org/jira/browse/HIVE-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Skye Berghel updated HIVE-1501: --- Status: Patch Available (was: Open) when generating reentrant INSERT for index rebuild, quote identifiers using backticks - Key: HIVE-1501 URL: https://issues.apache.org/jira/browse/HIVE-1501 Project: Hive Issue Type: Bug Components: Indexing Affects Versions: 0.7.0 Reporter: John Sichi Assignee: Skye Berghel Fix For: 0.7.0 Attachments: 1501.patch, 1501_new_tests.patch, 1501_with_tests.patch, HIVE-1501.4.patch, HIVE-1501.5.patch, HIVE-1501.6.patch Yongqiang, you mentioned that you weren't able to do this due to SORT BY not accepting them. The SORT BY is gone now as of HIVE-1494 (and SORT BY needs to be fixed anyway). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1501) when generating reentrant INSERT for index rebuild, quote identifiers using backticks
[ https://issues.apache.org/jira/browse/HIVE-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Skye Berghel updated HIVE-1501: --- Attachment: HIVE-1501.6.patch Added a new patch with formatting fixes. when generating reentrant INSERT for index rebuild, quote identifiers using backticks - Key: HIVE-1501 URL: https://issues.apache.org/jira/browse/HIVE-1501 Project: Hive Issue Type: Bug Components: Indexing Affects Versions: 0.7.0 Reporter: John Sichi Assignee: Skye Berghel Fix For: 0.7.0 Attachments: 1501.patch, 1501_new_tests.patch, 1501_with_tests.patch, HIVE-1501.4.patch, HIVE-1501.5.patch, HIVE-1501.6.patch Yongqiang, you mentioned that you weren't able to do this due to SORT BY not accepting them. The SORT BY is gone now as of HIVE-1494 (and SORT BY needs to be fixed anyway). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.