[jira] [Updated] (HIVE-3218) When big table has two or more partitions on SMBJoin it fails at runtime
[ https://issues.apache.org/jira/browse/HIVE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-3218: Status: Open (was: Patch Available) This happens with bucket mapjoin, too. It would be better to fix it along with smb join. When big table has two or more partitions on SMBJoin it fails at runtime Key: HIVE-3218 URL: https://issues.apache.org/jira/browse/HIVE-3218 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.10.0 Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-3218.1.patch.txt {noformat} drop table hive_test_smb_bucket1; drop table hive_test_smb_bucket2; create table hive_test_smb_bucket1 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets; create table hive_test_smb_bucket2 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets; set hive.enforce.bucketing = true; set hive.enforce.sorting = true; insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-14') select key, value from src; insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-15') select key, value from src; insert overwrite table hive_test_smb_bucket2 partition (ds='2010-10-15') select key, value from src; set hive.optimize.bucketmapjoin = true; set hive.optimize.bucketmapjoin.sortedmerge = true; set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat; SELECT /* + MAPJOIN(b) */ * FROM hive_test_smb_bucket1 a JOIN hive_test_smb_bucket2 b ON a.key = b.key; {noformat} which make bucket join context.. {noformat} Alias Bucket Output File Name Mapping: hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/00_0 0 hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/01_0 1 hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/00_0 0 hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/01_0 1 {noformat} fails with exception {noformat} java.lang.RuntimeException: Hive Runtime Error while closing operators at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:226) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127) at org.apache.hadoop.mapred.Child.main(Child.java:264) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename output from: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_task_tmp.-ext-10001/_tmp.01_0 to: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_tmp.-ext-10001/01_0 at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.commit(FileSinkOperator.java:198) at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.access$300(FileSinkOperator.java:100) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:717) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:557) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566) at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193) ... 8 more {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3221) HiveConf.getPositionFromInternalName does not support more than sinle digit column numbers
[ https://issues.apache.org/jira/browse/HIVE-3221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-3221: --- Status: Patch Available (was: Open) HiveConf.getPositionFromInternalName does not support more than sinle digit column numbers -- Key: HIVE-3221 URL: https://issues.apache.org/jira/browse/HIVE-3221 Project: Hive Issue Type: Bug Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-3221.patch For positions above 9, HiveConf.getPositionFromInternalName only looks at the last digit, and thus, causes collisions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3221) HiveConf.getPositionFromInternalName does not support more than sinle digit column numbers
[ https://issues.apache.org/jira/browse/HIVE-3221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-3221: --- Attachment: HIVE-3221.patch HiveConf.getPositionFromInternalName does not support more than sinle digit column numbers -- Key: HIVE-3221 URL: https://issues.apache.org/jira/browse/HIVE-3221 Project: Hive Issue Type: Bug Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-3221.patch For positions above 9, HiveConf.getPositionFromInternalName only looks at the last digit, and thus, causes collisions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3221) HiveConf.getPositionFromInternalName does not support more than sinle digit column numbers
[ https://issues.apache.org/jira/browse/HIVE-3221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13404728#comment-13404728 ] Sushanth Sowmyan commented on HIVE-3221: Now getting a new arc error on starting anew: -- tundra:hive sush$ arc diff --jira HIVE-3221 PHP Fatal error: Call to undefined method ArcanistGitAPI::amendGitHeadCommit() in /Users/sush/dev/hive.git/.arc_jira_lib/arcanist/ArcJIRAConfiguration.php on line 169 Fatal error: Call to undefined method ArcanistGitAPI::amendGitHeadCommit() in /Users/sush/dev/hive.git/.arc_jira_lib/arcanist/ArcJIRAConfiguration.php on line 169 -- Still, the patch attached has a unit test and is a fairly straightforward patch. Hopefully it can be reviewed easily. HiveConf.getPositionFromInternalName does not support more than sinle digit column numbers -- Key: HIVE-3221 URL: https://issues.apache.org/jira/browse/HIVE-3221 Project: Hive Issue Type: Bug Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-3221.patch For positions above 9, HiveConf.getPositionFromInternalName only looks at the last digit, and thus, causes collisions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-942) use bucketing for group by
[ https://issues.apache.org/jira/browse/HIVE-942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13404730#comment-13404730 ] Lianhui Wang commented on HIVE-942: --- i think in HIVE-931 ,the group by keys must be the same with the sort keys. but in the case that the group by keys contain the sort keys, it may be complete it to use the hash table on the mapper. for example: t is a bucket table, sort by c1,c2. sql: select t.c1,t.c2,t.c3.sum(t.c4) from t group by t.c1,t.c2,t.c3. i think generally that only use the hash table on the mapper.so do not do anything on the reducer. use bucketing for group by -- Key: HIVE-942 URL: https://issues.apache.org/jira/browse/HIVE-942 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Group by on a bucketed column can be completely performed on the mapper if the split can be adjusted to span the key boundary. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
Connect Squirrel SQL Client to access Hive tables
I am new to Hive, MapReduce and Hadoop, so I am newbie in this. I am using Putty to connect to hive table and access records in the tables. So what I did is- I opened Putty and in the host name I typed- ` ares-ingest.vip.name.com` and then I click `Open`. And then I entered my username and password and then few commands to get to Hive sql. Below is the list what I did $ bash bash-3.00$ hive Hive history file=/tmp/rjamal/hive_job_log_rjamal_201207010451_1212680168.txt hive set mapred.job.queue.name=hdmi-technology; hive select * from table LIMIT 1; So my question is- **Is there any other way I can do the same thing in any Sql client like Sql Developer or Squirel SQL Client instead of doing it from the command prompt. And if it is there then what is the step by step process to do this considering my example as I am logging to `ares-ingest.vip.name.com` from Putty .** And same thing if I need to do through JDBC Program in my windows machine then how I can do it. Means with the use of JDBC Program, how I can access Hive tables and get the result back. As I know how I can do this with the oracle tables. But the only confusion I have is, as I am using this hostname `ares-ingest.vip.name.com` to log into Putty. I am hoping the question is clear. Any suggestion will be appreciated. **In short my question is- Can I do the same thing in any SQLClient instead of logging from the Putty?** -Raihan Jamal
[jira] [Updated] (HIVE-3172) Remove the duplicate JAR entries from the (“test.classpath”) to avoid command line exceeding char limit on windows
[ https://issues.apache.org/jira/browse/HIVE-3172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-3172: --- Resolution: Fixed Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Kanna! Remove the duplicate JAR entries from the (“test.classpath”) to avoid command line exceeding char limit on windows --- Key: HIVE-3172 URL: https://issues.apache.org/jira/browse/HIVE-3172 Project: Hive Issue Type: Sub-task Components: Tests, Windows Affects Versions: 0.10.0 Environment: Windows Reporter: Kanna Karanam Assignee: Kanna Karanam Labels: Windows Fix For: 0.10.0 Attachments: HIVE-3172.1.patch.txt, HIVE-3172.2.patch.txt, HIVE-3172.3.patch.txt The maximum length of the DOS command string is 8191 characters (in Windows latest versions http://support.microsoft.com/kb/830473). Following entries in the “build-common.xml” are adding lot of duplicate JAR entries to the “test.classpath” and it exceeds the max character limit on windows very easily. !-- Include build/dist/lib on the classpath before Ivy and exclude hive jars from Ivy to make sure we get the local changes when we test Hive -- fileset dir=${build.dir.hive}/dist/lib includes=*.jar erroronmissingdir=false excludes=**/hive_contrib*.jar,**/hive-contrib*.jar,**/lib*.jar/ fileset dir=${hive.root}/build/ivy/lib/test includes=*.jar erroronmissingdir=false excludes=**/hive_*.jar,**/hive-*.jar/ fileset dir=${hive.root}/build/ivy/lib/default includes=*.jar erroronmissingdir=false excludes=**/hive_*.jar,**/hive-*.jar / fileset dir=${hive.root}/testlibs includes=*.jar/ Proposed solution (workaround)– 1)Include all JARs from dist\lib excluding **/hive_contrib*.jar,**/hive-contrib*.jar,**/lib*.jar 2)Select the specific jars (missing jars) from test/other folders, (that includes Hadoop-*.jar files) Thanks -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
Connect Squirrel SQL Client to access Hive tables
I am new to Hive, MapReduce and Hadoop, so I am newbie in this. I am using Putty to connect to hive table and access records in the tables. So what I did is- I opened Putty and in the host name I typed- ` ares-ingest.vip.name.com` and then I click `Open`. And then I entered my username and password and then few commands to get to Hive sql. Below is the list what I did $ bash bash-3.00$ hive Hive history file=/tmp/rjamal/hive_job_log_rjamal_201207010451_1212680168.txt hive set mapred.job.queue.name=hdmi-technology; hive select * from table LIMIT 1; So my question is- **Is there any other way I can do the same thing in any Sql client like Sql Developer or Squirel SQL Client instead of doing it from the command prompt. And if it is there then what is the step by step process to do this considering my example as I am logging to `ares-ingest.vip.name.com` from Putty .** And same thing if I need to do through JDBC Program in my windows machine then how I can do it. Means with the use of JDBC Program, how I can access Hive tables and get the result back. As I know how I can do this with the oracle tables. But the only confusion I have is, as I am using this hostname `ares-ingest.vip.name.com` to log into Putty. I am hoping the question is clear. Any suggestion will be appreciated. **In short my question is- Can I do the same thing in any SQLClient instead of logging from the Putty?** -Raihan Jamal
[jira] [Assigned] (HIVE-3145) Lock support for Metastore calls
[ https://issues.apache.org/jira/browse/HIVE-3145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Chalfant reassigned HIVE-3145: - Assignee: Andrew Chalfant Lock support for Metastore calls Key: HIVE-3145 URL: https://issues.apache.org/jira/browse/HIVE-3145 Project: Hive Issue Type: Improvement Components: Locking, Metastore Affects Versions: 0.10.0 Reporter: David Goode Assignee: Andrew Chalfant Priority: Minor Attachments: HIVE3145_lock.diff Original Estimate: 168h Remaining Estimate: 168h Added locking to the metastore calls. Currently failing some unit tests due to improper configuration I think; this needs to be resolved and new unit tests added. Also may want some code cleanup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: Allow to download resources from any external File Systems to local machine.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/5687/#review8776 --- http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java https://reviews.apache.org/r/5687/#comment18552 Instead of regex, it might be better to use URI to parse the string. String scheme = new Path(value).toURI().getScheme(); return (scheme != null) !scheme.equalsIgnoreCase(file); - Ashutosh Chauhan On June 30, 2012, 6:15 p.m., Kanna Karanam wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/5687/ --- (Updated June 30, 2012, 6:15 p.m.) Review request for hive, Carl Steinbach, Edward Capriolo, and Ashutosh Chauhan. Description --- Instead of restricting resources download to s3, s3n, hdfs make it open for any external file systems. This addresses bug HIVE-3146. https://issues.apache.org/jira/browse/HIVE-3146 Diffs - http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1355510 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 1355510 Diff: https://reviews.apache.org/r/5687/diff/ Testing --- Yes. All unit tests passed. Thanks, Kanna Karanam
[jira] [Updated] (HIVE-3146) Support external hive tables whose data are stored in Azure blob store/Azure Storage Volumes (ASV)
[ https://issues.apache.org/jira/browse/HIVE-3146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-3146: --- Status: Open (was: Patch Available) Support external hive tables whose data are stored in Azure blob store/Azure Storage Volumes (ASV) -- Key: HIVE-3146 URL: https://issues.apache.org/jira/browse/HIVE-3146 Project: Hive Issue Type: Sub-task Components: Windows Affects Versions: 0.10.0 Reporter: Kanna Karanam Assignee: Kanna Karanam Labels: Windows Fix For: 0.10.0 Attachments: HIVE-3146.1.patch.txt, HIVE-3146.2.patch.txt Support external hive tables whose data are stored in Azure blob store/Azure Storage Volumes (ASV) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Connect Squirrel SQL Client to access Hive tables
Check HIVE-3100 which is to include SQLLine as command line tool for Hive. thanks Prasad On Sun, Jul 1, 2012 at 11:58 AM, Raihan Jamal jamalrai...@gmail.com wrote: I am new to Hive, MapReduce and Hadoop, so I am newbie in this. I am using Putty to connect to hive table and access records in the tables. So what I did is- I opened Putty and in the host name I typed- ` ares-ingest.vip.name.com` and then I click `Open`. And then I entered my username and password and then few commands to get to Hive sql. Below is the list what I did $ bash bash-3.00$ hive Hive history file=/tmp/rjamal/hive_job_log_rjamal_201207010451_1212680168.txt hive set mapred.job.queue.name=hdmi-technology; hive select * from table LIMIT 1; So my question is- **Is there any other way I can do the same thing in any Sql client like Sql Developer or Squirel SQL Client instead of doing it from the command prompt. And if it is there then what is the step by step process to do this considering my example as I am logging to `ares-ingest.vip.name.com` from Putty .** And same thing if I need to do through JDBC Program in my windows machine then how I can do it. Means with the use of JDBC Program, how I can access Hive tables and get the result back. As I know how I can do this with the oracle tables. But the only confusion I have is, as I am using this hostname `ares-ingest.vip.name.com` to log into Putty. I am hoping the question is clear. Any suggestion will be appreciated. **In short my question is- Can I do the same thing in any SQLClient instead of logging from the Putty?** -Raihan Jamal