[jira] [Updated] (HIVE-3218) When big table has two or more partitions on SMBJoin it fails at runtime

2012-07-01 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-3218:


Status: Open  (was: Patch Available)

This happens with bucket mapjoin, too. It would be better to fix it along with 
smb join.

 When big table has two or more partitions on SMBJoin it fails at runtime
 

 Key: HIVE-3218
 URL: https://issues.apache.org/jira/browse/HIVE-3218
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-3218.1.patch.txt


 {noformat}
 drop table hive_test_smb_bucket1;
 drop table hive_test_smb_bucket2;
 create table hive_test_smb_bucket1 (key int, value string) partitioned by (ds 
 string) clustered by (key) sorted by (key) into 2 buckets;
 create table hive_test_smb_bucket2 (key int, value string) partitioned by (ds 
 string) clustered by (key) sorted by (key) into 2 buckets;
 set hive.enforce.bucketing = true;
 set hive.enforce.sorting = true;
 insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-14') 
 select key, value from src;
 insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-15') 
 select key, value from src;
 insert overwrite table hive_test_smb_bucket2 partition (ds='2010-10-15') 
 select key, value from src;
 set hive.optimize.bucketmapjoin = true;
 set hive.optimize.bucketmapjoin.sortedmerge = true;
 set hive.input.format = 
 org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
 SELECT /* + MAPJOIN(b) */ * FROM hive_test_smb_bucket1 a JOIN 
 hive_test_smb_bucket2 b ON a.key = b.key;
 {noformat}
 which make bucket join context..
 {noformat}
 Alias Bucket Output File Name Mapping:
 
 hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/00_0
  0
 
 hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/01_0
  1
 
 hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/00_0
  0
 
 hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/01_0
  1
 {noformat}
 fails with exception
 {noformat}
 java.lang.RuntimeException: Hive Runtime Error while closing operators
   at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:226)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:416)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
   at org.apache.hadoop.mapred.Child.main(Child.java:264)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename 
 output from: 
 hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_task_tmp.-ext-10001/_tmp.01_0
  to: 
 hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_tmp.-ext-10001/01_0
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.commit(FileSinkOperator.java:198)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.access$300(FileSinkOperator.java:100)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:717)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:557)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
   at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
   ... 8 more
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-3221) HiveConf.getPositionFromInternalName does not support more than sinle digit column numbers

2012-07-01 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-3221:
---

Status: Patch Available  (was: Open)

 HiveConf.getPositionFromInternalName does not support more than sinle digit 
 column numbers
 --

 Key: HIVE-3221
 URL: https://issues.apache.org/jira/browse/HIVE-3221
 Project: Hive
  Issue Type: Bug
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: HIVE-3221.patch


 For positions above 9, HiveConf.getPositionFromInternalName only looks at the 
 last digit, and thus, causes collisions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-3221) HiveConf.getPositionFromInternalName does not support more than sinle digit column numbers

2012-07-01 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-3221:
---

Attachment: HIVE-3221.patch

 HiveConf.getPositionFromInternalName does not support more than sinle digit 
 column numbers
 --

 Key: HIVE-3221
 URL: https://issues.apache.org/jira/browse/HIVE-3221
 Project: Hive
  Issue Type: Bug
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: HIVE-3221.patch


 For positions above 9, HiveConf.getPositionFromInternalName only looks at the 
 last digit, and thus, causes collisions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-3221) HiveConf.getPositionFromInternalName does not support more than sinle digit column numbers

2012-07-01 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13404728#comment-13404728
 ] 

Sushanth Sowmyan commented on HIVE-3221:


Now getting a new arc error on starting anew:

--
tundra:hive sush$ arc diff --jira HIVE-3221 
PHP Fatal error:  Call to undefined method ArcanistGitAPI::amendGitHeadCommit() 
in /Users/sush/dev/hive.git/.arc_jira_lib/arcanist/ArcJIRAConfiguration.php on 
line 169

Fatal error: Call to undefined method ArcanistGitAPI::amendGitHeadCommit() in 
/Users/sush/dev/hive.git/.arc_jira_lib/arcanist/ArcJIRAConfiguration.php on 
line 169
--

Still, the patch attached has a unit test and is a fairly straightforward 
patch. Hopefully it can be reviewed easily.

 HiveConf.getPositionFromInternalName does not support more than sinle digit 
 column numbers
 --

 Key: HIVE-3221
 URL: https://issues.apache.org/jira/browse/HIVE-3221
 Project: Hive
  Issue Type: Bug
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: HIVE-3221.patch


 For positions above 9, HiveConf.getPositionFromInternalName only looks at the 
 last digit, and thus, causes collisions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-942) use bucketing for group by

2012-07-01 Thread Lianhui Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13404730#comment-13404730
 ] 

Lianhui Wang commented on HIVE-942:
---

i think in HIVE-931 ,the group by keys must be the same with the sort keys.
but in the case that the group by keys contain the sort keys, it may be 
complete it to use the hash table on the mapper.
for example:
t is a bucket table, sort by c1,c2.
sql: select t.c1,t.c2,t.c3.sum(t.c4) from t group by t.c1,t.c2,t.c3.
i think generally that only use the hash table on the mapper.so do not do 
anything on the reducer.
 

 use bucketing for group by
 --

 Key: HIVE-942
 URL: https://issues.apache.org/jira/browse/HIVE-942
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain

 Group by on a bucketed column can be completely performed on the mapper if 
 the split can be adjusted to span the key boundary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Connect Squirrel SQL Client to access Hive tables

2012-07-01 Thread Raihan Jamal

 I am new to Hive, MapReduce and Hadoop, so I am newbie in this.
 I am using Putty to connect to hive table and access records in the
 tables. So what I did is- I opened Putty and in the host name I typed- `
 ares-ingest.vip.name.com` and then I click `Open`. And then I entered my
 username and password and then few commands to get to Hive sql. Below is
 the list what I did

 $ bash
 bash-3.00$ hive
 Hive history
 file=/tmp/rjamal/hive_job_log_rjamal_201207010451_1212680168.txt
 hive set mapred.job.queue.name=hdmi-technology;
 hive select * from table LIMIT 1;

 So my question is-

 **Is there any other way I can do the same thing in any Sql client like
 Sql Developer or Squirel SQL Client instead of doing it from the command
 prompt. And if it is there then what is the step by step process to do this
 considering my example as I am logging to `ares-ingest.vip.name.com` from
 Putty .**

 And same thing if I need to do through JDBC Program in my windows machine
 then how I can do it. Means with the use of JDBC Program, how I can access
 Hive tables and get the result back. As I know how I can do this with the
 oracle tables. But the only confusion I have is, as I am using this
 hostname `ares-ingest.vip.name.com` to log into Putty. I am hoping the
 question is clear. Any suggestion will be appreciated.

 **In short my question is- Can I do the same thing in any SQLClient
 instead of logging from the Putty?**


 -Raihan Jamal



[jira] [Updated] (HIVE-3172) Remove the duplicate JAR entries from the (“test.classpath”) to avoid command line exceeding char limit on windows

2012-07-01 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-3172:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Kanna!

 Remove the duplicate JAR entries from the (“test.classpath”) to avoid command 
 line exceeding char limit on windows 
 ---

 Key: HIVE-3172
 URL: https://issues.apache.org/jira/browse/HIVE-3172
 Project: Hive
  Issue Type: Sub-task
  Components: Tests, Windows
Affects Versions: 0.10.0
 Environment: Windows
Reporter: Kanna Karanam
Assignee: Kanna Karanam
  Labels: Windows
 Fix For: 0.10.0

 Attachments: HIVE-3172.1.patch.txt, HIVE-3172.2.patch.txt, 
 HIVE-3172.3.patch.txt


 The maximum length of the DOS command string is 8191 characters (in Windows 
 latest versions http://support.microsoft.com/kb/830473). Following entries in 
 the “build-common.xml” are adding lot of duplicate JAR entries to the 
 “test.classpath” and it exceeds the max character limit on windows very 
 easily. 
 !-- Include build/dist/lib on the classpath before Ivy and exclude hive jars 
 from Ivy to make sure we get the local changes when we test Hive --
 fileset dir=${build.dir.hive}/dist/lib includes=*.jar 
 erroronmissingdir=false 
 excludes=**/hive_contrib*.jar,**/hive-contrib*.jar,**/lib*.jar/
 fileset dir=${hive.root}/build/ivy/lib/test includes=*.jar 
 erroronmissingdir=false excludes=**/hive_*.jar,**/hive-*.jar/
 fileset dir=${hive.root}/build/ivy/lib/default includes=*.jar 
 erroronmissingdir=false excludes=**/hive_*.jar,**/hive-*.jar /
 fileset dir=${hive.root}/testlibs includes=*.jar/
 Proposed solution (workaround)–
 1)Include all JARs from dist\lib excluding 
 **/hive_contrib*.jar,**/hive-contrib*.jar,**/lib*.jar
 2)Select the specific jars (missing jars) from test/other folders, (that 
 includes Hadoop-*.jar files)
 Thanks

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Connect Squirrel SQL Client to access Hive tables

2012-07-01 Thread Raihan Jamal

 I am new to Hive, MapReduce and Hadoop, so I am newbie in this.
 I am using Putty to connect to hive table and access records in the
 tables. So what I did is- I opened Putty and in the host name I typed- `
 ares-ingest.vip.name.com` and then I click `Open`. And then I entered my
 username and password and then few commands to get to Hive sql. Below is
 the list what I did

 $ bash
 bash-3.00$ hive
 Hive history
 file=/tmp/rjamal/hive_job_log_rjamal_201207010451_1212680168.txt
 hive set mapred.job.queue.name=hdmi-technology;
 hive select * from table LIMIT 1;

 So my question is-

 **Is there any other way I can do the same thing in any Sql client like
 Sql Developer or Squirel SQL Client instead of doing it from the command
 prompt. And if it is there then what is the step by step process to do this
 considering my example as I am logging to `ares-ingest.vip.name.com` from
 Putty .**

 And same thing if I need to do through JDBC Program in my windows machine
 then how I can do it. Means with the use of JDBC Program, how I can access
 Hive tables and get the result back. As I know how I can do this with the
 oracle tables. But the only confusion I have is, as I am using this
 hostname `ares-ingest.vip.name.com` to log into Putty. I am hoping the
 question is clear. Any suggestion will be appreciated.

 **In short my question is- Can I do the same thing in any SQLClient
 instead of logging from the Putty?**


 -Raihan Jamal



[jira] [Assigned] (HIVE-3145) Lock support for Metastore calls

2012-07-01 Thread Andrew Chalfant (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Chalfant reassigned HIVE-3145:
-

Assignee: Andrew Chalfant

 Lock support for Metastore calls
 

 Key: HIVE-3145
 URL: https://issues.apache.org/jira/browse/HIVE-3145
 Project: Hive
  Issue Type: Improvement
  Components: Locking, Metastore
Affects Versions: 0.10.0
Reporter: David Goode
Assignee: Andrew Chalfant
Priority: Minor
 Attachments: HIVE3145_lock.diff

   Original Estimate: 168h
  Remaining Estimate: 168h

 Added locking to the metastore calls. Currently failing some unit tests due 
 to improper configuration I think; this needs to be resolved and new unit 
 tests added. Also may want some code cleanup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Review Request: Allow to download resources from any external File Systems to local machine.

2012-07-01 Thread Ashutosh Chauhan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5687/#review8776
---



http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java
https://reviews.apache.org/r/5687/#comment18552

Instead of regex, it might be better to use URI to parse the string.
String scheme = new Path(value).toURI().getScheme();
return (scheme != null)  !scheme.equalsIgnoreCase(file);


- Ashutosh Chauhan


On June 30, 2012, 6:15 p.m., Kanna Karanam wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/5687/
 ---
 
 (Updated June 30, 2012, 6:15 p.m.)
 
 
 Review request for hive, Carl Steinbach, Edward  Capriolo, and Ashutosh 
 Chauhan.
 
 
 Description
 ---
 
 Instead of restricting resources download to s3, s3n, hdfs make it open 
 for any external file systems.
 
 
 This addresses bug HIVE-3146.
 https://issues.apache.org/jira/browse/HIVE-3146
 
 
 Diffs
 -
 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
  1355510 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java
  1355510 
 
 Diff: https://reviews.apache.org/r/5687/diff/
 
 
 Testing
 ---
 
 Yes. All unit tests passed.
 
 
 Thanks,
 
 Kanna Karanam
 




[jira] [Updated] (HIVE-3146) Support external hive tables whose data are stored in Azure blob store/Azure Storage Volumes (ASV)

2012-07-01 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-3146:
---

Status: Open  (was: Patch Available)

 Support external hive tables whose data are stored in Azure blob store/Azure 
 Storage Volumes (ASV)
 --

 Key: HIVE-3146
 URL: https://issues.apache.org/jira/browse/HIVE-3146
 Project: Hive
  Issue Type: Sub-task
  Components: Windows
Affects Versions: 0.10.0
Reporter: Kanna Karanam
Assignee: Kanna Karanam
  Labels: Windows
 Fix For: 0.10.0

 Attachments: HIVE-3146.1.patch.txt, HIVE-3146.2.patch.txt


 Support external hive tables whose data are stored in Azure blob store/Azure 
 Storage Volumes (ASV)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Connect Squirrel SQL Client to access Hive tables

2012-07-01 Thread Prasad Mujumdar
   Check HIVE-3100 which is to include SQLLine as command line tool for
Hive.

thanks
Prasad

On Sun, Jul 1, 2012 at 11:58 AM, Raihan Jamal jamalrai...@gmail.com wrote:

 I am new to Hive, MapReduce and Hadoop, so I am newbie in this.
 I am using Putty to connect to hive table and access records in the tables.
 So what I did is- I opened Putty and in the host name I typed- `
 ares-ingest.vip.name.com` and then I click `Open`. And then I entered my
 username and password and then few commands to get to Hive sql. Below is
 the list what I did

 $ bash
 bash-3.00$ hive
 Hive history
 file=/tmp/rjamal/hive_job_log_rjamal_201207010451_1212680168.txt
 hive set mapred.job.queue.name=hdmi-technology;
 hive select * from table LIMIT 1;

 So my question is-

 **Is there any other way I can do the same thing in any Sql client like Sql
 Developer or Squirel SQL Client instead of doing it from the command
 prompt. And if it is there then what is the step by step process to do this
 considering my example as I am logging to `ares-ingest.vip.name.com` from
 Putty .**

 And same thing if I need to do through JDBC Program in my windows machine
 then how I can do it. Means with the use of JDBC Program, how I can access
 Hive tables and get the result back. As I know how I can do this with the
 oracle tables. But the only confusion I have is, as I am using this
 hostname `ares-ingest.vip.name.com` to log into Putty. I am hoping the
 question is clear. Any suggestion will be appreciated.

 **In short my question is- Can I do the same thing in any SQLClient instead
 of logging from the Putty?**


 -Raihan Jamal