Re: select failed in Hive. java.io.FileNotFoundException
Set the hive class path correctly. *Thanks Regards* ∞ Shashwat Shriparv On Mon, May 13, 2013 at 1:10 PM, gautier vivard gaut.viv...@gmail.comwrote: Hey, I'm beginner in the hadoop world, and more specifically in Hive (I work on hive-0.9.0 on Ubuntu). I work on MapR M3 in standalone mode. I'm training on hive and have some issues with one query : SELECT year FROM records; Automatically selecting local only mode for query Total MapReduce jobs = 1 Launching Job 1 out of 1 Num 'java.io.FileNotFoundExceptionber of reduce tasks is set to 0 since there's no reduce operator Execution log at: /tmp/mapr/mapr_20130506044545_e2ba972e-84a5-4e68-a33b-580be4a8d866.log java.io.FileNotFoundException: /var/mapr/cluster/mapred/jobTracker/staging/mapr406767829/.staging () at org.apache.hadoop.io.nativeio.NativeIO.chmod(Native Method) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:518) at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:330) at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189) at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:126) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:875) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:869) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:869) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:843) at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:435) at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:693) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:197) Job Submission failed with exception 'java.io.FileNotFoundException(/var/mapr/cluster/mapred/jobTracker/staging/mapr406767829/.staging ())' Execution failed with exit status: 2 Obtaining error information Task failed! Task ID: Stage-1 Logs: /tmp/mapr/hive.log FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask But when I tried : select * from records; It works fine. I read that's because there is no reduce tasks in this query. Any idea? Thank's Gautier
Re: select failed in Hive. java.io.FileNotFoundException
Thank's for your response. I'm really a beginner, so I don't really understand what you mean by : correctly 2013/5/13 shashwat shriparv dwivedishash...@gmail.com Set the hive class path correctly. *Thanks Regards* ∞ Shashwat Shriparv On Mon, May 13, 2013 at 1:10 PM, gautier vivard gaut.viv...@gmail.comwrote: Hey, I'm beginner in the hadoop world, and more specifically in Hive (I work on hive-0.9.0 on Ubuntu). I work on MapR M3 in standalone mode. I'm training on hive and have some issues with one query : SELECT year FROM records; Automatically selecting local only mode for query Total MapReduce jobs = 1 Launching Job 1 out of 1 Num 'java.io.FileNotFoundExceptionber of reduce tasks is set to 0 since there's no reduce operator Execution log at: /tmp/mapr/mapr_20130506044545_e2ba972e-84a5-4e68-a33b-580be4a8d866.log java.io.FileNotFoundException: /var/mapr/cluster/mapred/jobTracker/staging/mapr406767829/.staging () at org.apache.hadoop.io.nativeio.NativeIO.chmod(Native Method) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:518) at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:330) at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189) at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:126) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:875) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:869) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:869) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:843) at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:435) at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:693) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:197) Job Submission failed with exception 'java.io.FileNotFoundException(/var/mapr/cluster/mapred/jobTracker/staging/mapr406767829/.staging ())' Execution failed with exit status: 2 Obtaining error information Task failed! Task ID: Stage-1 Logs: /tmp/mapr/hive.log FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask But when I tried : select * from records; It works fine. I read that's because there is no reduce tasks in this query. Any idea? Thank's Gautier
Trying to write a custom HiveOutputFormat
Hi guys, I'm currently writing my on HiveOutputFormat as I would like to write the output of hive queries into a specific protobuf format my team is using. I have managed to do this however, the Writable object I get from Hive as a result of a SELECT query is of type Text. This means that I have to split the string to find my fields but that's very error prone, specially if some fields are strings that may contain spaces. My question is: 1) How do I get a Hive Writable that gives me each field of each result row? Thank you, rui
Re: Who is the hive admin user?
Thanks a lot Owen Lefty. Sorry for the late reply, got a bit busy during the weekend. So... what I understand is: 1. To enable security in Hive you need minimum of Hive 0.10 and Kerberos enabled. 2. The best option is to protect the HDFS directories that the data is stored in. 3. The user roles in Hive are advisory only. Everyone is an admin so anyone can grant anyone additional permissions. But since permissions on the HDFS dir can't be changed by any user, the data will be secure. Will everyone being an admin be changed in a future version of Hive? Is it difficult to implement? I guess it will be more helpful if there was a single or a group of predefined admins. Thanks, Austin On Sat, May 11, 2013 at 12:50 AM, Lefty Leverenz le...@hortonworks.comwrote: Hive 0.10's metastore server security is documented in the revised Authorization wiki, which your link didn't go to because the name had been changed from auth to Authorization (sorry about that): https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Authorization . The jira is HIVE-3705 https://issues.apache.org/jira/browse/HIVE-3705. – Lefty On Fri, May 10, 2013 at 8:24 AM, Owen O'Malley omal...@apache.org wrote: Unfortunately, the roles in Hive are advisory only. Effectively everyone is an admin who can grant anyone (including themselves) additional permissions. If you need security, the best option is to protect the HDFS directories that the data is stored in. Set the HDFS owner, group, and permissions so that the users have read/write permission as desired. Don't forget to set things at both the database directory and table directory levels. Then you need to configure hive.security.metastore.authorization.manager with org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider, which uses HDFS permissions to control access to databases and tables. By setting it up this way, the HDFS permissions will be enforced by the NameNode and keep each group from reading each others data. The metastore will use the same HDFS permissions to prevent access to the other groups databases' metadata. Of course, all of this assumes you have Kerberos turned on for your Hadoop cluster. It also requires a minimum of Hive 0.10. Hope it helps, Owen On Fri, May 10, 2013 at 1:19 AM, Austin Chungath austi...@gmail.comwrote: Hi all, This seems silly but I couldn't get any reliable information even after a few minutes of googling. If I am creating user roles and groups in hive, which user should I be doing it with? Is there some configuration in hive-site.xml which sets a user as hive admin? Right now I see that any user can create user roles and groups or am I missing something? I am using the following link as reference https://cwiki.apache.org/Hive/languagemanual-auth.html Thanks, Austin
Re: Trying to write a custom HiveOutputFormat
You may need to implement a custom SerDe. Regards, Ramki On May 13, 2013, at 5:14 AM, Rui Martins ruibmart...@gmail.com wrote: Hi guys, I'm currently writing my on HiveOutputFormat as I would like to write the output of hive queries into a specific protobuf format my team is using. I have managed to do this however, the Writable object I get from Hive as a result of a SELECT query is of type Text. This means that I have to split the string to find my fields but that's very error prone, specially if some fields are strings that may contain spaces. My question is: 1) How do I get a Hive Writable that gives me each field of each result row? Thank you, rui
回复: different outer join plan between hive 0.9 and hive 0.10
This time i cherry-pick HIVE-3464, HIVE-4212, HIVE-4206 and some related commits and the above explain result matches in hive 0.9 and hive 0.10, thanks! But I confuse about this error msg: JOINNODE_OUTERJOIN_MORETHAN_16(10142, Single join node containing outer join(s) + cannot have more than 16 aliases), does this mean in hive0.10 when we have more than 16 outer join the query plan will still have some bug? I test the sql below and find the explain result still diff between hive 0.9 and hive 0.10. explain select sum(a.value) val from default.test_join a left outer join default.test_join b on a.key = b.key left outer join default.test_join c on a.key = c.key left outer join default.test_join d on a.key = d.key left outer join default.test_join e on a.key = e.key left outer join default.test_join f on a.key = f.key left outer join default.test_join g on a.key = g.key left outer join default.test_join h on a.key = h.key left outer join default.test_join i on a.key = i.key left outer join default.test_join j on a.key = j.key left outer join default.test_join k on a.key = k.key left outer join default.test_join l on a.key = l.key left outer join default.test_join m on a.key = m.key left outer join default.test_join n on a.key = n.key left outer join default.test_join u on a.key = u.key left outer join default.test_join v on a.key = v.key left outer join default.test_join w on a.key = w.key left outer join default.test_join x on a.key = x.key left outer join default.test_join z on a.key = z.key -- wzc1989 已使用 Sparrow (http://www.sparrowmailapp.com/?sig) 在 2013年3月29日星期五,上午9:34,Navis류승우 写道: The problem is mixture of issues (HIVE-3411, HIVE-4209, HIVE-4212, HIVE-3464) and still not completely fixed even in trunk. Will be fixed shortly. 2013/3/29 wzc wzc1...@gmail.com (mailto:wzc1...@gmail.com): The bug remains even if I apply the patch in HIVE-4206 :( The explain result hasn't change. 2013/3/28 Navis류승우 navis@nexr.com (mailto:navis@nexr.com) It's a bug (https://issues.apache.org/jira/browse/HIVE-4206). Thanks for reporting it. 2013/3/24 wzc wzc1...@gmail.com (mailto:wzc1...@gmail.com): Recently we tried to upgrade our hive from 0.9 to 0.10, but found some of our hive queries almost 7 times slow. One of such query consists multiple table outer join on the same key. By looking into the query, we found the query plans generate by hive 0.9 and hive 0.10 are different. Here is the example: testcase: use default; create table test_join ( `key` string, `value` string ); explain select sum(a.value) val from default.test_join a left outer join default.test_join b on a.key = b.key left outer join default.test_join c on a.key = c.key left outer join default.test_join d on a.key = d.key left outer join default.test_join e on a.key = e.key left outer join default.test_join f on a.key = f.key left outer join default.test_join g on a.key = g.key the explain of hive 0.9: STAGE DEPENDENCIES: Stage-1 is a root stage Stage-2 depends on stages: Stage-1 Stage-0 is a root stage ... Reduce Operator Tree: Join Operator condition map: Left Outer Join0 to 1 Left Outer Join0 to 2 Left Outer Join0 to 3 Left Outer Join0 to 4 Left Outer Join0 to 5 Left Outer Join0 to 6 condition expressions: 0 {VALUE._col1} 1 2 3 4 5 6 .. while the explain of hive 0.10: STAGE DEPENDENCIES: Stage-6 is a root stage Stage-1 depends on stages: Stage-6 Stage-2 depends on stages: Stage-1 Stage-0 is a root stage ... Reduce Operator Tree: Join Operator condition map: Left Outer Join0 to 1 Left Outer Join0 to 2 condition expressions: 0 {VALUE._col0} {VALUE._col1} 1 2 ... Reduce Operator Tree: Join Operator condition map: Left Outer Join0 to 1 Left Outer Join0 to 2 Left Outer Join0 to 3 Left Outer Join0 to 4 condition expressions: 0 {VALUE._col9} 1 2 3 4 It seems like hive 0.9 use only one stage/job to process all outer joins but hive 0.10 split them into two stage. When running such kind of query on hive0.10 in production, in the second stage of outer join process, some reducer stucks. I can't find any param to change the query plain , can anyone give me some hint? Thanks!
HADOOP and Query Capabilities
Had a quick question wrt to querying HADOOP data; 1. What tools are available to Query Hadoop data in real time vs batch? 2. I believe HIVE provides a batch interface, not sure on what tools within HIVE support the query capabilities against HADOOP ? 3. Besides HIVE, are there any other Query tools to query HADOOP data (ad-hoc queries) ? 4. Finally, what skill set is required to use HIVE or other alternate tools ? Can business users uses these tools? Thanks for your insight. Regards, -nalin
Re: Trying to write a custom HiveOutputFormat
Cool, thanks for the clarification guys. I'll get on with the implementation of another SerDe. :) @Edward - Thanks for the link. I actually already have a protobuffer SerDe that I implemented that allows passing any protobuffer schema into it (as long as it's proto 2.4.1) and also has some other features like mimic'ing Maps that proto doesn't have. I'll reply back to this thread with my progress. Thanks guys, rui On Mon, May 13, 2013 at 2:54 PM, Edward Capriolo edlinuxg...@gmail.comwrote: You need to use a combination of output format and serde, this might allow you to do something like present struct objects to the input format rather then Text objects. You may want to take a look at the protobuf input format we use: https://github.com/edwardcapriolo/hive-protobuf/ You could reverse the logic here and design an output format. On Mon, May 13, 2013 at 8:14 AM, Rui Martins ruibmart...@gmail.comwrote: Hi guys, I'm currently writing my on HiveOutputFormat as I would like to write the output of hive queries into a specific protobuf format my team is using. I have managed to do this however, the Writable object I get from Hive as a result of a SELECT query is of type Text. This means that I have to split the string to find my fields but that's very error prone, specially if some fields are strings that may contain spaces. My question is: 1) How do I get a Hive Writable that gives me each field of each result row? Thank you, rui
Re: Trying to write a custom HiveOutputFormat
You could also look at the OrcSerde and how it works. https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSerde.java Basically, OrcSerde on serialize just wraps the row and object inspector in a fake writable. That is passed down to the OutputFormat. On deserialize it does the reverse and just passes back the object from the InputFormat. -- Owen On Mon, May 13, 2013 at 6:54 AM, Edward Capriolo edlinuxg...@gmail.comwrote: You need to use a combination of output format and serde, this might allow you to do something like present struct objects to the input format rather then Text objects. You may want to take a look at the protobuf input format we use: https://github.com/edwardcapriolo/hive-protobuf/ You could reverse the logic here and design an output format. On Mon, May 13, 2013 at 8:14 AM, Rui Martins ruibmart...@gmail.comwrote: Hi guys, I'm currently writing my on HiveOutputFormat as I would like to write the output of hive queries into a specific protobuf format my team is using. I have managed to do this however, the Writable object I get from Hive as a result of a SELECT query is of type Text. This means that I have to split the string to find my fields but that's very error prone, specially if some fields are strings that may contain spaces. My question is: 1) How do I get a Hive Writable that gives me each field of each result row? Thank you, rui
Re: HADOOP and Query Capabilities
Sorry for the typo in the 3rd answer. I meant, question 1 covers this. Warm Regards, Tariq cloudfront.blogspot.com On Mon, May 13, 2013 at 10:13 PM, Mohammad Tariq donta...@gmail.com wrote: Hello Nalin, Please find my comments embedded below : 1. What tools are available to Query Hadoop data in real time vs batch? Real time : Cloudera Impala, Batch : Hive and Pig..You might wanna have a look at Apache Drill http://incubator.apache.org/projects/drill.htmlas well. 2. I believe HIVE provides a batch interface, not sure on what tools within HIVE support the query capabilities against HADOOP ? I'm sorry, I didn't quite get this one. 3. Besides HIVE, are there any other Query tools to query HADOOP data (ad-hoc queries) ? I think question covers this one. 4. Finally, what skill set is required to use HIVE or other alternate tools ? Can business users uses these tools? I think it should be pretty easy for business folks. It would be helpful if you have some SQL knowledge. Warm Regards, Tariq cloudfront.blogspot.com On Mon, May 13, 2013 at 10:04 PM, Nalin Khosla nalin.kho...@rogers.comwrote: Had a quick question wrt to querying HADOOP data; 1. What tools are available to Query Hadoop data in real time vs batch? 2. I believe HIVE provides a batch interface, not sure on what tools within HIVE support the query capabilities against HADOOP ? 3. Besides HIVE, are there any other Query tools to query HADOOP data (ad-hoc queries) ? 4. Finally, what skill set is required to use HIVE or other alternate tools ? Can business users uses these tools? Thanks for your insight. Regards, -nalin
Re: 回复: different outer join plan between hive 0.9 and hive 0.10
The error message means hive cannot merge joins of 16+ aliases with outer join into single stage. It was 8 way originally (HIVE-3411) but expanded to 16 later. Check https://issues.apache.org/jira/browse/HIVE-3411 for details. 2013/5/14 wzc1989 wzc1...@gmail.com: This time i cherry-pick HIVE-3464, HIVE-4212, HIVE-4206 and some related commits and the above explain result matches in hive 0.9 and hive 0.10, thanks! But I confuse about this error msg: JOINNODE_OUTERJOIN_MORETHAN_16(10142, Single join node containing outer join(s) + cannot have more than 16 aliases), does this mean in hive0.10 when we have more than 16 outer join the query plan will still have some bug? I test the sql below and find the explain result still diff between hive 0.9 and hive 0.10. explain select sum(a.value) val from default.test_join a left outer join default.test_join b on a.key = b.key left outer join default.test_join c on a.key = c.key left outer join default.test_join d on a.key = d.key left outer join default.test_join e on a.key = e.key left outer join default.test_join f on a.key = f.key left outer join default.test_join g on a.key = g.key left outer join default.test_join h on a.key = h.key left outer join default.test_join i on a.key = i.key left outer join default.test_join j on a.key = j.key left outer join default.test_join k on a.key = k.key left outer join default.test_join l on a.key = l.key left outer join default.test_join m on a.key = m.key left outer join default.test_join n on a.key = n.key left outer join default.test_join u on a.key = u.key left outer join default.test_join v on a.key = v.key left outer join default.test_join w on a.key = w.key left outer join default.test_join x on a.key = x.key left outer join default.test_join z on a.key = z.key -- wzc1989 已使用 Sparrow 在 2013年3月29日星期五,上午9:34,Navis류승우 写道: The problem is mixture of issues (HIVE-3411, HIVE-4209, HIVE-4212, HIVE-3464) and still not completely fixed even in trunk. Will be fixed shortly. 2013/3/29 wzc wzc1...@gmail.com: The bug remains even if I apply the patch in HIVE-4206 :( The explain result hasn't change. 2013/3/28 Navis류승우 navis@nexr.com It's a bug (https://issues.apache.org/jira/browse/HIVE-4206). Thanks for reporting it. 2013/3/24 wzc wzc1...@gmail.com: Recently we tried to upgrade our hive from 0.9 to 0.10, but found some of our hive queries almost 7 times slow. One of such query consists multiple table outer join on the same key. By looking into the query, we found the query plans generate by hive 0.9 and hive 0.10 are different. Here is the example: testcase: use default; create table test_join ( `key` string, `value` string ); explain select sum(a.value) val from default.test_join a left outer join default.test_join b on a.key = b.key left outer join default.test_join c on a.key = c.key left outer join default.test_join d on a.key = d.key left outer join default.test_join e on a.key = e.key left outer join default.test_join f on a.key = f.key left outer join default.test_join g on a.key = g.key the explain of hive 0.9: STAGE DEPENDENCIES: Stage-1 is a root stage Stage-2 depends on stages: Stage-1 Stage-0 is a root stage ... Reduce Operator Tree: Join Operator condition map: Left Outer Join0 to 1 Left Outer Join0 to 2 Left Outer Join0 to 3 Left Outer Join0 to 4 Left Outer Join0 to 5 Left Outer Join0 to 6 condition expressions: 0 {VALUE._col1} 1 2 3 4 5 6 .. while the explain of hive 0.10: STAGE DEPENDENCIES: Stage-6 is a root stage Stage-1 depends on stages: Stage-6 Stage-2 depends on stages: Stage-1 Stage-0 is a root stage ... Reduce Operator Tree: Join Operator condition map: Left Outer Join0 to 1 Left Outer Join0 to 2 condition expressions: 0 {VALUE._col0} {VALUE._col1} 1 2 ... Reduce Operator Tree: Join Operator condition map: Left Outer Join0 to 1 Left Outer Join0 to 2 Left Outer Join0 to 3 Left Outer Join0 to 4 condition expressions: 0 {VALUE._col9} 1 2 3 4 It seems like hive 0.9 use only one stage/job to process all outer joins but hive 0.10 split them into two stage. When running such kind of query on hive0.10 in production, in the second stage of outer join process, some reducer stucks. I can't find any param to change the query plain , can anyone give me some hint? Thanks!
回复: different outer join plan between hive 0.9 and hive 0.10
hive cannot merge joins of 16+ aliases with outer join into single stage. In our use case we use one table full outer join all other table to produce one big table, which may exceed 16 outer join limits and will be split into multi stage under hive 0.10. It become very slow under hive 0.10 while we run such query well under hive 0.9. I believe it's due to the diff of query plan. I wonder why hive 0.10 cannot merge join 16+ aliases into single stage while hive 0.9 doesn't have such issue. could you explain this or give me some hint? Thanks! -- wzc1989 已使用 Sparrow (http://www.sparrowmailapp.com/?sig) 在 2013年5月14日星期二,下午12:26,Navis류승우 写道: The error message means hive cannot merge joins of 16+ aliases with outer join into single stage. It was 8 way originally (HIVE-3411) but expanded to 16 later. Check https://issues.apache.org/jira/browse/HIVE-3411 for details. 2013/5/14 wzc1989 wzc1...@gmail.com (mailto:wzc1...@gmail.com): This time i cherry-pick HIVE-3464, HIVE-4212, HIVE-4206 and some related commits and the above explain result matches in hive 0.9 and hive 0.10, thanks! But I confuse about this error msg: JOINNODE_OUTERJOIN_MORETHAN_16(10142, Single join node containing outer join(s) + cannot have more than 16 aliases), does this mean in hive0.10 when we have more than 16 outer join the query plan will still have some bug? I test the sql below and find the explain result still diff between hive 0.9 and hive 0.10. explain select sum(a.value) val from default.test_join a left outer join default.test_join b on a.key = b.key left outer join default.test_join c on a.key = c.key left outer join default.test_join d on a.key = d.key left outer join default.test_join e on a.key = e.key left outer join default.test_join f on a.key = f.key left outer join default.test_join g on a.key = g.key left outer join default.test_join h on a.key = h.key left outer join default.test_join i on a.key = i.key left outer join default.test_join j on a.key = j.key left outer join default.test_join k on a.key = k.key left outer join default.test_join l on a.key = l.key left outer join default.test_join m on a.key = m.key left outer join default.test_join n on a.key = n.key left outer join default.test_join u on a.key = u.key left outer join default.test_join v on a.key = v.key left outer join default.test_join w on a.key = w.key left outer join default.test_join x on a.key = x.key left outer join default.test_join z on a.key = z.key -- wzc1989 已使用 Sparrow 在 2013年3月29日星期五,上午9:34,Navis류승우 写道: The problem is mixture of issues (HIVE-3411, HIVE-4209, HIVE-4212, HIVE-3464) and still not completely fixed even in trunk. Will be fixed shortly. 2013/3/29 wzc wzc1...@gmail.com (mailto:wzc1...@gmail.com): The bug remains even if I apply the patch in HIVE-4206 :( The explain result hasn't change. 2013/3/28 Navis류승우 navis@nexr.com (mailto:navis@nexr.com) It's a bug (https://issues.apache.org/jira/browse/HIVE-4206). Thanks for reporting it. 2013/3/24 wzc wzc1...@gmail.com (mailto:wzc1...@gmail.com): Recently we tried to upgrade our hive from 0.9 to 0.10, but found some of our hive queries almost 7 times slow. One of such query consists multiple table outer join on the same key. By looking into the query, we found the query plans generate by hive 0.9 and hive 0.10 are different. Here is the example: testcase: use default; create table test_join ( `key` string, `value` string ); explain select sum(a.value) val from default.test_join a left outer join default.test_join b on a.key = b.key left outer join default.test_join c on a.key = c.key left outer join default.test_join d on a.key = d.key left outer join default.test_join e on a.key = e.key left outer join default.test_join f on a.key = f.key left outer join default.test_join g on a.key = g.key the explain of hive 0.9: STAGE DEPENDENCIES: Stage-1 is a root stage Stage-2 depends on stages: Stage-1 Stage-0 is a root stage ... Reduce Operator Tree: Join Operator condition map: Left Outer Join0 to 1 Left Outer Join0 to 2 Left Outer Join0 to 3 Left Outer Join0 to 4 Left Outer Join0 to 5 Left Outer Join0 to 6 condition expressions: 0 {VALUE._col1} 1 2 3 4 5 6 .. while the explain of hive 0.10: STAGE DEPENDENCIES: Stage-6 is a root stage Stage-1 depends on stages: Stage-6 Stage-2 depends on stages: Stage-1 Stage-0 is a root stage ... Reduce Operator Tree: Join Operator condition map: Left Outer Join0 to 1 Left Outer Join0 to 2 condition expressions: 0 {VALUE._col0} {VALUE._col1} 1 2 ... Reduce