Re: select failed in Hive. java.io.FileNotFoundException

2013-05-13 Thread shashwat shriparv
Set the hive class path correctly.

*Thanks  Regards*

∞
Shashwat Shriparv



On Mon, May 13, 2013 at 1:10 PM, gautier vivard gaut.viv...@gmail.comwrote:

 Hey,

 I'm beginner in the hadoop world, and more specifically in Hive (I work on
 hive-0.9.0 on Ubuntu). I work on MapR M3 in standalone mode. I'm training
 on hive and have some issues with one query :

  SELECT year FROM records;
 Automatically selecting local only mode for query
 Total MapReduce jobs = 1
 Launching Job 1 out of 1
 Num 'java.io.FileNotFoundExceptionber of reduce tasks is set to 0 since 
 there's no reduce operator
 Execution log at: 
 /tmp/mapr/mapr_20130506044545_e2ba972e-84a5-4e68-a33b-580be4a8d866.log
 java.io.FileNotFoundException: 
 /var/mapr/cluster/mapred/jobTracker/staging/mapr406767829/.staging ()
 at org.apache.hadoop.io.nativeio.NativeIO.chmod(Native Method)
 at 
 org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:518)
 at 
 org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:330)
 at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189)
 at 
 org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:126)
 at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:875)
 at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:869)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126)
 at 
 org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:869)
 at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:843)
 at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:435)
 at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:693)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
 Job Submission failed with exception 
 'java.io.FileNotFoundException(/var/mapr/cluster/mapred/jobTracker/staging/mapr406767829/.staging
  ())'
 Execution failed with exit status: 2
 Obtaining error information

 Task failed!
 Task ID:
   Stage-1

 Logs:

 /tmp/mapr/hive.log
 FAILED: Execution Error, return code 2 from 
 org.apache.hadoop.hive.ql.exec.MapRedTask

 But when I tried :

  select * from records;

 It works fine. I read that's because there is no reduce tasks in this
 query.
 Any idea?

 Thank's

 Gautier



Re: select failed in Hive. java.io.FileNotFoundException

2013-05-13 Thread gautier vivard
Thank's for your response.
I'm really a beginner, so I don't really understand what you mean by :
correctly


2013/5/13 shashwat shriparv dwivedishash...@gmail.com

 Set the hive class path correctly.

 *Thanks  Regards*

 ∞
 Shashwat Shriparv



 On Mon, May 13, 2013 at 1:10 PM, gautier vivard gaut.viv...@gmail.comwrote:

 Hey,

 I'm beginner in the hadoop world, and more specifically in Hive (I work
 on hive-0.9.0 on Ubuntu). I work on MapR M3 in standalone mode. I'm
 training on hive and have some issues with one query :

  SELECT year FROM records;
 Automatically selecting local only mode for query
 Total MapReduce jobs = 1
 Launching Job 1 out of 1
 Num 'java.io.FileNotFoundExceptionber of reduce tasks is set to 0 since 
 there's no reduce operator
 Execution log at: 
 /tmp/mapr/mapr_20130506044545_e2ba972e-84a5-4e68-a33b-580be4a8d866.log
 java.io.FileNotFoundException: 
 /var/mapr/cluster/mapred/jobTracker/staging/mapr406767829/.staging ()
 at org.apache.hadoop.io.nativeio.NativeIO.chmod(Native Method)
 at 
 org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:518)
 at 
 org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:330)
 at 
 org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189)
 at 
 org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:126)
 at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:875)
 at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:869)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126)
 at 
 org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:869)
 at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:843)
 at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:435)
 at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:693)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
 Job Submission failed with exception 
 'java.io.FileNotFoundException(/var/mapr/cluster/mapred/jobTracker/staging/mapr406767829/.staging
  ())'
 Execution failed with exit status: 2
 Obtaining error information

 Task failed!
 Task ID:
   Stage-1

 Logs:

 /tmp/mapr/hive.log
 FAILED: Execution Error, return code 2 from 
 org.apache.hadoop.hive.ql.exec.MapRedTask

 But when I tried :

  select * from records;

 It works fine. I read that's because there is no reduce tasks in this
 query.
 Any idea?

 Thank's

 Gautier





Trying to write a custom HiveOutputFormat

2013-05-13 Thread Rui Martins
Hi guys,

I'm currently writing my on HiveOutputFormat as I would like to write the
output of hive queries into a specific protobuf format my team is using.
I have managed to do this however, the Writable object I get from Hive as a
result of a SELECT query is of type Text. This means that I have to split
the string to find my fields but that's very error prone, specially if some
fields are strings that may contain spaces.

My question is:
1) How do I get a Hive Writable that gives me each field of each result row?

Thank you,
rui


Re: Who is the hive admin user?

2013-05-13 Thread Austin Chungath
Thanks a lot Owen  Lefty. Sorry for the late reply, got a bit busy during
the weekend.

So... what I understand is:
1. To enable security in Hive you need minimum of Hive 0.10 and Kerberos
enabled.
2. The best option is to protect the HDFS directories that the data is
stored in.
3. The user roles in Hive are advisory only. Everyone is an admin so anyone
can grant anyone additional permissions. But since permissions on the HDFS
dir can't be changed by any user, the data will be secure.

Will everyone being an admin be changed in a future version of Hive? Is it
difficult to implement? I guess it will be more helpful if there was a
single or a group of predefined admins.

Thanks,
Austin




On Sat, May 11, 2013 at 12:50 AM, Lefty Leverenz le...@hortonworks.comwrote:

 Hive 0.10's metastore server security is documented in the revised
 Authorization wiki, which your link didn't go to because the name had been
 changed from auth to Authorization (sorry about that):
 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Authorization
 .

 The jira is HIVE-3705 https://issues.apache.org/jira/browse/HIVE-3705.

 – Lefty



 On Fri, May 10, 2013 at 8:24 AM, Owen O'Malley omal...@apache.org wrote:

 Unfortunately, the roles in Hive are advisory only. Effectively everyone
 is an admin who can grant anyone (including themselves) additional
 permissions.

 If you need security, the best option is to protect the HDFS directories
 that the data is stored in. Set the HDFS owner, group, and permissions so
 that the users have read/write permission as desired. Don't forget to set
 things at both the database directory and table directory levels.

 Then you need to configure hive.security.metastore.authorization.manager
 with 
 org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider,
 which uses HDFS permissions to control access to databases and tables.

 By setting it up this way, the HDFS permissions will be enforced by the
 NameNode and keep each group from reading each others data. The metastore
 will use the same HDFS permissions to prevent access to the other groups
 databases' metadata.

 Of course, all of this assumes you have Kerberos turned on for your
 Hadoop cluster. It also requires a minimum of Hive 0.10.

 Hope it helps,
Owen


 On Fri, May 10, 2013 at 1:19 AM, Austin Chungath austi...@gmail.comwrote:

 Hi all,

 This seems silly but I couldn't get any reliable information even after
 a few minutes of googling.
 If I am creating user roles and groups in hive, which user should I be
 doing it with?

 Is there some configuration in hive-site.xml which sets a user as hive
 admin?

 Right now I see that any user can create user roles and groups or am I
 missing something?

 I am using the following link as reference
 https://cwiki.apache.org/Hive/languagemanual-auth.html

 Thanks,
 Austin






Re: Trying to write a custom HiveOutputFormat

2013-05-13 Thread Ramki Gmail
You may need to implement a custom SerDe.

Regards,
Ramki


On May 13, 2013, at 5:14 AM, Rui Martins ruibmart...@gmail.com wrote:

 Hi guys,
 
 I'm currently writing my on HiveOutputFormat as I would like to write the 
 output of hive queries into a specific protobuf format my team is using. 
 I have managed to do this however, the Writable object I get from Hive as a 
 result of a SELECT query is of type Text. This means that I have to split the 
 string to find my fields but that's very error prone, specially if some 
 fields are strings that may contain spaces.
 
 My question is:
 1) How do I get a Hive Writable that gives me each field of each result row?
 
 Thank you,
 rui


回复: different outer join plan between hive 0.9 and hive 0.10

2013-05-13 Thread wzc1989
This time i cherry-pick HIVE-3464, HIVE-4212, HIVE-4206 and some related 
commits and the above explain result matches in hive 0.9 and hive 0.10, thanks!
But I confuse about this error msg:

 JOINNODE_OUTERJOIN_MORETHAN_16(10142, Single join node containing outer 
join(s)  +
  cannot have more than 16 aliases),


does this mean in hive0.10 when we have more than 16 outer join the query plan 
will still have some bug?
I test the sql below and find the explain result still diff between hive 0.9 
and hive 0.10.  

explain select
sum(a.value) val
from default.test_join a
left outer join default.test_join b on a.key = b.key
left outer join default.test_join c on a.key = c.key
left outer join default.test_join d on a.key = d.key
left outer join default.test_join e on a.key = e.key
left outer join default.test_join f on a.key = f.key
left outer join default.test_join g on a.key = g.key
left outer join default.test_join h on a.key = h.key
left outer join default.test_join i on a.key = i.key
left outer join default.test_join j on a.key = j.key
left outer join default.test_join k on a.key = k.key
left outer join default.test_join l on a.key = l.key
left outer join default.test_join m on a.key = m.key
left outer join default.test_join n on a.key = n.key
left outer join default.test_join u on a.key = u.key
left outer join default.test_join v on a.key = v.key
left outer join default.test_join w on a.key = w.key
left outer join default.test_join x on a.key = x.key
left outer join default.test_join z on a.key = z.key



--  
wzc1989
已使用 Sparrow (http://www.sparrowmailapp.com/?sig)


在 2013年3月29日星期五,上午9:34,Navis류승우 写道:

 The problem is mixture of issues (HIVE-3411, HIVE-4209, HIVE-4212,
 HIVE-3464) and still not completely fixed even in trunk.
  
 Will be fixed shortly.
  
 2013/3/29 wzc wzc1...@gmail.com (mailto:wzc1...@gmail.com):
  The bug remains even if I apply the patch in HIVE-4206 :( The explain
  result hasn't change.
   
   
  2013/3/28 Navis류승우 navis@nexr.com (mailto:navis@nexr.com)

   It's a bug (https://issues.apache.org/jira/browse/HIVE-4206).

   Thanks for reporting it.

   2013/3/24 wzc wzc1...@gmail.com (mailto:wzc1...@gmail.com):
Recently we tried to upgrade our hive from 0.9 to 0.10, but found some
of
our hive queries almost 7 times slow. One of such query consists
multiple
table outer join on the same key. By looking into the query, we found
the
query plans generate by hive 0.9 and hive 0.10 are different. Here is
the
example:
 
testcase:
 
use default;
create table test_join (
`key` string,
`value` string
);
 
explain select
sum(a.value) val
from default.test_join a
left outer join default.test_join b on a.key = b.key
left outer join default.test_join c on a.key = c.key
left outer join default.test_join d on a.key = d.key
left outer join default.test_join e on a.key = e.key
left outer join default.test_join f on a.key = f.key
left outer join default.test_join g on a.key = g.key
 
 
the explain of hive 0.9:
 
STAGE DEPENDENCIES:
 
Stage-1 is a root stage
 
Stage-2 depends on stages: Stage-1
 
Stage-0 is a root stage
 
...
 
Reduce Operator Tree:
 
Join Operator
 
condition map:
 
Left Outer Join0 to 1
 
Left Outer Join0 to 2
 
Left Outer Join0 to 3
 
Left Outer Join0 to 4
 
Left Outer Join0 to 5
 
Left Outer Join0 to 6
 
condition expressions:
 
0 {VALUE._col1}
 
1
 
2
 
3
 
4
 
5
 
6
 
..
 
 
while the explain of hive 0.10:
 
STAGE DEPENDENCIES:
 
Stage-6 is a root stage
 
Stage-1 depends on stages: Stage-6
 
Stage-2 depends on stages: Stage-1
 
Stage-0 is a root stage
 
...
 
Reduce Operator Tree:
 
Join Operator
 
condition map:
 
Left Outer Join0 to 1
 
Left Outer Join0 to 2
 
condition expressions:
 
0 {VALUE._col0} {VALUE._col1}
 
1
 
2
 
...
 
Reduce Operator Tree:
 
Join Operator
 
condition map:
 
Left Outer Join0 to 1
 
Left Outer Join0 to 2
 
Left Outer Join0 to 3
 
Left Outer Join0 to 4
 
condition expressions:
 
0 {VALUE._col9}
 
1
 
2
 
3
 
4
 

 
 
It seems like hive 0.9 use only one stage/job to process all outer joins
but
hive 0.10 split them into two stage. When running such kind of query on
hive0.10 in production, in the second stage of outer join process, some
reducer stucks.
 
I can't find any param to change the query plain , can anyone give me
some
hint?
 
Thanks!  



HADOOP and Query Capabilities

2013-05-13 Thread Nalin Khosla
Had a quick question wrt to querying HADOOP data;

1. What tools are available to Query Hadoop data in real time vs batch? 

2. I believe HIVE provides a batch interface, not sure on what tools within 
HIVE support the query capabilities against HADOOP ?

3. Besides HIVE, are there any other Query tools to query HADOOP data (ad-hoc 
queries) ?

4. Finally, what skill set is required to use HIVE or other alternate tools ? 
Can business users uses these tools?

Thanks for your insight.

Regards,
-nalin

Re: Trying to write a custom HiveOutputFormat

2013-05-13 Thread Rui Martins
Cool, thanks for the clarification guys. I'll get on with the
implementation of another SerDe. :)

@Edward - Thanks for the link. I actually already have a protobuffer SerDe
that I implemented that allows passing any protobuffer schema into it (as
long as it's proto 2.4.1) and also has some other features like mimic'ing
Maps that proto doesn't have.

I'll reply back to this thread with my progress.

Thanks guys,
rui


On Mon, May 13, 2013 at 2:54 PM, Edward Capriolo edlinuxg...@gmail.comwrote:

 You need to use a combination of output format and serde, this might allow
 you to do something like present struct objects to the input format rather
 then Text objects.

 You may want to take a look at the protobuf input format we use:
 https://github.com/edwardcapriolo/hive-protobuf/

 You could reverse the logic here and design an output format.


 On Mon, May 13, 2013 at 8:14 AM, Rui Martins ruibmart...@gmail.comwrote:

 Hi guys,

 I'm currently writing my on HiveOutputFormat as I would like to write the
 output of hive queries into a specific protobuf format my team is using.
 I have managed to do this however, the Writable object I get from Hive as
 a result of a SELECT query is of type Text. This means that I have to split
 the string to find my fields but that's very error prone, specially if some
 fields are strings that may contain spaces.

 My question is:
 1) How do I get a Hive Writable that gives me each field of each result
 row?

 Thank you,
 rui





Re: Trying to write a custom HiveOutputFormat

2013-05-13 Thread Owen O'Malley
You could also look at the OrcSerde and how it works.

https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSerde.java

Basically, OrcSerde on serialize just wraps the row and object inspector
in a fake writable. That is passed down to the OutputFormat. On
deserialize it does the reverse and just passes back the object from the
InputFormat.

-- Owen


On Mon, May 13, 2013 at 6:54 AM, Edward Capriolo edlinuxg...@gmail.comwrote:

 You need to use a combination of output format and serde, this might allow
 you to do something like present struct objects to the input format rather
 then Text objects.

 You may want to take a look at the protobuf input format we use:
 https://github.com/edwardcapriolo/hive-protobuf/

 You could reverse the logic here and design an output format.


 On Mon, May 13, 2013 at 8:14 AM, Rui Martins ruibmart...@gmail.comwrote:

 Hi guys,

 I'm currently writing my on HiveOutputFormat as I would like to write the
 output of hive queries into a specific protobuf format my team is using.
 I have managed to do this however, the Writable object I get from Hive as
 a result of a SELECT query is of type Text. This means that I have to split
 the string to find my fields but that's very error prone, specially if some
 fields are strings that may contain spaces.

 My question is:
 1) How do I get a Hive Writable that gives me each field of each result
 row?

 Thank you,
 rui





Re: HADOOP and Query Capabilities

2013-05-13 Thread Mohammad Tariq
Sorry for the typo in the 3rd answer. I meant, question 1 covers this.

Warm Regards,
Tariq
cloudfront.blogspot.com


On Mon, May 13, 2013 at 10:13 PM, Mohammad Tariq donta...@gmail.com wrote:

 Hello Nalin,

  Please find my comments embedded below :

 1. What tools are available to Query Hadoop data in real time vs batch?
  Real time : Cloudera Impala, Batch : Hive and Pig..You might wanna have
 a look at Apache Drill http://incubator.apache.org/projects/drill.htmlas 
 well.

 2. I believe HIVE provides a batch interface, not sure on what tools
 within HIVE support the query capabilities against HADOOP ?
  I'm sorry, I didn't quite get this one.

 3. Besides HIVE, are there any other Query tools to query HADOOP data
 (ad-hoc queries) ?
 I think question covers this one.

 4. Finally, what skill set is required to use HIVE or other alternate
 tools ? Can business users uses these tools?
 I think it should be pretty easy for business folks. It would be helpful
 if you have some SQL knowledge.

 Warm Regards,
 Tariq
 cloudfront.blogspot.com


 On Mon, May 13, 2013 at 10:04 PM, Nalin Khosla nalin.kho...@rogers.comwrote:

  Had a quick question wrt to querying HADOOP data;

 1. What tools are available to Query Hadoop data in real time vs batch?

 2. I believe HIVE provides a batch interface, not sure on what tools
 within HIVE support the query capabilities against HADOOP ?

 3. Besides HIVE, are there any other Query tools to query HADOOP data
 (ad-hoc queries) ?

 4. Finally, what skill set is required to use HIVE or other alternate
 tools ? Can business users uses these tools?

 Thanks for your insight.

 Regards,
 -nalin








Re: 回复: different outer join plan between hive 0.9 and hive 0.10

2013-05-13 Thread Navis류승우
The error message means hive cannot merge joins of 16+ aliases with
outer join  into single stage. It was 8 way originally (HIVE-3411) but
expanded to 16 later.

Check https://issues.apache.org/jira/browse/HIVE-3411 for details.

2013/5/14 wzc1989 wzc1...@gmail.com:
 This time i cherry-pick HIVE-3464, HIVE-4212, HIVE-4206 and some related
 commits and the above explain result matches in hive 0.9 and hive 0.10,
 thanks!
 But I confuse about this error msg:

  JOINNODE_OUTERJOIN_MORETHAN_16(10142, Single join node containing outer
 join(s)  +
   cannot have more than 16 aliases),

 does this mean in hive0.10 when we have more than 16 outer join the query
 plan will still have some bug?
 I test the sql below and find the explain result still diff between hive 0.9
 and hive 0.10.

 explain select
 sum(a.value) val
 from default.test_join a
 left outer join default.test_join b on a.key = b.key
 left outer join default.test_join c on a.key = c.key
 left outer join default.test_join d on a.key = d.key
 left outer join default.test_join e on a.key = e.key
 left outer join default.test_join f on a.key = f.key
 left outer join default.test_join g on a.key = g.key
 left outer join default.test_join h on a.key = h.key
 left outer join default.test_join i on a.key = i.key
 left outer join default.test_join j on a.key = j.key
 left outer join default.test_join k on a.key = k.key
 left outer join default.test_join l on a.key = l.key
 left outer join default.test_join m on a.key = m.key
 left outer join default.test_join n on a.key = n.key
 left outer join default.test_join u on a.key = u.key
 left outer join default.test_join v on a.key = v.key
 left outer join default.test_join w on a.key = w.key
 left outer join default.test_join x on a.key = x.key
 left outer join default.test_join z on a.key = z.key


 --
 wzc1989
 已使用 Sparrow

 在 2013年3月29日星期五,上午9:34,Navis류승우 写道:

 The problem is mixture of issues (HIVE-3411, HIVE-4209, HIVE-4212,
 HIVE-3464) and still not completely fixed even in trunk.

 Will be fixed shortly.

 2013/3/29 wzc wzc1...@gmail.com:

 The bug remains even if I apply the patch in HIVE-4206 :( The explain
 result hasn't change.


 2013/3/28 Navis류승우 navis@nexr.com


 It's a bug (https://issues.apache.org/jira/browse/HIVE-4206).

 Thanks for reporting it.

 2013/3/24 wzc wzc1...@gmail.com:

 Recently we tried to upgrade our hive from 0.9 to 0.10, but found some
 of
 our hive queries almost 7 times slow. One of such query consists
 multiple
 table outer join on the same key. By looking into the query, we found
 the
 query plans generate by hive 0.9 and hive 0.10 are different. Here is
 the
 example:

 testcase:

 use default;
 create table test_join (
 `key` string,
 `value` string
 );

 explain select
 sum(a.value) val
 from default.test_join a
 left outer join default.test_join b on a.key = b.key
 left outer join default.test_join c on a.key = c.key
 left outer join default.test_join d on a.key = d.key
 left outer join default.test_join e on a.key = e.key
 left outer join default.test_join f on a.key = f.key
 left outer join default.test_join g on a.key = g.key


 the explain of hive 0.9:

 STAGE DEPENDENCIES:

 Stage-1 is a root stage

 Stage-2 depends on stages: Stage-1

 Stage-0 is a root stage

 ...

 Reduce Operator Tree:

 Join Operator

 condition map:

 Left Outer Join0 to 1

 Left Outer Join0 to 2

 Left Outer Join0 to 3

 Left Outer Join0 to 4

 Left Outer Join0 to 5

 Left Outer Join0 to 6

 condition expressions:

 0 {VALUE._col1}

 1

 2

 3

 4

 5

 6

 ..


 while the explain of hive 0.10:

 STAGE DEPENDENCIES:

 Stage-6 is a root stage

 Stage-1 depends on stages: Stage-6

 Stage-2 depends on stages: Stage-1

 Stage-0 is a root stage

 ...

 Reduce Operator Tree:

 Join Operator

 condition map:

 Left Outer Join0 to 1

 Left Outer Join0 to 2

 condition expressions:

 0 {VALUE._col0} {VALUE._col1}

 1

 2

 ...

 Reduce Operator Tree:

 Join Operator

 condition map:

 Left Outer Join0 to 1

 Left Outer Join0 to 2

 Left Outer Join0 to 3

 Left Outer Join0 to 4

 condition expressions:

 0 {VALUE._col9}

 1

 2

 3

 4

 


 It seems like hive 0.9 use only one stage/job to process all outer joins
 but
 hive 0.10 split them into two stage. When running such kind of query on
 hive0.10 in production, in the second stage of outer join process, some
 reducer stucks.

 I can't find any param to change the query plain , can anyone give me
 some
 hint?

 Thanks!




回复: different outer join plan between hive 0.9 and hive 0.10

2013-05-13 Thread wzc1989
hive cannot merge joins of 16+ aliases with outer join into single stage.
In our use case we use one table full outer join all other table to produce one 
big table, which may exceed 16 outer join limits and will be split into multi 
stage under hive 0.10.
It become very slow under hive 0.10 while we run such query well under hive 0.9.
I believe it's due to the diff of query plan. I wonder why hive 0.10 cannot 
merge join 16+ aliases into single stage while hive 0.9 doesn't have such 
issue. could you explain this or give me some hint?

Thanks!  

--  
wzc1989
已使用 Sparrow (http://www.sparrowmailapp.com/?sig)


在 2013年5月14日星期二,下午12:26,Navis류승우 写道:

 The error message means hive cannot merge joins of 16+ aliases with
 outer join into single stage. It was 8 way originally (HIVE-3411) but
 expanded to 16 later.
  
 Check https://issues.apache.org/jira/browse/HIVE-3411 for details.
  
 2013/5/14 wzc1989 wzc1...@gmail.com (mailto:wzc1...@gmail.com):
  This time i cherry-pick HIVE-3464, HIVE-4212, HIVE-4206 and some related
  commits and the above explain result matches in hive 0.9 and hive 0.10,
  thanks!
  But I confuse about this error msg:
   
  JOINNODE_OUTERJOIN_MORETHAN_16(10142, Single join node containing outer
  join(s)  +
  cannot have more than 16 aliases),
   
  does this mean in hive0.10 when we have more than 16 outer join the query
  plan will still have some bug?
  I test the sql below and find the explain result still diff between hive 0.9
  and hive 0.10.
   
  explain select
  sum(a.value) val
  from default.test_join a
  left outer join default.test_join b on a.key = b.key
  left outer join default.test_join c on a.key = c.key
  left outer join default.test_join d on a.key = d.key
  left outer join default.test_join e on a.key = e.key
  left outer join default.test_join f on a.key = f.key
  left outer join default.test_join g on a.key = g.key
  left outer join default.test_join h on a.key = h.key
  left outer join default.test_join i on a.key = i.key
  left outer join default.test_join j on a.key = j.key
  left outer join default.test_join k on a.key = k.key
  left outer join default.test_join l on a.key = l.key
  left outer join default.test_join m on a.key = m.key
  left outer join default.test_join n on a.key = n.key
  left outer join default.test_join u on a.key = u.key
  left outer join default.test_join v on a.key = v.key
  left outer join default.test_join w on a.key = w.key
  left outer join default.test_join x on a.key = x.key
  left outer join default.test_join z on a.key = z.key
   
   
  --
  wzc1989
  已使用 Sparrow
   
  在 2013年3月29日星期五,上午9:34,Navis류승우 写道:
   
  The problem is mixture of issues (HIVE-3411, HIVE-4209, HIVE-4212,
  HIVE-3464) and still not completely fixed even in trunk.
   
  Will be fixed shortly.
   
  2013/3/29 wzc wzc1...@gmail.com (mailto:wzc1...@gmail.com):
   
  The bug remains even if I apply the patch in HIVE-4206 :( The explain
  result hasn't change.
   
   
  2013/3/28 Navis류승우 navis@nexr.com (mailto:navis@nexr.com)
   
   
  It's a bug (https://issues.apache.org/jira/browse/HIVE-4206).
   
  Thanks for reporting it.
   
  2013/3/24 wzc wzc1...@gmail.com (mailto:wzc1...@gmail.com):
   
  Recently we tried to upgrade our hive from 0.9 to 0.10, but found some
  of
  our hive queries almost 7 times slow. One of such query consists
  multiple
  table outer join on the same key. By looking into the query, we found
  the
  query plans generate by hive 0.9 and hive 0.10 are different. Here is
  the
  example:
   
  testcase:
   
  use default;
  create table test_join (
  `key` string,
  `value` string
  );
   
  explain select
  sum(a.value) val
  from default.test_join a
  left outer join default.test_join b on a.key = b.key
  left outer join default.test_join c on a.key = c.key
  left outer join default.test_join d on a.key = d.key
  left outer join default.test_join e on a.key = e.key
  left outer join default.test_join f on a.key = f.key
  left outer join default.test_join g on a.key = g.key
   
   
  the explain of hive 0.9:
   
  STAGE DEPENDENCIES:
   
  Stage-1 is a root stage
   
  Stage-2 depends on stages: Stage-1
   
  Stage-0 is a root stage
   
  ...
   
  Reduce Operator Tree:
   
  Join Operator
   
  condition map:
   
  Left Outer Join0 to 1
   
  Left Outer Join0 to 2
   
  Left Outer Join0 to 3
   
  Left Outer Join0 to 4
   
  Left Outer Join0 to 5
   
  Left Outer Join0 to 6
   
  condition expressions:
   
  0 {VALUE._col1}
   
  1
   
  2
   
  3
   
  4
   
  5
   
  6
   
  ..
   
   
  while the explain of hive 0.10:
   
  STAGE DEPENDENCIES:
   
  Stage-6 is a root stage
   
  Stage-1 depends on stages: Stage-6
   
  Stage-2 depends on stages: Stage-1
   
  Stage-0 is a root stage
   
  ...
   
  Reduce Operator Tree:
   
  Join Operator
   
  condition map:
   
  Left Outer Join0 to 1
   
  Left Outer Join0 to 2
   
  condition expressions:
   
  0 {VALUE._col0} {VALUE._col1}
   
  1
   
  2
   
  ...
   
  Reduce