RE: How to make the query compiler not determine the number of reducers?

Ryabin, Thomas Mon, 30 Apr 2012 10:27:40 -0700

Edward,

That is not working for me, I get a syntax error:


hive> select * from employees,stores;
FAILED: Parse Error: line 1:18 mismatched input ',' expecting EOF near
'employees'

-Thomas


-----Original Message-----
From: Edward Capriolo [mailto:[email protected]] 
Sent: Monday, April 30, 2012 1:08 PM
To: [email protected]
Subject: Re: How to make the query compiler not determine the number of
reducers?

You are trying to create a Cartesian product.

select * FROM table1,table2

should do that. You do not need a join clause.

On Mon, Apr 30, 2012 at 12:53 PM, Ryabin, Thomas
<[email protected]> wrote:
> The query I am executing is:
>
> select test_udf(name, store) from employees join stores;
>
>
>
> My goal for this query is to run every combination of employees.name
and
> stores.store through my test_udf, and have Hadoop spread the
computation
> among the reducers. So if I have 5 rows in the "stores" table and 3
rows in
> the "employees" table then there would be 15 combinations, and if I
had 3
> reducers then ideally each reducer would get 5 combinations.
>
>
>
> I created the tables with these commands:
>
> create external table employees(row_key string, name string)
>
> stored by 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'
>
> with serdeproperties ("cassandra.columns.mapping" = ":key,name",
>
> "cassandra.ks.name" = "test",
>
> "cassandra.cf.name" = "employees");
>
>
>
> create external table stores(row_key string, store string)
>
> stored by 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'
>
> with serdeproperties ("cassandra.columns.mapping" = ":key,store",
>
> "cassandra.ks.name" = "test",
>
> "cassandra.cf.name" = "stores");
>
>
>
> I am using Cassandra as the storage mechanism. I have tried using the
ON
> operator with my query like so:
>
> select test_udf(name, store) from employees join stores on
(employees.name =
> stores.store);
>
>
>
> and in this case Hive creates 3 reduce tasks, but nothing gets done
because
> there are no matching keys. Is there a way to accomplish what I am
trying to
> do by using "distribute by", "cluster by", and/or bucketed tables, or
> something else?
>
>
>
> Thanks,
>
> Thomas
>
>
>
>
>
> From: Bejoy KS [mailto:[email protected]]
> Sent: Monday, April 30, 2012 10:15 AM
> To: [email protected]
> Subject: Re: How to make the query compiler not determine the number
of
> reducers?
>
>
>
> Thomas,
>
> It needn't be the case, raising your map tasks may not have any effect
on
> reduce tasks. May be we can help you out if you could provide some
details
> like :
> - the query you are executing
> - describe formatted on the tables involved in query
>
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
>
> ________________________________
>
> From: "Ryabin, Thomas" <[email protected]>
>
> Date: Mon, 30 Apr 2012 10:06:01 -0400
>
> To: <[email protected]>
>
> ReplyTo: [email protected]
>
> Subject: RE: How to make the query compiler not determine the number
of
> reducers?
>
>
>
> I tried using this to set the number of reduce tasks to 2, but it
doesn't
> work for me. In my case the Hive query always creates 8 map tasks and
1
> reduce task. Could the number of reduce tasks be limited by the number
of
> map tasks, so that if I wanted 2 reduce tasks I would need to increase
the
> number of map tasks to 16 in my case?
>
>
>
> -Thomas
>
>
>
> From: Bejoy KS [mailto:[email protected]]
> Sent: Saturday, April 28, 2012 1:43 AM
> To: [email protected]
> Subject: Re: How to make the query compiler not determine the number
of
> reducers?
>
>
>
> Hi Thomas
> Hive automatically sets the number of reducers for you. But you can
easily
> override them at CLI. Before executing your query
> hive>SET mapred.reduce.tasks=n;
>
> Where n is the required num of reducers.
>
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
>
> ________________________________
>
> From: "Ryabin, Thomas" <[email protected]>
>
> Date: Fri, 27 Apr 2012 16:48:25 -0400
>
> To: <[email protected]>
>
> ReplyTo: [email protected]
>
> Subject: How to make the query compiler not determine the number of
> reducers?
>
>
>
> Hi,
>
>
>
> When I run a query that uses a custom UDF I made, one of the lines it
prints
> out is:
>
> Number of reduce tasks determined at compile time: 1
>
>
>
> And this causes the MapReduce job to have only 1 reducer. Is there a
way to
> make it so the compiler does not determine the number of reduce tasks
to
> create, so I can specify the number myself?
>
>
>
> The query in question is:
>
> select test_udf(name, store) from employees join stores;
>
>
>
> Thanks,
>
> Thomas

RE: How to make the query compiler not determine the number of reducers?

Reply via email to