Edward, That is not working for me, I get a syntax error:
hive> select * from employees,stores; FAILED: Parse Error: line 1:18 mismatched input ',' expecting EOF near 'employees' -Thomas -----Original Message----- From: Edward Capriolo [mailto:[email protected]] Sent: Monday, April 30, 2012 1:08 PM To: [email protected] Subject: Re: How to make the query compiler not determine the number of reducers? You are trying to create a Cartesian product. select * FROM table1,table2 should do that. You do not need a join clause. On Mon, Apr 30, 2012 at 12:53 PM, Ryabin, Thomas <[email protected]> wrote: > The query I am executing is: > > select test_udf(name, store) from employees join stores; > > > > My goal for this query is to run every combination of employees.name and > stores.store through my test_udf, and have Hadoop spread the computation > among the reducers. So if I have 5 rows in the "stores" table and 3 rows in > the "employees" table then there would be 15 combinations, and if I had 3 > reducers then ideally each reducer would get 5 combinations. > > > > I created the tables with these commands: > > create external table employees(row_key string, name string) > > stored by 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler' > > with serdeproperties ("cassandra.columns.mapping" = ":key,name", > > "cassandra.ks.name" = "test", > > "cassandra.cf.name" = "employees"); > > > > create external table stores(row_key string, store string) > > stored by 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler' > > with serdeproperties ("cassandra.columns.mapping" = ":key,store", > > "cassandra.ks.name" = "test", > > "cassandra.cf.name" = "stores"); > > > > I am using Cassandra as the storage mechanism. I have tried using the ON > operator with my query like so: > > select test_udf(name, store) from employees join stores on (employees.name = > stores.store); > > > > and in this case Hive creates 3 reduce tasks, but nothing gets done because > there are no matching keys. Is there a way to accomplish what I am trying to > do by using "distribute by", "cluster by", and/or bucketed tables, or > something else? > > > > Thanks, > > Thomas > > > > > > From: Bejoy KS [mailto:[email protected]] > Sent: Monday, April 30, 2012 10:15 AM > To: [email protected] > Subject: Re: How to make the query compiler not determine the number of > reducers? > > > > Thomas, > > It needn't be the case, raising your map tasks may not have any effect on > reduce tasks. May be we can help you out if you could provide some details > like : > - the query you are executing > - describe formatted on the tables involved in query > > Regards > Bejoy KS > > Sent from handheld, please excuse typos. > > ________________________________ > > From: "Ryabin, Thomas" <[email protected]> > > Date: Mon, 30 Apr 2012 10:06:01 -0400 > > To: <[email protected]> > > ReplyTo: [email protected] > > Subject: RE: How to make the query compiler not determine the number of > reducers? > > > > I tried using this to set the number of reduce tasks to 2, but it doesn't > work for me. In my case the Hive query always creates 8 map tasks and 1 > reduce task. Could the number of reduce tasks be limited by the number of > map tasks, so that if I wanted 2 reduce tasks I would need to increase the > number of map tasks to 16 in my case? > > > > -Thomas > > > > From: Bejoy KS [mailto:[email protected]] > Sent: Saturday, April 28, 2012 1:43 AM > To: [email protected] > Subject: Re: How to make the query compiler not determine the number of > reducers? > > > > Hi Thomas > Hive automatically sets the number of reducers for you. But you can easily > override them at CLI. Before executing your query > hive>SET mapred.reduce.tasks=n; > > Where n is the required num of reducers. > > Regards > Bejoy KS > > Sent from handheld, please excuse typos. > > ________________________________ > > From: "Ryabin, Thomas" <[email protected]> > > Date: Fri, 27 Apr 2012 16:48:25 -0400 > > To: <[email protected]> > > ReplyTo: [email protected] > > Subject: How to make the query compiler not determine the number of > reducers? > > > > Hi, > > > > When I run a query that uses a custom UDF I made, one of the lines it prints > out is: > > Number of reduce tasks determined at compile time: 1 > > > > And this causes the MapReduce job to have only 1 reducer. Is there a way to > make it so the compiler does not determine the number of reduce tasks to > create, so I can specify the number myself? > > > > The query in question is: > > select test_udf(name, store) from employees join stores; > > > > Thanks, > > Thomas
