Dear Mahsa, You need to increase the data size to benefit out of Hadoop. Basically hadoop creates splits based on the configured value. The default being 64MB. So if your data size is less than 64MB it would basically run only 1 MR job.
Thanks & Regards, Saurabh Bhutyani Call : 9820083104 Gtalk: [email protected] On Mon, Aug 20, 2012 at 6:33 PM, Mahsa Mofidpoor <[email protected]>wrote: > Hello, > > I run a simple join (select col_list from table1 join table2 on > (join_condition)) on both single-node and multi-nodes setup. The table > sizes are 1.7 MB and 4.2 MB respectively. It takes more time to execute > the query on the cluster then to run it on a single-node hadoop setup. > I checked to map logs and I saw that both mappings happen on the master > node. > Do I need to increase the data in order to benefit from the multi-nodes > capacity? > How can I make sure that my data is distributed on all the nodes? > > Thank you in advance for your assistance. > > Reagrds, > Mahsa >
