Re: running a job on single-node setup takes less time than running on a cluster

Saurabh bhutyani Mon, 20 Aug 2012 09:15:42 -0700

Dear Mahsa,

You need to increase the data size to benefit out of Hadoop. Basically
hadoop creates splits based on the configured value. The default being
64MB. So if your data size is less than 64MB it would basically run only 1
MR job.


Thanks & Regards,
Saurabh Bhutyani

Call  : 9820083104
Gtalk: [email protected]



On Mon, Aug 20, 2012 at 6:33 PM, Mahsa Mofidpoor <[email protected]>wrote:

> Hello,
>
> I run a simple join (select col_list from table1 join table2 on
> (join_condition)) on both single-node and multi-nodes  setup. The table
> sizes are 1.7 MB and 4.2 MB respectively.  It takes more time to execute
> the query on the cluster then to run it on a single-node hadoop setup.
> I checked to map logs and I saw that both mappings happen on the master
> node.
> Do I need to increase the data in order to benefit from the multi-nodes
> capacity?
> How can I make sure that my data is distributed on all the nodes?
>
> Thank you in advance for your assistance.
>
> Reagrds,
> Mahsa
>

Re: running a job on single-node setup takes less time than running on a cluster

Reply via email to