Re: running a job on single-node setup takes less time than running on a cluster

Mahsa Mofidpoor Mon, 20 Aug 2012 11:32:33 -0700

Thnaks Saurabh

On Mon, Aug 20, 2012 at 12:15 PM, Saurabh bhutyani <[email protected]>wrote:


> Dear Mahsa,
>
> You need to increase the data size to benefit out of Hadoop. Basically
> hadoop creates splits based on the configured value. The default being
> 64MB. So if your data size is less than 64MB it would basically run only 1
> MR job.
>
> Thanks & Regards,
> Saurabh Bhutyani
>
> Call  : 9820083104
> Gtalk: [email protected]
>
>
>
> On Mon, Aug 20, 2012 at 6:33 PM, Mahsa Mofidpoor <[email protected]>wrote:
>
>> Hello,
>>
>> I run a simple join (select col_list from table1 join table2 on
>> (join_condition)) on both single-node and multi-nodes  setup. The table
>> sizes are 1.7 MB and 4.2 MB respectively.  It takes more time to execute
>> the query on the cluster then to run it on a single-node hadoop setup.
>> I checked to map logs and I saw that both mappings happen on the master
>> node.
>> Do I need to increase the data in order to benefit from the multi-nodes
>> capacity?
>> How can I make sure that my data is distributed on all the nodes?
>>
>> Thank you in advance for your assistance.
>>
>> Reagrds,
>> Mahsa
>>
>
>

Re: running a job on single-node setup takes less time than running on a cluster

Reply via email to