Hello, I run a simple join (select col_list from table1 join table2 on (join_condition)) on both single-node and multi-nodes setup. The table sizes are 1.7 MB and 4.2 MB respectively. It takes more time to execute the query on the cluster then to run it on a single-node hadoop setup. I checked to map logs and I saw that both mappings happen on the master node. Do I need to increase the data in order to benefit from the multi-nodes capacity? How can I make sure that my data is distributed on all the nodes?
Thank you in advance for your assistance. Reagrds, Mahsa
