Advantage/disadvantage of dbm vs join vs HBase

2015-06-07 Thread Kiet Tran
Hi, I have a roughly 5 GB file where each row is a key, value pair. I would like to use this as a hashmap against another large set of file. From searching around, one way to do it would be to turn it into a dbm like DBD and put it into a distributed cache. Another is by joining the data. A third

Re: Advantage/disadvantage of dbm vs join vs HBase

2015-06-07 Thread Ted Yu
Do you have hbase running in your cluster ? I ask this because bringing HBase as a new component into your deployment incurs operational overhead which you may not be familiar with. Cheers On Sun, Jun 7, 2015 at 2:53 PM, Kiet Tran ktt...@gmail.com wrote: Hi, I have a roughly 5 GB file where

Re: How Shuffle and sorting is done in Map-Reduce

2015-06-07 Thread adarsh deshratnam
Here is the algo explanation http://theory.stanford.edu/~sergei/papers/soda10-mrc.pdf Thanks, Adarsh D On Sun, Jun 7, 2015 at 11:47 AM, yeshwanth kumar yeshwant...@gmail.com wrote: underlying algo

Re: WELCOME to user@hadoop.apache.org

2015-06-07 Thread J. Rottinghuis
On each node you can configure how much memory is available for containers to run. On the other hand, for each application you can configure how large containers should be. For MR apps, you can separately set mappers, reducers, and the app master itself. Yarn will detemine through scheduling

Re: How Shuffle and sorting is done in Map-Reduce

2015-06-07 Thread yeshwanth kumar
Hi Adarsh, thanks for the links, those links doesn't have the info about underlying algo's Thanks, -Yeshwanth On Sun, Jun 7, 2015 at 12:07 AM, adarsh deshratnam adarsh.deshrat...@gmail.com wrote: On Sun, Jun 7, 2015 at 5:17 AM, yeshwanth kumar yeshwant...@gmail.com wrote: shuffle and

Re: Advantage/disadvantage of dbm vs join vs HBase

2015-06-07 Thread Kiet Tran
Nope. I have never used HBase before. I'm also new to Hadoop in general. I'll be running the MapReduce job on EMR. Disregarding what I'm familiar with, I'd also like to know when one would do one thing vs another. Maybe it's something we can only tell from experimenting around, but it sounds like