Hi,
I have a roughly 5 GB file where each row is a key, value pair. I
would like to use this as a hashmap against another large set of
file. From searching around, one way to do it would be to turn it into
a dbm like DBD and put it into a distributed cache. Another is by
joining the data. A third
Do you have hbase running in your cluster ?
I ask this because bringing HBase as a new component into your deployment
incurs operational overhead which you may not be familiar with.
Cheers
On Sun, Jun 7, 2015 at 2:53 PM, Kiet Tran ktt...@gmail.com wrote:
Hi,
I have a roughly 5 GB file where
Here is the algo explanation
http://theory.stanford.edu/~sergei/papers/soda10-mrc.pdf
Thanks,
Adarsh D
On Sun, Jun 7, 2015 at 11:47 AM, yeshwanth kumar yeshwant...@gmail.com
wrote:
underlying algo
On each node you can configure how much memory is available for containers
to run.
On the other hand, for each application you can configure how large
containers should be. For MR apps, you can separately set mappers,
reducers, and the app master itself.
Yarn will detemine through scheduling
Hi Adarsh,
thanks for the links,
those links doesn't have the info about underlying algo's
Thanks,
-Yeshwanth
On Sun, Jun 7, 2015 at 12:07 AM, adarsh deshratnam
adarsh.deshrat...@gmail.com wrote:
On Sun, Jun 7, 2015 at 5:17 AM, yeshwanth kumar yeshwant...@gmail.com
wrote:
shuffle and
Nope. I have never used HBase before. I'm also new to Hadoop in
general. I'll be running the MapReduce job on EMR.
Disregarding what I'm familiar with, I'd also like to know when one
would do one thing vs another. Maybe it's something we can only tell
from experimenting around, but it sounds like