I am researching a Hadoop solution for an existing application that requires a 
directory structure full of data for processing.
To make the Hadoop solution work I need to deploy the data directory to each DN 
when the job is executed.I know this isn't new and commonly done with a 
Distributed Cache.
Based on experience what are the common file sizes deployed in a Distributed 
Cache? I know smaller is better, but how big is too big? the larger cache 
deployed I have read there will be startup latency. I also assume there are 
other factors that play into this.
I know that->Default local.cache.size=10Gb
-Range of desirable sizes for Distributed Cache= 10Kb - 1Gb??-Distributed Cache 
is normally not used if larger than =____?
Another Option: Put the data directories on each DN and provide location to 
TaskTracker?
thanksJohn
                                          

Reply via email to