Thanks for the reply. I used the cat command this time, the result is not great.
In my test, file hadoop003.log is cached while hadoop010.log is not cached. -bash-4.1$ /hadoop/hadoop-2.3.0/bin/hadoop fs -ls -rw-r--r-- 3 hdfs hadoop 209715206 2014-03-06 18:14 hadoop003.log -rw-r--r-- 3 hdfs hadoop 209715272 2014-03-07 14:37 hadoop010.log -bash-4.1$ hdfs cacheadmin -listDirectives -stats -path hadoop003.log Found 1 entry ID POOL REPL EXPIRY PATH BYTES_NEEDED BYTES_CACHED FILES_NEEDED FILES_CACHED 5 wptest1 3 never /user/hdfs/hadoop003.log 629145618 629145618 1 1 run first time -bash-4.1$ time /hadoop/hadoop-2.3.0/bin/hadoop fs -cat hadoop003.log> /tmp/aa real 0m4.881s user 0m4.805s sys 0m1.468s -bash-4.1$ time /hadoop/hadoop-2.3.0/bin/hadoop fs -cat hadoop010.log> /tmp/aa real 0m6.479s user 0m4.777s sys 0m1.312s run 2nd time. -bash-4.1$ time /hadoop/hadoop-2.3.0/bin/hadoop fs -cat hadoop003.log> /tmp/aa real 0m4.751s user 0m4.685s sys 0m1.313s -bash-4.1$ time /hadoop/hadoop-2.3.0/bin/hadoop fs -cat hadoop010.log> /tmp/aa real 0m4.916s user 0m4.779s sys 0m1.378s I did not see much cache improvement. please advice. Thanks On Tue, Mar 11, 2014 at 3:55 PM, Colin McCabe <[email protected]>wrote: > On Fri, Mar 7, 2014 at 7:37 AM, hwpstorage <[email protected]> wrote: > > Hello, > > > > It looks like the HDFS caching does not work well. > > The cached log file is around 200MB. The hadoop cluster has 3 nodes, each > > has 4GB memory. > > > > -bash-4.1$ hdfs cacheadmin -addPool wptest1 > > Successfully added cache pool wptest1. > > > > -bash-4.1$ /hadoop/hadoop-2.3.0/bin/hdfs cacheadmin -listPools > > Found 1 result. > > NAME OWNER GROUP MODE LIMIT MAXTTL > > wptest1 hdfs hdfs rwxr-xr-x unlimited never > > > > -bash-4.1$ hdfs cacheadmin -addDirective -path hadoop003.log -pool > wptest1 > > Added cache directive 1 > > > > -bash-4.1$ time /hadoop/hadoop-2.3.0/bin/hadoop fs -tail hadoop003.log > > real 0m2.796s > > user 0m4.263s > > sys 0m0.203s > > > > -bash-4.1$ time /hadoop/hadoop-2.3.0/bin/hadoop fs -tail hadoop003.log > > real 0m3.050s > > user 0m4.176s > > sys 0m0.192s > > > > It is weird that the cache status shows 0 byte cached:-bash-4.1$ > > /hadoop/hadoop-2.3.0/bin/hdfs cacheadmin -listDirectives -stats -path > > hadoop003.log -pool wptest1 > > Found 1 entry > > ID POOL REPL EXPIRY PATH BYTES_NEEDED > > BYTES_CACHED FILES_NEEDED FILES_CACHED > > 1 wptest1 1 never /user/hdfs/hadoop003.log 209715206 > > 0 1 0 > > If you take a look at this output, you can see that nothing is actually > cached. > > One way to figure out why this is is to look at the logs of the > NameNode and DataNode. Some of the relevant logs are at DEBUG or > TRACE level, so you may need to turn up the logs. The > CacheReplicationMonitor and FsDatasetCache classes are good places to > start. > > Also be sure to check that you have set dfs.datanode.max.locked.memory. > > As Andrew commented, "hadoop tail" is not a good command to use for > measuring performance, since you have a few seconds of Java startup > time, followed by any HDFS setup time, followed by reading a single > kilobyte of data. If you want to use the shell, the simplest thing to > do is to use cat and read a large file, so that those startup costs > don't dominate the measurement. > > best, > Colin > > > > > > -bash-4.1$ file /hadoop/hadoop-2.3.0/lib/native/libhadoop.so.1.0.0 > > /hadoop/hadoop-2.3.0/lib/native/libhadoop.so.1.0.0: ELF 64-bit LSB shared > > object, x86-64, version 1 (SYSV), dynamically linked, not stripped > > > > I also tried the word count example with the same file. The execution > time > > is always 40 seconds. (The map/reduce job without cache is 42 seconds) > > Is there anything wrong? > > Thanks a lot >
