I have been running some SLive tests over the past couple of days to test NameNode performance on different filesystems. Our setup includes:
. CentOS 6.6 . CDH 5.3.0 with Java 7 . 2 NameNodes in an HA configuration . Both NameNodes are on the same type of hardware . Journal Nodes are running on the NameNode servers, and one Journal Node running on another node (3 total) . NameNodes and JournalNodes are writing to different devices on the NameNode machines For the tests I used SLive to create a large number of files to create a rather large NameNode data structure. Then, I ran some tests varying the # of mappers and # of total operations, but always limiting the test to 30 minutes. The tests ran with a 70% read ratio and a mix of the other operations to get to 100%. When changing the filesystems, we saved off the fsimage and other information, reformatted the name, edits, checkpoints, and journal directories, mounted them and put the fsimage and other information back in their original location and restarted the NameNodes. We made sure to change nothing else during the tests. We tested ext3, ext4, and xfs filesystems. When we changed the filesystem type, we changed it to the same type on all 3 machines (2NN + 1JN). Using the total # of operations completed during the 30 minute test we found that ext4 seemed to be the best choice. Using ext3 we completed 1% less operations on average and with xfs it was about 30% less. I was a little shocked by this, so we ran the xfs tests three times, attempting to tune the XFS filesystem and mount options with no success. Thinking about this a little, XFS has superior performance for parallel writes due to multiple inode tables. So, some questions: 1. Is XFS known to be slower for single-threaded write performance with multiple inode tables? 2. are the writes to the edits and journals multi-threaded? I know that they are sync'd, but is it a single writer? 3. Is using the total # of ops the correct way to use SLive? Next, I wanted to test the penalty for HA NameNode. So, we took the best performing test (ext4) and changed the setup to non-HA, with a single NN on one of the same NN machines. Running the same tests, the single NN exhibited much less (like 10x less) total operations in the same amount of time. This does not seem correct. What is different about the I/O paths for a single NN (other than the writes to the JNs)? What did I do incorrectly? Thanks, Dave
