Hadoop: dfs.namenode.name.dir and dfs.datanode.data.dir

rammohan ganapavarapu Thu, 09 Jun 2016 16:07:25 -0700

Hi,

I am trying to understand these two properties if i use multiple
disks/mount points,


For example i have a server with 3 100gb disk mounted on
/data1,/data2,/data3 and if i use them for both data.dir and name.dir do i
get total ~300gb disk space for the data or i only get 100gb and other two
disks are for redundant purpose only?

This is the description i got from hadoop docs:
dfs.namenode.name.dir:

Determines where on the local filesystem the DFS name node should store the
name table(fsimage). If this is a comma-delimited list of directories then
the name table is replicated in all of the directories, for redundancy.

dfs.datanode.data.dir:

Determines where on the local filesystem an DFS data node should store its
blocks. If this is a comma-delimited list of directories, then data will be
stored in all named directories, typically on different devices. The
directories should be tagged with corresponding storage types
([SSD]/[DISK]/[ARCHIVE]/[RAM_DISK]) for HDFS storage policies. The default
storage type will be DISK if the directory does not have a storage type
tagged explicitly. Directories that do not exist will be created if local
filesystem permission allows.

>From the above description i understand only namenode table will get
replicated in 3 disks but not sure how it works if i have multiple disks
for data dir.

I wanted to use all available disk (3:300gb) in a server for data, so can i
just use comma seperated dir list or should i do raid or lvm to combine
those disks?

Thanks,
Ram

Hadoop: dfs.namenode.name.dir and dfs.datanode.data.dir

Reply via email to