Thanks for taking a look Josh. It appears the HDFS client process must have execute permissions on that directory, presumably it's depending on a listing operation. Adding o+x to the base directory of dfs.domain.socket.path, in my case, /var/run/hadoop-hdfs, seems to have resolved the issue. At least, the DN came up without complaint and the DataNode metric "ReadsFromLocalClient" has increased to non-zero. I'll keep an eye on my logs and let you know if I run into anything else.
Thanks, -n On Sun, Nov 6, 2016 at 12:43 PM, Josh Elser <[email protected]> wrote: > Did some quick searches out of curiosity which state that unix filesystem > permissions should be sufficient (the hbase user would not need to be in > the hdfs group). > > Is the permission on /var/run/hadoop-hdfs set correctly? (hbase user could > do that same `ls`) > > > Nick Dimiduk wrote: > >> That closing question should have been "so I add the _hbase_ user to the >> hdfs group?" >> >> On Thursday, November 3, 2016, Nick Dimiduk<[email protected]> wrote: >> >> Hello there, >>> >>> I'm setting up a new cluster and I notice in my RS startup logs an >>> ominous >>> warning. >>> >>> 2016-11-03 21:52:14,624 WARN [RS_LOG_REPLAY_OPS-r103u3:16020-0] >>> shortcircuit.DomainSocketFactory: error creating DomainSocket >>> java.net.ConnectException: connect(2) error: Permission denied when >>> trying >>> to connect to '/var/run/hadoop-hdfs/dn.socket' >>> >>> My hdfs-site.xml (not not hbase-site.xml) has: >>> >>> <property> >>> <name>dfs.client.read.shortcircuit</name> >>> <value>true</value> >>> </property> >>> <property> >>> <name>dfs.client.read.shortcircuit.streams.cache.size</name> >>> <value>4096</value> >>> </property> >>> <property> >>> <name>dfs.domain.socket.path</name> >>> <value>/var/run/hadoop-hdfs/dn.socket</value> >>> </property> >>> >>> lsof on the RS process shows me that libhadoop.so has been loaded. >>> >>> $ sudo lsof -p $(cat /var/run/hbase/hbase-hbase-regionserver.pid) | grep >>> 'libhadoop\.so' >>> java 33581 hbase mem REG 252,0 137720 2492109 >>> /usr/hdp/2.3.2.0-2950/hadoop/lib/native/libhadoop.so.1.0.0 >>> >>> Given the hbase user's group membership, >>> >>> $ sudo -u hbase groups >>> hbase hadoop >>> >>> Permission on the socket itself looks suspicious >>> >>> $ sudo ls -la /var/run/hadoop-hdfs >>> total 0 >>> drwx------ 2 hdfs root 60 Nov 2 22:56 . >>> drwxr-xr-x 23 root root 760 Nov 3 21:52 .. >>> srw-rw-rw- 1 hdfs hdfs 0 Nov 2 22:56 dn.socket >>> >>> Our book has this word of caution: "Be careful about permissions for the >>> directory that hosts the shared domain socket; dfsclient will complain if >>> open to other than the hbase user." But this seems inaccurate in that I'm >>> seeing HDFS complain if the directory is open to a user other than hdfs. >>> >>> So what's the correct solution here? Do I add hdfs user to the hdfs >>> group? >>> That sounds too permissive. >>> >>> Thanks, >>> Nick >>> >>> >>
