[jira] [Commented] (FLINK-3418) RocksDB HDFSCopyFromLocal util doesn't respect our Hadoop security configuration
[ https://issues.apache.org/jira/browse/FLINK-3418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159172#comment-15159172 ] ASF GitHub Bot commented on FLINK-3418: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/1687 > RocksDB HDFSCopyFromLocal util doesn't respect our Hadoop security > configuration > > > Key: FLINK-3418 > URL: https://issues.apache.org/jira/browse/FLINK-3418 > Project: Flink > Issue Type: Bug > Components: state backends >Reporter: Robert Metzger >Assignee: Aljoscha Krettek >Priority: Blocker > > As you can see for example in the {{YARNTaskManagerRunner}}, our TaskManagers > are running in a special UserGroupInformation.doAs(); call. > With that call, we are manually changing the user from the user starting the > YARN NodeManager (our containers are part of that process tree) to the user > who submitted the job. > For example on my cluster, the NodeManager runs as "yarn", but "robert" > submits the job. For regular file access, "robert" is accessing the files in > HDFS, even though "yarn" runs the process. > The {{HDFSCopyFromLocal}} does not properly initialize these settings, hence > "yarn" tries to access the files, leading to the following exception: > {code} > Caused by: java.lang.RuntimeException: Error while copying to remote > FileSystem: SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/yarn/nm/usercache/robert/appcache/application_1455632128025_0010/filecache/17/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/opt/cloudera/parcels/CDH-5.4.5-1.cdh5.4.5.p0.7/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] > Exception in thread "main" org.apache.hadoop.security.AccessControlException: > Permission denied: user=yarn, access=WRITE, > inode="/user/robert/rocksdb/5b7ad8b04048e894ef7bf341856681bf":robert:supergroup:drwxr-xr-x > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257) > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238) > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:216) > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:145) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6599) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6581) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6533) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4337) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4307) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4280) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:853) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.mkdirs(AuthorizationProviderProxyClientProtocol.java:321) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:601) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at >
[jira] [Commented] (FLINK-3418) RocksDB HDFSCopyFromLocal util doesn't respect our Hadoop security configuration
[ https://issues.apache.org/jira/browse/FLINK-3418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1515#comment-1515 ] ASF GitHub Bot commented on FLINK-3418: --- Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/1687#issuecomment-187705826 Thanks for your work @aljoscha. Changes look good to me. I'll address Robert's concern and then merge the PR. > RocksDB HDFSCopyFromLocal util doesn't respect our Hadoop security > configuration > > > Key: FLINK-3418 > URL: https://issues.apache.org/jira/browse/FLINK-3418 > Project: Flink > Issue Type: Bug > Components: state backends >Reporter: Robert Metzger >Assignee: Aljoscha Krettek >Priority: Blocker > > As you can see for example in the {{YARNTaskManagerRunner}}, our TaskManagers > are running in a special UserGroupInformation.doAs(); call. > With that call, we are manually changing the user from the user starting the > YARN NodeManager (our containers are part of that process tree) to the user > who submitted the job. > For example on my cluster, the NodeManager runs as "yarn", but "robert" > submits the job. For regular file access, "robert" is accessing the files in > HDFS, even though "yarn" runs the process. > The {{HDFSCopyFromLocal}} does not properly initialize these settings, hence > "yarn" tries to access the files, leading to the following exception: > {code} > Caused by: java.lang.RuntimeException: Error while copying to remote > FileSystem: SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/yarn/nm/usercache/robert/appcache/application_1455632128025_0010/filecache/17/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/opt/cloudera/parcels/CDH-5.4.5-1.cdh5.4.5.p0.7/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] > Exception in thread "main" org.apache.hadoop.security.AccessControlException: > Permission denied: user=yarn, access=WRITE, > inode="/user/robert/rocksdb/5b7ad8b04048e894ef7bf341856681bf":robert:supergroup:drwxr-xr-x > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257) > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238) > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:216) > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:145) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6599) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6581) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6533) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4337) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4307) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4280) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:853) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.mkdirs(AuthorizationProviderProxyClientProtocol.java:321) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:601) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038) > at
[jira] [Commented] (FLINK-3418) RocksDB HDFSCopyFromLocal util doesn't respect our Hadoop security configuration
[ https://issues.apache.org/jira/browse/FLINK-3418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15158884#comment-15158884 ] ASF GitHub Bot commented on FLINK-3418: --- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/1687#discussion_r53780485 --- Diff: flink-streaming-java/src/main/java/org/apache/flink/streaming/util/HDFSCopyFromLocal.java --- @@ -17,41 +17,50 @@ */ package org.apache.flink.streaming.util; -import org.apache.flink.util.ExternalProcessRunner; +import org.apache.flink.runtime.fs.hdfs.HadoopFileSystem; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; -import java.io.DataInputStream; import java.io.File; -import java.io.FileInputStream; import java.net.URI; +import java.util.ArrayList; +import java.util.List; /** - * Utility for copying from local file system to a HDFS {@link FileSystem} in an external process. - * This is required since {@code FileSystem.copyFromLocalFile} does not like being interrupted. + * Utility for copying from local file system to a HDFS {@link FileSystem}. */ public class HDFSCopyFromLocal { - public static void main(String[] args) throws Exception { - String hadoopConfPath = args[0]; - String localBackupPath = args[1]; - String backupUri = args[2]; - - Configuration hadoopConf = new Configuration(); - try (DataInputStream in = new DataInputStream(new FileInputStream(hadoopConfPath))) { - hadoopConf.readFields(in); - } - FileSystem fs = FileSystem.get(new URI(backupUri), hadoopConf); + public static void copyFromLocal(final File localPath, + final URI remotePath) throws Exception { + // Do it in another Thread because HDFS can deadlock if being interrupted while copying + String threadName = "HDFS Copy from " + localPath + " to " + remotePath; - fs.copyFromLocalFile(new Path(localBackupPath), new Path(backupUri)); - } + final List asyncException = new ArrayList<>(); --- End diff -- Agreed. `ArrayList` will allocate an array of 10 elements when you initialize the ArrayList without an initial capacity. So either initialize the ArrayList with a size of `1` or use Robert's approach. > RocksDB HDFSCopyFromLocal util doesn't respect our Hadoop security > configuration > > > Key: FLINK-3418 > URL: https://issues.apache.org/jira/browse/FLINK-3418 > Project: Flink > Issue Type: Bug > Components: state backends >Reporter: Robert Metzger >Assignee: Aljoscha Krettek >Priority: Blocker > > As you can see for example in the {{YARNTaskManagerRunner}}, our TaskManagers > are running in a special UserGroupInformation.doAs(); call. > With that call, we are manually changing the user from the user starting the > YARN NodeManager (our containers are part of that process tree) to the user > who submitted the job. > For example on my cluster, the NodeManager runs as "yarn", but "robert" > submits the job. For regular file access, "robert" is accessing the files in > HDFS, even though "yarn" runs the process. > The {{HDFSCopyFromLocal}} does not properly initialize these settings, hence > "yarn" tries to access the files, leading to the following exception: > {code} > Caused by: java.lang.RuntimeException: Error while copying to remote > FileSystem: SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/yarn/nm/usercache/robert/appcache/application_1455632128025_0010/filecache/17/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/opt/cloudera/parcels/CDH-5.4.5-1.cdh5.4.5.p0.7/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] > Exception in thread "main" org.apache.hadoop.security.AccessControlException: > Permission denied: user=yarn, access=WRITE, > inode="/user/robert/rocksdb/5b7ad8b04048e894ef7bf341856681bf":robert:supergroup:drwxr-xr-x > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257) > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238) > at >
[jira] [Commented] (FLINK-3418) RocksDB HDFSCopyFromLocal util doesn't respect our Hadoop security configuration
[ https://issues.apache.org/jira/browse/FLINK-3418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15158845#comment-15158845 ] ASF GitHub Bot commented on FLINK-3418: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/1687#issuecomment-187692550 +1 to merge this change > RocksDB HDFSCopyFromLocal util doesn't respect our Hadoop security > configuration > > > Key: FLINK-3418 > URL: https://issues.apache.org/jira/browse/FLINK-3418 > Project: Flink > Issue Type: Bug > Components: state backends >Reporter: Robert Metzger >Assignee: Aljoscha Krettek >Priority: Blocker > > As you can see for example in the {{YARNTaskManagerRunner}}, our TaskManagers > are running in a special UserGroupInformation.doAs(); call. > With that call, we are manually changing the user from the user starting the > YARN NodeManager (our containers are part of that process tree) to the user > who submitted the job. > For example on my cluster, the NodeManager runs as "yarn", but "robert" > submits the job. For regular file access, "robert" is accessing the files in > HDFS, even though "yarn" runs the process. > The {{HDFSCopyFromLocal}} does not properly initialize these settings, hence > "yarn" tries to access the files, leading to the following exception: > {code} > Caused by: java.lang.RuntimeException: Error while copying to remote > FileSystem: SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/yarn/nm/usercache/robert/appcache/application_1455632128025_0010/filecache/17/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/opt/cloudera/parcels/CDH-5.4.5-1.cdh5.4.5.p0.7/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] > Exception in thread "main" org.apache.hadoop.security.AccessControlException: > Permission denied: user=yarn, access=WRITE, > inode="/user/robert/rocksdb/5b7ad8b04048e894ef7bf341856681bf":robert:supergroup:drwxr-xr-x > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257) > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238) > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:216) > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:145) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6599) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6581) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6533) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4337) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4307) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4280) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:853) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.mkdirs(AuthorizationProviderProxyClientProtocol.java:321) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:601) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at >
[jira] [Commented] (FLINK-3418) RocksDB HDFSCopyFromLocal util doesn't respect our Hadoop security configuration
[ https://issues.apache.org/jira/browse/FLINK-3418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15158843#comment-15158843 ] ASF GitHub Bot commented on FLINK-3418: --- Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/1687#discussion_r53777248 --- Diff: flink-streaming-java/src/main/java/org/apache/flink/streaming/util/HDFSCopyFromLocal.java --- @@ -17,41 +17,50 @@ */ package org.apache.flink.streaming.util; -import org.apache.flink.util.ExternalProcessRunner; +import org.apache.flink.runtime.fs.hdfs.HadoopFileSystem; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; -import java.io.DataInputStream; import java.io.File; -import java.io.FileInputStream; import java.net.URI; +import java.util.ArrayList; +import java.util.List; /** - * Utility for copying from local file system to a HDFS {@link FileSystem} in an external process. - * This is required since {@code FileSystem.copyFromLocalFile} does not like being interrupted. + * Utility for copying from local file system to a HDFS {@link FileSystem}. */ public class HDFSCopyFromLocal { - public static void main(String[] args) throws Exception { - String hadoopConfPath = args[0]; - String localBackupPath = args[1]; - String backupUri = args[2]; - - Configuration hadoopConf = new Configuration(); - try (DataInputStream in = new DataInputStream(new FileInputStream(hadoopConfPath))) { - hadoopConf.readFields(in); - } - FileSystem fs = FileSystem.get(new URI(backupUri), hadoopConf); + public static void copyFromLocal(final File localPath, + final URI remotePath) throws Exception { + // Do it in another Thread because HDFS can deadlock if being interrupted while copying + String threadName = "HDFS Copy from " + localPath + " to " + remotePath; - fs.copyFromLocalFile(new Path(localBackupPath), new Path(backupUri)); - } + final List asyncException = new ArrayList<>(); --- End diff -- If you have cases like this again in the future: I use Flink's `Tuple1` for those cases. Its probably cheaper than creating a new ArrayList. > RocksDB HDFSCopyFromLocal util doesn't respect our Hadoop security > configuration > > > Key: FLINK-3418 > URL: https://issues.apache.org/jira/browse/FLINK-3418 > Project: Flink > Issue Type: Bug > Components: state backends >Reporter: Robert Metzger >Assignee: Aljoscha Krettek >Priority: Blocker > > As you can see for example in the {{YARNTaskManagerRunner}}, our TaskManagers > are running in a special UserGroupInformation.doAs(); call. > With that call, we are manually changing the user from the user starting the > YARN NodeManager (our containers are part of that process tree) to the user > who submitted the job. > For example on my cluster, the NodeManager runs as "yarn", but "robert" > submits the job. For regular file access, "robert" is accessing the files in > HDFS, even though "yarn" runs the process. > The {{HDFSCopyFromLocal}} does not properly initialize these settings, hence > "yarn" tries to access the files, leading to the following exception: > {code} > Caused by: java.lang.RuntimeException: Error while copying to remote > FileSystem: SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/yarn/nm/usercache/robert/appcache/application_1455632128025_0010/filecache/17/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/opt/cloudera/parcels/CDH-5.4.5-1.cdh5.4.5.p0.7/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] > Exception in thread "main" org.apache.hadoop.security.AccessControlException: > Permission denied: user=yarn, access=WRITE, > inode="/user/robert/rocksdb/5b7ad8b04048e894ef7bf341856681bf":robert:supergroup:drwxr-xr-x > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257) > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238) > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:216) > at >
[jira] [Commented] (FLINK-3418) RocksDB HDFSCopyFromLocal util doesn't respect our Hadoop security configuration
[ https://issues.apache.org/jira/browse/FLINK-3418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15157523#comment-15157523 ] ASF GitHub Bot commented on FLINK-3418: --- Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/1687#issuecomment-187328780 Ahh, stupid beginners mistake. Fixed it. > RocksDB HDFSCopyFromLocal util doesn't respect our Hadoop security > configuration > > > Key: FLINK-3418 > URL: https://issues.apache.org/jira/browse/FLINK-3418 > Project: Flink > Issue Type: Bug > Components: state backends >Reporter: Robert Metzger >Assignee: Aljoscha Krettek >Priority: Blocker > > As you can see for example in the {{YARNTaskManagerRunner}}, our TaskManagers > are running in a special UserGroupInformation.doAs(); call. > With that call, we are manually changing the user from the user starting the > YARN NodeManager (our containers are part of that process tree) to the user > who submitted the job. > For example on my cluster, the NodeManager runs as "yarn", but "robert" > submits the job. For regular file access, "robert" is accessing the files in > HDFS, even though "yarn" runs the process. > The {{HDFSCopyFromLocal}} does not properly initialize these settings, hence > "yarn" tries to access the files, leading to the following exception: > {code} > Caused by: java.lang.RuntimeException: Error while copying to remote > FileSystem: SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/yarn/nm/usercache/robert/appcache/application_1455632128025_0010/filecache/17/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/opt/cloudera/parcels/CDH-5.4.5-1.cdh5.4.5.p0.7/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] > Exception in thread "main" org.apache.hadoop.security.AccessControlException: > Permission denied: user=yarn, access=WRITE, > inode="/user/robert/rocksdb/5b7ad8b04048e894ef7bf341856681bf":robert:supergroup:drwxr-xr-x > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257) > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238) > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:216) > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:145) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6599) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6581) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6533) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4337) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4307) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4280) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:853) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.mkdirs(AuthorizationProviderProxyClientProtocol.java:321) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:601) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) >
[jira] [Commented] (FLINK-3418) RocksDB HDFSCopyFromLocal util doesn't respect our Hadoop security configuration
[ https://issues.apache.org/jira/browse/FLINK-3418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15157502#comment-15157502 ] ASF GitHub Bot commented on FLINK-3418: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1687#issuecomment-187323846 Fails on checkstyle, otherwise looks good... > RocksDB HDFSCopyFromLocal util doesn't respect our Hadoop security > configuration > > > Key: FLINK-3418 > URL: https://issues.apache.org/jira/browse/FLINK-3418 > Project: Flink > Issue Type: Bug > Components: state backends >Reporter: Robert Metzger >Assignee: Aljoscha Krettek >Priority: Blocker > > As you can see for example in the {{YARNTaskManagerRunner}}, our TaskManagers > are running in a special UserGroupInformation.doAs(); call. > With that call, we are manually changing the user from the user starting the > YARN NodeManager (our containers are part of that process tree) to the user > who submitted the job. > For example on my cluster, the NodeManager runs as "yarn", but "robert" > submits the job. For regular file access, "robert" is accessing the files in > HDFS, even though "yarn" runs the process. > The {{HDFSCopyFromLocal}} does not properly initialize these settings, hence > "yarn" tries to access the files, leading to the following exception: > {code} > Caused by: java.lang.RuntimeException: Error while copying to remote > FileSystem: SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/yarn/nm/usercache/robert/appcache/application_1455632128025_0010/filecache/17/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/opt/cloudera/parcels/CDH-5.4.5-1.cdh5.4.5.p0.7/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] > Exception in thread "main" org.apache.hadoop.security.AccessControlException: > Permission denied: user=yarn, access=WRITE, > inode="/user/robert/rocksdb/5b7ad8b04048e894ef7bf341856681bf":robert:supergroup:drwxr-xr-x > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257) > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238) > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:216) > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:145) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6599) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6581) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6533) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4337) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4307) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4280) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:853) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.mkdirs(AuthorizationProviderProxyClientProtocol.java:321) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:601) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) >
[jira] [Commented] (FLINK-3418) RocksDB HDFSCopyFromLocal util doesn't respect our Hadoop security configuration
[ https://issues.apache.org/jira/browse/FLINK-3418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15157046#comment-15157046 ] ASF GitHub Bot commented on FLINK-3418: --- Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/1687#issuecomment-187202268 Should be the correct code now. :smile: > RocksDB HDFSCopyFromLocal util doesn't respect our Hadoop security > configuration > > > Key: FLINK-3418 > URL: https://issues.apache.org/jira/browse/FLINK-3418 > Project: Flink > Issue Type: Bug > Components: state backends >Reporter: Robert Metzger >Assignee: Aljoscha Krettek >Priority: Blocker > > As you can see for example in the {{YARNTaskManagerRunner}}, our TaskManagers > are running in a special UserGroupInformation.doAs(); call. > With that call, we are manually changing the user from the user starting the > YARN NodeManager (our containers are part of that process tree) to the user > who submitted the job. > For example on my cluster, the NodeManager runs as "yarn", but "robert" > submits the job. For regular file access, "robert" is accessing the files in > HDFS, even though "yarn" runs the process. > The {{HDFSCopyFromLocal}} does not properly initialize these settings, hence > "yarn" tries to access the files, leading to the following exception: > {code} > Caused by: java.lang.RuntimeException: Error while copying to remote > FileSystem: SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/yarn/nm/usercache/robert/appcache/application_1455632128025_0010/filecache/17/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/opt/cloudera/parcels/CDH-5.4.5-1.cdh5.4.5.p0.7/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] > Exception in thread "main" org.apache.hadoop.security.AccessControlException: > Permission denied: user=yarn, access=WRITE, > inode="/user/robert/rocksdb/5b7ad8b04048e894ef7bf341856681bf":robert:supergroup:drwxr-xr-x > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257) > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238) > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:216) > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:145) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6599) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6581) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6533) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4337) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4307) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4280) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:853) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.mkdirs(AuthorizationProviderProxyClientProtocol.java:321) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:601) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) >
[jira] [Commented] (FLINK-3418) RocksDB HDFSCopyFromLocal util doesn't respect our Hadoop security configuration
[ https://issues.apache.org/jira/browse/FLINK-3418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15157029#comment-15157029 ] ASF GitHub Bot commented on FLINK-3418: --- Github user aljoscha commented on a diff in the pull request: https://github.com/apache/flink/pull/1687#discussion_r53629014 --- Diff: flink-streaming-java/src/main/java/org/apache/flink/streaming/util/HDFSCopyFromLocal.java --- @@ -26,32 +25,46 @@ import java.io.File; import java.io.FileInputStream; import java.net.URI; +import java.util.ArrayList; +import java.util.List; /** - * Utility for copying from local file system to a HDFS {@link FileSystem} in an external process. - * This is required since {@code FileSystem.copyFromLocalFile} does not like being interrupted. + * Utility for copying from local file system to a HDFS {@link FileSystem}. */ public class HDFSCopyFromLocal { - public static void main(String[] args) throws Exception { - String hadoopConfPath = args[0]; - String localBackupPath = args[1]; - String backupUri = args[2]; - - Configuration hadoopConf = new Configuration(); - try (DataInputStream in = new DataInputStream(new FileInputStream(hadoopConfPath))) { - hadoopConf.readFields(in); - } - FileSystem fs = FileSystem.get(new URI(backupUri), hadoopConf); + public static void copyFromLocal(final File hadoopConfPath, final File localPath, final URI remotePath) throws Exception { + // Do it in another Thread because HDFS can deadlock if being interrupted while copying - fs.copyFromLocalFile(new Path(localBackupPath), new Path(backupUri)); - } + String threadName = "HDFS Copy from " + localPath + " to " + remotePath; + + final List asyncException = new ArrayList<>(); + + Thread copyThread = new Thread(threadName) { + @Override + public void run() { + try { + Configuration hadoopConf = new Configuration(); --- End diff -- Ah dammit, I pushed to wrong stuff. Give me a sec. > RocksDB HDFSCopyFromLocal util doesn't respect our Hadoop security > configuration > > > Key: FLINK-3418 > URL: https://issues.apache.org/jira/browse/FLINK-3418 > Project: Flink > Issue Type: Bug > Components: state backends >Reporter: Robert Metzger >Assignee: Aljoscha Krettek >Priority: Blocker > > As you can see for example in the {{YARNTaskManagerRunner}}, our TaskManagers > are running in a special UserGroupInformation.doAs(); call. > With that call, we are manually changing the user from the user starting the > YARN NodeManager (our containers are part of that process tree) to the user > who submitted the job. > For example on my cluster, the NodeManager runs as "yarn", but "robert" > submits the job. For regular file access, "robert" is accessing the files in > HDFS, even though "yarn" runs the process. > The {{HDFSCopyFromLocal}} does not properly initialize these settings, hence > "yarn" tries to access the files, leading to the following exception: > {code} > Caused by: java.lang.RuntimeException: Error while copying to remote > FileSystem: SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/yarn/nm/usercache/robert/appcache/application_1455632128025_0010/filecache/17/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/opt/cloudera/parcels/CDH-5.4.5-1.cdh5.4.5.p0.7/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] > Exception in thread "main" org.apache.hadoop.security.AccessControlException: > Permission denied: user=yarn, access=WRITE, > inode="/user/robert/rocksdb/5b7ad8b04048e894ef7bf341856681bf":robert:supergroup:drwxr-xr-x > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257) > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238) > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:216) > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:145) > at >
[jira] [Commented] (FLINK-3418) RocksDB HDFSCopyFromLocal util doesn't respect our Hadoop security configuration
[ https://issues.apache.org/jira/browse/FLINK-3418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15157026#comment-15157026 ] ASF GitHub Bot commented on FLINK-3418: --- Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/1687#discussion_r53628814 --- Diff: flink-streaming-java/src/main/java/org/apache/flink/streaming/util/HDFSCopyFromLocal.java --- @@ -26,32 +25,46 @@ import java.io.File; import java.io.FileInputStream; import java.net.URI; +import java.util.ArrayList; +import java.util.List; /** - * Utility for copying from local file system to a HDFS {@link FileSystem} in an external process. - * This is required since {@code FileSystem.copyFromLocalFile} does not like being interrupted. + * Utility for copying from local file system to a HDFS {@link FileSystem}. */ public class HDFSCopyFromLocal { - public static void main(String[] args) throws Exception { - String hadoopConfPath = args[0]; - String localBackupPath = args[1]; - String backupUri = args[2]; - - Configuration hadoopConf = new Configuration(); - try (DataInputStream in = new DataInputStream(new FileInputStream(hadoopConfPath))) { - hadoopConf.readFields(in); - } - FileSystem fs = FileSystem.get(new URI(backupUri), hadoopConf); + public static void copyFromLocal(final File hadoopConfPath, final File localPath, final URI remotePath) throws Exception { + // Do it in another Thread because HDFS can deadlock if being interrupted while copying - fs.copyFromLocalFile(new Path(localBackupPath), new Path(backupUri)); - } + String threadName = "HDFS Copy from " + localPath + " to " + remotePath; + + final List asyncException = new ArrayList<>(); + + Thread copyThread = new Thread(threadName) { + @Override + public void run() { + try { + Configuration hadoopConf = new Configuration(); --- End diff -- This is only loading the Hadoop configuration from the classpath, not from the Flink configuration or environment variables I think our filesystem code has a method to try the environment variables and the config as well. > RocksDB HDFSCopyFromLocal util doesn't respect our Hadoop security > configuration > > > Key: FLINK-3418 > URL: https://issues.apache.org/jira/browse/FLINK-3418 > Project: Flink > Issue Type: Bug > Components: state backends >Reporter: Robert Metzger >Assignee: Aljoscha Krettek >Priority: Blocker > > As you can see for example in the {{YARNTaskManagerRunner}}, our TaskManagers > are running in a special UserGroupInformation.doAs(); call. > With that call, we are manually changing the user from the user starting the > YARN NodeManager (our containers are part of that process tree) to the user > who submitted the job. > For example on my cluster, the NodeManager runs as "yarn", but "robert" > submits the job. For regular file access, "robert" is accessing the files in > HDFS, even though "yarn" runs the process. > The {{HDFSCopyFromLocal}} does not properly initialize these settings, hence > "yarn" tries to access the files, leading to the following exception: > {code} > Caused by: java.lang.RuntimeException: Error while copying to remote > FileSystem: SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/yarn/nm/usercache/robert/appcache/application_1455632128025_0010/filecache/17/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/opt/cloudera/parcels/CDH-5.4.5-1.cdh5.4.5.p0.7/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] > Exception in thread "main" org.apache.hadoop.security.AccessControlException: > Permission denied: user=yarn, access=WRITE, > inode="/user/robert/rocksdb/5b7ad8b04048e894ef7bf341856681bf":robert:supergroup:drwxr-xr-x > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257) > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238) > at >
[jira] [Commented] (FLINK-3418) RocksDB HDFSCopyFromLocal util doesn't respect our Hadoop security configuration
[ https://issues.apache.org/jira/browse/FLINK-3418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15157012#comment-15157012 ] ASF GitHub Bot commented on FLINK-3418: --- GitHub user aljoscha opened a pull request: https://github.com/apache/flink/pull/1687 [FLINK-3418] Don't run RocksDB copy utils in external process This was causing to many problems with security tokens and yarn. Now, let the RocksDB backup run in a thread but don't interrupt these Threads anymore on closing. The Threads will close themselves because the copy operation will fail because of a FileNotFoundException when the state directories are being cleaned up. This also removes the ExternalProcessRunner because it is not needed anymore and using it causes too many headaches. You can merge this pull request into a Git repository by running: $ git pull https://github.com/aljoscha/flink rocksdb-security-fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1687.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1687 commit 421e851f0da752da6ec9d14988759b06680ced03 Author: Aljoscha KrettekDate: 2016-02-17T11:34:51Z [FLINK-3418] Don't run RocksDB copy utils in external process This was causing to many problems with security tokens and yarn. Now, let the RocksDB backup run in a thread but don't interrupt these Threads anymore on closing. The Threads will close themselves because the copy operation will fail because of a FileNotFoundException when the state directories are being cleaned up. This also removes the ExternalProcessRunner because it is not needed anymore and using it causes too many headaches. > RocksDB HDFSCopyFromLocal util doesn't respect our Hadoop security > configuration > > > Key: FLINK-3418 > URL: https://issues.apache.org/jira/browse/FLINK-3418 > Project: Flink > Issue Type: Bug > Components: state backends >Reporter: Robert Metzger >Assignee: Aljoscha Krettek >Priority: Blocker > > As you can see for example in the {{YARNTaskManagerRunner}}, our TaskManagers > are running in a special UserGroupInformation.doAs(); call. > With that call, we are manually changing the user from the user starting the > YARN NodeManager (our containers are part of that process tree) to the user > who submitted the job. > For example on my cluster, the NodeManager runs as "yarn", but "robert" > submits the job. For regular file access, "robert" is accessing the files in > HDFS, even though "yarn" runs the process. > The {{HDFSCopyFromLocal}} does not properly initialize these settings, hence > "yarn" tries to access the files, leading to the following exception: > {code} > Caused by: java.lang.RuntimeException: Error while copying to remote > FileSystem: SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/yarn/nm/usercache/robert/appcache/application_1455632128025_0010/filecache/17/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/opt/cloudera/parcels/CDH-5.4.5-1.cdh5.4.5.p0.7/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] > Exception in thread "main" org.apache.hadoop.security.AccessControlException: > Permission denied: user=yarn, access=WRITE, > inode="/user/robert/rocksdb/5b7ad8b04048e894ef7bf341856681bf":robert:supergroup:drwxr-xr-x > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257) > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238) > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:216) > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:145) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6599) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6581) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6533) > at >
[jira] [Commented] (FLINK-3418) RocksDB HDFSCopyFromLocal util doesn't respect our Hadoop security configuration
[ https://issues.apache.org/jira/browse/FLINK-3418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148906#comment-15148906 ] Stephan Ewen commented on FLINK-3418: - Something like this needs to be added: https://github.com/apache/flink/blob/master/flink-runtime/src/main/scala/org/apache/flink/runtime/taskmanager/TaskManager.scala#L1370 This runs code as privileged code under the authenticated user . > RocksDB HDFSCopyFromLocal util doesn't respect our Hadoop security > configuration > > > Key: FLINK-3418 > URL: https://issues.apache.org/jira/browse/FLINK-3418 > Project: Flink > Issue Type: Bug > Components: state backends >Reporter: Robert Metzger >Priority: Critical > > As you can see for example in the {{YARNTaskManagerRunner}}, our TaskManagers > are running in a special UserGroupInformation.doAs(); call. > With that call, we are manually changing the user from the user starting the > YARN NodeManager (our containers are part of that process tree) to the user > who submitted the job. > For example on my cluster, the NodeManager runs as "yarn", but "robert" > submits the job. For regular file access, "robert" is accessing the files in > HDFS, even though "yarn" runs the process. > The {{HDFSCopyFromLocal}} does not properly initialize these settings, hence > "yarn" tries to access the files, leading to the following exception: > {code} > Caused by: java.lang.RuntimeException: Error while copying to remote > FileSystem: SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/yarn/nm/usercache/robert/appcache/application_1455632128025_0010/filecache/17/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/opt/cloudera/parcels/CDH-5.4.5-1.cdh5.4.5.p0.7/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] > Exception in thread "main" org.apache.hadoop.security.AccessControlException: > Permission denied: user=yarn, access=WRITE, > inode="/user/robert/rocksdb/5b7ad8b04048e894ef7bf341856681bf":robert:supergroup:drwxr-xr-x > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257) > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238) > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:216) > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:145) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6599) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6581) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6533) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4337) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4307) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4280) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:853) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.mkdirs(AuthorizationProviderProxyClientProtocol.java:321) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:601) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038) > at
[jira] [Commented] (FLINK-3418) RocksDB HDFSCopyFromLocal util doesn't respect our Hadoop security configuration
[ https://issues.apache.org/jira/browse/FLINK-3418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148896#comment-15148896 ] Aljoscha Krettek commented on FLINK-3418: - Where is the logic that does the user change? Could this simply be added to the copy utilities? > RocksDB HDFSCopyFromLocal util doesn't respect our Hadoop security > configuration > > > Key: FLINK-3418 > URL: https://issues.apache.org/jira/browse/FLINK-3418 > Project: Flink > Issue Type: Bug > Components: state backends >Reporter: Robert Metzger >Priority: Critical > > As you can see for example in the {{YARNTaskManagerRunner}}, our TaskManagers > are running in a special UserGroupInformation.doAs(); call. > With that call, we are manually changing the user from the user starting the > YARN NodeManager (our containers are part of that process tree) to the user > who submitted the job. > For example on my cluster, the NodeManager runs as "yarn", but "robert" > submits the job. For regular file access, "robert" is accessing the files in > HDFS, even though "yarn" runs the process. > The {{HDFSCopyFromLocal}} does not properly initialize these settings, hence > "yarn" tries to access the files, leading to the following exception: > {code} > Caused by: java.lang.RuntimeException: Error while copying to remote > FileSystem: SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/yarn/nm/usercache/robert/appcache/application_1455632128025_0010/filecache/17/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/opt/cloudera/parcels/CDH-5.4.5-1.cdh5.4.5.p0.7/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] > Exception in thread "main" org.apache.hadoop.security.AccessControlException: > Permission denied: user=yarn, access=WRITE, > inode="/user/robert/rocksdb/5b7ad8b04048e894ef7bf341856681bf":robert:supergroup:drwxr-xr-x > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257) > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238) > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:216) > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:145) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6599) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6581) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6533) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4337) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4307) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4280) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:853) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.mkdirs(AuthorizationProviderProxyClientProtocol.java:321) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:601) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > at
[jira] [Commented] (FLINK-3418) RocksDB HDFSCopyFromLocal util doesn't respect our Hadoop security configuration
[ https://issues.apache.org/jira/browse/FLINK-3418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148878#comment-15148878 ] Robert Metzger commented on FLINK-3418: --- Currently, even making the directory accessible for the user running the NM ("yarn") doesn't solve the problem because {{initializeForJob()}} creates a directory with the user who submitted the job "robert". > RocksDB HDFSCopyFromLocal util doesn't respect our Hadoop security > configuration > > > Key: FLINK-3418 > URL: https://issues.apache.org/jira/browse/FLINK-3418 > Project: Flink > Issue Type: Bug > Components: state backends >Reporter: Robert Metzger >Priority: Critical > > As you can see for example in the {{YARNTaskManagerRunner}}, our TaskManagers > are running in a special UserGroupInformation.doAs(); call. > With that call, we are manually changing the user from the user starting the > YARN NodeManager (our containers are part of that process tree) to the user > who submitted the job. > For example on my cluster, the NodeManager runs as "yarn", but "robert" > submits the job. For regular file access, "robert" is accessing the files in > HDFS, even though "yarn" runs the process. > The {{HDFSCopyFromLocal}} does not properly initialize these settings, hence > "yarn" tries to access the files, leading to the following exception: > {code} > Caused by: java.lang.RuntimeException: Error while copying to remote > FileSystem: SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/yarn/nm/usercache/robert/appcache/application_1455632128025_0010/filecache/17/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/opt/cloudera/parcels/CDH-5.4.5-1.cdh5.4.5.p0.7/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] > Exception in thread "main" org.apache.hadoop.security.AccessControlException: > Permission denied: user=yarn, access=WRITE, > inode="/user/robert/rocksdb/5b7ad8b04048e894ef7bf341856681bf":robert:supergroup:drwxr-xr-x > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257) > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238) > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:216) > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:145) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6599) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6581) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6533) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4337) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4307) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4280) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:853) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.mkdirs(AuthorizationProviderProxyClientProtocol.java:321) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:601) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) >